Merge tag 'v3.7.0' into aosp/main
2023-08-10 v3.7.0
This release includes new codec interfaces, compression efficiency and
perceptual improvements, speedup and memory optimizations and many bug fixes.
This release is ABI compatible with the last release.
- New Features
* New codec controls:
* AV1E_SET_QUANTIZER_ONE_PASS: Set quantizer for each frame.
* AV1E_ENABLE_RATE_GUIDE_DELTAQ: enable the rate distribution guided delta
quantization in all intra mode. The "enable-rate-guide-deltaq" option is
added for this control.
* AV1E_SET_RATE_DISTRIBUTION_INFO: set the input file for rate
distribution used in all intra mode. The "rate-distribution-info" option
is added for this control.
* AV1E_GET_LUMA_CDEF_STRENGTH
* AV1E_SET_BITRATE_ONE_PASS_CBR
* AOM_SCALING_MODE is extended to include 2/3 and 1/3 scaling.
* aom_tune_metric is extended to include AOM_TUNE_VMAF_SALIENCY_MAP.
The "tune" option is extended to include "vmaf_saliency_map".
* SVC example encoder svc_encoder_rtc is able to use the rate control
library.
* Loopfilter level and CDEF filter level is supported by RTC rate control
library.
* New speed (--cpu-used) 11, intended for RTC screen sharing, added for
faster encoding with ~3% bdrate loss with 16% IC (instruction count)
speedup compared to speed 10.
- Compression Efficiency Improvements
* Improved VoD encoding performance
* 0.1-0.6% BDrate gains for encoding speeds 2 to 6
* Rate control accuracy improvement in VBR mode
* RTC encoding improvements
* Screen content mode: 10-19% BDrate gains for speeds 6 - 10
* Temporal layers video mode, for speed 10:
* 2 temporal layers on low resolutions: 13-15% BDrate gain
* 3 temporal layers on VGA/HD: 3-4% BDrate gain
- Perceptual Quality Improvements
* Fixed multiple block and color artifacts for RTC screen content by
* Incorporating color into RD cost for IDTX
* Reducing thresholds for palette mode in non RD mode
* Allowing more palette mode testing
* Improved color sensitivity for altref in non-RD mode.
* Reduced video flickering for temporal layer encoding.
- Speedup and Memory Optimizations
* Speed up the VoD encoder
* 2-5% for encoding speed 2 to 4
* 9-15% for encoding speed 5 to 6
* ARM
* Standard bitdepth
* speed 5: +31%
* speed 4: +2%
* speed 3: +9%
* speed 2: +157%
* High bitdepth
* speed 5: +85%
* RTC speedups
* Screen content mode
* 15% IC speedup for speeds 6-8
* ARM: 7% for speed 9, 3% for speed 10
* Temporal layers video mode
* 7% speedup for 3 temporal layers on VGA/HD, for speed 10
* Single layer video
* x86: 2% IC speedup for speeds 7-10
* ARM: 2-4% speedup across speeds 5-10
- Other improvements
* VoD: Major improvements to global motion estimation, now enabled up to
speed 4
* RTC
* Fixes to make lossless coding work.
* Fixes to make frame dropper (--drop_frames) work for single and temporal
layers.
* Improvements to RPS (reference picture selection) recovery frames.
* Improvements to rate control for temporal layers.
* libwebm is updated to libwebm-1.0.0.29-9-g1930e3c
- Bug Fixes
* aomedia:3261 Assertion failed when encoding av1 with film grain and
'--monochrome' flag
* aomedia:3276 ensure all allocations are checked (partial fix)
* aomedia:3451 The libaom library calls exit()
* aomedia:3450 enable -Wshadow for C++ sources
* aomedia:3449 Test Seg Faults After
b459af3e345be402db052a143fcc5383d4b74cbd
* aomedia:3416 prune unused symbols / restrict symbol visibility
* aomedia:3443 Jenkins failure:
UninstantiatedParameterizedTestSuite<EstimateNoiseTest>
* aomedia:3434 realtime failures with CONFIG_BITSTREAM_DEBUG=1
* aomedia:3433 DeltaqModeTest crash w/row_mt=0
* aomedia:3429 Encoder crash when turn on both ExternalResize and
g_threads > 2
* aomedia:3438 Build failure with
`-DSANITIZE=address -DBUILD_SHARED_LIBS=ON` when using clang.
* aomedia:3435 Block artifacts when scrolling with AV1 in screen sharing
scenarios
* aomedia:3170 vmaf tune presets produce extreme glitches in one scene
* aomedia:3401 Building shared libaom with MSVC results in a race condition
with the export library
* aomedia:3420 Floating point exception in av1_tpl_get_frame_importance()
* aomedia:3424 heap-buffer-overflow in ScaleFilterCols_16_C() (SIGABRT)
* aomedia:3417 examples/svc_encoder_rtc.c is using internal macros and
functions
* aomedia:3372 SEGV in assign_frame_buffer_p av1_common_int.h
* aomedia:3130 'cpu-features.h' file not found on Android NDK 22
* aomedia:3415 Encoder/decoder mismatch for svc_encoder_rtc running
1 SL 3 TL
* aomedia:3412 Lossless Mode Fails Loopback Bit Test
* aomedia:3409 The use of AV1_VAR_OFFS in av1/encoder/var_based_part.c is
incorrect for high bit depths
* aomedia:3403 test_libaom fails with error message
"feenableexcept() failed" on Linux arm
* aomedia:3370 Random color block at fast motion area
* aomedia:3393 Assertion failure in av1_convolve_2d_sr_c()
* aomedia:3392 Strong artifacting for high bit-depth real-time
* aomedia:3376 aomenc --threads=10 --deltaq-mode=3 crashes after
"Allintra: multi-threading of calculating differential contrast"
* aomedia:3380 Crashes and ASan and TSan errors in deltaq-mode=3
multithreading code
* chromium:1410766 heap-buffer-overflow in aom_yv12_copy_v_c
* Cannot set level via AV1E_SET_TARGET_SEQ_LEVEL_IDX
* Encoding failure due to the use of loop restoration with unintended use of
lossless mode.
* Signed integer overflow in scan_past_frames
* Signed integer overflow in update_a_sep_sym
* Flickering in AV1 1440p/2160p HDR transcodes
* Fixed artifacts with screen share at encoder speed 10
* Fixed prediction setup for IDTX
Bug: 299684368
Test: atest CtsMediaV2TestCases
(cherry picked from https://android-review.googlesource.com/q/commit:eb47d839a7b1731d294f150ee256cec1546958b3)
Merged-In: Ic153bff94ca4bdb8f60f2769026615cd10e07bac
Change-Id: Ic153bff94ca4bdb8f60f2769026615cd10e07bac
diff --git a/.mailmap b/.mailmap
index 7ee51a4..7d31a70 100644
--- a/.mailmap
+++ b/.mailmap
@@ -9,6 +9,8 @@
Arild Fuldseth <[email protected]> <[email protected]>
Aasaipriya Chandran <[email protected]>
Aasaipriya Chandran <[email protected]> Aasaipriya C <[email protected]>
+Apurve Pandey <[email protected]>
+Apurve Kumar Pandey <[email protected]> Apurve Pandey
Bohan Li <[email protected]>
Changjun Yang <[email protected]>
Chi Yo Tsai <[email protected]>
@@ -53,8 +55,11 @@
Michael Horowitz <[email protected]> <[email protected]>
Mingliang Chen <[email protected]>
Monty Montgomery <[email protected]>
+Mudassir Galaganath <[email protected]>
+Mudassir Galaganath <[email protected]> Mudassir Galagnath
Nathan E. Egge <[email protected]>
Nathan E. Egge <[email protected]> <[email protected]>
+Onur Guleryuz <[email protected]>
Pascal Massimino <[email protected]>
Pascal Massimino <[email protected]> <[email protected]>
Paul Wilkins <[email protected]>
diff --git a/AUTHORS b/AUTHORS
index 3695891..79056a1 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -26,7 +26,7 @@
Aniket Wanare <[email protected]>
Ankur Saxena <[email protected]>
Anupam Pandey <[email protected]>
-Apurve Pandey <[email protected]>
+Apurve Kumar Pandey <[email protected]>
Arild Fuldseth <[email protected]>
Aron Rosenberg <[email protected]>
Arun Singh Negi <[email protected]>
@@ -36,6 +36,7 @@
Brennan Shacklett <[email protected]>
Brion Vibber <[email protected]>
Bruno Berthier <[email protected]>
+Casey Smalley <[email protected]>
Changjun Yang <[email protected]>
Charles 'Buck' Krasic <[email protected]>
Cheng Chen <[email protected]>
@@ -80,6 +81,7 @@
Fritz Koenig <[email protected]>
Fyodor Kyslov <[email protected]>
Gaute Strokkenes <[email protected]>
+George Steed <[email protected]>
Gerda Zsejke More <[email protected]>
Geza Lore <[email protected]>
Ghislain MARY <[email protected]>
@@ -140,6 +142,7 @@
Katsuhisa Yuasa <[email protected]>
Kavi Ramamurthy <[email protected]>
KO Myung-Hun <[email protected]>
+Konstantinos Margaritis <[email protected]>
Krishna Malladi <[email protected]>
Kwanghoon Son <[email protected]>
Kyle Siefring <[email protected]>
@@ -163,6 +166,7 @@
Makoto Kato <[email protected]>
Mans Rullgard <[email protected]>
Marco Paniconi <[email protected]>
+Mark Horvath <[email protected]>
Mark Mentovai <[email protected]>
Mark Wachsler <[email protected]>
Martin Ettl <[email protected]>
@@ -184,8 +188,9 @@
Mirko Bonadei <[email protected]>
Monty Montgomery <[email protected]>
Morton Jonuschat <[email protected]>
-Mudassir Galagnath <[email protected]>
+Mudassir Galaganath <[email protected]>
Mufaddal Chakera <[email protected]>
+Narayan Kalaburgi <[email protected]>
Narayan <[email protected]>
Nathan E. Egge <[email protected]>
Neeraj Gadgil <[email protected]>
@@ -195,6 +200,7 @@
Nithya V S <[email protected]>
Ola Hugosson <[email protected]>
Oleg Nalivayko <[email protected]>
+Onur Guleryuz <[email protected]>
Parag Salasakar <[email protected]>
Pascal Massimino <[email protected]>
Patrik Westin <[email protected]>
@@ -232,6 +238,7 @@
Ryan Overbeck <[email protected]>
Sachin Kumar Garg <[email protected]>
Sai Deng <[email protected]>
+Salome Thirot <[email protected]>
Sami Boukortt <[email protected]>
Sami Pietilä <[email protected]>
Samuel Thibault <[email protected]>
@@ -298,6 +305,7 @@
Yingying Ma <[email protected]>
Yongzhe Wang <[email protected]>
Yuan Tong <[email protected]>
+Yu-Chen (Eric) Sun <[email protected]>
Yue Chen <[email protected]>
Yunqing Wang <[email protected]>
Yury Gitman <[email protected]>
diff --git a/Android.bp b/Android.bp
index 7f37b93..c7cb621 100644
--- a/Android.bp
+++ b/Android.bp
@@ -30,6 +30,7 @@
"av1/common/arm/cdef_block_neon.c",
"av1/common/arm/cfl_neon.c",
"av1/common/arm/convolve_neon.c",
+ "av1/common/arm/highbd_convolve_neon.c",
"av1/common/arm/highbd_inv_txfm_neon.c",
"av1/common/arm/jnt_convolve_neon.c",
"av1/common/arm/reconinter_neon.c",
@@ -164,6 +165,7 @@
"av1/encoder/arm/neon/av1_error_neon.c",
"av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c",
"av1/encoder/arm/neon/av1_highbd_quantize_neon.c",
+ "av1/encoder/arm/neon/av1_k_means_neon.c",
"av1/encoder/arm/neon/encodetxb_neon.c",
"av1/encoder/arm/neon/highbd_fwd_txfm_neon.c",
"av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c",
@@ -171,6 +173,7 @@
"av1/encoder/arm/neon/picksrt_neon.c",
"av1/encoder/arm/neon/quantize_neon.c",
"av1/encoder/arm/neon/rdopt_neon.c",
+ "av1/encoder/arm/neon/reconinter_enc_neon.c",
"av1/encoder/arm/neon/temporal_filter_neon.c",
"av1/encoder/arm/neon/wedge_utils_neon.c",
]
@@ -252,6 +255,7 @@
"av1/encoder/ml.c",
"av1/encoder/motion_search_facade.c",
"av1/encoder/mv_prec.c",
+ "av1/encoder/nonrd_opt.c",
"av1/encoder/nonrd_pickmode.c",
"av1/encoder/palette.c",
"av1/encoder/partition_search.c",
@@ -282,11 +286,25 @@
"third_party/vector/vector.c",
]
-aom_av1_rc_qmode_sources = [
- "av1/qmode_rc/ducky_encode.cc",
- "av1/qmode_rc/ratectrl_qmode.cc",
- "av1/qmode_rc/ratectrl_qmode_interface.cc",
- "av1/qmode_rc/reference_manager.cc",
+aom_av1_rc_sources = [
+ "av1/ratectrl_rtc.cc",
+]
+
+aom_common_app_util_sources = [
+ "av1/arg_defs.c",
+ "common/args.c",
+ "common/args_helper.c",
+ "common/av1_config.c",
+ "common/ivfdec.c",
+ "common/md5_utils.c",
+ "common/rawenc.c",
+ "common/tools_common.c",
+ "common/y4menc.c",
+]
+
+aom_decoder_app_util_sources = [
+ "common/obudec.c",
+ "common/video_reader.c",
]
aom_dsp_common_asm_sse2 = [
@@ -316,7 +334,9 @@
]
aom_dsp_common_intrin_neon = [
+ "aom_dsp/arm/aom_convolve8_neon.c",
"aom_dsp/arm/aom_convolve_copy_neon.c",
+ "aom_dsp/arm/avg_pred_neon.c",
"aom_dsp/arm/blend_a64_mask_neon.c",
"aom_dsp/arm/fwd_txfm_neon.c",
"aom_dsp/arm/highbd_intrapred_neon.c",
@@ -422,10 +442,18 @@
aom_dsp_encoder_intrin_neon = [
"aom_dsp/arm/avg_neon.c",
"aom_dsp/arm/hadamard_neon.c",
+ "aom_dsp/arm/highbd_avg_neon.c",
+ "aom_dsp/arm/highbd_hadamard_neon.c",
"aom_dsp/arm/highbd_quantize_neon.c",
+ "aom_dsp/arm/highbd_sad4d_neon.c",
+ "aom_dsp/arm/highbd_sad_neon.c",
"aom_dsp/arm/highbd_variance_neon.c",
- "aom_dsp/arm/sad4d_neon.c",
+ "aom_dsp/arm/masked_sad4d_neon.c",
+ "aom_dsp/arm/masked_sad_neon.c",
+ "aom_dsp/arm/obmc_sad_neon.c",
+ "aom_dsp/arm/obmc_variance_neon.c",
"aom_dsp/arm/sad_neon.c",
+ "aom_dsp/arm/sadxd_neon.c",
"aom_dsp/arm/sse_neon.c",
"aom_dsp/arm/subpel_variance_neon.c",
"aom_dsp/arm/sum_squares_neon.c",
@@ -441,6 +469,7 @@
"aom_dsp/x86/highbd_quantize_intrin_sse2.c",
"aom_dsp/x86/highbd_subtract_sse2.c",
"aom_dsp/x86/highbd_variance_sse2.c",
+ "aom_dsp/x86/jnt_sad_sse2.c",
"aom_dsp/x86/quantize_sse2.c",
"aom_dsp/x86/sum_squares_sse2.c",
"aom_dsp/x86/variance_sse2.c",
@@ -448,6 +477,7 @@
aom_dsp_encoder_intrin_sse4_1 = [
"aom_dsp/flow_estimation/x86/corner_match_sse4.c",
+ "aom_dsp/flow_estimation/x86/disflow_sse4.c",
"aom_dsp/x86/avg_intrin_sse4.c",
"aom_dsp/x86/highbd_variance_sse4.c",
"aom_dsp/x86/obmc_sad_sse4.c",
@@ -456,7 +486,6 @@
]
aom_dsp_encoder_intrin_ssse3 = [
- "aom_dsp/x86/jnt_sad_ssse3.c",
"aom_dsp/x86/jnt_variance_ssse3.c",
"aom_dsp/x86/masked_sad4d_ssse3.c",
"aom_dsp/x86/masked_sad_intrin_ssse3.c",
@@ -481,6 +510,7 @@
"aom_dsp/noise_model.c",
"aom_dsp/noise_util.c",
"aom_dsp/psnr.c",
+ "aom_dsp/pyramid.c",
"aom_dsp/quantize.c",
"aom_dsp/sad.c",
"aom_dsp/sad_av1.c",
@@ -490,11 +520,28 @@
"aom_dsp/variance.c",
]
+aom_encoder_app_util_sources = [
+ "common/ivfenc.c",
+ "common/video_writer.c",
+ "common/warnings.c",
+ "common/y4minput.c",
+ "examples/encoder_util.c",
+]
+
aom_encoder_stats_sources = [
"stats/aomstats.c",
"stats/rate_hist.c",
]
+aom_libwebm_sources = [
+ "third_party/libwebm/common/hdr_util.cc",
+ "third_party/libwebm/mkvmuxer/mkvmuxer.cc",
+ "third_party/libwebm/mkvmuxer/mkvmuxerutil.cc",
+ "third_party/libwebm/mkvmuxer/mkvwriter.cc",
+ "third_party/libwebm/mkvparser/mkvparser.cc",
+ "third_party/libwebm/mkvparser/mkvreader.cc",
+]
+
aom_mem_sources = [
"aom_mem/aom_mem.c",
]
@@ -503,14 +550,6 @@
"aom_ports/float.asm",
]
-aom_rc_interface_sources = [
- "common/y4minput.c",
- "test/decode_test_driver.cc",
- "test/encode_test_driver.cc",
- "test/ratectrl_rtc_test.cc",
- "test/test_aom_rc_interface.cc",
-]
-
aom_rtcd_sources = [
"aom_dsp/aom_dsp_rtcd.c",
"aom_scale/aom_scale_rtcd.c",
@@ -534,7 +573,6 @@
aom_util_sources = [
"aom_util/aom_thread.c",
- "aom_util/debug_util.c",
]
aom_webm_decoder_sources = [
@@ -545,13 +583,6 @@
"common/webmenc.cc",
]
-av1_rc_qmode_sources = [
- "common/tools_common.c",
- "common/y4minput.c",
- "test/ducky_encode_test.cc",
- "test/ratectrl_qmode_test.cc",
-]
-
aom_rtcd_sources_gen = [
]
@@ -562,10 +593,6 @@
aom_version_sources_gen = [
]
-av1_rc_qmode_sources_gen = [
- "gen_src/usage_exit.c",
-]
-
aom_av1_common_sources += ["common/av1_config.c"]
package {
diff --git a/CHANGELOG b/CHANGELOG
index 531c6d9..f35903d 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,133 @@
+2023-08-10 v3.7.0
+ This release includes new codec interfaces, compression efficiency and
+ perceptual improvements, speedup and memory optimizations and many bug fixes.
+ This release is ABI compatible with the last release.
+
+ - New Features
+ * New codec controls:
+ * AV1E_SET_QUANTIZER_ONE_PASS: Set quantizer for each frame.
+ * AV1E_ENABLE_RATE_GUIDE_DELTAQ: enable the rate distribution guided delta
+ quantization in all intra mode. The "enable-rate-guide-deltaq" option is
+ added for this control.
+ * AV1E_SET_RATE_DISTRIBUTION_INFO: set the input file for rate
+ distribution used in all intra mode. The "rate-distribution-info" option
+ is added for this control.
+ * AV1E_GET_LUMA_CDEF_STRENGTH
+ * AV1E_SET_BITRATE_ONE_PASS_CBR
+ * AOM_SCALING_MODE is extended to include 2/3 and 1/3 scaling.
+ * aom_tune_metric is extended to include AOM_TUNE_VMAF_SALIENCY_MAP.
+ The "tune" option is extended to include "vmaf_saliency_map".
+ * SVC example encoder svc_encoder_rtc is able to use the rate control
+ library.
+ * Loopfilter level and CDEF filter level is supported by RTC rate control
+ library.
+ * New speed (--cpu-used) 11, intended for RTC screen sharing, added for
+ faster encoding with ~3% bdrate loss with 16% IC (instruction count)
+ speedup compared to speed 10.
+
+ - Compression Efficiency Improvements
+ * Improved VoD encoding performance
+ * 0.1-0.6% BDrate gains for encoding speeds 2 to 6
+ * Rate control accuracy improvement in VBR mode
+ * RTC encoding improvements
+ * Screen content mode: 10-19% BDrate gains for speeds 6 - 10
+ * Temporal layers video mode, for speed 10:
+ * 2 temporal layers on low resolutions: 13-15% BDrate gain
+ * 3 temporal layers on VGA/HD: 3-4% BDrate gain
+
+ - Perceptual Quality Improvements
+ * Fixed multiple block and color artifacts for RTC screen content by
+ * Incorporating color into RD cost for IDTX
+ * Reducing thresholds for palette mode in non RD mode
+ * Allowing more palette mode testing
+ * Improved color sensitivity for altref in non-RD mode.
+ * Reduced video flickering for temporal layer encoding.
+
+ - Speedup and Memory Optimizations
+ * Speed up the VoD encoder
+ * 2-5% for encoding speed 2 to 4
+ * 9-15% for encoding speed 5 to 6
+ * ARM
+ * Standard bitdepth
+ * speed 5: +31%
+ * speed 4: +2%
+ * speed 3: +9%
+ * speed 2: +157%
+ * High bitdepth
+ * speed 5: +85%
+ * RTC speedups
+ * Screen content mode
+ * 15% IC speedup for speeds 6-8
+ * ARM: 7% for speed 9, 3% for speed 10
+ * Temporal layers video mode
+ * 7% speedup for 3 temporal layers on VGA/HD, for speed 10
+ * Single layer video
+ * x86: 2% IC speedup for speeds 7-10
+ * ARM: 2-4% speedup across speeds 5-10
+
+ - Other improvements
+ * VoD: Major improvements to global motion estimation, now enabled up to
+ speed 4
+ * RTC
+ * Fixes to make lossless coding work.
+ * Fixes to make frame dropper (--drop_frames) work for single and temporal
+ layers.
+ * Improvements to RPS (reference picture selection) recovery frames.
+ * Improvements to rate control for temporal layers.
+ * libwebm is updated to libwebm-1.0.0.29-9-g1930e3c
+
+ - Bug Fixes
+ * aomedia:3261 Assertion failed when encoding av1 with film grain and
+ '--monochrome' flag
+ * aomedia:3276 ensure all allocations are checked (partial fix)
+ * aomedia:3451 The libaom library calls exit()
+ * aomedia:3450 enable -Wshadow for C++ sources
+ * aomedia:3449 Test Seg Faults After
+ b459af3e345be402db052a143fcc5383d4b74cbd
+ * aomedia:3416 prune unused symbols / restrict symbol visibility
+ * aomedia:3443 Jenkins failure:
+ UninstantiatedParameterizedTestSuite<EstimateNoiseTest>
+ * aomedia:3434 realtime failures with CONFIG_BITSTREAM_DEBUG=1
+ * aomedia:3433 DeltaqModeTest crash w/row_mt=0
+ * aomedia:3429 Encoder crash when turn on both ExternalResize and
+ g_threads > 2
+ * aomedia:3438 Build failure with
+ `-DSANITIZE=address -DBUILD_SHARED_LIBS=ON` when using clang.
+ * aomedia:3435 Block artifacts when scrolling with AV1 in screen sharing
+ scenarios
+ * aomedia:3170 vmaf tune presets produce extreme glitches in one scene
+ * aomedia:3401 Building shared libaom with MSVC results in a race condition
+ with the export library
+ * aomedia:3420 Floating point exception in av1_tpl_get_frame_importance()
+ * aomedia:3424 heap-buffer-overflow in ScaleFilterCols_16_C() (SIGABRT)
+ * aomedia:3417 examples/svc_encoder_rtc.c is using internal macros and
+ functions
+ * aomedia:3372 SEGV in assign_frame_buffer_p av1_common_int.h
+ * aomedia:3130 'cpu-features.h' file not found on Android NDK 22
+ * aomedia:3415 Encoder/decoder mismatch for svc_encoder_rtc running
+ 1 SL 3 TL
+ * aomedia:3412 Lossless Mode Fails Loopback Bit Test
+ * aomedia:3409 The use of AV1_VAR_OFFS in av1/encoder/var_based_part.c is
+ incorrect for high bit depths
+ * aomedia:3403 test_libaom fails with error message
+ "feenableexcept() failed" on Linux arm
+ * aomedia:3370 Random color block at fast motion area
+ * aomedia:3393 Assertion failure in av1_convolve_2d_sr_c()
+ * aomedia:3392 Strong artifacting for high bit-depth real-time
+ * aomedia:3376 aomenc --threads=10 --deltaq-mode=3 crashes after
+ "Allintra: multi-threading of calculating differential contrast"
+ * aomedia:3380 Crashes and ASan and TSan errors in deltaq-mode=3
+ multithreading code
+ * chromium:1410766 heap-buffer-overflow in aom_yv12_copy_v_c
+ * Cannot set level via AV1E_SET_TARGET_SEQ_LEVEL_IDX
+ * Encoding failure due to the use of loop restoration with unintended use of
+ lossless mode.
+ * Signed integer overflow in scan_past_frames
+ * Signed integer overflow in update_a_sep_sym
+ * Flickering in AV1 1440p/2160p HDR transcodes
+ * Fixed artifacts with screen share at encoder speed 10
+ * Fixed prediction setup for IDTX
+
2023-05-08 v3.6.1
This release includes several bug fixes. This release is ABI
compatible with the last release. See
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 87d88fa..8f459f3 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -11,7 +11,7 @@
if(CONFIG_TFLITE)
cmake_minimum_required(VERSION 3.11)
else()
- cmake_minimum_required(VERSION 3.7)
+ cmake_minimum_required(VERSION 3.9)
endif()
set(AOM_ROOT "${CMAKE_CURRENT_SOURCE_DIR}")
@@ -41,6 +41,13 @@
endif()
endif()
+if(MSVC AND MSVC_VERSION LESS 1920)
+ message(
+ WARNING
+ "MSVC versions prior to 2019 (v16) are not supported and may generate"
+ " incorrect code!")
+endif()
+
# Library version info. Update LT_CURRENT, LT_REVISION and LT_AGE when making a
# public release by following the guidelines in the libtool document:
# https://www.gnu.org/software/libtool/manual/libtool.html#Updating-version-info
@@ -51,9 +58,9 @@
# passed to libtool.
#
# We set SO_FILE_VERSION = [c-a].a.r
-set(LT_CURRENT 9)
-set(LT_REVISION 1)
-set(LT_AGE 6)
+set(LT_CURRENT 10)
+set(LT_REVISION 0)
+set(LT_AGE 7)
math(EXPR SO_VERSION "${LT_CURRENT} - ${LT_AGE}")
set(SO_FILE_VERSION "${SO_VERSION}.${LT_AGE}.${LT_REVISION}")
unset(LT_CURRENT)
@@ -210,13 +217,9 @@
include_directories(${AOM_ROOT} ${AOM_CONFIG_DIR} ${AOM_ROOT}/apps
${AOM_ROOT}/common ${AOM_ROOT}/examples ${AOM_ROOT}/stats)
-if(CONFIG_RUNTIME_CPU_DETECT AND ANDROID_NDK)
- include_directories(${ANDROID_NDK}/sources/android/cpufeatures)
-endif()
-
# Targets
add_library(aom_version ${AOM_VERSION_SOURCES})
-add_dummy_source_file_to_target(aom_version c)
+add_no_op_source_file_to_target(aom_version c)
add_custom_command(OUTPUT "${AOM_CONFIG_DIR}/config/aom_version.h"
COMMAND ${CMAKE_COMMAND} ARGS
-DAOM_CONFIG_DIR=${AOM_CONFIG_DIR}
@@ -270,10 +273,26 @@
set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_encoder_stats)
endif()
-add_library(aom ${AOM_SOURCES} $<TARGET_OBJECTS:aom_rtcd>)
+# Xcode generator cannot take a library composed solely of objects. See
+# https://gitlab.kitware.com/cmake/cmake/-/issues/17500
+if(XCODE)
+ set(target_objs_aom ${AOM_SOURCES})
+else()
+ add_library(aom_obj OBJECT ${AOM_SOURCES})
+ set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_obj)
+ set(target_objs_aom $<TARGET_OBJECTS:aom_obj>)
+endif()
+add_library(aom ${target_objs_aom} $<TARGET_OBJECTS:aom_rtcd>)
+
if(BUILD_SHARED_LIBS)
- add_library(aom_static STATIC ${AOM_SOURCES} $<TARGET_OBJECTS:aom_rtcd>)
+ add_library(aom_static STATIC ${target_objs_aom} $<TARGET_OBJECTS:aom_rtcd>)
set_target_properties(aom_static PROPERTIES OUTPUT_NAME aom)
+ if(MSVC OR (WIN32 AND NOT MINGW))
+ # Fix race condition on the export library file between the two versions.
+ # Affects MSVC in all three flavors (stock, Clang/CL, LLVM-- the latter sets
+ # MSVC and MINGW both to FALSE).
+ set_target_properties(aom PROPERTIES ARCHIVE_OUTPUT_NAME "aom_dll")
+ endif()
if(NOT MSVC)
# Extract version string and set VERSION/SOVERSION for the aom target.
@@ -304,7 +323,7 @@
endif()
endif()
-if(CONFIG_AV1_RC_RTC AND CONFIG_AV1_ENCODER AND NOT BUILD_SHARED_LIBS)
+if(CONFIG_AV1_ENCODER AND NOT CONFIG_REALTIME_ONLY AND NOT BUILD_SHARED_LIBS)
list(APPEND AOM_AV1_RC_SOURCES "${AOM_ROOT}/av1/ratectrl_rtc.h"
"${AOM_ROOT}/av1/ratectrl_rtc.cc")
add_library(aom_av1_rc ${AOM_AV1_RC_SOURCES})
@@ -312,33 +331,13 @@
if(NOT WIN32 AND NOT APPLE)
target_link_libraries(aom_av1_rc ${AOM_LIB_LINK_TYPE} m)
endif()
-endif()
-
-if(CONFIG_AV1_ENCODER AND NOT CONFIG_REALTIME_ONLY AND NOT BUILD_SHARED_LIBS)
- list(APPEND AOM_AV1_RC_QMODE_SOURCES
- "${AOM_ROOT}/av1/qmode_rc/ratectrl_qmode_interface.h"
- "${AOM_ROOT}/av1/qmode_rc/ratectrl_qmode_interface.cc"
- "${AOM_ROOT}/av1/qmode_rc/reference_manager.h"
- "${AOM_ROOT}/av1/qmode_rc/reference_manager.cc"
- "${AOM_ROOT}/av1/qmode_rc/ratectrl_qmode.h"
- "${AOM_ROOT}/av1/qmode_rc/ratectrl_qmode.cc"
- "${AOM_ROOT}/av1/qmode_rc/ducky_encode.h"
- "${AOM_ROOT}/av1/qmode_rc/ducky_encode.cc")
- add_library(av1_rc_qmode ${AOM_AV1_RC_QMODE_SOURCES})
- target_link_libraries(av1_rc_qmode ${AOM_LIB_LINK_TYPE} aom)
- if(NOT MSVC AND NOT APPLE)
- target_link_libraries(av1_rc_qmode ${AOM_LIB_LINK_TYPE} m)
- endif()
- set_target_properties(av1_rc_qmode PROPERTIES LINKER_LANGUAGE CXX)
+ set_target_properties(aom_av1_rc PROPERTIES LINKER_LANGUAGE CXX)
endif()
# List of object and static library targets.
set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_rtcd aom_mem aom_scale aom)
-if(CONFIG_AV1_RC_RTC AND CONFIG_AV1_ENCODER AND NOT BUILD_SHARED_LIBS)
- set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_av1_rc)
-endif()
if(CONFIG_AV1_ENCODER AND NOT CONFIG_REALTIME_ONLY AND NOT BUILD_SHARED_LIBS)
- set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} av1_rc_qmode)
+ set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_av1_rc)
endif()
if(BUILD_SHARED_LIBS)
set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} aom_static)
@@ -362,13 +361,13 @@
endif()
endforeach()
-# Generate C/C++ stub files containing the function usage_exit(). Users of the
+# Generate a C file containing the function usage_exit(). Users of the
# aom_common_app_util library must define this function. This is a convenience
# to allow omission of the function from applications that might want to use
# other pieces of the util support without defining usage_exit().
-file(WRITE "${AOM_GEN_SRC_DIR}/usage_exit.c" "void usage_exit(void) {}")
-file(WRITE "${AOM_GEN_SRC_DIR}/usage_exit.cc"
- "extern \"C\" void usage_exit(void) {}")
+file(WRITE "${AOM_GEN_SRC_DIR}/usage_exit.c"
+ "#include <stdlib.h>\n\n#include \"common/tools_common.h\"\n\n"
+ "void usage_exit(void) { exit(EXIT_FAILURE); }\n")
#
# Application and application support targets.
@@ -461,7 +460,7 @@
if(CONFIG_LIBYUV OR CONFIG_TUNE_BUTTERAUGLI)
add_library(yuv OBJECT ${AOM_LIBYUV_SOURCES})
if(NOT MSVC)
- target_compile_options(yuv PRIVATE -Wno-unused-parameter)
+ target_compile_options(yuv PRIVATE -Wno-shadow)
endif()
include_directories("${AOM_ROOT}/third_party/libyuv/include")
endif()
@@ -495,7 +494,7 @@
$<TARGET_OBJECTS:aom_common_app_util>
$<TARGET_OBJECTS:aom_encoder_app_util>)
- add_executable(svc_encoder_rtc "${AOM_ROOT}/examples/svc_encoder_rtc.c"
+ add_executable(svc_encoder_rtc "${AOM_ROOT}/examples/svc_encoder_rtc.cc"
$<TARGET_OBJECTS:aom_common_app_util>
$<TARGET_OBJECTS:aom_encoder_app_util>)
@@ -634,15 +633,15 @@
if(PKG_CONFIG_FOUND)
pkg_check_modules(VMAF REQUIRED libvmaf)
if(BUILD_SHARED_LIBS)
- target_link_libraries(aom PRIVATE ${VMAF_LDFLAGS} ${VMAF_LIBRARIES})
- else()
- target_link_libraries(aom
- PRIVATE ${VMAF_LDFLAGS} ${VMAF_LIBRARIES} -static)
+ target_link_libraries(aom_static
+ PRIVATE ${VMAF_LDFLAGS} ${VMAF_LIBRARIES})
endif()
- target_include_directories(aom PRIVATE ${VMAF_INCLUDE_DIRS})
+ target_link_libraries(aom PRIVATE ${VMAF_LDFLAGS} ${VMAF_LIBRARIES})
target_include_directories(aom_dsp_encoder PRIVATE ${VMAF_INCLUDE_DIRS})
if(VMAF_CFLAGS)
- append_compiler_flag("${VMAF_CFLAGS}")
+ foreach(flag "${VMAF_CFLAGS}")
+ append_compiler_flag("${flag}")
+ endforeach()
endif()
else()
message(FATAL_ERROR "CONFIG_TUNE_VMAF error: pkg-config not found.")
@@ -665,7 +664,7 @@
if(ENABLE_TOOLS)
if(CONFIG_AV1_DECODER)
- add_executable(dump_obu "${AOM_GEN_SRC_DIR}/usage_exit.cc"
+ add_executable(dump_obu "${AOM_GEN_SRC_DIR}/usage_exit.c"
"${AOM_ROOT}/tools/dump_obu.cc"
"${AOM_ROOT}/tools/obu_parser.cc"
"${AOM_ROOT}/tools/obu_parser.h"
@@ -795,7 +794,7 @@
# here, it really is the Xcode generator's fault, or just a deficiency in
# Xcode itself.
foreach(aom_app ${AOM_APP_TARGETS})
- add_dummy_source_file_to_target("${aom_app}" "cc")
+ add_no_op_source_file_to_target("${aom_app}" "cc")
endforeach()
endif()
endif()
@@ -824,7 +823,15 @@
endif()
if(BUILD_SHARED_LIBS)
- if(NOT WIN32 AND NOT APPLE)
+ # Don't use -Wl,-z,defs with Clang's sanitizers.
+ #
+ # Clang's AddressSanitizer documentation says "When linking shared libraries,
+ # the AddressSanitizer run-time is not linked, so -Wl,-z,defs may cause link
+ # errors (don't use it with AddressSanitizer)." See
+ # https://clang.llvm.org/docs/AddressSanitizer.html#usage.
+ if(NOT WIN32
+ AND NOT APPLE
+ AND NOT (CMAKE_C_COMPILER_ID MATCHES "Clang" AND SANITIZE))
# The -z defs linker option reports unresolved symbol references from object
# files when building a shared library.
if("${CMAKE_VERSION}" VERSION_LESS "3.13")
@@ -935,7 +942,7 @@
get_cmake_property(all_cmake_vars VARIABLES)
foreach(var ${all_cmake_vars})
if("${var}" MATCHES "SOURCES$\|_INTRIN_\|_ASM_"
- AND NOT "${var}" MATCHES "_APP_\|DOXYGEN\|LIBWEBM\|LIBYUV\|_PKG_\|TEST")
+ AND NOT "${var}" MATCHES "DOXYGEN\|LIBYUV\|_PKG_\|TEST")
list(APPEND aom_source_vars ${var})
endif()
endforeach()
diff --git a/METADATA b/METADATA
index eeaae2a..2534531 100644
--- a/METADATA
+++ b/METADATA
@@ -20,10 +20,10 @@
type: GIT
value: "https://aomedia.googlesource.com/aom/"
}
- version: "v3.6.1"
+ version: "v3.7.0"
last_upgrade_date {
year: 2023
- month: 5
- day: 11
+ month: 10
+ day: 9
}
}
diff --git a/README.android b/README.android
index bff3665..fcffb85 100644
--- a/README.android
+++ b/README.android
@@ -1,12 +1,12 @@
Name: libaom
URL: https://aomedia.org
-Version: v3.6.1
+Version: v3.7.0
License: BSD
License File: libaom/LICENSE
-Date: Thursday May 11 2023
-Branch: helia
-Commit: 7ade96172b95adc91a5d85bf80c90989cd543ee8
+Date: Monday October 09 2023
+Branch: ironbark
+Commit: 6054fae218eda6e53e1e3b4f7ef0fff4877c7bf1
Description:
Contains the sources used to compile libaom.
diff --git a/README.md b/README.md
index 0d51080..d7b66e0 100644
--- a/README.md
+++ b/README.md
@@ -217,27 +217,26 @@
### Microsoft Visual Studio builds {#microsoft-visual-studio-builds}
Building the AV1 codec library in Microsoft Visual Studio is supported. Visual
-Studio 2017 (15.0) or later is required. The following example demonstrates
+Studio 2019 (16.0) or later is required. The following example demonstrates
generating projects and a solution for the Microsoft IDE:
~~~
# This does not require a bash shell; Command Prompt (cmd.exe) is fine.
# This assumes the build host is a Windows x64 computer.
- # To build with Visual Studio 2019 for the x64 target:
+ # To create a Visual Studio 2022 solution for the x64 target:
+ $ cmake path/to/aom -G "Visual Studio 17 2022"
+
+ # To create a Visual Studio 2022 solution for the 32-bit x86 target:
+ $ cmake path/to/aom -G "Visual Studio 17 2022" -A Win32
+
+ # To create a Visual Studio 2019 solution for the x64 target:
$ cmake path/to/aom -G "Visual Studio 16 2019"
- $ cmake --build .
- # To build with Visual Studio 2019 for the 32-bit x86 target:
+ # To create a Visual Studio 2019 solution for the 32-bit x86 target:
$ cmake path/to/aom -G "Visual Studio 16 2019" -A Win32
- $ cmake --build .
- # To build with Visual Studio 2017 for the x64 target:
- $ cmake path/to/aom -G "Visual Studio 15 2017" -T host=x64 -A x64
- $ cmake --build .
-
- # To build with Visual Studio 2017 for the 32-bit x86 target:
- $ cmake path/to/aom -G "Visual Studio 15 2017" -T host=x64
+ # To build the solution:
$ cmake --build .
~~~
@@ -575,12 +574,19 @@
`Generate Password` Password link at the top of the page. You’ll be given
instructions for creating a cookie to use with our Git repos.
+You must also have a Gerrit account associated with your Google account. To do
+this visit the [Gerrit review server](https://aomedia-review.googlesource.com)
+and click "Sign in" (top right).
+
### Contributor agreement {#contributor-agreement}
You will be required to execute a
[contributor agreement](http://aomedia.org/license) to ensure that the AOMedia
Project has the right to distribute your changes.
+Note: If you are pushing changes on behalf of an Alliance for Open Media member
+organization this step is not necessary.
+
### Testing your code {#testing-your-code}
The testing basics are covered in the [testing section](#testing-the-av1-codec)
diff --git a/README.version b/README.version
index 398ffb8..1ce2056 100644
--- a/README.version
+++ b/README.version
@@ -1,4 +1,3 @@
URL: https://aomedia.googlesource.com/aom/
-Version: v3.6.1
+Version: v3.7.0
Local Modifications:
-* cherry-pick 4781b9f7f6 nonrd_opt: align scan tables
diff --git a/aom/aom_codec.h b/aom/aom_codec.h
index 6a9fb7b..d5b8790 100644
--- a/aom/aom_codec.h
+++ b/aom/aom_codec.h
@@ -417,19 +417,21 @@
* \param[in] ctx Pointer to this instance's context.
*
*/
-const char *aom_codec_error(aom_codec_ctx_t *ctx);
+const char *aom_codec_error(const aom_codec_ctx_t *ctx);
/*!\brief Retrieve detailed error information for codec context
*
* Returns a human readable string providing detailed information about
- * the last error.
+ * the last error. The returned string is only valid until the next
+ * aom_codec_* function call (except aom_codec_error and
+ * aom_codec_error_detail) on the codec context.
*
* \param[in] ctx Pointer to this instance's context.
*
* \retval NULL
* No detailed information is available.
*/
-const char *aom_codec_error_detail(aom_codec_ctx_t *ctx);
+const char *aom_codec_error_detail(const aom_codec_ctx_t *ctx);
/* REQUIRED FUNCTIONS
*
@@ -444,9 +446,11 @@
* \param[in] ctx Pointer to this instance's context
*
* \retval #AOM_CODEC_OK
- * The codec algorithm initialized.
- * \retval #AOM_CODEC_MEM_ERROR
- * Memory allocation failed.
+ * The codec instance has been destroyed.
+ * \retval #AOM_CODEC_INVALID_PARAM
+ * ctx is a null pointer.
+ * \retval #AOM_CODEC_ERROR
+ * Codec context not initialized.
*/
aom_codec_err_t aom_codec_destroy(aom_codec_ctx_t *ctx);
diff --git a/aom/aom_decoder.h b/aom/aom_decoder.h
index 5ce7c7b..f3f11d8 100644
--- a/aom/aom_decoder.h
+++ b/aom/aom_decoder.h
@@ -113,7 +113,7 @@
* \param[in] ver ABI version number. Must be set to
* AOM_DECODER_ABI_VERSION
* \retval #AOM_CODEC_OK
- * The decoder algorithm initialized.
+ * The decoder algorithm has been initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
diff --git a/aom/aom_encoder.h b/aom/aom_encoder.h
index c0efe79..e3d8d29 100644
--- a/aom/aom_encoder.h
+++ b/aom/aom_encoder.h
@@ -903,7 +903,7 @@
/*!\brief Initialize an encoder instance
*
- * Initializes a encoder context using the given interface. Applications
+ * Initializes an encoder context using the given interface. Applications
* should call the aom_codec_enc_init convenience macro instead of this
* function directly, to ensure that the ABI version number parameter
* is properly initialized.
@@ -912,6 +912,9 @@
* is not thread safe and should be guarded with a lock if being used
* in a multithreaded context.
*
+ * If aom_codec_enc_init_ver() fails, it is not necessary to call
+ * aom_codec_destroy() on the encoder context.
+ *
* \param[in] ctx Pointer to this instance's context.
* \param[in] iface Pointer to the algorithm interface to use.
* \param[in] cfg Configuration to use, if known.
@@ -919,7 +922,7 @@
* \param[in] ver ABI version number. Must be set to
* AOM_ENCODER_ABI_VERSION
* \retval #AOM_CODEC_OK
- * The decoder algorithm initialized.
+ * The encoder algorithm has been initialized.
* \retval #AOM_CODEC_MEM_ERROR
* Memory allocation failed.
*/
@@ -1024,6 +1027,10 @@
* \param[in] img Image data to encode, NULL to flush.
* Encoding sample values outside the range
* [0..(1<<img->bit_depth)-1] is undefined behavior.
+ * Note: Although img is declared as a const pointer,
+ * if AV1E_SET_DENOISE_NOISE_LEVEL is set to a nonzero
+ * value aom_codec_encode() modifies (denoises) the
+ * samples in img->planes[i] .
* \param[in] pts Presentation time stamp, in timebase units. If img
* is NULL, pts is ignored.
* \param[in] duration Duration to show frame, in timebase units. If img
diff --git a/aom/aomcx.h b/aom/aomcx.h
index 906cf2a..a5db0a5 100644
--- a/aom/aomcx.h
+++ b/aom/aomcx.h
@@ -1481,6 +1481,52 @@
*/
AV1E_ENABLE_SB_QP_SWEEP = 158,
+ /*!\brief Codec control to set quantizer for the next frame, int parameter.
+ *
+ * - Valid range [0, 63]
+ *
+ * This will turn off cyclic refresh. Only applicable to 1-pass.
+ */
+ AV1E_SET_QUANTIZER_ONE_PASS = 159,
+
+ /*!\brief Codec control to enable the rate distribution guided delta
+ * quantization in all intra mode, unsigned int parameter
+ *
+ * - 0 = disable (default)
+ * - 1 = enable
+ *
+ * \attention This feature requires --deltaq-mode=3, also an input file
+ * which contains rate distribution for each 16x16 block,
+ * passed in by --rate-distribution-info=rate_distribution.txt.
+ */
+ AV1E_ENABLE_RATE_GUIDE_DELTAQ = 160,
+
+ /*!\brief Codec control to set the input file for rate distribution used
+ * in all intra mode, const char * parameter
+ * The input should be the name of a text file, which
+ * contains (rows x cols) float values separated by space.
+ * Each float value represent the number of bits for each 16x16 block.
+ * rows = (frame_height + 15) / 16
+ * cols = (frame_width + 15) / 16
+ *
+ * \attention This feature requires --enable-rate-guide-deltaq=1.
+ */
+ AV1E_SET_RATE_DISTRIBUTION_INFO = 161,
+
+ /*!\brief Codec control to get the CDEF strength for Y / luma plane,
+ * int * parameter.
+ * Returns an integer array of CDEF_MAX_STRENGTHS elements.
+ */
+ AV1E_GET_LUMA_CDEF_STRENGTH = 162,
+
+ /*!\brief Codec control to set the target bitrate in kilobits per second,
+ * unsigned int parameter. For 1 pass CBR mode, single layer encoding.
+ * This controls replaces the call aom_codec_enc_config_set(&codec, &cfg)
+ * when only target bitrate is changed, and so is much cheaper as it
+ * bypasses a lot of unneeded code checks.
+ */
+ AV1E_SET_BITRATE_ONE_PASS_CBR = 163,
+
// Any new encoder control IDs should be added above.
// Maximum allowed encoder control ID is 229.
// No encoder control ID should be added below.
@@ -1497,7 +1543,9 @@
AOME_THREEFOUR = 3,
AOME_ONEFOUR = 4,
AOME_ONEEIGHT = 5,
- AOME_ONETWO = 6
+ AOME_ONETWO = 6,
+ AOME_TWOTHREE = 7,
+ AOME_ONETHREE = 8
} AOM_SCALING_MODE;
/*!\brief Max number of segments
@@ -1579,6 +1627,7 @@
AOM_TUNE_VMAF_MAX_GAIN = 6,
AOM_TUNE_VMAF_NEG_MAX_GAIN = 7,
AOM_TUNE_BUTTERAUGLI = 8,
+ AOM_TUNE_VMAF_SALIENCY_MAP = 9,
} aom_tune_metric;
/*!\brief Distortion metric to use for RD optimization.
@@ -1608,7 +1657,12 @@
int temporal_layer_id; /**< Temporal layer ID */
} aom_svc_layer_id_t;
-/*!brief Parameter type for SVC */
+/*!brief Parameter type for SVC
+ *
+ * In the arrays of size AOM_MAX_LAYERS, the index for spatial layer `sl` and
+ * temporal layer `tl` is sl * number_temporal_layers + tl.
+ *
+ */
typedef struct aom_svc_params {
int number_spatial_layers; /**< Number of spatial layers */
int number_temporal_layers; /**< Number of temporal layers */
@@ -1616,7 +1670,7 @@
int min_quantizers[AOM_MAX_LAYERS]; /**< Min Q for each layer */
int scaling_factor_num[AOM_MAX_SS_LAYERS]; /**< Scaling factor-numerator */
int scaling_factor_den[AOM_MAX_SS_LAYERS]; /**< Scaling factor-denominator */
- /*! Target bitrate for each layer */
+ /*! Target bitrate for each layer, in kilobits per second */
int layer_target_bitrate[AOM_MAX_LAYERS];
/*! Frame rate factor for each temporal layer */
int framerate_factor[AOM_MAX_TS_LAYERS];
@@ -2103,6 +2157,21 @@
AOM_CTRL_USE_TYPE(AV1E_ENABLE_SB_QP_SWEEP, unsigned int)
#define AOM_CTRL_AV1E_ENABLE_SB_QP_SWEEP
+AOM_CTRL_USE_TYPE(AV1E_SET_QUANTIZER_ONE_PASS, int)
+#define AOM_CTRL_AV1E_SET_QUANTIZER_ONE_PASS
+
+AOM_CTRL_USE_TYPE(AV1E_ENABLE_RATE_GUIDE_DELTAQ, unsigned int)
+#define AOM_CTRL_AV1E_ENABLE_RATE_GUIDE_DELTAQ
+
+AOM_CTRL_USE_TYPE(AV1E_SET_RATE_DISTRIBUTION_INFO, const char *)
+#define AOM_CTRL_AV1E_SET_RATE_DISTRIBUTION_INFO
+
+AOM_CTRL_USE_TYPE(AV1E_GET_LUMA_CDEF_STRENGTH, int *)
+#define AOM_CTRL_AV1E_GET_LUMA_CDEF_STRENGTH
+
+AOM_CTRL_USE_TYPE(AV1E_SET_BITRATE_ONE_PASS_CBR, unsigned int)
+#define AOM_CTRL_AV1E_SET_BITRATE_ONE_PASS_CBR
+
/*!\endcond */
/*! @} - end defgroup aom_encoder */
#ifdef __cplusplus
diff --git a/aom/src/aom_codec.c b/aom/src/aom_codec.c
index bc2039a..4e75fcb 100644
--- a/aom/src/aom_codec.c
+++ b/aom/src/aom_codec.c
@@ -52,12 +52,12 @@
return "Unrecognized error code";
}
-const char *aom_codec_error(aom_codec_ctx_t *ctx) {
+const char *aom_codec_error(const aom_codec_ctx_t *ctx) {
return (ctx) ? aom_codec_err_to_string(ctx->err)
: aom_codec_err_to_string(AOM_CODEC_INVALID_PARAM);
}
-const char *aom_codec_error_detail(aom_codec_ctx_t *ctx) {
+const char *aom_codec_error_detail(const aom_codec_ctx_t *ctx) {
if (ctx && ctx->err)
return ctx->priv ? ctx->priv->err_detail : ctx->err_detail;
@@ -81,7 +81,7 @@
}
aom_codec_caps_t aom_codec_get_caps(aom_codec_iface_t *iface) {
- return (iface) ? iface->caps : 0;
+ return iface ? iface->caps : 0;
}
aom_codec_err_t aom_codec_control(aom_codec_ctx_t *ctx, int ctrl_id, ...) {
diff --git a/aom/src/aom_encoder.c b/aom/src/aom_encoder.c
index 6ec2f34..a4acbcc 100644
--- a/aom/src/aom_encoder.c
+++ b/aom/src/aom_encoder.c
@@ -80,6 +80,10 @@
res = ctx->iface->init(ctx);
if (res) {
+ // IMPORTANT: ctx->priv->err_detail must be null or point to a string
+ // that remains valid after ctx->priv is destroyed, such as a C string
+ // literal. This makes it safe to call aom_codec_error_detail() after
+ // aom_codec_enc_init_ver() failed.
ctx->err_detail = ctx->priv ? ctx->priv->err_detail : NULL;
aom_codec_destroy(ctx);
}
@@ -92,7 +96,6 @@
aom_codec_enc_cfg_t *cfg,
unsigned int usage) {
aom_codec_err_t res;
- int i;
if (!iface || !cfg)
res = AOM_CODEC_INVALID_PARAM;
@@ -101,26 +104,24 @@
else {
res = AOM_CODEC_INVALID_PARAM;
- for (i = 0; i < iface->enc.cfg_count; ++i) {
+ for (int i = 0; i < iface->enc.cfg_count; ++i) {
if (iface->enc.cfgs[i].g_usage == usage) {
*cfg = iface->enc.cfgs[i];
res = AOM_CODEC_OK;
+ /* default values */
+ memset(&cfg->encoder_cfg, 0, sizeof(cfg->encoder_cfg));
+ cfg->encoder_cfg.super_block_size = 0; // Dynamic
+ cfg->encoder_cfg.max_partition_size = 128;
+ cfg->encoder_cfg.min_partition_size = 4;
+ cfg->encoder_cfg.disable_trellis_quant = 3;
break;
}
}
}
- /* default values */
- if (cfg) {
- memset(&cfg->encoder_cfg, 0, sizeof(cfg->encoder_cfg));
- cfg->encoder_cfg.super_block_size = 0; // Dynamic
- cfg->encoder_cfg.max_partition_size = 128;
- cfg->encoder_cfg.min_partition_size = 4;
- cfg->encoder_cfg.disable_trellis_quant = 3;
- }
return res;
}
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
/* On X86, disable the x87 unit's internal 80 bit precision for better
* consistency with the SSE unit's 64 bit precision.
*/
@@ -131,15 +132,17 @@
#else
#define FLOATING_POINT_SET_PRECISION
#define FLOATING_POINT_RESTORE_PRECISION
-#endif // ARCH_X86 || ARCH_X86_64
+#endif // AOM_ARCH_X86 || AOM_ARCH_X86_64
#if HAVE_FEXCEPT && CONFIG_DEBUG
#define FLOATING_POINT_SET_EXCEPTIONS \
const int float_excepts = \
feenableexcept(FE_DIVBYZERO | FE_UNDERFLOW | FE_OVERFLOW);
#define FLOATING_POINT_RESTORE_EXCEPTIONS \
- fedisableexcept(FE_ALL_EXCEPT); \
- feenableexcept(float_excepts);
+ if (float_excepts != -1) { \
+ fedisableexcept(FE_ALL_EXCEPT); \
+ feenableexcept(float_excepts); \
+ }
#else
#define FLOATING_POINT_SET_EXCEPTIONS
#define FLOATING_POINT_RESTORE_EXCEPTIONS
diff --git a/aom_dsp/aom_dsp.cmake b/aom_dsp/aom_dsp.cmake
index c5c2db7..4c60e5c 100644
--- a/aom_dsp/aom_dsp.cmake
+++ b/aom_dsp/aom_dsp.cmake
@@ -112,12 +112,14 @@
list(APPEND AOM_DSP_COMMON_INTRIN_NEON
"${AOM_ROOT}/aom_dsp/arm/aom_convolve_copy_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/aom_convolve8_neon.c"
"${AOM_ROOT}/aom_dsp/arm/fwd_txfm_neon.c"
"${AOM_ROOT}/aom_dsp/arm/loopfilter_neon.c"
"${AOM_ROOT}/aom_dsp/arm/highbd_intrapred_neon.c"
"${AOM_ROOT}/aom_dsp/arm/intrapred_neon.c"
"${AOM_ROOT}/aom_dsp/arm/subtract_neon.c"
- "${AOM_ROOT}/aom_dsp/arm/blend_a64_mask_neon.c")
+ "${AOM_ROOT}/aom_dsp/arm/blend_a64_mask_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/avg_pred_neon.c")
if(CONFIG_AV1_HIGHBITDEPTH)
list(APPEND AOM_DSP_COMMON_INTRIN_SSE2
@@ -176,7 +178,7 @@
# Flow estimation library
if(NOT CONFIG_REALTIME_ONLY)
- list(APPEND AOM_DSP_ENCODER_SOURCES
+ list(APPEND AOM_DSP_ENCODER_SOURCES "${AOM_ROOT}/aom_dsp/pyramid.c"
"${AOM_ROOT}/aom_dsp/flow_estimation/corner_detect.c"
"${AOM_ROOT}/aom_dsp/flow_estimation/corner_match.c"
"${AOM_ROOT}/aom_dsp/flow_estimation/disflow.c"
@@ -184,7 +186,8 @@
"${AOM_ROOT}/aom_dsp/flow_estimation/ransac.c")
list(APPEND AOM_DSP_ENCODER_INTRIN_SSE4_1
- "${AOM_ROOT}/aom_dsp/flow_estimation/x86/corner_match_sse4.c")
+ "${AOM_ROOT}/aom_dsp/flow_estimation/x86/corner_match_sse4.c"
+ "${AOM_ROOT}/aom_dsp/flow_estimation/x86/disflow_sse4.c")
list(APPEND AOM_DSP_ENCODER_INTRIN_AVX2
"${AOM_ROOT}/aom_dsp/flow_estimation/x86/corner_match_avx2.c")
@@ -208,7 +211,8 @@
"${AOM_ROOT}/aom_dsp/x86/quantize_x86.h"
"${AOM_ROOT}/aom_dsp/x86/blk_sse_sum_sse2.c"
"${AOM_ROOT}/aom_dsp/x86/sum_squares_sse2.c"
- "${AOM_ROOT}/aom_dsp/x86/variance_sse2.c")
+ "${AOM_ROOT}/aom_dsp/x86/variance_sse2.c"
+ "${AOM_ROOT}/aom_dsp/x86/jnt_sad_sse2.c")
list(APPEND AOM_DSP_ENCODER_ASM_SSSE3_X86_64
"${AOM_ROOT}/aom_dsp/x86/fwd_txfm_ssse3_x86_64.asm"
@@ -245,8 +249,7 @@
"${AOM_ROOT}/aom_dsp/x86/masked_variance_intrin_ssse3.c"
"${AOM_ROOT}/aom_dsp/x86/quantize_ssse3.c"
"${AOM_ROOT}/aom_dsp/x86/variance_impl_ssse3.c"
- "${AOM_ROOT}/aom_dsp/x86/jnt_variance_ssse3.c"
- "${AOM_ROOT}/aom_dsp/x86/jnt_sad_ssse3.c")
+ "${AOM_ROOT}/aom_dsp/x86/jnt_variance_ssse3.c")
list(APPEND AOM_DSP_ENCODER_INTRIN_SSE4_1
"${AOM_ROOT}/aom_dsp/x86/avg_intrin_sse4.c"
@@ -254,12 +257,17 @@
"${AOM_ROOT}/aom_dsp/x86/obmc_sad_sse4.c"
"${AOM_ROOT}/aom_dsp/x86/obmc_variance_sse4.c")
- list(APPEND AOM_DSP_ENCODER_INTRIN_NEON "${AOM_ROOT}/aom_dsp/arm/sad4d_neon.c"
+ list(APPEND AOM_DSP_ENCODER_INTRIN_NEON
+ "${AOM_ROOT}/aom_dsp/arm/sadxd_neon.c"
"${AOM_ROOT}/aom_dsp/arm/sad_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/masked_sad_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/masked_sad4d_neon.c"
"${AOM_ROOT}/aom_dsp/arm/subpel_variance_neon.c"
"${AOM_ROOT}/aom_dsp/arm/variance_neon.c"
"${AOM_ROOT}/aom_dsp/arm/hadamard_neon.c"
"${AOM_ROOT}/aom_dsp/arm/avg_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/obmc_variance_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/obmc_sad_neon.c"
"${AOM_ROOT}/aom_dsp/arm/sse_neon.c"
"${AOM_ROOT}/aom_dsp/arm/sum_squares_neon.c")
@@ -283,7 +291,11 @@
"${AOM_ROOT}/aom_dsp/x86/highbd_variance_sse4.c")
list(APPEND AOM_DSP_ENCODER_INTRIN_NEON
+ "${AOM_ROOT}/aom_dsp/arm/highbd_avg_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/highbd_hadamard_neon.c"
"${AOM_ROOT}/aom_dsp/arm/highbd_quantize_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/highbd_sad_neon.c"
+ "${AOM_ROOT}/aom_dsp/arm/highbd_sad4d_neon.c"
"${AOM_ROOT}/aom_dsp/arm/highbd_variance_neon.c")
endif()
@@ -322,8 +334,8 @@
function(setup_aom_dsp_targets)
add_library(aom_dsp_common OBJECT ${AOM_DSP_COMMON_SOURCES})
list(APPEND AOM_LIB_TARGETS aom_dsp_common)
- create_dummy_source_file("aom_av1" "c" "dummy_source_file")
- add_library(aom_dsp OBJECT "${dummy_source_file}")
+ create_no_op_source_file("aom_av1" "c" "no_op_source_file")
+ add_library(aom_dsp OBJECT "${no_op_source_file}")
target_sources(aom PRIVATE $<TARGET_OBJECTS:aom_dsp_common>)
if(BUILD_SHARED_LIBS)
target_sources(aom_static PRIVATE $<TARGET_OBJECTS:aom_dsp_common>)
@@ -331,8 +343,8 @@
list(APPEND AOM_LIB_TARGETS aom_dsp)
# Not all generators support libraries consisting only of object files. Add a
- # dummy source file to the aom_dsp target.
- add_dummy_source_file_to_target("aom_dsp" "c")
+ # source file to the aom_dsp target.
+ add_no_op_source_file_to_target("aom_dsp" "c")
if(CONFIG_AV1_DECODER)
add_library(aom_dsp_decoder OBJECT ${AOM_DSP_DECODER_SOURCES})
diff --git a/aom_dsp/aom_dsp_rtcd_defs.pl b/aom_dsp/aom_dsp_rtcd_defs.pl
index b3f8ec7..e738971 100755
--- a/aom_dsp/aom_dsp_rtcd_defs.pl
+++ b/aom_dsp/aom_dsp_rtcd_defs.pl
@@ -16,8 +16,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
EOF
}
@@ -86,104 +86,104 @@
}
specialize qw/aom_dc_top_predictor_4x4 neon sse2/;
-specialize qw/aom_dc_top_predictor_4x8 sse2/;
-specialize qw/aom_dc_top_predictor_4x16 sse2/;
-specialize qw/aom_dc_top_predictor_8x4 sse2/;
+specialize qw/aom_dc_top_predictor_4x8 neon sse2/;
+specialize qw/aom_dc_top_predictor_4x16 neon sse2/;
+specialize qw/aom_dc_top_predictor_8x4 neon sse2/;
specialize qw/aom_dc_top_predictor_8x8 neon sse2/;
-specialize qw/aom_dc_top_predictor_8x16 sse2/;
-specialize qw/aom_dc_top_predictor_8x32 sse2/;
-specialize qw/aom_dc_top_predictor_16x4 sse2/;
-specialize qw/aom_dc_top_predictor_16x8 sse2/;
+specialize qw/aom_dc_top_predictor_8x16 neon sse2/;
+specialize qw/aom_dc_top_predictor_8x32 neon sse2/;
+specialize qw/aom_dc_top_predictor_16x4 neon sse2/;
+specialize qw/aom_dc_top_predictor_16x8 neon sse2/;
specialize qw/aom_dc_top_predictor_16x16 neon sse2/;
-specialize qw/aom_dc_top_predictor_16x32 sse2/;
-specialize qw/aom_dc_top_predictor_16x64 sse2/;
-specialize qw/aom_dc_top_predictor_32x8 sse2/;
-specialize qw/aom_dc_top_predictor_32x16 sse2 avx2/;
+specialize qw/aom_dc_top_predictor_16x32 neon sse2/;
+specialize qw/aom_dc_top_predictor_16x64 neon sse2/;
+specialize qw/aom_dc_top_predictor_32x8 neon sse2/;
+specialize qw/aom_dc_top_predictor_32x16 neon sse2 avx2/;
specialize qw/aom_dc_top_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_dc_top_predictor_32x64 sse2 avx2/;
-specialize qw/aom_dc_top_predictor_64x16 sse2 avx2/;
-specialize qw/aom_dc_top_predictor_64x32 sse2 avx2/;
-specialize qw/aom_dc_top_predictor_64x64 sse2 avx2/;
+specialize qw/aom_dc_top_predictor_32x64 neon sse2 avx2/;
+specialize qw/aom_dc_top_predictor_64x16 neon sse2 avx2/;
+specialize qw/aom_dc_top_predictor_64x32 neon sse2 avx2/;
+specialize qw/aom_dc_top_predictor_64x64 neon sse2 avx2/;
specialize qw/aom_dc_left_predictor_4x4 neon sse2/;
-specialize qw/aom_dc_left_predictor_4x8 sse2/;
-specialize qw/aom_dc_left_predictor_4x16 sse2/;
-specialize qw/aom_dc_left_predictor_8x4 sse2/;
+specialize qw/aom_dc_left_predictor_4x8 neon sse2/;
+specialize qw/aom_dc_left_predictor_4x16 neon sse2/;
+specialize qw/aom_dc_left_predictor_8x4 neon sse2/;
specialize qw/aom_dc_left_predictor_8x8 neon sse2/;
-specialize qw/aom_dc_left_predictor_8x16 sse2/;
-specialize qw/aom_dc_left_predictor_8x32 sse2/;
-specialize qw/aom_dc_left_predictor_16x4 sse2/;
-specialize qw/aom_dc_left_predictor_16x8 sse2/;
+specialize qw/aom_dc_left_predictor_8x16 neon sse2/;
+specialize qw/aom_dc_left_predictor_8x32 neon sse2/;
+specialize qw/aom_dc_left_predictor_16x4 neon sse2/;
+specialize qw/aom_dc_left_predictor_16x8 neon sse2/;
specialize qw/aom_dc_left_predictor_16x16 neon sse2/;
-specialize qw/aom_dc_left_predictor_16x32 sse2/;
-specialize qw/aom_dc_left_predictor_16x64 sse2/;
-specialize qw/aom_dc_left_predictor_32x8 sse2/;
-specialize qw/aom_dc_left_predictor_32x16 sse2 avx2/;
+specialize qw/aom_dc_left_predictor_16x32 neon sse2/;
+specialize qw/aom_dc_left_predictor_16x64 neon sse2/;
+specialize qw/aom_dc_left_predictor_32x8 neon sse2/;
+specialize qw/aom_dc_left_predictor_32x16 neon sse2 avx2/;
specialize qw/aom_dc_left_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_dc_left_predictor_32x64 sse2 avx2/;
-specialize qw/aom_dc_left_predictor_64x16 sse2 avx2/;
-specialize qw/aom_dc_left_predictor_64x32 sse2 avx2/;
-specialize qw/aom_dc_left_predictor_64x64 sse2 avx2/;
+specialize qw/aom_dc_left_predictor_32x64 neon sse2 avx2/;
+specialize qw/aom_dc_left_predictor_64x16 neon sse2 avx2/;
+specialize qw/aom_dc_left_predictor_64x32 neon sse2 avx2/;
+specialize qw/aom_dc_left_predictor_64x64 neon sse2 avx2/;
specialize qw/aom_dc_128_predictor_4x4 neon sse2/;
-specialize qw/aom_dc_128_predictor_4x8 sse2/;
-specialize qw/aom_dc_128_predictor_4x16 sse2/;
-specialize qw/aom_dc_128_predictor_8x4 sse2/;
+specialize qw/aom_dc_128_predictor_4x8 neon sse2/;
+specialize qw/aom_dc_128_predictor_4x16 neon sse2/;
+specialize qw/aom_dc_128_predictor_8x4 neon sse2/;
specialize qw/aom_dc_128_predictor_8x8 neon sse2/;
-specialize qw/aom_dc_128_predictor_8x16 sse2/;
-specialize qw/aom_dc_128_predictor_8x32 sse2/;
-specialize qw/aom_dc_128_predictor_16x4 sse2/;
-specialize qw/aom_dc_128_predictor_16x8 sse2/;
+specialize qw/aom_dc_128_predictor_8x16 neon sse2/;
+specialize qw/aom_dc_128_predictor_8x32 neon sse2/;
+specialize qw/aom_dc_128_predictor_16x4 neon sse2/;
+specialize qw/aom_dc_128_predictor_16x8 neon sse2/;
specialize qw/aom_dc_128_predictor_16x16 neon sse2/;
-specialize qw/aom_dc_128_predictor_16x32 sse2/;
-specialize qw/aom_dc_128_predictor_16x64 sse2/;
-specialize qw/aom_dc_128_predictor_32x8 sse2/;
-specialize qw/aom_dc_128_predictor_32x16 sse2 avx2/;
+specialize qw/aom_dc_128_predictor_16x32 neon sse2/;
+specialize qw/aom_dc_128_predictor_16x64 neon sse2/;
+specialize qw/aom_dc_128_predictor_32x8 neon sse2/;
+specialize qw/aom_dc_128_predictor_32x16 neon sse2 avx2/;
specialize qw/aom_dc_128_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_dc_128_predictor_32x64 sse2 avx2/;
-specialize qw/aom_dc_128_predictor_64x16 sse2 avx2/;
-specialize qw/aom_dc_128_predictor_64x32 sse2 avx2/;
-specialize qw/aom_dc_128_predictor_64x64 sse2 avx2/;
+specialize qw/aom_dc_128_predictor_32x64 neon sse2 avx2/;
+specialize qw/aom_dc_128_predictor_64x16 neon sse2 avx2/;
+specialize qw/aom_dc_128_predictor_64x32 neon sse2 avx2/;
+specialize qw/aom_dc_128_predictor_64x64 neon sse2 avx2/;
specialize qw/aom_v_predictor_4x4 neon sse2/;
-specialize qw/aom_v_predictor_4x8 sse2/;
-specialize qw/aom_v_predictor_4x16 sse2/;
-specialize qw/aom_v_predictor_8x4 sse2/;
+specialize qw/aom_v_predictor_4x8 neon sse2/;
+specialize qw/aom_v_predictor_4x16 neon sse2/;
+specialize qw/aom_v_predictor_8x4 neon sse2/;
specialize qw/aom_v_predictor_8x8 neon sse2/;
-specialize qw/aom_v_predictor_8x16 sse2/;
-specialize qw/aom_v_predictor_8x32 sse2/;
-specialize qw/aom_v_predictor_16x4 sse2/;
-specialize qw/aom_v_predictor_16x8 sse2/;
+specialize qw/aom_v_predictor_8x16 neon sse2/;
+specialize qw/aom_v_predictor_8x32 neon sse2/;
+specialize qw/aom_v_predictor_16x4 neon sse2/;
+specialize qw/aom_v_predictor_16x8 neon sse2/;
specialize qw/aom_v_predictor_16x16 neon sse2/;
-specialize qw/aom_v_predictor_16x32 sse2/;
-specialize qw/aom_v_predictor_16x64 sse2/;
-specialize qw/aom_v_predictor_32x8 sse2/;
-specialize qw/aom_v_predictor_32x16 sse2 avx2/;
+specialize qw/aom_v_predictor_16x32 neon sse2/;
+specialize qw/aom_v_predictor_16x64 neon sse2/;
+specialize qw/aom_v_predictor_32x8 neon sse2/;
+specialize qw/aom_v_predictor_32x16 neon sse2 avx2/;
specialize qw/aom_v_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_v_predictor_32x64 sse2 avx2/;
-specialize qw/aom_v_predictor_64x16 sse2 avx2/;
-specialize qw/aom_v_predictor_64x32 sse2 avx2/;
-specialize qw/aom_v_predictor_64x64 sse2 avx2/;
+specialize qw/aom_v_predictor_32x64 neon sse2 avx2/;
+specialize qw/aom_v_predictor_64x16 neon sse2 avx2/;
+specialize qw/aom_v_predictor_64x32 neon sse2 avx2/;
+specialize qw/aom_v_predictor_64x64 neon sse2 avx2/;
specialize qw/aom_h_predictor_4x4 neon sse2/;
-specialize qw/aom_h_predictor_4x8 sse2/;
-specialize qw/aom_h_predictor_4x16 sse2/;
-specialize qw/aom_h_predictor_8x4 sse2/;
+specialize qw/aom_h_predictor_4x8 neon sse2/;
+specialize qw/aom_h_predictor_4x16 neon sse2/;
+specialize qw/aom_h_predictor_8x4 neon sse2/;
specialize qw/aom_h_predictor_8x8 neon sse2/;
-specialize qw/aom_h_predictor_8x16 sse2/;
-specialize qw/aom_h_predictor_8x32 sse2/;
-specialize qw/aom_h_predictor_16x4 sse2/;
-specialize qw/aom_h_predictor_16x8 sse2/;
+specialize qw/aom_h_predictor_8x16 neon sse2/;
+specialize qw/aom_h_predictor_8x32 neon sse2/;
+specialize qw/aom_h_predictor_16x4 neon sse2/;
+specialize qw/aom_h_predictor_16x8 neon sse2/;
specialize qw/aom_h_predictor_16x16 neon sse2/;
-specialize qw/aom_h_predictor_16x32 sse2/;
-specialize qw/aom_h_predictor_16x64 sse2/;
-specialize qw/aom_h_predictor_32x8 sse2/;
-specialize qw/aom_h_predictor_32x16 sse2/;
+specialize qw/aom_h_predictor_16x32 neon sse2/;
+specialize qw/aom_h_predictor_16x64 neon sse2/;
+specialize qw/aom_h_predictor_32x8 neon sse2/;
+specialize qw/aom_h_predictor_32x16 neon sse2/;
specialize qw/aom_h_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_h_predictor_32x64 sse2/;
-specialize qw/aom_h_predictor_64x16 sse2/;
-specialize qw/aom_h_predictor_64x32 sse2/;
-specialize qw/aom_h_predictor_64x64 sse2/;
+specialize qw/aom_h_predictor_32x64 neon sse2/;
+specialize qw/aom_h_predictor_64x16 neon sse2/;
+specialize qw/aom_h_predictor_64x32 neon sse2/;
+specialize qw/aom_h_predictor_64x64 neon sse2/;
specialize qw/aom_paeth_predictor_4x4 ssse3 neon/;
specialize qw/aom_paeth_predictor_4x8 ssse3 neon/;
@@ -268,24 +268,24 @@
# TODO(yunqingwang): optimize rectangular DC_PRED to replace division
# by multiply and shift.
specialize qw/aom_dc_predictor_4x4 neon sse2/;
-specialize qw/aom_dc_predictor_4x8 sse2/;
-specialize qw/aom_dc_predictor_4x16 sse2/;
-specialize qw/aom_dc_predictor_8x4 sse2/;
+specialize qw/aom_dc_predictor_4x8 neon sse2/;
+specialize qw/aom_dc_predictor_4x16 neon sse2/;
+specialize qw/aom_dc_predictor_8x4 neon sse2/;
specialize qw/aom_dc_predictor_8x8 neon sse2/;
-specialize qw/aom_dc_predictor_8x16 sse2/;
-specialize qw/aom_dc_predictor_8x32 sse2/;
-specialize qw/aom_dc_predictor_16x4 sse2/;
-specialize qw/aom_dc_predictor_16x8 sse2/;
+specialize qw/aom_dc_predictor_8x16 neon sse2/;
+specialize qw/aom_dc_predictor_8x32 neon sse2/;
+specialize qw/aom_dc_predictor_16x4 neon sse2/;
+specialize qw/aom_dc_predictor_16x8 neon sse2/;
specialize qw/aom_dc_predictor_16x16 neon sse2/;
-specialize qw/aom_dc_predictor_16x32 sse2/;
-specialize qw/aom_dc_predictor_16x64 sse2/;
-specialize qw/aom_dc_predictor_32x8 sse2/;
-specialize qw/aom_dc_predictor_32x16 sse2 avx2/;
+specialize qw/aom_dc_predictor_16x32 neon sse2/;
+specialize qw/aom_dc_predictor_16x64 neon sse2/;
+specialize qw/aom_dc_predictor_32x8 neon sse2/;
+specialize qw/aom_dc_predictor_32x16 neon sse2 avx2/;
specialize qw/aom_dc_predictor_32x32 neon sse2 avx2/;
-specialize qw/aom_dc_predictor_32x64 sse2 avx2/;
-specialize qw/aom_dc_predictor_64x64 sse2 avx2/;
-specialize qw/aom_dc_predictor_64x32 sse2 avx2/;
-specialize qw/aom_dc_predictor_64x16 sse2 avx2/;
+specialize qw/aom_dc_predictor_32x64 neon sse2 avx2/;
+specialize qw/aom_dc_predictor_64x64 neon sse2 avx2/;
+specialize qw/aom_dc_predictor_64x32 neon sse2 avx2/;
+specialize qw/aom_dc_predictor_64x16 neon sse2 avx2/;
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
specialize qw/aom_highbd_v_predictor_4x4 sse2 neon/;
specialize qw/aom_highbd_v_predictor_4x8 sse2 neon/;
@@ -310,57 +310,104 @@
# TODO(yunqingwang): optimize rectangular DC_PRED to replace division
# by multiply and shift.
specialize qw/aom_highbd_dc_predictor_4x4 sse2 neon/;
- specialize qw/aom_highbd_dc_predictor_4x8 sse2/;
- specialize qw/aom_highbd_dc_predictor_8x4 sse2/;;
+ specialize qw/aom_highbd_dc_predictor_4x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_predictor_4x16 neon/;
+ specialize qw/aom_highbd_dc_predictor_8x4 sse2 neon/;
specialize qw/aom_highbd_dc_predictor_8x8 sse2 neon/;
- specialize qw/aom_highbd_dc_predictor_8x16 sse2/;;
- specialize qw/aom_highbd_dc_predictor_16x8 sse2/;
+ specialize qw/aom_highbd_dc_predictor_8x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_predictor_8x32 neon/;
+ specialize qw/aom_highbd_dc_predictor_16x4 neon/;
+ specialize qw/aom_highbd_dc_predictor_16x8 sse2 neon/;
specialize qw/aom_highbd_dc_predictor_16x16 sse2 neon/;
- specialize qw/aom_highbd_dc_predictor_16x32 sse2/;
- specialize qw/aom_highbd_dc_predictor_32x16 sse2/;
+ specialize qw/aom_highbd_dc_predictor_16x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_predictor_16x64 neon/;
+ specialize qw/aom_highbd_dc_predictor_32x8 neon/;
+ specialize qw/aom_highbd_dc_predictor_32x16 sse2 neon/;
specialize qw/aom_highbd_dc_predictor_32x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_predictor_32x64 neon/;
+ specialize qw/aom_highbd_dc_predictor_64x16 neon/;
+ specialize qw/aom_highbd_dc_predictor_64x32 neon/;
specialize qw/aom_highbd_dc_predictor_64x64 neon/;
- specialize qw/aom_highbd_h_predictor_4x4 sse2/;
- specialize qw/aom_highbd_h_predictor_4x8 sse2/;
- specialize qw/aom_highbd_h_predictor_8x4 sse2/;
- specialize qw/aom_highbd_h_predictor_8x8 sse2/;
- specialize qw/aom_highbd_h_predictor_8x16 sse2/;
- specialize qw/aom_highbd_h_predictor_16x8 sse2/;
- specialize qw/aom_highbd_h_predictor_16x16 sse2/;
- specialize qw/aom_highbd_h_predictor_16x32 sse2/;
- specialize qw/aom_highbd_h_predictor_32x16 sse2/;
- specialize qw/aom_highbd_h_predictor_32x32 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_4x4 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_4x4 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_4x4 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_4x8 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_4x8 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_4x8 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_8x4 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_8x4 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_8x4 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_8x8 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_8x8 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_8x8 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_8x16 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_8x16 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_8x16 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_16x8 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_16x8 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_16x8 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_16x16 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_16x16 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_16x16 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_16x32 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_16x32 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_16x32 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_32x16 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_32x16 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_32x16 sse2/;
- specialize qw/aom_highbd_dc_left_predictor_32x32 sse2/;
- specialize qw/aom_highbd_dc_top_predictor_32x32 sse2/;
- specialize qw/aom_highbd_dc_128_predictor_32x32 sse2/;
+ specialize qw/aom_highbd_h_predictor_4x4 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_4x8 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_4x16 neon/;
+ specialize qw/aom_highbd_h_predictor_8x4 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_8x8 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_8x16 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_8x32 neon/;
+ specialize qw/aom_highbd_h_predictor_16x4 neon/;
+ specialize qw/aom_highbd_h_predictor_16x8 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_16x16 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_16x32 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_16x64 neon/;
+ specialize qw/aom_highbd_h_predictor_32x8 neon/;
+ specialize qw/aom_highbd_h_predictor_32x16 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_32x32 sse2 neon/;
+ specialize qw/aom_highbd_h_predictor_32x64 neon/;
+ specialize qw/aom_highbd_h_predictor_64x16 neon/;
+ specialize qw/aom_highbd_h_predictor_64x32 neon/;
+ specialize qw/aom_highbd_h_predictor_64x64 neon/;
+
+ specialize qw/aom_highbd_dc_128_predictor_4x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_4x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_4x16 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_8x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_8x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_8x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_8x32 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_16x4 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_16x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_16x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_16x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_16x64 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_32x8 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_32x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_32x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_32x64 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_64x16 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_64x32 neon/;
+ specialize qw/aom_highbd_dc_128_predictor_64x64 neon/;
+
+ specialize qw/aom_highbd_dc_left_predictor_4x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_4x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_4x16 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_8x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_8x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_8x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_8x32 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_16x4 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_16x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_16x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_16x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_16x64 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_32x8 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_32x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_32x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_32x64 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_64x16 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_64x32 neon/;
+ specialize qw/aom_highbd_dc_left_predictor_64x64 neon/;
+
+ specialize qw/aom_highbd_dc_top_predictor_4x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_4x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_4x16 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_8x4 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_8x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_8x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_8x32 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_16x4 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_16x8 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_16x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_16x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_16x64 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_32x8 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_32x16 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_32x32 sse2 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_32x64 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_64x16 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_64x32 neon/;
+ specialize qw/aom_highbd_dc_top_predictor_64x64 neon/;
specialize qw/aom_highbd_paeth_predictor_4x4 neon/;
specialize qw/aom_highbd_paeth_predictor_4x8 neon/;
@@ -451,8 +498,8 @@
add_proto qw/void aom_convolve8_vert/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h";
specialize qw/aom_convolve_copy neon sse2 avx2/;
-specialize qw/aom_convolve8_horiz sse2 ssse3/, "$avx2_ssse3";
-specialize qw/aom_convolve8_vert sse2 ssse3/, "$avx2_ssse3";
+specialize qw/aom_convolve8_horiz neon sse2 ssse3/, "$avx2_ssse3";
+specialize qw/aom_convolve8_vert neon sse2 ssse3/, "$avx2_ssse3";
add_proto qw/void aom_scaled_2d/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h";
specialize qw/aom_scaled_2d ssse3 neon/;
@@ -607,12 +654,16 @@
add_proto qw/void aom_fdct4x4_lp/, "const int16_t *input, int16_t *output, int stride";
specialize qw/aom_fdct4x4_lp neon sse2/;
- add_proto qw/void aom_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/aom_fdct8x8 neon sse2/, "$ssse3_x86_64";
- # High bit depth
- if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
- add_proto qw/void aom_highbd_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/aom_highbd_fdct8x8 sse2/;
+ if (aom_config("CONFIG_INTERNAL_STATS") eq "yes"){
+ # 8x8 DCT transform for psnr-hvs. Unlike other transforms isn't compatible
+ # with av1 scan orders, because it does two transposes.
+ add_proto qw/void aom_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride";
+ specialize qw/aom_fdct8x8 neon sse2/, "$ssse3_x86_64";
+ # High bit depth
+ if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
+ add_proto qw/void aom_highbd_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride";
+ specialize qw/aom_highbd_fdct8x8 sse2/;
+ }
}
# FFT/IFFT (float) only used for denoising (and noise power spectral density estimation)
add_proto qw/void aom_fft2x2_float/, "const float *input, float *temp, float *output";
@@ -743,13 +794,13 @@
specialize qw/aom_sum_squares_2d_i16 sse2 avx2 neon/;
add_proto qw/uint64_t aom_sum_squares_i16/, "const int16_t *src, uint32_t N";
- specialize qw/aom_sum_squares_i16 sse2/;
+ specialize qw/aom_sum_squares_i16 sse2 neon/;
add_proto qw/uint64_t aom_var_2d_u8/, "uint8_t *src, int src_stride, int width, int height";
- specialize qw/aom_var_2d_u8 sse2 avx2/;
+ specialize qw/aom_var_2d_u8 sse2 avx2 neon/;
add_proto qw/uint64_t aom_var_2d_u16/, "uint8_t *src, int src_stride, int width, int height";
- specialize qw/aom_var_2d_u16 sse2 avx2/;
+ specialize qw/aom_var_2d_u16 sse2 avx2 neon/;
}
#
@@ -802,9 +853,12 @@
specialize qw/aom_sad_skip_16x8 sse2 neon/;
specialize qw/aom_sad_skip_8x16 sse2 neon/;
specialize qw/aom_sad_skip_8x8 sse2 neon/;
+ specialize qw/aom_sad_skip_8x4 neon/;
specialize qw/aom_sad_skip_4x8 sse2 neon/;
+ specialize qw/aom_sad_skip_4x4 neon/;
specialize qw/aom_sad_skip_4x16 sse2 neon/;
+ specialize qw/aom_sad_skip_16x4 neon/;
specialize qw/aom_sad_skip_8x32 sse2 neon/;
specialize qw/aom_sad_skip_32x8 sse2 neon/;
specialize qw/aom_sad_skip_16x64 sse2 neon/;
@@ -834,43 +888,31 @@
specialize qw/aom_sad16x64_avg sse2 neon/;
specialize qw/aom_sad64x16_avg sse2 neon/;
- specialize qw/aom_dist_wtd_sad128x128_avg ssse3/;
- specialize qw/aom_dist_wtd_sad128x64_avg ssse3/;
- specialize qw/aom_dist_wtd_sad64x128_avg ssse3/;
- specialize qw/aom_dist_wtd_sad64x64_avg ssse3/;
- specialize qw/aom_dist_wtd_sad64x32_avg ssse3/;
- specialize qw/aom_dist_wtd_sad32x64_avg ssse3/;
- specialize qw/aom_dist_wtd_sad32x32_avg ssse3/;
- specialize qw/aom_dist_wtd_sad32x16_avg ssse3/;
- specialize qw/aom_dist_wtd_sad16x32_avg ssse3/;
- specialize qw/aom_dist_wtd_sad16x16_avg ssse3/;
- specialize qw/aom_dist_wtd_sad16x8_avg ssse3/;
- specialize qw/aom_dist_wtd_sad8x16_avg ssse3/;
- specialize qw/aom_dist_wtd_sad8x8_avg ssse3/;
- specialize qw/aom_dist_wtd_sad8x4_avg ssse3/;
- specialize qw/aom_dist_wtd_sad4x8_avg ssse3/;
- specialize qw/aom_dist_wtd_sad4x4_avg ssse3/;
+ specialize qw/aom_dist_wtd_sad128x128_avg sse2/;
+ specialize qw/aom_dist_wtd_sad128x64_avg sse2/;
+ specialize qw/aom_dist_wtd_sad64x128_avg sse2/;
+ specialize qw/aom_dist_wtd_sad64x64_avg sse2/;
+ specialize qw/aom_dist_wtd_sad64x32_avg sse2/;
+ specialize qw/aom_dist_wtd_sad32x64_avg sse2/;
+ specialize qw/aom_dist_wtd_sad32x32_avg sse2/;
+ specialize qw/aom_dist_wtd_sad32x16_avg sse2/;
+ specialize qw/aom_dist_wtd_sad16x32_avg sse2/;
+ specialize qw/aom_dist_wtd_sad16x16_avg sse2/;
+ specialize qw/aom_dist_wtd_sad16x8_avg sse2/;
+ specialize qw/aom_dist_wtd_sad8x16_avg sse2/;
+ specialize qw/aom_dist_wtd_sad8x8_avg sse2/;
+ specialize qw/aom_dist_wtd_sad8x4_avg sse2/;
+ specialize qw/aom_dist_wtd_sad4x8_avg sse2/;
+ specialize qw/aom_dist_wtd_sad4x4_avg sse2/;
- specialize qw/aom_dist_wtd_sad4x16_avg ssse3/;
- specialize qw/aom_dist_wtd_sad16x4_avg ssse3/;
- specialize qw/aom_dist_wtd_sad8x32_avg ssse3/;
- specialize qw/aom_dist_wtd_sad32x8_avg ssse3/;
- specialize qw/aom_dist_wtd_sad16x64_avg ssse3/;
- specialize qw/aom_dist_wtd_sad64x16_avg ssse3/;
-
- add_proto qw/unsigned int/, "aom_sad4xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
- add_proto qw/unsigned int/, "aom_sad8xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
- add_proto qw/unsigned int/, "aom_sad16xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
- add_proto qw/unsigned int/, "aom_sad32xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
- add_proto qw/unsigned int/, "aom_sad64xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
- add_proto qw/unsigned int/, "aom_sad128xh", "const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height";
-
- specialize qw/aom_sad4xh sse2/;
- specialize qw/aom_sad8xh sse2/;
- specialize qw/aom_sad16xh sse2/;
- specialize qw/aom_sad32xh sse2/;
- specialize qw/aom_sad64xh sse2/;
- specialize qw/aom_sad128xh sse2/;
+ if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
+ specialize qw/aom_dist_wtd_sad4x16_avg sse2/;
+ specialize qw/aom_dist_wtd_sad16x4_avg sse2/;
+ specialize qw/aom_dist_wtd_sad8x32_avg sse2/;
+ specialize qw/aom_dist_wtd_sad32x8_avg sse2/;
+ specialize qw/aom_dist_wtd_sad16x64_avg sse2/;
+ specialize qw/aom_dist_wtd_sad64x16_avg sse2/;
+ }
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
foreach (@encoder_block_sizes) {
@@ -884,50 +926,53 @@
}
add_proto qw/unsigned int/, "aom_highbd_dist_wtd_sad${w}x${h}_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param";
}
- specialize qw/aom_highbd_sad128x128 avx2/;
- specialize qw/aom_highbd_sad128x64 avx2/;
- specialize qw/aom_highbd_sad64x128 avx2/;
- specialize qw/aom_highbd_sad64x64 avx2 sse2/;
- specialize qw/aom_highbd_sad64x32 avx2 sse2/;
- specialize qw/aom_highbd_sad32x64 avx2 sse2/;
- specialize qw/aom_highbd_sad32x32 avx2 sse2/;
- specialize qw/aom_highbd_sad32x16 avx2 sse2/;
- specialize qw/aom_highbd_sad16x32 avx2 sse2/;
- specialize qw/aom_highbd_sad16x16 avx2 sse2/;
- specialize qw/aom_highbd_sad16x8 avx2 sse2/;
- specialize qw/aom_highbd_sad8x16 sse2/;
- specialize qw/aom_highbd_sad8x8 sse2/;
- specialize qw/aom_highbd_sad8x4 sse2/;
- specialize qw/aom_highbd_sad4x8 sse2/;
- specialize qw/aom_highbd_sad4x4 sse2/;
+ specialize qw/aom_highbd_sad128x128 avx2 neon/;
+ specialize qw/aom_highbd_sad128x64 avx2 neon/;
+ specialize qw/aom_highbd_sad64x128 avx2 neon/;
+ specialize qw/aom_highbd_sad64x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad64x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad32x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad32x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad32x16 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad16x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad16x16 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad16x8 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad8x16 sse2 neon/;
+ specialize qw/aom_highbd_sad8x8 sse2 neon/;
+ specialize qw/aom_highbd_sad8x4 sse2 neon/;
+ specialize qw/aom_highbd_sad4x8 sse2 neon/;
+ specialize qw/aom_highbd_sad4x4 sse2 neon/;
- specialize qw/aom_highbd_sad4x16 sse2/;
- specialize qw/aom_highbd_sad16x4 avx2 sse2/;
- specialize qw/aom_highbd_sad8x32 sse2/;
- specialize qw/aom_highbd_sad32x8 avx2 sse2/;
- specialize qw/aom_highbd_sad16x64 avx2 sse2/;
- specialize qw/aom_highbd_sad64x16 avx2 sse2/;
+ specialize qw/aom_highbd_sad4x16 sse2 neon/;
+ specialize qw/aom_highbd_sad16x4 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad8x32 sse2 neon/;
+ specialize qw/aom_highbd_sad32x8 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad16x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad64x16 avx2 sse2 neon/;
- specialize qw/aom_highbd_sad_skip_128x128 avx2/;
- specialize qw/aom_highbd_sad_skip_128x64 avx2/;
- specialize qw/aom_highbd_sad_skip_64x128 avx2/;
- specialize qw/aom_highbd_sad_skip_64x64 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_64x32 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x64 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x32 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x16 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x32 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x16 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x8 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_8x16 sse2/;
- specialize qw/aom_highbd_sad_skip_8x8 sse2/;
- specialize qw/aom_highbd_sad_skip_4x8 sse2/;
+ specialize qw/aom_highbd_sad_skip_128x128 avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_128x64 avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x128 avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x16 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x32 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x16 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x8 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x4 neon/;
+ specialize qw/aom_highbd_sad_skip_8x16 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_8x4 neon/;
+ specialize qw/aom_highbd_sad_skip_8x8 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_4x8 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_4x4 neon/;
- specialize qw/aom_highbd_sad_skip_4x16 sse2/;
- specialize qw/aom_highbd_sad_skip_8x32 sse2/;
- specialize qw/aom_highbd_sad_skip_32x8 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x64 avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_64x16 avx2 sse2/;
+ specialize qw/aom_highbd_sad_skip_4x16 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_8x32 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x8 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x64 avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x16 avx2 sse2 neon/;
specialize qw/aom_highbd_sad128x128_avg avx2/;
specialize qw/aom_highbd_sad128x64_avg avx2/;
@@ -957,7 +1002,7 @@
foreach (@encoder_block_sizes) {
($w, $h) = @$_;
add_proto qw/unsigned int/, "aom_masked_sad${w}x${h}", "const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask";
- specialize "aom_masked_sad${w}x${h}", qw/ssse3 avx2/;
+ specialize "aom_masked_sad${w}x${h}", qw/ssse3 avx2 neon/;
}
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
@@ -976,7 +1021,7 @@
($w, $h) = @$_;
add_proto qw/unsigned int/, "aom_obmc_sad${w}x${h}", "const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask";
if (! (($w == 128 && $h == 32) || ($w == 32 && $h == 128))) {
- specialize "aom_obmc_sad${w}x${h}", qw/sse4_1 avx2/;
+ specialize "aom_obmc_sad${w}x${h}", qw/sse4_1 avx2 neon/;
}
}
@@ -998,7 +1043,6 @@
($w, $h) = @$_;
add_proto qw/void/, "aom_sad${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]";
add_proto qw/void/, "aom_sad${w}x${h}x3d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]";
- add_proto qw/void/, "aom_sad${w}x${h}x4d_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]";
add_proto qw/void/, "aom_sad_skip_${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]";
add_proto qw/void/, "aom_masked_sad${w}x${h}x4d", "const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]";
}
@@ -1018,7 +1062,6 @@
specialize qw/aom_sad8x16x4d neon sse2/;
specialize qw/aom_sad8x8x4d neon sse2/;
specialize qw/aom_sad8x4x4d neon sse2/;
- specialize qw/aom_sad4x32x4d neon sse2/;
specialize qw/aom_sad4x8x4d neon sse2/;
specialize qw/aom_sad4x4x4d neon sse2/;
@@ -1044,88 +1087,66 @@
specialize qw/aom_sad_skip_16x32x4d avx2 sse2 neon/;
specialize qw/aom_sad_skip_16x16x4d avx2 sse2 neon/;
specialize qw/aom_sad_skip_16x8x4d avx2 sse2 neon/;
+ specialize qw/aom_sad_skip_16x4x4d neon/;
specialize qw/aom_sad_skip_8x32x4d sse2 neon/;
specialize qw/aom_sad_skip_8x16x4d sse2 neon/;
specialize qw/aom_sad_skip_8x8x4d sse2 neon/;
- specialize qw/aom_sad_skip_4x32x4d sse2 neon/;
+ specialize qw/aom_sad_skip_8x4x4d neon/;
specialize qw/aom_sad_skip_4x16x4d sse2 neon/;
specialize qw/aom_sad_skip_4x8x4d sse2 neon/;
+ specialize qw/aom_sad_skip_4x4x4d neon/;
- specialize qw/aom_sad128x128x3d avx2/;
- specialize qw/aom_sad128x64x3d avx2/;
- specialize qw/aom_sad64x128x3d avx2/;
- specialize qw/aom_sad64x64x3d avx2/;
- specialize qw/aom_sad64x32x3d avx2/;
- specialize qw/aom_sad32x64x3d avx2/;
- specialize qw/aom_sad32x32x3d avx2/;
- specialize qw/aom_sad32x16x3d avx2/;
- specialize qw/aom_sad16x32x3d avx2/;
- specialize qw/aom_sad16x16x3d avx2/;
- specialize qw/aom_sad16x8x3d avx2/;
+ specialize qw/aom_sad128x128x3d neon avx2/;
+ specialize qw/aom_sad128x64x3d neon avx2/;
+ specialize qw/aom_sad64x128x3d neon avx2/;
+ specialize qw/aom_sad64x64x3d neon avx2/;
+ specialize qw/aom_sad64x32x3d neon avx2/;
+ specialize qw/aom_sad32x64x3d neon avx2/;
+ specialize qw/aom_sad32x32x3d neon avx2/;
+ specialize qw/aom_sad32x16x3d neon avx2/;
+ specialize qw/aom_sad16x32x3d neon avx2/;
+ specialize qw/aom_sad16x16x3d neon avx2/;
+ specialize qw/aom_sad16x8x3d neon avx2/;
+ specialize qw/aom_sad8x16x3d neon/;
+ specialize qw/aom_sad8x8x3d neon/;
+ specialize qw/aom_sad8x4x3d neon/;
+ specialize qw/aom_sad4x8x3d neon/;
+ specialize qw/aom_sad4x4x3d neon/;
- specialize qw/aom_sad64x16x3d avx2/;
- specialize qw/aom_sad32x8x3d avx2/;
- specialize qw/aom_sad16x64x3d avx2/;
+ specialize qw/aom_sad64x16x3d neon avx2/;
+ specialize qw/aom_sad32x8x3d neon avx2/;
+ specialize qw/aom_sad16x64x3d neon avx2/;
+ specialize qw/aom_sad16x4x3d neon/;
+ specialize qw/aom_sad8x32x3d neon/;
+ specialize qw/aom_sad4x16x3d neon/;
- if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
- specialize qw/aom_sad128x128x4d_avg sse2/;
- specialize qw/aom_sad128x64x4d_avg sse2/;
- specialize qw/aom_sad64x128x4d_avg sse2/;
- specialize qw/aom_sad64x64x4d_avg sse2/;
- specialize qw/aom_sad64x32x4d_avg sse2/;
- specialize qw/aom_sad64x16x4d_avg sse2/;
- specialize qw/aom_sad32x64x4d_avg sse2/;
- specialize qw/aom_sad32x32x4d_avg sse2/;
- specialize qw/aom_sad32x16x4d_avg sse2/;
- specialize qw/aom_sad32x8x4d_avg sse2/;
- specialize qw/aom_sad16x64x4d_avg sse2/;
- specialize qw/aom_sad16x32x4d_avg sse2/;
- specialize qw/aom_sad16x16x4d_avg sse2/;
- specialize qw/aom_sad16x8x4d_avg sse2/;
+ specialize qw/aom_masked_sad128x128x4d ssse3 neon/;
+ specialize qw/aom_masked_sad128x64x4d ssse3 neon/;
+ specialize qw/aom_masked_sad64x128x4d ssse3 neon/;
+ specialize qw/aom_masked_sad64x64x4d ssse3 neon/;
+ specialize qw/aom_masked_sad64x32x4d ssse3 neon/;
+ specialize qw/aom_masked_sad64x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad32x64x4d ssse3 neon/;
+ specialize qw/aom_masked_sad32x32x4d ssse3 neon/;
+ specialize qw/aom_masked_sad32x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad32x8x4d ssse3 neon/;
+ specialize qw/aom_masked_sad16x64x4d ssse3 neon/;
+ specialize qw/aom_masked_sad16x32x4d ssse3 neon/;
+ specialize qw/aom_masked_sad16x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad16x8x4d ssse3 neon/;
- specialize qw/aom_sad8x16x4d_avg sse2/;
- specialize qw/aom_sad8x8x4d_avg sse2/;
- specialize qw/aom_sad8x4x4d_avg sse2/;
- specialize qw/aom_sad4x16x4d_avg sse2/;
- specialize qw/aom_sad4x8x4d_avg sse2/;
- specialize qw/aom_sad4x4x4d_avg sse2/;
+ specialize qw/aom_masked_sad8x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad8x8x4d ssse3 neon/;
+ specialize qw/aom_masked_sad8x4x4d ssse3 neon/;
+ specialize qw/aom_masked_sad4x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad4x8x4d ssse3 neon/;
+ specialize qw/aom_masked_sad4x4x4d ssse3 neon/;
- specialize qw/aom_sad4x32x4d_avg sse2/;
- specialize qw/aom_sad4x16x4d_avg sse2/;
- specialize qw/aom_sad16x4x4d_avg sse2/;
- specialize qw/aom_sad8x32x4d_avg sse2/;
- specialize qw/aom_sad32x8x4d_avg sse2/;
- specialize qw/aom_sad64x16x4d_avg sse2/;
- }
-
- specialize qw/aom_masked_sad128x128x4d ssse3/;
- specialize qw/aom_masked_sad128x64x4d ssse3/;
- specialize qw/aom_masked_sad64x128x4d ssse3/;
- specialize qw/aom_masked_sad64x64x4d ssse3/;
- specialize qw/aom_masked_sad64x32x4d ssse3/;
- specialize qw/aom_masked_sad64x16x4d ssse3/;
- specialize qw/aom_masked_sad32x64x4d ssse3/;
- specialize qw/aom_masked_sad32x32x4d ssse3/;
- specialize qw/aom_masked_sad32x16x4d ssse3/;
- specialize qw/aom_masked_sad32x8x4d ssse3/;
- specialize qw/aom_masked_sad16x64x4d ssse3/;
- specialize qw/aom_masked_sad16x32x4d ssse3/;
- specialize qw/aom_masked_sad16x16x4d ssse3/;
- specialize qw/aom_masked_sad16x8x4d ssse3/;
-
- specialize qw/aom_masked_sad8x16x4d ssse3/;
- specialize qw/aom_masked_sad8x8x4d ssse3/;
- specialize qw/aom_masked_sad8x4x4d ssse3/;
- specialize qw/aom_masked_sad4x16x4d ssse3/;
- specialize qw/aom_masked_sad4x8x4d ssse3/;
- specialize qw/aom_masked_sad4x4x4d ssse3/;
-
- specialize qw/aom_masked_sad4x32x4d ssse3/;
- specialize qw/aom_masked_sad4x16x4d ssse3/;
- specialize qw/aom_masked_sad16x4x4d ssse3/;
- specialize qw/aom_masked_sad8x32x4d ssse3/;
- specialize qw/aom_masked_sad32x8x4d ssse3/;
- specialize qw/aom_masked_sad64x16x4d ssse3/;
+ specialize qw/aom_masked_sad4x16x4d ssse3 neon/;
+ specialize qw/aom_masked_sad16x4x4d ssse3 neon/;
+ specialize qw/aom_masked_sad8x32x4d ssse3 neon/;
+ specialize qw/aom_masked_sad32x8x4d ssse3 neon/;
+ specialize qw/aom_masked_sad64x16x4d ssse3 neon/;
#
# Multi-block SAD, comparing a reference to N independent blocks
#
@@ -1139,50 +1160,53 @@
specialize "aom_highbd_sad${w}x${h}x4d", qw/sse2/;
}
}
- specialize qw/aom_highbd_sad128x128x4d avx2/;
- specialize qw/aom_highbd_sad128x64x4d avx2/;
- specialize qw/aom_highbd_sad64x128x4d avx2/;
- specialize qw/aom_highbd_sad64x64x4d sse2 avx2/;
- specialize qw/aom_highbd_sad64x32x4d sse2 avx2/;
- specialize qw/aom_highbd_sad32x64x4d sse2 avx2/;
- specialize qw/aom_highbd_sad32x32x4d sse2 avx2/;
- specialize qw/aom_highbd_sad32x16x4d sse2 avx2/;
- specialize qw/aom_highbd_sad16x32x4d sse2 avx2/;
- specialize qw/aom_highbd_sad16x16x4d sse2 avx2/;
- specialize qw/aom_highbd_sad16x8x4d sse2 avx2/;
- specialize qw/aom_highbd_sad8x16x4d sse2/;
- specialize qw/aom_highbd_sad8x8x4d sse2/;
- specialize qw/aom_highbd_sad8x4x4d sse2/;
- specialize qw/aom_highbd_sad4x8x4d sse2/;
- specialize qw/aom_highbd_sad4x4x4d sse2/;
+ specialize qw/aom_highbd_sad128x128x4d avx2 neon/;
+ specialize qw/aom_highbd_sad128x64x4d avx2 neon/;
+ specialize qw/aom_highbd_sad64x128x4d avx2 neon/;
+ specialize qw/aom_highbd_sad64x64x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad64x32x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad32x64x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad32x32x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad32x16x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad16x32x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad16x16x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad16x8x4d sse2 avx2 neon/;
+ specialize qw/aom_highbd_sad8x16x4d sse2 neon/;
+ specialize qw/aom_highbd_sad8x8x4d sse2 neon/;
+ specialize qw/aom_highbd_sad8x4x4d sse2 neon/;
+ specialize qw/aom_highbd_sad4x8x4d sse2 neon/;
+ specialize qw/aom_highbd_sad4x4x4d sse2 neon/;
- specialize qw/aom_highbd_sad4x16x4d sse2/;
- specialize qw/aom_highbd_sad16x4x4d avx2 sse2/;
- specialize qw/aom_highbd_sad8x32x4d sse2/;
- specialize qw/aom_highbd_sad32x8x4d avx2 sse2/;
- specialize qw/aom_highbd_sad16x64x4d avx2 sse2/;
- specialize qw/aom_highbd_sad64x16x4d avx2 sse2/;
+ specialize qw/aom_highbd_sad4x16x4d sse2 neon/;
+ specialize qw/aom_highbd_sad16x4x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad8x32x4d sse2 neon/;
+ specialize qw/aom_highbd_sad32x8x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad16x64x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad64x16x4d avx2 sse2 neon/;
- specialize qw/aom_highbd_sad_skip_128x128x4d avx2/;
- specialize qw/aom_highbd_sad_skip_128x64x4d avx2/;
- specialize qw/aom_highbd_sad_skip_64x128x4d avx2/;
- specialize qw/aom_highbd_sad_skip_64x64x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_64x32x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x64x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x32x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_32x16x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x32x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x16x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x8x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_8x16x4d sse2/;
- specialize qw/aom_highbd_sad_skip_8x8x4d sse2/;
- specialize qw/aom_highbd_sad_skip_4x8x4d sse2/;
+ specialize qw/aom_highbd_sad_skip_128x128x4d avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_128x64x4d avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x128x4d avx2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x64x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x32x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x64x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x32x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x16x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x32x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x16x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x8x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x4x4d neon/;
+ specialize qw/aom_highbd_sad_skip_8x16x4d sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_8x8x4d sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_8x4x4d neon/;
+ specialize qw/aom_highbd_sad_skip_4x8x4d sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_4x4x4d neon/;
- specialize qw/aom_highbd_sad_skip_4x16x4d sse2/;
- specialize qw/aom_highbd_sad_skip_8x32x4d sse2/;
- specialize qw/aom_highbd_sad_skip_32x8x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_16x64x4d avx2 sse2/;
- specialize qw/aom_highbd_sad_skip_64x16x4d avx2 sse2/;
+ specialize qw/aom_highbd_sad_skip_4x16x4d sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_8x32x4d sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_32x8x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_16x64x4d avx2 sse2 neon/;
+ specialize qw/aom_highbd_sad_skip_64x16x4d avx2 sse2 neon/;
specialize qw/aom_highbd_sad128x128x3d avx2/;
specialize qw/aom_highbd_sad128x64x3d avx2/;
@@ -1214,13 +1238,15 @@
specialize qw/aom_avg_8x8_quad avx2 sse2 neon/;
add_proto qw/void aom_minmax_8x8/, "const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max";
- specialize qw/aom_minmax_8x8 sse2/;
+ specialize qw/aom_minmax_8x8 sse2 neon/;
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/unsigned int aom_highbd_avg_8x8/, "const uint8_t *, int p";
+ specialize qw/aom_highbd_avg_8x8 neon/;
add_proto qw/unsigned int aom_highbd_avg_4x4/, "const uint8_t *, int p";
specialize qw/aom_highbd_avg_4x4 neon/;
add_proto qw/void aom_highbd_minmax_8x8/, "const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max";
+ specialize qw/aom_highbd_minmax_8x8 neon/;
}
add_proto qw/void aom_int_pro_row/, "int16_t *hbuf, const uint8_t *ref, const int ref_stride, const int width, const int height, int norm_factor";
@@ -1238,7 +1264,7 @@
# hamadard transform and satd for implmenting temporal dependency model
#
add_proto qw/void aom_hadamard_4x4/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
- specialize qw/aom_hadamard_4x4 sse2/;
+ specialize qw/aom_hadamard_4x4 sse2 neon/;
add_proto qw/void aom_hadamard_8x8/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
specialize qw/aom_hadamard_8x8 sse2 neon/;
@@ -1247,7 +1273,7 @@
specialize qw/aom_hadamard_16x16 avx2 sse2 neon/;
add_proto qw/void aom_hadamard_32x32/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
- specialize qw/aom_hadamard_32x32 avx2 sse2/;
+ specialize qw/aom_hadamard_32x32 avx2 sse2 neon/;
add_proto qw/void aom_hadamard_lp_8x8/, "const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff";
specialize qw/aom_hadamard_lp_8x8 sse2 neon/;
@@ -1258,18 +1284,15 @@
add_proto qw/void aom_hadamard_lp_8x8_dual/, "const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff";
specialize qw/aom_hadamard_lp_8x8_dual sse2 avx2 neon/;
- add_proto qw/void aom_pixel_scale/, "const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8";
- specialize qw/aom_pixel_scale sse2/;
-
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/void aom_highbd_hadamard_8x8/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
- specialize qw/aom_highbd_hadamard_8x8 avx2/;
+ specialize qw/aom_highbd_hadamard_8x8 avx2 neon/;
add_proto qw/void aom_highbd_hadamard_16x16/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
- specialize qw/aom_highbd_hadamard_16x16 avx2/;
+ specialize qw/aom_highbd_hadamard_16x16 avx2 neon/;
add_proto qw/void aom_highbd_hadamard_32x32/, "const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff";
- specialize qw/aom_highbd_hadamard_32x32 avx2/;
+ specialize qw/aom_highbd_hadamard_32x32 avx2 neon/;
}
add_proto qw/int aom_satd/, "const tran_low_t *coeff, int length";
specialize qw/aom_satd neon sse2 avx2/;
@@ -1299,17 +1322,11 @@
#
# Specialty Variance
#
- add_proto qw/void aom_get16x16var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
- add_proto qw/void aom_get8x8var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
-
- specialize qw/aom_get16x16var neon/;
- specialize qw/aom_get8x8var sse2 neon/;
-
add_proto qw/void aom_get_var_sse_sum_8x8_quad/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8";
specialize qw/aom_get_var_sse_sum_8x8_quad avx2 sse2 neon/;
add_proto qw/void aom_get_var_sse_sum_16x16_dual/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16";
- specialize qw/aom_get_var_sse_sum_16x16_dual avx2/;
+ specialize qw/aom_get_var_sse_sum_16x16_dual avx2 sse2 neon/;
add_proto qw/unsigned int aom_mse16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
add_proto qw/unsigned int aom_mse16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
@@ -1323,9 +1340,6 @@
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
foreach $bd (8, 10, 12) {
- add_proto qw/void/, "aom_highbd_${bd}_get16x16var", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
- add_proto qw/void/, "aom_highbd_${bd}_get8x8var", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
-
add_proto qw/unsigned int/, "aom_highbd_${bd}_mse16x16", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
add_proto qw/unsigned int/, "aom_highbd_${bd}_mse16x8", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
add_proto qw/unsigned int/, "aom_highbd_${bd}_mse8x16", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
@@ -1340,10 +1354,7 @@
#
#
add_proto qw/unsigned int aom_get_mb_ss/, "const int16_t *";
- add_proto qw/unsigned int aom_get4x4sse_cs/, "const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride";
-
specialize qw/aom_get_mb_ss sse2/;
- specialize qw/aom_get4x4sse_cs neon/;
#
# Variance / Subpixel Variance / Subpixel Avg Variance
@@ -1522,7 +1533,7 @@
foreach (@encoder_block_sizes) {
($w, $h) = @$_;
add_proto qw/unsigned int/, "aom_masked_sub_pixel_variance${w}x${h}", "const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse";
- specialize "aom_masked_sub_pixel_variance${w}x${h}", qw/ssse3/;
+ specialize "aom_masked_sub_pixel_variance${w}x${h}", qw/ssse3 neon/;
}
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
@@ -1543,8 +1554,8 @@
($w, $h) = @$_;
add_proto qw/unsigned int/, "aom_obmc_variance${w}x${h}", "const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse";
add_proto qw/unsigned int/, "aom_obmc_sub_pixel_variance${w}x${h}", "const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse";
- specialize "aom_obmc_variance${w}x${h}", qw/sse4_1 avx2/;
- specialize "aom_obmc_sub_pixel_variance${w}x${h}", q/sse4_1/;
+ specialize "aom_obmc_variance${w}x${h}", qw/sse4_1 avx2 neon/;
+ specialize "aom_obmc_sub_pixel_variance${w}x${h}", qw/sse4_1 neon/;
}
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
@@ -1602,6 +1613,7 @@
# Comp Avg
#
add_proto qw/void aom_comp_avg_pred/, "uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride";
+ specialize qw/aom_comp_avg_pred avx2 neon/;
add_proto qw/void aom_dist_wtd_comp_avg_pred/, "uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param";
specialize qw/aom_dist_wtd_comp_avg_pred ssse3/;
@@ -1609,47 +1621,52 @@
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/unsigned int aom_highbd_12_variance128x128/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance128x128 sse2/;
+ specialize qw/aom_highbd_12_variance128x128 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance128x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance128x64 sse2/;
+ specialize qw/aom_highbd_12_variance128x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance64x128/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance64x128 sse2/;
+ specialize qw/aom_highbd_12_variance64x128 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance64x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance64x64 sse2/;
+ specialize qw/aom_highbd_12_variance64x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance64x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance64x32 sse2/;
+ specialize qw/aom_highbd_12_variance64x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance32x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance32x64 sse2/;
+ specialize qw/aom_highbd_12_variance32x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance32x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance32x32 sse2/;
+ specialize qw/aom_highbd_12_variance32x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance32x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance32x16 sse2/;
+ specialize qw/aom_highbd_12_variance32x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance16x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance16x32 sse2/;
+ specialize qw/aom_highbd_12_variance16x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance16x16 sse2/;
+ specialize qw/aom_highbd_12_variance16x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance16x8 sse2/;
+ specialize qw/aom_highbd_12_variance16x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance8x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance8x16 sse2/;
+ specialize qw/aom_highbd_12_variance8x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance8x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_variance8x8 sse2/;
+ specialize qw/aom_highbd_12_variance8x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_variance8x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_12_variance8x4 neon/;
+
add_proto qw/unsigned int aom_highbd_12_variance4x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_12_variance4x8 neon/;
+
add_proto qw/unsigned int aom_highbd_12_variance4x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_12_variance4x4 neon/;
add_proto qw/unsigned int aom_highbd_10_variance128x128/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
specialize qw/aom_highbd_10_variance128x128 sse2 avx2 neon/;
@@ -1691,84 +1708,113 @@
specialize qw/aom_highbd_10_variance8x8 sse2 avx2 neon/;
add_proto qw/unsigned int aom_highbd_10_variance8x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_10_variance8x4 neon/;
+
add_proto qw/unsigned int aom_highbd_10_variance4x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_10_variance4x8 neon/;
+
add_proto qw/unsigned int aom_highbd_10_variance4x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_10_variance4x4 neon/;
add_proto qw/unsigned int aom_highbd_8_variance128x128/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance128x128 sse2/;
+ specialize qw/aom_highbd_8_variance128x128 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance128x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance128x64 sse2/;
+ specialize qw/aom_highbd_8_variance128x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance64x128/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance64x128 sse2/;
+ specialize qw/aom_highbd_8_variance64x128 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance64x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance64x64 sse2/;
+ specialize qw/aom_highbd_8_variance64x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance64x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance64x32 sse2/;
+ specialize qw/aom_highbd_8_variance64x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance32x64/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance32x64 sse2/;
+ specialize qw/aom_highbd_8_variance32x64 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance32x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance32x32 sse2/;
+ specialize qw/aom_highbd_8_variance32x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance32x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance32x16 sse2/;
+ specialize qw/aom_highbd_8_variance32x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance16x32/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance16x32 sse2/;
+ specialize qw/aom_highbd_8_variance16x32 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance16x16 sse2/;
+ specialize qw/aom_highbd_8_variance16x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance16x8 sse2/;
+ specialize qw/aom_highbd_8_variance16x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance8x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance8x16 sse2/;
+ specialize qw/aom_highbd_8_variance8x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance8x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_variance8x8 sse2/;
+ specialize qw/aom_highbd_8_variance8x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_variance8x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_8_variance8x4 neon/;
+
add_proto qw/unsigned int aom_highbd_8_variance4x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_8_variance4x8 neon/;
+
add_proto qw/unsigned int aom_highbd_8_variance4x4/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize qw/aom_highbd_8_variance4x4 neon/;
- add_proto qw/void aom_highbd_8_get16x16var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
- add_proto qw/void aom_highbd_8_get8x8var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
+ if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
+ foreach $bd (8, 10, 12) {
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance64x16", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance64x16" , qw/neon/;
- add_proto qw/void aom_highbd_10_get16x16var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
- add_proto qw/void aom_highbd_10_get8x8var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance32x8", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance32x8" , qw/neon/;
- add_proto qw/void aom_highbd_12_get16x16var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
- add_proto qw/void aom_highbd_12_get8x8var/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum";
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance16x64", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance16x64" , qw/neon/;
+
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance16x4", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance16x4" , qw/neon/;
+
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance8x32", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance8x32" , qw/neon/;
+
+ add_proto qw/unsigned int/, "aom_highbd_${bd}_variance4x16", "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse";
+ specialize "aom_highbd_${bd}_variance4x16" , qw/neon/;
+ }
+ }
add_proto qw/unsigned int aom_highbd_8_mse16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_mse16x16 sse2/;
+ specialize qw/aom_highbd_8_mse16x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_8_mse16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_8_mse16x8 neon/;
add_proto qw/unsigned int aom_highbd_8_mse8x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_8_mse8x16 neon/;
add_proto qw/unsigned int aom_highbd_8_mse8x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_8_mse8x8 sse2/;
+ specialize qw/aom_highbd_8_mse8x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_10_mse16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_10_mse16x16 sse2/;
+ specialize qw/aom_highbd_10_mse16x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_10_mse16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_10_mse16x8 neon/;
add_proto qw/unsigned int aom_highbd_10_mse8x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_10_mse8x16 neon/;
add_proto qw/unsigned int aom_highbd_10_mse8x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_10_mse8x8 sse2/;
+ specialize qw/aom_highbd_10_mse8x8 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_mse16x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_mse16x16 sse2/;
+ specialize qw/aom_highbd_12_mse16x16 sse2 neon/;
add_proto qw/unsigned int aom_highbd_12_mse16x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_12_mse16x8 neon/;
add_proto qw/unsigned int aom_highbd_12_mse8x16/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
+ specialize qw/aom_highbd_12_mse8x16 neon/;
add_proto qw/unsigned int aom_highbd_12_mse8x8/, "const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse";
- specialize qw/aom_highbd_12_mse8x8 sse2/;
+ specialize qw/aom_highbd_12_mse8x8 sse2 neon/;
add_proto qw/void aom_highbd_comp_avg_pred/, "uint8_t *comp_pred8, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride";
@@ -2028,7 +2074,7 @@
add_proto qw/void aom_comp_mask_pred/, "uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask";
- specialize qw/aom_comp_mask_pred ssse3 avx2/;
+ specialize qw/aom_comp_mask_pred ssse3 avx2 neon/;
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/void aom_highbd_comp_mask_pred/, "uint8_t *comp_pred, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask";
@@ -2037,8 +2083,11 @@
# Flow estimation library
if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
- add_proto qw/double av1_compute_cross_correlation/, "unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2";
+ add_proto qw/double av1_compute_cross_correlation/, "const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2";
specialize qw/av1_compute_cross_correlation sse4_1 avx2/;
+
+ add_proto qw/void aom_compute_flow_at_point/, "const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v";
+ specialize qw/aom_compute_flow_at_point sse4_1/;
}
} # CONFIG_AV1_ENCODER
diff --git a/aom_dsp/arm/aom_convolve8_neon.c b/aom_dsp/arm/aom_convolve8_neon.c
new file mode 100644
index 0000000..3d07a0f
--- /dev/null
+++ b/aom_dsp/arm/aom_convolve8_neon.c
@@ -0,0 +1,1176 @@
+/*
+ * Copyright (c) 2014 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+#include <assert.h>
+#include <string.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/aom_filter.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/transpose_neon.h"
+#include "aom_ports/mem.h"
+
+#if AOM_ARCH_AARCH64 && \
+ (defined(__ARM_FEATURE_DOTPROD) || defined(__ARM_FEATURE_MATMUL_INT8))
+
+DECLARE_ALIGNED(16, static const uint8_t, dot_prod_permute_tbl[48]) = {
+ 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6,
+ 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10,
+ 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14
+};
+
+DECLARE_ALIGNED(16, static const uint8_t, dot_prod_tran_concat_tbl[32]) = {
+ 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
+ 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31
+};
+
+DECLARE_ALIGNED(16, static const uint8_t, dot_prod_merge_block_tbl[48]) = {
+ /* Shift left and insert new last column in transposed 4x4 block. */
+ 1, 2, 3, 16, 5, 6, 7, 20, 9, 10, 11, 24, 13, 14, 15, 28,
+ /* Shift left and insert two new columns in transposed 4x4 block. */
+ 2, 3, 16, 17, 6, 7, 20, 21, 10, 11, 24, 25, 14, 15, 28, 29,
+ /* Shift left and insert three new columns in transposed 4x4 block. */
+ 3, 16, 17, 18, 7, 20, 21, 22, 11, 24, 25, 26, 15, 28, 29, 30
+};
+
+#if defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE int16x4_t convolve8_4_usdot(uint8x16_t samples,
+ const int8x8_t filter,
+ const uint8x16x2_t permute_tbl) {
+ uint8x16_t permuted_samples[2];
+ int32x4_t sum;
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ sum = vusdotq_lane_s32(vdupq_n_s32(0), permuted_samples[0], filter, 0);
+ sum = vusdotq_lane_s32(sum, permuted_samples[1], filter, 1);
+
+ /* Further narrowing and packing is performed by the caller. */
+ return vqmovn_s32(sum);
+}
+
+static INLINE uint8x8_t convolve8_8_usdot(uint8x16_t samples,
+ const int8x8_t filter,
+ const uint8x16x3_t permute_tbl) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum0, sum1;
+ int16x8_t sum;
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ /* First 4 output values. */
+ sum0 = vusdotq_lane_s32(vdupq_n_s32(0), permuted_samples[0], filter, 0);
+ sum0 = vusdotq_lane_s32(sum0, permuted_samples[1], filter, 1);
+ /* Second 4 output values. */
+ sum1 = vusdotq_lane_s32(vdupq_n_s32(0), permuted_samples[1], filter, 0);
+ sum1 = vusdotq_lane_s32(sum1, permuted_samples[2], filter, 1);
+
+ /* Narrow and re-pack. */
+ sum = vcombine_s16(vqmovn_s32(sum0), vqmovn_s32(sum1));
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int8x8_t filter = vmovn_s16(vld1q_s16(filter_x));
+ uint8x16_t s0, s1, s2, s3;
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)x_step_q4;
+ (void)filter_y;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1);
+
+ if (w == 4) {
+ const uint8x16x2_t perm_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+ do {
+ int16x4_t t0, t1, t2, t3;
+ uint8x8_t d01, d23;
+
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ t0 = convolve8_4_usdot(s0, filter, perm_tbl);
+ t1 = convolve8_4_usdot(s1, filter, perm_tbl);
+ t2 = convolve8_4_usdot(s2, filter, perm_tbl);
+ t3 = convolve8_4_usdot(s3, filter, perm_tbl);
+ d01 = vqrshrun_n_s16(vcombine_s16(t0, t1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(t2, t3), FILTER_BITS);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ const uint8x16x3_t perm_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+ const uint8_t *s;
+ uint8_t *d;
+ int width;
+ uint8x8_t d0, d1, d2, d3;
+
+ do {
+ width = w;
+ s = src;
+ d = dst;
+ do {
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_usdot(s0, filter, perm_tbl);
+ d1 = convolve8_8_usdot(s1, filter, perm_tbl);
+ d2 = convolve8_8_usdot(s2, filter, perm_tbl);
+ d3 = convolve8_8_usdot(s3, filter, perm_tbl);
+
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ }
+}
+
+static INLINE void transpose_concat_4x4(uint8x8_t a0, uint8x8_t a1,
+ uint8x8_t a2, uint8x8_t a3,
+ uint8x16_t *b,
+ const uint8x16_t permute_tbl) {
+ /* Transpose 8-bit elements and concatenate result rows as follows:
+ * a0: 00, 01, 02, 03, XX, XX, XX, XX
+ * a1: 10, 11, 12, 13, XX, XX, XX, XX
+ * a2: 20, 21, 22, 23, XX, XX, XX, XX
+ * a3: 30, 31, 32, 33, XX, XX, XX, XX
+ *
+ * b: 00, 10, 20, 30, 01, 11, 21, 31, 02, 12, 22, 32, 03, 13, 23, 33
+ *
+ * The 'permute_tbl' is always 'dot_prod_tran_concat_tbl' above. Passing it
+ * as an argument is preferable to loading it directly from memory as this
+ * inline helper is called many times from the same parent function.
+ */
+
+ uint8x16x2_t samples = { { vcombine_u8(a0, a1), vcombine_u8(a2, a3) } };
+ *b = vqtbl2q_u8(samples, permute_tbl);
+}
+
+static INLINE void transpose_concat_8x4(uint8x8_t a0, uint8x8_t a1,
+ uint8x8_t a2, uint8x8_t a3,
+ uint8x16_t *b0, uint8x16_t *b1,
+ const uint8x16x2_t permute_tbl) {
+ /* Transpose 8-bit elements and concatenate result rows as follows:
+ * a0: 00, 01, 02, 03, 04, 05, 06, 07
+ * a1: 10, 11, 12, 13, 14, 15, 16, 17
+ * a2: 20, 21, 22, 23, 24, 25, 26, 27
+ * a3: 30, 31, 32, 33, 34, 35, 36, 37
+ *
+ * b0: 00, 10, 20, 30, 01, 11, 21, 31, 02, 12, 22, 32, 03, 13, 23, 33
+ * b1: 04, 14, 24, 34, 05, 15, 25, 35, 06, 16, 26, 36, 07, 17, 27, 37
+ *
+ * The 'permute_tbl' is always 'dot_prod_tran_concat_tbl' above. Passing it
+ * as an argument is preferable to loading it directly from memory as this
+ * inline helper is called many times from the same parent function.
+ */
+
+ uint8x16x2_t samples = { { vcombine_u8(a0, a1), vcombine_u8(a2, a3) } };
+ *b0 = vqtbl2q_u8(samples, permute_tbl.val[0]);
+ *b1 = vqtbl2q_u8(samples, permute_tbl.val[1]);
+}
+
+static INLINE int16x4_t convolve8_4_usdot_partial(const uint8x16_t samples_lo,
+ const uint8x16_t samples_hi,
+ const int8x8_t filter) {
+ /* Sample permutation is performed by the caller. */
+ int32x4_t sum;
+
+ sum = vusdotq_lane_s32(vdupq_n_s32(0), samples_lo, filter, 0);
+ sum = vusdotq_lane_s32(sum, samples_hi, filter, 1);
+
+ /* Further narrowing and packing is performed by the caller. */
+ return vqmovn_s32(sum);
+}
+
+static INLINE uint8x8_t convolve8_8_usdot_partial(const uint8x16_t samples0_lo,
+ const uint8x16_t samples0_hi,
+ const uint8x16_t samples1_lo,
+ const uint8x16_t samples1_hi,
+ const int8x8_t filter) {
+ /* Sample permutation is performed by the caller. */
+ int32x4_t sum0, sum1;
+ int16x8_t sum;
+
+ /* First 4 output values. */
+ sum0 = vusdotq_lane_s32(vdupq_n_s32(0), samples0_lo, filter, 0);
+ sum0 = vusdotq_lane_s32(sum0, samples0_hi, filter, 1);
+ /* Second 4 output values. */
+ sum1 = vusdotq_lane_s32(vdupq_n_s32(0), samples1_lo, filter, 0);
+ sum1 = vusdotq_lane_s32(sum1, samples1_hi, filter, 1);
+
+ /* Narrow and re-pack. */
+ sum = vcombine_s16(vqmovn_s32(sum0), vqmovn_s32(sum1));
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int8x8_t filter = vmovn_s16(vld1q_s16(filter_y));
+ const uint8x16x3_t merge_block_tbl = vld1q_u8_x3(dot_prod_merge_block_tbl);
+ uint8x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint8x16x2_t samples_LUT;
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)filter_x;
+ (void)x_step_q4;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1) * src_stride;
+
+ if (w == 4) {
+ const uint8x16_t tran_concat_tbl = vld1q_u8(dot_prod_tran_concat_tbl);
+ uint8x16_t s0123, s1234, s2345, s3456, s4567, s5678, s6789, s78910;
+ int16x4_t d0, d1, d2, d3;
+ uint8x8_t d01, d23;
+
+ load_u8_8x7(src, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ src += 7 * src_stride;
+
+ s7 = vdup_n_u8(0);
+ s8 = vdup_n_u8(0);
+ s9 = vdup_n_u8(0);
+
+ /* This operation combines a conventional transpose and the sample permute
+ * (see horizontal case) required before computing the dot product.
+ */
+ transpose_concat_4x4(s0, s1, s2, s3, &s0123, tran_concat_tbl);
+ transpose_concat_4x4(s1, s2, s3, s4, &s1234, tran_concat_tbl);
+ transpose_concat_4x4(s2, s3, s4, s5, &s2345, tran_concat_tbl);
+ transpose_concat_4x4(s3, s4, s5, s6, &s3456, tran_concat_tbl);
+ transpose_concat_4x4(s4, s5, s6, s7, &s4567, tran_concat_tbl);
+ transpose_concat_4x4(s5, s6, s7, s8, &s5678, tran_concat_tbl);
+ transpose_concat_4x4(s6, s7, s8, s9, &s6789, tran_concat_tbl);
+
+ do {
+ load_u8_8x4(src, src_stride, &s7, &s8, &s9, &s10);
+
+ transpose_concat_4x4(s7, s8, s9, s10, &s78910, tran_concat_tbl);
+
+ /* Merge new data into block from previous iteration. */
+ samples_LUT.val[0] = s3456;
+ samples_LUT.val[1] = s78910;
+ s4567 = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[0]);
+ s5678 = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[1]);
+ s6789 = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[2]);
+
+ d0 = convolve8_4_usdot_partial(s0123, s4567, filter);
+ d1 = convolve8_4_usdot_partial(s1234, s5678, filter);
+ d2 = convolve8_4_usdot_partial(s2345, s6789, filter);
+ d3 = convolve8_4_usdot_partial(s3456, s78910, filter);
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ /* Prepare block for next iteration - re-using as much as possible. */
+ /* Shuffle everything up four rows. */
+ s0123 = s4567;
+ s1234 = s5678;
+ s2345 = s6789;
+ s3456 = s78910;
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h != 0);
+ } else {
+ const uint8x16x2_t tran_concat_tbl = vld1q_u8_x2(dot_prod_tran_concat_tbl);
+ uint8x16_t s0123_lo, s0123_hi, s1234_lo, s1234_hi, s2345_lo, s2345_hi,
+ s3456_lo, s3456_hi, s4567_lo, s4567_hi, s5678_lo, s5678_hi, s6789_lo,
+ s6789_hi, s78910_lo, s78910_hi;
+ uint8x8_t d0, d1, d2, d3;
+ const uint8_t *s;
+ uint8_t *d;
+ int height;
+
+ do {
+ height = h;
+ s = src;
+ d = dst;
+
+ load_u8_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ s7 = vdup_n_u8(0);
+ s8 = vdup_n_u8(0);
+ s9 = vdup_n_u8(0);
+
+ /* This operation combines a conventional transpose and the sample permute
+ * (see horizontal case) required before computing the dot product.
+ */
+ transpose_concat_8x4(s0, s1, s2, s3, &s0123_lo, &s0123_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s1, s2, s3, s4, &s1234_lo, &s1234_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s2, s3, s4, s5, &s2345_lo, &s2345_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s3, s4, s5, s6, &s3456_lo, &s3456_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s4, s5, s6, s7, &s4567_lo, &s4567_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s5, s6, s7, s8, &s5678_lo, &s5678_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s6, s7, s8, s9, &s6789_lo, &s6789_hi,
+ tran_concat_tbl);
+
+ do {
+ load_u8_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ transpose_concat_8x4(s7, s8, s9, s10, &s78910_lo, &s78910_hi,
+ tran_concat_tbl);
+
+ /* Merge new data into block from previous iteration. */
+ samples_LUT.val[0] = s3456_lo;
+ samples_LUT.val[1] = s78910_lo;
+ s4567_lo = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[0]);
+ s5678_lo = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[1]);
+ s6789_lo = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[2]);
+
+ samples_LUT.val[0] = s3456_hi;
+ samples_LUT.val[1] = s78910_hi;
+ s4567_hi = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[0]);
+ s5678_hi = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[1]);
+ s6789_hi = vqtbl2q_u8(samples_LUT, merge_block_tbl.val[2]);
+
+ d0 = convolve8_8_usdot_partial(s0123_lo, s4567_lo, s0123_hi, s4567_hi,
+ filter);
+ d1 = convolve8_8_usdot_partial(s1234_lo, s5678_lo, s1234_hi, s5678_hi,
+ filter);
+ d2 = convolve8_8_usdot_partial(s2345_lo, s6789_lo, s2345_hi, s6789_hi,
+ filter);
+ d3 = convolve8_8_usdot_partial(s3456_lo, s78910_lo, s3456_hi, s78910_hi,
+ filter);
+
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ /* Prepare block for next iteration - re-using as much as possible. */
+ /* Shuffle everything up four rows. */
+ s0123_lo = s4567_lo;
+ s0123_hi = s4567_hi;
+ s1234_lo = s5678_lo;
+ s1234_hi = s5678_hi;
+ s2345_lo = s6789_lo;
+ s2345_hi = s6789_hi;
+ s3456_lo = s78910_lo;
+ s3456_hi = s78910_hi;
+
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ src += 8;
+ dst += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+#else // !defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE int16x4_t convolve8_4_sdot(uint8x16_t samples,
+ const int8x8_t filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x2_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[2];
+ int32x4_t sum;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ sum = vdotq_lane_s32(correction, permuted_samples[0], filter, 0);
+ sum = vdotq_lane_s32(sum, permuted_samples[1], filter, 1);
+
+ /* Further narrowing and packing is performed by the caller. */
+ return vqmovn_s32(sum);
+}
+
+static INLINE uint8x8_t convolve8_8_sdot(uint8x16_t samples,
+ const int8x8_t filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum0, sum1;
+ int16x8_t sum;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum0 = vdotq_lane_s32(correction, permuted_samples[0], filter, 0);
+ sum0 = vdotq_lane_s32(sum0, permuted_samples[1], filter, 1);
+ /* Second 4 output values. */
+ sum1 = vdotq_lane_s32(correction, permuted_samples[1], filter, 0);
+ sum1 = vdotq_lane_s32(sum1, permuted_samples[2], filter, 1);
+
+ /* Narrow and re-pack. */
+ sum = vcombine_s16(vqmovn_s32(sum0), vqmovn_s32(sum1));
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int8x8_t filter = vmovn_s16(vld1q_s16(filter_x));
+ const int16x8_t correct_tmp = vmulq_n_s16(vld1q_s16(filter_x), 128);
+ const int32x4_t correction = vdupq_n_s32((int32_t)vaddvq_s16(correct_tmp));
+ const uint8x16_t range_limit = vdupq_n_u8(128);
+ uint8x16_t s0, s1, s2, s3;
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)x_step_q4;
+ (void)filter_y;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1);
+
+ if (w == 4) {
+ const uint8x16x2_t perm_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+ do {
+ int16x4_t t0, t1, t2, t3;
+ uint8x8_t d01, d23;
+
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ t0 = convolve8_4_sdot(s0, filter, correction, range_limit, perm_tbl);
+ t1 = convolve8_4_sdot(s1, filter, correction, range_limit, perm_tbl);
+ t2 = convolve8_4_sdot(s2, filter, correction, range_limit, perm_tbl);
+ t3 = convolve8_4_sdot(s3, filter, correction, range_limit, perm_tbl);
+ d01 = vqrshrun_n_s16(vcombine_s16(t0, t1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(t2, t3), FILTER_BITS);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ const uint8x16x3_t perm_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+ const uint8_t *s;
+ uint8_t *d;
+ int width;
+ uint8x8_t d0, d1, d2, d3;
+
+ do {
+ width = w;
+ s = src;
+ d = dst;
+ do {
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_sdot(s0, filter, correction, range_limit, perm_tbl);
+ d1 = convolve8_8_sdot(s1, filter, correction, range_limit, perm_tbl);
+ d2 = convolve8_8_sdot(s2, filter, correction, range_limit, perm_tbl);
+ d3 = convolve8_8_sdot(s3, filter, correction, range_limit, perm_tbl);
+
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ }
+}
+
+static INLINE void transpose_concat_4x4(int8x8_t a0, int8x8_t a1, int8x8_t a2,
+ int8x8_t a3, int8x16_t *b,
+ const uint8x16_t permute_tbl) {
+ /* Transpose 8-bit elements and concatenate result rows as follows:
+ * a0: 00, 01, 02, 03, XX, XX, XX, XX
+ * a1: 10, 11, 12, 13, XX, XX, XX, XX
+ * a2: 20, 21, 22, 23, XX, XX, XX, XX
+ * a3: 30, 31, 32, 33, XX, XX, XX, XX
+ *
+ * b: 00, 10, 20, 30, 01, 11, 21, 31, 02, 12, 22, 32, 03, 13, 23, 33
+ *
+ * The 'permute_tbl' is always 'dot_prod_tran_concat_tbl' above. Passing it
+ * as an argument is preferable to loading it directly from memory as this
+ * inline helper is called many times from the same parent function.
+ */
+
+ int8x16x2_t samples = { { vcombine_s8(a0, a1), vcombine_s8(a2, a3) } };
+ *b = vqtbl2q_s8(samples, permute_tbl);
+}
+
+static INLINE void transpose_concat_8x4(int8x8_t a0, int8x8_t a1, int8x8_t a2,
+ int8x8_t a3, int8x16_t *b0,
+ int8x16_t *b1,
+ const uint8x16x2_t permute_tbl) {
+ /* Transpose 8-bit elements and concatenate result rows as follows:
+ * a0: 00, 01, 02, 03, 04, 05, 06, 07
+ * a1: 10, 11, 12, 13, 14, 15, 16, 17
+ * a2: 20, 21, 22, 23, 24, 25, 26, 27
+ * a3: 30, 31, 32, 33, 34, 35, 36, 37
+ *
+ * b0: 00, 10, 20, 30, 01, 11, 21, 31, 02, 12, 22, 32, 03, 13, 23, 33
+ * b1: 04, 14, 24, 34, 05, 15, 25, 35, 06, 16, 26, 36, 07, 17, 27, 37
+ *
+ * The 'permute_tbl' is always 'dot_prod_tran_concat_tbl' above. Passing it
+ * as an argument is preferable to loading it directly from memory as this
+ * inline helper is called many times from the same parent function.
+ */
+
+ int8x16x2_t samples = { { vcombine_s8(a0, a1), vcombine_s8(a2, a3) } };
+ *b0 = vqtbl2q_s8(samples, permute_tbl.val[0]);
+ *b1 = vqtbl2q_s8(samples, permute_tbl.val[1]);
+}
+
+static INLINE int16x4_t convolve8_4_sdot_partial(const int8x16_t samples_lo,
+ const int8x16_t samples_hi,
+ const int32x4_t correction,
+ const int8x8_t filter) {
+ /* Sample range-clamping and permutation are performed by the caller. */
+ int32x4_t sum;
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ sum = vdotq_lane_s32(correction, samples_lo, filter, 0);
+ sum = vdotq_lane_s32(sum, samples_hi, filter, 1);
+
+ /* Further narrowing and packing is performed by the caller. */
+ return vqmovn_s32(sum);
+}
+
+static INLINE uint8x8_t convolve8_8_sdot_partial(const int8x16_t samples0_lo,
+ const int8x16_t samples0_hi,
+ const int8x16_t samples1_lo,
+ const int8x16_t samples1_hi,
+ const int32x4_t correction,
+ const int8x8_t filter) {
+ /* Sample range-clamping and permutation are performed by the caller. */
+ int32x4_t sum0, sum1;
+ int16x8_t sum;
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum0 = vdotq_lane_s32(correction, samples0_lo, filter, 0);
+ sum0 = vdotq_lane_s32(sum0, samples0_hi, filter, 1);
+ /* Second 4 output values. */
+ sum1 = vdotq_lane_s32(correction, samples1_lo, filter, 0);
+ sum1 = vdotq_lane_s32(sum1, samples1_hi, filter, 1);
+
+ /* Narrow and re-pack. */
+ sum = vcombine_s16(vqmovn_s32(sum0), vqmovn_s32(sum1));
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int8x8_t filter = vmovn_s16(vld1q_s16(filter_y));
+ const int16x8_t correct_tmp = vmulq_n_s16(vld1q_s16(filter_y), 128);
+ const int32x4_t correction = vdupq_n_s32((int32_t)vaddvq_s16(correct_tmp));
+ const uint8x8_t range_limit = vdup_n_u8(128);
+ const uint8x16x3_t merge_block_tbl = vld1q_u8_x3(dot_prod_merge_block_tbl);
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6;
+ int8x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ int8x16x2_t samples_LUT;
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)filter_x;
+ (void)x_step_q4;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1) * src_stride;
+
+ if (w == 4) {
+ const uint8x16_t tran_concat_tbl = vld1q_u8(dot_prod_tran_concat_tbl);
+ int8x16_t s0123, s1234, s2345, s3456, s4567, s5678, s6789, s78910;
+ int16x4_t d0, d1, d2, d3;
+ uint8x8_t d01, d23;
+
+ load_u8_8x7(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+ src += 7 * src_stride;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ s0 = vreinterpret_s8_u8(vsub_u8(t0, range_limit));
+ s1 = vreinterpret_s8_u8(vsub_u8(t1, range_limit));
+ s2 = vreinterpret_s8_u8(vsub_u8(t2, range_limit));
+ s3 = vreinterpret_s8_u8(vsub_u8(t3, range_limit));
+ s4 = vreinterpret_s8_u8(vsub_u8(t4, range_limit));
+ s5 = vreinterpret_s8_u8(vsub_u8(t5, range_limit));
+ s6 = vreinterpret_s8_u8(vsub_u8(t6, range_limit));
+ s7 = vdup_n_s8(0);
+ s8 = vdup_n_s8(0);
+ s9 = vdup_n_s8(0);
+
+ /* This operation combines a conventional transpose and the sample permute
+ * (see horizontal case) required before computing the dot product.
+ */
+ transpose_concat_4x4(s0, s1, s2, s3, &s0123, tran_concat_tbl);
+ transpose_concat_4x4(s1, s2, s3, s4, &s1234, tran_concat_tbl);
+ transpose_concat_4x4(s2, s3, s4, s5, &s2345, tran_concat_tbl);
+ transpose_concat_4x4(s3, s4, s5, s6, &s3456, tran_concat_tbl);
+ transpose_concat_4x4(s4, s5, s6, s7, &s4567, tran_concat_tbl);
+ transpose_concat_4x4(s5, s6, s7, s8, &s5678, tran_concat_tbl);
+ transpose_concat_4x4(s6, s7, s8, s9, &s6789, tran_concat_tbl);
+
+ do {
+ uint8x8_t t7, t8, t9, t10;
+
+ load_u8_8x4(src, src_stride, &t7, &t8, &t9, &t10);
+
+ s7 = vreinterpret_s8_u8(vsub_u8(t7, range_limit));
+ s8 = vreinterpret_s8_u8(vsub_u8(t8, range_limit));
+ s9 = vreinterpret_s8_u8(vsub_u8(t9, range_limit));
+ s10 = vreinterpret_s8_u8(vsub_u8(t10, range_limit));
+
+ transpose_concat_4x4(s7, s8, s9, s10, &s78910, tran_concat_tbl);
+
+ /* Merge new data into block from previous iteration. */
+ samples_LUT.val[0] = s3456;
+ samples_LUT.val[1] = s78910;
+ s4567 = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[0]);
+ s5678 = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[1]);
+ s6789 = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[2]);
+
+ d0 = convolve8_4_sdot_partial(s0123, s4567, correction, filter);
+ d1 = convolve8_4_sdot_partial(s1234, s5678, correction, filter);
+ d2 = convolve8_4_sdot_partial(s2345, s6789, correction, filter);
+ d3 = convolve8_4_sdot_partial(s3456, s78910, correction, filter);
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ /* Prepare block for next iteration - re-using as much as possible. */
+ /* Shuffle everything up four rows. */
+ s0123 = s4567;
+ s1234 = s5678;
+ s2345 = s6789;
+ s3456 = s78910;
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h != 0);
+ } else {
+ const uint8x16x2_t tran_concat_tbl = vld1q_u8_x2(dot_prod_tran_concat_tbl);
+ int8x16_t s0123_lo, s0123_hi, s1234_lo, s1234_hi, s2345_lo, s2345_hi,
+ s3456_lo, s3456_hi, s4567_lo, s4567_hi, s5678_lo, s5678_hi, s6789_lo,
+ s6789_hi, s78910_lo, s78910_hi;
+ uint8x8_t d0, d1, d2, d3;
+ const uint8_t *s;
+ uint8_t *d;
+ int height;
+
+ do {
+ height = h;
+ s = src;
+ d = dst;
+
+ load_u8_8x7(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+ s += 7 * src_stride;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ s0 = vreinterpret_s8_u8(vsub_u8(t0, range_limit));
+ s1 = vreinterpret_s8_u8(vsub_u8(t1, range_limit));
+ s2 = vreinterpret_s8_u8(vsub_u8(t2, range_limit));
+ s3 = vreinterpret_s8_u8(vsub_u8(t3, range_limit));
+ s4 = vreinterpret_s8_u8(vsub_u8(t4, range_limit));
+ s5 = vreinterpret_s8_u8(vsub_u8(t5, range_limit));
+ s6 = vreinterpret_s8_u8(vsub_u8(t6, range_limit));
+ s7 = vdup_n_s8(0);
+ s8 = vdup_n_s8(0);
+ s9 = vdup_n_s8(0);
+
+ /* This operation combines a conventional transpose and the sample permute
+ * (see horizontal case) required before computing the dot product.
+ */
+ transpose_concat_8x4(s0, s1, s2, s3, &s0123_lo, &s0123_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s1, s2, s3, s4, &s1234_lo, &s1234_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s2, s3, s4, s5, &s2345_lo, &s2345_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s3, s4, s5, s6, &s3456_lo, &s3456_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s4, s5, s6, s7, &s4567_lo, &s4567_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s5, s6, s7, s8, &s5678_lo, &s5678_hi,
+ tran_concat_tbl);
+ transpose_concat_8x4(s6, s7, s8, s9, &s6789_lo, &s6789_hi,
+ tran_concat_tbl);
+
+ do {
+ uint8x8_t t7, t8, t9, t10;
+
+ load_u8_8x4(s, src_stride, &t7, &t8, &t9, &t10);
+
+ s7 = vreinterpret_s8_u8(vsub_u8(t7, range_limit));
+ s8 = vreinterpret_s8_u8(vsub_u8(t8, range_limit));
+ s9 = vreinterpret_s8_u8(vsub_u8(t9, range_limit));
+ s10 = vreinterpret_s8_u8(vsub_u8(t10, range_limit));
+
+ transpose_concat_8x4(s7, s8, s9, s10, &s78910_lo, &s78910_hi,
+ tran_concat_tbl);
+
+ /* Merge new data into block from previous iteration. */
+ samples_LUT.val[0] = s3456_lo;
+ samples_LUT.val[1] = s78910_lo;
+ s4567_lo = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[0]);
+ s5678_lo = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[1]);
+ s6789_lo = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[2]);
+
+ samples_LUT.val[0] = s3456_hi;
+ samples_LUT.val[1] = s78910_hi;
+ s4567_hi = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[0]);
+ s5678_hi = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[1]);
+ s6789_hi = vqtbl2q_s8(samples_LUT, merge_block_tbl.val[2]);
+
+ d0 = convolve8_8_sdot_partial(s0123_lo, s4567_lo, s0123_hi, s4567_hi,
+ correction, filter);
+ d1 = convolve8_8_sdot_partial(s1234_lo, s5678_lo, s1234_hi, s5678_hi,
+ correction, filter);
+ d2 = convolve8_8_sdot_partial(s2345_lo, s6789_lo, s2345_hi, s6789_hi,
+ correction, filter);
+ d3 = convolve8_8_sdot_partial(s3456_lo, s78910_lo, s3456_hi, s78910_hi,
+ correction, filter);
+
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ /* Prepare block for next iteration - re-using as much as possible. */
+ /* Shuffle everything up four rows. */
+ s0123_lo = s4567_lo;
+ s0123_hi = s4567_hi;
+ s1234_lo = s5678_lo;
+ s1234_hi = s5678_hi;
+ s2345_lo = s6789_lo;
+ s2345_hi = s6789_hi;
+ s3456_lo = s78910_lo;
+ s3456_hi = s78910_hi;
+
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ src += 8;
+ dst += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+#endif // defined(__ARM_FEATURE_MATMUL_INT8)
+
+#else // !(AOM_ARCH_AARCH64 &&
+ // (defined(__ARM_FEATURE_DOTPROD) ||
+ // defined(__ARM_FEATURE_MATMUL_INT8)))
+
+static INLINE int16x4_t convolve8_4(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t filter) {
+ const int16x4_t filter_lo = vget_low_s16(filter);
+ const int16x4_t filter_hi = vget_high_s16(filter);
+ int16x4_t sum;
+
+ sum = vmul_lane_s16(s0, filter_lo, 0);
+ sum = vmla_lane_s16(sum, s1, filter_lo, 1);
+ sum = vmla_lane_s16(sum, s2, filter_lo, 2);
+ sum = vmla_lane_s16(sum, s5, filter_hi, 1);
+ sum = vmla_lane_s16(sum, s6, filter_hi, 2);
+ sum = vmla_lane_s16(sum, s7, filter_hi, 3);
+ sum = vqadd_s16(sum, vmul_lane_s16(s3, filter_lo, 3));
+ sum = vqadd_s16(sum, vmul_lane_s16(s4, filter_hi, 0));
+ return sum;
+}
+
+static INLINE uint8x8_t convolve8_8(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t filter) {
+ const int16x4_t filter_lo = vget_low_s16(filter);
+ const int16x4_t filter_hi = vget_high_s16(filter);
+ int16x8_t sum;
+
+ sum = vmulq_lane_s16(s0, filter_lo, 0);
+ sum = vmlaq_lane_s16(sum, s1, filter_lo, 1);
+ sum = vmlaq_lane_s16(sum, s2, filter_lo, 2);
+ sum = vmlaq_lane_s16(sum, s5, filter_hi, 1);
+ sum = vmlaq_lane_s16(sum, s6, filter_hi, 2);
+ sum = vmlaq_lane_s16(sum, s7, filter_hi, 3);
+ sum = vqaddq_s16(sum, vmulq_lane_s16(s3, filter_lo, 3));
+ sum = vqaddq_s16(sum, vmulq_lane_s16(s4, filter_hi, 0));
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int16x8_t filter = vld1q_s16(filter_x);
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)x_step_q4;
+ (void)filter_y;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1);
+
+ if (h == 4) {
+ uint8x8_t t0, t1, t2, t3, d01, d23;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, d0, d1, d2, d3;
+
+ load_u8_8x4(src, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
+ src += 7;
+
+ do {
+ load_u8_8x4(src, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+
+ d0 = convolve8_4(s0, s1, s2, s3, s4, s5, s6, s7, filter);
+ d1 = convolve8_4(s1, s2, s3, s4, s5, s6, s7, s8, filter);
+ d2 = convolve8_4(s2, s3, s4, s5, s6, s7, s8, s9, filter);
+ d3 = convolve8_4(s3, s4, s5, s6, s7, s8, s9, s10, filter);
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS);
+
+ transpose_u8_4x4(&d01, &d23);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 2 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ src += 4;
+ dst += 4;
+ w -= 4;
+ } while (w != 0);
+ } else {
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, t7, d0, d1, d2, d3;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+
+ if (w == 4) {
+ do {
+ load_u8_8x8(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ load_u8_8x8(src + 7, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6,
+ &t7);
+ transpose_u8_4x8(&t0, &t1, &t2, &t3, t4, t5, t6, t7);
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+
+ d0 = convolve8_8(s0, s1, s2, s3, s4, s5, s6, s7, filter);
+ d1 = convolve8_8(s1, s2, s3, s4, s5, s6, s7, s8, filter);
+ d2 = convolve8_8(s2, s3, s4, s5, s6, s7, s8, s9, filter);
+ d3 = convolve8_8(s3, s4, s5, s6, s7, s8, s9, s10, filter);
+
+ transpose_u8_8x4(&d0, &d1, &d2, &d3);
+
+ store_u8_4x1(dst + 0 * dst_stride, d0, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d1, 0);
+ store_u8_4x1(dst + 2 * dst_stride, d2, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d3, 0);
+ store_u8_4x1(dst + 4 * dst_stride, d0, 1);
+ store_u8_4x1(dst + 5 * dst_stride, d1, 1);
+ store_u8_4x1(dst + 6 * dst_stride, d2, 1);
+ store_u8_4x1(dst + 7 * dst_stride, d3, 1);
+
+ src += 8 * src_stride;
+ dst += 8 * dst_stride;
+ h -= 8;
+ } while (h > 0);
+ } else {
+ uint8x8_t d4, d5, d6, d7;
+ int16x8_t s11, s12, s13, s14;
+ int width;
+ const uint8_t *s;
+ uint8_t *d;
+
+ do {
+ load_u8_8x8(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ width = w;
+ s = src + 7;
+ d = dst;
+
+ do {
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve8_8(s0, s1, s2, s3, s4, s5, s6, s7, filter);
+ d1 = convolve8_8(s1, s2, s3, s4, s5, s6, s7, s8, filter);
+ d2 = convolve8_8(s2, s3, s4, s5, s6, s7, s8, s9, filter);
+ d3 = convolve8_8(s3, s4, s5, s6, s7, s8, s9, s10, filter);
+ d4 = convolve8_8(s4, s5, s6, s7, s8, s9, s10, s11, filter);
+ d5 = convolve8_8(s5, s6, s7, s8, s9, s10, s11, s12, filter);
+ d6 = convolve8_8(s6, s7, s8, s9, s10, s11, s12, s13, filter);
+ d7 = convolve8_8(s7, s8, s9, s10, s11, s12, s13, s14, filter);
+
+ transpose_u8_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
+
+ store_u8_8x8(d, dst_stride, d0, d1, d2, d3, d4, d5, d6, d7);
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s5 = s13;
+ s6 = s14;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 8 * src_stride;
+ dst += 8 * dst_stride;
+ h -= 8;
+ } while (h > 0);
+ }
+ }
+}
+
+void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride,
+ uint8_t *dst, ptrdiff_t dst_stride,
+ const int16_t *filter_x, int x_step_q4,
+ const int16_t *filter_y, int y_step_q4, int w,
+ int h) {
+ const int16x8_t filter = vld1q_s16(filter_y);
+
+ assert((intptr_t)dst % 4 == 0);
+ assert(dst_stride % 4 == 0);
+
+ (void)filter_x;
+ (void)x_step_q4;
+ (void)y_step_q4;
+
+ src -= ((SUBPEL_TAPS / 2) - 1) * src_stride;
+
+ if (w == 4) {
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, d01, d23;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, d0, d1, d2, d3;
+
+ load_u8_8x7(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t4)));
+ s5 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t5)));
+ s6 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t6)));
+
+ src += 7 * src_stride;
+
+ do {
+ load_u8_8x4(src, src_stride, &t0, &t1, &t2, &t3);
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+
+ d0 = convolve8_4(s0, s1, s2, s3, s4, s5, s6, s7, filter);
+ d1 = convolve8_4(s1, s2, s3, s4, s5, s6, s7, s8, filter);
+ d2 = convolve8_4(s2, s3, s4, s5, s6, s7, s8, s9, filter);
+ d3 = convolve8_4(s3, s4, s5, s6, s7, s8, s9, s10, filter);
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS);
+ d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS);
+
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h != 0);
+ } else {
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, d0, d1, d2, d3;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ int height;
+ const uint8_t *s;
+ uint8_t *d;
+
+ do {
+ load_u8_8x7(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ height = h;
+ s = src + 7 * src_stride;
+ d = dst;
+
+ do {
+ load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+
+ d0 = convolve8_8(s0, s1, s2, s3, s4, s5, s6, s7, filter);
+ d1 = convolve8_8(s1, s2, s3, s4, s5, s6, s7, s8, filter);
+ d2 = convolve8_8(s2, s3, s4, s5, s6, s7, s8, s9, filter);
+ d3 = convolve8_8(s3, s4, s5, s6, s7, s8, s9, s10, filter);
+
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ src += 8;
+ dst += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
diff --git a/aom_dsp/arm/avg_neon.c b/aom_dsp/arm/avg_neon.c
index 991fd3f..ef2f3af 100644
--- a/aom_dsp/arm/avg_neon.c
+++ b/aom_dsp/arm/avg_neon.c
@@ -9,7 +9,9 @@
*/
#include <arm_neon.h>
+#include <assert.h>
+#include "config/aom_config.h"
#include "config/aom_dsp_rtcd.h"
#include "aom/aom_integer.h"
#include "aom_dsp/arm/mem_neon.h"
@@ -17,7 +19,7 @@
#include "aom_dsp/arm/transpose_neon.h"
#include "aom_ports/mem.h"
-#if !defined(__aarch64__)
+#if !AOM_ARCH_AARCH64
static INLINE uint32x2_t horizontal_add_u16x8_v(const uint16x8_t a) {
const uint32x4_t b = vpaddlq_u16(a);
const uint64x2_t c = vpaddlq_u32(b);
@@ -29,7 +31,7 @@
unsigned int aom_avg_4x4_neon(const uint8_t *a, int a_stride) {
const uint8x16_t b = load_unaligned_u8q(a, a_stride);
const uint16x8_t c = vaddl_u8(vget_low_u8(b), vget_high_u8(b));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const uint32_t d = vaddlvq_u16(c);
return (d + 8) >> 4;
#else
@@ -52,7 +54,7 @@
sum = vaddw_u8(sum, e);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const uint32_t d = vaddlvq_u16(sum);
return (d + 32) >> 6;
#else
@@ -92,52 +94,90 @@
void aom_int_pro_row_neon(int16_t *hbuf, const uint8_t *ref,
const int ref_stride, const int width,
const int height, int norm_factor) {
- const uint8_t *idx = ref;
- const uint16x8_t zero = vdupq_n_u16(0);
- const int16x8_t neg_norm_factor = vdupq_n_s16(-norm_factor);
+ assert(width % 16 == 0);
+ assert(height % 4 == 0);
- for (int wd = 0; wd < width; wd += 16) {
- uint16x8_t vec0 = zero;
- uint16x8_t vec1 = zero;
- idx = ref + wd;
- for (int ht = 0; ht < height; ++ht) {
- const uint8x16_t tmp = vld1q_u8(idx);
- idx += ref_stride;
- vec0 = vaddw_u8(vec0, vget_low_u8(tmp));
- vec1 = vaddw_u8(vec1, vget_high_u8(tmp));
+ const int16x8_t neg_norm_factor = vdupq_n_s16(-norm_factor);
+ uint16x8_t sum_lo[2], sum_hi[2];
+
+ int w = 0;
+ do {
+ const uint8_t *r = ref + w;
+ uint8x16_t r0 = vld1q_u8(r + 0 * ref_stride);
+ uint8x16_t r1 = vld1q_u8(r + 1 * ref_stride);
+ uint8x16_t r2 = vld1q_u8(r + 2 * ref_stride);
+ uint8x16_t r3 = vld1q_u8(r + 3 * ref_stride);
+
+ sum_lo[0] = vaddl_u8(vget_low_u8(r0), vget_low_u8(r1));
+ sum_hi[0] = vaddl_u8(vget_high_u8(r0), vget_high_u8(r1));
+ sum_lo[1] = vaddl_u8(vget_low_u8(r2), vget_low_u8(r3));
+ sum_hi[1] = vaddl_u8(vget_high_u8(r2), vget_high_u8(r3));
+
+ r += 4 * ref_stride;
+
+ for (int h = height - 4; h != 0; h -= 4) {
+ r0 = vld1q_u8(r + 0 * ref_stride);
+ r1 = vld1q_u8(r + 1 * ref_stride);
+ r2 = vld1q_u8(r + 2 * ref_stride);
+ r3 = vld1q_u8(r + 3 * ref_stride);
+
+ uint16x8_t tmp0_lo = vaddl_u8(vget_low_u8(r0), vget_low_u8(r1));
+ uint16x8_t tmp0_hi = vaddl_u8(vget_high_u8(r0), vget_high_u8(r1));
+ uint16x8_t tmp1_lo = vaddl_u8(vget_low_u8(r2), vget_low_u8(r3));
+ uint16x8_t tmp1_hi = vaddl_u8(vget_high_u8(r2), vget_high_u8(r3));
+
+ sum_lo[0] = vaddq_u16(sum_lo[0], tmp0_lo);
+ sum_hi[0] = vaddq_u16(sum_hi[0], tmp0_hi);
+ sum_lo[1] = vaddq_u16(sum_lo[1], tmp1_lo);
+ sum_hi[1] = vaddq_u16(sum_hi[1], tmp1_hi);
+
+ r += 4 * ref_stride;
}
- const int16x8_t result0 =
- vshlq_s16(vreinterpretq_s16_u16(vec0), neg_norm_factor);
- const int16x8_t result1 =
- vshlq_s16(vreinterpretq_s16_u16(vec1), neg_norm_factor);
+ sum_lo[0] = vaddq_u16(sum_lo[0], sum_lo[1]);
+ sum_hi[0] = vaddq_u16(sum_hi[0], sum_hi[1]);
- vst1q_s16(hbuf + wd, result0);
- vst1q_s16(hbuf + wd + 8, result1);
- }
+ const int16x8_t avg0 =
+ vshlq_s16(vreinterpretq_s16_u16(sum_lo[0]), neg_norm_factor);
+ const int16x8_t avg1 =
+ vshlq_s16(vreinterpretq_s16_u16(sum_hi[0]), neg_norm_factor);
+
+ vst1q_s16(hbuf + w, avg0);
+ vst1q_s16(hbuf + w + 8, avg1);
+ w += 16;
+ } while (w < width);
}
void aom_int_pro_col_neon(int16_t *vbuf, const uint8_t *ref,
const int ref_stride, const int width,
const int height, int norm_factor) {
- for (int ht = 0; ht < height; ++ht) {
- uint16x8_t sum = vdupq_n_u16(0);
- for (int wd = 0; wd < width; wd += 16) {
- const uint8x16_t vec = vld1q_u8(ref + wd);
- sum = vaddq_u16(sum, vpaddlq_u8(vec));
+ assert(width % 16 == 0);
+ assert(height % 4 == 0);
+
+ const int16x4_t neg_norm_factor = vdup_n_s16(-norm_factor);
+ uint16x8_t sum[4];
+
+ int h = 0;
+ do {
+ sum[0] = vpaddlq_u8(vld1q_u8(ref + 0 * ref_stride));
+ sum[1] = vpaddlq_u8(vld1q_u8(ref + 1 * ref_stride));
+ sum[2] = vpaddlq_u8(vld1q_u8(ref + 2 * ref_stride));
+ sum[3] = vpaddlq_u8(vld1q_u8(ref + 3 * ref_stride));
+
+ for (int w = 16; w < width; w += 16) {
+ sum[0] = vpadalq_u8(sum[0], vld1q_u8(ref + 0 * ref_stride + w));
+ sum[1] = vpadalq_u8(sum[1], vld1q_u8(ref + 1 * ref_stride + w));
+ sum[2] = vpadalq_u8(sum[2], vld1q_u8(ref + 2 * ref_stride + w));
+ sum[3] = vpadalq_u8(sum[3], vld1q_u8(ref + 3 * ref_stride + w));
}
-#if defined(__aarch64__)
- vbuf[ht] = ((int16_t)vaddvq_u16(sum)) >> norm_factor;
-#else
- const uint32x4_t a = vpaddlq_u16(sum);
- const uint64x2_t b = vpaddlq_u32(a);
- const uint32x2_t c = vadd_u32(vreinterpret_u32_u64(vget_low_u64(b)),
- vreinterpret_u32_u64(vget_high_u64(b)));
- vbuf[ht] = ((int16_t)vget_lane_u32(c, 0)) >> norm_factor;
-#endif
- ref += ref_stride;
- }
+ uint16x4_t sum_4d = vmovn_u32(horizontal_add_4d_u16x8(sum));
+ int16x4_t avg = vshl_s16(vreinterpret_s16_u16(sum_4d), neg_norm_factor);
+ vst1_s16(vbuf + h, avg);
+
+ ref += 4 * ref_stride;
+ h += 4;
+ } while (h < height);
}
// coeff: 16 bits, dynamic range [-32640, 32640].
@@ -177,7 +217,7 @@
v_mean = vpadalq_s16(v_mean, diff);
v_low = vget_low_s16(diff);
v_sse = vmlal_s16(v_sse, v_low, v_low);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
v_sse = vmlal_high_s16(v_sse, diff, diff);
#else
const int16x4_t v_high = vget_high_s16(diff);
@@ -192,27 +232,56 @@
return var;
}
-#if CONFIG_AV1_HIGHBITDEPTH
-unsigned int aom_highbd_avg_4x4_neon(const uint8_t *s, int p) {
- const uint16_t *src = CONVERT_TO_SHORTPTR(s);
- const uint16x4_t r0 = vld1_u16(src);
- src += p;
- uint16x4_t r1, r2, r3;
- r1 = vld1_u16(src);
- src += p;
- r2 = vld1_u16(src);
- src += p;
- r3 = vld1_u16(src);
- const uint16x4_t s1 = vadd_u16(r0, r1);
- const uint16x4_t s2 = vadd_u16(r2, r3);
- const uint16x4_t s3 = vadd_u16(s1, s2);
-#if defined(__aarch64__)
- return (vaddv_u16(s3) + 8) >> 4;
+void aom_minmax_8x8_neon(const uint8_t *a, int a_stride, const uint8_t *b,
+ int b_stride, int *min, int *max) {
+ // Load and concatenate.
+ const uint8x16_t a01 = load_u8_8x2(a + 0 * a_stride, a_stride);
+ const uint8x16_t a23 = load_u8_8x2(a + 2 * a_stride, a_stride);
+ const uint8x16_t a45 = load_u8_8x2(a + 4 * a_stride, a_stride);
+ const uint8x16_t a67 = load_u8_8x2(a + 6 * a_stride, a_stride);
+
+ const uint8x16_t b01 = load_u8_8x2(b + 0 * b_stride, b_stride);
+ const uint8x16_t b23 = load_u8_8x2(b + 2 * b_stride, b_stride);
+ const uint8x16_t b45 = load_u8_8x2(b + 4 * b_stride, b_stride);
+ const uint8x16_t b67 = load_u8_8x2(b + 6 * b_stride, b_stride);
+
+ // Absolute difference.
+ const uint8x16_t ab01_diff = vabdq_u8(a01, b01);
+ const uint8x16_t ab23_diff = vabdq_u8(a23, b23);
+ const uint8x16_t ab45_diff = vabdq_u8(a45, b45);
+ const uint8x16_t ab67_diff = vabdq_u8(a67, b67);
+
+ // Max values between the Q vectors.
+ const uint8x16_t ab0123_max = vmaxq_u8(ab01_diff, ab23_diff);
+ const uint8x16_t ab4567_max = vmaxq_u8(ab45_diff, ab67_diff);
+ const uint8x16_t ab0123_min = vminq_u8(ab01_diff, ab23_diff);
+ const uint8x16_t ab4567_min = vminq_u8(ab45_diff, ab67_diff);
+
+ const uint8x16_t ab07_max = vmaxq_u8(ab0123_max, ab4567_max);
+ const uint8x16_t ab07_min = vminq_u8(ab0123_min, ab4567_min);
+
+#if AOM_ARCH_AARCH64
+ *min = *max = 0; // Clear high bits
+ *((uint8_t *)max) = vmaxvq_u8(ab07_max);
+ *((uint8_t *)min) = vminvq_u8(ab07_min);
#else
- const uint16x4_t h1 = vpadd_u16(s3, s3);
- const uint16x4_t h2 = vpadd_u16(h1, h1);
- const uint16x4_t res = vrshr_n_u16(h2, 4);
- return vget_lane_u16(res, 0);
+ // Split into 64-bit vectors and execute pairwise min/max.
+ uint8x8_t ab_max = vmax_u8(vget_high_u8(ab07_max), vget_low_u8(ab07_max));
+ uint8x8_t ab_min = vmin_u8(vget_high_u8(ab07_min), vget_low_u8(ab07_min));
+
+ // Enough runs of vpmax/min propagate the max/min values to every position.
+ ab_max = vpmax_u8(ab_max, ab_max);
+ ab_min = vpmin_u8(ab_min, ab_min);
+
+ ab_max = vpmax_u8(ab_max, ab_max);
+ ab_min = vpmin_u8(ab_min, ab_min);
+
+ ab_max = vpmax_u8(ab_max, ab_max);
+ ab_min = vpmin_u8(ab_min, ab_min);
+
+ *min = *max = 0; // Clear high bits
+ // Store directly to avoid costly neon->gpr transfer.
+ vst1_lane_u8((uint8_t *)max, ab_max, 0);
+ vst1_lane_u8((uint8_t *)min, ab_min, 0);
#endif
}
-#endif // CONFIG_AV1_HIGHBITDEPTH
diff --git a/aom_dsp/arm/avg_pred_neon.c b/aom_dsp/arm/avg_pred_neon.c
new file mode 100644
index 0000000..04e0904
--- /dev/null
+++ b/aom_dsp/arm/avg_pred_neon.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+#include <assert.h>
+
+#include "config/aom_dsp_rtcd.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/blend.h"
+
+void aom_comp_avg_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width,
+ int height, const uint8_t *ref, int ref_stride) {
+ if (width > 8) {
+ do {
+ const uint8_t *pred_ptr = pred;
+ const uint8_t *ref_ptr = ref;
+ uint8_t *comp_pred_ptr = comp_pred;
+ int w = width;
+
+ do {
+ const uint8x16_t p = vld1q_u8(pred_ptr);
+ const uint8x16_t r = vld1q_u8(ref_ptr);
+ const uint8x16_t avg = vrhaddq_u8(p, r);
+
+ vst1q_u8(comp_pred_ptr, avg);
+
+ ref_ptr += 16;
+ pred_ptr += 16;
+ comp_pred_ptr += 16;
+ w -= 16;
+ } while (w != 0);
+
+ ref += ref_stride;
+ pred += width;
+ comp_pred += width;
+ } while (--height != 0);
+ } else if (width == 8) {
+ int h = height / 2;
+
+ do {
+ const uint8x16_t p = vld1q_u8(pred);
+ const uint8x16_t r = load_u8_8x2(ref, ref_stride);
+ const uint8x16_t avg = vrhaddq_u8(p, r);
+
+ vst1q_u8(comp_pred, avg);
+
+ ref += 2 * ref_stride;
+ pred += 16;
+ comp_pred += 16;
+ } while (--h != 0);
+ } else {
+ int h = height / 4;
+ assert(width == 4);
+
+ do {
+ const uint8x16_t p = vld1q_u8(pred);
+ const uint8x16_t r = load_unaligned_u8q(ref, ref_stride);
+ const uint8x16_t avg = vrhaddq_u8(p, r);
+
+ vst1q_u8(comp_pred, avg);
+
+ ref += 4 * ref_stride;
+ pred += 16;
+ comp_pred += 16;
+ } while (--h != 0);
+ }
+}
+
+void aom_comp_mask_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width,
+ int height, const uint8_t *ref, int ref_stride,
+ const uint8_t *mask, int mask_stride,
+ int invert_mask) {
+ const uint8_t *src0 = invert_mask ? pred : ref;
+ const uint8_t *src1 = invert_mask ? ref : pred;
+ const int src_stride0 = invert_mask ? width : ref_stride;
+ const int src_stride1 = invert_mask ? ref_stride : width;
+
+ if (width > 8) {
+ const uint8x16_t max_alpha = vdupq_n_u8(AOM_BLEND_A64_MAX_ALPHA);
+ do {
+ const uint8_t *src0_ptr = src0;
+ const uint8_t *src1_ptr = src1;
+ const uint8_t *mask_ptr = mask;
+ uint8_t *comp_pred_ptr = comp_pred;
+ int w = width;
+
+ do {
+ const uint8x16_t s0 = vld1q_u8(src0_ptr);
+ const uint8x16_t s1 = vld1q_u8(src1_ptr);
+ const uint8x16_t m0 = vld1q_u8(mask_ptr);
+
+ uint8x16_t m0_inv = vsubq_u8(max_alpha, m0);
+ uint16x8_t blend_u16_lo = vmull_u8(vget_low_u8(s0), vget_low_u8(m0));
+ uint16x8_t blend_u16_hi = vmull_u8(vget_high_u8(s0), vget_high_u8(m0));
+ blend_u16_lo =
+ vmlal_u8(blend_u16_lo, vget_low_u8(s1), vget_low_u8(m0_inv));
+ blend_u16_hi =
+ vmlal_u8(blend_u16_hi, vget_high_u8(s1), vget_high_u8(m0_inv));
+
+ uint8x8_t blend_u8_lo =
+ vrshrn_n_u16(blend_u16_lo, AOM_BLEND_A64_ROUND_BITS);
+ uint8x8_t blend_u8_hi =
+ vrshrn_n_u16(blend_u16_hi, AOM_BLEND_A64_ROUND_BITS);
+ uint8x16_t blend_u8 = vcombine_u8(blend_u8_lo, blend_u8_hi);
+
+ vst1q_u8(comp_pred_ptr, blend_u8);
+
+ src0_ptr += 16;
+ src1_ptr += 16;
+ mask_ptr += 16;
+ comp_pred_ptr += 16;
+ w -= 16;
+ } while (w != 0);
+
+ src0 += src_stride0;
+ src1 += src_stride1;
+ mask += mask_stride;
+ comp_pred += width;
+ } while (--height != 0);
+ } else if (width == 8) {
+ const uint8x8_t max_alpha = vdup_n_u8(AOM_BLEND_A64_MAX_ALPHA);
+
+ do {
+ const uint8x8_t s0 = vld1_u8(src0);
+ const uint8x8_t s1 = vld1_u8(src1);
+ const uint8x8_t m0 = vld1_u8(mask);
+
+ uint8x8_t m0_inv = vsub_u8(max_alpha, m0);
+ uint16x8_t blend_u16 = vmull_u8(s0, m0);
+ blend_u16 = vmlal_u8(blend_u16, s1, m0_inv);
+ uint8x8_t blend_u8 = vrshrn_n_u16(blend_u16, AOM_BLEND_A64_ROUND_BITS);
+
+ vst1_u8(comp_pred, blend_u8);
+
+ src0 += src_stride0;
+ src1 += src_stride1;
+ mask += mask_stride;
+ comp_pred += 8;
+ } while (--height != 0);
+ } else {
+ const uint8x8_t max_alpha = vdup_n_u8(AOM_BLEND_A64_MAX_ALPHA);
+ int h = height / 2;
+ assert(width == 4);
+
+ do {
+ const uint8x8_t s0 = load_unaligned_u8(src0, src_stride0);
+ const uint8x8_t s1 = load_unaligned_u8(src1, src_stride1);
+ const uint8x8_t m0 = load_unaligned_u8(mask, mask_stride);
+
+ uint8x8_t m0_inv = vsub_u8(max_alpha, m0);
+ uint16x8_t blend_u16 = vmull_u8(s0, m0);
+ blend_u16 = vmlal_u8(blend_u16, s1, m0_inv);
+ uint8x8_t blend_u8 = vrshrn_n_u16(blend_u16, AOM_BLEND_A64_ROUND_BITS);
+
+ vst1_u8(comp_pred, blend_u8);
+
+ src0 += 2 * src_stride0;
+ src1 += 2 * src_stride1;
+ mask += 2 * mask_stride;
+ comp_pred += 8;
+ } while (--h != 0);
+ }
+}
diff --git a/aom_dsp/arm/blend_a64_mask_neon.c b/aom_dsp/arm/blend_a64_mask_neon.c
index f11d57e..c3ee0b7 100644
--- a/aom_dsp/arm/blend_a64_mask_neon.c
+++ b/aom_dsp/arm/blend_a64_mask_neon.c
@@ -86,19 +86,21 @@
const int16x8_t vec_round_bits) {
int16x8_t src0_0, src0_1;
int16x8_t src1_0, src1_1;
- uint64x2_t tu0 = vdupq_n_u64(0), tu1 = vdupq_n_u64(0), tu2 = vdupq_n_u64(0),
- tu3 = vdupq_n_u64(0);
+ uint16x8_t tu0 = vdupq_n_u16(0);
+ uint16x8_t tu1 = vdupq_n_u16(0);
+ uint16x8_t tu2 = vdupq_n_u16(0);
+ uint16x8_t tu3 = vdupq_n_u16(0);
int16x8_t mask0_1, mask2_3;
int16x8_t res0, res1;
load_unaligned_u16_4x4(src0, src0_stride, &tu0, &tu1);
load_unaligned_u16_4x4(src1, src1_stride, &tu2, &tu3);
- src0_0 = vreinterpretq_s16_u64(tu0);
- src0_1 = vreinterpretq_s16_u64(tu1);
+ src0_0 = vreinterpretq_s16_u16(tu0);
+ src0_1 = vreinterpretq_s16_u16(tu1);
- src1_0 = vreinterpretq_s16_u64(tu2);
- src1_1 = vreinterpretq_s16_u64(tu3);
+ src1_0 = vreinterpretq_s16_u16(tu2);
+ src1_1 = vreinterpretq_s16_u16(tu3);
mask0_1 = vcombine_s16(mask0, mask1);
mask2_3 = vcombine_s16(mask2, mask3);
@@ -150,9 +152,10 @@
assert(IS_POWER_OF_TWO(h));
assert(IS_POWER_OF_TWO(w));
- uint8x8_t s0, s1, s2, s3;
- uint32x2_t tu0 = vdup_n_u32(0), tu1 = vdup_n_u32(0), tu2 = vdup_n_u32(0),
- tu3 = vdup_n_u32(0);
+ uint8x8_t s0 = vdup_n_u8(0);
+ uint8x8_t s1 = vdup_n_u8(0);
+ uint8x8_t s2 = vdup_n_u8(0);
+ uint8x8_t s3 = vdup_n_u8(0);
uint8x16_t t0, t1, t2, t3, t4, t5, t6, t7;
int16x8_t mask0, mask1, mask2, mask3;
int16x8_t mask4, mask5, mask6, mask7;
@@ -197,10 +200,10 @@
} while (i < h);
} else {
do {
- load_unaligned_u8_4x4(mask_tmp, mask_stride, &tu0, &tu1);
+ load_unaligned_u8_4x4(mask_tmp, mask_stride, &s0, &s1);
- mask0 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu0)));
- mask1 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu1)));
+ mask0 = vreinterpretq_s16_u16(vmovl_u8(s0));
+ mask1 = vreinterpretq_s16_u16(vmovl_u8(s1));
mask0_low = vget_low_s16(mask0);
mask1_low = vget_high_s16(mask0);
@@ -412,14 +415,9 @@
} while (i < h);
} else {
do {
- load_unaligned_u8_4x4(mask_tmp, 2 * mask_stride, &tu0, &tu1);
- load_unaligned_u8_4x4(mask_tmp + mask_stride, 2 * mask_stride, &tu2,
- &tu3);
-
- s0 = vreinterpret_u8_u32(tu0);
- s1 = vreinterpret_u8_u32(tu1);
- s2 = vreinterpret_u8_u32(tu2);
- s3 = vreinterpret_u8_u32(tu3);
+ load_unaligned_u8_4x4(mask_tmp, 2 * mask_stride, &s0, &s1);
+ load_unaligned_u8_4x4(mask_tmp + mask_stride, 2 * mask_stride, &s2,
+ &s3);
mask0 = vreinterpretq_s16_u16(vaddl_u8(s0, s2));
mask1 = vreinterpretq_s16_u16(vaddl_u8(s1, s3));
diff --git a/aom_dsp/arm/fwd_txfm_neon.c b/aom_dsp/arm/fwd_txfm_neon.c
index 7fccdab..a7d66b3 100644
--- a/aom_dsp/arm/fwd_txfm_neon.c
+++ b/aom_dsp/arm/fwd_txfm_neon.c
@@ -67,7 +67,10 @@
int16x4_t out_1 = vrshrn_n_s32(temp3, DCT_CONST_BITS);
int16x4_t out_3 = vrshrn_n_s32(temp4, DCT_CONST_BITS);
- transpose_s16_4x4d(&out_0, &out_1, &out_2, &out_3);
+ // Only transpose the first pass
+ if (i == 0) {
+ transpose_s16_4x4d(&out_0, &out_1, &out_2, &out_3);
+ }
*input_0 = out_0;
*input_1 = out_1;
diff --git a/aom_dsp/arm/hadamard_neon.c b/aom_dsp/arm/hadamard_neon.c
index 75dd7d6..82ce0cd 100644
--- a/aom_dsp/arm/hadamard_neon.c
+++ b/aom_dsp/arm/hadamard_neon.c
@@ -15,6 +15,38 @@
#include "aom_dsp/arm/mem_neon.h"
#include "aom_dsp/arm/transpose_neon.h"
+static INLINE void hadamard_4x4_one_pass(int16x4_t *a0, int16x4_t *a1,
+ int16x4_t *a2, int16x4_t *a3) {
+ const int16x4_t b0 = vhadd_s16(*a0, *a1);
+ const int16x4_t b1 = vhsub_s16(*a0, *a1);
+ const int16x4_t b2 = vhadd_s16(*a2, *a3);
+ const int16x4_t b3 = vhsub_s16(*a2, *a3);
+
+ *a0 = vadd_s16(b0, b2);
+ *a1 = vadd_s16(b1, b3);
+ *a2 = vsub_s16(b0, b2);
+ *a3 = vsub_s16(b1, b3);
+}
+
+void aom_hadamard_4x4_neon(const int16_t *src_diff, ptrdiff_t src_stride,
+ tran_low_t *coeff) {
+ int16x4_t a0 = vld1_s16(src_diff);
+ int16x4_t a1 = vld1_s16(src_diff + src_stride);
+ int16x4_t a2 = vld1_s16(src_diff + 2 * src_stride);
+ int16x4_t a3 = vld1_s16(src_diff + 3 * src_stride);
+
+ hadamard_4x4_one_pass(&a0, &a1, &a2, &a3);
+
+ transpose_s16_4x4d(&a0, &a1, &a2, &a3);
+
+ hadamard_4x4_one_pass(&a0, &a1, &a2, &a3);
+
+ store_s16_to_tran_low(coeff, a0);
+ store_s16_to_tran_low(coeff + 4, a1);
+ store_s16_to_tran_low(coeff + 8, a2);
+ store_s16_to_tran_low(coeff + 12, a3);
+}
+
static void hadamard8x8_one_pass(int16x8_t *a0, int16x8_t *a1, int16x8_t *a2,
int16x8_t *a3, int16x8_t *a4, int16x8_t *a5,
int16x8_t *a6, int16x8_t *a7) {
@@ -154,44 +186,106 @@
void aom_hadamard_16x16_neon(const int16_t *src_diff, ptrdiff_t src_stride,
tran_low_t *coeff) {
- DECLARE_ALIGNED(32, tran_low_t, temp_coeff[16 * 16]);
/* Rearrange 16x16 to 8x32 and remove stride.
* Top left first. */
- aom_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride,
- temp_coeff + 0);
+ aom_hadamard_8x8_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
/* Top right. */
- aom_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride,
- temp_coeff + 64);
+ aom_hadamard_8x8_neon(src_diff + 8 + 0 * src_stride, src_stride, coeff + 64);
/* Bottom left. */
- aom_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride,
- temp_coeff + 128);
+ aom_hadamard_8x8_neon(src_diff + 0 + 8 * src_stride, src_stride, coeff + 128);
/* Bottom right. */
- aom_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride,
- temp_coeff + 192);
+ aom_hadamard_8x8_neon(src_diff + 8 + 8 * src_stride, src_stride, coeff + 192);
- tran_low_t *t_coeff = temp_coeff;
- for (int i = 0; i < 64; i += 8) {
- const int16x8_t a0 = load_tran_low_to_s16q(t_coeff + 0);
- const int16x8_t a1 = load_tran_low_to_s16q(t_coeff + 64);
- const int16x8_t a2 = load_tran_low_to_s16q(t_coeff + 128);
- const int16x8_t a3 = load_tran_low_to_s16q(t_coeff + 192);
+ for (int i = 0; i < 64; i += 16) {
+ const int16x8_t a00 = load_tran_low_to_s16q(coeff + 0);
+ const int16x8_t a01 = load_tran_low_to_s16q(coeff + 64);
+ const int16x8_t a02 = load_tran_low_to_s16q(coeff + 128);
+ const int16x8_t a03 = load_tran_low_to_s16q(coeff + 192);
- const int16x8_t b0 = vhaddq_s16(a0, a1);
- const int16x8_t b1 = vhsubq_s16(a0, a1);
- const int16x8_t b2 = vhaddq_s16(a2, a3);
- const int16x8_t b3 = vhsubq_s16(a2, a3);
+ const int16x8_t b00 = vhaddq_s16(a00, a01);
+ const int16x8_t b01 = vhsubq_s16(a00, a01);
+ const int16x8_t b02 = vhaddq_s16(a02, a03);
+ const int16x8_t b03 = vhsubq_s16(a02, a03);
- const int16x8_t c0 = vaddq_s16(b0, b2);
- const int16x8_t c1 = vaddq_s16(b1, b3);
- const int16x8_t c2 = vsubq_s16(b0, b2);
- const int16x8_t c3 = vsubq_s16(b1, b3);
+ const int16x8_t c00 = vaddq_s16(b00, b02);
+ const int16x8_t c01 = vaddq_s16(b01, b03);
+ const int16x8_t c02 = vsubq_s16(b00, b02);
+ const int16x8_t c03 = vsubq_s16(b01, b03);
- store_s16q_to_tran_low_offset_4(coeff + 0, c0);
- store_s16q_to_tran_low_offset_4(coeff + 64, c1);
- store_s16q_to_tran_low_offset_4(coeff + 128, c2);
- store_s16q_to_tran_low_offset_4(coeff + 192, c3);
+ const int16x8_t a10 = load_tran_low_to_s16q(coeff + 8 + 0);
+ const int16x8_t a11 = load_tran_low_to_s16q(coeff + 8 + 64);
+ const int16x8_t a12 = load_tran_low_to_s16q(coeff + 8 + 128);
+ const int16x8_t a13 = load_tran_low_to_s16q(coeff + 8 + 192);
- t_coeff += 8;
- coeff += (4 + (((i >> 3) & 1) << 3));
+ const int16x8_t b10 = vhaddq_s16(a10, a11);
+ const int16x8_t b11 = vhsubq_s16(a10, a11);
+ const int16x8_t b12 = vhaddq_s16(a12, a13);
+ const int16x8_t b13 = vhsubq_s16(a12, a13);
+
+ const int16x8_t c10 = vaddq_s16(b10, b12);
+ const int16x8_t c11 = vaddq_s16(b11, b13);
+ const int16x8_t c12 = vsubq_s16(b10, b12);
+ const int16x8_t c13 = vsubq_s16(b11, b13);
+
+ store_s16_to_tran_low(coeff + 0 + 0, vget_low_s16(c00));
+ store_s16_to_tran_low(coeff + 0 + 4, vget_low_s16(c10));
+ store_s16_to_tran_low(coeff + 0 + 8, vget_high_s16(c00));
+ store_s16_to_tran_low(coeff + 0 + 12, vget_high_s16(c10));
+
+ store_s16_to_tran_low(coeff + 64 + 0, vget_low_s16(c01));
+ store_s16_to_tran_low(coeff + 64 + 4, vget_low_s16(c11));
+ store_s16_to_tran_low(coeff + 64 + 8, vget_high_s16(c01));
+ store_s16_to_tran_low(coeff + 64 + 12, vget_high_s16(c11));
+
+ store_s16_to_tran_low(coeff + 128 + 0, vget_low_s16(c02));
+ store_s16_to_tran_low(coeff + 128 + 4, vget_low_s16(c12));
+ store_s16_to_tran_low(coeff + 128 + 8, vget_high_s16(c02));
+ store_s16_to_tran_low(coeff + 128 + 12, vget_high_s16(c12));
+
+ store_s16_to_tran_low(coeff + 192 + 0, vget_low_s16(c03));
+ store_s16_to_tran_low(coeff + 192 + 4, vget_low_s16(c13));
+ store_s16_to_tran_low(coeff + 192 + 8, vget_high_s16(c03));
+ store_s16_to_tran_low(coeff + 192 + 12, vget_high_s16(c13));
+
+ coeff += 16;
+ }
+}
+
+void aom_hadamard_32x32_neon(const int16_t *src_diff, ptrdiff_t src_stride,
+ tran_low_t *coeff) {
+ /* Top left first. */
+ aom_hadamard_16x16_neon(src_diff + 0 + 0 * src_stride, src_stride, coeff + 0);
+ /* Top right. */
+ aom_hadamard_16x16_neon(src_diff + 16 + 0 * src_stride, src_stride,
+ coeff + 256);
+ /* Bottom left. */
+ aom_hadamard_16x16_neon(src_diff + 0 + 16 * src_stride, src_stride,
+ coeff + 512);
+ /* Bottom right. */
+ aom_hadamard_16x16_neon(src_diff + 16 + 16 * src_stride, src_stride,
+ coeff + 768);
+
+ for (int i = 0; i < 256; i += 4) {
+ const int32x4_t a0 = vld1q_s32(coeff);
+ const int32x4_t a1 = vld1q_s32(coeff + 256);
+ const int32x4_t a2 = vld1q_s32(coeff + 512);
+ const int32x4_t a3 = vld1q_s32(coeff + 768);
+
+ const int32x4_t b0 = vshrq_n_s32(vaddq_s32(a0, a1), 2);
+ const int32x4_t b1 = vshrq_n_s32(vsubq_s32(a0, a1), 2);
+ const int32x4_t b2 = vshrq_n_s32(vaddq_s32(a2, a3), 2);
+ const int32x4_t b3 = vshrq_n_s32(vsubq_s32(a2, a3), 2);
+
+ const int32x4_t c0 = vaddq_s32(b0, b2);
+ const int32x4_t c1 = vaddq_s32(b1, b3);
+ const int32x4_t c2 = vsubq_s32(b0, b2);
+ const int32x4_t c3 = vsubq_s32(b1, b3);
+
+ vst1q_s32(coeff + 0, c0);
+ vst1q_s32(coeff + 256, c1);
+ vst1q_s32(coeff + 512, c2);
+ vst1q_s32(coeff + 768, c3);
+
+ coeff += 4;
}
}
diff --git a/aom_dsp/arm/highbd_avg_neon.c b/aom_dsp/arm/highbd_avg_neon.c
new file mode 100644
index 0000000..47d5dae
--- /dev/null
+++ b/aom_dsp/arm/highbd_avg_neon.c
@@ -0,0 +1,125 @@
+/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All Rights Reserved.
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
+#include "aom_ports/mem.h"
+
+uint32_t aom_highbd_avg_4x4_neon(const uint8_t *a, int a_stride) {
+ const uint16_t *a_ptr = CONVERT_TO_SHORTPTR(a);
+ uint16x4_t sum, a0, a1, a2, a3;
+
+ load_u16_4x4(a_ptr, a_stride, &a0, &a1, &a2, &a3);
+
+ sum = vadd_u16(a0, a1);
+ sum = vadd_u16(sum, a2);
+ sum = vadd_u16(sum, a3);
+
+ return (horizontal_add_u16x4(sum) + (1 << 3)) >> 4;
+}
+
+uint32_t aom_highbd_avg_8x8_neon(const uint8_t *a, int a_stride) {
+ const uint16_t *a_ptr = CONVERT_TO_SHORTPTR(a);
+ uint16x8_t sum, a0, a1, a2, a3, a4, a5, a6, a7;
+
+ load_u16_8x8(a_ptr, a_stride, &a0, &a1, &a2, &a3, &a4, &a5, &a6, &a7);
+
+ sum = vaddq_u16(a0, a1);
+ sum = vaddq_u16(sum, a2);
+ sum = vaddq_u16(sum, a3);
+ sum = vaddq_u16(sum, a4);
+ sum = vaddq_u16(sum, a5);
+ sum = vaddq_u16(sum, a6);
+ sum = vaddq_u16(sum, a7);
+
+ return (horizontal_add_u16x8(sum) + (1 << 5)) >> 6;
+}
+
+void aom_highbd_minmax_8x8_neon(const uint8_t *s8, int p, const uint8_t *d8,
+ int dp, int *min, int *max) {
+ const uint16_t *a_ptr = CONVERT_TO_SHORTPTR(s8);
+ const uint16_t *b_ptr = CONVERT_TO_SHORTPTR(d8);
+
+ const uint16x8_t a0 = vld1q_u16(a_ptr + 0 * p);
+ const uint16x8_t a1 = vld1q_u16(a_ptr + 1 * p);
+ const uint16x8_t a2 = vld1q_u16(a_ptr + 2 * p);
+ const uint16x8_t a3 = vld1q_u16(a_ptr + 3 * p);
+ const uint16x8_t a4 = vld1q_u16(a_ptr + 4 * p);
+ const uint16x8_t a5 = vld1q_u16(a_ptr + 5 * p);
+ const uint16x8_t a6 = vld1q_u16(a_ptr + 6 * p);
+ const uint16x8_t a7 = vld1q_u16(a_ptr + 7 * p);
+
+ const uint16x8_t b0 = vld1q_u16(b_ptr + 0 * dp);
+ const uint16x8_t b1 = vld1q_u16(b_ptr + 1 * dp);
+ const uint16x8_t b2 = vld1q_u16(b_ptr + 2 * dp);
+ const uint16x8_t b3 = vld1q_u16(b_ptr + 3 * dp);
+ const uint16x8_t b4 = vld1q_u16(b_ptr + 4 * dp);
+ const uint16x8_t b5 = vld1q_u16(b_ptr + 5 * dp);
+ const uint16x8_t b6 = vld1q_u16(b_ptr + 6 * dp);
+ const uint16x8_t b7 = vld1q_u16(b_ptr + 7 * dp);
+
+ const uint16x8_t abs_diff0 = vabdq_u16(a0, b0);
+ const uint16x8_t abs_diff1 = vabdq_u16(a1, b1);
+ const uint16x8_t abs_diff2 = vabdq_u16(a2, b2);
+ const uint16x8_t abs_diff3 = vabdq_u16(a3, b3);
+ const uint16x8_t abs_diff4 = vabdq_u16(a4, b4);
+ const uint16x8_t abs_diff5 = vabdq_u16(a5, b5);
+ const uint16x8_t abs_diff6 = vabdq_u16(a6, b6);
+ const uint16x8_t abs_diff7 = vabdq_u16(a7, b7);
+
+ const uint16x8_t max01 = vmaxq_u16(abs_diff0, abs_diff1);
+ const uint16x8_t max23 = vmaxq_u16(abs_diff2, abs_diff3);
+ const uint16x8_t max45 = vmaxq_u16(abs_diff4, abs_diff5);
+ const uint16x8_t max67 = vmaxq_u16(abs_diff6, abs_diff7);
+
+ const uint16x8_t max0123 = vmaxq_u16(max01, max23);
+ const uint16x8_t max4567 = vmaxq_u16(max45, max67);
+ const uint16x8_t max07 = vmaxq_u16(max0123, max4567);
+
+ const uint16x8_t min01 = vminq_u16(abs_diff0, abs_diff1);
+ const uint16x8_t min23 = vminq_u16(abs_diff2, abs_diff3);
+ const uint16x8_t min45 = vminq_u16(abs_diff4, abs_diff5);
+ const uint16x8_t min67 = vminq_u16(abs_diff6, abs_diff7);
+
+ const uint16x8_t min0123 = vminq_u16(min01, min23);
+ const uint16x8_t min4567 = vminq_u16(min45, min67);
+ const uint16x8_t min07 = vminq_u16(min0123, min4567);
+
+#if AOM_ARCH_AARCH64
+ *max = (int)vmaxvq_u16(max07);
+ *min = (int)vminvq_u16(min07);
+#else
+ // Split into 64-bit vectors and execute pairwise min/max.
+ uint16x4_t ab_max = vmax_u16(vget_high_u16(max07), vget_low_u16(max07));
+ uint16x4_t ab_min = vmin_u16(vget_high_u16(min07), vget_low_u16(min07));
+
+ // Enough runs of vpmax/min propagate the max/min values to every position.
+ ab_max = vpmax_u16(ab_max, ab_max);
+ ab_min = vpmin_u16(ab_min, ab_min);
+
+ ab_max = vpmax_u16(ab_max, ab_max);
+ ab_min = vpmin_u16(ab_min, ab_min);
+
+ ab_max = vpmax_u16(ab_max, ab_max);
+ ab_min = vpmin_u16(ab_min, ab_min);
+
+ *min = *max = 0; // Clear high bits
+ // Store directly to avoid costly neon->gpr transfer.
+ vst1_lane_u16((uint16_t *)max, ab_max, 0);
+ vst1_lane_u16((uint16_t *)min, ab_min, 0);
+#endif
+}
diff --git a/aom_dsp/arm/highbd_hadamard_neon.c b/aom_dsp/arm/highbd_hadamard_neon.c
new file mode 100644
index 0000000..aad2046
--- /dev/null
+++ b/aom_dsp/arm/highbd_hadamard_neon.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All Rights Reserved.
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+#include "config/aom_dsp_rtcd.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/transpose_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
+#include "aom_ports/mem.h"
+
+static INLINE void hadamard_highbd_col8_first_pass(int16x8_t *a0, int16x8_t *a1,
+ int16x8_t *a2, int16x8_t *a3,
+ int16x8_t *a4, int16x8_t *a5,
+ int16x8_t *a6,
+ int16x8_t *a7) {
+ int16x8_t b0 = vaddq_s16(*a0, *a1);
+ int16x8_t b1 = vsubq_s16(*a0, *a1);
+ int16x8_t b2 = vaddq_s16(*a2, *a3);
+ int16x8_t b3 = vsubq_s16(*a2, *a3);
+ int16x8_t b4 = vaddq_s16(*a4, *a5);
+ int16x8_t b5 = vsubq_s16(*a4, *a5);
+ int16x8_t b6 = vaddq_s16(*a6, *a7);
+ int16x8_t b7 = vsubq_s16(*a6, *a7);
+
+ int16x8_t c0 = vaddq_s16(b0, b2);
+ int16x8_t c2 = vsubq_s16(b0, b2);
+ int16x8_t c1 = vaddq_s16(b1, b3);
+ int16x8_t c3 = vsubq_s16(b1, b3);
+ int16x8_t c4 = vaddq_s16(b4, b6);
+ int16x8_t c6 = vsubq_s16(b4, b6);
+ int16x8_t c5 = vaddq_s16(b5, b7);
+ int16x8_t c7 = vsubq_s16(b5, b7);
+
+ *a0 = vaddq_s16(c0, c4);
+ *a2 = vsubq_s16(c0, c4);
+ *a7 = vaddq_s16(c1, c5);
+ *a6 = vsubq_s16(c1, c5);
+ *a3 = vaddq_s16(c2, c6);
+ *a1 = vsubq_s16(c2, c6);
+ *a4 = vaddq_s16(c3, c7);
+ *a5 = vsubq_s16(c3, c7);
+}
+
+static INLINE void hadamard_highbd_col4_second_pass(int16x4_t a0, int16x4_t a1,
+ int16x4_t a2, int16x4_t a3,
+ int16x4_t a4, int16x4_t a5,
+ int16x4_t a6, int16x4_t a7,
+ tran_low_t *coeff) {
+ int32x4_t b0 = vaddl_s16(a0, a1);
+ int32x4_t b1 = vsubl_s16(a0, a1);
+ int32x4_t b2 = vaddl_s16(a2, a3);
+ int32x4_t b3 = vsubl_s16(a2, a3);
+ int32x4_t b4 = vaddl_s16(a4, a5);
+ int32x4_t b5 = vsubl_s16(a4, a5);
+ int32x4_t b6 = vaddl_s16(a6, a7);
+ int32x4_t b7 = vsubl_s16(a6, a7);
+
+ int32x4_t c0 = vaddq_s32(b0, b2);
+ int32x4_t c2 = vsubq_s32(b0, b2);
+ int32x4_t c1 = vaddq_s32(b1, b3);
+ int32x4_t c3 = vsubq_s32(b1, b3);
+ int32x4_t c4 = vaddq_s32(b4, b6);
+ int32x4_t c6 = vsubq_s32(b4, b6);
+ int32x4_t c5 = vaddq_s32(b5, b7);
+ int32x4_t c7 = vsubq_s32(b5, b7);
+
+ int32x4_t d0 = vaddq_s32(c0, c4);
+ int32x4_t d2 = vsubq_s32(c0, c4);
+ int32x4_t d7 = vaddq_s32(c1, c5);
+ int32x4_t d6 = vsubq_s32(c1, c5);
+ int32x4_t d3 = vaddq_s32(c2, c6);
+ int32x4_t d1 = vsubq_s32(c2, c6);
+ int32x4_t d4 = vaddq_s32(c3, c7);
+ int32x4_t d5 = vsubq_s32(c3, c7);
+
+ vst1q_s32(coeff + 0, d0);
+ vst1q_s32(coeff + 4, d1);
+ vst1q_s32(coeff + 8, d2);
+ vst1q_s32(coeff + 12, d3);
+ vst1q_s32(coeff + 16, d4);
+ vst1q_s32(coeff + 20, d5);
+ vst1q_s32(coeff + 24, d6);
+ vst1q_s32(coeff + 28, d7);
+}
+
+void aom_highbd_hadamard_8x8_neon(const int16_t *src_diff, ptrdiff_t src_stride,
+ tran_low_t *coeff) {
+ int16x4_t b0, b1, b2, b3, b4, b5, b6, b7;
+
+ int16x8_t s0 = vld1q_s16(src_diff + 0 * src_stride);
+ int16x8_t s1 = vld1q_s16(src_diff + 1 * src_stride);
+ int16x8_t s2 = vld1q_s16(src_diff + 2 * src_stride);
+ int16x8_t s3 = vld1q_s16(src_diff + 3 * src_stride);
+ int16x8_t s4 = vld1q_s16(src_diff + 4 * src_stride);
+ int16x8_t s5 = vld1q_s16(src_diff + 5 * src_stride);
+ int16x8_t s6 = vld1q_s16(src_diff + 6 * src_stride);
+ int16x8_t s7 = vld1q_s16(src_diff + 7 * src_stride);
+
+ // For the first pass we can stay in 16-bit elements (4095*8 = 32760).
+ hadamard_highbd_col8_first_pass(&s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+
+ transpose_s16_8x8(&s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+
+ // For the second pass we need to widen to 32-bit elements, so we're
+ // processing 4 columns at a time.
+ // Skip the second transpose because it is not required.
+
+ b0 = vget_low_s16(s0);
+ b1 = vget_low_s16(s1);
+ b2 = vget_low_s16(s2);
+ b3 = vget_low_s16(s3);
+ b4 = vget_low_s16(s4);
+ b5 = vget_low_s16(s5);
+ b6 = vget_low_s16(s6);
+ b7 = vget_low_s16(s7);
+
+ hadamard_highbd_col4_second_pass(b0, b1, b2, b3, b4, b5, b6, b7, coeff);
+
+ b0 = vget_high_s16(s0);
+ b1 = vget_high_s16(s1);
+ b2 = vget_high_s16(s2);
+ b3 = vget_high_s16(s3);
+ b4 = vget_high_s16(s4);
+ b5 = vget_high_s16(s5);
+ b6 = vget_high_s16(s6);
+ b7 = vget_high_s16(s7);
+
+ hadamard_highbd_col4_second_pass(b0, b1, b2, b3, b4, b5, b6, b7, coeff + 32);
+}
+
+void aom_highbd_hadamard_16x16_neon(const int16_t *src_diff,
+ ptrdiff_t src_stride, tran_low_t *coeff) {
+ // Rearrange 16x16 to 8x32 and remove stride.
+ // Top left first.
+ aom_highbd_hadamard_8x8_neon(src_diff, src_stride, coeff);
+ // Top right.
+ aom_highbd_hadamard_8x8_neon(src_diff + 8, src_stride, coeff + 64);
+ // Bottom left.
+ aom_highbd_hadamard_8x8_neon(src_diff + 8 * src_stride, src_stride,
+ coeff + 128);
+ // Bottom right.
+ aom_highbd_hadamard_8x8_neon(src_diff + 8 * src_stride + 8, src_stride,
+ coeff + 192);
+
+ for (int i = 0; i < 16; i++) {
+ int32x4_t a0 = vld1q_s32(coeff + 4 * i);
+ int32x4_t a1 = vld1q_s32(coeff + 4 * i + 64);
+ int32x4_t a2 = vld1q_s32(coeff + 4 * i + 128);
+ int32x4_t a3 = vld1q_s32(coeff + 4 * i + 192);
+
+ int32x4_t b0 = vhaddq_s32(a0, a1);
+ int32x4_t b1 = vhsubq_s32(a0, a1);
+ int32x4_t b2 = vhaddq_s32(a2, a3);
+ int32x4_t b3 = vhsubq_s32(a2, a3);
+
+ int32x4_t c0 = vaddq_s32(b0, b2);
+ int32x4_t c1 = vaddq_s32(b1, b3);
+ int32x4_t c2 = vsubq_s32(b0, b2);
+ int32x4_t c3 = vsubq_s32(b1, b3);
+
+ vst1q_s32(coeff + 4 * i, c0);
+ vst1q_s32(coeff + 4 * i + 64, c1);
+ vst1q_s32(coeff + 4 * i + 128, c2);
+ vst1q_s32(coeff + 4 * i + 192, c3);
+ }
+}
+
+void aom_highbd_hadamard_32x32_neon(const int16_t *src_diff,
+ ptrdiff_t src_stride, tran_low_t *coeff) {
+ // Rearrange 32x32 to 16x64 and remove stride.
+ // Top left first.
+ aom_highbd_hadamard_16x16_neon(src_diff, src_stride, coeff);
+ // Top right.
+ aom_highbd_hadamard_16x16_neon(src_diff + 16, src_stride, coeff + 256);
+ // Bottom left.
+ aom_highbd_hadamard_16x16_neon(src_diff + 16 * src_stride, src_stride,
+ coeff + 512);
+ // Bottom right.
+ aom_highbd_hadamard_16x16_neon(src_diff + 16 * src_stride + 16, src_stride,
+ coeff + 768);
+
+ for (int i = 0; i < 64; i++) {
+ int32x4_t a0 = vld1q_s32(coeff + 4 * i);
+ int32x4_t a1 = vld1q_s32(coeff + 4 * i + 256);
+ int32x4_t a2 = vld1q_s32(coeff + 4 * i + 512);
+ int32x4_t a3 = vld1q_s32(coeff + 4 * i + 768);
+
+ int32x4_t b0 = vshrq_n_s32(vaddq_s32(a0, a1), 2);
+ int32x4_t b1 = vshrq_n_s32(vsubq_s32(a0, a1), 2);
+ int32x4_t b2 = vshrq_n_s32(vaddq_s32(a2, a3), 2);
+ int32x4_t b3 = vshrq_n_s32(vsubq_s32(a2, a3), 2);
+
+ int32x4_t c0 = vaddq_s32(b0, b2);
+ int32x4_t c1 = vaddq_s32(b1, b3);
+ int32x4_t c2 = vsubq_s32(b0, b2);
+ int32x4_t c3 = vsubq_s32(b1, b3);
+
+ vst1q_s32(coeff + 4 * i, c0);
+ vst1q_s32(coeff + 4 * i + 256, c1);
+ vst1q_s32(coeff + 4 * i + 512, c2);
+ vst1q_s32(coeff + 4 * i + 768, c3);
+ }
+}
diff --git a/aom_dsp/arm/highbd_intrapred_neon.c b/aom_dsp/arm/highbd_intrapred_neon.c
index fa2f11e..63f53c3 100644
--- a/aom_dsp/arm/highbd_intrapred_neon.c
+++ b/aom_dsp/arm/highbd_intrapred_neon.c
@@ -20,66 +20,333 @@
// -----------------------------------------------------------------------------
// DC
-static INLINE void highbd_dc_predictor(uint16_t *dst, ptrdiff_t stride, int bw,
- const uint16_t *above,
- const uint16_t *left) {
- assert(bw >= 4);
- assert(IS_POWER_OF_TWO(bw));
- int expected_dc, sum = 0;
- const int count = bw * 2;
- uint32x4_t sum_q = vdupq_n_u32(0);
- uint32x2_t sum_d;
- uint16_t *dst_1;
- if (bw >= 8) {
- for (int i = 0; i < bw; i += 8) {
- sum_q = vpadalq_u16(sum_q, vld1q_u16(above));
- sum_q = vpadalq_u16(sum_q, vld1q_u16(left));
- above += 8;
- left += 8;
- }
- sum_d = vadd_u32(vget_low_u32(sum_q), vget_high_u32(sum_q));
- sum = vget_lane_s32(vreinterpret_s32_u64(vpaddl_u32(sum_d)), 0);
- expected_dc = (sum + (count >> 1)) / count;
- const uint16x8_t dc = vdupq_n_u16((uint16_t)expected_dc);
- for (int r = 0; r < bw; r++) {
- dst_1 = dst;
- for (int i = 0; i < bw; i += 8) {
- vst1q_u16(dst_1, dc);
- dst_1 += 8;
- }
- dst += stride;
- }
- } else { // 4x4
- sum_q = vaddl_u16(vld1_u16(above), vld1_u16(left));
- sum_d = vadd_u32(vget_low_u32(sum_q), vget_high_u32(sum_q));
- sum = vget_lane_s32(vreinterpret_s32_u64(vpaddl_u32(sum_d)), 0);
- expected_dc = (sum + (count >> 1)) / count;
- const uint16x4_t dc = vdup_n_u16((uint16_t)expected_dc);
- for (int r = 0; r < bw; r++) {
- vst1_u16(dst, dc);
- dst += stride;
- }
+static INLINE void highbd_dc_store_4xh(uint16_t *dst, ptrdiff_t stride, int h,
+ uint16x4_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1_u16(dst + i * stride, dc);
}
}
-#define INTRA_PRED_HIGHBD_SIZED_NEON(type, width) \
- void aom_highbd_##type##_predictor_##width##x##width##_neon( \
- uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
- const uint16_t *left, int bd) { \
- (void)bd; \
- highbd_##type##_predictor(dst, stride, width, above, left); \
+static INLINE void highbd_dc_store_8xh(uint16_t *dst, ptrdiff_t stride, int h,
+ uint16x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u16(dst + i * stride, dc);
+ }
+}
+
+static INLINE void highbd_dc_store_16xh(uint16_t *dst, ptrdiff_t stride, int h,
+ uint16x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u16(dst + i * stride, dc);
+ vst1q_u16(dst + i * stride + 8, dc);
+ }
+}
+
+static INLINE void highbd_dc_store_32xh(uint16_t *dst, ptrdiff_t stride, int h,
+ uint16x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u16(dst + i * stride, dc);
+ vst1q_u16(dst + i * stride + 8, dc);
+ vst1q_u16(dst + i * stride + 16, dc);
+ vst1q_u16(dst + i * stride + 24, dc);
+ }
+}
+
+static INLINE void highbd_dc_store_64xh(uint16_t *dst, ptrdiff_t stride, int h,
+ uint16x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u16(dst + i * stride, dc);
+ vst1q_u16(dst + i * stride + 8, dc);
+ vst1q_u16(dst + i * stride + 16, dc);
+ vst1q_u16(dst + i * stride + 24, dc);
+ vst1q_u16(dst + i * stride + 32, dc);
+ vst1q_u16(dst + i * stride + 40, dc);
+ vst1q_u16(dst + i * stride + 48, dc);
+ vst1q_u16(dst + i * stride + 56, dc);
+ }
+}
+
+static INLINE uint32x4_t horizontal_add_and_broadcast_long_u16x8(uint16x8_t a) {
+ // Need to assume input is up to 16 bits wide from dc 64x64 partial sum, so
+ // promote first.
+ const uint32x4_t b = vpaddlq_u16(a);
+#if AOM_ARCH_AARCH64
+ const uint32x4_t c = vpaddq_u32(b, b);
+ return vpaddq_u32(c, c);
+#else
+ const uint32x2_t c = vadd_u32(vget_low_u32(b), vget_high_u32(b));
+ const uint32x2_t d = vpadd_u32(c, c);
+ return vcombine_u32(d, d);
+#endif
+}
+
+static INLINE uint16x8_t highbd_dc_load_partial_sum_4(const uint16_t *left) {
+ // Nothing to do since sum is already one vector, but saves needing to
+ // special case w=4 or h=4 cases. The combine will be zero cost for a sane
+ // compiler since vld1 already sets the top half of a vector to zero as part
+ // of the operation.
+ return vcombine_u16(vld1_u16(left), vdup_n_u16(0));
+}
+
+static INLINE uint16x8_t highbd_dc_load_partial_sum_8(const uint16_t *left) {
+ // Nothing to do since sum is already one vector, but saves needing to
+ // special case w=8 or h=8 cases.
+ return vld1q_u16(left);
+}
+
+static INLINE uint16x8_t highbd_dc_load_partial_sum_16(const uint16_t *left) {
+ const uint16x8_t a0 = vld1q_u16(left + 0); // up to 12 bits
+ const uint16x8_t a1 = vld1q_u16(left + 8);
+ return vaddq_u16(a0, a1); // up to 13 bits
+}
+
+static INLINE uint16x8_t highbd_dc_load_partial_sum_32(const uint16_t *left) {
+ const uint16x8_t a0 = vld1q_u16(left + 0); // up to 12 bits
+ const uint16x8_t a1 = vld1q_u16(left + 8);
+ const uint16x8_t a2 = vld1q_u16(left + 16);
+ const uint16x8_t a3 = vld1q_u16(left + 24);
+ const uint16x8_t b0 = vaddq_u16(a0, a1); // up to 13 bits
+ const uint16x8_t b1 = vaddq_u16(a2, a3);
+ return vaddq_u16(b0, b1); // up to 14 bits
+}
+
+static INLINE uint16x8_t highbd_dc_load_partial_sum_64(const uint16_t *left) {
+ const uint16x8_t a0 = vld1q_u16(left + 0); // up to 12 bits
+ const uint16x8_t a1 = vld1q_u16(left + 8);
+ const uint16x8_t a2 = vld1q_u16(left + 16);
+ const uint16x8_t a3 = vld1q_u16(left + 24);
+ const uint16x8_t a4 = vld1q_u16(left + 32);
+ const uint16x8_t a5 = vld1q_u16(left + 40);
+ const uint16x8_t a6 = vld1q_u16(left + 48);
+ const uint16x8_t a7 = vld1q_u16(left + 56);
+ const uint16x8_t b0 = vaddq_u16(a0, a1); // up to 13 bits
+ const uint16x8_t b1 = vaddq_u16(a2, a3);
+ const uint16x8_t b2 = vaddq_u16(a4, a5);
+ const uint16x8_t b3 = vaddq_u16(a6, a7);
+ const uint16x8_t c0 = vaddq_u16(b0, b1); // up to 14 bits
+ const uint16x8_t c1 = vaddq_u16(b2, b3);
+ return vaddq_u16(c0, c1); // up to 15 bits
+}
+
+#define HIGHBD_DC_PREDICTOR(w, h, shift) \
+ void aom_highbd_dc_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)bd; \
+ const uint16x8_t a = highbd_dc_load_partial_sum_##w(above); \
+ const uint16x8_t l = highbd_dc_load_partial_sum_##h(left); \
+ const uint32x4_t sum = \
+ horizontal_add_and_broadcast_long_u16x8(vaddq_u16(a, l)); \
+ const uint16x4_t dc0 = vrshrn_n_u32(sum, shift); \
+ highbd_dc_store_##w##xh(dst, stride, (h), vdupq_lane_u16(dc0, 0)); \
}
-#define INTRA_PRED_SQUARE(type) \
- INTRA_PRED_HIGHBD_SIZED_NEON(type, 4) \
- INTRA_PRED_HIGHBD_SIZED_NEON(type, 8) \
- INTRA_PRED_HIGHBD_SIZED_NEON(type, 16) \
- INTRA_PRED_HIGHBD_SIZED_NEON(type, 32) \
- INTRA_PRED_HIGHBD_SIZED_NEON(type, 64)
+void aom_highbd_dc_predictor_4x4_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ // In the rectangular cases we simply extend the shorter vector to uint16x8
+ // in order to accumulate, however in the 4x4 case there is no shorter vector
+ // to extend so it is beneficial to do the whole calculation in uint16x4
+ // instead.
+ (void)bd;
+ const uint16x4_t a = vld1_u16(above); // up to 12 bits
+ const uint16x4_t l = vld1_u16(left);
+ uint16x4_t sum = vpadd_u16(a, l); // up to 13 bits
+ sum = vpadd_u16(sum, sum); // up to 14 bits
+ sum = vpadd_u16(sum, sum);
+ const uint16x4_t dc = vrshr_n_u16(sum, 3);
+ highbd_dc_store_4xh(dst, stride, 4, dc);
+}
-INTRA_PRED_SQUARE(dc)
+HIGHBD_DC_PREDICTOR(8, 8, 4)
+HIGHBD_DC_PREDICTOR(16, 16, 5)
+HIGHBD_DC_PREDICTOR(32, 32, 6)
+HIGHBD_DC_PREDICTOR(64, 64, 7)
-#undef INTRA_PRED_SQUARE
+#undef HIGHBD_DC_PREDICTOR
+
+static INLINE int divide_using_multiply_shift(int num, int shift1,
+ int multiplier, int shift2) {
+ const int interm = num >> shift1;
+ return interm * multiplier >> shift2;
+}
+
+#define HIGHBD_DC_MULTIPLIER_1X2 0xAAAB
+#define HIGHBD_DC_MULTIPLIER_1X4 0x6667
+#define HIGHBD_DC_SHIFT2 17
+
+static INLINE int highbd_dc_predictor_rect(int bw, int bh, int sum, int shift1,
+ uint32_t multiplier) {
+ return divide_using_multiply_shift(sum + ((bw + bh) >> 1), shift1, multiplier,
+ HIGHBD_DC_SHIFT2);
+}
+
+#undef HIGHBD_DC_SHIFT2
+
+#define HIGHBD_DC_PREDICTOR_RECT(w, h, q, shift, mult) \
+ void aom_highbd_dc_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)bd; \
+ uint16x8_t sum_above = highbd_dc_load_partial_sum_##w(above); \
+ uint16x8_t sum_left = highbd_dc_load_partial_sum_##h(left); \
+ uint16x8_t sum_vec = vaddq_u16(sum_left, sum_above); \
+ int sum = horizontal_add_and_broadcast_long_u16x8(sum_vec)[0]; \
+ int dc0 = highbd_dc_predictor_rect((w), (h), sum, (shift), (mult)); \
+ highbd_dc_store_##w##xh(dst, stride, (h), vdup##q##_n_u16(dc0)); \
+ }
+
+HIGHBD_DC_PREDICTOR_RECT(4, 8, , 2, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(4, 16, , 2, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(8, 4, q, 2, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(8, 16, q, 3, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(8, 32, q, 3, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(16, 4, q, 2, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(16, 8, q, 3, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(16, 32, q, 4, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(16, 64, q, 4, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(32, 8, q, 3, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(32, 16, q, 4, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(32, 64, q, 5, HIGHBD_DC_MULTIPLIER_1X2)
+HIGHBD_DC_PREDICTOR_RECT(64, 16, q, 4, HIGHBD_DC_MULTIPLIER_1X4)
+HIGHBD_DC_PREDICTOR_RECT(64, 32, q, 5, HIGHBD_DC_MULTIPLIER_1X2)
+
+#undef HIGHBD_DC_PREDICTOR_RECT
+#undef HIGHBD_DC_MULTIPLIER_1X2
+#undef HIGHBD_DC_MULTIPLIER_1X4
+
+// -----------------------------------------------------------------------------
+// DC_128
+
+#define HIGHBD_DC_PREDICTOR_128(w, h, q) \
+ void aom_highbd_dc_128_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)above; \
+ (void)bd; \
+ (void)left; \
+ highbd_dc_store_##w##xh(dst, stride, (h), \
+ vdup##q##_n_u16(0x80 << (bd - 8))); \
+ }
+
+HIGHBD_DC_PREDICTOR_128(4, 4, )
+HIGHBD_DC_PREDICTOR_128(4, 8, )
+HIGHBD_DC_PREDICTOR_128(4, 16, )
+HIGHBD_DC_PREDICTOR_128(8, 4, q)
+HIGHBD_DC_PREDICTOR_128(8, 8, q)
+HIGHBD_DC_PREDICTOR_128(8, 16, q)
+HIGHBD_DC_PREDICTOR_128(8, 32, q)
+HIGHBD_DC_PREDICTOR_128(16, 4, q)
+HIGHBD_DC_PREDICTOR_128(16, 8, q)
+HIGHBD_DC_PREDICTOR_128(16, 16, q)
+HIGHBD_DC_PREDICTOR_128(16, 32, q)
+HIGHBD_DC_PREDICTOR_128(16, 64, q)
+HIGHBD_DC_PREDICTOR_128(32, 8, q)
+HIGHBD_DC_PREDICTOR_128(32, 16, q)
+HIGHBD_DC_PREDICTOR_128(32, 32, q)
+HIGHBD_DC_PREDICTOR_128(32, 64, q)
+HIGHBD_DC_PREDICTOR_128(64, 16, q)
+HIGHBD_DC_PREDICTOR_128(64, 32, q)
+HIGHBD_DC_PREDICTOR_128(64, 64, q)
+
+#undef HIGHBD_DC_PREDICTOR_128
+
+// -----------------------------------------------------------------------------
+// DC_LEFT
+
+static INLINE uint32x4_t highbd_dc_load_sum_4(const uint16_t *left) {
+ const uint16x4_t a = vld1_u16(left); // up to 12 bits
+ const uint16x4_t b = vpadd_u16(a, a); // up to 13 bits
+ return vcombine_u32(vpaddl_u16(b), vdup_n_u32(0));
+}
+
+static INLINE uint32x4_t highbd_dc_load_sum_8(const uint16_t *left) {
+ return horizontal_add_and_broadcast_long_u16x8(vld1q_u16(left));
+}
+
+static INLINE uint32x4_t highbd_dc_load_sum_16(const uint16_t *left) {
+ return horizontal_add_and_broadcast_long_u16x8(
+ highbd_dc_load_partial_sum_16(left));
+}
+
+static INLINE uint32x4_t highbd_dc_load_sum_32(const uint16_t *left) {
+ return horizontal_add_and_broadcast_long_u16x8(
+ highbd_dc_load_partial_sum_32(left));
+}
+
+static INLINE uint32x4_t highbd_dc_load_sum_64(const uint16_t *left) {
+ return horizontal_add_and_broadcast_long_u16x8(
+ highbd_dc_load_partial_sum_64(left));
+}
+
+#define DC_PREDICTOR_LEFT(w, h, shift, q) \
+ void aom_highbd_dc_left_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)above; \
+ (void)bd; \
+ const uint32x4_t sum = highbd_dc_load_sum_##h(left); \
+ const uint16x4_t dc0 = vrshrn_n_u32(sum, (shift)); \
+ highbd_dc_store_##w##xh(dst, stride, (h), vdup##q##_lane_u16(dc0, 0)); \
+ }
+
+DC_PREDICTOR_LEFT(4, 4, 2, )
+DC_PREDICTOR_LEFT(4, 8, 3, )
+DC_PREDICTOR_LEFT(4, 16, 4, )
+DC_PREDICTOR_LEFT(8, 4, 2, q)
+DC_PREDICTOR_LEFT(8, 8, 3, q)
+DC_PREDICTOR_LEFT(8, 16, 4, q)
+DC_PREDICTOR_LEFT(8, 32, 5, q)
+DC_PREDICTOR_LEFT(16, 4, 2, q)
+DC_PREDICTOR_LEFT(16, 8, 3, q)
+DC_PREDICTOR_LEFT(16, 16, 4, q)
+DC_PREDICTOR_LEFT(16, 32, 5, q)
+DC_PREDICTOR_LEFT(16, 64, 6, q)
+DC_PREDICTOR_LEFT(32, 8, 3, q)
+DC_PREDICTOR_LEFT(32, 16, 4, q)
+DC_PREDICTOR_LEFT(32, 32, 5, q)
+DC_PREDICTOR_LEFT(32, 64, 6, q)
+DC_PREDICTOR_LEFT(64, 16, 4, q)
+DC_PREDICTOR_LEFT(64, 32, 5, q)
+DC_PREDICTOR_LEFT(64, 64, 6, q)
+
+#undef DC_PREDICTOR_LEFT
+
+// -----------------------------------------------------------------------------
+// DC_TOP
+
+#define DC_PREDICTOR_TOP(w, h, shift, q) \
+ void aom_highbd_dc_top_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)bd; \
+ (void)left; \
+ const uint32x4_t sum = highbd_dc_load_sum_##w(above); \
+ const uint16x4_t dc0 = vrshrn_n_u32(sum, (shift)); \
+ highbd_dc_store_##w##xh(dst, stride, (h), vdup##q##_lane_u16(dc0, 0)); \
+ }
+
+DC_PREDICTOR_TOP(4, 4, 2, )
+DC_PREDICTOR_TOP(4, 8, 2, )
+DC_PREDICTOR_TOP(4, 16, 2, )
+DC_PREDICTOR_TOP(8, 4, 3, q)
+DC_PREDICTOR_TOP(8, 8, 3, q)
+DC_PREDICTOR_TOP(8, 16, 3, q)
+DC_PREDICTOR_TOP(8, 32, 3, q)
+DC_PREDICTOR_TOP(16, 4, 4, q)
+DC_PREDICTOR_TOP(16, 8, 4, q)
+DC_PREDICTOR_TOP(16, 16, 4, q)
+DC_PREDICTOR_TOP(16, 32, 4, q)
+DC_PREDICTOR_TOP(16, 64, 4, q)
+DC_PREDICTOR_TOP(32, 8, 5, q)
+DC_PREDICTOR_TOP(32, 16, 5, q)
+DC_PREDICTOR_TOP(32, 32, 5, q)
+DC_PREDICTOR_TOP(32, 64, 5, q)
+DC_PREDICTOR_TOP(64, 16, 6, q)
+DC_PREDICTOR_TOP(64, 32, 6, q)
+DC_PREDICTOR_TOP(64, 64, 6, q)
+
+#undef DC_PREDICTOR_TOP
// -----------------------------------------------------------------------------
// V_PRED
@@ -213,6 +480,170 @@
HIGHBD_V_NXM(64, 64)
// -----------------------------------------------------------------------------
+// H_PRED
+
+static INLINE void highbd_h_store_4x4(uint16_t *dst, ptrdiff_t stride,
+ uint16x4_t left) {
+ vst1_u16(dst + 0 * stride, vdup_lane_u16(left, 0));
+ vst1_u16(dst + 1 * stride, vdup_lane_u16(left, 1));
+ vst1_u16(dst + 2 * stride, vdup_lane_u16(left, 2));
+ vst1_u16(dst + 3 * stride, vdup_lane_u16(left, 3));
+}
+
+static INLINE void highbd_h_store_8x4(uint16_t *dst, ptrdiff_t stride,
+ uint16x4_t left) {
+ vst1q_u16(dst + 0 * stride, vdupq_lane_u16(left, 0));
+ vst1q_u16(dst + 1 * stride, vdupq_lane_u16(left, 1));
+ vst1q_u16(dst + 2 * stride, vdupq_lane_u16(left, 2));
+ vst1q_u16(dst + 3 * stride, vdupq_lane_u16(left, 3));
+}
+
+static INLINE void highbd_h_store_16x1(uint16_t *dst, uint16x8_t left) {
+ vst1q_u16(dst + 0, left);
+ vst1q_u16(dst + 8, left);
+}
+
+static INLINE void highbd_h_store_16x4(uint16_t *dst, ptrdiff_t stride,
+ uint16x4_t left) {
+ highbd_h_store_16x1(dst + 0 * stride, vdupq_lane_u16(left, 0));
+ highbd_h_store_16x1(dst + 1 * stride, vdupq_lane_u16(left, 1));
+ highbd_h_store_16x1(dst + 2 * stride, vdupq_lane_u16(left, 2));
+ highbd_h_store_16x1(dst + 3 * stride, vdupq_lane_u16(left, 3));
+}
+
+static INLINE void highbd_h_store_32x1(uint16_t *dst, uint16x8_t left) {
+ vst1q_u16(dst + 0, left);
+ vst1q_u16(dst + 8, left);
+ vst1q_u16(dst + 16, left);
+ vst1q_u16(dst + 24, left);
+}
+
+static INLINE void highbd_h_store_32x4(uint16_t *dst, ptrdiff_t stride,
+ uint16x4_t left) {
+ highbd_h_store_32x1(dst + 0 * stride, vdupq_lane_u16(left, 0));
+ highbd_h_store_32x1(dst + 1 * stride, vdupq_lane_u16(left, 1));
+ highbd_h_store_32x1(dst + 2 * stride, vdupq_lane_u16(left, 2));
+ highbd_h_store_32x1(dst + 3 * stride, vdupq_lane_u16(left, 3));
+}
+
+static INLINE void highbd_h_store_64x1(uint16_t *dst, uint16x8_t left) {
+ vst1q_u16(dst + 0, left);
+ vst1q_u16(dst + 8, left);
+ vst1q_u16(dst + 16, left);
+ vst1q_u16(dst + 24, left);
+ vst1q_u16(dst + 32, left);
+ vst1q_u16(dst + 40, left);
+ vst1q_u16(dst + 48, left);
+ vst1q_u16(dst + 56, left);
+}
+
+static INLINE void highbd_h_store_64x4(uint16_t *dst, ptrdiff_t stride,
+ uint16x4_t left) {
+ highbd_h_store_64x1(dst + 0 * stride, vdupq_lane_u16(left, 0));
+ highbd_h_store_64x1(dst + 1 * stride, vdupq_lane_u16(left, 1));
+ highbd_h_store_64x1(dst + 2 * stride, vdupq_lane_u16(left, 2));
+ highbd_h_store_64x1(dst + 3 * stride, vdupq_lane_u16(left, 3));
+}
+
+void aom_highbd_h_predictor_4x4_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ highbd_h_store_4x4(dst, stride, vld1_u16(left));
+}
+
+void aom_highbd_h_predictor_4x8_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ uint16x8_t l = vld1q_u16(left);
+ highbd_h_store_4x4(dst + 0 * stride, stride, vget_low_u16(l));
+ highbd_h_store_4x4(dst + 4 * stride, stride, vget_high_u16(l));
+}
+
+void aom_highbd_h_predictor_8x4_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ highbd_h_store_8x4(dst, stride, vld1_u16(left));
+}
+
+void aom_highbd_h_predictor_8x8_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ uint16x8_t l = vld1q_u16(left);
+ highbd_h_store_8x4(dst + 0 * stride, stride, vget_low_u16(l));
+ highbd_h_store_8x4(dst + 4 * stride, stride, vget_high_u16(l));
+}
+
+void aom_highbd_h_predictor_16x4_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ highbd_h_store_16x4(dst, stride, vld1_u16(left));
+}
+
+void aom_highbd_h_predictor_16x8_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ uint16x8_t l = vld1q_u16(left);
+ highbd_h_store_16x4(dst + 0 * stride, stride, vget_low_u16(l));
+ highbd_h_store_16x4(dst + 4 * stride, stride, vget_high_u16(l));
+}
+
+void aom_highbd_h_predictor_32x8_neon(uint16_t *dst, ptrdiff_t stride,
+ const uint16_t *above,
+ const uint16_t *left, int bd) {
+ (void)above;
+ (void)bd;
+ uint16x8_t l = vld1q_u16(left);
+ highbd_h_store_32x4(dst + 0 * stride, stride, vget_low_u16(l));
+ highbd_h_store_32x4(dst + 4 * stride, stride, vget_high_u16(l));
+}
+
+// For cases where height >= 16 we use pairs of loads to get LDP instructions.
+#define HIGHBD_H_WXH_LARGE(w, h) \
+ void aom_highbd_h_predictor_##w##x##h##_neon( \
+ uint16_t *dst, ptrdiff_t stride, const uint16_t *above, \
+ const uint16_t *left, int bd) { \
+ (void)above; \
+ (void)bd; \
+ for (int i = 0; i < (h) / 16; ++i) { \
+ uint16x8_t l0 = vld1q_u16(left + 0); \
+ uint16x8_t l1 = vld1q_u16(left + 8); \
+ highbd_h_store_##w##x4(dst + 0 * stride, stride, vget_low_u16(l0)); \
+ highbd_h_store_##w##x4(dst + 4 * stride, stride, vget_high_u16(l0)); \
+ highbd_h_store_##w##x4(dst + 8 * stride, stride, vget_low_u16(l1)); \
+ highbd_h_store_##w##x4(dst + 12 * stride, stride, vget_high_u16(l1)); \
+ left += 16; \
+ dst += 16 * stride; \
+ } \
+ }
+
+HIGHBD_H_WXH_LARGE(4, 16)
+HIGHBD_H_WXH_LARGE(8, 16)
+HIGHBD_H_WXH_LARGE(8, 32)
+HIGHBD_H_WXH_LARGE(16, 16)
+HIGHBD_H_WXH_LARGE(16, 32)
+HIGHBD_H_WXH_LARGE(16, 64)
+HIGHBD_H_WXH_LARGE(32, 16)
+HIGHBD_H_WXH_LARGE(32, 32)
+HIGHBD_H_WXH_LARGE(32, 64)
+HIGHBD_H_WXH_LARGE(64, 16)
+HIGHBD_H_WXH_LARGE(64, 32)
+HIGHBD_H_WXH_LARGE(64, 64)
+
+#undef HIGHBD_H_WXH_LARGE
+
+// -----------------------------------------------------------------------------
// PAETH
static INLINE void highbd_paeth_4or8_x_h_neon(uint16_t *dest, ptrdiff_t stride,
diff --git a/aom_dsp/arm/highbd_loopfilter_neon.c b/aom_dsp/arm/highbd_loopfilter_neon.c
index 0b720ce..2b5128e 100644
--- a/aom_dsp/arm/highbd_loopfilter_neon.c
+++ b/aom_dsp/arm/highbd_loopfilter_neon.c
@@ -247,12 +247,12 @@
filter4_masks(p0q0, p1q1, hev_thresh, outer_mask, inner_thresh, &hev_mask,
&needs_filter4_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter4_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -313,12 +313,12 @@
filter4_masks(p0q0, p1q1, hev_thresh, outer_mask, inner_thresh, &hev_mask,
&needs_filter4_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter4_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -437,12 +437,12 @@
filter6_masks(p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh, bd,
&needs_filter_mask, &is_flat3_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -528,12 +528,12 @@
filter6_masks(p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh, bd,
&needs_filter_mask, &is_flat3_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -684,12 +684,12 @@
filter8_masks(p3q3, p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh,
bd, &needs_filter_mask, &is_flat4_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -783,12 +783,12 @@
filter8_masks(p3q3, p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh,
bd, &needs_filter_mask, &is_flat4_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
// Copy the masks to the high bits for packed comparisons later.
const uint16x8_t hev_mask_8 = vcombine_u16(hev_mask, hev_mask);
@@ -976,12 +976,12 @@
filter8_masks(p3q3, p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh,
bd, &needs_filter_mask, &is_flat4_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
const uint16x8_t p4q4 = vcombine_u16(src[2], src[11]);
const uint16x8_t p5q5 = vcombine_u16(src[1], src[12]);
const uint16x8_t p6q6 = vcombine_u16(src[0], src[13]);
@@ -1083,7 +1083,7 @@
static INLINE uint16x8x2_t permute_acdb64(const uint16x8_t ab,
const uint16x8_t cd) {
uint16x8x2_t acdb;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
// a[b] <- [c]d
acdb.val[0] = vreinterpretq_u16_u64(
vtrn1q_u64(vreinterpretq_u64_u16(ab), vreinterpretq_u64_u16(cd)));
@@ -1099,7 +1099,7 @@
acdb.val[1] = vreinterpretq_u16_u64(
vsetq_lane_u64(vgetq_lane_u64(vreinterpretq_u64_u16(cd), 1),
vreinterpretq_u64_u16(ab), 0));
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
return acdb;
}
@@ -1144,12 +1144,12 @@
filter8_masks(p3q3, p2q2, p1q1, p0q0, hev_thresh, outer_mask, inner_thresh,
bd, &needs_filter_mask, &is_flat4_mask, &hev_mask);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (vaddv_u16(needs_filter_mask) == 0) {
// None of the values will be filtered.
return;
}
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
const uint16x8_t p4q4 =
vcombine_u16(vget_low_u16(src_p[3]), vget_high_u16(src_q[0]));
const uint16x8_t p5q5 =
diff --git a/aom_dsp/arm/highbd_quantize_neon.c b/aom_dsp/arm/highbd_quantize_neon.c
index 927e13c..77a7aac 100644
--- a/aom_dsp/arm/highbd_quantize_neon.c
+++ b/aom_dsp/arm/highbd_quantize_neon.c
@@ -12,6 +12,8 @@
#include <arm_neon.h>
#include <assert.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/quantize.h"
#include "aom_dsp/arm/mem_neon.h"
@@ -19,7 +21,7 @@
#include "av1/encoder/av1_quantize.h"
static INLINE uint32_t sum_abs_coeff(const uint32x4_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddvq_u32(a);
#else
const uint64x2_t b = vpaddlq_u32(a);
@@ -98,7 +100,7 @@
}
static INLINE uint16_t get_max_eob(int16x8_t v_eobmax) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
return (uint16_t)vmaxvq_s16(v_eobmax);
#else
const int16x4_t v_eobmax_3210 =
@@ -116,7 +118,7 @@
}
static INLINE uint16_t get_min_eob(int16x8_t v_eobmin) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
return (uint16_t)vminvq_s16(v_eobmin);
#else
const int16x4_t v_eobmin_3210 =
diff --git a/aom_dsp/arm/highbd_sad4d_neon.c b/aom_dsp/arm/highbd_sad4d_neon.c
new file mode 100644
index 0000000..f2fda36
--- /dev/null
+++ b/aom_dsp/arm/highbd_sad4d_neon.c
@@ -0,0 +1,360 @@
+/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
+
+static INLINE void highbd_sad4xhx4d_small_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr0 = CONVERT_TO_SHORTPTR(ref_ptr[0]);
+ const uint16_t *ref16_ptr1 = CONVERT_TO_SHORTPTR(ref_ptr[1]);
+ const uint16_t *ref16_ptr2 = CONVERT_TO_SHORTPTR(ref_ptr[2]);
+ const uint16_t *ref16_ptr3 = CONVERT_TO_SHORTPTR(ref_ptr[3]);
+
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+
+ int i = 0;
+ do {
+ uint16x4_t s = vld1_u16(src16_ptr + i * src_stride);
+ uint16x4_t r0 = vld1_u16(ref16_ptr0 + i * ref_stride);
+ uint16x4_t r1 = vld1_u16(ref16_ptr1 + i * ref_stride);
+ uint16x4_t r2 = vld1_u16(ref16_ptr2 + i * ref_stride);
+ uint16x4_t r3 = vld1_u16(ref16_ptr3 + i * ref_stride);
+
+ sum[0] = vabal_u16(sum[0], s, r0);
+ sum[1] = vabal_u16(sum[1], s, r1);
+ sum[2] = vabal_u16(sum[2], s, r2);
+ sum[3] = vabal_u16(sum[3], s, r3);
+
+ } while (++i < h);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void highbd_sad8xhx4d_small_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr0 = CONVERT_TO_SHORTPTR(ref_ptr[0]);
+ const uint16_t *ref16_ptr1 = CONVERT_TO_SHORTPTR(ref_ptr[1]);
+ const uint16_t *ref16_ptr2 = CONVERT_TO_SHORTPTR(ref_ptr[2]);
+ const uint16_t *ref16_ptr3 = CONVERT_TO_SHORTPTR(ref_ptr[3]);
+
+ uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint32x4_t sum_u32[4];
+
+ int i = 0;
+ do {
+ uint16x8_t s = vld1q_u16(src16_ptr + i * src_stride);
+
+ sum[0] = vabaq_u16(sum[0], s, vld1q_u16(ref16_ptr0 + i * ref_stride));
+ sum[1] = vabaq_u16(sum[1], s, vld1q_u16(ref16_ptr1 + i * ref_stride));
+ sum[2] = vabaq_u16(sum[2], s, vld1q_u16(ref16_ptr2 + i * ref_stride));
+ sum[3] = vabaq_u16(sum[3], s, vld1q_u16(ref16_ptr3 + i * ref_stride));
+
+ } while (++i < h);
+
+ sum_u32[0] = vpaddlq_u16(sum[0]);
+ sum_u32[1] = vpaddlq_u16(sum[1]);
+ sum_u32[2] = vpaddlq_u16(sum[2]);
+ sum_u32[3] = vpaddlq_u16(sum[3]);
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum_u32));
+}
+
+static INLINE void sad8_neon(uint16x8_t src, uint16x8_t ref,
+ uint32x4_t *const sad_sum) {
+ uint16x8_t abs_diff = vabdq_u16(src, ref);
+ *sad_sum = vpadalq_u16(*sad_sum, abs_diff);
+}
+
+static INLINE void highbd_sad8xhx4d_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr0 = CONVERT_TO_SHORTPTR(ref_ptr[0]);
+ const uint16_t *ref16_ptr1 = CONVERT_TO_SHORTPTR(ref_ptr[1]);
+ const uint16_t *ref16_ptr2 = CONVERT_TO_SHORTPTR(ref_ptr[2]);
+ const uint16_t *ref16_ptr3 = CONVERT_TO_SHORTPTR(ref_ptr[3]);
+
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+
+ int i = 0;
+ do {
+ uint16x8_t s = vld1q_u16(src16_ptr + i * src_stride);
+ sad8_neon(s, vld1q_u16(ref16_ptr0 + i * ref_stride), &sum[0]);
+ sad8_neon(s, vld1q_u16(ref16_ptr1 + i * ref_stride), &sum[1]);
+ sad8_neon(s, vld1q_u16(ref16_ptr2 + i * ref_stride), &sum[2]);
+ sad8_neon(s, vld1q_u16(ref16_ptr3 + i * ref_stride), &sum[3]);
+
+ } while (++i < h);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void highbd_sad16xhx4d_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr0 = CONVERT_TO_SHORTPTR(ref_ptr[0]);
+ const uint16_t *ref16_ptr1 = CONVERT_TO_SHORTPTR(ref_ptr[1]);
+ const uint16_t *ref16_ptr2 = CONVERT_TO_SHORTPTR(ref_ptr[2]);
+ const uint16_t *ref16_ptr3 = CONVERT_TO_SHORTPTR(ref_ptr[3]);
+
+ uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum[4];
+
+ int i = 0;
+ do {
+ uint16x8_t s0 = vld1q_u16(src16_ptr + i * src_stride);
+ sad8_neon(s0, vld1q_u16(ref16_ptr0 + i * ref_stride), &sum_lo[0]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr1 + i * ref_stride), &sum_lo[1]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr2 + i * ref_stride), &sum_lo[2]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr3 + i * ref_stride), &sum_lo[3]);
+
+ uint16x8_t s1 = vld1q_u16(src16_ptr + i * src_stride + 8);
+ sad8_neon(s1, vld1q_u16(ref16_ptr0 + i * ref_stride + 8), &sum_hi[0]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr1 + i * ref_stride + 8), &sum_hi[1]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr2 + i * ref_stride + 8), &sum_hi[2]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr3 + i * ref_stride + 8), &sum_hi[3]);
+
+ } while (++i < h);
+
+ sum[0] = vaddq_u32(sum_lo[0], sum_hi[0]);
+ sum[1] = vaddq_u32(sum_lo[1], sum_hi[1]);
+ sum[2] = vaddq_u32(sum_lo[2], sum_hi[2]);
+ sum[3] = vaddq_u32(sum_lo[3], sum_hi[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void highbd_sadwxhx4d_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int w, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr0 = CONVERT_TO_SHORTPTR(ref_ptr[0]);
+ const uint16_t *ref16_ptr1 = CONVERT_TO_SHORTPTR(ref_ptr[1]);
+ const uint16_t *ref16_ptr2 = CONVERT_TO_SHORTPTR(ref_ptr[2]);
+ const uint16_t *ref16_ptr3 = CONVERT_TO_SHORTPTR(ref_ptr[3]);
+
+ uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum[4];
+
+ int i = 0;
+ do {
+ int j = 0;
+ do {
+ uint16x8_t s0 = vld1q_u16(src16_ptr + i * src_stride + j);
+ sad8_neon(s0, vld1q_u16(ref16_ptr0 + i * ref_stride + j), &sum_lo[0]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr1 + i * ref_stride + j), &sum_lo[1]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr2 + i * ref_stride + j), &sum_lo[2]);
+ sad8_neon(s0, vld1q_u16(ref16_ptr3 + i * ref_stride + j), &sum_lo[3]);
+
+ uint16x8_t s1 = vld1q_u16(src16_ptr + i * src_stride + j + 8);
+ sad8_neon(s1, vld1q_u16(ref16_ptr0 + i * ref_stride + j + 8), &sum_hi[0]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr1 + i * ref_stride + j + 8), &sum_hi[1]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr2 + i * ref_stride + j + 8), &sum_hi[2]);
+ sad8_neon(s1, vld1q_u16(ref16_ptr3 + i * ref_stride + j + 8), &sum_hi[3]);
+
+ uint16x8_t s2 = vld1q_u16(src16_ptr + i * src_stride + j + 16);
+ sad8_neon(s2, vld1q_u16(ref16_ptr0 + i * ref_stride + j + 16),
+ &sum_lo[0]);
+ sad8_neon(s2, vld1q_u16(ref16_ptr1 + i * ref_stride + j + 16),
+ &sum_lo[1]);
+ sad8_neon(s2, vld1q_u16(ref16_ptr2 + i * ref_stride + j + 16),
+ &sum_lo[2]);
+ sad8_neon(s2, vld1q_u16(ref16_ptr3 + i * ref_stride + j + 16),
+ &sum_lo[3]);
+
+ uint16x8_t s3 = vld1q_u16(src16_ptr + i * src_stride + j + 24);
+ sad8_neon(s3, vld1q_u16(ref16_ptr0 + i * ref_stride + j + 24),
+ &sum_hi[0]);
+ sad8_neon(s3, vld1q_u16(ref16_ptr1 + i * ref_stride + j + 24),
+ &sum_hi[1]);
+ sad8_neon(s3, vld1q_u16(ref16_ptr2 + i * ref_stride + j + 24),
+ &sum_hi[2]);
+ sad8_neon(s3, vld1q_u16(ref16_ptr3 + i * ref_stride + j + 24),
+ &sum_hi[3]);
+
+ j += 32;
+ } while (j < w);
+
+ } while (++i < h);
+
+ sum[0] = vaddq_u32(sum_lo[0], sum_hi[0]);
+ sum[1] = vaddq_u32(sum_lo[1], sum_hi[1]);
+ sum[2] = vaddq_u32(sum_lo[2], sum_hi[2]);
+ sum[3] = vaddq_u32(sum_lo[3], sum_hi[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void highbd_sad128xhx4d_large_neon(
+ const uint8_t *src_ptr, int src_stride, const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4], int h) {
+ highbd_sadwxhx4d_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, res,
+ 128, h);
+}
+
+static INLINE void highbd_sad64xhx4d_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ highbd_sadwxhx4d_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, res, 64,
+ h);
+}
+
+static INLINE void highbd_sad32xhx4d_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *const ref_ptr[4],
+ int ref_stride, uint32_t res[4],
+ int h) {
+ highbd_sadwxhx4d_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, res, 32,
+ h);
+}
+
+#define HBD_SAD_WXH_4D_SMALL_NEON(w, h) \
+ void aom_highbd_sad##w##x##h##x4d_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *const ref_array[4], \
+ int ref_stride, uint32_t sad_array[4]) { \
+ highbd_sad##w##xhx4d_small_neon(src, src_stride, ref_array, ref_stride, \
+ sad_array, (h)); \
+ }
+
+#define HBD_SAD_WXH_4D_LARGE_NEON(w, h) \
+ void aom_highbd_sad##w##x##h##x4d_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *const ref_array[4], \
+ int ref_stride, uint32_t sad_array[4]) { \
+ highbd_sad##w##xhx4d_large_neon(src, src_stride, ref_array, ref_stride, \
+ sad_array, (h)); \
+ }
+
+HBD_SAD_WXH_4D_SMALL_NEON(4, 4)
+HBD_SAD_WXH_4D_SMALL_NEON(4, 8)
+
+HBD_SAD_WXH_4D_SMALL_NEON(8, 4)
+HBD_SAD_WXH_4D_SMALL_NEON(8, 8)
+HBD_SAD_WXH_4D_SMALL_NEON(8, 16)
+
+HBD_SAD_WXH_4D_LARGE_NEON(16, 8)
+HBD_SAD_WXH_4D_LARGE_NEON(16, 16)
+HBD_SAD_WXH_4D_LARGE_NEON(16, 32)
+
+HBD_SAD_WXH_4D_LARGE_NEON(32, 16)
+HBD_SAD_WXH_4D_LARGE_NEON(32, 32)
+HBD_SAD_WXH_4D_LARGE_NEON(32, 64)
+
+HBD_SAD_WXH_4D_LARGE_NEON(64, 32)
+HBD_SAD_WXH_4D_LARGE_NEON(64, 64)
+HBD_SAD_WXH_4D_LARGE_NEON(64, 128)
+
+HBD_SAD_WXH_4D_LARGE_NEON(128, 64)
+HBD_SAD_WXH_4D_LARGE_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+HBD_SAD_WXH_4D_SMALL_NEON(4, 16)
+
+HBD_SAD_WXH_4D_LARGE_NEON(8, 32)
+
+HBD_SAD_WXH_4D_LARGE_NEON(16, 4)
+HBD_SAD_WXH_4D_LARGE_NEON(16, 64)
+
+HBD_SAD_WXH_4D_LARGE_NEON(32, 8)
+
+HBD_SAD_WXH_4D_LARGE_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
+#define HBD_SAD_SKIP_WXH_4D_SMALL_NEON(w, h) \
+ void aom_highbd_sad_skip_##w##x##h##x4d_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *const ref_array[4], \
+ int ref_stride, uint32_t sad_array[4]) { \
+ highbd_sad##w##xhx4d_small_neon(src, 2 * src_stride, ref_array, \
+ 2 * ref_stride, sad_array, ((h) >> 1)); \
+ sad_array[0] <<= 1; \
+ sad_array[1] <<= 1; \
+ sad_array[2] <<= 1; \
+ sad_array[3] <<= 1; \
+ }
+
+#define HBD_SAD_SKIP_WXH_4D_LARGE_NEON(w, h) \
+ void aom_highbd_sad_skip_##w##x##h##x4d_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *const ref_array[4], \
+ int ref_stride, uint32_t sad_array[4]) { \
+ highbd_sad##w##xhx4d_large_neon(src, 2 * src_stride, ref_array, \
+ 2 * ref_stride, sad_array, ((h) >> 1)); \
+ sad_array[0] <<= 1; \
+ sad_array[1] <<= 1; \
+ sad_array[2] <<= 1; \
+ sad_array[3] <<= 1; \
+ }
+
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(4, 4)
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(4, 8)
+
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(8, 4)
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(8, 8)
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(8, 16)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(16, 8)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(16, 16)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(16, 32)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(32, 16)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(32, 32)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(32, 64)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(64, 32)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(64, 64)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(64, 128)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(128, 64)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(4, 16)
+
+HBD_SAD_SKIP_WXH_4D_SMALL_NEON(8, 32)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(16, 4)
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(16, 64)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(32, 8)
+
+HBD_SAD_SKIP_WXH_4D_LARGE_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
diff --git a/aom_dsp/arm/highbd_sad_neon.c b/aom_dsp/arm/highbd_sad_neon.c
new file mode 100644
index 0000000..919eb55
--- /dev/null
+++ b/aom_dsp/arm/highbd_sad_neon.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
+
+static INLINE uint32_t highbd_sad4xh_small_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr = CONVERT_TO_SHORTPTR(ref_ptr);
+ uint32x4_t sum = vdupq_n_u32(0);
+
+ int i = h;
+ do {
+ uint16x4_t s = vld1_u16(src16_ptr);
+ uint16x4_t r = vld1_u16(ref16_ptr);
+ sum = vabal_u16(sum, s, r);
+
+ src16_ptr += src_stride;
+ ref16_ptr += ref_stride;
+ } while (--i != 0);
+
+ return horizontal_add_u32x4(sum);
+}
+
+static INLINE uint32_t highbd_sad8xh_small_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr = CONVERT_TO_SHORTPTR(ref_ptr);
+ uint16x8_t sum = vdupq_n_u16(0);
+
+ int i = h;
+ do {
+ uint16x8_t s = vld1q_u16(src16_ptr);
+ uint16x8_t r = vld1q_u16(ref16_ptr);
+ sum = vabaq_u16(sum, s, r);
+
+ src16_ptr += src_stride;
+ ref16_ptr += ref_stride;
+ } while (--i != 0);
+
+ return horizontal_add_u16x8(sum);
+}
+
+static INLINE uint32_t highbd_sad8xh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr = CONVERT_TO_SHORTPTR(ref_ptr);
+ uint32x4_t sum_u32 = vdupq_n_u32(0);
+
+ int i = h;
+ do {
+ uint16x8_t s = vld1q_u16(src16_ptr);
+ uint16x8_t r = vld1q_u16(ref16_ptr);
+ uint16x8_t sum_u16 = vabdq_u16(s, r);
+ sum_u32 = vpadalq_u16(sum_u32, sum_u16);
+
+ src16_ptr += src_stride;
+ ref16_ptr += ref_stride;
+ } while (--i != 0);
+
+ return horizontal_add_u32x4(sum_u32);
+}
+
+static INLINE uint32_t highbd_sad16xh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr = CONVERT_TO_SHORTPTR(ref_ptr);
+ uint32x4_t sum[2] = { vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ int i = h;
+ do {
+ uint16x8_t s0 = vld1q_u16(src16_ptr);
+ uint16x8_t r0 = vld1q_u16(ref16_ptr);
+ uint16x8_t diff0 = vabdq_u16(s0, r0);
+ sum[0] = vpadalq_u16(sum[0], diff0);
+
+ uint16x8_t s1 = vld1q_u16(src16_ptr + 8);
+ uint16x8_t r1 = vld1q_u16(ref16_ptr + 8);
+ uint16x8_t diff1 = vabdq_u16(s1, r1);
+ sum[1] = vpadalq_u16(sum[1], diff1);
+
+ src16_ptr += src_stride;
+ ref16_ptr += ref_stride;
+ } while (--i != 0);
+
+ sum[0] = vaddq_u32(sum[0], sum[1]);
+ return horizontal_add_u32x4(sum[0]);
+}
+
+static INLINE uint32_t highbd_sadwxh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int w, int h) {
+ const uint16_t *src16_ptr = CONVERT_TO_SHORTPTR(src_ptr);
+ const uint16_t *ref16_ptr = CONVERT_TO_SHORTPTR(ref_ptr);
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+
+ int i = h;
+ do {
+ int j = 0;
+ do {
+ uint16x8_t s0 = vld1q_u16(src16_ptr + j);
+ uint16x8_t r0 = vld1q_u16(ref16_ptr + j);
+ uint16x8_t diff0 = vabdq_u16(s0, r0);
+ sum[0] = vpadalq_u16(sum[0], diff0);
+
+ uint16x8_t s1 = vld1q_u16(src16_ptr + j + 8);
+ uint16x8_t r1 = vld1q_u16(ref16_ptr + j + 8);
+ uint16x8_t diff1 = vabdq_u16(s1, r1);
+ sum[1] = vpadalq_u16(sum[1], diff1);
+
+ uint16x8_t s2 = vld1q_u16(src16_ptr + j + 16);
+ uint16x8_t r2 = vld1q_u16(ref16_ptr + j + 16);
+ uint16x8_t diff2 = vabdq_u16(s2, r2);
+ sum[2] = vpadalq_u16(sum[2], diff2);
+
+ uint16x8_t s3 = vld1q_u16(src16_ptr + j + 24);
+ uint16x8_t r3 = vld1q_u16(ref16_ptr + j + 24);
+ uint16x8_t diff3 = vabdq_u16(s3, r3);
+ sum[3] = vpadalq_u16(sum[3], diff3);
+
+ j += 32;
+ } while (j < w);
+
+ src16_ptr += src_stride;
+ ref16_ptr += ref_stride;
+ } while (--i != 0);
+
+ sum[0] = vaddq_u32(sum[0], sum[1]);
+ sum[2] = vaddq_u32(sum[2], sum[3]);
+ sum[0] = vaddq_u32(sum[0], sum[2]);
+
+ return horizontal_add_u32x4(sum[0]);
+}
+
+static INLINE unsigned int highbd_sad128xh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ return highbd_sadwxh_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, 128,
+ h);
+}
+
+static INLINE unsigned int highbd_sad64xh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ return highbd_sadwxh_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, 64,
+ h);
+}
+
+static INLINE unsigned int highbd_sad32xh_large_neon(const uint8_t *src_ptr,
+ int src_stride,
+ const uint8_t *ref_ptr,
+ int ref_stride, int h) {
+ return highbd_sadwxh_large_neon(src_ptr, src_stride, ref_ptr, ref_stride, 32,
+ h);
+}
+
+#define HBD_SAD_WXH_SMALL_NEON(w, h) \
+ unsigned int aom_highbd_sad##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref, \
+ int ref_stride) { \
+ return highbd_sad##w##xh_small_neon(src, src_stride, ref, ref_stride, \
+ (h)); \
+ }
+
+#define HBD_SAD_WXH_LARGE_NEON(w, h) \
+ unsigned int aom_highbd_sad##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref, \
+ int ref_stride) { \
+ return highbd_sad##w##xh_large_neon(src, src_stride, ref, ref_stride, \
+ (h)); \
+ }
+
+HBD_SAD_WXH_SMALL_NEON(4, 4)
+HBD_SAD_WXH_SMALL_NEON(4, 8)
+
+HBD_SAD_WXH_SMALL_NEON(8, 4)
+HBD_SAD_WXH_SMALL_NEON(8, 8)
+HBD_SAD_WXH_SMALL_NEON(8, 16)
+
+HBD_SAD_WXH_LARGE_NEON(16, 8)
+HBD_SAD_WXH_LARGE_NEON(16, 16)
+HBD_SAD_WXH_LARGE_NEON(16, 32)
+
+HBD_SAD_WXH_LARGE_NEON(32, 16)
+HBD_SAD_WXH_LARGE_NEON(32, 32)
+HBD_SAD_WXH_LARGE_NEON(32, 64)
+
+HBD_SAD_WXH_LARGE_NEON(64, 32)
+HBD_SAD_WXH_LARGE_NEON(64, 64)
+HBD_SAD_WXH_LARGE_NEON(64, 128)
+
+HBD_SAD_WXH_LARGE_NEON(128, 64)
+HBD_SAD_WXH_LARGE_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+HBD_SAD_WXH_SMALL_NEON(4, 16)
+
+HBD_SAD_WXH_LARGE_NEON(8, 32)
+
+HBD_SAD_WXH_LARGE_NEON(16, 4)
+HBD_SAD_WXH_LARGE_NEON(16, 64)
+
+HBD_SAD_WXH_LARGE_NEON(32, 8)
+
+HBD_SAD_WXH_LARGE_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
+#define HBD_SAD_SKIP_WXH_SMALL_NEON(w, h) \
+ unsigned int aom_highbd_sad_skip_##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref, \
+ int ref_stride) { \
+ return 2 * highbd_sad##w##xh_small_neon(src, 2 * src_stride, ref, \
+ 2 * ref_stride, (h) / 2); \
+ }
+
+#define HBD_SAD_SKIP_WXH_LARGE_NEON(w, h) \
+ unsigned int aom_highbd_sad_skip_##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref, \
+ int ref_stride) { \
+ return 2 * highbd_sad##w##xh_large_neon(src, 2 * src_stride, ref, \
+ 2 * ref_stride, (h) / 2); \
+ }
+
+HBD_SAD_SKIP_WXH_SMALL_NEON(4, 4)
+HBD_SAD_SKIP_WXH_SMALL_NEON(4, 8)
+
+HBD_SAD_SKIP_WXH_SMALL_NEON(8, 4)
+HBD_SAD_SKIP_WXH_SMALL_NEON(8, 8)
+HBD_SAD_SKIP_WXH_SMALL_NEON(8, 16)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(16, 8)
+HBD_SAD_SKIP_WXH_LARGE_NEON(16, 16)
+HBD_SAD_SKIP_WXH_LARGE_NEON(16, 32)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(32, 16)
+HBD_SAD_SKIP_WXH_LARGE_NEON(32, 32)
+HBD_SAD_SKIP_WXH_LARGE_NEON(32, 64)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(64, 32)
+HBD_SAD_SKIP_WXH_LARGE_NEON(64, 64)
+HBD_SAD_SKIP_WXH_LARGE_NEON(64, 128)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(128, 64)
+HBD_SAD_SKIP_WXH_LARGE_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+HBD_SAD_SKIP_WXH_SMALL_NEON(4, 16)
+
+HBD_SAD_SKIP_WXH_SMALL_NEON(8, 32)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(16, 4)
+HBD_SAD_SKIP_WXH_LARGE_NEON(16, 64)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(32, 8)
+
+HBD_SAD_SKIP_WXH_LARGE_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
diff --git a/aom_dsp/arm/highbd_variance_neon.c b/aom_dsp/arm/highbd_variance_neon.c
index 3c3877a..948f2f7 100644
--- a/aom_dsp/arm/highbd_variance_neon.c
+++ b/aom_dsp/arm/highbd_variance_neon.c
@@ -1,4 +1,5 @@
/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
* Copyright (c) 2022, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
@@ -16,156 +17,515 @@
#include "aom_dsp/variance.h"
#include "aom_dsp/aom_filter.h"
+#include "aom_dsp/arm/mem_neon.h"
#include "aom_dsp/arm/sum_neon.h"
-typedef void (*high_variance_fn_t)(const uint16_t *src, int src_stride,
- const uint16_t *ref, int ref_stride,
- uint32_t *sse, int *sum);
+// Process a block of width 4 two rows at a time.
+static INLINE void highbd_variance_4xh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int h,
+ uint64_t *sse, int64_t *sum) {
+ int16x8_t sum_s16 = vdupq_n_s16(0);
+ int32x4_t sse_s32 = vdupq_n_s32(0);
-void aom_highbd_calc16x16var_neon(const uint16_t *src, int src_stride,
- const uint16_t *ref, int ref_stride,
- uint32_t *sse, int *sum) {
- int i, j;
- int16x8_t v_sum = vdupq_n_s16(0);
- int32x4_t v_sse_lo = vdupq_n_s32(0);
- int32x4_t v_sse_hi = vdupq_n_s32(0);
+ int i = h;
+ do {
+ const uint16x8_t s = load_unaligned_u16_4x2(src_ptr, src_stride);
+ const uint16x8_t r = load_unaligned_u16_4x2(ref_ptr, ref_stride);
- for (i = 0; i < 16; ++i) {
- for (j = 0; j < 16; j += 8) {
- const uint16x8_t v_a = vld1q_u16(&src[j]);
- const uint16x8_t v_b = vld1q_u16(&ref[j]);
- const int16x8_t sv_diff = vreinterpretq_s16_u16(vsubq_u16(v_a, v_b));
- v_sum = vaddq_s16(v_sum, sv_diff);
- v_sse_lo =
- vmlal_s16(v_sse_lo, vget_low_s16(sv_diff), vget_low_s16(sv_diff));
- v_sse_hi =
- vmlal_s16(v_sse_hi, vget_high_s16(sv_diff), vget_high_s16(sv_diff));
- }
- src += src_stride;
- ref += ref_stride;
- }
+ int16x8_t diff = vreinterpretq_s16_u16(vsubq_u16(s, r));
+ sum_s16 = vaddq_s16(sum_s16, diff);
- *sum = horizontal_add_s16x8(v_sum);
- *sse = (unsigned int)horizontal_add_s32x4(vaddq_s32(v_sse_lo, v_sse_hi));
+ sse_s32 = vmlal_s16(sse_s32, vget_low_s16(diff), vget_low_s16(diff));
+ sse_s32 = vmlal_s16(sse_s32, vget_high_s16(diff), vget_high_s16(diff));
+
+ src_ptr += 2 * src_stride;
+ ref_ptr += 2 * ref_stride;
+ i -= 2;
+ } while (i != 0);
+
+ *sum = horizontal_add_s16x8(sum_s16);
+ *sse = horizontal_add_s32x4(sse_s32);
}
-void aom_highbd_calc8x8var_neon(const uint16_t *src, int src_stride,
- const uint16_t *ref, int ref_stride,
- uint32_t *sse, int *sum) {
- int i;
- int16x8_t v_sum = vdupq_n_s16(0);
- int32x4_t v_sse_lo = vdupq_n_s32(0);
- int32x4_t v_sse_hi = vdupq_n_s32(0);
+// For 8-bit and 10-bit data, since we're using two int32x4 accumulators, all
+// block sizes can be processed in 32-bit elements (1023*1023*128*32 =
+// 4286582784 for a 128x128 block).
+static INLINE void highbd_variance_large_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int w, int h,
+ uint64_t *sse, int64_t *sum) {
+ int32x4_t sum_s32 = vdupq_n_s32(0);
+ int32x4_t sse_s32[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
- for (i = 0; i < 8; ++i) {
- const uint16x8_t v_a = vld1q_u16(&src[0]);
- const uint16x8_t v_b = vld1q_u16(&ref[0]);
- const int16x8_t sv_diff = vreinterpretq_s16_u16(vsubq_u16(v_a, v_b));
- v_sum = vaddq_s16(v_sum, sv_diff);
- v_sse_lo =
- vmlal_s16(v_sse_lo, vget_low_s16(sv_diff), vget_low_s16(sv_diff));
- v_sse_hi =
- vmlal_s16(v_sse_hi, vget_high_s16(sv_diff), vget_high_s16(sv_diff));
- src += src_stride;
- ref += ref_stride;
- }
+ int i = h;
+ do {
+ int j = 0;
+ do {
+ const uint16x8_t s = vld1q_u16(src_ptr + j);
+ const uint16x8_t r = vld1q_u16(ref_ptr + j);
- *sum = horizontal_add_s16x8(v_sum);
- *sse = (unsigned int)horizontal_add_s32x4(vaddq_s32(v_sse_lo, v_sse_hi));
+ const int16x8_t diff = vreinterpretq_s16_u16(vsubq_u16(s, r));
+ sum_s32 = vpadalq_s16(sum_s32, diff);
+
+ sse_s32[0] =
+ vmlal_s16(sse_s32[0], vget_low_s16(diff), vget_low_s16(diff));
+ sse_s32[1] =
+ vmlal_s16(sse_s32[1], vget_high_s16(diff), vget_high_s16(diff));
+
+ j += 8;
+ } while (j < w);
+
+ src_ptr += src_stride;
+ ref_ptr += ref_stride;
+ } while (--i != 0);
+
+ *sum = horizontal_add_s32x4(sum_s32);
+ *sse = horizontal_long_add_u32x4(vaddq_u32(
+ vreinterpretq_u32_s32(sse_s32[0]), vreinterpretq_u32_s32(sse_s32[1])));
}
-void aom_highbd_calc4x4var_neon(const uint16_t *src, int src_stride,
- const uint16_t *ref, int ref_stride,
- uint32_t *sse, int *sum) {
- int i;
- int16x8_t v_sum = vdupq_n_s16(0);
- int32x4_t v_sse_lo = vdupq_n_s32(0);
- int32x4_t v_sse_hi = vdupq_n_s32(0);
-
- for (i = 0; i < 4; i += 2) {
- const uint16x4_t v_a_r0 = vld1_u16(&src[0]);
- const uint16x4_t v_b_r0 = vld1_u16(&ref[0]);
- const uint16x4_t v_a_r1 = vld1_u16(&src[src_stride]);
- const uint16x4_t v_b_r1 = vld1_u16(&ref[ref_stride]);
- const uint16x8_t v_a = vcombine_u16(v_a_r0, v_a_r1);
- const uint16x8_t v_b = vcombine_u16(v_b_r0, v_b_r1);
- const int16x8_t sv_diff = vreinterpretq_s16_u16(vsubq_u16(v_a, v_b));
- v_sum = vaddq_s16(v_sum, sv_diff);
- v_sse_lo =
- vmlal_s16(v_sse_lo, vget_low_s16(sv_diff), vget_low_s16(sv_diff));
- v_sse_hi =
- vmlal_s16(v_sse_hi, vget_high_s16(sv_diff), vget_high_s16(sv_diff));
- src += src_stride << 1;
- ref += ref_stride << 1;
- }
-
- *sum = horizontal_add_s16x8(v_sum);
- *sse = (unsigned int)horizontal_add_s32x4(vaddq_s32(v_sse_lo, v_sse_hi));
+static INLINE void highbd_variance_8xh_neon(const uint16_t *src, int src_stride,
+ const uint16_t *ref, int ref_stride,
+ int h, uint64_t *sse,
+ int64_t *sum) {
+ highbd_variance_large_neon(src, src_stride, ref, ref_stride, 8, h, sse, sum);
}
-static void highbd_10_variance_neon(const uint16_t *src, int src_stride,
- const uint16_t *ref, int ref_stride, int w,
- int h, uint32_t *sse, int *sum,
- high_variance_fn_t var_fn, int block_size) {
- int i, j;
- uint64_t sse_long = 0;
- int32_t sum_long = 0;
-
- for (i = 0; i < h; i += block_size) {
- for (j = 0; j < w; j += block_size) {
- unsigned int sse0;
- int sum0;
- var_fn(src + src_stride * i + j, src_stride, ref + ref_stride * i + j,
- ref_stride, &sse0, &sum0);
- sse_long += sse0;
- sum_long += sum0;
- }
- }
- *sum = ROUND_POWER_OF_TWO(sum_long, 2);
- *sse = (uint32_t)ROUND_POWER_OF_TWO(sse_long, 4);
+static INLINE void highbd_variance_16xh_neon(const uint16_t *src,
+ int src_stride,
+ const uint16_t *ref,
+ int ref_stride, int h,
+ uint64_t *sse, int64_t *sum) {
+ highbd_variance_large_neon(src, src_stride, ref, ref_stride, 16, h, sse, sum);
}
-#define VAR_FN(w, h, block_size, shift) \
- uint32_t aom_highbd_10_variance##w##x##h##_neon( \
- const uint8_t *src8, int src_stride, const uint8_t *ref8, \
- int ref_stride, uint32_t *sse) { \
- int sum; \
- int64_t var; \
- uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
- uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
- highbd_10_variance_neon( \
- src, src_stride, ref, ref_stride, w, h, sse, &sum, \
- aom_highbd_calc##block_size##x##block_size##var_neon, block_size); \
- var = (int64_t)(*sse) - (((int64_t)sum * sum) >> shift); \
- return (var >= 0) ? (uint32_t)var : 0; \
+static INLINE void highbd_variance_32xh_neon(const uint16_t *src,
+ int src_stride,
+ const uint16_t *ref,
+ int ref_stride, int h,
+ uint64_t *sse, int64_t *sum) {
+ highbd_variance_large_neon(src, src_stride, ref, ref_stride, 32, h, sse, sum);
+}
+
+static INLINE void highbd_variance_64xh_neon(const uint16_t *src,
+ int src_stride,
+ const uint16_t *ref,
+ int ref_stride, int h,
+ uint64_t *sse, int64_t *sum) {
+ highbd_variance_large_neon(src, src_stride, ref, ref_stride, 64, h, sse, sum);
+}
+
+static INLINE void highbd_variance_128xh_neon(const uint16_t *src,
+ int src_stride,
+ const uint16_t *ref,
+ int ref_stride, int h,
+ uint64_t *sse, int64_t *sum) {
+ highbd_variance_large_neon(src, src_stride, ref, ref_stride, 128, h, sse,
+ sum);
+}
+
+// For 12-bit data, we can only accumulate up to 128 elements in the sum of
+// squares (4095*4095*128 = 2146435200), and because we're using two int32x4
+// accumulators, we can only process up to 32 32-element rows (32*32/8 = 128)
+// or 16 64-element rows before we have to accumulate into 64-bit elements.
+// Therefore blocks of size 32x64, 64x32, 64x64, 64x128, 128x64, 128x128 are
+// processed in a different helper function.
+
+// Process a block of any size where the width is divisible by 8, with
+// accumulation into 64-bit elements.
+static INLINE void highbd_variance_xlarge_neon(
+ const uint16_t *src_ptr, int src_stride, const uint16_t *ref_ptr,
+ int ref_stride, int w, int h, int h_limit, uint64_t *sse, int64_t *sum) {
+ int32x4_t sum_s32 = vdupq_n_s32(0);
+ int64x2_t sse_s64 = vdupq_n_s64(0);
+
+ // 'h_limit' is the number of 'w'-width rows we can process before our 32-bit
+ // accumulator overflows. After hitting this limit we accumulate into 64-bit
+ // elements.
+ int h_tmp = h > h_limit ? h_limit : h;
+
+ int i = 0;
+ do {
+ int32x4_t sse_s32[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
+ do {
+ int j = 0;
+ do {
+ const uint16x8_t s0 = vld1q_u16(src_ptr + j);
+ const uint16x8_t r0 = vld1q_u16(ref_ptr + j);
+
+ const int16x8_t diff = vreinterpretq_s16_u16(vsubq_u16(s0, r0));
+ sum_s32 = vpadalq_s16(sum_s32, diff);
+
+ sse_s32[0] =
+ vmlal_s16(sse_s32[0], vget_low_s16(diff), vget_low_s16(diff));
+ sse_s32[1] =
+ vmlal_s16(sse_s32[1], vget_high_s16(diff), vget_high_s16(diff));
+
+ j += 8;
+ } while (j < w);
+
+ src_ptr += src_stride;
+ ref_ptr += ref_stride;
+ i++;
+ } while (i < h_tmp);
+
+ sse_s64 = vpadalq_s32(sse_s64, sse_s32[0]);
+ sse_s64 = vpadalq_s32(sse_s64, sse_s32[1]);
+ h_tmp += h_limit;
+ } while (i < h);
+
+ *sum = horizontal_add_s32x4(sum_s32);
+ *sse = (uint64_t)horizontal_add_s64x2(sse_s64);
+}
+
+static INLINE void highbd_variance_32xh_xlarge_neon(
+ const uint16_t *src, int src_stride, const uint16_t *ref, int ref_stride,
+ int h, uint64_t *sse, int64_t *sum) {
+ highbd_variance_xlarge_neon(src, src_stride, ref, ref_stride, 32, h, 32, sse,
+ sum);
+}
+
+static INLINE void highbd_variance_64xh_xlarge_neon(
+ const uint16_t *src, int src_stride, const uint16_t *ref, int ref_stride,
+ int h, uint64_t *sse, int64_t *sum) {
+ highbd_variance_xlarge_neon(src, src_stride, ref, ref_stride, 64, h, 16, sse,
+ sum);
+}
+
+static INLINE void highbd_variance_128xh_xlarge_neon(
+ const uint16_t *src, int src_stride, const uint16_t *ref, int ref_stride,
+ int h, uint64_t *sse, int64_t *sum) {
+ highbd_variance_xlarge_neon(src, src_stride, ref, ref_stride, 128, h, 8, sse,
+ sum);
+}
+
+#define HBD_VARIANCE_WXH_8_NEON(w, h) \
+ uint32_t aom_highbd_8_variance##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ int sum; \
+ uint64_t sse_long = 0; \
+ int64_t sum_long = 0; \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_variance_##w##xh_neon(src, src_stride, ref, ref_stride, h, \
+ &sse_long, &sum_long); \
+ *sse = (uint32_t)sse_long; \
+ sum = (int)sum_long; \
+ return *sse - (uint32_t)(((int64_t)sum * sum) / (w * h)); \
}
-VAR_FN(128, 128, 16, 14)
-VAR_FN(128, 64, 16, 13)
-VAR_FN(64, 128, 16, 13)
-VAR_FN(64, 64, 16, 12)
-VAR_FN(64, 32, 16, 11)
-VAR_FN(32, 64, 16, 11)
-VAR_FN(32, 32, 16, 10)
-VAR_FN(32, 16, 16, 9)
-VAR_FN(16, 32, 16, 9)
-VAR_FN(16, 16, 16, 8)
-VAR_FN(16, 8, 8, 7)
-VAR_FN(8, 16, 8, 7)
-VAR_FN(8, 8, 8, 6)
+#define HBD_VARIANCE_WXH_10_NEON(w, h) \
+ uint32_t aom_highbd_10_variance##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ int sum; \
+ int64_t var; \
+ uint64_t sse_long = 0; \
+ int64_t sum_long = 0; \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_variance_##w##xh_neon(src, src_stride, ref, ref_stride, h, \
+ &sse_long, &sum_long); \
+ *sse = (uint32_t)ROUND_POWER_OF_TWO(sse_long, 4); \
+ sum = (int)ROUND_POWER_OF_TWO(sum_long, 2); \
+ var = (int64_t)(*sse) - (((int64_t)sum * sum) / (w * h)); \
+ return (var >= 0) ? (uint32_t)var : 0; \
+ }
-VAR_FN(16, 4, 4, 6)
-VAR_FN(4, 16, 4, 6)
+#define HBD_VARIANCE_WXH_12_NEON(w, h) \
+ uint32_t aom_highbd_12_variance##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ int sum; \
+ int64_t var; \
+ uint64_t sse_long = 0; \
+ int64_t sum_long = 0; \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_variance_##w##xh_neon(src, src_stride, ref, ref_stride, h, \
+ &sse_long, &sum_long); \
+ *sse = (uint32_t)ROUND_POWER_OF_TWO(sse_long, 8); \
+ sum = (int)ROUND_POWER_OF_TWO(sum_long, 4); \
+ var = (int64_t)(*sse) - (((int64_t)sum * sum) / (w * h)); \
+ return (var >= 0) ? (uint32_t)var : 0; \
+ }
-VAR_FN(8, 4, 4, 5)
-VAR_FN(4, 8, 4, 5)
-VAR_FN(4, 4, 4, 4)
+#define HBD_VARIANCE_WXH_12_XLARGE_NEON(w, h) \
+ uint32_t aom_highbd_12_variance##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ int sum; \
+ int64_t var; \
+ uint64_t sse_long = 0; \
+ int64_t sum_long = 0; \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_variance_##w##xh_xlarge_neon(src, src_stride, ref, ref_stride, h, \
+ &sse_long, &sum_long); \
+ *sse = (uint32_t)ROUND_POWER_OF_TWO(sse_long, 8); \
+ sum = (int)ROUND_POWER_OF_TWO(sum_long, 4); \
+ var = (int64_t)(*sse) - (((int64_t)sum * sum) / (w * h)); \
+ return (var >= 0) ? (uint32_t)var : 0; \
+ }
+
+// 8-bit
+HBD_VARIANCE_WXH_8_NEON(4, 4)
+HBD_VARIANCE_WXH_8_NEON(4, 8)
+
+HBD_VARIANCE_WXH_8_NEON(8, 4)
+HBD_VARIANCE_WXH_8_NEON(8, 8)
+HBD_VARIANCE_WXH_8_NEON(8, 16)
+
+HBD_VARIANCE_WXH_8_NEON(16, 8)
+HBD_VARIANCE_WXH_8_NEON(16, 16)
+HBD_VARIANCE_WXH_8_NEON(16, 32)
+
+HBD_VARIANCE_WXH_8_NEON(32, 16)
+HBD_VARIANCE_WXH_8_NEON(32, 32)
+HBD_VARIANCE_WXH_8_NEON(32, 64)
+
+HBD_VARIANCE_WXH_8_NEON(64, 32)
+HBD_VARIANCE_WXH_8_NEON(64, 64)
+HBD_VARIANCE_WXH_8_NEON(64, 128)
+
+HBD_VARIANCE_WXH_8_NEON(128, 64)
+HBD_VARIANCE_WXH_8_NEON(128, 128)
+
+// 10-bit
+HBD_VARIANCE_WXH_10_NEON(4, 4)
+HBD_VARIANCE_WXH_10_NEON(4, 8)
+
+HBD_VARIANCE_WXH_10_NEON(8, 4)
+HBD_VARIANCE_WXH_10_NEON(8, 8)
+HBD_VARIANCE_WXH_10_NEON(8, 16)
+
+HBD_VARIANCE_WXH_10_NEON(16, 8)
+HBD_VARIANCE_WXH_10_NEON(16, 16)
+HBD_VARIANCE_WXH_10_NEON(16, 32)
+
+HBD_VARIANCE_WXH_10_NEON(32, 16)
+HBD_VARIANCE_WXH_10_NEON(32, 32)
+HBD_VARIANCE_WXH_10_NEON(32, 64)
+
+HBD_VARIANCE_WXH_10_NEON(64, 32)
+HBD_VARIANCE_WXH_10_NEON(64, 64)
+HBD_VARIANCE_WXH_10_NEON(64, 128)
+
+HBD_VARIANCE_WXH_10_NEON(128, 64)
+HBD_VARIANCE_WXH_10_NEON(128, 128)
+
+// 12-bit
+HBD_VARIANCE_WXH_12_NEON(4, 4)
+HBD_VARIANCE_WXH_12_NEON(4, 8)
+
+HBD_VARIANCE_WXH_12_NEON(8, 4)
+HBD_VARIANCE_WXH_12_NEON(8, 8)
+HBD_VARIANCE_WXH_12_NEON(8, 16)
+
+HBD_VARIANCE_WXH_12_NEON(16, 8)
+HBD_VARIANCE_WXH_12_NEON(16, 16)
+HBD_VARIANCE_WXH_12_NEON(16, 32)
+
+HBD_VARIANCE_WXH_12_NEON(32, 16)
+HBD_VARIANCE_WXH_12_NEON(32, 32)
+HBD_VARIANCE_WXH_12_XLARGE_NEON(32, 64)
+
+HBD_VARIANCE_WXH_12_XLARGE_NEON(64, 32)
+HBD_VARIANCE_WXH_12_XLARGE_NEON(64, 64)
+HBD_VARIANCE_WXH_12_XLARGE_NEON(64, 128)
+
+HBD_VARIANCE_WXH_12_XLARGE_NEON(128, 64)
+HBD_VARIANCE_WXH_12_XLARGE_NEON(128, 128)
#if !CONFIG_REALTIME_ONLY
-VAR_FN(64, 16, 16, 10)
-VAR_FN(16, 64, 16, 10)
-VAR_FN(8, 32, 8, 8)
-VAR_FN(32, 8, 8, 8)
+// 8-bit
+HBD_VARIANCE_WXH_8_NEON(4, 16)
+
+HBD_VARIANCE_WXH_8_NEON(8, 32)
+
+HBD_VARIANCE_WXH_8_NEON(16, 4)
+HBD_VARIANCE_WXH_8_NEON(16, 64)
+
+HBD_VARIANCE_WXH_8_NEON(32, 8)
+
+HBD_VARIANCE_WXH_8_NEON(64, 16)
+
+// 10-bit
+HBD_VARIANCE_WXH_10_NEON(4, 16)
+
+HBD_VARIANCE_WXH_10_NEON(8, 32)
+
+HBD_VARIANCE_WXH_10_NEON(16, 4)
+HBD_VARIANCE_WXH_10_NEON(16, 64)
+
+HBD_VARIANCE_WXH_10_NEON(32, 8)
+
+HBD_VARIANCE_WXH_10_NEON(64, 16)
+
+// 12-bit
+HBD_VARIANCE_WXH_12_NEON(4, 16)
+
+HBD_VARIANCE_WXH_12_NEON(8, 32)
+
+HBD_VARIANCE_WXH_12_NEON(16, 4)
+HBD_VARIANCE_WXH_12_NEON(16, 64)
+
+HBD_VARIANCE_WXH_12_NEON(32, 8)
+
+HBD_VARIANCE_WXH_12_NEON(64, 16)
+
#endif // !CONFIG_REALTIME_ONLY
-#undef VAR_FN
+static INLINE uint32_t highbd_mse_wxh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int w, int h,
+ unsigned int *sse) {
+ uint32x4_t sse_u32[2] = { vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ int i = h;
+ do {
+ int j = 0;
+ do {
+ uint16x8_t s = vld1q_u16(src_ptr + j);
+ uint16x8_t r = vld1q_u16(ref_ptr + j);
+
+ uint16x8_t diff = vabdq_u16(s, r);
+
+ sse_u32[0] =
+ vmlal_u16(sse_u32[0], vget_low_u16(diff), vget_low_u16(diff));
+ sse_u32[1] =
+ vmlal_u16(sse_u32[1], vget_high_u16(diff), vget_high_u16(diff));
+
+ j += 8;
+ } while (j < w);
+
+ src_ptr += src_stride;
+ ref_ptr += ref_stride;
+ } while (--i != 0);
+
+ *sse = horizontal_add_u32x4(vaddq_u32(sse_u32[0], sse_u32[1]));
+ return *sse;
+}
+
+#if defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE uint32_t highbd_mse8_8xh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int h,
+ unsigned int *sse) {
+ uint32x4_t sse_u32 = vdupq_n_u32(0);
+
+ int i = h / 2;
+ do {
+ uint16x8_t s0 = vld1q_u16(src_ptr);
+ src_ptr += src_stride;
+ uint16x8_t s1 = vld1q_u16(src_ptr);
+ src_ptr += src_stride;
+ uint16x8_t r0 = vld1q_u16(ref_ptr);
+ ref_ptr += ref_stride;
+ uint16x8_t r1 = vld1q_u16(ref_ptr);
+ ref_ptr += ref_stride;
+
+ uint8x16_t s = vcombine_u8(vmovn_u16(s0), vmovn_u16(s1));
+ uint8x16_t r = vcombine_u8(vmovn_u16(r0), vmovn_u16(r1));
+
+ uint8x16_t diff = vabdq_u8(s, r);
+ sse_u32 = vdotq_u32(sse_u32, diff, diff);
+ } while (--i != 0);
+
+ *sse = horizontal_add_u32x4(sse_u32);
+ return *sse;
+}
+
+static INLINE uint32_t highbd_mse8_16xh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int h,
+ unsigned int *sse) {
+ uint32x4_t sse_u32 = vdupq_n_u32(0);
+
+ int i = h;
+ do {
+ uint16x8_t s0 = vld1q_u16(src_ptr);
+ uint16x8_t s1 = vld1q_u16(src_ptr + 8);
+ uint16x8_t r0 = vld1q_u16(ref_ptr);
+ uint16x8_t r1 = vld1q_u16(ref_ptr + 8);
+
+ uint8x16_t s = vcombine_u8(vmovn_u16(s0), vmovn_u16(s1));
+ uint8x16_t r = vcombine_u8(vmovn_u16(r0), vmovn_u16(r1));
+
+ uint8x16_t diff = vabdq_u8(s, r);
+ sse_u32 = vdotq_u32(sse_u32, diff, diff);
+
+ src_ptr += src_stride;
+ ref_ptr += ref_stride;
+ } while (--i != 0);
+
+ *sse = horizontal_add_u32x4(sse_u32);
+ return *sse;
+}
+
+#else // !defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE uint32_t highbd_mse8_8xh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int h,
+ unsigned int *sse) {
+ return highbd_mse_wxh_neon(src_ptr, src_stride, ref_ptr, ref_stride, 8, h,
+ sse);
+}
+
+static INLINE uint32_t highbd_mse8_16xh_neon(const uint16_t *src_ptr,
+ int src_stride,
+ const uint16_t *ref_ptr,
+ int ref_stride, int h,
+ unsigned int *sse) {
+ return highbd_mse_wxh_neon(src_ptr, src_stride, ref_ptr, ref_stride, 16, h,
+ sse);
+}
+
+#endif // defined(__ARM_FEATURE_DOTPROD)
+
+#define HIGHBD_MSE_WXH_NEON(w, h) \
+ uint32_t aom_highbd_8_mse##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_mse8_##w##xh_neon(src, src_stride, ref, ref_stride, h, sse); \
+ return *sse; \
+ } \
+ \
+ uint32_t aom_highbd_10_mse##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_mse_wxh_neon(src, src_stride, ref, ref_stride, w, h, sse); \
+ *sse = ROUND_POWER_OF_TWO(*sse, 4); \
+ return *sse; \
+ } \
+ \
+ uint32_t aom_highbd_12_mse##w##x##h##_neon( \
+ const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, \
+ int ref_stride, uint32_t *sse) { \
+ uint16_t *src = CONVERT_TO_SHORTPTR(src_ptr); \
+ uint16_t *ref = CONVERT_TO_SHORTPTR(ref_ptr); \
+ highbd_mse_wxh_neon(src, src_stride, ref, ref_stride, w, h, sse); \
+ *sse = ROUND_POWER_OF_TWO(*sse, 8); \
+ return *sse; \
+ }
+
+HIGHBD_MSE_WXH_NEON(16, 16)
+HIGHBD_MSE_WXH_NEON(16, 8)
+HIGHBD_MSE_WXH_NEON(8, 16)
+HIGHBD_MSE_WXH_NEON(8, 8)
+
+#undef HIGHBD_MSE_WXH_NEON
diff --git a/aom_dsp/arm/intrapred_neon.c b/aom_dsp/arm/intrapred_neon.c
index 8e6dc12..2161378 100644
--- a/aom_dsp/arm/intrapred_neon.c
+++ b/aom_dsp/arm/intrapred_neon.c
@@ -17,518 +17,1029 @@
#include "aom/aom_integer.h"
#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
#include "aom_dsp/intrapred_common.h"
//------------------------------------------------------------------------------
// DC 4x4
-// 'do_above' and 'do_left' facilitate branch removal when inlined.
-static INLINE void dc_4x4(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
- const uint8_t *left, int do_above, int do_left) {
- uint16x8_t sum_top;
- uint16x8_t sum_left;
- uint8x8_t dc0;
+static INLINE uint16x8_t dc_load_sum_4(const uint8_t *in) {
+ const uint8x8_t a = load_u8_4x1_lane0(in);
+ const uint16x4_t p0 = vpaddl_u8(a);
+ const uint16x4_t p1 = vpadd_u16(p0, p0);
+ return vcombine_u16(p1, vdup_n_u16(0));
+}
- if (do_above) {
- const uint8x8_t A = vld1_u8(above); // top row
- const uint16x4_t p0 = vpaddl_u8(A); // cascading summation of the top
- const uint16x4_t p1 = vpadd_u16(p0, p0);
- sum_top = vcombine_u16(p1, p1);
- }
-
- if (do_left) {
- const uint8x8_t L = vld1_u8(left); // left border
- const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
- const uint16x4_t p1 = vpadd_u16(p0, p0);
- sum_left = vcombine_u16(p1, p1);
- }
-
- if (do_above && do_left) {
- const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
- dc0 = vrshrn_n_u16(sum, 3);
- } else if (do_above) {
- dc0 = vrshrn_n_u16(sum_top, 2);
- } else if (do_left) {
- dc0 = vrshrn_n_u16(sum_left, 2);
- } else {
- dc0 = vdup_n_u8(0x80);
- }
-
- {
- const uint8x8_t dc = vdup_lane_u8(dc0, 0);
- int i;
- for (i = 0; i < 4; ++i) {
- vst1_lane_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc), 0);
- }
+static INLINE void dc_store_4xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ store_u8_4x1(dst + i * stride, dc, 0);
}
}
void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- dc_4x4(dst, stride, above, left, 1, 1);
+ const uint16x8_t sum_top = dc_load_sum_4(above);
+ const uint16x8_t sum_left = dc_load_sum_4(left);
+ const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, 3);
+ dc_store_4xh(dst, stride, 4, vdup_lane_u8(dc0, 0));
}
void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint16x8_t sum_left = dc_load_sum_4(left);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_left, 2);
(void)above;
- dc_4x4(dst, stride, NULL, left, 0, 1);
+ dc_store_4xh(dst, stride, 4, vdup_lane_u8(dc0, 0));
}
void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_sum_4(above);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_top, 2);
(void)left;
- dc_4x4(dst, stride, above, NULL, 1, 0);
+ dc_store_4xh(dst, stride, 4, vdup_lane_u8(dc0, 0));
}
void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t dc0 = vdup_n_u8(0x80);
(void)above;
(void)left;
- dc_4x4(dst, stride, NULL, NULL, 0, 0);
+ dc_store_4xh(dst, stride, 4, dc0);
}
//------------------------------------------------------------------------------
// DC 8x8
-// 'do_above' and 'do_left' facilitate branch removal when inlined.
-static INLINE void dc_8x8(uint8_t *dst, ptrdiff_t stride, const uint8_t *above,
- const uint8_t *left, int do_above, int do_left) {
- uint16x8_t sum_top;
- uint16x8_t sum_left;
- uint8x8_t dc0;
+static INLINE uint16x8_t dc_load_sum_8(const uint8_t *in) {
+ // This isn't used in the case where we want to load both above and left
+ // vectors, since we want to avoid performing the reduction twice.
+ const uint8x8_t a = vld1_u8(in);
+ const uint16x4_t p0 = vpaddl_u8(a);
+ const uint16x4_t p1 = vpadd_u16(p0, p0);
+ const uint16x4_t p2 = vpadd_u16(p1, p1);
+ return vcombine_u16(p2, vdup_n_u16(0));
+}
- if (do_above) {
- const uint8x8_t A = vld1_u8(above); // top row
- const uint16x4_t p0 = vpaddl_u8(A); // cascading summation of the top
- const uint16x4_t p1 = vpadd_u16(p0, p0);
- const uint16x4_t p2 = vpadd_u16(p1, p1);
- sum_top = vcombine_u16(p2, p2);
- }
+static INLINE uint16x8_t horizontal_add_and_broadcast_u16x8(uint16x8_t a) {
+#if AOM_ARCH_AARCH64
+ // On AArch64 we could also use vdupq_n_u16(vaddvq_u16(a)) here to save an
+ // instruction, however the addv instruction is usually slightly more
+ // expensive than a pairwise addition, so the need for immediately
+ // broadcasting the result again seems to negate any benefit.
+ const uint16x8_t b = vpaddq_u16(a, a);
+ const uint16x8_t c = vpaddq_u16(b, b);
+ return vpaddq_u16(c, c);
+#else
+ const uint16x4_t b = vadd_u16(vget_low_u16(a), vget_high_u16(a));
+ const uint16x4_t c = vpadd_u16(b, b);
+ const uint16x4_t d = vpadd_u16(c, c);
+ return vcombine_u16(d, d);
+#endif
+}
- if (do_left) {
- const uint8x8_t L = vld1_u8(left); // left border
- const uint16x4_t p0 = vpaddl_u8(L); // cascading summation of the left
- const uint16x4_t p1 = vpadd_u16(p0, p0);
- const uint16x4_t p2 = vpadd_u16(p1, p1);
- sum_left = vcombine_u16(p2, p2);
- }
-
- if (do_above && do_left) {
- const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
- dc0 = vrshrn_n_u16(sum, 4);
- } else if (do_above) {
- dc0 = vrshrn_n_u16(sum_top, 3);
- } else if (do_left) {
- dc0 = vrshrn_n_u16(sum_left, 3);
- } else {
- dc0 = vdup_n_u8(0x80);
- }
-
- {
- const uint8x8_t dc = vdup_lane_u8(dc0, 0);
- int i;
- for (i = 0; i < 8; ++i) {
- vst1_u32((uint32_t *)(dst + i * stride), vreinterpret_u32_u8(dc));
- }
+static INLINE void dc_store_8xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x8_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1_u8(dst + i * stride, dc);
}
}
void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- dc_8x8(dst, stride, above, left, 1, 1);
+ const uint8x8_t sum_top = vld1_u8(above);
+ const uint8x8_t sum_left = vld1_u8(left);
+ uint16x8_t sum = vaddl_u8(sum_left, sum_top);
+ sum = horizontal_add_and_broadcast_u16x8(sum);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, 4);
+ dc_store_8xh(dst, stride, 8, vdup_lane_u8(dc0, 0));
}
void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint16x8_t sum_left = dc_load_sum_8(left);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_left, 3);
(void)above;
- dc_8x8(dst, stride, NULL, left, 0, 1);
+ dc_store_8xh(dst, stride, 8, vdup_lane_u8(dc0, 0));
}
void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_sum_8(above);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_top, 3);
(void)left;
- dc_8x8(dst, stride, above, NULL, 1, 0);
+ dc_store_8xh(dst, stride, 8, vdup_lane_u8(dc0, 0));
}
void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t dc0 = vdup_n_u8(0x80);
(void)above;
(void)left;
- dc_8x8(dst, stride, NULL, NULL, 0, 0);
+ dc_store_8xh(dst, stride, 8, dc0);
}
//------------------------------------------------------------------------------
// DC 16x16
-// 'do_above' and 'do_left' facilitate branch removal when inlined.
-static INLINE void dc_16x16(uint8_t *dst, ptrdiff_t stride,
- const uint8_t *above, const uint8_t *left,
- int do_above, int do_left) {
- uint16x8_t sum_top;
- uint16x8_t sum_left;
- uint8x8_t dc0;
+static INLINE uint16x8_t dc_load_partial_sum_16(const uint8_t *in) {
+ const uint8x16_t a = vld1q_u8(in);
+ // delay the remainder of the reduction until
+ // horizontal_add_and_broadcast_u16x8, since we want to do it once rather
+ // than twice in the case we are loading both above and left.
+ return vpaddlq_u8(a);
+}
- if (do_above) {
- const uint8x16_t A = vld1q_u8(above); // top row
- const uint16x8_t p0 = vpaddlq_u8(A); // cascading summation of the top
- const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0));
- const uint16x4_t p2 = vpadd_u16(p1, p1);
- const uint16x4_t p3 = vpadd_u16(p2, p2);
- sum_top = vcombine_u16(p3, p3);
- }
+static INLINE uint16x8_t dc_load_sum_16(const uint8_t *in) {
+ return horizontal_add_and_broadcast_u16x8(dc_load_partial_sum_16(in));
+}
- if (do_left) {
- const uint8x16_t L = vld1q_u8(left); // left row
- const uint16x8_t p0 = vpaddlq_u8(L); // cascading summation of the left
- const uint16x4_t p1 = vadd_u16(vget_low_u16(p0), vget_high_u16(p0));
- const uint16x4_t p2 = vpadd_u16(p1, p1);
- const uint16x4_t p3 = vpadd_u16(p2, p2);
- sum_left = vcombine_u16(p3, p3);
- }
-
- if (do_above && do_left) {
- const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
- dc0 = vrshrn_n_u16(sum, 5);
- } else if (do_above) {
- dc0 = vrshrn_n_u16(sum_top, 4);
- } else if (do_left) {
- dc0 = vrshrn_n_u16(sum_left, 4);
- } else {
- dc0 = vdup_n_u8(0x80);
- }
-
- {
- const uint8x16_t dc = vdupq_lane_u8(dc0, 0);
- int i;
- for (i = 0; i < 16; ++i) {
- vst1q_u8(dst + i * stride, dc);
- }
+static INLINE void dc_store_16xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + i * stride, dc);
}
}
void aom_dc_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- dc_16x16(dst, stride, above, left, 1, 1);
+ const uint16x8_t sum_top = dc_load_partial_sum_16(above);
+ const uint16x8_t sum_left = dc_load_partial_sum_16(left);
+ uint16x8_t sum = vaddq_u16(sum_left, sum_top);
+ sum = horizontal_add_and_broadcast_u16x8(sum);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, 5);
+ dc_store_16xh(dst, stride, 16, vdupq_lane_u8(dc0, 0));
}
void aom_dc_left_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint16x8_t sum_left = dc_load_sum_16(left);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_left, 4);
(void)above;
- dc_16x16(dst, stride, NULL, left, 0, 1);
+ dc_store_16xh(dst, stride, 16, vdupq_lane_u8(dc0, 0));
}
void aom_dc_top_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_sum_16(above);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_top, 4);
(void)left;
- dc_16x16(dst, stride, above, NULL, 1, 0);
+ dc_store_16xh(dst, stride, 16, vdupq_lane_u8(dc0, 0));
}
void aom_dc_128_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint8x16_t dc0 = vdupq_n_u8(0x80);
(void)above;
(void)left;
- dc_16x16(dst, stride, NULL, NULL, 0, 0);
+ dc_store_16xh(dst, stride, 16, dc0);
}
//------------------------------------------------------------------------------
// DC 32x32
-// 'do_above' and 'do_left' facilitate branch removal when inlined.
-static INLINE void dc_32x32(uint8_t *dst, ptrdiff_t stride,
- const uint8_t *above, const uint8_t *left,
- int do_above, int do_left) {
- uint16x8_t sum_top;
- uint16x8_t sum_left;
- uint8x8_t dc0;
+static INLINE uint16x8_t dc_load_partial_sum_32(const uint8_t *in) {
+ const uint8x16_t a0 = vld1q_u8(in);
+ const uint8x16_t a1 = vld1q_u8(in + 16);
+ // delay the remainder of the reduction until
+ // horizontal_add_and_broadcast_u16x8, since we want to do it once rather
+ // than twice in the case we are loading both above and left.
+ return vpadalq_u8(vpaddlq_u8(a0), a1);
+}
- if (do_above) {
- const uint8x16_t A0 = vld1q_u8(above); // top row
- const uint8x16_t A1 = vld1q_u8(above + 16);
- const uint16x8_t p0 = vpaddlq_u8(A0); // cascading summation of the top
- const uint16x8_t p1 = vpaddlq_u8(A1);
- const uint16x8_t p2 = vaddq_u16(p0, p1);
- const uint16x4_t p3 = vadd_u16(vget_low_u16(p2), vget_high_u16(p2));
- const uint16x4_t p4 = vpadd_u16(p3, p3);
- const uint16x4_t p5 = vpadd_u16(p4, p4);
- sum_top = vcombine_u16(p5, p5);
- }
+static INLINE uint16x8_t dc_load_sum_32(const uint8_t *in) {
+ return horizontal_add_and_broadcast_u16x8(dc_load_partial_sum_32(in));
+}
- if (do_left) {
- const uint8x16_t L0 = vld1q_u8(left); // left row
- const uint8x16_t L1 = vld1q_u8(left + 16);
- const uint16x8_t p0 = vpaddlq_u8(L0); // cascading summation of the left
- const uint16x8_t p1 = vpaddlq_u8(L1);
- const uint16x8_t p2 = vaddq_u16(p0, p1);
- const uint16x4_t p3 = vadd_u16(vget_low_u16(p2), vget_high_u16(p2));
- const uint16x4_t p4 = vpadd_u16(p3, p3);
- const uint16x4_t p5 = vpadd_u16(p4, p4);
- sum_left = vcombine_u16(p5, p5);
- }
-
- if (do_above && do_left) {
- const uint16x8_t sum = vaddq_u16(sum_left, sum_top);
- dc0 = vrshrn_n_u16(sum, 6);
- } else if (do_above) {
- dc0 = vrshrn_n_u16(sum_top, 5);
- } else if (do_left) {
- dc0 = vrshrn_n_u16(sum_left, 5);
- } else {
- dc0 = vdup_n_u8(0x80);
- }
-
- {
- const uint8x16_t dc = vdupq_lane_u8(dc0, 0);
- int i;
- for (i = 0; i < 32; ++i) {
- vst1q_u8(dst + i * stride, dc);
- vst1q_u8(dst + i * stride + 16, dc);
- }
+static INLINE void dc_store_32xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + i * stride, dc);
+ vst1q_u8(dst + i * stride + 16, dc);
}
}
void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- dc_32x32(dst, stride, above, left, 1, 1);
+ const uint16x8_t sum_top = dc_load_partial_sum_32(above);
+ const uint16x8_t sum_left = dc_load_partial_sum_32(left);
+ uint16x8_t sum = vaddq_u16(sum_left, sum_top);
+ sum = horizontal_add_and_broadcast_u16x8(sum);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, 6);
+ dc_store_32xh(dst, stride, 32, vdupq_lane_u8(dc0, 0));
}
void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint16x8_t sum_left = dc_load_sum_32(left);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_left, 5);
(void)above;
- dc_32x32(dst, stride, NULL, left, 0, 1);
+ dc_store_32xh(dst, stride, 32, vdupq_lane_u8(dc0, 0));
}
void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_sum_32(above);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_top, 5);
(void)left;
- dc_32x32(dst, stride, above, NULL, 1, 0);
+ dc_store_32xh(dst, stride, 32, vdupq_lane_u8(dc0, 0));
}
void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above,
const uint8_t *left) {
+ const uint8x16_t dc0 = vdupq_n_u8(0x80);
(void)above;
(void)left;
- dc_32x32(dst, stride, NULL, NULL, 0, 0);
+ dc_store_32xh(dst, stride, 32, dc0);
}
+//------------------------------------------------------------------------------
+// DC 64x64
+
+static INLINE uint16x8_t dc_load_partial_sum_64(const uint8_t *in) {
+ const uint8x16_t a0 = vld1q_u8(in);
+ const uint8x16_t a1 = vld1q_u8(in + 16);
+ const uint8x16_t a2 = vld1q_u8(in + 32);
+ const uint8x16_t a3 = vld1q_u8(in + 48);
+ const uint16x8_t p01 = vpadalq_u8(vpaddlq_u8(a0), a1);
+ const uint16x8_t p23 = vpadalq_u8(vpaddlq_u8(a2), a3);
+ // delay the remainder of the reduction until
+ // horizontal_add_and_broadcast_u16x8, since we want to do it once rather
+ // than twice in the case we are loading both above and left.
+ return vaddq_u16(p01, p23);
+}
+
+static INLINE uint16x8_t dc_load_sum_64(const uint8_t *in) {
+ return horizontal_add_and_broadcast_u16x8(dc_load_partial_sum_64(in));
+}
+
+static INLINE void dc_store_64xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t dc) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + i * stride, dc);
+ vst1q_u8(dst + i * stride + 16, dc);
+ vst1q_u8(dst + i * stride + 32, dc);
+ vst1q_u8(dst + i * stride + 48, dc);
+ }
+}
+
+void aom_dc_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_partial_sum_64(above);
+ const uint16x8_t sum_left = dc_load_partial_sum_64(left);
+ uint16x8_t sum = vaddq_u16(sum_left, sum_top);
+ sum = horizontal_add_and_broadcast_u16x8(sum);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, 7);
+ dc_store_64xh(dst, stride, 64, vdupq_lane_u8(dc0, 0));
+}
+
+void aom_dc_left_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above,
+ const uint8_t *left) {
+ const uint16x8_t sum_left = dc_load_sum_64(left);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_left, 6);
+ (void)above;
+ dc_store_64xh(dst, stride, 64, vdupq_lane_u8(dc0, 0));
+}
+
+void aom_dc_top_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above,
+ const uint8_t *left) {
+ const uint16x8_t sum_top = dc_load_sum_64(above);
+ const uint8x8_t dc0 = vrshrn_n_u16(sum_top, 6);
+ (void)left;
+ dc_store_64xh(dst, stride, 64, vdupq_lane_u8(dc0, 0));
+}
+
+void aom_dc_128_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above,
+ const uint8_t *left) {
+ const uint8x16_t dc0 = vdupq_n_u8(0x80);
+ (void)above;
+ (void)left;
+ dc_store_64xh(dst, stride, 64, dc0);
+}
+
+//------------------------------------------------------------------------------
+// DC rectangular cases
+
+#define DC_MULTIPLIER_1X2 0x5556
+#define DC_MULTIPLIER_1X4 0x3334
+
+#define DC_SHIFT2 16
+
+static INLINE int divide_using_multiply_shift(int num, int shift1,
+ int multiplier, int shift2) {
+ const int interm = num >> shift1;
+ return interm * multiplier >> shift2;
+}
+
+static INLINE int calculate_dc_from_sum(int bw, int bh, uint32_t sum,
+ int shift1, int multiplier) {
+ const int expected_dc = divide_using_multiply_shift(
+ sum + ((bw + bh) >> 1), shift1, multiplier, DC_SHIFT2);
+ assert(expected_dc < (1 << 8));
+ return expected_dc;
+}
+
+#undef DC_SHIFT2
+
+void aom_dc_predictor_4x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x8_t a = load_u8_4x1_lane0(above);
+ uint8x8_t l = vld1_u8(left);
+ uint32_t sum = horizontal_add_u16x8(vaddl_u8(a, l));
+ uint32_t dc = calculate_dc_from_sum(4, 8, sum, 2, DC_MULTIPLIER_1X2);
+ dc_store_4xh(dst, stride, 8, vdup_n_u8(dc));
+}
+
+void aom_dc_predictor_8x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x8_t a = vld1_u8(above);
+ uint8x8_t l = load_u8_4x1_lane0(left);
+ uint32_t sum = horizontal_add_u16x8(vaddl_u8(a, l));
+ uint32_t dc = calculate_dc_from_sum(8, 4, sum, 2, DC_MULTIPLIER_1X2);
+ dc_store_8xh(dst, stride, 4, vdup_n_u8(dc));
+}
+
+void aom_dc_predictor_4x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x8_t a = load_u8_4x1_lane0(above);
+ uint8x16_t l = vld1q_u8(left);
+ uint16x8_t sum_al = vaddw_u8(vpaddlq_u8(l), a);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(4, 16, sum, 2, DC_MULTIPLIER_1X4);
+ dc_store_4xh(dst, stride, 16, vdup_n_u8(dc));
+}
+
+void aom_dc_predictor_16x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x16_t a = vld1q_u8(above);
+ uint8x8_t l = load_u8_4x1_lane0(left);
+ uint16x8_t sum_al = vaddw_u8(vpaddlq_u8(a), l);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(16, 4, sum, 2, DC_MULTIPLIER_1X4);
+ dc_store_16xh(dst, stride, 4, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_8x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x8_t a = vld1_u8(above);
+ uint8x16_t l = vld1q_u8(left);
+ uint16x8_t sum_al = vaddw_u8(vpaddlq_u8(l), a);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(8, 16, sum, 3, DC_MULTIPLIER_1X2);
+ dc_store_8xh(dst, stride, 16, vdup_n_u8(dc));
+}
+
+void aom_dc_predictor_16x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x16_t a = vld1q_u8(above);
+ uint8x8_t l = vld1_u8(left);
+ uint16x8_t sum_al = vaddw_u8(vpaddlq_u8(a), l);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(16, 8, sum, 3, DC_MULTIPLIER_1X2);
+ dc_store_16xh(dst, stride, 8, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_8x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint8x8_t a = vld1_u8(above);
+ uint16x8_t sum_left = dc_load_partial_sum_32(left);
+ uint16x8_t sum_al = vaddw_u8(sum_left, a);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(8, 32, sum, 3, DC_MULTIPLIER_1X4);
+ dc_store_8xh(dst, stride, 32, vdup_n_u8(dc));
+}
+
+void aom_dc_predictor_32x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_top = dc_load_partial_sum_32(above);
+ uint8x8_t l = vld1_u8(left);
+ uint16x8_t sum_al = vaddw_u8(sum_top, l);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(32, 8, sum, 3, DC_MULTIPLIER_1X4);
+ dc_store_32xh(dst, stride, 8, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_16x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_16(above);
+ uint16x8_t sum_left = dc_load_partial_sum_32(left);
+ uint16x8_t sum_al = vaddq_u16(sum_left, sum_above);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(16, 32, sum, 4, DC_MULTIPLIER_1X2);
+ dc_store_16xh(dst, stride, 32, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_32x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_32(above);
+ uint16x8_t sum_left = dc_load_partial_sum_16(left);
+ uint16x8_t sum_al = vaddq_u16(sum_left, sum_above);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(32, 16, sum, 4, DC_MULTIPLIER_1X2);
+ dc_store_32xh(dst, stride, 16, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_16x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_16(above);
+ uint16x8_t sum_left = dc_load_partial_sum_64(left);
+ uint16x8_t sum_al = vaddq_u16(sum_left, sum_above);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(16, 64, sum, 4, DC_MULTIPLIER_1X4);
+ dc_store_16xh(dst, stride, 64, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_64x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_64(above);
+ uint16x8_t sum_left = dc_load_partial_sum_16(left);
+ uint16x8_t sum_al = vaddq_u16(sum_above, sum_left);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(64, 16, sum, 4, DC_MULTIPLIER_1X4);
+ dc_store_64xh(dst, stride, 16, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_32x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_32(above);
+ uint16x8_t sum_left = dc_load_partial_sum_64(left);
+ uint16x8_t sum_al = vaddq_u16(sum_above, sum_left);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(32, 64, sum, 5, DC_MULTIPLIER_1X2);
+ dc_store_32xh(dst, stride, 64, vdupq_n_u8(dc));
+}
+
+void aom_dc_predictor_64x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ uint16x8_t sum_above = dc_load_partial_sum_64(above);
+ uint16x8_t sum_left = dc_load_partial_sum_32(left);
+ uint16x8_t sum_al = vaddq_u16(sum_above, sum_left);
+ uint32_t sum = horizontal_add_u16x8(sum_al);
+ uint32_t dc = calculate_dc_from_sum(64, 32, sum, 5, DC_MULTIPLIER_1X2);
+ dc_store_64xh(dst, stride, 32, vdupq_n_u8(dc));
+}
+
+#undef DC_MULTIPLIER_1X2
+#undef DC_MULTIPLIER_1X4
+
+#define DC_PREDICTOR_128(w, h, q) \
+ void aom_dc_128_predictor_##w##x##h##_neon(uint8_t *dst, ptrdiff_t stride, \
+ const uint8_t *above, \
+ const uint8_t *left) { \
+ (void)above; \
+ (void)left; \
+ dc_store_##w##xh(dst, stride, (h), vdup##q##_n_u8(0x80)); \
+ }
+
+DC_PREDICTOR_128(4, 8, )
+DC_PREDICTOR_128(4, 16, )
+DC_PREDICTOR_128(8, 4, )
+DC_PREDICTOR_128(8, 16, )
+DC_PREDICTOR_128(8, 32, )
+DC_PREDICTOR_128(16, 4, q)
+DC_PREDICTOR_128(16, 8, q)
+DC_PREDICTOR_128(16, 32, q)
+DC_PREDICTOR_128(16, 64, q)
+DC_PREDICTOR_128(32, 8, q)
+DC_PREDICTOR_128(32, 16, q)
+DC_PREDICTOR_128(32, 64, q)
+DC_PREDICTOR_128(64, 32, q)
+DC_PREDICTOR_128(64, 16, q)
+
+#undef DC_PREDICTOR_128
+
+#define DC_PREDICTOR_LEFT(w, h, shift, q) \
+ void aom_dc_left_predictor_##w##x##h##_neon(uint8_t *dst, ptrdiff_t stride, \
+ const uint8_t *above, \
+ const uint8_t *left) { \
+ (void)above; \
+ const uint16x8_t sum = dc_load_sum_##h(left); \
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, (shift)); \
+ dc_store_##w##xh(dst, stride, (h), vdup##q##_lane_u8(dc0, 0)); \
+ }
+
+DC_PREDICTOR_LEFT(4, 8, 3, )
+DC_PREDICTOR_LEFT(8, 4, 2, )
+DC_PREDICTOR_LEFT(8, 16, 4, )
+DC_PREDICTOR_LEFT(16, 8, 3, q)
+DC_PREDICTOR_LEFT(16, 32, 5, q)
+DC_PREDICTOR_LEFT(32, 16, 4, q)
+DC_PREDICTOR_LEFT(32, 64, 6, q)
+DC_PREDICTOR_LEFT(64, 32, 5, q)
+DC_PREDICTOR_LEFT(4, 16, 4, )
+DC_PREDICTOR_LEFT(16, 4, 2, q)
+DC_PREDICTOR_LEFT(8, 32, 5, )
+DC_PREDICTOR_LEFT(32, 8, 3, q)
+DC_PREDICTOR_LEFT(16, 64, 6, q)
+DC_PREDICTOR_LEFT(64, 16, 4, q)
+
+#undef DC_PREDICTOR_LEFT
+
+#define DC_PREDICTOR_TOP(w, h, shift, q) \
+ void aom_dc_top_predictor_##w##x##h##_neon(uint8_t *dst, ptrdiff_t stride, \
+ const uint8_t *above, \
+ const uint8_t *left) { \
+ (void)left; \
+ const uint16x8_t sum = dc_load_sum_##w(above); \
+ const uint8x8_t dc0 = vrshrn_n_u16(sum, (shift)); \
+ dc_store_##w##xh(dst, stride, (h), vdup##q##_lane_u8(dc0, 0)); \
+ }
+
+DC_PREDICTOR_TOP(4, 8, 2, )
+DC_PREDICTOR_TOP(4, 16, 2, )
+DC_PREDICTOR_TOP(8, 4, 3, )
+DC_PREDICTOR_TOP(8, 16, 3, )
+DC_PREDICTOR_TOP(8, 32, 3, )
+DC_PREDICTOR_TOP(16, 4, 4, q)
+DC_PREDICTOR_TOP(16, 8, 4, q)
+DC_PREDICTOR_TOP(16, 32, 4, q)
+DC_PREDICTOR_TOP(16, 64, 4, q)
+DC_PREDICTOR_TOP(32, 8, 5, q)
+DC_PREDICTOR_TOP(32, 16, 5, q)
+DC_PREDICTOR_TOP(32, 64, 5, q)
+DC_PREDICTOR_TOP(64, 16, 6, q)
+DC_PREDICTOR_TOP(64, 32, 6, q)
+
+#undef DC_PREDICTOR_TOP
+
// -----------------------------------------------------------------------------
-void aom_d135_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
- const uint8_t *above, const uint8_t *left) {
- const uint8x8_t XABCD_u8 = vld1_u8(above - 1);
- const uint64x1_t XABCD = vreinterpret_u64_u8(XABCD_u8);
- const uint64x1_t ____XABC = vshl_n_u64(XABCD, 32);
- const uint32x2_t zero = vdup_n_u32(0);
- const uint32x2_t IJKL = vld1_lane_u32((const uint32_t *)left, zero, 0);
- const uint8x8_t IJKL_u8 = vreinterpret_u8_u32(IJKL);
- const uint64x1_t LKJI____ = vreinterpret_u64_u8(vrev32_u8(IJKL_u8));
- const uint64x1_t LKJIXABC = vorr_u64(LKJI____, ____XABC);
- const uint8x8_t KJIXABC_ = vreinterpret_u8_u64(vshr_n_u64(LKJIXABC, 8));
- const uint8x8_t JIXABC__ = vreinterpret_u8_u64(vshr_n_u64(LKJIXABC, 16));
- const uint8_t D = vget_lane_u8(XABCD_u8, 4);
- const uint8x8_t JIXABCD_ = vset_lane_u8(D, JIXABC__, 6);
- const uint8x8_t LKJIXABC_u8 = vreinterpret_u8_u64(LKJIXABC);
- const uint8x8_t avg1 = vhadd_u8(JIXABCD_, LKJIXABC_u8);
- const uint8x8_t avg2 = vrhadd_u8(avg1, KJIXABC_);
- const uint64x1_t avg2_u64 = vreinterpret_u64_u8(avg2);
- const uint32x2_t r3 = vreinterpret_u32_u8(avg2);
- const uint32x2_t r2 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 8));
- const uint32x2_t r1 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 16));
- const uint32x2_t r0 = vreinterpret_u32_u64(vshr_n_u64(avg2_u64, 24));
- vst1_lane_u32((uint32_t *)(dst + 0 * stride), r0, 0);
- vst1_lane_u32((uint32_t *)(dst + 1 * stride), r1, 0);
- vst1_lane_u32((uint32_t *)(dst + 2 * stride), r2, 0);
- vst1_lane_u32((uint32_t *)(dst + 3 * stride), r3, 0);
+static INLINE void v_store_4xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x8_t d0) {
+ for (int i = 0; i < h; ++i) {
+ store_u8_4x1(dst + i * stride, d0, 0);
+ }
+}
+
+static INLINE void v_store_8xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x8_t d0) {
+ for (int i = 0; i < h; ++i) {
+ vst1_u8(dst + i * stride, d0);
+ }
+}
+
+static INLINE void v_store_16xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t d0) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + i * stride, d0);
+ }
+}
+
+static INLINE void v_store_32xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t d0, uint8x16_t d1) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + 0, d0);
+ vst1q_u8(dst + 16, d1);
+ dst += stride;
+ }
+}
+
+static INLINE void v_store_64xh(uint8_t *dst, ptrdiff_t stride, int h,
+ uint8x16_t d0, uint8x16_t d1, uint8x16_t d2,
+ uint8x16_t d3) {
+ for (int i = 0; i < h; ++i) {
+ vst1q_u8(dst + 0, d0);
+ vst1q_u8(dst + 16, d1);
+ vst1q_u8(dst + 32, d2);
+ vst1q_u8(dst + 48, d3);
+ dst += stride;
+ }
}
void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int i;
- uint32x2_t d0u32 = vdup_n_u32(0);
(void)left;
-
- d0u32 = vld1_lane_u32((const uint32_t *)above, d0u32, 0);
- for (i = 0; i < 4; i++, dst += stride)
- vst1_lane_u32((uint32_t *)dst, d0u32, 0);
+ v_store_4xh(dst, stride, 4, load_u8_4x1_lane0(above));
}
void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int i;
- uint8x8_t d0u8 = vdup_n_u8(0);
(void)left;
-
- d0u8 = vld1_u8(above);
- for (i = 0; i < 8; i++, dst += stride) vst1_u8(dst, d0u8);
+ v_store_8xh(dst, stride, 8, vld1_u8(above));
}
void aom_v_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int i;
- uint8x16_t q0u8 = vdupq_n_u8(0);
(void)left;
-
- q0u8 = vld1q_u8(above);
- for (i = 0; i < 16; i++, dst += stride) vst1q_u8(dst, q0u8);
+ v_store_16xh(dst, stride, 16, vld1q_u8(above));
}
void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int i;
- uint8x16_t q0u8 = vdupq_n_u8(0);
- uint8x16_t q1u8 = vdupq_n_u8(0);
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
(void)left;
+ v_store_32xh(dst, stride, 32, d0, d1);
+}
- q0u8 = vld1q_u8(above);
- q1u8 = vld1q_u8(above + 16);
- for (i = 0; i < 32; i++, dst += stride) {
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q1u8);
- }
+void aom_v_predictor_4x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_4xh(dst, stride, 8, load_u8_4x1_lane0(above));
+}
+
+void aom_v_predictor_4x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_4xh(dst, stride, 16, load_u8_4x1_lane0(above));
+}
+
+void aom_v_predictor_8x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_8xh(dst, stride, 4, vld1_u8(above));
+}
+
+void aom_v_predictor_8x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_8xh(dst, stride, 16, vld1_u8(above));
+}
+
+void aom_v_predictor_8x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_8xh(dst, stride, 32, vld1_u8(above));
+}
+
+void aom_v_predictor_16x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_16xh(dst, stride, 4, vld1q_u8(above));
+}
+
+void aom_v_predictor_16x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_16xh(dst, stride, 8, vld1q_u8(above));
+}
+
+void aom_v_predictor_16x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_16xh(dst, stride, 32, vld1q_u8(above));
+}
+
+void aom_v_predictor_16x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)left;
+ v_store_16xh(dst, stride, 64, vld1q_u8(above));
+}
+
+void aom_v_predictor_32x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ (void)left;
+ v_store_32xh(dst, stride, 8, d0, d1);
+}
+
+void aom_v_predictor_32x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ (void)left;
+ v_store_32xh(dst, stride, 16, d0, d1);
+}
+
+void aom_v_predictor_32x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ (void)left;
+ v_store_32xh(dst, stride, 64, d0, d1);
+}
+
+void aom_v_predictor_64x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ const uint8x16_t d2 = vld1q_u8(above + 32);
+ const uint8x16_t d3 = vld1q_u8(above + 48);
+ (void)left;
+ v_store_64xh(dst, stride, 16, d0, d1, d2, d3);
+}
+
+void aom_v_predictor_64x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ const uint8x16_t d2 = vld1q_u8(above + 32);
+ const uint8x16_t d3 = vld1q_u8(above + 48);
+ (void)left;
+ v_store_64xh(dst, stride, 32, d0, d1, d2, d3);
+}
+
+void aom_v_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(above);
+ const uint8x16_t d1 = vld1q_u8(above + 16);
+ const uint8x16_t d2 = vld1q_u8(above + 32);
+ const uint8x16_t d3 = vld1q_u8(above + 48);
+ (void)left;
+ v_store_64xh(dst, stride, 64, d0, d1, d2, d3);
+}
+
+// -----------------------------------------------------------------------------
+
+static INLINE void h_store_4x8(uint8_t *dst, ptrdiff_t stride, uint8x8_t d0) {
+ store_u8_4x1(dst + 0 * stride, vdup_lane_u8(d0, 0), 0);
+ store_u8_4x1(dst + 1 * stride, vdup_lane_u8(d0, 1), 0);
+ store_u8_4x1(dst + 2 * stride, vdup_lane_u8(d0, 2), 0);
+ store_u8_4x1(dst + 3 * stride, vdup_lane_u8(d0, 3), 0);
+ store_u8_4x1(dst + 4 * stride, vdup_lane_u8(d0, 4), 0);
+ store_u8_4x1(dst + 5 * stride, vdup_lane_u8(d0, 5), 0);
+ store_u8_4x1(dst + 6 * stride, vdup_lane_u8(d0, 6), 0);
+ store_u8_4x1(dst + 7 * stride, vdup_lane_u8(d0, 7), 0);
+}
+
+static INLINE void h_store_8x8(uint8_t *dst, ptrdiff_t stride, uint8x8_t d0) {
+ vst1_u8(dst + 0 * stride, vdup_lane_u8(d0, 0));
+ vst1_u8(dst + 1 * stride, vdup_lane_u8(d0, 1));
+ vst1_u8(dst + 2 * stride, vdup_lane_u8(d0, 2));
+ vst1_u8(dst + 3 * stride, vdup_lane_u8(d0, 3));
+ vst1_u8(dst + 4 * stride, vdup_lane_u8(d0, 4));
+ vst1_u8(dst + 5 * stride, vdup_lane_u8(d0, 5));
+ vst1_u8(dst + 6 * stride, vdup_lane_u8(d0, 6));
+ vst1_u8(dst + 7 * stride, vdup_lane_u8(d0, 7));
+}
+
+static INLINE void h_store_16x8(uint8_t *dst, ptrdiff_t stride, uint8x8_t d0) {
+ vst1q_u8(dst + 0 * stride, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 1 * stride, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 2 * stride, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 3 * stride, vdupq_lane_u8(d0, 3));
+ vst1q_u8(dst + 4 * stride, vdupq_lane_u8(d0, 4));
+ vst1q_u8(dst + 5 * stride, vdupq_lane_u8(d0, 5));
+ vst1q_u8(dst + 6 * stride, vdupq_lane_u8(d0, 6));
+ vst1q_u8(dst + 7 * stride, vdupq_lane_u8(d0, 7));
+}
+
+static INLINE void h_store_32x8(uint8_t *dst, ptrdiff_t stride, uint8x8_t d0) {
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 0));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 1));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 2));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 3));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 3));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 4));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 4));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 5));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 5));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 6));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 6));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 7));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 7));
+}
+
+static INLINE void h_store_64x8(uint8_t *dst, ptrdiff_t stride, uint8x8_t d0) {
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 0));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 1));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 2));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 3));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 3));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 3));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 3));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 4));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 4));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 4));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 4));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 5));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 5));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 5));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 5));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 6));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 6));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 6));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 6));
+ dst += stride;
+ vst1q_u8(dst + 0, vdupq_lane_u8(d0, 7));
+ vst1q_u8(dst + 16, vdupq_lane_u8(d0, 7));
+ vst1q_u8(dst + 32, vdupq_lane_u8(d0, 7));
+ vst1q_u8(dst + 48, vdupq_lane_u8(d0, 7));
}
void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- uint8x8_t d0u8 = vdup_n_u8(0);
- uint32x2_t d1u32 = vdup_n_u32(0);
+ const uint8x8_t d0 = load_u8_4x1_lane0(left);
(void)above;
-
- d1u32 = vld1_lane_u32((const uint32_t *)left, d1u32, 0);
-
- d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 0);
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 1);
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 2);
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u32(d1u32), 3);
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d0u8), 0);
+ store_u8_4x1(dst + 0 * stride, vdup_lane_u8(d0, 0), 0);
+ store_u8_4x1(dst + 1 * stride, vdup_lane_u8(d0, 1), 0);
+ store_u8_4x1(dst + 2 * stride, vdup_lane_u8(d0, 2), 0);
+ store_u8_4x1(dst + 3 * stride, vdup_lane_u8(d0, 3), 0);
}
void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- uint8x8_t d0u8 = vdup_n_u8(0);
- uint64x1_t d1u64 = vdup_n_u64(0);
+ const uint8x8_t d0 = vld1_u8(left);
(void)above;
-
- d1u64 = vld1_u64((const uint64_t *)left);
-
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 0);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 1);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 2);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 3);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 4);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 5);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 6);
- vst1_u8(dst, d0u8);
- dst += stride;
- d0u8 = vdup_lane_u8(vreinterpret_u8_u64(d1u64), 7);
- vst1_u8(dst, d0u8);
+ h_store_8x8(dst, stride, d0);
}
void aom_h_predictor_16x16_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int j;
- uint8x8_t d2u8 = vdup_n_u8(0);
- uint8x16_t q0u8 = vdupq_n_u8(0);
- uint8x16_t q1u8 = vdupq_n_u8(0);
+ const uint8x16_t d0 = vld1q_u8(left);
(void)above;
-
- q1u8 = vld1q_u8(left);
- d2u8 = vget_low_u8(q1u8);
- for (j = 0; j < 2; j++, d2u8 = vget_high_u8(q1u8)) {
- q0u8 = vdupq_lane_u8(d2u8, 0);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 1);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 2);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 3);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 4);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 5);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 6);
- vst1q_u8(dst, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 7);
- vst1q_u8(dst, q0u8);
- dst += stride;
- }
+ h_store_16x8(dst, stride, vget_low_u8(d0));
+ h_store_16x8(dst + 8 * stride, stride, vget_high_u8(d0));
}
void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t stride,
const uint8_t *above, const uint8_t *left) {
- int j, k;
- uint8x8_t d2u8 = vdup_n_u8(0);
- uint8x16_t q0u8 = vdupq_n_u8(0);
- uint8x16_t q1u8 = vdupq_n_u8(0);
+ const uint8x16_t d0 = vld1q_u8(left);
+ const uint8x16_t d1 = vld1q_u8(left + 16);
(void)above;
+ h_store_32x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_32x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ h_store_32x8(dst + 16 * stride, stride, vget_low_u8(d1));
+ h_store_32x8(dst + 24 * stride, stride, vget_high_u8(d1));
+}
- for (k = 0; k < 2; k++, left += 16) {
- q1u8 = vld1q_u8(left);
- d2u8 = vget_low_u8(q1u8);
- for (j = 0; j < 2; j++, d2u8 = vget_high_u8(q1u8)) {
- q0u8 = vdupq_lane_u8(d2u8, 0);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 1);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 2);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 3);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 4);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 5);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 6);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- q0u8 = vdupq_lane_u8(d2u8, 7);
- vst1q_u8(dst, q0u8);
- vst1q_u8(dst + 16, q0u8);
- dst += stride;
- }
+void aom_h_predictor_4x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t d0 = vld1_u8(left);
+ (void)above;
+ h_store_4x8(dst, stride, d0);
+}
+
+void aom_h_predictor_4x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ (void)above;
+ h_store_4x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_4x8(dst + 8 * stride, stride, vget_high_u8(d0));
+}
+
+void aom_h_predictor_8x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t d0 = load_u8_4x1_lane0(left);
+ (void)above;
+ vst1_u8(dst + 0 * stride, vdup_lane_u8(d0, 0));
+ vst1_u8(dst + 1 * stride, vdup_lane_u8(d0, 1));
+ vst1_u8(dst + 2 * stride, vdup_lane_u8(d0, 2));
+ vst1_u8(dst + 3 * stride, vdup_lane_u8(d0, 3));
+}
+
+void aom_h_predictor_8x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ (void)above;
+ h_store_8x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_8x8(dst + 8 * stride, stride, vget_high_u8(d0));
+}
+
+void aom_h_predictor_8x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ const uint8x16_t d1 = vld1q_u8(left + 16);
+ (void)above;
+ h_store_8x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_8x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ h_store_8x8(dst + 16 * stride, stride, vget_low_u8(d1));
+ h_store_8x8(dst + 24 * stride, stride, vget_high_u8(d1));
+}
+
+void aom_h_predictor_16x4_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t d0 = load_u8_4x1_lane0(left);
+ (void)above;
+ vst1q_u8(dst + 0 * stride, vdupq_lane_u8(d0, 0));
+ vst1q_u8(dst + 1 * stride, vdupq_lane_u8(d0, 1));
+ vst1q_u8(dst + 2 * stride, vdupq_lane_u8(d0, 2));
+ vst1q_u8(dst + 3 * stride, vdupq_lane_u8(d0, 3));
+}
+
+void aom_h_predictor_16x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t d0 = vld1_u8(left);
+ (void)above;
+ h_store_16x8(dst, stride, d0);
+}
+
+void aom_h_predictor_16x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ const uint8x16_t d1 = vld1q_u8(left + 16);
+ (void)above;
+ h_store_16x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_16x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ h_store_16x8(dst + 16 * stride, stride, vget_low_u8(d1));
+ h_store_16x8(dst + 24 * stride, stride, vget_high_u8(d1));
+}
+
+void aom_h_predictor_16x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ const uint8x16_t d1 = vld1q_u8(left + 16);
+ const uint8x16_t d2 = vld1q_u8(left + 32);
+ const uint8x16_t d3 = vld1q_u8(left + 48);
+ (void)above;
+ h_store_16x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_16x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ h_store_16x8(dst + 16 * stride, stride, vget_low_u8(d1));
+ h_store_16x8(dst + 24 * stride, stride, vget_high_u8(d1));
+ h_store_16x8(dst + 32 * stride, stride, vget_low_u8(d2));
+ h_store_16x8(dst + 40 * stride, stride, vget_high_u8(d2));
+ h_store_16x8(dst + 48 * stride, stride, vget_low_u8(d3));
+ h_store_16x8(dst + 56 * stride, stride, vget_high_u8(d3));
+}
+
+void aom_h_predictor_32x8_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x8_t d0 = vld1_u8(left);
+ (void)above;
+ h_store_32x8(dst, stride, d0);
+}
+
+void aom_h_predictor_32x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ (void)above;
+ h_store_32x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_32x8(dst + 8 * stride, stride, vget_high_u8(d0));
+}
+
+void aom_h_predictor_32x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left + 0);
+ const uint8x16_t d1 = vld1q_u8(left + 16);
+ const uint8x16_t d2 = vld1q_u8(left + 32);
+ const uint8x16_t d3 = vld1q_u8(left + 48);
+ (void)above;
+ h_store_32x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_32x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ h_store_32x8(dst + 16 * stride, stride, vget_low_u8(d1));
+ h_store_32x8(dst + 24 * stride, stride, vget_high_u8(d1));
+ h_store_32x8(dst + 32 * stride, stride, vget_low_u8(d2));
+ h_store_32x8(dst + 40 * stride, stride, vget_high_u8(d2));
+ h_store_32x8(dst + 48 * stride, stride, vget_low_u8(d3));
+ h_store_32x8(dst + 56 * stride, stride, vget_high_u8(d3));
+}
+
+void aom_h_predictor_64x16_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ (void)above;
+ h_store_64x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_64x8(dst + 8 * stride, stride, vget_high_u8(d0));
+}
+
+void aom_h_predictor_64x32_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)above;
+ for (int i = 0; i < 2; ++i) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ h_store_64x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_64x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ left += 16;
+ dst += 16 * stride;
+ }
+}
+
+void aom_h_predictor_64x64_neon(uint8_t *dst, ptrdiff_t stride,
+ const uint8_t *above, const uint8_t *left) {
+ (void)above;
+ for (int i = 0; i < 4; ++i) {
+ const uint8x16_t d0 = vld1q_u8(left);
+ h_store_64x8(dst + 0 * stride, stride, vget_low_u8(d0));
+ h_store_64x8(dst + 8 * stride, stride, vget_high_u8(d0));
+ left += 16;
+ dst += 16 * stride;
}
}
@@ -638,7 +1149,6 @@
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
};
-/* clang-format on */
static AOM_FORCE_INLINE void dr_prediction_z1_HxW_internal_neon_64(
int H, int W, uint8x8_t *dst, const uint8_t *above, int upsample_above,
int dx) {
@@ -653,23 +1163,12 @@
// final pixels will be calculated as:
// (above[x] * 32 + 16 + (above[x+1] - above[x]) * shift) >> 5
- uint16x8_t a0, a1;
- uint16x8_t diff, a32;
- uint16x8_t a16;
- uint8x8_t a_mbase_x;
-
- a16 = vdupq_n_u16(16);
- a_mbase_x = vdup_n_u8(above[max_base_x]);
- uint16x8_t v_32 = vdupq_n_u16(32);
- int16x8_t v_upsample_above = vdupq_n_s16(upsample_above);
- uint16x8_t c3f = vdupq_n_u16(0x3f);
+ const uint16x8_t a16 = vdupq_n_u16(16);
+ const uint8x8_t a_mbase_x = vdup_n_u8(above[max_base_x]);
+ const uint8x8_t v_32 = vdup_n_u8(32);
int x = dx;
for (int r = 0; r < W; r++) {
- uint16x8_t res;
- uint16x8_t shift;
- uint8x8x2_t v_tmp_a0_128;
-
int base = x >> frac_bits;
int base_max_diff = (max_base_x - base) >> upsample_above;
if (base_max_diff <= 0) {
@@ -681,24 +1180,22 @@
if (base_max_diff > H) base_max_diff = H;
+ uint8x8x2_t a01_128;
+ uint16x8_t shift;
if (upsample_above) {
- v_tmp_a0_128 = vld2_u8(above + base);
- shift = vshrq_n_u16(
- vandq_u16(vshlq_u16(vdupq_n_u16(x), v_upsample_above), c3f), 1);
+ a01_128 = vld2_u8(above + base);
+ shift = vdupq_n_u16(((x << upsample_above) & 0x3f) >> 1);
} else {
- v_tmp_a0_128.val[0] = vld1_u8(above + base);
- v_tmp_a0_128.val[1] = vld1_u8(above + base + 1);
- shift = vshrq_n_u16(vandq_u16(vdupq_n_u16(x), c3f), 1);
+ a01_128.val[0] = vld1_u8(above + base);
+ a01_128.val[1] = vld1_u8(above + base + 1);
+ shift = vdupq_n_u16((x & 0x3f) >> 1);
}
- a0 = vmovl_u8(v_tmp_a0_128.val[0]);
- a1 = vmovl_u8(v_tmp_a0_128.val[1]);
- diff = vsubq_u16(a1, a0); // a[x+1] - a[x]
- a32 = vmlaq_u16(a16, a0, v_32); // a[x] * 32 + 16
- res = vmlaq_u16(a32, diff, shift);
+ uint16x8_t diff = vsubl_u8(a01_128.val[1], a01_128.val[0]);
+ uint16x8_t a32 = vmlal_u8(a16, a01_128.val[0], v_32);
+ uint16x8_t res = vmlaq_u16(a32, diff, shift);
uint8x8_t mask = vld1_u8(BaseMask[base_max_diff]);
- dst[r] =
- vorr_u8(vand_u8(mask, vshrn_n_u16(res, 5)), vbic_u8(a_mbase_x, mask));
+ dst[r] = vbsl_u8(mask, vshrn_n_u16(res, 5), a_mbase_x);
x += dx;
}
@@ -743,17 +1240,10 @@
// final pixels will be calculated as:
// (above[x] * 32 + 16 + (above[x+1] - above[x]) * shift) >> 5
- uint8x16x2_t a0, a1;
- uint16x8x2_t diff, a32;
- uint16x8_t a16, c3f;
- uint8x16_t a_mbase_x;
-
- a16 = vdupq_n_u16(16);
- a_mbase_x = vdupq_n_u8(above[max_base_x]);
- c3f = vdupq_n_u16(0x3f);
- uint16x8_t v_32 = vdupq_n_u16(32);
- uint8x16_t v_zero = vdupq_n_u8(0);
- int16x8_t v_upsample_above = vdupq_n_s16(upsample_above);
+ const uint16x8_t a16 = vdupq_n_u16(16);
+ const uint8x16_t a_mbase_x = vdupq_n_u8(above[max_base_x]);
+ const uint8x8_t v_32 = vdup_n_u8(32);
+ const uint8x16_t v_zero = vdupq_n_u8(0);
int x = dx;
for (int r = 0; r < W; r++) {
@@ -776,30 +1266,24 @@
uint8x8x2_t v_tmp_a0_128 = vld2_u8(above + base);
a0_128 = vcombine_u8(v_tmp_a0_128.val[0], v_tmp_a0_128.val[1]);
a1_128 = vextq_u8(a0_128, v_zero, 8);
- shift = vshrq_n_u16(
- vandq_u16(vshlq_u16(vdupq_n_u16(x), v_upsample_above), c3f), 1);
+ shift = vdupq_n_u16(((x << upsample_above) & 0x3f) >> 1);
} else {
a0_128 = vld1q_u8(above + base);
a1_128 = vld1q_u8(above + base + 1);
- shift = vshrq_n_u16(vandq_u16(vdupq_n_u16(x), c3f), 1);
+ shift = vdupq_n_u16((x & 0x3f) >> 1);
}
- a0 = vzipq_u8(a0_128, v_zero);
- a1 = vzipq_u8(a1_128, v_zero);
- diff.val[0] = vsubq_u16(vreinterpretq_u16_u8(a1.val[0]),
- vreinterpretq_u16_u8(a0.val[0])); // a[x+1] - a[x]
- diff.val[1] = vsubq_u16(vreinterpretq_u16_u8(a1.val[1]),
- vreinterpretq_u16_u8(a0.val[1])); // a[x+1] - a[x]
- a32.val[0] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[0]),
- v_32); // a[x] * 32 + 16
- a32.val[1] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[1]),
- v_32); // a[x] * 32 + 16
+ uint16x8x2_t diff, a32;
+ diff.val[0] = vsubl_u8(vget_low_u8(a1_128), vget_low_u8(a0_128));
+ diff.val[1] = vsubl_u8(vget_high_u8(a1_128), vget_high_u8(a0_128));
+ a32.val[0] = vmlal_u8(a16, vget_low_u8(a0_128), v_32);
+ a32.val[1] = vmlal_u8(a16, vget_high_u8(a0_128), v_32);
res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shift);
res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shift);
uint8x16_t v_temp =
vcombine_u8(vshrn_n_u16(res.val[0], 5), vshrn_n_u16(res.val[1], 5));
uint8x16_t mask = vld1q_u8(BaseMask[base_max_diff]);
- dst[r] = vorrq_u8(vandq_u8(mask, v_temp), vbicq_u8(a_mbase_x, mask));
+ dst[r] = vbslq_u8(mask, v_temp, a_mbase_x);
x += dx;
}
@@ -831,22 +1315,13 @@
// final pixels will be calculated as:
// (above[x] * 32 + 16 + (above[x+1] - above[x]) * shift) >> 5
- uint8x16_t a_mbase_x;
- uint8x16x2_t a0, a1;
- uint16x8x2_t diff, a32;
- uint16x8_t a16, c3f;
-
- a_mbase_x = vdupq_n_u8(above[max_base_x]);
- a16 = vdupq_n_u16(16);
- c3f = vdupq_n_u16(0x3f);
- uint16x8_t v_32 = vdupq_n_u16(32);
- uint8x16_t v_zero = vdupq_n_u8(0);
+ const uint8x16_t a_mbase_x = vdupq_n_u8(above[max_base_x]);
+ const uint16x8_t a16 = vdupq_n_u16(16);
+ const uint8x8_t v_32 = vdup_n_u8(32);
int x = dx;
for (int r = 0; r < N; r++) {
- uint16x8x2_t res;
uint8x16_t res16[2];
- uint8x16_t a0_128, a1_128;
int base = x >> frac_bits;
int base_max_diff = (max_base_x - base);
@@ -859,27 +1334,21 @@
}
if (base_max_diff > 32) base_max_diff = 32;
- uint16x8_t shift = vshrq_n_u16(vandq_u16(vdupq_n_u16(x), c3f), 1);
+ uint16x8_t shift = vdupq_n_u16((x & 0x3f) >> 1);
for (int j = 0, jj = 0; j < 32; j += 16, jj++) {
int mdiff = base_max_diff - j;
if (mdiff <= 0) {
res16[jj] = a_mbase_x;
} else {
+ uint16x8x2_t a32, diff, res;
+ uint8x16_t a0_128, a1_128;
a0_128 = vld1q_u8(above + base + j);
a1_128 = vld1q_u8(above + base + j + 1);
- a0 = vzipq_u8(a0_128, v_zero);
- a1 = vzipq_u8(a1_128, v_zero);
- diff.val[0] =
- vsubq_u16(vreinterpretq_u16_u8(a1.val[0]),
- vreinterpretq_u16_u8(a0.val[0])); // a[x+1] - a[x]
- diff.val[1] =
- vsubq_u16(vreinterpretq_u16_u8(a1.val[1]),
- vreinterpretq_u16_u8(a0.val[1])); // a[x+1] - a[x]
- a32.val[0] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[0]),
- v_32); // a[x] * 32 + 16
- a32.val[1] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[1]),
- v_32); // a[x] * 32 + 16
+ diff.val[0] = vsubl_u8(vget_low_u8(a1_128), vget_low_u8(a0_128));
+ diff.val[1] = vsubl_u8(vget_high_u8(a1_128), vget_high_u8(a0_128));
+ a32.val[0] = vmlal_u8(a16, vget_low_u8(a0_128), v_32);
+ a32.val[1] = vmlal_u8(a16, vget_high_u8(a0_128), v_32);
res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shift);
res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shift);
@@ -892,10 +1361,8 @@
mask.val[0] = vld1q_u8(BaseMask[base_max_diff]);
mask.val[1] = vld1q_u8(BaseMask[base_max_diff] + 16);
- dstvec[r].val[0] = vorrq_u8(vandq_u8(mask.val[0], res16[0]),
- vbicq_u8(a_mbase_x, mask.val[0]));
- dstvec[r].val[1] = vorrq_u8(vandq_u8(mask.val[1], res16[1]),
- vbicq_u8(a_mbase_x, mask.val[1]));
+ dstvec[r].val[0] = vbslq_u8(mask.val[0], res16[0], a_mbase_x);
+ dstvec[r].val[1] = vbslq_u8(mask.val[1], res16[1], a_mbase_x);
x += dx;
}
}
@@ -927,23 +1394,15 @@
// final pixels will be calculated as:
// (above[x] * 32 + 16 + (above[x+1] - above[x]) * shift) >> 5
- uint8x16x2_t a0, a1;
- uint16x8x2_t a32, diff;
- uint16x8_t a16, c3f;
- uint8x16_t a_mbase_x, max_base_x128, mask128;
-
- a16 = vdupq_n_u16(16);
- a_mbase_x = vdupq_n_u8(above[max_base_x]);
- max_base_x128 = vdupq_n_u8(max_base_x);
- c3f = vdupq_n_u16(0x3f);
- uint16x8_t v_32 = vdupq_n_u16(32);
- uint8x16_t v_zero = vdupq_n_u8(0);
- uint8x16_t step = vdupq_n_u8(16);
+ const uint16x8_t a16 = vdupq_n_u16(16);
+ const uint8x16_t a_mbase_x = vdupq_n_u8(above[max_base_x]);
+ const uint8x16_t max_base_x128 = vdupq_n_u8(max_base_x);
+ const uint8x8_t v_32 = vdup_n_u8(32);
+ const uint8x16_t v_zero = vdupq_n_u8(0);
+ const uint8x16_t step = vdupq_n_u8(16);
int x = dx;
for (int r = 0; r < N; r++, dst += stride) {
- uint16x8x2_t res;
-
int base = x >> frac_bits;
if (base >= max_base_x) {
for (int i = r; i < N; ++i) {
@@ -956,8 +1415,7 @@
return;
}
- uint16x8_t shift = vshrq_n_u16(vandq_u16(vdupq_n_u16(x), c3f), 1);
- uint8x16_t a0_128, a1_128, res128;
+ uint16x8_t shift = vdupq_n_u16((x & 0x3f) >> 1);
uint8x16_t base_inc128 =
vaddq_u8(vdupq_n_u8(base), vcombine_u8(vcreate_u8(0x0706050403020100),
vcreate_u8(0x0F0E0D0C0B0A0908)));
@@ -967,28 +1425,21 @@
if (mdif <= 0) {
vst1q_u8(dst + j, a_mbase_x);
} else {
+ uint16x8x2_t a32, diff, res;
+ uint8x16_t a0_128, a1_128, mask128, res128;
a0_128 = vld1q_u8(above + base + j);
a1_128 = vld1q_u8(above + base + 1 + j);
- a0 = vzipq_u8(a0_128, v_zero);
- a1 = vzipq_u8(a1_128, v_zero);
- diff.val[0] =
- vsubq_u16(vreinterpretq_u16_u8(a1.val[0]),
- vreinterpretq_u16_u8(a0.val[0])); // a[x+1] - a[x]
- diff.val[1] =
- vsubq_u16(vreinterpretq_u16_u8(a1.val[1]),
- vreinterpretq_u16_u8(a0.val[1])); // a[x+1] - a[x]
- a32.val[0] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[0]),
- v_32); // a[x] * 32 + 16
- a32.val[1] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0.val[1]),
- v_32); // a[x] * 32 + 16
+ diff.val[0] = vsubl_u8(vget_low_u8(a1_128), vget_low_u8(a0_128));
+ diff.val[1] = vsubl_u8(vget_high_u8(a1_128), vget_high_u8(a0_128));
+ a32.val[0] = vmlal_u8(a16, vget_low_u8(a0_128), v_32);
+ a32.val[1] = vmlal_u8(a16, vget_high_u8(a0_128), v_32);
res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shift);
res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shift);
uint8x16_t v_temp =
vcombine_u8(vshrn_n_u16(res.val[0], 5), vshrn_n_u16(res.val[1], 5));
mask128 = vcgtq_u8(vqsubq_u8(max_base_x128, base_inc128), v_zero);
- res128 =
- vorrq_u8(vandq_u8(mask128, v_temp), vbicq_u8(a_mbase_x, mask128));
+ res128 = vbslq_u8(mask128, v_temp, a_mbase_x);
vst1q_u8(dst + j, res128);
base_inc128 = vaddq_u8(base_inc128, step);
@@ -1023,7 +1474,6 @@
break;
default: break;
}
- return;
}
/* ---------------------P R E D I C T I O N Z 2--------------------------- */
@@ -1077,11 +1527,17 @@
int16x4_t dy64 = vdup_n_s16(dy);
int16x4_t v_frac_bits_y = vdup_n_s16(-frac_bits_y);
int16x4_t min_base_y64 = vdup_n_s16(min_base_y);
- int16x4_t v_one = vdup_lane_s16(v_1234, 0);
+
+#if AOM_ARCH_AARCH64
+ // Use ext rather than loading left + 14 directly to avoid over-read.
+ const uint8x16_t left_m2 = vld1q_u8(left - 2);
+ const uint8x16_t left_0 = vld1q_u8(left);
+ const uint8x16_t left_14 = vextq_u8(left_0, left_0, 14);
+ const uint8x16x2_t left_vals = { { left_m2, left_14 } };
+#endif // AOM_ARCH_AARCH64
for (int r = 0; r < N; r++) {
uint16x8_t res, shift;
- uint16x4_t ydx;
uint8x8_t resx, resy;
uint16x4x2_t v_shift;
v_shift.val[1] = vdup_n_u16(0);
@@ -1105,7 +1561,7 @@
v_shift.val[0] = vreinterpret_u16_u8(v_zero_u8);
v_shift.val[1] = vreinterpret_u16_u8(v_zero_u8);
} else {
- ydx = vdup_n_u16(y * dx);
+ uint16x4_t ydx = vdup_n_u16(y * dx);
if (upsample_above) {
uint8x8x2_t v_tmp;
@@ -1128,29 +1584,39 @@
}
// y calc
- uint8x8_t a0_y, a1_y;
if (base_x < min_base_x) {
- DECLARE_ALIGNED(32, int16_t, base_y_c[4]);
int16x4_t v_r6 = vdup_n_s16(r << 6);
int16x4_t y_c64 = vmls_s16(v_r6, v_1234, dy64);
int16x4_t base_y_c64 = vshl_s16(y_c64, v_frac_bits_y);
uint16x4_t mask64 = vcgt_s16(min_base_y64, base_y_c64);
+ // Values in base_y_c64 range from -2 through 14 inclusive.
base_y_c64 = vbic_s16(base_y_c64, vreinterpret_s16_u16(mask64));
+
+#if AOM_ARCH_AARCH64
+ uint8x8_t left_idx0 = vreinterpret_u8_s16(base_y_c64 + 2); // [0, 16]
+ uint8x8_t left_idx1 = vreinterpret_u8_s16(base_y_c64 + 3); // [1, 17]
+
+ uint8x8_t a0_y = vtrn1_u8(vqtbl2_u8(left_vals, left_idx0), v_zero_u8);
+ uint8x8_t a1_y = vtrn1_u8(vqtbl2_u8(left_vals, left_idx1), v_zero_u8);
+#else // !AOM_ARCH_AARCH64
+ DECLARE_ALIGNED(32, int16_t, base_y_c[4]);
+
vst1_s16(base_y_c, base_y_c64);
- a0_y = v_zero_u8;
+ uint8x8_t a0_y = vdup_n_u8(0);
a0_y = vld1_lane_u8(left + base_y_c[0], a0_y, 0);
a0_y = vld1_lane_u8(left + base_y_c[1], a0_y, 2);
a0_y = vld1_lane_u8(left + base_y_c[2], a0_y, 4);
a0_y = vld1_lane_u8(left + base_y_c[3], a0_y, 6);
- base_y_c64 = vadd_s16(base_y_c64, v_one);
+ base_y_c64 = vadd_s16(base_y_c64, vdup_n_s16(1));
vst1_s16(base_y_c, base_y_c64);
- a1_y = v_zero_u8;
+ uint8x8_t a1_y = vdup_n_u8(0);
a1_y = vld1_lane_u8(left + base_y_c[0], a1_y, 0);
a1_y = vld1_lane_u8(left + base_y_c[1], a1_y, 2);
a1_y = vld1_lane_u8(left + base_y_c[2], a1_y, 4);
a1_y = vld1_lane_u8(left + base_y_c[3], a1_y, 6);
+#endif // AOM_ARCH_AARCH64
if (upsample_left) {
v_shift.val[1] = vshr_n_u16(
@@ -1173,7 +1639,7 @@
resy = vext_u8(resx, v_zero_u8, 4);
uint8x8_t mask = vld1_u8(BaseMask[base_min_diff]);
- uint8x8_t v_resxy = vorr_u8(vand_u8(mask, resy), vbic_u8(resx, mask));
+ uint8x8_t v_resxy = vbsl_u8(mask, resy, resx);
vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(v_resxy), 0);
dst += stride;
@@ -1217,27 +1683,31 @@
// above[x+1] - above[x]
// final pixels will be calculated as:
// (above[x] * 32 + 16 + (above[x+1] - above[x]) * shift) >> 5
- uint8x16x2_t a0_x, a1_x;
uint16x8x2_t diff, a32;
- uint16x8_t c1234, a16, c3f;
- uint8x16_t a0_x128, a1_x128;
- int16x8_t min_base_y128, dy128;
- uint16x8_t v_32 = vdupq_n_u16(32);
uint8x16_t v_zero = vdupq_n_u8(0);
int16x8_t v_upsample_left = vdupq_n_s16(upsample_left);
int16x8_t v_upsample_above = vdupq_n_s16(upsample_above);
int16x8_t v_frac_bits_y = vdupq_n_s16(-frac_bits_y);
- a16 = vdupq_n_u16(16);
- c3f = vdupq_n_u16(0x3f);
- min_base_y128 = vdupq_n_s16(min_base_y);
- dy128 = vdupq_n_s16(dy);
- c1234 = vcombine_u16(vcreate_u16(0x0004000300020001),
- vcreate_u16(0x0008000700060005));
+ uint16x8_t a16 = vdupq_n_u16(16);
+ uint16x8_t c3f = vdupq_n_u16(0x3f);
+ int16x8_t min_base_y128 = vdupq_n_s16(min_base_y);
+ int16x8_t dy128 = vdupq_n_s16(dy);
+ uint16x8_t c1234 = vcombine_u16(vcreate_u16(0x0004000300020001),
+ vcreate_u16(0x0008000700060005));
+
+#if AOM_ARCH_AARCH64
+ // Use ext rather than loading left + 30 directly to avoid over-read.
+ const uint8x16_t left_m2 = vld1q_u8(left - 2);
+ const uint8x16_t left_0 = vld1q_u8(left + 0);
+ const uint8x16_t left_16 = vld1q_u8(left + 16);
+ const uint8x16_t left_14 = vextq_u8(left_0, left_16, 14);
+ const uint8x16_t left_30 = vextq_u8(left_16, left_16, 14);
+ const uint8x16x3_t left_vals = { { left_m2, left_14, left_30 } };
+#endif // AOM_ARCH_AARCH64
for (int r = 0; r < N; r++) {
uint8x8_t resx, resy, resxy;
- uint16x8_t r6, ydx;
uint16x8x2_t res, shift;
shift.val[1] = vdupq_n_u16(0);
@@ -1255,16 +1725,16 @@
if (base_min_diff < 0) base_min_diff = 0;
}
+ uint8x8_t a0_x0, a1_x0;
if (base_shift > 7) {
- a0_x.val[0] = v_zero;
- a0_x.val[1] = v_zero;
- a1_x.val[0] = v_zero;
- a1_x.val[1] = v_zero;
+ a0_x0 = vdup_n_u8(0);
+ a1_x0 = vdup_n_u8(0);
shift.val[0] = vreinterpretq_u16_u8(v_zero);
shift.val[1] = vreinterpretq_u16_u8(v_zero);
} else {
- ydx = vdupq_n_u16(y * dx);
- r6 = vshlq_n_u16(vextq_u16(c1234, vreinterpretq_u16_u8(v_zero), 2), 6);
+ uint16x8_t ydx = vdupq_n_u16(y * dx);
+ uint16x8_t r6 =
+ vshlq_n_u16(vextq_u16(c1234, vreinterpretq_u16_u8(v_zero), 2), 6);
if (upsample_above) {
uint8x8x2_t v_tmp;
@@ -1274,24 +1744,27 @@
uint8x8_t v_index_high = vld1_u8(EvenOddMaskx[base_shift] + 8);
shift.val[0] = vshrq_n_u16(
vandq_u16(vshlq_u16(vsubq_u16(r6, ydx), v_upsample_above), c3f), 1);
- a0_x.val[0] =
- vreinterpretq_u8_u16(vmovl_u8(vtbl2_u8(v_tmp, v_index_low)));
- a1_x.val[0] =
- vreinterpretq_u8_u16(vmovl_u8(vtbl2_u8(v_tmp, v_index_high)));
+ a0_x0 = vtbl2_u8(v_tmp, v_index_low);
+ a1_x0 = vtbl2_u8(v_tmp, v_index_high);
} else {
+ uint8x16_t a0_x128, a1_x128;
a0_x128 = vld1q_u8(above + base_x + base_shift);
a1_x128 = vextq_u8(a0_x128, v_zero, 1);
vector_shuffle(&a0_x128, &v_zero, base_shift);
vector_shuffle(&a1_x128, &v_zero, base_shift);
shift.val[0] = vshrq_n_u16(vandq_u16(vsubq_u16(r6, ydx), c3f), 1);
- a0_x.val[0] = vreinterpretq_u8_u16(vmovl_u8(vget_low_u8(a0_x128)));
- a1_x.val[0] = vreinterpretq_u8_u16(vmovl_u8(vget_low_u8(a1_x128)));
+ a0_x0 = vget_low_u8(a0_x128);
+ a1_x0 = vget_low_u8(a1_x128);
}
}
+ diff.val[0] = vsubl_u8(a1_x0, a0_x0); // a[x+1] - a[x]
+ a32.val[0] = vmlal_u8(a16, a0_x0, vdup_n_u8(32)); // a[x] * 32 + 16
+ res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shift.val[0]);
+ resx = vshrn_n_u16(res.val[0], 5);
+
// y calc
if (base_x < min_base_x) {
- DECLARE_ALIGNED(32, int16_t, base_y_c[16]);
int16x8_t y_c128, base_y_c128;
uint16x8_t mask128;
int16x8_t v_r6 = vdupq_n_s16(r << 6);
@@ -1300,30 +1773,43 @@
base_y_c128 = vshlq_s16(y_c128, v_frac_bits_y);
mask128 = vcgtq_s16(min_base_y128, base_y_c128);
+ // Values in base_y_c128 range from -2 through 31 inclusive.
base_y_c128 = vbicq_s16(base_y_c128, vreinterpretq_s16_u16(mask128));
- vst1q_s16(base_y_c, base_y_c128);
- a0_x.val[1] = v_zero;
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[0], a0_x.val[1], 0);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[1], a0_x.val[1], 2);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[2], a0_x.val[1], 4);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[3], a0_x.val[1], 6);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[4], a0_x.val[1], 8);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[5], a0_x.val[1], 10);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[6], a0_x.val[1], 12);
- a0_x.val[1] = vld1q_lane_u8(left + base_y_c[7], a0_x.val[1], 14);
- base_y_c128 =
- vaddq_s16(base_y_c128, vreinterpretq_s16_u16(vshrq_n_u16(a16, 4)));
+#if AOM_ARCH_AARCH64
+ uint8x16_t left_idx0 = vreinterpretq_u8_s16(base_y_c128 + 2); // [0, 33]
+ uint8x16_t left_idx1 = vreinterpretq_u8_s16(base_y_c128 + 3); // [1, 34]
+ uint8x16_t left_idx01 = vuzp1q_u8(left_idx0, left_idx1);
+
+ uint8x16_t a01_x = vqtbl3q_u8(left_vals, left_idx01);
+ uint8x8_t a0_x1 = vget_low_u8(a01_x);
+ uint8x8_t a1_x1 = vget_high_u8(a01_x);
+#else // !AOM_ARCH_AARCH64
+ DECLARE_ALIGNED(32, int16_t, base_y_c[16]);
+
vst1q_s16(base_y_c, base_y_c128);
- a1_x.val[1] = v_zero;
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[0], a1_x.val[1], 0);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[1], a1_x.val[1], 2);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[2], a1_x.val[1], 4);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[3], a1_x.val[1], 6);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[4], a1_x.val[1], 8);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[5], a1_x.val[1], 10);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[6], a1_x.val[1], 12);
- a1_x.val[1] = vld1q_lane_u8(left + base_y_c[7], a1_x.val[1], 14);
+ uint8x8_t a0_x1 = vdup_n_u8(0);
+ a0_x1 = vld1_lane_u8(left + base_y_c[0], a0_x1, 0);
+ a0_x1 = vld1_lane_u8(left + base_y_c[1], a0_x1, 1);
+ a0_x1 = vld1_lane_u8(left + base_y_c[2], a0_x1, 2);
+ a0_x1 = vld1_lane_u8(left + base_y_c[3], a0_x1, 3);
+ a0_x1 = vld1_lane_u8(left + base_y_c[4], a0_x1, 4);
+ a0_x1 = vld1_lane_u8(left + base_y_c[5], a0_x1, 5);
+ a0_x1 = vld1_lane_u8(left + base_y_c[6], a0_x1, 6);
+ a0_x1 = vld1_lane_u8(left + base_y_c[7], a0_x1, 7);
+
+ base_y_c128 = vaddq_s16(base_y_c128, vdupq_n_s16(1));
+ vst1q_s16(base_y_c, base_y_c128);
+ uint8x8_t a1_x1 = vdup_n_u8(0);
+ a1_x1 = vld1_lane_u8(left + base_y_c[0], a1_x1, 0);
+ a1_x1 = vld1_lane_u8(left + base_y_c[1], a1_x1, 1);
+ a1_x1 = vld1_lane_u8(left + base_y_c[2], a1_x1, 2);
+ a1_x1 = vld1_lane_u8(left + base_y_c[3], a1_x1, 3);
+ a1_x1 = vld1_lane_u8(left + base_y_c[4], a1_x1, 4);
+ a1_x1 = vld1_lane_u8(left + base_y_c[5], a1_x1, 5);
+ a1_x1 = vld1_lane_u8(left + base_y_c[6], a1_x1, 6);
+ a1_x1 = vld1_lane_u8(left + base_y_c[7], a1_x1, 7);
+#endif // AOM_ARCH_AARCH64
if (upsample_left) {
shift.val[1] = vshrq_n_u16(
@@ -1334,26 +1820,18 @@
shift.val[1] =
vshrq_n_u16(vandq_u16(vreinterpretq_u16_s16(y_c128), c3f), 1);
}
+
+ diff.val[1] = vsubl_u8(a1_x1, a0_x1);
+ a32.val[1] = vmlal_u8(a16, a0_x1, vdup_n_u8(32));
+ res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shift.val[1]);
+ resy = vshrn_n_u16(res.val[1], 5);
+ uint8x8_t mask = vld1_u8(BaseMask[base_min_diff]);
+ resxy = vbsl_u8(mask, resy, resx);
+ vst1_u8(dst, resxy);
+ } else {
+ vst1_u8(dst, resx);
}
- diff.val[0] =
- vsubq_u16(vreinterpretq_u16_u8(a1_x.val[0]),
- vreinterpretq_u16_u8(a0_x.val[0])); // a[x+1] - a[x]
- diff.val[1] =
- vsubq_u16(vreinterpretq_u16_u8(a1_x.val[1]),
- vreinterpretq_u16_u8(a0_x.val[1])); // a[x+1] - a[x]
- a32.val[0] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0_x.val[0]),
- v_32); // a[x] * 32 + 16
- a32.val[1] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0_x.val[1]),
- v_32); // a[x] * 32 + 16
- res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shift.val[0]);
- res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shift.val[1]);
- resx = vshrn_n_u16(res.val[0], 5);
- resy = vshrn_n_u16(res.val[1], 5);
- uint8x8_t mask = vld1_u8(BaseMask[base_min_diff]);
-
- resxy = vorr_u8(vand_u8(mask, resy), vbic_u8(resx, mask));
- vst1_u8(dst, resxy);
dst += stride;
}
}
@@ -1371,22 +1849,17 @@
const int frac_bits_x = 6;
const int frac_bits_y = 6;
- uint16x8_t a16, c1, c3f;
- int16x8_t min_base_y256, dy256;
uint16x8x2_t a32, c0123, c1234, diff, shifty;
- uint8x16x2_t a0_x, a1_x, a0_y, a1_y;
- uint8x16_t a0_x128, a1_x128;
+ uint8x16x2_t a0_x, a1_x;
uint16x8_t v_32 = vdupq_n_u16(32);
uint8x16_t v_zero = vdupq_n_u8(0);
int16x8_t v_frac_bits_y = vdupq_n_s16(-frac_bits_y);
- DECLARE_ALIGNED(32, int16_t, base_y_c[16]);
-
- a16 = vdupq_n_u16(16);
- c1 = vshrq_n_u16(a16, 4);
- min_base_y256 = vdupq_n_s16(min_base_y);
- c3f = vdupq_n_u16(0x3f);
- dy256 = vdupq_n_s16(dy);
+ uint16x8_t a16 = vdupq_n_u16(16);
+ uint16x8_t c1 = vshrq_n_u16(a16, 4);
+ int16x8_t min_base_y256 = vdupq_n_s16(min_base_y);
+ uint16x8_t c3f = vdupq_n_u16(0x3f);
+ int16x8_t dy256 = vdupq_n_s16(dy);
c0123.val[0] = vcombine_u16(vcreate_u16(0x0003000200010000),
vcreate_u16(0x0007000600050004));
c0123.val[1] = vcombine_u16(vcreate_u16(0x000B000A00090008),
@@ -1394,12 +1867,25 @@
c1234.val[0] = vaddq_u16(c0123.val[0], c1);
c1234.val[1] = vaddq_u16(c0123.val[1], c1);
+#if AOM_ARCH_AARCH64
+ const uint8x16_t left_m1 = vld1q_u8(left - 1);
+ const uint8x16_t left_0 = vld1q_u8(left + 0);
+ const uint8x16_t left_16 = vld1q_u8(left + 16);
+ const uint8x16_t left_32 = vld1q_u8(left + 32);
+ const uint8x16_t left_48 = vld1q_u8(left + 48);
+ const uint8x16_t left_15 = vextq_u8(left_0, left_16, 15);
+ const uint8x16_t left_31 = vextq_u8(left_16, left_32, 15);
+ const uint8x16_t left_47 = vextq_u8(left_32, left_48, 15);
+ const uint8x16x4_t left_vals0 = { { left_m1, left_15, left_31, left_47 } };
+ const uint8x16x4_t left_vals1 = { { left_0, left_16, left_32, left_48 } };
+#endif // AOM_ARCH_AARCH64
+
for (int r = 0; r < H; r++) {
uint16x8x2_t res, r6, shift;
- uint16x8_t ydx, j256;
+ uint16x8_t j256;
uint8x16_t resx, resy, resxy;
int y = r + 1;
- ydx = vdupq_n_u16((uint16_t)(y * dx));
+ uint16x8_t ydx = vdupq_n_u16((uint16_t)(y * dx));
int base_x = (-y * dx) >> frac_bits_x;
for (int j = 0; j < W; j += 16) {
@@ -1417,6 +1903,7 @@
}
if (base_shift < 16) {
+ uint8x16_t a0_x128, a1_x128;
a0_x128 = vld1q_u8(above + base_x + base_shift + j);
a1_x128 = vld1q_u8(above + base_x + base_shift + 1 + j);
vector_shuffle(&a0_x128, &v_zero, base_shift);
@@ -1471,19 +1958,20 @@
mask256.val[0] = vcgtq_s16(min_base_y256, base_y_c256.val[0]);
mask256.val[1] = vcgtq_s16(min_base_y256, base_y_c256.val[1]);
- base_y_c256.val[0] = vorrq_s16(
- vandq_s16(vreinterpretq_s16_u16(mask256.val[0]), min_base_y256),
- vbicq_s16(base_y_c256.val[0],
- vreinterpretq_s16_u16(mask256.val[0])));
- base_y_c256.val[1] = vorrq_s16(
- vandq_s16(vreinterpretq_s16_u16(mask256.val[1]), min_base_y256),
- vbicq_s16(base_y_c256.val[1],
- vreinterpretq_s16_u16(mask256.val[1])));
+ base_y_c256.val[0] =
+ vbslq_s16(mask256.val[0], min_base_y256, base_y_c256.val[0]);
+ base_y_c256.val[1] =
+ vbslq_s16(mask256.val[1], min_base_y256, base_y_c256.val[1]);
int16_t min_y = vgetq_lane_s16(base_y_c256.val[1], 7);
int16_t max_y = vgetq_lane_s16(base_y_c256.val[0], 0);
int16_t offset_diff = max_y - min_y;
+ uint8x8_t a0_y0;
+ uint8x8_t a0_y1;
+ uint8x8_t a1_y0;
+ uint8x8_t a1_y1;
+
if (offset_diff < 16) {
assert(offset_diff >= 0);
int16x8_t min_y256 =
@@ -1503,7 +1991,7 @@
a0_y128 = vandq_u8(a0_y128, v_loadmaskz2);
a1_y128 = vld1q_u8(left + min_y + 1);
a1_y128 = vandq_u8(a1_y128, v_loadmaskz2);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
a0_y128 = vqtbl1q_u8(a0_y128, vreinterpretq_u8_s8(base_y_offset128));
a1_y128 = vqtbl1q_u8(a1_y128, vreinterpretq_u8_s8(base_y_offset128));
#else
@@ -1524,73 +2012,91 @@
v_res.val[1] = vtbl2_u8(v_tmp, v_index_high);
a1_y128 = vcombine_u8(v_res.val[0], v_res.val[1]);
#endif
- a0_y = vzipq_u8(a0_y128, v_zero);
- a1_y = vzipq_u8(a1_y128, v_zero);
+ a0_y0 = vget_low_u8(a0_y128);
+ a0_y1 = vget_high_u8(a0_y128);
+ a1_y0 = vget_low_u8(a1_y128);
+ a1_y1 = vget_high_u8(a1_y128);
} else {
+ // Values in base_y_c256 range from -1 through 62 inclusive.
base_y_c256.val[0] = vbicq_s16(base_y_c256.val[0],
vreinterpretq_s16_u16(mask256.val[0]));
base_y_c256.val[1] = vbicq_s16(base_y_c256.val[1],
vreinterpretq_s16_u16(mask256.val[1]));
+
+#if AOM_ARCH_AARCH64
+ // Values in left_idx{0,1} range from 0 through 63 inclusive.
+ uint8x16_t left_idx0 = vreinterpretq_u8_s16(base_y_c256.val[0] + 1);
+ uint8x16_t left_idx1 = vreinterpretq_u8_s16(base_y_c256.val[1] + 1);
+
+ uint8x16_t left_idx01 = vuzp1q_u8(left_idx0, left_idx1);
+
+ uint8x16_t a0_y01 = vqtbl4q_u8(left_vals0, left_idx01);
+ uint8x16_t a1_y01 = vqtbl4q_u8(left_vals1, left_idx01);
+
+ a0_y0 = vget_low_u8(a0_y01);
+ a0_y1 = vget_high_u8(a0_y01);
+ a1_y0 = vget_low_u8(a1_y01);
+ a1_y1 = vget_high_u8(a1_y01);
+#else // !AOM_ARCH_AARCH64
+ DECLARE_ALIGNED(32, int16_t, base_y_c[16]);
+
vst1q_s16(base_y_c, base_y_c256.val[0]);
vst1q_s16(base_y_c + 8, base_y_c256.val[1]);
- a0_y.val[0] = v_zero;
- a0_y.val[1] = v_zero;
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[0], a0_y.val[0], 0);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[1], a0_y.val[0], 2);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[2], a0_y.val[0], 4);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[3], a0_y.val[0], 6);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[4], a0_y.val[0], 8);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[5], a0_y.val[0], 10);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[6], a0_y.val[0], 12);
- a0_y.val[0] = vld1q_lane_u8(left + base_y_c[7], a0_y.val[0], 14);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[8], a0_y.val[1], 0);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[9], a0_y.val[1], 2);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[10], a0_y.val[1], 4);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[11], a0_y.val[1], 6);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[12], a0_y.val[1], 8);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[13], a0_y.val[1], 10);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[14], a0_y.val[1], 12);
- a0_y.val[1] = vld1q_lane_u8(left + base_y_c[15], a0_y.val[1], 14);
+ a0_y0 = vdup_n_u8(0);
+ a0_y0 = vld1_lane_u8(left + base_y_c[0], a0_y0, 0);
+ a0_y0 = vld1_lane_u8(left + base_y_c[1], a0_y0, 1);
+ a0_y0 = vld1_lane_u8(left + base_y_c[2], a0_y0, 2);
+ a0_y0 = vld1_lane_u8(left + base_y_c[3], a0_y0, 3);
+ a0_y0 = vld1_lane_u8(left + base_y_c[4], a0_y0, 4);
+ a0_y0 = vld1_lane_u8(left + base_y_c[5], a0_y0, 5);
+ a0_y0 = vld1_lane_u8(left + base_y_c[6], a0_y0, 6);
+ a0_y0 = vld1_lane_u8(left + base_y_c[7], a0_y0, 7);
+ a0_y1 = vdup_n_u8(0);
+ a0_y1 = vld1_lane_u8(left + base_y_c[8], a0_y1, 0);
+ a0_y1 = vld1_lane_u8(left + base_y_c[9], a0_y1, 1);
+ a0_y1 = vld1_lane_u8(left + base_y_c[10], a0_y1, 2);
+ a0_y1 = vld1_lane_u8(left + base_y_c[11], a0_y1, 3);
+ a0_y1 = vld1_lane_u8(left + base_y_c[12], a0_y1, 4);
+ a0_y1 = vld1_lane_u8(left + base_y_c[13], a0_y1, 5);
+ a0_y1 = vld1_lane_u8(left + base_y_c[14], a0_y1, 6);
+ a0_y1 = vld1_lane_u8(left + base_y_c[15], a0_y1, 7);
base_y_c256.val[0] =
vaddq_s16(base_y_c256.val[0], vreinterpretq_s16_u16(c1));
base_y_c256.val[1] =
vaddq_s16(base_y_c256.val[1], vreinterpretq_s16_u16(c1));
+
vst1q_s16(base_y_c, base_y_c256.val[0]);
vst1q_s16(base_y_c + 8, base_y_c256.val[1]);
- a1_y.val[0] = v_zero;
- a1_y.val[1] = v_zero;
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[0], a1_y.val[0], 0);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[1], a1_y.val[0], 2);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[2], a1_y.val[0], 4);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[3], a1_y.val[0], 6);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[4], a1_y.val[0], 8);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[5], a1_y.val[0], 10);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[6], a1_y.val[0], 12);
- a1_y.val[0] = vld1q_lane_u8(left + base_y_c[7], a1_y.val[0], 14);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[8], a1_y.val[1], 0);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[9], a1_y.val[1], 2);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[10], a1_y.val[1], 4);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[11], a1_y.val[1], 6);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[12], a1_y.val[1], 8);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[13], a1_y.val[1], 10);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[14], a1_y.val[1], 12);
- a1_y.val[1] = vld1q_lane_u8(left + base_y_c[15], a1_y.val[1], 14);
+ a1_y0 = vdup_n_u8(0);
+ a1_y0 = vld1_lane_u8(left + base_y_c[0], a1_y0, 0);
+ a1_y0 = vld1_lane_u8(left + base_y_c[1], a1_y0, 1);
+ a1_y0 = vld1_lane_u8(left + base_y_c[2], a1_y0, 2);
+ a1_y0 = vld1_lane_u8(left + base_y_c[3], a1_y0, 3);
+ a1_y0 = vld1_lane_u8(left + base_y_c[4], a1_y0, 4);
+ a1_y0 = vld1_lane_u8(left + base_y_c[5], a1_y0, 5);
+ a1_y0 = vld1_lane_u8(left + base_y_c[6], a1_y0, 6);
+ a1_y0 = vld1_lane_u8(left + base_y_c[7], a1_y0, 7);
+ a1_y1 = vdup_n_u8(0);
+ a1_y1 = vld1_lane_u8(left + base_y_c[8], a1_y1, 0);
+ a1_y1 = vld1_lane_u8(left + base_y_c[9], a1_y1, 1);
+ a1_y1 = vld1_lane_u8(left + base_y_c[10], a1_y1, 2);
+ a1_y1 = vld1_lane_u8(left + base_y_c[11], a1_y1, 3);
+ a1_y1 = vld1_lane_u8(left + base_y_c[12], a1_y1, 4);
+ a1_y1 = vld1_lane_u8(left + base_y_c[13], a1_y1, 5);
+ a1_y1 = vld1_lane_u8(left + base_y_c[14], a1_y1, 6);
+ a1_y1 = vld1_lane_u8(left + base_y_c[15], a1_y1, 7);
+#endif // AOM_ARCH_AARCH64
}
+
shifty.val[0] = vshrq_n_u16(
vandq_u16(vreinterpretq_u16_s16(y_c256.val[0]), c3f), 1);
shifty.val[1] = vshrq_n_u16(
vandq_u16(vreinterpretq_u16_s16(y_c256.val[1]), c3f), 1);
- diff.val[0] =
- vsubq_u16(vreinterpretq_u16_u8(a1_y.val[0]),
- vreinterpretq_u16_u8(a0_y.val[0])); // a[x+1] - a[x]
- diff.val[1] =
- vsubq_u16(vreinterpretq_u16_u8(a1_y.val[1]),
- vreinterpretq_u16_u8(a0_y.val[1])); // a[x+1] - a[x]
- a32.val[0] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0_y.val[0]),
- v_32); // a[x] * 32 + 16
- a32.val[1] = vmlaq_u16(a16, vreinterpretq_u16_u8(a0_y.val[1]),
- v_32); // a[x] * 32 + 16
+ diff.val[0] = vsubl_u8(a1_y0, a0_y0); // a[x+1] - a[x]
+ diff.val[1] = vsubl_u8(a1_y1, a0_y1); // a[x+1] - a[x]
+ a32.val[0] = vmlal_u8(a16, a0_y0, vdup_n_u8(32)); // a[x] * 32 + 16
+ a32.val[1] = vmlal_u8(a16, a0_y1, vdup_n_u8(32)); // a[x] * 32 + 16
res.val[0] = vmlaq_u16(a32.val[0], diff.val[0], shifty.val[0]);
res.val[1] = vmlaq_u16(a32.val[1], diff.val[1], shifty.val[1]);
@@ -1600,7 +2106,7 @@
resy = v_zero;
}
uint8x16_t mask = vld1q_u8(BaseMask[base_min_diff]);
- resxy = vorrq_u8(vandq_u8(mask, resy), vbicq_u8(resx, mask));
+ resxy = vbslq_u8(mask, resy, resx);
vst1q_u8(dst + j, resxy);
} // for j
dst += stride;
@@ -1629,7 +2135,6 @@
upsample_above, upsample_left, dx, dy);
break;
}
- return;
}
/* ---------------------P R E D I C T I O N Z 3--------------------------- */
@@ -1813,7 +2318,7 @@
w11 = vzipq_u32(vreinterpretq_u32_u16(w6.val[1]),
vreinterpretq_u32_u16(w7.val[1]));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[0] = vzip1q_u64(vreinterpretq_u64_u32(w8.val[0]),
vreinterpretq_u64_u32(w9.val[0]));
d[1] = vzip2q_u64(vreinterpretq_u64_u32(w8.val[0]),
@@ -1883,7 +2388,7 @@
w15 = vzipq_u32(vreinterpretq_u32_u16(w10.val[1]),
vreinterpretq_u32_u16(w11.val[1]));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[0] = vzip1q_u64(vreinterpretq_u64_u32(w12.val[0]),
vreinterpretq_u64_u32(w13.val[0]));
d[1] = vzip2q_u64(vreinterpretq_u64_u32(w12.val[0]),
@@ -1938,7 +2443,7 @@
w15 = vzipq_u32(vreinterpretq_u32_u16(w10.val[1]),
vreinterpretq_u32_u16(w11.val[1]));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[8] = vzip1q_u64(vreinterpretq_u64_u32(w12.val[0]),
vreinterpretq_u64_u32(w13.val[0]));
d[9] = vzip2q_u64(vreinterpretq_u64_u32(w12.val[0]),
@@ -2011,7 +2516,7 @@
// Store first 4-line result
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[0].val[0] = vzip1q_u64(vreinterpretq_u64_u32(w6.val[0]),
vreinterpretq_u64_u32(w14.val[0]));
d[0].val[1] = vzip2q_u64(vreinterpretq_u64_u32(w6.val[0]),
@@ -2067,7 +2572,7 @@
// Store second 4-line result
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[4].val[0] = vzip1q_u64(vreinterpretq_u64_u32(w6.val[0]),
vreinterpretq_u64_u32(w14.val[0]));
d[4].val[1] = vzip2q_u64(vreinterpretq_u64_u32(w6.val[0]),
@@ -2134,7 +2639,7 @@
// Store first 4-line result
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[8].val[0] = vzip1q_u64(vreinterpretq_u64_u32(w6.val[0]),
vreinterpretq_u64_u32(w14.val[0]));
d[8].val[1] = vzip2q_u64(vreinterpretq_u64_u32(w6.val[0]),
@@ -2190,7 +2695,7 @@
// Store second 4-line result
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
d[12].val[0] = vzip1q_u64(vreinterpretq_u64_u32(w6.val[0]),
vreinterpretq_u64_u32(w14.val[0]));
d[12].val[1] = vzip2q_u64(vreinterpretq_u64_u32(w6.val[0]),
@@ -3212,7 +3717,7 @@
int width, int height) {
const uint8x8_t top_left = vdup_n_u8(top_row[-1]);
const uint16x8_t top_left_x2 = vdupq_n_u16(top_row[-1] + top_row[-1]);
- uint8x8_t top;
+ uint8x8_t UNINITIALIZED_IS_SAFE(top);
if (width == 4) {
load_u8_4x1(top_row, &top, 0);
} else { // width == 8
diff --git a/aom_dsp/arm/loopfilter_neon.c b/aom_dsp/arm/loopfilter_neon.c
index f3f86a2..8fc7ccb 100644
--- a/aom_dsp/arm/loopfilter_neon.c
+++ b/aom_dsp/arm/loopfilter_neon.c
@@ -628,7 +628,7 @@
// row1: x p6 p5 p4 p3 p2 p1 p0 | q0 q1 q2 q3 q4 q5 q6 y
// row2: x p6 p5 p4 p3 p2 p1 p0 | q0 q1 q2 q3 q4 q5 q6 y
// row3: x p6 p5 p4 p3 p2 p1 p0 | q0 q1 q2 q3 q4 q5 q6 y
- load_u8_8x16(src - 8, stride, &row0, &row1, &row2, &row3);
+ load_u8_16x4(src - 8, stride, &row0, &row1, &row2, &row3);
pxp3 = vget_low_u8(row0);
p6p2 = vget_low_u8(row1);
@@ -841,8 +841,7 @@
// row1: p1 p0 | q0 q1
// row2: p1 p0 | q0 q1
// row3: p1 p0 | q0 q1
- load_unaligned_u8_4x4(src - 2, stride, (uint32x2_t *)&p1p0,
- (uint32x2_t *)&q0q1);
+ load_unaligned_u8_4x4(src - 2, stride, &p1p0, &q0q1);
transpose_u8_4x4(&p1p0, &q0q1);
@@ -1037,7 +1036,7 @@
void aom_lpf_horizontal_4_neon(uint8_t *src, int stride, const uint8_t *blimit,
const uint8_t *limit, const uint8_t *thresh) {
- uint8x8_t p0q0, UNINITIALIZED_IS_SAFE(p1q1);
+ uint8x8_t UNINITIALIZED_IS_SAFE(p0q0), UNINITIALIZED_IS_SAFE(p1q1);
load_u8_4x1(src - 2 * stride, &p1q1, 0);
load_u8_4x1(src - 1 * stride, &p0q0, 0);
diff --git a/aom_dsp/arm/masked_sad4d_neon.c b/aom_dsp/arm/masked_sad4d_neon.c
new file mode 100644
index 0000000..98daeda
--- /dev/null
+++ b/aom_dsp/arm/masked_sad4d_neon.c
@@ -0,0 +1,563 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/blend.h"
+#include "mem_neon.h"
+#include "sum_neon.h"
+
+static INLINE uint16x8_t masked_sad_16x1_neon(uint16x8_t sad,
+ const uint8x16_t s0,
+ const uint8x16_t a0,
+ const uint8x16_t b0,
+ const uint8x16_t m0) {
+ uint8x16_t m0_inv = vsubq_u8(vdupq_n_u8(AOM_BLEND_A64_MAX_ALPHA), m0);
+ uint16x8_t blend_u16_lo = vmull_u8(vget_low_u8(m0), vget_low_u8(a0));
+ uint16x8_t blend_u16_hi = vmull_u8(vget_high_u8(m0), vget_high_u8(a0));
+ blend_u16_lo = vmlal_u8(blend_u16_lo, vget_low_u8(m0_inv), vget_low_u8(b0));
+ blend_u16_hi = vmlal_u8(blend_u16_hi, vget_high_u8(m0_inv), vget_high_u8(b0));
+
+ uint8x8_t blend_u8_lo = vrshrn_n_u16(blend_u16_lo, AOM_BLEND_A64_ROUND_BITS);
+ uint8x8_t blend_u8_hi = vrshrn_n_u16(blend_u16_hi, AOM_BLEND_A64_ROUND_BITS);
+ uint8x16_t blend_u8 = vcombine_u8(blend_u8_lo, blend_u8_hi);
+ return vpadalq_u8(sad, vabdq_u8(blend_u8, s0));
+}
+
+static INLINE void masked_inv_sadwxhx4d_large_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int width, int height, int h_overflow) {
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ int h_limit = height > h_overflow ? h_overflow : height;
+
+ int ref_offset = 0;
+ int i = 0;
+ do {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ do {
+ int j = 0;
+ do {
+ uint8x16_t s0 = vld1q_u8(src + j);
+ uint8x16_t p0 = vld1q_u8(second_pred + j);
+ uint8x16_t m0 = vld1q_u8(mask + j);
+ sum_lo[0] = masked_sad_16x1_neon(sum_lo[0], s0, p0,
+ vld1q_u8(ref[0] + ref_offset + j), m0);
+ sum_lo[1] = masked_sad_16x1_neon(sum_lo[1], s0, p0,
+ vld1q_u8(ref[1] + ref_offset + j), m0);
+ sum_lo[2] = masked_sad_16x1_neon(sum_lo[2], s0, p0,
+ vld1q_u8(ref[2] + ref_offset + j), m0);
+ sum_lo[3] = masked_sad_16x1_neon(sum_lo[3], s0, p0,
+ vld1q_u8(ref[3] + ref_offset + j), m0);
+
+ uint8x16_t s1 = vld1q_u8(src + j + 16);
+ uint8x16_t p1 = vld1q_u8(second_pred + j + 16);
+ uint8x16_t m1 = vld1q_u8(mask + j + 16);
+ sum_hi[0] = masked_sad_16x1_neon(
+ sum_hi[0], s1, p1, vld1q_u8(ref[0] + ref_offset + j + 16), m1);
+ sum_hi[1] = masked_sad_16x1_neon(
+ sum_hi[1], s1, p1, vld1q_u8(ref[1] + ref_offset + j + 16), m1);
+ sum_hi[2] = masked_sad_16x1_neon(
+ sum_hi[2], s1, p1, vld1q_u8(ref[2] + ref_offset + j + 16), m1);
+ sum_hi[3] = masked_sad_16x1_neon(
+ sum_hi[3], s1, p1, vld1q_u8(ref[3] + ref_offset + j + 16), m1);
+
+ j += 32;
+ } while (j < width);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += width;
+ mask += mask_stride;
+ } while (++i < h_limit);
+
+ sum[0] = vpadalq_u16(sum[0], sum_lo[0]);
+ sum[0] = vpadalq_u16(sum[0], sum_hi[0]);
+ sum[1] = vpadalq_u16(sum[1], sum_lo[1]);
+ sum[1] = vpadalq_u16(sum[1], sum_hi[1]);
+ sum[2] = vpadalq_u16(sum[2], sum_lo[2]);
+ sum[2] = vpadalq_u16(sum[2], sum_hi[2]);
+ sum[3] = vpadalq_u16(sum[3], sum_lo[3]);
+ sum[3] = vpadalq_u16(sum[3], sum_hi[3]);
+
+ h_limit += h_overflow;
+ } while (i < height);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void masked_inv_sad128xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ masked_inv_sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, second_pred,
+ mask, mask_stride, res, 128, h, 32);
+}
+
+static INLINE void masked_inv_sad64xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ masked_inv_sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, second_pred,
+ mask, mask_stride, res, 64, h, 64);
+}
+
+static INLINE void masked_sadwxhx4d_large_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int width, int height, int h_overflow) {
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ int h_limit = height > h_overflow ? h_overflow : height;
+
+ int ref_offset = 0;
+ int i = 0;
+ do {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ do {
+ int j = 0;
+ do {
+ uint8x16_t s0 = vld1q_u8(src + j);
+ uint8x16_t p0 = vld1q_u8(second_pred + j);
+ uint8x16_t m0 = vld1q_u8(mask + j);
+ sum_lo[0] = masked_sad_16x1_neon(
+ sum_lo[0], s0, vld1q_u8(ref[0] + ref_offset + j), p0, m0);
+ sum_lo[1] = masked_sad_16x1_neon(
+ sum_lo[1], s0, vld1q_u8(ref[1] + ref_offset + j), p0, m0);
+ sum_lo[2] = masked_sad_16x1_neon(
+ sum_lo[2], s0, vld1q_u8(ref[2] + ref_offset + j), p0, m0);
+ sum_lo[3] = masked_sad_16x1_neon(
+ sum_lo[3], s0, vld1q_u8(ref[3] + ref_offset + j), p0, m0);
+
+ uint8x16_t s1 = vld1q_u8(src + j + 16);
+ uint8x16_t p1 = vld1q_u8(second_pred + j + 16);
+ uint8x16_t m1 = vld1q_u8(mask + j + 16);
+ sum_hi[0] = masked_sad_16x1_neon(
+ sum_hi[0], s1, vld1q_u8(ref[0] + ref_offset + j + 16), p1, m1);
+ sum_hi[1] = masked_sad_16x1_neon(
+ sum_hi[1], s1, vld1q_u8(ref[1] + ref_offset + j + 16), p1, m1);
+ sum_hi[2] = masked_sad_16x1_neon(
+ sum_hi[2], s1, vld1q_u8(ref[2] + ref_offset + j + 16), p1, m1);
+ sum_hi[3] = masked_sad_16x1_neon(
+ sum_hi[3], s1, vld1q_u8(ref[3] + ref_offset + j + 16), p1, m1);
+
+ j += 32;
+ } while (j < width);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += width;
+ mask += mask_stride;
+ } while (++i < h_limit);
+
+ sum[0] = vpadalq_u16(sum[0], sum_lo[0]);
+ sum[0] = vpadalq_u16(sum[0], sum_hi[0]);
+ sum[1] = vpadalq_u16(sum[1], sum_lo[1]);
+ sum[1] = vpadalq_u16(sum[1], sum_hi[1]);
+ sum[2] = vpadalq_u16(sum[2], sum_lo[2]);
+ sum[2] = vpadalq_u16(sum[2], sum_hi[2]);
+ sum[3] = vpadalq_u16(sum[3], sum_lo[3]);
+ sum[3] = vpadalq_u16(sum[3], sum_hi[3]);
+
+ h_limit += h_overflow;
+ } while (i < height);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void masked_sad128xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ masked_sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, second_pred,
+ mask, mask_stride, res, 128, h, 32);
+}
+
+static INLINE void masked_sad64xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ masked_sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, second_pred,
+ mask, mask_stride, res, 64, h, 64);
+}
+
+static INLINE void masked_inv_sad32xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x16_t s0 = vld1q_u8(src);
+ uint8x16_t p0 = vld1q_u8(second_pred);
+ uint8x16_t m0 = vld1q_u8(mask);
+ sum_lo[0] = masked_sad_16x1_neon(sum_lo[0], s0, p0,
+ vld1q_u8(ref[0] + ref_offset), m0);
+ sum_lo[1] = masked_sad_16x1_neon(sum_lo[1], s0, p0,
+ vld1q_u8(ref[1] + ref_offset), m0);
+ sum_lo[2] = masked_sad_16x1_neon(sum_lo[2], s0, p0,
+ vld1q_u8(ref[2] + ref_offset), m0);
+ sum_lo[3] = masked_sad_16x1_neon(sum_lo[3], s0, p0,
+ vld1q_u8(ref[3] + ref_offset), m0);
+
+ uint8x16_t s1 = vld1q_u8(src + 16);
+ uint8x16_t p1 = vld1q_u8(second_pred + 16);
+ uint8x16_t m1 = vld1q_u8(mask + 16);
+ sum_hi[0] = masked_sad_16x1_neon(sum_hi[0], s1, p1,
+ vld1q_u8(ref[0] + ref_offset + 16), m1);
+ sum_hi[1] = masked_sad_16x1_neon(sum_hi[1], s1, p1,
+ vld1q_u8(ref[1] + ref_offset + 16), m1);
+ sum_hi[2] = masked_sad_16x1_neon(sum_hi[2], s1, p1,
+ vld1q_u8(ref[2] + ref_offset + 16), m1);
+ sum_hi[3] = masked_sad_16x1_neon(sum_hi[3], s1, p1,
+ vld1q_u8(ref[3] + ref_offset + 16), m1);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 32;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_long_add_4d_u16x8(sum_lo, sum_hi));
+}
+
+static INLINE void masked_sad32xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x16_t s0 = vld1q_u8(src);
+ uint8x16_t p0 = vld1q_u8(second_pred);
+ uint8x16_t m0 = vld1q_u8(mask);
+ sum_lo[0] = masked_sad_16x1_neon(sum_lo[0], s0,
+ vld1q_u8(ref[0] + ref_offset), p0, m0);
+ sum_lo[1] = masked_sad_16x1_neon(sum_lo[1], s0,
+ vld1q_u8(ref[1] + ref_offset), p0, m0);
+ sum_lo[2] = masked_sad_16x1_neon(sum_lo[2], s0,
+ vld1q_u8(ref[2] + ref_offset), p0, m0);
+ sum_lo[3] = masked_sad_16x1_neon(sum_lo[3], s0,
+ vld1q_u8(ref[3] + ref_offset), p0, m0);
+
+ uint8x16_t s1 = vld1q_u8(src + 16);
+ uint8x16_t p1 = vld1q_u8(second_pred + 16);
+ uint8x16_t m1 = vld1q_u8(mask + 16);
+ sum_hi[0] = masked_sad_16x1_neon(
+ sum_hi[0], s1, vld1q_u8(ref[0] + ref_offset + 16), p1, m1);
+ sum_hi[1] = masked_sad_16x1_neon(
+ sum_hi[1], s1, vld1q_u8(ref[1] + ref_offset + 16), p1, m1);
+ sum_hi[2] = masked_sad_16x1_neon(
+ sum_hi[2], s1, vld1q_u8(ref[2] + ref_offset + 16), p1, m1);
+ sum_hi[3] = masked_sad_16x1_neon(
+ sum_hi[3], s1, vld1q_u8(ref[3] + ref_offset + 16), p1, m1);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 32;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_long_add_4d_u16x8(sum_lo, sum_hi));
+}
+
+static INLINE void masked_inv_sad16xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ uint16x8_t sum_u16[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint32x4_t sum_u32[4];
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x16_t s0 = vld1q_u8(src);
+ uint8x16_t p0 = vld1q_u8(second_pred);
+ uint8x16_t m0 = vld1q_u8(mask);
+ sum_u16[0] = masked_sad_16x1_neon(sum_u16[0], s0, p0,
+ vld1q_u8(ref[0] + ref_offset), m0);
+ sum_u16[1] = masked_sad_16x1_neon(sum_u16[1], s0, p0,
+ vld1q_u8(ref[1] + ref_offset), m0);
+ sum_u16[2] = masked_sad_16x1_neon(sum_u16[2], s0, p0,
+ vld1q_u8(ref[2] + ref_offset), m0);
+ sum_u16[3] = masked_sad_16x1_neon(sum_u16[3], s0, p0,
+ vld1q_u8(ref[3] + ref_offset), m0);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 16;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ sum_u32[0] = vpaddlq_u16(sum_u16[0]);
+ sum_u32[1] = vpaddlq_u16(sum_u16[1]);
+ sum_u32[2] = vpaddlq_u16(sum_u16[2]);
+ sum_u32[3] = vpaddlq_u16(sum_u16[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum_u32));
+}
+
+static INLINE void masked_sad16xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum_u16[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint32x4_t sum_u32[4];
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x16_t s0 = vld1q_u8(src);
+ uint8x16_t p0 = vld1q_u8(second_pred);
+ uint8x16_t m0 = vld1q_u8(mask);
+ sum_u16[0] = masked_sad_16x1_neon(sum_u16[0], s0,
+ vld1q_u8(ref[0] + ref_offset), p0, m0);
+ sum_u16[1] = masked_sad_16x1_neon(sum_u16[1], s0,
+ vld1q_u8(ref[1] + ref_offset), p0, m0);
+ sum_u16[2] = masked_sad_16x1_neon(sum_u16[2], s0,
+ vld1q_u8(ref[2] + ref_offset), p0, m0);
+ sum_u16[3] = masked_sad_16x1_neon(sum_u16[3], s0,
+ vld1q_u8(ref[3] + ref_offset), p0, m0);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 16;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ sum_u32[0] = vpaddlq_u16(sum_u16[0]);
+ sum_u32[1] = vpaddlq_u16(sum_u16[1]);
+ sum_u32[2] = vpaddlq_u16(sum_u16[2]);
+ sum_u32[3] = vpaddlq_u16(sum_u16[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum_u32));
+}
+
+static INLINE uint16x8_t masked_sad_8x1_neon(uint16x8_t sad, const uint8x8_t s0,
+ const uint8x8_t a0,
+ const uint8x8_t b0,
+ const uint8x8_t m0) {
+ uint8x8_t m0_inv = vsub_u8(vdup_n_u8(AOM_BLEND_A64_MAX_ALPHA), m0);
+ uint16x8_t blend_u16 = vmull_u8(m0, a0);
+ blend_u16 = vmlal_u8(blend_u16, m0_inv, b0);
+
+ uint8x8_t blend_u8 = vrshrn_n_u16(blend_u16, AOM_BLEND_A64_ROUND_BITS);
+ return vabal_u8(sad, blend_u8, s0);
+}
+
+static INLINE void masked_inv_sad8xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x8_t s0 = vld1_u8(src);
+ uint8x8_t p0 = vld1_u8(second_pred);
+ uint8x8_t m0 = vld1_u8(mask);
+ sum[0] =
+ masked_sad_8x1_neon(sum[0], s0, p0, vld1_u8(ref[0] + ref_offset), m0);
+ sum[1] =
+ masked_sad_8x1_neon(sum[1], s0, p0, vld1_u8(ref[1] + ref_offset), m0);
+ sum[2] =
+ masked_sad_8x1_neon(sum[2], s0, p0, vld1_u8(ref[2] + ref_offset), m0);
+ sum[3] =
+ masked_sad_8x1_neon(sum[3], s0, p0, vld1_u8(ref[3] + ref_offset), m0);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 8;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+static INLINE void masked_sad8xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ uint8x8_t s0 = vld1_u8(src);
+ uint8x8_t p0 = vld1_u8(second_pred);
+ uint8x8_t m0 = vld1_u8(mask);
+
+ sum[0] =
+ masked_sad_8x1_neon(sum[0], s0, vld1_u8(ref[0] + ref_offset), p0, m0);
+ sum[1] =
+ masked_sad_8x1_neon(sum[1], s0, vld1_u8(ref[1] + ref_offset), p0, m0);
+ sum[2] =
+ masked_sad_8x1_neon(sum[2], s0, vld1_u8(ref[2] + ref_offset), p0, m0);
+ sum[3] =
+ masked_sad_8x1_neon(sum[3], s0, vld1_u8(ref[3] + ref_offset), p0, m0);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ second_pred += 8;
+ mask += mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+static INLINE void masked_inv_sad4xhx4d_neon(
+ const uint8_t *src, int src_stride, const uint8_t *const ref[4],
+ int ref_stride, const uint8_t *second_pred, const uint8_t *mask,
+ int mask_stride, uint32_t res[4], int h) {
+ uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h / 2;
+ do {
+ uint8x8_t s = load_unaligned_u8(src, src_stride);
+ uint8x8_t r0 = load_unaligned_u8(ref[0] + ref_offset, ref_stride);
+ uint8x8_t r1 = load_unaligned_u8(ref[1] + ref_offset, ref_stride);
+ uint8x8_t r2 = load_unaligned_u8(ref[2] + ref_offset, ref_stride);
+ uint8x8_t r3 = load_unaligned_u8(ref[3] + ref_offset, ref_stride);
+ uint8x8_t p0 = vld1_u8(second_pred);
+ uint8x8_t m0 = load_unaligned_u8(mask, mask_stride);
+
+ sum[0] = masked_sad_8x1_neon(sum[0], s, p0, r0, m0);
+ sum[1] = masked_sad_8x1_neon(sum[1], s, p0, r1, m0);
+ sum[2] = masked_sad_8x1_neon(sum[2], s, p0, r2, m0);
+ sum[3] = masked_sad_8x1_neon(sum[3], s, p0, r3, m0);
+
+ src += 2 * src_stride;
+ ref_offset += 2 * ref_stride;
+ second_pred += 2 * 4;
+ mask += 2 * mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+static INLINE void masked_sad4xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride,
+ const uint8_t *second_pred,
+ const uint8_t *mask, int mask_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h / 2;
+ do {
+ uint8x8_t s = load_unaligned_u8(src, src_stride);
+ uint8x8_t r0 = load_unaligned_u8(ref[0] + ref_offset, ref_stride);
+ uint8x8_t r1 = load_unaligned_u8(ref[1] + ref_offset, ref_stride);
+ uint8x8_t r2 = load_unaligned_u8(ref[2] + ref_offset, ref_stride);
+ uint8x8_t r3 = load_unaligned_u8(ref[3] + ref_offset, ref_stride);
+ uint8x8_t p0 = vld1_u8(second_pred);
+ uint8x8_t m0 = load_unaligned_u8(mask, mask_stride);
+
+ sum[0] = masked_sad_8x1_neon(sum[0], s, r0, p0, m0);
+ sum[1] = masked_sad_8x1_neon(sum[1], s, r1, p0, m0);
+ sum[2] = masked_sad_8x1_neon(sum[2], s, r2, p0, m0);
+ sum[3] = masked_sad_8x1_neon(sum[3], s, r3, p0, m0);
+
+ src += 2 * src_stride;
+ ref_offset += 2 * ref_stride;
+ second_pred += 2 * 4;
+ mask += 2 * mask_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+#define MASKED_SAD4D_WXH_NEON(w, h) \
+ void aom_masked_sad##w##x##h##x4d_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref[4], \
+ int ref_stride, const uint8_t *second_pred, const uint8_t *msk, \
+ int msk_stride, int invert_mask, uint32_t res[4]) { \
+ if (invert_mask) { \
+ return masked_inv_sad##w##xhx4d_neon(src, src_stride, ref, ref_stride, \
+ second_pred, msk, msk_stride, res, \
+ h); \
+ } else { \
+ return masked_sad##w##xhx4d_neon(src, src_stride, ref, ref_stride, \
+ second_pred, msk, msk_stride, res, h); \
+ } \
+ }
+
+MASKED_SAD4D_WXH_NEON(4, 8)
+MASKED_SAD4D_WXH_NEON(4, 4)
+
+MASKED_SAD4D_WXH_NEON(8, 16)
+MASKED_SAD4D_WXH_NEON(8, 8)
+MASKED_SAD4D_WXH_NEON(8, 4)
+
+MASKED_SAD4D_WXH_NEON(16, 32)
+MASKED_SAD4D_WXH_NEON(16, 16)
+MASKED_SAD4D_WXH_NEON(16, 8)
+
+MASKED_SAD4D_WXH_NEON(32, 64)
+MASKED_SAD4D_WXH_NEON(32, 32)
+MASKED_SAD4D_WXH_NEON(32, 16)
+
+MASKED_SAD4D_WXH_NEON(64, 128)
+MASKED_SAD4D_WXH_NEON(64, 64)
+MASKED_SAD4D_WXH_NEON(64, 32)
+
+MASKED_SAD4D_WXH_NEON(128, 128)
+MASKED_SAD4D_WXH_NEON(128, 64)
+
+#if !CONFIG_REALTIME_ONLY
+MASKED_SAD4D_WXH_NEON(4, 16)
+MASKED_SAD4D_WXH_NEON(16, 4)
+MASKED_SAD4D_WXH_NEON(8, 32)
+MASKED_SAD4D_WXH_NEON(32, 8)
+MASKED_SAD4D_WXH_NEON(16, 64)
+MASKED_SAD4D_WXH_NEON(64, 16)
+#endif
diff --git a/aom_dsp/arm/masked_sad_neon.c b/aom_dsp/arm/masked_sad_neon.c
new file mode 100644
index 0000000..340df05
--- /dev/null
+++ b/aom_dsp/arm/masked_sad_neon.c
@@ -0,0 +1,257 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/blend.h"
+#include "mem_neon.h"
+#include "sum_neon.h"
+
+static INLINE uint16x8_t masked_sad_16x1_neon(uint16x8_t sad,
+ const uint8_t *src,
+ const uint8_t *a,
+ const uint8_t *b,
+ const uint8_t *m) {
+ uint8x16_t m0 = vld1q_u8(m);
+ uint8x16_t a0 = vld1q_u8(a);
+ uint8x16_t b0 = vld1q_u8(b);
+ uint8x16_t s0 = vld1q_u8(src);
+
+ uint8x16_t m0_inv = vsubq_u8(vdupq_n_u8(AOM_BLEND_A64_MAX_ALPHA), m0);
+ uint16x8_t blend_u16_lo = vmull_u8(vget_low_u8(m0), vget_low_u8(a0));
+ uint16x8_t blend_u16_hi = vmull_u8(vget_high_u8(m0), vget_high_u8(a0));
+ blend_u16_lo = vmlal_u8(blend_u16_lo, vget_low_u8(m0_inv), vget_low_u8(b0));
+ blend_u16_hi = vmlal_u8(blend_u16_hi, vget_high_u8(m0_inv), vget_high_u8(b0));
+
+ uint8x8_t blend_u8_lo = vrshrn_n_u16(blend_u16_lo, AOM_BLEND_A64_ROUND_BITS);
+ uint8x8_t blend_u8_hi = vrshrn_n_u16(blend_u16_hi, AOM_BLEND_A64_ROUND_BITS);
+ uint8x16_t blend_u8 = vcombine_u8(blend_u8_lo, blend_u8_hi);
+
+ return vpadalq_u8(sad, vabdq_u8(blend_u8, s0));
+}
+
+static INLINE unsigned masked_sad_128xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // Eight accumulator vectors are required to avoid overflow in the 128x128
+ // case.
+ assert(height <= 128);
+ uint16x8_t sad[] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0), vdupq_n_u16(0) };
+
+ do {
+ sad[0] = masked_sad_16x1_neon(sad[0], &src[0], &a[0], &b[0], &m[0]);
+ sad[1] = masked_sad_16x1_neon(sad[1], &src[16], &a[16], &b[16], &m[16]);
+ sad[2] = masked_sad_16x1_neon(sad[2], &src[32], &a[32], &b[32], &m[32]);
+ sad[3] = masked_sad_16x1_neon(sad[3], &src[48], &a[48], &b[48], &m[48]);
+ sad[4] = masked_sad_16x1_neon(sad[4], &src[64], &a[64], &b[64], &m[64]);
+ sad[5] = masked_sad_16x1_neon(sad[5], &src[80], &a[80], &b[80], &m[80]);
+ sad[6] = masked_sad_16x1_neon(sad[6], &src[96], &a[96], &b[96], &m[96]);
+ sad[7] = masked_sad_16x1_neon(sad[7], &src[112], &a[112], &b[112], &m[112]);
+
+ src += src_stride;
+ a += a_stride;
+ b += b_stride;
+ m += m_stride;
+ height--;
+ } while (height != 0);
+
+ return horizontal_long_add_u16x8(sad[0], sad[1]) +
+ horizontal_long_add_u16x8(sad[2], sad[3]) +
+ horizontal_long_add_u16x8(sad[4], sad[5]) +
+ horizontal_long_add_u16x8(sad[6], sad[7]);
+}
+
+static INLINE unsigned masked_sad_64xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // Four accumulator vectors are required to avoid overflow in the 64x128 case.
+ assert(height <= 128);
+ uint16x8_t sad[] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ do {
+ sad[0] = masked_sad_16x1_neon(sad[0], &src[0], &a[0], &b[0], &m[0]);
+ sad[1] = masked_sad_16x1_neon(sad[1], &src[16], &a[16], &b[16], &m[16]);
+ sad[2] = masked_sad_16x1_neon(sad[2], &src[32], &a[32], &b[32], &m[32]);
+ sad[3] = masked_sad_16x1_neon(sad[3], &src[48], &a[48], &b[48], &m[48]);
+
+ src += src_stride;
+ a += a_stride;
+ b += b_stride;
+ m += m_stride;
+ height--;
+ } while (height != 0);
+
+ return horizontal_long_add_u16x8(sad[0], sad[1]) +
+ horizontal_long_add_u16x8(sad[2], sad[3]);
+}
+
+static INLINE unsigned masked_sad_32xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // We could use a single accumulator up to height=64 without overflow.
+ assert(height <= 64);
+ uint16x8_t sad = vdupq_n_u16(0);
+
+ do {
+ sad = masked_sad_16x1_neon(sad, &src[0], &a[0], &b[0], &m[0]);
+ sad = masked_sad_16x1_neon(sad, &src[16], &a[16], &b[16], &m[16]);
+
+ src += src_stride;
+ a += a_stride;
+ b += b_stride;
+ m += m_stride;
+ height--;
+ } while (height != 0);
+
+ return horizontal_add_u16x8(sad);
+}
+
+static INLINE unsigned masked_sad_16xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // We could use a single accumulator up to height=128 without overflow.
+ assert(height <= 128);
+ uint16x8_t sad = vdupq_n_u16(0);
+
+ do {
+ sad = masked_sad_16x1_neon(sad, src, a, b, m);
+
+ src += src_stride;
+ a += a_stride;
+ b += b_stride;
+ m += m_stride;
+ height--;
+ } while (height != 0);
+
+ return horizontal_add_u16x8(sad);
+}
+
+static INLINE unsigned masked_sad_8xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // We could use a single accumulator up to height=128 without overflow.
+ assert(height <= 128);
+ uint16x4_t sad = vdup_n_u16(0);
+
+ do {
+ uint8x8_t m0 = vld1_u8(m);
+ uint8x8_t a0 = vld1_u8(a);
+ uint8x8_t b0 = vld1_u8(b);
+ uint8x8_t s0 = vld1_u8(src);
+
+ uint8x8_t m0_inv = vsub_u8(vdup_n_u8(AOM_BLEND_A64_MAX_ALPHA), m0);
+ uint16x8_t blend_u16 = vmull_u8(m0, a0);
+ blend_u16 = vmlal_u8(blend_u16, m0_inv, b0);
+ uint8x8_t blend_u8 = vrshrn_n_u16(blend_u16, AOM_BLEND_A64_ROUND_BITS);
+
+ sad = vpadal_u8(sad, vabd_u8(blend_u8, s0));
+
+ src += src_stride;
+ a += a_stride;
+ b += b_stride;
+ m += m_stride;
+ height--;
+ } while (height != 0);
+
+ return horizontal_add_u16x4(sad);
+}
+
+static INLINE unsigned masked_sad_4xh_neon(const uint8_t *src, int src_stride,
+ const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ const uint8_t *m, int m_stride,
+ int height) {
+ // Process two rows per loop iteration.
+ assert(height % 2 == 0);
+
+ // We could use a single accumulator up to height=256 without overflow.
+ assert(height <= 256);
+ uint16x4_t sad = vdup_n_u16(0);
+
+ do {
+ uint8x8_t m0 = load_unaligned_u8(m, m_stride);
+ uint8x8_t a0 = load_unaligned_u8(a, a_stride);
+ uint8x8_t b0 = load_unaligned_u8(b, b_stride);
+ uint8x8_t s0 = load_unaligned_u8(src, src_stride);
+
+ uint8x8_t m0_inv = vsub_u8(vdup_n_u8(AOM_BLEND_A64_MAX_ALPHA), m0);
+ uint16x8_t blend_u16 = vmull_u8(m0, a0);
+ blend_u16 = vmlal_u8(blend_u16, m0_inv, b0);
+ uint8x8_t blend_u8 = vrshrn_n_u16(blend_u16, AOM_BLEND_A64_ROUND_BITS);
+
+ sad = vpadal_u8(sad, vabd_u8(blend_u8, s0));
+
+ src += 2 * src_stride;
+ a += 2 * a_stride;
+ b += 2 * b_stride;
+ m += 2 * m_stride;
+ height -= 2;
+ } while (height != 0);
+
+ return horizontal_add_u16x4(sad);
+}
+
+#define MASKED_SAD_WXH_NEON(width, height) \
+ unsigned aom_masked_sad##width##x##height##_neon( \
+ const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
+ const uint8_t *second_pred, const uint8_t *msk, int msk_stride, \
+ int invert_mask) { \
+ if (!invert_mask) \
+ return masked_sad_##width##xh_neon(src, src_stride, ref, ref_stride, \
+ second_pred, width, msk, msk_stride, \
+ height); \
+ else \
+ return masked_sad_##width##xh_neon(src, src_stride, second_pred, width, \
+ ref, ref_stride, msk, msk_stride, \
+ height); \
+ }
+
+MASKED_SAD_WXH_NEON(4, 4)
+MASKED_SAD_WXH_NEON(4, 8)
+MASKED_SAD_WXH_NEON(8, 4)
+MASKED_SAD_WXH_NEON(8, 8)
+MASKED_SAD_WXH_NEON(8, 16)
+MASKED_SAD_WXH_NEON(16, 8)
+MASKED_SAD_WXH_NEON(16, 16)
+MASKED_SAD_WXH_NEON(16, 32)
+MASKED_SAD_WXH_NEON(32, 16)
+MASKED_SAD_WXH_NEON(32, 32)
+MASKED_SAD_WXH_NEON(32, 64)
+MASKED_SAD_WXH_NEON(64, 32)
+MASKED_SAD_WXH_NEON(64, 64)
+MASKED_SAD_WXH_NEON(64, 128)
+MASKED_SAD_WXH_NEON(128, 64)
+MASKED_SAD_WXH_NEON(128, 128)
+#if !CONFIG_REALTIME_ONLY
+MASKED_SAD_WXH_NEON(4, 16)
+MASKED_SAD_WXH_NEON(16, 4)
+MASKED_SAD_WXH_NEON(8, 32)
+MASKED_SAD_WXH_NEON(32, 8)
+MASKED_SAD_WXH_NEON(16, 64)
+MASKED_SAD_WXH_NEON(64, 16)
+#endif
diff --git a/aom_dsp/arm/mem_neon.h b/aom_dsp/arm/mem_neon.h
index 73a5127..16d44c5 100644
--- a/aom_dsp/arm/mem_neon.h
+++ b/aom_dsp/arm/mem_neon.h
@@ -73,14 +73,18 @@
#endif // __GNUC__ < 9
#endif // defined(__GNUC__) && !defined(__clang__)
-static INLINE void store_row2_u8_8x8(uint8_t *s, int p, const uint8x8_t s0,
- const uint8x8_t s1) {
+static INLINE void store_u8_8x2(uint8_t *s, ptrdiff_t p, const uint8x8_t s0,
+ const uint8x8_t s1) {
vst1_u8(s, s0);
s += p;
vst1_u8(s, s1);
s += p;
}
+static INLINE uint8x16_t load_u8_8x2(const uint8_t *s, ptrdiff_t p) {
+ return vcombine_u8(vld1_u8(s), vld1_u8(s + p));
+}
+
/* These intrinsics require immediate values, so we must use #defines
to enforce that. */
#define load_u8_4x1(s, s0, lane) \
@@ -89,6 +93,13 @@
vld1_lane_u32((uint32_t *)(s), vreinterpret_u32_u8(*(s0)), lane)); \
} while (0)
+// Load four bytes into the low half of a uint8x8_t, zero the upper half.
+static INLINE uint8x8_t load_u8_4x1_lane0(const uint8_t *p) {
+ uint8x8_t ret = vdup_n_u8(0);
+ load_u8_4x1(p, &ret, 0);
+ return ret;
+}
+
static INLINE void load_u8_8x8(const uint8_t *s, ptrdiff_t p,
uint8x8_t *const s0, uint8x8_t *const s1,
uint8x8_t *const s2, uint8x8_t *const s3,
@@ -111,16 +122,24 @@
*s7 = vld1_u8(s);
}
-static INLINE void load_u8_8x16(const uint8_t *s, ptrdiff_t p,
- uint8x16_t *const s0, uint8x16_t *const s1,
- uint8x16_t *const s2, uint8x16_t *const s3) {
- *s0 = vld1q_u8(s);
+static INLINE void load_u8_8x7(const uint8_t *s, ptrdiff_t p,
+ uint8x8_t *const s0, uint8x8_t *const s1,
+ uint8x8_t *const s2, uint8x8_t *const s3,
+ uint8x8_t *const s4, uint8x8_t *const s5,
+ uint8x8_t *const s6) {
+ *s0 = vld1_u8(s);
s += p;
- *s1 = vld1q_u8(s);
+ *s1 = vld1_u8(s);
s += p;
- *s2 = vld1q_u8(s);
+ *s2 = vld1_u8(s);
s += p;
- *s3 = vld1q_u8(s);
+ *s3 = vld1_u8(s);
+ s += p;
+ *s4 = vld1_u8(s);
+ s += p;
+ *s5 = vld1_u8(s);
+ s += p;
+ *s6 = vld1_u8(s);
}
static INLINE void load_u8_8x4(const uint8_t *s, const ptrdiff_t p,
@@ -148,6 +167,40 @@
s += p;
}
+static INLINE void load_u16_4x7(const uint16_t *s, ptrdiff_t p,
+ uint16x4_t *const s0, uint16x4_t *const s1,
+ uint16x4_t *const s2, uint16x4_t *const s3,
+ uint16x4_t *const s4, uint16x4_t *const s5,
+ uint16x4_t *const s6) {
+ *s0 = vld1_u16(s);
+ s += p;
+ *s1 = vld1_u16(s);
+ s += p;
+ *s2 = vld1_u16(s);
+ s += p;
+ *s3 = vld1_u16(s);
+ s += p;
+ *s4 = vld1_u16(s);
+ s += p;
+ *s5 = vld1_u16(s);
+ s += p;
+ *s6 = vld1_u16(s);
+}
+
+static INLINE void load_s16_8x2(const int16_t *s, const ptrdiff_t p,
+ int16x8_t *const s0, int16x8_t *const s1) {
+ *s0 = vld1q_s16(s);
+ s += p;
+ *s1 = vld1q_s16(s);
+}
+
+static INLINE void load_u16_8x2(const uint16_t *s, const ptrdiff_t p,
+ uint16x8_t *const s0, uint16x8_t *const s1) {
+ *s0 = vld1q_u16(s);
+ s += p;
+ *s1 = vld1q_u16(s);
+}
+
static INLINE void load_u16_8x4(const uint16_t *s, const ptrdiff_t p,
uint16x8_t *const s0, uint16x8_t *const s1,
uint16x8_t *const s2, uint16x8_t *const s3) {
@@ -161,6 +214,66 @@
s += p;
}
+static INLINE void load_s16_4x11(const int16_t *s, ptrdiff_t p,
+ int16x4_t *const s0, int16x4_t *const s1,
+ int16x4_t *const s2, int16x4_t *const s3,
+ int16x4_t *const s4, int16x4_t *const s5,
+ int16x4_t *const s6, int16x4_t *const s7,
+ int16x4_t *const s8, int16x4_t *const s9,
+ int16x4_t *const s10) {
+ *s0 = vld1_s16(s);
+ s += p;
+ *s1 = vld1_s16(s);
+ s += p;
+ *s2 = vld1_s16(s);
+ s += p;
+ *s3 = vld1_s16(s);
+ s += p;
+ *s4 = vld1_s16(s);
+ s += p;
+ *s5 = vld1_s16(s);
+ s += p;
+ *s6 = vld1_s16(s);
+ s += p;
+ *s7 = vld1_s16(s);
+ s += p;
+ *s8 = vld1_s16(s);
+ s += p;
+ *s9 = vld1_s16(s);
+ s += p;
+ *s10 = vld1_s16(s);
+}
+
+static INLINE void load_u16_4x11(const uint16_t *s, ptrdiff_t p,
+ uint16x4_t *const s0, uint16x4_t *const s1,
+ uint16x4_t *const s2, uint16x4_t *const s3,
+ uint16x4_t *const s4, uint16x4_t *const s5,
+ uint16x4_t *const s6, uint16x4_t *const s7,
+ uint16x4_t *const s8, uint16x4_t *const s9,
+ uint16x4_t *const s10) {
+ *s0 = vld1_u16(s);
+ s += p;
+ *s1 = vld1_u16(s);
+ s += p;
+ *s2 = vld1_u16(s);
+ s += p;
+ *s3 = vld1_u16(s);
+ s += p;
+ *s4 = vld1_u16(s);
+ s += p;
+ *s5 = vld1_u16(s);
+ s += p;
+ *s6 = vld1_u16(s);
+ s += p;
+ *s7 = vld1_u16(s);
+ s += p;
+ *s8 = vld1_u16(s);
+ s += p;
+ *s9 = vld1_u16(s);
+ s += p;
+ *s10 = vld1_u16(s);
+}
+
static INLINE void load_s16_4x8(const int16_t *s, ptrdiff_t p,
int16x4_t *const s0, int16x4_t *const s1,
int16x4_t *const s2, int16x4_t *const s3,
@@ -183,6 +296,88 @@
*s7 = vld1_s16(s);
}
+static INLINE void load_s16_4x7(const int16_t *s, ptrdiff_t p,
+ int16x4_t *const s0, int16x4_t *const s1,
+ int16x4_t *const s2, int16x4_t *const s3,
+ int16x4_t *const s4, int16x4_t *const s5,
+ int16x4_t *const s6) {
+ *s0 = vld1_s16(s);
+ s += p;
+ *s1 = vld1_s16(s);
+ s += p;
+ *s2 = vld1_s16(s);
+ s += p;
+ *s3 = vld1_s16(s);
+ s += p;
+ *s4 = vld1_s16(s);
+ s += p;
+ *s5 = vld1_s16(s);
+ s += p;
+ *s6 = vld1_s16(s);
+}
+
+static INLINE void load_s16_4x5(const int16_t *s, ptrdiff_t p,
+ int16x4_t *const s0, int16x4_t *const s1,
+ int16x4_t *const s2, int16x4_t *const s3,
+ int16x4_t *const s4) {
+ *s0 = vld1_s16(s);
+ s += p;
+ *s1 = vld1_s16(s);
+ s += p;
+ *s2 = vld1_s16(s);
+ s += p;
+ *s3 = vld1_s16(s);
+ s += p;
+ *s4 = vld1_s16(s);
+}
+
+static INLINE void load_u16_4x5(const uint16_t *s, const ptrdiff_t p,
+ uint16x4_t *const s0, uint16x4_t *const s1,
+ uint16x4_t *const s2, uint16x4_t *const s3,
+ uint16x4_t *const s4) {
+ *s0 = vld1_u16(s);
+ s += p;
+ *s1 = vld1_u16(s);
+ s += p;
+ *s2 = vld1_u16(s);
+ s += p;
+ *s3 = vld1_u16(s);
+ s += p;
+ *s4 = vld1_u16(s);
+ s += p;
+}
+
+static INLINE void load_u8_8x5(const uint8_t *s, ptrdiff_t p,
+ uint8x8_t *const s0, uint8x8_t *const s1,
+ uint8x8_t *const s2, uint8x8_t *const s3,
+ uint8x8_t *const s4) {
+ *s0 = vld1_u8(s);
+ s += p;
+ *s1 = vld1_u8(s);
+ s += p;
+ *s2 = vld1_u8(s);
+ s += p;
+ *s3 = vld1_u8(s);
+ s += p;
+ *s4 = vld1_u8(s);
+}
+
+static INLINE void load_u16_8x5(const uint16_t *s, const ptrdiff_t p,
+ uint16x8_t *const s0, uint16x8_t *const s1,
+ uint16x8_t *const s2, uint16x8_t *const s3,
+ uint16x8_t *const s4) {
+ *s0 = vld1q_u16(s);
+ s += p;
+ *s1 = vld1q_u16(s);
+ s += p;
+ *s2 = vld1q_u16(s);
+ s += p;
+ *s3 = vld1q_u16(s);
+ s += p;
+ *s4 = vld1q_u16(s);
+ s += p;
+}
+
static INLINE void load_s16_4x4(const int16_t *s, ptrdiff_t p,
int16x4_t *const s0, int16x4_t *const s1,
int16x4_t *const s2, int16x4_t *const s3) {
@@ -197,6 +392,11 @@
/* These intrinsics require immediate values, so we must use #defines
to enforce that. */
+#define store_u8_2x1(s, s0, lane) \
+ do { \
+ vst1_lane_u16((uint16_t *)(s), vreinterpret_u16_u8(s0), lane); \
+ } while (0)
+
#define store_u8_4x1(s, s0, lane) \
do { \
vst1_lane_u32((uint32_t *)(s), vreinterpret_u32_u8(s0), lane); \
@@ -282,6 +482,13 @@
vst1_u16(s, s3);
}
+static INLINE void store_u16_8x2(uint16_t *s, ptrdiff_t dst_stride,
+ const uint16x8_t s0, const uint16x8_t s1) {
+ vst1q_u16(s, s0);
+ s += dst_stride;
+ vst1q_u16(s, s1);
+}
+
static INLINE void store_u16_8x4(uint16_t *s, ptrdiff_t dst_stride,
const uint16x8_t s0, const uint16x8_t s1,
const uint16x8_t s2, const uint16x8_t s3) {
@@ -328,6 +535,21 @@
vst1_s16(s, s3);
}
+/* These intrinsics require immediate values, so we must use #defines
+ to enforce that. */
+#define store_s16_2x1(s, s0, lane) \
+ do { \
+ vst1_lane_s32((int32_t *)(s), vreinterpret_s32_s16(s0), lane); \
+ } while (0)
+#define store_u16_2x1(s, s0, lane) \
+ do { \
+ vst1_lane_u32((uint32_t *)(s), vreinterpret_u32_u16(s0), lane); \
+ } while (0)
+#define store_u16q_2x1(s, s0, lane) \
+ do { \
+ vst1q_lane_u32((uint32_t *)(s), vreinterpretq_u32_u16(s0), lane); \
+ } while (0)
+
static INLINE void store_s16_8x4(int16_t *s, ptrdiff_t dst_stride,
const int16x8_t s0, const int16x8_t s1,
const int16x8_t s2, const int16x8_t s3) {
@@ -340,6 +562,96 @@
vst1q_s16(s, s3);
}
+static INLINE void load_u8_8x11(const uint8_t *s, ptrdiff_t p,
+ uint8x8_t *const s0, uint8x8_t *const s1,
+ uint8x8_t *const s2, uint8x8_t *const s3,
+ uint8x8_t *const s4, uint8x8_t *const s5,
+ uint8x8_t *const s6, uint8x8_t *const s7,
+ uint8x8_t *const s8, uint8x8_t *const s9,
+ uint8x8_t *const s10) {
+ *s0 = vld1_u8(s);
+ s += p;
+ *s1 = vld1_u8(s);
+ s += p;
+ *s2 = vld1_u8(s);
+ s += p;
+ *s3 = vld1_u8(s);
+ s += p;
+ *s4 = vld1_u8(s);
+ s += p;
+ *s5 = vld1_u8(s);
+ s += p;
+ *s6 = vld1_u8(s);
+ s += p;
+ *s7 = vld1_u8(s);
+ s += p;
+ *s8 = vld1_u8(s);
+ s += p;
+ *s9 = vld1_u8(s);
+ s += p;
+ *s10 = vld1_u8(s);
+}
+
+static INLINE void load_s16_8x11(const int16_t *s, ptrdiff_t p,
+ int16x8_t *const s0, int16x8_t *const s1,
+ int16x8_t *const s2, int16x8_t *const s3,
+ int16x8_t *const s4, int16x8_t *const s5,
+ int16x8_t *const s6, int16x8_t *const s7,
+ int16x8_t *const s8, int16x8_t *const s9,
+ int16x8_t *const s10) {
+ *s0 = vld1q_s16(s);
+ s += p;
+ *s1 = vld1q_s16(s);
+ s += p;
+ *s2 = vld1q_s16(s);
+ s += p;
+ *s3 = vld1q_s16(s);
+ s += p;
+ *s4 = vld1q_s16(s);
+ s += p;
+ *s5 = vld1q_s16(s);
+ s += p;
+ *s6 = vld1q_s16(s);
+ s += p;
+ *s7 = vld1q_s16(s);
+ s += p;
+ *s8 = vld1q_s16(s);
+ s += p;
+ *s9 = vld1q_s16(s);
+ s += p;
+ *s10 = vld1q_s16(s);
+}
+
+static INLINE void load_u16_8x11(const uint16_t *s, ptrdiff_t p,
+ uint16x8_t *const s0, uint16x8_t *const s1,
+ uint16x8_t *const s2, uint16x8_t *const s3,
+ uint16x8_t *const s4, uint16x8_t *const s5,
+ uint16x8_t *const s6, uint16x8_t *const s7,
+ uint16x8_t *const s8, uint16x8_t *const s9,
+ uint16x8_t *const s10) {
+ *s0 = vld1q_u16(s);
+ s += p;
+ *s1 = vld1q_u16(s);
+ s += p;
+ *s2 = vld1q_u16(s);
+ s += p;
+ *s3 = vld1q_u16(s);
+ s += p;
+ *s4 = vld1q_u16(s);
+ s += p;
+ *s5 = vld1q_u16(s);
+ s += p;
+ *s6 = vld1q_u16(s);
+ s += p;
+ *s7 = vld1q_u16(s);
+ s += p;
+ *s8 = vld1q_u16(s);
+ s += p;
+ *s9 = vld1q_u16(s);
+ s += p;
+ *s10 = vld1q_u16(s);
+}
+
static INLINE void load_s16_8x8(const int16_t *s, ptrdiff_t p,
int16x8_t *const s0, int16x8_t *const s1,
int16x8_t *const s2, int16x8_t *const s3,
@@ -362,6 +674,61 @@
*s7 = vld1q_s16(s);
}
+static INLINE void load_u16_8x7(const uint16_t *s, ptrdiff_t p,
+ uint16x8_t *const s0, uint16x8_t *const s1,
+ uint16x8_t *const s2, uint16x8_t *const s3,
+ uint16x8_t *const s4, uint16x8_t *const s5,
+ uint16x8_t *const s6) {
+ *s0 = vld1q_u16(s);
+ s += p;
+ *s1 = vld1q_u16(s);
+ s += p;
+ *s2 = vld1q_u16(s);
+ s += p;
+ *s3 = vld1q_u16(s);
+ s += p;
+ *s4 = vld1q_u16(s);
+ s += p;
+ *s5 = vld1q_u16(s);
+ s += p;
+ *s6 = vld1q_u16(s);
+}
+
+static INLINE void load_s16_8x7(const int16_t *s, ptrdiff_t p,
+ int16x8_t *const s0, int16x8_t *const s1,
+ int16x8_t *const s2, int16x8_t *const s3,
+ int16x8_t *const s4, int16x8_t *const s5,
+ int16x8_t *const s6) {
+ *s0 = vld1q_s16(s);
+ s += p;
+ *s1 = vld1q_s16(s);
+ s += p;
+ *s2 = vld1q_s16(s);
+ s += p;
+ *s3 = vld1q_s16(s);
+ s += p;
+ *s4 = vld1q_s16(s);
+ s += p;
+ *s5 = vld1q_s16(s);
+ s += p;
+ *s6 = vld1q_s16(s);
+}
+
+static INLINE void load_s16_8x5(const int16_t *s, ptrdiff_t p,
+ int16x8_t *const s0, int16x8_t *const s1,
+ int16x8_t *const s2, int16x8_t *const s3,
+ int16x8_t *const s4) {
+ *s0 = vld1q_s16(s);
+ s += p;
+ *s1 = vld1q_s16(s);
+ s += p;
+ *s2 = vld1q_s16(s);
+ s += p;
+ *s3 = vld1q_s16(s);
+ s += p;
+ *s4 = vld1q_s16(s);
+}
+
static INLINE void load_s16_8x4(const int16_t *s, ptrdiff_t p,
int16x8_t *const s0, int16x8_t *const s1,
int16x8_t *const s2, int16x8_t *const s3) {
@@ -404,71 +771,61 @@
return vreinterpretq_u8_u32(a_u32);
}
-static INLINE void load_unaligned_u8_4x8(const uint8_t *buf, int stride,
- uint32x2_t *tu0, uint32x2_t *tu1,
- uint32x2_t *tu2, uint32x2_t *tu3) {
+static INLINE uint8x8_t load_unaligned_u8_2x2(const uint8_t *buf, int stride) {
+ uint16_t a;
+ uint16x4_t a_u16;
+
+ memcpy(&a, buf, 2);
+ buf += stride;
+ a_u16 = vdup_n_u16(a);
+ memcpy(&a, buf, 2);
+ a_u16 = vset_lane_u16(a, a_u16, 1);
+ return vreinterpret_u8_u16(a_u16);
+}
+
+static INLINE uint8x8_t load_unaligned_u8_4x1(const uint8_t *buf) {
uint32_t a;
+ uint32x2_t a_u32;
+
+ memcpy(&a, buf, 4);
+ a_u32 = vdup_n_u32(0);
+ a_u32 = vset_lane_u32(a, a_u32, 0);
+ return vreinterpret_u8_u32(a_u32);
+}
+
+static INLINE uint8x8_t load_unaligned_u8_4x2(const uint8_t *buf, int stride) {
+ uint32_t a;
+ uint32x2_t a_u32;
memcpy(&a, buf, 4);
buf += stride;
- *tu0 = vdup_n_u32(a);
+ a_u32 = vdup_n_u32(a);
memcpy(&a, buf, 4);
- buf += stride;
- *tu0 = vset_lane_u32(a, *tu0, 1);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu1 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu1 = vset_lane_u32(a, *tu1, 1);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu2 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu2 = vset_lane_u32(a, *tu2, 1);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu3 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- *tu3 = vset_lane_u32(a, *tu3, 1);
+ a_u32 = vset_lane_u32(a, a_u32, 1);
+ return vreinterpret_u8_u32(a_u32);
}
static INLINE void load_unaligned_u8_4x4(const uint8_t *buf, int stride,
- uint32x2_t *tu0, uint32x2_t *tu1) {
- uint32_t a;
-
- memcpy(&a, buf, 4);
- buf += stride;
- *tu0 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu0 = vset_lane_u32(a, *tu0, 1);
- memcpy(&a, buf, 4);
- buf += stride;
- *tu1 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- *tu1 = vset_lane_u32(a, *tu1, 1);
+ uint8x8_t *tu0, uint8x8_t *tu1) {
+ *tu0 = load_unaligned_u8_4x2(buf, stride);
+ buf += 2 * stride;
+ *tu1 = load_unaligned_u8_4x2(buf, stride);
}
-static INLINE void load_unaligned_u8_4x1(const uint8_t *buf, int stride,
- uint32x2_t *tu0) {
- uint32_t a;
-
- memcpy(&a, buf, 4);
- buf += stride;
- *tu0 = vset_lane_u32(a, *tu0, 0);
+static INLINE void load_unaligned_u8_3x8(const uint8_t *buf, int stride,
+ uint8x8_t *tu0, uint8x8_t *tu1,
+ uint8x8_t *tu2) {
+ load_unaligned_u8_4x4(buf, stride, tu0, tu1);
+ buf += 4 * stride;
+ *tu2 = load_unaligned_u8_4x2(buf, stride);
}
-static INLINE void load_unaligned_u8_4x2(const uint8_t *buf, int stride,
- uint32x2_t *tu0) {
- uint32_t a;
-
- memcpy(&a, buf, 4);
- buf += stride;
- *tu0 = vdup_n_u32(a);
- memcpy(&a, buf, 4);
- *tu0 = vset_lane_u32(a, *tu0, 1);
+static INLINE void load_unaligned_u8_4x8(const uint8_t *buf, int stride,
+ uint8x8_t *tu0, uint8x8_t *tu1,
+ uint8x8_t *tu2, uint8x8_t *tu3) {
+ load_unaligned_u8_4x4(buf, stride, tu0, tu1);
+ buf += 4 * stride;
+ load_unaligned_u8_4x4(buf, stride, tu2, tu3);
}
/* These intrinsics require immediate values, so we must use #defines
@@ -487,17 +844,6 @@
memcpy(dst, &a, 2); \
} while (0)
-static INLINE void load_unaligned_u8_2x2(const uint8_t *buf, int stride,
- uint16x4_t *tu0) {
- uint16_t a;
-
- memcpy(&a, buf, 2);
- buf += stride;
- *tu0 = vdup_n_u16(a);
- memcpy(&a, buf, 2);
- *tu0 = vset_lane_u16(a, *tu0, 1);
-}
-
static INLINE void load_u8_16x8(const uint8_t *s, ptrdiff_t p,
uint8x16_t *const s0, uint8x16_t *const s1,
uint8x16_t *const s2, uint8x16_t *const s3,
@@ -532,21 +878,65 @@
*s3 = vld1q_u8(s);
}
-static INLINE void load_unaligned_u16_4x4(const uint16_t *buf, uint32_t stride,
- uint64x2_t *tu0, uint64x2_t *tu1) {
+static INLINE void load_u16_8x8(const uint16_t *s, const ptrdiff_t p,
+ uint16x8_t *s0, uint16x8_t *s1, uint16x8_t *s2,
+ uint16x8_t *s3, uint16x8_t *s4, uint16x8_t *s5,
+ uint16x8_t *s6, uint16x8_t *s7) {
+ *s0 = vld1q_u16(s);
+ s += p;
+ *s1 = vld1q_u16(s);
+ s += p;
+ *s2 = vld1q_u16(s);
+ s += p;
+ *s3 = vld1q_u16(s);
+ s += p;
+ *s4 = vld1q_u16(s);
+ s += p;
+ *s5 = vld1q_u16(s);
+ s += p;
+ *s6 = vld1q_u16(s);
+ s += p;
+ *s7 = vld1q_u16(s);
+}
+
+static INLINE void load_u16_16x4(const uint16_t *s, ptrdiff_t p,
+ uint16x8_t *const s0, uint16x8_t *const s1,
+ uint16x8_t *const s2, uint16x8_t *const s3,
+ uint16x8_t *const s4, uint16x8_t *const s5,
+ uint16x8_t *const s6, uint16x8_t *const s7) {
+ *s0 = vld1q_u16(s);
+ *s1 = vld1q_u16(s + 8);
+ s += p;
+ *s2 = vld1q_u16(s);
+ *s3 = vld1q_u16(s + 8);
+ s += p;
+ *s4 = vld1q_u16(s);
+ *s5 = vld1q_u16(s + 8);
+ s += p;
+ *s6 = vld1q_u16(s);
+ *s7 = vld1q_u16(s + 8);
+}
+
+static INLINE uint16x8_t load_unaligned_u16_4x2(const uint16_t *buf,
+ uint32_t stride) {
uint64_t a;
+ uint64x2_t a_u64;
memcpy(&a, buf, 8);
buf += stride;
- *tu0 = vdupq_n_u64(a);
+ a_u64 = vdupq_n_u64(0);
+ a_u64 = vsetq_lane_u64(a, a_u64, 0);
memcpy(&a, buf, 8);
buf += stride;
- *tu0 = vsetq_lane_u64(a, *tu0, 1);
- memcpy(&a, buf, 8);
- buf += stride;
- *tu1 = vdupq_n_u64(a);
- memcpy(&a, buf, 8);
- *tu1 = vsetq_lane_u64(a, *tu1, 1);
+ a_u64 = vsetq_lane_u64(a, a_u64, 1);
+ return vreinterpretq_u16_u64(a_u64);
+}
+
+static INLINE void load_unaligned_u16_4x4(const uint16_t *buf, uint32_t stride,
+ uint16x8_t *tu0, uint16x8_t *tu1) {
+ *tu0 = load_unaligned_u16_4x2(buf, stride);
+ buf += 2 * stride;
+ *tu1 = load_unaligned_u16_4x2(buf, stride);
}
static INLINE void load_s32_4x4(int32_t *s, int32_t p, int32x4_t *s1,
@@ -609,17 +999,9 @@
vst1q_s32(buf + 4, v1);
}
-// Stores the second result at an offset of 8 (instead of 4) to match the output
-// with that of C implementation and the function is similar to
-// store_s16q_to_tran_low(). The offset in the function name signifies that
-// pointer should be incremented by at least 4 in the calling function after
-// store_s16q_to_tran_low_offset_4() call.
-static INLINE void store_s16q_to_tran_low_offset_4(tran_low_t *buf,
- const int16x8_t a) {
- const int32x4_t v0 = vmovl_s16(vget_low_s16(a));
- const int32x4_t v1 = vmovl_s16(vget_high_s16(a));
+static INLINE void store_s16_to_tran_low(tran_low_t *buf, const int16x4_t a) {
+ const int32x4_t v0 = vmovl_s16(a);
vst1q_s32(buf, v0);
- vst1q_s32(buf + 8, v1);
}
#endif // AOM_AOM_DSP_ARM_MEM_NEON_H_
diff --git a/aom_dsp/arm/obmc_sad_neon.c b/aom_dsp/arm/obmc_sad_neon.c
new file mode 100644
index 0000000..a692cbb
--- /dev/null
+++ b/aom_dsp/arm/obmc_sad_neon.c
@@ -0,0 +1,250 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "mem_neon.h"
+#include "sum_neon.h"
+
+static INLINE void obmc_sad_8x1_s16_neon(int16x8_t ref_s16, const int32_t *mask,
+ const int32_t *wsrc, uint32x4_t *sum) {
+ int32x4_t wsrc_lo = vld1q_s32(wsrc);
+ int32x4_t wsrc_hi = vld1q_s32(wsrc + 4);
+
+ int32x4_t mask_lo = vld1q_s32(mask);
+ int32x4_t mask_hi = vld1q_s32(mask + 4);
+
+ int16x8_t mask_s16 =
+ vuzpq_s16(vreinterpretq_s16_s32(mask_lo), vreinterpretq_s16_s32(mask_hi))
+ .val[0];
+
+ int32x4_t pre_lo = vmull_s16(vget_low_s16(ref_s16), vget_low_s16(mask_s16));
+ int32x4_t pre_hi = vmull_s16(vget_high_s16(ref_s16), vget_high_s16(mask_s16));
+
+ uint32x4_t abs_lo = vreinterpretq_u32_s32(vabdq_s32(wsrc_lo, pre_lo));
+ uint32x4_t abs_hi = vreinterpretq_u32_s32(vabdq_s32(wsrc_hi, pre_hi));
+
+ *sum = vrsraq_n_u32(*sum, abs_lo, 12);
+ *sum = vrsraq_n_u32(*sum, abs_hi, 12);
+}
+
+#if AOM_ARCH_AARCH64
+
+// Use tbl for doing a double-width zero extension from 8->32 bits since we can
+// do this in one instruction rather than two (indices out of range (255 here)
+// are set to zero by tbl).
+DECLARE_ALIGNED(16, static const uint8_t, obmc_variance_permute_idx[]) = {
+ 0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255,
+ 4, 255, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255, 7, 255, 255, 255,
+ 8, 255, 255, 255, 9, 255, 255, 255, 10, 255, 255, 255, 11, 255, 255, 255,
+ 12, 255, 255, 255, 13, 255, 255, 255, 14, 255, 255, 255, 15, 255, 255, 255
+};
+
+static INLINE void obmc_sad_8x1_s32_neon(uint32x4_t ref_u32_lo,
+ uint32x4_t ref_u32_hi,
+ const int32_t *mask,
+ const int32_t *wsrc,
+ uint32x4_t sum[2]) {
+ int32x4_t wsrc_lo = vld1q_s32(wsrc);
+ int32x4_t wsrc_hi = vld1q_s32(wsrc + 4);
+ int32x4_t mask_lo = vld1q_s32(mask);
+ int32x4_t mask_hi = vld1q_s32(mask + 4);
+
+ int32x4_t pre_lo = vmulq_s32(vreinterpretq_s32_u32(ref_u32_lo), mask_lo);
+ int32x4_t pre_hi = vmulq_s32(vreinterpretq_s32_u32(ref_u32_hi), mask_hi);
+
+ uint32x4_t abs_lo = vreinterpretq_u32_s32(vabdq_s32(wsrc_lo, pre_lo));
+ uint32x4_t abs_hi = vreinterpretq_u32_s32(vabdq_s32(wsrc_hi, pre_hi));
+
+ sum[0] = vrsraq_n_u32(sum[0], abs_lo, 12);
+ sum[1] = vrsraq_n_u32(sum[1], abs_hi, 12);
+}
+
+static INLINE unsigned int obmc_sad_large_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int width,
+ int height) {
+ uint32x4_t sum[2] = { vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ // Use tbl for doing a double-width zero extension from 8->32 bits since we
+ // can do this in one instruction rather than two.
+ uint8x16_t pre_idx0 = vld1q_u8(&obmc_variance_permute_idx[0]);
+ uint8x16_t pre_idx1 = vld1q_u8(&obmc_variance_permute_idx[16]);
+ uint8x16_t pre_idx2 = vld1q_u8(&obmc_variance_permute_idx[32]);
+ uint8x16_t pre_idx3 = vld1q_u8(&obmc_variance_permute_idx[48]);
+
+ int h = height;
+ do {
+ int w = width;
+ const uint8_t *ref_ptr = ref;
+ do {
+ uint8x16_t r = vld1q_u8(ref_ptr);
+
+ uint32x4_t ref_u32_lo = vreinterpretq_u32_u8(vqtbl1q_u8(r, pre_idx0));
+ uint32x4_t ref_u32_hi = vreinterpretq_u32_u8(vqtbl1q_u8(r, pre_idx1));
+ obmc_sad_8x1_s32_neon(ref_u32_lo, ref_u32_hi, mask, wsrc, sum);
+
+ ref_u32_lo = vreinterpretq_u32_u8(vqtbl1q_u8(r, pre_idx2));
+ ref_u32_hi = vreinterpretq_u32_u8(vqtbl1q_u8(r, pre_idx3));
+ obmc_sad_8x1_s32_neon(ref_u32_lo, ref_u32_hi, mask + 8, wsrc + 8, sum);
+
+ ref_ptr += 16;
+ wsrc += 16;
+ mask += 16;
+ w -= 16;
+ } while (w != 0);
+
+ ref += ref_stride;
+ } while (--h != 0);
+
+ return horizontal_add_u32x4(vaddq_u32(sum[0], sum[1]));
+}
+
+#else // !AOM_ARCH_AARCH64
+
+static INLINE unsigned int obmc_sad_large_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int width,
+ int height) {
+ uint32x4_t sum = vdupq_n_u32(0);
+
+ int h = height;
+ do {
+ int w = width;
+ const uint8_t *ref_ptr = ref;
+ do {
+ uint8x16_t r = vld1q_u8(ref_ptr);
+
+ int16x8_t ref_s16 = vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(r)));
+ obmc_sad_8x1_s16_neon(ref_s16, mask, wsrc, &sum);
+
+ ref_s16 = vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(r)));
+ obmc_sad_8x1_s16_neon(ref_s16, mask + 8, wsrc + 8, &sum);
+
+ ref_ptr += 16;
+ wsrc += 16;
+ mask += 16;
+ w -= 16;
+ } while (w != 0);
+
+ ref += ref_stride;
+ } while (--h != 0);
+
+ return horizontal_add_u32x4(sum);
+}
+
+#endif // AOM_ARCH_AARCH64
+
+static INLINE unsigned int obmc_sad_128xh_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h) {
+ return obmc_sad_large_neon(ref, ref_stride, wsrc, mask, 128, h);
+}
+
+static INLINE unsigned int obmc_sad_64xh_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h) {
+ return obmc_sad_large_neon(ref, ref_stride, wsrc, mask, 64, h);
+}
+
+static INLINE unsigned int obmc_sad_32xh_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h) {
+ return obmc_sad_large_neon(ref, ref_stride, wsrc, mask, 32, h);
+}
+
+static INLINE unsigned int obmc_sad_16xh_neon(const uint8_t *ref,
+ int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h) {
+ return obmc_sad_large_neon(ref, ref_stride, wsrc, mask, 16, h);
+}
+
+static INLINE unsigned int obmc_sad_8xh_neon(const uint8_t *ref, int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int height) {
+ uint32x4_t sum = vdupq_n_u32(0);
+
+ int h = height;
+ do {
+ uint8x8_t r = vld1_u8(ref);
+
+ int16x8_t ref_s16 = vreinterpretq_s16_u16(vmovl_u8(r));
+ obmc_sad_8x1_s16_neon(ref_s16, mask, wsrc, &sum);
+
+ ref += ref_stride;
+ wsrc += 8;
+ mask += 8;
+ } while (--h != 0);
+
+ return horizontal_add_u32x4(sum);
+}
+
+static INLINE unsigned int obmc_sad_4xh_neon(const uint8_t *ref, int ref_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int height) {
+ uint32x4_t sum = vdupq_n_u32(0);
+
+ int h = height / 2;
+ do {
+ uint8x8_t r = load_unaligned_u8(ref, ref_stride);
+
+ int16x8_t ref_s16 = vreinterpretq_s16_u16(vmovl_u8(r));
+ obmc_sad_8x1_s16_neon(ref_s16, mask, wsrc, &sum);
+
+ ref += 2 * ref_stride;
+ wsrc += 8;
+ mask += 8;
+ } while (--h != 0);
+
+ return horizontal_add_u32x4(sum);
+}
+
+#define OBMC_SAD_WXH_NEON(w, h) \
+ unsigned int aom_obmc_sad##w##x##h##_neon( \
+ const uint8_t *ref, int ref_stride, const int32_t *wsrc, \
+ const int32_t *mask) { \
+ return obmc_sad_##w##xh_neon(ref, ref_stride, wsrc, mask, h); \
+ }
+
+OBMC_SAD_WXH_NEON(4, 4)
+OBMC_SAD_WXH_NEON(4, 8)
+OBMC_SAD_WXH_NEON(4, 16)
+
+OBMC_SAD_WXH_NEON(8, 4)
+OBMC_SAD_WXH_NEON(8, 8)
+OBMC_SAD_WXH_NEON(8, 16)
+OBMC_SAD_WXH_NEON(8, 32)
+
+OBMC_SAD_WXH_NEON(16, 4)
+OBMC_SAD_WXH_NEON(16, 8)
+OBMC_SAD_WXH_NEON(16, 16)
+OBMC_SAD_WXH_NEON(16, 32)
+OBMC_SAD_WXH_NEON(16, 64)
+
+OBMC_SAD_WXH_NEON(32, 8)
+OBMC_SAD_WXH_NEON(32, 16)
+OBMC_SAD_WXH_NEON(32, 32)
+OBMC_SAD_WXH_NEON(32, 64)
+
+OBMC_SAD_WXH_NEON(64, 16)
+OBMC_SAD_WXH_NEON(64, 32)
+OBMC_SAD_WXH_NEON(64, 64)
+OBMC_SAD_WXH_NEON(64, 128)
+
+OBMC_SAD_WXH_NEON(128, 64)
+OBMC_SAD_WXH_NEON(128, 128)
diff --git a/aom_dsp/arm/obmc_variance_neon.c b/aom_dsp/arm/obmc_variance_neon.c
new file mode 100644
index 0000000..50cd5f3
--- /dev/null
+++ b/aom_dsp/arm/obmc_variance_neon.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "mem_neon.h"
+#include "sum_neon.h"
+
+static INLINE void obmc_variance_8x1_s16_neon(int16x8_t pre_s16,
+ const int32_t *wsrc,
+ const int32_t *mask,
+ int32x4_t *ssev,
+ int32x4_t *sumv) {
+ // For 4xh and 8xh we observe it is faster to avoid the double-widening of
+ // pre. Instead we do a single widening step and narrow the mask to 16-bits
+ // to allow us to perform a widening multiply. Widening multiply
+ // instructions have better throughput on some micro-architectures but for
+ // the larger block sizes this benefit is outweighed by the additional
+ // instruction needed to first narrow the mask vectors.
+
+ int32x4_t wsrc_s32_lo = vld1q_s32(&wsrc[0]);
+ int32x4_t wsrc_s32_hi = vld1q_s32(&wsrc[4]);
+ int16x8_t mask_s16 = vuzpq_s16(vreinterpretq_s16_s32(vld1q_s32(&mask[0])),
+ vreinterpretq_s16_s32(vld1q_s32(&mask[4])))
+ .val[0];
+
+ int32x4_t diff_s32_lo =
+ vmlsl_s16(wsrc_s32_lo, vget_low_s16(pre_s16), vget_low_s16(mask_s16));
+ int32x4_t diff_s32_hi =
+ vmlsl_s16(wsrc_s32_hi, vget_high_s16(pre_s16), vget_high_s16(mask_s16));
+
+ // ROUND_POWER_OF_TWO_SIGNED(value, 12) rounds to nearest with ties away
+ // from zero, however vrshrq_n_s32 rounds to nearest with ties rounded up.
+ // This difference only affects the bit patterns at the rounding breakpoints
+ // exactly, so we can add -1 to all negative numbers to move the breakpoint
+ // one value across and into the correct rounding region.
+ diff_s32_lo = vsraq_n_s32(diff_s32_lo, diff_s32_lo, 31);
+ diff_s32_hi = vsraq_n_s32(diff_s32_hi, diff_s32_hi, 31);
+ int32x4_t round_s32_lo = vrshrq_n_s32(diff_s32_lo, 12);
+ int32x4_t round_s32_hi = vrshrq_n_s32(diff_s32_hi, 12);
+
+ *sumv = vrsraq_n_s32(*sumv, diff_s32_lo, 12);
+ *sumv = vrsraq_n_s32(*sumv, diff_s32_hi, 12);
+ *ssev = vmlaq_s32(*ssev, round_s32_lo, round_s32_lo);
+ *ssev = vmlaq_s32(*ssev, round_s32_hi, round_s32_hi);
+}
+
+#if AOM_ARCH_AARCH64
+
+// Use tbl for doing a double-width zero extension from 8->32 bits since we can
+// do this in one instruction rather than two (indices out of range (255 here)
+// are set to zero by tbl).
+DECLARE_ALIGNED(16, static const uint8_t, obmc_variance_permute_idx[]) = {
+ 0, 255, 255, 255, 1, 255, 255, 255, 2, 255, 255, 255, 3, 255, 255, 255,
+ 4, 255, 255, 255, 5, 255, 255, 255, 6, 255, 255, 255, 7, 255, 255, 255,
+ 8, 255, 255, 255, 9, 255, 255, 255, 10, 255, 255, 255, 11, 255, 255, 255,
+ 12, 255, 255, 255, 13, 255, 255, 255, 14, 255, 255, 255, 15, 255, 255, 255
+};
+
+static INLINE void obmc_variance_8x1_s32_neon(
+ int32x4_t pre_lo, int32x4_t pre_hi, const int32_t *wsrc,
+ const int32_t *mask, int32x4_t *ssev, int32x4_t *sumv) {
+ int32x4_t wsrc_lo = vld1q_s32(&wsrc[0]);
+ int32x4_t wsrc_hi = vld1q_s32(&wsrc[4]);
+ int32x4_t mask_lo = vld1q_s32(&mask[0]);
+ int32x4_t mask_hi = vld1q_s32(&mask[4]);
+
+ int32x4_t diff_lo = vmlsq_s32(wsrc_lo, pre_lo, mask_lo);
+ int32x4_t diff_hi = vmlsq_s32(wsrc_hi, pre_hi, mask_hi);
+
+ // ROUND_POWER_OF_TWO_SIGNED(value, 12) rounds to nearest with ties away from
+ // zero, however vrshrq_n_s32 rounds to nearest with ties rounded up. This
+ // difference only affects the bit patterns at the rounding breakpoints
+ // exactly, so we can add -1 to all negative numbers to move the breakpoint
+ // one value across and into the correct rounding region.
+ diff_lo = vsraq_n_s32(diff_lo, diff_lo, 31);
+ diff_hi = vsraq_n_s32(diff_hi, diff_hi, 31);
+ int32x4_t round_lo = vrshrq_n_s32(diff_lo, 12);
+ int32x4_t round_hi = vrshrq_n_s32(diff_hi, 12);
+
+ *sumv = vrsraq_n_s32(*sumv, diff_lo, 12);
+ *sumv = vrsraq_n_s32(*sumv, diff_hi, 12);
+ *ssev = vmlaq_s32(*ssev, round_lo, round_lo);
+ *ssev = vmlaq_s32(*ssev, round_hi, round_hi);
+}
+
+static INLINE void obmc_variance_large_neon(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int width,
+ int height, unsigned *sse,
+ int *sum) {
+ assert(width % 16 == 0);
+
+ // Use tbl for doing a double-width zero extension from 8->32 bits since we
+ // can do this in one instruction rather than two.
+ uint8x16_t pre_idx0 = vld1q_u8(&obmc_variance_permute_idx[0]);
+ uint8x16_t pre_idx1 = vld1q_u8(&obmc_variance_permute_idx[16]);
+ uint8x16_t pre_idx2 = vld1q_u8(&obmc_variance_permute_idx[32]);
+ uint8x16_t pre_idx3 = vld1q_u8(&obmc_variance_permute_idx[48]);
+
+ int32x4_t ssev = vdupq_n_s32(0);
+ int32x4_t sumv = vdupq_n_s32(0);
+
+ int h = height;
+ do {
+ int w = width;
+ do {
+ uint8x16_t pre_u8 = vld1q_u8(pre);
+
+ int32x4_t pre_s32_lo = vreinterpretq_s32_u8(vqtbl1q_u8(pre_u8, pre_idx0));
+ int32x4_t pre_s32_hi = vreinterpretq_s32_u8(vqtbl1q_u8(pre_u8, pre_idx1));
+ obmc_variance_8x1_s32_neon(pre_s32_lo, pre_s32_hi, &wsrc[0], &mask[0],
+ &ssev, &sumv);
+
+ pre_s32_lo = vreinterpretq_s32_u8(vqtbl1q_u8(pre_u8, pre_idx2));
+ pre_s32_hi = vreinterpretq_s32_u8(vqtbl1q_u8(pre_u8, pre_idx3));
+ obmc_variance_8x1_s32_neon(pre_s32_lo, pre_s32_hi, &wsrc[8], &mask[8],
+ &ssev, &sumv);
+
+ wsrc += 16;
+ mask += 16;
+ pre += 16;
+ w -= 16;
+ } while (w != 0);
+
+ pre += pre_stride - width;
+ } while (--h != 0);
+
+ *sse = horizontal_add_s32x4(ssev);
+ *sum = horizontal_add_s32x4(sumv);
+}
+
+#else // !AOM_ARCH_AARCH64
+
+static INLINE void obmc_variance_large_neon(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int width,
+ int height, unsigned *sse,
+ int *sum) {
+ // Non-aarch64 targets do not have a 128-bit tbl instruction, so use the
+ // widening version of the core kernel instead.
+
+ assert(width % 16 == 0);
+
+ int32x4_t ssev = vdupq_n_s32(0);
+ int32x4_t sumv = vdupq_n_s32(0);
+
+ int h = height;
+ do {
+ int w = width;
+ do {
+ uint8x16_t pre_u8 = vld1q_u8(pre);
+
+ int16x8_t pre_s16 = vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(pre_u8)));
+ obmc_variance_8x1_s16_neon(pre_s16, &wsrc[0], &mask[0], &ssev, &sumv);
+
+ pre_s16 = vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(pre_u8)));
+ obmc_variance_8x1_s16_neon(pre_s16, &wsrc[8], &mask[8], &ssev, &sumv);
+
+ wsrc += 16;
+ mask += 16;
+ pre += 16;
+ w -= 16;
+ } while (w != 0);
+
+ pre += pre_stride - width;
+ } while (--h != 0);
+
+ *sse = horizontal_add_s32x4(ssev);
+ *sum = horizontal_add_s32x4(sumv);
+}
+
+#endif // AOM_ARCH_AARCH64
+
+static INLINE void obmc_variance_neon_128xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ obmc_variance_large_neon(pre, pre_stride, wsrc, mask, 128, h, sse, sum);
+}
+
+static INLINE void obmc_variance_neon_64xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ obmc_variance_large_neon(pre, pre_stride, wsrc, mask, 64, h, sse, sum);
+}
+
+static INLINE void obmc_variance_neon_32xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ obmc_variance_large_neon(pre, pre_stride, wsrc, mask, 32, h, sse, sum);
+}
+
+static INLINE void obmc_variance_neon_16xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ obmc_variance_large_neon(pre, pre_stride, wsrc, mask, 16, h, sse, sum);
+}
+
+static INLINE void obmc_variance_neon_8xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ int32x4_t ssev = vdupq_n_s32(0);
+ int32x4_t sumv = vdupq_n_s32(0);
+
+ do {
+ uint8x8_t pre_u8 = vld1_u8(pre);
+ int16x8_t pre_s16 = vreinterpretq_s16_u16(vmovl_u8(pre_u8));
+
+ obmc_variance_8x1_s16_neon(pre_s16, wsrc, mask, &ssev, &sumv);
+
+ pre += pre_stride;
+ wsrc += 8;
+ mask += 8;
+ } while (--h != 0);
+
+ *sse = horizontal_add_s32x4(ssev);
+ *sum = horizontal_add_s32x4(sumv);
+}
+
+static INLINE void obmc_variance_neon_4xh(const uint8_t *pre, int pre_stride,
+ const int32_t *wsrc,
+ const int32_t *mask, int h,
+ unsigned *sse, int *sum) {
+ assert(h % 2 == 0);
+
+ int32x4_t ssev = vdupq_n_s32(0);
+ int32x4_t sumv = vdupq_n_s32(0);
+
+ do {
+ uint8x8_t pre_u8 = load_unaligned_u8(pre, pre_stride);
+ int16x8_t pre_s16 = vreinterpretq_s16_u16(vmovl_u8(pre_u8));
+
+ obmc_variance_8x1_s16_neon(pre_s16, wsrc, mask, &ssev, &sumv);
+
+ pre += 2 * pre_stride;
+ wsrc += 8;
+ mask += 8;
+ h -= 2;
+ } while (h != 0);
+
+ *sse = horizontal_add_s32x4(ssev);
+ *sum = horizontal_add_s32x4(sumv);
+}
+
+#define OBMC_VARIANCE_WXH_NEON(W, H) \
+ unsigned aom_obmc_variance##W##x##H##_neon( \
+ const uint8_t *pre, int pre_stride, const int32_t *wsrc, \
+ const int32_t *mask, unsigned *sse) { \
+ int sum; \
+ obmc_variance_neon_##W##xh(pre, pre_stride, wsrc, mask, H, sse, &sum); \
+ return *sse - (unsigned)(((int64_t)sum * sum) / (W * H)); \
+ }
+
+OBMC_VARIANCE_WXH_NEON(4, 4)
+OBMC_VARIANCE_WXH_NEON(4, 8)
+OBMC_VARIANCE_WXH_NEON(8, 4)
+OBMC_VARIANCE_WXH_NEON(8, 8)
+OBMC_VARIANCE_WXH_NEON(8, 16)
+OBMC_VARIANCE_WXH_NEON(16, 8)
+OBMC_VARIANCE_WXH_NEON(16, 16)
+OBMC_VARIANCE_WXH_NEON(16, 32)
+OBMC_VARIANCE_WXH_NEON(32, 16)
+OBMC_VARIANCE_WXH_NEON(32, 32)
+OBMC_VARIANCE_WXH_NEON(32, 64)
+OBMC_VARIANCE_WXH_NEON(64, 32)
+OBMC_VARIANCE_WXH_NEON(64, 64)
+OBMC_VARIANCE_WXH_NEON(64, 128)
+OBMC_VARIANCE_WXH_NEON(128, 64)
+OBMC_VARIANCE_WXH_NEON(128, 128)
+OBMC_VARIANCE_WXH_NEON(4, 16)
+OBMC_VARIANCE_WXH_NEON(16, 4)
+OBMC_VARIANCE_WXH_NEON(8, 32)
+OBMC_VARIANCE_WXH_NEON(32, 8)
+OBMC_VARIANCE_WXH_NEON(16, 64)
+OBMC_VARIANCE_WXH_NEON(64, 16)
diff --git a/aom_dsp/arm/sad4d_neon.c b/aom_dsp/arm/sad4d_neon.c
deleted file mode 100644
index e1eccc3..0000000
--- a/aom_dsp/arm/sad4d_neon.c
+++ /dev/null
@@ -1,534 +0,0 @@
-/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include <arm_neon.h>
-
-#include "config/aom_config.h"
-#include "config/aom_dsp_rtcd.h"
-
-#include "aom/aom_integer.h"
-#include "aom_dsp/arm/sum_neon.h"
-
-#if defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
-
-static INLINE void sad16_neon(uint8x16_t src, uint8x16_t ref,
- uint32x4_t *const sad_sum) {
- uint8x16_t abs_diff = vabdq_u8(src, ref);
- *sad_sum = vdotq_u32(*sad_sum, abs_diff, vdupq_n_u8(1));
-}
-
-static INLINE void sad128xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
- uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
-
- int i = 0;
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- const uint8x16_t s2 = vld1q_u8(src + i * src_stride + 32);
- sad16_neon(s2, vld1q_u8(ref[0] + i * ref_stride + 32), &sum_lo[0]);
- sad16_neon(s2, vld1q_u8(ref[1] + i * ref_stride + 32), &sum_lo[1]);
- sad16_neon(s2, vld1q_u8(ref[2] + i * ref_stride + 32), &sum_lo[2]);
- sad16_neon(s2, vld1q_u8(ref[3] + i * ref_stride + 32), &sum_lo[3]);
-
- const uint8x16_t s3 = vld1q_u8(src + i * src_stride + 48);
- sad16_neon(s3, vld1q_u8(ref[0] + i * ref_stride + 48), &sum_hi[0]);
- sad16_neon(s3, vld1q_u8(ref[1] + i * ref_stride + 48), &sum_hi[1]);
- sad16_neon(s3, vld1q_u8(ref[2] + i * ref_stride + 48), &sum_hi[2]);
- sad16_neon(s3, vld1q_u8(ref[3] + i * ref_stride + 48), &sum_hi[3]);
-
- const uint8x16_t s4 = vld1q_u8(src + i * src_stride + 64);
- sad16_neon(s4, vld1q_u8(ref[0] + i * ref_stride + 64), &sum_lo[0]);
- sad16_neon(s4, vld1q_u8(ref[1] + i * ref_stride + 64), &sum_lo[1]);
- sad16_neon(s4, vld1q_u8(ref[2] + i * ref_stride + 64), &sum_lo[2]);
- sad16_neon(s4, vld1q_u8(ref[3] + i * ref_stride + 64), &sum_lo[3]);
-
- const uint8x16_t s5 = vld1q_u8(src + i * src_stride + 80);
- sad16_neon(s5, vld1q_u8(ref[0] + i * ref_stride + 80), &sum_hi[0]);
- sad16_neon(s5, vld1q_u8(ref[1] + i * ref_stride + 80), &sum_hi[1]);
- sad16_neon(s5, vld1q_u8(ref[2] + i * ref_stride + 80), &sum_hi[2]);
- sad16_neon(s5, vld1q_u8(ref[3] + i * ref_stride + 80), &sum_hi[3]);
-
- const uint8x16_t s6 = vld1q_u8(src + i * src_stride + 96);
- sad16_neon(s6, vld1q_u8(ref[0] + i * ref_stride + 96), &sum_lo[0]);
- sad16_neon(s6, vld1q_u8(ref[1] + i * ref_stride + 96), &sum_lo[1]);
- sad16_neon(s6, vld1q_u8(ref[2] + i * ref_stride + 96), &sum_lo[2]);
- sad16_neon(s6, vld1q_u8(ref[3] + i * ref_stride + 96), &sum_lo[3]);
-
- const uint8x16_t s7 = vld1q_u8(src + i * src_stride + 112);
- sad16_neon(s7, vld1q_u8(ref[0] + i * ref_stride + 112), &sum_hi[0]);
- sad16_neon(s7, vld1q_u8(ref[1] + i * ref_stride + 112), &sum_hi[1]);
- sad16_neon(s7, vld1q_u8(ref[2] + i * ref_stride + 112), &sum_hi[2]);
- sad16_neon(s7, vld1q_u8(ref[3] + i * ref_stride + 112), &sum_hi[3]);
-
- i++;
- } while (i < h);
-
- uint32x4_t res0 = vpaddq_u32(vaddq_u32(sum_lo[0], sum_hi[0]),
- vaddq_u32(sum_lo[1], sum_hi[1]));
- uint32x4_t res1 = vpaddq_u32(vaddq_u32(sum_lo[2], sum_hi[2]),
- vaddq_u32(sum_lo[3], sum_hi[3]));
- vst1q_u32(res, vpaddq_u32(res0, res1));
-}
-
-static INLINE void sad64xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
- uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
-
- int i = 0;
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- const uint8x16_t s2 = vld1q_u8(src + i * src_stride + 32);
- sad16_neon(s2, vld1q_u8(ref[0] + i * ref_stride + 32), &sum_lo[0]);
- sad16_neon(s2, vld1q_u8(ref[1] + i * ref_stride + 32), &sum_lo[1]);
- sad16_neon(s2, vld1q_u8(ref[2] + i * ref_stride + 32), &sum_lo[2]);
- sad16_neon(s2, vld1q_u8(ref[3] + i * ref_stride + 32), &sum_lo[3]);
-
- const uint8x16_t s3 = vld1q_u8(src + i * src_stride + 48);
- sad16_neon(s3, vld1q_u8(ref[0] + i * ref_stride + 48), &sum_hi[0]);
- sad16_neon(s3, vld1q_u8(ref[1] + i * ref_stride + 48), &sum_hi[1]);
- sad16_neon(s3, vld1q_u8(ref[2] + i * ref_stride + 48), &sum_hi[2]);
- sad16_neon(s3, vld1q_u8(ref[3] + i * ref_stride + 48), &sum_hi[3]);
-
- i++;
- } while (i < h);
-
- uint32x4_t res0 = vpaddq_u32(vaddq_u32(sum_lo[0], sum_hi[0]),
- vaddq_u32(sum_lo[1], sum_hi[1]));
- uint32x4_t res1 = vpaddq_u32(vaddq_u32(sum_lo[2], sum_hi[2]),
- vaddq_u32(sum_lo[3], sum_hi[3]));
- vst1q_u32(res, vpaddq_u32(res0, res1));
-}
-
-static INLINE void sad32xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
- uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
-
- int i = 0;
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- i++;
- } while (i < h);
-
- uint32x4_t res0 = vpaddq_u32(vaddq_u32(sum_lo[0], sum_hi[0]),
- vaddq_u32(sum_lo[1], sum_hi[1]));
- uint32x4_t res1 = vpaddq_u32(vaddq_u32(sum_lo[2], sum_hi[2]),
- vaddq_u32(sum_lo[3], sum_hi[3]));
- vst1q_u32(res, vpaddq_u32(res0, res1));
-}
-
-static INLINE void sad16xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
- vdupq_n_u32(0) };
-
- int i = 0;
- do {
- const uint8x16_t s = vld1q_u8(src + i * src_stride);
- sad16_neon(s, vld1q_u8(ref[0] + i * ref_stride), &sum[0]);
- sad16_neon(s, vld1q_u8(ref[1] + i * ref_stride), &sum[1]);
- sad16_neon(s, vld1q_u8(ref[2] + i * ref_stride), &sum[2]);
- sad16_neon(s, vld1q_u8(ref[3] + i * ref_stride), &sum[3]);
-
- i++;
- } while (i < h);
-
- uint32x4_t res0 = vpaddq_u32(sum[0], sum[1]);
- uint32x4_t res1 = vpaddq_u32(sum[2], sum[3]);
- vst1q_u32(res, vpaddq_u32(res0, res1));
-}
-
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
-
-static INLINE void sad16_neon(uint8x16_t src, uint8x16_t ref,
- uint16x8_t *const sad_sum) {
- uint8x16_t abs_diff = vabdq_u8(src, ref);
- *sad_sum = vpadalq_u8(*sad_sum, abs_diff);
-}
-
-static INLINE void sad128xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- vst1q_u32(res, vdupq_n_u32(0));
- int h_tmp = h > 32 ? 32 : h;
-
- int i = 0;
- do {
- uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
- uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- const uint8x16_t s2 = vld1q_u8(src + i * src_stride + 32);
- sad16_neon(s2, vld1q_u8(ref[0] + i * ref_stride + 32), &sum_lo[0]);
- sad16_neon(s2, vld1q_u8(ref[1] + i * ref_stride + 32), &sum_lo[1]);
- sad16_neon(s2, vld1q_u8(ref[2] + i * ref_stride + 32), &sum_lo[2]);
- sad16_neon(s2, vld1q_u8(ref[3] + i * ref_stride + 32), &sum_lo[3]);
-
- const uint8x16_t s3 = vld1q_u8(src + i * src_stride + 48);
- sad16_neon(s3, vld1q_u8(ref[0] + i * ref_stride + 48), &sum_hi[0]);
- sad16_neon(s3, vld1q_u8(ref[1] + i * ref_stride + 48), &sum_hi[1]);
- sad16_neon(s3, vld1q_u8(ref[2] + i * ref_stride + 48), &sum_hi[2]);
- sad16_neon(s3, vld1q_u8(ref[3] + i * ref_stride + 48), &sum_hi[3]);
-
- const uint8x16_t s4 = vld1q_u8(src + i * src_stride + 64);
- sad16_neon(s4, vld1q_u8(ref[0] + i * ref_stride + 64), &sum_lo[0]);
- sad16_neon(s4, vld1q_u8(ref[1] + i * ref_stride + 64), &sum_lo[1]);
- sad16_neon(s4, vld1q_u8(ref[2] + i * ref_stride + 64), &sum_lo[2]);
- sad16_neon(s4, vld1q_u8(ref[3] + i * ref_stride + 64), &sum_lo[3]);
-
- const uint8x16_t s5 = vld1q_u8(src + i * src_stride + 80);
- sad16_neon(s5, vld1q_u8(ref[0] + i * ref_stride + 80), &sum_hi[0]);
- sad16_neon(s5, vld1q_u8(ref[1] + i * ref_stride + 80), &sum_hi[1]);
- sad16_neon(s5, vld1q_u8(ref[2] + i * ref_stride + 80), &sum_hi[2]);
- sad16_neon(s5, vld1q_u8(ref[3] + i * ref_stride + 80), &sum_hi[3]);
-
- const uint8x16_t s6 = vld1q_u8(src + i * src_stride + 96);
- sad16_neon(s6, vld1q_u8(ref[0] + i * ref_stride + 96), &sum_lo[0]);
- sad16_neon(s6, vld1q_u8(ref[1] + i * ref_stride + 96), &sum_lo[1]);
- sad16_neon(s6, vld1q_u8(ref[2] + i * ref_stride + 96), &sum_lo[2]);
- sad16_neon(s6, vld1q_u8(ref[3] + i * ref_stride + 96), &sum_lo[3]);
-
- const uint8x16_t s7 = vld1q_u8(src + i * src_stride + 112);
- sad16_neon(s7, vld1q_u8(ref[0] + i * ref_stride + 112), &sum_hi[0]);
- sad16_neon(s7, vld1q_u8(ref[1] + i * ref_stride + 112), &sum_hi[1]);
- sad16_neon(s7, vld1q_u8(ref[2] + i * ref_stride + 112), &sum_hi[2]);
- sad16_neon(s7, vld1q_u8(ref[3] + i * ref_stride + 112), &sum_hi[3]);
-
- i++;
- } while (i < h_tmp);
-
- res[0] += horizontal_long_add_u16x8(sum_lo[0], sum_hi[0]);
- res[1] += horizontal_long_add_u16x8(sum_lo[1], sum_hi[1]);
- res[2] += horizontal_long_add_u16x8(sum_lo[2], sum_hi[2]);
- res[3] += horizontal_long_add_u16x8(sum_lo[3], sum_hi[3]);
-
- h_tmp += 32;
- } while (i < h);
-}
-
-static INLINE void sad64xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- vst1q_u32(res, vdupq_n_u32(0));
- int h_tmp = h > 64 ? 64 : h;
-
- int i = 0;
- do {
- uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
- uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- const uint8x16_t s2 = vld1q_u8(src + i * src_stride + 32);
- sad16_neon(s2, vld1q_u8(ref[0] + i * ref_stride + 32), &sum_lo[0]);
- sad16_neon(s2, vld1q_u8(ref[1] + i * ref_stride + 32), &sum_lo[1]);
- sad16_neon(s2, vld1q_u8(ref[2] + i * ref_stride + 32), &sum_lo[2]);
- sad16_neon(s2, vld1q_u8(ref[3] + i * ref_stride + 32), &sum_lo[3]);
-
- const uint8x16_t s3 = vld1q_u8(src + i * src_stride + 48);
- sad16_neon(s3, vld1q_u8(ref[0] + i * ref_stride + 48), &sum_hi[0]);
- sad16_neon(s3, vld1q_u8(ref[1] + i * ref_stride + 48), &sum_hi[1]);
- sad16_neon(s3, vld1q_u8(ref[2] + i * ref_stride + 48), &sum_hi[2]);
- sad16_neon(s3, vld1q_u8(ref[3] + i * ref_stride + 48), &sum_hi[3]);
-
- i++;
- } while (i < h_tmp);
-
- res[0] += horizontal_long_add_u16x8(sum_lo[0], sum_hi[0]);
- res[1] += horizontal_long_add_u16x8(sum_lo[1], sum_hi[1]);
- res[2] += horizontal_long_add_u16x8(sum_lo[2], sum_hi[2]);
- res[3] += horizontal_long_add_u16x8(sum_lo[3], sum_hi[3]);
-
- h_tmp += 64;
- } while (i < h);
-}
-
-static INLINE void sad32xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
- uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- int i = 0;
- do {
- const uint8x16_t s0 = vld1q_u8(src + i * src_stride);
- sad16_neon(s0, vld1q_u8(ref[0] + i * ref_stride), &sum_lo[0]);
- sad16_neon(s0, vld1q_u8(ref[1] + i * ref_stride), &sum_lo[1]);
- sad16_neon(s0, vld1q_u8(ref[2] + i * ref_stride), &sum_lo[2]);
- sad16_neon(s0, vld1q_u8(ref[3] + i * ref_stride), &sum_lo[3]);
-
- const uint8x16_t s1 = vld1q_u8(src + i * src_stride + 16);
- sad16_neon(s1, vld1q_u8(ref[0] + i * ref_stride + 16), &sum_hi[0]);
- sad16_neon(s1, vld1q_u8(ref[1] + i * ref_stride + 16), &sum_hi[1]);
- sad16_neon(s1, vld1q_u8(ref[2] + i * ref_stride + 16), &sum_hi[2]);
- sad16_neon(s1, vld1q_u8(ref[3] + i * ref_stride + 16), &sum_hi[3]);
-
- i++;
- } while (i < h);
-
- res[0] = horizontal_long_add_u16x8(sum_lo[0], sum_hi[0]);
- res[1] = horizontal_long_add_u16x8(sum_lo[1], sum_hi[1]);
- res[2] = horizontal_long_add_u16x8(sum_lo[2], sum_hi[2]);
- res[3] = horizontal_long_add_u16x8(sum_lo[3], sum_hi[3]);
-}
-
-static INLINE void sad16xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- int i = 0;
- do {
- const uint8x16_t s = vld1q_u8(src + i * src_stride);
- sad16_neon(s, vld1q_u8(ref[0] + i * ref_stride), &sum[0]);
- sad16_neon(s, vld1q_u8(ref[1] + i * ref_stride), &sum[1]);
- sad16_neon(s, vld1q_u8(ref[2] + i * ref_stride), &sum[2]);
- sad16_neon(s, vld1q_u8(ref[3] + i * ref_stride), &sum[3]);
-
- i++;
- } while (i < h);
-
- res[0] = horizontal_add_u16x8(sum[0]);
- res[1] = horizontal_add_u16x8(sum[1]);
- res[2] = horizontal_add_u16x8(sum[2]);
- res[3] = horizontal_add_u16x8(sum[3]);
-}
-
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
-
-static INLINE void sad8_neon(uint8x8_t src, uint8x8_t ref,
- uint16x8_t *const sad_sum) {
- uint8x8_t abs_diff = vabd_u8(src, ref);
- *sad_sum = vaddw_u8(*sad_sum, abs_diff);
-}
-
-static INLINE void sad8xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- int i = 0;
- do {
- const uint8x8_t s = vld1_u8(src + i * src_stride);
- sad8_neon(s, vld1_u8(ref[0] + i * ref_stride), &sum[0]);
- sad8_neon(s, vld1_u8(ref[1] + i * ref_stride), &sum[1]);
- sad8_neon(s, vld1_u8(ref[2] + i * ref_stride), &sum[2]);
- sad8_neon(s, vld1_u8(ref[3] + i * ref_stride), &sum[3]);
-
- i++;
- } while (i < h);
-
- res[0] = horizontal_add_u16x8(sum[0]);
- res[1] = horizontal_add_u16x8(sum[1]);
- res[2] = horizontal_add_u16x8(sum[2]);
- res[3] = horizontal_add_u16x8(sum[3]);
-}
-
-static INLINE void sad4xhx4d_neon(const uint8_t *src, int src_stride,
- const uint8_t *const ref[4], int ref_stride,
- uint32_t res[4], int h) {
- uint16x8_t sum[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
- vdupq_n_u16(0) };
-
- int i = 0;
- do {
- uint32x2_t s, r0, r1, r2, r3;
- uint32_t s_lo, s_hi, r0_lo, r0_hi, r1_lo, r1_hi, r2_lo, r2_hi, r3_lo, r3_hi;
-
- memcpy(&s_lo, src + i * src_stride, 4);
- memcpy(&r0_lo, ref[0] + i * ref_stride, 4);
- memcpy(&r1_lo, ref[1] + i * ref_stride, 4);
- memcpy(&r2_lo, ref[2] + i * ref_stride, 4);
- memcpy(&r3_lo, ref[3] + i * ref_stride, 4);
- s = vdup_n_u32(s_lo);
- r0 = vdup_n_u32(r0_lo);
- r1 = vdup_n_u32(r1_lo);
- r2 = vdup_n_u32(r2_lo);
- r3 = vdup_n_u32(r3_lo);
-
- memcpy(&s_hi, src + (i + 1) * src_stride, 4);
- memcpy(&r0_hi, ref[0] + (i + 1) * ref_stride, 4);
- memcpy(&r1_hi, ref[1] + (i + 1) * ref_stride, 4);
- memcpy(&r2_hi, ref[2] + (i + 1) * ref_stride, 4);
- memcpy(&r3_hi, ref[3] + (i + 1) * ref_stride, 4);
- s = vset_lane_u32(s_hi, s, 1);
- r0 = vset_lane_u32(r0_hi, r0, 1);
- r1 = vset_lane_u32(r1_hi, r1, 1);
- r2 = vset_lane_u32(r2_hi, r2, 1);
- r3 = vset_lane_u32(r3_hi, r3, 1);
-
- sad8_neon(vreinterpret_u8_u32(s), vreinterpret_u8_u32(r0), &sum[0]);
- sad8_neon(vreinterpret_u8_u32(s), vreinterpret_u8_u32(r1), &sum[1]);
- sad8_neon(vreinterpret_u8_u32(s), vreinterpret_u8_u32(r2), &sum[2]);
- sad8_neon(vreinterpret_u8_u32(s), vreinterpret_u8_u32(r3), &sum[3]);
-
- i += 2;
- } while (i < h);
-
- res[0] = horizontal_add_u16x8(sum[0]);
- res[1] = horizontal_add_u16x8(sum[1]);
- res[2] = horizontal_add_u16x8(sum[2]);
- res[3] = horizontal_add_u16x8(sum[3]);
-}
-
-#define SAD_WXH_4D_NEON(w, h) \
- void aom_sad##w##x##h##x4d_neon(const uint8_t *src, int src_stride, \
- const uint8_t *const ref[4], int ref_stride, \
- uint32_t res[4]) { \
- sad##w##xhx4d_neon(src, src_stride, ref, ref_stride, res, (h)); \
- }
-
-SAD_WXH_4D_NEON(4, 4)
-SAD_WXH_4D_NEON(4, 8)
-SAD_WXH_4D_NEON(4, 16)
-SAD_WXH_4D_NEON(4, 32)
-
-SAD_WXH_4D_NEON(8, 4)
-SAD_WXH_4D_NEON(8, 8)
-SAD_WXH_4D_NEON(8, 16)
-SAD_WXH_4D_NEON(8, 32)
-
-SAD_WXH_4D_NEON(16, 4)
-SAD_WXH_4D_NEON(16, 8)
-SAD_WXH_4D_NEON(16, 16)
-SAD_WXH_4D_NEON(16, 32)
-SAD_WXH_4D_NEON(16, 64)
-
-SAD_WXH_4D_NEON(32, 8)
-SAD_WXH_4D_NEON(32, 16)
-SAD_WXH_4D_NEON(32, 32)
-SAD_WXH_4D_NEON(32, 64)
-
-SAD_WXH_4D_NEON(64, 16)
-SAD_WXH_4D_NEON(64, 32)
-SAD_WXH_4D_NEON(64, 64)
-SAD_WXH_4D_NEON(64, 128)
-
-SAD_WXH_4D_NEON(128, 64)
-SAD_WXH_4D_NEON(128, 128)
-
-#undef SAD_WXH_4D_NEON
-
-#define SAD_SKIP_WXH_4D_NEON(w, h) \
- void aom_sad_skip_##w##x##h##x4d_neon(const uint8_t *src, int src_stride, \
- const uint8_t *const ref[4], \
- int ref_stride, uint32_t res[4]) { \
- sad##w##xhx4d_neon(src, 2 * src_stride, ref, 2 * ref_stride, res, \
- ((h) >> 1)); \
- res[0] <<= 1; \
- res[1] <<= 1; \
- res[2] <<= 1; \
- res[3] <<= 1; \
- }
-
-SAD_SKIP_WXH_4D_NEON(4, 8)
-SAD_SKIP_WXH_4D_NEON(4, 16)
-SAD_SKIP_WXH_4D_NEON(4, 32)
-
-SAD_SKIP_WXH_4D_NEON(8, 8)
-SAD_SKIP_WXH_4D_NEON(8, 16)
-SAD_SKIP_WXH_4D_NEON(8, 32)
-
-SAD_SKIP_WXH_4D_NEON(16, 8)
-SAD_SKIP_WXH_4D_NEON(16, 16)
-SAD_SKIP_WXH_4D_NEON(16, 32)
-SAD_SKIP_WXH_4D_NEON(16, 64)
-
-SAD_SKIP_WXH_4D_NEON(32, 8)
-SAD_SKIP_WXH_4D_NEON(32, 16)
-SAD_SKIP_WXH_4D_NEON(32, 32)
-SAD_SKIP_WXH_4D_NEON(32, 64)
-
-SAD_SKIP_WXH_4D_NEON(64, 16)
-SAD_SKIP_WXH_4D_NEON(64, 32)
-SAD_SKIP_WXH_4D_NEON(64, 64)
-SAD_SKIP_WXH_4D_NEON(64, 128)
-
-SAD_SKIP_WXH_4D_NEON(128, 64)
-SAD_SKIP_WXH_4D_NEON(128, 128)
-
-#undef SAD_SKIP_WXH_4D_NEON
diff --git a/aom_dsp/arm/sad_neon.c b/aom_dsp/arm/sad_neon.c
index 5ba7f10..60efef8 100644
--- a/aom_dsp/arm/sad_neon.c
+++ b/aom_dsp/arm/sad_neon.c
@@ -10,9 +10,12 @@
*/
#include <arm_neon.h>
+
#include "config/aom_config.h"
#include "config/aom_dsp_rtcd.h"
+
#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
#include "aom_dsp/arm/sum_neon.h"
#if defined(__ARM_FEATURE_DOTPROD)
@@ -289,24 +292,13 @@
int i = h / 2;
do {
- uint32x2_t s, r;
- uint32_t s0, s1, r0, r1;
+ uint8x8_t s = load_unaligned_u8(src_ptr, src_stride);
+ uint8x8_t r = load_unaligned_u8(ref_ptr, ref_stride);
- memcpy(&s0, src_ptr, 4);
- memcpy(&r0, ref_ptr, 4);
- s = vdup_n_u32(s0);
- r = vdup_n_u32(r0);
- src_ptr += src_stride;
- ref_ptr += ref_stride;
+ sum = vabal_u8(sum, s, r);
- memcpy(&s1, src_ptr, 4);
- memcpy(&r1, ref_ptr, 4);
- s = vset_lane_u32(s1, s, 1);
- r = vset_lane_u32(r1, r, 1);
- src_ptr += src_stride;
- ref_ptr += ref_stride;
-
- sum = vabal_u8(sum, vreinterpret_u8_u32(s), vreinterpret_u8_u32(r));
+ src_ptr += 2 * src_stride;
+ ref_ptr += 2 * ref_stride;
} while (--i != 0);
return horizontal_add_u16x8(sum);
@@ -320,25 +312,19 @@
SAD_WXH_NEON(4, 4)
SAD_WXH_NEON(4, 8)
-SAD_WXH_NEON(4, 16)
SAD_WXH_NEON(8, 4)
SAD_WXH_NEON(8, 8)
SAD_WXH_NEON(8, 16)
-SAD_WXH_NEON(8, 32)
-SAD_WXH_NEON(16, 4)
SAD_WXH_NEON(16, 8)
SAD_WXH_NEON(16, 16)
SAD_WXH_NEON(16, 32)
-SAD_WXH_NEON(16, 64)
-SAD_WXH_NEON(32, 8)
SAD_WXH_NEON(32, 16)
SAD_WXH_NEON(32, 32)
SAD_WXH_NEON(32, 64)
-SAD_WXH_NEON(64, 16)
SAD_WXH_NEON(64, 32)
SAD_WXH_NEON(64, 64)
SAD_WXH_NEON(64, 128)
@@ -346,6 +332,15 @@
SAD_WXH_NEON(128, 64)
SAD_WXH_NEON(128, 128)
+#if !CONFIG_REALTIME_ONLY
+SAD_WXH_NEON(4, 16)
+SAD_WXH_NEON(8, 32)
+SAD_WXH_NEON(16, 4)
+SAD_WXH_NEON(16, 64)
+SAD_WXH_NEON(32, 8)
+SAD_WXH_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
#undef SAD_WXH_NEON
#define SAD_SKIP_WXH_NEON(w, h) \
@@ -356,24 +351,21 @@
sad##w##xh_neon(src, 2 * src_stride, ref, 2 * ref_stride, (h) / 2); \
}
+SAD_SKIP_WXH_NEON(4, 4)
SAD_SKIP_WXH_NEON(4, 8)
-SAD_SKIP_WXH_NEON(4, 16)
+SAD_SKIP_WXH_NEON(8, 4)
SAD_SKIP_WXH_NEON(8, 8)
SAD_SKIP_WXH_NEON(8, 16)
-SAD_SKIP_WXH_NEON(8, 32)
SAD_SKIP_WXH_NEON(16, 8)
SAD_SKIP_WXH_NEON(16, 16)
SAD_SKIP_WXH_NEON(16, 32)
-SAD_SKIP_WXH_NEON(16, 64)
-SAD_SKIP_WXH_NEON(32, 8)
SAD_SKIP_WXH_NEON(32, 16)
SAD_SKIP_WXH_NEON(32, 32)
SAD_SKIP_WXH_NEON(32, 64)
-SAD_SKIP_WXH_NEON(64, 16)
SAD_SKIP_WXH_NEON(64, 32)
SAD_SKIP_WXH_NEON(64, 64)
SAD_SKIP_WXH_NEON(64, 128)
@@ -381,6 +373,15 @@
SAD_SKIP_WXH_NEON(128, 64)
SAD_SKIP_WXH_NEON(128, 128)
+#if !CONFIG_REALTIME_ONLY
+SAD_SKIP_WXH_NEON(4, 16)
+SAD_SKIP_WXH_NEON(8, 32)
+SAD_SKIP_WXH_NEON(16, 4)
+SAD_SKIP_WXH_NEON(16, 64)
+SAD_SKIP_WXH_NEON(32, 8)
+SAD_SKIP_WXH_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
#undef SAD_SKIP_WXH_NEON
#if defined(__ARM_FEATURE_DOTPROD)
@@ -732,28 +733,15 @@
int i = h / 2;
do {
- uint32x2_t s, r;
- uint32_t s0, s1, r0, r1;
- uint8x8_t p, avg;
+ uint8x8_t s = load_unaligned_u8(src_ptr, src_stride);
+ uint8x8_t r = load_unaligned_u8(ref_ptr, ref_stride);
+ uint8x8_t p = vld1_u8(second_pred);
- memcpy(&s0, src_ptr, 4);
- memcpy(&r0, ref_ptr, 4);
- s = vdup_n_u32(s0);
- r = vdup_n_u32(r0);
- src_ptr += src_stride;
- ref_ptr += ref_stride;
+ uint8x8_t avg = vrhadd_u8(r, p);
+ sum = vabal_u8(sum, s, avg);
- memcpy(&s1, src_ptr, 4);
- memcpy(&r1, ref_ptr, 4);
- s = vset_lane_u32(s1, s, 1);
- r = vset_lane_u32(r1, r, 1);
- src_ptr += src_stride;
- ref_ptr += ref_stride;
-
- p = vld1_u8(second_pred);
- avg = vrhadd_u8(vreinterpret_u8_u32(r), p);
-
- sum = vabal_u8(sum, vreinterpret_u8_u32(s), avg);
+ src_ptr += 2 * src_stride;
+ ref_ptr += 2 * ref_stride;
second_pred += 8;
} while (--i != 0);
@@ -770,25 +758,19 @@
SAD_WXH_AVG_NEON(4, 4)
SAD_WXH_AVG_NEON(4, 8)
-SAD_WXH_AVG_NEON(4, 16)
SAD_WXH_AVG_NEON(8, 4)
SAD_WXH_AVG_NEON(8, 8)
SAD_WXH_AVG_NEON(8, 16)
-SAD_WXH_AVG_NEON(8, 32)
-SAD_WXH_AVG_NEON(16, 4)
SAD_WXH_AVG_NEON(16, 8)
SAD_WXH_AVG_NEON(16, 16)
SAD_WXH_AVG_NEON(16, 32)
-SAD_WXH_AVG_NEON(16, 64)
-SAD_WXH_AVG_NEON(32, 8)
SAD_WXH_AVG_NEON(32, 16)
SAD_WXH_AVG_NEON(32, 32)
SAD_WXH_AVG_NEON(32, 64)
-SAD_WXH_AVG_NEON(64, 16)
SAD_WXH_AVG_NEON(64, 32)
SAD_WXH_AVG_NEON(64, 64)
SAD_WXH_AVG_NEON(64, 128)
@@ -796,4 +778,13 @@
SAD_WXH_AVG_NEON(128, 64)
SAD_WXH_AVG_NEON(128, 128)
+#if !CONFIG_REALTIME_ONLY
+SAD_WXH_AVG_NEON(4, 16)
+SAD_WXH_AVG_NEON(8, 32)
+SAD_WXH_AVG_NEON(16, 4)
+SAD_WXH_AVG_NEON(16, 64)
+SAD_WXH_AVG_NEON(32, 8)
+SAD_WXH_AVG_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
#undef SAD_WXH_AVG_NEON
diff --git a/aom_dsp/arm/sadxd_neon.c b/aom_dsp/arm/sadxd_neon.c
new file mode 100644
index 0000000..81803b1
--- /dev/null
+++ b/aom_dsp/arm/sadxd_neon.c
@@ -0,0 +1,688 @@
+/*
+ * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_integer.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/sum_neon.h"
+
+#if defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE void sad16_neon(uint8x16_t src, uint8x16_t ref,
+ uint32x4_t *const sad_sum) {
+ uint8x16_t abs_diff = vabdq_u8(src, ref);
+ *sad_sum = vdotq_u32(*sad_sum, abs_diff, vdupq_n_u8(1));
+}
+
+static INLINE void sadwxhx3d_large_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride, uint32_t res[4], int w,
+ int h) {
+ uint32x4_t sum_lo[3] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0) };
+ uint32x4_t sum_hi[3] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ int j = 0;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src + j);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset + j), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset + j), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset + j), &sum_lo[2]);
+
+ const uint8x16_t s1 = vld1q_u8(src + j + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + j + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + j + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + j + 16), &sum_hi[2]);
+
+ j += 32;
+ } while (j < w);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_add_u32x4(vaddq_u32(sum_lo[0], sum_hi[0]));
+ res[1] = horizontal_add_u32x4(vaddq_u32(sum_lo[1], sum_hi[1]));
+ res[2] = horizontal_add_u32x4(vaddq_u32(sum_lo[2], sum_hi[2]));
+}
+
+static INLINE void sad128xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx3d_large_neon(src, src_stride, ref, ref_stride, res, 128, h);
+}
+
+static INLINE void sad64xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx3d_large_neon(src, src_stride, ref, ref_stride, res, 64, h);
+}
+
+static INLINE void sad32xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx3d_large_neon(src, src_stride, ref, ref_stride, res, 32, h);
+}
+
+static INLINE void sad16xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint32x4_t sum[3] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s = vld1q_u8(src);
+ sad16_neon(s, vld1q_u8(ref[0] + ref_offset), &sum[0]);
+ sad16_neon(s, vld1q_u8(ref[1] + ref_offset), &sum[1]);
+ sad16_neon(s, vld1q_u8(ref[2] + ref_offset), &sum[2]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_add_u32x4(sum[0]);
+ res[1] = horizontal_add_u32x4(sum[1]);
+ res[2] = horizontal_add_u32x4(sum[2]);
+}
+
+#else // !(defined(__ARM_FEATURE_DOTPROD))
+
+static INLINE void sad16_neon(uint8x16_t src, uint8x16_t ref,
+ uint16x8_t *const sad_sum) {
+ uint8x16_t abs_diff = vabdq_u8(src, ref);
+ *sad_sum = vpadalq_u8(*sad_sum, abs_diff);
+}
+
+static INLINE void sadwxhx3d_large_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3],
+ int ref_stride, uint32_t res[3], int w,
+ int h, int h_overflow) {
+ uint32x4_t sum[3] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0) };
+ int h_limit = h > h_overflow ? h_overflow : h;
+
+ int ref_offset = 0;
+ int i = 0;
+ do {
+ uint16x8_t sum_lo[3] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0) };
+ uint16x8_t sum_hi[3] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0) };
+
+ do {
+ int j = 0;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src + j);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset + j), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset + j), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset + j), &sum_lo[2]);
+
+ const uint8x16_t s1 = vld1q_u8(src + j + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + j + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + j + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + j + 16), &sum_hi[2]);
+
+ j += 32;
+ } while (j < w);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (++i < h_limit);
+
+ sum[0] = vpadalq_u16(sum[0], sum_lo[0]);
+ sum[0] = vpadalq_u16(sum[0], sum_hi[0]);
+ sum[1] = vpadalq_u16(sum[1], sum_lo[1]);
+ sum[1] = vpadalq_u16(sum[1], sum_hi[1]);
+ sum[2] = vpadalq_u16(sum[2], sum_lo[2]);
+ sum[2] = vpadalq_u16(sum[2], sum_hi[2]);
+
+ h_limit += h_overflow;
+ } while (i < h);
+
+ res[0] = horizontal_add_u32x4(sum[0]);
+ res[1] = horizontal_add_u32x4(sum[1]);
+ res[2] = horizontal_add_u32x4(sum[2]);
+}
+
+static INLINE void sad128xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ sadwxhx3d_large_neon(src, src_stride, ref, ref_stride, res, 128, h, 32);
+}
+
+static INLINE void sad64xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ sadwxhx3d_large_neon(src, src_stride, ref, ref_stride, res, 64, h, 64);
+}
+
+static INLINE void sad32xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ uint16x8_t sum_lo[3] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0) };
+ uint16x8_t sum_hi[3] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset), &sum_lo[2]);
+
+ const uint8x16_t s1 = vld1q_u8(src + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + 16), &sum_hi[2]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_long_add_u16x8(sum_lo[0], sum_hi[0]);
+ res[1] = horizontal_long_add_u16x8(sum_lo[1], sum_hi[1]);
+ res[2] = horizontal_long_add_u16x8(sum_lo[2], sum_hi[2]);
+}
+
+static INLINE void sad16xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ uint16x8_t sum[3] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s = vld1q_u8(src);
+ sad16_neon(s, vld1q_u8(ref[0] + ref_offset), &sum[0]);
+ sad16_neon(s, vld1q_u8(ref[1] + ref_offset), &sum[1]);
+ sad16_neon(s, vld1q_u8(ref[2] + ref_offset), &sum[2]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_add_u16x8(sum[0]);
+ res[1] = horizontal_add_u16x8(sum[1]);
+ res[2] = horizontal_add_u16x8(sum[2]);
+}
+
+#endif // defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE void sad8xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ uint16x8_t sum[3];
+
+ uint8x8_t s = vld1_u8(src);
+ sum[0] = vabdl_u8(s, vld1_u8(ref[0]));
+ sum[1] = vabdl_u8(s, vld1_u8(ref[1]));
+ sum[2] = vabdl_u8(s, vld1_u8(ref[2]));
+
+ src += src_stride;
+ int ref_offset = ref_stride;
+ int i = h - 1;
+ do {
+ s = vld1_u8(src);
+ sum[0] = vabal_u8(sum[0], s, vld1_u8(ref[0] + ref_offset));
+ sum[1] = vabal_u8(sum[1], s, vld1_u8(ref[1] + ref_offset));
+ sum[2] = vabal_u8(sum[2], s, vld1_u8(ref[2] + ref_offset));
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_add_u16x8(sum[0]);
+ res[1] = horizontal_add_u16x8(sum[1]);
+ res[2] = horizontal_add_u16x8(sum[2]);
+}
+
+static INLINE void sad4xhx3d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[3], int ref_stride,
+ uint32_t res[3], int h) {
+ assert(h % 2 == 0);
+ uint16x8_t sum[3];
+
+ uint8x8_t s = load_unaligned_u8(src, src_stride);
+ uint8x8_t r0 = load_unaligned_u8(ref[0], ref_stride);
+ uint8x8_t r1 = load_unaligned_u8(ref[1], ref_stride);
+ uint8x8_t r2 = load_unaligned_u8(ref[2], ref_stride);
+
+ sum[0] = vabdl_u8(s, r0);
+ sum[1] = vabdl_u8(s, r1);
+ sum[2] = vabdl_u8(s, r2);
+
+ src += 2 * src_stride;
+ int ref_offset = 2 * ref_stride;
+ int i = (h / 2) - 1;
+ do {
+ s = load_unaligned_u8(src, src_stride);
+ r0 = load_unaligned_u8(ref[0] + ref_offset, ref_stride);
+ r1 = load_unaligned_u8(ref[1] + ref_offset, ref_stride);
+ r2 = load_unaligned_u8(ref[2] + ref_offset, ref_stride);
+
+ sum[0] = vabal_u8(sum[0], s, r0);
+ sum[1] = vabal_u8(sum[1], s, r1);
+ sum[2] = vabal_u8(sum[2], s, r2);
+
+ src += 2 * src_stride;
+ ref_offset += 2 * ref_stride;
+ } while (--i != 0);
+
+ res[0] = horizontal_add_u16x8(sum[0]);
+ res[1] = horizontal_add_u16x8(sum[1]);
+ res[2] = horizontal_add_u16x8(sum[2]);
+}
+
+#define SAD_WXH_3D_NEON(w, h) \
+ void aom_sad##w##x##h##x3d_neon(const uint8_t *src, int src_stride, \
+ const uint8_t *const ref[4], int ref_stride, \
+ uint32_t res[4]) { \
+ sad##w##xhx3d_neon(src, src_stride, ref, ref_stride, res, (h)); \
+ }
+
+SAD_WXH_3D_NEON(4, 4)
+SAD_WXH_3D_NEON(4, 8)
+
+SAD_WXH_3D_NEON(8, 4)
+SAD_WXH_3D_NEON(8, 8)
+SAD_WXH_3D_NEON(8, 16)
+
+SAD_WXH_3D_NEON(16, 8)
+SAD_WXH_3D_NEON(16, 16)
+SAD_WXH_3D_NEON(16, 32)
+
+SAD_WXH_3D_NEON(32, 16)
+SAD_WXH_3D_NEON(32, 32)
+SAD_WXH_3D_NEON(32, 64)
+
+SAD_WXH_3D_NEON(64, 32)
+SAD_WXH_3D_NEON(64, 64)
+SAD_WXH_3D_NEON(64, 128)
+
+SAD_WXH_3D_NEON(128, 64)
+SAD_WXH_3D_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+SAD_WXH_3D_NEON(4, 16)
+SAD_WXH_3D_NEON(8, 32)
+SAD_WXH_3D_NEON(16, 4)
+SAD_WXH_3D_NEON(16, 64)
+SAD_WXH_3D_NEON(32, 8)
+SAD_WXH_3D_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
+#undef SAD_WXH_3D_NEON
+
+#if defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE void sadwxhx4d_large_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride, uint32_t res[4], int w,
+ int h) {
+ uint32x4_t sum_lo[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum_hi[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ uint32x4_t sum[4];
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ int j = 0;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src + j);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset + j), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset + j), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset + j), &sum_lo[2]);
+ sad16_neon(s0, vld1q_u8(ref[3] + ref_offset + j), &sum_lo[3]);
+
+ const uint8x16_t s1 = vld1q_u8(src + j + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + j + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + j + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + j + 16), &sum_hi[2]);
+ sad16_neon(s1, vld1q_u8(ref[3] + ref_offset + j + 16), &sum_hi[3]);
+
+ j += 32;
+ } while (j < w);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ sum[0] = vaddq_u32(sum_lo[0], sum_hi[0]);
+ sum[1] = vaddq_u32(sum_lo[1], sum_hi[1]);
+ sum[2] = vaddq_u32(sum_lo[2], sum_hi[2]);
+ sum[3] = vaddq_u32(sum_lo[3], sum_hi[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void sad128xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, res, 128, h);
+}
+
+static INLINE void sad64xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, res, 64, h);
+}
+
+static INLINE void sad32xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, res, 32, h);
+}
+
+static INLINE void sad16xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s = vld1q_u8(src);
+ sad16_neon(s, vld1q_u8(ref[0] + ref_offset), &sum[0]);
+ sad16_neon(s, vld1q_u8(ref[1] + ref_offset), &sum[1]);
+ sad16_neon(s, vld1q_u8(ref[2] + ref_offset), &sum[2]);
+ sad16_neon(s, vld1q_u8(ref[3] + ref_offset), &sum[3]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+#else // !(defined(__ARM_FEATURE_DOTPROD))
+
+static INLINE void sadwxhx4d_large_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4],
+ int ref_stride, uint32_t res[4], int w,
+ int h, int h_overflow) {
+ uint32x4_t sum[4] = { vdupq_n_u32(0), vdupq_n_u32(0), vdupq_n_u32(0),
+ vdupq_n_u32(0) };
+ int h_limit = h > h_overflow ? h_overflow : h;
+
+ int ref_offset = 0;
+ int i = 0;
+ do {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ do {
+ int j = 0;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src + j);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset + j), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset + j), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset + j), &sum_lo[2]);
+ sad16_neon(s0, vld1q_u8(ref[3] + ref_offset + j), &sum_lo[3]);
+
+ const uint8x16_t s1 = vld1q_u8(src + j + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + j + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + j + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + j + 16), &sum_hi[2]);
+ sad16_neon(s1, vld1q_u8(ref[3] + ref_offset + j + 16), &sum_hi[3]);
+
+ j += 32;
+ } while (j < w);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (++i < h_limit);
+
+ sum[0] = vpadalq_u16(sum[0], sum_lo[0]);
+ sum[0] = vpadalq_u16(sum[0], sum_hi[0]);
+ sum[1] = vpadalq_u16(sum[1], sum_lo[1]);
+ sum[1] = vpadalq_u16(sum[1], sum_hi[1]);
+ sum[2] = vpadalq_u16(sum[2], sum_lo[2]);
+ sum[2] = vpadalq_u16(sum[2], sum_hi[2]);
+ sum[3] = vpadalq_u16(sum[3], sum_lo[3]);
+ sum[3] = vpadalq_u16(sum[3], sum_hi[3]);
+
+ h_limit += h_overflow;
+ } while (i < h);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum));
+}
+
+static INLINE void sad128xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, res, 128, h, 32);
+}
+
+static INLINE void sad64xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ sadwxhx4d_large_neon(src, src_stride, ref, ref_stride, res, 64, h, 64);
+}
+
+static INLINE void sad32xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum_lo[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint16x8_t sum_hi[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s0 = vld1q_u8(src);
+ sad16_neon(s0, vld1q_u8(ref[0] + ref_offset), &sum_lo[0]);
+ sad16_neon(s0, vld1q_u8(ref[1] + ref_offset), &sum_lo[1]);
+ sad16_neon(s0, vld1q_u8(ref[2] + ref_offset), &sum_lo[2]);
+ sad16_neon(s0, vld1q_u8(ref[3] + ref_offset), &sum_lo[3]);
+
+ const uint8x16_t s1 = vld1q_u8(src + 16);
+ sad16_neon(s1, vld1q_u8(ref[0] + ref_offset + 16), &sum_hi[0]);
+ sad16_neon(s1, vld1q_u8(ref[1] + ref_offset + 16), &sum_hi[1]);
+ sad16_neon(s1, vld1q_u8(ref[2] + ref_offset + 16), &sum_hi[2]);
+ sad16_neon(s1, vld1q_u8(ref[3] + ref_offset + 16), &sum_hi[3]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_long_add_4d_u16x8(sum_lo, sum_hi));
+}
+
+static INLINE void sad16xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum_u16[4] = { vdupq_n_u16(0), vdupq_n_u16(0), vdupq_n_u16(0),
+ vdupq_n_u16(0) };
+ uint32x4_t sum_u32[4];
+
+ int ref_offset = 0;
+ int i = h;
+ do {
+ const uint8x16_t s = vld1q_u8(src);
+ sad16_neon(s, vld1q_u8(ref[0] + ref_offset), &sum_u16[0]);
+ sad16_neon(s, vld1q_u8(ref[1] + ref_offset), &sum_u16[1]);
+ sad16_neon(s, vld1q_u8(ref[2] + ref_offset), &sum_u16[2]);
+ sad16_neon(s, vld1q_u8(ref[3] + ref_offset), &sum_u16[3]);
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ sum_u32[0] = vpaddlq_u16(sum_u16[0]);
+ sum_u32[1] = vpaddlq_u16(sum_u16[1]);
+ sum_u32[2] = vpaddlq_u16(sum_u16[2]);
+ sum_u32[3] = vpaddlq_u16(sum_u16[3]);
+
+ vst1q_u32(res, horizontal_add_4d_u32x4(sum_u32));
+}
+
+#endif // defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE void sad8xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum[4];
+
+ uint8x8_t s = vld1_u8(src);
+ sum[0] = vabdl_u8(s, vld1_u8(ref[0]));
+ sum[1] = vabdl_u8(s, vld1_u8(ref[1]));
+ sum[2] = vabdl_u8(s, vld1_u8(ref[2]));
+ sum[3] = vabdl_u8(s, vld1_u8(ref[3]));
+
+ src += src_stride;
+ int ref_offset = ref_stride;
+ int i = h - 1;
+ do {
+ s = vld1_u8(src);
+ sum[0] = vabal_u8(sum[0], s, vld1_u8(ref[0] + ref_offset));
+ sum[1] = vabal_u8(sum[1], s, vld1_u8(ref[1] + ref_offset));
+ sum[2] = vabal_u8(sum[2], s, vld1_u8(ref[2] + ref_offset));
+ sum[3] = vabal_u8(sum[3], s, vld1_u8(ref[3] + ref_offset));
+
+ src += src_stride;
+ ref_offset += ref_stride;
+ } while (--i != 0);
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+static INLINE void sad4xhx4d_neon(const uint8_t *src, int src_stride,
+ const uint8_t *const ref[4], int ref_stride,
+ uint32_t res[4], int h) {
+ uint16x8_t sum[4];
+
+ uint8x8_t s = load_unaligned_u8(src, src_stride);
+ uint8x8_t r0 = load_unaligned_u8(ref[0], ref_stride);
+ uint8x8_t r1 = load_unaligned_u8(ref[1], ref_stride);
+ uint8x8_t r2 = load_unaligned_u8(ref[2], ref_stride);
+ uint8x8_t r3 = load_unaligned_u8(ref[3], ref_stride);
+
+ sum[0] = vabdl_u8(s, r0);
+ sum[1] = vabdl_u8(s, r1);
+ sum[2] = vabdl_u8(s, r2);
+ sum[3] = vabdl_u8(s, r3);
+
+ src += 2 * src_stride;
+ int ref_offset = 2 * ref_stride;
+ int i = h / 2;
+ while (--i != 0) {
+ s = load_unaligned_u8(src, src_stride);
+ r0 = load_unaligned_u8(ref[0] + ref_offset, ref_stride);
+ r1 = load_unaligned_u8(ref[1] + ref_offset, ref_stride);
+ r2 = load_unaligned_u8(ref[2] + ref_offset, ref_stride);
+ r3 = load_unaligned_u8(ref[3] + ref_offset, ref_stride);
+
+ sum[0] = vabal_u8(sum[0], s, r0);
+ sum[1] = vabal_u8(sum[1], s, r1);
+ sum[2] = vabal_u8(sum[2], s, r2);
+ sum[3] = vabal_u8(sum[3], s, r3);
+
+ src += 2 * src_stride;
+ ref_offset += 2 * ref_stride;
+ }
+
+ vst1q_u32(res, horizontal_add_4d_u16x8(sum));
+}
+
+#define SAD_WXH_4D_NEON(w, h) \
+ void aom_sad##w##x##h##x4d_neon(const uint8_t *src, int src_stride, \
+ const uint8_t *const ref[4], int ref_stride, \
+ uint32_t res[4]) { \
+ sad##w##xhx4d_neon(src, src_stride, ref, ref_stride, res, (h)); \
+ }
+
+SAD_WXH_4D_NEON(4, 4)
+SAD_WXH_4D_NEON(4, 8)
+
+SAD_WXH_4D_NEON(8, 4)
+SAD_WXH_4D_NEON(8, 8)
+SAD_WXH_4D_NEON(8, 16)
+
+SAD_WXH_4D_NEON(16, 8)
+SAD_WXH_4D_NEON(16, 16)
+SAD_WXH_4D_NEON(16, 32)
+
+SAD_WXH_4D_NEON(32, 16)
+SAD_WXH_4D_NEON(32, 32)
+SAD_WXH_4D_NEON(32, 64)
+
+SAD_WXH_4D_NEON(64, 32)
+SAD_WXH_4D_NEON(64, 64)
+SAD_WXH_4D_NEON(64, 128)
+
+SAD_WXH_4D_NEON(128, 64)
+SAD_WXH_4D_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+SAD_WXH_4D_NEON(4, 16)
+SAD_WXH_4D_NEON(8, 32)
+SAD_WXH_4D_NEON(16, 4)
+SAD_WXH_4D_NEON(16, 64)
+SAD_WXH_4D_NEON(32, 8)
+SAD_WXH_4D_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
+#undef SAD_WXH_4D_NEON
+
+#define SAD_SKIP_WXH_4D_NEON(w, h) \
+ void aom_sad_skip_##w##x##h##x4d_neon(const uint8_t *src, int src_stride, \
+ const uint8_t *const ref[4], \
+ int ref_stride, uint32_t res[4]) { \
+ sad##w##xhx4d_neon(src, 2 * src_stride, ref, 2 * ref_stride, res, \
+ ((h) >> 1)); \
+ res[0] <<= 1; \
+ res[1] <<= 1; \
+ res[2] <<= 1; \
+ res[3] <<= 1; \
+ }
+
+SAD_SKIP_WXH_4D_NEON(4, 4)
+SAD_SKIP_WXH_4D_NEON(4, 8)
+
+SAD_SKIP_WXH_4D_NEON(8, 4)
+SAD_SKIP_WXH_4D_NEON(8, 8)
+SAD_SKIP_WXH_4D_NEON(8, 16)
+
+SAD_SKIP_WXH_4D_NEON(16, 8)
+SAD_SKIP_WXH_4D_NEON(16, 16)
+SAD_SKIP_WXH_4D_NEON(16, 32)
+
+SAD_SKIP_WXH_4D_NEON(32, 16)
+SAD_SKIP_WXH_4D_NEON(32, 32)
+SAD_SKIP_WXH_4D_NEON(32, 64)
+
+SAD_SKIP_WXH_4D_NEON(64, 32)
+SAD_SKIP_WXH_4D_NEON(64, 64)
+SAD_SKIP_WXH_4D_NEON(64, 128)
+
+SAD_SKIP_WXH_4D_NEON(128, 64)
+SAD_SKIP_WXH_4D_NEON(128, 128)
+
+#if !CONFIG_REALTIME_ONLY
+SAD_SKIP_WXH_4D_NEON(4, 16)
+SAD_SKIP_WXH_4D_NEON(8, 32)
+SAD_SKIP_WXH_4D_NEON(16, 4)
+SAD_SKIP_WXH_4D_NEON(16, 64)
+SAD_SKIP_WXH_4D_NEON(32, 8)
+SAD_SKIP_WXH_4D_NEON(64, 16)
+#endif // !CONFIG_REALTIME_ONLY
+
+#undef SAD_SKIP_WXH_4D_NEON
diff --git a/aom_dsp/arm/sse_neon.c b/aom_dsp/arm/sse_neon.c
index 2c988dc..d1d3d93 100644
--- a/aom_dsp/arm/sse_neon.c
+++ b/aom_dsp/arm/sse_neon.c
@@ -348,7 +348,8 @@
int64_t aom_highbd_sse_neon(const uint8_t *a8, int a_stride, const uint8_t *b8,
int b_stride, int width, int height) {
- const uint16x8_t q0 = { 0, 1, 2, 3, 4, 5, 6, 7 };
+ static const uint16_t k01234567[8] = { 0, 1, 2, 3, 4, 5, 6, 7 };
+ const uint16x8_t q0 = vld1q_u16(k01234567);
int64_t sse = 0;
uint16_t *a = CONVERT_TO_SHORTPTR(a8);
uint16_t *b = CONVERT_TO_SHORTPTR(b8);
diff --git a/aom_dsp/arm/subpel_variance_neon.c b/aom_dsp/arm/subpel_variance_neon.c
index a058860..9599ae0 100644
--- a/aom_dsp/arm/subpel_variance_neon.c
+++ b/aom_dsp/arm/subpel_variance_neon.c
@@ -549,3 +549,239 @@
#undef SUBPEL_AVG_VARIANCE_WXH_NEON
#undef SPECIALIZED_SUBPEL_AVG_VARIANCE_WXH_NEON
+
+#if !CONFIG_REALTIME_ONLY
+
+#define OBMC_SUBPEL_VARIANCE_WXH_NEON(w, h, padding) \
+ unsigned int aom_obmc_sub_pixel_variance##w##x##h##_neon( \
+ const uint8_t *pre, int pre_stride, int xoffset, int yoffset, \
+ const int32_t *wsrc, const int32_t *mask, unsigned int *sse) { \
+ uint8_t tmp0[w * (h + padding)]; \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_bil_w##w(pre, tmp0, pre_stride, 1, h + padding, \
+ xoffset); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ return aom_obmc_variance##w##x##h(tmp1, w, wsrc, mask, sse); \
+ }
+
+#define SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(w, h, padding) \
+ unsigned int aom_obmc_sub_pixel_variance##w##x##h##_neon( \
+ const uint8_t *pre, int pre_stride, int xoffset, int yoffset, \
+ const int32_t *wsrc, const int32_t *mask, unsigned int *sse) { \
+ if (xoffset == 0) { \
+ if (yoffset == 0) { \
+ return aom_obmc_variance##w##x##h##_neon(pre, pre_stride, wsrc, mask, \
+ sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp[w * h]; \
+ var_filter_block2d_avg(pre, tmp, pre_stride, pre_stride, w, h); \
+ return aom_obmc_variance##w##x##h##_neon(tmp, w, wsrc, mask, sse); \
+ } else { \
+ uint8_t tmp[w * h]; \
+ var_filter_block2d_bil_w##w(pre, tmp, pre_stride, pre_stride, h, \
+ yoffset); \
+ return aom_obmc_variance##w##x##h##_neon(tmp, w, wsrc, mask, sse); \
+ } \
+ } else if (xoffset == 4) { \
+ uint8_t tmp0[w * (h + padding)]; \
+ if (yoffset == 0) { \
+ var_filter_block2d_avg(pre, tmp0, pre_stride, 1, w, h); \
+ return aom_obmc_variance##w##x##h##_neon(tmp0, w, wsrc, mask, sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp1[w * (h + padding)]; \
+ var_filter_block2d_avg(pre, tmp0, pre_stride, 1, w, h + padding); \
+ var_filter_block2d_avg(tmp0, tmp1, w, w, w, h); \
+ return aom_obmc_variance##w##x##h##_neon(tmp1, w, wsrc, mask, sse); \
+ } else { \
+ uint8_t tmp1[w * (h + padding)]; \
+ var_filter_block2d_avg(pre, tmp0, pre_stride, 1, w, h + padding); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ return aom_obmc_variance##w##x##h##_neon(tmp1, w, wsrc, mask, sse); \
+ } \
+ } else { \
+ uint8_t tmp0[w * (h + padding)]; \
+ if (yoffset == 0) { \
+ var_filter_block2d_bil_w##w(pre, tmp0, pre_stride, 1, h, xoffset); \
+ return aom_obmc_variance##w##x##h##_neon(tmp0, w, wsrc, mask, sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_bil_w##w(pre, tmp0, pre_stride, 1, h + padding, \
+ xoffset); \
+ var_filter_block2d_avg(tmp0, tmp1, w, w, w, h); \
+ return aom_obmc_variance##w##x##h##_neon(tmp1, w, wsrc, mask, sse); \
+ } else { \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_bil_w##w(pre, tmp0, pre_stride, 1, h + padding, \
+ xoffset); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ return aom_obmc_variance##w##x##h##_neon(tmp1, w, wsrc, mask, sse); \
+ } \
+ } \
+ }
+
+OBMC_SUBPEL_VARIANCE_WXH_NEON(4, 4, 2)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(4, 8, 2)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(4, 16, 2)
+
+OBMC_SUBPEL_VARIANCE_WXH_NEON(8, 4, 1)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(8, 8, 1)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(8, 16, 1)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(8, 32, 1)
+
+OBMC_SUBPEL_VARIANCE_WXH_NEON(16, 4, 1)
+OBMC_SUBPEL_VARIANCE_WXH_NEON(16, 8, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(16, 16, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(16, 32, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(16, 64, 1)
+
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(32, 8, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(32, 16, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(32, 32, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(32, 64, 1)
+
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(64, 16, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(64, 32, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(64, 64, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(64, 128, 1)
+
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(128, 64, 1)
+SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON(128, 128, 1)
+
+#undef OBMC_SUBPEL_VARIANCE_WXH_NEON
+#undef SPECIALIZED_OBMC_SUBPEL_VARIANCE_WXH_NEON
+#endif // !CONFIG_REALTIME_ONLY
+
+#define MASKED_SUBPEL_VARIANCE_WXH_NEON(w, h, padding) \
+ unsigned int aom_masked_sub_pixel_variance##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, int xoffset, int yoffset, \
+ const uint8_t *ref, int ref_stride, const uint8_t *second_pred, \
+ const uint8_t *msk, int msk_stride, int invert_mask, \
+ unsigned int *sse) { \
+ uint8_t tmp0[w * (h + padding)]; \
+ uint8_t tmp1[w * h]; \
+ uint8_t tmp2[w * h]; \
+ var_filter_block2d_bil_w##w(src, tmp0, src_stride, 1, (h + padding), \
+ xoffset); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ aom_comp_mask_pred_neon(tmp2, second_pred, w, h, tmp1, w, msk, msk_stride, \
+ invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp2, w, ref, ref_stride, sse); \
+ }
+
+#define SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(w, h, padding) \
+ unsigned int aom_masked_sub_pixel_variance##w##x##h##_neon( \
+ const uint8_t *src, int src_stride, int xoffset, int yoffset, \
+ const uint8_t *ref, int ref_stride, const uint8_t *second_pred, \
+ const uint8_t *msk, int msk_stride, int invert_mask, \
+ unsigned int *sse) { \
+ if (xoffset == 0) { \
+ uint8_t tmp0[w * h]; \
+ if (yoffset == 0) { \
+ aom_comp_mask_pred_neon(tmp0, second_pred, w, h, src, src_stride, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp0, w, ref, ref_stride, sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_avg(src, tmp0, src_stride, src_stride, w, h); \
+ aom_comp_mask_pred_neon(tmp1, second_pred, w, h, tmp0, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp1, w, ref, ref_stride, sse); \
+ } else { \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_bil_w##w(src, tmp0, src_stride, src_stride, h, \
+ yoffset); \
+ aom_comp_mask_pred_neon(tmp1, second_pred, w, h, tmp0, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp1, w, ref, ref_stride, sse); \
+ } \
+ } else if (xoffset == 4) { \
+ uint8_t tmp0[w * (h + padding)]; \
+ if (yoffset == 0) { \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_avg(src, tmp0, src_stride, 1, w, h); \
+ aom_comp_mask_pred_neon(tmp1, second_pred, w, h, tmp0, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp1, w, ref, ref_stride, sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp1[w * h]; \
+ uint8_t tmp2[w * h]; \
+ var_filter_block2d_avg(src, tmp0, src_stride, 1, w, (h + padding)); \
+ var_filter_block2d_avg(tmp0, tmp1, w, w, w, h); \
+ aom_comp_mask_pred_neon(tmp2, second_pred, w, h, tmp1, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp2, w, ref, ref_stride, sse); \
+ } else { \
+ uint8_t tmp1[w * h]; \
+ uint8_t tmp2[w * h]; \
+ var_filter_block2d_avg(src, tmp0, src_stride, 1, w, (h + padding)); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ aom_comp_mask_pred_neon(tmp2, second_pred, w, h, tmp1, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp2, w, ref, ref_stride, sse); \
+ } \
+ } else { \
+ if (yoffset == 0) { \
+ uint8_t tmp0[w * h]; \
+ uint8_t tmp1[w * h]; \
+ var_filter_block2d_bil_w##w(src, tmp0, src_stride, 1, h, xoffset); \
+ aom_comp_mask_pred_neon(tmp1, second_pred, w, h, tmp0, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp1, w, ref, ref_stride, sse); \
+ } else if (yoffset == 4) { \
+ uint8_t tmp0[w * (h + padding)]; \
+ uint8_t tmp1[w * h]; \
+ uint8_t tmp2[w * h]; \
+ var_filter_block2d_bil_w##w(src, tmp0, src_stride, 1, (h + padding), \
+ xoffset); \
+ var_filter_block2d_avg(tmp0, tmp1, w, w, w, h); \
+ aom_comp_mask_pred_neon(tmp2, second_pred, w, h, tmp1, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp2, w, ref, ref_stride, sse); \
+ } else { \
+ uint8_t tmp0[w * (h + padding)]; \
+ uint8_t tmp1[w * (h + padding)]; \
+ uint8_t tmp2[w * h]; \
+ var_filter_block2d_bil_w##w(src, tmp0, src_stride, 1, (h + padding), \
+ xoffset); \
+ var_filter_block2d_bil_w##w(tmp0, tmp1, w, w, h, yoffset); \
+ aom_comp_mask_pred_neon(tmp2, second_pred, w, h, tmp1, w, msk, \
+ msk_stride, invert_mask); \
+ return aom_variance##w##x##h##_neon(tmp2, w, ref, ref_stride, sse); \
+ } \
+ } \
+ }
+
+MASKED_SUBPEL_VARIANCE_WXH_NEON(4, 4, 2)
+MASKED_SUBPEL_VARIANCE_WXH_NEON(4, 8, 2)
+
+MASKED_SUBPEL_VARIANCE_WXH_NEON(8, 4, 1)
+MASKED_SUBPEL_VARIANCE_WXH_NEON(8, 8, 1)
+MASKED_SUBPEL_VARIANCE_WXH_NEON(8, 16, 1)
+
+MASKED_SUBPEL_VARIANCE_WXH_NEON(16, 8, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(16, 16, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(16, 32, 1)
+
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(32, 16, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(32, 32, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(32, 64, 1)
+
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(64, 32, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(64, 64, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(64, 128, 1)
+
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(128, 64, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(128, 128, 1)
+
+// Realtime mode doesn't use 4x rectangular blocks.
+#if !CONFIG_REALTIME_ONLY
+MASKED_SUBPEL_VARIANCE_WXH_NEON(4, 16, 2)
+MASKED_SUBPEL_VARIANCE_WXH_NEON(8, 32, 1)
+MASKED_SUBPEL_VARIANCE_WXH_NEON(16, 4, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(16, 64, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(32, 8, 1)
+SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON(64, 16, 1)
+#endif // !CONFIG_REALTIME_ONLY
+
+#undef MASKED_SUBPEL_VARIANCE_WXH_NEON
+#undef SPECIALIZED_MASKED_SUBPEL_VARIANCE_WXH_NEON
diff --git a/aom_dsp/arm/sum_neon.h b/aom_dsp/arm/sum_neon.h
index 855edf6..ff68c12 100644
--- a/aom_dsp/arm/sum_neon.h
+++ b/aom_dsp/arm/sum_neon.h
@@ -15,7 +15,7 @@
#include "aom_ports/mem.h"
static INLINE int horizontal_add_s16x8(const int16x8_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_s16(a);
#else
const int32x4_t b = vpaddlq_s16(a);
@@ -27,7 +27,7 @@
}
static INLINE int horizontal_add_s32x4(const int32x4_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddvq_s32(a);
#else
const int64x2_t b = vpaddlq_s32(a);
@@ -37,8 +37,16 @@
#endif
}
+static INLINE int64_t horizontal_add_s64x2(const int64x2_t a) {
+#if AOM_ARCH_AARCH64
+ return vaddvq_s64(a);
+#else
+ return vgetq_lane_s64(a, 0) + vgetq_lane_s64(a, 1);
+#endif
+}
+
static INLINE uint64_t horizontal_add_u64x2(const uint64x2_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddvq_u64(a);
#else
return vgetq_lane_u64(a, 0) + vgetq_lane_u64(a, 1);
@@ -46,7 +54,7 @@
}
static INLINE uint64_t horizontal_long_add_u32x4(const uint32x4_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u32(a);
#else
const uint64x2_t b = vpaddlq_u32(a);
@@ -55,7 +63,7 @@
}
static INLINE unsigned int horizontal_add_u32x4(const uint32x4_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddvq_u32(a);
#else
const uint64x2_t b = vpaddlq_u32(a);
@@ -65,9 +73,24 @@
#endif
}
+static INLINE uint32x4_t horizontal_add_4d_u32x4(const uint32x4_t sum[4]) {
+#if AOM_ARCH_AARCH64
+ uint32x4_t res01 = vpaddq_u32(sum[0], sum[1]);
+ uint32x4_t res23 = vpaddq_u32(sum[2], sum[3]);
+ return vpaddq_u32(res01, res23);
+#else
+ uint32x4_t res = vdupq_n_u32(0);
+ res = vsetq_lane_u32(horizontal_add_u32x4(sum[0]), res, 0);
+ res = vsetq_lane_u32(horizontal_add_u32x4(sum[1]), res, 1);
+ res = vsetq_lane_u32(horizontal_add_u32x4(sum[2]), res, 2);
+ res = vsetq_lane_u32(horizontal_add_u32x4(sum[3]), res, 3);
+ return res;
+#endif
+}
+
static INLINE uint32_t horizontal_long_add_u16x8(const uint16x8_t vec_lo,
const uint16x8_t vec_hi) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u16(vec_lo) + vaddlvq_u16(vec_hi);
#else
const uint32x4_t vec_l_lo =
@@ -82,8 +105,33 @@
#endif
}
+static INLINE uint32x4_t horizontal_long_add_4d_u16x8(
+ const uint16x8_t sum_lo[4], const uint16x8_t sum_hi[4]) {
+ const uint32x4_t a0 = vpaddlq_u16(sum_lo[0]);
+ const uint32x4_t a1 = vpaddlq_u16(sum_lo[1]);
+ const uint32x4_t a2 = vpaddlq_u16(sum_lo[2]);
+ const uint32x4_t a3 = vpaddlq_u16(sum_lo[3]);
+ const uint32x4_t b0 = vpadalq_u16(a0, sum_hi[0]);
+ const uint32x4_t b1 = vpadalq_u16(a1, sum_hi[1]);
+ const uint32x4_t b2 = vpadalq_u16(a2, sum_hi[2]);
+ const uint32x4_t b3 = vpadalq_u16(a3, sum_hi[3]);
+#if AOM_ARCH_AARCH64
+ const uint32x4_t c0 = vpaddq_u32(b0, b1);
+ const uint32x4_t c1 = vpaddq_u32(b2, b3);
+ return vpaddq_u32(c0, c1);
+#else
+ const uint32x2_t c0 = vadd_u32(vget_low_u32(b0), vget_high_u32(b0));
+ const uint32x2_t c1 = vadd_u32(vget_low_u32(b1), vget_high_u32(b1));
+ const uint32x2_t c2 = vadd_u32(vget_low_u32(b2), vget_high_u32(b2));
+ const uint32x2_t c3 = vadd_u32(vget_low_u32(b3), vget_high_u32(b3));
+ const uint32x2_t d0 = vpadd_u32(c0, c1);
+ const uint32x2_t d1 = vpadd_u32(c2, c3);
+ return vcombine_u32(d0, d1);
+#endif
+}
+
static INLINE uint32_t horizontal_add_u16x8(const uint16x8_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u16(a);
#else
const uint32x4_t b = vpaddlq_u16(a);
@@ -94,8 +142,25 @@
#endif
}
+static INLINE uint32x4_t horizontal_add_4d_u16x8(const uint16x8_t sum[4]) {
+#if AOM_ARCH_AARCH64
+ const uint16x8_t a0 = vpaddq_u16(sum[0], sum[1]);
+ const uint16x8_t a1 = vpaddq_u16(sum[2], sum[3]);
+ const uint16x8_t b0 = vpaddq_u16(a0, a1);
+ return vpaddlq_u16(b0);
+#else
+ const uint16x4_t a0 = vadd_u16(vget_low_u16(sum[0]), vget_high_u16(sum[0]));
+ const uint16x4_t a1 = vadd_u16(vget_low_u16(sum[1]), vget_high_u16(sum[1]));
+ const uint16x4_t a2 = vadd_u16(vget_low_u16(sum[2]), vget_high_u16(sum[2]));
+ const uint16x4_t a3 = vadd_u16(vget_low_u16(sum[3]), vget_high_u16(sum[3]));
+ const uint16x4_t b0 = vpadd_u16(a0, a1);
+ const uint16x4_t b1 = vpadd_u16(a2, a3);
+ return vpaddlq_u16(vcombine_u16(b0, b1));
+#endif
+}
+
static INLINE uint32_t horizontal_add_u32x2(const uint32x2_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddv_u32(a);
#else
const uint64x1_t b = vpaddl_u32(a);
@@ -103,8 +168,17 @@
#endif
}
+static INLINE uint64_t horizontal_long_add_u32x2(const uint32x2_t a) {
+#if AOM_ARCH_AARCH64
+ return vaddlv_u32(a);
+#else
+ const uint64x1_t b = vpaddl_u32(a);
+ return vget_lane_u64(b, 0);
+#endif
+}
+
static INLINE uint32_t horizontal_add_u16x4(const uint16x4_t a) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlv_u16(a);
#else
const uint32x2_t b = vpaddl_u16(a);
diff --git a/aom_dsp/arm/sum_squares_neon.c b/aom_dsp/arm/sum_squares_neon.c
index bf212a9..626cf21 100644
--- a/aom_dsp/arm/sum_squares_neon.c
+++ b/aom_dsp/arm/sum_squares_neon.c
@@ -35,7 +35,7 @@
int stride, int height) {
int32x4_t sum_squares[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
- int h = 0;
+ int h = height;
do {
int16x4_t s0 = vld1_s16(src + 0 * stride);
int16x4_t s1 = vld1_s16(src + 1 * stride);
@@ -48,8 +48,8 @@
sum_squares[1] = vmlal_s16(sum_squares[1], s3, s3);
src += 4 * stride;
- h += 4;
- } while (h < height);
+ h -= 4;
+ } while (h != 0);
return horizontal_long_add_u32x4(
vreinterpretq_u32_s32(vaddq_s32(sum_squares[0], sum_squares[1])));
@@ -60,7 +60,7 @@
int height) {
uint64x2_t sum_squares = vdupq_n_u64(0);
- int h = 0;
+ int h = height;
do {
int32x4_t ss_row[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
int w = 0;
@@ -86,8 +86,8 @@
sum_squares, vreinterpretq_u32_s32(vaddq_s32(ss_row[0], ss_row[1])));
src += 4 * stride;
- h += 4;
- } while (h < height);
+ h -= 4;
+ } while (h != 0);
return horizontal_add_u64x2(sum_squares);
}
@@ -134,7 +134,7 @@
int32x4_t sse[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
int32x2_t sum_acc[2] = { vdup_n_s32(0), vdup_n_s32(0) };
- int h = 0;
+ int h = height;
do {
int16x4_t s0 = vld1_s16(src + 0 * stride);
int16x4_t s1 = vld1_s16(src + 1 * stride);
@@ -152,8 +152,8 @@
sum_acc[1] = vpadal_s16(sum_acc[1], s3);
src += 4 * stride;
- h += 4;
- } while (h < height);
+ h -= 4;
+ } while (h != 0);
*sum += horizontal_add_s32x4(vcombine_s32(sum_acc[0], sum_acc[1]));
return horizontal_long_add_u32x4(
@@ -166,7 +166,7 @@
uint64x2_t sse = vdupq_n_u64(0);
int32x4_t sum_acc = vdupq_n_s32(0);
- int h = 0;
+ int h = height;
do {
int32x4_t sse_row[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
int w = 0;
@@ -198,8 +198,8 @@
vreinterpretq_u32_s32(vaddq_s32(sse_row[0], sse_row[1])));
src += 4 * stride;
- h += 4;
- } while (h < height);
+ h -= 4;
+ } while (h != 0);
*sum += horizontal_add_s32x4(sum_acc);
return horizontal_add_u64x2(sse);
@@ -223,3 +223,478 @@
return sse;
}
+
+static INLINE uint64_t aom_sum_squares_i16_4xn_neon(const int16_t *src,
+ uint32_t n) {
+ uint64x2_t sum_u64 = vdupq_n_u64(0);
+
+ int i = n;
+ do {
+ uint32x4_t sum;
+ int16x4_t s0 = vld1_s16(src);
+
+ sum = vreinterpretq_u32_s32(vmull_s16(s0, s0));
+
+ sum_u64 = vpadalq_u32(sum_u64, sum);
+
+ src += 4;
+ i -= 4;
+ } while (i >= 4);
+
+ if (i > 0) {
+ return horizontal_add_u64x2(sum_u64) + aom_sum_squares_i16_c(src, i);
+ }
+ return horizontal_add_u64x2(sum_u64);
+}
+
+static INLINE uint64_t aom_sum_squares_i16_8xn_neon(const int16_t *src,
+ uint32_t n) {
+ uint64x2_t sum_u64[2] = { vdupq_n_u64(0), vdupq_n_u64(0) };
+
+ int i = n;
+ do {
+ uint32x4_t sum[2];
+ int16x8_t s0 = vld1q_s16(src);
+
+ sum[0] =
+ vreinterpretq_u32_s32(vmull_s16(vget_low_s16(s0), vget_low_s16(s0)));
+ sum[1] =
+ vreinterpretq_u32_s32(vmull_s16(vget_high_s16(s0), vget_high_s16(s0)));
+
+ sum_u64[0] = vpadalq_u32(sum_u64[0], sum[0]);
+ sum_u64[1] = vpadalq_u32(sum_u64[1], sum[1]);
+
+ src += 8;
+ i -= 8;
+ } while (i >= 8);
+
+ if (i > 0) {
+ return horizontal_add_u64x2(vaddq_u64(sum_u64[0], sum_u64[1])) +
+ aom_sum_squares_i16_c(src, i);
+ }
+ return horizontal_add_u64x2(vaddq_u64(sum_u64[0], sum_u64[1]));
+}
+
+uint64_t aom_sum_squares_i16_neon(const int16_t *src, uint32_t n) {
+ // This function seems to be called only for values of N >= 64. See
+ // av1/encoder/compound_type.c.
+ if (LIKELY(n >= 8)) {
+ return aom_sum_squares_i16_8xn_neon(src, n);
+ }
+ if (n >= 4) {
+ return aom_sum_squares_i16_4xn_neon(src, n);
+ }
+ return aom_sum_squares_i16_c(src, n);
+}
+
+#if defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE uint64_t aom_var_2d_u8_4xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x2_t sum_u32 = vdup_n_u32(0);
+ uint32x2_t sse_u32 = vdup_n_u32(0);
+
+ int h = height / 2;
+ do {
+ int w = width;
+ uint8_t *src_ptr = src;
+ do {
+ uint8x8_t s0 = load_unaligned_u8(src_ptr, src_stride);
+
+ sum_u32 = vdot_u32(sum_u32, s0, vdup_n_u8(1));
+
+ sse_u32 = vdot_u32(sse_u32, s0, s0);
+
+ src_ptr += 8;
+ w -= 8;
+ } while (w >= 8);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += 2 * src_stride;
+ } while (--h != 0);
+
+ sum += horizontal_long_add_u32x2(sum_u32);
+ sse += horizontal_long_add_u32x2(sse_u32);
+
+ return sse - sum * sum / (width * height);
+}
+
+static INLINE uint64_t aom_var_2d_u8_8xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x2_t sum_u32 = vdup_n_u32(0);
+ uint32x2_t sse_u32 = vdup_n_u32(0);
+
+ int h = height;
+ do {
+ int w = width;
+ uint8_t *src_ptr = src;
+ do {
+ uint8x8_t s0 = vld1_u8(src_ptr);
+
+ sum_u32 = vdot_u32(sum_u32, s0, vdup_n_u8(1));
+
+ sse_u32 = vdot_u32(sse_u32, s0, s0);
+
+ src_ptr += 8;
+ w -= 8;
+ } while (w >= 8);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += src_stride;
+ } while (--h != 0);
+
+ sum += horizontal_long_add_u32x2(sum_u32);
+ sse += horizontal_long_add_u32x2(sse_u32);
+
+ return sse - sum * sum / (width * height);
+}
+
+static INLINE uint64_t aom_var_2d_u8_16xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x4_t sum_u32 = vdupq_n_u32(0);
+ uint32x4_t sse_u32 = vdupq_n_u32(0);
+
+ int h = height;
+ do {
+ int w = width;
+ uint8_t *src_ptr = src;
+ do {
+ uint8x16_t s0 = vld1q_u8(src_ptr);
+
+ sum_u32 = vdotq_u32(sum_u32, s0, vdupq_n_u8(1));
+
+ sse_u32 = vdotq_u32(sse_u32, s0, s0);
+
+ src_ptr += 16;
+ w -= 16;
+ } while (w >= 16);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += src_stride;
+ } while (--h != 0);
+
+ sum += horizontal_long_add_u32x4(sum_u32);
+ sse += horizontal_long_add_u32x4(sse_u32);
+
+ return sse - sum * sum / (width * height);
+}
+
+#else // !defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE uint64_t aom_var_2d_u8_4xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x2_t sum_u32 = vdup_n_u32(0);
+ uint32x4_t sse_u32 = vdupq_n_u32(0);
+
+ // 255*256 = 65280, so we can accumulate up to 256 8-bit elements in a 16-bit
+ // element before we need to accumulate to 32-bit elements. Since we're
+ // accumulating in uint16x4_t vectors, this means we can accumulate up to 4
+ // rows of 256 elements. Therefore the limit can be computed as: h_limit = (4
+ // * 256) / width.
+ int h_limit = (4 * 256) / width;
+ int h_tmp = height > h_limit ? h_limit : height;
+
+ int h = 0;
+ do {
+ uint16x4_t sum_u16 = vdup_n_u16(0);
+ do {
+ uint8_t *src_ptr = src;
+ int w = width;
+ do {
+ uint8x8_t s0 = load_unaligned_u8(src_ptr, src_stride);
+
+ sum_u16 = vpadal_u8(sum_u16, s0);
+
+ uint16x8_t sse_u16 = vmull_u8(s0, s0);
+
+ sse_u32 = vpadalq_u16(sse_u32, sse_u16);
+
+ src_ptr += 8;
+ w -= 8;
+ } while (w >= 8);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += 2 * src_stride;
+ h += 2;
+ } while (h < h_tmp && h < height);
+
+ sum_u32 = vpadal_u16(sum_u32, sum_u16);
+ h_tmp += h_limit;
+ } while (h < height);
+
+ sum += horizontal_long_add_u32x2(sum_u32);
+ sse += horizontal_long_add_u32x4(sse_u32);
+
+ return sse - sum * sum / (width * height);
+}
+
+static INLINE uint64_t aom_var_2d_u8_8xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x2_t sum_u32 = vdup_n_u32(0);
+ uint32x4_t sse_u32 = vdupq_n_u32(0);
+
+ // 255*256 = 65280, so we can accumulate up to 256 8-bit elements in a 16-bit
+ // element before we need to accumulate to 32-bit elements. Since we're
+ // accumulating in uint16x4_t vectors, this means we can accumulate up to 4
+ // rows of 256 elements. Therefore the limit can be computed as: h_limit = (4
+ // * 256) / width.
+ int h_limit = (4 * 256) / width;
+ int h_tmp = height > h_limit ? h_limit : height;
+
+ int h = 0;
+ do {
+ uint16x4_t sum_u16 = vdup_n_u16(0);
+ do {
+ uint8_t *src_ptr = src;
+ int w = width;
+ do {
+ uint8x8_t s0 = vld1_u8(src_ptr);
+
+ sum_u16 = vpadal_u8(sum_u16, s0);
+
+ uint16x8_t sse_u16 = vmull_u8(s0, s0);
+
+ sse_u32 = vpadalq_u16(sse_u32, sse_u16);
+
+ src_ptr += 8;
+ w -= 8;
+ } while (w >= 8);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += src_stride;
+ ++h;
+ } while (h < h_tmp && h < height);
+
+ sum_u32 = vpadal_u16(sum_u32, sum_u16);
+ h_tmp += h_limit;
+ } while (h < height);
+
+ sum += horizontal_long_add_u32x2(sum_u32);
+ sse += horizontal_long_add_u32x4(sse_u32);
+
+ return sse - sum * sum / (width * height);
+}
+
+static INLINE uint64_t aom_var_2d_u8_16xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x4_t sum_u32 = vdupq_n_u32(0);
+ uint32x4_t sse_u32[2] = { vdupq_n_u32(0), vdupq_n_u32(0) };
+
+ // 255*256 = 65280, so we can accumulate up to 256 8-bit elements in a 16-bit
+ // element before we need to accumulate to 32-bit elements. Since we're
+ // accumulating in uint16x8_t vectors, this means we can accumulate up to 8
+ // rows of 256 elements. Therefore the limit can be computed as: h_limit = (8
+ // * 256) / width.
+ int h_limit = (8 * 256) / width;
+ int h_tmp = height > h_limit ? h_limit : height;
+
+ int h = 0;
+ do {
+ uint16x8_t sum_u16 = vdupq_n_u16(0);
+ do {
+ int w = width;
+ uint8_t *src_ptr = src;
+ do {
+ uint8x16_t s0 = vld1q_u8(src_ptr);
+
+ sum_u16 = vpadalq_u8(sum_u16, s0);
+
+ uint16x8_t sse_u16_lo = vmull_u8(vget_low_u8(s0), vget_low_u8(s0));
+ uint16x8_t sse_u16_hi = vmull_u8(vget_high_u8(s0), vget_high_u8(s0));
+
+ sse_u32[0] = vpadalq_u16(sse_u32[0], sse_u16_lo);
+ sse_u32[1] = vpadalq_u16(sse_u32[1], sse_u16_hi);
+
+ src_ptr += 16;
+ w -= 16;
+ } while (w >= 16);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint8_t v = src[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src += src_stride;
+ ++h;
+ } while (h < h_tmp && h < height);
+
+ sum_u32 = vpadalq_u16(sum_u32, sum_u16);
+ h_tmp += h_limit;
+ } while (h < height);
+
+ sum += horizontal_long_add_u32x4(sum_u32);
+ sse += horizontal_long_add_u32x4(vaddq_u32(sse_u32[0], sse_u32[1]));
+
+ return sse - sum * sum / (width * height);
+}
+
+#endif // defined(__ARM_FEATURE_DOTPROD)
+
+uint64_t aom_var_2d_u8_neon(uint8_t *src, int src_stride, int width,
+ int height) {
+ if (width >= 16) {
+ return aom_var_2d_u8_16xh_neon(src, src_stride, width, height);
+ }
+ if (width >= 8) {
+ return aom_var_2d_u8_8xh_neon(src, src_stride, width, height);
+ }
+ if (width >= 4 && height % 2 == 0) {
+ return aom_var_2d_u8_4xh_neon(src, src_stride, width, height);
+ }
+ return aom_var_2d_u8_c(src, src_stride, width, height);
+}
+
+static INLINE uint64_t aom_var_2d_u16_4xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint16_t *src_u16 = CONVERT_TO_SHORTPTR(src);
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x2_t sum_u32 = vdup_n_u32(0);
+ uint64x2_t sse_u64 = vdupq_n_u64(0);
+
+ int h = height;
+ do {
+ int w = width;
+ uint16_t *src_ptr = src_u16;
+ do {
+ uint16x4_t s0 = vld1_u16(src_ptr);
+
+ sum_u32 = vpadal_u16(sum_u32, s0);
+
+ uint32x4_t sse_u32 = vmull_u16(s0, s0);
+
+ sse_u64 = vpadalq_u32(sse_u64, sse_u32);
+
+ src_ptr += 4;
+ w -= 4;
+ } while (w >= 4);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint16_t v = src_u16[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src_u16 += src_stride;
+ } while (--h != 0);
+
+ sum += horizontal_long_add_u32x2(sum_u32);
+ sse += horizontal_add_u64x2(sse_u64);
+
+ return sse - sum * sum / (width * height);
+}
+
+static INLINE uint64_t aom_var_2d_u16_8xh_neon(uint8_t *src, int src_stride,
+ int width, int height) {
+ uint16_t *src_u16 = CONVERT_TO_SHORTPTR(src);
+ uint64_t sum = 0;
+ uint64_t sse = 0;
+ uint32x4_t sum_u32 = vdupq_n_u32(0);
+ uint64x2_t sse_u64[2] = { vdupq_n_u64(0), vdupq_n_u64(0) };
+
+ int h = height;
+ do {
+ int w = width;
+ uint16_t *src_ptr = src_u16;
+ do {
+ uint16x8_t s0 = vld1q_u16(src_ptr);
+
+ sum_u32 = vpadalq_u16(sum_u32, s0);
+
+ uint32x4_t sse_u32_lo = vmull_u16(vget_low_u16(s0), vget_low_u16(s0));
+ uint32x4_t sse_u32_hi = vmull_u16(vget_high_u16(s0), vget_high_u16(s0));
+
+ sse_u64[0] = vpadalq_u32(sse_u64[0], sse_u32_lo);
+ sse_u64[1] = vpadalq_u32(sse_u64[1], sse_u32_hi);
+
+ src_ptr += 8;
+ w -= 8;
+ } while (w >= 8);
+
+ // Process remaining columns in the row using C.
+ while (w > 0) {
+ int idx = width - w;
+ const uint16_t v = src_u16[idx];
+ sum += v;
+ sse += v * v;
+ w--;
+ }
+
+ src_u16 += src_stride;
+ } while (--h != 0);
+
+ sum += horizontal_long_add_u32x4(sum_u32);
+ sse += horizontal_add_u64x2(vaddq_u64(sse_u64[0], sse_u64[1]));
+
+ return sse - sum * sum / (width * height);
+}
+
+uint64_t aom_var_2d_u16_neon(uint8_t *src, int src_stride, int width,
+ int height) {
+ if (width >= 8) {
+ return aom_var_2d_u16_8xh_neon(src, src_stride, width, height);
+ }
+ if (width >= 4) {
+ return aom_var_2d_u16_4xh_neon(src, src_stride, width, height);
+ }
+ return aom_var_2d_u16_c(src, src_stride, width, height);
+}
diff --git a/aom_dsp/arm/transpose_neon.h b/aom_dsp/arm/transpose_neon.h
index 26fc1fd..8218140 100644
--- a/aom_dsp/arm/transpose_neon.h
+++ b/aom_dsp/arm/transpose_neon.h
@@ -13,6 +13,8 @@
#include <arm_neon.h>
+#include "config/aom_config.h"
+
// Swap high and low halves.
static INLINE uint16x8_t transpose64_u16q(const uint16x8_t a) {
return vextq_u16(a, a, 4);
@@ -258,13 +260,19 @@
a[3] = vreinterpretq_u16_u32(c1.val[1]);
}
-static INLINE uint16x8x2_t aom_vtrnq_u64_to_u16(const uint32x4_t a0,
- const uint32x4_t a1) {
+static INLINE uint16x8x2_t aom_vtrnq_u64_to_u16(uint32x4_t a0, uint32x4_t a1) {
uint16x8x2_t b0;
+#if AOM_ARCH_AARCH64
+ b0.val[0] = vreinterpretq_u16_u64(
+ vtrn1q_u64(vreinterpretq_u64_u32(a0), vreinterpretq_u64_u32(a1)));
+ b0.val[1] = vreinterpretq_u16_u64(
+ vtrn2q_u64(vreinterpretq_u64_u32(a0), vreinterpretq_u64_u32(a1)));
+#else
b0.val[0] = vcombine_u16(vreinterpret_u16_u32(vget_low_u32(a0)),
vreinterpret_u16_u32(vget_low_u32(a1)));
b0.val[1] = vcombine_u16(vreinterpret_u16_u32(vget_high_u32(a0)),
vreinterpret_u16_u32(vget_high_u32(a1)));
+#endif
return b0;
}
@@ -343,7 +351,7 @@
uint16x4_t *a6, uint16x4_t *a7,
uint16x8_t *o0, uint16x8_t *o1,
uint16x8_t *o2, uint16x8_t *o3) {
- // Swap 16 bit elements. Goes from:
+ // Combine rows. Goes from:
// a0: 00 01 02 03
// a1: 10 11 12 13
// a2: 20 21 22 23
@@ -353,53 +361,40 @@
// a6: 60 61 62 63
// a7: 70 71 72 73
// to:
- // b0.val[0]: 00 10 02 12
- // b0.val[1]: 01 11 03 13
- // b1.val[0]: 20 30 22 32
- // b1.val[1]: 21 31 23 33
- // b2.val[0]: 40 50 42 52
- // b2.val[1]: 41 51 43 53
- // b3.val[0]: 60 70 62 72
- // b3.val[1]: 61 71 63 73
+ // b0: 00 01 02 03 40 41 42 43
+ // b1: 10 11 12 13 50 51 52 53
+ // b2: 20 21 22 23 60 61 62 63
+ // b3: 30 31 32 33 70 71 72 73
- uint16x4x2_t b0 = vtrn_u16(*a0, *a1);
- uint16x4x2_t b1 = vtrn_u16(*a2, *a3);
- uint16x4x2_t b2 = vtrn_u16(*a4, *a5);
- uint16x4x2_t b3 = vtrn_u16(*a6, *a7);
+ const uint16x8_t b0 = vcombine_u16(*a0, *a4);
+ const uint16x8_t b1 = vcombine_u16(*a1, *a5);
+ const uint16x8_t b2 = vcombine_u16(*a2, *a6);
+ const uint16x8_t b3 = vcombine_u16(*a3, *a7);
+
+ // Swap 16 bit elements resulting in:
+ // c0.val[0]: 00 10 02 12 40 50 42 52
+ // c0.val[1]: 01 11 03 13 41 51 43 53
+ // c1.val[0]: 20 30 22 32 60 70 62 72
+ // c1.val[1]: 21 31 23 33 61 71 63 73
+
+ const uint16x8x2_t c0 = vtrnq_u16(b0, b1);
+ const uint16x8x2_t c1 = vtrnq_u16(b2, b3);
// Swap 32 bit elements resulting in:
- // c0.val[0]: 00 10 20 30
- // c0.val[1]: 02 12 22 32
- // c1.val[0]: 01 11 21 31
- // c1.val[1]: 03 13 23 33
- // c2.val[0]: 40 50 60 70
- // c2.val[1]: 42 52 62 72
- // c3.val[0]: 41 51 61 71
- // c3.val[1]: 43 53 63 73
+ // d0.val[0]: 00 10 20 30 40 50 60 70
+ // d0.val[1]: 02 12 22 32 42 52 62 72
+ // d1.val[0]: 01 11 21 31 41 51 61 71
+ // d1.val[1]: 03 13 23 33 43 53 63 73
- uint32x2x2_t c0 = vtrn_u32(vreinterpret_u32_u16(b0.val[0]),
- vreinterpret_u32_u16(b1.val[0]));
- uint32x2x2_t c1 = vtrn_u32(vreinterpret_u32_u16(b0.val[1]),
- vreinterpret_u32_u16(b1.val[1]));
- uint32x2x2_t c2 = vtrn_u32(vreinterpret_u32_u16(b2.val[0]),
- vreinterpret_u32_u16(b3.val[0]));
- uint32x2x2_t c3 = vtrn_u32(vreinterpret_u32_u16(b2.val[1]),
- vreinterpret_u32_u16(b3.val[1]));
+ const uint32x4x2_t d0 = vtrnq_u32(vreinterpretq_u32_u16(c0.val[0]),
+ vreinterpretq_u32_u16(c1.val[0]));
+ const uint32x4x2_t d1 = vtrnq_u32(vreinterpretq_u32_u16(c0.val[1]),
+ vreinterpretq_u32_u16(c1.val[1]));
- // Swap 64 bit elements resulting in:
- // o0: 00 10 20 30 40 50 60 70
- // o1: 01 11 21 31 41 51 61 71
- // o2: 02 12 22 32 42 52 62 72
- // o3: 03 13 23 33 43 53 63 73
-
- *o0 = vcombine_u16(vreinterpret_u16_u32(c0.val[0]),
- vreinterpret_u16_u32(c2.val[0]));
- *o1 = vcombine_u16(vreinterpret_u16_u32(c1.val[0]),
- vreinterpret_u16_u32(c3.val[0]));
- *o2 = vcombine_u16(vreinterpret_u16_u32(c0.val[1]),
- vreinterpret_u16_u32(c2.val[1]));
- *o3 = vcombine_u16(vreinterpret_u16_u32(c1.val[1]),
- vreinterpret_u16_u32(c3.val[1]));
+ *o0 = vreinterpretq_u16_u32(d0.val[0]);
+ *o1 = vreinterpretq_u16_u32(d1.val[0]);
+ *o2 = vreinterpretq_u16_u32(d0.val[1]);
+ *o3 = vreinterpretq_u16_u32(d1.val[1]);
}
static INLINE void transpose_s16_4x8(int16x4_t *a0, int16x4_t *a1,
@@ -408,7 +403,7 @@
int16x4_t *a6, int16x4_t *a7,
int16x8_t *o0, int16x8_t *o1,
int16x8_t *o2, int16x8_t *o3) {
- // Swap 16 bit elements. Goes from:
+ // Combine rows. Goes from:
// a0: 00 01 02 03
// a1: 10 11 12 13
// a2: 20 21 22 23
@@ -418,53 +413,40 @@
// a6: 60 61 62 63
// a7: 70 71 72 73
// to:
- // b0.val[0]: 00 10 02 12
- // b0.val[1]: 01 11 03 13
- // b1.val[0]: 20 30 22 32
- // b1.val[1]: 21 31 23 33
- // b2.val[0]: 40 50 42 52
- // b2.val[1]: 41 51 43 53
- // b3.val[0]: 60 70 62 72
- // b3.val[1]: 61 71 63 73
+ // b0: 00 01 02 03 40 41 42 43
+ // b1: 10 11 12 13 50 51 52 53
+ // b2: 20 21 22 23 60 61 62 63
+ // b3: 30 31 32 33 70 71 72 73
- int16x4x2_t b0 = vtrn_s16(*a0, *a1);
- int16x4x2_t b1 = vtrn_s16(*a2, *a3);
- int16x4x2_t b2 = vtrn_s16(*a4, *a5);
- int16x4x2_t b3 = vtrn_s16(*a6, *a7);
+ const int16x8_t b0 = vcombine_s16(*a0, *a4);
+ const int16x8_t b1 = vcombine_s16(*a1, *a5);
+ const int16x8_t b2 = vcombine_s16(*a2, *a6);
+ const int16x8_t b3 = vcombine_s16(*a3, *a7);
+
+ // Swap 16 bit elements resulting in:
+ // c0.val[0]: 00 10 02 12 40 50 42 52
+ // c0.val[1]: 01 11 03 13 41 51 43 53
+ // c1.val[0]: 20 30 22 32 60 70 62 72
+ // c1.val[1]: 21 31 23 33 61 71 63 73
+
+ const int16x8x2_t c0 = vtrnq_s16(b0, b1);
+ const int16x8x2_t c1 = vtrnq_s16(b2, b3);
// Swap 32 bit elements resulting in:
- // c0.val[0]: 00 10 20 30
- // c0.val[1]: 02 12 22 32
- // c1.val[0]: 01 11 21 31
- // c1.val[1]: 03 13 23 33
- // c2.val[0]: 40 50 60 70
- // c2.val[1]: 42 52 62 72
- // c3.val[0]: 41 51 61 71
- // c3.val[1]: 43 53 63 73
+ // d0.val[0]: 00 10 20 30 40 50 60 70
+ // d0.val[1]: 02 12 22 32 42 52 62 72
+ // d1.val[0]: 01 11 21 31 41 51 61 71
+ // d1.val[1]: 03 13 23 33 43 53 63 73
- int32x2x2_t c0 = vtrn_s32(vreinterpret_s32_s16(b0.val[0]),
- vreinterpret_s32_s16(b1.val[0]));
- int32x2x2_t c1 = vtrn_s32(vreinterpret_s32_s16(b0.val[1]),
- vreinterpret_s32_s16(b1.val[1]));
- int32x2x2_t c2 = vtrn_s32(vreinterpret_s32_s16(b2.val[0]),
- vreinterpret_s32_s16(b3.val[0]));
- int32x2x2_t c3 = vtrn_s32(vreinterpret_s32_s16(b2.val[1]),
- vreinterpret_s32_s16(b3.val[1]));
+ const int32x4x2_t d0 = vtrnq_s32(vreinterpretq_s32_s16(c0.val[0]),
+ vreinterpretq_s32_s16(c1.val[0]));
+ const int32x4x2_t d1 = vtrnq_s32(vreinterpretq_s32_s16(c0.val[1]),
+ vreinterpretq_s32_s16(c1.val[1]));
- // Swap 64 bit elements resulting in:
- // o0: 00 10 20 30 40 50 60 70
- // o1: 01 11 21 31 41 51 61 71
- // o2: 02 12 22 32 42 52 62 72
- // o3: 03 13 23 33 43 53 63 73
-
- *o0 = vcombine_s16(vreinterpret_s16_s32(c0.val[0]),
- vreinterpret_s16_s32(c2.val[0]));
- *o1 = vcombine_s16(vreinterpret_s16_s32(c1.val[0]),
- vreinterpret_s16_s32(c3.val[0]));
- *o2 = vcombine_s16(vreinterpret_s16_s32(c0.val[1]),
- vreinterpret_s16_s32(c2.val[1]));
- *o3 = vcombine_s16(vreinterpret_s16_s32(c1.val[1]),
- vreinterpret_s16_s32(c3.val[1]));
+ *o0 = vreinterpretq_s16_s32(d0.val[0]);
+ *o1 = vreinterpretq_s16_s32(d1.val[0]);
+ *o2 = vreinterpretq_s16_s32(d0.val[1]);
+ *o3 = vreinterpretq_s16_s32(d1.val[1]);
}
static INLINE void transpose_u16_8x8(uint16x8_t *a0, uint16x8_t *a1,
@@ -514,25 +496,45 @@
const uint32x4x2_t c3 = vtrnq_u32(vreinterpretq_u32_u16(b2.val[1]),
vreinterpretq_u32_u16(b3.val[1]));
- *a0 = vcombine_u16(vget_low_u16(vreinterpretq_u16_u32(c0.val[0])),
- vget_low_u16(vreinterpretq_u16_u32(c2.val[0])));
- *a4 = vcombine_u16(vget_high_u16(vreinterpretq_u16_u32(c0.val[0])),
- vget_high_u16(vreinterpretq_u16_u32(c2.val[0])));
+ // Swap 64 bit elements resulting in:
+ // d0.val[0]: 00 10 20 30 40 50 60 70
+ // d0.val[1]: 04 14 24 34 44 54 64 74
+ // d1.val[0]: 01 11 21 31 41 51 61 71
+ // d1.val[1]: 05 15 25 35 45 55 65 75
+ // d2.val[0]: 02 12 22 32 42 52 62 72
+ // d2.val[1]: 06 16 26 36 46 56 66 76
+ // d3.val[0]: 03 13 23 33 43 53 63 73
+ // d3.val[1]: 07 17 27 37 47 57 67 77
- *a2 = vcombine_u16(vget_low_u16(vreinterpretq_u16_u32(c0.val[1])),
- vget_low_u16(vreinterpretq_u16_u32(c2.val[1])));
- *a6 = vcombine_u16(vget_high_u16(vreinterpretq_u16_u32(c0.val[1])),
- vget_high_u16(vreinterpretq_u16_u32(c2.val[1])));
+ const uint16x8x2_t d0 = aom_vtrnq_u64_to_u16(c0.val[0], c2.val[0]);
+ const uint16x8x2_t d1 = aom_vtrnq_u64_to_u16(c1.val[0], c3.val[0]);
+ const uint16x8x2_t d2 = aom_vtrnq_u64_to_u16(c0.val[1], c2.val[1]);
+ const uint16x8x2_t d3 = aom_vtrnq_u64_to_u16(c1.val[1], c3.val[1]);
- *a1 = vcombine_u16(vget_low_u16(vreinterpretq_u16_u32(c1.val[0])),
- vget_low_u16(vreinterpretq_u16_u32(c3.val[0])));
- *a5 = vcombine_u16(vget_high_u16(vreinterpretq_u16_u32(c1.val[0])),
- vget_high_u16(vreinterpretq_u16_u32(c3.val[0])));
+ *a0 = d0.val[0];
+ *a1 = d1.val[0];
+ *a2 = d2.val[0];
+ *a3 = d3.val[0];
+ *a4 = d0.val[1];
+ *a5 = d1.val[1];
+ *a6 = d2.val[1];
+ *a7 = d3.val[1];
+}
- *a3 = vcombine_u16(vget_low_u16(vreinterpretq_u16_u32(c1.val[1])),
- vget_low_u16(vreinterpretq_u16_u32(c3.val[1])));
- *a7 = vcombine_u16(vget_high_u16(vreinterpretq_u16_u32(c1.val[1])),
- vget_high_u16(vreinterpretq_u16_u32(c3.val[1])));
+static INLINE int16x8x2_t aom_vtrnq_s64_to_s16(int32x4_t a0, int32x4_t a1) {
+ int16x8x2_t b0;
+#if AOM_ARCH_AARCH64
+ b0.val[0] = vreinterpretq_s16_s64(
+ vtrn1q_s64(vreinterpretq_s64_s32(a0), vreinterpretq_s64_s32(a1)));
+ b0.val[1] = vreinterpretq_s16_s64(
+ vtrn2q_s64(vreinterpretq_s64_s32(a0), vreinterpretq_s64_s32(a1)));
+#else
+ b0.val[0] = vcombine_s16(vreinterpret_s16_s32(vget_low_s32(a0)),
+ vreinterpret_s16_s32(vget_low_s32(a1)));
+ b0.val[1] = vcombine_s16(vreinterpret_s16_s32(vget_high_s32(a0)),
+ vreinterpret_s16_s32(vget_high_s32(a1)));
+#endif
+ return b0;
}
static INLINE void transpose_s16_8x8(int16x8_t *a0, int16x8_t *a1,
@@ -582,37 +584,32 @@
const int32x4x2_t c3 = vtrnq_s32(vreinterpretq_s32_s16(b2.val[1]),
vreinterpretq_s32_s16(b3.val[1]));
- *a0 = vcombine_s16(vget_low_s16(vreinterpretq_s16_s32(c0.val[0])),
- vget_low_s16(vreinterpretq_s16_s32(c2.val[0])));
- *a4 = vcombine_s16(vget_high_s16(vreinterpretq_s16_s32(c0.val[0])),
- vget_high_s16(vreinterpretq_s16_s32(c2.val[0])));
+ // Swap 64 bit elements resulting in:
+ // d0.val[0]: 00 10 20 30 40 50 60 70
+ // d0.val[1]: 04 14 24 34 44 54 64 74
+ // d1.val[0]: 01 11 21 31 41 51 61 71
+ // d1.val[1]: 05 15 25 35 45 55 65 75
+ // d2.val[0]: 02 12 22 32 42 52 62 72
+ // d2.val[1]: 06 16 26 36 46 56 66 76
+ // d3.val[0]: 03 13 23 33 43 53 63 73
+ // d3.val[1]: 07 17 27 37 47 57 67 77
- *a2 = vcombine_s16(vget_low_s16(vreinterpretq_s16_s32(c0.val[1])),
- vget_low_s16(vreinterpretq_s16_s32(c2.val[1])));
- *a6 = vcombine_s16(vget_high_s16(vreinterpretq_s16_s32(c0.val[1])),
- vget_high_s16(vreinterpretq_s16_s32(c2.val[1])));
+ const int16x8x2_t d0 = aom_vtrnq_s64_to_s16(c0.val[0], c2.val[0]);
+ const int16x8x2_t d1 = aom_vtrnq_s64_to_s16(c1.val[0], c3.val[0]);
+ const int16x8x2_t d2 = aom_vtrnq_s64_to_s16(c0.val[1], c2.val[1]);
+ const int16x8x2_t d3 = aom_vtrnq_s64_to_s16(c1.val[1], c3.val[1]);
- *a1 = vcombine_s16(vget_low_s16(vreinterpretq_s16_s32(c1.val[0])),
- vget_low_s16(vreinterpretq_s16_s32(c3.val[0])));
- *a5 = vcombine_s16(vget_high_s16(vreinterpretq_s16_s32(c1.val[0])),
- vget_high_s16(vreinterpretq_s16_s32(c3.val[0])));
-
- *a3 = vcombine_s16(vget_low_s16(vreinterpretq_s16_s32(c1.val[1])),
- vget_low_s16(vreinterpretq_s16_s32(c3.val[1])));
- *a7 = vcombine_s16(vget_high_s16(vreinterpretq_s16_s32(c1.val[1])),
- vget_high_s16(vreinterpretq_s16_s32(c3.val[1])));
+ *a0 = d0.val[0];
+ *a1 = d1.val[0];
+ *a2 = d2.val[0];
+ *a3 = d3.val[0];
+ *a4 = d0.val[1];
+ *a5 = d1.val[1];
+ *a6 = d2.val[1];
+ *a7 = d3.val[1];
}
-static INLINE int16x8x2_t aom_vtrnq_s64_to_s16(int32x4_t a0, int32x4_t a1) {
- int16x8x2_t b0;
- b0.val[0] = vcombine_s16(vreinterpret_s16_s32(vget_low_s32(a0)),
- vreinterpret_s16_s32(vget_low_s32(a1)));
- b0.val[1] = vcombine_s16(vreinterpret_s16_s32(vget_high_s32(a0)),
- vreinterpret_s16_s32(vget_high_s32(a1)));
- return b0;
-}
-
-static INLINE void transpose_s16_8x8q(int16x8_t *a0, int16x8_t *out) {
+static INLINE void transpose_s16_8x8q(int16x8_t *a, int16x8_t *out) {
// Swap 16 bit elements. Goes from:
// a0: 00 01 02 03 04 05 06 07
// a1: 10 11 12 13 14 15 16 17
@@ -632,10 +629,10 @@
// b3.val[0]: 60 70 62 72 64 74 66 76
// b3.val[1]: 61 71 63 73 65 75 67 77
- const int16x8x2_t b0 = vtrnq_s16(*a0, *(a0 + 1));
- const int16x8x2_t b1 = vtrnq_s16(*(a0 + 2), *(a0 + 3));
- const int16x8x2_t b2 = vtrnq_s16(*(a0 + 4), *(a0 + 5));
- const int16x8x2_t b3 = vtrnq_s16(*(a0 + 6), *(a0 + 7));
+ const int16x8x2_t b0 = vtrnq_s16(a[0], a[1]);
+ const int16x8x2_t b1 = vtrnq_s16(a[2], a[3]);
+ const int16x8x2_t b2 = vtrnq_s16(a[4], a[5]);
+ const int16x8x2_t b3 = vtrnq_s16(a[6], a[7]);
// Swap 32 bit elements resulting in:
// c0.val[0]: 00 10 20 30 04 14 24 34
@@ -665,19 +662,53 @@
// d2.val[1]: 06 16 26 36 46 56 66 76
// d3.val[0]: 03 13 23 33 43 53 63 73
// d3.val[1]: 07 17 27 37 47 57 67 77
+
const int16x8x2_t d0 = aom_vtrnq_s64_to_s16(c0.val[0], c2.val[0]);
const int16x8x2_t d1 = aom_vtrnq_s64_to_s16(c1.val[0], c3.val[0]);
const int16x8x2_t d2 = aom_vtrnq_s64_to_s16(c0.val[1], c2.val[1]);
const int16x8x2_t d3 = aom_vtrnq_s64_to_s16(c1.val[1], c3.val[1]);
- *out = d0.val[0];
- *(out + 1) = d1.val[0];
- *(out + 2) = d2.val[0];
- *(out + 3) = d3.val[0];
- *(out + 4) = d0.val[1];
- *(out + 5) = d1.val[1];
- *(out + 6) = d2.val[1];
- *(out + 7) = d3.val[1];
+ out[0] = d0.val[0];
+ out[1] = d1.val[0];
+ out[2] = d2.val[0];
+ out[3] = d3.val[0];
+ out[4] = d0.val[1];
+ out[5] = d1.val[1];
+ out[6] = d2.val[1];
+ out[7] = d3.val[1];
+}
+
+static INLINE void transpose_u16_4x4d(uint16x4_t *a0, uint16x4_t *a1,
+ uint16x4_t *a2, uint16x4_t *a3) {
+ // Swap 16 bit elements. Goes from:
+ // a0: 00 01 02 03
+ // a1: 10 11 12 13
+ // a2: 20 21 22 23
+ // a3: 30 31 32 33
+ // to:
+ // b0.val[0]: 00 10 02 12
+ // b0.val[1]: 01 11 03 13
+ // b1.val[0]: 20 30 22 32
+ // b1.val[1]: 21 31 23 33
+
+ const uint16x4x2_t b0 = vtrn_u16(*a0, *a1);
+ const uint16x4x2_t b1 = vtrn_u16(*a2, *a3);
+
+ // Swap 32 bit elements resulting in:
+ // c0.val[0]: 00 10 20 30
+ // c0.val[1]: 02 12 22 32
+ // c1.val[0]: 01 11 21 31
+ // c1.val[1]: 03 13 23 33
+
+ const uint32x2x2_t c0 = vtrn_u32(vreinterpret_u32_u16(b0.val[0]),
+ vreinterpret_u32_u16(b1.val[0]));
+ const uint32x2x2_t c1 = vtrn_u32(vreinterpret_u32_u16(b0.val[1]),
+ vreinterpret_u32_u16(b1.val[1]));
+
+ *a0 = vreinterpret_u16_u32(c0.val[0]);
+ *a1 = vreinterpret_u16_u32(c1.val[0]);
+ *a2 = vreinterpret_u16_u32(c0.val[1]);
+ *a3 = vreinterpret_u16_u32(c1.val[1]);
}
static INLINE void transpose_s16_4x4d(int16x4_t *a0, int16x4_t *a1,
@@ -715,8 +746,15 @@
static INLINE int32x4x2_t aom_vtrnq_s64_to_s32(int32x4_t a0, int32x4_t a1) {
int32x4x2_t b0;
+#if AOM_ARCH_AARCH64
+ b0.val[0] = vreinterpretq_s32_s64(
+ vtrn1q_s64(vreinterpretq_s64_s32(a0), vreinterpretq_s64_s32(a1)));
+ b0.val[1] = vreinterpretq_s32_s64(
+ vtrn2q_s64(vreinterpretq_s64_s32(a0), vreinterpretq_s64_s32(a1)));
+#else
b0.val[0] = vcombine_s32(vget_low_s32(a0), vget_low_s32(a1));
b0.val[1] = vcombine_s32(vget_high_s32(a0), vget_high_s32(a1));
+#endif
return b0;
}
diff --git a/aom_dsp/arm/variance_neon.c b/aom_dsp/arm/variance_neon.c
index 40e40f0..5e33996 100644
--- a/aom_dsp/arm/variance_neon.c
+++ b/aom_dsp/arm/variance_neon.c
@@ -27,7 +27,7 @@
uint32x4_t ref_sum = vdupq_n_u32(0);
uint32x4_t sse_u32 = vdupq_n_u32(0);
- int i = 0;
+ int i = h;
do {
uint8x16_t s = load_unaligned_u8q(src, src_stride);
uint8x16_t r = load_unaligned_u8q(ref, ref_stride);
@@ -40,8 +40,8 @@
src += 4 * src_stride;
ref += 4 * ref_stride;
- i += 4;
- } while (i < h);
+ i -= 4;
+ } while (i != 0);
int32x4_t sum_diff =
vsubq_s32(vreinterpretq_s32_u32(src_sum), vreinterpretq_s32_u32(ref_sum));
@@ -56,7 +56,7 @@
uint32x4_t ref_sum = vdupq_n_u32(0);
uint32x4_t sse_u32 = vdupq_n_u32(0);
- int i = 0;
+ int i = h;
do {
uint8x16_t s = vcombine_u8(vld1_u8(src), vld1_u8(src + src_stride));
uint8x16_t r = vcombine_u8(vld1_u8(ref), vld1_u8(ref + ref_stride));
@@ -69,8 +69,8 @@
src += 2 * src_stride;
ref += 2 * ref_stride;
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
int32x4_t sum_diff =
vsubq_s32(vreinterpretq_s32_u32(src_sum), vreinterpretq_s32_u32(ref_sum));
@@ -85,7 +85,7 @@
uint32x4_t ref_sum = vdupq_n_u32(0);
uint32x4_t sse_u32 = vdupq_n_u32(0);
- int i = 0;
+ int i = h;
do {
uint8x16_t s = vld1q_u8(src);
uint8x16_t r = vld1q_u8(ref);
@@ -98,8 +98,7 @@
src += src_stride;
ref += ref_stride;
- i++;
- } while (i < h);
+ } while (--i != 0);
int32x4_t sum_diff =
vsubq_s32(vreinterpretq_s32_u32(src_sum), vreinterpretq_s32_u32(ref_sum));
@@ -114,7 +113,7 @@
uint32x4_t ref_sum = vdupq_n_u32(0);
uint32x4_t sse_u32 = vdupq_n_u32(0);
- int i = 0;
+ int i = h;
do {
int j = 0;
do {
@@ -132,8 +131,7 @@
src += src_stride;
ref += ref_stride;
- i++;
- } while (i < h);
+ } while (--i != 0);
int32x4_t sum_diff =
vsubq_s32(vreinterpretq_s32_u32(src_sum), vreinterpretq_s32_u32(ref_sum));
@@ -171,7 +169,7 @@
// 32767 / 255 ~= 128, but we use an 8-wide accumulator; so 256 4-wide rows.
assert(h <= 256);
- int i = 0;
+ int i = h;
do {
uint8x8_t s = load_unaligned_u8(src, src_stride);
uint8x8_t r = load_unaligned_u8(ref, ref_stride);
@@ -184,8 +182,8 @@
src += 2 * src_stride;
ref += 2 * ref_stride;
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
*sum = horizontal_add_s16x8(sum_s16);
*sse = (uint32_t)horizontal_add_s32x4(sse_s32);
@@ -201,7 +199,7 @@
// 32767 / 255 ~= 128
assert(h <= 128);
- int i = 0;
+ int i = h;
do {
uint8x8_t s = vld1_u8(src);
uint8x8_t r = vld1_u8(ref);
@@ -215,8 +213,7 @@
src += src_stride;
ref += ref_stride;
- i++;
- } while (i < h);
+ } while (--i != 0);
*sum = horizontal_add_s16x8(sum_s16);
*sse = (uint32_t)horizontal_add_s32x4(vaddq_s32(sse_s32[0], sse_s32[1]));
@@ -232,7 +229,7 @@
// 32767 / 255 ~= 128, so 128 16-wide rows.
assert(h <= 128);
- int i = 0;
+ int i = h;
do {
uint8x16_t s = vld1q_u8(src);
uint8x16_t r = vld1q_u8(ref);
@@ -256,8 +253,7 @@
src += src_stride;
ref += ref_stride;
- i++;
- } while (i < h);
+ } while (--i != 0);
*sum = horizontal_add_s16x8(vaddq_s16(sum_s16[0], sum_s16[1]));
*sse = (uint32_t)horizontal_add_s32x4(vaddq_s32(sse_s32[0], sse_s32[1]));
@@ -378,17 +374,6 @@
#undef VARIANCE_WXH_NEON
-void aom_get8x8var_neon(const uint8_t *src, int src_stride, const uint8_t *ref,
- int ref_stride, unsigned int *sse, int *sum) {
- variance_8xh_neon(src, src_stride, ref, ref_stride, 8, sse, sum);
-}
-
-void aom_get16x16var_neon(const uint8_t *src, int src_stride,
- const uint8_t *ref, int ref_stride, unsigned int *sse,
- int *sum) {
- variance_16xh_neon(src, src_stride, ref, ref_stride, 16, sse, sum);
-}
-
// TODO(yunqingwang): Perform variance of two/four 8x8 blocks similar to that of
// AVX2. Also, implement the NEON for variance computation present in this
// function.
@@ -409,6 +394,25 @@
var8x8[i] = sse8x8[i] - (uint32_t)(((int64_t)sum8x8[i] * sum8x8[i]) >> 6);
}
+void aom_get_var_sse_sum_16x16_dual_neon(const uint8_t *src, int src_stride,
+ const uint8_t *ref, int ref_stride,
+ uint32_t *sse16x16,
+ unsigned int *tot_sse, int *tot_sum,
+ uint32_t *var16x16) {
+ int sum16x16[2] = { 0 };
+ // Loop over 2 16x16 blocks. Process one 16x32 block.
+ for (int k = 0; k < 2; k++) {
+ variance_16xh_neon(src + (k * 16), src_stride, ref + (k * 16), ref_stride,
+ 16, &sse16x16[k], &sum16x16[k]);
+ }
+
+ *tot_sse += sse16x16[0] + sse16x16[1];
+ *tot_sum += sum16x16[0] + sum16x16[1];
+ for (int i = 0; i < 2; i++)
+ var16x16[i] =
+ sse16x16[i] - (uint32_t)(((int64_t)sum16x16[i] * sum16x16[i]) >> 8);
+}
+
#if defined(__ARM_FEATURE_DOTPROD)
static INLINE unsigned int mse8xh_neon(const uint8_t *src, int src_stride,
@@ -416,7 +420,7 @@
unsigned int *sse, int h) {
uint32x4_t sse_u32 = vdupq_n_u32(0);
- int i = 0;
+ int i = h;
do {
uint8x16_t s = vcombine_u8(vld1_u8(src), vld1_u8(src + src_stride));
uint8x16_t r = vcombine_u8(vld1_u8(ref), vld1_u8(ref + ref_stride));
@@ -427,8 +431,8 @@
src += 2 * src_stride;
ref += 2 * ref_stride;
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
*sse = horizontal_add_u32x4(sse_u32);
return horizontal_add_u32x4(sse_u32);
@@ -439,7 +443,7 @@
unsigned int *sse, int h) {
uint32x4_t sse_u32[2] = { vdupq_n_u32(0), vdupq_n_u32(0) };
- int i = 0;
+ int i = h;
do {
uint8x16_t s0 = vld1q_u8(src);
uint8x16_t s1 = vld1q_u8(src + src_stride);
@@ -454,25 +458,13 @@
src += 2 * src_stride;
ref += 2 * ref_stride;
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
*sse = horizontal_add_u32x4(vaddq_u32(sse_u32[0], sse_u32[1]));
return horizontal_add_u32x4(vaddq_u32(sse_u32[0], sse_u32[1]));
}
-unsigned int aom_get4x4sse_cs_neon(const uint8_t *src, int src_stride,
- const uint8_t *ref, int ref_stride) {
- uint8x16_t s = load_unaligned_u8q(src, src_stride);
- uint8x16_t r = load_unaligned_u8q(ref, ref_stride);
-
- uint8x16_t abs_diff = vabdq_u8(s, r);
-
- uint32x4_t sse = vdotq_u32(vdupq_n_u32(0), abs_diff, abs_diff);
-
- return horizontal_add_u32x4(sse);
-}
-
#else // !defined(__ARM_FEATURE_DOTPROD)
static INLINE unsigned int mse8xh_neon(const uint8_t *src, int src_stride,
@@ -483,7 +475,7 @@
uint16x8_t diff[2];
int32x4_t sse_s32[2] = { vdupq_n_s32(0), vdupq_n_s32(0) };
- int i = 0;
+ int i = h;
do {
s[0] = vld1_u8(src);
src += src_stride;
@@ -507,8 +499,8 @@
sse_s32[0] = vmlal_s16(sse_s32[0], diff_hi[0], diff_hi[0]);
sse_s32[1] = vmlal_s16(sse_s32[1], diff_hi[1], diff_hi[1]);
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
sse_s32[0] = vaddq_s32(sse_s32[0], sse_s32[1]);
@@ -525,7 +517,7 @@
int32x4_t sse_s32[4] = { vdupq_n_s32(0), vdupq_n_s32(0), vdupq_n_s32(0),
vdupq_n_s32(0) };
- int i = 0;
+ int i = h;
do {
s[0] = vld1q_u8(src);
src += src_stride;
@@ -561,8 +553,8 @@
sse_s32[2] = vmlal_s16(sse_s32[2], diff_hi[2], diff_hi[2]);
sse_s32[3] = vmlal_s16(sse_s32[3], diff_hi[3], diff_hi[3]);
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
sse_s32[0] = vaddq_s32(sse_s32[0], sse_s32[1]);
sse_s32[2] = vaddq_s32(sse_s32[2], sse_s32[3]);
@@ -572,40 +564,6 @@
return horizontal_add_u32x4(vreinterpretq_u32_s32(sse_s32[0]));
}
-unsigned int aom_get4x4sse_cs_neon(const uint8_t *src, int src_stride,
- const uint8_t *ref, int ref_stride) {
- uint8x8_t s[4], r[4];
- int16x4_t diff[4];
- int32x4_t sse;
-
- s[0] = vld1_u8(src);
- src += src_stride;
- r[0] = vld1_u8(ref);
- ref += ref_stride;
- s[1] = vld1_u8(src);
- src += src_stride;
- r[1] = vld1_u8(ref);
- ref += ref_stride;
- s[2] = vld1_u8(src);
- src += src_stride;
- r[2] = vld1_u8(ref);
- ref += ref_stride;
- s[3] = vld1_u8(src);
- r[3] = vld1_u8(ref);
-
- diff[0] = vget_low_s16(vreinterpretq_s16_u16(vsubl_u8(s[0], r[0])));
- diff[1] = vget_low_s16(vreinterpretq_s16_u16(vsubl_u8(s[1], r[1])));
- diff[2] = vget_low_s16(vreinterpretq_s16_u16(vsubl_u8(s[2], r[2])));
- diff[3] = vget_low_s16(vreinterpretq_s16_u16(vsubl_u8(s[3], r[3])));
-
- sse = vmull_s16(diff[0], diff[0]);
- sse = vmlal_s16(sse, diff[1], diff[1]);
- sse = vmlal_s16(sse, diff[2], diff[2]);
- sse = vmlal_s16(sse, diff[3], diff[3]);
-
- return horizontal_add_u32x4(vreinterpretq_u32_s32(sse));
-}
-
#endif // defined(__ARM_FEATURE_DOTPROD)
#define MSE_WXH_NEON(w, h) \
@@ -647,7 +605,7 @@
int h) {
uint64x2_t square_result = vdupq_n_u64(0);
uint32_t d0, d1;
- int i = 0;
+ int i = h;
uint8_t *dst_ptr = dst;
uint16_t *src_ptr = src;
do {
@@ -678,8 +636,8 @@
const uint16x8_t src_16x8 = vcombine_u16(src0_16x4, src1_16x4);
COMPUTE_MSE_16BIT(src_16x8, dst_16x8)
- i += 2;
- } while (i < h);
+ i -= 2;
+ } while (i != 0);
uint64x1_t sum =
vadd_u64(vget_high_u64(square_result), vget_low_u64(square_result));
return vget_lane_u64(sum, 0);
@@ -689,16 +647,18 @@
uint16_t *src, int sstride,
int h) {
uint64x2_t square_result = vdupq_n_u64(0);
- int i = 0;
+ int i = h;
do {
// d7 d6 d5 d4 d3 d2 d1 d0 - 8 bit
- const uint16x8_t dst_16x8 = vmovl_u8(vld1_u8(&dst[i * dstride]));
+ const uint16x8_t dst_16x8 = vmovl_u8(vld1_u8(dst));
// s7 s6 s5 s4 s3 s2 s1 s0 - 16 bit
- const uint16x8_t src_16x8 = vld1q_u16(&src[i * sstride]);
+ const uint16x8_t src_16x8 = vld1q_u16(src);
COMPUTE_MSE_16BIT(src_16x8, dst_16x8)
- i++;
- } while (i < h);
+
+ dst += dstride;
+ src += sstride;
+ } while (--i != 0);
uint64x1_t sum =
vadd_u64(vget_high_u64(square_result), vget_low_u64(square_result));
return vget_lane_u64(sum, 0);
diff --git a/aom_dsp/avg.c b/aom_dsp/avg.c
index ceb1026..7b36bf3 100644
--- a/aom_dsp/avg.c
+++ b/aom_dsp/avg.c
@@ -87,7 +87,7 @@
int i, j;
const uint16_t *s = CONVERT_TO_SHORTPTR(s8);
const uint16_t *d = CONVERT_TO_SHORTPTR(d8);
- *min = 255;
+ *min = 65535;
*max = 0;
for (i = 0; i < 8; ++i, s += p, d += dp) {
for (j = 0; j < 8; ++j) {
@@ -99,14 +99,6 @@
}
#endif // CONFIG_AV1_HIGHBITDEPTH
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride,
- int16_t *coeff, int log_scale, int h8, int w8) {
- for (int idy = 0; idy < h8 * 8; ++idy)
- for (int idx = 0; idx < w8 * 8; ++idx)
- coeff[idy * (h8 * 8) + idx] = src_diff[idy * src_stride + idx]
- << log_scale;
-}
-
static void hadamard_col4(const int16_t *src_diff, ptrdiff_t src_stride,
int16_t *coeff) {
int16_t b0 = (src_diff[0 * src_stride] + src_diff[1 * src_stride]) >> 1;
@@ -333,19 +325,19 @@
aom_hadamard_16x16_c(src_ptr, src_stride, coeff + idx * 256);
}
- // coeff: 15 bit, dynamic range [-16320, 16320]
+ // coeff: 16 bit, dynamic range [-32768, 32767]
for (idx = 0; idx < 256; ++idx) {
tran_low_t a0 = coeff[0];
tran_low_t a1 = coeff[256];
tran_low_t a2 = coeff[512];
tran_low_t a3 = coeff[768];
- tran_low_t b0 = (a0 + a1) >> 2; // (a0 + a1): 16 bit, [-32640, 32640]
+ tran_low_t b0 = (a0 + a1) >> 2; // (a0 + a1): 17 bit, [-65536, 65535]
tran_low_t b1 = (a0 - a1) >> 2; // b0-b3: 15 bit, dynamic range
- tran_low_t b2 = (a2 + a3) >> 2; // [-16320, 16320]
+ tran_low_t b2 = (a2 + a3) >> 2; // [-16384, 16383]
tran_low_t b3 = (a2 - a3) >> 2;
- coeff[0] = b0 + b2; // 16 bit, [-32640, 32640]
+ coeff[0] = b0 + b2; // 16 bit, [-32768, 32767]
coeff[256] = b1 + b3;
coeff[512] = b0 - b2;
coeff[768] = b1 - b3;
diff --git a/aom_dsp/entdec.c b/aom_dsp/entdec.c
index da43e8a..5bbcdda 100644
--- a/aom_dsp/entdec.c
+++ b/aom_dsp/entdec.c
@@ -205,14 +205,14 @@
assert(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
assert(icdf[nsyms - 1] == OD_ICDF(CDF_PROB_TOP));
assert(32768U <= r);
- assert(7 - EC_PROB_SHIFT - CDF_SHIFT >= 0);
+ assert(7 - EC_PROB_SHIFT >= 0);
c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
v = r;
ret = -1;
do {
u = v;
v = ((r >> 8) * (uint32_t)(icdf[++ret] >> EC_PROB_SHIFT) >>
- (7 - EC_PROB_SHIFT - CDF_SHIFT));
+ (7 - EC_PROB_SHIFT));
v += EC_MIN_PROB * (N - ret);
} while (c < v);
assert(v < u);
diff --git a/aom_dsp/entenc.c b/aom_dsp/entenc.c
index 2fd4493..dfc1624 100644
--- a/aom_dsp/entenc.c
+++ b/aom_dsp/entenc.c
@@ -49,11 +49,11 @@
}*/
/*Takes updated low and range values, renormalizes them so that
- 32768 <= rng < 65536 (flushing bytes from low to the pre-carry buffer if
+ 32768 <= rng < 65536 (flushing bytes from low to the output buffer if
necessary), and stores them back in the encoder context.
low: The new value of low.
rng: The new value of the range.*/
-static void od_ec_enc_normalize(od_ec_enc *enc, od_ec_window low,
+static void od_ec_enc_normalize(od_ec_enc *enc, od_ec_enc_window low,
unsigned rng) {
int d;
int c;
@@ -63,44 +63,59 @@
/*The number of leading zeros in the 16-bit binary representation of rng.*/
d = 16 - OD_ILOG_NZ(rng);
s = c + d;
- /*TODO: Right now we flush every time we have at least one byte available.
- Instead we should use an od_ec_window and flush right before we're about to
- shift bits off the end of the window.
- For a 32-bit window this is about the same amount of work, but for a 64-bit
- window it should be a fair win.*/
- if (s >= 0) {
- uint16_t *buf;
- uint32_t storage;
- uint32_t offs;
- unsigned m;
- buf = enc->precarry_buf;
- storage = enc->precarry_storage;
- offs = enc->offs;
- if (offs + 2 > storage) {
- storage = 2 * storage + 2;
- buf = (uint16_t *)realloc(buf, sizeof(*buf) * storage);
- if (buf == NULL) {
+
+ /* We flush every time "low" cannot safely and efficiently accommodate any
+ more data. Overall, c must not exceed 63 at the time of byte flush out. To
+ facilitate this, "s" cannot exceed 56-bits because we have to keep 1 byte
+ for carry. Also, we need to subtract 16 because we want to keep room for
+ the next symbol worth "d"-bits (max 15). An alternate condition would be if
+ (e < d), where e = number of leading zeros in "low", indicating there is
+ not enough rooom to accommodate "rng" worth of "d"-bits in "low". However,
+ this approach needs additional computations: (i) compute "e", (ii) push
+ the leading 0x00's as a special case.
+ */
+ if (s >= 40) { // 56 - 16
+ unsigned char *out = enc->buf;
+ uint32_t storage = enc->storage;
+ uint32_t offs = enc->offs;
+ if (offs + 8 > storage) {
+ storage = 2 * storage + 8;
+ out = (unsigned char *)realloc(out, sizeof(*out) * storage);
+ if (out == NULL) {
enc->error = -1;
enc->offs = 0;
return;
}
- enc->precarry_buf = buf;
- enc->precarry_storage = storage;
+ enc->buf = out;
+ enc->storage = storage;
}
- c += 16;
- m = (1 << c) - 1;
- if (s >= 8) {
- assert(offs < storage);
- buf[offs++] = (uint16_t)(low >> c);
- low &= m;
- c -= 8;
- m >>= 8;
- }
- assert(offs < storage);
- buf[offs++] = (uint16_t)(low >> c);
+ // Need to add 1 byte here since enc->cnt always counts 1 byte less
+ // (enc->cnt = -9) to ensure correct operation
+ uint8_t num_bytes_ready = (s >> 3) + 1;
+
+ // Update "c" to contain the number of non-ready bits in "low". Since "low"
+ // has 64-bit capacity, we need to add the (64 - 40) cushion bits and take
+ // off the number of ready bits.
+ c += 24 - (num_bytes_ready << 3);
+
+ // Prepare "output" and update "low"
+ uint64_t output = low >> c;
+ low = low & (((uint64_t)1 << c) - 1);
+
+ // Prepare data and carry mask
+ uint64_t mask = (uint64_t)1 << (num_bytes_ready << 3);
+ uint64_t carry = output & mask;
+
+ mask = mask - 0x01;
+ output = output & mask;
+
+ // Write data in a single operation
+ write_enc_data_to_out_buf(out, offs, output, carry, &enc->offs,
+ num_bytes_ready);
+
+ // Update state of the encoder: enc->cnt to contain the number of residual
+ // bits
s = c + d - 24;
- low &= m;
- enc->offs = offs;
}
enc->low = low << d;
enc->rng = rng << d;
@@ -117,12 +132,6 @@
enc->storage = 0;
enc->error = -1;
}
- enc->precarry_buf = (uint16_t *)malloc(sizeof(*enc->precarry_buf) * size);
- enc->precarry_storage = size;
- if (size > 0 && enc->precarry_buf == NULL) {
- enc->precarry_storage = 0;
- enc->error = -1;
- }
}
/*Reinitializes the encoder.*/
@@ -141,21 +150,16 @@
}
/*Frees the buffers used by the encoder.*/
-void od_ec_enc_clear(od_ec_enc *enc) {
- free(enc->precarry_buf);
- free(enc->buf);
-}
+void od_ec_enc_clear(od_ec_enc *enc) { free(enc->buf); }
/*Encodes a symbol given its frequency in Q15.
fl: CDF_PROB_TOP minus the cumulative frequency of all symbols that come
- before the
- one to be encoded.
+ before the one to be encoded.
fh: CDF_PROB_TOP minus the cumulative frequency of all symbols up to and
- including
- the one to be encoded.*/
+ including the one to be encoded.*/
static void od_ec_encode_q15(od_ec_enc *enc, unsigned fl, unsigned fh, int s,
int nsyms) {
- od_ec_window l;
+ od_ec_enc_window l;
unsigned r;
unsigned u;
unsigned v;
@@ -164,20 +168,17 @@
assert(32768U <= r);
assert(fh <= fl);
assert(fl <= 32768U);
- assert(7 - EC_PROB_SHIFT - CDF_SHIFT >= 0);
+ assert(7 - EC_PROB_SHIFT >= 0);
const int N = nsyms - 1;
if (fl < CDF_PROB_TOP) {
- u = ((r >> 8) * (uint32_t)(fl >> EC_PROB_SHIFT) >>
- (7 - EC_PROB_SHIFT - CDF_SHIFT)) +
+ u = ((r >> 8) * (uint32_t)(fl >> EC_PROB_SHIFT) >> (7 - EC_PROB_SHIFT)) +
EC_MIN_PROB * (N - (s - 1));
- v = ((r >> 8) * (uint32_t)(fh >> EC_PROB_SHIFT) >>
- (7 - EC_PROB_SHIFT - CDF_SHIFT)) +
+ v = ((r >> 8) * (uint32_t)(fh >> EC_PROB_SHIFT) >> (7 - EC_PROB_SHIFT)) +
EC_MIN_PROB * (N - (s + 0));
l += r - u;
r = u - v;
} else {
- r -= ((r >> 8) * (uint32_t)(fh >> EC_PROB_SHIFT) >>
- (7 - EC_PROB_SHIFT - CDF_SHIFT)) +
+ r -= ((r >> 8) * (uint32_t)(fh >> EC_PROB_SHIFT) >> (7 - EC_PROB_SHIFT)) +
EC_MIN_PROB * (N - (s + 0));
}
od_ec_enc_normalize(enc, l, r);
@@ -191,7 +192,7 @@
val: The value to encode (0 or 1).
f: The probability that the val is one, scaled by 32768.*/
void od_ec_encode_bool_q15(od_ec_enc *enc, int val, unsigned f) {
- od_ec_window l;
+ od_ec_enc_window l;
unsigned r;
unsigned v;
assert(0 < f);
@@ -251,12 +252,11 @@
mask = ((1U << nbits) - 1) << shift;
if (enc->offs > 0) {
/*The first byte has been finalized.*/
- enc->precarry_buf[0] =
- (uint16_t)((enc->precarry_buf[0] & ~mask) | val << shift);
+ enc->buf[0] = (unsigned char)((enc->buf[0] & ~mask) | val << shift);
} else if (9 + enc->cnt + (enc->rng == 0x8000) > nbits) {
/*The first byte has yet to be output.*/
- enc->low = (enc->low & ~((od_ec_window)mask << (16 + enc->cnt))) |
- (od_ec_window)val << (16 + enc->cnt + shift);
+ enc->low = (enc->low & ~((od_ec_enc_window)mask << (16 + enc->cnt))) |
+ (od_ec_enc_window)val << (16 + enc->cnt + shift);
} else {
/*The encoder hasn't even encoded _nbits of data yet.*/
enc->error = -1;
@@ -276,11 +276,10 @@
unsigned char *od_ec_enc_done(od_ec_enc *enc, uint32_t *nbytes) {
unsigned char *out;
uint32_t storage;
- uint16_t *buf;
uint32_t offs;
- od_ec_window m;
- od_ec_window e;
- od_ec_window l;
+ od_ec_enc_window m;
+ od_ec_enc_window e;
+ od_ec_enc_window l;
int c;
int s;
if (enc->error) return NULL;
@@ -295,8 +294,7 @@
(double)tell / enc->nb_symbols);
}
#endif
- /*We output the minimum number of bits that ensures that the symbols encoded
- thus far will be decoded correctly regardless of the bits that follow.*/
+
l = enc->low;
c = enc->cnt;
s = 10;
@@ -304,36 +302,14 @@
e = ((l + m) & ~m) | (m + 1);
s += c;
offs = enc->offs;
- buf = enc->precarry_buf;
- if (s > 0) {
- unsigned n;
- storage = enc->precarry_storage;
- if (offs + ((s + 7) >> 3) > storage) {
- storage = storage * 2 + ((s + 7) >> 3);
- buf = (uint16_t *)realloc(buf, sizeof(*buf) * storage);
- if (buf == NULL) {
- enc->error = -1;
- return NULL;
- }
- enc->precarry_buf = buf;
- enc->precarry_storage = storage;
- }
- n = (1 << (c + 16)) - 1;
- do {
- assert(offs < storage);
- buf[offs++] = (uint16_t)(e >> (c + 16));
- e &= n;
- s -= 8;
- c -= 8;
- n >>= 8;
- } while (s > 0);
- }
+
/*Make sure there's enough room for the entropy-coded bits.*/
out = enc->buf;
storage = enc->storage;
- c = OD_MAXI((s + 7) >> 3, 0);
- if (offs + c > storage) {
- storage = offs + c;
+ const int s_bits = (s + 7) >> 3;
+ int b = OD_MAXI(s_bits, 0);
+ if (offs + b > storage) {
+ storage = offs + b;
out = (unsigned char *)realloc(out, sizeof(*out) * storage);
if (out == NULL) {
enc->error = -1;
@@ -342,23 +318,30 @@
enc->buf = out;
enc->storage = storage;
}
- *nbytes = offs;
- /*Perform carry propagation.*/
- assert(offs <= storage);
- out = out + storage - offs;
- c = 0;
- while (offs > 0) {
- offs--;
- c = buf[offs] + c;
- out[offs] = (unsigned char)c;
- c >>= 8;
+
+ /*We output the minimum number of bits that ensures that the symbols encoded
+ thus far will be decoded correctly regardless of the bits that follow.*/
+ if (s > 0) {
+ uint64_t n;
+ n = ((uint64_t)1 << (c + 16)) - 1;
+ do {
+ assert(offs < storage);
+ uint16_t val = (uint16_t)(e >> (c + 16));
+ out[offs] = (unsigned char)(val & 0x00FF);
+ if (val & 0x0100) {
+ assert(offs > 0);
+ propagate_carry_bwd(out, offs - 1);
+ }
+ offs++;
+
+ e &= n;
+ s -= 8;
+ c -= 8;
+ n >>= 8;
+ } while (s > 0);
}
- /*Note: Unless there's an allocation error, if you keep encoding into the
- current buffer and call this function again later, everything will work
- just fine (you won't get a new packet out, but you will get a single
- buffer with the new data appended to the old).
- However, this function is O(N) where N is the amount of data coded so far,
- so calling it more than once for a given packet is a bad idea.*/
+ *nbytes = offs;
+
return out;
}
@@ -407,17 +390,10 @@
void od_ec_enc_rollback(od_ec_enc *dst, const od_ec_enc *src) {
unsigned char *buf;
uint32_t storage;
- uint16_t *precarry_buf;
- uint32_t precarry_storage;
assert(dst->storage >= src->storage);
- assert(dst->precarry_storage >= src->precarry_storage);
buf = dst->buf;
storage = dst->storage;
- precarry_buf = dst->precarry_buf;
- precarry_storage = dst->precarry_storage;
OD_COPY(dst, src, 1);
dst->buf = buf;
dst->storage = storage;
- dst->precarry_buf = precarry_buf;
- dst->precarry_storage = precarry_storage;
}
diff --git a/aom_dsp/entenc.h b/aom_dsp/entenc.h
index 3551d42..467e47b 100644
--- a/aom_dsp/entenc.h
+++ b/aom_dsp/entenc.h
@@ -13,11 +13,14 @@
#define AOM_AOM_DSP_ENTENC_H_
#include <stddef.h>
#include "aom_dsp/entcode.h"
+#include "aom_ports/bitops.h"
#ifdef __cplusplus
extern "C" {
#endif
+typedef uint64_t od_ec_enc_window;
+
typedef struct od_ec_enc od_ec_enc;
#define OD_MEASURE_EC_OVERHEAD (0)
@@ -30,14 +33,10 @@
unsigned char *buf;
/*The size of the buffer.*/
uint32_t storage;
- /*A buffer for output bytes with their associated carry flags.*/
- uint16_t *precarry_buf;
- /*The size of the pre-carry buffer.*/
- uint32_t precarry_storage;
/*The offset at which the next entropy-coded byte will be written.*/
uint32_t offs;
/*The low end of the current range.*/
- od_ec_window low;
+ od_ec_enc_window low;
/*The number of values in the current range.*/
uint16_t rng;
/*The number of bits of data in the current value.*/
@@ -78,6 +77,32 @@
void od_ec_enc_checkpoint(od_ec_enc *dst, const od_ec_enc *src);
void od_ec_enc_rollback(od_ec_enc *dst, const od_ec_enc *src);
+// buf is the frame bitbuffer, offs is where carry to be added
+static AOM_INLINE void propagate_carry_bwd(unsigned char *buf, uint32_t offs) {
+ uint16_t sum, carry = 1;
+ do {
+ sum = (uint16_t)buf[offs] + 1;
+ buf[offs--] = (unsigned char)sum;
+ carry = sum >> 8;
+ } while (carry);
+}
+
+// Reverse byte order and write data to buffer adding the carry-bit
+static AOM_INLINE void write_enc_data_to_out_buf(unsigned char *out,
+ uint32_t offs, uint64_t output,
+ uint64_t carry,
+ uint32_t *enc_offs,
+ uint8_t num_bytes_ready) {
+ const uint64_t reg = get_byteswap64(output) >> ((8 - num_bytes_ready) << 3);
+ memcpy(&out[offs], ®, 8);
+ // Propagate carry backwards if exists
+ if (carry) {
+ assert(offs > 0);
+ propagate_carry_bwd(out, offs - 1);
+ }
+ *enc_offs = offs + num_bytes_ready;
+}
+
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/aom_dsp/flow_estimation/corner_detect.c b/aom_dsp/flow_estimation/corner_detect.c
index c49e3fa..7848295 100644
--- a/aom_dsp/flow_estimation/corner_detect.c
+++ b/aom_dsp/flow_estimation/corner_detect.c
@@ -17,21 +17,149 @@
#include "third_party/fastfeat/fast.h"
+#include "aom_dsp/aom_dsp_common.h"
#include "aom_dsp/flow_estimation/corner_detect.h"
+#include "aom_mem/aom_mem.h"
+#include "av1/common/common.h"
-// Fast_9 wrapper
#define FAST_BARRIER 18
-int av1_fast_corner_detect(unsigned char *buf, int width, int height,
- int stride, int *points, int max_points) {
- int num_points;
- xy *const frm_corners_xy = aom_fast9_detect_nonmax(buf, width, height, stride,
- FAST_BARRIER, &num_points);
- num_points = (num_points <= max_points ? num_points : max_points);
- if (num_points > 0 && frm_corners_xy) {
- memcpy(points, frm_corners_xy, sizeof(*frm_corners_xy) * num_points);
- free(frm_corners_xy);
- return num_points;
+
+size_t av1_get_corner_list_size() { return sizeof(CornerList); }
+
+CornerList *av1_alloc_corner_list() {
+ CornerList *corners = (CornerList *)aom_calloc(1, sizeof(CornerList));
+ if (!corners) {
+ return NULL;
}
- free(frm_corners_xy);
- return 0;
+
+ corners->valid = false;
+#if CONFIG_MULTITHREAD
+ pthread_mutex_init(&corners->mutex, NULL);
+#endif // CONFIG_MULTITHREAD
+ return corners;
+}
+
+void compute_corner_list(const ImagePyramid *pyr, CornerList *corners) {
+ const uint8_t *buf = pyr->layers[0].buffer;
+ int width = pyr->layers[0].width;
+ int height = pyr->layers[0].height;
+ int stride = pyr->layers[0].stride;
+
+ int *scores = NULL;
+ int num_corners;
+ xy *const frame_corners_xy = aom_fast9_detect_nonmax(
+ buf, width, height, stride, FAST_BARRIER, &scores, &num_corners);
+
+ if (num_corners <= 0) {
+ // Some error occured, so no corners are available
+ corners->num_corners = 0;
+ } else if (num_corners <= MAX_CORNERS) {
+ // Use all detected corners
+ memcpy(corners->corners, frame_corners_xy,
+ sizeof(*frame_corners_xy) * num_corners);
+ corners->num_corners = num_corners;
+ } else {
+ // There are more than MAX_CORNERS corners avilable, so pick out a subset
+ // of the sharpest corners, as these will be the most useful for flow
+ // estimation
+ int histogram[256];
+ av1_zero(histogram);
+ for (int i = 0; i < num_corners; i++) {
+ assert(FAST_BARRIER <= scores[i] && scores[i] <= 255);
+ histogram[scores[i]] += 1;
+ }
+
+ int threshold = -1;
+ int found_corners = 0;
+ for (int bucket = 255; bucket >= 0; bucket--) {
+ if (found_corners + histogram[bucket] > MAX_CORNERS) {
+ // Set threshold here
+ threshold = bucket;
+ break;
+ }
+ found_corners += histogram[bucket];
+ }
+ assert(threshold != -1 && "Failed to select a valid threshold");
+
+ int copied_corners = 0;
+ for (int i = 0; i < num_corners; i++) {
+ if (scores[i] > threshold) {
+ assert(copied_corners < MAX_CORNERS);
+ corners->corners[2 * copied_corners + 0] = frame_corners_xy[i].x;
+ corners->corners[2 * copied_corners + 1] = frame_corners_xy[i].y;
+ copied_corners += 1;
+ }
+ }
+ assert(copied_corners == found_corners);
+ corners->num_corners = copied_corners;
+ }
+
+ free(scores);
+ free(frame_corners_xy);
+}
+
+void av1_compute_corner_list(const ImagePyramid *pyr, CornerList *corners) {
+ assert(corners);
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ if (!corners->valid) {
+ compute_corner_list(pyr, corners);
+ corners->valid = true;
+ }
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+}
+
+#ifndef NDEBUG
+// Check if a corner list has already been computed.
+// This is mostly a debug helper - as it is necessary to hold corners->mutex
+// while reading the valid flag, we cannot just write:
+// assert(corners->valid);
+// This function allows the check to be correctly written as:
+// assert(aom_is_corner_list_valid(corners));
+bool aom_is_corner_list_valid(CornerList *corners) {
+ assert(corners);
+
+ // Per the comments in the CornerList struct, we must take this mutex
+ // before reading or writing the "valid" flag, and hold it while computing
+ // the pyramid, to ensure proper behaviour if multiple threads call this
+ // function simultaneously
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ bool valid = corners->valid;
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ return valid;
+}
+#endif
+
+void av1_invalidate_corner_list(CornerList *corners) {
+ if (corners) {
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+ corners->valid = false;
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+ }
+}
+
+void av1_free_corner_list(CornerList *corners) {
+ if (corners) {
+#if CONFIG_MULTITHREAD
+ pthread_mutex_destroy(&corners->mutex);
+#endif // CONFIG_MULTITHREAD
+ aom_free(corners);
+ }
}
diff --git a/aom_dsp/flow_estimation/corner_detect.h b/aom_dsp/flow_estimation/corner_detect.h
index 4481c4e..c77813e 100644
--- a/aom_dsp/flow_estimation/corner_detect.h
+++ b/aom_dsp/flow_estimation/corner_detect.h
@@ -14,14 +14,64 @@
#include <stdio.h>
#include <stdlib.h>
+#include <stdbool.h>
#include <memory.h>
+#include "aom_dsp/pyramid.h"
+#include "aom_util/aom_thread.h"
+
#ifdef __cplusplus
extern "C" {
#endif
-int av1_fast_corner_detect(unsigned char *buf, int width, int height,
- int stride, int *points, int max_points);
+#define MAX_CORNERS 4096
+
+typedef struct corner_list {
+#if CONFIG_MULTITHREAD
+ // Mutex which is used to prevent the corner list from being computed twice
+ // at the same time
+ //
+ // Semantics:
+ // * This mutex must be held whenever reading or writing the `valid` flag
+ //
+ // * This mutex must also be held while computing the image pyramid,
+ // to ensure that only one thread may do so at a time.
+ //
+ // * However, once you have read the valid flag and seen a true value,
+ // it is safe to drop the mutex and read from the remaining fields.
+ // This is because, once the image pyramid is computed, its contents
+ // will not be changed until the parent frame buffer is recycled,
+ // which will not happen until there are no more outstanding references
+ // to the frame buffer.
+ pthread_mutex_t mutex;
+#endif // CONFIG_MULTITHREAD
+ // Flag indicating whether the corner list contains valid data
+ bool valid;
+ // Number of corners found
+ int num_corners;
+ // (x, y) coordinates of each corner
+ int corners[2 * MAX_CORNERS];
+} CornerList;
+
+size_t av1_get_corner_list_size();
+
+CornerList *av1_alloc_corner_list();
+
+void av1_compute_corner_list(const ImagePyramid *pyr, CornerList *corners);
+
+#ifndef NDEBUG
+// Check if a corner list has already been computed.
+// This is mostly a debug helper - as it is necessary to hold corners->mutex
+// while reading the valid flag, we cannot just write:
+// assert(corners->valid);
+// This function allows the check to be correctly written as:
+// assert(aom_is_corner_list_valid(corners));
+bool aom_is_corner_list_valid(CornerList *corners);
+#endif
+
+void av1_invalidate_corner_list(CornerList *corners);
+
+void av1_free_corner_list(CornerList *corners);
#ifdef __cplusplus
}
diff --git a/aom_dsp/flow_estimation/corner_match.c b/aom_dsp/flow_estimation/corner_match.c
index f675604..f34178e 100644
--- a/aom_dsp/flow_estimation/corner_match.c
+++ b/aom_dsp/flow_estimation/corner_match.c
@@ -19,6 +19,7 @@
#include "aom_dsp/flow_estimation/corner_match.h"
#include "aom_dsp/flow_estimation/flow_estimation.h"
#include "aom_dsp/flow_estimation/ransac.h"
+#include "aom_dsp/pyramid.h"
#include "aom_scale/yv12config.h"
#define SEARCH_SZ 9
@@ -26,30 +27,32 @@
#define THRESHOLD_NCC 0.75
-/* Compute var(im) * MATCH_SZ_SQ over a MATCH_SZ by MATCH_SZ window of im,
+/* Compute var(frame) * MATCH_SZ_SQ over a MATCH_SZ by MATCH_SZ window of frame,
centered at (x, y).
*/
-static double compute_variance(unsigned char *im, int stride, int x, int y) {
+static double compute_variance(const unsigned char *frame, int stride, int x,
+ int y) {
int sum = 0;
int sumsq = 0;
int var;
int i, j;
for (i = 0; i < MATCH_SZ; ++i)
for (j = 0; j < MATCH_SZ; ++j) {
- sum += im[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)];
- sumsq += im[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)] *
- im[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)];
+ sum += frame[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)];
+ sumsq += frame[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)] *
+ frame[(i + y - MATCH_SZ_BY2) * stride + (j + x - MATCH_SZ_BY2)];
}
var = sumsq * MATCH_SZ_SQ - sum * sum;
return (double)var;
}
-/* Compute corr(im1, im2) * MATCH_SZ * stddev(im1), where the
+/* Compute corr(frame1, frame2) * MATCH_SZ * stddev(frame1), where the
correlation/standard deviation are taken over MATCH_SZ by MATCH_SZ windows
of each image, centered at (x1, y1) and (x2, y2) respectively.
*/
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1,
- int y1, unsigned char *im2, int stride2,
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1,
+ int x1, int y1,
+ const unsigned char *frame2, int stride2,
int x2, int y2) {
int v1, v2;
int sum1 = 0;
@@ -60,8 +63,8 @@
int i, j;
for (i = 0; i < MATCH_SZ; ++i)
for (j = 0; j < MATCH_SZ; ++j) {
- v1 = im1[(i + y1 - MATCH_SZ_BY2) * stride1 + (j + x1 - MATCH_SZ_BY2)];
- v2 = im2[(i + y2 - MATCH_SZ_BY2) * stride2 + (j + x2 - MATCH_SZ_BY2)];
+ v1 = frame1[(i + y1 - MATCH_SZ_BY2) * stride1 + (j + x1 - MATCH_SZ_BY2)];
+ v2 = frame2[(i + y2 - MATCH_SZ_BY2) * stride2 + (j + x2 - MATCH_SZ_BY2)];
sum1 += v1;
sum2 += v2;
sumsq2 += v2 * v2;
@@ -84,28 +87,30 @@
(point1y - point2y) * (point1y - point2y)) <= thresh * thresh;
}
-static void improve_correspondence(unsigned char *frm, unsigned char *ref,
- int width, int height, int frm_stride,
- int ref_stride,
+static void improve_correspondence(const unsigned char *src,
+ const unsigned char *ref, int width,
+ int height, int src_stride, int ref_stride,
Correspondence *correspondences,
int num_correspondences) {
int i;
for (i = 0; i < num_correspondences; ++i) {
int x, y, best_x = 0, best_y = 0;
double best_match_ncc = 0.0;
+ // For this algorithm, all points have integer coordinates.
+ // It's a little more efficient to convert them to ints once,
+ // before the inner loops
+ int x0 = (int)correspondences[i].x;
+ int y0 = (int)correspondences[i].y;
+ int rx0 = (int)correspondences[i].rx;
+ int ry0 = (int)correspondences[i].ry;
for (y = -SEARCH_SZ_BY2; y <= SEARCH_SZ_BY2; ++y) {
for (x = -SEARCH_SZ_BY2; x <= SEARCH_SZ_BY2; ++x) {
double match_ncc;
- if (!is_eligible_point(correspondences[i].rx + x,
- correspondences[i].ry + y, width, height))
+ if (!is_eligible_point(rx0 + x, ry0 + y, width, height)) continue;
+ if (!is_eligible_distance(x0, y0, rx0 + x, ry0 + y, width, height))
continue;
- if (!is_eligible_distance(correspondences[i].x, correspondences[i].y,
- correspondences[i].rx + x,
- correspondences[i].ry + y, width, height))
- continue;
- match_ncc = av1_compute_cross_correlation(
- frm, frm_stride, correspondences[i].x, correspondences[i].y, ref,
- ref_stride, correspondences[i].rx + x, correspondences[i].ry + y);
+ match_ncc = av1_compute_cross_correlation(src, src_stride, x0, y0, ref,
+ ref_stride, rx0 + x, ry0 + y);
if (match_ncc > best_match_ncc) {
best_match_ncc = match_ncc;
best_y = y;
@@ -119,19 +124,18 @@
for (i = 0; i < num_correspondences; ++i) {
int x, y, best_x = 0, best_y = 0;
double best_match_ncc = 0.0;
+ int x0 = (int)correspondences[i].x;
+ int y0 = (int)correspondences[i].y;
+ int rx0 = (int)correspondences[i].rx;
+ int ry0 = (int)correspondences[i].ry;
for (y = -SEARCH_SZ_BY2; y <= SEARCH_SZ_BY2; ++y)
for (x = -SEARCH_SZ_BY2; x <= SEARCH_SZ_BY2; ++x) {
double match_ncc;
- if (!is_eligible_point(correspondences[i].x + x,
- correspondences[i].y + y, width, height))
- continue;
- if (!is_eligible_distance(
- correspondences[i].x + x, correspondences[i].y + y,
- correspondences[i].rx, correspondences[i].ry, width, height))
+ if (!is_eligible_point(x0 + x, y0 + y, width, height)) continue;
+ if (!is_eligible_distance(x0 + x, y0 + y, rx0, ry0, width, height))
continue;
match_ncc = av1_compute_cross_correlation(
- ref, ref_stride, correspondences[i].rx, correspondences[i].ry, frm,
- frm_stride, correspondences[i].x + x, correspondences[i].y + y);
+ ref, ref_stride, rx0, ry0, src, src_stride, x0 + x, y0 + y);
if (match_ncc > best_match_ncc) {
best_match_ncc = match_ncc;
best_y = y;
@@ -143,14 +147,15 @@
}
}
-int aom_determine_correspondence(unsigned char *src, int *src_corners,
- int num_src_corners, unsigned char *ref,
- int *ref_corners, int num_ref_corners,
+int aom_determine_correspondence(const unsigned char *src,
+ const int *src_corners, int num_src_corners,
+ const unsigned char *ref,
+ const int *ref_corners, int num_ref_corners,
int width, int height, int src_stride,
- int ref_stride, int *correspondence_pts) {
+ int ref_stride,
+ Correspondence *correspondences) {
// TODO(sarahparker) Improve this to include 2-way match
int i, j;
- Correspondence *correspondences = (Correspondence *)correspondence_pts;
int num_correspondences = 0;
for (i = 0; i < num_src_corners; ++i) {
double best_match_ncc = 0.0;
@@ -195,71 +200,44 @@
return num_correspondences;
}
-static bool get_inliers_from_indices(MotionModel *params,
- int *correspondences) {
- int *inliers_tmp = (int *)aom_calloc(2 * MAX_CORNERS, sizeof(*inliers_tmp));
- if (!inliers_tmp) return false;
-
- for (int i = 0; i < params->num_inliers; i++) {
- int index = params->inliers[i];
- inliers_tmp[2 * i] = correspondences[4 * index];
- inliers_tmp[2 * i + 1] = correspondences[4 * index + 1];
- }
- memcpy(params->inliers, inliers_tmp, sizeof(*inliers_tmp) * 2 * MAX_CORNERS);
- aom_free(inliers_tmp);
- return true;
-}
-
-int av1_compute_global_motion_feature_based(
- TransformationType type, unsigned char *src_buffer, int src_width,
- int src_height, int src_stride, int *src_corners, int num_src_corners,
- YV12_BUFFER_CONFIG *ref, int bit_depth, int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions) {
- int i;
- int num_ref_corners;
+bool av1_compute_global_motion_feature_match(
+ TransformationType type, YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *ref,
+ int bit_depth, MotionModel *motion_models, int num_motion_models) {
int num_correspondences;
- int *correspondences;
- int ref_corners[2 * MAX_CORNERS];
- unsigned char *ref_buffer = ref->y_buffer;
- RansacFunc ransac = av1_get_ransac_type(type);
+ Correspondence *correspondences;
+ ImagePyramid *src_pyramid = src->y_pyramid;
+ CornerList *src_corners = src->corners;
+ ImagePyramid *ref_pyramid = ref->y_pyramid;
+ CornerList *ref_corners = ref->corners;
- if (ref->flags & YV12_FLAG_HIGHBITDEPTH) {
- ref_buffer = av1_downconvert_frame(ref, bit_depth);
- }
+ // Precompute information we will need about each frame
+ aom_compute_pyramid(src, bit_depth, src_pyramid);
+ av1_compute_corner_list(src_pyramid, src_corners);
+ aom_compute_pyramid(ref, bit_depth, ref_pyramid);
+ av1_compute_corner_list(ref_pyramid, ref_corners);
- num_ref_corners =
- av1_fast_corner_detect(ref_buffer, ref->y_width, ref->y_height,
- ref->y_stride, ref_corners, MAX_CORNERS);
+ const uint8_t *src_buffer = src_pyramid->layers[0].buffer;
+ const int src_width = src_pyramid->layers[0].width;
+ const int src_height = src_pyramid->layers[0].height;
+ const int src_stride = src_pyramid->layers[0].stride;
+
+ const uint8_t *ref_buffer = ref_pyramid->layers[0].buffer;
+ assert(ref_pyramid->layers[0].width == src_width);
+ assert(ref_pyramid->layers[0].height == src_height);
+ const int ref_stride = ref_pyramid->layers[0].stride;
// find correspondences between the two images
- correspondences =
- (int *)aom_malloc(num_src_corners * 4 * sizeof(*correspondences));
- if (!correspondences) return 0;
+ correspondences = (Correspondence *)aom_malloc(src_corners->num_corners *
+ sizeof(*correspondences));
+ if (!correspondences) return false;
num_correspondences = aom_determine_correspondence(
- src_buffer, (int *)src_corners, num_src_corners, ref_buffer,
- (int *)ref_corners, num_ref_corners, src_width, src_height, src_stride,
- ref->y_stride, correspondences);
+ src_buffer, src_corners->corners, src_corners->num_corners, ref_buffer,
+ ref_corners->corners, ref_corners->num_corners, src_width, src_height,
+ src_stride, ref_stride, correspondences);
- ransac(correspondences, num_correspondences, num_inliers_by_motion,
- params_by_motion, num_motions);
-
- // Set num_inliers = 0 for motions with too few inliers so they are ignored.
- for (i = 0; i < num_motions; ++i) {
- if (num_inliers_by_motion[i] < MIN_INLIER_PROB * num_correspondences ||
- num_correspondences == 0) {
- num_inliers_by_motion[i] = 0;
- } else if (!get_inliers_from_indices(¶ms_by_motion[i],
- correspondences)) {
- aom_free(correspondences);
- return 0;
- }
- }
+ bool result = ransac(correspondences, num_correspondences, type,
+ motion_models, num_motion_models);
aom_free(correspondences);
-
- // Return true if any one of the motions has inliers.
- for (i = 0; i < num_motions; ++i) {
- if (num_inliers_by_motion[i] > 0) return 1;
- }
- return 0;
+ return result;
}
diff --git a/aom_dsp/flow_estimation/corner_match.h b/aom_dsp/flow_estimation/corner_match.h
index 71afadf..bb69944 100644
--- a/aom_dsp/flow_estimation/corner_match.h
+++ b/aom_dsp/flow_estimation/corner_match.h
@@ -12,10 +12,12 @@
#ifndef AOM_AOM_DSP_FLOW_ESTIMATION_CORNER_MATCH_H_
#define AOM_AOM_DSP_FLOW_ESTIMATION_CORNER_MATCH_H_
+#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_dsp/flow_estimation/flow_estimation.h"
#include "aom_scale/yv12config.h"
@@ -27,22 +29,17 @@
#define MATCH_SZ_BY2 ((MATCH_SZ - 1) / 2)
#define MATCH_SZ_SQ (MATCH_SZ * MATCH_SZ)
-typedef struct {
- int x, y;
- int rx, ry;
-} Correspondence;
-
-int aom_determine_correspondence(unsigned char *src, int *src_corners,
- int num_src_corners, unsigned char *ref,
- int *ref_corners, int num_ref_corners,
+int aom_determine_correspondence(const unsigned char *src,
+ const int *src_corners, int num_src_corners,
+ const unsigned char *ref,
+ const int *ref_corners, int num_ref_corners,
int width, int height, int src_stride,
- int ref_stride, int *correspondence_pts);
+ int ref_stride,
+ Correspondence *correspondences);
-int av1_compute_global_motion_feature_based(
- TransformationType type, unsigned char *src_buffer, int src_width,
- int src_height, int src_stride, int *src_corners, int num_src_corners,
- YV12_BUFFER_CONFIG *ref, int bit_depth, int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions);
+bool av1_compute_global_motion_feature_match(
+ TransformationType type, YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *ref,
+ int bit_depth, MotionModel *motion_models, int num_motion_models);
#ifdef __cplusplus
}
diff --git a/aom_dsp/flow_estimation/disflow.c b/aom_dsp/flow_estimation/disflow.c
index 2a6ad4b..a8e7b06 100644
--- a/aom_dsp/flow_estimation/disflow.c
+++ b/aom_dsp/flow_estimation/disflow.c
@@ -9,626 +9,643 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
-#include <stdbool.h>
-#include <stddef.h>
-#include <stdint.h>
+// Dense Inverse Search flow algorithm
+// Paper: https://arxiv.org/abs/1603.03590
+#include <assert.h>
+#include <math.h>
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_dsp/flow_estimation/disflow.h"
-#include "aom_dsp/flow_estimation/flow_estimation.h"
#include "aom_dsp/flow_estimation/ransac.h"
+#include "aom_dsp/pyramid.h"
+#include "aom_mem/aom_mem.h"
-#include "aom_scale/yv12config.h"
+#include "config/aom_dsp_rtcd.h"
+// TODO(rachelbarker):
+// Implement specialized functions for upscaling flow fields,
+// replacing av1_upscale_plane_double_prec().
+// Then we can avoid needing to include code from av1/
#include "av1/common/resize.h"
-// Number of pyramid levels in disflow computation
-#define N_LEVELS 2
-// Size of square patches in the disflow dense grid
-#define PATCH_SIZE 8
-// Center point of square patch
-#define PATCH_CENTER ((PATCH_SIZE + 1) >> 1)
-// Step size between patches, lower value means greater patch overlap
-#define PATCH_STEP 1
-// Minimum size of border padding for disflow
-#define MIN_PAD 7
-// Warp error convergence threshold for disflow
-#define DISFLOW_ERROR_TR 0.01
-// Max number of iterations if warp convergence is not found
-#define DISFLOW_MAX_ITR 10
+// Amount to downsample the flow field by.
+// eg. DOWNSAMPLE_SHIFT = 2 (DOWNSAMPLE_FACTOR == 4) means we calculate
+// one flow point for each 4x4 pixel region of the frame
+// Must be a power of 2
+#define DOWNSAMPLE_SHIFT 3
+#define DOWNSAMPLE_FACTOR (1 << DOWNSAMPLE_SHIFT)
+// Number of outermost flow field entries (on each edge) which can't be
+// computed, because the patch they correspond to extends outside of the
+// frame
+// The border is (DISFLOW_PATCH_SIZE >> 1) pixels, which is
+// (DISFLOW_PATCH_SIZE >> 1) >> DOWNSAMPLE_SHIFT many flow field entries
+#define FLOW_BORDER ((DISFLOW_PATCH_SIZE >> 1) >> DOWNSAMPLE_SHIFT)
+// When downsampling the flow field, each flow field entry covers a square
+// region of pixels in the image pyramid. This value is equal to the position
+// of the center of that region, as an offset from the top/left edge.
+//
+// Note: Using ((DOWNSAMPLE_FACTOR - 1) / 2) is equivalent to the more
+// natural expression ((DOWNSAMPLE_FACTOR / 2) - 1),
+// unless DOWNSAMPLE_FACTOR == 1 (ie, no downsampling), in which case
+// this gives the correct offset of 0 instead of -1.
+#define UPSAMPLE_CENTER_OFFSET ((DOWNSAMPLE_FACTOR - 1) / 2)
-// Struct for an image pyramid
-typedef struct {
- int n_levels;
- int pad_size;
- int has_gradient;
- int widths[N_LEVELS];
- int heights[N_LEVELS];
- int strides[N_LEVELS];
- int level_loc[N_LEVELS];
- unsigned char *level_buffer;
- double *level_dx_buffer;
- double *level_dy_buffer;
-} ImagePyramid;
-
-// Don't use points around the frame border since they are less reliable
-static INLINE int valid_point(int x, int y, int width, int height) {
- return (x > (PATCH_SIZE + PATCH_CENTER)) &&
- (x < (width - PATCH_SIZE - PATCH_CENTER)) &&
- (y > (PATCH_SIZE + PATCH_CENTER)) &&
- (y < (height - PATCH_SIZE - PATCH_CENTER));
+static INLINE void get_cubic_kernel_dbl(double x, double *kernel) {
+ assert(0 <= x && x < 1);
+ double x2 = x * x;
+ double x3 = x2 * x;
+ kernel[0] = -0.5 * x + x2 - 0.5 * x3;
+ kernel[1] = 1.0 - 2.5 * x2 + 1.5 * x3;
+ kernel[2] = 0.5 * x + 2.0 * x2 - 1.5 * x3;
+ kernel[3] = -0.5 * x2 + 0.5 * x3;
}
-static int determine_disflow_correspondence(int *frm_corners,
- int num_frm_corners, double *flow_u,
- double *flow_v, int width,
- int height, int stride,
- double *correspondences) {
+static INLINE void get_cubic_kernel_int(double x, int *kernel) {
+ double kernel_dbl[4];
+ get_cubic_kernel_dbl(x, kernel_dbl);
+
+ kernel[0] = (int)rint(kernel_dbl[0] * (1 << DISFLOW_INTERP_BITS));
+ kernel[1] = (int)rint(kernel_dbl[1] * (1 << DISFLOW_INTERP_BITS));
+ kernel[2] = (int)rint(kernel_dbl[2] * (1 << DISFLOW_INTERP_BITS));
+ kernel[3] = (int)rint(kernel_dbl[3] * (1 << DISFLOW_INTERP_BITS));
+}
+
+static INLINE double get_cubic_value_dbl(const double *p,
+ const double *kernel) {
+ return kernel[0] * p[0] + kernel[1] * p[1] + kernel[2] * p[2] +
+ kernel[3] * p[3];
+}
+
+static INLINE int get_cubic_value_int(const int *p, const int *kernel) {
+ return kernel[0] * p[0] + kernel[1] * p[1] + kernel[2] * p[2] +
+ kernel[3] * p[3];
+}
+
+static INLINE double bicubic_interp_one(const double *arr, int stride,
+ double *h_kernel, double *v_kernel) {
+ double tmp[1 * 4];
+
+ // Horizontal convolution
+ for (int i = -1; i < 3; ++i) {
+ tmp[i + 1] = get_cubic_value_dbl(&arr[i * stride - 1], h_kernel);
+ }
+
+ // Vertical convolution
+ return get_cubic_value_dbl(tmp, v_kernel);
+}
+
+static int determine_disflow_correspondence(CornerList *corners,
+ const FlowField *flow,
+ Correspondence *correspondences) {
+ const int width = flow->width;
+ const int height = flow->height;
+ const int stride = flow->stride;
+
int num_correspondences = 0;
- int x, y;
- for (int i = 0; i < num_frm_corners; ++i) {
- x = frm_corners[2 * i];
- y = frm_corners[2 * i + 1];
- if (valid_point(x, y, width, height)) {
- correspondences[4 * num_correspondences] = x;
- correspondences[4 * num_correspondences + 1] = y;
- correspondences[4 * num_correspondences + 2] = x + flow_u[y * stride + x];
- correspondences[4 * num_correspondences + 3] = y + flow_v[y * stride + x];
- num_correspondences++;
- }
+ for (int i = 0; i < corners->num_corners; ++i) {
+ const int x0 = corners->corners[2 * i];
+ const int y0 = corners->corners[2 * i + 1];
+
+ // Offset points, to compensate for the fact that (say) a flow field entry
+ // at horizontal index i, is nominally associated with the pixel at
+ // horizontal coordinate (i << DOWNSAMPLE_FACTOR) + UPSAMPLE_CENTER_OFFSET
+ // This offset must be applied before we split the coordinate into integer
+ // and fractional parts, in order for the interpolation to be correct.
+ const int x = x0 - UPSAMPLE_CENTER_OFFSET;
+ const int y = y0 - UPSAMPLE_CENTER_OFFSET;
+
+ // Split the pixel coordinates into integer flow field coordinates and
+ // an offset for interpolation
+ const int flow_x = x >> DOWNSAMPLE_SHIFT;
+ const double flow_sub_x =
+ (x & (DOWNSAMPLE_FACTOR - 1)) / (double)DOWNSAMPLE_FACTOR;
+ const int flow_y = y >> DOWNSAMPLE_SHIFT;
+ const double flow_sub_y =
+ (y & (DOWNSAMPLE_FACTOR - 1)) / (double)DOWNSAMPLE_FACTOR;
+
+ // Make sure that bicubic interpolation won't read outside of the flow field
+ if (flow_x < 1 || (flow_x + 2) >= width) continue;
+ if (flow_y < 1 || (flow_y + 2) >= height) continue;
+
+ double h_kernel[4];
+ double v_kernel[4];
+ get_cubic_kernel_dbl(flow_sub_x, h_kernel);
+ get_cubic_kernel_dbl(flow_sub_y, v_kernel);
+
+ const double flow_u = bicubic_interp_one(&flow->u[flow_y * stride + flow_x],
+ stride, h_kernel, v_kernel);
+ const double flow_v = bicubic_interp_one(&flow->v[flow_y * stride + flow_x],
+ stride, h_kernel, v_kernel);
+
+ // Use original points (without offsets) when filling in correspondence
+ // array
+ correspondences[num_correspondences].x = x0;
+ correspondences[num_correspondences].y = y0;
+ correspondences[num_correspondences].rx = x0 + flow_u;
+ correspondences[num_correspondences].ry = y0 + flow_v;
+ num_correspondences++;
}
return num_correspondences;
}
-static double getCubicValue(double p[4], double x) {
- return p[1] + 0.5 * x *
- (p[2] - p[0] +
- x * (2.0 * p[0] - 5.0 * p[1] + 4.0 * p[2] - p[3] +
- x * (3.0 * (p[1] - p[2]) + p[3] - p[0])));
-}
+// Compare two regions of width x height pixels, one rooted at position
+// (x, y) in src and the other at (x + u, y + v) in ref.
+// This function returns the sum of squared pixel differences between
+// the two regions.
+static INLINE void compute_flow_error(const uint8_t *src, const uint8_t *ref,
+ int width, int height, int stride, int x,
+ int y, double u, double v, int16_t *dt) {
+ // Split offset into integer and fractional parts, and compute cubic
+ // interpolation kernels
+ const int u_int = (int)floor(u);
+ const int v_int = (int)floor(v);
+ const double u_frac = u - floor(u);
+ const double v_frac = v - floor(v);
-static void get_subcolumn(unsigned char *ref, double col[4], int stride, int x,
- int y_start) {
- int i;
- for (i = 0; i < 4; ++i) {
- col[i] = ref[(i + y_start) * stride + x];
- }
-}
+ int h_kernel[4];
+ int v_kernel[4];
+ get_cubic_kernel_int(u_frac, h_kernel);
+ get_cubic_kernel_int(v_frac, v_kernel);
-static double bicubic(unsigned char *ref, double x, double y, int stride) {
- double arr[4];
- int k;
- int i = (int)x;
- int j = (int)y;
- for (k = 0; k < 4; ++k) {
- double arr_temp[4];
- get_subcolumn(ref, arr_temp, stride, i + k - 1, j - 1);
- arr[k] = getCubicValue(arr_temp, y - j);
- }
- return getCubicValue(arr, x - i);
-}
+ // Storage for intermediate values between the two convolution directions
+ int tmp_[DISFLOW_PATCH_SIZE * (DISFLOW_PATCH_SIZE + 3)];
+ int *tmp = tmp_ + DISFLOW_PATCH_SIZE; // Offset by one row
-// Interpolate a warped block using bicubic interpolation when possible
-static unsigned char interpolate(unsigned char *ref, double x, double y,
- int width, int height, int stride) {
- if (x < 0 && y < 0)
- return ref[0];
- else if (x < 0 && y > height - 1)
- return ref[(height - 1) * stride];
- else if (x > width - 1 && y < 0)
- return ref[width - 1];
- else if (x > width - 1 && y > height - 1)
- return ref[(height - 1) * stride + (width - 1)];
- else if (x < 0) {
- int v;
- int i = (int)y;
- double a = y - i;
- if (y > 1 && y < height - 2) {
- double arr[4];
- get_subcolumn(ref, arr, stride, 0, i - 1);
- return clamp((int)(getCubicValue(arr, a) + 0.5), 0, 255);
- }
- v = (int)(ref[i * stride] * (1 - a) + ref[(i + 1) * stride] * a + 0.5);
- return clamp(v, 0, 255);
- } else if (y < 0) {
- int v;
- int j = (int)x;
- double b = x - j;
- if (x > 1 && x < width - 2) {
- double arr[4] = { ref[j - 1], ref[j], ref[j + 1], ref[j + 2] };
- return clamp((int)(getCubicValue(arr, b) + 0.5), 0, 255);
- }
- v = (int)(ref[j] * (1 - b) + ref[j + 1] * b + 0.5);
- return clamp(v, 0, 255);
- } else if (x > width - 1) {
- int v;
- int i = (int)y;
- double a = y - i;
- if (y > 1 && y < height - 2) {
- double arr[4];
- get_subcolumn(ref, arr, stride, width - 1, i - 1);
- return clamp((int)(getCubicValue(arr, a) + 0.5), 0, 255);
- }
- v = (int)(ref[i * stride + width - 1] * (1 - a) +
- ref[(i + 1) * stride + width - 1] * a + 0.5);
- return clamp(v, 0, 255);
- } else if (y > height - 1) {
- int v;
- int j = (int)x;
- double b = x - j;
- if (x > 1 && x < width - 2) {
- int row = (height - 1) * stride;
- double arr[4] = { ref[row + j - 1], ref[row + j], ref[row + j + 1],
- ref[row + j + 2] };
- return clamp((int)(getCubicValue(arr, b) + 0.5), 0, 255);
- }
- v = (int)(ref[(height - 1) * stride + j] * (1 - b) +
- ref[(height - 1) * stride + j + 1] * b + 0.5);
- return clamp(v, 0, 255);
- } else if (x > 1 && y > 1 && x < width - 2 && y < height - 2) {
- return clamp((int)(bicubic(ref, x, y, stride) + 0.5), 0, 255);
- } else {
- int i = (int)y;
- int j = (int)x;
- double a = y - i;
- double b = x - j;
- int v = (int)(ref[i * stride + j] * (1 - a) * (1 - b) +
- ref[i * stride + j + 1] * (1 - a) * b +
- ref[(i + 1) * stride + j] * a * (1 - b) +
- ref[(i + 1) * stride + j + 1] * a * b);
- return clamp(v, 0, 255);
- }
-}
+ // Clamp coordinates so that all pixels we fetch will remain within the
+ // allocated border region, but allow them to go far enough out that
+ // the border pixels' values do not change.
+ // Since we are calculating an 8x8 block, the bottom-right pixel
+ // in the block has coordinates (x0 + 7, y0 + 7). Then, the cubic
+ // interpolation has 4 taps, meaning that the output of pixel
+ // (x_w, y_w) depends on the pixels in the range
+ // ([x_w - 1, x_w + 2], [y_w - 1, y_w + 2]).
+ //
+ // Thus the most extreme coordinates which will be fetched are
+ // (x0 - 1, y0 - 1) and (x0 + 9, y0 + 9).
+ const int x0 = clamp(x + u_int, -9, width);
+ const int y0 = clamp(y + v_int, -9, height);
-// Warps a block using flow vector [u, v] and computes the mse
-static double compute_warp_and_error(unsigned char *ref, unsigned char *frm,
- int width, int height, int stride, int x,
- int y, double u, double v, int16_t *dt) {
- int i, j;
- unsigned char warped;
- double x_w, y_w;
- double mse = 0;
- int16_t err = 0;
- for (i = y; i < y + PATCH_SIZE; ++i)
- for (j = x; j < x + PATCH_SIZE; ++j) {
- x_w = (double)j + u;
- y_w = (double)i + v;
- warped = interpolate(ref, x_w, y_w, width, height, stride);
- err = warped - frm[j + i * stride];
- mse += err * err;
- dt[(i - y) * PATCH_SIZE + (j - x)] = err;
- }
+ // Horizontal convolution
+ for (int i = -1; i < DISFLOW_PATCH_SIZE + 2; ++i) {
+ const int y_w = y0 + i;
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; ++j) {
+ const int x_w = x0 + j;
+ int arr[4];
- mse /= (PATCH_SIZE * PATCH_SIZE);
- return mse;
-}
+ arr[0] = (int)ref[y_w * stride + (x_w - 1)];
+ arr[1] = (int)ref[y_w * stride + (x_w + 0)];
+ arr[2] = (int)ref[y_w * stride + (x_w + 1)];
+ arr[3] = (int)ref[y_w * stride + (x_w + 2)];
-// Computes the components of the system of equations used to solve for
-// a flow vector. This includes:
-// 1.) The hessian matrix for optical flow. This matrix is in the
-// form of:
-//
-// M = |sum(dx * dx) sum(dx * dy)|
-// |sum(dx * dy) sum(dy * dy)|
-//
-// 2.) b = |sum(dx * dt)|
-// |sum(dy * dt)|
-// Where the sums are computed over a square window of PATCH_SIZE.
-static INLINE void compute_flow_system(const double *dx, int dx_stride,
- const double *dy, int dy_stride,
- const int16_t *dt, int dt_stride,
- double *M, double *b) {
- for (int i = 0; i < PATCH_SIZE; i++) {
- for (int j = 0; j < PATCH_SIZE; j++) {
- M[0] += dx[i * dx_stride + j] * dx[i * dx_stride + j];
- M[1] += dx[i * dx_stride + j] * dy[i * dy_stride + j];
- M[3] += dy[i * dy_stride + j] * dy[i * dy_stride + j];
-
- b[0] += dx[i * dx_stride + j] * dt[i * dt_stride + j];
- b[1] += dy[i * dy_stride + j] * dt[i * dt_stride + j];
+ // Apply kernel and round, keeping 6 extra bits of precision.
+ //
+ // 6 is the maximum allowable number of extra bits which will avoid
+ // the intermediate values overflowing an int16_t. The most extreme
+ // intermediate value occurs when:
+ // * The input pixels are [0, 255, 255, 0]
+ // * u_frac = 0.5
+ // In this case, the un-scaled output is 255 * 1.125 = 286.875.
+ // As an integer with 6 fractional bits, that is 18360, which fits
+ // in an int16_t. But with 7 fractional bits it would be 36720,
+ // which is too large.
+ tmp[i * DISFLOW_PATCH_SIZE + j] = ROUND_POWER_OF_TWO(
+ get_cubic_value_int(arr, h_kernel), DISFLOW_INTERP_BITS - 6);
}
}
- M[2] = M[1];
-}
+ // Vertical convolution
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; ++i) {
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; ++j) {
+ const int *p = &tmp[i * DISFLOW_PATCH_SIZE + j];
+ const int arr[4] = { p[-DISFLOW_PATCH_SIZE], p[0], p[DISFLOW_PATCH_SIZE],
+ p[2 * DISFLOW_PATCH_SIZE] };
+ const int result = get_cubic_value_int(arr, v_kernel);
-// Solves a general Mx = b where M is a 2x2 matrix and b is a 2x1 matrix
-static INLINE void solve_2x2_system(const double *M, const double *b,
- double *output_vec) {
- double M_0 = M[0];
- double M_3 = M[3];
- double det = (M_0 * M_3) - (M[1] * M[2]);
- if (det < 1e-5) {
- // Handle singular matrix
- // TODO(sarahparker) compare results using pseudo inverse instead
- M_0 += 1e-10;
- M_3 += 1e-10;
- det = (M_0 * M_3) - (M[1] * M[2]);
- }
- const double det_inv = 1 / det;
- const double mult_b0 = det_inv * b[0];
- const double mult_b1 = det_inv * b[1];
- output_vec[0] = M_3 * mult_b0 - M[1] * mult_b1;
- output_vec[1] = -M[2] * mult_b0 + M_0 * mult_b1;
-}
-
-/*
-static INLINE void image_difference(const uint8_t *src, int src_stride,
- const uint8_t *ref, int ref_stride,
- int16_t *dst, int dst_stride, int height,
- int width) {
- const int block_unit = 8;
- // Take difference in 8x8 blocks to make use of optimized diff function
- for (int i = 0; i < height; i += block_unit) {
- for (int j = 0; j < width; j += block_unit) {
- aom_subtract_block(block_unit, block_unit, dst + i * dst_stride + j,
- dst_stride, src + i * src_stride + j, src_stride,
- ref + i * ref_stride + j, ref_stride);
+ // Apply kernel and round.
+ // This time, we have to round off the 6 extra bits which were kept
+ // earlier, but we also want to keep DISFLOW_DERIV_SCALE_LOG2 extra bits
+ // of precision to match the scale of the dx and dy arrays.
+ const int round_bits = DISFLOW_INTERP_BITS + 6 - DISFLOW_DERIV_SCALE_LOG2;
+ const int warped = ROUND_POWER_OF_TWO(result, round_bits);
+ const int src_px = src[(x + j) + (y + i) * stride] << 3;
+ const int err = warped - src_px;
+ dt[i * DISFLOW_PATCH_SIZE + j] = err;
}
}
}
-*/
-static INLINE void convolve_2d_sobel_y(const uint8_t *src, int src_stride,
- double *dst, int dst_stride, int w,
- int h, int dir, double norm) {
- int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
- DECLARE_ALIGNED(256, static const int16_t, sobel_a[3]) = { 1, 0, -1 };
- DECLARE_ALIGNED(256, static const int16_t, sobel_b[3]) = { 1, 2, 1 };
+static INLINE void sobel_filter(const uint8_t *src, int src_stride,
+ int16_t *dst, int dst_stride, int dir) {
+ int16_t tmp_[DISFLOW_PATCH_SIZE * (DISFLOW_PATCH_SIZE + 2)];
+ int16_t *tmp = tmp_ + DISFLOW_PATCH_SIZE;
+
+ // Sobel filter kernel
+ // This must have an overall scale factor equal to DISFLOW_DERIV_SCALE,
+ // in order to produce correctly scaled outputs.
+ // To work out the scale factor, we multiply two factors:
+ //
+ // * For the derivative filter (sobel_a), comparing our filter
+ // image[x - 1] - image[x + 1]
+ // to the standard form
+ // d/dx image[x] = image[x+1] - image[x]
+ // tells us that we're actually calculating -2 * d/dx image[2]
+ //
+ // * For the smoothing filter (sobel_b), all coefficients are positive
+ // so the scale factor is just the sum of the coefficients
+ //
+ // Thus we need to make sure that DISFLOW_DERIV_SCALE = 2 * sum(sobel_b)
+ // (and take care of the - sign from sobel_a elsewhere)
+ static const int16_t sobel_a[3] = { 1, 0, -1 };
+ static const int16_t sobel_b[3] = { 1, 2, 1 };
const int taps = 3;
- int im_h = h + taps - 1;
- int im_stride = w;
- const int fo_vert = 1;
- const int fo_horiz = 1;
// horizontal filter
- const uint8_t *src_horiz = src - fo_vert * src_stride;
- const int16_t *x_filter = dir ? sobel_a : sobel_b;
- for (int y = 0; y < im_h; ++y) {
- for (int x = 0; x < w; ++x) {
- int16_t sum = 0;
+ const int16_t *h_kernel = dir ? sobel_a : sobel_b;
+
+ for (int y = -1; y < DISFLOW_PATCH_SIZE + 1; ++y) {
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
for (int k = 0; k < taps; ++k) {
- sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
+ sum += h_kernel[k] * src[y * src_stride + (x + k - 1)];
}
- im_block[y * im_stride + x] = sum;
+ tmp[y * DISFLOW_PATCH_SIZE + x] = sum;
}
}
// vertical filter
- int16_t *src_vert = im_block + fo_vert * im_stride;
- const int16_t *y_filter = dir ? sobel_b : sobel_a;
- for (int y = 0; y < h; ++y) {
- for (int x = 0; x < w; ++x) {
- int16_t sum = 0;
+ const int16_t *v_kernel = dir ? sobel_b : sobel_a;
+
+ for (int y = 0; y < DISFLOW_PATCH_SIZE; ++y) {
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
for (int k = 0; k < taps; ++k) {
- sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
+ sum += v_kernel[k] * tmp[(y + k - 1) * DISFLOW_PATCH_SIZE + x];
}
- dst[y * dst_stride + x] = sum * norm;
+ dst[y * dst_stride + x] = sum;
}
}
}
-// Compute an image gradient using a sobel filter.
-// If dir == 1, compute the x gradient. If dir == 0, compute y. This function
-// assumes the images have been padded so that they can be processed in units
-// of 8.
-static INLINE void sobel_xy_image_gradient(const uint8_t *src, int src_stride,
- double *dst, int dst_stride,
- int height, int width, int dir) {
- double norm = 1.0;
- // TODO(sarahparker) experiment with doing this over larger block sizes
- const int block_unit = 8;
- // Filter in 8x8 blocks to eventually make use of optimized convolve function
- for (int i = 0; i < height; i += block_unit) {
- for (int j = 0; j < width; j += block_unit) {
- convolve_2d_sobel_y(src + i * src_stride + j, src_stride,
- dst + i * dst_stride + j, dst_stride, block_unit,
- block_unit, dir, norm);
+// Computes the components of the system of equations used to solve for
+// a flow vector.
+//
+// The flow equations are a least-squares system, derived as follows:
+//
+// For each pixel in the patch, we calculate the current error `dt`,
+// and the x and y gradients `dx` and `dy` of the source patch.
+// This means that, to first order, the squared error for this pixel is
+//
+// (dt + u * dx + v * dy)^2
+//
+// where (u, v) are the incremental changes to the flow vector.
+//
+// We then want to find the values of u and v which minimize the sum
+// of the squared error across all pixels. Conveniently, this fits exactly
+// into the form of a least squares problem, with one equation
+//
+// u * dx + v * dy = -dt
+//
+// for each pixel.
+//
+// Summing across all pixels in a square window of size DISFLOW_PATCH_SIZE,
+// and absorbing the - sign elsewhere, this results in the least squares system
+//
+// M = |sum(dx * dx) sum(dx * dy)|
+// |sum(dx * dy) sum(dy * dy)|
+//
+// b = |sum(dx * dt)|
+// |sum(dy * dt)|
+static INLINE void compute_flow_matrix(const int16_t *dx, int dx_stride,
+ const int16_t *dy, int dy_stride,
+ double *M) {
+ int tmp[4] = { 0 };
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; j++) {
+ tmp[0] += dx[i * dx_stride + j] * dx[i * dx_stride + j];
+ tmp[1] += dx[i * dx_stride + j] * dy[i * dy_stride + j];
+ // Don't compute tmp[2], as it should be equal to tmp[1]
+ tmp[3] += dy[i * dy_stride + j] * dy[i * dy_stride + j];
+ }
+ }
+
+ // Apply regularization
+ // We follow the standard regularization method of adding `k * I` before
+ // inverting. This ensures that the matrix will be invertible.
+ //
+ // Setting the regularization strength k to 1 seems to work well here, as
+ // typical values coming from the other equations are very large (1e5 to
+ // 1e6, with an upper limit of around 6e7, at the time of writing).
+ // It also preserves the property that all matrix values are whole numbers,
+ // which is convenient for integerized SIMD implementation.
+ tmp[0] += 1;
+ tmp[3] += 1;
+
+ tmp[2] = tmp[1];
+
+ M[0] = (double)tmp[0];
+ M[1] = (double)tmp[1];
+ M[2] = (double)tmp[2];
+ M[3] = (double)tmp[3];
+}
+
+static INLINE void compute_flow_vector(const int16_t *dx, int dx_stride,
+ const int16_t *dy, int dy_stride,
+ const int16_t *dt, int dt_stride,
+ int *b) {
+ memset(b, 0, 2 * sizeof(*b));
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; j++) {
+ b[0] += dx[i * dx_stride + j] * dt[i * dt_stride + j];
+ b[1] += dy[i * dy_stride + j] * dt[i * dt_stride + j];
}
}
}
-static void free_pyramid(ImagePyramid *pyr) {
- aom_free(pyr->level_buffer);
- if (pyr->has_gradient) {
- aom_free(pyr->level_dx_buffer);
- aom_free(pyr->level_dy_buffer);
- }
- aom_free(pyr);
+// Try to invert the matrix M
+// Note: Due to the nature of how a least-squares matrix is constructed, all of
+// the eigenvalues will be >= 0, and therefore det M >= 0 as well.
+// The regularization term `+ k * I` further ensures that det M >= k^2.
+// As mentioned in compute_flow_matrix(), here we use k = 1, so det M >= 1.
+// So we don't have to worry about non-invertible matrices here.
+static INLINE void invert_2x2(const double *M, double *M_inv) {
+ double det = (M[0] * M[3]) - (M[1] * M[2]);
+ assert(det >= 1);
+ const double det_inv = 1 / det;
+
+ M_inv[0] = M[3] * det_inv;
+ M_inv[1] = -M[1] * det_inv;
+ M_inv[2] = -M[2] * det_inv;
+ M_inv[3] = M[0] * det_inv;
}
-static ImagePyramid *alloc_pyramid(int width, int height, int pad_size,
- int compute_gradient) {
- ImagePyramid *pyr = aom_calloc(1, sizeof(*pyr));
- if (!pyr) return NULL;
- pyr->has_gradient = compute_gradient;
- // 2 * width * height is the upper bound for a buffer that fits
- // all pyramid levels + padding for each level
- const int buffer_size = sizeof(*pyr->level_buffer) * 2 * width * height +
- (width + 2 * pad_size) * 2 * pad_size * N_LEVELS;
- pyr->level_buffer = aom_malloc(buffer_size);
- if (!pyr->level_buffer) {
- free_pyramid(pyr);
- return NULL;
- }
- memset(pyr->level_buffer, 0, buffer_size);
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x,
+ int y, int width, int height, int stride,
+ double *u, double *v) {
+ double M[4];
+ double M_inv[4];
+ int b[2];
+ int16_t dt[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
+ int16_t dx[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
+ int16_t dy[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
- if (compute_gradient) {
- const int gradient_size =
- sizeof(*pyr->level_dx_buffer) * 2 * width * height +
- (width + 2 * pad_size) * 2 * pad_size * N_LEVELS;
- pyr->level_dx_buffer = aom_calloc(1, gradient_size);
- pyr->level_dy_buffer = aom_calloc(1, gradient_size);
- if (!(pyr->level_dx_buffer && pyr->level_dy_buffer)) {
- free_pyramid(pyr);
- return NULL;
- }
- }
- return pyr;
-}
+ // Compute gradients within this patch
+ const uint8_t *src_patch = &src[y * stride + x];
+ sobel_filter(src_patch, stride, dx, DISFLOW_PATCH_SIZE, 1);
+ sobel_filter(src_patch, stride, dy, DISFLOW_PATCH_SIZE, 0);
-static INLINE void update_level_dims(ImagePyramid *frm_pyr, int level) {
- frm_pyr->widths[level] = frm_pyr->widths[level - 1] >> 1;
- frm_pyr->heights[level] = frm_pyr->heights[level - 1] >> 1;
- frm_pyr->strides[level] = frm_pyr->widths[level] + 2 * frm_pyr->pad_size;
- // Point the beginning of the next level buffer to the correct location inside
- // the padded border
- frm_pyr->level_loc[level] =
- frm_pyr->level_loc[level - 1] +
- frm_pyr->strides[level - 1] *
- (2 * frm_pyr->pad_size + frm_pyr->heights[level - 1]);
-}
-
-// Compute coarse to fine pyramids for a frame
-static void compute_flow_pyramids(unsigned char *frm, const int frm_width,
- const int frm_height, const int frm_stride,
- int n_levels, int pad_size, int compute_grad,
- ImagePyramid *frm_pyr) {
- int cur_width, cur_height, cur_stride, cur_loc;
- assert((frm_width >> n_levels) > 0);
- assert((frm_height >> n_levels) > 0);
-
- // Initialize first level
- frm_pyr->n_levels = n_levels;
- frm_pyr->pad_size = pad_size;
- frm_pyr->widths[0] = frm_width;
- frm_pyr->heights[0] = frm_height;
- frm_pyr->strides[0] = frm_width + 2 * frm_pyr->pad_size;
- // Point the beginning of the level buffer to the location inside
- // the padded border
- frm_pyr->level_loc[0] =
- frm_pyr->strides[0] * frm_pyr->pad_size + frm_pyr->pad_size;
- // This essentially copies the original buffer into the pyramid buffer
- // without the original padding
- av1_resize_plane(frm, frm_height, frm_width, frm_stride,
- frm_pyr->level_buffer + frm_pyr->level_loc[0],
- frm_pyr->heights[0], frm_pyr->widths[0],
- frm_pyr->strides[0]);
-
- if (compute_grad) {
- cur_width = frm_pyr->widths[0];
- cur_height = frm_pyr->heights[0];
- cur_stride = frm_pyr->strides[0];
- cur_loc = frm_pyr->level_loc[0];
- assert(frm_pyr->has_gradient && frm_pyr->level_dx_buffer != NULL &&
- frm_pyr->level_dy_buffer != NULL);
- // Computation x gradient
- sobel_xy_image_gradient(frm_pyr->level_buffer + cur_loc, cur_stride,
- frm_pyr->level_dx_buffer + cur_loc, cur_stride,
- cur_height, cur_width, 1);
-
- // Computation y gradient
- sobel_xy_image_gradient(frm_pyr->level_buffer + cur_loc, cur_stride,
- frm_pyr->level_dy_buffer + cur_loc, cur_stride,
- cur_height, cur_width, 0);
- }
-
- // Start at the finest level and resize down to the coarsest level
- for (int level = 1; level < n_levels; ++level) {
- update_level_dims(frm_pyr, level);
- cur_width = frm_pyr->widths[level];
- cur_height = frm_pyr->heights[level];
- cur_stride = frm_pyr->strides[level];
- cur_loc = frm_pyr->level_loc[level];
-
- av1_resize_plane(frm_pyr->level_buffer + frm_pyr->level_loc[level - 1],
- frm_pyr->heights[level - 1], frm_pyr->widths[level - 1],
- frm_pyr->strides[level - 1],
- frm_pyr->level_buffer + cur_loc, cur_height, cur_width,
- cur_stride);
-
- if (compute_grad) {
- assert(frm_pyr->has_gradient && frm_pyr->level_dx_buffer != NULL &&
- frm_pyr->level_dy_buffer != NULL);
- // Computation x gradient
- sobel_xy_image_gradient(frm_pyr->level_buffer + cur_loc, cur_stride,
- frm_pyr->level_dx_buffer + cur_loc, cur_stride,
- cur_height, cur_width, 1);
-
- // Computation y gradient
- sobel_xy_image_gradient(frm_pyr->level_buffer + cur_loc, cur_stride,
- frm_pyr->level_dy_buffer + cur_loc, cur_stride,
- cur_height, cur_width, 0);
- }
- }
-}
-
-static INLINE void compute_flow_at_point(unsigned char *frm, unsigned char *ref,
- double *dx, double *dy, int x, int y,
- int width, int height, int stride,
- double *u, double *v) {
- double M[4] = { 0 };
- double b[2] = { 0 };
- double tmp_output_vec[2] = { 0 };
- double error = 0;
- int16_t dt[PATCH_SIZE * PATCH_SIZE];
- double o_u = *u;
- double o_v = *v;
+ compute_flow_matrix(dx, DISFLOW_PATCH_SIZE, dy, DISFLOW_PATCH_SIZE, M);
+ invert_2x2(M, M_inv);
for (int itr = 0; itr < DISFLOW_MAX_ITR; itr++) {
- error = compute_warp_and_error(ref, frm, width, height, stride, x, y, *u,
- *v, dt);
- if (error <= DISFLOW_ERROR_TR) break;
- compute_flow_system(dx, stride, dy, stride, dt, PATCH_SIZE, M, b);
- solve_2x2_system(M, b, tmp_output_vec);
- *u += tmp_output_vec[0];
- *v += tmp_output_vec[1];
+ compute_flow_error(src, ref, width, height, stride, x, y, *u, *v, dt);
+ compute_flow_vector(dx, DISFLOW_PATCH_SIZE, dy, DISFLOW_PATCH_SIZE, dt,
+ DISFLOW_PATCH_SIZE, b);
+
+ // Solve flow equations to find a better estimate for the flow vector
+ // at this point
+ const double step_u = M_inv[0] * b[0] + M_inv[1] * b[1];
+ const double step_v = M_inv[2] * b[0] + M_inv[3] * b[1];
+ *u += fclamp(step_u * DISFLOW_STEP_SIZE, -2, 2);
+ *v += fclamp(step_v * DISFLOW_STEP_SIZE, -2, 2);
+
+ if (fabs(step_u) + fabs(step_v) < DISFLOW_STEP_SIZE_THRESOLD) {
+ // Stop iteration when we're close to convergence
+ break;
+ }
}
- if (fabs(*u - o_u) > PATCH_SIZE || fabs(*v - o_u) > PATCH_SIZE) {
- *u = o_u;
- *v = o_v;
+}
+
+static void fill_flow_field_borders(double *flow, int width, int height,
+ int stride) {
+ // Calculate the bounds of the rectangle which was filled in by
+ // compute_flow_field() before calling this function.
+ // These indices are inclusive on both ends.
+ const int left_index = FLOW_BORDER;
+ const int right_index = (width - FLOW_BORDER - 1);
+ const int top_index = FLOW_BORDER;
+ const int bottom_index = (height - FLOW_BORDER - 1);
+
+ // Left area
+ for (int i = top_index; i <= bottom_index; i += 1) {
+ double *row = flow + i * stride;
+ const double left = row[left_index];
+ for (int j = 0; j < left_index; j++) {
+ row[j] = left;
+ }
+ }
+
+ // Right area
+ for (int i = top_index; i <= bottom_index; i += 1) {
+ double *row = flow + i * stride;
+ const double right = row[right_index];
+ for (int j = right_index + 1; j < width; j++) {
+ row[j] = right;
+ }
+ }
+
+ // Top area
+ const double *top_row = flow + top_index * stride;
+ for (int i = 0; i < top_index; i++) {
+ double *row = flow + i * stride;
+ memcpy(row, top_row, width * sizeof(*row));
+ }
+
+ // Bottom area
+ const double *bottom_row = flow + bottom_index * stride;
+ for (int i = bottom_index + 1; i < height; i++) {
+ double *row = flow + i * stride;
+ memcpy(row, bottom_row, width * sizeof(*row));
}
}
// make sure flow_u and flow_v start at 0
-static bool compute_flow_field(ImagePyramid *frm_pyr, ImagePyramid *ref_pyr,
- double *flow_u, double *flow_v) {
- int cur_width, cur_height, cur_stride, cur_loc, patch_loc, patch_center;
- double *u_upscale =
- aom_malloc(frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_u));
- double *v_upscale =
- aom_malloc(frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_v));
- if (!(u_upscale && v_upscale)) {
- aom_free(u_upscale);
- aom_free(v_upscale);
- return false;
- }
+static void compute_flow_field(const ImagePyramid *src_pyr,
+ const ImagePyramid *ref_pyr, FlowField *flow) {
+ assert(src_pyr->n_levels == ref_pyr->n_levels);
- assert(frm_pyr->n_levels == ref_pyr->n_levels);
+ double *flow_u = flow->u;
+ double *flow_v = flow->v;
+
+ const size_t flow_size = flow->stride * (size_t)flow->height;
+ double *u_upscale = aom_malloc(flow_size * sizeof(*u_upscale));
+ double *v_upscale = aom_malloc(flow_size * sizeof(*v_upscale));
// Compute flow field from coarsest to finest level of the pyramid
- for (int level = frm_pyr->n_levels - 1; level >= 0; --level) {
- cur_width = frm_pyr->widths[level];
- cur_height = frm_pyr->heights[level];
- cur_stride = frm_pyr->strides[level];
- cur_loc = frm_pyr->level_loc[level];
+ for (int level = src_pyr->n_levels - 1; level >= 0; --level) {
+ const PyramidLayer *cur_layer = &src_pyr->layers[level];
+ const int cur_width = cur_layer->width;
+ const int cur_height = cur_layer->height;
+ const int cur_stride = cur_layer->stride;
- for (int i = PATCH_SIZE; i < cur_height - PATCH_SIZE; i += PATCH_STEP) {
- for (int j = PATCH_SIZE; j < cur_width - PATCH_SIZE; j += PATCH_STEP) {
- patch_loc = i * cur_stride + j;
- patch_center = patch_loc + PATCH_CENTER * cur_stride + PATCH_CENTER;
- compute_flow_at_point(frm_pyr->level_buffer + cur_loc,
- ref_pyr->level_buffer + cur_loc,
- frm_pyr->level_dx_buffer + cur_loc + patch_loc,
- frm_pyr->level_dy_buffer + cur_loc + patch_loc, j,
- i, cur_width, cur_height, cur_stride,
- flow_u + patch_center, flow_v + patch_center);
+ const uint8_t *src_buffer = cur_layer->buffer;
+ const uint8_t *ref_buffer = ref_pyr->layers[level].buffer;
+
+ const int cur_flow_width = cur_width >> DOWNSAMPLE_SHIFT;
+ const int cur_flow_height = cur_height >> DOWNSAMPLE_SHIFT;
+ const int cur_flow_stride = flow->stride;
+
+ for (int i = FLOW_BORDER; i < cur_flow_height - FLOW_BORDER; i += 1) {
+ for (int j = FLOW_BORDER; j < cur_flow_width - FLOW_BORDER; j += 1) {
+ const int flow_field_idx = i * cur_flow_stride + j;
+
+ // Calculate the position of a patch of size DISFLOW_PATCH_SIZE pixels,
+ // which is centered on the region covered by this flow field entry
+ const int patch_center_x =
+ (j << DOWNSAMPLE_SHIFT) + UPSAMPLE_CENTER_OFFSET; // In pixels
+ const int patch_center_y =
+ (i << DOWNSAMPLE_SHIFT) + UPSAMPLE_CENTER_OFFSET; // In pixels
+ const int patch_tl_x = patch_center_x - DISFLOW_PATCH_CENTER;
+ const int patch_tl_y = patch_center_y - DISFLOW_PATCH_CENTER;
+ assert(patch_tl_x >= 0);
+ assert(patch_tl_y >= 0);
+
+ aom_compute_flow_at_point(src_buffer, ref_buffer, patch_tl_x,
+ patch_tl_y, cur_width, cur_height, cur_stride,
+ &flow_u[flow_field_idx],
+ &flow_v[flow_field_idx]);
}
}
- // TODO(sarahparker) Replace this with upscale function in resize.c
+
+ // Fill in the areas which we haven't explicitly computed, with copies
+ // of the outermost values which we did compute
+ fill_flow_field_borders(flow_u, cur_flow_width, cur_flow_height,
+ cur_flow_stride);
+ fill_flow_field_borders(flow_v, cur_flow_width, cur_flow_height,
+ cur_flow_stride);
+
if (level > 0) {
- int h_upscale = frm_pyr->heights[level - 1];
- int w_upscale = frm_pyr->widths[level - 1];
- int s_upscale = frm_pyr->strides[level - 1];
- for (int i = 0; i < h_upscale; ++i) {
- for (int j = 0; j < w_upscale; ++j) {
- u_upscale[j + i * s_upscale] =
- flow_u[(int)(j >> 1) + (int)(i >> 1) * cur_stride];
- v_upscale[j + i * s_upscale] =
- flow_v[(int)(j >> 1) + (int)(i >> 1) * cur_stride];
+ const int upscale_flow_width = cur_flow_width << 1;
+ const int upscale_flow_height = cur_flow_height << 1;
+ const int upscale_stride = flow->stride;
+
+ av1_upscale_plane_double_prec(
+ flow_u, cur_flow_height, cur_flow_width, cur_flow_stride, u_upscale,
+ upscale_flow_height, upscale_flow_width, upscale_stride);
+ av1_upscale_plane_double_prec(
+ flow_v, cur_flow_height, cur_flow_width, cur_flow_stride, v_upscale,
+ upscale_flow_height, upscale_flow_width, upscale_stride);
+
+ // Multiply all flow vectors by 2.
+ // When we move down a pyramid level, the image resolution doubles.
+ // Thus we need to double all vectors in order for them to represent
+ // the same translation at the next level down
+ for (int i = 0; i < upscale_flow_height; i++) {
+ for (int j = 0; j < upscale_flow_width; j++) {
+ const int index = i * upscale_stride + j;
+ flow_u[index] = u_upscale[index] * 2.0;
+ flow_v[index] = v_upscale[index] * 2.0;
}
}
- memcpy(flow_u, u_upscale,
- frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_u));
- memcpy(flow_v, v_upscale,
- frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_v));
+
+ // If we didn't fill in the rightmost column or bottommost row during
+ // upsampling (in order to keep the ratio to exactly 2), fill them
+ // in here by copying the next closest column/row
+ const PyramidLayer *next_layer = &src_pyr->layers[level - 1];
+ const int next_flow_width = next_layer->width >> DOWNSAMPLE_SHIFT;
+ const int next_flow_height = next_layer->height >> DOWNSAMPLE_SHIFT;
+
+ // Rightmost column
+ if (next_flow_width > upscale_flow_width) {
+ assert(next_flow_width == upscale_flow_width + 1);
+ for (int i = 0; i < upscale_flow_height; i++) {
+ const int index = i * upscale_stride + upscale_flow_width;
+ flow_u[index] = flow_u[index - 1];
+ flow_v[index] = flow_v[index - 1];
+ }
+ }
+
+ // Bottommost row
+ if (next_flow_height > upscale_flow_height) {
+ assert(next_flow_height == upscale_flow_height + 1);
+ for (int j = 0; j < next_flow_width; j++) {
+ const int index = upscale_flow_height * upscale_stride + j;
+ flow_u[index] = flow_u[index - upscale_stride];
+ flow_v[index] = flow_v[index - upscale_stride];
+ }
+ }
}
}
aom_free(u_upscale);
aom_free(v_upscale);
- return true;
}
-int av1_compute_global_motion_disflow_based(
- TransformationType type, unsigned char *frm_buffer, int frm_width,
- int frm_height, int frm_stride, int *frm_corners, int num_frm_corners,
- YV12_BUFFER_CONFIG *ref, int bit_depth, int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions) {
- unsigned char *ref_buffer = ref->y_buffer;
- const int ref_width = ref->y_width;
- const int ref_height = ref->y_height;
- const int pad_size = AOMMAX(PATCH_SIZE, MIN_PAD);
- int num_correspondences;
- double *correspondences;
- RansacFuncDouble ransac = av1_get_ransac_double_prec_type(type);
- assert(frm_width == ref_width);
- assert(frm_height == ref_height);
+static FlowField *alloc_flow_field(int frame_width, int frame_height) {
+ FlowField *flow = (FlowField *)aom_malloc(sizeof(FlowField));
+ if (flow == NULL) return NULL;
- // Ensure the number of pyramid levels will work with the frame resolution
- const int msb =
- frm_width < frm_height ? get_msb(frm_width) : get_msb(frm_height);
- const int n_levels = AOMMIN(msb, N_LEVELS);
+ // Calculate the size of the bottom (largest) layer of the flow pyramid
+ flow->width = frame_width >> DOWNSAMPLE_SHIFT;
+ flow->height = frame_height >> DOWNSAMPLE_SHIFT;
+ flow->stride = flow->width;
- if (ref->flags & YV12_FLAG_HIGHBITDEPTH) {
- ref_buffer = av1_downconvert_frame(ref, bit_depth);
+ const size_t flow_size = flow->stride * (size_t)flow->height;
+ flow->u = aom_calloc(flow_size, sizeof(*flow->u));
+ flow->v = aom_calloc(flow_size, sizeof(*flow->v));
+
+ if (flow->u == NULL || flow->v == NULL) {
+ aom_free(flow->u);
+ aom_free(flow->v);
+ aom_free(flow);
+ return NULL;
}
- // TODO(sarahparker) We will want to do the source pyramid computation
- // outside of this function so it doesn't get recomputed for every
- // reference. We also don't need to compute every pyramid level for the
- // reference in advance, since lower levels can be overwritten once their
- // flow field is computed and upscaled. I'll add these optimizations
- // once the full implementation is working.
- // Allocate frm image pyramids
- int compute_gradient = 1;
- ImagePyramid *frm_pyr =
- alloc_pyramid(frm_width, frm_height, pad_size, compute_gradient);
- if (!frm_pyr) return 0;
- compute_flow_pyramids(frm_buffer, frm_width, frm_height, frm_stride, n_levels,
- pad_size, compute_gradient, frm_pyr);
- // Allocate ref image pyramids
- compute_gradient = 0;
- ImagePyramid *ref_pyr =
- alloc_pyramid(ref_width, ref_height, pad_size, compute_gradient);
- if (!ref_pyr) {
- free_pyramid(frm_pyr);
- return 0;
- }
- compute_flow_pyramids(ref_buffer, ref_width, ref_height, ref->y_stride,
- n_levels, pad_size, compute_gradient, ref_pyr);
+ return flow;
+}
- int ret = 0;
- double *flow_u =
- aom_malloc(frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_u));
- double *flow_v =
- aom_malloc(frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_v));
- if (!(flow_u && flow_v)) goto Error;
+static void free_flow_field(FlowField *flow) {
+ aom_free(flow->u);
+ aom_free(flow->v);
+ aom_free(flow);
+}
- memset(flow_u, 0,
- frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_u));
- memset(flow_v, 0,
- frm_pyr->strides[0] * frm_pyr->heights[0] * sizeof(*flow_v));
+// Compute flow field between `src` and `ref`, and then use that flow to
+// compute a global motion model relating the two frames.
+//
+// Following the convention in flow_estimation.h, the flow vectors are computed
+// at fixed points in `src` and point to the corresponding locations in `ref`,
+// regardless of the temporal ordering of the frames.
+bool av1_compute_global_motion_disflow(TransformationType type,
+ YV12_BUFFER_CONFIG *src,
+ YV12_BUFFER_CONFIG *ref, int bit_depth,
+ MotionModel *motion_models,
+ int num_motion_models) {
+ // Precompute information we will need about each frame
+ ImagePyramid *src_pyramid = src->y_pyramid;
+ CornerList *src_corners = src->corners;
+ ImagePyramid *ref_pyramid = ref->y_pyramid;
+ aom_compute_pyramid(src, bit_depth, src_pyramid);
+ av1_compute_corner_list(src_pyramid, src_corners);
+ aom_compute_pyramid(ref, bit_depth, ref_pyramid);
- if (!compute_flow_field(frm_pyr, ref_pyr, flow_u, flow_v)) goto Error;
+ const int src_width = src_pyramid->layers[0].width;
+ const int src_height = src_pyramid->layers[0].height;
+ assert(ref_pyramid->layers[0].width == src_width);
+ assert(ref_pyramid->layers[0].height == src_height);
+
+ FlowField *flow = alloc_flow_field(src_width, src_height);
+ if (!flow) return false;
+
+ compute_flow_field(src_pyramid, ref_pyramid, flow);
// find correspondences between the two images using the flow field
- correspondences = aom_malloc(num_frm_corners * 4 * sizeof(*correspondences));
- if (!correspondences) goto Error;
- num_correspondences = determine_disflow_correspondence(
- frm_corners, num_frm_corners, flow_u, flow_v, frm_width, frm_height,
- frm_pyr->strides[0], correspondences);
- ransac(correspondences, num_correspondences, num_inliers_by_motion,
- params_by_motion, num_motions);
-
- // Set num_inliers = 0 for motions with too few inliers so they are ignored.
- for (int i = 0; i < num_motions; ++i) {
- if (num_inliers_by_motion[i] < MIN_INLIER_PROB * num_correspondences) {
- num_inliers_by_motion[i] = 0;
- }
+ Correspondence *correspondences =
+ aom_malloc(src_corners->num_corners * sizeof(*correspondences));
+ if (!correspondences) {
+ free_flow_field(flow);
+ return false;
}
- // Return true if any one of the motions has inliers.
- for (int i = 0; i < num_motions; ++i) {
- if (num_inliers_by_motion[i] > 0) {
- ret = 1;
- break;
- }
- }
+ const int num_correspondences =
+ determine_disflow_correspondence(src_corners, flow, correspondences);
+
+ bool result = ransac(correspondences, num_correspondences, type,
+ motion_models, num_motion_models);
aom_free(correspondences);
-Error:
- free_pyramid(frm_pyr);
- free_pyramid(ref_pyr);
- aom_free(flow_u);
- aom_free(flow_v);
- return ret;
+ free_flow_field(flow);
+ return result;
}
diff --git a/aom_dsp/flow_estimation/disflow.h b/aom_dsp/flow_estimation/disflow.h
index 52fb261..2e97ba2 100644
--- a/aom_dsp/flow_estimation/disflow.h
+++ b/aom_dsp/flow_estimation/disflow.h
@@ -12,18 +12,88 @@
#ifndef AOM_AOM_DSP_FLOW_ESTIMATION_DISFLOW_H_
#define AOM_AOM_DSP_FLOW_ESTIMATION_DISFLOW_H_
+#include <stdbool.h>
+
#include "aom_dsp/flow_estimation/flow_estimation.h"
+#include "aom_dsp/rect.h"
#include "aom_scale/yv12config.h"
#ifdef __cplusplus
extern "C" {
#endif
-int av1_compute_global_motion_disflow_based(
- TransformationType type, unsigned char *frm_buffer, int frm_width,
- int frm_height, int frm_stride, int *frm_corners, int num_frm_corners,
- YV12_BUFFER_CONFIG *ref, int bit_depth, int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions);
+// Number of pyramid levels in disflow computation
+#define DISFLOW_PYRAMID_LEVELS 12
+
+// Size of square patches in the disflow dense grid
+// Must be a power of 2
+#define DISFLOW_PATCH_SIZE_LOG2 3
+#define DISFLOW_PATCH_SIZE (1 << DISFLOW_PATCH_SIZE_LOG2)
+// Center point of square patch
+#define DISFLOW_PATCH_CENTER ((DISFLOW_PATCH_SIZE / 2) - 1)
+
+// Overall scale of the `dx`, `dy` and `dt` arrays in the disflow code
+// In other words, the various derivatives are calculated with an internal
+// precision of (8 + DISFLOW_DERIV_SCALE_LOG2) bits, from an 8-bit input.
+//
+// This must be carefully synchronized with the code in sobel_filter()
+// (which fills the dx and dy arrays) and compute_flow_error() (which
+// fills dt); see the comments in those functions for more details
+#define DISFLOW_DERIV_SCALE_LOG2 3
+#define DISFLOW_DERIV_SCALE (1 << DISFLOW_DERIV_SCALE_LOG2)
+
+// Scale factor applied to each step in the main refinement loop
+//
+// This should be <= 1.0 to avoid overshoot. Values below 1.0
+// may help in some cases, but slow convergence overall, so
+// will require careful tuning.
+// TODO(rachelbarker): Tune this value
+#define DISFLOW_STEP_SIZE 1.0
+
+// Step size at which we should terminate iteration
+// The idea here is that, if we take a step which is much smaller than 1px in
+// size, then the values won't change much from iteration to iteration, so
+// many future steps will also be small, and that won't have much effect
+// on the ultimate result. So we can terminate early.
+//
+// To look at it another way, when we take a small step, that means that
+// either we're near to convergence (so can stop), or we're stuck in a
+// shallow valley and will take many iterations to get unstuck.
+//
+// Solving the latter properly requires fancier methods, such as "gradient
+// descent with momentum". For now, we terminate to avoid wasting a ton of
+// time on points which are either nearly-converged or stuck.
+//
+// Terminating at 1/8 px seems to give good results for global motion estimation
+#define DISFLOW_STEP_SIZE_THRESOLD (1. / 8.)
+
+// Max number of iterations if warp convergence is not found
+#define DISFLOW_MAX_ITR 4
+
+// Internal precision of cubic interpolation filters
+// The limiting factor here is that:
+// * Before integerizing, the maximum value of any kernel tap is 1.0
+// * After integerizing, each tap must fit into an int16_t.
+// Thus the largest multiplier we can get away with is 2^14 = 16384,
+// as 2^15 = 32768 is too large to fit in an int16_t.
+#define DISFLOW_INTERP_BITS 14
+
+typedef struct {
+ // x and y directions of flow, per patch
+ double *u;
+ double *v;
+
+ // Sizes of the above arrays
+ int width;
+ int height;
+ int stride;
+} FlowField;
+
+bool av1_compute_global_motion_disflow(TransformationType type,
+ YV12_BUFFER_CONFIG *src,
+ YV12_BUFFER_CONFIG *ref, int bit_depth,
+ MotionModel *motion_models,
+ int num_motion_models);
#ifdef __cplusplus
}
diff --git a/aom_dsp/flow_estimation/flow_estimation.c b/aom_dsp/flow_estimation/flow_estimation.c
index d8cf8bd..a6bf942 100644
--- a/aom_dsp/flow_estimation/flow_estimation.c
+++ b/aom_dsp/flow_estimation/flow_estimation.c
@@ -11,49 +11,48 @@
#include <assert.h>
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_dsp/flow_estimation/corner_match.h"
#include "aom_dsp/flow_estimation/disflow.h"
#include "aom_dsp/flow_estimation/flow_estimation.h"
#include "aom_ports/mem.h"
#include "aom_scale/yv12config.h"
-int aom_compute_global_motion(TransformationType type,
- unsigned char *src_buffer, int src_width,
- int src_height, int src_stride, int *src_corners,
- int num_src_corners, YV12_BUFFER_CONFIG *ref,
- int bit_depth,
- GlobalMotionEstimationType gm_estimation_type,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions) {
- switch (gm_estimation_type) {
- case GLOBAL_MOTION_FEATURE_BASED:
- return av1_compute_global_motion_feature_based(
- type, src_buffer, src_width, src_height, src_stride, src_corners,
- num_src_corners, ref, bit_depth, num_inliers_by_motion,
- params_by_motion, num_motions);
- case GLOBAL_MOTION_DISFLOW_BASED:
- return av1_compute_global_motion_disflow_based(
- type, src_buffer, src_width, src_height, src_stride, src_corners,
- num_src_corners, ref, bit_depth, num_inliers_by_motion,
- params_by_motion, num_motions);
+// For each global motion method, how many pyramid levels should we allocate?
+// Note that this is a maximum, and fewer levels will be allocated if the frame
+// is not large enough to need all of the specified levels
+const int global_motion_pyr_levels[GLOBAL_MOTION_METHODS] = {
+ 1, // GLOBAL_MOTION_METHOD_FEATURE_MATCH
+ 16, // GLOBAL_MOTION_METHOD_DISFLOW
+};
+
+// clang-format off
+const double kIdentityParams[MAX_PARAMDIM] = {
+ 0.0, 0.0, 1.0, 0.0, 0.0, 1.0
+};
+// clang-format on
+
+// Compute a global motion model between the given source and ref frames.
+//
+// As is standard for video codecs, the resulting model maps from (x, y)
+// coordinates in `src` to the corresponding points in `ref`, regardless
+// of the temporal order of the two frames.
+//
+// Returns true if global motion estimation succeeded, false if not.
+// The output models should only be used if this function succeeds.
+bool aom_compute_global_motion(TransformationType type, YV12_BUFFER_CONFIG *src,
+ YV12_BUFFER_CONFIG *ref, int bit_depth,
+ GlobalMotionMethod gm_method,
+ MotionModel *motion_models,
+ int num_motion_models) {
+ switch (gm_method) {
+ case GLOBAL_MOTION_METHOD_FEATURE_MATCH:
+ return av1_compute_global_motion_feature_match(
+ type, src, ref, bit_depth, motion_models, num_motion_models);
+ case GLOBAL_MOTION_METHOD_DISFLOW:
+ return av1_compute_global_motion_disflow(
+ type, src, ref, bit_depth, motion_models, num_motion_models);
default: assert(0 && "Unknown global motion estimation type");
}
return 0;
}
-
-unsigned char *av1_downconvert_frame(YV12_BUFFER_CONFIG *frm, int bit_depth) {
- int i, j;
- uint16_t *orig_buf = CONVERT_TO_SHORTPTR(frm->y_buffer);
- uint8_t *buf_8bit = frm->y_buffer_8bit;
- assert(buf_8bit);
- if (!frm->buf_8bit_valid) {
- for (i = 0; i < frm->y_height; ++i) {
- for (j = 0; j < frm->y_width; ++j) {
- buf_8bit[i * frm->y_stride + j] =
- orig_buf[i * frm->y_stride + j] >> (bit_depth - 8);
- }
- }
- frm->buf_8bit_valid = 1;
- }
- return buf_8bit;
-}
diff --git a/aom_dsp/flow_estimation/flow_estimation.h b/aom_dsp/flow_estimation/flow_estimation.h
index ab9d328..4f2192c 100644
--- a/aom_dsp/flow_estimation/flow_estimation.h
+++ b/aom_dsp/flow_estimation/flow_estimation.h
@@ -12,6 +12,8 @@
#ifndef AOM_AOM_DSP_FLOW_ESTIMATION_H_
#define AOM_AOM_DSP_FLOW_ESTIMATION_H_
+#include "aom_dsp/pyramid.h"
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_ports/mem.h"
#include "aom_scale/yv12config.h"
@@ -19,8 +21,7 @@
extern "C" {
#endif
-#define MAX_PARAMDIM 9
-#define MAX_CORNERS 4096
+#define MAX_PARAMDIM 6
#define MIN_INLIER_PROB 0.1
/* clang-format off */
@@ -36,27 +37,56 @@
// number of parameters used by each transformation in TransformationTypes
static const int trans_model_params[TRANS_TYPES] = { 0, 2, 4, 6 };
+// Available methods which can be used for global motion estimation
typedef enum {
- GLOBAL_MOTION_FEATURE_BASED,
- GLOBAL_MOTION_DISFLOW_BASED,
-} GlobalMotionEstimationType;
+ GLOBAL_MOTION_METHOD_FEATURE_MATCH,
+ GLOBAL_MOTION_METHOD_DISFLOW,
+ GLOBAL_MOTION_METHOD_LAST = GLOBAL_MOTION_METHOD_DISFLOW,
+ GLOBAL_MOTION_METHODS
+} GlobalMotionMethod;
typedef struct {
- double params[MAX_PARAMDIM - 1];
+ double params[MAX_PARAMDIM];
int *inliers;
int num_inliers;
} MotionModel;
-int aom_compute_global_motion(TransformationType type,
- unsigned char *src_buffer, int src_width,
- int src_height, int src_stride, int *src_corners,
- int num_src_corners, YV12_BUFFER_CONFIG *ref,
- int bit_depth,
- GlobalMotionEstimationType gm_estimation_type,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions);
+// Data structure to store a single correspondence point during global
+// motion search.
+//
+// A correspondence (x, y) -> (rx, ry) means that point (x, y) in the
+// source frame corresponds to point (rx, ry) in the ref frame.
+typedef struct {
+ double x, y;
+ double rx, ry;
+} Correspondence;
-unsigned char *av1_downconvert_frame(YV12_BUFFER_CONFIG *frm, int bit_depth);
+// For each global motion method, how many pyramid levels should we allocate?
+// Note that this is a maximum, and fewer levels will be allocated if the frame
+// is not large enough to need all of the specified levels
+extern const int global_motion_pyr_levels[GLOBAL_MOTION_METHODS];
+
+// Which global motion method should we use in practice?
+// Disflow is both faster and gives better results than feature matching in
+// practically all cases, so we use disflow by default
+static const GlobalMotionMethod default_global_motion_method =
+ GLOBAL_MOTION_METHOD_DISFLOW;
+
+extern const double kIdentityParams[MAX_PARAMDIM];
+
+// Compute a global motion model between the given source and ref frames.
+//
+// As is standard for video codecs, the resulting model maps from (x, y)
+// coordinates in `src` to the corresponding points in `ref`, regardless
+// of the temporal order of the two frames.
+//
+// Returns true if global motion estimation succeeded, false if not.
+// The output models should only be used if this function succeeds.
+bool aom_compute_global_motion(TransformationType type, YV12_BUFFER_CONFIG *src,
+ YV12_BUFFER_CONFIG *ref, int bit_depth,
+ GlobalMotionMethod gm_method,
+ MotionModel *motion_models,
+ int num_motion_models);
#ifdef __cplusplus
}
diff --git a/aom_dsp/flow_estimation/ransac.c b/aom_dsp/flow_estimation/ransac.c
index 8ffc30d..81c5f2c 100644
--- a/aom_dsp/flow_estimation/ransac.c
+++ b/aom_dsp/flow_estimation/ransac.c
@@ -13,37 +13,54 @@
#include <math.h>
#include <time.h>
#include <stdio.h>
-#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
#include <assert.h>
#include "aom_dsp/flow_estimation/ransac.h"
#include "aom_dsp/mathutils.h"
+#include "aom_mem/aom_mem.h"
// TODO(rachelbarker): Remove dependence on code in av1/encoder/
#include "av1/encoder/random.h"
#define MAX_MINPTS 4
-#define MAX_DEGENERATE_ITER 10
#define MINPTS_MULTIPLIER 5
#define INLIER_THRESHOLD 1.25
-#define MIN_TRIALS 20
+#define INLIER_THRESHOLD_SQUARED (INLIER_THRESHOLD * INLIER_THRESHOLD)
+#define NUM_TRIALS 20
+
+// Flag to enable functions for finding TRANSLATION type models.
+//
+// These modes are not considered currently due to a spec bug (see comments
+// in gm_get_motion_vector() in av1/common/mv.h). Thus we don't need to compile
+// the corresponding search functions, but it is nice to keep the source around
+// but disabled, for completeness.
+#define ALLOW_TRANSLATION_MODELS 0
////////////////////////////////////////////////////////////////////////////////
// ransac
-typedef int (*IsDegenerateFunc)(double *p);
-typedef void (*NormalizeFunc)(double *p, int np, double *T);
-typedef void (*DenormalizeFunc)(double *params, double *T1, double *T2);
-typedef int (*FindTransformationFunc)(int points, double *points1,
- double *points2, double *params);
-typedef void (*ProjectPointsDoubleFunc)(double *mat, double *points,
- double *proj, int n, int stride_points,
- int stride_proj);
+typedef bool (*IsDegenerateFunc)(double *p);
+typedef bool (*FindTransformationFunc)(int points, const double *points1,
+ const double *points2, double *params);
+typedef void (*ProjectPointsFunc)(const double *mat, const double *points,
+ double *proj, int n, int stride_points,
+ int stride_proj);
-static void project_points_double_translation(double *mat, double *points,
- double *proj, int n,
- int stride_points,
- int stride_proj) {
+// vtable-like structure which stores all of the information needed by RANSAC
+// for a particular model type
+typedef struct {
+ IsDegenerateFunc is_degenerate;
+ FindTransformationFunc find_transformation;
+ ProjectPointsFunc project_points;
+ int minpts;
+} RansacModelInfo;
+
+#if ALLOW_TRANSLATION_MODELS
+static void project_points_translation(const double *mat, const double *points,
+ double *proj, int n, int stride_points,
+ int stride_proj) {
int i;
for (i = 0; i < n; ++i) {
const double x = *(points++), y = *(points++);
@@ -53,23 +70,11 @@
proj += stride_proj - 2;
}
}
+#endif // ALLOW_TRANSLATION_MODELS
-static void project_points_double_rotzoom(double *mat, double *points,
- double *proj, int n,
- int stride_points, int stride_proj) {
- int i;
- for (i = 0; i < n; ++i) {
- const double x = *(points++), y = *(points++);
- *(proj++) = mat[2] * x + mat[3] * y + mat[0];
- *(proj++) = -mat[3] * x + mat[2] * y + mat[1];
- points += stride_points - 2;
- proj += stride_proj - 2;
- }
-}
-
-static void project_points_double_affine(double *mat, double *points,
- double *proj, int n, int stride_points,
- int stride_proj) {
+static void project_points_affine(const double *mat, const double *points,
+ double *proj, int n, int stride_points,
+ int stride_proj) {
int i;
for (i = 0; i < n; ++i) {
const double x = *(points++), y = *(points++);
@@ -80,261 +85,135 @@
}
}
-static void normalize_homography(double *pts, int n, double *T) {
- double *p = pts;
- double mean[2] = { 0, 0 };
- double msqe = 0;
- double scale;
- int i;
+#if ALLOW_TRANSLATION_MODELS
+static bool find_translation(int np, const double *pts1, const double *pts2,
+ double *params) {
+ double sumx = 0;
+ double sumy = 0;
- assert(n > 0);
- for (i = 0; i < n; ++i, p += 2) {
- mean[0] += p[0];
- mean[1] += p[1];
- }
- mean[0] /= n;
- mean[1] /= n;
- for (p = pts, i = 0; i < n; ++i, p += 2) {
- p[0] -= mean[0];
- p[1] -= mean[1];
- msqe += sqrt(p[0] * p[0] + p[1] * p[1]);
- }
- msqe /= n;
- scale = (msqe == 0 ? 1.0 : sqrt(2) / msqe);
- T[0] = scale;
- T[1] = 0;
- T[2] = -scale * mean[0];
- T[3] = 0;
- T[4] = scale;
- T[5] = -scale * mean[1];
- T[6] = 0;
- T[7] = 0;
- T[8] = 1;
- for (p = pts, i = 0; i < n; ++i, p += 2) {
- p[0] *= scale;
- p[1] *= scale;
- }
-}
-
-static void invnormalize_mat(double *T, double *iT) {
- double is = 1.0 / T[0];
- double m0 = -T[2] * is;
- double m1 = -T[5] * is;
- iT[0] = is;
- iT[1] = 0;
- iT[2] = m0;
- iT[3] = 0;
- iT[4] = is;
- iT[5] = m1;
- iT[6] = 0;
- iT[7] = 0;
- iT[8] = 1;
-}
-
-static void denormalize_homography(double *params, double *T1, double *T2) {
- double iT2[9];
- double params2[9];
- invnormalize_mat(T2, iT2);
- multiply_mat(params, T1, params2, 3, 3, 3);
- multiply_mat(iT2, params2, params, 3, 3, 3);
-}
-
-static void denormalize_affine_reorder(double *params, double *T1, double *T2) {
- double params_denorm[MAX_PARAMDIM];
- params_denorm[0] = params[0];
- params_denorm[1] = params[1];
- params_denorm[2] = params[4];
- params_denorm[3] = params[2];
- params_denorm[4] = params[3];
- params_denorm[5] = params[5];
- params_denorm[6] = params_denorm[7] = 0;
- params_denorm[8] = 1;
- denormalize_homography(params_denorm, T1, T2);
- params[0] = params_denorm[2];
- params[1] = params_denorm[5];
- params[2] = params_denorm[0];
- params[3] = params_denorm[1];
- params[4] = params_denorm[3];
- params[5] = params_denorm[4];
- params[6] = params[7] = 0;
-}
-
-static void denormalize_rotzoom_reorder(double *params, double *T1,
- double *T2) {
- double params_denorm[MAX_PARAMDIM];
- params_denorm[0] = params[0];
- params_denorm[1] = params[1];
- params_denorm[2] = params[2];
- params_denorm[3] = -params[1];
- params_denorm[4] = params[0];
- params_denorm[5] = params[3];
- params_denorm[6] = params_denorm[7] = 0;
- params_denorm[8] = 1;
- denormalize_homography(params_denorm, T1, T2);
- params[0] = params_denorm[2];
- params[1] = params_denorm[5];
- params[2] = params_denorm[0];
- params[3] = params_denorm[1];
- params[4] = -params[3];
- params[5] = params[2];
- params[6] = params[7] = 0;
-}
-
-static void denormalize_translation_reorder(double *params, double *T1,
- double *T2) {
- double params_denorm[MAX_PARAMDIM];
- params_denorm[0] = 1;
- params_denorm[1] = 0;
- params_denorm[2] = params[0];
- params_denorm[3] = 0;
- params_denorm[4] = 1;
- params_denorm[5] = params[1];
- params_denorm[6] = params_denorm[7] = 0;
- params_denorm[8] = 1;
- denormalize_homography(params_denorm, T1, T2);
- params[0] = params_denorm[2];
- params[1] = params_denorm[5];
- params[2] = params[5] = 1;
- params[3] = params[4] = 0;
- params[6] = params[7] = 0;
-}
-
-static int find_translation(int np, double *pts1, double *pts2, double *mat) {
- int i;
- double sx, sy, dx, dy;
- double sumx, sumy;
-
- double T1[9], T2[9];
- normalize_homography(pts1, np, T1);
- normalize_homography(pts2, np, T2);
-
- sumx = 0;
- sumy = 0;
- for (i = 0; i < np; ++i) {
- dx = *(pts2++);
- dy = *(pts2++);
- sx = *(pts1++);
- sy = *(pts1++);
+ for (int i = 0; i < np; ++i) {
+ double dx = *(pts2++);
+ double dy = *(pts2++);
+ double sx = *(pts1++);
+ double sy = *(pts1++);
sumx += dx - sx;
sumy += dy - sy;
}
- mat[0] = sumx / np;
- mat[1] = sumy / np;
- denormalize_translation_reorder(mat, T1, T2);
- return 0;
+
+ params[0] = sumx / np;
+ params[1] = sumy / np;
+ params[2] = 1;
+ params[3] = 0;
+ params[4] = 0;
+ params[5] = 1;
+ return true;
+}
+#endif // ALLOW_TRANSLATION_MODELS
+
+static bool find_rotzoom(int np, const double *pts1, const double *pts2,
+ double *params) {
+ const int n = 4; // Size of least-squares problem
+ double mat[4 * 4]; // Accumulator for A'A
+ double y[4]; // Accumulator for A'b
+ double a[4]; // Single row of A
+ double b; // Single element of b
+
+ least_squares_init(mat, y, n);
+ for (int i = 0; i < np; ++i) {
+ double dx = *(pts2++);
+ double dy = *(pts2++);
+ double sx = *(pts1++);
+ double sy = *(pts1++);
+
+ a[0] = 1;
+ a[1] = 0;
+ a[2] = sx;
+ a[3] = sy;
+ b = dx;
+ least_squares_accumulate(mat, y, a, b, n);
+
+ a[0] = 0;
+ a[1] = 1;
+ a[2] = sy;
+ a[3] = -sx;
+ b = dy;
+ least_squares_accumulate(mat, y, a, b, n);
+ }
+
+ // Fill in params[0] .. params[3] with output model
+ if (!least_squares_solve(mat, y, params, n)) {
+ return false;
+ }
+
+ // Fill in remaining parameters
+ params[4] = -params[3];
+ params[5] = params[2];
+
+ return true;
}
-static int find_rotzoom(int np, double *pts1, double *pts2, double *mat) {
- const int np2 = np * 2;
- double *a = (double *)aom_malloc(sizeof(*a) * (np2 * 5 + 20));
- if (a == NULL) return 1;
- double *b = a + np2 * 4;
- double *temp = b + np2;
- int i;
- double sx, sy, dx, dy;
+static bool find_affine(int np, const double *pts1, const double *pts2,
+ double *params) {
+ // Note: The least squares problem for affine models is 6-dimensional,
+ // but it splits into two independent 3-dimensional subproblems.
+ // Solving these two subproblems separately and recombining at the end
+ // results in less total computation than solving the 6-dimensional
+ // problem directly.
+ //
+ // The two subproblems correspond to all the parameters which contribute
+ // to the x output of the model, and all the parameters which contribute
+ // to the y output, respectively.
- double T1[9], T2[9];
- normalize_homography(pts1, np, T1);
- normalize_homography(pts2, np, T2);
+ const int n = 3; // Size of each least-squares problem
+ double mat[2][3 * 3]; // Accumulator for A'A
+ double y[2][3]; // Accumulator for A'b
+ double x[2][3]; // Output vector
+ double a[2][3]; // Single row of A
+ double b[2]; // Single element of b
- for (i = 0; i < np; ++i) {
- dx = *(pts2++);
- dy = *(pts2++);
- sx = *(pts1++);
- sy = *(pts1++);
+ least_squares_init(mat[0], y[0], n);
+ least_squares_init(mat[1], y[1], n);
+ for (int i = 0; i < np; ++i) {
+ double dx = *(pts2++);
+ double dy = *(pts2++);
+ double sx = *(pts1++);
+ double sy = *(pts1++);
- a[i * 2 * 4 + 0] = sx;
- a[i * 2 * 4 + 1] = sy;
- a[i * 2 * 4 + 2] = 1;
- a[i * 2 * 4 + 3] = 0;
- a[(i * 2 + 1) * 4 + 0] = sy;
- a[(i * 2 + 1) * 4 + 1] = -sx;
- a[(i * 2 + 1) * 4 + 2] = 0;
- a[(i * 2 + 1) * 4 + 3] = 1;
+ a[0][0] = 1;
+ a[0][1] = sx;
+ a[0][2] = sy;
+ b[0] = dx;
+ least_squares_accumulate(mat[0], y[0], a[0], b[0], n);
- b[2 * i] = dx;
- b[2 * i + 1] = dy;
+ a[1][0] = 1;
+ a[1][1] = sx;
+ a[1][2] = sy;
+ b[1] = dy;
+ least_squares_accumulate(mat[1], y[1], a[1], b[1], n);
}
- if (!least_squares(4, a, np2, 4, b, temp, mat)) {
- aom_free(a);
- return 1;
+
+ if (!least_squares_solve(mat[0], y[0], x[0], n)) {
+ return false;
}
- denormalize_rotzoom_reorder(mat, T1, T2);
- aom_free(a);
- return 0;
-}
-
-static int find_affine(int np, double *pts1, double *pts2, double *mat) {
- assert(np > 0);
- const int np2 = np * 2;
- double *a = (double *)aom_malloc(sizeof(*a) * (np2 * 7 + 42));
- if (a == NULL) return 1;
- double *b = a + np2 * 6;
- double *temp = b + np2;
- int i;
- double sx, sy, dx, dy;
-
- double T1[9], T2[9];
- normalize_homography(pts1, np, T1);
- normalize_homography(pts2, np, T2);
-
- for (i = 0; i < np; ++i) {
- dx = *(pts2++);
- dy = *(pts2++);
- sx = *(pts1++);
- sy = *(pts1++);
-
- a[i * 2 * 6 + 0] = sx;
- a[i * 2 * 6 + 1] = sy;
- a[i * 2 * 6 + 2] = 0;
- a[i * 2 * 6 + 3] = 0;
- a[i * 2 * 6 + 4] = 1;
- a[i * 2 * 6 + 5] = 0;
- a[(i * 2 + 1) * 6 + 0] = 0;
- a[(i * 2 + 1) * 6 + 1] = 0;
- a[(i * 2 + 1) * 6 + 2] = sx;
- a[(i * 2 + 1) * 6 + 3] = sy;
- a[(i * 2 + 1) * 6 + 4] = 0;
- a[(i * 2 + 1) * 6 + 5] = 1;
-
- b[2 * i] = dx;
- b[2 * i + 1] = dy;
+ if (!least_squares_solve(mat[1], y[1], x[1], n)) {
+ return false;
}
- if (!least_squares(6, a, np2, 6, b, temp, mat)) {
- aom_free(a);
- return 1;
- }
- denormalize_affine_reorder(mat, T1, T2);
- aom_free(a);
- return 0;
-}
-static int get_rand_indices(int npoints, int minpts, int *indices,
- unsigned int *seed) {
- int i, j;
- int ptr = lcg_rand16(seed) % npoints;
- if (minpts > npoints) return 0;
- indices[0] = ptr;
- ptr = (ptr == npoints - 1 ? 0 : ptr + 1);
- i = 1;
- while (i < minpts) {
- int index = lcg_rand16(seed) % npoints;
- while (index) {
- ptr = (ptr == npoints - 1 ? 0 : ptr + 1);
- for (j = 0; j < i; ++j) {
- if (indices[j] == ptr) break;
- }
- if (j == i) index--;
- }
- indices[i++] = ptr;
- }
- return 1;
+ // Rearrange least squares result to form output model
+ params[0] = x[0][0];
+ params[1] = x[1][0];
+ params[2] = x[0][1];
+ params[3] = x[0][2];
+ params[4] = x[1][1];
+ params[5] = x[1][2];
+
+ return true;
}
typedef struct {
int num_inliers;
- double variance;
+ double sse; // Sum of squared errors of inliers
int *inlier_indices;
} RANSAC_MOTION;
@@ -345,13 +224,13 @@
if (motion_a->num_inliers > motion_b->num_inliers) return -1;
if (motion_a->num_inliers < motion_b->num_inliers) return 1;
- if (motion_a->variance < motion_b->variance) return -1;
- if (motion_a->variance > motion_b->variance) return 1;
+ if (motion_a->sse < motion_b->sse) return -1;
+ if (motion_a->sse > motion_b->sse) return 1;
return 0;
}
-static int is_better_motion(const RANSAC_MOTION *motion_a,
- const RANSAC_MOTION *motion_b) {
+static bool is_better_motion(const RANSAC_MOTION *motion_a,
+ const RANSAC_MOTION *motion_b) {
return compare_motions(motion_a, motion_b) < 0;
}
@@ -364,24 +243,14 @@
}
}
-static const double kInfiniteVariance = 1e12;
-
-static void clear_motion(RANSAC_MOTION *motion, int num_points) {
- motion->num_inliers = 0;
- motion->variance = kInfiniteVariance;
- memset(motion->inlier_indices, 0,
- sizeof(*motion->inlier_indices) * num_points);
-}
-
-static int ransac(const int *matched_points, int npoints,
- int *num_inliers_by_motion, MotionModel *params_by_motion,
- int num_desired_motions, int minpts,
- IsDegenerateFunc is_degenerate,
- FindTransformationFunc find_transformation,
- ProjectPointsDoubleFunc projectpoints) {
- int trial_count = 0;
+// Returns true on success, false on error
+static bool ransac_internal(const Correspondence *matched_points, int npoints,
+ MotionModel *motion_models, int num_desired_motions,
+ const RansacModelInfo *model_info) {
+ assert(npoints >= 0);
int i = 0;
- int ret_val = 0;
+ int minpts = model_info->minpts;
+ bool ret_val = true;
unsigned int seed = (unsigned int)npoints;
@@ -389,7 +258,7 @@
double *points1, *points2;
double *corners1, *corners2;
- double *image1_coord;
+ double *projected_corners;
// Store information for the num_desired_motions best transformations found
// and the worst motion among them, as well as the motion currently under
@@ -401,123 +270,115 @@
// currently under consideration.
double params_this_motion[MAX_PARAMDIM];
- double *cnp1, *cnp2;
-
- for (i = 0; i < num_desired_motions; ++i) {
- num_inliers_by_motion[i] = 0;
- }
if (npoints < minpts * MINPTS_MULTIPLIER || npoints == 0) {
- return 1;
+ return false;
}
+ int min_inliers = AOMMAX((int)(MIN_INLIER_PROB * npoints), minpts);
+
points1 = (double *)aom_malloc(sizeof(*points1) * npoints * 2);
points2 = (double *)aom_malloc(sizeof(*points2) * npoints * 2);
corners1 = (double *)aom_malloc(sizeof(*corners1) * npoints * 2);
corners2 = (double *)aom_malloc(sizeof(*corners2) * npoints * 2);
- image1_coord = (double *)aom_malloc(sizeof(*image1_coord) * npoints * 2);
+ projected_corners =
+ (double *)aom_malloc(sizeof(*projected_corners) * npoints * 2);
motions =
(RANSAC_MOTION *)aom_calloc(num_desired_motions, sizeof(RANSAC_MOTION));
- current_motion.inlier_indices =
- (int *)aom_malloc(sizeof(*current_motion.inlier_indices) * npoints);
- if (!(points1 && points2 && corners1 && corners2 && image1_coord && motions &&
- current_motion.inlier_indices)) {
- ret_val = 1;
+
+ // Allocate one large buffer which will be carved up to store the inlier
+ // indices for the current motion plus the num_desired_motions many
+ // output models
+ // This allows us to keep the allocation/deallocation logic simple, without
+ // having to (for example) check that `motions` is non-null before allocating
+ // the inlier arrays
+ int *inlier_buffer = (int *)aom_malloc(sizeof(*inlier_buffer) * npoints *
+ (num_desired_motions + 1));
+
+ if (!(points1 && points2 && corners1 && corners2 && projected_corners &&
+ motions && inlier_buffer)) {
+ ret_val = false;
goto finish_ransac;
}
- for (i = 0; i < num_desired_motions; ++i) {
- motions[i].inlier_indices =
- (int *)aom_malloc(sizeof(*motions->inlier_indices) * npoints);
- if (!motions[i].inlier_indices) {
- ret_val = 1;
- goto finish_ransac;
- }
- clear_motion(motions + i, npoints);
- }
- clear_motion(¤t_motion, npoints);
-
+ // Once all our allocations are known-good, we can fill in our structures
worst_kept_motion = motions;
- cnp1 = corners1;
- cnp2 = corners2;
+ for (i = 0; i < num_desired_motions; ++i) {
+ motions[i].inlier_indices = inlier_buffer + i * npoints;
+ }
+ memset(¤t_motion, 0, sizeof(current_motion));
+ current_motion.inlier_indices = inlier_buffer + num_desired_motions * npoints;
+
for (i = 0; i < npoints; ++i) {
- *(cnp1++) = *(matched_points++);
- *(cnp1++) = *(matched_points++);
- *(cnp2++) = *(matched_points++);
- *(cnp2++) = *(matched_points++);
+ corners1[2 * i + 0] = matched_points[i].x;
+ corners1[2 * i + 1] = matched_points[i].y;
+ corners2[2 * i + 0] = matched_points[i].rx;
+ corners2[2 * i + 1] = matched_points[i].ry;
}
- while (MIN_TRIALS > trial_count) {
- double sum_distance = 0.0;
- double sum_distance_squared = 0.0;
+ for (int trial_count = 0; trial_count < NUM_TRIALS; trial_count++) {
+ lcg_pick(npoints, minpts, indices, &seed);
- clear_motion(¤t_motion, npoints);
+ copy_points_at_indices(points1, corners1, indices, minpts);
+ copy_points_at_indices(points2, corners2, indices, minpts);
- int degenerate = 1;
- int num_degenerate_iter = 0;
-
- while (degenerate) {
- num_degenerate_iter++;
- if (!get_rand_indices(npoints, minpts, indices, &seed)) {
- ret_val = 1;
- goto finish_ransac;
- }
-
- copy_points_at_indices(points1, corners1, indices, minpts);
- copy_points_at_indices(points2, corners2, indices, minpts);
-
- degenerate = is_degenerate(points1);
- if (num_degenerate_iter > MAX_DEGENERATE_ITER) {
- ret_val = 1;
- goto finish_ransac;
- }
- }
-
- if (find_transformation(minpts, points1, points2, params_this_motion)) {
- trial_count++;
+ if (model_info->is_degenerate(points1)) {
continue;
}
- projectpoints(params_this_motion, corners1, image1_coord, npoints, 2, 2);
+ if (!model_info->find_transformation(minpts, points1, points2,
+ params_this_motion)) {
+ continue;
+ }
+ model_info->project_points(params_this_motion, corners1, projected_corners,
+ npoints, 2, 2);
+
+ current_motion.num_inliers = 0;
+ double sse = 0.0;
for (i = 0; i < npoints; ++i) {
- double dx = image1_coord[i * 2] - corners2[i * 2];
- double dy = image1_coord[i * 2 + 1] - corners2[i * 2 + 1];
- double distance = sqrt(dx * dx + dy * dy);
+ double dx = projected_corners[i * 2] - corners2[i * 2];
+ double dy = projected_corners[i * 2 + 1] - corners2[i * 2 + 1];
+ double squared_error = dx * dx + dy * dy;
- if (distance < INLIER_THRESHOLD) {
+ if (squared_error < INLIER_THRESHOLD_SQUARED) {
current_motion.inlier_indices[current_motion.num_inliers++] = i;
- sum_distance += distance;
- sum_distance_squared += distance * distance;
+ sse += squared_error;
}
}
- if (current_motion.num_inliers >= worst_kept_motion->num_inliers &&
- current_motion.num_inliers > 1) {
- double mean_distance;
- mean_distance = sum_distance / ((double)current_motion.num_inliers);
- current_motion.variance =
- sum_distance_squared / ((double)current_motion.num_inliers - 1.0) -
- mean_distance * mean_distance * ((double)current_motion.num_inliers) /
- ((double)current_motion.num_inliers - 1.0);
- if (is_better_motion(¤t_motion, worst_kept_motion)) {
- // This motion is better than the worst currently kept motion. Remember
- // the inlier points and variance. The parameters for each kept motion
- // will be recomputed later using only the inliers.
- worst_kept_motion->num_inliers = current_motion.num_inliers;
- worst_kept_motion->variance = current_motion.variance;
- memcpy(worst_kept_motion->inlier_indices, current_motion.inlier_indices,
- sizeof(*current_motion.inlier_indices) * npoints);
- assert(npoints > 0);
- // Determine the new worst kept motion and its num_inliers and variance.
- for (i = 0; i < num_desired_motions; ++i) {
- if (is_better_motion(worst_kept_motion, &motions[i])) {
- worst_kept_motion = &motions[i];
- }
+ if (current_motion.num_inliers < min_inliers) {
+ // Reject models with too few inliers
+ continue;
+ }
+
+ current_motion.sse = sse;
+ if (is_better_motion(¤t_motion, worst_kept_motion)) {
+ // This motion is better than the worst currently kept motion. Remember
+ // the inlier points and sse. The parameters for each kept motion
+ // will be recomputed later using only the inliers.
+ worst_kept_motion->num_inliers = current_motion.num_inliers;
+ worst_kept_motion->sse = current_motion.sse;
+
+ // Rather than copying the (potentially many) inlier indices from
+ // current_motion.inlier_indices to worst_kept_motion->inlier_indices,
+ // we can swap the underlying pointers.
+ //
+ // This is okay because the next time current_motion.inlier_indices
+ // is used will be in the next trial, where we ignore its previous
+ // contents anyway. And both arrays will be deallocated together at the
+ // end of this function, so there are no lifetime issues.
+ int *tmp = worst_kept_motion->inlier_indices;
+ worst_kept_motion->inlier_indices = current_motion.inlier_indices;
+ current_motion.inlier_indices = tmp;
+
+ // Determine the new worst kept motion and its num_inliers and sse.
+ for (i = 0; i < num_desired_motions; ++i) {
+ if (is_better_motion(worst_kept_motion, &motions[i])) {
+ worst_kept_motion = &motions[i];
}
}
}
- trial_count++;
}
// Sort the motions, best first.
@@ -525,310 +386,96 @@
// Recompute the motions using only the inliers.
for (i = 0; i < num_desired_motions; ++i) {
- if (motions[i].num_inliers >= minpts) {
+ int num_inliers = motions[i].num_inliers;
+ if (num_inliers > 0) {
+ assert(num_inliers >= minpts);
+
copy_points_at_indices(points1, corners1, motions[i].inlier_indices,
- motions[i].num_inliers);
+ num_inliers);
copy_points_at_indices(points2, corners2, motions[i].inlier_indices,
- motions[i].num_inliers);
+ num_inliers);
- find_transformation(motions[i].num_inliers, points1, points2,
- params_by_motion[i].params);
+ if (!model_info->find_transformation(num_inliers, points1, points2,
+ motion_models[i].params)) {
+ // In the unlikely event that this model fitting fails,
+ // we don't have a good fallback. So just clear the output
+ // model and move on
+ memcpy(motion_models[i].params, kIdentityParams,
+ MAX_PARAMDIM * sizeof(*(motion_models[i].params)));
+ motion_models[i].num_inliers = 0;
+ continue;
+ }
- params_by_motion[i].num_inliers = motions[i].num_inliers;
- memcpy(params_by_motion[i].inliers, motions[i].inlier_indices,
- sizeof(*motions[i].inlier_indices) * npoints);
- num_inliers_by_motion[i] = motions[i].num_inliers;
+ // Populate inliers array
+ for (int j = 0; j < num_inliers; j++) {
+ int index = motions[i].inlier_indices[j];
+ const Correspondence *corr = &matched_points[index];
+ motion_models[i].inliers[2 * j + 0] = (int)rint(corr->x);
+ motion_models[i].inliers[2 * j + 1] = (int)rint(corr->y);
+ }
+ motion_models[i].num_inliers = num_inliers;
+ } else {
+ memcpy(motion_models[i].params, kIdentityParams,
+ MAX_PARAMDIM * sizeof(*(motion_models[i].params)));
+ motion_models[i].num_inliers = 0;
}
}
finish_ransac:
- aom_free(points1);
- aom_free(points2);
- aom_free(corners1);
+ aom_free(inlier_buffer);
+ aom_free(motions);
+ aom_free(projected_corners);
aom_free(corners2);
- aom_free(image1_coord);
- aom_free(current_motion.inlier_indices);
- if (motions) {
- for (i = 0; i < num_desired_motions; ++i) {
- aom_free(motions[i].inlier_indices);
- }
- aom_free(motions);
- }
+ aom_free(corners1);
+ aom_free(points2);
+ aom_free(points1);
return ret_val;
}
-static int ransac_double_prec(const double *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions, int minpts,
- IsDegenerateFunc is_degenerate,
- FindTransformationFunc find_transformation,
- ProjectPointsDoubleFunc projectpoints) {
- int trial_count = 0;
- int i = 0;
- int ret_val = 0;
-
- unsigned int seed = (unsigned int)npoints;
-
- int indices[MAX_MINPTS] = { 0 };
-
- double *points1, *points2;
- double *corners1, *corners2;
- double *image1_coord;
-
- // Store information for the num_desired_motions best transformations found
- // and the worst motion among them, as well as the motion currently under
- // consideration.
- RANSAC_MOTION *motions, *worst_kept_motion = NULL;
- RANSAC_MOTION current_motion;
-
- // Store the parameters and the indices of the inlier points for the motion
- // currently under consideration.
- double params_this_motion[MAX_PARAMDIM];
-
- double *cnp1, *cnp2;
-
- for (i = 0; i < num_desired_motions; ++i) {
- num_inliers_by_motion[i] = 0;
- }
- if (npoints < minpts * MINPTS_MULTIPLIER || npoints == 0) {
- return 1;
- }
-
- points1 = (double *)aom_malloc(sizeof(*points1) * npoints * 2);
- points2 = (double *)aom_malloc(sizeof(*points2) * npoints * 2);
- corners1 = (double *)aom_malloc(sizeof(*corners1) * npoints * 2);
- corners2 = (double *)aom_malloc(sizeof(*corners2) * npoints * 2);
- image1_coord = (double *)aom_malloc(sizeof(*image1_coord) * npoints * 2);
- motions =
- (RANSAC_MOTION *)aom_calloc(num_desired_motions, sizeof(RANSAC_MOTION));
- current_motion.inlier_indices =
- (int *)aom_malloc(sizeof(*current_motion.inlier_indices) * npoints);
- if (!(points1 && points2 && corners1 && corners2 && image1_coord && motions &&
- current_motion.inlier_indices)) {
- ret_val = 1;
- goto finish_ransac;
- }
-
- for (i = 0; i < num_desired_motions; ++i) {
- motions[i].inlier_indices =
- (int *)aom_malloc(sizeof(*motions->inlier_indices) * npoints);
- if (!motions[i].inlier_indices) {
- ret_val = 1;
- goto finish_ransac;
- }
- clear_motion(motions + i, npoints);
- }
- clear_motion(¤t_motion, npoints);
-
- worst_kept_motion = motions;
-
- cnp1 = corners1;
- cnp2 = corners2;
- for (i = 0; i < npoints; ++i) {
- *(cnp1++) = *(matched_points++);
- *(cnp1++) = *(matched_points++);
- *(cnp2++) = *(matched_points++);
- *(cnp2++) = *(matched_points++);
- }
-
- while (MIN_TRIALS > trial_count) {
- double sum_distance = 0.0;
- double sum_distance_squared = 0.0;
-
- clear_motion(¤t_motion, npoints);
-
- int degenerate = 1;
- int num_degenerate_iter = 0;
-
- while (degenerate) {
- num_degenerate_iter++;
- if (!get_rand_indices(npoints, minpts, indices, &seed)) {
- ret_val = 1;
- goto finish_ransac;
- }
-
- copy_points_at_indices(points1, corners1, indices, minpts);
- copy_points_at_indices(points2, corners2, indices, minpts);
-
- degenerate = is_degenerate(points1);
- if (num_degenerate_iter > MAX_DEGENERATE_ITER) {
- ret_val = 1;
- goto finish_ransac;
- }
- }
-
- if (find_transformation(minpts, points1, points2, params_this_motion)) {
- trial_count++;
- continue;
- }
-
- projectpoints(params_this_motion, corners1, image1_coord, npoints, 2, 2);
-
- for (i = 0; i < npoints; ++i) {
- double dx = image1_coord[i * 2] - corners2[i * 2];
- double dy = image1_coord[i * 2 + 1] - corners2[i * 2 + 1];
- double distance = sqrt(dx * dx + dy * dy);
-
- if (distance < INLIER_THRESHOLD) {
- current_motion.inlier_indices[current_motion.num_inliers++] = i;
- sum_distance += distance;
- sum_distance_squared += distance * distance;
- }
- }
-
- if (current_motion.num_inliers >= worst_kept_motion->num_inliers &&
- current_motion.num_inliers > 1) {
- double mean_distance;
- mean_distance = sum_distance / ((double)current_motion.num_inliers);
- current_motion.variance =
- sum_distance_squared / ((double)current_motion.num_inliers - 1.0) -
- mean_distance * mean_distance * ((double)current_motion.num_inliers) /
- ((double)current_motion.num_inliers - 1.0);
- if (is_better_motion(¤t_motion, worst_kept_motion)) {
- // This motion is better than the worst currently kept motion. Remember
- // the inlier points and variance. The parameters for each kept motion
- // will be recomputed later using only the inliers.
- worst_kept_motion->num_inliers = current_motion.num_inliers;
- worst_kept_motion->variance = current_motion.variance;
- memcpy(worst_kept_motion->inlier_indices, current_motion.inlier_indices,
- sizeof(*current_motion.inlier_indices) * npoints);
- assert(npoints > 0);
- // Determine the new worst kept motion and its num_inliers and variance.
- for (i = 0; i < num_desired_motions; ++i) {
- if (is_better_motion(worst_kept_motion, &motions[i])) {
- worst_kept_motion = &motions[i];
- }
- }
- }
- }
- trial_count++;
- }
-
- // Sort the motions, best first.
- qsort(motions, num_desired_motions, sizeof(RANSAC_MOTION), compare_motions);
-
- // Recompute the motions using only the inliers.
- for (i = 0; i < num_desired_motions; ++i) {
- if (motions[i].num_inliers >= minpts) {
- copy_points_at_indices(points1, corners1, motions[i].inlier_indices,
- motions[i].num_inliers);
- copy_points_at_indices(points2, corners2, motions[i].inlier_indices,
- motions[i].num_inliers);
-
- find_transformation(motions[i].num_inliers, points1, points2,
- params_by_motion[i].params);
- memcpy(params_by_motion[i].inliers, motions[i].inlier_indices,
- sizeof(*motions[i].inlier_indices) * npoints);
- }
- num_inliers_by_motion[i] = motions[i].num_inliers;
- }
-
-finish_ransac:
- aom_free(points1);
- aom_free(points2);
- aom_free(corners1);
- aom_free(corners2);
- aom_free(image1_coord);
- aom_free(current_motion.inlier_indices);
- if (motions) {
- for (i = 0; i < num_desired_motions; ++i) {
- aom_free(motions[i].inlier_indices);
- }
- aom_free(motions);
- }
-
- return ret_val;
-}
-
-static int is_collinear3(double *p1, double *p2, double *p3) {
+static bool is_collinear3(double *p1, double *p2, double *p3) {
static const double collinear_eps = 1e-3;
const double v =
(p2[0] - p1[0]) * (p3[1] - p1[1]) - (p2[1] - p1[1]) * (p3[0] - p1[0]);
return fabs(v) < collinear_eps;
}
-static int is_degenerate_translation(double *p) {
+#if ALLOW_TRANSLATION_MODELS
+static bool is_degenerate_translation(double *p) {
return (p[0] - p[2]) * (p[0] - p[2]) + (p[1] - p[3]) * (p[1] - p[3]) <= 2;
}
+#endif // ALLOW_TRANSLATION_MODELS
-static int is_degenerate_affine(double *p) {
+static bool is_degenerate_affine(double *p) {
return is_collinear3(p, p + 2, p + 4);
}
-static int ransac_translation(int *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3,
- is_degenerate_translation, find_translation,
- project_points_double_translation);
-}
+static const RansacModelInfo ransac_model_info[TRANS_TYPES] = {
+ // IDENTITY
+ { NULL, NULL, NULL, 0 },
+// TRANSLATION
+#if ALLOW_TRANSLATION_MODELS
+ { is_degenerate_translation, find_translation, project_points_translation,
+ 3 },
+#else
+ { NULL, NULL, NULL, 0 },
+#endif
+ // ROTZOOM
+ { is_degenerate_affine, find_rotzoom, project_points_affine, 3 },
+ // AFFINE
+ { is_degenerate_affine, find_affine, project_points_affine, 3 },
+};
-static int ransac_rotzoom(int *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3, is_degenerate_affine,
- find_rotzoom, project_points_double_rotzoom);
-}
+// Returns true on success, false on error
+bool ransac(const Correspondence *matched_points, int npoints,
+ TransformationType type, MotionModel *motion_models,
+ int num_desired_motions) {
+#if ALLOW_TRANSLATION_MODELS
+ assert(type > IDENTITY && type < TRANS_TYPES);
+#else
+ assert(type > TRANSLATION && type < TRANS_TYPES);
+#endif // ALLOW_TRANSLATION_MODELS
-static int ransac_affine(int *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3, is_degenerate_affine,
- find_affine, project_points_double_affine);
-}
-
-RansacFunc av1_get_ransac_type(TransformationType type) {
- switch (type) {
- case AFFINE: return ransac_affine;
- case ROTZOOM: return ransac_rotzoom;
- case TRANSLATION: return ransac_translation;
- default: assert(0); return NULL;
- }
-}
-
-static int ransac_translation_double_prec(double *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac_double_prec(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3,
- is_degenerate_translation, find_translation,
- project_points_double_translation);
-}
-
-static int ransac_rotzoom_double_prec(double *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac_double_prec(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3,
- is_degenerate_affine, find_rotzoom,
- project_points_double_rotzoom);
-}
-
-static int ransac_affine_double_prec(double *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion,
- int num_desired_motions) {
- return ransac_double_prec(matched_points, npoints, num_inliers_by_motion,
- params_by_motion, num_desired_motions, 3,
- is_degenerate_affine, find_affine,
- project_points_double_affine);
-}
-
-RansacFuncDouble av1_get_ransac_double_prec_type(TransformationType type) {
- switch (type) {
- case AFFINE: return ransac_affine_double_prec;
- case ROTZOOM: return ransac_rotzoom_double_prec;
- case TRANSLATION: return ransac_translation_double_prec;
- default: assert(0); return NULL;
- }
+ return ransac_internal(matched_points, npoints, motion_models,
+ num_desired_motions, &ransac_model_info[type]);
}
diff --git a/aom_dsp/flow_estimation/ransac.h b/aom_dsp/flow_estimation/ransac.h
index aa3a243..6047580 100644
--- a/aom_dsp/flow_estimation/ransac.h
+++ b/aom_dsp/flow_estimation/ransac.h
@@ -16,6 +16,7 @@
#include <stdlib.h>
#include <math.h>
#include <memory.h>
+#include <stdbool.h>
#include "aom_dsp/flow_estimation/flow_estimation.h"
@@ -23,14 +24,9 @@
extern "C" {
#endif
-typedef int (*RansacFunc)(int *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions);
-typedef int (*RansacFuncDouble)(double *matched_points, int npoints,
- int *num_inliers_by_motion,
- MotionModel *params_by_motion, int num_motions);
-RansacFunc av1_get_ransac_type(TransformationType type);
-RansacFuncDouble av1_get_ransac_double_prec_type(TransformationType type);
+bool ransac(const Correspondence *matched_points, int npoints,
+ TransformationType type, MotionModel *motion_models,
+ int num_desired_motions);
#ifdef __cplusplus
}
diff --git a/aom_dsp/flow_estimation/x86/corner_match_avx2.c b/aom_dsp/flow_estimation/x86/corner_match_avx2.c
index 9830ad8..87c76fa 100644
--- a/aom_dsp/flow_estimation/x86/corner_match_avx2.c
+++ b/aom_dsp/flow_estimation/x86/corner_match_avx2.c
@@ -24,12 +24,13 @@
#error "Need to change byte_mask in corner_match_sse4.c if MATCH_SZ != 13"
#endif
-/* Compute corr(im1, im2) * MATCH_SZ * stddev(im1), where the
+/* Compute corr(frame1, frame2) * MATCH_SZ * stddev(frame1), where the
correlation/standard deviation are taken over MATCH_SZ by MATCH_SZ windows
of each image, centered at (x1, y1) and (x2, y2) respectively.
*/
-double av1_compute_cross_correlation_avx2(unsigned char *im1, int stride1,
- int x1, int y1, unsigned char *im2,
+double av1_compute_cross_correlation_avx2(const unsigned char *frame1,
+ int stride1, int x1, int y1,
+ const unsigned char *frame2,
int stride2, int x2, int y2) {
int i, stride1_i = 0, stride2_i = 0;
__m256i temp1, sum_vec, sumsq2_vec, cross_vec, v, v1_1, v2_1;
@@ -41,13 +42,13 @@
sumsq2_vec = zero;
cross_vec = zero;
- im1 += (y1 - MATCH_SZ_BY2) * stride1 + (x1 - MATCH_SZ_BY2);
- im2 += (y2 - MATCH_SZ_BY2) * stride2 + (x2 - MATCH_SZ_BY2);
+ frame1 += (y1 - MATCH_SZ_BY2) * stride1 + (x1 - MATCH_SZ_BY2);
+ frame2 += (y2 - MATCH_SZ_BY2) * stride2 + (x2 - MATCH_SZ_BY2);
for (i = 0; i < MATCH_SZ; ++i) {
- v1 = _mm_and_si128(_mm_loadu_si128((__m128i *)&im1[stride1_i]), mask);
+ v1 = _mm_and_si128(_mm_loadu_si128((__m128i *)&frame1[stride1_i]), mask);
v1_1 = _mm256_cvtepu8_epi16(v1);
- v2 = _mm_and_si128(_mm_loadu_si128((__m128i *)&im2[stride2_i]), mask);
+ v2 = _mm_and_si128(_mm_loadu_si128((__m128i *)&frame2[stride2_i]), mask);
v2_1 = _mm256_cvtepu8_epi16(v2);
v = _mm256_insertf128_si256(_mm256_castsi128_si256(v1), v2, 1);
diff --git a/aom_dsp/flow_estimation/x86/corner_match_sse4.c b/aom_dsp/flow_estimation/x86/corner_match_sse4.c
index 40eec6c..b3cb5bc 100644
--- a/aom_dsp/flow_estimation/x86/corner_match_sse4.c
+++ b/aom_dsp/flow_estimation/x86/corner_match_sse4.c
@@ -28,12 +28,13 @@
#error "Need to change byte_mask in corner_match_sse4.c if MATCH_SZ != 13"
#endif
-/* Compute corr(im1, im2) * MATCH_SZ * stddev(im1), where the
+/* Compute corr(frame1, frame2) * MATCH_SZ * stddev(frame1), where the
correlation/standard deviation are taken over MATCH_SZ by MATCH_SZ windows
of each image, centered at (x1, y1) and (x2, y2) respectively.
*/
-double av1_compute_cross_correlation_sse4_1(unsigned char *im1, int stride1,
- int x1, int y1, unsigned char *im2,
+double av1_compute_cross_correlation_sse4_1(const unsigned char *frame1,
+ int stride1, int x1, int y1,
+ const unsigned char *frame2,
int stride2, int x2, int y2) {
int i;
// 2 16-bit partial sums in lanes 0, 4 (== 2 32-bit partial sums in lanes 0,
@@ -47,14 +48,14 @@
const __m128i mask = _mm_load_si128((__m128i *)byte_mask);
const __m128i zero = _mm_setzero_si128();
- im1 += (y1 - MATCH_SZ_BY2) * stride1 + (x1 - MATCH_SZ_BY2);
- im2 += (y2 - MATCH_SZ_BY2) * stride2 + (x2 - MATCH_SZ_BY2);
+ frame1 += (y1 - MATCH_SZ_BY2) * stride1 + (x1 - MATCH_SZ_BY2);
+ frame2 += (y2 - MATCH_SZ_BY2) * stride2 + (x2 - MATCH_SZ_BY2);
for (i = 0; i < MATCH_SZ; ++i) {
const __m128i v1 =
- _mm_and_si128(_mm_loadu_si128((__m128i *)&im1[i * stride1]), mask);
+ _mm_and_si128(_mm_loadu_si128((__m128i *)&frame1[i * stride1]), mask);
const __m128i v2 =
- _mm_and_si128(_mm_loadu_si128((__m128i *)&im2[i * stride2]), mask);
+ _mm_and_si128(_mm_loadu_si128((__m128i *)&frame2[i * stride2]), mask);
// Using the 'sad' intrinsic here is a bit faster than adding
// v1_l + v1_r and v2_l + v2_r, plus it avoids the need for a 16->32 bit
diff --git a/aom_dsp/flow_estimation/x86/disflow_sse4.c b/aom_dsp/flow_estimation/x86/disflow_sse4.c
new file mode 100644
index 0000000..a62e9a4
--- /dev/null
+++ b/aom_dsp/flow_estimation/x86/disflow_sse4.c
@@ -0,0 +1,560 @@
+/*
+ * Copyright (c) 2022, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 3-Clause Clear License
+ * and the Alliance for Open Media Patent License 1.0. If the BSD 3-Clause Clear
+ * License was not distributed with this source code in the LICENSE file, you
+ * can obtain it at aomedia.org/license/software-license/bsd-3-c-c/. If the
+ * Alliance for Open Media Patent License 1.0 was not distributed with this
+ * source code in the PATENTS file, you can obtain it at
+ * aomedia.org/license/patent-license/.
+ */
+
+#include <assert.h>
+#include <math.h>
+#include <smmintrin.h>
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/flow_estimation/disflow.h"
+#include "aom_dsp/x86/synonyms.h"
+
+#include "config/aom_dsp_rtcd.h"
+
+// Internal cross-check against C code
+// If you set this to 1 and compile in debug mode, then the outputs of the two
+// convolution stages will be checked against the plain C version of the code,
+// and an assertion will be fired if the results differ.
+#define CHECK_RESULTS 0
+
+// Note: Max sum(+ve coefficients) = 1.125 * scale
+static INLINE void get_cubic_kernel_dbl(double x, double *kernel) {
+ assert(0 <= x && x < 1);
+ double x2 = x * x;
+ double x3 = x2 * x;
+ kernel[0] = -0.5 * x + x2 - 0.5 * x3;
+ kernel[1] = 1.0 - 2.5 * x2 + 1.5 * x3;
+ kernel[2] = 0.5 * x + 2.0 * x2 - 1.5 * x3;
+ kernel[3] = -0.5 * x2 + 0.5 * x3;
+}
+
+static INLINE void get_cubic_kernel_int(double x, int16_t *kernel) {
+ double kernel_dbl[4];
+ get_cubic_kernel_dbl(x, kernel_dbl);
+
+ kernel[0] = (int16_t)rint(kernel_dbl[0] * (1 << DISFLOW_INTERP_BITS));
+ kernel[1] = (int16_t)rint(kernel_dbl[1] * (1 << DISFLOW_INTERP_BITS));
+ kernel[2] = (int16_t)rint(kernel_dbl[2] * (1 << DISFLOW_INTERP_BITS));
+ kernel[3] = (int16_t)rint(kernel_dbl[3] * (1 << DISFLOW_INTERP_BITS));
+}
+
+#if CHECK_RESULTS
+static INLINE int get_cubic_value_int(const int *p, const int16_t *kernel) {
+ return kernel[0] * p[0] + kernel[1] * p[1] + kernel[2] * p[2] +
+ kernel[3] * p[3];
+}
+#endif // CHECK_RESULTS
+
+// Compare two regions of width x height pixels, one rooted at position
+// (x, y) in src and the other at (x + u, y + v) in ref.
+// This function returns the sum of squared pixel differences between
+// the two regions.
+//
+// TODO(rachelbarker): Test speed/quality impact of using bilinear interpolation
+// instad of bicubic interpolation
+static INLINE void compute_flow_error(const uint8_t *src, const uint8_t *ref,
+ int width, int height, int stride, int x,
+ int y, double u, double v, int16_t *dt) {
+ // This function is written to do 8x8 convolutions only
+ assert(DISFLOW_PATCH_SIZE == 8);
+
+ // Split offset into integer and fractional parts, and compute cubic
+ // interpolation kernels
+ const int u_int = (int)floor(u);
+ const int v_int = (int)floor(v);
+ const double u_frac = u - floor(u);
+ const double v_frac = v - floor(v);
+
+ int16_t h_kernel[4];
+ int16_t v_kernel[4];
+ get_cubic_kernel_int(u_frac, h_kernel);
+ get_cubic_kernel_int(v_frac, v_kernel);
+
+ // Storage for intermediate values between the two convolution directions
+ int16_t tmp_[DISFLOW_PATCH_SIZE * (DISFLOW_PATCH_SIZE + 3)];
+ int16_t *tmp = tmp_ + DISFLOW_PATCH_SIZE; // Offset by one row
+
+ // Clamp coordinates so that all pixels we fetch will remain within the
+ // allocated border region, but allow them to go far enough out that
+ // the border pixels' values do not change.
+ // Since we are calculating an 8x8 block, the bottom-right pixel
+ // in the block has coordinates (x0 + 7, y0 + 7). Then, the cubic
+ // interpolation has 4 taps, meaning that the output of pixel
+ // (x_w, y_w) depends on the pixels in the range
+ // ([x_w - 1, x_w + 2], [y_w - 1, y_w + 2]).
+ //
+ // Thus the most extreme coordinates which will be fetched are
+ // (x0 - 1, y0 - 1) and (x0 + 9, y0 + 9).
+ const int x0 = clamp(x + u_int, -9, width);
+ const int y0 = clamp(y + v_int, -9, height);
+
+ // Horizontal convolution
+
+ // Prepare the kernel vectors
+ // We split the kernel into two vectors with kernel indices:
+ // 0, 1, 0, 1, 0, 1, 0, 1, and
+ // 2, 3, 2, 3, 2, 3, 2, 3
+ __m128i h_kernel_01 = xx_set2_epi16(h_kernel[0], h_kernel[1]);
+ __m128i h_kernel_23 = xx_set2_epi16(h_kernel[2], h_kernel[3]);
+
+ __m128i round_const_h = _mm_set1_epi32(1 << (DISFLOW_INTERP_BITS - 6 - 1));
+
+ for (int i = -1; i < DISFLOW_PATCH_SIZE + 2; ++i) {
+ const int y_w = y0 + i;
+ const uint8_t *ref_row = &ref[y_w * stride + (x0 - 1)];
+ int16_t *tmp_row = &tmp[i * DISFLOW_PATCH_SIZE];
+
+ // Load this row of pixels.
+ // For an 8x8 patch, we need to load the 8 image pixels + 3 extras,
+ // for a total of 11 pixels. Here we load 16 pixels, but only use
+ // the first 11.
+ __m128i row = _mm_loadu_si128((__m128i *)ref_row);
+
+ // Expand pixels to int16s
+ __m128i px_0to7_i16 = _mm_cvtepu8_epi16(row);
+ __m128i px_4to10_i16 = _mm_cvtepu8_epi16(_mm_srli_si128(row, 4));
+
+ // Relevant multiply instruction
+ // This multiplies pointwise, then sums in pairs.
+ //_mm_madd_epi16();
+
+ // Compute first four outputs
+ // input pixels 0, 1, 1, 2, 2, 3, 3, 4
+ // * kernel 0, 1, 0, 1, 0, 1, 0, 1
+ __m128i px0 =
+ _mm_unpacklo_epi16(px_0to7_i16, _mm_srli_si128(px_0to7_i16, 2));
+ // input pixels 2, 3, 3, 4, 4, 5, 5, 6
+ // * kernel 2, 3, 2, 3, 2, 3, 2, 3
+ __m128i px1 = _mm_unpacklo_epi16(_mm_srli_si128(px_0to7_i16, 4),
+ _mm_srli_si128(px_0to7_i16, 6));
+ // Convolve with kernel and sum 2x2 boxes to form first 4 outputs
+ __m128i sum0 = _mm_add_epi32(_mm_madd_epi16(px0, h_kernel_01),
+ _mm_madd_epi16(px1, h_kernel_23));
+
+ __m128i out0 = _mm_srai_epi32(_mm_add_epi32(sum0, round_const_h),
+ DISFLOW_INTERP_BITS - 6);
+
+ // Compute second four outputs
+ __m128i px2 =
+ _mm_unpacklo_epi16(px_4to10_i16, _mm_srli_si128(px_4to10_i16, 2));
+ __m128i px3 = _mm_unpacklo_epi16(_mm_srli_si128(px_4to10_i16, 4),
+ _mm_srli_si128(px_4to10_i16, 6));
+ __m128i sum1 = _mm_add_epi32(_mm_madd_epi16(px2, h_kernel_01),
+ _mm_madd_epi16(px3, h_kernel_23));
+
+ // Round by just enough bits that the result is
+ // guaranteed to fit into an i16. Then the next stage can use 16 x 16 -> 32
+ // bit multiplies, which should be a fair bit faster than 32 x 32 -> 32
+ // as it does now
+ // This means shifting down so we have 6 extra bits, for a maximum value
+ // of +18360, which can occur if u_frac == 0.5 and the input pixels are
+ // {0, 255, 255, 0}.
+ __m128i out1 = _mm_srai_epi32(_mm_add_epi32(sum1, round_const_h),
+ DISFLOW_INTERP_BITS - 6);
+
+ _mm_storeu_si128((__m128i *)tmp_row, _mm_packs_epi32(out0, out1));
+
+#if CHECK_RESULTS && !defined(NDEBUG)
+ // Cross-check
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; ++j) {
+ const int x_w = x0 + j;
+ int arr[4];
+
+ arr[0] = (int)ref[y_w * stride + (x_w - 1)];
+ arr[1] = (int)ref[y_w * stride + (x_w + 0)];
+ arr[2] = (int)ref[y_w * stride + (x_w + 1)];
+ arr[3] = (int)ref[y_w * stride + (x_w + 2)];
+
+ // Apply kernel and round, keeping 6 extra bits of precision.
+ //
+ // 6 is the maximum allowable number of extra bits which will avoid
+ // the intermediate values overflowing an int16_t. The most extreme
+ // intermediate value occurs when:
+ // * The input pixels are [0, 255, 255, 0]
+ // * u_frac = 0.5
+ // In this case, the un-scaled output is 255 * 1.125 = 286.875.
+ // As an integer with 6 fractional bits, that is 18360, which fits
+ // in an int16_t. But with 7 fractional bits it would be 36720,
+ // which is too large.
+ const int c_value = ROUND_POWER_OF_TWO(get_cubic_value_int(arr, h_kernel),
+ DISFLOW_INTERP_BITS - 6);
+ (void)c_value; // Suppress warnings
+ assert(tmp_row[j] == c_value);
+ }
+#endif // CHECK_RESULTS
+ }
+
+ // Vertical convolution
+ const int round_bits = DISFLOW_INTERP_BITS + 6 - DISFLOW_DERIV_SCALE_LOG2;
+ __m128i round_const_v = _mm_set1_epi32(1 << (round_bits - 1));
+
+ __m128i v_kernel_01 = xx_set2_epi16(v_kernel[0], v_kernel[1]);
+ __m128i v_kernel_23 = xx_set2_epi16(v_kernel[2], v_kernel[3]);
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; ++i) {
+ int16_t *tmp_row = &tmp[i * DISFLOW_PATCH_SIZE];
+
+ // Load 4 rows of 8 x 16-bit values
+ __m128i px0 = _mm_loadu_si128((__m128i *)(tmp_row - DISFLOW_PATCH_SIZE));
+ __m128i px1 = _mm_loadu_si128((__m128i *)tmp_row);
+ __m128i px2 = _mm_loadu_si128((__m128i *)(tmp_row + DISFLOW_PATCH_SIZE));
+ __m128i px3 =
+ _mm_loadu_si128((__m128i *)(tmp_row + 2 * DISFLOW_PATCH_SIZE));
+
+ // We want to calculate px0 * v_kernel[0] + px1 * v_kernel[1] + ... ,
+ // but each multiply expands its output to 32 bits. So we need to be
+ // a little clever about how we do this
+ __m128i sum0 = _mm_add_epi32(
+ _mm_madd_epi16(_mm_unpacklo_epi16(px0, px1), v_kernel_01),
+ _mm_madd_epi16(_mm_unpacklo_epi16(px2, px3), v_kernel_23));
+ __m128i sum1 = _mm_add_epi32(
+ _mm_madd_epi16(_mm_unpackhi_epi16(px0, px1), v_kernel_01),
+ _mm_madd_epi16(_mm_unpackhi_epi16(px2, px3), v_kernel_23));
+
+ __m128i sum0_rounded =
+ _mm_srai_epi32(_mm_add_epi32(sum0, round_const_v), round_bits);
+ __m128i sum1_rounded =
+ _mm_srai_epi32(_mm_add_epi32(sum1, round_const_v), round_bits);
+
+ __m128i warped = _mm_packs_epi32(sum0_rounded, sum1_rounded);
+ __m128i src_pixels_u8 =
+ _mm_loadl_epi64((__m128i *)&src[(y + i) * stride + x]);
+ __m128i src_pixels = _mm_slli_epi16(_mm_cvtepu8_epi16(src_pixels_u8), 3);
+
+ // Calculate delta from the target patch
+ __m128i err = _mm_sub_epi16(warped, src_pixels);
+ _mm_storeu_si128((__m128i *)&dt[i * DISFLOW_PATCH_SIZE], err);
+
+#if CHECK_RESULTS
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; ++j) {
+ int16_t *p = &tmp[i * DISFLOW_PATCH_SIZE + j];
+ int arr[4] = { p[-DISFLOW_PATCH_SIZE], p[0], p[DISFLOW_PATCH_SIZE],
+ p[2 * DISFLOW_PATCH_SIZE] };
+ const int result = get_cubic_value_int(arr, v_kernel);
+
+ // Apply kernel and round.
+ // This time, we have to round off the 6 extra bits which were kept
+ // earlier, but we also want to keep DISFLOW_DERIV_SCALE_LOG2 extra bits
+ // of precision to match the scale of the dx and dy arrays.
+ const int c_warped = ROUND_POWER_OF_TWO(result, round_bits);
+ const int c_src_px = src[(x + j) + (y + i) * stride] << 3;
+ const int c_err = c_warped - c_src_px;
+ (void)c_err;
+ assert(dt[i * DISFLOW_PATCH_SIZE + j] == c_err);
+ }
+#endif // CHECK_RESULTS
+ }
+}
+
+static INLINE void sobel_filter_x(const uint8_t *src, int src_stride,
+ int16_t *dst, int dst_stride) {
+ int16_t tmp_[DISFLOW_PATCH_SIZE * (DISFLOW_PATCH_SIZE + 2)];
+ int16_t *tmp = tmp_ + DISFLOW_PATCH_SIZE;
+#if CHECK_RESULTS
+ const int taps = 3;
+#endif // CHECK_RESULTS
+
+ // Horizontal filter
+ // As the kernel is simply {1, 0, -1}, we implement this as simply
+ // out[x] = image[x-1] - image[x+1]
+ // rather than doing a "proper" convolution operation
+ for (int y = -1; y < DISFLOW_PATCH_SIZE + 1; ++y) {
+ const uint8_t *src_row = src + y * src_stride;
+ int16_t *tmp_row = tmp + y * DISFLOW_PATCH_SIZE;
+
+ // Load pixels and expand to 16 bits
+ __m128i row = _mm_loadu_si128((__m128i *)(src_row - 1));
+ __m128i px0 = _mm_cvtepu8_epi16(row);
+ __m128i px2 = _mm_cvtepu8_epi16(_mm_srli_si128(row, 2));
+
+ __m128i out = _mm_sub_epi16(px0, px2);
+
+ // Store to intermediate array
+ _mm_storeu_si128((__m128i *)tmp_row, out);
+
+#if CHECK_RESULTS
+ // Cross-check
+ static const int16_t h_kernel[3] = { 1, 0, -1 };
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
+ for (int k = 0; k < taps; ++k) {
+ sum += h_kernel[k] * src_row[x + k - 1];
+ }
+ (void)sum;
+ assert(tmp_row[x] == sum);
+ }
+#endif // CHECK_RESULTS
+ }
+
+ // Vertical filter
+ // Here the kernel is {1, 2, 1}, which can be implemented
+ // with simple sums rather than multiplies and adds.
+ // In order to minimize dependency chains, we evaluate in the order
+ // (image[y - 1] + image[y + 1]) + (image[y] << 1)
+ // This way, the first addition and the shift can happen in parallel
+ for (int y = 0; y < DISFLOW_PATCH_SIZE; ++y) {
+ const int16_t *tmp_row = tmp + y * DISFLOW_PATCH_SIZE;
+ int16_t *dst_row = dst + y * dst_stride;
+
+ __m128i px0 = _mm_loadu_si128((__m128i *)(tmp_row - DISFLOW_PATCH_SIZE));
+ __m128i px1 = _mm_loadu_si128((__m128i *)tmp_row);
+ __m128i px2 = _mm_loadu_si128((__m128i *)(tmp_row + DISFLOW_PATCH_SIZE));
+
+ __m128i out =
+ _mm_add_epi16(_mm_add_epi16(px0, px2), _mm_slli_epi16(px1, 1));
+
+ _mm_storeu_si128((__m128i *)dst_row, out);
+
+#if CHECK_RESULTS
+ static const int16_t v_kernel[3] = { 1, 2, 1 };
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
+ for (int k = 0; k < taps; ++k) {
+ sum += v_kernel[k] * tmp[(y + k - 1) * DISFLOW_PATCH_SIZE + x];
+ }
+ (void)sum;
+ assert(dst_row[x] == sum);
+ }
+#endif // CHECK_RESULTS
+ }
+}
+
+static INLINE void sobel_filter_y(const uint8_t *src, int src_stride,
+ int16_t *dst, int dst_stride) {
+ int16_t tmp_[DISFLOW_PATCH_SIZE * (DISFLOW_PATCH_SIZE + 2)];
+ int16_t *tmp = tmp_ + DISFLOW_PATCH_SIZE;
+#if CHECK_RESULTS
+ const int taps = 3;
+#endif // CHECK_RESULTS
+
+ // Horizontal filter
+ // Here the kernel is {1, 2, 1}, which can be implemented
+ // with simple sums rather than multiplies and adds.
+ // In order to minimize dependency chains, we evaluate in the order
+ // (image[y - 1] + image[y + 1]) + (image[y] << 1)
+ // This way, the first addition and the shift can happen in parallel
+ for (int y = -1; y < DISFLOW_PATCH_SIZE + 1; ++y) {
+ const uint8_t *src_row = src + y * src_stride;
+ int16_t *tmp_row = tmp + y * DISFLOW_PATCH_SIZE;
+
+ // Load pixels and expand to 16 bits
+ __m128i row = _mm_loadu_si128((__m128i *)(src_row - 1));
+ __m128i px0 = _mm_cvtepu8_epi16(row);
+ __m128i px1 = _mm_cvtepu8_epi16(_mm_srli_si128(row, 1));
+ __m128i px2 = _mm_cvtepu8_epi16(_mm_srli_si128(row, 2));
+
+ __m128i out =
+ _mm_add_epi16(_mm_add_epi16(px0, px2), _mm_slli_epi16(px1, 1));
+
+ // Store to intermediate array
+ _mm_storeu_si128((__m128i *)tmp_row, out);
+
+#if CHECK_RESULTS
+ // Cross-check
+ static const int16_t h_kernel[3] = { 1, 2, 1 };
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
+ for (int k = 0; k < taps; ++k) {
+ sum += h_kernel[k] * src_row[x + k - 1];
+ }
+ (void)sum;
+ assert(tmp_row[x] == sum);
+ }
+#endif // CHECK_RESULTS
+ }
+
+ // Vertical filter
+ // As the kernel is simply {1, 0, -1}, we implement this as simply
+ // out[x] = image[x-1] - image[x+1]
+ // rather than doing a "proper" convolution operation
+ for (int y = 0; y < DISFLOW_PATCH_SIZE; ++y) {
+ const int16_t *tmp_row = tmp + y * DISFLOW_PATCH_SIZE;
+ int16_t *dst_row = dst + y * dst_stride;
+
+ __m128i px0 = _mm_loadu_si128((__m128i *)(tmp_row - DISFLOW_PATCH_SIZE));
+ __m128i px2 = _mm_loadu_si128((__m128i *)(tmp_row + DISFLOW_PATCH_SIZE));
+
+ __m128i out = _mm_sub_epi16(px0, px2);
+
+ _mm_storeu_si128((__m128i *)dst_row, out);
+
+#if CHECK_RESULTS
+ static const int16_t v_kernel[3] = { 1, 0, -1 };
+ for (int x = 0; x < DISFLOW_PATCH_SIZE; ++x) {
+ int sum = 0;
+ for (int k = 0; k < taps; ++k) {
+ sum += v_kernel[k] * tmp[(y + k - 1) * DISFLOW_PATCH_SIZE + x];
+ }
+ (void)sum;
+ assert(dst_row[x] == sum);
+ }
+#endif // CHECK_RESULTS
+ }
+}
+
+static INLINE void compute_flow_vector(const int16_t *dx, int dx_stride,
+ const int16_t *dy, int dy_stride,
+ const int16_t *dt, int dt_stride,
+ int *b) {
+ __m128i b0_acc = _mm_setzero_si128();
+ __m128i b1_acc = _mm_setzero_si128();
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ // Need to load 8 values of dx, 8 of dy, 8 of dt, which conveniently
+ // works out to one register each. Then just calculate dx * dt, dy * dt,
+ // and (implicitly) sum horizontally in pairs.
+ // This gives four 32-bit partial sums for each of b[0] and b[1],
+ // which can be accumulated and summed at the end.
+ __m128i dx_row = _mm_loadu_si128((__m128i *)&dx[i * dx_stride]);
+ __m128i dy_row = _mm_loadu_si128((__m128i *)&dy[i * dy_stride]);
+ __m128i dt_row = _mm_loadu_si128((__m128i *)&dt[i * dt_stride]);
+
+ b0_acc = _mm_add_epi32(b0_acc, _mm_madd_epi16(dx_row, dt_row));
+ b1_acc = _mm_add_epi32(b1_acc, _mm_madd_epi16(dy_row, dt_row));
+ }
+
+ // We need to set b[0] = sum(b0_acc), b[1] = sum(b1_acc).
+ // We might as well use a `hadd` instruction to do 4 of the additions
+ // needed here. Then that just leaves two more additions, which can be
+ // done in scalar code
+ __m128i partial_sum = _mm_hadd_epi32(b0_acc, b1_acc);
+ b[0] = _mm_extract_epi32(partial_sum, 0) + _mm_extract_epi32(partial_sum, 1);
+ b[1] = _mm_extract_epi32(partial_sum, 2) + _mm_extract_epi32(partial_sum, 3);
+
+#if CHECK_RESULTS
+ int c_result[2] = { 0 };
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; j++) {
+ c_result[0] += dx[i * dx_stride + j] * dt[i * dt_stride + j];
+ c_result[1] += dy[i * dy_stride + j] * dt[i * dt_stride + j];
+ }
+ }
+
+ assert(b[0] == c_result[0]);
+ assert(b[1] == c_result[1]);
+#endif // CHECK_RESULTS
+}
+
+static INLINE void compute_flow_matrix(const int16_t *dx, int dx_stride,
+ const int16_t *dy, int dy_stride,
+ double *M) {
+ __m128i acc[4] = { 0 };
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ __m128i dx_row = _mm_loadu_si128((__m128i *)&dx[i * dx_stride]);
+ __m128i dy_row = _mm_loadu_si128((__m128i *)&dy[i * dy_stride]);
+
+ acc[0] = _mm_add_epi32(acc[0], _mm_madd_epi16(dx_row, dx_row));
+ acc[1] = _mm_add_epi32(acc[1], _mm_madd_epi16(dx_row, dy_row));
+ // Don't compute acc[2], as it should be equal to acc[1]
+ acc[3] = _mm_add_epi32(acc[3], _mm_madd_epi16(dy_row, dy_row));
+ }
+
+ // Condense sums
+ __m128i partial_sum_0 = _mm_hadd_epi32(acc[0], acc[1]);
+ __m128i partial_sum_1 = _mm_hadd_epi32(acc[1], acc[3]);
+ __m128i result = _mm_hadd_epi32(partial_sum_0, partial_sum_1);
+
+ // Apply regularization
+ // We follow the standard regularization method of adding `k * I` before
+ // inverting. This ensures that the matrix will be invertible.
+ //
+ // Setting the regularization strength k to 1 seems to work well here, as
+ // typical values coming from the other equations are very large (1e5 to
+ // 1e6, with an upper limit of around 6e7, at the time of writing).
+ // It also preserves the property that all matrix values are whole numbers,
+ // which is convenient for integerized SIMD implementation.
+ result = _mm_add_epi32(result, _mm_set_epi32(1, 0, 0, 1));
+
+#if CHECK_RESULTS
+ int tmp[4] = { 0 };
+
+ for (int i = 0; i < DISFLOW_PATCH_SIZE; i++) {
+ for (int j = 0; j < DISFLOW_PATCH_SIZE; j++) {
+ tmp[0] += dx[i * dx_stride + j] * dx[i * dx_stride + j];
+ tmp[1] += dx[i * dx_stride + j] * dy[i * dy_stride + j];
+ // Don't compute tmp[2], as it should be equal to tmp[1]
+ tmp[3] += dy[i * dy_stride + j] * dy[i * dy_stride + j];
+ }
+ }
+
+ // Apply regularization
+ tmp[0] += 1;
+ tmp[3] += 1;
+
+ tmp[2] = tmp[1];
+
+ assert(tmp[0] == _mm_extract_epi32(result, 0));
+ assert(tmp[1] == _mm_extract_epi32(result, 1));
+ assert(tmp[2] == _mm_extract_epi32(result, 2));
+ assert(tmp[3] == _mm_extract_epi32(result, 3));
+#endif // CHECK_RESULTS
+
+ // Convert results to doubles and store
+ _mm_storeu_pd(M, _mm_cvtepi32_pd(result));
+ _mm_storeu_pd(M + 2, _mm_cvtepi32_pd(_mm_srli_si128(result, 8)));
+}
+
+// Try to invert the matrix M
+// Note: Due to the nature of how a least-squares matrix is constructed, all of
+// the eigenvalues will be >= 0, and therefore det M >= 0 as well.
+// The regularization term `+ k * I` further ensures that det M >= k^2.
+// As mentioned in compute_flow_matrix(), here we use k = 1, so det M >= 1.
+// So we don't have to worry about non-invertible matrices here.
+static INLINE void invert_2x2(const double *M, double *M_inv) {
+ double det = (M[0] * M[3]) - (M[1] * M[2]);
+ assert(det >= 1);
+ const double det_inv = 1 / det;
+
+ M_inv[0] = M[3] * det_inv;
+ M_inv[1] = -M[1] * det_inv;
+ M_inv[2] = -M[2] * det_inv;
+ M_inv[3] = M[0] * det_inv;
+}
+
+void aom_compute_flow_at_point_sse4_1(const uint8_t *src, const uint8_t *ref,
+ int x, int y, int width, int height,
+ int stride, double *u, double *v) {
+ double M[4];
+ double M_inv[4];
+ int b[2];
+ int16_t dt[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
+ int16_t dx[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
+ int16_t dy[DISFLOW_PATCH_SIZE * DISFLOW_PATCH_SIZE];
+
+ // Compute gradients within this patch
+ const uint8_t *src_patch = &src[y * stride + x];
+ sobel_filter_x(src_patch, stride, dx, DISFLOW_PATCH_SIZE);
+ sobel_filter_y(src_patch, stride, dy, DISFLOW_PATCH_SIZE);
+
+ compute_flow_matrix(dx, DISFLOW_PATCH_SIZE, dy, DISFLOW_PATCH_SIZE, M);
+ invert_2x2(M, M_inv);
+
+ for (int itr = 0; itr < DISFLOW_MAX_ITR; itr++) {
+ compute_flow_error(src, ref, width, height, stride, x, y, *u, *v, dt);
+ compute_flow_vector(dx, DISFLOW_PATCH_SIZE, dy, DISFLOW_PATCH_SIZE, dt,
+ DISFLOW_PATCH_SIZE, b);
+
+ // Solve flow equations to find a better estimate for the flow vector
+ // at this point
+ const double step_u = M_inv[0] * b[0] + M_inv[1] * b[1];
+ const double step_v = M_inv[2] * b[0] + M_inv[3] * b[1];
+ *u += fclamp(step_u * DISFLOW_STEP_SIZE, -2, 2);
+ *v += fclamp(step_v * DISFLOW_STEP_SIZE, -2, 2);
+
+ if (fabs(step_u) + fabs(step_v) < DISFLOW_STEP_SIZE_THRESOLD) {
+ // Stop iteration when we're close to convergence
+ break;
+ }
+ }
+}
diff --git a/aom_dsp/fwd_txfm.c b/aom_dsp/fwd_txfm.c
index 3d30444..5503501 100644
--- a/aom_dsp/fwd_txfm.c
+++ b/aom_dsp/fwd_txfm.c
@@ -16,19 +16,16 @@
void aom_fdct4x4_c(const int16_t *input, tran_low_t *output, int stride) {
// The 2D transform is done with two passes which are actually pretty
// similar. In the first one, we transform the columns and transpose
- // the results. In the second one, we transform the rows. To achieve that,
- // as the first pass results are transposed, we transpose the columns (that
- // is the transposed rows) and transpose the results (so that it goes back
- // in normal/row positions).
+ // the results. In the second one, we transform the rows.
// We need an intermediate buffer between passes.
tran_low_t intermediate[4 * 4];
const tran_low_t *in_low = NULL;
tran_low_t *out = intermediate;
- // Do the two transform/transpose passes
+ // Do the two transform passes
for (int pass = 0; pass < 2; ++pass) {
- tran_high_t in_high[4]; // canbe16
- tran_high_t step[4]; // canbe16
- tran_high_t temp1, temp2; // needs32
+ tran_high_t in_high[4]; // canbe16
+ tran_high_t step[4]; // canbe16
+ tran_low_t temp[4];
for (int i = 0; i < 4; ++i) {
// Load inputs.
if (pass == 0) {
@@ -39,30 +36,40 @@
if (i == 0 && in_high[0]) {
++in_high[0];
}
+ ++input; // Next column
} else {
assert(in_low != NULL);
in_high[0] = in_low[0 * 4];
in_high[1] = in_low[1 * 4];
in_high[2] = in_low[2 * 4];
in_high[3] = in_low[3 * 4];
- ++in_low;
+ ++in_low; // Next column (which is a transposed row)
}
// Transform.
step[0] = in_high[0] + in_high[3];
step[1] = in_high[1] + in_high[2];
step[2] = in_high[1] - in_high[2];
step[3] = in_high[0] - in_high[3];
- temp1 = (step[0] + step[1]) * cospi_16_64;
- temp2 = (step[0] - step[1]) * cospi_16_64;
- out[0] = (tran_low_t)fdct_round_shift(temp1);
- out[2] = (tran_low_t)fdct_round_shift(temp2);
- temp1 = step[2] * cospi_24_64 + step[3] * cospi_8_64;
- temp2 = -step[2] * cospi_8_64 + step[3] * cospi_24_64;
- out[1] = (tran_low_t)fdct_round_shift(temp1);
- out[3] = (tran_low_t)fdct_round_shift(temp2);
- // Do next column (which is a transposed row in second/horizontal pass)
- ++input;
- out += 4;
+ temp[0] = (tran_low_t)fdct_round_shift((step[0] + step[1]) * cospi_16_64);
+ temp[2] = (tran_low_t)fdct_round_shift((step[0] - step[1]) * cospi_16_64);
+ temp[1] = (tran_low_t)fdct_round_shift(step[2] * cospi_24_64 +
+ step[3] * cospi_8_64);
+ temp[3] = (tran_low_t)fdct_round_shift(-step[2] * cospi_8_64 +
+ step[3] * cospi_24_64);
+ // Only transpose the first pass.
+ if (pass == 0) {
+ out[0] = temp[0];
+ out[1] = temp[1];
+ out[2] = temp[2];
+ out[3] = temp[3];
+ out += 4;
+ } else {
+ out[0 * 4] = temp[0];
+ out[1 * 4] = temp[1];
+ out[2 * 4] = temp[2];
+ out[3 * 4] = temp[3];
+ ++out;
+ }
}
// Setup in/out for next pass.
in_low = intermediate;
@@ -78,19 +85,16 @@
void aom_fdct4x4_lp_c(const int16_t *input, int16_t *output, int stride) {
// The 2D transform is done with two passes which are actually pretty
// similar. In the first one, we transform the columns and transpose
- // the results. In the second one, we transform the rows. To achieve that,
- // as the first pass results are transposed, we transpose the columns (that
- // is the transposed rows) and transpose the results (so that it goes back
- // in normal/row positions).
+ // the results. In the second one, we transform the rows.
// We need an intermediate buffer between passes.
int16_t intermediate[4 * 4];
const int16_t *in_low = NULL;
int16_t *out = intermediate;
- // Do the two transform/transpose passes
+ // Do the two transform passes
for (int pass = 0; pass < 2; ++pass) {
- int32_t in_high[4]; // canbe16
- int32_t step[4]; // canbe16
- int32_t temp1, temp2; // needs32
+ int32_t in_high[4]; // canbe16
+ int32_t step[4]; // canbe16
+ int16_t temp[4];
for (int i = 0; i < 4; ++i) {
// Load inputs.
if (pass == 0) {
@@ -98,6 +102,7 @@
in_high[1] = input[1 * stride] * 16;
in_high[2] = input[2 * stride] * 16;
in_high[3] = input[3 * stride] * 16;
+ ++input;
if (i == 0 && in_high[0]) {
++in_high[0];
}
@@ -114,17 +119,26 @@
step[1] = in_high[1] + in_high[2];
step[2] = in_high[1] - in_high[2];
step[3] = in_high[0] - in_high[3];
- temp1 = (step[0] + step[1]) * (int32_t)cospi_16_64;
- temp2 = (step[0] - step[1]) * (int32_t)cospi_16_64;
- out[0] = (int16_t)fdct_round_shift(temp1);
- out[2] = (int16_t)fdct_round_shift(temp2);
- temp1 = step[2] * (int32_t)cospi_24_64 + step[3] * (int32_t)cospi_8_64;
- temp2 = -step[2] * (int32_t)cospi_8_64 + step[3] * (int32_t)cospi_24_64;
- out[1] = (int16_t)fdct_round_shift(temp1);
- out[3] = (int16_t)fdct_round_shift(temp2);
- // Do next column (which is a transposed row in second/horizontal pass)
- ++input;
- out += 4;
+ temp[0] = (int16_t)fdct_round_shift((step[0] + step[1]) * cospi_16_64);
+ temp[2] = (int16_t)fdct_round_shift((step[0] - step[1]) * cospi_16_64);
+ temp[1] = (int16_t)fdct_round_shift(step[2] * cospi_24_64 +
+ step[3] * cospi_8_64);
+ temp[3] = (int16_t)fdct_round_shift(-step[2] * cospi_8_64 +
+ step[3] * cospi_24_64);
+ // Only transpose the first pass.
+ if (pass == 0) {
+ out[0] = temp[0];
+ out[1] = temp[1];
+ out[2] = temp[2];
+ out[3] = temp[3];
+ out += 4;
+ } else {
+ out[0 * 4] = temp[0];
+ out[1 * 4] = temp[1];
+ out[2 * 4] = temp[2];
+ out[3 * 4] = temp[3];
+ ++out;
+ }
}
// Setup in/out for next pass.
in_low = intermediate;
@@ -137,6 +151,7 @@
}
}
+#if CONFIG_INTERNAL_STATS
void aom_fdct8x8_c(const int16_t *input, tran_low_t *final_output, int stride) {
int i, j;
tran_low_t intermediate[64];
@@ -220,8 +235,9 @@
for (j = 0; j < 8; ++j) final_output[j + i * 8] /= 2;
}
}
+#endif // CONFIG_INTERNAL_STATS
-#if CONFIG_AV1_HIGHBITDEPTH
+#if CONFIG_AV1_HIGHBITDEPTH && CONFIG_INTERNAL_STATS
void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *final_output,
int stride) {
aom_fdct8x8_c(input, final_output, stride);
diff --git a/aom_dsp/grain_table.h b/aom_dsp/grain_table.h
index 3f75101..49e8498 100644
--- a/aom_dsp/grain_table.h
+++ b/aom_dsp/grain_table.h
@@ -52,7 +52,7 @@
/*!\brief Add a mapping from [time_stamp, end_time) to the given grain
* parameters
*
- * \param[in/out] table The grain table
+ * \param[in,out] table The grain table
* \param[in] time_stamp The start time stamp
* \param[in] end_stamp The end time_stamp
* \param[in] grain The grain parameters
diff --git a/aom_dsp/mathutils.h b/aom_dsp/mathutils.h
index 22b0202..cbb6cf4 100644
--- a/aom_dsp/mathutils.h
+++ b/aom_dsp/mathutils.h
@@ -63,32 +63,51 @@
// Solves for n-dim x in a least squares sense to minimize |Ax - b|^2
// The solution is simply x = (A'A)^-1 A'b or simply the solution for
// the system: A'A x = A'b
-static INLINE int least_squares(int n, double *A, int rows, int stride,
- double *b, double *scratch, double *x) {
- int i, j, k;
- double *scratch_ = NULL;
- double *AtA, *Atb;
- if (!scratch) {
- scratch_ = (double *)aom_malloc(sizeof(*scratch) * n * (n + 1));
- if (!scratch_) return 0;
- scratch = scratch_;
- }
- AtA = scratch;
- Atb = scratch + n * n;
+//
+// This process is split into three steps in order to avoid needing to
+// explicitly allocate the A matrix, which may be very large if there
+// are many equations to solve.
+//
+// The process for using this is (in pseudocode):
+//
+// Allocate mat (size n*n), y (size n), a (size n), x (size n)
+// least_squares_init(mat, y, n)
+// for each equation a . x = b {
+// least_squares_accumulate(mat, y, a, b, n)
+// }
+// least_squares_solve(mat, y, x, n)
+//
+// where:
+// * mat, y are accumulators for the values A'A and A'b respectively,
+// * a, b are the coefficients of each individual equation,
+// * x is the result vector
+// * and n is the problem size
+static INLINE void least_squares_init(double *mat, double *y, int n) {
+ memset(mat, 0, n * n * sizeof(double));
+ memset(y, 0, n * sizeof(double));
+}
- for (i = 0; i < n; ++i) {
- for (j = i; j < n; ++j) {
- AtA[i * n + j] = 0.0;
- for (k = 0; k < rows; ++k)
- AtA[i * n + j] += A[k * stride + i] * A[k * stride + j];
- AtA[j * n + i] = AtA[i * n + j];
+// Round the given positive value to nearest integer
+static AOM_FORCE_INLINE int iroundpf(float x) {
+ assert(x >= 0.0);
+ return (int)(x + 0.5f);
+}
+
+static INLINE void least_squares_accumulate(double *mat, double *y,
+ const double *a, double b, int n) {
+ for (int i = 0; i < n; i++) {
+ for (int j = 0; j < n; j++) {
+ mat[i * n + j] += a[i] * a[j];
}
- Atb[i] = 0;
- for (k = 0; k < rows; ++k) Atb[i] += A[k * stride + i] * b[k];
}
- int ret = linsolve(n, AtA, n, Atb, x);
- aom_free(scratch_);
- return ret;
+ for (int i = 0; i < n; i++) {
+ y[i] += a[i] * b;
+ }
+}
+
+static INLINE int least_squares_solve(double *mat, double *y, double *x,
+ int n) {
+ return linsolve(n, mat, n, y, x);
}
// Matrix multiply
@@ -108,4 +127,19 @@
}
}
+static AOM_INLINE float approx_exp(float y) {
+#define A ((1 << 23) / 0.69314718056f) // (1 << 23) / ln(2)
+#define B \
+ 127 // Offset for the exponent according to IEEE floating point standard.
+#define C 60801 // Magic number controls the accuracy of approximation
+ union {
+ float as_float;
+ int32_t as_int32;
+ } container;
+ container.as_int32 = ((int32_t)(y * A)) + ((B << 23) - C);
+ return container.as_float;
+#undef A
+#undef B
+#undef C
+}
#endif // AOM_AOM_DSP_MATHUTILS_H_
diff --git a/aom_dsp/noise_model.c b/aom_dsp/noise_model.c
index 8521232..065ec9a 100644
--- a/aom_dsp/noise_model.c
+++ b/aom_dsp/noise_model.c
@@ -571,7 +571,6 @@
const int num_blocks_w = (w + block_size - 1) / block_size;
const int num_blocks_h = (h + block_size - 1) / block_size;
int num_flat = 0;
- int bx = 0, by = 0;
double *plane = (double *)aom_malloc(n * sizeof(*plane));
double *block = (double *)aom_malloc(n * sizeof(*block));
index_and_score_t *scores = (index_and_score_t *)aom_malloc(
@@ -587,19 +586,18 @@
#ifdef NOISE_MODEL_LOG_SCORE
fprintf(stderr, "score = [");
#endif
- for (by = 0; by < num_blocks_h; ++by) {
- for (bx = 0; bx < num_blocks_w; ++bx) {
+ for (int by = 0; by < num_blocks_h; ++by) {
+ for (int bx = 0; bx < num_blocks_w; ++bx) {
// Compute gradient covariance matrix.
- double Gxx = 0, Gxy = 0, Gyy = 0;
- double var = 0;
- double mean = 0;
- int xi, yi;
aom_flat_block_finder_extract_block(block_finder, data, w, h, stride,
bx * block_size, by * block_size,
plane, block);
+ double Gxx = 0, Gxy = 0, Gyy = 0;
+ double mean = 0;
+ double var = 0;
- for (yi = 1; yi < block_size - 1; ++yi) {
- for (xi = 1; xi < block_size - 1; ++xi) {
+ for (int yi = 1; yi < block_size - 1; ++yi) {
+ for (int xi = 1; xi < block_size - 1; ++xi) {
const double gx = (block[yi * block_size + xi + 1] -
block[yi * block_size + xi - 1]) /
2;
@@ -1623,6 +1621,8 @@
return 1;
}
+// TODO(aomedia:3151): Handle a monochrome image (sd->u_buffer and sd->v_buffer
+// are null pointers) correctly.
int aom_denoise_and_model_run(struct aom_denoise_and_model_t *ctx,
YV12_BUFFER_CONFIG *sd,
aom_film_grain_t *film_grain, int apply_denoise) {
@@ -1680,10 +1680,12 @@
if (apply_denoise) {
memcpy(raw_data[0], ctx->denoised[0],
(strides[0] * sd->y_height) << use_highbd);
- memcpy(raw_data[1], ctx->denoised[1],
- (strides[1] * sd->uv_height) << use_highbd);
- memcpy(raw_data[2], ctx->denoised[2],
- (strides[2] * sd->uv_height) << use_highbd);
+ if (!sd->monochrome) {
+ memcpy(raw_data[1], ctx->denoised[1],
+ (strides[1] * sd->uv_height) << use_highbd);
+ memcpy(raw_data[2], ctx->denoised[2],
+ (strides[2] * sd->uv_height) << use_highbd);
+ }
}
}
return 1;
diff --git a/aom_dsp/noise_model.h b/aom_dsp/noise_model.h
index f385251..8228aea 100644
--- a/aom_dsp/noise_model.h
+++ b/aom_dsp/noise_model.h
@@ -293,13 +293,13 @@
* parameter will be true when the input buffer was successfully denoised and
* grain was modelled. Returns false on error.
*
- * \param[in] ctx Struct allocated with
+ * \param[in] ctx Struct allocated with
* aom_denoise_and_model_alloc that holds some
* buffers for denoising and the current noise
* estimate.
- * \param[in/out] buf The raw input buffer to be denoised.
+ * \param[in,out] buf The raw input buffer to be denoised.
* \param[out] grain Output film grain parameters
- * \param[out] apply_denoise Whether or not to apply the denoising to the
+ * \param[in] apply_denoise Whether or not to apply the denoising to the
* frame that will be encoded
*/
int aom_denoise_and_model_run(struct aom_denoise_and_model_t *ctx,
diff --git a/aom_dsp/prob.h b/aom_dsp/prob.h
index 5e25b9c..5711a40 100644
--- a/aom_dsp/prob.h
+++ b/aom_dsp/prob.h
@@ -31,16 +31,12 @@
#define CDF_SIZE(x) ((x) + 1)
#define CDF_PROB_BITS 15
#define CDF_PROB_TOP (1 << CDF_PROB_BITS)
-#define CDF_INIT_TOP 32768
-#define CDF_SHIFT (15 - CDF_PROB_BITS)
/*The value stored in an iCDF is CDF_PROB_TOP minus the actual cumulative
probability (an "inverse" CDF).
This function converts from one representation to the other (and is its own
inverse).*/
#define AOM_ICDF(x) (CDF_PROB_TOP - (x))
-#if CDF_SHIFT == 0
-
#define AOM_CDF2(a0) AOM_ICDF(a0), AOM_ICDF(CDF_PROB_TOP), 0
#define AOM_CDF3(a0, a1) AOM_ICDF(a0), AOM_ICDF(a1), AOM_ICDF(CDF_PROB_TOP), 0
#define AOM_CDF4(a0, a1, a2) \
@@ -101,535 +97,6 @@
AOM_ICDF(a11), AOM_ICDF(a12), AOM_ICDF(a13), AOM_ICDF(a14), \
AOM_ICDF(CDF_PROB_TOP), 0
-#else
-#define AOM_CDF2(a0) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 2) + \
- ((CDF_INIT_TOP - 2) >> 1)) / \
- ((CDF_INIT_TOP - 2)) + \
- 1) \
- , AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF3(a0, a1) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 3) + \
- ((CDF_INIT_TOP - 3) >> 1)) / \
- ((CDF_INIT_TOP - 3)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 3) + \
- ((CDF_INIT_TOP - 3) >> 1)) / \
- ((CDF_INIT_TOP - 3)) + \
- 2), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF4(a0, a1, a2) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) + \
- ((CDF_INIT_TOP - 4) >> 1)) / \
- ((CDF_INIT_TOP - 4)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) + \
- ((CDF_INIT_TOP - 4) >> 1)) / \
- ((CDF_INIT_TOP - 4)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 4) + \
- ((CDF_INIT_TOP - 4) >> 1)) / \
- ((CDF_INIT_TOP - 4)) + \
- 3), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF5(a0, a1, a2, a3) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
- ((CDF_INIT_TOP - 5) >> 1)) / \
- ((CDF_INIT_TOP - 5)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
- ((CDF_INIT_TOP - 5) >> 1)) / \
- ((CDF_INIT_TOP - 5)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
- ((CDF_INIT_TOP - 5) >> 1)) / \
- ((CDF_INIT_TOP - 5)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 5) + \
- ((CDF_INIT_TOP - 5) >> 1)) / \
- ((CDF_INIT_TOP - 5)) + \
- 4), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF6(a0, a1, a2, a3, a4) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
- ((CDF_INIT_TOP - 6) >> 1)) / \
- ((CDF_INIT_TOP - 6)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
- ((CDF_INIT_TOP - 6) >> 1)) / \
- ((CDF_INIT_TOP - 6)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
- ((CDF_INIT_TOP - 6) >> 1)) / \
- ((CDF_INIT_TOP - 6)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
- ((CDF_INIT_TOP - 6) >> 1)) / \
- ((CDF_INIT_TOP - 6)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 6) + \
- ((CDF_INIT_TOP - 6) >> 1)) / \
- ((CDF_INIT_TOP - 6)) + \
- 5), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF7(a0, a1, a2, a3, a4, a5) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 7) + \
- ((CDF_INIT_TOP - 7) >> 1)) / \
- ((CDF_INIT_TOP - 7)) + \
- 6), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF8(a0, a1, a2, a3, a4, a5, a6) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 8) + \
- ((CDF_INIT_TOP - 8) >> 1)) / \
- ((CDF_INIT_TOP - 8)) + \
- 7), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF9(a0, a1, a2, a3, a4, a5, a6, a7) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 9) + \
- ((CDF_INIT_TOP - 9) >> 1)) / \
- ((CDF_INIT_TOP - 9)) + \
- 8), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF10(a0, a1, a2, a3, a4, a5, a6, a7, a8) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 10) + \
- ((CDF_INIT_TOP - 10) >> 1)) / \
- ((CDF_INIT_TOP - 10)) + \
- 9), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF11(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 11) + \
- ((CDF_INIT_TOP - 11) >> 1)) / \
- ((CDF_INIT_TOP - 11)) + \
- 10), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF12(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 10), \
- AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 12) + \
- ((CDF_INIT_TOP - 12) >> 1)) / \
- ((CDF_INIT_TOP - 12)) + \
- 11), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF13(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 10), \
- AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 11), \
- AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 13) + \
- ((CDF_INIT_TOP - 13) >> 1)) / \
- ((CDF_INIT_TOP - 13)) + \
- 12), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF14(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 10), \
- AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 11), \
- AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 12), \
- AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 14) + \
- ((CDF_INIT_TOP - 14) >> 1)) / \
- ((CDF_INIT_TOP - 14)) + \
- 13), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF15(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 10), \
- AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 11), \
- AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 12), \
- AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 13), \
- AOM_ICDF((((a13)-14) * ((CDF_INIT_TOP >> CDF_SHIFT) - 15) + \
- ((CDF_INIT_TOP - 15) >> 1)) / \
- ((CDF_INIT_TOP - 15)) + \
- 14), \
- AOM_ICDF(CDF_PROB_TOP), 0
-#define AOM_CDF16(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, \
- a14) \
- AOM_ICDF((((a0)-1) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 1) \
- , \
- AOM_ICDF((((a1)-2) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 2), \
- AOM_ICDF((((a2)-3) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 3), \
- AOM_ICDF((((a3)-4) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 4), \
- AOM_ICDF((((a4)-5) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 5), \
- AOM_ICDF((((a5)-6) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 6), \
- AOM_ICDF((((a6)-7) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 7), \
- AOM_ICDF((((a7)-8) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 8), \
- AOM_ICDF((((a8)-9) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 9), \
- AOM_ICDF((((a9)-10) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 10), \
- AOM_ICDF((((a10)-11) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 11), \
- AOM_ICDF((((a11)-12) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 12), \
- AOM_ICDF((((a12)-13) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 13), \
- AOM_ICDF((((a13)-14) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 14), \
- AOM_ICDF((((a14)-15) * ((CDF_INIT_TOP >> CDF_SHIFT) - 16) + \
- ((CDF_INIT_TOP - 16) >> 1)) / \
- ((CDF_INIT_TOP - 16)) + \
- 15), \
- AOM_ICDF(CDF_PROB_TOP), 0
-
-#endif
-
static INLINE uint8_t get_prob(unsigned int num, unsigned int den) {
assert(den != 0);
{
diff --git a/aom_dsp/psnr.c b/aom_dsp/psnr.c
index 08fb69c..f71590c 100644
--- a/aom_dsp/psnr.c
+++ b/aom_dsp/psnr.c
@@ -44,9 +44,9 @@
}
#if CONFIG_AV1_HIGHBITDEPTH
-static int64_t encoder_highbd_8_sse(const uint8_t *a8, int a_stride,
- const uint8_t *b8, int b_stride, int w,
- int h) {
+static int64_t encoder_highbd_sse(const uint8_t *a8, int a_stride,
+ const uint8_t *b8, int b_stride, int w,
+ int h) {
const uint16_t *a = CONVERT_TO_SHORTPTR(a8);
const uint16_t *b = CONVERT_TO_SHORTPTR(b8);
int64_t sse = 0;
@@ -84,10 +84,8 @@
for (y = 0; y < height / 16; ++y) {
const uint8_t *pa = a;
const uint8_t *pb = b;
- unsigned int sse;
for (x = 0; x < width / 16; ++x) {
- aom_mse16x16(pa, a_stride, pb, b_stride, &sse);
- total_sse += sse;
+ total_sse += aom_sse(pa, a_stride, pb, b_stride, 16, 16);
pa += 16;
pb += 16;
@@ -128,22 +126,20 @@
const int dh = height % 16;
if (dw > 0) {
- total_sse += encoder_highbd_8_sse(&a[width - dw], a_stride, &b[width - dw],
- b_stride, dw, height);
+ total_sse += encoder_highbd_sse(&a[width - dw], a_stride, &b[width - dw],
+ b_stride, dw, height);
}
if (dh > 0) {
- total_sse += encoder_highbd_8_sse(&a[(height - dh) * a_stride], a_stride,
- &b[(height - dh) * b_stride], b_stride,
- width - dw, dh);
+ total_sse += encoder_highbd_sse(&a[(height - dh) * a_stride], a_stride,
+ &b[(height - dh) * b_stride], b_stride,
+ width - dw, dh);
}
for (y = 0; y < height / 16; ++y) {
const uint8_t *pa = a;
const uint8_t *pb = b;
- unsigned int sse;
for (x = 0; x < width / 16; ++x) {
- aom_highbd_8_mse16x16(pa, a_stride, pb, b_stride, &sse);
- total_sse += sse;
+ total_sse += aom_highbd_sse(pa, a_stride, pb, b_stride, 16, 16);
pa += 16;
pb += 16;
}
diff --git a/aom_dsp/pyramid.c b/aom_dsp/pyramid.c
new file mode 100644
index 0000000..a26d302
--- /dev/null
+++ b/aom_dsp/pyramid.c
@@ -0,0 +1,411 @@
+/*
+ * Copyright (c) 2022, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "aom_dsp/pyramid.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/bitops.h"
+#include "aom_util/aom_thread.h"
+
+// TODO(rachelbarker): Move needed code from av1/ to aom_dsp/
+#include "av1/common/resize.h"
+
+#include <assert.h>
+#include <string.h>
+
+// Lifecycle:
+// * Frame buffer alloc code calls aom_get_pyramid_alloc_size()
+// to work out how much space is needed for a given number of pyramid
+// levels. This is counted in the size checked against the max allocation
+// limit
+// * Then calls aom_alloc_pyramid() to actually create the pyramid
+// * Pyramid is initially marked as invalid (no data)
+// * Whenever pyramid is needed, we check the valid flag. If set, use existing
+// data. If not set, compute full pyramid
+// * Whenever frame buffer is reused, clear the valid flag
+// * Whenever frame buffer is resized, reallocate pyramid
+
+size_t aom_get_pyramid_alloc_size(int width, int height, int n_levels,
+ bool image_is_16bit) {
+ // Limit number of levels on small frames
+ const int msb = get_msb(AOMMIN(width, height));
+ const int max_levels = AOMMAX(msb - MIN_PYRAMID_SIZE_LOG2, 1);
+ n_levels = AOMMIN(n_levels, max_levels);
+
+ size_t alloc_size = 0;
+ alloc_size += sizeof(ImagePyramid);
+ alloc_size += n_levels * sizeof(PyramidLayer);
+
+ // Calculate how much memory is needed for downscaled frame buffers
+ size_t buffer_size = 0;
+
+ // Work out if we need to allocate a few extra bytes for alignment.
+ // aom_memalign() will ensure that the start of the allocation is aligned
+ // to a multiple of PYRAMID_ALIGNMENT. But we want the first image pixel
+ // to be aligned, not the first byte of the allocation.
+ //
+ // In the loop below, we ensure that the stride of every image is a multiple
+ // of PYRAMID_ALIGNMENT. Thus the allocated size of each pyramid level will
+ // also be a multiple of PYRAMID_ALIGNMENT. Thus, as long as we can get the
+ // first pixel in the first pyramid layer aligned properly, that will
+ // automatically mean that the first pixel of every row of every layer is
+ // properly aligned too.
+ //
+ // Thus all we need to consider is the first pixel in the first layer.
+ // This is located at offset
+ // extra_bytes + level_stride * PYRAMID_PADDING + PYRAMID_PADDING
+ // bytes into the buffer. Since level_stride is a multiple of
+ // PYRAMID_ALIGNMENT, we can ignore that. So we need
+ // extra_bytes + PYRAMID_PADDING = multiple of PYRAMID_ALIGNMENT
+ //
+ // To solve this, we can round PYRAMID_PADDING up to the next multiple
+ // of PYRAMID_ALIGNMENT, then subtract the orginal value to calculate
+ // how many extra bytes are needed.
+ size_t first_px_offset =
+ (PYRAMID_PADDING + PYRAMID_ALIGNMENT - 1) & ~(PYRAMID_ALIGNMENT - 1);
+ size_t extra_bytes = first_px_offset - PYRAMID_PADDING;
+ buffer_size += extra_bytes;
+
+ // If the original image is stored in an 8-bit buffer, then we can point the
+ // lowest pyramid level at that buffer rather than allocating a new one.
+ int first_allocated_level = image_is_16bit ? 0 : 1;
+
+ for (int level = first_allocated_level; level < n_levels; level++) {
+ int level_width = width >> level;
+ int level_height = height >> level;
+
+ // Allocate padding for each layer
+ int padded_width = level_width + 2 * PYRAMID_PADDING;
+ int padded_height = level_height + 2 * PYRAMID_PADDING;
+
+ // Align the layer stride to be a multiple of PYRAMID_ALIGNMENT
+ // This ensures that, as long as the top-left pixel in this pyramid level is
+ // properly aligned, then so will the leftmost pixel in every row of the
+ // pyramid level.
+ int level_stride =
+ (padded_width + PYRAMID_ALIGNMENT - 1) & ~(PYRAMID_ALIGNMENT - 1);
+
+ buffer_size += level_stride * padded_height;
+ }
+
+ alloc_size += buffer_size;
+
+ return alloc_size;
+}
+
+ImagePyramid *aom_alloc_pyramid(int width, int height, int n_levels,
+ bool image_is_16bit) {
+ // Limit number of levels on small frames
+ const int msb = get_msb(AOMMIN(width, height));
+ const int max_levels = AOMMAX(msb - MIN_PYRAMID_SIZE_LOG2, 1);
+ n_levels = AOMMIN(n_levels, max_levels);
+
+ ImagePyramid *pyr = aom_calloc(1, sizeof(*pyr));
+ if (!pyr) {
+ return NULL;
+ }
+
+ pyr->layers = aom_calloc(n_levels, sizeof(PyramidLayer));
+ if (!pyr->layers) {
+ aom_free(pyr);
+ return NULL;
+ }
+
+ pyr->valid = false;
+ pyr->n_levels = n_levels;
+
+ // Compute sizes and offsets for each pyramid level
+ // These are gathered up first, so that we can allocate all pyramid levels
+ // in a single buffer
+ size_t buffer_size = 0;
+ size_t *layer_offsets = aom_calloc(n_levels, sizeof(size_t));
+ if (!layer_offsets) {
+ aom_free(pyr);
+ aom_free(pyr->layers);
+ return NULL;
+ }
+
+ // Work out if we need to allocate a few extra bytes for alignment.
+ // aom_memalign() will ensure that the start of the allocation is aligned
+ // to a multiple of PYRAMID_ALIGNMENT. But we want the first image pixel
+ // to be aligned, not the first byte of the allocation.
+ //
+ // In the loop below, we ensure that the stride of every image is a multiple
+ // of PYRAMID_ALIGNMENT. Thus the allocated size of each pyramid level will
+ // also be a multiple of PYRAMID_ALIGNMENT. Thus, as long as we can get the
+ // first pixel in the first pyramid layer aligned properly, that will
+ // automatically mean that the first pixel of every row of every layer is
+ // properly aligned too.
+ //
+ // Thus all we need to consider is the first pixel in the first layer.
+ // This is located at offset
+ // extra_bytes + level_stride * PYRAMID_PADDING + PYRAMID_PADDING
+ // bytes into the buffer. Since level_stride is a multiple of
+ // PYRAMID_ALIGNMENT, we can ignore that. So we need
+ // extra_bytes + PYRAMID_PADDING = multiple of PYRAMID_ALIGNMENT
+ //
+ // To solve this, we can round PYRAMID_PADDING up to the next multiple
+ // of PYRAMID_ALIGNMENT, then subtract the orginal value to calculate
+ // how many extra bytes are needed.
+ size_t first_px_offset =
+ (PYRAMID_PADDING + PYRAMID_ALIGNMENT - 1) & ~(PYRAMID_ALIGNMENT - 1);
+ size_t extra_bytes = first_px_offset - PYRAMID_PADDING;
+ buffer_size += extra_bytes;
+
+ // If the original image is stored in an 8-bit buffer, then we can point the
+ // lowest pyramid level at that buffer rather than allocating a new one.
+ int first_allocated_level = image_is_16bit ? 0 : 1;
+
+ for (int level = first_allocated_level; level < n_levels; level++) {
+ PyramidLayer *layer = &pyr->layers[level];
+
+ int level_width = width >> level;
+ int level_height = height >> level;
+
+ // Allocate padding for each layer
+ int padded_width = level_width + 2 * PYRAMID_PADDING;
+ int padded_height = level_height + 2 * PYRAMID_PADDING;
+
+ // Align the layer stride to be a multiple of PYRAMID_ALIGNMENT
+ // This ensures that, as long as the top-left pixel in this pyramid level is
+ // properly aligned, then so will the leftmost pixel in every row of the
+ // pyramid level.
+ int level_stride =
+ (padded_width + PYRAMID_ALIGNMENT - 1) & ~(PYRAMID_ALIGNMENT - 1);
+
+ size_t level_alloc_start = buffer_size;
+ size_t level_start =
+ level_alloc_start + PYRAMID_PADDING * level_stride + PYRAMID_PADDING;
+
+ buffer_size += level_stride * padded_height;
+
+ layer_offsets[level] = level_start;
+ layer->width = level_width;
+ layer->height = level_height;
+ layer->stride = level_stride;
+ }
+
+ pyr->buffer_alloc =
+ aom_memalign(PYRAMID_ALIGNMENT, buffer_size * sizeof(*pyr->buffer_alloc));
+ if (!pyr->buffer_alloc) {
+ aom_free(pyr);
+ aom_free(pyr->layers);
+ aom_free(layer_offsets);
+ return NULL;
+ }
+
+ // Fill in pointers for each level
+ // If image is 8-bit, then the lowest level is left unconfigured for now,
+ // and will be set up properly when the pyramid is filled in
+ for (int level = first_allocated_level; level < n_levels; level++) {
+ PyramidLayer *layer = &pyr->layers[level];
+ layer->buffer = pyr->buffer_alloc + layer_offsets[level];
+ }
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_init(&pyr->mutex, NULL);
+#endif // CONFIG_MULTITHREAD
+
+ aom_free(layer_offsets);
+ return pyr;
+}
+
+// Fill the border region of a pyramid frame.
+// This must be called after the main image area is filled out.
+// `img_buf` should point to the first pixel in the image area,
+// ie. it should be pyr->level_buffer + pyr->level_loc[level].
+static INLINE void fill_border(uint8_t *img_buf, const int width,
+ const int height, const int stride) {
+ // Fill left and right areas
+ for (int row = 0; row < height; row++) {
+ uint8_t *row_start = &img_buf[row * stride];
+ uint8_t left_pixel = row_start[0];
+ memset(row_start - PYRAMID_PADDING, left_pixel, PYRAMID_PADDING);
+ uint8_t right_pixel = row_start[width - 1];
+ memset(row_start + width, right_pixel, PYRAMID_PADDING);
+ }
+
+ // Fill top area
+ for (int row = -PYRAMID_PADDING; row < 0; row++) {
+ uint8_t *row_start = &img_buf[row * stride];
+ memcpy(row_start - PYRAMID_PADDING, img_buf - PYRAMID_PADDING,
+ width + 2 * PYRAMID_PADDING);
+ }
+
+ // Fill bottom area
+ uint8_t *last_row_start = &img_buf[(height - 1) * stride];
+ for (int row = height; row < height + PYRAMID_PADDING; row++) {
+ uint8_t *row_start = &img_buf[row * stride];
+ memcpy(row_start - PYRAMID_PADDING, last_row_start - PYRAMID_PADDING,
+ width + 2 * PYRAMID_PADDING);
+ }
+}
+
+// Compute coarse to fine pyramids for a frame
+// This must only be called while holding frame_pyr->mutex
+static INLINE void fill_pyramid(const YV12_BUFFER_CONFIG *frame, int bit_depth,
+ ImagePyramid *frame_pyr) {
+ int n_levels = frame_pyr->n_levels;
+ const int frame_width = frame->y_crop_width;
+ const int frame_height = frame->y_crop_height;
+ const int frame_stride = frame->y_stride;
+ assert((frame_width >> n_levels) >= 0);
+ assert((frame_height >> n_levels) >= 0);
+
+ PyramidLayer *first_layer = &frame_pyr->layers[0];
+ if (frame->flags & YV12_FLAG_HIGHBITDEPTH) {
+ // For frames stored in a 16-bit buffer, we need to downconvert to 8 bits
+ assert(first_layer->width == frame_width);
+ assert(first_layer->height == frame_height);
+
+ uint16_t *frame_buffer = CONVERT_TO_SHORTPTR(frame->y_buffer);
+ uint8_t *pyr_buffer = first_layer->buffer;
+ int pyr_stride = first_layer->stride;
+ for (int y = 0; y < frame_height; y++) {
+ uint16_t *frame_row = frame_buffer + y * frame_stride;
+ uint8_t *pyr_row = pyr_buffer + y * pyr_stride;
+ for (int x = 0; x < frame_width; x++) {
+ pyr_row[x] = frame_row[x] >> (bit_depth - 8);
+ }
+ }
+
+ fill_border(pyr_buffer, frame_width, frame_height, pyr_stride);
+ } else {
+ // For frames stored in an 8-bit buffer, we need to configure the first
+ // pyramid layer to point at the original image buffer
+ first_layer->buffer = frame->y_buffer;
+ first_layer->width = frame_width;
+ first_layer->height = frame_height;
+ first_layer->stride = frame_stride;
+ }
+
+ // Fill in the remaining levels through progressive downsampling
+ for (int level = 1; level < n_levels; ++level) {
+ PyramidLayer *prev_layer = &frame_pyr->layers[level - 1];
+ uint8_t *prev_buffer = prev_layer->buffer;
+ int prev_stride = prev_layer->stride;
+
+ PyramidLayer *this_layer = &frame_pyr->layers[level];
+ uint8_t *this_buffer = this_layer->buffer;
+ int this_width = this_layer->width;
+ int this_height = this_layer->height;
+ int this_stride = this_layer->stride;
+
+ // Compute the this pyramid level by downsampling the current level.
+ //
+ // We downsample by a factor of exactly 2, clipping the rightmost and
+ // bottommost pixel off of the current level if needed. We do this for
+ // two main reasons:
+ //
+ // 1) In the disflow code, when stepping from a higher pyramid level to a
+ // lower pyramid level, we need to not just interpolate the flow field
+ // but also to scale each flow vector by the upsampling ratio.
+ // So it is much more convenient if this ratio is simply 2.
+ //
+ // 2) Up/downsampling by a factor of 2 can be implemented much more
+ // efficiently than up/downsampling by a generic ratio.
+ // TODO(rachelbarker): Use optimized downsample-by-2 function
+ av1_resize_plane(prev_buffer, this_height << 1, this_width << 1,
+ prev_stride, this_buffer, this_height, this_width,
+ this_stride);
+ fill_border(this_buffer, this_width, this_height, this_stride);
+ }
+}
+
+// Fill out a downsampling pyramid for a given frame.
+//
+// The top level (index 0) will always be an 8-bit copy of the input frame,
+// regardless of the input bit depth. Additional levels are then downscaled
+// by powers of 2.
+//
+// For small input frames, the number of levels actually constructed
+// will be limited so that the smallest image is at least MIN_PYRAMID_SIZE
+// pixels along each side.
+//
+// However, if the input frame has a side of length < MIN_PYRAMID_SIZE,
+// we will still construct the top level.
+void aom_compute_pyramid(const YV12_BUFFER_CONFIG *frame, int bit_depth,
+ ImagePyramid *pyr) {
+ assert(pyr);
+
+ // Per the comments in the ImagePyramid struct, we must take this mutex
+ // before reading or writing the "valid" flag, and hold it while computing
+ // the pyramid, to ensure proper behaviour if multiple threads call this
+ // function simultaneously
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ if (!pyr->valid) {
+ fill_pyramid(frame, bit_depth, pyr);
+ pyr->valid = true;
+ }
+
+ // At this point, the pyramid is guaranteed to be valid, and can be safely
+ // read from without holding the mutex any more
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+}
+
+#ifndef NDEBUG
+// Check if a pyramid has already been computed.
+// This is mostly a debug helper - as it is necessary to hold pyr->mutex
+// while reading the valid flag, we cannot just write:
+// assert(pyr->valid);
+// This function allows the check to be correctly written as:
+// assert(aom_is_pyramid_valid(pyr));
+bool aom_is_pyramid_valid(ImagePyramid *pyr) {
+ assert(pyr);
+
+ // Per the comments in the ImagePyramid struct, we must take this mutex
+ // before reading or writing the "valid" flag, and hold it while computing
+ // the pyramid, to ensure proper behaviour if multiple threads call this
+ // function simultaneously
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ bool valid = pyr->valid;
+
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+
+ return valid;
+}
+#endif
+
+// Mark a pyramid as no longer containing valid data.
+// This must be done whenever the corresponding frame buffer is reused
+void aom_invalidate_pyramid(ImagePyramid *pyr) {
+ if (pyr) {
+#if CONFIG_MULTITHREAD
+ pthread_mutex_lock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+ pyr->valid = false;
+#if CONFIG_MULTITHREAD
+ pthread_mutex_unlock(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+ }
+}
+
+// Release the memory associated with a pyramid
+void aom_free_pyramid(ImagePyramid *pyr) {
+ if (pyr) {
+#if CONFIG_MULTITHREAD
+ pthread_mutex_destroy(&pyr->mutex);
+#endif // CONFIG_MULTITHREAD
+ aom_free(pyr->buffer_alloc);
+ aom_free(pyr->layers);
+ aom_free(pyr);
+ }
+}
diff --git a/aom_dsp/pyramid.h b/aom_dsp/pyramid.h
new file mode 100644
index 0000000..812aae1
--- /dev/null
+++ b/aom_dsp/pyramid.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2022, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AOM_DSP_PYRAMID_H_
+#define AOM_AOM_DSP_PYRAMID_H_
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "config/aom_config.h"
+
+#include "aom_scale/yv12config.h"
+#include "aom_util/aom_thread.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Minimum dimensions of a downsampled image
+#define MIN_PYRAMID_SIZE_LOG2 3
+#define MIN_PYRAMID_SIZE (1 << MIN_PYRAMID_SIZE_LOG2)
+
+// Size of border around each pyramid image, in pixels
+// Similarly to the border around regular image buffers, this border is filled
+// with copies of the outermost pixels of the frame, to allow for more efficient
+// convolution code
+// TODO(rachelbarker): How many pixels do we actually need here?
+// I think we only need 9 for disflow, but how many for corner matching?
+#define PYRAMID_PADDING 16
+
+// Byte alignment of each line within the image pyramids.
+// That is, the first pixel inside the image (ie, not in the border region),
+// on each row of each pyramid level, is aligned to this byte alignment.
+// This value must be a power of 2.
+#define PYRAMID_ALIGNMENT 32
+
+typedef struct {
+ uint8_t *buffer;
+ int width;
+ int height;
+ int stride;
+} PyramidLayer;
+
+// Struct for an image pyramid
+typedef struct image_pyramid {
+#if CONFIG_MULTITHREAD
+ // Mutex which is used to prevent the pyramid being computed twice at the
+ // same time
+ //
+ // Semantics:
+ // * This mutex must be held whenever reading or writing the `valid` flag
+ //
+ // * This mutex must also be held while computing the image pyramid,
+ // to ensure that only one thread may do so at a time.
+ //
+ // * However, once you have read the valid flag and seen a true value,
+ // it is safe to drop the mutex and read from the remaining fields.
+ // This is because, once the image pyramid is computed, its contents
+ // will not be changed until the parent frame buffer is recycled,
+ // which will not happen until there are no more outstanding references
+ // to the frame buffer.
+ pthread_mutex_t mutex;
+#endif
+ // Flag indicating whether the pyramid contains valid data
+ bool valid;
+ // Number of allocated/filled levels in this pyramid
+ int n_levels;
+ // Pointer to allocated buffer
+ uint8_t *buffer_alloc;
+ // Data for each level
+ // The `buffer` pointers inside this array point into the region which
+ // is stored in the `buffer_alloc` field here
+ PyramidLayer *layers;
+} ImagePyramid;
+
+size_t aom_get_pyramid_alloc_size(int width, int height, int n_levels,
+ bool image_is_16bit);
+
+ImagePyramid *aom_alloc_pyramid(int width, int height, int n_levels,
+ bool image_is_16bit);
+
+// Fill out a downsampling pyramid for a given frame.
+//
+// The top level (index 0) will always be an 8-bit copy of the input frame,
+// regardless of the input bit depth. Additional levels are then downscaled
+// by powers of 2.
+//
+// For small input frames, the number of levels actually constructed
+// will be limited so that the smallest image is at least MIN_PYRAMID_SIZE
+// pixels along each side.
+//
+// However, if the input frame has a side of length < MIN_PYRAMID_SIZE,
+// we will still construct the top level.
+void aom_compute_pyramid(const YV12_BUFFER_CONFIG *frame, int bit_depth,
+ ImagePyramid *pyr);
+
+#ifndef NDEBUG
+// Check if a pyramid has already been computed.
+// This is mostly a debug helper - as it is necessary to hold pyr->mutex
+// while reading the valid flag, we cannot just write:
+// assert(pyr->valid);
+// This function allows the check to be correctly written as:
+// assert(aom_is_pyramid_valid(pyr));
+bool aom_is_pyramid_valid(ImagePyramid *pyr);
+#endif
+
+// Mark a pyramid as no longer containing valid data.
+// This must be done whenever the corresponding frame buffer is reused
+void aom_invalidate_pyramid(ImagePyramid *pyr);
+
+// Release the memory associated with a pyramid
+void aom_free_pyramid(ImagePyramid *pyr);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // AOM_AOM_DSP_PYRAMID_H_
diff --git a/aom_dsp/sad.c b/aom_dsp/sad.c
index 5b7b0e4..341a5ff 100644
--- a/aom_dsp/sad.c
+++ b/aom_dsp/sad.c
@@ -35,13 +35,6 @@
return sad;
}
-#define SAD_MXH(m) \
- unsigned int aom_sad##m##xh_c(const uint8_t *a, int a_stride, \
- const uint8_t *b, int b_stride, int width, \
- int height) { \
- return sad(a, a_stride, b, b_stride, width, height); \
- }
-
#define SADMXN(m, n) \
unsigned int aom_sad##m##x##n##_c(const uint8_t *src, int src_stride, \
const uint8_t *ref, int ref_stride) { \
@@ -68,7 +61,6 @@
return 2 * sad(src, 2 * src_stride, ref, 2 * ref_stride, (m), (n / 2)); \
}
-#if CONFIG_REALTIME_ONLY
// Calculate sad against 4 reference locations and store each in sad_array
#define SAD_MXNX4D(m, n) \
void aom_sad##m##x##n##x4d_c(const uint8_t *src, int src_stride, \
@@ -89,37 +81,6 @@
2 * ref_stride, (m), (n / 2)); \
} \
}
-#else // !CONFIG_REALTIME_ONLY
-// Calculate sad against 4 reference locations and store each in sad_array
-#define SAD_MXNX4D(m, n) \
- void aom_sad##m##x##n##x4d_c(const uint8_t *src, int src_stride, \
- const uint8_t *const ref_array[4], \
- int ref_stride, uint32_t sad_array[4]) { \
- int i; \
- for (i = 0; i < 4; ++i) { \
- sad_array[i] = \
- aom_sad##m##x##n##_c(src, src_stride, ref_array[i], ref_stride); \
- } \
- } \
- void aom_sad##m##x##n##x4d_avg_c( \
- const uint8_t *src, int src_stride, const uint8_t *const ref_array[4], \
- int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]) { \
- int i; \
- for (i = 0; i < 4; ++i) { \
- sad_array[i] = aom_sad##m##x##n##_avg_c(src, src_stride, ref_array[i], \
- ref_stride, second_pred); \
- } \
- } \
- void aom_sad_skip_##m##x##n##x4d_c(const uint8_t *src, int src_stride, \
- const uint8_t *const ref_array[4], \
- int ref_stride, uint32_t sad_array[4]) { \
- int i; \
- for (i = 0; i < 4; ++i) { \
- sad_array[i] = 2 * sad(src, 2 * src_stride, ref_array[i], \
- 2 * ref_stride, (m), (n / 2)); \
- } \
- }
-#endif // CONFIG_REALTIME_ONLY
// Call SIMD version of aom_sad_mxnx4d if the 3d version is unavailable.
#define SAD_MXNX3D(m, n) \
void aom_sad##m##x##n##x3d_c(const uint8_t *src, int src_stride, \
@@ -208,13 +169,7 @@
SAD_MXNX4D(4, 4)
SAD_MXNX3D(4, 4)
-SAD_MXH(128)
-SAD_MXH(64)
-SAD_MXH(32)
-SAD_MXH(16)
-SAD_MXH(8)
-SAD_MXH(4)
-
+#if !CONFIG_REALTIME_ONLY
SADMXN(4, 16)
SAD_MXNX4D(4, 16)
SADMXN(16, 4)
@@ -227,7 +182,6 @@
SAD_MXNX4D(16, 64)
SADMXN(64, 16)
SAD_MXNX4D(64, 16)
-#if !CONFIG_REALTIME_ONLY
SAD_MXNX3D(4, 16)
SAD_MXNX3D(16, 4)
SAD_MXNX3D(8, 32)
diff --git a/aom_dsp/simd/v128_intrinsics_arm.h b/aom_dsp/simd/v128_intrinsics_arm.h
index 2d497f4..6488de7 100644
--- a/aom_dsp/simd/v128_intrinsics_arm.h
+++ b/aom_dsp/simd/v128_intrinsics_arm.h
@@ -14,6 +14,8 @@
#include <arm_neon.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/simd/v64_intrinsics_arm.h"
typedef int64x2_t v128;
@@ -29,7 +31,7 @@
SIMD_INLINE v128 v128_from_v64(v64 a, v64 b) { return vcombine_s64(b, a); }
SIMD_INLINE v128 v128_from_64(uint64_t a, uint64_t b) {
- return vcombine_s64((int64x1_t)b, (int64x1_t)a);
+ return vcombine_s64(vcreate_s64(b), vcreate_s64(a));
}
SIMD_INLINE v128 v128_from_32(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
@@ -97,11 +99,11 @@
int16x8_t t2 = vmulq_s16(
vmovl_s8(vreinterpret_s8_s64(vget_high_s64(a))),
vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_s64(vget_high_s64(b)))));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_s16(t1) + vaddlvq_s16(t2);
#else
int64x2_t t = vpaddlq_s32(vaddq_s32(vpaddlq_s16(t1), vpaddlq_s16(t2)));
- return (int64_t)vget_high_s64(t) + (int64_t)vget_low_s64(t);
+ return vget_lane_s64(vadd_s64(vget_high_s64(t), vget_low_s64(t)), 0);
#endif
}
@@ -113,11 +115,11 @@
SIMD_INLINE int64_t v128_dotp_s32(v128 a, v128 b) {
int64x2_t t = vpaddlq_s32(
vmulq_s32(vreinterpretq_s32_s64(a), vreinterpretq_s32_s64(b)));
- return (int64_t)vget_high_s64(t) + (int64_t)vget_low_s64(t);
+ return vget_lane_s64(vadd_s64(vget_high_s64(t), vget_low_s64(t)), 0);
}
SIMD_INLINE uint64_t v128_hadd_u8(v128 x) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u8(vreinterpretq_u8_s64(x));
#else
uint64x2_t t = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(vreinterpretq_u8_s64(x))));
@@ -155,11 +157,12 @@
}
SIMD_INLINE uint32_t v128_sad_u8_sum(sad128_internal s) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u16(s.hi) + vaddlvq_u16(s.lo);
#else
uint64x2_t t = vpaddlq_u32(vpaddlq_u16(vaddq_u16(s.hi, s.lo)));
- return (uint32_t)(uint64_t)(vget_high_u64(t) + vget_low_u64(t));
+ return (uint32_t)vget_lane_u64(vadd_u64(vget_high_u64(t), vget_low_u64(t)),
+ 0);
#endif
}
@@ -285,7 +288,7 @@
}
SIMD_INLINE v128 v128_mulhi_s16(v128 a, v128 b) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_s16(vuzp2q_s16(
vreinterpretq_s16_s32(vmull_s16(vreinterpret_s16_s64(vget_low_s64(a)),
vreinterpret_s16_s64(vget_low_s64(b)))),
@@ -303,7 +306,7 @@
}
SIMD_INLINE v128 v128_madd_s16(v128 a, v128 b) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int32x4_t t1 = vmull_s16(vreinterpret_s16_s64(vget_low_s64(a)),
vreinterpret_s16_s64(vget_low_s64(b)));
int32x4_t t2 =
@@ -316,7 +319,7 @@
}
SIMD_INLINE v128 v128_madd_us8(v128 a, v128 b) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t t1 = vmulq_s16(
vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_s64(vget_low_s64(a)))),
vmovl_s8(vreinterpret_s8_s64(vget_low_s64(b))));
@@ -368,7 +371,7 @@
SIMD_INLINE uint32_t v128_movemask_8(v128 a) {
a = vreinterpretq_s64_u8(vcltq_s8(vreinterpretq_s8_s64(a), vdupq_n_s8(0)));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x16_t m =
vandq_u8(vreinterpretq_u8_s64(a),
vreinterpretq_u8_u64(vdupq_n_u64(0x8040201008040201ULL)));
@@ -377,8 +380,8 @@
uint64x2_t m = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(
vandq_u8(vreinterpretq_u8_s64(a),
vreinterpretq_u8_u64(vdupq_n_u64(0x8040201008040201ULL))))));
- return v64_low_u32(
- v64_ziplo_8(v128_high_v64((v128)m), v128_low_v64((v128)m)));
+ int64x2_t s = vreinterpretq_s64_u64(m);
+ return v64_low_u32(v64_ziplo_8(vget_high_s64(s), vget_low_s64(s)));
#endif
}
@@ -413,7 +416,7 @@
}
SIMD_INLINE v128 v128_ziplo_8(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u8(
vzip1q_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x)));
#else
@@ -423,7 +426,7 @@
}
SIMD_INLINE v128 v128_ziphi_8(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u8(
vzip2q_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x)));
#else
@@ -438,7 +441,7 @@
}
SIMD_INLINE v128 v128_ziplo_16(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u16(
vzip1q_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x)));
#else
@@ -448,7 +451,7 @@
}
SIMD_INLINE v128 v128_ziphi_16(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u16(
vzip2q_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x)));
#else
@@ -463,7 +466,7 @@
}
SIMD_INLINE v128 v128_ziplo_32(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u32(
vzip1q_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x)));
#else
@@ -473,7 +476,7 @@
}
SIMD_INLINE v128 v128_ziphi_32(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u32(
vzip2q_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x)));
#else
@@ -488,16 +491,15 @@
}
SIMD_INLINE v128 v128_ziplo_64(v128 a, v128 b) {
- return v128_from_v64(vget_low_s64((int64x2_t)a), vget_low_s64((int64x2_t)b));
+ return v128_from_v64(vget_low_s64(a), vget_low_s64(b));
}
SIMD_INLINE v128 v128_ziphi_64(v128 a, v128 b) {
- return v128_from_v64(vget_high_s64((int64x2_t)a),
- vget_high_s64((int64x2_t)b));
+ return v128_from_v64(vget_high_s64(a), vget_high_s64(b));
}
SIMD_INLINE v128 v128_unziplo_8(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u8(
vuzp1q_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x)));
#else
@@ -507,7 +509,7 @@
}
SIMD_INLINE v128 v128_unziphi_8(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u8(
vuzp2q_u8(vreinterpretq_u8_s64(y), vreinterpretq_u8_s64(x)));
#else
@@ -517,7 +519,7 @@
}
SIMD_INLINE v128 v128_unziplo_16(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u16(
vuzp1q_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x)));
#else
@@ -528,7 +530,7 @@
}
SIMD_INLINE v128 v128_unziphi_16(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u16(
vuzp2q_u16(vreinterpretq_u16_s64(y), vreinterpretq_u16_s64(x)));
#else
@@ -539,7 +541,7 @@
}
SIMD_INLINE v128 v128_unziplo_32(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u32(
vuzp1q_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x)));
#else
@@ -550,7 +552,7 @@
}
SIMD_INLINE v128 v128_unziphi_32(v128 x, v128 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u32(
vuzp2q_u32(vreinterpretq_u32_s64(y), vreinterpretq_u32_s64(x)));
#else
@@ -637,16 +639,18 @@
}
SIMD_INLINE v128 v128_shuffle_8(v128 x, v128 pattern) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpretq_s64_u8(
vqtbl1q_u8(vreinterpretq_u8_s64(x), vreinterpretq_u8_s64(pattern)));
#else
uint8x8x2_t p = { { vget_low_u8(vreinterpretq_u8_s64(x)),
vget_high_u8(vreinterpretq_u8_s64(x)) } };
- return v128_from_64((uint64_t)vreinterpret_s64_u8(vtbl2_u8(
- p, vreinterpret_u8_s64(vget_high_s64(pattern)))),
- (uint64_t)vreinterpret_s64_u8(vtbl2_u8(
- p, vreinterpret_u8_s64(vget_low_s64(pattern)))));
+ uint8x8_t shuffle_hi =
+ vtbl2_u8(p, vreinterpret_u8_s64(vget_high_s64(pattern)));
+ uint8x8_t shuffle_lo =
+ vtbl2_u8(p, vreinterpret_u8_s64(vget_low_s64(pattern)));
+ return v128_from_64(vget_lane_u64(vreinterpret_u64_u8(shuffle_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle_lo), 0));
#endif
}
@@ -697,72 +701,72 @@
SIMD_INLINE v128 v128_shl_8(v128 a, unsigned int c) {
return (c > 7) ? v128_zero()
- : vreinterpretq_s64_u8(
- vshlq_u8(vreinterpretq_u8_s64(a), vdupq_n_s8(c)));
+ : vreinterpretq_s64_u8(vshlq_u8(vreinterpretq_u8_s64(a),
+ vdupq_n_s8((int8_t)c)));
}
SIMD_INLINE v128 v128_shr_u8(v128 a, unsigned int c) {
return (c > 7) ? v128_zero()
- : vreinterpretq_s64_u8(
- vshlq_u8(vreinterpretq_u8_s64(a), vdupq_n_s8(-c)));
+ : vreinterpretq_s64_u8(vshlq_u8(vreinterpretq_u8_s64(a),
+ vdupq_n_s8(-(int8_t)c)));
}
SIMD_INLINE v128 v128_shr_s8(v128 a, unsigned int c) {
return (c > 7) ? v128_ones()
- : vreinterpretq_s64_s8(
- vshlq_s8(vreinterpretq_s8_s64(a), vdupq_n_s8(-c)));
+ : vreinterpretq_s64_s8(vshlq_s8(vreinterpretq_s8_s64(a),
+ vdupq_n_s8(-(int8_t)c)));
}
SIMD_INLINE v128 v128_shl_16(v128 a, unsigned int c) {
return (c > 15) ? v128_zero()
- : vreinterpretq_s64_u16(
- vshlq_u16(vreinterpretq_u16_s64(a), vdupq_n_s16(c)));
+ : vreinterpretq_s64_u16(vshlq_u16(vreinterpretq_u16_s64(a),
+ vdupq_n_s16((int16_t)c)));
}
SIMD_INLINE v128 v128_shr_u16(v128 a, unsigned int c) {
return (c > 15) ? v128_zero()
- : vreinterpretq_s64_u16(
- vshlq_u16(vreinterpretq_u16_s64(a), vdupq_n_s16(-c)));
+ : vreinterpretq_s64_u16(vshlq_u16(vreinterpretq_u16_s64(a),
+ vdupq_n_s16(-(int16_t)c)));
}
SIMD_INLINE v128 v128_shr_s16(v128 a, unsigned int c) {
return (c > 15) ? v128_ones()
- : vreinterpretq_s64_s16(
- vshlq_s16(vreinterpretq_s16_s64(a), vdupq_n_s16(-c)));
+ : vreinterpretq_s64_s16(vshlq_s16(vreinterpretq_s16_s64(a),
+ vdupq_n_s16(-(int16_t)c)));
}
SIMD_INLINE v128 v128_shl_32(v128 a, unsigned int c) {
return (c > 31) ? v128_zero()
- : vreinterpretq_s64_u32(
- vshlq_u32(vreinterpretq_u32_s64(a), vdupq_n_s32(c)));
+ : vreinterpretq_s64_u32(vshlq_u32(vreinterpretq_u32_s64(a),
+ vdupq_n_s32((int32_t)c)));
}
SIMD_INLINE v128 v128_shr_u32(v128 a, unsigned int c) {
return (c > 31) ? v128_zero()
- : vreinterpretq_s64_u32(
- vshlq_u32(vreinterpretq_u32_s64(a), vdupq_n_s32(-c)));
+ : vreinterpretq_s64_u32(vshlq_u32(vreinterpretq_u32_s64(a),
+ vdupq_n_s32(-(int32_t)c)));
}
SIMD_INLINE v128 v128_shr_s32(v128 a, unsigned int c) {
return (c > 31) ? v128_ones()
- : vreinterpretq_s64_s32(
- vshlq_s32(vreinterpretq_s32_s64(a), vdupq_n_s32(-c)));
+ : vreinterpretq_s64_s32(vshlq_s32(vreinterpretq_s32_s64(a),
+ vdupq_n_s32(-(int32_t)c)));
}
SIMD_INLINE v128 v128_shl_64(v128 a, unsigned int c) {
return (c > 63) ? v128_zero()
- : vreinterpretq_s64_u64(
- vshlq_u64(vreinterpretq_u64_s64(a), vdupq_n_s64(c)));
+ : vreinterpretq_s64_u64(vshlq_u64(vreinterpretq_u64_s64(a),
+ vdupq_n_s64((int64_t)c)));
}
SIMD_INLINE v128 v128_shr_u64(v128 a, unsigned int c) {
return (c > 63) ? v128_zero()
- : vreinterpretq_s64_u64(
- vshlq_u64(vreinterpretq_u64_s64(a), vdupq_n_s64(-c)));
+ : vreinterpretq_s64_u64(vshlq_u64(vreinterpretq_u64_s64(a),
+ vdupq_n_s64(-(int64_t)c)));
}
SIMD_INLINE v128 v128_shr_s64(v128 a, unsigned int c) {
- return (c > 63) ? v128_ones() : vshlq_s64(a, vdupq_n_s64(-c));
+ return (c > 63) ? v128_ones() : vshlq_s64(a, vdupq_n_s64(-(int64_t)c));
}
#if defined(__OPTIMIZE__) && __OPTIMIZE__ && !defined(__clang__)
@@ -949,8 +953,8 @@
SIMD_INLINE uint32_t v128_sad_u16_sum(sad128_internal_u16 s) {
uint64x2_t t = vpaddlq_u32(s);
- return (uint32_t)(uint64_t)vget_high_u64(t) +
- (uint32_t)(uint64_t)vget_low_u64(t);
+ return (uint32_t)vget_lane_u64(vadd_u64(vget_high_u64(t), vget_low_u64(t)),
+ 0);
}
typedef v128 ssd128_internal_s16;
diff --git a/aom_dsp/simd/v256_intrinsics_v128.h b/aom_dsp/simd/v256_intrinsics_v128.h
index 0d22667..4cd83f7 100644
--- a/aom_dsp/simd/v256_intrinsics_v128.h
+++ b/aom_dsp/simd/v256_intrinsics_v128.h
@@ -12,6 +12,8 @@
#ifndef AOM_AOM_DSP_SIMD_V256_INTRINSICS_V128_H_
#define AOM_AOM_DSP_SIMD_V256_INTRINSICS_V128_H_
+#include "config/aom_config.h"
+
#if HAVE_NEON
#include "aom_dsp/simd/v128_intrinsics_arm.h"
#elif HAVE_SSE2
@@ -614,7 +616,7 @@
SIMD_INLINE v256 v256_shuffle_8(v256 x, v256 pattern) {
#if HAVE_NEON
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x16x2_t p = { { vreinterpretq_u8_s64(x.val[0]),
vreinterpretq_u8_s64(x.val[1]) } };
return v256_from_v128(
@@ -626,15 +628,18 @@
vget_high_u8(vreinterpretq_u8_s64(x.val[0])),
vget_low_u8(vreinterpretq_u8_s64(x.val[1])),
vget_high_u8(vreinterpretq_u8_s64(x.val[1])) } };
- return v256_from_64(
- (uint64_t)vreinterpret_s64_u8(
- vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(pattern.val[1])))),
- (uint64_t)vreinterpret_s64_u8(
- vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(pattern.val[1])))),
- (uint64_t)vreinterpret_s64_u8(
- vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(pattern.val[0])))),
- (uint64_t)vreinterpret_s64_u8(
- vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(pattern.val[0])))));
+ uint8x8_t shuffle1_hi =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(pattern.val[1])));
+ uint8x8_t shuffle1_lo =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(pattern.val[1])));
+ uint8x8_t shuffle0_hi =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(pattern.val[0])));
+ uint8x8_t shuffle0_lo =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(pattern.val[0])));
+ return v256_from_64(vget_lane_u64(vreinterpret_u64_u8(shuffle1_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle1_lo), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_lo), 0));
#endif
#else
v128 c16 = v128_dup_8(16);
@@ -650,7 +655,7 @@
SIMD_INLINE v256 v256_wideshuffle_8(v256 x, v256 y, v256 pattern) {
#if HAVE_NEON
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x16x4_t p = { {
vreinterpretq_u8_s64(y.val[0]),
vreinterpretq_u8_s64(y.val[1]),
@@ -672,24 +677,26 @@
vget_high_u8(vreinterpretq_u8_s64(y.val[0])),
vget_low_u8(vreinterpretq_u8_s64(y.val[1])),
vget_high_u8(vreinterpretq_u8_s64(y.val[1])) } };
- v256 r1 =
- v256_from_64((uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- p, vreinterpret_u8_s64(vget_high_s64(p32.val[1])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- p, vreinterpret_u8_s64(vget_low_s64(p32.val[1])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- p, vreinterpret_u8_s64(vget_high_s64(p32.val[0])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- p, vreinterpret_u8_s64(vget_low_s64(p32.val[0])))));
- v256 r2 =
- v256_from_64((uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- q, vreinterpret_u8_s64(vget_high_s64(pattern.val[1])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- q, vreinterpret_u8_s64(vget_low_s64(pattern.val[1])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- q, vreinterpret_u8_s64(vget_high_s64(pattern.val[0])))),
- (uint64_t)vreinterpret_s64_u8(vtbl4_u8(
- q, vreinterpret_u8_s64(vget_low_s64(pattern.val[0])))));
+ uint8x8_t shuffle1_hi =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(p32.val[1])));
+ uint8x8_t shuffle1_lo =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(p32.val[1])));
+ uint8x8_t shuffle0_hi =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_high_s64(p32.val[0])));
+ uint8x8_t shuffle0_lo =
+ vtbl4_u8(p, vreinterpret_u8_s64(vget_low_s64(p32.val[0])));
+ v256 r1 = v256_from_64(vget_lane_u64(vreinterpret_u64_u8(shuffle1_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle1_lo), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_lo), 0));
+ shuffle1_hi = vtbl4_u8(q, vreinterpret_u8_s64(vget_high_s64(pattern.val[1])));
+ shuffle1_lo = vtbl4_u8(q, vreinterpret_u8_s64(vget_low_s64(pattern.val[1])));
+ shuffle0_hi = vtbl4_u8(q, vreinterpret_u8_s64(vget_high_s64(pattern.val[0])));
+ shuffle0_lo = vtbl4_u8(q, vreinterpret_u8_s64(vget_low_s64(pattern.val[0])));
+ v256 r2 = v256_from_64(vget_lane_u64(vreinterpret_u64_u8(shuffle1_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle1_lo), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_hi), 0),
+ vget_lane_u64(vreinterpret_u64_u8(shuffle0_lo), 0));
return v256_blend_8(r1, r2, v256_cmplt_s8(pattern, c32));
#endif
#else
diff --git a/aom_dsp/simd/v64_intrinsics_arm.h b/aom_dsp/simd/v64_intrinsics_arm.h
index a4ecdf4..8d07c34 100644
--- a/aom_dsp/simd/v64_intrinsics_arm.h
+++ b/aom_dsp/simd/v64_intrinsics_arm.h
@@ -13,6 +13,9 @@
#define AOM_AOM_DSP_SIMD_V64_INTRINSICS_ARM_H_
#include <arm_neon.h>
+#include <string.h>
+
+#include "config/aom_config.h"
#include "aom_dsp/simd/v64_intrinsics_arm.h"
#include "aom_ports/arm.h"
@@ -50,7 +53,7 @@
SIMD_INLINE v64 v64_from_64(uint64_t x) { return vcreate_s64(x); }
-SIMD_INLINE uint64_t v64_u64(v64 x) { return (uint64_t)x; }
+SIMD_INLINE uint64_t v64_u64(v64 x) { return (uint64_t)vget_lane_s64(x, 0); }
SIMD_INLINE uint32_t u32_load_aligned(const void *p) {
return *((uint32_t *)p);
@@ -77,8 +80,7 @@
} __attribute__((__packed__));
((struct Unaligned32Struct *)p)->value = a;
#else
- vst1_lane_u32((uint32_t *)p, vreinterpret_u32_s64((uint64x1_t)(uint64_t)a),
- 0);
+ memcpy(p, &a, 4);
#endif
}
@@ -106,7 +108,8 @@
vext_s8(vreinterpret_s8_s64(b), vreinterpret_s8_s64(a), c))
: b;
#else
- return c ? v64_from_64(((uint64_t)b >> c * 8) | ((uint64_t)a << (8 - c) * 8))
+ return c ? v64_from_64(((uint64_t)vget_lane_s64(b, 0) >> c * 8) |
+ ((uint64_t)vget_lane_s64(a, 0) << (8 - c) * 8))
: b;
#endif
}
@@ -129,35 +132,36 @@
int16x8_t t =
vmulq_s16(vmovl_s8(vreinterpret_s8_s64(x)),
vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_s64(y))));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_s16(t);
#else
int64x2_t r = vpaddlq_s32(vpaddlq_s16(t));
- return (int64_t)vadd_s64(vget_high_s64(r), vget_low_s64(r));
+ return vget_lane_s64(vadd_s64(vget_high_s64(r), vget_low_s64(r)), 0);
#endif
}
SIMD_INLINE int64_t v64_dotp_s16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_s32(
vmull_s16(vreinterpret_s16_s64(x), vreinterpret_s16_s64(y)));
#else
int64x2_t r =
vpaddlq_s32(vmull_s16(vreinterpret_s16_s64(x), vreinterpret_s16_s64(y)));
- return (int64_t)(vget_high_s64(r) + vget_low_s64(r));
+ return vget_lane_s64(vadd_s64(vget_high_s64(r), vget_low_s64(r)), 0);
#endif
}
SIMD_INLINE uint64_t v64_hadd_u8(v64 x) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlv_u8(vreinterpret_u8_s64(x));
#else
- return (uint64_t)vpaddl_u32(vpaddl_u16(vpaddl_u8(vreinterpret_u8_s64(x))));
+ return vget_lane_u64(
+ vpaddl_u32(vpaddl_u16(vpaddl_u8(vreinterpret_u8_s64(x)))), 0);
#endif
}
SIMD_INLINE int64_t v64_hadd_s16(v64 a) {
- return (int64_t)vpaddl_s32(vpaddl_s16(vreinterpret_s16_s64(a)));
+ return vget_lane_s64(vpaddl_s32(vpaddl_s16(vreinterpret_s16_s64(a))), 0);
}
typedef uint16x8_t sad64_internal;
@@ -171,11 +175,12 @@
}
SIMD_INLINE uint32_t v64_sad_u8_sum(sad64_internal s) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_u16(s);
#else
uint64x2_t r = vpaddlq_u32(vpaddlq_u16(s));
- return (uint32_t)(uint64_t)(vget_high_u64(r) + vget_low_u64(r));
+ return (uint32_t)vget_lane_u64(vadd_u64(vget_high_u64(r), vget_low_u64(r)),
+ 0);
#endif
}
@@ -191,7 +196,7 @@
}
SIMD_INLINE uint32_t v64_ssd_u8_sum(ssd64_internal s) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddvq_u32(s);
#else
uint64x2_t t = vpaddlq_u32(s);
@@ -287,7 +292,7 @@
}
SIMD_INLINE v64 v64_mulhi_s16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t t = vreinterpretq_s16_s32(
vmull_s16(vreinterpret_s16_s64(x), vreinterpret_s16_s64(y)));
return vget_low_s64(vreinterpretq_s64_s16(vuzp2q_s16(t, t)));
@@ -367,7 +372,7 @@
}
SIMD_INLINE v64 v64_ziplo_8(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u8(
vzip1_u8(vreinterpret_u8_s64(y), vreinterpret_u8_s64(x)));
#else
@@ -377,7 +382,7 @@
}
SIMD_INLINE v64 v64_ziphi_8(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u8(
vzip2_u8(vreinterpret_u8_s64(y), vreinterpret_u8_s64(x)));
#else
@@ -387,7 +392,7 @@
}
SIMD_INLINE v64 v64_ziplo_16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u16(
vzip1_u16(vreinterpret_u16_s64(y), vreinterpret_u16_s64(x)));
#else
@@ -397,7 +402,7 @@
}
SIMD_INLINE v64 v64_ziphi_16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u16(
vzip2_u16(vreinterpret_u16_s64(y), vreinterpret_u16_s64(x)));
#else
@@ -407,7 +412,7 @@
}
SIMD_INLINE v64 v64_ziplo_32(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u32(
vzip1_u32(vreinterpret_u32_s64(y), vreinterpret_u32_s64(x)));
#else
@@ -417,7 +422,7 @@
}
SIMD_INLINE v64 v64_ziphi_32(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u32(
vzip2_u32(vreinterpret_u32_s64(y), vreinterpret_u32_s64(x)));
#else
@@ -463,7 +468,7 @@
}
SIMD_INLINE v64 v64_unziplo_8(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u8(
vuzp1_u8(vreinterpret_u8_s64(y), vreinterpret_u8_s64(x)));
#else
@@ -473,7 +478,7 @@
}
SIMD_INLINE v64 v64_unziphi_8(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u8(
vuzp2_u8(vreinterpret_u8_s64(y), vreinterpret_u8_s64(x)));
#else
@@ -483,7 +488,7 @@
}
SIMD_INLINE v64 v64_unziplo_16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u16(
vuzp1_u16(vreinterpret_u16_s64(y), vreinterpret_u16_s64(x)));
#else
@@ -493,7 +498,7 @@
}
SIMD_INLINE v64 v64_unziphi_16(v64 x, v64 y) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vreinterpret_s64_u16(
vuzp2_u16(vreinterpret_u16_s64(y), vreinterpret_u16_s64(x)));
#else
@@ -556,43 +561,48 @@
}
SIMD_INLINE v64 v64_shl_8(v64 a, unsigned int c) {
- return vreinterpret_s64_u8(vshl_u8(vreinterpret_u8_s64(a), vdup_n_s8(c)));
+ return vreinterpret_s64_u8(
+ vshl_u8(vreinterpret_u8_s64(a), vdup_n_s8((int8_t)c)));
}
SIMD_INLINE v64 v64_shr_u8(v64 a, unsigned int c) {
- return vreinterpret_s64_u8(vshl_u8(vreinterpret_u8_s64(a), vdup_n_s8(-c)));
+ return vreinterpret_s64_u8(
+ vshl_u8(vreinterpret_u8_s64(a), vdup_n_s8(-(int8_t)c)));
}
SIMD_INLINE v64 v64_shr_s8(v64 a, unsigned int c) {
- return vreinterpret_s64_s8(vshl_s8(vreinterpret_s8_s64(a), vdup_n_s8(-c)));
+ return vreinterpret_s64_s8(
+ vshl_s8(vreinterpret_s8_s64(a), vdup_n_s8(-(int8_t)c)));
}
SIMD_INLINE v64 v64_shl_16(v64 a, unsigned int c) {
- return vreinterpret_s64_u16(vshl_u16(vreinterpret_u16_s64(a), vdup_n_s16(c)));
+ return vreinterpret_s64_u16(
+ vshl_u16(vreinterpret_u16_s64(a), vdup_n_s16((int16_t)c)));
}
SIMD_INLINE v64 v64_shr_u16(v64 a, unsigned int c) {
return vreinterpret_s64_u16(
- vshl_u16(vreinterpret_u16_s64(a), vdup_n_s16(-(int)c)));
+ vshl_u16(vreinterpret_u16_s64(a), vdup_n_s16(-(int16_t)c)));
}
SIMD_INLINE v64 v64_shr_s16(v64 a, unsigned int c) {
return vreinterpret_s64_s16(
- vshl_s16(vreinterpret_s16_s64(a), vdup_n_s16(-(int)c)));
+ vshl_s16(vreinterpret_s16_s64(a), vdup_n_s16(-(int16_t)c)));
}
SIMD_INLINE v64 v64_shl_32(v64 a, unsigned int c) {
- return vreinterpret_s64_u32(vshl_u32(vreinterpret_u32_s64(a), vdup_n_s32(c)));
+ return vreinterpret_s64_u32(
+ vshl_u32(vreinterpret_u32_s64(a), vdup_n_s32((int32_t)c)));
}
SIMD_INLINE v64 v64_shr_u32(v64 a, unsigned int c) {
return vreinterpret_s64_u32(
- vshl_u32(vreinterpret_u32_s64(a), vdup_n_s32(-(int)c)));
+ vshl_u32(vreinterpret_u32_s64(a), vdup_n_s32(-(int32_t)c)));
}
SIMD_INLINE v64 v64_shr_s32(v64 a, unsigned int c) {
return vreinterpret_s64_s32(
- vshl_s32(vreinterpret_s32_s64(a), vdup_n_s32(-(int)c)));
+ vshl_s32(vreinterpret_s32_s64(a), vdup_n_s32(-(int32_t)c)));
}
// The following functions require an immediate.
diff --git a/aom_dsp/variance.c b/aom_dsp/variance.c
index f72feea..63c1e5f 100644
--- a/aom_dsp/variance.c
+++ b/aom_dsp/variance.c
@@ -25,24 +25,6 @@
#include "av1/common/filter.h"
#include "av1/common/reconinter.h"
-uint32_t aom_get4x4sse_cs_c(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride) {
- int distortion = 0;
- int r, c;
-
- for (r = 0; r < 4; ++r) {
- for (c = 0; c < 4; ++c) {
- int diff = a[c] - b[c];
- distortion += diff * diff;
- }
-
- a += a_stride;
- b += b_stride;
- }
-
- return distortion;
-}
-
uint32_t aom_get_mb_ss_c(const int16_t *a) {
unsigned int i, sum = 0;
@@ -198,17 +180,6 @@
return aom_variance##W##x##H(temp3, W, b, b_stride, sse); \
}
-/* Identical to the variance call except it takes an additional parameter, sum,
- * and returns that value using pass-by-reference instead of returning
- * sse - sum^2 / w*h
- */
-#define GET_VAR(W, H) \
- void aom_get##W##x##H##var_c(const uint8_t *a, int a_stride, \
- const uint8_t *b, int b_stride, uint32_t *sse, \
- int *sum) { \
- variance(a, a_stride, b, b_stride, W, H, sse, sum); \
- }
-
void aom_get_var_sse_sum_8x8_quad_c(const uint8_t *a, int a_stride,
const uint8_t *b, int b_stride,
uint32_t *sse8x8, int *sum8x8,
@@ -231,7 +202,7 @@
const uint8_t *ref_ptr, int ref_stride,
uint32_t *sse16x16, unsigned int *tot_sse,
int *tot_sum, uint32_t *var16x16) {
- int sum16x16[64] = { 0 };
+ int sum16x16[2] = { 0 };
// Loop over two consecutive 16x16 blocks and process as one 16x32 block.
for (int k = 0; k < 2; k++) {
variance(src_ptr + (k * 16), source_stride, ref_ptr + (k * 16), ref_stride,
@@ -281,9 +252,6 @@
VARIANCES(8, 4)
VARIANCES(4, 8)
VARIANCES(4, 4)
-VARIANCES(4, 2)
-VARIANCES(2, 4)
-VARIANCES(2, 2)
// Realtime mode doesn't use rectangular blocks.
#if !CONFIG_REALTIME_ONLY
@@ -295,9 +263,6 @@
VARIANCES(64, 16)
#endif
-GET_VAR(16, 16)
-GET_VAR(8, 8)
-
MSE(16, 16)
MSE(16, 8)
MSE(8, 16)
@@ -428,25 +393,6 @@
return (var >= 0) ? (uint32_t)var : 0; \
}
-#define HIGHBD_GET_VAR(S) \
- void aom_highbd_8_get##S##x##S##var_c(const uint8_t *src, int src_stride, \
- const uint8_t *ref, int ref_stride, \
- uint32_t *sse, int *sum) { \
- highbd_8_variance(src, src_stride, ref, ref_stride, S, S, sse, sum); \
- } \
- \
- void aom_highbd_10_get##S##x##S##var_c(const uint8_t *src, int src_stride, \
- const uint8_t *ref, int ref_stride, \
- uint32_t *sse, int *sum) { \
- highbd_10_variance(src, src_stride, ref, ref_stride, S, S, sse, sum); \
- } \
- \
- void aom_highbd_12_get##S##x##S##var_c(const uint8_t *src, int src_stride, \
- const uint8_t *ref, int ref_stride, \
- uint32_t *sse, int *sum) { \
- highbd_12_variance(src, src_stride, ref, ref_stride, S, S, sse, sum); \
- }
-
#define HIGHBD_MSE(W, H) \
uint32_t aom_highbd_8_mse##W##x##H##_c(const uint8_t *src, int src_stride, \
const uint8_t *ref, int ref_stride, \
@@ -706,9 +652,6 @@
HIGHBD_VARIANCES(8, 4)
HIGHBD_VARIANCES(4, 8)
HIGHBD_VARIANCES(4, 4)
-HIGHBD_VARIANCES(4, 2)
-HIGHBD_VARIANCES(2, 4)
-HIGHBD_VARIANCES(2, 2)
// Realtime mode doesn't use 4x rectangular blocks.
#if !CONFIG_REALTIME_ONLY
@@ -720,9 +663,6 @@
HIGHBD_VARIANCES(64, 16)
#endif
-HIGHBD_GET_VAR(8)
-HIGHBD_GET_VAR(16)
-
HIGHBD_MSE(16, 16)
HIGHBD_MSE(16, 8)
HIGHBD_MSE(8, 16)
diff --git a/aom_dsp/x86/aom_subpixel_8t_ssse3.asm b/aom_dsp/x86/aom_subpixel_8t_ssse3.asm
index 3ca7921..e5fafb0 100644
--- a/aom_dsp/x86/aom_subpixel_8t_ssse3.asm
+++ b/aom_dsp/x86/aom_subpixel_8t_ssse3.asm
@@ -30,7 +30,7 @@
%define LOCAL_VARS_SIZE 16*6
%macro SETUP_LOCAL_VARS 0
- ; TODO(slavarnway): using xmm registers for these on ARCH_X86_64 +
+ ; TODO(slavarnway): using xmm registers for these on AOM_ARCH_X86_64 +
; pmaddubsw has a higher latency on some platforms, this might be eased by
; interleaving the instructions.
%define k0k1 [rsp + 16*0]
@@ -52,7 +52,7 @@
mova k2k3, m1
mova k4k5, m2
mova k6k7, m3
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define krd m12
%define tmp0 [rsp + 16*4]
%define tmp1 [rsp + 16*5]
@@ -72,7 +72,7 @@
%endm
;-------------------------------------------------------------------------------
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define LOCAL_VARS_SIZE_H4 0
%else
%define LOCAL_VARS_SIZE_H4 16*4
@@ -83,7 +83,7 @@
src, sstride, dst, dstride, height, filter
mova m4, [filterq]
packsswb m4, m4
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define k0k1k4k5 m8
%define k2k3k6k7 m9
%define krd m10
@@ -346,7 +346,7 @@
psraw m0, 7
psraw m4, 7
%ifidn %1, h8_add_src
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
pcmpeqb m2, m2 ;all ones
psrlw m2, 8 ;even_byte_mask
%else
@@ -383,7 +383,7 @@
; TODO(Linfeng): Detect cpu type and choose the code with better performance.
%define X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON 1
-%if ARCH_X86_64 && X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+%if AOM_ARCH_X86_64 && X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
%define NUM_GENERAL_REG_USED 9
%else
%define NUM_GENERAL_REG_USED 6
@@ -403,9 +403,9 @@
dec heightd
-%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+%if AOM_ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define src1q r7
%define sstride6q r8
%define dst_stride dstrideq
@@ -528,7 +528,7 @@
movx [dstq], m0
%else
- ; ARCH_X86_64
+ ; AOM_ARCH_X86_64
movx m0, [srcq ] ;A
movx m1, [srcq + sstrideq ] ;B
@@ -628,7 +628,7 @@
%endif
movx [dstq], m0
-%endif ; ARCH_X86_64
+%endif ; AOM_ARCH_X86_64
.done:
REP_RET
@@ -642,9 +642,9 @@
mova m4, [filterq]
SETUP_LOCAL_VARS
-%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
+%if AOM_ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define src1q r7
%define sstride6q r8
%define dst_stride dstrideq
@@ -724,7 +724,7 @@
REP_RET
%else
- ; ARCH_X86_64
+ ; AOM_ARCH_X86_64
dec heightd
movu m1, [srcq ] ;A
@@ -860,7 +860,7 @@
.done:
REP_RET
-%endif ; ARCH_X86_64
+%endif ; AOM_ARCH_X86_64
%endm
diff --git a/aom_dsp/x86/avg_intrin_avx2.c b/aom_dsp/x86/avg_intrin_avx2.c
index c85d8c5..49fcd72 100644
--- a/aom_dsp/x86/avg_intrin_avx2.c
+++ b/aom_dsp/x86/avg_intrin_avx2.c
@@ -16,6 +16,14 @@
#include "aom_dsp/x86/bitdepth_conversion_avx2.h"
#include "aom_ports/mem.h"
+static INLINE void sign_extend_16bit_to_32bit_avx2(__m256i in, __m256i zero,
+ __m256i *out_lo,
+ __m256i *out_hi) {
+ const __m256i sign_bits = _mm256_cmpgt_epi16(zero, in);
+ *out_lo = _mm256_unpacklo_epi16(in, sign_bits);
+ *out_hi = _mm256_unpackhi_epi16(in, sign_bits);
+}
+
static void hadamard_col8x2_avx2(__m256i *in, int iter) {
__m256i a0 = in[0];
__m256i a1 = in[1];
@@ -224,6 +232,12 @@
DECLARE_ALIGNED(32, int16_t, temp_coeff[32 * 32]);
int16_t *t_coeff = temp_coeff;
int idx;
+ __m256i coeff0_lo, coeff1_lo, coeff2_lo, coeff3_lo, b0_lo, b1_lo, b2_lo,
+ b3_lo;
+ __m256i coeff0_hi, coeff1_hi, coeff2_hi, coeff3_hi, b0_hi, b1_hi, b2_hi,
+ b3_hi;
+ __m256i b0, b1, b2, b3;
+ const __m256i zero = _mm256_setzero_si256();
for (idx = 0; idx < 4; ++idx) {
// src_diff: 9 bit, dynamic range [-255, 255]
const int16_t *src_ptr =
@@ -238,15 +252,38 @@
const __m256i coeff2 = _mm256_loadu_si256((const __m256i *)(t_coeff + 512));
const __m256i coeff3 = _mm256_loadu_si256((const __m256i *)(t_coeff + 768));
- __m256i b0 = _mm256_add_epi16(coeff0, coeff1);
- __m256i b1 = _mm256_sub_epi16(coeff0, coeff1);
- __m256i b2 = _mm256_add_epi16(coeff2, coeff3);
- __m256i b3 = _mm256_sub_epi16(coeff2, coeff3);
+ // Sign extend 16 bit to 32 bit.
+ sign_extend_16bit_to_32bit_avx2(coeff0, zero, &coeff0_lo, &coeff0_hi);
+ sign_extend_16bit_to_32bit_avx2(coeff1, zero, &coeff1_lo, &coeff1_hi);
+ sign_extend_16bit_to_32bit_avx2(coeff2, zero, &coeff2_lo, &coeff2_hi);
+ sign_extend_16bit_to_32bit_avx2(coeff3, zero, &coeff3_lo, &coeff3_hi);
- b0 = _mm256_srai_epi16(b0, 2);
- b1 = _mm256_srai_epi16(b1, 2);
- b2 = _mm256_srai_epi16(b2, 2);
- b3 = _mm256_srai_epi16(b3, 2);
+ b0_lo = _mm256_add_epi32(coeff0_lo, coeff1_lo);
+ b0_hi = _mm256_add_epi32(coeff0_hi, coeff1_hi);
+
+ b1_lo = _mm256_sub_epi32(coeff0_lo, coeff1_lo);
+ b1_hi = _mm256_sub_epi32(coeff0_hi, coeff1_hi);
+
+ b2_lo = _mm256_add_epi32(coeff2_lo, coeff3_lo);
+ b2_hi = _mm256_add_epi32(coeff2_hi, coeff3_hi);
+
+ b3_lo = _mm256_sub_epi32(coeff2_lo, coeff3_lo);
+ b3_hi = _mm256_sub_epi32(coeff2_hi, coeff3_hi);
+
+ b0_lo = _mm256_srai_epi32(b0_lo, 2);
+ b1_lo = _mm256_srai_epi32(b1_lo, 2);
+ b2_lo = _mm256_srai_epi32(b2_lo, 2);
+ b3_lo = _mm256_srai_epi32(b3_lo, 2);
+
+ b0_hi = _mm256_srai_epi32(b0_hi, 2);
+ b1_hi = _mm256_srai_epi32(b1_hi, 2);
+ b2_hi = _mm256_srai_epi32(b2_hi, 2);
+ b3_hi = _mm256_srai_epi32(b3_hi, 2);
+
+ b0 = _mm256_packs_epi32(b0_lo, b0_hi);
+ b1 = _mm256_packs_epi32(b1_lo, b1_hi);
+ b2 = _mm256_packs_epi32(b2_lo, b2_hi);
+ b3 = _mm256_packs_epi32(b3_lo, b3_hi);
store_tran_low(_mm256_add_epi16(b0, b2), coeff);
store_tran_low(_mm256_add_epi16(b1, b3), coeff + 256);
diff --git a/aom_dsp/x86/avg_intrin_sse2.c b/aom_dsp/x86/avg_intrin_sse2.c
index 71e7028..ca2752e 100644
--- a/aom_dsp/x86/avg_intrin_sse2.c
+++ b/aom_dsp/x86/avg_intrin_sse2.c
@@ -17,6 +17,14 @@
#include "aom_dsp/x86/mem_sse2.h"
#include "aom_ports/mem.h"
+static INLINE void sign_extend_16bit_to_32bit_sse2(__m128i in, __m128i zero,
+ __m128i *out_lo,
+ __m128i *out_hi) {
+ const __m128i sign_bits = _mm_cmplt_epi16(in, zero);
+ *out_lo = _mm_unpacklo_epi16(in, sign_bits);
+ *out_hi = _mm_unpackhi_epi16(in, sign_bits);
+}
+
void aom_minmax_8x8_sse2(const uint8_t *s, int p, const uint8_t *d, int dp,
int *min, int *max) {
__m128i u0, s0, d0, diff, maxabsdiff, minabsdiff, negdiff, absdiff0, absdiff;
@@ -344,56 +352,6 @@
hadamard_8x8_sse2(src_diff, src_stride, coeff, 1);
}
-void aom_pixel_scale_sse2(const int16_t *src_diff, ptrdiff_t src_stride,
- int16_t *coeff, int log_scale, int h8, int w8) {
- __m128i src[8];
- const int16_t *org_src_diff = src_diff;
- int16_t *org_coeff = coeff;
- int coeff_stride = w8 << 3;
- for (int idy = 0; idy < h8; ++idy) {
- for (int idx = 0; idx < w8; ++idx) {
- src_diff = org_src_diff + (idx << 3);
- coeff = org_coeff + (idx << 3);
-
- src[0] = _mm_load_si128((const __m128i *)src_diff);
- src[1] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[2] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[3] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[4] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[5] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[6] = _mm_load_si128((const __m128i *)(src_diff += src_stride));
- src[7] = _mm_load_si128((const __m128i *)(src_diff + src_stride));
-
- src[0] = _mm_slli_epi16(src[0], log_scale);
- src[1] = _mm_slli_epi16(src[1], log_scale);
- src[2] = _mm_slli_epi16(src[2], log_scale);
- src[3] = _mm_slli_epi16(src[3], log_scale);
- src[4] = _mm_slli_epi16(src[4], log_scale);
- src[5] = _mm_slli_epi16(src[5], log_scale);
- src[6] = _mm_slli_epi16(src[6], log_scale);
- src[7] = _mm_slli_epi16(src[7], log_scale);
-
- _mm_store_si128((__m128i *)coeff, src[0]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[1]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[2]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[3]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[4]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[5]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[6]);
- coeff += coeff_stride;
- _mm_store_si128((__m128i *)coeff, src[7]);
- }
- org_src_diff += (src_stride << 3);
- org_coeff += (coeff_stride << 3);
- }
-}
-
static INLINE void hadamard_lp_8x8_sse2(const int16_t *src_diff,
ptrdiff_t src_stride, int16_t *coeff) {
__m128i src[8];
@@ -552,6 +510,12 @@
DECLARE_ALIGNED(32, int16_t, temp_coeff[32 * 32]);
int16_t *t_coeff = temp_coeff;
int idx;
+ __m128i coeff0_lo, coeff1_lo, coeff2_lo, coeff3_lo, b0_lo, b1_lo, b2_lo,
+ b3_lo;
+ __m128i coeff0_hi, coeff1_hi, coeff2_hi, coeff3_hi, b0_hi, b1_hi, b2_hi,
+ b3_hi;
+ __m128i b0, b1, b2, b3;
+ const __m128i zero = _mm_setzero_si128();
for (idx = 0; idx < 4; ++idx) {
const int16_t *src_ptr =
src_diff + (idx >> 1) * 16 * src_stride + (idx & 0x01) * 16;
@@ -565,15 +529,38 @@
__m128i coeff2 = _mm_load_si128((const __m128i *)(t_coeff + 512));
__m128i coeff3 = _mm_load_si128((const __m128i *)(t_coeff + 768));
- __m128i b0 = _mm_add_epi16(coeff0, coeff1);
- __m128i b1 = _mm_sub_epi16(coeff0, coeff1);
- __m128i b2 = _mm_add_epi16(coeff2, coeff3);
- __m128i b3 = _mm_sub_epi16(coeff2, coeff3);
+ // Sign extend 16 bit to 32 bit.
+ sign_extend_16bit_to_32bit_sse2(coeff0, zero, &coeff0_lo, &coeff0_hi);
+ sign_extend_16bit_to_32bit_sse2(coeff1, zero, &coeff1_lo, &coeff1_hi);
+ sign_extend_16bit_to_32bit_sse2(coeff2, zero, &coeff2_lo, &coeff2_hi);
+ sign_extend_16bit_to_32bit_sse2(coeff3, zero, &coeff3_lo, &coeff3_hi);
- b0 = _mm_srai_epi16(b0, 2);
- b1 = _mm_srai_epi16(b1, 2);
- b2 = _mm_srai_epi16(b2, 2);
- b3 = _mm_srai_epi16(b3, 2);
+ b0_lo = _mm_add_epi32(coeff0_lo, coeff1_lo);
+ b0_hi = _mm_add_epi32(coeff0_hi, coeff1_hi);
+
+ b1_lo = _mm_sub_epi32(coeff0_lo, coeff1_lo);
+ b1_hi = _mm_sub_epi32(coeff0_hi, coeff1_hi);
+
+ b2_lo = _mm_add_epi32(coeff2_lo, coeff3_lo);
+ b2_hi = _mm_add_epi32(coeff2_hi, coeff3_hi);
+
+ b3_lo = _mm_sub_epi32(coeff2_lo, coeff3_lo);
+ b3_hi = _mm_sub_epi32(coeff2_hi, coeff3_hi);
+
+ b0_lo = _mm_srai_epi32(b0_lo, 2);
+ b1_lo = _mm_srai_epi32(b1_lo, 2);
+ b2_lo = _mm_srai_epi32(b2_lo, 2);
+ b3_lo = _mm_srai_epi32(b3_lo, 2);
+
+ b0_hi = _mm_srai_epi32(b0_hi, 2);
+ b1_hi = _mm_srai_epi32(b1_hi, 2);
+ b2_hi = _mm_srai_epi32(b2_hi, 2);
+ b3_hi = _mm_srai_epi32(b3_hi, 2);
+
+ b0 = _mm_packs_epi32(b0_lo, b0_hi);
+ b1 = _mm_packs_epi32(b1_lo, b1_hi);
+ b2 = _mm_packs_epi32(b2_lo, b2_hi);
+ b3 = _mm_packs_epi32(b3_lo, b3_hi);
coeff0 = _mm_add_epi16(b0, b2);
coeff1 = _mm_add_epi16(b1, b3);
diff --git a/aom_dsp/x86/blk_sse_sum_avx2.c b/aom_dsp/x86/blk_sse_sum_avx2.c
index f7c0eb0..fdf7de3 100644
--- a/aom_dsp/x86/blk_sse_sum_avx2.c
+++ b/aom_dsp/x86/blk_sse_sum_avx2.c
@@ -31,7 +31,7 @@
out_buffer = _mm256_castsi256_si128(regx_sum);
*x_sum += _mm_cvtsi128_si32(out_buffer);
out_buffer = _mm256_castsi256_si128(regx2_sum);
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
*x2_sum += _mm_cvtsi128_si64(out_buffer);
#else
{
diff --git a/aom_dsp/x86/blk_sse_sum_sse2.c b/aom_dsp/x86/blk_sse_sum_sse2.c
index ef0a024..bf89427 100644
--- a/aom_dsp/x86/blk_sse_sum_sse2.c
+++ b/aom_dsp/x86/blk_sse_sum_sse2.c
@@ -41,7 +41,7 @@
temp_buffer2 = _mm_unpackhi_epi32(regx2_sum, _mm_setzero_si128());
regx2_sum = _mm_add_epi64(temp_buffer1, temp_buffer2);
regx2_sum = _mm_add_epi64(regx2_sum, _mm_srli_si128(regx2_sum, 8));
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
*x2_sum += _mm_cvtsi128_si64(regx2_sum);
#else
{
@@ -82,7 +82,7 @@
temp_buffer2 = _mm_unpackhi_epi32(regx2_sum, _mm_setzero_si128());
regx2_sum = _mm_add_epi64(temp_buffer1, temp_buffer2);
regx2_sum = _mm_add_epi64(regx2_sum, _mm_srli_si128(regx2_sum, 8));
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
*x2_sum += _mm_cvtsi128_si64(regx2_sum);
#else
{
diff --git a/aom_dsp/x86/convolve_avx2.h b/aom_dsp/x86/convolve_avx2.h
index a709008..f5a382c 100644
--- a/aom_dsp/x86/convolve_avx2.h
+++ b/aom_dsp/x86/convolve_avx2.h
@@ -329,20 +329,20 @@
_mm256_castsi128_si256(_mm_loadu_si128( \
(__m128i *)(&src_ptr[i * src_stride + src_stride + j]))), \
0x20); \
- const __m256i s_16l = _mm256_unpacklo_epi8(data, v_zero); \
- const __m256i s_16h = _mm256_unpackhi_epi8(data, v_zero); \
- const __m256i s_ll = _mm256_unpacklo_epi16(s_16l, s_16l); \
- const __m256i s_lh = _mm256_unpackhi_epi16(s_16l, s_16l); \
+ const __m256i s_16lo = _mm256_unpacklo_epi8(data, v_zero); \
+ const __m256i s_16hi = _mm256_unpackhi_epi8(data, v_zero); \
+ const __m256i s_lolo = _mm256_unpacklo_epi16(s_16lo, s_16lo); \
+ const __m256i s_lohi = _mm256_unpackhi_epi16(s_16lo, s_16lo); \
\
- const __m256i s_hl = _mm256_unpacklo_epi16(s_16h, s_16h); \
- const __m256i s_hh = _mm256_unpackhi_epi16(s_16h, s_16h); \
+ const __m256i s_hilo = _mm256_unpacklo_epi16(s_16hi, s_16hi); \
+ const __m256i s_hihi = _mm256_unpackhi_epi16(s_16hi, s_16hi); \
\
- s[0] = _mm256_alignr_epi8(s_lh, s_ll, 2); \
- s[1] = _mm256_alignr_epi8(s_lh, s_ll, 10); \
- s[2] = _mm256_alignr_epi8(s_hl, s_lh, 2); \
- s[3] = _mm256_alignr_epi8(s_hl, s_lh, 10); \
- s[4] = _mm256_alignr_epi8(s_hh, s_hl, 2); \
- s[5] = _mm256_alignr_epi8(s_hh, s_hl, 10); \
+ s[0] = _mm256_alignr_epi8(s_lohi, s_lolo, 2); \
+ s[1] = _mm256_alignr_epi8(s_lohi, s_lolo, 10); \
+ s[2] = _mm256_alignr_epi8(s_hilo, s_lohi, 2); \
+ s[3] = _mm256_alignr_epi8(s_hilo, s_lohi, 10); \
+ s[4] = _mm256_alignr_epi8(s_hihi, s_hilo, 2); \
+ s[5] = _mm256_alignr_epi8(s_hihi, s_hilo, 10); \
\
const __m256i res_lo = convolve_12taps(s, coeffs_h); \
\
@@ -373,21 +373,21 @@
_mm256_castsi128_si256( \
_mm_loadu_si128((__m128i *)(&src_ptr[i * src_stride + j + 4]))), \
0x20); \
- const __m256i s_16l = _mm256_unpacklo_epi8(data, v_zero); \
- const __m256i s_16h = _mm256_unpackhi_epi8(data, v_zero); \
+ const __m256i s_16lo = _mm256_unpacklo_epi8(data, v_zero); \
+ const __m256i s_16hi = _mm256_unpackhi_epi8(data, v_zero); \
\
- const __m256i s_ll = _mm256_unpacklo_epi16(s_16l, s_16l); \
- const __m256i s_lh = _mm256_unpackhi_epi16(s_16l, s_16l); \
+ const __m256i s_lolo = _mm256_unpacklo_epi16(s_16lo, s_16lo); \
+ const __m256i s_lohi = _mm256_unpackhi_epi16(s_16lo, s_16lo); \
\
- const __m256i s_hl = _mm256_unpacklo_epi16(s_16h, s_16h); \
- const __m256i s_hh = _mm256_unpackhi_epi16(s_16h, s_16h); \
+ const __m256i s_hilo = _mm256_unpacklo_epi16(s_16hi, s_16hi); \
+ const __m256i s_hihi = _mm256_unpackhi_epi16(s_16hi, s_16hi); \
\
- s[0] = _mm256_alignr_epi8(s_lh, s_ll, 2); \
- s[1] = _mm256_alignr_epi8(s_lh, s_ll, 10); \
- s[2] = _mm256_alignr_epi8(s_hl, s_lh, 2); \
- s[3] = _mm256_alignr_epi8(s_hl, s_lh, 10); \
- s[4] = _mm256_alignr_epi8(s_hh, s_hl, 2); \
- s[5] = _mm256_alignr_epi8(s_hh, s_hl, 10); \
+ s[0] = _mm256_alignr_epi8(s_lohi, s_lolo, 2); \
+ s[1] = _mm256_alignr_epi8(s_lohi, s_lolo, 10); \
+ s[2] = _mm256_alignr_epi8(s_hilo, s_lohi, 2); \
+ s[3] = _mm256_alignr_epi8(s_hilo, s_lohi, 10); \
+ s[4] = _mm256_alignr_epi8(s_hihi, s_hilo, 2); \
+ s[5] = _mm256_alignr_epi8(s_hihi, s_hilo, 10); \
\
const __m256i res_lo = convolve_12taps(s, coeffs_h); \
\
diff --git a/aom_dsp/x86/fwd_txfm_impl_sse2.h b/aom_dsp/x86/fwd_txfm_impl_sse2.h
index 89fe189..7ee8ba3 100644
--- a/aom_dsp/x86/fwd_txfm_impl_sse2.h
+++ b/aom_dsp/x86/fwd_txfm_impl_sse2.h
@@ -180,25 +180,8 @@
const __m128i w1 = _mm_srai_epi32(v1, DCT_CONST_BITS2);
const __m128i w2 = _mm_srai_epi32(v2, DCT_CONST_BITS2);
const __m128i w3 = _mm_srai_epi32(v3, DCT_CONST_BITS2);
- // w0 = [o0 o4 o8 oC]
- // w1 = [o2 o6 oA oE]
- // w2 = [o1 o5 o9 oD]
- // w3 = [o3 o7 oB oF]
- // remember the o's are numbered according to the correct output location
- const __m128i x0 = _mm_packs_epi32(w0, w1);
- const __m128i x1 = _mm_packs_epi32(w2, w3);
- {
- // x0 = [o0 o4 o8 oC o2 o6 oA oE]
- // x1 = [o1 o5 o9 oD o3 o7 oB oF]
- const __m128i y0 = _mm_unpacklo_epi16(x0, x1);
- const __m128i y1 = _mm_unpackhi_epi16(x0, x1);
- // y0 = [o0 o1 o4 o5 o8 o9 oC oD]
- // y1 = [o2 o3 o6 o7 oA oB oE oF]
- *in0 = _mm_unpacklo_epi32(y0, y1);
- // in0 = [o0 o1 o2 o3 o4 o5 o6 o7]
- *in1 = _mm_unpackhi_epi32(y0, y1);
- // in1 = [o8 o9 oA oB oC oD oE oF]
- }
+ *in0 = _mm_packs_epi32(w0, w2);
+ *in1 = _mm_packs_epi32(w1, w3);
}
}
}
@@ -230,6 +213,7 @@
_mm_storeu_si128((__m128i *)(output + 2 * 4), in1);
}
+#if CONFIG_INTERNAL_STATS
void FDCT8x8_2D(const int16_t *input, tran_low_t *output, int stride) {
int pass;
// Constants
@@ -539,6 +523,7 @@
store_output(&in7, (output + 7 * 8));
}
}
+#endif // CONFIG_INTERNAL_STATS
#undef ADD_EPI16
#undef SUB_EPI16
diff --git a/aom_dsp/x86/fwd_txfm_ssse3_x86_64.asm b/aom_dsp/x86/fwd_txfm_ssse3_x86_64.asm
index c1fb259..0687904 100644
--- a/aom_dsp/x86/fwd_txfm_ssse3_x86_64.asm
+++ b/aom_dsp/x86/fwd_txfm_ssse3_x86_64.asm
@@ -45,7 +45,7 @@
SECTION .text
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
INIT_XMM ssse3
cglobal fdct8x8, 3, 5, 13, input, output, stride
diff --git a/aom_dsp/x86/highbd_sad4d_sse2.asm b/aom_dsp/x86/highbd_sad4d_sse2.asm
index 9442cd0..03839b4 100644
--- a/aom_dsp/x86/highbd_sad4d_sse2.asm
+++ b/aom_dsp/x86/highbd_sad4d_sse2.asm
@@ -221,21 +221,21 @@
; 3: If 0, then normal sad, if 2, then skip every other row
%macro HIGH_SADNXN4D 2-3 0
%if %3 == 0 ; normal sad
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
cglobal highbd_sad%1x%2x4d, 5, 8, 8, src, src_stride, ref1, ref_stride, \
res, ref2, ref3, ref4
%else
cglobal highbd_sad%1x%2x4d, 4, 7, 8, src, src_stride, ref1, ref_stride, \
ref2, ref3, ref4
-%endif ; ARCH_X86_64
+%endif ; AOM_ARCH_X86_64
%else ; %3 == 2, downsample
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
cglobal highbd_sad_skip_%1x%2x4d, 5, 8, 8, src, src_stride, ref1, ref_stride, \
res, ref2, ref3, ref4
%else
cglobal highbd_sad_skip_%1x%2x4d, 4, 7, 8, src, src_stride, ref1, ref_stride, \
ref2, ref3, ref4
-%endif ; ARCH_X86_64
+%endif ; AOM_ARCH_X86_64
%endif ; sad/avg/skip
; set m1
diff --git a/aom_dsp/x86/highbd_sad_sse2.asm b/aom_dsp/x86/highbd_sad_sse2.asm
index 48b93bf..3dc4e4e 100644
--- a/aom_dsp/x86/highbd_sad_sse2.asm
+++ b/aom_dsp/x86/highbd_sad_sse2.asm
@@ -34,11 +34,11 @@
cglobal highbd_sad%1x%2_avg, 5, 1 + %3, %5, src, src_stride, ref, ref_stride, \
second_pred, n_rows
%else ; %3 == 7
-cglobal highbd_sad%1x%2_avg, 5, ARCH_X86_64 + %3, %5, src, src_stride, \
+cglobal highbd_sad%1x%2_avg, 5, AOM_ARCH_X86_64 + %3, %5, src, src_stride, \
ref, ref_stride, \
second_pred, \
src_stride3, ref_stride3
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define n_rowsd r7d
%else ; x86-32
%define n_rowsd dword r0m
diff --git a/aom_dsp/x86/highbd_subpel_variance_impl_sse2.asm b/aom_dsp/x86/highbd_subpel_variance_impl_sse2.asm
index 5c78933..c0ccc18 100644
--- a/aom_dsp/x86/highbd_subpel_variance_impl_sse2.asm
+++ b/aom_dsp/x86/highbd_subpel_variance_impl_sse2.asm
@@ -81,7 +81,7 @@
%endmacro
%macro INC_SRC_BY_SRC_STRIDE 0
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
add srcq, src_stridemp
add srcq, src_stridemp
%else
@@ -94,7 +94,7 @@
%define filter_idx_shift 5
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%if %2 == 1 ; avg
cglobal highbd_sub_pixel_avg_variance%1xh, 9, 10, 13, src, src_stride, \
x_offset, y_offset, \
@@ -271,11 +271,11 @@
.x_zero_y_nonhalf:
; x_offset == 0 && y_offset == bilin interpolation
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && mmsize == 16
+%if AOM_ARCH_X86_64 && mmsize == 16
mova m8, [bilin_filter+y_offsetq]
mova m9, [bilin_filter+y_offsetq+16]
mova m10, [GLOBAL(pw_8)]
@@ -283,7 +283,7 @@
%define filter_y_b m9
%define filter_rnd m10
%else ; x86-32 or mmx
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; x_offset == 0, reuse x_offset reg
%define tempq x_offsetq
add y_offsetq, g_bilin_filterm
@@ -498,11 +498,11 @@
.x_half_y_nonhalf:
; x_offset == 0.5 && y_offset == bilin interpolation
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && mmsize == 16
+%if AOM_ARCH_X86_64 && mmsize == 16
mova m8, [bilin_filter+y_offsetq]
mova m9, [bilin_filter+y_offsetq+16]
mova m10, [GLOBAL(pw_8)]
@@ -510,7 +510,7 @@
%define filter_y_b m9
%define filter_rnd m10
%else ; x86_32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; x_offset == 0.5. We can reuse x_offset reg
%define tempq x_offsetq
add y_offsetq, g_bilin_filterm
@@ -620,11 +620,11 @@
jnz .x_nonhalf_y_nonzero
; x_offset == bilin interpolation && y_offset == 0
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift
-%if ARCH_X86_64 && mmsize == 16
+%if AOM_ARCH_X86_64 && mmsize == 16
mova m8, [bilin_filter+x_offsetq]
mova m9, [bilin_filter+x_offsetq+16]
mova m10, [GLOBAL(pw_8)]
@@ -632,7 +632,7 @@
%define filter_x_b m9
%define filter_rnd m10
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; y_offset == 0. We can reuse y_offset reg.
%define tempq y_offsetq
add x_offsetq, g_bilin_filterm
@@ -719,11 +719,11 @@
jne .x_nonhalf_y_nonhalf
; x_offset == bilin interpolation && y_offset == 0.5
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift
-%if ARCH_X86_64 && mmsize == 16
+%if AOM_ARCH_X86_64 && mmsize == 16
mova m8, [bilin_filter+x_offsetq]
mova m9, [bilin_filter+x_offsetq+16]
mova m10, [GLOBAL(pw_8)]
@@ -731,7 +731,7 @@
%define filter_x_b m9
%define filter_rnd m10
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; y_offset == 0.5. We can reuse y_offset reg.
%define tempq y_offsetq
add x_offsetq, g_bilin_filterm
@@ -846,12 +846,12 @@
.x_nonhalf_y_nonhalf:
; loading filter - this is same as in 8-bit depth
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift ; filter_idx_shift = 5
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && mmsize == 16
+%if AOM_ARCH_X86_64 && mmsize == 16
mova m8, [bilin_filter+x_offsetq]
mova m9, [bilin_filter+x_offsetq+16]
mova m10, [bilin_filter+y_offsetq]
@@ -863,7 +863,7 @@
%define filter_y_b m11
%define filter_rnd m12
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; In this case, there is NO unused register. Used src_stride register. Later,
; src_stride has to be loaded from stack when it is needed.
%define tempq src_strideq
diff --git a/aom_dsp/x86/highbd_variance_sse2.c b/aom_dsp/x86/highbd_variance_sse2.c
index d45885c..e897aab 100644
--- a/aom_dsp/x86/highbd_variance_sse2.c
+++ b/aom_dsp/x86/highbd_variance_sse2.c
@@ -98,43 +98,6 @@
*sse = (uint32_t)ROUND_POWER_OF_TWO(sse_long, 8);
}
-#define HIGH_GET_VAR(S) \
- void aom_highbd_get##S##x##S##var_sse2(const uint8_t *src8, int src_stride, \
- const uint8_t *ref8, int ref_stride, \
- uint32_t *sse, int *sum) { \
- uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
- uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
- aom_highbd_calc##S##x##S##var_sse2(src, src_stride, ref, ref_stride, sse, \
- sum); \
- } \
- \
- void aom_highbd_10_get##S##x##S##var_sse2( \
- const uint8_t *src8, int src_stride, const uint8_t *ref8, \
- int ref_stride, uint32_t *sse, int *sum) { \
- uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
- uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
- aom_highbd_calc##S##x##S##var_sse2(src, src_stride, ref, ref_stride, sse, \
- sum); \
- *sum = ROUND_POWER_OF_TWO(*sum, 2); \
- *sse = ROUND_POWER_OF_TWO(*sse, 4); \
- } \
- \
- void aom_highbd_12_get##S##x##S##var_sse2( \
- const uint8_t *src8, int src_stride, const uint8_t *ref8, \
- int ref_stride, uint32_t *sse, int *sum) { \
- uint16_t *src = CONVERT_TO_SHORTPTR(src8); \
- uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \
- aom_highbd_calc##S##x##S##var_sse2(src, src_stride, ref, ref_stride, sse, \
- sum); \
- *sum = ROUND_POWER_OF_TWO(*sum, 4); \
- *sse = ROUND_POWER_OF_TWO(*sse, 8); \
- }
-
-HIGH_GET_VAR(16)
-HIGH_GET_VAR(8)
-
-#undef HIGH_GET_VAR
-
#define VAR_FN(w, h, block_size, shift) \
uint32_t aom_highbd_8_variance##w##x##h##_sse2( \
const uint8_t *src8, int src_stride, const uint8_t *ref8, \
diff --git a/aom_dsp/x86/intrapred_sse4.c b/aom_dsp/x86/intrapred_sse4.c
index 3f72dc4..fb30420 100644
--- a/aom_dsp/x86/intrapred_sse4.c
+++ b/aom_dsp/x86/intrapred_sse4.c
@@ -602,7 +602,7 @@
const __m128i c1234 = _mm_setr_epi16(1, 2, 3, 4, 5, 6, 7, 8);
for (int r = 0; r < N; r++) {
- __m128i b, res, res1, shift, shifty;
+ __m128i b, res, res1, shift;
__m128i resx, resy, resxy, r6, ydx;
int y = r + 1;
@@ -620,11 +620,7 @@
}
if (base_shift > 7) {
- a0_x = _mm_setzero_si128();
- a1_x = _mm_setzero_si128();
- a0_y = _mm_setzero_si128();
- a1_y = _mm_setzero_si128();
- shift = _mm_setzero_si128();
+ resx = _mm_setzero_si128();
} else {
a0_above = _mm_loadu_si128((__m128i *)(above + base_x + base_shift));
ydx = _mm_set1_epi16(y * dx);
@@ -649,9 +645,15 @@
}
a0_x = _mm_cvtepu8_epi16(a0_above);
a1_x = _mm_cvtepu8_epi16(a1_above);
- a0_y = _mm_setzero_si128();
- a1_y = _mm_setzero_si128();
- shifty = shift;
+
+ diff = _mm_sub_epi16(a1_x, a0_x); // a[x+1] - a[x]
+ a32 = _mm_slli_epi16(a0_x, 5); // a[x] * 32
+ a32 = _mm_add_epi16(a32, a16); // a[x] * 32 + 16
+
+ b = _mm_mullo_epi16(diff, shift);
+ res = _mm_add_epi16(a32, b);
+ res = _mm_srli_epi16(res, 5);
+ resx = _mm_packus_epi16(res, res);
}
// y calc
@@ -678,34 +680,27 @@
left[base_y_c[6]], left[base_y_c[7]]);
if (upsample_left) {
- shifty = _mm_srli_epi16(
+ shift = _mm_srli_epi16(
_mm_and_si128(_mm_slli_epi16(y_c, upsample_left), c3f), 1);
} else {
- shifty = _mm_srli_epi16(_mm_and_si128(y_c, c3f), 1);
+ shift = _mm_srli_epi16(_mm_and_si128(y_c, c3f), 1);
}
+
+ diff = _mm_sub_epi16(a1_y, a0_y); // a[x+1] - a[x]
+ a32 = _mm_slli_epi16(a0_y, 5); // a[x] * 32
+ a32 = _mm_add_epi16(a32, a16); // a[x] * 32 + 16
+
+ b = _mm_mullo_epi16(diff, shift);
+ res1 = _mm_add_epi16(a32, b);
+ res1 = _mm_srli_epi16(res1, 5);
+
+ resy = _mm_packus_epi16(res1, res1);
+ resxy = _mm_blendv_epi8(resx, resy, *(__m128i *)Mask[0][base_min_diff]);
+ _mm_storel_epi64((__m128i *)dst, resxy);
+ } else {
+ _mm_storel_epi64((__m128i *)dst, resx);
}
- diff = _mm_sub_epi16(a1_x, a0_x); // a[x+1] - a[x]
- a32 = _mm_slli_epi16(a0_x, 5); // a[x] * 32
- a32 = _mm_add_epi16(a32, a16); // a[x] * 32 + 16
-
- b = _mm_mullo_epi16(diff, shift);
- res = _mm_add_epi16(a32, b);
- res = _mm_srli_epi16(res, 5);
-
- diff = _mm_sub_epi16(a1_y, a0_y); // a[x+1] - a[x]
- a32 = _mm_slli_epi16(a0_y, 5); // a[x] * 32
- a32 = _mm_add_epi16(a32, a16); // a[x] * 32 + 16
-
- b = _mm_mullo_epi16(diff, shifty);
- res1 = _mm_add_epi16(a32, b);
- res1 = _mm_srli_epi16(res1, 5);
-
- resx = _mm_packus_epi16(res, res);
- resy = _mm_packus_epi16(res1, res1);
-
- resxy = _mm_blendv_epi8(resx, resy, *(__m128i *)Mask[0][base_min_diff]);
- _mm_storel_epi64((__m128i *)(dst), resxy);
dst += stride;
}
}
diff --git a/aom_dsp/x86/jnt_sad_ssse3.c b/aom_dsp/x86/jnt_sad_sse2.c
similarity index 66%
rename from aom_dsp/x86/jnt_sad_ssse3.c
rename to aom_dsp/x86/jnt_sad_sse2.c
index 357f70a..16d2f4b 100644
--- a/aom_dsp/x86/jnt_sad_ssse3.c
+++ b/aom_dsp/x86/jnt_sad_sse2.c
@@ -10,16 +10,16 @@
*/
#include <assert.h>
-#include <emmintrin.h> // SSE2
-#include <tmmintrin.h>
+#include <emmintrin.h>
#include "config/aom_config.h"
#include "config/aom_dsp_rtcd.h"
#include "aom_dsp/x86/synonyms.h"
-unsigned int aom_sad4xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad4xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i;
assert(width == 4);
(void)width;
@@ -59,8 +59,9 @@
return res;
}
-unsigned int aom_sad8xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad8xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i;
assert(width == 8);
(void)width;
@@ -91,8 +92,9 @@
return res;
}
-unsigned int aom_sad16xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad16xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i;
assert(width == 16);
(void)width;
@@ -116,8 +118,9 @@
return res;
}
-unsigned int aom_sad32xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad32xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i, j;
assert(width == 32);
(void)width;
@@ -143,8 +146,9 @@
return res;
}
-unsigned int aom_sad64xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad64xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i, j;
assert(width == 64);
(void)width;
@@ -170,8 +174,9 @@
return res;
}
-unsigned int aom_sad128xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b,
- int b_stride, int width, int height) {
+static unsigned int sad128xh_sse2(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride, int width,
+ int height) {
int i, j;
assert(width == 128);
(void)width;
@@ -197,47 +202,37 @@
return res;
}
-#define dist_wtd_sadMxN_sse2(m, n) \
- unsigned int aom_dist_wtd_sad##m##x##n##_avg_ssse3( \
+#define DIST_WTD_SADMXN_SSE2(m, n) \
+ unsigned int aom_dist_wtd_sad##m##x##n##_avg_sse2( \
const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param) { \
uint8_t comp_pred[m * n]; \
aom_dist_wtd_comp_avg_pred(comp_pred, second_pred, m, n, ref, ref_stride, \
jcp_param); \
- return aom_sad##m##xh_sse2(src, src_stride, comp_pred, m, m, n); \
+ return sad##m##xh_sse2(src, src_stride, comp_pred, m, m, n); \
}
-#define dist_wtd_sadMxN_avx2(m, n) \
- unsigned int aom_dist_wtd_sad##m##x##n##_avg_avx2( \
- const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
- const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param) { \
- uint8_t comp_pred[m * n]; \
- aom_dist_wtd_comp_avg_pred(comp_pred, second_pred, m, n, ref, ref_stride, \
- jcp_param); \
- return aom_sad##m##xh_avx2(src, src_stride, comp_pred, m, m, n); \
- }
-
-/* clang-format off */
-dist_wtd_sadMxN_sse2(128, 128)
-dist_wtd_sadMxN_sse2(128, 64)
-dist_wtd_sadMxN_sse2(64, 128)
-dist_wtd_sadMxN_sse2(64, 64)
-dist_wtd_sadMxN_sse2(64, 32)
-dist_wtd_sadMxN_sse2(32, 64)
-dist_wtd_sadMxN_sse2(32, 32)
-dist_wtd_sadMxN_sse2(32, 16)
-dist_wtd_sadMxN_sse2(16, 32)
-dist_wtd_sadMxN_sse2(16, 16)
-dist_wtd_sadMxN_sse2(16, 8)
-dist_wtd_sadMxN_sse2(8, 16)
-dist_wtd_sadMxN_sse2(8, 8)
-dist_wtd_sadMxN_sse2(8, 4)
-dist_wtd_sadMxN_sse2(4, 8)
-dist_wtd_sadMxN_sse2(4, 4)
-dist_wtd_sadMxN_sse2(4, 16)
-dist_wtd_sadMxN_sse2(16, 4)
-dist_wtd_sadMxN_sse2(8, 32)
-dist_wtd_sadMxN_sse2(32, 8)
-dist_wtd_sadMxN_sse2(16, 64)
-dist_wtd_sadMxN_sse2(64, 16)
- /* clang-format on */
+DIST_WTD_SADMXN_SSE2(128, 128)
+DIST_WTD_SADMXN_SSE2(128, 64)
+DIST_WTD_SADMXN_SSE2(64, 128)
+DIST_WTD_SADMXN_SSE2(64, 64)
+DIST_WTD_SADMXN_SSE2(64, 32)
+DIST_WTD_SADMXN_SSE2(32, 64)
+DIST_WTD_SADMXN_SSE2(32, 32)
+DIST_WTD_SADMXN_SSE2(32, 16)
+DIST_WTD_SADMXN_SSE2(16, 32)
+DIST_WTD_SADMXN_SSE2(16, 16)
+DIST_WTD_SADMXN_SSE2(16, 8)
+DIST_WTD_SADMXN_SSE2(8, 16)
+DIST_WTD_SADMXN_SSE2(8, 8)
+DIST_WTD_SADMXN_SSE2(8, 4)
+DIST_WTD_SADMXN_SSE2(4, 8)
+DIST_WTD_SADMXN_SSE2(4, 4)
+#if !CONFIG_REALTIME_ONLY
+DIST_WTD_SADMXN_SSE2(4, 16)
+DIST_WTD_SADMXN_SSE2(16, 4)
+DIST_WTD_SADMXN_SSE2(8, 32)
+DIST_WTD_SADMXN_SSE2(32, 8)
+DIST_WTD_SADMXN_SSE2(16, 64)
+DIST_WTD_SADMXN_SSE2(64, 16)
+#endif
diff --git a/aom_dsp/x86/obmc_intrinsic_ssse3.h b/aom_dsp/x86/obmc_intrinsic_ssse3.h
index 48486c6..27398ff 100644
--- a/aom_dsp/x86/obmc_intrinsic_ssse3.h
+++ b/aom_dsp/x86/obmc_intrinsic_ssse3.h
@@ -24,7 +24,7 @@
static INLINE int64_t xx_hsum_epi64_si64(__m128i v_q) {
v_q = _mm_add_epi64(v_q, _mm_srli_si128(v_q, 8));
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return _mm_cvtsi128_si64(v_q);
#else
{
diff --git a/aom_dsp/x86/sad4d_sse2.asm b/aom_dsp/x86/sad4d_sse2.asm
index 6de708b..6edad99 100644
--- a/aom_dsp/x86/sad4d_sse2.asm
+++ b/aom_dsp/x86/sad4d_sse2.asm
@@ -15,13 +15,6 @@
SECTION .text
-%macro AVG_4x2x4 2
- movh m2, [second_predq]
- movlhps m2, m2
- pavgb %1, m2
- pavgb %2, m2
- lea second_predq, [second_predq+8]
-%endmacro
; 'spill_src_stride' affect a lot how the code works.
;
; When 'spill_src_stride' is false, the 'src_strideq' resides in
@@ -64,8 +57,8 @@
lea ref4q, [ref4q+ref_strideq*2]
%endmacro
-; PROCESS_4x2x4 first, do_avg
-%macro PROCESS_4x2x4 2
+; PROCESS_4x2x4 first
+%macro PROCESS_4x2x4 1
movd m0, [srcq]
HANDLE_SECOND_OFFSET
%if %1 == 1
@@ -87,9 +80,6 @@
movlhps m0, m0
movlhps m6, m4
movlhps m7, m5
-%if %2 == 1
- AVG_4x2x4 m6, m7
-%endif
psadbw m6, m0
psadbw m7, m0
%else
@@ -110,9 +100,6 @@
movlhps m0, m0
movlhps m1, m2
movlhps m3, m4
-%if %2 == 1
- AVG_4x2x4 m1, m3
-%endif
psadbw m1, m0
psadbw m3, m0
paddd m6, m1
@@ -120,8 +107,8 @@
%endif
%endmacro
-; PROCESS_8x2x4 first, do_avg
-%macro PROCESS_8x2x4 2
+; PROCESS_8x2x4 first
+%macro PROCESS_8x2x4 1
movh m0, [srcq]
HANDLE_SECOND_OFFSET
%if %1 == 1
@@ -134,14 +121,6 @@
movhps m5, [ref2q+ref_strideq]
movhps m6, [ref3q+ref_strideq]
movhps m7, [ref4q+ref_strideq]
-%if %2 == 1
- movu m3, [second_predq]
- pavgb m4, m3
- pavgb m5, m3
- pavgb m6, m3
- pavgb m7, m3
- lea second_predq, [second_predq+mmsize]
-%endif
psadbw m4, m0
psadbw m5, m0
psadbw m6, m0
@@ -152,11 +131,6 @@
movhps m0, [srcq + second_offset]
movhps m1, [ref1q+ref_strideq]
movhps m2, [ref2q+ref_strideq]
-%if %2 == 1
- movu m3, [second_predq]
- pavgb m1, m3
- pavgb m2, m3
-%endif
psadbw m1, m0
psadbw m2, m0
paddd m4, m1
@@ -166,11 +140,6 @@
movhps m1, [ref3q+ref_strideq]
movh m2, [ref4q]
movhps m2, [ref4q+ref_strideq]
-%if %2 == 1
- pavgb m1, m3
- pavgb m2, m3
- lea second_predq, [second_predq+mmsize]
-%endif
psadbw m1, m0
psadbw m2, m0
paddd m6, m1
@@ -178,37 +147,24 @@
%endif
%endmacro
-; PROCESS_FIRST_MMSIZE do_avg
-%macro PROCESS_FIRST_MMSIZE 1
+; PROCESS_FIRST_MMSIZE
+%macro PROCESS_FIRST_MMSIZE 0
mova m0, [srcq]
movu m4, [ref1q]
movu m5, [ref2q]
movu m6, [ref3q]
movu m7, [ref4q]
-%if %1 == 1
- movu m3, [second_predq]
- pavgb m4, m3
- pavgb m5, m3
- pavgb m6, m3
- pavgb m7, m3
- lea second_predq, [second_predq+mmsize]
-%endif
psadbw m4, m0
psadbw m5, m0
psadbw m6, m0
psadbw m7, m0
%endmacro
-; PROCESS_16x1x4 offset, do_avg
-%macro PROCESS_16x1x4 2
+; PROCESS_16x1x4 offset
+%macro PROCESS_16x1x4 1
mova m0, [srcq + %1]
movu m1, [ref1q + ref_offsetq + %1]
movu m2, [ref2q + ref_offsetq + %1]
-%if %2 == 1
- movu m3, [second_predq]
- pavgb m1, m3
- pavgb m2, m3
-%endif
psadbw m1, m0
psadbw m2, m0
paddd m4, m1
@@ -216,11 +172,6 @@
movu m1, [ref3q + ref_offsetq + %1]
movu m2, [ref4q + ref_offsetq + %1]
-%if %2 == 1
- pavgb m1, m3
- pavgb m2, m3
- lea second_predq, [second_predq+mmsize]
-%endif
psadbw m1, m0
psadbw m2, m0
paddd m6, m1
@@ -233,9 +184,8 @@
; Macro Arguments:
; 1: Width
; 2: Height
-; 3: If 0, then normal sad, else avg
-; 4: If 0, then normal sad, else skip rows
-%macro SADNXN4D 2-4 0,0
+; 3: If 0, then normal sad, else skip rows
+%macro SADNXN4D 2-3 0
%define spill_src_stride 0
%define spill_ref_stride 0
@@ -249,8 +199,8 @@
; Remove loops in the 4x4 and 8x4 case
%define use_loop (use_ref_offset || %2 > 4)
-%if %4 == 1 ; skip rows
-%if ARCH_X86_64
+%if %3 == 1 ; skip rows
+%if AOM_ARCH_X86_64
%if use_ref_offset
cglobal sad_skip_%1x%2x4d, 5, 10, 8, src, src_stride, ref1, ref_stride, res, \
ref2, ref3, ref4, cnt, ref_offset
@@ -276,8 +226,8 @@
ref3, ref4
%endif
%endif
-%elif %3 == 0 ; normal sad
-%if ARCH_X86_64
+%else ; normal sad
+%if AOM_ARCH_X86_64
%if use_ref_offset
cglobal sad%1x%2x4d, 5, 10, 8, src, src_stride, ref1, ref_stride, res, ref2, \
ref3, ref4, cnt, ref_offset
@@ -301,34 +251,6 @@
ref4
%endif
%endif
-%else ; avg
-%if ARCH_X86_64
-%if use_ref_offset
-cglobal sad%1x%2x4d_avg, 6, 11, 8, src, src_stride, ref1, ref_stride, \
- second_pred, res, ref2, ref3, ref4, cnt, \
- ref_offset
-%elif use_loop
-cglobal sad%1x%2x4d_avg, 6, 10, 8, src, src_stride, ref1, ref_stride, \
- second_pred, res, ref2, ref3, ref4, cnt
-%else
-cglobal sad%1x%2x4d_avg, 6, 9, 8, src, src_stride, ref1, ref_stride, \
- second_pred, res, ref2, ref3, ref4
-%endif
-%else
-%if use_ref_offset
-cglobal sad%1x%2x4d_avg, 5, 7, 8, src, ref4, ref1, ref_offset, second_pred, ref2, ref3
- %define spill_src_stride 1
- %define spill_ref_stride 1
- %define spill_cnt 1
-%elif use_loop
-cglobal sad%1x%2x4d_avg, 5, 7, 8, src, ref4, ref1, ref_stride, second_pred, ref2, ref3
- %define spill_src_stride 1
- %define spill_cnt 1
-%else
-cglobal sad%1x%2x4d_avg, 5, 7, 8, src, ref4, ref1, ref_stride, second_pred, ref2, ref3
- %define spill_src_stride 1
-%endif
-%endif
%endif
%if spill_src_stride
@@ -345,7 +267,7 @@
%define cntd word [rsp]
%endif
-%if %4 == 1
+%if %3 == 1
sal src_strided, 1
sal ref_strided, 1
%endif
@@ -362,14 +284,12 @@
%define external_loop (use_ref_offset && %1 > mmsize && %1 != %2)
%if use_ref_offset
- PROCESS_FIRST_MMSIZE %3
+ PROCESS_FIRST_MMSIZE
%if %1 > mmsize
mov ref_offsetq, 0
- mov cntd, %2 >> %4
+ mov cntd, %2 >> %3
; Jump part way into the loop for the square version of this width
%if %3 == 1
- jmp mangle(private_prefix %+ _sad%1x%1x4d_avg %+ SUFFIX).midloop
-%elif %4 == 1
jmp mangle(private_prefix %+ _sad_skip_%1x%1x4d %+ SUFFIX).midloop
%else
jmp mangle(private_prefix %+ _sad%1x%1x4d %+ SUFFIX).midloop
@@ -377,14 +297,14 @@
%else
mov ref_offsetq, ref_strideq
add srcq, src_strideq
- mov cntd, (%2 >> %4) - 1
+ mov cntd, (%2 >> %3) - 1
%endif
%if external_loop == 0
.loop:
; Unrolled horizontal loop
%assign h_offset 0
%rep %1/mmsize
- PROCESS_16x1x4 h_offset, %3
+ PROCESS_16x1x4 h_offset
%if h_offset == 0
; The first row of the first column is done outside the loop and jumps here
.midloop:
@@ -398,13 +318,13 @@
jnz .loop
%endif
%else
- PROCESS_%1x2x4 1, %3
+ PROCESS_%1x2x4 1
ADVANCE_END_OF_TWO_LINES
%if use_loop
- mov cntd, (%2/2 >> %4) - 1
+ mov cntd, (%2/2 >> %3) - 1
.loop:
%endif
- PROCESS_%1x2x4 0, %3
+ PROCESS_%1x2x4 0
%if use_loop
ADVANCE_END_OF_TWO_LINES
sub cntd, 1
@@ -421,13 +341,10 @@
%if %3 == 0
%define resultq r4
%define resultmp r4mp
-%else
- %define resultq r5
- %define resultmp r5mp
%endif
; Undo modifications on parameters on the stack
-%if %4 == 1
+%if %3 == 1
%if spill_src_stride
shr src_strided, 1
%endif
@@ -446,7 +363,7 @@
punpcklqdq m4, m6
punpckhqdq m5, m7
paddd m4, m5
-%if %4 == 1
+%if %3 == 1
pslld m4, 1
%endif
movifnidn resultq, resultmp
@@ -455,7 +372,7 @@
%else
pshufd m6, m6, 0x08
pshufd m7, m7, 0x08
-%if %4 == 1
+%if %3 == 1
pslld m6, 1
pslld m7, 1
%endif
@@ -492,7 +409,6 @@
SADNXN4D 16, 64
SADNXN4D 64, 16
%endif
-%if CONFIG_REALTIME_ONLY==0
SADNXN4D 128, 128, 1
SADNXN4D 128, 64, 1
SADNXN4D 64, 128, 1
@@ -506,39 +422,16 @@
SADNXN4D 16, 8, 1
SADNXN4D 8, 16, 1
SADNXN4D 8, 8, 1
-SADNXN4D 8, 4, 1
SADNXN4D 4, 8, 1
-SADNXN4D 4, 4, 1
+%if CONFIG_REALTIME_ONLY==0
SADNXN4D 4, 16, 1
-SADNXN4D 16, 4, 1
SADNXN4D 8, 32, 1
SADNXN4D 32, 8, 1
SADNXN4D 16, 64, 1
SADNXN4D 64, 16, 1
%endif
-SADNXN4D 128, 128, 0, 1
-SADNXN4D 128, 64, 0, 1
-SADNXN4D 64, 128, 0, 1
-SADNXN4D 64, 64, 0, 1
-SADNXN4D 64, 32, 0, 1
-SADNXN4D 32, 64, 0, 1
-SADNXN4D 32, 32, 0, 1
-SADNXN4D 32, 16, 0, 1
-SADNXN4D 16, 32, 0, 1
-SADNXN4D 16, 16, 0, 1
-SADNXN4D 16, 8, 0, 1
-SADNXN4D 8, 16, 0, 1
-SADNXN4D 8, 8, 0, 1
-SADNXN4D 4, 8, 0, 1
-%if CONFIG_REALTIME_ONLY==0
-SADNXN4D 4, 16, 0, 1
-SADNXN4D 8, 32, 0, 1
-SADNXN4D 32, 8, 0, 1
-SADNXN4D 16, 64, 0, 1
-SADNXN4D 64, 16, 0, 1
-%endif
; Different assembly is needed when the height gets subsampled to 2
-; SADNXN4D 16, 4, 0, 1
-; SADNXN4D 8, 4, 0, 1
-; SADNXN4D 4, 4, 0, 1
+; SADNXN4D 16, 4, 1
+; SADNXN4D 8, 4, 1
+; SADNXN4D 4, 4, 1
diff --git a/aom_dsp/x86/sad_sse2.asm b/aom_dsp/x86/sad_sse2.asm
index de9845a..dbe8ca3 100644
--- a/aom_dsp/x86/sad_sse2.asm
+++ b/aom_dsp/x86/sad_sse2.asm
@@ -42,11 +42,11 @@
cglobal sad%1x%2_avg, 5, 1 + %3, 5, src, src_stride, ref, ref_stride, \
second_pred, n_rows
%else ; %3 == 7
-cglobal sad%1x%2_avg, 5, ARCH_X86_64 + %3, 6, src, src_stride, \
+cglobal sad%1x%2_avg, 5, AOM_ARCH_X86_64 + %3, 6, src, src_stride, \
ref, ref_stride, \
second_pred, \
src_stride3, ref_stride3
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define n_rowsd r7d
%else ; x86-32
%define n_rowsd dword r0m
diff --git a/aom_dsp/x86/subpel_variance_sse2.asm b/aom_dsp/x86/subpel_variance_sse2.asm
index cbf2890..d1d8373 100644
--- a/aom_dsp/x86/subpel_variance_sse2.asm
+++ b/aom_dsp/x86/subpel_variance_sse2.asm
@@ -98,7 +98,7 @@
%endmacro
%macro INC_SRC_BY_SRC_STRIDE 0
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
add srcq, src_stridemp
%else
add srcq, src_strideq
@@ -117,7 +117,7 @@
; 11, not 13, if the registers are ordered correctly. May make a minor speed
; difference on Win64
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%if %2 == 1 ; avg
cglobal sub_pixel_avg_variance%1xh, 9, 10, 13, src, src_stride, \
x_offset, y_offset, dst, dst_stride, \
@@ -355,11 +355,11 @@
.x_zero_y_nonhalf:
; x_offset == 0 && y_offset == bilin interpolation
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && %1 > 4
+%if AOM_ARCH_X86_64 && %1 > 4
mova m8, [bilin_filter+y_offsetq]
%if notcpuflag(ssse3) ; FIXME(rbultje) don't scatter registers on x86-64
mova m9, [bilin_filter+y_offsetq+16]
@@ -369,7 +369,7 @@
%define filter_y_b m9
%define filter_rnd m10
%else ; x86-32 or mmx
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; x_offset == 0, reuse x_offset reg
%define tempq x_offsetq
add y_offsetq, g_bilin_filterm
@@ -678,11 +678,11 @@
.x_half_y_nonhalf:
; x_offset == 0.5 && y_offset == bilin interpolation
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && %1 > 4
+%if AOM_ARCH_X86_64 && %1 > 4
mova m8, [bilin_filter+y_offsetq]
%if notcpuflag(ssse3) ; FIXME(rbultje) don't scatter registers on x86-64
mova m9, [bilin_filter+y_offsetq+16]
@@ -692,7 +692,7 @@
%define filter_y_b m9
%define filter_rnd m10
%else ;x86_32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; x_offset == 0.5. We can reuse x_offset reg
%define tempq x_offsetq
add y_offsetq, g_bilin_filterm
@@ -836,11 +836,11 @@
jnz .x_nonhalf_y_nonzero
; x_offset == bilin interpolation && y_offset == 0
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift
-%if ARCH_X86_64 && %1 > 4
+%if AOM_ARCH_X86_64 && %1 > 4
mova m8, [bilin_filter+x_offsetq]
%if notcpuflag(ssse3) ; FIXME(rbultje) don't scatter registers on x86-64
mova m9, [bilin_filter+x_offsetq+16]
@@ -850,7 +850,7 @@
%define filter_x_b m9
%define filter_rnd m10
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
;y_offset == 0. We can reuse y_offset reg.
%define tempq y_offsetq
add x_offsetq, g_bilin_filterm
@@ -978,11 +978,11 @@
jne .x_nonhalf_y_nonhalf
; x_offset == bilin interpolation && y_offset == 0.5
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift
-%if ARCH_X86_64 && %1 > 4
+%if AOM_ARCH_X86_64 && %1 > 4
mova m8, [bilin_filter+x_offsetq]
%if notcpuflag(ssse3) ; FIXME(rbultje) don't scatter registers on x86-64
mova m9, [bilin_filter+x_offsetq+16]
@@ -992,7 +992,7 @@
%define filter_x_b m9
%define filter_rnd m10
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; y_offset == 0.5. We can reuse y_offset reg.
%define tempq y_offsetq
add x_offsetq, g_bilin_filterm
@@ -1176,12 +1176,12 @@
STORE_AND_RET %1
.x_nonhalf_y_nonhalf:
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea bilin_filter, [GLOBAL(bilin_filter_m)]
%endif
shl x_offsetd, filter_idx_shift
shl y_offsetd, filter_idx_shift
-%if ARCH_X86_64 && %1 > 4
+%if AOM_ARCH_X86_64 && %1 > 4
mova m8, [bilin_filter+x_offsetq]
%if notcpuflag(ssse3) ; FIXME(rbultje) don't scatter registers on x86-64
mova m9, [bilin_filter+x_offsetq+16]
@@ -1197,7 +1197,7 @@
%define filter_y_b m11
%define filter_rnd m12
%else ; x86-32
-%if ARCH_X86=1 && CONFIG_PIC=1
+%if AOM_ARCH_X86=1 && CONFIG_PIC=1
; In this case, there is NO unused register. Used src_stride register. Later,
; src_stride has to be loaded from stack when it is needed.
%define tempq src_strideq
diff --git a/aom_dsp/x86/subtract_sse2.asm b/aom_dsp/x86/subtract_sse2.asm
index af38022..fd508c0 100644
--- a/aom_dsp/x86/subtract_sse2.asm
+++ b/aom_dsp/x86/subtract_sse2.asm
@@ -40,8 +40,8 @@
%macro loop16 6
mova m0, [srcq+%1]
mova m4, [srcq+%2]
- mova m1, [predq+%3]
- mova m5, [predq+%4]
+ movu m1, [predq+%3]
+ movu m5, [predq+%4]
punpckhbw m2, m0, m7
punpckhbw m3, m1, m7
punpcklbw m0, m7
diff --git a/aom_dsp/x86/sum_squares_sse2.c b/aom_dsp/x86/sum_squares_sse2.c
index 25be856..cf3ed98 100644
--- a/aom_dsp/x86/sum_squares_sse2.c
+++ b/aom_dsp/x86/sum_squares_sse2.c
@@ -23,7 +23,7 @@
}
static INLINE uint64_t xx_cvtsi128_si64(__m128i a) {
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return (uint64_t)_mm_cvtsi128_si64(a);
#else
{
diff --git a/aom_dsp/x86/synonyms.h b/aom_dsp/x86/synonyms.h
index d538015..6744ec5 100644
--- a/aom_dsp/x86/synonyms.h
+++ b/aom_dsp/x86/synonyms.h
@@ -85,6 +85,16 @@
#endif
}
+// Fill an SSE register using an interleaved pair of values, ie. set the
+// 8 channels to {a, b, a, b, a, b, a, b}, using the same channel ordering
+// as when a register is stored to / loaded from memory.
+//
+// This is useful for rearranging filter kernels for use with the _mm_madd_epi16
+// instruction
+static INLINE __m128i xx_set2_epi16(int16_t a, int16_t b) {
+ return _mm_setr_epi16(a, b, a, b, a, b, a, b);
+}
+
static INLINE __m128i xx_round_epu16(__m128i v_val_w) {
return _mm_avg_epu16(v_val_w, _mm_setzero_si128());
}
diff --git a/aom_dsp/x86/variance_avx2.c b/aom_dsp/x86/variance_avx2.c
index a475fb7..046d6f1 100644
--- a/aom_dsp/x86/variance_avx2.c
+++ b/aom_dsp/x86/variance_avx2.c
@@ -269,6 +269,95 @@
_mm256_storeu_si256((__m256i *)(comp_pred), roundA);
}
+void aom_comp_avg_pred_avx2(uint8_t *comp_pred, const uint8_t *pred, int width,
+ int height, const uint8_t *ref, int ref_stride) {
+ int row = 0;
+ if (width == 8) {
+ do {
+ const __m256i pred_0123 = _mm256_loadu_si256((const __m256i *)(pred));
+ const __m128i ref_0 = _mm_loadl_epi64((const __m128i *)(ref));
+ const __m128i ref_1 =
+ _mm_loadl_epi64((const __m128i *)(ref + ref_stride));
+ const __m128i ref_2 =
+ _mm_loadl_epi64((const __m128i *)(ref + 2 * ref_stride));
+ const __m128i ref_3 =
+ _mm_loadl_epi64((const __m128i *)(ref + 3 * ref_stride));
+ const __m128i ref_01 = _mm_unpacklo_epi64(ref_0, ref_1);
+ const __m128i ref_23 = _mm_unpacklo_epi64(ref_2, ref_3);
+
+ const __m256i ref_0123 =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(ref_01), ref_23, 1);
+ const __m256i average = _mm256_avg_epu8(pred_0123, ref_0123);
+ _mm256_storeu_si256((__m256i *)(comp_pred), average);
+
+ row += 4;
+ pred += 32;
+ comp_pred += 32;
+ ref += 4 * ref_stride;
+ } while (row < height);
+ } else if (width == 16) {
+ do {
+ const __m256i pred_0 = _mm256_loadu_si256((const __m256i *)(pred));
+ const __m256i pred_1 = _mm256_loadu_si256((const __m256i *)(pred + 32));
+ const __m256i tmp0 =
+ _mm256_castsi128_si256(_mm_loadu_si128((const __m128i *)(ref)));
+ const __m256i ref_0 = _mm256_inserti128_si256(
+ tmp0, _mm_loadu_si128((const __m128i *)(ref + ref_stride)), 1);
+ const __m256i tmp1 = _mm256_castsi128_si256(
+ _mm_loadu_si128((const __m128i *)(ref + 2 * ref_stride)));
+ const __m256i ref_1 = _mm256_inserti128_si256(
+ tmp1, _mm_loadu_si128((const __m128i *)(ref + 3 * ref_stride)), 1);
+ const __m256i average_0 = _mm256_avg_epu8(pred_0, ref_0);
+ const __m256i average_1 = _mm256_avg_epu8(pred_1, ref_1);
+ _mm256_storeu_si256((__m256i *)(comp_pred), average_0);
+ _mm256_storeu_si256((__m256i *)(comp_pred + 32), average_1);
+
+ row += 4;
+ pred += 64;
+ comp_pred += 64;
+ ref += 4 * ref_stride;
+ } while (row < height);
+ } else if (width == 32) {
+ do {
+ const __m256i pred_0 = _mm256_loadu_si256((const __m256i *)(pred));
+ const __m256i pred_1 = _mm256_loadu_si256((const __m256i *)(pred + 32));
+ const __m256i ref_0 = _mm256_loadu_si256((const __m256i *)(ref));
+ const __m256i ref_1 =
+ _mm256_loadu_si256((const __m256i *)(ref + ref_stride));
+ const __m256i average_0 = _mm256_avg_epu8(pred_0, ref_0);
+ const __m256i average_1 = _mm256_avg_epu8(pred_1, ref_1);
+ _mm256_storeu_si256((__m256i *)(comp_pred), average_0);
+ _mm256_storeu_si256((__m256i *)(comp_pred + 32), average_1);
+
+ row += 2;
+ pred += 64;
+ comp_pred += 64;
+ ref += 2 * ref_stride;
+ } while (row < height);
+ } else if (width % 64 == 0) {
+ do {
+ for (int x = 0; x < width; x += 64) {
+ const __m256i pred_0 = _mm256_loadu_si256((const __m256i *)(pred + x));
+ const __m256i pred_1 =
+ _mm256_loadu_si256((const __m256i *)(pred + x + 32));
+ const __m256i ref_0 = _mm256_loadu_si256((const __m256i *)(ref + x));
+ const __m256i ref_1 =
+ _mm256_loadu_si256((const __m256i *)(ref + x + 32));
+ const __m256i average_0 = _mm256_avg_epu8(pred_0, ref_0);
+ const __m256i average_1 = _mm256_avg_epu8(pred_1, ref_1);
+ _mm256_storeu_si256((__m256i *)(comp_pred + x), average_0);
+ _mm256_storeu_si256((__m256i *)(comp_pred + x + 32), average_1);
+ }
+ row++;
+ pred += width;
+ comp_pred += width;
+ ref += ref_stride;
+ } while (row < height);
+ } else {
+ aom_comp_avg_pred_c(comp_pred, pred, width, height, ref, ref_stride);
+ }
+}
+
void aom_comp_mask_pred_avx2(uint8_t *comp_pred, const uint8_t *pred, int width,
int height, const uint8_t *ref, int ref_stride,
const uint8_t *mask, int mask_stride,
diff --git a/aom_dsp/x86/variance_sse2.c b/aom_dsp/x86/variance_sse2.c
index 7d4ff4f..faec9cf 100644
--- a/aom_dsp/x86/variance_sse2.c
+++ b/aom_dsp/x86/variance_sse2.c
@@ -46,6 +46,12 @@
return _mm_unpacklo_epi8(p0, _mm_setzero_si128());
}
+static INLINE void load16_8to16_sse2(const uint8_t *const p, __m128i *out) {
+ const __m128i p0 = _mm_loadu_si128((const __m128i *)p);
+ out[0] = _mm_unpacklo_epi8(p0, _mm_setzero_si128()); // lower 8 values
+ out[1] = _mm_unpackhi_epi8(p0, _mm_setzero_si128()); // upper 8 values
+}
+
// Accumulate 4 32bit numbers in val to 1 32bit number
static INLINE unsigned int add32x4_sse2(__m128i val) {
val = _mm_add_epi32(val, _mm_srli_si128(val, 8));
@@ -232,14 +238,6 @@
}
}
-void aom_get8x8var_sse2(const uint8_t *src_ptr, int src_stride,
- const uint8_t *ref_ptr, int ref_stride,
- unsigned int *sse, int *sum) {
- __m128i vsse, vsum;
- variance8_sse2(src_ptr, src_stride, ref_ptr, ref_stride, 8, &vsse, &vsum);
- variance_final_128_pel_sse2(vsse, vsum, sse, sum);
-}
-
void aom_get_var_sse_sum_8x8_quad_sse2(const uint8_t *src_ptr, int src_stride,
const uint8_t *ref_ptr, int ref_stride,
uint32_t *sse8x8, int *sum8x8,
@@ -271,6 +269,42 @@
var8x8[i] = sse8x8[i] - (uint32_t)(((int64_t)sum8x8[i] * sum8x8[i]) >> 6);
}
+void aom_get_var_sse_sum_16x16_dual_sse2(const uint8_t *src_ptr, int src_stride,
+ const uint8_t *ref_ptr, int ref_stride,
+ uint32_t *sse16x16,
+ unsigned int *tot_sse, int *tot_sum,
+ uint32_t *var16x16) {
+ int sum16x16[2] = { 0 };
+ // Loop over 2 16x16 blocks. Process one 16x32 block.
+ for (int k = 0; k < 2; k++) {
+ const uint8_t *src = src_ptr;
+ const uint8_t *ref = ref_ptr;
+ __m128i vsum = _mm_setzero_si128();
+ __m128i vsse = _mm_setzero_si128();
+ for (int i = 0; i < 16; i++) {
+ __m128i s[2];
+ __m128i r[2];
+ load16_8to16_sse2(src + (k * 16), s);
+ load16_8to16_sse2(ref + (k * 16), r);
+ const __m128i diff0 = _mm_sub_epi16(s[0], r[0]);
+ const __m128i diff1 = _mm_sub_epi16(s[1], r[1]);
+ vsse = _mm_add_epi32(vsse, _mm_madd_epi16(diff0, diff0));
+ vsse = _mm_add_epi32(vsse, _mm_madd_epi16(diff1, diff1));
+ vsum = _mm_add_epi16(vsum, _mm_add_epi16(diff0, diff1));
+ src += src_stride;
+ ref += ref_stride;
+ }
+ variance_final_256_pel_sse2(vsse, vsum, &sse16x16[k], &sum16x16[k]);
+ }
+
+ // Calculate variance at 16x16 level and total sse, sum of 16x32 block.
+ *tot_sse += sse16x16[0] + sse16x16[1];
+ *tot_sum += sum16x16[0] + sum16x16[1];
+ for (int i = 0; i < 2; i++)
+ var16x16[i] =
+ sse16x16[i] - (uint32_t)(((int64_t)sum16x16[i] * sum16x16[i]) >> 8);
+}
+
#define AOM_VAR_NO_LOOP_SSE2(bw, bh, bits, max_pixels) \
unsigned int aom_variance##bw##x##bh##_sse2( \
const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, \
diff --git a/aom_ports/aom_once.h b/aom_ports/aom_once.h
index 0974a3b..680120f 100644
--- a/aom_ports/aom_once.h
+++ b/aom_ports/aom_once.h
@@ -60,29 +60,6 @@
InitOnceComplete(&aom_init_once, 0, NULL);
}
-#elif CONFIG_MULTITHREAD && defined(__OS2__)
-#define INCL_DOS
-#include <os2.h>
-static void aom_once(void (*func)(void)) {
- static volatile int done;
-
- /* If the initialization is complete, return early. */
- if (done) return;
-
- /* Causes all other threads in the process to block themselves
- * and give up their time slice.
- */
- DosEnterCritSec();
-
- if (!done) {
- func();
- done = 1;
- }
-
- /* Restores normal thread dispatching for the current process. */
- DosExitCritSec();
-}
-
#elif CONFIG_MULTITHREAD && HAVE_PTHREAD_H
#include <pthread.h>
static void aom_once(void (*func)(void)) {
diff --git a/aom_ports/aom_ports.cmake b/aom_ports/aom_ports.cmake
index 5d9f69a..e3b67e4 100644
--- a/aom_ports/aom_ports.cmake
+++ b/aom_ports/aom_ports.cmake
@@ -27,6 +27,12 @@
list(APPEND AOM_PORTS_SOURCES_ARM "${AOM_ROOT}/aom_ports/arm.h"
"${AOM_ROOT}/aom_ports/arm_cpudetect.c")
+if(CONFIG_RUNTIME_CPU_DETECT AND ANDROID_NDK)
+ include_directories(${ANDROID_NDK}/sources/android/cpufeatures)
+ list(APPEND AOM_PORTS_SOURCES_ARM
+ "${ANDROID_NDK}/sources/android/cpufeatures/cpu-features.c")
+endif()
+
list(APPEND AOM_PORTS_SOURCES_PPC "${AOM_ROOT}/aom_ports/ppc.h"
"${AOM_ROOT}/aom_ports/ppc_cpudetect.c")
@@ -43,9 +49,13 @@
#
# * The libaom target must exist before this function is called.
function(setup_aom_ports_targets)
- if(WIN32 AND "${AOM_TARGET_CPU}" STREQUAL "x86_64")
+ if(XCODE AND "${AOM_TARGET_CPU}" STREQUAL "x86_64")
add_asm_library("aom_ports" "AOM_PORTS_ASM_X86")
- set(aom_ports_asm_lib 1)
+ # Xcode is the only one
+ set(aom_ports_is_embedded 1)
+ set(aom_ports_has_symbols 1)
+ elseif(WIN32 AND "${AOM_TARGET_CPU}" STREQUAL "x86_64")
+ add_asm_library("aom_ports" "AOM_PORTS_ASM_X86")
set(aom_ports_has_symbols 1)
elseif("${AOM_TARGET_CPU}" MATCHES "arm")
add_library(aom_ports OBJECT ${AOM_PORTS_SOURCES_ARM})
@@ -68,14 +78,7 @@
# libaom_srcs.*; if it becomes necessary for a particular generator another
# method should be used.
if(aom_ports_has_symbols)
- if(aom_ports_asm_lib)
- # When aom_ports is an asm library its name changes based on build
- # configuration. This handles adding sources to the correct target(s).
- target_sources(aom_ports_static PRIVATE ${AOM_PORTS_INCLUDES})
- if(BUILD_SHARED_LIBS)
- target_sources(aom_ports_shared PRIVATE ${AOM_PORTS_INCLUDES})
- endif()
- else()
+ if(NOT aom_ports_is_embedded)
target_sources(aom_ports PRIVATE ${AOM_PORTS_INCLUDES})
endif()
set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} PARENT_SCOPE)
diff --git a/aom_ports/aom_timer.h b/aom_ports/aom_timer.h
index ff58799..642c5a0 100644
--- a/aom_ports/aom_timer.h
+++ b/aom_ports/aom_timer.h
@@ -14,10 +14,11 @@
#include "config/aom_config.h"
-#include "aom/aom_integer.h"
-
#if CONFIG_OS_SUPPORT
+#include <stddef.h>
+#include <stdint.h>
+
#if defined(_WIN32)
/*
* Win32 specific includes
diff --git a/aom_ports/arm_cpudetect.c b/aom_ports/arm_cpudetect.c
index 305b22c..276ef61 100644
--- a/aom_ports/arm_cpudetect.c
+++ b/aom_ports/arm_cpudetect.c
@@ -57,12 +57,14 @@
}
#elif defined(_MSC_VER) /* end !CONFIG_RUNTIME_CPU_DETECT || __APPLE__ */
+#if HAVE_NEON && !AOM_ARCH_AARCH64
/*For GetExceptionCode() and EXCEPTION_ILLEGAL_INSTRUCTION.*/
#undef WIN32_LEAN_AND_MEAN
#define WIN32_LEAN_AND_MEAN
#undef WIN32_EXTRA_LEAN
#define WIN32_EXTRA_LEAN
#include <windows.h>
+#endif // HAVE_NEON && !AOM_ARCH_AARCH64
int aom_arm_cpu_caps(void) {
int flags;
@@ -71,6 +73,9 @@
return flags;
}
mask = arm_cpu_env_mask();
+#if AOM_ARCH_AARCH64
+ return HAS_NEON & mask;
+#else
/* MSVC has no inline __asm support for ARM, but it does let you __emit
* instructions via their assembled hex code.
* All of these instructions should be essentially nops.
@@ -85,8 +90,9 @@
/*Ignore exception.*/
}
}
-#endif /* HAVE_NEON */
+#endif /* HAVE_NEON */
return flags & mask;
+#endif // AOM_ARCH_AARCH64
}
#elif defined(__ANDROID__) /* end _MSC_VER */
diff --git a/aom_ports/bitops.h b/aom_ports/bitops.h
index 44df173..3c5b992 100644
--- a/aom_ports/bitops.h
+++ b/aom_ports/bitops.h
@@ -13,6 +13,7 @@
#define AOM_AOM_PORTS_BITOPS_H_
#include <assert.h>
+#include <stdint.h>
#include "aom_ports/msvc.h"
#include "config/aom_config.h"
@@ -32,7 +33,13 @@
// Returns (int)floor(log2(n)). n must be > 0.
// These versions of get_msb() are only valid when n != 0 because all
// of the optimized versions are undefined when n == 0:
-// https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+
+// get_byteswap64:
+// Returns the number (uint64_t) with byte-positions reversed
+// e.g. input 0x123456789ABCDEF0 returns 0xF0DEBC9A78563412
+
+// GCC compiler: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+// MSVC: https://learn.microsoft.com/en-us/cpp/c-runtime-library/
// use GNU builtins where available.
#if defined(__GNUC__) && \
@@ -41,6 +48,10 @@
assert(n != 0);
return 31 ^ __builtin_clz(n);
}
+
+static INLINE uint64_t get_byteswap64(uint64_t num) {
+ return __builtin_bswap64(num);
+}
#elif defined(USE_MSC_INTRINSICS)
#pragma intrinsic(_BitScanReverse)
@@ -50,17 +61,19 @@
_BitScanReverse(&first_set_bit, n);
return first_set_bit;
}
+
+static INLINE uint64_t get_byteswap64(uint64_t num) {
+ return _byteswap_uint64(num);
+}
#undef USE_MSC_INTRINSICS
#else
static INLINE int get_msb(unsigned int n) {
int log = 0;
unsigned int value = n;
- int i;
assert(n != 0);
- for (i = 4; i >= 0; --i) {
- const int shift = (1 << i);
+ for (int shift = 16; shift != 0; shift >>= 1) {
const unsigned int x = value >> shift;
if (x != 0) {
value = x;
@@ -69,6 +82,26 @@
}
return log;
}
+
+static INLINE uint64_t get_byteswap64(uint64_t num) {
+ uint64_t out = 0x00;
+ uint64_t mask = 0xFF00000000000000;
+ int bit_shift = 56; // 7 bytes
+ // 4 ms bytes
+ do {
+ out |= (num & mask) >> bit_shift;
+ mask >>= 8;
+ bit_shift -= 16;
+ } while (bit_shift >= 0);
+ // 4 ls bytes
+ bit_shift = 8; // 1 byte
+ do {
+ out |= (num & mask) << bit_shift;
+ mask >>= 8;
+ bit_shift += 16;
+ } while (bit_shift <= 56);
+ return out;
+}
#endif
#ifdef __cplusplus
diff --git a/aom_ports/x86.h b/aom_ports/x86.h
index d44d386..c089984 100644
--- a/aom_ports/x86.h
+++ b/aom_ports/x86.h
@@ -44,7 +44,7 @@
} aom_cpu_t;
#if defined(__GNUC__) && __GNUC__ || defined(__ANDROID__)
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
#define cpuid(func, func2, ax, bx, cx, dx) \
__asm__ __volatile__("cpuid \n\t" \
: "=a"(ax), "=b"(bx), "=c"(cx), "=d"(dx) \
@@ -60,7 +60,7 @@
#endif
#elif defined(__SUNPRO_C) || \
defined(__SUNPRO_CC) /* end __GNUC__ or __ANDROID__*/
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
#define cpuid(func, func2, ax, bx, cx, dx) \
asm volatile( \
"xchg %rsi, %rbx \n\t" \
@@ -80,7 +80,7 @@
: "a"(func), "c"(func2))
#endif
#else /* end __SUNPRO__ */
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
#if defined(_MSC_VER) && _MSC_VER > 1500
#define cpuid(func, func2, a, b, c, d) \
do { \
@@ -258,7 +258,7 @@
asm volatile("rdtsc\n\t" : "=a"(tsc) :);
return tsc;
#else
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return (unsigned int)__rdtsc();
#else
__asm rdtsc;
@@ -276,7 +276,7 @@
asm volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi));
return ((uint64_t)hi << 32) | lo;
#else
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return (uint64_t)__rdtsc();
#else
__asm rdtsc;
@@ -298,7 +298,7 @@
unsigned int ui;
return (unsigned int)__rdtscp(&ui);
#else
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return (unsigned int)__rdtscp();
#else
__asm rdtscp;
@@ -336,7 +336,7 @@
#elif defined(__SUNPRO_C) || defined(__SUNPRO_CC)
#define x86_pause_hint() asm volatile("pause \n\t")
#else
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
#define x86_pause_hint() _mm_pause();
#else
#define x86_pause_hint() __asm pause
@@ -361,7 +361,7 @@
asm volatile("fstcw %0\n\t" : "=m"(*&mode) :);
return mode;
}
-#elif ARCH_X86_64
+#elif AOM_ARCH_X86_64
/* No fldcw intrinsics on Windows x64, punt to external asm */
extern void aom_winx64_fldcw(unsigned short mode);
extern unsigned short aom_winx64_fstcw(void);
diff --git a/aom_scale/aom_scale_rtcd.pl b/aom_scale/aom_scale_rtcd.pl
index e84b6f9..ae0a856 100644
--- a/aom_scale/aom_scale_rtcd.pl
+++ b/aom_scale/aom_scale_rtcd.pl
@@ -26,7 +26,7 @@
add_proto qw/void aom_vertical_band_2_1_scale_i/, "unsigned char *source, int src_pitch, unsigned char *dest, int dest_pitch, unsigned int dest_width";
}
-add_proto qw/int aom_yv12_realloc_with_new_border/, "struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes";
+add_proto qw/int aom_yv12_realloc_with_new_border/, "struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes";
add_proto qw/void aom_yv12_extend_frame_borders/, "struct yv12_buffer_config *ybf, const int num_planes";
diff --git a/aom_scale/generic/yv12config.c b/aom_scale/generic/yv12config.c
index de56263..82376f4 100644
--- a/aom_scale/generic/yv12config.c
+++ b/aom_scale/generic/yv12config.c
@@ -12,6 +12,8 @@
#include <assert.h>
#include "aom/internal/aom_image_internal.h"
+#include "aom_dsp/pyramid.h"
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_mem/aom_mem.h"
#include "aom_ports/mem.h"
#include "aom_scale/yv12config.h"
@@ -31,7 +33,14 @@
if (ybf->buffer_alloc_sz > 0) {
aom_free(ybf->buffer_alloc);
}
- if (ybf->y_buffer_8bit) aom_free(ybf->y_buffer_8bit);
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+ if (ybf->y_pyramid) {
+ aom_free_pyramid(ybf->y_pyramid);
+ }
+ if (ybf->corners) {
+ av1_free_corner_list(ybf->corners);
+ }
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
aom_remove_metadata_from_frame_buffer(ybf);
/* buffer_alloc isn't accessed by most functions. Rather y_buffer,
u_buffer and v_buffer point to buffer_alloc and are used. Clear out
@@ -51,7 +60,7 @@
const uint64_t uvplane_size, const int aligned_width,
const int aligned_height, const int uv_width, const int uv_height,
const int uv_stride, const int uv_border_w, const int uv_border_h,
- int alloc_y_buffer_8bit, int alloc_y_plane_only) {
+ int num_pyramid_levels, int alloc_y_plane_only) {
if (ybf) {
const int aom_byte_align = (byte_alignment == 0) ? 1 : byte_alignment;
const uint64_t frame_size =
@@ -59,11 +68,24 @@
uint8_t *buf = NULL;
+#if CONFIG_REALTIME_ONLY || !CONFIG_AV1_ENCODER
+ // We should only need an 8-bit version of the source frame if we are
+ // encoding in non-realtime mode
+ (void)num_pyramid_levels;
+ assert(num_pyramid_levels == 0);
+#endif // CONFIG_REALTIME_ONLY || !CONFIG_AV1_ENCODER
+
#if defined AOM_MAX_ALLOCABLE_MEMORY
// The size of ybf->buffer_alloc.
uint64_t alloc_size = frame_size;
- // The size of ybf->y_buffer_8bit.
- if (use_highbitdepth) alloc_size += yplane_size;
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+ // The size of ybf->y_pyramid
+ if (num_pyramid_levels > 0) {
+ alloc_size += aom_get_pyramid_alloc_size(
+ width, height, num_pyramid_levels, use_highbitdepth);
+ alloc_size += av1_get_corner_list_size();
+ }
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
// The decoder may allocate REF_FRAMES frame buffers in the frame buffer
// pool. Bound the total amount of allocated memory as if these REF_FRAMES
// frame buffers were allocated in a single allocation.
@@ -159,17 +181,21 @@
ybf->use_external_reference_buffers = 0;
- if (use_highbitdepth && alloc_y_buffer_8bit) {
- if (ybf->y_buffer_8bit) aom_free(ybf->y_buffer_8bit);
- ybf->y_buffer_8bit = (uint8_t *)aom_memalign(32, (size_t)yplane_size);
- if (!ybf->y_buffer_8bit) return AOM_CODEC_MEM_ERROR;
- } else {
- if (ybf->y_buffer_8bit) {
- aom_free(ybf->y_buffer_8bit);
- ybf->y_buffer_8bit = NULL;
- ybf->buf_8bit_valid = 0;
- }
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+ if (ybf->y_pyramid) {
+ aom_free_pyramid(ybf->y_pyramid);
+ ybf->y_pyramid = NULL;
}
+ if (ybf->corners) {
+ av1_free_corner_list(ybf->corners);
+ ybf->corners = NULL;
+ }
+ if (num_pyramid_levels > 0) {
+ ybf->y_pyramid = aom_alloc_pyramid(width, height, num_pyramid_levels,
+ use_highbitdepth);
+ ybf->corners = av1_alloc_corner_list();
+ }
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
ybf->corrupted = 0; /* assume not corrupted by errors */
return 0;
@@ -209,7 +235,7 @@
int border, int byte_alignment,
aom_codec_frame_buffer_t *fb,
aom_get_frame_buffer_cb_fn_t cb, void *cb_priv,
- int alloc_y_buffer_8bit, int alloc_y_plane_only) {
+ int num_pyramid_levels, int alloc_y_plane_only) {
#if CONFIG_SIZE_LIMIT
if (width > DECODE_WIDTH_LIMIT || height > DECODE_HEIGHT_LIMIT)
return AOM_CODEC_MEM_ERROR;
@@ -236,19 +262,21 @@
ybf, width, height, ss_x, ss_y, use_highbitdepth, border,
byte_alignment, fb, cb, cb_priv, y_stride, yplane_size, uvplane_size,
aligned_width, aligned_height, uv_width, uv_height, uv_stride,
- uv_border_w, uv_border_h, alloc_y_buffer_8bit, alloc_y_plane_only);
+ uv_border_w, uv_border_h, num_pyramid_levels, alloc_y_plane_only);
}
return AOM_CODEC_MEM_ERROR;
}
int aom_alloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
int ss_x, int ss_y, int use_highbitdepth, int border,
- int byte_alignment, int alloc_y_plane_only) {
+ int byte_alignment, int num_pyramid_levels,
+ int alloc_y_plane_only) {
if (ybf) {
aom_free_frame_buffer(ybf);
return aom_realloc_frame_buffer(ybf, width, height, ss_x, ss_y,
use_highbitdepth, border, byte_alignment,
- NULL, NULL, NULL, 0, alloc_y_plane_only);
+ NULL, NULL, NULL, num_pyramid_levels,
+ alloc_y_plane_only);
}
return AOM_CODEC_MEM_ERROR;
}
diff --git a/aom_scale/generic/yv12extend.c b/aom_scale/generic/yv12extend.c
index 997ff54..5546112 100644
--- a/aom_scale/generic/yv12extend.c
+++ b/aom_scale/generic/yv12extend.c
@@ -491,7 +491,8 @@
}
int aom_yv12_realloc_with_new_border_c(YV12_BUFFER_CONFIG *ybf, int new_border,
- int byte_alignment, int num_planes) {
+ int byte_alignment,
+ int num_pyramid_levels, int num_planes) {
if (ybf) {
if (new_border == ybf->border) return 0;
YV12_BUFFER_CONFIG new_buf;
@@ -499,7 +500,7 @@
const int error = aom_alloc_frame_buffer(
&new_buf, ybf->y_crop_width, ybf->y_crop_height, ybf->subsampling_x,
ybf->subsampling_y, ybf->flags & YV12_FLAG_HIGHBITDEPTH, new_border,
- byte_alignment, 0);
+ byte_alignment, num_pyramid_levels, 0);
if (error) return error;
// Copy image buffer
aom_yv12_copy_frame(ybf, &new_buf, num_planes);
diff --git a/aom_scale/yv12config.h b/aom_scale/yv12config.h
index 581e923..f192a30 100644
--- a/aom_scale/yv12config.h
+++ b/aom_scale/yv12config.h
@@ -32,6 +32,11 @@
#define AOM_ENC_ALLINTRA_BORDER 64
#define AOM_DEC_BORDER_IN_PIXELS 64
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+struct image_pyramid;
+struct corner_list;
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+
/*!\endcond */
/*!
* \brief YV12 frame buffer data structure
@@ -90,10 +95,12 @@
// external reference frame is no longer used.
uint8_t *store_buf_adr[3];
- // If the frame is stored in a 16-bit buffer, this stores an 8-bit version
- // for use in global motion detection. It is allocated on-demand.
- uint8_t *y_buffer_8bit;
- int buf_8bit_valid;
+ // Global motion search data
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+ // 8-bit downsampling pyramid for the Y plane
+ struct image_pyramid *y_pyramid;
+ struct corner_list *corners;
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
uint8_t *buffer_alloc;
size_t buffer_alloc_sz;
@@ -121,23 +128,44 @@
#define YV12_FLAG_HIGHBITDEPTH 8
+// Allocate a frame buffer
+//
+// If ybf currently contains an image, all associated memory will be freed and
+// then reallocated. In contrast, aom_realloc_frame_buffer() will reuse any
+// existing allocations where possible. So, if ybf is likely to already be
+// set up, please consider aom_realloc_frame_buffer() instead.
+//
+// See aom_realloc_frame_buffer() for the meanings of the arguments, and
+// available return values.
int aom_alloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
int ss_x, int ss_y, int use_highbitdepth, int border,
- int byte_alignment, int alloc_y_plane_only);
+ int byte_alignment, int num_pyramid_levels,
+ int alloc_y_plane_only);
// Updates the yv12 buffer config with the frame buffer. |byte_alignment| must
// be a power of 2, from 32 to 1024. 0 sets legacy alignment. If cb is not
// NULL, then libaom is using the frame buffer callbacks to handle memory.
// If cb is not NULL, libaom will call cb with minimum size in bytes needed
// to decode the current frame. If cb is NULL, libaom will allocate memory
-// internally to decode the current frame. Returns 0 on success. Returns < 0
-// on failure.
+// internally to decode the current frame.
+//
+// If num_pyramid_levels > 0, then an image pyramid will be allocated with
+// the specified number of levels.
+//
+// Any buffer which may become a source or ref frame buffer in the encoder
+// must have num_pyramid_levels = cpi->image_pyramid_levels. This will cause
+// an image pyramid to be allocated if one is needed.
+//
+// Any other buffers (in particular, any buffers inside the decoder)
+// must have cpi->image_pyramid_levels = 0, as a pyramid is unneeded there.
+//
+// Returns 0 on success. Returns < 0 on failure.
int aom_realloc_frame_buffer(YV12_BUFFER_CONFIG *ybf, int width, int height,
int ss_x, int ss_y, int use_highbitdepth,
int border, int byte_alignment,
aom_codec_frame_buffer_t *fb,
aom_get_frame_buffer_cb_fn_t cb, void *cb_priv,
- int alloc_y_buffer_8bit, int alloc_y_plane_only);
+ int num_pyramid_levels, int alloc_y_plane_only);
int aom_free_frame_buffer(YV12_BUFFER_CONFIG *ybf);
diff --git a/aom_util/aom_thread.c b/aom_util/aom_thread.c
index 3916021..2c62b24 100644
--- a/aom_util/aom_thread.c
+++ b/aom_util/aom_thread.c
@@ -62,22 +62,30 @@
pthread_setname_np(pthread_self(), thread_name);
}
#endif
- int done = 0;
- while (!done) {
- pthread_mutex_lock(&worker->impl_->mutex_);
+ pthread_mutex_lock(&worker->impl_->mutex_);
+ for (;;) {
while (worker->status_ == OK) { // wait in idling mode
pthread_cond_wait(&worker->impl_->condition_, &worker->impl_->mutex_);
}
if (worker->status_ == WORK) {
+ // When worker->status_ is WORK, the main thread doesn't change
+ // worker->status_ and will wait until the worker changes worker->status_
+ // to OK. See change_state(). So the worker can safely call execute()
+ // without holding worker->impl_->mutex_. When the worker reacquires
+ // worker->impl_->mutex_, worker->status_ must still be WORK.
+ pthread_mutex_unlock(&worker->impl_->mutex_);
execute(worker);
+ pthread_mutex_lock(&worker->impl_->mutex_);
+ assert(worker->status_ == WORK);
worker->status_ = OK;
- } else if (worker->status_ == NOT_OK) { // finish the worker
- done = 1;
+ // signal to the main thread that we're done (for sync())
+ pthread_cond_signal(&worker->impl_->condition_);
+ } else {
+ assert(worker->status_ == NOT_OK); // finish the worker
+ break;
}
- // signal to the main thread that we're done (for sync())
- pthread_cond_signal(&worker->impl_->condition_);
- pthread_mutex_unlock(&worker->impl_->mutex_);
}
+ pthread_mutex_unlock(&worker->impl_->mutex_);
return THREAD_RETURN(NULL); // Thread is finished
}
diff --git a/aom_util/aom_thread.h b/aom_util/aom_thread.h
index 7db7924..2df190f 100644
--- a/aom_util/aom_thread.h
+++ b/aom_util/aom_thread.h
@@ -143,151 +143,6 @@
ok = SleepConditionVariableCS(condition, mutex, INFINITE);
return !ok;
}
-#elif defined(__OS2__)
-#define INCL_DOS
-#include <os2.h> // NOLINT
-
-#include <errno.h> // NOLINT
-#include <stdlib.h> // NOLINT
-#include <sys/builtin.h> // NOLINT
-
-#define pthread_t TID
-#define pthread_mutex_t HMTX
-
-typedef struct {
- HEV event_sem_;
- HEV ack_sem_;
- volatile unsigned wait_count_;
-} pthread_cond_t;
-
-//------------------------------------------------------------------------------
-// simplistic pthread emulation layer
-
-#define THREADFN void *
-#define THREAD_RETURN(val) (val)
-
-typedef struct {
- void *(*start_)(void *);
- void *arg_;
-} thread_arg;
-
-static void thread_start(void *arg) {
- thread_arg targ = *(thread_arg *)arg;
- free(arg);
-
- targ.start_(targ.arg_);
-}
-
-static INLINE int pthread_create(pthread_t *const thread, const void *attr,
- void *(*start)(void *), void *arg) {
- int tid;
- thread_arg *targ = (thread_arg *)malloc(sizeof(*targ));
- if (targ == NULL) return 1;
-
- (void)attr;
-
- targ->start_ = start;
- targ->arg_ = arg;
- tid = (pthread_t)_beginthread(thread_start, NULL, 1024 * 1024, targ);
- if (tid == -1) {
- free(targ);
- return 1;
- }
-
- *thread = tid;
- return 0;
-}
-
-static INLINE int pthread_join(pthread_t thread, void **value_ptr) {
- (void)value_ptr;
- return DosWaitThread(&thread, DCWW_WAIT) != 0;
-}
-
-// Mutex
-static INLINE int pthread_mutex_init(pthread_mutex_t *const mutex,
- void *mutexattr) {
- (void)mutexattr;
- return DosCreateMutexSem(NULL, mutex, 0, FALSE) != 0;
-}
-
-static INLINE int pthread_mutex_trylock(pthread_mutex_t *const mutex) {
- return DosRequestMutexSem(*mutex, SEM_IMMEDIATE_RETURN) == 0 ? 0 : EBUSY;
-}
-
-static INLINE int pthread_mutex_lock(pthread_mutex_t *const mutex) {
- return DosRequestMutexSem(*mutex, SEM_INDEFINITE_WAIT) != 0;
-}
-
-static INLINE int pthread_mutex_unlock(pthread_mutex_t *const mutex) {
- return DosReleaseMutexSem(*mutex) != 0;
-}
-
-static INLINE int pthread_mutex_destroy(pthread_mutex_t *const mutex) {
- return DosCloseMutexSem(*mutex) != 0;
-}
-
-// Condition
-static INLINE int pthread_cond_destroy(pthread_cond_t *const condition) {
- int ok = 1;
- ok &= DosCloseEventSem(condition->event_sem_) == 0;
- ok &= DosCloseEventSem(condition->ack_sem_) == 0;
- return !ok;
-}
-
-static INLINE int pthread_cond_init(pthread_cond_t *const condition,
- void *cond_attr) {
- int ok = 1;
- (void)cond_attr;
-
- ok &=
- DosCreateEventSem(NULL, &condition->event_sem_, DCE_POSTONE, FALSE) == 0;
- ok &= DosCreateEventSem(NULL, &condition->ack_sem_, DCE_POSTONE, FALSE) == 0;
- if (!ok) {
- pthread_cond_destroy(condition);
- return 1;
- }
- condition->wait_count_ = 0;
- return 0;
-}
-
-static INLINE int pthread_cond_signal(pthread_cond_t *const condition) {
- int ok = 1;
-
- if (!__atomic_cmpxchg32(&condition->wait_count_, 0, 0)) {
- ok &= DosPostEventSem(condition->event_sem_) == 0;
- ok &= DosWaitEventSem(condition->ack_sem_, SEM_INDEFINITE_WAIT) == 0;
- }
-
- return !ok;
-}
-
-static INLINE int pthread_cond_broadcast(pthread_cond_t *const condition) {
- int ok = 1;
-
- while (!__atomic_cmpxchg32(&condition->wait_count_, 0, 0))
- ok &= pthread_cond_signal(condition) == 0;
-
- return !ok;
-}
-
-static INLINE int pthread_cond_wait(pthread_cond_t *const condition,
- pthread_mutex_t *const mutex) {
- int ok = 1;
-
- __atomic_increment(&condition->wait_count_);
-
- ok &= pthread_mutex_unlock(mutex) == 0;
-
- ok &= DosWaitEventSem(condition->event_sem_, SEM_INDEFINITE_WAIT) == 0;
-
- __atomic_decrement(&condition->wait_count_);
-
- ok &= DosPostEventSem(condition->ack_sem_) == 0;
-
- pthread_mutex_lock(mutex);
-
- return !ok;
-}
#else // _WIN32
#include <pthread.h> // NOLINT
#define THREADFN void *
diff --git a/aom_util/aom_util.cmake b/aom_util/aom_util.cmake
index 1a1bfe1..6bf4faf 100644
--- a/aom_util/aom_util.cmake
+++ b/aom_util/aom_util.cmake
@@ -15,9 +15,12 @@
list(APPEND AOM_UTIL_SOURCES "${AOM_ROOT}/aom_util/aom_thread.c"
"${AOM_ROOT}/aom_util/aom_thread.h"
- "${AOM_ROOT}/aom_util/endian_inl.h"
- "${AOM_ROOT}/aom_util/debug_util.c"
- "${AOM_ROOT}/aom_util/debug_util.h")
+ "${AOM_ROOT}/aom_util/endian_inl.h")
+
+if(CONFIG_BITSTREAM_DEBUG)
+ list(APPEND AOM_UTIL_SOURCES "${AOM_ROOT}/aom_util/debug_util.c"
+ "${AOM_ROOT}/aom_util/debug_util.h")
+endif()
# Creates the aom_util build target and makes libaom depend on it. The libaom
# target must exist before this function is called.
diff --git a/aom_util/debug_util.c b/aom_util/debug_util.c
index 3e9c314..7b24550 100644
--- a/aom_util/debug_util.c
+++ b/aom_util/debug_util.c
@@ -32,7 +32,7 @@
int aom_bitstream_queue_get_frame_read(void) { return frame_idx_r; }
#if CONFIG_BITSTREAM_DEBUG
-#define QUEUE_MAX_SIZE 2000000
+#define QUEUE_MAX_SIZE 4000000
static int result_queue[QUEUE_MAX_SIZE];
static int nsymbs_queue[QUEUE_MAX_SIZE];
static aom_cdf_prob cdf_queue[QUEUE_MAX_SIZE][16];
diff --git a/apps/aomdec.c b/apps/aomdec.c
index ab4a37f..1efc091 100644
--- a/apps/aomdec.c
+++ b/apps/aomdec.c
@@ -9,7 +9,6 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
-#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
@@ -118,10 +117,18 @@
};
#if CONFIG_LIBYUV
-static INLINE int libyuv_scale(aom_image_t *src, aom_image_t *dst,
+// Returns 0 on success and returns -1 on failure.
+static INLINE int libyuv_scale(const aom_image_t *src, aom_image_t *dst,
FilterModeEnum mode) {
+ if (src->fmt != dst->fmt) {
+ fprintf(stderr,
+ "%s failed to scale output frame because format changed from %s to "
+ "%s\n",
+ exec_name, image_format_to_string(dst->fmt),
+ image_format_to_string(src->fmt));
+ return -1;
+ }
if (src->fmt == AOM_IMG_FMT_I42016) {
- assert(dst->fmt == AOM_IMG_FMT_I42016);
return I420Scale_16(
(uint16_t *)src->planes[AOM_PLANE_Y], src->stride[AOM_PLANE_Y] / 2,
(uint16_t *)src->planes[AOM_PLANE_U], src->stride[AOM_PLANE_U] / 2,
@@ -131,15 +138,18 @@
dst->stride[AOM_PLANE_U] / 2, (uint16_t *)dst->planes[AOM_PLANE_V],
dst->stride[AOM_PLANE_V] / 2, dst->d_w, dst->d_h, mode);
}
- assert(src->fmt == AOM_IMG_FMT_I420);
- assert(dst->fmt == AOM_IMG_FMT_I420);
- return I420Scale(src->planes[AOM_PLANE_Y], src->stride[AOM_PLANE_Y],
- src->planes[AOM_PLANE_U], src->stride[AOM_PLANE_U],
- src->planes[AOM_PLANE_V], src->stride[AOM_PLANE_V], src->d_w,
- src->d_h, dst->planes[AOM_PLANE_Y], dst->stride[AOM_PLANE_Y],
- dst->planes[AOM_PLANE_U], dst->stride[AOM_PLANE_U],
- dst->planes[AOM_PLANE_V], dst->stride[AOM_PLANE_V], dst->d_w,
- dst->d_h, mode);
+ if (src->fmt == AOM_IMG_FMT_I420) {
+ return I420Scale(src->planes[AOM_PLANE_Y], src->stride[AOM_PLANE_Y],
+ src->planes[AOM_PLANE_U], src->stride[AOM_PLANE_U],
+ src->planes[AOM_PLANE_V], src->stride[AOM_PLANE_V],
+ src->d_w, src->d_h, dst->planes[AOM_PLANE_Y],
+ dst->stride[AOM_PLANE_Y], dst->planes[AOM_PLANE_U],
+ dst->stride[AOM_PLANE_U], dst->planes[AOM_PLANE_V],
+ dst->stride[AOM_PLANE_V], dst->d_w, dst->d_h, mode);
+ }
+ fprintf(stderr, "%s cannot scale output frame of format %s\n", exec_name,
+ image_format_to_string(src->fmt));
+ return -1;
}
#endif
@@ -371,7 +381,7 @@
case '7': snprintf(q, q_len - 1, "%07d", frame_in); break;
case '8': snprintf(q, q_len - 1, "%08d", frame_in); break;
case '9': snprintf(q, q_len - 1, "%09d", frame_in); break;
- default: die("Unrecognized pattern %%%c\n", p[1]); break;
+ default: die("Unrecognized pattern %%%c\n", p[1]);
}
pat_len = strlen(q);
@@ -878,7 +888,7 @@
if (img->d_w != scaled_img->d_w || img->d_h != scaled_img->d_h) {
#if CONFIG_LIBYUV
- libyuv_scale(img, scaled_img, kFilterBox);
+ if (libyuv_scale(img, scaled_img, kFilterBox) != 0) goto fail;
img = scaled_img;
#else
fprintf(
diff --git a/apps/aomenc.c b/apps/aomenc.c
index ef208fd..09306f2 100644
--- a/apps/aomenc.c
+++ b/apps/aomenc.c
@@ -237,6 +237,8 @@
AV1E_SET_ENABLE_TX_SIZE_SEARCH,
AV1E_SET_LOOPFILTER_CONTROL,
AV1E_SET_AUTO_INTRA_TOOLS_OFF,
+ AV1E_ENABLE_RATE_GUIDE_DELTAQ,
+ AV1E_SET_RATE_DISTRIBUTION_INFO,
0 };
const arg_def_t *main_args[] = { &g_av1_codec_arg_defs.help,
@@ -437,6 +439,8 @@
#endif
&g_av1_codec_arg_defs.dv_cost_upd_freq,
&g_av1_codec_arg_defs.partition_info_path,
+ &g_av1_codec_arg_defs.enable_rate_guide_deltaq,
+ &g_av1_codec_arg_defs.rate_distribution_info,
&g_av1_codec_arg_defs.enable_directional_intra,
&g_av1_codec_arg_defs.enable_tx_size_search,
&g_av1_codec_arg_defs.loopfilter_control,
@@ -533,6 +537,8 @@
const char *vmaf_model_path;
#endif
const char *partition_info_path;
+ unsigned int enable_rate_guide_deltaq;
+ const char *rate_distribution_info;
aom_color_range_t color_range;
const char *two_pass_input;
const char *two_pass_output;
@@ -1130,6 +1136,12 @@
} else if (arg_match(&arg, &g_av1_codec_arg_defs.partition_info_path,
argi)) {
config->partition_info_path = arg.val;
+ } else if (arg_match(&arg, &g_av1_codec_arg_defs.enable_rate_guide_deltaq,
+ argi)) {
+ config->enable_rate_guide_deltaq = arg_parse_uint(&arg);
+ } else if (arg_match(&arg, &g_av1_codec_arg_defs.rate_distribution_info,
+ argi)) {
+ config->rate_distribution_info = arg.val;
} else if (arg_match(&arg, &g_av1_codec_arg_defs.use_fixed_qp_offsets,
argi)) {
config->cfg.use_fixed_qp_offsets = arg_parse_uint(&arg);
@@ -1294,21 +1306,6 @@
}
}
-static const char *image_format_to_string(aom_img_fmt_t f) {
- switch (f) {
- case AOM_IMG_FMT_I420: return "I420";
- case AOM_IMG_FMT_I422: return "I422";
- case AOM_IMG_FMT_I444: return "I444";
- case AOM_IMG_FMT_YV12: return "YV12";
- case AOM_IMG_FMT_NV12: return "NV12";
- case AOM_IMG_FMT_YV1216: return "YV1216";
- case AOM_IMG_FMT_I42016: return "I42016";
- case AOM_IMG_FMT_I42216: return "I42216";
- case AOM_IMG_FMT_I44416: return "I44416";
- default: return "Other";
- }
-}
-
static void show_stream_config(struct stream_state *stream,
struct AvxEncoderConfig *global,
struct AvxInputContext *input) {
@@ -1540,6 +1537,16 @@
AV1E_SET_PARTITION_INFO_PATH,
stream->config.partition_info_path);
}
+ if (stream->config.enable_rate_guide_deltaq) {
+ AOM_CODEC_CONTROL_TYPECHECKED(&stream->encoder,
+ AV1E_ENABLE_RATE_GUIDE_DELTAQ,
+ stream->config.enable_rate_guide_deltaq);
+ }
+ if (stream->config.rate_distribution_info) {
+ AOM_CODEC_CONTROL_TYPECHECKED(&stream->encoder,
+ AV1E_SET_RATE_DISTRIBUTION_INFO,
+ stream->config.rate_distribution_info);
+ }
if (stream->config.film_grain_filename) {
AOM_CODEC_CONTROL_TYPECHECKED(&stream->encoder, AV1E_SET_FILM_GRAIN_TABLE,
diff --git a/av1/arg_defs.c b/av1/arg_defs.c
index abfd4b3..35a2ab4 100644
--- a/av1/arg_defs.c
+++ b/av1/arg_defs.c
@@ -47,6 +47,7 @@
{ "vmaf", AOM_TUNE_VMAF_MAX_GAIN },
{ "vmaf_neg", AOM_TUNE_VMAF_NEG_MAX_GAIN },
{ "butteraugli", AOM_TUNE_BUTTERAUGLI },
+ { "vmaf_saliency_map", AOM_TUNE_VMAF_SALIENCY_MAP },
{ NULL, 0 }
};
@@ -226,13 +227,17 @@
ARG_DEF(NULL, "use-16bit-internal", 0, "Force use of 16-bit pipeline"),
.dropframe_thresh =
ARG_DEF(NULL, "drop-frame", 1, "Temporal resampling threshold (buf %)"),
- .resize_mode = ARG_DEF(NULL, "resize-mode", 1, "Frame resize mode"),
+ .resize_mode = ARG_DEF(
+ NULL, "resize-mode", 1,
+ "Frame resize mode (0: off (default), 1: fixed, 2: random, 3: dynamic)"),
.resize_denominator =
ARG_DEF(NULL, "resize-denominator", 1, "Frame resize denominator"),
.resize_kf_denominator = ARG_DEF(NULL, "resize-kf-denominator", 1,
"Frame resize keyframe denominator"),
.superres_mode =
- ARG_DEF(NULL, "superres-mode", 1, "Frame super-resolution mode"),
+ ARG_DEF(NULL, "superres-mode", 1,
+ "Frame super-resolution mode (0: disabled (default), 1: fixed, "
+ "2: random, 3: qthresh, 4: auto)"),
.superres_denominator = ARG_DEF(NULL, "superres-denominator", 1,
"Frame super-resolution denominator"),
.superres_kf_denominator =
@@ -495,6 +500,16 @@
#endif
.partition_info_path = ARG_DEF(NULL, "partition-info-path", 1,
"Partition information read and write path"),
+ .enable_rate_guide_deltaq =
+ ARG_DEF(NULL, "enable-rate-guide-deltaq", 1,
+ "Enable rate guide deltaq (1), by default off (0). "
+ "It requires --deltaq-mode=3. "
+ "If turned on, it requires an input file specified "
+ "by --rate-distribution-info."),
+ .rate_distribution_info =
+ ARG_DEF(NULL, "rate-distribution-info", 1,
+ "Rate distribution information input."
+ "It requires --enable-rate-guide-deltaq=1."),
.film_grain_test = ARG_DEF(
NULL, "film-grain-test", 1,
"Film grain test vectors (0: none (default), 1: test-1 2: test-2, "
diff --git a/av1/arg_defs.h b/av1/arg_defs.h
index e15a84c..b9d0cfe 100644
--- a/av1/arg_defs.h
+++ b/av1/arg_defs.h
@@ -21,6 +21,7 @@
#include "common/webmenc.h"
#endif
#include "aom/aomcx.h"
+#include "aom_dsp/flow_estimation/flow_estimation.h"
enum TestDecodeFatality {
TEST_DECODE_OFF,
@@ -185,6 +186,8 @@
arg_def_t vmaf_model_path;
#endif
arg_def_t partition_info_path;
+ arg_def_t enable_rate_guide_deltaq;
+ arg_def_t rate_distribution_info;
arg_def_t film_grain_test;
arg_def_t film_grain_table;
#if CONFIG_DENOISE
diff --git a/av1/av1.cmake b/av1/av1.cmake
index ae1eba7..43b7665 100644
--- a/av1/av1.cmake
+++ b/av1/av1.cmake
@@ -214,6 +214,7 @@
"${AOM_ROOT}/av1/encoder/rd.h"
"${AOM_ROOT}/av1/encoder/rdopt.c"
"${AOM_ROOT}/av1/encoder/nonrd_pickmode.c"
+ "${AOM_ROOT}/av1/encoder/nonrd_opt.c"
"${AOM_ROOT}/av1/encoder/nonrd_opt.h"
"${AOM_ROOT}/av1/encoder/rdopt.h"
"${AOM_ROOT}/av1/encoder/rdopt_data_defs.h"
@@ -357,9 +358,11 @@
"${AOM_ROOT}/av1/encoder/arm/neon/av1_error_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/encodetxb_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c"
+ "${AOM_ROOT}/av1/encoder/arm/neon/av1_k_means_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/highbd_fwd_txfm_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/wedge_utils_neon.c"
+ "${AOM_ROOT}/av1/encoder/arm/neon/reconinter_enc_neon.c"
"${AOM_ROOT}/av1/encoder/arm/neon/temporal_filter_neon.c")
list(APPEND AOM_AV1_ENCODER_INTRIN_ARM_CRC32
@@ -400,6 +403,11 @@
"${AOM_ROOT}/av1/encoder/tune_butteraugli.h")
endif()
+if(CONFIG_SALIENCY_MAP)
+ list(APPEND AOM_AV1_ENCODER_SOURCES "${AOM_ROOT}/av1/encoder/saliency_map.c"
+ "${AOM_ROOT}/av1/encoder/saliency_map.h")
+endif()
+
if(CONFIG_OPTICAL_FLOW_API)
list(APPEND AOM_AV1_ENCODER_SOURCES
"${AOM_ROOT}/av1/encoder/sparse_linear_solver.c"
@@ -437,6 +445,9 @@
"${AOM_ROOT}/av1/common/x86/highbd_wiener_convolve_avx2.c"
"${AOM_ROOT}/av1/common/x86/highbd_warp_affine_avx2.c")
+ list(APPEND AOM_AV1_COMMON_INTRIN_NEON
+ "${AOM_ROOT}/av1/common/arm/highbd_convolve_neon.c")
+
list(APPEND AOM_AV1_ENCODER_INTRIN_SSE2
"${AOM_ROOT}/av1/encoder/x86/highbd_block_error_intrin_sse2.c"
"${AOM_ROOT}/av1/encoder/x86/highbd_temporal_filter_sse2.c")
diff --git a/av1/av1_cx_iface.c b/av1/av1_cx_iface.c
index 178d966..403f994 100644
--- a/av1/av1_cx_iface.c
+++ b/av1/av1_cx_iface.c
@@ -20,13 +20,17 @@
#include "aom/aom_encoder.h"
#include "aom/internal/aom_codec_internal.h"
+#include "aom_dsp/flow_estimation/flow_estimation.h"
+
#include "av1/av1_iface_common.h"
#include "av1/encoder/bitstream.h"
#include "av1/encoder/encoder.h"
+#include "av1/encoder/encoder_alloc.h"
#include "av1/encoder/encoder_utils.h"
#include "av1/encoder/ethread.h"
#include "av1/encoder/external_partition.h"
#include "av1/encoder/firstpass.h"
+#include "av1/encoder/rc_utils.h"
#include "av1/arg_defs.h"
#include "common/args_helper.h"
@@ -53,6 +57,8 @@
aom_tune_metric tuning;
const char *vmaf_model_path;
const char *partition_info_path;
+ unsigned int enable_rate_guide_deltaq;
+ const char *rate_distribution_info;
aom_dist_metric dist_metric;
unsigned int cq_level; // constrained quality level
unsigned int rc_max_intra_bitrate_pct;
@@ -231,6 +237,8 @@
AOM_TUNE_PSNR, // tuning
"/usr/local/share/model/vmaf_v0.6.1.json", // VMAF model path
".", // partition info path
+ 0, // enable rate guide deltaq
+ "./rate_map.txt", // rate distribution input
AOM_DIST_METRIC_PSNR, // dist_metric
10, // cq_level
0, // rc_max_intra_bitrate_pct
@@ -380,6 +388,8 @@
AOM_TUNE_PSNR, // tuning
"/usr/local/share/model/vmaf_v0.6.1.json", // VMAF model path
".", // partition info path
+ 0, // enable rate guide deltaq
+ "./rate_map.txt", // rate distribution input
AOM_DIST_METRIC_PSNR, // dist_metric
10, // cq_level
0, // rc_max_intra_bitrate_pct
@@ -552,6 +562,7 @@
ratio->den /= denom;
}
+// Called by encoder_encode() only. Must not be called by encoder_init().
static aom_codec_err_t update_error_state(
aom_codec_alg_priv_t *ctx, const struct aom_internal_error_info *error) {
const aom_codec_err_t res = error->error_code;
@@ -703,12 +714,13 @@
RANGE_CHECK_HI(extra_cfg, enable_auto_alt_ref, 1);
RANGE_CHECK_HI(extra_cfg, enable_auto_bwd_ref, 2);
RANGE_CHECK(extra_cfg, cpu_used, 0,
- (cfg->g_usage == AOM_USAGE_REALTIME) ? 10 : 9);
+ (cfg->g_usage == AOM_USAGE_REALTIME) ? 11 : 9);
RANGE_CHECK_HI(extra_cfg, noise_sensitivity, 6);
RANGE_CHECK(extra_cfg, superblock_size, AOM_SUPERBLOCK_SIZE_64X64,
AOM_SUPERBLOCK_SIZE_DYNAMIC);
RANGE_CHECK_HI(cfg, large_scale_tile, 1);
RANGE_CHECK_HI(extra_cfg, single_tile_decoding, 1);
+ RANGE_CHECK_HI(extra_cfg, enable_rate_guide_deltaq, 1);
RANGE_CHECK_HI(extra_cfg, row_mt, 1);
RANGE_CHECK_HI(extra_cfg, fp_mt, 1);
@@ -761,6 +773,10 @@
ERROR("Current pass is larger than total number of passes.");
}
+ if (cfg->g_profile == (unsigned int)PROFILE_1 && cfg->monochrome) {
+ ERROR("Monochrome is not supported in profile 1");
+ }
+
if (cfg->g_profile <= (unsigned int)PROFILE_1 &&
cfg->g_bit_depth > AOM_BITS_10) {
ERROR("Codec bit-depth 12 not supported in profile < 2");
@@ -814,7 +830,7 @@
}
#endif
- RANGE_CHECK(extra_cfg, tuning, AOM_TUNE_PSNR, AOM_TUNE_BUTTERAUGLI);
+ RANGE_CHECK(extra_cfg, tuning, AOM_TUNE_PSNR, AOM_TUNE_VMAF_SALIENCY_MAP);
RANGE_CHECK(extra_cfg, dist_metric, AOM_DIST_METRIC_PSNR,
AOM_DIST_METRIC_QM_PSNR);
@@ -988,9 +1004,9 @@
extra_cfg->reduced_tx_type_set = cfg->reduced_tx_type_set;
}
-static aom_codec_err_t set_encoder_config(AV1EncoderConfig *oxcf,
- const aom_codec_enc_cfg_t *cfg,
- struct av1_extracfg *extra_cfg) {
+static void set_encoder_config(AV1EncoderConfig *oxcf,
+ const aom_codec_enc_cfg_t *cfg,
+ struct av1_extracfg *extra_cfg) {
if (cfg->encoder_cfg.init_by_cfg_file) {
update_default_encoder_config(&cfg->encoder_cfg, extra_cfg);
}
@@ -1082,16 +1098,6 @@
dec_model_cfg->decoder_model_info_present_flag = 0;
dec_model_cfg->display_model_info_present_flag = 1;
} else if (extra_cfg->timing_info_type == AOM_TIMING_DEC_MODEL) {
- // if( extra_cfg->arnr_strength > 0 )
- // {
- // printf("Only --arnr-strength=0 can currently be used with
- // --timing-info=model."); return AOM_CODEC_INVALID_PARAM;
- // }
- // if( extra_cfg->enable_superres)
- // {
- // printf("Only --superres-mode=0 can currently be used with
- // --timing-info=model."); return AOM_CODEC_INVALID_PARAM;
- // }
dec_model_cfg->num_units_in_decoding_tick = cfg->g_timebase.num;
dec_model_cfg->timing_info.equal_picture_interval = 0;
dec_model_cfg->decoder_model_info_present_flag = 1;
@@ -1156,7 +1162,19 @@
tool_cfg->enable_interintra_comp = extra_cfg->enable_interintra_comp;
tool_cfg->ref_frame_mvs_present =
extra_cfg->enable_ref_frame_mvs & extra_cfg->enable_order_hint;
- tool_cfg->enable_global_motion = extra_cfg->enable_global_motion;
+
+ // Explicitly disable global motion in a few cases:
+ // * For realtime mode, we never search global motion, and disabling
+ // it here prevents later code from allocating buffers we don't need
+ // * For large scale tile mode, some of the intended use cases expect
+ // all frame headers to be identical. This breaks if global motion is
+ // used, since global motion data is stored in the frame header.
+ // eg, see test/lightfield_test.sh, which checks that all frame headers
+ // are the same.
+ tool_cfg->enable_global_motion = extra_cfg->enable_global_motion &&
+ cfg->g_usage != AOM_USAGE_REALTIME &&
+ !cfg->large_scale_tile;
+
tool_cfg->error_resilient_mode =
cfg->g_error_resilient | extra_cfg->error_resilient_mode;
tool_cfg->frame_parallel_decoding_mode =
@@ -1452,13 +1470,14 @@
oxcf->partition_info_path = extra_cfg->partition_info_path;
+ oxcf->enable_rate_guide_deltaq = extra_cfg->enable_rate_guide_deltaq;
+ oxcf->rate_distribution_info = extra_cfg->rate_distribution_info;
+
oxcf->strict_level_conformance = extra_cfg->strict_level_conformance;
oxcf->kf_max_pyr_height = extra_cfg->kf_max_pyr_height;
oxcf->sb_qp_sweep = extra_cfg->sb_qp_sweep;
-
- return AOM_CODEC_OK;
}
AV1EncoderConfig av1_get_encoder_config(const aom_codec_enc_cfg_t *cfg) {
@@ -1557,7 +1576,7 @@
}
static aom_codec_err_t update_extra_cfg(aom_codec_alg_priv_t *ctx,
- struct av1_extracfg *extra_cfg) {
+ const struct av1_extracfg *extra_cfg) {
const aom_codec_err_t res = validate_config(ctx, &ctx->cfg, extra_cfg);
if (res == AOM_CODEC_OK) {
ctx->extra_cfg = *extra_cfg;
@@ -1620,22 +1639,28 @@
static aom_codec_err_t ctrl_set_row_mt(aom_codec_alg_priv_t *ctx,
va_list args) {
+ unsigned int row_mt = CAST(AV1E_SET_ROW_MT, args);
+ if (row_mt == ctx->extra_cfg.row_mt) return AOM_CODEC_OK;
struct av1_extracfg extra_cfg = ctx->extra_cfg;
- extra_cfg.row_mt = CAST(AV1E_SET_ROW_MT, args);
+ extra_cfg.row_mt = row_mt;
return update_extra_cfg(ctx, &extra_cfg);
}
static aom_codec_err_t ctrl_set_tile_columns(aom_codec_alg_priv_t *ctx,
va_list args) {
+ unsigned int tile_columns = CAST(AV1E_SET_TILE_COLUMNS, args);
+ if (tile_columns == ctx->extra_cfg.tile_columns) return AOM_CODEC_OK;
struct av1_extracfg extra_cfg = ctx->extra_cfg;
- extra_cfg.tile_columns = CAST(AV1E_SET_TILE_COLUMNS, args);
+ extra_cfg.tile_columns = tile_columns;
return update_extra_cfg(ctx, &extra_cfg);
}
static aom_codec_err_t ctrl_set_tile_rows(aom_codec_alg_priv_t *ctx,
va_list args) {
+ unsigned int tile_rows = CAST(AV1E_SET_TILE_ROWS, args);
+ if (tile_rows == ctx->extra_cfg.tile_rows) return AOM_CODEC_OK;
struct av1_extracfg extra_cfg = ctx->extra_cfg;
- extra_cfg.tile_rows = CAST(AV1E_SET_TILE_ROWS, args);
+ extra_cfg.tile_rows = tile_rows;
return update_extra_cfg(ctx, &extra_cfg);
}
@@ -2139,6 +2164,9 @@
va_list args) {
struct av1_extracfg extra_cfg = ctx->extra_cfg;
extra_cfg.aq_mode = CAST(AV1E_SET_AQ_MODE, args);
+
+ // Skip AQ mode if using fixed QP for current frame.
+ if (ctx->ppi->cpi->rc.use_external_qp_one_pass) extra_cfg.aq_mode = 0;
return update_extra_cfg(ctx, &extra_cfg);
}
@@ -2248,6 +2276,25 @@
return update_extra_cfg(ctx, &extra_cfg);
}
+static aom_codec_err_t ctrl_enable_rate_guide_deltaq(aom_codec_alg_priv_t *ctx,
+ va_list args) {
+ struct av1_extracfg extra_cfg = ctx->extra_cfg;
+ extra_cfg.enable_rate_guide_deltaq =
+ CAST(AV1E_ENABLE_RATE_GUIDE_DELTAQ, args);
+ return update_extra_cfg(ctx, &extra_cfg);
+}
+
+static aom_codec_err_t ctrl_set_rate_distribution_info(
+ aom_codec_alg_priv_t *ctx, va_list args) {
+ struct av1_extracfg extra_cfg = ctx->extra_cfg;
+ const char *str = CAST(AV1E_SET_RATE_DISTRIBUTION_INFO, args);
+ const aom_codec_err_t ret = allocate_and_set_string(
+ str, default_extra_cfg.rate_distribution_info,
+ &extra_cfg.rate_distribution_info, ctx->ppi->error.detail);
+ if (ret != AOM_CODEC_OK) return ret;
+ return update_extra_cfg(ctx, &extra_cfg);
+}
+
static aom_codec_err_t ctrl_set_film_grain_test_vector(
aom_codec_alg_priv_t *ctx, va_list args) {
struct av1_extracfg extra_cfg = ctx->extra_cfg;
@@ -2411,10 +2458,15 @@
const int val = CAST(AV1E_SET_TARGET_SEQ_LEVEL_IDX, args);
const int level = val % 100;
const int operating_point_idx = val / 100;
- if (operating_point_idx >= 0 &&
- operating_point_idx < MAX_NUM_OPERATING_POINTS) {
- extra_cfg.target_seq_level_idx[operating_point_idx] = (AV1_LEVEL)level;
+ if (operating_point_idx < 0 ||
+ operating_point_idx >= MAX_NUM_OPERATING_POINTS) {
+ char *const err_string = ctx->ppi->error.detail;
+ snprintf(err_string, ARG_ERR_MSG_MAX_LEN,
+ "Invalid operating point index: %d", operating_point_idx);
+ ctx->base.err_detail = err_string;
+ return AOM_CODEC_INVALID_PARAM;
}
+ extra_cfg.target_seq_level_idx[operating_point_idx] = (AV1_LEVEL)level;
return update_extra_cfg(ctx, &extra_cfg);
}
@@ -2484,6 +2536,39 @@
return AOM_CODEC_OK;
}
+static aom_codec_err_t ctrl_set_quantizer_one_pass(aom_codec_alg_priv_t *ctx,
+ va_list args) {
+ const int qp = CAST(AV1E_SET_QUANTIZER_ONE_PASS, args);
+
+ if (qp < 0 || qp > 63) return AOM_CODEC_INVALID_PARAM;
+
+ aom_codec_enc_cfg_t *cfg = &ctx->cfg;
+ struct av1_extracfg extra_cfg = ctx->extra_cfg;
+ cfg->rc_min_quantizer = cfg->rc_max_quantizer = qp;
+ extra_cfg.aq_mode = 0;
+ ctx->ppi->cpi->rc.use_external_qp_one_pass = 1;
+
+ return update_extra_cfg(ctx, &extra_cfg);
+}
+
+static aom_codec_err_t ctrl_set_bitrate_one_pass_cbr(aom_codec_alg_priv_t *ctx,
+ va_list args) {
+ AV1_PRIMARY *const ppi = ctx->ppi;
+ AV1_COMP *const cpi = ppi->cpi;
+ AV1EncoderConfig *oxcf = &cpi->oxcf;
+ if (!is_one_pass_rt_params(cpi) || oxcf->rc_cfg.mode != AOM_CBR ||
+ cpi->ppi->use_svc || ppi->num_fp_contexts != 1 || ppi->cpi_lap != NULL) {
+ return AOM_CODEC_INVALID_PARAM;
+ }
+ const int new_bitrate = CAST(AV1E_SET_BITRATE_ONE_PASS_CBR, args);
+ ctx->cfg.rc_target_bitrate = new_bitrate;
+ oxcf->rc_cfg.target_bandwidth = new_bitrate * 1000;
+ set_primary_rc_buffer_sizes(oxcf, ppi);
+ av1_new_framerate(cpi, cpi->framerate);
+ check_reset_rc_flag(cpi);
+ return AOM_CODEC_OK;
+}
+
#if !CONFIG_REALTIME_ONLY
aom_codec_err_t av1_create_stats_buffer(FIRSTPASS_STATS **frame_stats_buffer,
STATS_BUFFER_CTX *stats_buf_context,
@@ -2517,19 +2602,33 @@
COMPRESSOR_STAGE stage,
int lap_lag_in_frames) {
aom_codec_err_t res = AOM_CODEC_OK;
+ BufferPool *buffer_pool = *p_buffer_pool;
- if (*p_buffer_pool == NULL) {
- *p_buffer_pool = (BufferPool *)aom_calloc(1, sizeof(BufferPool));
- if (*p_buffer_pool == NULL) return AOM_CODEC_MEM_ERROR;
-
+ if (buffer_pool == NULL) {
+ buffer_pool = (BufferPool *)aom_calloc(1, sizeof(BufferPool));
+ if (buffer_pool == NULL) return AOM_CODEC_MEM_ERROR;
+ buffer_pool->num_frame_bufs =
+ (oxcf->mode == ALLINTRA) ? FRAME_BUFFERS_ALLINTRA : FRAME_BUFFERS;
+ buffer_pool->frame_bufs = (RefCntBuffer *)aom_calloc(
+ buffer_pool->num_frame_bufs, sizeof(*buffer_pool->frame_bufs));
+ if (buffer_pool->frame_bufs == NULL) {
+ buffer_pool->num_frame_bufs = 0;
+ aom_free(buffer_pool);
+ return AOM_CODEC_MEM_ERROR;
+ }
#if CONFIG_MULTITHREAD
- if (pthread_mutex_init(&((*p_buffer_pool)->pool_mutex), NULL)) {
+ if (pthread_mutex_init(&buffer_pool->pool_mutex, NULL)) {
+ aom_free(buffer_pool->frame_bufs);
+ buffer_pool->frame_bufs = NULL;
+ buffer_pool->num_frame_bufs = 0;
+ aom_free(buffer_pool);
return AOM_CODEC_MEM_ERROR;
}
#endif
+ *p_buffer_pool = buffer_pool;
}
- *p_cpi = av1_create_compressor(ppi, oxcf, *p_buffer_pool, stage,
- lap_lag_in_frames);
+ *p_cpi =
+ av1_create_compressor(ppi, oxcf, buffer_pool, stage, lap_lag_in_frames);
if (*p_cpi == NULL) res = AOM_CODEC_MEM_ERROR;
return res;
@@ -2705,6 +2804,8 @@
&extra_cfg->second_pass_log);
check_and_free_string(default_extra_cfg.partition_info_path,
&extra_cfg->partition_info_path);
+ check_and_free_string(default_extra_cfg.rate_distribution_info,
+ &extra_cfg->rate_distribution_info);
check_and_free_string(default_extra_cfg.film_grain_table_filename,
&extra_cfg->film_grain_table_filename);
}
@@ -2894,10 +2995,6 @@
if (res == AOM_CODEC_OK) {
AV1_COMP *cpi = ppi->cpi;
- const int num_layers =
- cpi->svc.number_spatial_layers * cpi->svc.number_temporal_layers;
- if (!av1_alloc_layer_context(cpi, num_layers)) return AOM_CODEC_MEM_ERROR;
-
// Set up internal flags
if (ctx->base.init_flags & AOM_CODEC_USE_PSNR) ppi->b_calculate_psnr = 1;
@@ -2944,7 +3041,7 @@
subsampling_x, subsampling_y, use_highbitdepth, lag_in_frames,
src_border_in_pixels, cpi->common.features.byte_alignment,
ctx->num_lap_buffers, (cpi->oxcf.kf_cfg.key_freq_max == 0),
- cpi->oxcf.tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels);
}
if (!ppi->lookahead)
aom_internal_error(&ppi->error, AOM_CODEC_MEM_ERROR,
@@ -3015,6 +3112,19 @@
}
#endif // CONFIG_MULTITHREAD
}
+
+ // Re-allocate thread data if workers for encoder multi-threading stage
+ // exceeds prev_num_enc_workers.
+ const int num_enc_workers =
+ av1_get_num_mod_workers_for_alloc(&ppi->p_mt_info, MOD_ENC);
+ if (ppi->p_mt_info.prev_num_enc_workers < num_enc_workers &&
+ num_enc_workers <= ppi->p_mt_info.num_workers) {
+ free_thread_data(ppi);
+ for (int j = 0; j < ppi->num_fp_contexts; j++)
+ aom_free(ppi->parallel_cpi[j]->td.tctx);
+ av1_init_tile_thread_data(ppi, cpi->oxcf.pass == AOM_RC_FIRST_PASS);
+ }
+
for (int i = 0; i < ppi->num_fp_contexts; i++) {
av1_init_frame_mt(ppi, ppi->parallel_cpi[i]);
}
@@ -3414,6 +3524,7 @@
AV1_PRIMARY *const ppi = ctx->ppi;
AV1_COMP *const cpi = ppi->cpi;
AV1_COMMON *const cm = &cpi->common;
+ AV1EncoderConfig *oxcf = &cpi->oxcf;
aom_svc_params_t *const params = va_arg(args, aom_svc_params_t *);
int64_t target_bandwidth = 0;
ppi->number_spatial_layers = params->number_spatial_layers;
@@ -3425,6 +3536,13 @@
ctx->ppi->use_svc = 1;
const int num_layers =
ppi->number_spatial_layers * ppi->number_temporal_layers;
+ for (int layer = 0; layer < num_layers; ++layer) {
+ if (params->max_quantizers[layer] > 63 ||
+ params->min_quantizers[layer] < 0 ||
+ params->min_quantizers[layer] > params->max_quantizers[layer]) {
+ return AOM_CODEC_INVALID_PARAM;
+ }
+ }
if (!av1_alloc_layer_context(cpi, num_layers)) return AOM_CODEC_MEM_ERROR;
for (sl = 0; sl < ppi->number_spatial_layers; ++sl) {
@@ -3450,7 +3568,10 @@
}
av1_init_layer_context(cpi);
}
+ oxcf->rc_cfg.target_bandwidth = target_bandwidth;
+ set_primary_rc_buffer_sizes(oxcf, cpi->ppi);
av1_update_layer_context_change_config(cpi, target_bandwidth);
+ check_reset_rc_flag(cpi);
}
av1_check_fpmt_config(ctx->ppi, &ctx->ppi->cpi->oxcf);
return AOM_CODEC_OK;
@@ -3660,6 +3781,17 @@
argv, err_string)) {
err = allocate_and_set_string(value, default_extra_cfg.partition_info_path,
&extra_cfg.partition_info_path, err_string);
+ } else if (arg_match_helper(&arg,
+ &g_av1_codec_arg_defs.enable_rate_guide_deltaq,
+ argv, err_string)) {
+ extra_cfg.enable_rate_guide_deltaq =
+ arg_parse_uint_helper(&arg, err_string);
+ } else if (arg_match_helper(&arg,
+ &g_av1_codec_arg_defs.rate_distribution_info,
+ argv, err_string)) {
+ err =
+ allocate_and_set_string(value, default_extra_cfg.rate_distribution_info,
+ &extra_cfg.rate_distribution_info, err_string);
} else if (arg_match_helper(&arg, &g_av1_codec_arg_defs.dist_metric, argv,
err_string)) {
extra_cfg.dist_metric = arg_parse_enum_helper(&arg, err_string);
@@ -3951,8 +4083,12 @@
const int val = arg_parse_int_helper(&arg, err_string);
const int level = val % 100;
const int operating_point_idx = val / 100;
- if (operating_point_idx >= 0 &&
- operating_point_idx < MAX_NUM_OPERATING_POINTS) {
+ if (operating_point_idx < 0 ||
+ operating_point_idx >= MAX_NUM_OPERATING_POINTS) {
+ snprintf(err_string, ARG_ERR_MSG_MAX_LEN,
+ "Invalid operating point index: %d", operating_point_idx);
+ err = AOM_CODEC_INVALID_PARAM;
+ } else {
extra_cfg.target_seq_level_idx[operating_point_idx] = (AV1_LEVEL)level;
}
} else if (arg_match_helper(&arg,
@@ -4050,6 +4186,16 @@
return AOM_CODEC_OK;
}
+static aom_codec_err_t ctrl_get_luma_cdef_strength(aom_codec_alg_priv_t *ctx,
+ va_list args) {
+ int *arg = va_arg(args, int *);
+ AV1_COMMON const *cm = &ctx->ppi->cpi->common;
+ if (arg == NULL) return AOM_CODEC_INVALID_PARAM;
+ memcpy(arg, cm->cdef_info.cdef_strengths, CDEF_MAX_STRENGTHS * sizeof(*arg));
+
+ return AOM_CODEC_OK;
+}
+
static aom_codec_ctrl_fn_map_t encoder_ctrl_maps[] = {
{ AV1_COPY_REFERENCE, ctrl_copy_reference },
{ AOME_USE_REFERENCE, ctrl_use_reference },
@@ -4165,6 +4311,8 @@
{ AV1E_SET_SINGLE_TILE_DECODING, ctrl_set_single_tile_decoding },
{ AV1E_SET_VMAF_MODEL_PATH, ctrl_set_vmaf_model_path },
{ AV1E_SET_PARTITION_INFO_PATH, ctrl_set_partition_info_path },
+ { AV1E_ENABLE_RATE_GUIDE_DELTAQ, ctrl_enable_rate_guide_deltaq },
+ { AV1E_SET_RATE_DISTRIBUTION_INFO, ctrl_set_rate_distribution_info },
{ AV1E_SET_FILM_GRAIN_TEST_VECTOR, ctrl_set_film_grain_test_vector },
{ AV1E_SET_FILM_GRAIN_TABLE, ctrl_set_film_grain_table },
{ AV1E_SET_DENOISE_NOISE_LEVEL, ctrl_set_denoise_noise_level },
@@ -4190,6 +4338,8 @@
{ AV1E_SET_SKIP_POSTPROC_FILTERING, ctrl_set_skip_postproc_filtering },
{ AV1E_SET_AUTO_INTRA_TOOLS_OFF, ctrl_set_auto_intra_tools_off },
{ AV1E_SET_RTC_EXTERNAL_RC, ctrl_set_rtc_external_rc },
+ { AV1E_SET_QUANTIZER_ONE_PASS, ctrl_set_quantizer_one_pass },
+ { AV1E_SET_BITRATE_ONE_PASS_CBR, ctrl_set_bitrate_one_pass_cbr },
// Getters
{ AOME_GET_LAST_QUANTIZER, ctrl_get_quantizer },
@@ -4205,6 +4355,7 @@
{ AV1E_GET_BASELINE_GF_INTERVAL, ctrl_get_baseline_gf_interval },
{ AV1E_GET_TARGET_SEQ_LEVEL_IDX, ctrl_get_target_seq_level_idx },
{ AV1E_GET_NUM_OPERATING_POINTS, ctrl_get_num_operating_points },
+ { AV1E_GET_LUMA_CDEF_STRENGTH, ctrl_get_luma_cdef_strength },
CTRL_MAP_END,
};
diff --git a/av1/av1_dx_iface.c b/av1/av1_dx_iface.c
index 809268f..a1e7558 100644
--- a/av1/av1_dx_iface.c
+++ b/av1/av1_dx_iface.c
@@ -121,16 +121,13 @@
AV1Decoder *const pbi = frame_worker_data->pbi;
aom_free(pbi->common.tpl_mvs);
pbi->common.tpl_mvs = NULL;
- av1_remove_common(&frame_worker_data->pbi->common);
+ av1_remove_common(&pbi->common);
av1_free_cdef_buffers(&pbi->common, &pbi->cdef_worker, &pbi->cdef_sync);
av1_free_cdef_sync(&pbi->cdef_sync);
av1_free_restoration_buffers(&pbi->common);
av1_decoder_remove(pbi);
}
aom_free(frame_worker_data);
-#if CONFIG_MULTITHREAD
- pthread_mutex_destroy(&ctx->buffer_pool->pool_mutex);
-#endif
}
if (ctx->buffer_pool) {
@@ -140,6 +137,9 @@
}
av1_free_ref_frame_buffers(ctx->buffer_pool);
av1_free_internal_frame_buffers(&ctx->buffer_pool->int_frame_buffers);
+#if CONFIG_MULTITHREAD
+ pthread_mutex_destroy(&ctx->buffer_pool->pool_mutex);
+#endif
}
aom_free(ctx->frame_worker);
@@ -428,9 +428,23 @@
ctx->buffer_pool = (BufferPool *)aom_calloc(1, sizeof(BufferPool));
if (ctx->buffer_pool == NULL) return AOM_CODEC_MEM_ERROR;
+ ctx->buffer_pool->num_frame_bufs = FRAME_BUFFERS;
+ ctx->buffer_pool->frame_bufs = (RefCntBuffer *)aom_calloc(
+ ctx->buffer_pool->num_frame_bufs, sizeof(*ctx->buffer_pool->frame_bufs));
+ if (ctx->buffer_pool->frame_bufs == NULL) {
+ ctx->buffer_pool->num_frame_bufs = 0;
+ aom_free(ctx->buffer_pool);
+ ctx->buffer_pool = NULL;
+ return AOM_CODEC_MEM_ERROR;
+ }
#if CONFIG_MULTITHREAD
if (pthread_mutex_init(&ctx->buffer_pool->pool_mutex, NULL)) {
+ aom_free(ctx->buffer_pool->frame_bufs);
+ ctx->buffer_pool->frame_bufs = NULL;
+ ctx->buffer_pool->num_frame_bufs = 0;
+ aom_free(ctx->buffer_pool);
+ ctx->buffer_pool = NULL;
set_error_detail(ctx, "Failed to allocate buffer pool mutex");
return AOM_CODEC_MEM_ERROR;
}
@@ -443,18 +457,24 @@
}
AVxWorker *const worker = ctx->frame_worker;
- FrameWorkerData *frame_worker_data = NULL;
winterface->init(worker);
worker->thread_name = "aom frameworker";
worker->data1 = aom_memalign(32, sizeof(FrameWorkerData));
if (worker->data1 == NULL) {
+ winterface->end(worker);
+ aom_free(worker);
+ ctx->frame_worker = NULL;
set_error_detail(ctx, "Failed to allocate frame_worker_data");
return AOM_CODEC_MEM_ERROR;
}
- frame_worker_data = (FrameWorkerData *)worker->data1;
+ FrameWorkerData *frame_worker_data = (FrameWorkerData *)worker->data1;
frame_worker_data->pbi = av1_decoder_create(ctx->buffer_pool);
if (frame_worker_data->pbi == NULL) {
- set_error_detail(ctx, "Failed to allocate frame_worker_data");
+ winterface->end(worker);
+ aom_free(frame_worker_data);
+ aom_free(worker);
+ ctx->frame_worker = NULL;
+ set_error_detail(ctx, "Failed to allocate frame_worker_data->pbi");
return AOM_CODEC_MEM_ERROR;
}
frame_worker_data->frame_context_ready = 0;
diff --git a/av1/common/alloccommon.c b/av1/common/alloccommon.c
index 677078d..6e95f70 100644
--- a/av1/common/alloccommon.c
+++ b/av1/common/alloccommon.c
@@ -36,7 +36,7 @@
void av1_free_ref_frame_buffers(BufferPool *pool) {
int i;
- for (i = 0; i < FRAME_BUFFERS; ++i) {
+ for (i = 0; i < pool->num_frame_bufs; ++i) {
if (pool->frame_bufs[i].ref_count > 0 &&
pool->frame_bufs[i].raw_frame_buffer.data != NULL) {
pool->release_fb_cb(pool->cb_priv, &pool->frame_bufs[i].raw_frame_buffer);
@@ -51,6 +51,9 @@
pool->frame_bufs[i].seg_map = NULL;
aom_free_frame_buffer(&pool->frame_bufs[i].buf);
}
+ aom_free(pool->frame_bufs);
+ pool->frame_bufs = NULL;
+ pool->num_frame_bufs = 0;
}
static INLINE void free_cdef_linebuf_conditional(
@@ -286,12 +289,12 @@
}
// Assumes cm->rst_info[p].restoration_unit_size is already initialized
-void av1_alloc_restoration_buffers(AV1_COMMON *cm) {
+void av1_alloc_restoration_buffers(AV1_COMMON *cm, bool is_sgr_enabled) {
const int num_planes = av1_num_planes(cm);
for (int p = 0; p < num_planes; ++p)
av1_alloc_restoration_struct(cm, &cm->rst_info[p], p > 0);
- if (cm->rst_tmpbuf == NULL) {
+ if (cm->rst_tmpbuf == NULL && is_sgr_enabled) {
CHECK_MEM_ERROR(cm, cm->rst_tmpbuf,
(int32_t *)aom_memalign(16, RESTORATION_TMPBUF_SIZE));
}
diff --git a/av1/common/alloccommon.h b/av1/common/alloccommon.h
index fc4a8ba..d31b4c5 100644
--- a/av1/common/alloccommon.h
+++ b/av1/common/alloccommon.h
@@ -14,6 +14,8 @@
#define INVALID_IDX -1 // Invalid buffer index.
+#include <stdbool.h>
+
#include "config/aom_config.h"
#include "av1/common/enums.h"
@@ -48,7 +50,7 @@
void av1_free_cdef_buffers(struct AV1Common *const cm,
struct AV1CdefWorker **cdef_worker,
struct AV1CdefSyncData *cdef_sync);
-void av1_alloc_restoration_buffers(struct AV1Common *cm);
+void av1_alloc_restoration_buffers(struct AV1Common *cm, bool is_sgr_enabled);
void av1_free_restoration_buffers(struct AV1Common *cm);
int av1_alloc_state_buffers(struct AV1Common *cm, int width, int height);
diff --git a/av1/common/arm/av1_inv_txfm_neon.c b/av1/common/arm/av1_inv_txfm_neon.c
index 1628cbf..8afcd1f 100644
--- a/av1/common/arm/av1_inv_txfm_neon.c
+++ b/av1/common/arm/av1_inv_txfm_neon.c
@@ -467,12 +467,13 @@
}
static INLINE void load_buffer_32bit_to_16bit_neon(const int32_t *input,
+ int stride,
int16x8_t *const a,
int out_size) {
- for (int i = 0; i < 8; ++i) {
+ for (int i = 0; i < out_size; ++i) {
a[i] = vcombine_s16(vmovn_s32(vld1q_s32(input)),
vmovn_s32(vld1q_s32(input + 4)));
- input += out_size;
+ input += stride;
}
}
@@ -3590,28 +3591,22 @@
const int buf_size_w_div8 = txfm_size_col >> 3;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
- const int32_t *input_1;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
+ const int input_stride = txfm_size_row;
int temp_b = 0;
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
- input_1 = input;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- load_buffer_32bit_to_16bit_neon(input_1, &a[k], txfm_size_col);
- transpose_s16_8x8q(&a[k], &a[k]);
- input_1 += 8;
- }
- input += (txfm_size_col * 8);
+ int16x8_t *cur_a = &a[i * txfm_size_col];
+ load_buffer_32bit_to_16bit_neon(input, input_stride, cur_a,
+ buf_size_nonzero_w);
+ input += 8;
if (abs(rect_type) == 1) {
- int y = i * txfm_size_col;
- round_shift_for_rect(&a[y], &a[y], txfm_size_col);
+ round_shift_for_rect(cur_a, cur_a, buf_size_nonzero_w);
}
- identity_txfm_round_neon(&a[i * txfm_size_col], &a[i * txfm_size_col],
- txw_idx, txfm_size_col, -shift[0]);
+ identity_txfm_round_neon(cur_a, cur_a, txw_idx, buf_size_nonzero_w,
+ -shift[0]);
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- transpose_s16_8x8q(&a[k], &b[temp_b + txfm_size_row * j]);
+ transpose_s16_8x8q(&cur_a[j * 8], &b[temp_b + txfm_size_row * j]);
}
temp_b += 8;
}
@@ -3646,9 +3641,9 @@
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int buf_size_w_div8 = txfm_size_col >> 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
+ const int input_stride = txfm_size_row;
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
- const int32_t *input_1;
int temp_b = 0;
const transform_neon row_txfm =
lowbd_txfm_all_1d_zeros_w_arr[txw_idx][hitx_1d_tab[tx_type]][fun_idx_x];
@@ -3658,33 +3653,26 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
- input_1 = input;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- load_buffer_32bit_to_16bit_neon(input_1, &a[k], txfm_size_col);
- transpose_s16_8x8q(&a[k], &a[k]);
- input_1 += 8;
- }
- input += (txfm_size_col * 8);
+ int16x8_t *cur_a = &a[i * txfm_size_col];
+ load_buffer_32bit_to_16bit_neon(input, input_stride, cur_a,
+ buf_size_nonzero_w);
+ input += 8;
if (abs(rect_type) == 1) {
- int y = i * txfm_size_col;
- round_shift_for_rect(&a[y], &a[y], txfm_size_col);
+ round_shift_for_rect(cur_a, cur_a, buf_size_nonzero_w);
}
- row_txfm(&a[i * txfm_size_col], &a[i * txfm_size_col], INV_COS_BIT);
- av1_round_shift_array_16_neon(&a[i * txfm_size_col], txfm_size_col,
- -shift[0]);
+ row_txfm(cur_a, cur_a, INV_COS_BIT);
+ av1_round_shift_array_16_neon(cur_a, txfm_size_col, -shift[0]);
if (lr_flip == 1) {
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- flip_buf_ud_neon(&a[k], 8);
+ flip_buf_ud_neon(&cur_a[j * 8], 8);
transpose_s16_8x8q(
- &a[k], &b[temp_b + txfm_size_row * (buf_size_w_div8 - 1 - j)]);
+ &cur_a[j * 8],
+ &b[temp_b + txfm_size_row * (buf_size_w_div8 - 1 - j)]);
}
temp_b += 8;
} else {
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- transpose_s16_8x8q(&a[k], &b[temp_b + txfm_size_row * j]);
+ transpose_s16_8x8q(&cur_a[j * 8], &b[temp_b + txfm_size_row * j]);
}
temp_b += 8;
}
@@ -3720,9 +3708,9 @@
const int buf_size_w_div8 = txfm_size_col >> 3;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
+ const int input_stride = txfm_size_row;
const int fun_idx_y = lowbd_txfm_all_1d_zeros_idx[eoby];
- const int32_t *input_1;
int temp_b = 0;
const transform_neon col_txfm =
lowbd_txfm_all_1d_zeros_w_arr[txh_idx][vitx_1d_tab[tx_type]][fun_idx_y];
@@ -3732,23 +3720,17 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
- input_1 = input;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- load_buffer_32bit_to_16bit_neon(input_1, &a[k], txfm_size_col);
- transpose_s16_8x8q(&a[k], &a[k]);
- input_1 += 8;
- }
- input += (txfm_size_col * 8);
+ int16x8_t *cur_a = &a[i * txfm_size_col];
+ load_buffer_32bit_to_16bit_neon(input, input_stride, cur_a,
+ buf_size_nonzero_w);
+ input += 8;
if (abs(rect_type) == 1) {
- int y = i * txfm_size_col;
- round_shift_for_rect(&a[y], &a[y], txfm_size_col);
+ round_shift_for_rect(cur_a, cur_a, buf_size_nonzero_w);
}
- identity_txfm_round_neon(&a[i * txfm_size_col], &a[i * txfm_size_col],
- txw_idx, txfm_size_col, -shift[0]);
+ identity_txfm_round_neon(cur_a, cur_a, txw_idx, buf_size_nonzero_w,
+ -shift[0]);
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- transpose_s16_8x8q(&a[k], &b[temp_b + txfm_size_row * j]);
+ transpose_s16_8x8q(&cur_a[j * 8], &b[temp_b + txfm_size_row * j]);
}
temp_b += 8;
}
@@ -3796,9 +3778,11 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < txfm_size_row; i++) {
- row_txfm(input, buf_ptr, INV_COS_BIT, stage_range);
+ for (int c = 0; c < txfm_size_col; ++c)
+ temp_in[c] = input[c * txfm_size_row];
+ row_txfm(temp_in, buf_ptr, INV_COS_BIT, stage_range);
- input += txfm_size_col;
+ input++;
buf_ptr += txfm_size_col;
}
@@ -3858,11 +3842,12 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < txfm_size_row; i++) {
- for (int j = 0; j < txfm_size_col; j++)
- temp_in[j] = round_shift((int64_t)input[j] * NewInvSqrt2, NewSqrt2Bits);
+ for (int c = 0; c < txfm_size_col; c++)
+ temp_in[c] = round_shift((int64_t)input[c * txfm_size_row] * NewInvSqrt2,
+ NewSqrt2Bits);
row_txfm(temp_in, buf_ptr, INV_COS_BIT, stage_range);
- input += txfm_size_col;
+ input++;
buf_ptr += txfm_size_col;
}
@@ -3922,11 +3907,12 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < txfm_size_row; i++) {
- for (int j = 0; j < txfm_size_col; j++)
- temp_in[j] = round_shift((int64_t)input[j] * NewInvSqrt2, NewSqrt2Bits);
+ for (int c = 0; c < txfm_size_col; c++)
+ temp_in[c] = round_shift((int64_t)input[c * txfm_size_row] * NewInvSqrt2,
+ NewSqrt2Bits);
row_txfm(temp_in, buf_ptr, INV_COS_BIT, stage_range);
- input += txfm_size_col;
+ input++;
buf_ptr += txfm_size_col;
}
@@ -3986,9 +3972,11 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < txfm_size_row; i++) {
- row_txfm(input, buf_ptr, INV_COS_BIT, stage_range);
+ for (int c = 0; c < txfm_size_col; c++)
+ temp_in[c] = input[c * txfm_size_row];
+ row_txfm(temp_in, buf_ptr, INV_COS_BIT, stage_range);
av1_round_shift_array(buf_ptr, txfm_size_col, -shift[0]);
- input += txfm_size_col;
+ input++;
buf_ptr += txfm_size_col;
}
@@ -4048,9 +4036,11 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < txfm_size_row; i++) {
- row_txfm(input, buf_ptr, INV_COS_BIT, stage_range);
+ for (int c = 0; c < txfm_size_col; c++)
+ temp_in[c] = input[c * txfm_size_row];
+ row_txfm(temp_in, buf_ptr, INV_COS_BIT, stage_range);
av1_round_shift_array(buf_ptr, txfm_size_col, -shift[0]);
- input += txfm_size_col;
+ input++;
buf_ptr += txfm_size_col;
}
@@ -4097,11 +4087,10 @@
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int buf_size_w_div8 = txfm_size_col >> 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
const int fun_idx_y = lowbd_txfm_all_1d_zeros_idx[eoby];
- const int32_t *input_1;
int temp_b = 0;
const transform_neon row_txfm =
@@ -4115,33 +4104,26 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
- input_1 = input;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- load_buffer_32bit_to_16bit_neon(input_1, &a[k], input_stride);
- transpose_s16_8x8q(&a[k], &a[k]);
- input_1 += 8;
- }
- input += (input_stride * 8);
+ int16x8_t *cur_a = &a[i * txfm_size_col];
+ load_buffer_32bit_to_16bit_neon(input, input_stride, cur_a,
+ buf_size_nonzero_w);
+ input += 8;
if (abs(rect_type) == 1) {
- int y = i * txfm_size_col;
- round_shift_for_rect(&a[y], &a[y], input_stride);
+ round_shift_for_rect(cur_a, cur_a, buf_size_nonzero_w);
}
- row_txfm(&a[i * txfm_size_col], &a[i * txfm_size_col], INV_COS_BIT);
- av1_round_shift_array_16_neon(&a[i * txfm_size_col], txfm_size_col,
- -shift[0]);
+ row_txfm(cur_a, cur_a, INV_COS_BIT);
+ av1_round_shift_array_16_neon(cur_a, txfm_size_col, -shift[0]);
if (lr_flip == 1) {
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- flip_buf_ud_neon(&a[k], 8);
+ flip_buf_ud_neon(&cur_a[j * 8], 8);
transpose_s16_8x8q(
- &a[k], &b[temp_b + txfm_size_row * (buf_size_w_div8 - 1 - j)]);
+ &cur_a[j * 8],
+ &b[temp_b + txfm_size_row * (buf_size_w_div8 - 1 - j)]);
}
temp_b += 8;
} else {
for (int j = 0; j < buf_size_w_div8; ++j) {
- int k = j * 8 + i * txfm_size_col;
- transpose_s16_8x8q(&a[k], &b[temp_b + txfm_size_row * j]);
+ transpose_s16_8x8q(&cur_a[j * 8], &b[temp_b + txfm_size_row * j]);
}
temp_b += 8;
}
diff --git a/av1/common/arm/blend_a64_hmask_neon.c b/av1/common/arm/blend_a64_hmask_neon.c
index 89252ef..baad328 100644
--- a/av1/common/arm/blend_a64_hmask_neon.c
+++ b/av1/common/arm/blend_a64_hmask_neon.c
@@ -34,8 +34,6 @@
uint8x8_t tmp0, tmp1;
uint8x16_t res_q;
uint16x8_t res, res_low, res_high;
- uint32x2_t tmp0_32 = vdup_n_u32(0), tmp1_32 = vdup_n_u32(0);
- uint16x4_t tmp0_16 = vdup_n_u16(0), tmp1_16 = vdup_n_u16(0);
const uint8x8_t vdup_64 = vdup_n_u8((uint8_t)64);
if (w >= 16) {
@@ -91,10 +89,8 @@
__builtin_prefetch(src0 + 1 * src0_stride);
__builtin_prefetch(src1 + 0 * src1_stride);
__builtin_prefetch(src1 + 1 * src1_stride);
- load_unaligned_u8_4x2(src0, src0_stride, &tmp0_32);
- tmp0 = vreinterpret_u8_u32(tmp0_32);
- load_unaligned_u8_4x2(src1, src1_stride, &tmp1_32);
- tmp1 = vreinterpret_u8_u32(tmp1_32);
+ tmp0 = load_unaligned_u8_4x2(src0, src0_stride);
+ tmp1 = load_unaligned_u8_4x2(src1, src1_stride);
res = vmull_u8(m, tmp0);
res = vmlal_u8(res, max_minus_m, tmp1);
const uint8x8_t result = vrshrn_n_u16(res, AOM_BLEND_A64_ROUND_BITS);
@@ -113,10 +109,8 @@
__builtin_prefetch(src0 + 1 * src0_stride);
__builtin_prefetch(src1 + 0 * src1_stride);
__builtin_prefetch(src1 + 1 * src1_stride);
- load_unaligned_u8_2x2(src0, src0_stride, &tmp0_16);
- tmp0 = vreinterpret_u8_u16(tmp0_16);
- load_unaligned_u8_2x2(src1, src1_stride, &tmp1_16);
- tmp1 = vreinterpret_u8_u16(tmp1_16);
+ tmp0 = load_unaligned_u8_2x2(src0, src0_stride);
+ tmp1 = load_unaligned_u8_2x2(src1, src1_stride);
res = vmull_u8(m, tmp0);
res = vmlal_u8(res, max_minus_m, tmp1);
const uint8x8_t result = vrshrn_n_u16(res, AOM_BLEND_A64_ROUND_BITS);
diff --git a/av1/common/arm/blend_a64_vmask_neon.c b/av1/common/arm/blend_a64_vmask_neon.c
index 2132fbd..c316977 100644
--- a/av1/common/arm/blend_a64_vmask_neon.c
+++ b/av1/common/arm/blend_a64_vmask_neon.c
@@ -27,8 +27,6 @@
uint8x8_t tmp0, tmp1;
uint8x16_t tmp0_q, tmp1_q, res_q;
uint16x8_t res, res_low, res_high;
- uint32x2_t tmp0_32 = vdup_n_u32(0), tmp1_32 = vdup_n_u32(0);
- uint16x4_t tmp0_16 = vdup_n_u16(0), tmp1_16 = vdup_n_u16(0);
assert(IMPLIES(src0 == dst, src0_stride == dst_stride));
assert(IMPLIES(src1 == dst, src1_stride == dst_stride));
@@ -89,10 +87,8 @@
const uint16x4_t max_minus_m2 = vdup_n_u16(64 - (uint16_t)mask[i + 1]);
const uint8x8_t max_minus_m =
vmovn_u16(vcombine_u16(max_minus_m1, max_minus_m2));
- load_unaligned_u8_4x2(src0, src0_stride, &tmp0_32);
- tmp0 = vreinterpret_u8_u32(tmp0_32);
- load_unaligned_u8_4x2(src1, src1_stride, &tmp1_32);
- tmp1 = vreinterpret_u8_u32(tmp1_32);
+ tmp0 = load_unaligned_u8_4x2(src0, src0_stride);
+ tmp1 = load_unaligned_u8_4x2(src1, src1_stride);
res = vmull_u8(m, tmp0);
res = vmlal_u8(res, max_minus_m, tmp1);
const uint8x8_t result = vrshrn_n_u16(res, AOM_BLEND_A64_ROUND_BITS);
@@ -118,10 +114,8 @@
const uint16x4x2_t max_minus_m_trn = vtrn_u16(
vreinterpret_u16_u8(max_minus_m1), vreinterpret_u16_u8(max_minus_m2));
const uint8x8_t max_minus_m = vreinterpret_u8_u16(max_minus_m_trn.val[0]);
- load_unaligned_u8_2x2(src0, src0_stride, &tmp0_16);
- tmp0 = vreinterpret_u8_u16(tmp0_16);
- load_unaligned_u8_2x2(src1, src1_stride, &tmp1_16);
- tmp1 = vreinterpret_u8_u16(tmp1_16);
+ tmp0 = load_unaligned_u8_2x2(src0, src0_stride);
+ tmp1 = load_unaligned_u8_2x2(src1, src1_stride);
res = vmull_u8(m, tmp0);
res = vmlal_u8(res, max_minus_m, tmp1);
const uint8x8_t result = vrshrn_n_u16(res, AOM_BLEND_A64_ROUND_BITS);
diff --git a/av1/common/arm/cfl_neon.c b/av1/common/arm/cfl_neon.c
index 371be5f..0871b4f 100644
--- a/av1/common/arm/cfl_neon.c
+++ b/av1/common/arm/cfl_neon.c
@@ -10,6 +10,7 @@
*/
#include <arm_neon.h>
+#include "config/aom_config.h"
#include "config/av1_rtcd.h"
#include "av1/common/cfl.h"
@@ -31,12 +32,12 @@
// Store half of a vector.
static INLINE void vsth_u16(uint16_t *ptr, uint16x4_t val) {
- *((uint32_t *)ptr) = vreinterpret_u32_u16(val)[0];
+ vst1_lane_u32((uint32_t *)ptr, vreinterpret_u32_u16(val), 0);
}
// Store half of a vector.
static INLINE void vsth_u8(uint8_t *ptr, uint8x8_t val) {
- *((uint32_t *)ptr) = vreinterpret_u32_u8(val)[0];
+ vst1_lane_u32((uint32_t *)ptr, vreinterpret_u32_u8(val), 0);
}
static void cfl_luma_subsampling_420_lbd_neon(const uint8_t *input,
@@ -132,7 +133,7 @@
}
#if CONFIG_AV1_HIGHBITDEPTH
-#ifndef __aarch64__
+#if !AOM_ARCH_AARCH64
uint16x8_t vpaddq_u16(uint16x8_t a, uint16x8_t b) {
return vcombine_u16(vpadd_u16(vget_low_u16(a), vget_high_u16(a)),
vpadd_u16(vget_low_u16(b), vget_high_u16(b)));
@@ -269,7 +270,7 @@
// unsigned integer for the sum, we can do one addition operation inside 16
// bits (8 lanes) before having to convert to 32 bits (4 lanes).
const uint16_t *sum_buf = src;
- uint32x4_t sum_32x4 = { 0, 0, 0, 0 };
+ uint32x4_t sum_32x4 = vdupq_n_u32(0);
do {
// For all widths, we load, add and combine the data so it fits in 4 lanes.
if (width == 4) {
@@ -313,7 +314,7 @@
// Permute and add in such a way that each lane contains the block sum.
// [A+C+B+D, B+D+A+C, C+A+D+B, D+B+C+A]
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
sum_32x4 = vpaddq_u32(sum_32x4, sum_32x4);
sum_32x4 = vpaddq_u32(sum_32x4, sum_32x4);
#else
diff --git a/av1/common/arm/convolve_neon.c b/av1/common/arm/convolve_neon.c
index 012b3f7..713aaad 100644
--- a/av1/common/arm/convolve_neon.c
+++ b/av1/common/arm/convolve_neon.c
@@ -13,6 +13,7 @@
#include <assert.h>
#include <arm_neon.h>
+#include "config/aom_config.h"
#include "config/av1_rtcd.h"
#include "aom_dsp/aom_dsp_common.h"
@@ -44,17 +45,18 @@
return sum;
}
-#if !defined(__aarch64__)
-static INLINE uint8x8_t convolve8_horiz_4x1(
- const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
- const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
- const int16x4_t s6, const int16x4_t s7, const int16x8_t filter,
- const int16x4_t shift_round_0, const int16x4_t shift_by_bits) {
+#if !AOM_ARCH_AARCH64
+static INLINE uint8x8_t convolve8_x_4x1(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t filter,
+ const int16x4_t horiz_const) {
const int16x4_t filter_lo = vget_low_s16(filter);
const int16x4_t filter_hi = vget_high_s16(filter);
- int16x4_t sum;
+ int16x4_t sum = horiz_const;
- sum = vmul_lane_s16(s0, filter_lo, 0);
+ sum = vmla_lane_s16(sum, s0, filter_lo, 0);
sum = vmla_lane_s16(sum, s1, filter_lo, 1);
sum = vmla_lane_s16(sum, s2, filter_lo, 2);
sum = vmla_lane_s16(sum, s3, filter_lo, 3);
@@ -63,12 +65,200 @@
sum = vmla_lane_s16(sum, s6, filter_hi, 2);
sum = vmla_lane_s16(sum, s7, filter_hi, 3);
- sum = vqrshl_s16(sum, shift_round_0);
- sum = vqrshl_s16(sum, shift_by_bits);
-
- return vqmovun_s16(vcombine_s16(sum, sum));
+ // We halved the convolution filter values so - 1 from the right shift.
+ return vqrshrun_n_s16(vcombine_s16(sum, vdup_n_s16(0)), FILTER_BITS - 1);
}
-#endif // !defined(__arch64__)
+#endif // !AOM_ARCH_AARCH64
+
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE int32x4_t convolve12_4_usdot(uint8x16_t samples,
+ const int8x16_t filters,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum;
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ /* First 4 output values. */
+ sum = vusdotq_laneq_s32(horiz_const, permuted_samples[0], filters, 0);
+ sum = vusdotq_laneq_s32(sum, permuted_samples[1], filters, 1);
+ sum = vusdotq_laneq_s32(sum, permuted_samples[2], filters, 2);
+
+ return sum;
+}
+
+static INLINE int16x8_t convolve12_8_usdot(uint8x16_t samples0,
+ uint8x16_t samples1,
+ const int8x16_t filters,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[4];
+ int32x4_t sum[2];
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples0, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples0, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_u8(samples0, permute_tbl.val[2]);
+ /* {12, 13, 14, 15, 13, 14, 15, 16, 14, 15, 16, 17, 15, 16, 17, 18 } */
+ permuted_samples[3] = vqtbl1q_u8(samples1, permute_tbl.val[2]);
+
+ /* First 4 output values. */
+ sum[0] = vusdotq_laneq_s32(horiz_const, permuted_samples[0], filters, 0);
+ sum[0] = vusdotq_laneq_s32(sum[0], permuted_samples[1], filters, 1);
+ sum[0] = vusdotq_laneq_s32(sum[0], permuted_samples[2], filters, 2);
+ /* Second 4 output values. */
+ sum[1] = vusdotq_laneq_s32(horiz_const, permuted_samples[1], filters, 0);
+ sum[1] = vusdotq_laneq_s32(sum[1], permuted_samples[2], filters, 1);
+ sum[1] = vusdotq_laneq_s32(sum[1], permuted_samples[3], filters, 2);
+
+ /* Narrow and re-pack. */
+ return vcombine_s16(vqrshrn_n_s32(sum[0], FILTER_BITS),
+ vqrshrn_n_s32(sum[1], FILTER_BITS));
+}
+
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE int16x4_t convolve12_horiz_4_sdot(
+ uint8x16_t samples, const int8x16_t filters, const int32x4_t correction,
+ const uint8x16_t range_limit, const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
+ sum = vdotq_laneq_s32(sum, permuted_samples[1], filters, 1);
+ sum = vdotq_laneq_s32(sum, permuted_samples[2], filters, 2);
+
+ /* Narrow and re-pack. */
+ return vshrn_n_s32(sum, ROUND0_BITS);
+}
+
+static INLINE int16x8_t convolve12_horiz_8_sdot(
+ uint8x16_t samples0, uint8x16_t samples1, const int8x16_t filters,
+ const int32x4_t correction, const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples[2], permuted_samples[4];
+ int32x4_t sum[2];
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples[0] = vreinterpretq_s8_u8(vsubq_u8(samples0, range_limit));
+ clamped_samples[1] = vreinterpretq_s8_u8(vsubq_u8(samples1, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[2]);
+ /* {12, 13, 14, 15, 13, 14, 15, 16, 14, 15, 16, 17, 15, 16, 17, 18 } */
+ permuted_samples[3] = vqtbl1q_s8(clamped_samples[1], permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum[0] = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
+ sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[1], filters, 1);
+ sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[2], filters, 2);
+ /* Second 4 output values. */
+ sum[1] = vdotq_laneq_s32(correction, permuted_samples[1], filters, 0);
+ sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[2], filters, 1);
+ sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[3], filters, 2);
+
+ /* Narrow and re-pack. */
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS),
+ vshrn_n_s32(sum[1], ROUND0_BITS));
+}
+
+static INLINE int32x4_t convolve12_4_sdot(uint8x16_t samples,
+ const int8x16_t filters,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum;
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
+ sum = vdotq_laneq_s32(sum, permuted_samples[1], filters, 1);
+ sum = vdotq_laneq_s32(sum, permuted_samples[2], filters, 2);
+
+ return sum;
+}
+
+static INLINE int16x8_t convolve12_8_sdot(uint8x16_t samples0,
+ uint8x16_t samples1,
+ const int8x16_t filters,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples[2], permuted_samples[4];
+ int32x4_t sum[2];
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples[0] = vreinterpretq_s8_u8(vsubq_u8(samples0, range_limit));
+ clamped_samples[1] = vreinterpretq_s8_u8(vsubq_u8(samples1, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[2]);
+ /* {12, 13, 14, 15, 13, 14, 15, 16, 14, 15, 16, 17, 15, 16, 17, 18 } */
+ permuted_samples[3] = vqtbl1q_s8(clamped_samples[1], permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum[0] = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
+ sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[1], filters, 1);
+ sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[2], filters, 2);
+ /* Second 4 output values. */
+ sum[1] = vdotq_laneq_s32(correction, permuted_samples[1], filters, 0);
+ sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[2], filters, 1);
+ sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[3], filters, 2);
+
+ /* Narrow and re-pack. */
+ return vcombine_s16(vqrshrn_n_s32(sum[0], FILTER_BITS),
+ vqrshrn_n_s32(sum[1], FILTER_BITS));
+}
+
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
static INLINE uint8x8_t convolve8_vert_8x4(
const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
@@ -90,12 +280,10 @@
return vqrshrun_n_s16(sum, FILTER_BITS - 1);
}
-static INLINE int16x4_t convolve8_vert_4x4_s32(
+static INLINE int16x4_t convolve8_vert_4_s32(
const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
- const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
- const int32x4_t round_shift_vec, const int32x4_t offset_const,
- const int32x4_t sub_const_vec) {
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter) {
const int16x4_t y_filter_lo = vget_low_s16(y_filter);
const int16x4_t y_filter_hi = vget_high_s16(y_filter);
int32x4_t sum;
@@ -109,19 +297,14 @@
sum = vmlal_lane_s16(sum, s6, y_filter_hi, 2);
sum = vmlal_lane_s16(sum, s7, y_filter_hi, 3);
- sum = vaddq_s32(sum, offset_const);
- sum = vqrshlq_s32(sum, round_shift_vec);
- sum = vsubq_s32(sum, sub_const_vec);
-
- return vmovn_s32(sum);
+ return vqrshrn_n_s32(sum, 2 * FILTER_BITS - ROUND0_BITS);
}
-static INLINE uint8x8_t convolve8_vert_8x4_s32(
- const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
- const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
- const int16x8_t s6, const int16x8_t s7, const int16x8_t y_filter,
- const int32x4_t round_shift_vec, const int32x4_t offset_const,
- const int32x4_t sub_const_vec, const int16x8_t vec_round_bits) {
+static INLINE uint8x8_t
+convolve8_vert_8_s32(const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t y_filter, const int16x8_t sub_const) {
const int16x4_t y_filter_lo = vget_low_s16(y_filter);
const int16x4_t y_filter_hi = vget_high_s16(y_filter);
int32x4_t sum0, sum1;
@@ -145,132 +328,163 @@
sum1 = vmlal_lane_s16(sum1, vget_high_s16(s6), y_filter_hi, 2);
sum1 = vmlal_lane_s16(sum1, vget_high_s16(s7), y_filter_hi, 3);
- sum0 = vaddq_s32(sum0, offset_const);
- sum1 = vaddq_s32(sum1, offset_const);
- sum0 = vqrshlq_s32(sum0, round_shift_vec);
- sum1 = vqrshlq_s32(sum1, round_shift_vec);
- sum0 = vsubq_s32(sum0, sub_const_vec);
- sum1 = vsubq_s32(sum1, sub_const_vec);
-
- res = vcombine_s16(vmovn_s32(sum0), vmovn_s32(sum1));
- res = vqrshlq_s16(res, vec_round_bits);
+ res = vcombine_s16(vqrshrn_n_s32(sum0, 2 * FILTER_BITS - ROUND0_BITS),
+ vqrshrn_n_s32(sum1, 2 * FILTER_BITS - ROUND0_BITS));
+ res = vsubq_s16(res, sub_const);
return vqmovun_s16(res);
}
-static INLINE int16x4_t convolve12_vert_4x4_s32(
- const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
- const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
- const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
- const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
- const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
- const int32x4_t round_shift_vec, const int32x4_t offset_const,
- const int32x4_t sub_const_vec) {
- const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
- const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
- int32x4_t sum;
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
- sum = vmull_lane_s16(s0, y_filter_0_3, 0);
- sum = vmlal_lane_s16(sum, s1, y_filter_0_3, 1);
- sum = vmlal_lane_s16(sum, s2, y_filter_0_3, 2);
- sum = vmlal_lane_s16(sum, s3, y_filter_0_3, 3);
- sum = vmlal_lane_s16(sum, s4, y_filter_4_7, 0);
- sum = vmlal_lane_s16(sum, s5, y_filter_4_7, 1);
- sum = vmlal_lane_s16(sum, s6, y_filter_4_7, 2);
- sum = vmlal_lane_s16(sum, s7, y_filter_4_7, 3);
- sum = vmlal_lane_s16(sum, s8, y_filter_8_11, 0);
- sum = vmlal_lane_s16(sum, s9, y_filter_8_11, 1);
- sum = vmlal_lane_s16(sum, s10, y_filter_8_11, 2);
- sum = vmlal_lane_s16(sum, s11, y_filter_8_11, 3);
+void convolve_x_sr_12tap_neon(const uint8_t *src, int src_stride, uint8_t *dst,
+ int dst_stride, int w, int h,
+ const int16_t *x_filter_ptr) {
+ const int16x8_t filter_0_7 = vld1q_s16(x_filter_ptr);
+ const int16x4_t filter_8_11 = vld1_s16(x_filter_ptr + 8);
+ const int16x8_t filter_8_15 = vcombine_s16(filter_8_11, vdup_n_s16(0));
+ const int8x16_t filter =
+ vcombine_s8(vmovn_s16(filter_0_7), vmovn_s16(filter_8_15));
- sum = vaddq_s32(sum, offset_const);
- sum = vqrshlq_s32(sum, round_shift_vec);
- sum = vsubq_s32(sum, sub_const_vec);
+ // Special case the following no-op filter as 128 won't fit into the
+ // 8-bit signed dot-product instruction:
+ // { 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0 }
+ if (vgetq_lane_s16(filter_0_7, 5) == 128) {
+ uint8x8_t d0;
- return vmovn_s32(sum);
+ // Undo the horizontal offset in the calling function.
+ src += 5;
+
+ for (int i = 0; i < h; i++) {
+ for (int j = 0; j < w; j += 8) {
+ d0 = vld1_u8(src + i * src_stride + j);
+ if (w == 2) {
+ store_u8_2x1(dst + i * dst_stride, d0, 0);
+ } else if (w == 4) {
+ store_u8_4x1(dst + i * dst_stride, d0, 0);
+ } else {
+ vst1_u8(dst + i * dst_stride + j, d0);
+ }
+ }
+ }
+ } else {
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use a single rounding
+ // right shift by FILTER_BITS - instead of a first rounding right shift by
+ // ROUND0_BITS, followed by second rounding right shift by FILTER_BITS -
+ // ROUND0_BITS.
+ const int32x4_t horiz_const = vdupq_n_s32(1 << (ROUND0_BITS - 1));
+
+ if (w <= 4) {
+ uint8x16_t s0, s1, s2, s3;
+ int32x4_t d0, d1, d2, d3;
+ int16x8_t t01, t23;
+ uint8x8_t d01, d23;
+
+ do {
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve12_4_usdot(s0, filter, permute_tbl, horiz_const);
+ d1 = convolve12_4_usdot(s1, filter, permute_tbl, horiz_const);
+ d2 = convolve12_4_usdot(s2, filter, permute_tbl, horiz_const);
+ d3 = convolve12_4_usdot(s3, filter, permute_tbl, horiz_const);
+
+ t01 = vcombine_s16(vqrshrn_n_s32(d0, FILTER_BITS),
+ vqrshrn_n_s32(d1, FILTER_BITS));
+ t23 = vcombine_s16(vqrshrn_n_s32(d2, FILTER_BITS),
+ vqrshrn_n_s32(d3, FILTER_BITS));
+
+ d01 = vqmovun_s16(t01);
+ d23 = vqmovun_s16(t23);
+
+ if (w == 2) {
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+ }
+ }
+
+ dst += 4 * dst_stride;
+ src += 4 * src_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ uint8x16_t s0, s1, s2, s3, s4, s5, s6, s7;
+ int16x8_t d0, d1, d2, d3;
+ uint8x8_t dd0, dd1, dd2, dd3;
+
+ do {
+ const uint8_t *s = src;
+ uint8_t *d = dst;
+ int width = w;
+
+ do {
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+ load_u8_16x4(s + 4, src_stride, &s4, &s5, &s6, &s7);
+
+ d0 = convolve12_8_usdot(s0, s4, filter, permute_tbl, horiz_const);
+ d1 = convolve12_8_usdot(s1, s5, filter, permute_tbl, horiz_const);
+ d2 = convolve12_8_usdot(s2, s6, filter, permute_tbl, horiz_const);
+ d3 = convolve12_8_usdot(s3, s7, filter, permute_tbl, horiz_const);
+
+ dd0 = vqmovun_s16(d0);
+ dd1 = vqmovun_s16(d1);
+ dd2 = vqmovun_s16(d2);
+ dd3 = vqmovun_s16(d3);
+
+ store_u8_8x2(d + 0 * dst_stride, dst_stride, dd0, dd1);
+ if (h != 2) {
+ store_u8_8x2(d + 2 * dst_stride, dst_stride, dd2, dd3);
+ }
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ }
+ }
}
-static INLINE uint8x8_t convolve12_vert_8x4_s32(
- const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
- const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
- const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
- const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
- const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
- const int32x4_t round_shift_vec, const int32x4_t offset_const,
- const int32x4_t sub_const_vec, const int16x8_t vec_round_bits) {
- const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
- const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
- int32x4_t sum0, sum1;
- int16x8_t res;
-
- sum0 = vmull_lane_s16(vget_low_s16(s0), y_filter_0_3, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_0_3, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_0_3, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_0_3, 3);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_4_7, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_4_7, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s6), y_filter_4_7, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s7), y_filter_4_7, 3);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s8), y_filter_8_11, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s9), y_filter_8_11, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s10), y_filter_8_11, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s11), y_filter_8_11, 3);
-
- sum1 = vmull_lane_s16(vget_high_s16(s0), y_filter_0_3, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_0_3, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_0_3, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_0_3, 3);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_4_7, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_4_7, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s6), y_filter_4_7, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s7), y_filter_4_7, 3);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s8), y_filter_8_11, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s9), y_filter_8_11, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s10), y_filter_8_11, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s11), y_filter_8_11, 3);
-
- sum0 = vaddq_s32(sum0, offset_const);
- sum1 = vaddq_s32(sum1, offset_const);
- sum0 = vqrshlq_s32(sum0, round_shift_vec);
- sum1 = vqrshlq_s32(sum1, round_shift_vec);
- sum0 = vsubq_s32(sum0, sub_const_vec);
- sum1 = vsubq_s32(sum1, sub_const_vec);
-
- res = vcombine_s16(vmovn_s32(sum0), vmovn_s32(sum1));
- res = vqrshlq_s16(res, vec_round_bits);
-
- return vqmovun_s16(res);
-}
-
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
-
void av1_convolve_x_sr_neon(const uint8_t *src, int src_stride, uint8_t *dst,
int dst_stride, int w, int h,
const InterpFilterParams *filter_params_x,
const int subpel_x_qn,
ConvolveParams *conv_params) {
- if (filter_params_x->taps > 8) {
- av1_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_x,
- subpel_x_qn, conv_params);
- return;
- }
+ (void)conv_params;
const uint8_t horiz_offset = filter_params_x->taps / 2 - 1;
- const int8_t bits = FILTER_BITS - conv_params->round_0;
-
- assert(bits >= 0);
- assert((FILTER_BITS - conv_params->round_1) >= 0 ||
- ((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS));
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ src -= horiz_offset;
+
+ if (filter_params_x->taps > 8) {
+ convolve_x_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ x_filter_ptr);
+ return;
+ }
+
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
-
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t shift_by_bits = vdupq_n_s16(-bits);
-
- src -= horiz_offset;
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ // The outermost -1 is needed because we halved the filter values.
+ const int32x4_t horiz_const = vdupq_n_s32(1 << ((ROUND0_BITS - 1) - 1));
if (w <= 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
@@ -280,49 +494,33 @@
uint8x8_t d01, d23;
do {
- s0 = vld1q_u8(src + 0 * src_stride);
- s1 = vld1q_u8(src + 1 * src_stride);
- s2 = vld1q_u8(src + 2 * src_stride);
- s3 = vld1q_u8(src + 3 * src_stride);
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
- t0 = convolve8_4_usdot(s0, x_filter, permute_tbl, vdupq_n_s32(0));
- t1 = convolve8_4_usdot(s1, x_filter, permute_tbl, vdupq_n_s32(0));
- t2 = convolve8_4_usdot(s2, x_filter, permute_tbl, vdupq_n_s32(0));
- t3 = convolve8_4_usdot(s3, x_filter, permute_tbl, vdupq_n_s32(0));
+ t0 = convolve8_4_usdot(s0, x_filter, permute_tbl, horiz_const);
+ t1 = convolve8_4_usdot(s1, x_filter, permute_tbl, horiz_const);
+ t2 = convolve8_4_usdot(s2, x_filter, permute_tbl, horiz_const);
+ t3 = convolve8_4_usdot(s3, x_filter, permute_tbl, horiz_const);
t01 = vcombine_s16(vmovn_s32(t0), vmovn_s32(t1));
t23 = vcombine_s16(vmovn_s32(t2), vmovn_s32(t3));
- t01 = vqrshlq_s16(t01, shift_round_0);
- t23 = vqrshlq_s16(t23, shift_round_0);
-
- t01 = vqrshlq_s16(t01, shift_by_bits);
- t23 = vqrshlq_s16(t23, shift_by_bits);
-
- d01 = vqmovun_s16(t01);
- d23 = vqmovun_s16(t23);
+ // We halved the convolution filter values so - 1 from the right shift.
+ d01 = vqrshrun_n_s16(t01, FILTER_BITS - 1);
+ d23 = vqrshrun_n_s16(t23, FILTER_BITS - 1);
if (w == 2) {
- vst1_lane_u16((uint16_t *)(dst + 0 * dst_stride),
- vreinterpret_u16_u8(d01), 0);
- vst1_lane_u16((uint16_t *)(dst + 1 * dst_stride),
- vreinterpret_u16_u8(d01), 2);
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d01, 2);
if (h != 2) {
- vst1_lane_u16((uint16_t *)(dst + 2 * dst_stride),
- vreinterpret_u16_u8(d23), 0);
- vst1_lane_u16((uint16_t *)(dst + 3 * dst_stride),
- vreinterpret_u16_u8(d23), 2);
+ store_u8_2x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
}
} else {
- vst1_lane_u32((uint32_t *)(dst + 0 * dst_stride),
- vreinterpret_u32_u8(d01), 0);
- vst1_lane_u32((uint32_t *)(dst + 1 * dst_stride),
- vreinterpret_u32_u8(d01), 1);
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
if (h != 2) {
- vst1_lane_u32((uint32_t *)(dst + 2 * dst_stride),
- vreinterpret_u32_u8(d23), 0);
- vst1_lane_u32((uint32_t *)(dst + 3 * dst_stride),
- vreinterpret_u32_u8(d23), 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
}
}
@@ -343,29 +541,18 @@
uint8_t *d = dst;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- t0 = convolve8_8_usdot(s0, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- t1 = convolve8_8_usdot(s1, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- t2 = convolve8_8_usdot(s2, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- t3 = convolve8_8_usdot(s3, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
+ t0 = convolve8_x_8_usdot(s0, x_filter, permute_tbl, horiz_const);
+ t1 = convolve8_x_8_usdot(s1, x_filter, permute_tbl, horiz_const);
+ t2 = convolve8_x_8_usdot(s2, x_filter, permute_tbl, horiz_const);
+ t3 = convolve8_x_8_usdot(s3, x_filter, permute_tbl, horiz_const);
- t0 = vqrshlq_s16(t0, shift_by_bits);
- t1 = vqrshlq_s16(t1, shift_by_bits);
- t2 = vqrshlq_s16(t2, shift_by_bits);
- t3 = vqrshlq_s16(t3, shift_by_bits);
-
- d0 = vqmovun_s16(t0);
- d1 = vqmovun_s16(t1);
- d2 = vqmovun_s16(t2);
- d3 = vqmovun_s16(t3);
+ // We halved the convolution filter values so - 1 from the right shift.
+ d0 = vqrshrun_n_s16(t0, FILTER_BITS - 1);
+ d1 = vqrshrun_n_s16(t1, FILTER_BITS - 1);
+ d2 = vqrshrun_n_s16(t2, FILTER_BITS - 1);
+ d3 = vqrshrun_n_s16(t3, FILTER_BITS - 1);
vst1_u8(d + 0 * dst_stride, d0);
vst1_u8(d + 1 * dst_stride, d1);
@@ -386,40 +573,174 @@
}
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
+
+void convolve_x_sr_12tap_neon(const uint8_t *src, int src_stride, uint8_t *dst,
+ int dst_stride, int w, int h,
+ const int16_t *x_filter_ptr) {
+ const int16x8_t filter_0_7 = vld1q_s16(x_filter_ptr);
+ const int16x4_t filter_8_11 = vld1_s16(x_filter_ptr + 8);
+ const int16x8_t filter_8_15 = vcombine_s16(filter_8_11, vdup_n_s16(0));
+ const int8x16_t filter =
+ vcombine_s8(vmovn_s16(filter_0_7), vmovn_s16(filter_8_15));
+
+ const int32x4_t correct_tmp =
+ vaddq_s32(vpaddlq_s16(vshlq_n_s16(filter_0_7, 7)),
+ vpaddlq_s16(vshlq_n_s16(filter_8_15, 7)));
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use a single rounding
+ // right shift by FILTER_BITS - instead of a first rounding right shift by
+ // ROUND0_BITS, followed by second rounding right shift by FILTER_BITS -
+ // ROUND0_BITS.
+ int32x4_t correction =
+ vdupq_n_s32(vaddvq_s32(correct_tmp) + (1 << (ROUND0_BITS - 1)));
+ const uint8x16_t range_limit = vdupq_n_u8(128);
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+
+ // Special case the following no-op filter as 128 won't fit into the
+ // 8-bit signed dot-product instruction:
+ // { 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0 }
+ if (vgetq_lane_s16(filter_0_7, 5) == 128) {
+ uint8x8_t d0;
+
+ // Undo the horizontal offset in the calling function.
+ src += 5;
+
+ for (int i = 0; i < h; i++) {
+ for (int j = 0; j < w; j += 8) {
+ d0 = vld1_u8(src + i * src_stride + j);
+ if (w == 2) {
+ store_u8_2x1(dst + i * dst_stride, d0, 0);
+ } else if (w == 4) {
+ store_u8_4x1(dst + i * dst_stride, d0, 0);
+ } else {
+ vst1_u8(dst + i * dst_stride + j, d0);
+ }
+ }
+ }
+ } else {
+ if (w <= 4) {
+ uint8x16_t s0, s1, s2, s3;
+ int32x4_t d0, d1, d2, d3;
+ int16x8_t t01, t23;
+ uint8x8_t d01, d23;
+
+ do {
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 =
+ convolve12_4_sdot(s0, filter, correction, range_limit, permute_tbl);
+ d1 =
+ convolve12_4_sdot(s1, filter, correction, range_limit, permute_tbl);
+ d2 =
+ convolve12_4_sdot(s2, filter, correction, range_limit, permute_tbl);
+ d3 =
+ convolve12_4_sdot(s3, filter, correction, range_limit, permute_tbl);
+
+ t01 = vcombine_s16(vqrshrn_n_s32(d0, FILTER_BITS),
+ vqrshrn_n_s32(d1, FILTER_BITS));
+ t23 = vcombine_s16(vqrshrn_n_s32(d2, FILTER_BITS),
+ vqrshrn_n_s32(d3, FILTER_BITS));
+
+ d01 = vqmovun_s16(t01);
+ d23 = vqmovun_s16(t23);
+
+ if (w == 2) {
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+ }
+ }
+
+ dst += 4 * dst_stride;
+ src += 4 * src_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ uint8x16_t s0, s1, s2, s3, s4, s5, s6, s7;
+ int16x8_t d0, d1, d2, d3;
+ uint8x8_t dd0, dd1, dd2, dd3;
+
+ do {
+ const uint8_t *s = src;
+ uint8_t *d = dst;
+ int width = w;
+
+ do {
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+ load_u8_16x4(s + 4, src_stride, &s4, &s5, &s6, &s7);
+
+ d0 = convolve12_8_sdot(s0, s4, filter, correction, range_limit,
+ permute_tbl);
+ d1 = convolve12_8_sdot(s1, s5, filter, correction, range_limit,
+ permute_tbl);
+ d2 = convolve12_8_sdot(s2, s6, filter, correction, range_limit,
+ permute_tbl);
+ d3 = convolve12_8_sdot(s3, s7, filter, correction, range_limit,
+ permute_tbl);
+
+ dd0 = vqmovun_s16(d0);
+ dd1 = vqmovun_s16(d1);
+ dd2 = vqmovun_s16(d2);
+ dd3 = vqmovun_s16(d3);
+
+ store_u8_8x2(d + 0 * dst_stride, dst_stride, dd0, dd1);
+ if (h != 2) {
+ store_u8_8x2(d + 2 * dst_stride, dst_stride, dd2, dd3);
+ }
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ }
+ }
+}
void av1_convolve_x_sr_neon(const uint8_t *src, int src_stride, uint8_t *dst,
int dst_stride, int w, int h,
const InterpFilterParams *filter_params_x,
const int subpel_x_qn,
ConvolveParams *conv_params) {
- if (filter_params_x->taps > 8) {
- av1_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_x,
- subpel_x_qn, conv_params);
- return;
- }
+ (void)conv_params;
const uint8_t horiz_offset = filter_params_x->taps / 2 - 1;
- const int8_t bits = FILTER_BITS - conv_params->round_0;
-
- assert(bits >= 0);
- assert((FILTER_BITS - conv_params->round_1) >= 0 ||
- ((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS));
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ src -= horiz_offset;
+
+ if (filter_params_x->taps > 8) {
+ convolve_x_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ x_filter_ptr);
+ return;
+ }
+
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
// Dot product constants.
const int16x8_t correct_tmp = vshll_n_s8(x_filter, 7);
- const int32x4_t correction = vdupq_n_s32(vaddlvq_s16(correct_tmp));
+ // This shim of (1 << ((ROUND0_BITS - 1) - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ // The outermost -1 is needed because we halved the filter values.
+ const int32x4_t correction =
+ vdupq_n_s32(vaddlvq_s16(correct_tmp) + (1 << ((ROUND0_BITS - 1) - 1)));
const uint8x16_t range_limit = vdupq_n_u8(128);
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t shift_by_bits = vdupq_n_s16(-bits);
-
- src -= horiz_offset;
-
if (w <= 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
uint8x16_t s0, s1, s2, s3;
@@ -428,10 +749,7 @@
uint8x8_t d01, d23;
do {
- s0 = vld1q_u8(src + 0 * src_stride);
- s1 = vld1q_u8(src + 1 * src_stride);
- s2 = vld1q_u8(src + 2 * src_stride);
- s3 = vld1q_u8(src + 3 * src_stride);
+ load_u8_16x4(src, src_stride, &s0, &s1, &s2, &s3);
t0 = convolve8_4_sdot(s0, x_filter, correction, range_limit, permute_tbl);
t1 = convolve8_4_sdot(s1, x_filter, correction, range_limit, permute_tbl);
@@ -441,36 +759,23 @@
t01 = vcombine_s16(vmovn_s32(t0), vmovn_s32(t1));
t23 = vcombine_s16(vmovn_s32(t2), vmovn_s32(t3));
- t01 = vqrshlq_s16(t01, shift_round_0);
- t23 = vqrshlq_s16(t23, shift_round_0);
-
- t01 = vqrshlq_s16(t01, shift_by_bits);
- t23 = vqrshlq_s16(t23, shift_by_bits);
-
- d01 = vqmovun_s16(t01);
- d23 = vqmovun_s16(t23);
+ // We halved the convolution filter values so - 1 from the right shift.
+ d01 = vqrshrun_n_s16(t01, FILTER_BITS - 1);
+ d23 = vqrshrun_n_s16(t23, FILTER_BITS - 1);
if (w == 2) {
- vst1_lane_u16((uint16_t *)(dst + 0 * dst_stride),
- vreinterpret_u16_u8(d01), 0);
- vst1_lane_u16((uint16_t *)(dst + 1 * dst_stride),
- vreinterpret_u16_u8(d01), 2);
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d01, 2);
if (h != 2) {
- vst1_lane_u16((uint16_t *)(dst + 2 * dst_stride),
- vreinterpret_u16_u8(d23), 0);
- vst1_lane_u16((uint16_t *)(dst + 3 * dst_stride),
- vreinterpret_u16_u8(d23), 2);
+ store_u8_2x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
}
} else {
- vst1_lane_u32((uint32_t *)(dst + 0 * dst_stride),
- vreinterpret_u32_u8(d01), 0);
- vst1_lane_u32((uint32_t *)(dst + 1 * dst_stride),
- vreinterpret_u32_u8(d01), 1);
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
if (h != 2) {
- vst1_lane_u32((uint32_t *)(dst + 2 * dst_stride),
- vreinterpret_u32_u8(d23), 0);
- vst1_lane_u32((uint32_t *)(dst + 3 * dst_stride),
- vreinterpret_u32_u8(d23), 1);
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
}
}
@@ -478,7 +783,6 @@
src += 4 * src_stride;
dst += 4 * dst_stride;
} while (h > 0);
-
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
uint8x16_t s0, s1, s2, s3;
@@ -491,29 +795,22 @@
uint8_t *d = dst;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- t0 = convolve8_8_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- t1 = convolve8_8_sdot(s1, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- t2 = convolve8_8_sdot(s2, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- t3 = convolve8_8_sdot(s3, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ t0 = convolve8_x_8_sdot(s0, x_filter, correction, range_limit,
+ permute_tbl);
+ t1 = convolve8_x_8_sdot(s1, x_filter, correction, range_limit,
+ permute_tbl);
+ t2 = convolve8_x_8_sdot(s2, x_filter, correction, range_limit,
+ permute_tbl);
+ t3 = convolve8_x_8_sdot(s3, x_filter, correction, range_limit,
+ permute_tbl);
- t0 = vqrshlq_s16(t0, shift_by_bits);
- t1 = vqrshlq_s16(t1, shift_by_bits);
- t2 = vqrshlq_s16(t2, shift_by_bits);
- t3 = vqrshlq_s16(t3, shift_by_bits);
-
- d0 = vqmovun_s16(t0);
- d1 = vqmovun_s16(t1);
- d2 = vqmovun_s16(t2);
- d3 = vqmovun_s16(t3);
+ // We halved the convolution filter values so - 1 from the right shift.
+ d0 = vqrshrun_n_s16(t0, FILTER_BITS - 1);
+ d1 = vqrshrun_n_s16(t1, FILTER_BITS - 1);
+ d2 = vqrshrun_n_s16(t2, FILTER_BITS - 1);
+ d3 = vqrshrun_n_s16(t3, FILTER_BITS - 1);
vst1_u8(d + 0 * dst_stride, d0);
vst1_u8(d + 1 * dst_stride, d1);
@@ -534,18 +831,18 @@
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
-static INLINE uint8x8_t convolve8_horiz_8x8(
- const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
- const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
- const int16x8_t s6, const int16x8_t s7, const int16x8_t filter,
- const int16x8_t shift_round_0, const int16x8_t shift_by_bits) {
+static INLINE uint8x8_t
+convolve8_horiz_8x8(const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t filter, const int16x8_t horiz_const) {
const int16x4_t filter_lo = vget_low_s16(filter);
const int16x4_t filter_hi = vget_high_s16(filter);
- int16x8_t sum;
+ int16x8_t sum = horiz_const;
- sum = vmulq_lane_s16(s0, filter_lo, 0);
+ sum = vmlaq_lane_s16(sum, s0, filter_lo, 0);
sum = vmlaq_lane_s16(sum, s1, filter_lo, 1);
sum = vmlaq_lane_s16(sum, s2, filter_lo, 2);
sum = vmlaq_lane_s16(sum, s3, filter_lo, 3);
@@ -554,10 +851,218 @@
sum = vmlaq_lane_s16(sum, s6, filter_hi, 2);
sum = vmlaq_lane_s16(sum, s7, filter_hi, 3);
- sum = vqrshlq_s16(sum, shift_round_0);
- sum = vqrshlq_s16(sum, shift_by_bits);
+ // We halved the convolution filter values so - 1 from the right shift.
+ return vqrshrun_n_s16(sum, FILTER_BITS - 1);
+}
- return vqmovun_s16(sum);
+static INLINE int16x4_t convolve12_x_4x4_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t x_filter_0_7, const int16x4_t x_filter_8_11,
+ const int32x4_t horiz_const) {
+ const int16x4_t x_filter_0_3 = vget_low_s16(x_filter_0_7);
+ const int16x4_t x_filter_4_7 = vget_high_s16(x_filter_0_7);
+ int32x4_t sum = horiz_const;
+
+ sum = vmlal_lane_s16(sum, s0, x_filter_0_3, 0);
+ sum = vmlal_lane_s16(sum, s1, x_filter_0_3, 1);
+ sum = vmlal_lane_s16(sum, s2, x_filter_0_3, 2);
+ sum = vmlal_lane_s16(sum, s3, x_filter_0_3, 3);
+ sum = vmlal_lane_s16(sum, s4, x_filter_4_7, 0);
+ sum = vmlal_lane_s16(sum, s5, x_filter_4_7, 1);
+ sum = vmlal_lane_s16(sum, s6, x_filter_4_7, 2);
+ sum = vmlal_lane_s16(sum, s7, x_filter_4_7, 3);
+ sum = vmlal_lane_s16(sum, s8, x_filter_8_11, 0);
+ sum = vmlal_lane_s16(sum, s9, x_filter_8_11, 1);
+ sum = vmlal_lane_s16(sum, s10, x_filter_8_11, 2);
+ sum = vmlal_lane_s16(sum, s11, x_filter_8_11, 3);
+
+ return vqrshrn_n_s32(sum, FILTER_BITS);
+}
+
+// 4 column per iteration filtering for 12-tap convolve_x_sr.
+// Processes one row at a time.
+static INLINE void x_filter_12tap_w4_single_row(
+ const uint8_t *src_ptr, int src_stride, uint8_t *dst_ptr,
+ const int dst_stride, int w, int h, const int16x8_t x_filter_0_7,
+ const int16x4_t x_filter_8_11) {
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ const int32x4_t horiz_const = vdupq_n_s32(1 << (ROUND0_BITS - 1));
+
+ do {
+ const uint8_t *s = src_ptr;
+ uint8_t *d = dst_ptr;
+ int width = w;
+
+ do {
+ uint8x8_t dd0;
+ uint8x16_t t0;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, d0;
+ int16x8_t tt0, tt1;
+
+ t0 = vld1q_u8(s);
+ tt0 = vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(t0)));
+ tt1 = vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(t0)));
+
+ s0 = vget_low_s16(tt0);
+ s4 = vget_high_s16(tt0);
+ s8 = vget_low_s16(tt1);
+ s12 = vget_high_s16(tt1);
+
+ s1 = vext_s16(s0, s4, 1); // a1 a2 a3 a4
+ s2 = vext_s16(s0, s4, 2); // a2 a3 a4 a5
+ s3 = vext_s16(s0, s4, 3); // a3 a4 a5 a6
+ s5 = vext_s16(s4, s8, 1); // a5 a6 a7 a8
+ s6 = vext_s16(s4, s8, 2); // a6 a7 a8 a9
+ s7 = vext_s16(s4, s8, 3); // a7 a8 a9 a10
+ s9 = vext_s16(s8, s12, 1); // a9 a10 a11 a12
+ s10 = vext_s16(s8, s12, 2); // a10 a11 a12 a13
+ s11 = vext_s16(s8, s12, 3); // a11 a12 a13 a14
+
+ d0 = convolve12_x_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, x_filter_0_7, x_filter_8_11, horiz_const);
+
+ dd0 = vqmovun_s16(vcombine_s16(d0, vdup_n_s16(0)));
+
+ if (w == 2) {
+ store_u8_2x1(d, dd0, 0);
+ } else {
+ store_u8_4x1(d, dd0, 0);
+ }
+
+ s += 4;
+ d += 4;
+ width -= 4;
+ } while (width > 0);
+
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ } while (--h != 0);
+}
+
+static INLINE void convolve_x_sr_12tap_neon(const uint8_t *src_ptr,
+ int src_stride, uint8_t *dst_ptr,
+ const int dst_stride, int w, int h,
+ const int16_t *x_filter_ptr) {
+ const int16x8_t x_filter_0_7 = vld1q_s16(x_filter_ptr);
+ const int16x4_t x_filter_8_11 = vld1_s16(x_filter_ptr + 8);
+
+#if AOM_ARCH_AARCH64
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ const int32x4_t horiz_const = vdupq_n_s32(1 << (ROUND0_BITS - 1));
+
+ do {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint8x8_t t0, t1, t2, t3;
+
+ const uint8_t *s = src_ptr;
+ uint8_t *d = dst_ptr;
+ int width = w;
+
+ load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s7 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+
+ load_u8_8x4(s + 8, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
+ s += 11;
+
+ do {
+ int16x4_t s11, s12, s13, s14, d0, d1, d2, d3;
+ int16x8_t d01, d23;
+ uint8x8_t dd01, dd23;
+
+ load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+
+ s11 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s12 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s13 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s14 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+
+ d0 = convolve12_x_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, x_filter_0_7, x_filter_8_11, horiz_const);
+ d1 = convolve12_x_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, x_filter_0_7, x_filter_8_11, horiz_const);
+ d2 = convolve12_x_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, x_filter_0_7, x_filter_8_11, horiz_const);
+ d3 = convolve12_x_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13,
+ s14, x_filter_0_7, x_filter_8_11, horiz_const);
+
+ transpose_s16_4x4d(&d0, &d1, &d2, &d3);
+
+ d01 = vcombine_s16(d0, d1);
+ d23 = vcombine_s16(d2, d3);
+
+ dd01 = vqmovun_s16(d01);
+ dd23 = vqmovun_s16(d23);
+
+ if (w == 2) {
+ store_u8_2x1(d + 0 * dst_stride, dd01, 0);
+ store_u8_2x1(d + 1 * dst_stride, dd01, 2);
+ if (h != 2) {
+ store_u8_2x1(d + 2 * dst_stride, dd23, 0);
+ store_u8_2x1(d + 3 * dst_stride, dd23, 2);
+ }
+ } else {
+ store_u8_4x1(d + 0 * dst_stride, dd01, 0);
+ store_u8_4x1(d + 1 * dst_stride, dd01, 1);
+ if (h != 2) {
+ store_u8_4x1(d + 2 * dst_stride, dd23, 0);
+ store_u8_4x1(d + 3 * dst_stride, dd23, 1);
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4;
+ d += 4;
+ width -= 4;
+ } while (width > 0);
+
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+ } while (h >= 4);
+
+ if (h > 0) {
+ x_filter_12tap_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w, h,
+ x_filter_0_7, x_filter_8_11);
+ }
+#else // !AOM_ARCH_AARCH64
+ x_filter_12tap_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w, h,
+ x_filter_0_7, x_filter_8_11);
+#endif // AOM_ARCH_AARCH64
}
void av1_convolve_x_sr_neon(const uint8_t *src, int src_stride, uint8_t *dst,
@@ -565,33 +1070,33 @@
const InterpFilterParams *filter_params_x,
const int subpel_x_qn,
ConvolveParams *conv_params) {
- if (filter_params_x->taps > 8) {
- av1_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_x,
- subpel_x_qn, conv_params);
- return;
- }
+ (void)conv_params;
const uint8_t horiz_offset = filter_params_x->taps / 2 - 1;
- const int8_t bits = FILTER_BITS - conv_params->round_0;
-
- uint8x8_t t0;
-#if defined(__aarch64__)
- uint8x8_t t1, t2, t3;
-#endif
-
- assert(bits >= 0);
- assert((FILTER_BITS - conv_params->round_1) >= 0 ||
- ((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS));
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ src -= horiz_offset;
+
+ if (filter_params_x->taps > 8) {
+ convolve_x_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ x_filter_ptr);
+ return;
+ }
+
+ uint8x8_t t0;
+#if AOM_ARCH_AARCH64
+ uint8x8_t t1, t2, t3;
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ // The outermost -1 is needed because we halved the filter values.
+ const int16x8_t horiz_const = vdupq_n_s16(1 << ((ROUND0_BITS - 1) - 1));
+#endif // AOM_ARCH_AARCH64
// Filter values are even so downshift by 1 to reduce precision requirements.
const int16x8_t x_filter = vshrq_n_s16(vld1q_s16(x_filter_ptr), 1);
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t shift_by_bits = vdupq_n_s16(-bits);
-
- src -= horiz_offset;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
if (h == 4) {
uint8x8_t d01, d23;
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, d0, d1, d2, d3;
@@ -628,42 +1133,32 @@
s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
d0 = convolve8_4x4(s0, s1, s2, s3, s4, s5, s6, s7, x_filter);
-
d1 = convolve8_4x4(s1, s2, s3, s4, s5, s6, s7, s8, x_filter);
-
d2 = convolve8_4x4(s2, s3, s4, s5, s6, s7, s8, s9, x_filter);
-
d3 = convolve8_4x4(s3, s4, s5, s6, s7, s8, s9, s10, x_filter);
- d01_temp = vqrshlq_s16(vcombine_s16(d0, d1), shift_round_0);
- d23_temp = vqrshlq_s16(vcombine_s16(d2, d3), shift_round_0);
+ d01_temp = vcombine_s16(d0, d1);
+ d23_temp = vcombine_s16(d2, d3);
- d01_temp = vqrshlq_s16(d01_temp, shift_by_bits);
- d23_temp = vqrshlq_s16(d23_temp, shift_by_bits);
+ d01_temp = vaddq_s16(d01_temp, horiz_const);
+ d23_temp = vaddq_s16(d23_temp, horiz_const);
- d01 = vqmovun_s16(d01_temp);
- d23 = vqmovun_s16(d23_temp);
+ // We halved the convolution filter values so - 1 from the right shift.
+ d01 = vqrshrun_n_s16(d01_temp, FILTER_BITS - 1);
+ d23 = vqrshrun_n_s16(d23_temp, FILTER_BITS - 1);
transpose_u8_4x4(&d01, &d23);
- if (w != 2) {
- vst1_lane_u32((uint32_t *)(dst + 0 * dst_stride), // 00 01 02 03
- vreinterpret_u32_u8(d01), 0);
- vst1_lane_u32((uint32_t *)(dst + 1 * dst_stride), // 10 11 12 13
- vreinterpret_u32_u8(d23), 0);
- vst1_lane_u32((uint32_t *)(dst + 2 * dst_stride), // 20 21 22 23
- vreinterpret_u32_u8(d01), 1);
- vst1_lane_u32((uint32_t *)(dst + 3 * dst_stride), // 30 31 32 33
- vreinterpret_u32_u8(d23), 1);
+ if (w == 2) {
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 2 * dst_stride, d01, 2);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
} else {
- vst1_lane_u16((uint16_t *)(dst + 0 * dst_stride), // 00 01
- vreinterpret_u16_u8(d01), 0);
- vst1_lane_u16((uint16_t *)(dst + 1 * dst_stride), // 10 11
- vreinterpret_u16_u8(d23), 0);
- vst1_lane_u16((uint16_t *)(dst + 2 * dst_stride), // 20 21
- vreinterpret_u16_u8(d01), 2);
- vst1_lane_u16((uint16_t *)(dst + 3 * dst_stride), // 30 31
- vreinterpret_u16_u8(d23), 2);
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 2 * dst_stride, d01, 1);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
}
s0 = s4;
@@ -678,18 +1173,18 @@
w -= 4;
} while (w > 0);
} else {
-#endif
+#endif // AOM_ARCH_AARCH64
int width;
const uint8_t *s;
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t s8, s9, s10;
uint8x8_t t4, t5, t6, t7;
-#endif
+#endif // AOM_ARCH_AARCH64
if (w <= 4) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
do {
load_u8_8x8(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
@@ -729,88 +1224,53 @@
__builtin_prefetch(src + 6 * src_stride);
__builtin_prefetch(src + 7 * src_stride);
t0 = convolve8_horiz_8x8(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- shift_round_0, shift_by_bits);
+ horiz_const);
t1 = convolve8_horiz_8x8(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- shift_round_0, shift_by_bits);
+ horiz_const);
t2 = convolve8_horiz_8x8(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- shift_round_0, shift_by_bits);
+ horiz_const);
t3 = convolve8_horiz_8x8(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- shift_round_0, shift_by_bits);
+ horiz_const);
transpose_u8_8x4(&t0, &t1, &t2, &t3);
- if ((w == 4) && (h > 4)) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t0),
- 0); // 00 01 02 03
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t1),
- 0); // 10 11 12 13
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t2),
- 0); // 20 21 22 23
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t3),
- 0); // 30 31 32 33
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t0),
- 1); // 40 41 42 43
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t1),
- 1); // 50 51 52 53
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t2),
- 1); // 60 61 62 63
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t3),
- 1); // 70 71 72 73
- dst += dst_stride;
- } else if ((w == 4) && (h == 2)) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t0),
- 0); // 00 01 02 03
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t1),
- 0); // 10 11 12 13
- dst += dst_stride;
- } else if ((w == 2) && (h > 4)) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t0),
- 0); // 00 01
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t1),
- 0); // 10 11
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t2),
- 0); // 20 21
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t3),
- 0); // 30 31
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t0),
- 2); // 40 41
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t1),
- 2); // 50 51
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t2),
- 2); // 60 61
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t3),
- 2); // 70 71
- dst += dst_stride;
- } else if ((w == 2) && (h == 2)) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t0),
- 0); // 00 01
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t1),
- 0); // 10 11
- dst += dst_stride;
+ if (w == 4) {
+ store_u8_4x1(dst + 0 * dst_stride, t0, 0);
+ store_u8_4x1(dst + 1 * dst_stride, t1, 0);
+ if (h > 4) {
+ store_u8_4x1(dst + 2 * dst_stride, t2, 0);
+ store_u8_4x1(dst + 3 * dst_stride, t3, 0);
+ store_u8_4x1(dst + 4 * dst_stride, t0, 1);
+ store_u8_4x1(dst + 5 * dst_stride, t1, 1);
+ store_u8_4x1(dst + 6 * dst_stride, t2, 1);
+ store_u8_4x1(dst + 7 * dst_stride, t3, 1);
+ }
+ } else if (w == 2) {
+ store_u8_2x1(dst + 0 * dst_stride, t0, 0);
+ store_u8_2x1(dst + 1 * dst_stride, t1, 0);
+ if (h > 4) {
+ store_u8_2x1(dst + 2 * dst_stride, t2, 0);
+ store_u8_2x1(dst + 3 * dst_stride, t3, 0);
+ store_u8_2x1(dst + 4 * dst_stride, t0, 2);
+ store_u8_2x1(dst + 5 * dst_stride, t1, 2);
+ store_u8_2x1(dst + 6 * dst_stride, t2, 2);
+ store_u8_2x1(dst + 7 * dst_stride, t3, 2);
+ }
}
+
+ dst += 8 * dst_stride;
h -= 8;
} while (h > 0);
-#else
+#else // !AOM_ARCH_AARCH64
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ // The outermost -1 is needed because we halved the filter values.
+ const int16x4_t horiz_const = vdup_n_s16(1 << ((ROUND0_BITS - 1) - 1));
int16x8_t tt0;
int16x4_t x0, x1, x2, x3, x4, x5, x6, x7;
- const int16x4_t shift_round_0_low = vget_low_s16(shift_round_0);
- const int16x4_t shift_by_bits_low = vget_low_s16(shift_by_bits);
+
do {
t0 = vld1_u8(src); // a0 a1 a2 a3 a4 a5 a6 a7
tt0 = vreinterpretq_s16_u16(vmovl_u8(t0));
@@ -830,24 +1290,23 @@
src += src_stride;
- t0 = convolve8_horiz_4x1(x0, x1, x2, x3, x4, x5, x6, x7, x_filter,
- shift_round_0_low, shift_by_bits_low);
+ t0 = convolve8_x_4x1(x0, x1, x2, x3, x4, x5, x6, x7, x_filter,
+ horiz_const);
if (w == 4) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(t0),
- 0); // 00 01 02 03
+ store_u8_4x1(dst, t0, 0);
dst += dst_stride;
} else if (w == 2) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(t0), 0); // 00 01
+ store_u8_2x1(dst, t0, 0);
dst += dst_stride;
}
h -= 1;
} while (h > 0);
-#endif
+#endif // AOM_ARCH_AARCH64
} else {
uint8_t *d;
int16x8_t s11;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t s12, s13, s14;
do {
__builtin_prefetch(src + 0 * src_stride);
@@ -893,35 +1352,30 @@
s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
t0 = convolve8_horiz_8x8(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t1 = convolve8_horiz_8x8(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t2 = convolve8_horiz_8x8(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t3 = convolve8_horiz_8x8(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t4 = convolve8_horiz_8x8(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t5 = convolve8_horiz_8x8(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t6 = convolve8_horiz_8x8(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
- shift_round_0, shift_by_bits);
-
+ horiz_const);
t7 = convolve8_horiz_8x8(s7, s8, s9, s10, s11, s12, s13, s14,
- x_filter, shift_round_0, shift_by_bits);
+ x_filter, horiz_const);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
if (h != 2) {
store_u8_8x8(d, dst_stride, t0, t1, t2, t3, t4, t5, t6, t7);
} else {
- store_row2_u8_8x8(d, dst_stride, t0, t1);
+ store_u8_8x2(d, dst_stride, t0, t1);
}
+
s0 = s8;
s1 = s9;
s2 = s10;
@@ -937,7 +1391,14 @@
dst += 8 * dst_stride;
h -= 8;
} while (h > 0);
-#else
+#else // !AOM_ARCH_AARCH64
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use a single
+ // rounding right shift by FILTER_BITS - instead of a first rounding right
+ // shift by ROUND0_BITS, followed by second rounding right shift by
+ // FILTER_BITS - ROUND0_BITS.
+ // The outermost -1 is needed because we halved the filter values.
+ const int16x8_t horiz_const = vdupq_n_s16(1 << ((ROUND0_BITS - 1) - 1));
+
do {
t0 = vld1_u8(src); // a0 a1 a2 a3 a4 a5 a6 a7
s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
@@ -962,7 +1423,8 @@
s7 = vextq_s16(s11, s7, 7); // a7 a8 a9 a10 a11 a12 a13 a14
t0 = convolve8_horiz_8x8(s11, s1, s2, s3, s4, s5, s6, s7, x_filter,
- shift_round_0, shift_by_bits);
+ horiz_const);
+
vst1_u8(d, t0);
s += 8;
@@ -973,40 +1435,467 @@
dst += dst_stride;
h -= 1;
} while (h > 0);
-#endif
+#endif // AOM_ARCH_AARCH64
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
}
-#endif
+#endif // AOM_ARCH_AARCH64
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE void convolve_y_sr_6tap_neon(const uint8_t *src_ptr,
+ int src_stride, uint8_t *dst_ptr,
+ const int dst_stride, int w, int h,
+ const int16x8_t y_filter_0_7) {
+ if (w <= 4) {
+ uint8x8_t t0, t1, t2, t3, t4, t5;
+ int16x4_t s0, s1, s2, s3, s4, s5, d0;
+ uint8x8_t d01;
+
+#if AOM_ARCH_AARCH64
+ uint8x8_t t6, t7, t8;
+ int16x4_t s6, s7, s8, d1, d2, d3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ const uint8_t *s = src_ptr + src_stride;
+ uint8_t *d = dst_ptr;
+
+ load_u8_8x5(s, src_stride, &t0, &t1, &t2, &t3, &t4);
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t4)));
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x4(s, src_stride, &t5, &t6, &t7, &t8);
+ s5 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t5)));
+ s6 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t6)));
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t7)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t8)));
+
+ d0 = convolve6_4x4(s0, s1, s2, s3, s4, s5, y_filter_0_7);
+ d1 = convolve6_4x4(s1, s2, s3, s4, s5, s6, y_filter_0_7);
+ d2 = convolve6_4x4(s2, s3, s4, s5, s6, s7, y_filter_0_7);
+ d3 = convolve6_4x4(s3, s4, s5, s6, s7, s8, y_filter_0_7);
+
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS - 1);
+ d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS - 1);
+
+ if (w == 2) {
+ store_u8_2x1(d + 0 * dst_stride, d01, 0);
+ store_u8_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(d + 2 * dst_stride, d23, 0);
+ store_u8_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(d + 0 * dst_stride, d01, 0);
+ store_u8_4x1(d + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(d + 2 * dst_stride, d23, 0);
+ store_u8_4x1(d + 3 * dst_stride, d23, 1);
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ t5 = vld1_u8(s);
+ s5 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t5)));
+
+ d0 = convolve6_4x4(s0, s1, s2, s3, s4, s5, y_filter_0_7);
+ d01 = vqrshrun_n_s16(vcombine_s16(d0, vdup_n_s16(0)), FILTER_BITS - 1);
+
+ if (w == 2) {
+ store_u8_2x1(d, d01, 0);
+ } else {
+ store_u8_4x1(d, d01, 0);
+ }
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h > 0);
+ } else {
+ // if width is a multiple of 8 & height is a multiple of 4
+ uint8x8_t t0, t1, t2, t3, t4, t5;
+ int16x8_t s0, s1, s2, s3, s4, s5, dd0;
+ uint8x8_t d0;
+#if AOM_ARCH_AARCH64
+ uint8x8_t t6, t7, t8;
+ int16x8_t s6, s7, s8, dd1, dd2, dd3;
+ uint8x8_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int height = h;
+ const uint8_t *s = src_ptr + src_stride;
+ uint8_t *d = dst_ptr;
+
+ load_u8_8x5(s, src_stride, &t0, &t1, &t2, &t3, &t4);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x4(s, src_stride, &t5, &t6, &t7, &t8);
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t7));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t8));
+
+ dd0 = convolve6_8x4(s0, s1, s2, s3, s4, s5, y_filter_0_7);
+ dd1 = convolve6_8x4(s1, s2, s3, s4, s5, s6, y_filter_0_7);
+ dd2 = convolve6_8x4(s2, s3, s4, s5, s6, s7, y_filter_0_7);
+ dd3 = convolve6_8x4(s3, s4, s5, s6, s7, s8, y_filter_0_7);
+
+ d0 = vqrshrun_n_s16(dd0, FILTER_BITS - 1);
+ d1 = vqrshrun_n_s16(dd1, FILTER_BITS - 1);
+ d2 = vqrshrun_n_s16(dd2, FILTER_BITS - 1);
+ d3 = vqrshrun_n_s16(dd3, FILTER_BITS - 1);
+
+ if (h != 2) {
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+ } else {
+ store_u8_8x2(d, dst_stride, d0, d1);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t5 = vld1_u8(s);
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+
+ dd0 = convolve6_8x4(s0, s1, s2, s3, s4, s5, y_filter_0_7);
+ d0 = vqrshrun_n_s16(dd0, FILTER_BITS - 1);
+
+ vst1_u8(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE int16x4_t convolve12_y_4x4_s32(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+ int16x4_t sum;
+
+ sum = vmul_lane_s16(s0, y_filter_0_3, 0);
+ sum = vmla_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s4, y_filter_4_7, 0);
+
+ sum = vmla_lane_s16(sum, s7, y_filter_4_7, 3);
+ sum = vmla_lane_s16(sum, s8, y_filter_8_11, 0);
+ sum = vmla_lane_s16(sum, s9, y_filter_8_11, 1);
+ sum = vmla_lane_s16(sum, s10, y_filter_8_11, 2);
+ sum = vmla_lane_s16(sum, s11, y_filter_8_11, 3);
+
+ // Separate out the two filter values in the middle of the kernel that have
+ // the largest magnitude and use saturating addition to prevent overflow. This
+ // means we can stay at 16-bit elements, rather than having to widen
+ // everything to a 32-bit result, requiring twice the number of instructions.
+ sum = vqadd_s16(sum, vmul_lane_s16(s5, y_filter_4_7, 1));
+ sum = vqadd_s16(sum, vmul_lane_s16(s6, y_filter_4_7, 2));
+
+ return sum;
+}
+
+static INLINE uint8x8_t convolve12_y_8x4_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
+ const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+ int16x8_t sum;
+
+ sum = vmulq_lane_s16(s0, y_filter_0_3, 0);
+ sum = vmlaq_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmlaq_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmlaq_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmlaq_lane_s16(sum, s4, y_filter_4_7, 0);
+
+ sum = vmlaq_lane_s16(sum, s7, y_filter_4_7, 3);
+ sum = vmlaq_lane_s16(sum, s8, y_filter_8_11, 0);
+ sum = vmlaq_lane_s16(sum, s9, y_filter_8_11, 1);
+ sum = vmlaq_lane_s16(sum, s10, y_filter_8_11, 2);
+ sum = vmlaq_lane_s16(sum, s11, y_filter_8_11, 3);
+
+ // Separate out the two filter values in the middle of the kernel that have
+ // the largest magnitude and use saturating addition to prevent overflow. This
+ // means we can stay at 16-bit elements, rather than having to widen
+ // everything to a 32-bit result, requiring twice the number of instructions.
+ sum = vqaddq_s16(sum, vmulq_lane_s16(s5, y_filter_4_7, 1));
+ sum = vqaddq_s16(sum, vmulq_lane_s16(s6, y_filter_4_7, 2));
+
+ return vqrshrun_n_s16(sum, FILTER_BITS);
+}
+
+static INLINE void convolve_y_sr_12tap_neon(const uint8_t *src_ptr,
+ int src_stride, uint8_t *dst_ptr,
+ int dst_stride, int w, int h,
+ const int16_t *y_filter_ptr) {
+ // Special case the following no-op filter as 128 won't fit into the
+ // 8-bit signed dot-product instruction:
+ // { 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0 }
+ if (y_filter_ptr[5] == 128) {
+ // Undo the horizontal offset in the calling function
+ src_ptr += 5 * src_stride;
+
+ if (w <= 4) {
+ for (int i = 0; i < h; i += 2) {
+ uint8x8_t d0 = load_unaligned_u8(src_ptr + i * src_stride, src_stride);
+ if (w == 2) {
+ store_u8_2x1(dst_ptr + i * dst_stride, d0, 0);
+ store_u8_2x1(dst_ptr + (i + 1) * dst_stride, d0, 1);
+ } else if (w == 4) {
+ store_u8_4x1(dst_ptr + i * dst_stride, d0, 0);
+ store_u8_4x1(dst_ptr + (i + 1) * dst_stride, d0, 1);
+ }
+ }
+ } else {
+ for (int i = 0; i < h; i++) {
+ for (int j = 0; j < w; j += 8) {
+ uint8x8_t d0 = vld1_u8(src_ptr + i * src_stride + j);
+ vst1_u8(dst_ptr + i * dst_stride + j, d0);
+ }
+ }
+ }
+ return;
+ }
+
+ const int16x8_t y_filter_0_7 = vld1q_s16(y_filter_ptr);
+ const int16x4_t y_filter_8_11 = vld1_s16(y_filter_ptr + 8);
+
+ if (w <= 4) {
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+ int16x4_t d0, d1, d2, d3;
+ int16x8_t dd01, dd23;
+ uint8x8_t d01, d23;
+
+ load_u8_8x11(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7,
+ &t8, &t9, &t10);
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t4)));
+ s5 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t5)));
+ s6 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t6)));
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t7)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t8)));
+ s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t9)));
+ s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t10)));
+
+ src_ptr += 11 * src_stride;
+
+ do {
+ load_u8_8x4(src_ptr, src_stride, &t11, &t12, &t13, &t14);
+ s11 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t11)));
+ s12 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t12)));
+ s13 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t13)));
+ s14 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t14)));
+
+ d0 = convolve12_y_4x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11);
+ d1 = convolve12_y_4x4_s32(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, y_filter_0_7, y_filter_8_11);
+ d2 = convolve12_y_4x4_s32(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, y_filter_0_7, y_filter_8_11);
+ d3 = convolve12_y_4x4_s32(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13,
+ s14, y_filter_0_7, y_filter_8_11);
+
+ dd01 = vcombine_s16(d0, d1);
+ dd23 = vcombine_s16(d2, d3);
+
+ d01 = vqrshrun_n_s16(dd01, FILTER_BITS);
+ d23 = vqrshrun_n_s16(dd23, FILTER_BITS);
+
+ if (w == 2) {
+ store_u8_2x1(dst_ptr + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst_ptr + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(dst_ptr + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst_ptr + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(dst_ptr + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst_ptr + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(dst_ptr + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst_ptr + 3 * dst_stride, d23, 1);
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14;
+ uint8x8_t d0, d1, d2, d3;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+
+ do {
+ const uint8_t *s = src_ptr;
+ uint8_t *d = dst_ptr;
+ int height = h;
+
+ load_u8_8x11(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7, &t8,
+ &t9, &t10);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t7));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t8));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t9));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t10));
+
+ s += 11 * src_stride;
+
+ do {
+ load_u8_8x4(s, src_stride, &t11, &t12, &t13, &t14);
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t11));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t12));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t13));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t14));
+
+ d0 = convolve12_y_8x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11);
+ d1 = convolve12_y_8x4_s32(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, y_filter_0_7, y_filter_8_11);
+ d2 = convolve12_y_8x4_s32(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, y_filter_0_7, y_filter_8_11);
+ d3 = convolve12_y_8x4_s32(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, s14, y_filter_0_7, y_filter_8_11);
+
+ if (h != 2) {
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+ } else {
+ store_u8_8x2(d, dst_stride, d0, d1);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
void av1_convolve_y_sr_neon(const uint8_t *src, int src_stride, uint8_t *dst,
int dst_stride, int w, int h,
const InterpFilterParams *filter_params_y,
const int subpel_y_qn) {
- if (filter_params_y->taps > 8) {
- av1_convolve_y_sr_c(src, src_stride, dst, dst_stride, w, h, filter_params_y,
- subpel_y_qn);
- return;
- }
+ const int y_filter_taps = get_filter_tap(filter_params_y, subpel_y_qn);
const int vert_offset = filter_params_y->taps / 2 - 1;
src -= vert_offset * src_stride;
const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_y, subpel_y_qn & SUBPEL_MASK);
+
+ if (y_filter_taps > 8) {
+ convolve_y_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ y_filter_ptr);
+ return;
+ }
+
// Filter values are even so downshift by 1 to reduce precision requirements.
const int16x8_t y_filter = vshrq_n_s16(vld1q_s16(y_filter_ptr), 1);
+ if (y_filter_taps < 8) {
+ convolve_y_sr_6tap_neon(src, src_stride, dst, dst_stride, w, h, y_filter);
+ return;
+ }
+
if (w <= 4) {
uint8x8_t d01;
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x8_t d23;
int16x4_t s8, s9, s10, d1, d2, d3;
-#endif
+#endif // AOM_ARCH_AARCH64
s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(vld1_u8(src))));
src += src_stride;
s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(vld1_u8(src))));
@@ -1025,7 +1914,7 @@
do {
s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(vld1_u8(src))));
src += src_stride;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(vld1_u8(src))));
src += src_stride;
s9 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(vld1_u8(src))));
@@ -1048,41 +1937,23 @@
d01 = vqrshrun_n_s16(vcombine_s16(d0, d1), FILTER_BITS - 1);
d23 = vqrshrun_n_s16(vcombine_s16(d2, d3), FILTER_BITS - 1);
- if ((w == 4) && (h != 2)) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d01),
- 0); // 00 01 02 03
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d01),
- 1); // 10 11 12 13
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d23),
- 0); // 20 21 22 23
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d23),
- 1); // 30 31 32 33
- dst += dst_stride;
- } else if ((w == 4) && (h == 2)) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d01),
- 0); // 00 01 02 03
- dst += dst_stride;
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d01),
- 1); // 10 11 12 13
- dst += dst_stride;
- } else if ((w == 2) && (h != 2)) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d01), 0); // 00 01
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d01), 2); // 10 11
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d23), 0); // 20 21
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d23), 2); // 30 31
- dst += dst_stride;
- } else if ((w == 2) && (h == 2)) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d01), 0); // 00 01
- dst += dst_stride;
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d01), 2); // 10 11
- dst += dst_stride;
+
+ if (w == 2) {
+ store_u8_2x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(dst + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(dst + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst + 3 * dst_stride, d23, 1);
+ }
}
+
s0 = s4;
s1 = s5;
s2 = s6;
@@ -1090,8 +1961,9 @@
s4 = s8;
s5 = s9;
s6 = s10;
+ dst += 4 * dst_stride;
h -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
__builtin_prefetch(dst + 0 * dst_stride);
__builtin_prefetch(src + 0 * src_stride);
@@ -1100,11 +1972,9 @@
d01 = vqrshrun_n_s16(vcombine_s16(d0, d0), FILTER_BITS - 1);
if (w == 4) {
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d01), 0);
- dst += dst_stride;
+ store_u8_4x1(dst, d01, 0);
} else if (w == 2) {
- vst1_lane_u16((uint16_t *)dst, vreinterpret_u16_u8(d01), 0);
- dst += dst_stride;
+ store_u8_2x1(dst, d01, 0);
}
s0 = s1;
s1 = s2;
@@ -1113,8 +1983,9 @@
s4 = s5;
s5 = s6;
s6 = s7;
+ dst += dst_stride;
h -= 1;
-#endif
+#endif // AOM_ARCH_AARCH64
} while (h > 0);
} else {
int height;
@@ -1122,10 +1993,10 @@
uint8_t *d;
uint8x8_t t0;
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x8_t t1, t2, t3;
int16x8_t s8, s9, s10;
-#endif
+#endif // AOM_ARCH_AARCH64
do {
__builtin_prefetch(src + 0 * src_stride);
__builtin_prefetch(src + 1 * src_stride);
@@ -1155,7 +2026,7 @@
do {
s7 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
s += src_stride;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
s8 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
s += src_stride;
s9 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
@@ -1175,20 +2046,11 @@
t1 = convolve8_vert_8x4(s1, s2, s3, s4, s5, s6, s7, s8, y_filter);
t2 = convolve8_vert_8x4(s2, s3, s4, s5, s6, s7, s8, s9, y_filter);
t3 = convolve8_vert_8x4(s3, s4, s5, s6, s7, s8, s9, s10, y_filter);
+
if (h != 2) {
- vst1_u8(d, t0);
- d += dst_stride;
- vst1_u8(d, t1);
- d += dst_stride;
- vst1_u8(d, t2);
- d += dst_stride;
- vst1_u8(d, t3);
- d += dst_stride;
+ store_u8_8x4(d, dst_stride, t0, t1, t2, t3);
} else {
- vst1_u8(d, t0);
- d += dst_stride;
- vst1_u8(d, t1);
- d += dst_stride;
+ store_u8_8x2(d, dst_stride, t0, t1);
}
s0 = s4;
s1 = s5;
@@ -1197,8 +2059,9 @@
s4 = s8;
s5 = s9;
s6 = s10;
+ d += 4 * dst_stride;
height -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
__builtin_prefetch(d);
__builtin_prefetch(s);
@@ -1215,7 +2078,7 @@
s5 = s6;
s6 = s7;
height -= 1;
-#endif
+#endif // AOM_ARCH_AARCH64
} while (height > 0);
src += 8;
dst += 8;
@@ -1224,13 +2087,12 @@
}
}
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
-static INLINE int16x4_t convolve12_4_usdot(uint8x16_t samples,
- const int8x16_t filters,
- const uint8x16x3_t permute_tbl,
- const int32x4_t horiz_const,
- const int32x4_t shift_round_0) {
+static INLINE int16x4_t convolve12_horiz_4_usdot(uint8x16_t samples,
+ const int8x16_t filters,
+ const uint8x16x3_t permute_tbl,
+ int32x4_t horiz_const) {
uint8x16_t permuted_samples[3];
int32x4_t sum;
@@ -1248,17 +2110,14 @@
sum = vusdotq_laneq_s32(sum, permuted_samples[2], filters, 2);
/* Narrow and re-pack. */
- sum = vqrshlq_s32(sum, shift_round_0);
-
- return vmovn_s32(sum);
+ return vshrn_n_s32(sum, ROUND0_BITS);
}
-static INLINE int16x8_t convolve12_8_usdot(uint8x16_t samples0,
- uint8x16_t samples1,
- const int8x16_t filters,
- const uint8x16x3_t permute_tbl,
- const int32x4_t horiz_const,
- const int32x4_t shift_round_0) {
+static INLINE int16x8_t convolve12_horiz_8_usdot(uint8x16_t samples0,
+ uint8x16_t samples1,
+ const int8x16_t filters,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
uint8x16_t permuted_samples[4];
int32x4_t sum[2];
@@ -1282,16 +2141,14 @@
sum[1] = vusdotq_laneq_s32(sum[1], permuted_samples[3], filters, 2);
/* Narrow and re-pack. */
- sum[0] = vqrshlq_s32(sum[0], shift_round_0);
- sum[1] = vqrshlq_s32(sum[1], shift_round_0);
-
- return vcombine_s16(vmovn_s32(sum[0]), vmovn_s32(sum[1]));
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS),
+ vshrn_n_s32(sum[1], ROUND0_BITS));
}
-static INLINE void av1_convolve_2d_sr_horiz_12tap_neon(
+static INLINE void convolve_2d_sr_horiz_12tap_neon(
const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
const int dst_stride, int w, int h, const int16x8_t x_filter_0_7,
- const int16x4_t x_filter_8_11, const int round_0) {
+ const int16x4_t x_filter_8_11) {
const int bd = 8;
// Special case the following no-op filter as 128 won't fit into the
@@ -1299,7 +2156,6 @@
// { 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0 }
if (vgetq_lane_s16(x_filter_0_7, 5) == 128) {
const int16x8_t horiz_const = vdupq_n_s16((1 << (bd - 1)));
- const int16x8_t shift_round_0 = vdupq_n_s16(FILTER_BITS - round_0);
// Undo the horizontal offset in the calling function.
src_ptr += 5;
@@ -1307,10 +2163,10 @@
for (int j = 0; j < w; j += 8) {
uint8x8_t s0 = vld1_u8(src_ptr + i * src_stride + j);
uint16x8_t t0 = vaddw_u8(vreinterpretq_u16_s16(horiz_const), s0);
- int16x8_t d0 = vqrshlq_s16(vreinterpretq_s16_u16(t0), shift_round_0);
+ int16x8_t d0 =
+ vshlq_n_s16(vreinterpretq_s16_u16(t0), FILTER_BITS - ROUND0_BITS);
if (w == 2) {
- vst1q_lane_s32((int32_t *)(dst_ptr + i * dst_stride),
- vreinterpretq_s32_s16(d0), 0);
+ store_s16_2x1(dst_ptr + i * dst_stride, vget_low_s16(d0), 0);
} else if (w == 4) {
vst1_s16(dst_ptr + i * dst_stride, vget_low_s16(d0));
} else {
@@ -1325,9 +2181,10 @@
};
const int8x16_t x_filter = vcombine_s8(vmovn_s16(x_filter_s16.val[0]),
vmovn_s16(x_filter_s16.val[1]));
-
- const int32x4_t horiz_const = vdupq_n_s32((1 << (bd + FILTER_BITS - 1)));
- const int32x4_t shift_round_0 = vdupq_n_s32(-round_0);
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use non-rounding shifts
+ // - which are generally faster than rounding shifts on modern CPUs.
+ const int32x4_t horiz_const =
+ vdupq_n_s32((1 << (bd + FILTER_BITS - 1)) + (1 << (ROUND0_BITS - 1)));
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
if (w <= 4) {
@@ -1340,34 +2197,20 @@
uint8x16_t s0, s1, s2, s3;
int16x4_t d0, d1, d2, d3;
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve12_4_usdot(s0, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d1 = convolve12_4_usdot(s1, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d2 = convolve12_4_usdot(s2, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d3 = convolve12_4_usdot(s3, x_filter, permute_tbl, horiz_const,
- shift_round_0);
+ d0 = convolve12_horiz_4_usdot(s0, x_filter, permute_tbl, horiz_const);
+ d1 = convolve12_horiz_4_usdot(s1, x_filter, permute_tbl, horiz_const);
+ d2 = convolve12_horiz_4_usdot(s2, x_filter, permute_tbl, horiz_const);
+ d3 = convolve12_horiz_4_usdot(s3, x_filter, permute_tbl, horiz_const);
if (w == 2) {
- vst1_lane_s32((int32_t *)(d + 0 * dst_stride),
- vreinterpret_s32_s16(d0), 0);
- vst1_lane_s32((int32_t *)(d + 1 * dst_stride),
- vreinterpret_s32_s16(d1), 0);
- vst1_lane_s32((int32_t *)(d + 2 * dst_stride),
- vreinterpret_s32_s16(d2), 0);
- vst1_lane_s32((int32_t *)(d + 3 * dst_stride),
- vreinterpret_s32_s16(d3), 0);
+ store_s16_2x1(d + 0 * dst_stride, d0, 0);
+ store_s16_2x1(d + 1 * dst_stride, d1, 0);
+ store_s16_2x1(d + 2 * dst_stride, d2, 0);
+ store_s16_2x1(d + 3 * dst_stride, d3, 0);
} else {
- vst1_s16(d + 0 * dst_stride, d0);
- vst1_s16(d + 1 * dst_stride, d1);
- vst1_s16(d + 2 * dst_stride, d2);
- vst1_s16(d + 3 * dst_stride, d3);
+ store_s16_4x4(d, dst_stride, d0, d1, d2, d3);
}
s += 4;
@@ -1391,11 +2234,10 @@
s0 = vld1q_u8(s);
- d0 = convolve12_4_usdot(s0, x_filter, permute_tbl, horiz_const,
- shift_round_0);
+ d0 = convolve12_horiz_4_usdot(s0, x_filter, permute_tbl, horiz_const);
if (w == 2) {
- vst1_lane_s32((int32_t *)d, vreinterpret_s32_s16(d0), 0);
+ store_s16_2x1(d, d0, 0);
} else {
vst1_s16(d, d0);
}
@@ -1418,28 +2260,19 @@
uint8x16_t s0[2], s1[2], s2[2], s3[2];
int16x8_t d0, d1, d2, d3;
- s0[0] = vld1q_u8(s + 0 * src_stride);
- s1[0] = vld1q_u8(s + 1 * src_stride);
- s2[0] = vld1q_u8(s + 2 * src_stride);
- s3[0] = vld1q_u8(s + 3 * src_stride);
- s0[1] = vld1q_u8(s + 0 * src_stride + 4);
- s1[1] = vld1q_u8(s + 1 * src_stride + 4);
- s2[1] = vld1q_u8(s + 2 * src_stride + 4);
- s3[1] = vld1q_u8(s + 3 * src_stride + 4);
+ load_u8_16x4(s, src_stride, &s0[0], &s1[0], &s2[0], &s3[0]);
+ load_u8_16x4(s + 4, src_stride, &s0[1], &s1[1], &s2[1], &s3[1]);
- d0 = convolve12_8_usdot(s0[0], s0[1], x_filter, permute_tbl,
- horiz_const, shift_round_0);
- d1 = convolve12_8_usdot(s1[0], s1[1], x_filter, permute_tbl,
- horiz_const, shift_round_0);
- d2 = convolve12_8_usdot(s2[0], s2[1], x_filter, permute_tbl,
- horiz_const, shift_round_0);
- d3 = convolve12_8_usdot(s3[0], s3[1], x_filter, permute_tbl,
- horiz_const, shift_round_0);
+ d0 = convolve12_horiz_8_usdot(s0[0], s0[1], x_filter, permute_tbl,
+ horiz_const);
+ d1 = convolve12_horiz_8_usdot(s1[0], s1[1], x_filter, permute_tbl,
+ horiz_const);
+ d2 = convolve12_horiz_8_usdot(s2[0], s2[1], x_filter, permute_tbl,
+ horiz_const);
+ d3 = convolve12_horiz_8_usdot(s3[0], s3[1], x_filter, permute_tbl,
+ horiz_const);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
@@ -1463,8 +2296,8 @@
s0[0] = vld1q_u8(s);
s0[1] = vld1q_u8(s + 4);
- d0 = convolve12_8_usdot(s0[0], s0[1], x_filter, permute_tbl,
- horiz_const, shift_round_0);
+ d0 = convolve12_horiz_8_usdot(s0[0], s0[1], x_filter, permute_tbl,
+ horiz_const);
vst1q_s16(d, d0);
@@ -1480,82 +2313,12 @@
}
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE int16x4_t convolve12_4_sdot(uint8x16_t samples,
- const int8x16_t filters,
- const int32x4_t correction,
- const uint8x16_t range_limit,
- const uint8x16x3_t permute_tbl,
- const int32x4_t shift_round_0) {
- int8x16_t clamped_samples, permuted_samples[3];
- int32x4_t sum;
-
- /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
- clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
-
- /* Permute samples ready for dot product. */
- /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
- permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
- /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
- permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
- /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
- permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
-
- /* Accumulate dot product into 'correction' to account for range clamp. */
- /* First 4 output values. */
- sum = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
- sum = vdotq_laneq_s32(sum, permuted_samples[1], filters, 1);
- sum = vdotq_laneq_s32(sum, permuted_samples[2], filters, 2);
-
- /* Narrow and re-pack. */
- sum = vqrshlq_s32(sum, shift_round_0);
-
- return vmovn_s32(sum);
-}
-
-static INLINE int16x8_t convolve12_8_sdot(
- uint8x16_t samples0, uint8x16_t samples1, const int8x16_t filters,
- const int32x4_t correction, const uint8x16_t range_limit,
- const uint8x16x3_t permute_tbl, const int32x4_t shift_round_0) {
- int8x16_t clamped_samples[2], permuted_samples[4];
- int32x4_t sum[2];
-
- /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
- clamped_samples[0] = vreinterpretq_s8_u8(vsubq_u8(samples0, range_limit));
- clamped_samples[1] = vreinterpretq_s8_u8(vsubq_u8(samples1, range_limit));
-
- /* Permute samples ready for dot product. */
- /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
- permuted_samples[0] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[0]);
- /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
- permuted_samples[1] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[1]);
- /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
- permuted_samples[2] = vqtbl1q_s8(clamped_samples[0], permute_tbl.val[2]);
- /* {12, 13, 14, 15, 13, 14, 15, 16, 14, 15, 16, 17, 15, 16, 17, 18 } */
- permuted_samples[3] = vqtbl1q_s8(clamped_samples[1], permute_tbl.val[2]);
-
- /* Accumulate dot product into 'correction' to account for range clamp. */
- /* First 4 output values. */
- sum[0] = vdotq_laneq_s32(correction, permuted_samples[0], filters, 0);
- sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[1], filters, 1);
- sum[0] = vdotq_laneq_s32(sum[0], permuted_samples[2], filters, 2);
- /* Second 4 output values. */
- sum[1] = vdotq_laneq_s32(correction, permuted_samples[1], filters, 0);
- sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[2], filters, 1);
- sum[1] = vdotq_laneq_s32(sum[1], permuted_samples[3], filters, 2);
-
- /* Narrow and re-pack. */
- sum[0] = vqrshlq_s32(sum[0], shift_round_0);
- sum[1] = vqrshlq_s32(sum[1], shift_round_0);
-
- return vcombine_s16(vmovn_s32(sum[0]), vmovn_s32(sum[1]));
-}
-
-static INLINE void av1_convolve_2d_sr_horiz_12tap_neon(
+static INLINE void convolve_2d_sr_horiz_12tap_neon(
const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
const int dst_stride, int w, int h, const int16x8_t x_filter_0_7,
- const int16x4_t x_filter_8_11, const int round_0) {
+ const int16x4_t x_filter_8_11) {
const int bd = 8;
// Special case the following no-op filter as 128 won't fit into the
@@ -1563,7 +2326,6 @@
// { 0, 0, 0, 0, 0, 128, 0, 0, 0, 0, 0, 0 }
if (vgetq_lane_s16(x_filter_0_7, 5) == 128) {
const int16x8_t horiz_const = vdupq_n_s16((1 << (bd - 1)));
- const int16x8_t shift_round_0 = vdupq_n_s16(FILTER_BITS - round_0);
// Undo the horizontal offset in the calling function.
src_ptr += 5;
@@ -1571,10 +2333,10 @@
for (int j = 0; j < w; j += 8) {
uint8x8_t s0 = vld1_u8(src_ptr + i * src_stride + j);
uint16x8_t t0 = vaddw_u8(vreinterpretq_u16_s16(horiz_const), s0);
- int16x8_t d0 = vqrshlq_s16(vreinterpretq_s16_u16(t0), shift_round_0);
+ int16x8_t d0 =
+ vshlq_n_s16(vreinterpretq_s16_u16(t0), FILTER_BITS - ROUND0_BITS);
if (w == 2) {
- vst1q_lane_s32((int32_t *)(dst_ptr + i * dst_stride),
- vreinterpretq_s32_s16(d0), 0);
+ store_s16_2x1(dst_ptr + i * dst_stride, vget_low_s16(d0), 0);
} else if (w == 4) {
vst1_s16(dst_ptr + i * dst_stride, vget_low_s16(d0));
} else {
@@ -1583,8 +2345,6 @@
}
}
} else {
- const int32x4_t shift_round_0 = vdupq_n_s32(-round_0);
-
// Narrow filter values to 8-bit.
const int16x8x2_t x_filter_s16 = {
{ x_filter_0_7, vcombine_s16(x_filter_8_11, vdup_n_s16(0)) }
@@ -1592,8 +2352,11 @@
const int8x16_t x_filter = vcombine_s8(vmovn_s16(x_filter_s16.val[0]),
vmovn_s16(x_filter_s16.val[1]));
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use non-rounding shifts
+ // - which are generally faster than rounding shifts on modern CPUs.
+ const int32_t horiz_const =
+ ((1 << (bd + FILTER_BITS - 1)) + (1 << (ROUND0_BITS - 1)));
// Dot product constants.
- const int32_t horiz_const = (1 << (bd + FILTER_BITS - 1));
const int32x4_t correct_tmp =
vaddq_s32(vpaddlq_s16(vshlq_n_s16(x_filter_s16.val[0], 7)),
vpaddlq_s16(vshlq_n_s16(x_filter_s16.val[1], 7)));
@@ -1612,34 +2375,24 @@
uint8x16_t s0, s1, s2, s3;
int16x4_t d0, d1, d2, d3;
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve12_4_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d1 = convolve12_4_sdot(s1, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d2 = convolve12_4_sdot(s2, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d3 = convolve12_4_sdot(s3, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ d0 = convolve12_horiz_4_sdot(s0, x_filter, correction, range_limit,
+ permute_tbl);
+ d1 = convolve12_horiz_4_sdot(s1, x_filter, correction, range_limit,
+ permute_tbl);
+ d2 = convolve12_horiz_4_sdot(s2, x_filter, correction, range_limit,
+ permute_tbl);
+ d3 = convolve12_horiz_4_sdot(s3, x_filter, correction, range_limit,
+ permute_tbl);
if (w == 2) {
- vst1_lane_s32((int32_t *)(d + 0 * dst_stride),
- vreinterpret_s32_s16(d0), 0);
- vst1_lane_s32((int32_t *)(d + 1 * dst_stride),
- vreinterpret_s32_s16(d1), 0);
- vst1_lane_s32((int32_t *)(d + 2 * dst_stride),
- vreinterpret_s32_s16(d2), 0);
- vst1_lane_s32((int32_t *)(d + 3 * dst_stride),
- vreinterpret_s32_s16(d3), 0);
+ store_s16_2x1(d + 0 * dst_stride, d0, 0);
+ store_s16_2x1(d + 1 * dst_stride, d1, 0);
+ store_s16_2x1(d + 2 * dst_stride, d2, 0);
+ store_s16_2x1(d + 3 * dst_stride, d3, 0);
} else {
- vst1_s16(d + 0 * dst_stride, d0);
- vst1_s16(d + 1 * dst_stride, d1);
- vst1_s16(d + 2 * dst_stride, d2);
- vst1_s16(d + 3 * dst_stride, d3);
+ store_s16_4x4(d, dst_stride, d0, d1, d2, d3);
}
s += 4;
@@ -1663,11 +2416,11 @@
s0 = vld1q_u8(s);
- d0 = convolve12_4_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ d0 = convolve12_horiz_4_sdot(s0, x_filter, correction, range_limit,
+ permute_tbl);
if (w == 2) {
- vst1_lane_s32((int32_t *)d, vreinterpret_s32_s16(d0), 0);
+ store_s16_2x1(d, d0, 0);
} else {
vst1_s16(d, d0);
}
@@ -1690,28 +2443,19 @@
uint8x16_t s0[2], s1[2], s2[2], s3[2];
int16x8_t d0, d1, d2, d3;
- s0[0] = vld1q_u8(s + 0 * src_stride);
- s1[0] = vld1q_u8(s + 1 * src_stride);
- s2[0] = vld1q_u8(s + 2 * src_stride);
- s3[0] = vld1q_u8(s + 3 * src_stride);
- s0[1] = vld1q_u8(s + 0 * src_stride + 4);
- s1[1] = vld1q_u8(s + 1 * src_stride + 4);
- s2[1] = vld1q_u8(s + 2 * src_stride + 4);
- s3[1] = vld1q_u8(s + 3 * src_stride + 4);
+ load_u8_16x4(s, src_stride, &s0[0], &s1[0], &s2[0], &s3[0]);
+ load_u8_16x4(s + 4, src_stride, &s0[1], &s1[1], &s2[1], &s3[1]);
- d0 = convolve12_8_sdot(s0[0], s0[1], x_filter, correction,
- range_limit, permute_tbl, shift_round_0);
- d1 = convolve12_8_sdot(s1[0], s1[1], x_filter, correction,
- range_limit, permute_tbl, shift_round_0);
- d2 = convolve12_8_sdot(s2[0], s2[1], x_filter, correction,
- range_limit, permute_tbl, shift_round_0);
- d3 = convolve12_8_sdot(s3[0], s3[1], x_filter, correction,
- range_limit, permute_tbl, shift_round_0);
+ d0 = convolve12_horiz_8_sdot(s0[0], s0[1], x_filter, correction,
+ range_limit, permute_tbl);
+ d1 = convolve12_horiz_8_sdot(s1[0], s1[1], x_filter, correction,
+ range_limit, permute_tbl);
+ d2 = convolve12_horiz_8_sdot(s2[0], s2[1], x_filter, correction,
+ range_limit, permute_tbl);
+ d3 = convolve12_horiz_8_sdot(s3[0], s3[1], x_filter, correction,
+ range_limit, permute_tbl);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
@@ -1735,8 +2479,8 @@
s0[0] = vld1q_u8(s);
s0[1] = vld1q_u8(s + 4);
- d0 = convolve12_8_sdot(s0[0], s0[1], x_filter, correction,
- range_limit, permute_tbl, shift_round_0);
+ d0 = convolve12_horiz_8_sdot(s0[0], s0[1], x_filter, correction,
+ range_limit, permute_tbl);
vst1q_s16(d, d0);
@@ -1752,7 +2496,7 @@
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
static INLINE int16x4_t convolve12_horiz_4x4_s16(
const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
@@ -1760,7 +2504,7 @@
const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
const int16x8_t x_filter_0_7, const int16x4_t x_filter_8_11,
- const int32x4_t horiz_const, const int32x4_t shift_round_0) {
+ const int32x4_t horiz_const) {
const int16x4_t x_filter_0_3 = vget_low_s16(x_filter_0_7);
const int16x4_t x_filter_4_7 = vget_high_s16(x_filter_0_7);
int32x4_t sum;
@@ -1779,9 +2523,7 @@
sum = vmlal_lane_s16(sum, s10, x_filter_8_11, 2);
sum = vmlal_lane_s16(sum, s11, x_filter_8_11, 3);
- sum = vqrshlq_s32(sum, shift_round_0);
-
- return vmovn_s32(sum);
+ return vshrn_n_s32(sum, ROUND0_BITS);
}
// 4 column per iteration horizontal filtering for 12-tap convolve_2d_sr.
@@ -1789,8 +2531,7 @@
static INLINE void horiz_filter_12tap_w4_single_row(
const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
const int dst_stride, int w, int h, const int16x8_t x_filter_0_7,
- const int16x4_t x_filter_8_11, const int32x4_t horiz_const,
- const int32x4_t shift_round_0) {
+ const int16x4_t x_filter_8_11, const int32x4_t horiz_const) {
do {
const uint8_t *s = src_ptr;
int16_t *d = dst_ptr;
@@ -1822,10 +2563,10 @@
d0 = convolve12_horiz_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
s11, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
if (w == 2) {
- vst1_lane_s32((int32_t *)d, vreinterpret_s32_s16(d0), 0);
+ store_s16_2x1(d, d0, 0);
} else {
vst1_s16(d, d0);
}
@@ -1841,15 +2582,17 @@
} while (h > 0);
}
-static INLINE void av1_convolve_2d_sr_horiz_12tap_neon(
+static INLINE void convolve_2d_sr_horiz_12tap_neon(
const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
const int dst_stride, int w, int h, const int16x8_t x_filter_0_7,
- const int16x4_t x_filter_8_11, const int round_0) {
+ const int16x4_t x_filter_8_11) {
const int bd = 8;
- const int32x4_t shift_round_0 = vdupq_n_s32(-(round_0));
- const int32x4_t horiz_const = vdupq_n_s32((1 << (bd + FILTER_BITS - 1)));
+ // This shim of 1 << (ROUND0_BITS - 1) enables us to use non-rounding shifts -
+ // which are generally faster than rounding shifts on modern CPUs.
+ const int32x4_t horiz_const =
+ vdupq_n_s32((1 << (bd + FILTER_BITS - 1)) + (1 << (ROUND0_BITS - 1)));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
do {
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
uint8x8_t t0, t1, t2, t3;
@@ -1892,33 +2635,26 @@
d0 = convolve12_horiz_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
s11, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
d1 = convolve12_horiz_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
s11, s12, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
d2 = convolve12_horiz_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
s12, s13, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
d3 = convolve12_horiz_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
s13, s14, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
transpose_s16_4x4d(&d0, &d1, &d2, &d3);
if (w == 2) {
- vst1_lane_s32((int32_t *)(d + 0 * dst_stride), vreinterpret_s32_s16(d0),
- 0);
- vst1_lane_s32((int32_t *)(d + 1 * dst_stride), vreinterpret_s32_s16(d1),
- 0);
- vst1_lane_s32((int32_t *)(d + 2 * dst_stride), vreinterpret_s32_s16(d2),
- 0);
- vst1_lane_s32((int32_t *)(d + 3 * dst_stride), vreinterpret_s32_s16(d3),
- 0);
+ store_s16_2x1(d + 0 * dst_stride, d0, 0);
+ store_s16_2x1(d + 1 * dst_stride, d1, 0);
+ store_s16_2x1(d + 2 * dst_stride, d2, 0);
+ store_s16_2x1(d + 3 * dst_stride, d3, 0);
} else {
- vst1_s16((d + 0 * dst_stride), d0);
- vst1_s16((d + 1 * dst_stride), d1);
- vst1_s16((d + 2 * dst_stride), d2);
- vst1_s16((d + 3 * dst_stride), d3);
+ store_s16_4x4(d, dst_stride, d0, d1, d2, d3);
}
s0 = s4;
@@ -1946,177 +2682,21 @@
if (h) {
horiz_filter_12tap_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride,
w, h, x_filter_0_7, x_filter_8_11,
- horiz_const, shift_round_0);
+ horiz_const);
}
-#else // !defined(__aarch64__)
+#else // !AOM_ARCH_AARCH64
horiz_filter_12tap_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w,
- h, x_filter_0_7, x_filter_8_11, horiz_const,
- shift_round_0);
-#endif // defined(__aarch64__)
+ h, x_filter_0_7, x_filter_8_11, horiz_const);
+#endif // AOM_ARCH_AARCH64
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE void av1_convolve_2d_sr_vert_12tap_neon(
- int16_t *src_ptr, int src_stride, uint8_t *dst_ptr, int dst_stride, int w,
- int h, const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
- ConvolveParams *conv_params) {
- const int bd = 8;
- const int16_t round_bits =
- FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
- const int16x8_t vec_round_bits = vdupq_n_s16(-round_bits);
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int32_t sub_const = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int32x4_t round_shift_vec = vdupq_n_s32(-(conv_params->round_1));
- const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
- const int32x4_t sub_const_vec = vdupq_n_s32(sub_const);
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
- if (w <= 4) {
- int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
- int16x4_t d0, d1, d2, d3;
- int16x8_t dd01, dd23;
- uint8x8_t d01, d23;
-
- load_s16_4x8(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
- src_ptr += (8 * src_stride);
- load_s16_4x4(src_ptr, src_stride, &s8, &s9, &s10, &s11);
- src_ptr += (3 * src_stride);
-
- do {
- load_s16_4x4(src_ptr, src_stride, &s11, &s12, &s13, &s14);
- src_ptr += 4 * src_stride;
-
- d0 = convolve12_vert_4x4_s32(
- s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, y_filter_0_7,
- y_filter_8_11, round_shift_vec, offset_const, sub_const_vec);
- d1 = convolve12_vert_4x4_s32(
- s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, y_filter_0_7,
- y_filter_8_11, round_shift_vec, offset_const, sub_const_vec);
- d2 = convolve12_vert_4x4_s32(
- s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, y_filter_0_7,
- y_filter_8_11, round_shift_vec, offset_const, sub_const_vec);
- d3 = convolve12_vert_4x4_s32(
- s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, y_filter_0_7,
- y_filter_8_11, round_shift_vec, offset_const, sub_const_vec);
-
- dd01 = vqrshlq_s16(vcombine_s16(d0, d1), vec_round_bits);
- dd23 = vqrshlq_s16(vcombine_s16(d2, d3), vec_round_bits);
-
- d01 = vqmovun_s16(dd01);
- d23 = vqmovun_s16(dd23);
-
- if (w == 2) {
- vst1_lane_u16((uint16_t *)dst_ptr, vreinterpret_u16_u8(d01), 0);
- dst_ptr += dst_stride;
- vst1_lane_u16((uint16_t *)dst_ptr, vreinterpret_u16_u8(d01), 2);
- dst_ptr += dst_stride;
- if (h != 2) {
- vst1_lane_u16((uint16_t *)dst_ptr, vreinterpret_u16_u8(d23), 0);
- dst_ptr += dst_stride;
- vst1_lane_u16((uint16_t *)dst_ptr, vreinterpret_u16_u8(d23), 2);
- dst_ptr += dst_stride;
- }
- } else {
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_u8(d01), 0);
- dst_ptr += dst_stride;
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_u8(d01), 1);
- dst_ptr += dst_stride;
- if (h != 2) {
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_u8(d23), 0);
- dst_ptr += dst_stride;
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_u8(d23), 1);
- dst_ptr += dst_stride;
- }
- }
-
- s0 = s4;
- s1 = s5;
- s2 = s6;
- s3 = s7;
- s4 = s8;
- s5 = s9;
- s6 = s10;
- s7 = s11;
- s8 = s12;
- s9 = s13;
- s10 = s14;
- h -= 4;
- } while (h > 0);
-
- } else {
- do {
- int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
- uint8x8_t d0, d1, d2, d3;
-
- int16_t *s = src_ptr;
- uint8_t *d = dst_ptr;
-
- int height = h;
-
- load_s16_8x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
- s += (8 * src_stride);
- load_s16_8x4(s, src_stride, &s8, &s9, &s10, &s11);
- s += (3 * src_stride);
-
- do {
- load_s16_8x4(s, src_stride, &s11, &s12, &s13, &s14);
- s += 4 * src_stride;
-
- d0 = convolve12_vert_8x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9,
- s10, s11, y_filter_0_7, y_filter_8_11,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d1 = convolve12_vert_8x4_s32(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
- s11, s12, y_filter_0_7, y_filter_8_11,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d2 = convolve12_vert_8x4_s32(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
- s12, s13, y_filter_0_7, y_filter_8_11,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d3 = convolve12_vert_8x4_s32(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
- s13, s14, y_filter_0_7, y_filter_8_11,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
-
- vst1_u8(d, d0);
- d += dst_stride;
- vst1_u8(d, d1);
- d += dst_stride;
- if (h != 2) {
- vst1_u8(d, d2);
- d += dst_stride;
- vst1_u8(d, d3);
- d += dst_stride;
- }
-
- s0 = s4;
- s1 = s5;
- s2 = s6;
- s3 = s7;
- s4 = s8;
- s5 = s9;
- s6 = s10;
- s7 = s11;
- s8 = s12;
- s9 = s13;
- s10 = s14;
- height -= 4;
- } while (height > 0);
-
- src_ptr += 8;
- dst_ptr += 8;
- w -= 8;
- } while (w > 0);
- }
-}
-
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
-
-static INLINE void av1_convolve_2d_sr_horiz_neon(
+static INLINE void convolve_2d_sr_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, int im_stride, int w,
- int im_h, const int16x8_t x_filter_s16, const int round_0) {
+ int im_h, const int16x8_t x_filter_s16) {
const int bd = 8;
const uint8_t *src_ptr = src;
@@ -2128,13 +2708,14 @@
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int8x8_t x_filter = vshrn_n_s16(x_filter_s16, 1);
- const int32x4_t horiz_const = vdupq_n_s32(1 << (bd + FILTER_BITS - 2));
-
- assert(round_0 > 0);
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // The outermost -1 is needed because we halved the filter values.
+ const int32x4_t horiz_const = vdupq_n_s32((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
if (w <= 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0 - 1));
uint8x16_t s0, s1, s2, s3;
int32x4_t t0, t1, t2, t3;
int16x4_t d0, d1, d2, d3;
@@ -2142,32 +2723,26 @@
do {
assert(height >= 4);
- load_u8_8x16(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
t0 = convolve8_4_usdot(s0, x_filter, permute_tbl, horiz_const);
t1 = convolve8_4_usdot(s1, x_filter, permute_tbl, horiz_const);
t2 = convolve8_4_usdot(s2, x_filter, permute_tbl, horiz_const);
t3 = convolve8_4_usdot(s3, x_filter, permute_tbl, horiz_const);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
- d1 = vqrshl_s16(vmovn_s32(t1), shift_round_0);
- d2 = vqrshl_s16(vmovn_s32(t2), shift_round_0);
- d3 = vqrshl_s16(vmovn_s32(t3), shift_round_0);
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrn_n_s32(t0, ROUND0_BITS - 1);
+ d1 = vshrn_n_s32(t1, ROUND0_BITS - 1);
+ d2 = vshrn_n_s32(t2, ROUND0_BITS - 1);
+ d3 = vshrn_n_s32(t3, ROUND0_BITS - 1);
if (w == 2) {
- vst1_lane_u32((uint32_t *)(dst_ptr + 0 * dst_stride),
- vreinterpret_u32_s16(d0), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 1 * dst_stride),
- vreinterpret_u32_s16(d1), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 2 * dst_stride),
- vreinterpret_u32_s16(d2), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 3 * dst_stride),
- vreinterpret_u32_s16(d3), 0);
+ store_s16_2x1(dst_ptr + 0 * dst_stride, d0, 0);
+ store_s16_2x1(dst_ptr + 1 * dst_stride, d1, 0);
+ store_s16_2x1(dst_ptr + 2 * dst_stride, d2, 0);
+ store_s16_2x1(dst_ptr + 3 * dst_stride, d3, 0);
} else {
- vst1_s16(dst_ptr + 0 * dst_stride, d0);
- vst1_s16(dst_ptr + 1 * dst_stride, d1);
- vst1_s16(dst_ptr + 2 * dst_stride, d2);
- vst1_s16(dst_ptr + 3 * dst_stride, d3);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
}
src_ptr += 4 * src_stride;
@@ -2181,10 +2756,11 @@
do {
s0 = vld1q_u8(src_ptr);
t0 = convolve8_4_usdot(s0, x_filter, permute_tbl, horiz_const);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrn_n_s32(t0, ROUND0_BITS - 1);
if (w == 2) {
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_s16(d0), 0);
+ store_s16_2x1(dst_ptr, d0, 0);
} else {
vst1_s16(dst_ptr, d0);
}
@@ -2196,7 +2772,6 @@
}
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0 - 1));
uint8x16_t s0, s1, s2, s3;
int16x8_t d0, d1, d2, d3;
@@ -2208,24 +2783,14 @@
int width = w;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_usdot(s0, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d1 = convolve8_8_usdot(s1, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d2 = convolve8_8_usdot(s2, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d3 = convolve8_8_usdot(s3, x_filter, permute_tbl, horiz_const,
- shift_round_0);
+ d0 = convolve8_horiz_8_usdot(s0, x_filter, permute_tbl, horiz_const);
+ d1 = convolve8_horiz_8_usdot(s1, x_filter, permute_tbl, horiz_const);
+ d2 = convolve8_horiz_8_usdot(s2, x_filter, permute_tbl, horiz_const);
+ d3 = convolve8_horiz_8_usdot(s3, x_filter, permute_tbl, horiz_const);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
@@ -2247,8 +2812,7 @@
do {
s0 = vld1q_u8(s);
- d0 = convolve8_8_usdot(s0, x_filter, permute_tbl, horiz_const,
- shift_round_0);
+ d0 = convolve8_horiz_8_usdot(s0, x_filter, permute_tbl, horiz_const);
vst1q_s16(d, d0);
s += 8;
@@ -2264,11 +2828,11 @@
}
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE void av1_convolve_2d_sr_horiz_neon(
+static INLINE void convolve_2d_sr_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, int im_stride, int w,
- int im_h, const int16x8_t x_filter_s16, const int round_0) {
+ int im_h, const int16x8_t x_filter_s16) {
const int bd = 8;
const uint8_t *src_ptr = src;
@@ -2280,18 +2844,18 @@
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int8x8_t x_filter = vshrn_n_s16(x_filter_s16, 1);
- const int32_t horiz_const = (1 << (bd + FILTER_BITS - 2));
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // The outermost -1 is needed because we halved the filter values.
+ const int32_t horiz_const =
+ ((1 << (bd + FILTER_BITS - 2)) + (1 << ((ROUND0_BITS - 1) - 1)));
// Dot product constants.
const int16x8_t correct_tmp = vshlq_n_s16(x_filter_s16, 6);
- const int32x4_t correction =
- vdupq_n_s32(vaddlvq_s16(correct_tmp) + horiz_const);
+ int32x4_t correction = vdupq_n_s32(vaddlvq_s16(correct_tmp) + horiz_const);
const uint8x16_t range_limit = vdupq_n_u8(128);
- assert(round_0 > 0);
-
if (w <= 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0 - 1));
uint8x16_t s0, s1, s2, s3;
int32x4_t t0, t1, t2, t3;
int16x4_t d0, d1, d2, d3;
@@ -2299,32 +2863,26 @@
do {
assert(height >= 4);
- load_u8_8x16(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
t0 = convolve8_4_sdot(s0, x_filter, correction, range_limit, permute_tbl);
t1 = convolve8_4_sdot(s1, x_filter, correction, range_limit, permute_tbl);
t2 = convolve8_4_sdot(s2, x_filter, correction, range_limit, permute_tbl);
t3 = convolve8_4_sdot(s3, x_filter, correction, range_limit, permute_tbl);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
- d1 = vqrshl_s16(vmovn_s32(t1), shift_round_0);
- d2 = vqrshl_s16(vmovn_s32(t2), shift_round_0);
- d3 = vqrshl_s16(vmovn_s32(t3), shift_round_0);
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrn_n_s32(t0, ROUND0_BITS - 1);
+ d1 = vshrn_n_s32(t1, ROUND0_BITS - 1);
+ d2 = vshrn_n_s32(t2, ROUND0_BITS - 1);
+ d3 = vshrn_n_s32(t3, ROUND0_BITS - 1);
if (w == 2) {
- vst1_lane_u32((uint32_t *)(dst_ptr + 0 * dst_stride),
- vreinterpret_u32_s16(d0), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 1 * dst_stride),
- vreinterpret_u32_s16(d1), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 2 * dst_stride),
- vreinterpret_u32_s16(d2), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 3 * dst_stride),
- vreinterpret_u32_s16(d3), 0);
+ store_s16_2x1(dst_ptr + 0 * dst_stride, d0, 0);
+ store_s16_2x1(dst_ptr + 1 * dst_stride, d1, 0);
+ store_s16_2x1(dst_ptr + 2 * dst_stride, d2, 0);
+ store_s16_2x1(dst_ptr + 3 * dst_stride, d3, 0);
} else {
- vst1_s16(dst_ptr + 0 * dst_stride, d0);
- vst1_s16(dst_ptr + 1 * dst_stride, d1);
- vst1_s16(dst_ptr + 2 * dst_stride, d2);
- vst1_s16(dst_ptr + 3 * dst_stride, d3);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
}
src_ptr += 4 * src_stride;
@@ -2339,10 +2897,11 @@
s0 = vld1q_u8(src_ptr);
t0 = convolve8_4_sdot(s0, x_filter, correction, range_limit,
permute_tbl);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrn_n_s32(t0, ROUND0_BITS - 1);
if (w == 2) {
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_s16(d0), 0);
+ store_s16_2x1(dst_ptr, d0, 0);
} else {
vst1_s16(dst_ptr, d0);
}
@@ -2354,7 +2913,6 @@
}
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0 - 1));
uint8x16_t s0, s1, s2, s3;
int16x8_t d0, d1, d2, d3;
@@ -2366,24 +2924,18 @@
int width = w;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d1 = convolve8_8_sdot(s1, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d2 = convolve8_8_sdot(s2, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d3 = convolve8_8_sdot(s3, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ d0 = convolve8_horiz_8_sdot(s0, x_filter, correction, range_limit,
+ permute_tbl);
+ d1 = convolve8_horiz_8_sdot(s1, x_filter, correction, range_limit,
+ permute_tbl);
+ d2 = convolve8_horiz_8_sdot(s2, x_filter, correction, range_limit,
+ permute_tbl);
+ d3 = convolve8_horiz_8_sdot(s3, x_filter, correction, range_limit,
+ permute_tbl);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
@@ -2406,7 +2958,9 @@
do {
s0 = vld1q_u8(s);
d0 = convolve8_8_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ permute_tbl, vdupq_n_s16(0));
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrq_n_s16(d0, ROUND0_BITS - 1);
vst1q_s16(d, d0);
s += 8;
@@ -2422,14 +2976,16 @@
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
// Horizontal filtering for convolve_2d_sr for width multiple of 8
// Processes one row at a time
-static INLINE void horiz_filter_w8_single_row(
- const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
- const int dst_stride, int width, int height, const int16x8_t x_filter,
- const int16x8_t horiz_const, const int16x8_t shift_round_0) {
+static INLINE void horiz_filter_w8_single_row(const uint8_t *src_ptr,
+ int src_stride, int16_t *dst_ptr,
+ const int dst_stride, int width,
+ int height,
+ const int16x8_t x_filter,
+ const int16x8_t horiz_const) {
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
do {
uint8x8_t t0 = vld1_u8(src_ptr);
@@ -2455,8 +3011,8 @@
s6 = vextq_s16(sum, s7, 6); // a6 a7 a8 a9 a10 a11 a12 a13
s7 = vextq_s16(sum, s7, 7); // a7 a8 a9 a10 a11 a12 a13 a14
- int16x8_t res0 = convolve8_8x8_s16(sum, s1, s2, s3, s4, s5, s6, s7,
- x_filter, horiz_const, shift_round_0);
+ int16x8_t res0 = convolve8_horiz_8x8_s16(sum, s1, s2, s3, s4, s5, s6, s7,
+ x_filter, horiz_const);
vst1q_s16(dst_tmp, res0);
@@ -2472,10 +3028,12 @@
// Horizontal filtering for convolve_2d_sr for width <= 4
// Processes one row at a time
-static INLINE void horiz_filter_w4_single_row(
- const uint8_t *src_ptr, int src_stride, int16_t *dst_ptr,
- const int dst_stride, int width, int height, const int16x8_t x_filter,
- const int16x4_t horiz_const, const int16x4_t shift_round_0) {
+static INLINE void horiz_filter_w4_single_row(const uint8_t *src_ptr,
+ int src_stride, int16_t *dst_ptr,
+ const int dst_stride, int width,
+ int height,
+ const int16x8_t x_filter,
+ const int16x4_t horiz_const) {
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
do {
const uint8_t *s = src_ptr;
@@ -2500,11 +3058,11 @@
s6 = vext_s16(s4, s7, 2); // a6 a7 a8 a9
s7 = vext_s16(s4, s7, 3); // a7 a8 a9 a10
- int16x4_t d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
+ int16x4_t d0 = convolve8_horiz_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ x_filter, horiz_const);
if (width == 2) {
- vst1_lane_u32((uint32_t *)dst_ptr, vreinterpret_u32_s16(d0), 0);
+ store_s16_2x1(dst_ptr, d0, 0);
} else {
vst1_s16(dst_ptr, d0);
}
@@ -2515,9 +3073,9 @@
} while (height > 0);
}
-static INLINE void av1_convolve_2d_sr_horiz_neon(
+static INLINE void convolve_2d_sr_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, int im_stride, int w,
- int im_h, const int16x8_t x_filter_s16, const int round_0) {
+ int im_h, const int16x8_t x_filter_s16) {
const int bd = 8;
const uint8_t *src_ptr = src;
@@ -2530,13 +3088,14 @@
// requirements.
const int16x8_t x_filter = vshrq_n_s16(x_filter_s16, 1);
- assert(round_0 > 0);
-
if (w <= 4) {
- const int16x4_t horiz_const = vdup_n_s16((1 << (bd + FILTER_BITS - 2)));
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0 - 1));
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // The outermost -1 is needed because we halved the filter values.
+ const int16x4_t horiz_const = vdup_n_s16((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
do {
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, d0, d1, d2, d3;
uint8x8_t t0, t1, t2, t3;
@@ -2565,31 +3124,24 @@
s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
- d1 = convolve8_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- horiz_const, shift_round_0);
- d2 = convolve8_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- horiz_const, shift_round_0);
- d3 = convolve8_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- horiz_const, shift_round_0);
+ d0 = convolve8_horiz_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
+ d1 = convolve8_horiz_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ horiz_const);
+ d2 = convolve8_horiz_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ horiz_const);
+ d3 = convolve8_horiz_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ horiz_const);
transpose_s16_4x4d(&d0, &d1, &d2, &d3);
if (w == 2) {
- vst1_lane_u32((uint32_t *)(dst_ptr + 0 * dst_stride),
- vreinterpret_u32_s16(d0), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 1 * dst_stride),
- vreinterpret_u32_s16(d1), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 2 * dst_stride),
- vreinterpret_u32_s16(d2), 0);
- vst1_lane_u32((uint32_t *)(dst_ptr + 3 * dst_stride),
- vreinterpret_u32_s16(d3), 0);
+ store_s16_2x1(dst_ptr + 0 * dst_stride, d0, 0);
+ store_s16_2x1(dst_ptr + 1 * dst_stride, d1, 0);
+ store_s16_2x1(dst_ptr + 2 * dst_stride, d2, 0);
+ store_s16_2x1(dst_ptr + 3 * dst_stride, d3, 0);
} else {
- vst1_s16((dst_ptr + 0 * dst_stride), d0);
- vst1_s16((dst_ptr + 1 * dst_stride), d1);
- vst1_s16((dst_ptr + 2 * dst_stride), d2);
- vst1_s16((dst_ptr + 3 * dst_stride), d3);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
}
src_ptr += 4 * src_stride;
@@ -2600,19 +3152,22 @@
if (height) {
assert(height < 4);
horiz_filter_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w,
- height, x_filter, horiz_const, shift_round_0);
+ height, x_filter, horiz_const);
}
-#else // !defined(__aarch64__)
+#else // !AOM_ARCH_AARCH64
horiz_filter_w4_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w,
- height, x_filter, horiz_const, shift_round_0);
-#endif // defined(__aarch64__)
+ height, x_filter, horiz_const);
+#endif // AOM_ARCH_AARCH64
} else {
- const int16x8_t horiz_const = vdupq_n_s16((1 << (bd + FILTER_BITS - 2)));
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0 - 1));
+ // This shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // The outermost -1 is needed because we halved the filter values.
+ const int16x8_t horiz_const = vdupq_n_s16((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
for (; height >= 8; height -= 8) {
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14,
@@ -2651,22 +3206,22 @@
s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
- d0 = convolve8_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
- d1 = convolve8_8x8_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- horiz_const, shift_round_0);
- d2 = convolve8_8x8_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- horiz_const, shift_round_0);
- d3 = convolve8_8x8_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- horiz_const, shift_round_0);
- d4 = convolve8_8x8_s16(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
- horiz_const, shift_round_0);
- d5 = convolve8_8x8_s16(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
- horiz_const, shift_round_0);
- d6 = convolve8_8x8_s16(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
- horiz_const, shift_round_0);
- d7 = convolve8_8x8_s16(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
- horiz_const, shift_round_0);
+ d0 = convolve8_horiz_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
+ d1 = convolve8_horiz_8x8_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ horiz_const);
+ d2 = convolve8_horiz_8x8_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ horiz_const);
+ d3 = convolve8_horiz_8x8_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ horiz_const);
+ d4 = convolve8_horiz_8x8_s16(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
+ horiz_const);
+ d5 = convolve8_horiz_8x8_s16(s5, s6, s7, s8, s9, s10, s11, s12,
+ x_filter, horiz_const);
+ d6 = convolve8_horiz_8x8_s16(s6, s7, s8, s9, s10, s11, s12, s13,
+ x_filter, horiz_const);
+ d7 = convolve8_horiz_8x8_s16(s7, s8, s9, s10, s11, s12, s13, s14,
+ x_filter, horiz_const);
transpose_s16_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
@@ -2741,10 +3296,11 @@
d2 = vaddq_s16(d2, horiz_const);
d3 = vaddq_s16(d3, horiz_const);
- d0 = vqrshlq_s16(d0, shift_round_0);
- d1 = vqrshlq_s16(d1, shift_round_0);
- d2 = vqrshlq_s16(d2, shift_round_0);
- d3 = vqrshlq_s16(d3, shift_round_0);
+ // We halved the convolution filter values so -1 from the right shift.
+ d0 = vshrq_n_s16(d0, ROUND0_BITS - 1);
+ d1 = vshrq_n_s16(d1, ROUND0_BITS - 1);
+ d2 = vshrq_n_s16(d2, ROUND0_BITS - 1);
+ d3 = vshrq_n_s16(d3, ROUND0_BITS - 1);
store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
@@ -2767,92 +3323,141 @@
if (height) {
assert(height < 4);
horiz_filter_w8_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w,
- height, x_filter, horiz_const, shift_round_0);
+ height, x_filter, horiz_const);
}
-#else // !defined(__aarch64__)
+#else // !AOM_ARCH_AARCH64
horiz_filter_w8_single_row(src_ptr, src_stride, dst_ptr, dst_stride, w,
- height, x_filter, horiz_const, shift_round_0);
-#endif // defined(__aarch64__)
+ height, x_filter, horiz_const);
+#endif // AOM_ARCH_AARCH64
}
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE void av1_convolve_2d_sr_vert_8tap_neon(
+static INLINE int32x4_t convolve12_vert_4_s32(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+ int32x4_t sum;
+
+ sum = vmull_lane_s16(s0, y_filter_0_3, 0);
+ sum = vmlal_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmlal_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmlal_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmlal_lane_s16(sum, s4, y_filter_4_7, 0);
+ sum = vmlal_lane_s16(sum, s5, y_filter_4_7, 1);
+ sum = vmlal_lane_s16(sum, s6, y_filter_4_7, 2);
+ sum = vmlal_lane_s16(sum, s7, y_filter_4_7, 3);
+ sum = vmlal_lane_s16(sum, s8, y_filter_8_11, 0);
+ sum = vmlal_lane_s16(sum, s9, y_filter_8_11, 1);
+ sum = vmlal_lane_s16(sum, s10, y_filter_8_11, 2);
+ sum = vmlal_lane_s16(sum, s11, y_filter_8_11, 3);
+
+ return sum;
+}
+
+static INLINE uint8x8_t convolve12_vert_8_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
+ const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int16x8_t sub_const) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+ int32x4_t sum0, sum1;
+ int16x8_t res;
+
+ sum0 = vmull_lane_s16(vget_low_s16(s0), y_filter_0_3, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_0_3, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_0_3, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_0_3, 3);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_4_7, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_4_7, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s6), y_filter_4_7, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s7), y_filter_4_7, 3);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s8), y_filter_8_11, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s9), y_filter_8_11, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s10), y_filter_8_11, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s11), y_filter_8_11, 3);
+
+ sum1 = vmull_lane_s16(vget_high_s16(s0), y_filter_0_3, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_0_3, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_0_3, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_0_3, 3);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_4_7, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_4_7, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s6), y_filter_4_7, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s7), y_filter_4_7, 3);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s8), y_filter_8_11, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s9), y_filter_8_11, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s10), y_filter_8_11, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s11), y_filter_8_11, 3);
+
+ res = vcombine_s16(vqrshrn_n_s32(sum0, 2 * FILTER_BITS - ROUND0_BITS),
+ vqrshrn_n_s32(sum1, 2 * FILTER_BITS - ROUND0_BITS));
+ res = vsubq_s16(res, sub_const);
+
+ return vqmovun_s16(res);
+}
+
+static INLINE void convolve_2d_sr_vert_12tap_neon(
int16_t *src_ptr, int src_stride, uint8_t *dst_ptr, int dst_stride, int w,
- int h, const int16x8_t y_filter, ConvolveParams *conv_params) {
+ int h, const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11) {
const int bd = 8;
- const int16_t round_bits =
- FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
- const int16x8_t vec_round_bits = vdupq_n_s16(-round_bits);
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
-
- const int32_t sub_const = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
-
- const int32x4_t round_shift_vec = vdupq_n_s32(-(conv_params->round_1));
- const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
- const int32x4_t sub_const_vec = vdupq_n_s32(sub_const);
+ const int16x8_t sub_const = vdupq_n_s16(1 << (bd - 1));
if (w <= 4) {
- int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
- int16x8_t dd0;
- uint8x8_t d01;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+ int32x4_t d0, d1, d2, d3;
+ int16x8_t dd01, dd23;
+ uint8x8_t d01, d23;
-#if defined(__aarch64__)
- int16x4_t s8, s9, s10, d1, d2, d3;
- int16x8_t dd1;
- uint8x8_t d23;
-#endif // defined(__aarch64__)
-
- int16_t *s = src_ptr;
- uint8_t *d = dst_ptr;
-
- load_s16_4x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
- s += (7 * src_stride);
+ load_s16_4x11(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7,
+ &s8, &s9, &s10);
+ src_ptr += 11 * src_stride;
do {
-#if defined(__aarch64__)
- load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
- s += (4 * src_stride);
+ load_s16_4x4(src_ptr, src_stride, &s11, &s12, &s13, &s14);
- d0 = convolve8_vert_4x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d1 = convolve8_vert_4x4_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d2 = convolve8_vert_4x4_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d3 = convolve8_vert_4x4_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
+ d0 = convolve12_vert_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11);
+ d1 = convolve12_vert_4_s32(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, y_filter_0_7, y_filter_8_11);
+ d2 = convolve12_vert_4_s32(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, y_filter_0_7, y_filter_8_11);
+ d3 = convolve12_vert_4_s32(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13,
+ s14, y_filter_0_7, y_filter_8_11);
- dd0 = vqrshlq_s16(vcombine_s16(d0, d1), vec_round_bits);
- dd1 = vqrshlq_s16(vcombine_s16(d2, d3), vec_round_bits);
+ dd01 = vcombine_s16(vqrshrn_n_s32(d0, 2 * FILTER_BITS - ROUND0_BITS),
+ vqrshrn_n_s32(d1, 2 * FILTER_BITS - ROUND0_BITS));
+ dd23 = vcombine_s16(vqrshrn_n_s32(d2, 2 * FILTER_BITS - ROUND0_BITS),
+ vqrshrn_n_s32(d3, 2 * FILTER_BITS - ROUND0_BITS));
- d01 = vqmovun_s16(dd0);
- d23 = vqmovun_s16(dd1);
+ dd01 = vsubq_s16(dd01, sub_const);
+ dd23 = vsubq_s16(dd23, sub_const);
- if (w == 4) {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 0);
- d += dst_stride;
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 1);
- d += dst_stride;
+ d01 = vqmovun_s16(dd01);
+ d23 = vqmovun_s16(dd23);
+
+ if (w == 2) {
+ store_u8_2x1(dst_ptr + 0 * dst_stride, d01, 0);
+ store_u8_2x1(dst_ptr + 1 * dst_stride, d01, 2);
if (h != 2) {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d23), 0);
- d += dst_stride;
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d23), 1);
- d += dst_stride;
+ store_u8_2x1(dst_ptr + 2 * dst_stride, d23, 0);
+ store_u8_2x1(dst_ptr + 3 * dst_stride, d23, 2);
}
} else {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 0);
- d += dst_stride;
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 2);
- d += dst_stride;
+ store_u8_4x1(dst_ptr + 0 * dst_stride, d01, 0);
+ store_u8_4x1(dst_ptr + 1 * dst_stride, d01, 1);
if (h != 2) {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d23), 0);
- d += dst_stride;
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d23), 2);
- d += dst_stride;
+ store_u8_4x1(dst_ptr + 2 * dst_stride, d23, 0);
+ store_u8_4x1(dst_ptr + 3 * dst_stride, d23, 1);
}
}
@@ -2863,79 +3468,47 @@
s4 = s8;
s5 = s9;
s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
h -= 4;
-#else // !defined(__aarch64__)
- s7 = vld1_s16(s);
- s += src_stride;
-
- d0 = convolve8_vert_4x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
-
- dd0 = vqrshlq_s16(vcombine_s16(d0, d0), vec_round_bits);
- d01 = vqmovun_s16(dd0);
-
- if (w == 2) {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 0);
- d += dst_stride;
- } else {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 0);
- d += dst_stride;
- }
-
- s0 = s1;
- s1 = s2;
- s2 = s3;
- s3 = s4;
- s4 = s5;
- s5 = s6;
- s6 = s7;
- h--;
-#endif // defined(__aarch64__)
} while (h > 0);
- } else {
- // if width is a multiple of 8 & height is a multiple of 4
- int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
- uint8x8_t d0;
-#if defined(__aarch64__)
- int16x8_t s8, s9, s10;
- uint8x8_t d1, d2, d3;
-#endif // defined(__aarch64__)
+ } else {
do {
- int height = h;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+ uint8x8_t d0, d1, d2, d3;
+
int16_t *s = src_ptr;
uint8_t *d = dst_ptr;
- load_s16_8x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
- s += (7 * src_stride);
+ int height = h;
+
+ load_s16_8x11(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7, &s8,
+ &s9, &s10);
+ s += 11 * src_stride;
do {
-#if defined(__aarch64__)
- load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
- s += (4 * src_stride);
+ load_s16_8x4(s, src_stride, &s11, &s12, &s13, &s14);
- d0 = convolve8_vert_8x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d1 = convolve8_vert_8x4_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d2 = convolve8_vert_8x4_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d3 = convolve8_vert_8x4_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
+ d0 = convolve12_vert_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11, sub_const);
+ d1 = convolve12_vert_8_s32(s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, y_filter_0_7, y_filter_8_11, sub_const);
+ d2 =
+ convolve12_vert_8_s32(s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, y_filter_0_7, y_filter_8_11, sub_const);
+ d3 = convolve12_vert_8_s32(s3, s4, s5, s6, s7, s8, s9, s10, s11, s12,
+ s13, s14, y_filter_0_7, y_filter_8_11,
+ sub_const);
- vst1_u8(d, d0);
- d += dst_stride;
- vst1_u8(d, d1);
- d += dst_stride;
if (h != 2) {
- vst1_u8(d, d2);
- d += dst_stride;
- vst1_u8(d, d3);
- d += dst_stride;
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+ } else {
+ store_u8_8x2(d, dst_stride, d0, d1);
}
s0 = s4;
@@ -2945,27 +3518,13 @@
s4 = s8;
s5 = s9;
s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
height -= 4;
-#else // !defined(__aarch64__)
- s7 = vld1q_s16(s);
- s += src_stride;
-
- d0 = convolve8_vert_8x4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
-
- vst1_u8(d, d0);
- d += dst_stride;
-
- s0 = s1;
- s1 = s2;
- s2 = s3;
- s3 = s4;
- s4 = s5;
- s5 = s6;
- s6 = s7;
- height--;
-#endif // defined(__aarch64__)
} while (height > 0);
src_ptr += 8;
@@ -2975,11 +3534,170 @@
}
}
-static INLINE int16x4_t convolve6_vert_4x4_s32(
- const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
- const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
- const int16x8_t y_filter, const int32x4_t round_shift_vec,
- const int32x4_t offset_const, const int32x4_t sub_const_vec) {
+static INLINE void convolve_2d_sr_vert_8tap_neon(int16_t *src_ptr,
+ int src_stride,
+ uint8_t *dst_ptr,
+ int dst_stride, int w, int h,
+ const int16x8_t y_filter) {
+ const int bd = 8;
+ const int16x8_t sub_const = vdupq_n_s16(1 << (bd - 1));
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
+ uint8x8_t d01;
+
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10, d1, d2, d3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ int16_t *s = src_ptr;
+ uint8_t *d = dst_ptr;
+
+ load_s16_4x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_vert_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter);
+ d1 = convolve8_vert_4_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter);
+ d2 = convolve8_vert_4_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter);
+ d3 = convolve8_vert_4_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter);
+
+ d01 = vqmovun_s16(vsubq_s16(vcombine_s16(d0, d1), sub_const));
+ d23 = vqmovun_s16(vsubq_s16(vcombine_s16(d2, d3), sub_const));
+
+ if (w == 2) {
+ store_u8_2x1(d + 0 * dst_stride, d01, 0);
+ store_u8_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u8_2x1(d + 2 * dst_stride, d23, 0);
+ store_u8_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ store_u8_4x1(d + 0 * dst_stride, d01, 0);
+ store_u8_4x1(d + 1 * dst_stride, d01, 1);
+ if (h != 2) {
+ store_u8_4x1(d + 2 * dst_stride, d23, 0);
+ store_u8_4x1(d + 3 * dst_stride, d23, 1);
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1_s16(s);
+ s += src_stride;
+
+ d0 = convolve8_vert_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter);
+
+ d01 = vqmovun_s16(vsubq_s16(vcombine_s16(d0, vdup_n_s16(0)), sub_const));
+
+ if (w == 2) {
+ store_u8_2x1(d, d01, 0);
+ } else {
+ store_u8_4x1(d, d01, 0);
+ }
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ d += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h > 0);
+ } else {
+ // if width is a multiple of 8 & height is a multiple of 4
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint8x8_t d0;
+#if AOM_ARCH_AARCH64
+ int16x8_t s8, s9, s10;
+ uint8x8_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int height = h;
+ int16_t *s = src_ptr;
+ uint8_t *d = dst_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_vert_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ sub_const);
+ d1 = convolve8_vert_8_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ sub_const);
+ d2 = convolve8_vert_8_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ sub_const);
+ d3 = convolve8_vert_8_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ sub_const);
+
+ if (h != 2) {
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+ } else {
+ store_u8_8x2(d, dst_stride, d0, d1);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1q_s16(s);
+
+ d0 = convolve8_vert_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ sub_const);
+
+ vst1_u8(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE int16x4_t
+convolve6_vert_4_s32(const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter) {
const int16x4_t y_filter_lo = vget_low_s16(y_filter);
const int16x4_t y_filter_hi = vget_high_s16(y_filter);
int32x4_t sum;
@@ -2991,19 +3709,13 @@
sum = vmlal_lane_s16(sum, s4, y_filter_hi, 1);
sum = vmlal_lane_s16(sum, s5, y_filter_hi, 2);
- sum = vaddq_s32(sum, offset_const);
- sum = vqrshlq_s32(sum, round_shift_vec);
- sum = vsubq_s32(sum, sub_const_vec);
-
- return vmovn_s32(sum);
+ return vqrshrn_n_s32(sum, 2 * FILTER_BITS - ROUND0_BITS);
}
-static INLINE uint8x8_t convolve6_vert_8x4_s32(
- const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
- const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
- const int16x8_t y_filter, const int32x4_t round_shift_vec,
- const int32x4_t offset_const, const int32x4_t sub_const_vec,
- const int16x8_t vec_round_bits) {
+static INLINE uint8x8_t
+convolve6_vert_8_s32(const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filter, const int16x8_t sub_const) {
const int16x4_t y_filter_lo = vget_low_s16(y_filter);
const int16x4_t y_filter_hi = vget_high_s16(y_filter);
int32x4_t sum0, sum1;
@@ -3023,97 +3735,61 @@
sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_hi, 1);
sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_hi, 2);
- sum0 = vaddq_s32(sum0, offset_const);
- sum1 = vaddq_s32(sum1, offset_const);
- sum0 = vqrshlq_s32(sum0, round_shift_vec);
- sum1 = vqrshlq_s32(sum1, round_shift_vec);
- sum0 = vsubq_s32(sum0, sub_const_vec);
- sum1 = vsubq_s32(sum1, sub_const_vec);
-
- res = vcombine_s16(vmovn_s32(sum0), vmovn_s32(sum1));
- res = vqrshlq_s16(res, vec_round_bits);
+ res = vcombine_s16(vqrshrn_n_s32(sum0, 2 * FILTER_BITS - ROUND0_BITS),
+ vqrshrn_n_s32(sum1, 2 * FILTER_BITS - ROUND0_BITS));
+ res = vsubq_s16(res, sub_const);
return vqmovun_s16(res);
}
-static INLINE void av1_convolve_2d_sr_vert_6tap_neon(
- int16_t *src_ptr, int src_stride, uint8_t *dst_ptr, int dst_stride, int w,
- int h, const int16x8_t y_filter, ConvolveParams *conv_params) {
+static INLINE void convolve_2d_sr_vert_6tap_neon(int16_t *src_ptr,
+ int src_stride,
+ uint8_t *dst_ptr,
+ int dst_stride, int w, int h,
+ const int16x8_t y_filter) {
const int bd = 8;
- const int16_t round_bits =
- FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
- const int16x8_t vec_round_bits = vdupq_n_s16(-round_bits);
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
-
- const int32_t sub_const = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
-
- const int32x4_t round_shift_vec = vdupq_n_s32(-(conv_params->round_1));
- const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
- const int32x4_t sub_const_vec = vdupq_n_s32(sub_const);
+ const int16x8_t sub_const = vdupq_n_s16(1 << (bd - 1));
if (w <= 4) {
int16x4_t s0, s1, s2, s3, s4, s5, d0;
- int16x8_t dd0;
uint8x8_t d01;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x4_t s6, s7, s8, d1, d2, d3;
- int16x8_t dd1;
uint8x8_t d23;
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
int16_t *s = src_ptr;
uint8_t *d = dst_ptr;
- s0 = vld1_s16(s + 0 * src_stride);
- s1 = vld1_s16(s + 1 * src_stride);
- s2 = vld1_s16(s + 2 * src_stride);
- s3 = vld1_s16(s + 3 * src_stride);
- s4 = vld1_s16(s + 4 * src_stride);
- s += (5 * src_stride);
+ load_s16_4x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_4x4(s, src_stride, &s5, &s6, &s7, &s8);
- s += (4 * src_stride);
- d0 = convolve6_vert_4x4_s32(s0, s1, s2, s3, s4, s5, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d1 = convolve6_vert_4x4_s32(s1, s2, s3, s4, s5, s6, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d2 = convolve6_vert_4x4_s32(s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
- d3 = convolve6_vert_4x4_s32(s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
+ d0 = convolve6_vert_4_s32(s0, s1, s2, s3, s4, s5, y_filter);
+ d1 = convolve6_vert_4_s32(s1, s2, s3, s4, s5, s6, y_filter);
+ d2 = convolve6_vert_4_s32(s2, s3, s4, s5, s6, s7, y_filter);
+ d3 = convolve6_vert_4_s32(s3, s4, s5, s6, s7, s8, y_filter);
- dd0 = vqrshlq_s16(vcombine_s16(d0, d1), vec_round_bits);
- dd1 = vqrshlq_s16(vcombine_s16(d2, d3), vec_round_bits);
+ d01 = vqmovun_s16(vsubq_s16(vcombine_s16(d0, d1), sub_const));
+ d23 = vqmovun_s16(vsubq_s16(vcombine_s16(d2, d3), sub_const));
- d01 = vqmovun_s16(dd0);
- d23 = vqmovun_s16(dd1);
-
- if (w == 4) {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 0);
- d += dst_stride;
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 1);
- d += dst_stride;
+ if (w == 2) {
+ store_u8_2x1(d + 0 * dst_stride, d01, 0);
+ store_u8_2x1(d + 1 * dst_stride, d01, 2);
if (h != 2) {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d23), 0);
- d += dst_stride;
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d23), 1);
- d += dst_stride;
+ store_u8_2x1(d + 2 * dst_stride, d23, 0);
+ store_u8_2x1(d + 3 * dst_stride, d23, 2);
}
} else {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 0);
- d += dst_stride;
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 2);
- d += dst_stride;
+ store_u8_4x1(d + 0 * dst_stride, d01, 0);
+ store_u8_4x1(d + 1 * dst_stride, d01, 1);
if (h != 2) {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d23), 0);
- d += dst_stride;
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d23), 2);
- d += dst_stride;
+ store_u8_4x1(d + 2 * dst_stride, d23, 0);
+ store_u8_4x1(d + 3 * dst_stride, d23, 1);
}
}
@@ -3122,23 +3798,19 @@
s2 = s6;
s3 = s7;
s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
h -= 4;
-#else // !defined(__aarch64__)
+#else // !AOM_ARCH_AARCH64
s5 = vld1_s16(s);
- s += src_stride;
- d0 = convolve6_vert_4x4_s32(s0, s1, s2, s3, s4, s5, y_filter,
- round_shift_vec, offset_const, sub_const_vec);
-
- dd0 = vqrshlq_s16(vcombine_s16(d0, d0), vec_round_bits);
- d01 = vqmovun_s16(dd0);
+ d0 = convolve6_vert_4_s32(s0, s1, s2, s3, s4, s5, y_filter);
+ d01 = vqmovun_s16(vsubq_s16(vcombine_s16(d0, vdup_n_s16(0)), sub_const));
if (w == 2) {
- vst1_lane_u16((uint16_t *)d, vreinterpret_u16_u8(d01), 0);
- d += dst_stride;
+ store_u8_2x1(d, d01, 0);
} else {
- vst1_lane_u32((uint32_t *)d, vreinterpret_u32_u8(d01), 0);
- d += dst_stride;
+ store_u8_4x1(d, d01, 0);
}
s0 = s1;
@@ -3146,57 +3818,41 @@
s2 = s3;
s3 = s4;
s4 = s5;
+ s += src_stride;
+ d += dst_stride;
h--;
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
} while (h > 0);
} else {
// if width is a multiple of 8 & height is a multiple of 4
int16x8_t s0, s1, s2, s3, s4, s5;
uint8x8_t d0;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t s6, s7, s8;
uint8x8_t d1, d2, d3;
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
do {
int height = h;
int16_t *s = src_ptr;
uint8_t *d = dst_ptr;
- s0 = vld1q_s16(s + 0 * src_stride);
- s1 = vld1q_s16(s + 1 * src_stride);
- s2 = vld1q_s16(s + 2 * src_stride);
- s3 = vld1q_s16(s + 3 * src_stride);
- s4 = vld1q_s16(s + 4 * src_stride);
- s += (5 * src_stride);
+ load_s16_8x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_8x4(s, src_stride, &s5, &s6, &s7, &s8);
- s += (4 * src_stride);
- d0 = convolve6_vert_8x4_s32(s0, s1, s2, s3, s4, s5, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d1 = convolve6_vert_8x4_s32(s1, s2, s3, s4, s5, s6, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d2 = convolve6_vert_8x4_s32(s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
- d3 = convolve6_vert_8x4_s32(s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
+ d0 = convolve6_vert_8_s32(s0, s1, s2, s3, s4, s5, y_filter, sub_const);
+ d1 = convolve6_vert_8_s32(s1, s2, s3, s4, s5, s6, y_filter, sub_const);
+ d2 = convolve6_vert_8_s32(s2, s3, s4, s5, s6, s7, y_filter, sub_const);
+ d3 = convolve6_vert_8_s32(s3, s4, s5, s6, s7, s8, y_filter, sub_const);
- vst1_u8(d, d0);
- d += dst_stride;
- vst1_u8(d, d1);
- d += dst_stride;
if (h != 2) {
- vst1_u8(d, d2);
- d += dst_stride;
- vst1_u8(d, d3);
- d += dst_stride;
+ store_u8_8x4(d, dst_stride, d0, d1, d2, d3);
+ } else {
+ store_u8_8x2(d, dst_stride, d0, d1);
}
s0 = s4;
@@ -3204,25 +3860,25 @@
s2 = s6;
s3 = s7;
s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
height -= 4;
-#else // !defined(__aarch64__)
+#else // !AOM_ARCH_AARCH64
s5 = vld1q_s16(s);
- s += src_stride;
- d0 = convolve6_vert_8x4_s32(s0, s1, s2, s3, s4, s5, y_filter,
- round_shift_vec, offset_const,
- sub_const_vec, vec_round_bits);
+ d0 = convolve6_vert_8_s32(s0, s1, s2, s3, s4, s5, y_filter, sub_const);
vst1_u8(d, d0);
- d += dst_stride;
s0 = s1;
s1 = s2;
s2 = s3;
s3 = s4;
s4 = s5;
+ s += src_stride;
+ d += dst_stride;
height--;
-#endif // defined(__aarch64__)
+#endif // AOM_ARCH_AARCH64
} while (height > 0);
src_ptr += 8;
@@ -3238,6 +3894,7 @@
const InterpFilterParams *filter_params_y,
const int subpel_x_qn, const int subpel_y_qn,
ConvolveParams *conv_params) {
+ (void)conv_params;
const int y_filter_taps = get_filter_tap(filter_params_y, subpel_y_qn);
const int clamped_y_taps = y_filter_taps < 6 ? 6 : y_filter_taps;
const int im_h = h + clamped_y_taps - 1;
@@ -3260,13 +3917,11 @@
const int16x8_t y_filter_0_7 = vld1q_s16(y_filter_ptr);
const int16x4_t y_filter_8_11 = vld1_s16(y_filter_ptr + 8);
- av1_convolve_2d_sr_horiz_12tap_neon(src_ptr, src_stride, im_block,
- im_stride, w, im_h, x_filter_0_7,
- x_filter_8_11, conv_params->round_0);
+ convolve_2d_sr_horiz_12tap_neon(src_ptr, src_stride, im_block, im_stride, w,
+ im_h, x_filter_0_7, x_filter_8_11);
- av1_convolve_2d_sr_vert_12tap_neon(im_block, im_stride, dst, dst_stride, w,
- h, y_filter_0_7, y_filter_8_11,
- conv_params);
+ convolve_2d_sr_vert_12tap_neon(im_block, im_stride, dst, dst_stride, w, h,
+ y_filter_0_7, y_filter_8_11);
} else {
DECLARE_ALIGNED(16, int16_t,
im_block[(MAX_SB_SIZE + HORIZ_EXTRA_ROWS) * MAX_SB_SIZE]);
@@ -3274,15 +3929,15 @@
const int16x8_t x_filter = vld1q_s16(x_filter_ptr);
const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
- av1_convolve_2d_sr_horiz_neon(src_ptr, src_stride, im_block, im_stride, w,
- im_h, x_filter, conv_params->round_0);
+ convolve_2d_sr_horiz_8tap_neon(src_ptr, src_stride, im_block, im_stride, w,
+ im_h, x_filter);
if (clamped_y_taps <= 6) {
- av1_convolve_2d_sr_vert_6tap_neon(im_block, im_stride, dst, dst_stride, w,
- h, y_filter, conv_params);
+ convolve_2d_sr_vert_6tap_neon(im_block, im_stride, dst, dst_stride, w, h,
+ y_filter);
} else {
- av1_convolve_2d_sr_vert_8tap_neon(im_block, im_stride, dst, dst_stride, w,
- h, y_filter, conv_params);
+ convolve_2d_sr_vert_8tap_neon(im_block, im_stride, dst, dst_stride, w, h,
+ y_filter);
}
}
}
@@ -3329,7 +3984,7 @@
tt = convolve8_4(t[0], t[1], t[2], t[3], t[4], t[5], t[6], t[7],
filters);
d = vqrshrun_n_s16(vcombine_s16(tt, tt), 7);
- vst1_lane_u32((uint32_t *)&temp[4 * z], vreinterpret_u32_u8(d), 0);
+ store_u8_4x1(&temp[4 * z], d, 0);
} else {
int i;
for (i = 0; i < 4; ++i) {
@@ -3342,14 +3997,10 @@
// transpose the 4x4 filters values back to dst
{
const uint8x8x4_t d4 = vld4_u8(temp);
- vst1_lane_u32((uint32_t *)&dst[x + 0 * dst_stride],
- vreinterpret_u32_u8(d4.val[0]), 0);
- vst1_lane_u32((uint32_t *)&dst[x + 1 * dst_stride],
- vreinterpret_u32_u8(d4.val[1]), 0);
- vst1_lane_u32((uint32_t *)&dst[x + 2 * dst_stride],
- vreinterpret_u32_u8(d4.val[2]), 0);
- vst1_lane_u32((uint32_t *)&dst[x + 3 * dst_stride],
- vreinterpret_u32_u8(d4.val[3]), 0);
+ store_u8_4x1(&dst[x + 0 * dst_stride], d4.val[0], 0);
+ store_u8_4x1(&dst[x + 1 * dst_stride], d4.val[1], 0);
+ store_u8_4x1(&dst[x + 2 * dst_stride], d4.val[2], 0);
+ store_u8_4x1(&dst[x + 3 * dst_stride], d4.val[3], 0);
}
x += 4;
} while (x < w);
@@ -3403,14 +4054,8 @@
load_u8_8x8(temp, 8, &d[0], &d[1], &d[2], &d[3], &d[4], &d[5], &d[6],
&d[7]);
transpose_u8_8x8(&d[0], &d[1], &d[2], &d[3], &d[4], &d[5], &d[6], &d[7]);
- vst1_u8(&dst[x + 0 * dst_stride], d[0]);
- vst1_u8(&dst[x + 1 * dst_stride], d[1]);
- vst1_u8(&dst[x + 2 * dst_stride], d[2]);
- vst1_u8(&dst[x + 3 * dst_stride], d[3]);
- vst1_u8(&dst[x + 4 * dst_stride], d[4]);
- vst1_u8(&dst[x + 5 * dst_stride], d[5]);
- vst1_u8(&dst[x + 6 * dst_stride], d[6]);
- vst1_u8(&dst[x + 7 * dst_stride], d[7]);
+ store_u8_8x8(dst + x, dst_stride, d[0], d[1], d[2], d[3], d[4], d[5],
+ d[6], d[7]);
x += 8;
} while (x < w);
@@ -3449,7 +4094,7 @@
tt = convolve8_4(t[0], t[1], t[2], t[3], t[4], t[5], t[6], t[7], filters);
d = vqrshrun_n_s16(vcombine_s16(tt, tt), 7);
- vst1_lane_u32((uint32_t *)dst, vreinterpret_u32_u8(d), 0);
+ store_u8_4x1(dst, d, 0);
} else {
memcpy(dst, &src_y[3 * src_stride], w);
}
diff --git a/av1/common/arm/convolve_neon.h b/av1/common/arm/convolve_neon.h
index 4e9f636..14a6ebe 100644
--- a/av1/common/arm/convolve_neon.h
+++ b/av1/common/arm/convolve_neon.h
@@ -13,6 +13,8 @@
#include <arm_neon.h>
+#include "config/aom_config.h"
+
#define HORIZ_EXTRA_ROWS ((SUBPEL_TAPS + 7) & ~0x07)
static INLINE int16x4_t convolve8_4(const int16x4_t s0, const int16x4_t s1,
@@ -230,7 +232,10 @@
return sum;
}
-#if defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+// clang versions < 16 did not include the dotprod feature for Arm architecture
+// versions that should have it by default, e.g., armv8.6-a.
+#if AOM_ARCH_AARCH64 && \
+ (defined(__ARM_FEATURE_DOTPROD) || defined(__ARM_FEATURE_MATMUL_INT8))
DECLARE_ALIGNED(16, static const uint8_t, dot_prod_permute_tbl[48]) = {
0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6,
@@ -238,9 +243,62 @@
8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14
};
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE int16x8_t convolve8_x_8_usdot(uint8x16_t samples,
+ const int8x8_t filters,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum[2];
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ /* First 4 output values. */
+ sum[0] = vusdotq_lane_s32(horiz_const, permuted_samples[0], filters, 0);
+ sum[0] = vusdotq_lane_s32(sum[0], permuted_samples[1], filters, 1);
+ /* Second 4 output values. */
+ sum[1] = vusdotq_lane_s32(horiz_const, permuted_samples[1], filters, 0);
+ sum[1] = vusdotq_lane_s32(sum[1], permuted_samples[2], filters, 1);
+
+ return vcombine_s16(vmovn_s32(sum[0]), vmovn_s32(sum[1]));
+}
+
+static INLINE int16x8_t convolve8_horiz_8_usdot(uint8x16_t samples,
+ const int8x8_t filters,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum[2];
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ /* First 4 output values. */
+ sum[0] = vusdotq_lane_s32(horiz_const, permuted_samples[0], filters, 0);
+ sum[0] = vusdotq_lane_s32(sum[0], permuted_samples[1], filters, 1);
+ /* Second 4 output values. */
+ sum[1] = vusdotq_lane_s32(horiz_const, permuted_samples[1], filters, 0);
+ sum[1] = vusdotq_lane_s32(sum[1], permuted_samples[2], filters, 1);
+
+ /* Narrow and re-pack. */
+ // We halved the convolution filter values so -1 from the right shift.
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
+}
static INLINE int32x4_t convolve8_4_usdot(uint8x16_t samples,
const int8x8_t filters,
@@ -263,37 +321,41 @@
return sum;
}
-static INLINE int16x8_t convolve8_8_usdot(uint8x16_t samples,
- const int8x8_t filters,
- const uint8x16x3_t permute_tbl,
- const int32x4_t horiz_const,
- const int16x8_t shift_round_0) {
- uint8x16_t permuted_samples[3];
- int32x4_t sum0, sum1;
- int16x8_t sum;
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE int16x8_t convolve8_horiz_8_sdot(uint8x16_t samples,
+ const int8x8_t filters,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum[2];
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
/* Permute samples ready for dot product. */
/* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
- permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
/* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
- permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
/* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
- permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+ /* Accumulate dot product into 'correction' to account for range clamp. */
/* First 4 output values. */
- sum0 = vusdotq_lane_s32(horiz_const, permuted_samples[0], filters, 0);
- sum0 = vusdotq_lane_s32(sum0, permuted_samples[1], filters, 1);
+ sum[0] = vdotq_lane_s32(correction, permuted_samples[0], filters, 0);
+ sum[0] = vdotq_lane_s32(sum[0], permuted_samples[1], filters, 1);
/* Second 4 output values. */
- sum1 = vusdotq_lane_s32(horiz_const, permuted_samples[1], filters, 0);
- sum1 = vusdotq_lane_s32(sum1, permuted_samples[2], filters, 1);
+ sum[1] = vdotq_lane_s32(correction, permuted_samples[1], filters, 0);
+ sum[1] = vdotq_lane_s32(sum[1], permuted_samples[2], filters, 1);
/* Narrow and re-pack. */
- sum = vcombine_s16(vmovn_s32(sum0), vmovn_s32(sum1));
- return vqrshlq_s16(sum, shift_round_0);
+ /* We halved the convolution filter values so -1 from the right shift. */
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
-
static INLINE int32x4_t convolve8_4_sdot(uint8x16_t samples,
const int8x8_t filters,
const int32x4_t correction,
@@ -353,7 +415,38 @@
return vqrshlq_s16(sum, shift_round_0);
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+static INLINE int16x8_t convolve8_x_8_sdot(uint8x16_t samples,
+ const int8x8_t filters,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum[2];
+
+ /* Clamp sample range to [-128, 127] for 8-bit signed dot product. */
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ /* Permute samples ready for dot product. */
+ /* { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 } */
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ /* { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 } */
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ /* { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 } */
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ /* Accumulate dot product into 'correction' to account for range clamp. */
+ /* First 4 output values. */
+ sum[0] = vdotq_lane_s32(correction, permuted_samples[0], filters, 0);
+ sum[0] = vdotq_lane_s32(sum[0], permuted_samples[1], filters, 1);
+ /* Second 4 output values. */
+ sum[1] = vdotq_lane_s32(correction, permuted_samples[1], filters, 0);
+ sum[1] = vdotq_lane_s32(sum[1], permuted_samples[2], filters, 1);
+
+ /* Narrow and re-pack. */
+ return vcombine_s16(vmovn_s32(sum[0]), vmovn_s32(sum[1]));
+}
+
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
static INLINE int16x4_t convolve8_4x4_s16(
const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
@@ -379,114 +472,92 @@
return sum;
}
-static INLINE uint16x4_t convolve6_4_s32(const int16x4_t s0, const int16x4_t s1,
- const int16x4_t s2, const int16x4_t s3,
- const int16x4_t s4, const int16x4_t s5,
- const int16x8_t y_filter,
- const int32x4_t round_shift_vec,
- const int32x4_t offset_const) {
- const int16x4_t y_filter_lo = vget_low_s16(y_filter);
- const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+static INLINE int16x4_t convolve6_4x4(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter_0_7) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+ int16x4_t sum;
- int32x4_t sum = offset_const;
- sum = vmlal_lane_s16(sum, s0, y_filter_lo, 1);
- sum = vmlal_lane_s16(sum, s1, y_filter_lo, 2);
- sum = vmlal_lane_s16(sum, s2, y_filter_lo, 3);
- sum = vmlal_lane_s16(sum, s3, y_filter_hi, 0);
- sum = vmlal_lane_s16(sum, s4, y_filter_hi, 1);
- sum = vmlal_lane_s16(sum, s5, y_filter_hi, 2);
+ // Filter values at indices 0 and 7 are 0.
+ sum = vmul_lane_s16(s0, y_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s1, y_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s2, y_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s3, y_filter_4_7, 0);
+ sum = vmla_lane_s16(sum, s4, y_filter_4_7, 1);
+ sum = vmla_lane_s16(sum, s5, y_filter_4_7, 2);
- sum = vqrshlq_s32(sum, round_shift_vec);
- return vqmovun_s32(sum);
+ return sum;
}
-static INLINE uint16x8_t convolve6_8_s32(const int16x8_t s0, const int16x8_t s1,
- const int16x8_t s2, const int16x8_t s3,
- const int16x8_t s4, const int16x8_t s5,
- const int16x8_t y_filter,
- const int32x4_t round_shift_vec,
- const int32x4_t offset_const) {
- const int16x4_t y_filter_lo = vget_low_s16(y_filter);
- const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+static INLINE int16x8_t convolve6_8x4(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filters) {
+ const int16x4_t y_filter_lo = vget_low_s16(y_filters);
+ const int16x4_t y_filter_hi = vget_high_s16(y_filters);
+ int16x8_t sum;
- int32x4_t sum0 = offset_const;
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s0), y_filter_lo, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_lo, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_lo, 3);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_hi, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_hi, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_hi, 2);
+ // Filter values at indices 0 and 7 are 0.
+ sum = vmulq_lane_s16(s0, y_filter_lo, 1);
+ sum = vmlaq_lane_s16(sum, s1, y_filter_lo, 2);
+ sum = vmlaq_lane_s16(sum, s2, y_filter_lo, 3);
+ sum = vmlaq_lane_s16(sum, s3, y_filter_hi, 0);
+ sum = vmlaq_lane_s16(sum, s4, y_filter_hi, 1);
+ sum = vmlaq_lane_s16(sum, s5, y_filter_hi, 2);
- int32x4_t sum1 = offset_const;
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s0), y_filter_lo, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_lo, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_lo, 3);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_hi, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_hi, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_hi, 2);
-
- sum0 = vqrshlq_s32(sum0, round_shift_vec);
- sum1 = vqrshlq_s32(sum1, round_shift_vec);
- return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+ return sum;
}
-static INLINE uint16x4_t convolve8_4_s32(const int16x4_t s0, const int16x4_t s1,
- const int16x4_t s2, const int16x4_t s3,
- const int16x4_t s4, const int16x4_t s5,
- const int16x4_t s6, const int16x4_t s7,
- const int16x8_t y_filter,
- const int32x4_t round_shift_vec,
- const int32x4_t offset_const) {
- const int16x4_t y_filter_lo = vget_low_s16(y_filter);
- const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+#if !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
- int32x4_t sum = offset_const;
- sum = vmlal_lane_s16(sum, s0, y_filter_lo, 0);
- sum = vmlal_lane_s16(sum, s1, y_filter_lo, 1);
- sum = vmlal_lane_s16(sum, s2, y_filter_lo, 2);
- sum = vmlal_lane_s16(sum, s3, y_filter_lo, 3);
- sum = vmlal_lane_s16(sum, s4, y_filter_hi, 0);
- sum = vmlal_lane_s16(sum, s5, y_filter_hi, 1);
- sum = vmlal_lane_s16(sum, s6, y_filter_hi, 2);
- sum = vmlal_lane_s16(sum, s7, y_filter_hi, 3);
+static INLINE int16x4_t convolve8_horiz_4x4_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t filter,
+ const int16x4_t horiz_const) {
+ const int16x4_t filter_lo = vget_low_s16(filter);
+ const int16x4_t filter_hi = vget_high_s16(filter);
+ int16x4_t sum;
- sum = vqrshlq_s32(sum, round_shift_vec);
- return vqmovun_s32(sum);
+ sum = horiz_const;
+ sum = vmla_lane_s16(sum, s0, filter_lo, 0);
+ sum = vmla_lane_s16(sum, s1, filter_lo, 1);
+ sum = vmla_lane_s16(sum, s2, filter_lo, 2);
+ sum = vmla_lane_s16(sum, s3, filter_lo, 3);
+ sum = vmla_lane_s16(sum, s4, filter_hi, 0);
+ sum = vmla_lane_s16(sum, s5, filter_hi, 1);
+ sum = vmla_lane_s16(sum, s6, filter_hi, 2);
+ sum = vmla_lane_s16(sum, s7, filter_hi, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshr_n_s16(sum, ROUND0_BITS - 1);
}
-static INLINE uint16x8_t convolve8_8_s32(const int16x8_t s0, const int16x8_t s1,
- const int16x8_t s2, const int16x8_t s3,
- const int16x8_t s4, const int16x8_t s5,
- const int16x8_t s6, const int16x8_t s7,
- const int16x8_t y_filter,
- const int32x4_t round_shift_vec,
- const int32x4_t offset_const) {
- const int16x4_t y_filter_lo = vget_low_s16(y_filter);
- const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+static INLINE int16x8_t convolve8_horiz_8x8_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t filter,
+ const int16x8_t horiz_const) {
+ const int16x4_t filter_lo = vget_low_s16(filter);
+ const int16x4_t filter_hi = vget_high_s16(filter);
+ int16x8_t sum;
- int32x4_t sum0 = offset_const;
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s0), y_filter_lo, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_lo, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_lo, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_lo, 3);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_hi, 0);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_hi, 1);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s6), y_filter_hi, 2);
- sum0 = vmlal_lane_s16(sum0, vget_low_s16(s7), y_filter_hi, 3);
+ sum = horiz_const;
+ sum = vmlaq_lane_s16(sum, s0, filter_lo, 0);
+ sum = vmlaq_lane_s16(sum, s1, filter_lo, 1);
+ sum = vmlaq_lane_s16(sum, s2, filter_lo, 2);
+ sum = vmlaq_lane_s16(sum, s3, filter_lo, 3);
+ sum = vmlaq_lane_s16(sum, s4, filter_hi, 0);
+ sum = vmlaq_lane_s16(sum, s5, filter_hi, 1);
+ sum = vmlaq_lane_s16(sum, s6, filter_hi, 2);
+ sum = vmlaq_lane_s16(sum, s7, filter_hi, 3);
- int32x4_t sum1 = offset_const;
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s0), y_filter_lo, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_lo, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_lo, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_lo, 3);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_hi, 0);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_hi, 1);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s6), y_filter_hi, 2);
- sum1 = vmlal_lane_s16(sum1, vget_high_s16(s7), y_filter_hi, 3);
-
- sum0 = vqrshlq_s32(sum0, round_shift_vec);
- sum1 = vqrshlq_s32(sum1, round_shift_vec);
- return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshrq_n_s16(sum, ROUND0_BITS - 1);
}
+#endif // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
+
#endif // AOM_AV1_COMMON_ARM_CONVOLVE_NEON_H_
diff --git a/av1/common/arm/highbd_convolve_neon.c b/av1/common/arm/highbd_convolve_neon.c
new file mode 100644
index 0000000..fb18e28
--- /dev/null
+++ b/av1/common/arm/highbd_convolve_neon.c
@@ -0,0 +1,2381 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <assert.h>
+#include <arm_neon.h>
+
+#include "config/aom_config.h"
+#include "config/av1_rtcd.h"
+
+#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/arm/mem_neon.h"
+#include "aom_dsp/arm/transpose_neon.h"
+#include "aom_ports/mem.h"
+#include "av1/common/convolve.h"
+#include "av1/common/filter.h"
+#include "av1/common/arm/highbd_convolve_neon.h"
+
+static INLINE void highbd_convolve_y_sr_6tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, const int bd) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int16x8_t y_filter_0_7 = vld1q_s16(y_filter_ptr);
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+ const int16_t *s = (const int16_t *)(src_ptr + src_stride);
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = highbd_convolve6_4_s32_s16(s0, s1, s2, s3, s4, s5, y_filter_0_7,
+ zero_s32);
+ d1 = highbd_convolve6_4_s32_s16(s1, s2, s3, s4, s5, s6, y_filter_0_7,
+ zero_s32);
+ d2 = highbd_convolve6_4_s32_s16(s2, s3, s4, s5, s6, s7, y_filter_0_7,
+ zero_s32);
+ d3 = highbd_convolve6_4_s32_s16(s3, s4, s5, s6, s7, s8, y_filter_0_7,
+ zero_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ d01 = vminq_u16(d01, max);
+ d23 = vminq_u16(d23, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ // if width is a multiple of 8 & height is a multiple of 4
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x8_t d0, d1, d2, d3;
+
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)(src_ptr + src_stride);
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = highbd_convolve6_8_s32_s16(s0, s1, s2, s3, s4, s5, y_filter_0_7,
+ zero_s32);
+ d1 = highbd_convolve6_8_s32_s16(s1, s2, s3, s4, s5, s6, y_filter_0_7,
+ zero_s32);
+ d2 = highbd_convolve6_8_s32_s16(s2, s3, s4, s5, s6, s7, y_filter_0_7,
+ zero_s32);
+ d3 = highbd_convolve6_8_s32_s16(s3, s4, s5, s6, s7, s8, y_filter_0_7,
+ zero_s32);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE void highbd_convolve_y_sr_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, int bd) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_4_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ zero_s32);
+ d1 = highbd_convolve8_4_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ zero_s32);
+ d2 = highbd_convolve8_4_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ zero_s32);
+ d3 = highbd_convolve8_4_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ zero_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ d01 = vminq_u16(d01, max);
+ d23 = vminq_u16(d23, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_8_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, zero_s32);
+ d1 = highbd_convolve8_8_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, zero_s32);
+ d2 = highbd_convolve8_8_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9,
+ y_filter, zero_s32);
+ d3 = highbd_convolve8_8_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10,
+ y_filter, zero_s32);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE void highbd_convolve_y_sr_12tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, int bd) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int16x8_t y_filter_0_7 = vld1q_s16(y_filter_ptr);
+ const int16x4_t y_filter_8_11 = vld1_s16(y_filter_ptr + 8);
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x11(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7, &s8,
+ &s9, &s10);
+ s += 11 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s11, &s12, &s13, &s14);
+
+ d0 = highbd_convolve12_y_4_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9,
+ s10, s11, y_filter_0_7, y_filter_8_11,
+ zero_s32);
+ d1 = highbd_convolve12_y_4_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8, s9,
+ s10, s11, s12, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+ d2 = highbd_convolve12_y_4_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, s12, s13, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+ d3 = highbd_convolve12_y_4_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, s13, s14, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ d01 = vminq_u16(d01, max);
+ d23 = vminq_u16(d23, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ uint16x8_t d0, d1, d2, d3;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x11(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7, &s8,
+ &s9, &s10);
+ s += 11 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s11, &s12, &s13, &s14);
+
+ d0 = highbd_convolve12_y_8_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7, s8,
+ s9, s10, s11, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+ d1 = highbd_convolve12_y_8_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8, s9,
+ s10, s11, s12, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+ d2 = highbd_convolve12_y_8_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, s12, s13, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+ d3 = highbd_convolve12_y_8_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ s12, s13, s14, y_filter_0_7,
+ y_filter_8_11, zero_s32);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+void av1_highbd_convolve_y_sr_neon(const uint16_t *src, int src_stride,
+ uint16_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_y,
+ const int subpel_y_qn, int bd) {
+ const int y_filter_taps = get_filter_tap(filter_params_y, subpel_y_qn);
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_y, subpel_y_qn & SUBPEL_MASK);
+
+ src -= vert_offset * src_stride;
+
+ if (y_filter_taps > 8) {
+ highbd_convolve_y_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ y_filter_ptr, bd);
+ return;
+ }
+ if (y_filter_taps < 8) {
+ highbd_convolve_y_sr_6tap_neon(src, src_stride, dst, dst_stride, w, h,
+ y_filter_ptr, bd);
+ return;
+ }
+
+ highbd_convolve_y_sr_8tap_neon(src, src_stride, dst, dst_stride, w, h,
+ y_filter_ptr, bd);
+}
+
+static INLINE void highbd_convolve_x_sr_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *x_filter_ptr, ConvolveParams *conv_params,
+ int bd) {
+ const int16x8_t x_filter = vld1q_s16(x_filter_ptr);
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int bits = FILTER_BITS - conv_params->round_0;
+ const int16x8_t bits_s16 = vdupq_n_s16(-bits);
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+
+ if (w <= 4) {
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ do {
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ load_s16_8x2(s + 8, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve8_horiz4_s32_s16(s0, s1, x_filter, shift_s32,
+ zero_s32);
+ d1 = highbd_convolve8_horiz4_s32_s16(s2, s3, x_filter, shift_s32,
+ zero_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d01 = vqrshlq_u16(d01, bits_s16);
+ d01 = vminq_u16(d01, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int width = w;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x4(s, src_stride, &s0, &s2, &s4, &s6);
+ s += 8;
+
+ do {
+ load_s16_8x4(s, src_stride, &s1, &s3, &s5, &s7);
+
+ d0 = highbd_convolve8_horiz8_s32_s16(s0, s1, x_filter, shift_s32,
+ zero_s32);
+ d1 = highbd_convolve8_horiz8_s32_s16(s2, s3, x_filter, shift_s32,
+ zero_s32);
+ d2 = highbd_convolve8_horiz8_s32_s16(s4, s5, x_filter, shift_s32,
+ zero_s32);
+ d3 = highbd_convolve8_horiz8_s32_s16(s6, s7, x_filter, shift_s32,
+ zero_s32);
+
+ d0 = vqrshlq_u16(d0, bits_s16);
+ d1 = vqrshlq_u16(d1, bits_s16);
+ d2 = vqrshlq_u16(d2, bits_s16);
+ d3 = vqrshlq_u16(d3, bits_s16);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s1;
+ s2 = s3;
+ s4 = s5;
+ s6 = s7;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ }
+}
+
+static INLINE void highbd_convolve_x_sr_12tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *x_filter_ptr, ConvolveParams *conv_params,
+ int bd) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int bits = FILTER_BITS - conv_params->round_0;
+ const int16x8_t bits_s16 = vdupq_n_s16(-bits);
+ const int16x8_t x_filter_0_7 = vld1q_s16(x_filter_ptr);
+ const int16x4_t x_filter_8_11 = vld1_s16(x_filter_ptr + 8);
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+
+ if (w <= 4) {
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ do {
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ load_s16_8x2(s + 8, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve12_horiz4_s32_s16(s0, s1, x_filter_0_7, x_filter_8_11,
+ shift_s32, zero_s32);
+ d1 = highbd_convolve12_horiz4_s32_s16(s2, s3, x_filter_0_7, x_filter_8_11,
+ shift_s32, zero_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d01 = vqrshlq_u16(d01, bits_s16);
+ d01 = vminq_u16(d01, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int width = w;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x4(s, src_stride, &s0, &s3, &s6, &s9);
+ s += 8;
+
+ do {
+ load_s16_8x4(s, src_stride, &s1, &s4, &s7, &s10);
+ load_s16_8x4(s + 8, src_stride, &s2, &s5, &s8, &s11);
+
+ d0 = highbd_convolve12_horiz8_s32_s16(
+ s0, s1, s2, x_filter_0_7, x_filter_8_11, shift_s32, zero_s32);
+ d1 = highbd_convolve12_horiz8_s32_s16(
+ s3, s4, s5, x_filter_0_7, x_filter_8_11, shift_s32, zero_s32);
+ d2 = highbd_convolve12_horiz8_s32_s16(
+ s6, s7, s8, x_filter_0_7, x_filter_8_11, shift_s32, zero_s32);
+ d3 = highbd_convolve12_horiz8_s32_s16(
+ s9, s10, s11, x_filter_0_7, x_filter_8_11, shift_s32, zero_s32);
+
+ d0 = vqrshlq_u16(d0, bits_s16);
+ d1 = vqrshlq_u16(d1, bits_s16);
+ d2 = vqrshlq_u16(d2, bits_s16);
+ d3 = vqrshlq_u16(d3, bits_s16);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s1;
+ s1 = s2;
+ s3 = s4;
+ s4 = s5;
+ s6 = s7;
+ s7 = s8;
+ s9 = s10;
+ s10 = s11;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ }
+}
+
+void av1_highbd_convolve_x_sr_neon(const uint16_t *src, int src_stride,
+ uint16_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_x,
+ const int subpel_x_qn,
+ ConvolveParams *conv_params, int bd) {
+ const int x_filter_taps = get_filter_tap(filter_params_x, subpel_x_qn);
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+
+ src -= horiz_offset;
+
+ if (x_filter_taps > 8) {
+ highbd_convolve_x_sr_12tap_neon(src, src_stride, dst, dst_stride, w, h,
+ x_filter_ptr, conv_params, bd);
+ return;
+ }
+
+ highbd_convolve_x_sr_8tap_neon(src, src_stride, dst, dst_stride, w, h,
+ x_filter_ptr, conv_params, bd);
+}
+
+static INLINE void highbd_convolve_2d_y_sr_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, ConvolveParams *conv_params,
+ int bd, const int offset, const int correction) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+ const int round1_shift = conv_params->round_1;
+ const int32x4_t round1_shift_s32 = vdupq_n_s32(-round1_shift);
+ const int32x4_t correction_s32 = vdupq_n_s32(correction);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_4_sr_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, round1_shift_s32, offset_s32,
+ correction_s32);
+ d1 = highbd_convolve8_4_sr_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, round1_shift_s32, offset_s32,
+ correction_s32);
+ d2 = highbd_convolve8_4_sr_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9,
+ y_filter, round1_shift_s32, offset_s32,
+ correction_s32);
+ d3 = highbd_convolve8_4_sr_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10,
+ y_filter, round1_shift_s32, offset_s32,
+ correction_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ d01 = vminq_u16(d01, max);
+ d23 = vminq_u16(d23, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_8_sr_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, round1_shift_s32,
+ offset_s32, correction_s32);
+ d1 = highbd_convolve8_8_sr_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, round1_shift_s32,
+ offset_s32, correction_s32);
+ d2 = highbd_convolve8_8_sr_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9,
+ y_filter, round1_shift_s32,
+ offset_s32, correction_s32);
+ d3 = highbd_convolve8_8_sr_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10,
+ y_filter, round1_shift_s32,
+ offset_s32, correction_s32);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE void highbd_convolve_2d_y_sr_12tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, ConvolveParams *conv_params,
+ const int bd, const int offset, const int correction) {
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ const int16x8_t y_filter_0_7 = vld1q_s16(y_filter_ptr);
+ const int16x4_t y_filter_8_11 = vld1_s16(y_filter_ptr + 8);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+ const int round1_shift = conv_params->round_1;
+ const int32x4_t round1_shift_s32 = vdupq_n_s32(-round1_shift);
+ const int32x4_t correction_s32 = vdupq_n_s32(correction);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x11(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7, &s8,
+ &s9, &s10);
+ s += 11 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s11, &s12, &s13, &s14);
+
+ d0 = highbd_convolve12_y_4_sr_s32_s16(
+ s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d1 = highbd_convolve12_y_4_sr_s32_s16(
+ s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d2 = highbd_convolve12_y_4_sr_s32_s16(
+ s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d3 = highbd_convolve12_y_4_sr_s32_s16(
+ s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ d01 = vminq_u16(d01, max);
+ d23 = vminq_u16(d23, max);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ uint16x8_t d0, d1, d2, d3;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14;
+
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x11(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7, &s8,
+ &s9, &s10);
+ s += 11 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s11, &s12, &s13, &s14);
+
+ d0 = highbd_convolve12_y_8_sr_s32_s16(
+ s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d1 = highbd_convolve12_y_8_sr_s32_s16(
+ s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d2 = highbd_convolve12_y_8_sr_s32_s16(
+ s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+ d3 = highbd_convolve12_y_8_sr_s32_s16(
+ s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, y_filter_0_7,
+ y_filter_8_11, round1_shift_s32, offset_s32, correction_s32);
+
+ d0 = vminq_u16(d0, max);
+ d1 = vminq_u16(d1, max);
+ d2 = vminq_u16(d2, max);
+ d3 = vminq_u16(d3, max);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s7 = s11;
+ s8 = s12;
+ s9 = s13;
+ s10 = s14;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+static INLINE void highbd_convolve_x_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *x_filter_ptr, ConvolveParams *conv_params,
+ const int offset) {
+ const int16x8_t x_filter = vld1q_s16(x_filter_ptr);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+
+ if (w <= 4) {
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ do {
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ load_s16_8x2(s + 8, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve8_horiz4_s32_s16(s0, s1, x_filter, shift_s32,
+ offset_s32);
+ d1 = highbd_convolve8_horiz4_s32_s16(s2, s3, x_filter, shift_s32,
+ offset_s32);
+
+ d01 = vcombine_u16(d0, d1);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int width = w;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x4(s, src_stride, &s0, &s2, &s4, &s6);
+ s += 8;
+
+ do {
+ load_s16_8x4(s, src_stride, &s1, &s3, &s5, &s7);
+
+ d0 = highbd_convolve8_horiz8_s32_s16(s0, s1, x_filter, shift_s32,
+ offset_s32);
+ d1 = highbd_convolve8_horiz8_s32_s16(s2, s3, x_filter, shift_s32,
+ offset_s32);
+ d2 = highbd_convolve8_horiz8_s32_s16(s4, s5, x_filter, shift_s32,
+ offset_s32);
+ d3 = highbd_convolve8_horiz8_s32_s16(s6, s7, x_filter, shift_s32,
+ offset_s32);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s1;
+ s2 = s3;
+ s4 = s5;
+ s6 = s7;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ }
+}
+
+static INLINE void highbd_convolve_2d_x_sr_12tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *x_filter_ptr, ConvolveParams *conv_params,
+ const int offset) {
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int16x8_t x_filter_0_7 = vld1q_s16(x_filter_ptr);
+ const int16x4_t x_filter_8_11 = vld1_s16(x_filter_ptr + 8);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+
+ if (w <= 4) {
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ do {
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ load_s16_8x2(s + 8, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve12_horiz4_s32_s16(s0, s1, x_filter_0_7, x_filter_8_11,
+ shift_s32, offset_s32);
+ d1 = highbd_convolve12_horiz4_s32_s16(s2, s3, x_filter_0_7, x_filter_8_11,
+ shift_s32, offset_s32);
+
+ d01 = vcombine_u16(d0, d1);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int width = w;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x4(s, src_stride, &s0, &s3, &s6, &s9);
+ s += 8;
+
+ do {
+ load_s16_8x4(s, src_stride, &s1, &s4, &s7, &s10);
+ load_s16_8x4(s + 8, src_stride, &s2, &s5, &s8, &s11);
+
+ d0 = highbd_convolve12_horiz8_s32_s16(
+ s0, s1, s2, x_filter_0_7, x_filter_8_11, shift_s32, offset_s32);
+ d1 = highbd_convolve12_horiz8_s32_s16(
+ s3, s4, s5, x_filter_0_7, x_filter_8_11, shift_s32, offset_s32);
+ d2 = highbd_convolve12_horiz8_s32_s16(
+ s6, s7, s8, x_filter_0_7, x_filter_8_11, shift_s32, offset_s32);
+ d3 = highbd_convolve12_horiz8_s32_s16(
+ s9, s10, s11, x_filter_0_7, x_filter_8_11, shift_s32, offset_s32);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s1;
+ s1 = s2;
+ s3 = s4;
+ s4 = s5;
+ s6 = s7;
+ s7 = s8;
+ s9 = s10;
+ s10 = s11;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ }
+}
+
+void av1_highbd_convolve_2d_sr_neon(const uint16_t *src, int src_stride,
+ uint16_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_x,
+ const InterpFilterParams *filter_params_y,
+ const int subpel_x_qn,
+ const int subpel_y_qn,
+ ConvolveParams *conv_params, int bd) {
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+ const int im_h = h + filter_params_y->taps - 1;
+ const int im_stride = MAX_SB_SIZE;
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const int x_offset_initial = (1 << (bd + FILTER_BITS - 1));
+ const int y_offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int y_offset_initial = (1 << y_offset_bits);
+ const int y_offset_correction =
+ ((1 << (y_offset_bits - conv_params->round_1)) +
+ (1 << (y_offset_bits - conv_params->round_1 - 1)));
+
+ const uint16_t *src_ptr = src - vert_offset * src_stride - horiz_offset;
+
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_y, subpel_y_qn & SUBPEL_MASK);
+
+ if (filter_params_x->taps > 8) {
+ highbd_convolve_2d_x_sr_12tap_neon(src_ptr, src_stride, im_block, im_stride,
+ w, im_h, x_filter_ptr, conv_params,
+ x_offset_initial);
+
+ highbd_convolve_2d_y_sr_12tap_neon(im_block, im_stride, dst, dst_stride, w,
+ h, y_filter_ptr, conv_params, bd,
+ y_offset_initial, y_offset_correction);
+ } else {
+ highbd_convolve_x_8tap_neon(src_ptr, src_stride, im_block, im_stride, w,
+ im_h, x_filter_ptr, conv_params,
+ x_offset_initial);
+
+ highbd_convolve_2d_y_sr_8tap_neon(im_block, im_stride, dst, dst_stride, w,
+ h, y_filter_ptr, conv_params, bd,
+ y_offset_initial, y_offset_correction);
+ }
+}
+
+static INLINE void highbd_convolve_2d_x_scale_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int subpel_x_qn, const int x_step_qn,
+ const InterpFilterParams *filter_params, ConvolveParams *conv_params,
+ const int offset) {
+ const uint32x4_t idx = { 0, 1, 2, 3 };
+ const uint32x4_t subpel_mask = vdupq_n_u32(SCALE_SUBPEL_MASK);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+
+ if (w <= 4) {
+ int height = h;
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0;
+
+ uint16_t *d = dst_ptr;
+
+ do {
+ int x_qn = subpel_x_qn;
+
+ // Load 4 src vectors at a time, they might be the same, but we have to
+ // calculate the indices anyway. Doing it in SIMD and then storing the
+ // indices is faster than having to calculate the expression
+ // &src_ptr[((x_qn + 0*x_step_qn) >> SCALE_SUBPEL_BITS)] 4 times
+ // Ideally this should be a gather using the indices, but NEON does not
+ // have that, so have to emulate
+ const uint32x4_t xqn_idx = vmlaq_n_u32(vdupq_n_u32(x_qn), idx, x_step_qn);
+ // We have to multiply x2 to get the actual pointer as sizeof(uint16_t) =
+ // 2
+ const uint32x4_t src_idx_u32 =
+ vshlq_n_u32(vshrq_n_u32(xqn_idx, SCALE_SUBPEL_BITS), 1);
+#if AOM_ARCH_AARCH64
+ uint64x2_t src4[2];
+ src4[0] = vaddw_u32(vdupq_n_u64((const uint64_t)src_ptr),
+ vget_low_u32(src_idx_u32));
+ src4[1] = vaddw_u32(vdupq_n_u64((const uint64_t)src_ptr),
+ vget_high_u32(src_idx_u32));
+ int16_t *src4_ptr[4];
+ uint64_t *tmp_ptr = (uint64_t *)&src4_ptr;
+ vst1q_u64(tmp_ptr, src4[0]);
+ vst1q_u64(tmp_ptr + 2, src4[1]);
+#else
+ uint32x4_t src4;
+ src4 = vaddq_u32(vdupq_n_u32((const uint32_t)src_ptr), src_idx_u32);
+ int16_t *src4_ptr[4];
+ uint32_t *tmp_ptr = (uint32_t *)&src4_ptr;
+ vst1q_u32(tmp_ptr, src4);
+#endif // AOM_ARCH_AARCH64
+ // Same for the filter vectors
+ const int32x4_t filter_idx_s32 = vreinterpretq_s32_u32(
+ vshrq_n_u32(vandq_u32(xqn_idx, subpel_mask), SCALE_EXTRA_BITS));
+ int32_t x_filter4_idx[4];
+ vst1q_s32(x_filter4_idx, filter_idx_s32);
+ const int16_t *x_filter4_ptr[4];
+
+ // Load source
+ s0 = vld1q_s16(src4_ptr[0]);
+ s1 = vld1q_s16(src4_ptr[1]);
+ s2 = vld1q_s16(src4_ptr[2]);
+ s3 = vld1q_s16(src4_ptr[3]);
+
+ // We could easily do this using SIMD as well instead of calling the
+ // inline function 4 times.
+ x_filter4_ptr[0] =
+ av1_get_interp_filter_subpel_kernel(filter_params, x_filter4_idx[0]);
+ x_filter4_ptr[1] =
+ av1_get_interp_filter_subpel_kernel(filter_params, x_filter4_idx[1]);
+ x_filter4_ptr[2] =
+ av1_get_interp_filter_subpel_kernel(filter_params, x_filter4_idx[2]);
+ x_filter4_ptr[3] =
+ av1_get_interp_filter_subpel_kernel(filter_params, x_filter4_idx[3]);
+
+ // Actually load the filters
+ const int16x8_t x_filter0 = vld1q_s16(x_filter4_ptr[0]);
+ const int16x8_t x_filter1 = vld1q_s16(x_filter4_ptr[1]);
+ const int16x8_t x_filter2 = vld1q_s16(x_filter4_ptr[2]);
+ const int16x8_t x_filter3 = vld1q_s16(x_filter4_ptr[3]);
+
+ // Group low and high parts and transpose
+ int16x4_t filters_lo[] = { vget_low_s16(x_filter0),
+ vget_low_s16(x_filter1),
+ vget_low_s16(x_filter2),
+ vget_low_s16(x_filter3) };
+ int16x4_t filters_hi[] = { vget_high_s16(x_filter0),
+ vget_high_s16(x_filter1),
+ vget_high_s16(x_filter2),
+ vget_high_s16(x_filter3) };
+ transpose_u16_4x4((uint16x4_t *)filters_lo);
+ transpose_u16_4x4((uint16x4_t *)filters_hi);
+
+ // Run the 2D Scale convolution
+ d0 = highbd_convolve8_2d_scale_horiz4x8_s32_s16(
+ s0, s1, s2, s3, filters_lo, filters_hi, shift_s32, offset_s32);
+
+ if (w == 2) {
+ store_u16_2x1(d + 0 * dst_stride, d0, 0);
+ } else {
+ vst1_u16(d + 0 * dst_stride, d0);
+ }
+
+ src_ptr += src_stride;
+ d += dst_stride;
+ height--;
+ } while (height > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0;
+
+ do {
+ int width = w;
+ int x_qn = subpel_x_qn;
+ uint16_t *d = dst_ptr;
+ const uint16_t *s = src_ptr;
+
+ do {
+ // Load 4 src vectors at a time, they might be the same, but we have to
+ // calculate the indices anyway. Doing it in SIMD and then storing the
+ // indices is faster than having to calculate the expression
+ // &src_ptr[((x_qn + 0*x_step_qn) >> SCALE_SUBPEL_BITS)] 4 times
+ // Ideally this should be a gather using the indices, but NEON does not
+ // have that, so have to emulate
+ const uint32x4_t xqn_idx =
+ vmlaq_n_u32(vdupq_n_u32(x_qn), idx, x_step_qn);
+ // We have to multiply x2 to get the actual pointer as sizeof(uint16_t)
+ // = 2
+ const uint32x4_t src_idx_u32 =
+ vshlq_n_u32(vshrq_n_u32(xqn_idx, SCALE_SUBPEL_BITS), 1);
+#if AOM_ARCH_AARCH64
+ uint64x2_t src4[2];
+ src4[0] = vaddw_u32(vdupq_n_u64((const uint64_t)s),
+ vget_low_u32(src_idx_u32));
+ src4[1] = vaddw_u32(vdupq_n_u64((const uint64_t)s),
+ vget_high_u32(src_idx_u32));
+ int16_t *src4_ptr[4];
+ uint64_t *tmp_ptr = (uint64_t *)&src4_ptr;
+ vst1q_u64(tmp_ptr, src4[0]);
+ vst1q_u64(tmp_ptr + 2, src4[1]);
+#else
+ uint32x4_t src4;
+ src4 = vaddq_u32(vdupq_n_u32((const uint32_t)s), src_idx_u32);
+ int16_t *src4_ptr[4];
+ uint32_t *tmp_ptr = (uint32_t *)&src4_ptr;
+ vst1q_u32(tmp_ptr, src4);
+#endif // AOM_ARCH_AARCH64
+ // Same for the filter vectors
+ const int32x4_t filter_idx_s32 = vreinterpretq_s32_u32(
+ vshrq_n_u32(vandq_u32(xqn_idx, subpel_mask), SCALE_EXTRA_BITS));
+ int32_t x_filter4_idx[4];
+ vst1q_s32(x_filter4_idx, filter_idx_s32);
+ const int16_t *x_filter4_ptr[4];
+
+ // Load source
+ s0 = vld1q_s16(src4_ptr[0]);
+ s1 = vld1q_s16(src4_ptr[1]);
+ s2 = vld1q_s16(src4_ptr[2]);
+ s3 = vld1q_s16(src4_ptr[3]);
+
+ // We could easily do this using SIMD as well instead of calling the
+ // inline function 4 times.
+ x_filter4_ptr[0] = av1_get_interp_filter_subpel_kernel(
+ filter_params, x_filter4_idx[0]);
+ x_filter4_ptr[1] = av1_get_interp_filter_subpel_kernel(
+ filter_params, x_filter4_idx[1]);
+ x_filter4_ptr[2] = av1_get_interp_filter_subpel_kernel(
+ filter_params, x_filter4_idx[2]);
+ x_filter4_ptr[3] = av1_get_interp_filter_subpel_kernel(
+ filter_params, x_filter4_idx[3]);
+
+ // Actually load the filters
+ const int16x8_t x_filter0 = vld1q_s16(x_filter4_ptr[0]);
+ const int16x8_t x_filter1 = vld1q_s16(x_filter4_ptr[1]);
+ const int16x8_t x_filter2 = vld1q_s16(x_filter4_ptr[2]);
+ const int16x8_t x_filter3 = vld1q_s16(x_filter4_ptr[3]);
+
+ // Group low and high parts and transpose
+ int16x4_t filters_lo[] = { vget_low_s16(x_filter0),
+ vget_low_s16(x_filter1),
+ vget_low_s16(x_filter2),
+ vget_low_s16(x_filter3) };
+ int16x4_t filters_hi[] = { vget_high_s16(x_filter0),
+ vget_high_s16(x_filter1),
+ vget_high_s16(x_filter2),
+ vget_high_s16(x_filter3) };
+ transpose_u16_4x4((uint16x4_t *)filters_lo);
+ transpose_u16_4x4((uint16x4_t *)filters_hi);
+
+ // Run the 2D Scale X convolution
+ d0 = highbd_convolve8_2d_scale_horiz4x8_s32_s16(
+ s0, s1, s2, s3, filters_lo, filters_hi, shift_s32, offset_s32);
+
+ vst1_u16(d, d0);
+
+ x_qn += 4 * x_step_qn;
+ d += 4;
+ width -= 4;
+ } while (width > 0);
+
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ height--;
+ } while (height > 0);
+ }
+}
+
+static INLINE void highbd_convolve_2d_y_scale_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int subpel_y_qn, const int y_step_qn,
+ const InterpFilterParams *filter_params, const int round1_bits,
+ const int offset) {
+ const int32x4_t offset_s32 = vdupq_n_s32(1 << offset);
+
+ const int32x4_t round1_shift_s32 = vdupq_n_s32(-round1_bits);
+ if (w <= 4) {
+ int height = h;
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t d0;
+
+ uint16_t *d = dst_ptr;
+
+ int y_qn = subpel_y_qn;
+ do {
+ const int16_t *s =
+ (const int16_t *)&src_ptr[(y_qn >> SCALE_SUBPEL_BITS) * src_stride];
+
+ load_s16_4x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+
+ const int y_filter_idx = (y_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+ const int16_t *y_filter_ptr =
+ av1_get_interp_filter_subpel_kernel(filter_params, y_filter_idx);
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+
+ d0 = highbd_convolve8_4_sr_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, round1_shift_s32, offset_s32,
+ vdupq_n_s32(0));
+
+ if (w == 2) {
+ store_u16_2x1(d, d0, 0);
+ } else {
+ vst1_u16(d, d0);
+ }
+
+ y_qn += y_step_qn;
+ d += dst_stride;
+ height--;
+ } while (height > 0);
+ } else {
+ int width = w;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0;
+
+ do {
+ int height = h;
+ int y_qn = subpel_y_qn;
+
+ uint16_t *d = dst_ptr;
+
+ do {
+ const int16_t *s =
+ (const int16_t *)&src_ptr[(y_qn >> SCALE_SUBPEL_BITS) * src_stride];
+ load_s16_8x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+
+ const int y_filter_idx = (y_qn & SCALE_SUBPEL_MASK) >> SCALE_EXTRA_BITS;
+ const int16_t *y_filter_ptr =
+ av1_get_interp_filter_subpel_kernel(filter_params, y_filter_idx);
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+
+ d0 = highbd_convolve8_8_sr_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, round1_shift_s32,
+ offset_s32, vdupq_n_s32(0));
+ vst1q_u16(d, d0);
+
+ y_qn += y_step_qn;
+ d += dst_stride;
+ height--;
+ } while (height > 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ width -= 8;
+ } while (width > 0);
+ }
+}
+
+static INLINE void highbd_dist_wtd_comp_avg_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, ConvolveParams *conv_params, const int round_bits,
+ const int offset, const int bd) {
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ const int dst16_stride = conv_params->dst_stride;
+ const int32x4_t round_shift_s32 = vdupq_n_s32(-round_bits);
+ const int16x4_t offset_s16 = vdup_n_s16(offset);
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+ uint16x4_t fwd_offset_u16 = vdup_n_u16(conv_params->fwd_offset);
+ uint16x4_t bck_offset_u16 = vdup_n_u16(conv_params->bck_offset);
+
+ // Weighted averaging
+ if (w <= 4) {
+ for (int y = 0; y < h; ++y) {
+ const uint16x4_t s = vld1_u16(src_ptr + y * src_stride);
+ const uint16x4_t d16 = vld1_u16(dst16 + y * dst16_stride);
+ // We use vmull_u16/vmlal_u16 instead of of vmull_s16/vmlal_s16
+ // because the latter sign-extend and the values are non-negative.
+ // However, d0/d1 are signed-integers and we use vqmovun
+ // to do saturated narrowing to unsigned.
+ int32x4_t d0 = vreinterpretq_s32_u32(vmull_u16(d16, fwd_offset_u16));
+ d0 = vreinterpretq_s32_u32(
+ vmlal_u16(vreinterpretq_u32_s32(d0), s, bck_offset_u16));
+ d0 = vshrq_n_s32(d0, DIST_PRECISION_BITS);
+ // Subtract round offset and convolve round
+ d0 = vqrshlq_s32(vsubw_s16(d0, offset_s16), round_shift_s32);
+ uint16x4_t d = vqmovun_s32(d0);
+ d = vmin_u16(d, vget_low_u16(max));
+ if (w == 2) {
+ store_u16_2x1(dst_ptr + y * dst_stride, d, 0);
+ } else {
+ vst1_u16(dst_ptr + y * dst_stride, d);
+ }
+ }
+ } else {
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; x += 8) {
+ const uint16x8_t s = vld1q_u16(src_ptr + y * src_stride + x);
+ const uint16x8_t d16 = vld1q_u16(dst16 + y * dst16_stride + x);
+ // We use vmull_u16/vmlal_u16 instead of of vmull_s16/vmlal_s16
+ // because the latter sign-extend and the values are non-negative.
+ // However, d0/d1 are signed-integers and we use vqmovun
+ // to do saturated narrowing to unsigned.
+ int32x4_t d0 =
+ vreinterpretq_s32_u32(vmull_u16(vget_low_u16(d16), fwd_offset_u16));
+ int32x4_t d1 = vreinterpretq_s32_u32(
+ vmull_u16(vget_high_u16(d16), fwd_offset_u16));
+ d0 = vreinterpretq_s32_u32(vmlal_u16(vreinterpretq_u32_s32(d0),
+ vget_low_u16(s), bck_offset_u16));
+ d1 = vreinterpretq_s32_u32(vmlal_u16(vreinterpretq_u32_s32(d1),
+ vget_high_u16(s), bck_offset_u16));
+ d0 = vshrq_n_s32(d0, DIST_PRECISION_BITS);
+ d1 = vshrq_n_s32(d1, DIST_PRECISION_BITS);
+ d0 = vqrshlq_s32(vsubw_s16(d0, offset_s16), round_shift_s32);
+ d1 = vqrshlq_s32(vsubw_s16(d1, offset_s16), round_shift_s32);
+ uint16x8_t d01 = vcombine_u16(vqmovun_s32(d0), vqmovun_s32(d1));
+ d01 = vminq_u16(d01, max);
+ vst1q_u16(dst_ptr + y * dst_stride + x, d01);
+ }
+ }
+ }
+}
+
+static INLINE void highbd_comp_avg_neon(const uint16_t *src_ptr, int src_stride,
+ uint16_t *dst_ptr, int dst_stride,
+ int w, int h,
+ ConvolveParams *conv_params,
+ const int round_bits, const int offset,
+ const int bd) {
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ const int dst16_stride = conv_params->dst_stride;
+ const int32x4_t round_shift_s32 = vdupq_n_s32(-round_bits);
+ const int16x4_t offset_s16 = vdup_n_s16(offset);
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+
+ if (w <= 4) {
+ for (int y = 0; y < h; ++y) {
+ const uint16x4_t s = vld1_u16(src_ptr + y * src_stride);
+ const uint16x4_t d16 = vld1_u16(dst16 + y * dst16_stride);
+ int32x4_t s_s32 = vreinterpretq_s32_u32(vmovl_u16(s));
+ int32x4_t d16_s32 = vreinterpretq_s32_u32(vmovl_u16(d16));
+ int32x4_t d0 = vhaddq_s32(s_s32, d16_s32);
+ d0 = vsubw_s16(d0, offset_s16);
+ d0 = vqrshlq_s32(d0, round_shift_s32);
+ uint16x4_t d = vqmovun_s32(d0);
+ d = vmin_u16(d, vget_low_u16(max));
+ if (w == 2) {
+ store_u16_2x1(dst_ptr + y * dst_stride, d, 0);
+ } else {
+ vst1_u16(dst_ptr + y * dst_stride, d);
+ }
+ }
+ } else {
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; x += 8) {
+ const uint16x8_t s = vld1q_u16(src_ptr + y * src_stride + x);
+ const uint16x8_t d16 = vld1q_u16(dst16 + y * dst16_stride + x);
+ int32x4_t s_lo = vreinterpretq_s32_u32(vmovl_u16(vget_low_u16(s)));
+ int32x4_t s_hi = vreinterpretq_s32_u32(vmovl_u16(vget_high_u16(s)));
+ int32x4_t d16_lo = vreinterpretq_s32_u32(vmovl_u16(vget_low_u16(d16)));
+ int32x4_t d16_hi = vreinterpretq_s32_u32(vmovl_u16(vget_high_u16(d16)));
+ int32x4_t d0 = vhaddq_s32(s_lo, d16_lo);
+ int32x4_t d1 = vhaddq_s32(s_hi, d16_hi);
+ d0 = vsubw_s16(d0, offset_s16);
+ d1 = vsubw_s16(d1, offset_s16);
+ d0 = vqrshlq_s32(d0, round_shift_s32);
+ d1 = vqrshlq_s32(d1, round_shift_s32);
+ uint16x8_t d01 = vcombine_u16(vqmovun_s32(d0), vqmovun_s32(d1));
+ d01 = vminq_u16(d01, max);
+ vst1q_u16(dst_ptr + y * dst_stride + x, d01);
+ }
+ }
+ }
+}
+
+static INLINE void highbd_convolve_correct_offset_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int round_bits, const int offset, const int bd) {
+ const int32x4_t round_shift_s32 = vdupq_n_s32(-round_bits);
+ const int16x4_t offset_s16 = vdup_n_s16(offset);
+ const uint16x8_t max = vdupq_n_u16((1 << bd) - 1);
+
+ if (w <= 4) {
+ for (int y = 0; y < h; ++y) {
+ const int16x4_t s = vld1_s16((const int16_t *)src_ptr + y * src_stride);
+ const int32x4_t d0 =
+ vqrshlq_s32(vsubl_s16(s, offset_s16), round_shift_s32);
+ uint16x4_t d = vqmovun_s32(d0);
+ d = vmin_u16(d, vget_low_u16(max));
+ if (w == 2) {
+ store_u16_2x1(dst_ptr + y * dst_stride, d, 0);
+ } else {
+ vst1_u16(dst_ptr + y * dst_stride, d);
+ }
+ }
+ } else {
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; x += 8) {
+ // Subtract round offset and convolve round
+ const int16x8_t s =
+ vld1q_s16((const int16_t *)src_ptr + y * src_stride + x);
+ const int32x4_t d0 = vqrshlq_s32(vsubl_s16(vget_low_s16(s), offset_s16),
+ round_shift_s32);
+ const int32x4_t d1 = vqrshlq_s32(
+ vsubl_s16(vget_high_s16(s), offset_s16), round_shift_s32);
+ uint16x8_t d01 = vcombine_u16(vqmovun_s32(d0), vqmovun_s32(d1));
+ d01 = vminq_u16(d01, max);
+ vst1q_u16(dst_ptr + y * dst_stride + x, d01);
+ }
+ }
+ }
+}
+
+void av1_highbd_convolve_2d_scale_neon(
+ const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+ int h, const InterpFilterParams *filter_params_x,
+ const InterpFilterParams *filter_params_y, const int subpel_x_qn,
+ const int x_step_qn, const int subpel_y_qn, const int y_step_qn,
+ ConvolveParams *conv_params, int bd) {
+ uint16_t *im_block = (uint16_t *)aom_memalign(
+ 16, 2 * sizeof(uint16_t) * MAX_SB_SIZE * (MAX_SB_SIZE + MAX_FILTER_TAP));
+ if (!im_block) return;
+ uint16_t *im_block2 = (uint16_t *)aom_memalign(
+ 16, 2 * sizeof(uint16_t) * MAX_SB_SIZE * (MAX_SB_SIZE + MAX_FILTER_TAP));
+ if (!im_block2) {
+ aom_free(im_block); // free the first block and return.
+ return;
+ }
+
+ int im_h = (((h - 1) * y_step_qn + subpel_y_qn) >> SCALE_SUBPEL_BITS) +
+ filter_params_y->taps;
+ const int im_stride = MAX_SB_SIZE;
+ const int bits =
+ FILTER_BITS * 2 - conv_params->round_0 - conv_params->round_1;
+ assert(bits >= 0);
+
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const int x_offset_bits = (1 << (bd + FILTER_BITS - 1));
+ const int y_offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int y_offset_correction =
+ ((1 << (y_offset_bits - conv_params->round_1)) +
+ (1 << (y_offset_bits - conv_params->round_1 - 1)));
+
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ const int dst16_stride = conv_params->dst_stride;
+
+ const uint16_t *src_ptr = src - vert_offset * src_stride - horiz_offset;
+
+ highbd_convolve_2d_x_scale_8tap_neon(
+ src_ptr, src_stride, im_block, im_stride, w, im_h, subpel_x_qn, x_step_qn,
+ filter_params_x, conv_params, x_offset_bits);
+ if (conv_params->is_compound && !conv_params->do_average) {
+ highbd_convolve_2d_y_scale_8tap_neon(
+ im_block, im_stride, dst16, dst16_stride, w, h, subpel_y_qn, y_step_qn,
+ filter_params_y, conv_params->round_1, y_offset_bits);
+ } else {
+ highbd_convolve_2d_y_scale_8tap_neon(
+ im_block, im_stride, im_block2, im_stride, w, h, subpel_y_qn, y_step_qn,
+ filter_params_y, conv_params->round_1, y_offset_bits);
+ }
+
+ // Do the compound averaging outside the loop, avoids branching within the
+ // main loop
+ if (conv_params->is_compound) {
+ if (conv_params->do_average) {
+ if (conv_params->use_dist_wtd_comp_avg) {
+ highbd_dist_wtd_comp_avg_neon(im_block2, im_stride, dst, dst_stride, w,
+ h, conv_params, bits, y_offset_correction,
+ bd);
+ } else {
+ highbd_comp_avg_neon(im_block2, im_stride, dst, dst_stride, w, h,
+ conv_params, bits, y_offset_correction, bd);
+ }
+ }
+ } else {
+ highbd_convolve_correct_offset_neon(im_block2, im_stride, dst, dst_stride,
+ w, h, bits, y_offset_correction, bd);
+ }
+ aom_free(im_block);
+ aom_free(im_block2);
+}
+
+static INLINE void highbd_convolve_dist_wtd_x_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *x_filter_ptr, ConvolveParams *conv_params,
+ const int offset) {
+ const int16x8_t x_filter = vld1q_s16(x_filter_ptr);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int weight_bits = FILTER_BITS - conv_params->round_1;
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+ const int32x4_t weight_s32 = vdupq_n_s32(1 << weight_bits);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+
+ if (w <= 4) {
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ do {
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ load_s16_8x2(s + 8, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve8_wtd_horiz4_s32_s16(
+ s0, s1, x_filter, shift_s32, zero_s32, weight_s32, offset_s32);
+ d1 = highbd_convolve8_wtd_horiz4_s32_s16(
+ s2, s3, x_filter, shift_s32, zero_s32, weight_s32, offset_s32);
+ d01 = vcombine_u16(d0, d1);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3;
+ uint16x8_t d0, d1;
+
+ do {
+ int width = w;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x2(s, src_stride, &s0, &s2);
+ s += 8;
+
+ do {
+ load_s16_8x2(s, src_stride, &s1, &s3);
+
+ d0 = highbd_convolve8_wtd_horiz8_s32_s16(
+ s0, s1, x_filter, shift_s32, zero_s32, weight_s32, offset_s32);
+ d1 = highbd_convolve8_wtd_horiz8_s32_s16(
+ s2, s3, x_filter, shift_s32, zero_s32, weight_s32, offset_s32);
+
+ store_u16_8x2(d, dst_stride, d0, d1);
+
+ s0 = s1;
+ s2 = s3;
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width > 0);
+ src_ptr += 2 * src_stride;
+ dst_ptr += 2 * dst_stride;
+ height -= 2;
+ } while (height > 0);
+ }
+}
+
+static INLINE void highbd_convolve_dist_wtd_y_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, ConvolveParams *conv_params,
+ const int offset) {
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_0);
+ const int weight_bits = FILTER_BITS - conv_params->round_1;
+ const int32x4_t zero_s32 = vdupq_n_s32(0);
+ const int32x4_t weight_s32 = vdupq_n_s32(1 << weight_bits);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x4_t d0, d1;
+ uint16x8_t d01;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_wtd_4_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, shift_s32, zero_s32,
+ weight_s32, offset_s32);
+ d1 = highbd_convolve8_wtd_4_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, shift_s32, zero_s32,
+ weight_s32, offset_s32);
+ d01 = vcombine_u16(d0, d1);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ }
+
+ s0 = s2;
+ s1 = s3;
+ s2 = s4;
+ s3 = s5;
+ s4 = s6;
+ s5 = s7;
+ s6 = s8;
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ h -= 2;
+ } while (h > 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x8_t d0, d1;
+
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_8x2(s, src_stride, &s7, &s8);
+
+ d0 = highbd_convolve8_wtd_8_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, shift_s32, zero_s32,
+ weight_s32, offset_s32);
+ d1 = highbd_convolve8_wtd_8_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, shift_s32, zero_s32,
+ weight_s32, offset_s32);
+
+ store_u16_8x2(d, dst_stride, d0, d1);
+
+ s0 = s2;
+ s1 = s3;
+ s2 = s4;
+ s3 = s5;
+ s4 = s6;
+ s5 = s7;
+ s6 = s8;
+ s += 2 * src_stride;
+ d += 2 * dst_stride;
+ height -= 2;
+ } while (height > 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+void av1_highbd_dist_wtd_convolve_x_neon(
+ const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params, int bd) {
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ int dst16_stride = conv_params->dst_stride;
+ const int im_stride = MAX_SB_SIZE;
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+ (1 << (offset_bits - conv_params->round_1 - 1));
+ const int round_bits =
+ 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ assert(round_bits >= 0);
+
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+
+ src -= horiz_offset;
+
+ // horizontal filter
+ if (conv_params->do_average) {
+ highbd_convolve_dist_wtd_x_8tap_neon(src, src_stride, im_block, im_stride,
+ w, h, x_filter_ptr, conv_params,
+ round_offset);
+ } else {
+ highbd_convolve_dist_wtd_x_8tap_neon(src, src_stride, dst16, dst16_stride,
+ w, h, x_filter_ptr, conv_params,
+ round_offset);
+ }
+
+ if (conv_params->do_average) {
+ if (conv_params->use_dist_wtd_comp_avg) {
+ highbd_dist_wtd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ } else {
+ highbd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ }
+ }
+}
+
+void av1_highbd_dist_wtd_convolve_y_neon(
+ const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+ int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn,
+ ConvolveParams *conv_params, int bd) {
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ int dst16_stride = conv_params->dst_stride;
+ const int im_stride = MAX_SB_SIZE;
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+ (1 << (offset_bits - conv_params->round_1 - 1));
+ const int round_bits =
+ 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ assert(round_bits >= 0);
+
+ const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_y, subpel_y_qn & SUBPEL_MASK);
+
+ src -= vert_offset * src_stride;
+
+ // vertical filter
+ if (conv_params->do_average) {
+ highbd_convolve_dist_wtd_y_8tap_neon(src, src_stride, im_block, im_stride,
+ w, h, y_filter_ptr, conv_params,
+ round_offset);
+ } else {
+ highbd_convolve_dist_wtd_y_8tap_neon(src, src_stride, dst16, dst16_stride,
+ w, h, y_filter_ptr, conv_params,
+ round_offset);
+ }
+
+ if (conv_params->do_average) {
+ if (conv_params->use_dist_wtd_comp_avg) {
+ highbd_dist_wtd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ } else {
+ highbd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ }
+ }
+}
+
+static INLINE void highbd_2d_copy_neon(const uint16_t *src_ptr, int src_stride,
+ uint16_t *dst_ptr, int dst_stride, int w,
+ int h, const int round_bits,
+ const int offset) {
+ if (w <= 4) {
+ const int16x4_t round_shift_s16 = vdup_n_s16(round_bits);
+ const uint16x4_t offset_u16 = vdup_n_u16(offset);
+
+ for (int y = 0; y < h; ++y) {
+ const uint16x4_t s = vld1_u16(src_ptr + y * src_stride);
+ uint16x4_t d = vshl_u16(s, round_shift_s16);
+ d = vadd_u16(d, offset_u16);
+ if (w == 2) {
+ store_u16_2x1(dst_ptr + y * dst_stride, d, 0);
+ } else {
+ vst1_u16(dst_ptr + y * dst_stride, d);
+ }
+ }
+ } else {
+ const int16x8_t round_shift_s16 = vdupq_n_s16(round_bits);
+ const uint16x8_t offset_u16 = vdupq_n_u16(offset);
+
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; x += 8) {
+ const uint16x8_t s = vld1q_u16(src_ptr + y * src_stride + x);
+ uint16x8_t d = vshlq_u16(s, round_shift_s16);
+ d = vaddq_u16(d, offset_u16);
+ vst1q_u16(dst_ptr + y * dst_stride + x, d);
+ }
+ }
+ }
+}
+
+void av1_highbd_dist_wtd_convolve_2d_copy_neon(const uint16_t *src,
+ int src_stride, uint16_t *dst,
+ int dst_stride, int w, int h,
+ ConvolveParams *conv_params,
+ int bd) {
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+
+ const int im_stride = MAX_SB_SIZE;
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ int dst16_stride = conv_params->dst_stride;
+ const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
+ (1 << (offset_bits - conv_params->round_1 - 1));
+ const int round_bits =
+ 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ assert(round_bits >= 0);
+
+ if (conv_params->do_average) {
+ highbd_2d_copy_neon(src, src_stride, im_block, im_stride, w, h, round_bits,
+ round_offset);
+ } else {
+ highbd_2d_copy_neon(src, src_stride, dst16, dst16_stride, w, h, round_bits,
+ round_offset);
+ }
+
+ if (conv_params->do_average) {
+ if (conv_params->use_dist_wtd_comp_avg) {
+ highbd_dist_wtd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ } else {
+ highbd_comp_avg_neon(im_block, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, round_offset, bd);
+ }
+ }
+}
+
+static INLINE void highbd_convolve_y_8tap_neon(
+ const uint16_t *src_ptr, int src_stride, uint16_t *dst_ptr, int dst_stride,
+ int w, int h, const int16_t *y_filter_ptr, ConvolveParams *conv_params,
+ int offset) {
+ const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
+ const int32x4_t offset_s32 = vdupq_n_s32(offset);
+ const int32x4_t shift_s32 = vdupq_n_s32(-conv_params->round_1);
+
+ if (w <= 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x4_t d0, d1, d2, d3;
+ uint16x8_t d01, d23;
+
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_4x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_4x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_sr_4_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, shift_s32, offset_s32);
+ d1 = highbd_convolve8_sr_4_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, shift_s32, offset_s32);
+ d2 = highbd_convolve8_sr_4_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9,
+ y_filter, shift_s32, offset_s32);
+ d3 = highbd_convolve8_sr_4_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10,
+ y_filter, shift_s32, offset_s32);
+
+ d01 = vcombine_u16(d0, d1);
+ d23 = vcombine_u16(d2, d3);
+
+ if (w == 2) {
+ store_u16q_2x1(d + 0 * dst_stride, d01, 0);
+ store_u16q_2x1(d + 1 * dst_stride, d01, 2);
+ if (h != 2) {
+ store_u16q_2x1(d + 2 * dst_stride, d23, 0);
+ store_u16q_2x1(d + 3 * dst_stride, d23, 2);
+ }
+ } else {
+ vst1_u16(d + 0 * dst_stride, vget_low_u16(d01));
+ vst1_u16(d + 1 * dst_stride, vget_high_u16(d01));
+ if (h != 2) {
+ vst1_u16(d + 2 * dst_stride, vget_low_u16(d23));
+ vst1_u16(d + 3 * dst_stride, vget_high_u16(d23));
+ }
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ h -= 4;
+ } while (h > 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10;
+ uint16x8_t d0, d1, d2, d3;
+ do {
+ int height = h;
+ const int16_t *s = (const int16_t *)src_ptr;
+ uint16_t *d = dst_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = highbd_convolve8_8_s32_s16(s0, s1, s2, s3, s4, s5, s6, s7,
+ y_filter, offset_s32);
+ d1 = highbd_convolve8_8_s32_s16(s1, s2, s3, s4, s5, s6, s7, s8,
+ y_filter, offset_s32);
+ d2 = highbd_convolve8_8_s32_s16(s2, s3, s4, s5, s6, s7, s8, s9,
+ y_filter, offset_s32);
+ d3 = highbd_convolve8_8_s32_s16(s3, s4, s5, s6, s7, s8, s9, s10,
+ y_filter, offset_s32);
+
+ if (h == 2) {
+ store_u16_8x2(d, dst_stride, d0, d1);
+ } else {
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+ }
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+ } while (height > 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w > 0);
+ }
+}
+
+void av1_highbd_dist_wtd_convolve_2d_neon(
+ const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w,
+ int h, const InterpFilterParams *filter_params_x,
+ const InterpFilterParams *filter_params_y, const int subpel_x_qn,
+ const int subpel_y_qn, ConvolveParams *conv_params, int bd) {
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+ DECLARE_ALIGNED(16, uint16_t,
+ im_block2[(MAX_SB_SIZE + MAX_FILTER_TAP) * MAX_SB_SIZE]);
+
+ CONV_BUF_TYPE *dst16 = conv_params->dst;
+ int dst16_stride = conv_params->dst_stride;
+
+ const int im_h = h + filter_params_y->taps - 1;
+ const int im_stride = MAX_SB_SIZE;
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const int round_bits =
+ 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ const int x_offset_initial = (1 << (bd + FILTER_BITS - 1));
+ const int y_offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
+ const int y_offset_initial = (1 << y_offset_bits);
+ const int y_offset_correction =
+ ((1 << (y_offset_bits - conv_params->round_1)) +
+ (1 << (y_offset_bits - conv_params->round_1 - 1)));
+
+ const uint16_t *src_ptr = src - vert_offset * src_stride - horiz_offset;
+
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_y, subpel_y_qn & SUBPEL_MASK);
+
+ // horizontal filter
+ highbd_convolve_x_8tap_neon(src_ptr, src_stride, im_block, im_stride, w, im_h,
+ x_filter_ptr, conv_params, x_offset_initial);
+ // vertical filter
+ if (conv_params->do_average) {
+ highbd_convolve_y_8tap_neon(im_block, im_stride, im_block2, im_stride, w, h,
+ y_filter_ptr, conv_params, y_offset_initial);
+ } else {
+ highbd_convolve_y_8tap_neon(im_block, im_stride, dst16, dst16_stride, w, h,
+ y_filter_ptr, conv_params, y_offset_initial);
+ }
+
+ // Do the compound averaging outside the loop, avoids branching within the
+ // main loop
+ if (conv_params->do_average) {
+ if (conv_params->use_dist_wtd_comp_avg) {
+ highbd_dist_wtd_comp_avg_neon(im_block2, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits,
+ y_offset_correction, bd);
+ } else {
+ highbd_comp_avg_neon(im_block2, im_stride, dst, dst_stride, w, h,
+ conv_params, round_bits, y_offset_correction, bd);
+ }
+ }
+}
+
+#define UPSCALE_NORMATIVE_TAPS 8
+
+void av1_highbd_convolve_horiz_rs_neon(const uint16_t *src, int src_stride,
+ uint16_t *dst, int dst_stride, int w,
+ int h, const int16_t *x_filters,
+ int x0_qn, int x_step_qn, int bd) {
+ const int horiz_offset = UPSCALE_NORMATIVE_TAPS / 2 - 1;
+
+ const int32x4_t idx = { 0, 1, 2, 3 };
+ const int32x4_t subpel_mask = vdupq_n_s32(RS_SCALE_SUBPEL_MASK);
+ const int32x4_t shift_s32 = vdupq_n_s32(-FILTER_BITS);
+ const int32x4_t offset_s32 = vdupq_n_s32(0);
+ const uint16x4_t max = vdup_n_u16((1 << bd) - 1);
+
+ const uint16_t *src_ptr = src - horiz_offset;
+ uint16_t *dst_ptr = dst;
+
+ if (w <= 4) {
+ int height = h;
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0;
+
+ uint16_t *d = dst_ptr;
+ do {
+ int x_qn = x0_qn;
+
+ // Load 4 src vectors at a time, they might be the same, but we have to
+ // calculate the indices anyway. Doing it in SIMD and then storing the
+ // indices is faster than having to calculate the expression
+ // &src_ptr[((x_qn + 0*x_step_qn) >> RS_SCALE_SUBPEL_BITS)] 4 times
+ // Ideally this should be a gather using the indices, but NEON does not
+ // have that, so have to emulate
+ const int32x4_t xqn_idx = vmlaq_n_s32(vdupq_n_s32(x_qn), idx, x_step_qn);
+ // We have to multiply x2 to get the actual pointer as sizeof(uint16_t) =
+ // 2
+ const int32x4_t src_idx =
+ vshlq_n_s32(vshrq_n_s32(xqn_idx, RS_SCALE_SUBPEL_BITS), 1);
+ // Similarly for the filter vector indices, we calculate the filter
+ // indices for 4 columns. First we calculate the indices:
+ // x_qn & RS_SCALE_SUBPEL_MASK) >> RS_SCALE_EXTRA_BITS
+ // Then we calculate the actual pointers, multiplying with
+ // UPSCALE_UPSCALE_NORMATIVE_TAPS
+ // again shift left by 1
+ const int32x4_t x_filter4_idx = vshlq_n_s32(
+ vshrq_n_s32(vandq_s32(xqn_idx, subpel_mask), RS_SCALE_EXTRA_BITS), 1);
+ // Even though pointers are unsigned 32/64-bit ints we do signed
+ // addition The reason for this is that x_qn can be negative, leading to
+ // negative offsets. Argon test
+ // profile0_core/streams/test10573_11003.obu was failing because of
+ // this.
+#if AOM_ARCH_AARCH64
+ uint64x2_t tmp4[2];
+ tmp4[0] = vreinterpretq_u64_s64(vaddw_s32(
+ vdupq_n_s64((const int64_t)src_ptr), vget_low_s32(src_idx)));
+ tmp4[1] = vreinterpretq_u64_s64(vaddw_s32(
+ vdupq_n_s64((const int64_t)src_ptr), vget_high_s32(src_idx)));
+ int16_t *src4_ptr[4];
+ uint64_t *tmp_ptr = (uint64_t *)&src4_ptr;
+ vst1q_u64(tmp_ptr, tmp4[0]);
+ vst1q_u64(tmp_ptr + 2, tmp4[1]);
+
+ // filter vectors
+ tmp4[0] = vreinterpretq_u64_s64(vmlal_s32(
+ vdupq_n_s64((const int64_t)x_filters), vget_low_s32(x_filter4_idx),
+ vdup_n_s32(UPSCALE_NORMATIVE_TAPS)));
+ tmp4[1] = vreinterpretq_u64_s64(vmlal_s32(
+ vdupq_n_s64((const int64_t)x_filters), vget_high_s32(x_filter4_idx),
+ vdup_n_s32(UPSCALE_NORMATIVE_TAPS)));
+
+ const int16_t *x_filter4_ptr[4];
+ tmp_ptr = (uint64_t *)&x_filter4_ptr;
+ vst1q_u64(tmp_ptr, tmp4[0]);
+ vst1q_u64(tmp_ptr + 2, tmp4[1]);
+#else
+ uint32x4_t tmp4;
+ tmp4 = vreinterpretq_u32_s32(
+ vaddq_s32(vdupq_n_s32((const int32_t)src_ptr), src_idx));
+ int16_t *src4_ptr[4];
+ uint32_t *tmp_ptr = (uint32_t *)&src4_ptr;
+ vst1q_u32(tmp_ptr, tmp4);
+ // filter vectors
+ tmp4 = vreinterpretq_u32_s32(
+ vmlaq_s32(vdupq_n_s32((const int32_t)x_filters), x_filter4_idx,
+ vdupq_n_s32(UPSCALE_NORMATIVE_TAPS)));
+
+ const int16_t *x_filter4_ptr[4];
+ tmp_ptr = (uint32_t *)&x_filter4_ptr;
+ vst1q_u32(tmp_ptr, tmp4);
+#endif // AOM_ARCH_AARCH64
+ // Load source
+ s0 = vld1q_s16(src4_ptr[0]);
+ s1 = vld1q_s16(src4_ptr[1]);
+ s2 = vld1q_s16(src4_ptr[2]);
+ s3 = vld1q_s16(src4_ptr[3]);
+
+ // Actually load the filters
+ const int16x8_t x_filter0 = vld1q_s16(x_filter4_ptr[0]);
+ const int16x8_t x_filter1 = vld1q_s16(x_filter4_ptr[1]);
+ const int16x8_t x_filter2 = vld1q_s16(x_filter4_ptr[2]);
+ const int16x8_t x_filter3 = vld1q_s16(x_filter4_ptr[3]);
+
+ // Group low and high parts and transpose
+ int16x4_t filters_lo[] = { vget_low_s16(x_filter0),
+ vget_low_s16(x_filter1),
+ vget_low_s16(x_filter2),
+ vget_low_s16(x_filter3) };
+ int16x4_t filters_hi[] = { vget_high_s16(x_filter0),
+ vget_high_s16(x_filter1),
+ vget_high_s16(x_filter2),
+ vget_high_s16(x_filter3) };
+ transpose_u16_4x4((uint16x4_t *)filters_lo);
+ transpose_u16_4x4((uint16x4_t *)filters_hi);
+
+ // Run the 2D Scale convolution
+ d0 = highbd_convolve8_2d_scale_horiz4x8_s32_s16(
+ s0, s1, s2, s3, filters_lo, filters_hi, shift_s32, offset_s32);
+
+ d0 = vmin_u16(d0, max);
+
+ if (w == 2) {
+ store_u16_2x1(d + 0 * dst_stride, d0, 0);
+ } else {
+ vst1_u16(d + 0 * dst_stride, d0);
+ }
+
+ src_ptr += src_stride;
+ d += dst_stride;
+ height--;
+ } while (height > 0);
+ } else {
+ int height = h;
+ int16x8_t s0, s1, s2, s3;
+ uint16x4_t d0;
+
+ do {
+ int width = w;
+ int x_qn = x0_qn;
+ uint16_t *d = dst_ptr;
+ const uint16_t *s = src_ptr;
+
+ do {
+ // Load 4 src vectors at a time, they might be the same, but we have to
+ // calculate the indices anyway. Doing it in SIMD and then storing the
+ // indices is faster than having to calculate the expression
+ // &src_ptr[((x_qn + 0*x_step_qn) >> RS_SCALE_SUBPEL_BITS)] 4 times
+ // Ideally this should be a gather using the indices, but NEON does not
+ // have that, so have to emulate
+ const int32x4_t xqn_idx =
+ vmlaq_n_s32(vdupq_n_s32(x_qn), idx, x_step_qn);
+ // We have to multiply x2 to get the actual pointer as sizeof(uint16_t)
+ // = 2
+ const int32x4_t src_idx =
+ vshlq_n_s32(vshrq_n_s32(xqn_idx, RS_SCALE_SUBPEL_BITS), 1);
+
+ // Similarly for the filter vector indices, we calculate the filter
+ // indices for 4 columns. First we calculate the indices:
+ // x_qn & RS_SCALE_SUBPEL_MASK) >> RS_SCALE_EXTRA_BITS
+ // Then we calculate the actual pointers, multiplying with
+ // UPSCALE_UPSCALE_NORMATIVE_TAPS
+ // again shift left by 1
+ const int32x4_t x_filter4_idx = vshlq_n_s32(
+ vshrq_n_s32(vandq_s32(xqn_idx, subpel_mask), RS_SCALE_EXTRA_BITS),
+ 1);
+ // Even though pointers are unsigned 32/64-bit ints we do signed
+ // addition The reason for this is that x_qn can be negative, leading to
+ // negative offsets. Argon test
+ // profile0_core/streams/test10573_11003.obu was failing because of
+ // this.
+#if AOM_ARCH_AARCH64
+ uint64x2_t tmp4[2];
+ tmp4[0] = vreinterpretq_u64_s64(
+ vaddw_s32(vdupq_n_s64((const int64_t)s), vget_low_s32(src_idx)));
+ tmp4[1] = vreinterpretq_u64_s64(
+ vaddw_s32(vdupq_n_s64((const int64_t)s), vget_high_s32(src_idx)));
+ int16_t *src4_ptr[4];
+ uint64_t *tmp_ptr = (uint64_t *)&src4_ptr;
+ vst1q_u64(tmp_ptr, tmp4[0]);
+ vst1q_u64(tmp_ptr + 2, tmp4[1]);
+
+ // filter vectors
+ tmp4[0] = vreinterpretq_u64_s64(vmlal_s32(
+ vdupq_n_s64((const int64_t)x_filters), vget_low_s32(x_filter4_idx),
+ vdup_n_s32(UPSCALE_NORMATIVE_TAPS)));
+ tmp4[1] = vreinterpretq_u64_s64(vmlal_s32(
+ vdupq_n_s64((const int64_t)x_filters), vget_high_s32(x_filter4_idx),
+ vdup_n_s32(UPSCALE_NORMATIVE_TAPS)));
+
+ const int16_t *x_filter4_ptr[4];
+ tmp_ptr = (uint64_t *)&x_filter4_ptr;
+ vst1q_u64(tmp_ptr, tmp4[0]);
+ vst1q_u64(tmp_ptr + 2, tmp4[1]);
+#else
+ uint32x4_t tmp4;
+ tmp4 = vreinterpretq_u32_s32(
+ vaddq_s32(vdupq_n_s32((const int32_t)s), src_idx));
+ int16_t *src4_ptr[4];
+ uint32_t *tmp_ptr = (uint32_t *)&src4_ptr;
+ vst1q_u32(tmp_ptr, tmp4);
+ // filter vectors
+ tmp4 = vreinterpretq_u32_s32(
+ vmlaq_s32(vdupq_n_s32((const int32_t)x_filters), x_filter4_idx,
+ vdupq_n_s32(UPSCALE_NORMATIVE_TAPS)));
+
+ const int16_t *x_filter4_ptr[4];
+ tmp_ptr = (uint32_t *)&x_filter4_ptr;
+ vst1q_u32(tmp_ptr, tmp4);
+#endif // AOM_ARCH_AARCH64
+
+ // Load source
+ s0 = vld1q_s16(src4_ptr[0]);
+ s1 = vld1q_s16(src4_ptr[1]);
+ s2 = vld1q_s16(src4_ptr[2]);
+ s3 = vld1q_s16(src4_ptr[3]);
+
+ // Actually load the filters
+ const int16x8_t x_filter0 = vld1q_s16(x_filter4_ptr[0]);
+ const int16x8_t x_filter1 = vld1q_s16(x_filter4_ptr[1]);
+ const int16x8_t x_filter2 = vld1q_s16(x_filter4_ptr[2]);
+ const int16x8_t x_filter3 = vld1q_s16(x_filter4_ptr[3]);
+
+ // Group low and high parts and transpose
+ int16x4_t filters_lo[] = { vget_low_s16(x_filter0),
+ vget_low_s16(x_filter1),
+ vget_low_s16(x_filter2),
+ vget_low_s16(x_filter3) };
+ int16x4_t filters_hi[] = { vget_high_s16(x_filter0),
+ vget_high_s16(x_filter1),
+ vget_high_s16(x_filter2),
+ vget_high_s16(x_filter3) };
+ transpose_u16_4x4((uint16x4_t *)filters_lo);
+ transpose_u16_4x4((uint16x4_t *)filters_hi);
+
+ // Run the 2D Scale X convolution
+ d0 = highbd_convolve8_2d_scale_horiz4x8_s32_s16(
+ s0, s1, s2, s3, filters_lo, filters_hi, shift_s32, offset_s32);
+
+ d0 = vmin_u16(d0, max);
+ vst1_u16(d, d0);
+
+ x_qn += 4 * x_step_qn;
+ d += 4;
+ width -= 4;
+ } while (width > 0);
+
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ height--;
+ } while (height > 0);
+ }
+}
diff --git a/av1/common/arm/highbd_convolve_neon.h b/av1/common/arm/highbd_convolve_neon.h
new file mode 100644
index 0000000..f9d028f
--- /dev/null
+++ b/av1/common/arm/highbd_convolve_neon.h
@@ -0,0 +1,550 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_COMMON_ARM_HIGHBD_CONVOLVE_NEON_H_
+#define AOM_AV1_COMMON_ARM_HIGHBD_CONVOLVE_NEON_H_
+
+#include <arm_neon.h>
+
+static INLINE int32x4_t highbd_convolve6_4_s32(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter, const int32x4_t offset) {
+ const int16x4_t y_filter_lo = vget_low_s16(y_filter);
+ const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+
+ int32x4_t sum = vmlal_lane_s16(offset, s0, y_filter_lo, 1);
+ sum = vmlal_lane_s16(sum, s1, y_filter_lo, 2);
+ sum = vmlal_lane_s16(sum, s2, y_filter_lo, 3);
+ sum = vmlal_lane_s16(sum, s3, y_filter_hi, 0);
+ sum = vmlal_lane_s16(sum, s4, y_filter_hi, 1);
+ sum = vmlal_lane_s16(sum, s5, y_filter_hi, 2);
+
+ return sum;
+}
+
+static INLINE uint16x4_t highbd_convolve6_4_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter, const int32x4_t offset) {
+ int32x4_t sum =
+ highbd_convolve6_4_s32(s0, s1, s2, s3, s4, s5, y_filter, offset);
+
+ return vqrshrun_n_s32(sum, COMPOUND_ROUND1_BITS);
+}
+
+static INLINE void highbd_convolve6_8_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filter, const int32x4_t offset, int32x4_t *sum0,
+ int32x4_t *sum1) {
+ const int16x4_t y_filter_lo = vget_low_s16(y_filter);
+ const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+
+ *sum0 = vmlal_lane_s16(offset, vget_low_s16(s0), y_filter_lo, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s1), y_filter_lo, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s2), y_filter_lo, 3);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s3), y_filter_hi, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s4), y_filter_hi, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s5), y_filter_hi, 2);
+
+ *sum1 = vmlal_lane_s16(offset, vget_high_s16(s0), y_filter_lo, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s1), y_filter_lo, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s2), y_filter_lo, 3);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s3), y_filter_hi, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s4), y_filter_hi, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s5), y_filter_hi, 2);
+}
+
+static INLINE uint16x8_t highbd_convolve6_8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filter, const int32x4_t offset) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve6_8_s32(s0, s1, s2, s3, s4, s5, y_filter, offset, &sum0,
+ &sum1);
+
+ return vcombine_u16(vqrshrun_n_s32(sum0, COMPOUND_ROUND1_BITS),
+ vqrshrun_n_s32(sum1, COMPOUND_ROUND1_BITS));
+}
+
+static INLINE int32x4_t highbd_convolve8_4_s32(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
+ const int32x4_t offset) {
+ const int16x4_t y_filter_lo = vget_low_s16(y_filter);
+ const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+
+ int32x4_t sum = vmlal_lane_s16(offset, s0, y_filter_lo, 0);
+ sum = vmlal_lane_s16(sum, s1, y_filter_lo, 1);
+ sum = vmlal_lane_s16(sum, s2, y_filter_lo, 2);
+ sum = vmlal_lane_s16(sum, s3, y_filter_lo, 3);
+ sum = vmlal_lane_s16(sum, s4, y_filter_hi, 0);
+ sum = vmlal_lane_s16(sum, s5, y_filter_hi, 1);
+ sum = vmlal_lane_s16(sum, s6, y_filter_hi, 2);
+ sum = vmlal_lane_s16(sum, s7, y_filter_hi, 3);
+
+ return sum;
+}
+
+static INLINE uint16x4_t highbd_convolve8_4_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
+ const int32x4_t offset) {
+ int32x4_t sum =
+ highbd_convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset);
+
+ return vqrshrun_n_s32(sum, COMPOUND_ROUND1_BITS);
+}
+
+static INLINE uint16x4_t highbd_convolve8_sr_4_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
+ const int32x4_t shift_s32, const int32x4_t offset) {
+ int32x4_t sum =
+ highbd_convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ return vqmovun_s32(sum);
+}
+
+static INLINE uint16x4_t highbd_convolve8_wtd_4_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
+ const int32x4_t shift_s32, const int32x4_t offset, const int32x4_t weight,
+ const int32x4_t offset2) {
+ int32x4_t sum =
+ highbd_convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ sum = vmlaq_s32(offset2, sum, weight);
+
+ return vqmovun_s32(sum);
+}
+
+// Like above but also perform round shifting and subtract correction term
+static INLINE uint16x4_t highbd_convolve8_4_sr_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x8_t y_filter,
+ const int32x4_t round_shift, const int32x4_t offset,
+ const int32x4_t correction) {
+ int32x4_t sum =
+ highbd_convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset);
+
+ sum = vsubq_s32(vqrshlq_s32(sum, round_shift), correction);
+ return vqmovun_s32(sum);
+}
+
+static INLINE void highbd_convolve8_8_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t y_filter,
+ const int32x4_t offset, int32x4_t *sum0, int32x4_t *sum1) {
+ const int16x4_t y_filter_lo = vget_low_s16(y_filter);
+ const int16x4_t y_filter_hi = vget_high_s16(y_filter);
+
+ *sum0 = vmlal_lane_s16(offset, vget_low_s16(s0), y_filter_lo, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s1), y_filter_lo, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s2), y_filter_lo, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s3), y_filter_lo, 3);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s4), y_filter_hi, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s5), y_filter_hi, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s6), y_filter_hi, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s7), y_filter_hi, 3);
+
+ *sum1 = vmlal_lane_s16(offset, vget_high_s16(s0), y_filter_lo, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s1), y_filter_lo, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s2), y_filter_lo, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s3), y_filter_lo, 3);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s4), y_filter_hi, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s5), y_filter_hi, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s6), y_filter_hi, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s7), y_filter_hi, 3);
+}
+
+static INLINE uint16x8_t highbd_convolve8_8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t y_filter,
+ const int32x4_t offset) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset,
+ &sum0, &sum1);
+
+ return vcombine_u16(vqrshrun_n_s32(sum0, COMPOUND_ROUND1_BITS),
+ vqrshrun_n_s32(sum1, COMPOUND_ROUND1_BITS));
+}
+
+static INLINE uint16x8_t highbd_convolve8_wtd_8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t y_filter,
+ const int32x4_t shift_s32, const int32x4_t offset, const int32x4_t weight,
+ const int32x4_t offset2) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset,
+ &sum0, &sum1);
+
+ sum0 = vqrshlq_s32(sum0, shift_s32);
+ sum1 = vqrshlq_s32(sum1, shift_s32);
+ sum0 = vmlaq_s32(offset2, sum0, weight);
+ sum1 = vmlaq_s32(offset2, sum1, weight);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+// Like above but also perform round shifting and subtract correction term
+static INLINE uint16x8_t highbd_convolve8_8_sr_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t y_filter,
+ const int32x4_t round_shift, const int32x4_t offset,
+ const int32x4_t correction) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, offset,
+ &sum0, &sum1);
+
+ sum0 = vsubq_s32(vqrshlq_s32(sum0, round_shift), correction);
+ sum1 = vsubq_s32(vqrshlq_s32(sum1, round_shift), correction);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+static INLINE int32x4_t highbd_convolve12_y_4_s32(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t offset) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+
+ int32x4_t sum = vmlal_lane_s16(offset, s0, y_filter_0_3, 0);
+ sum = vmlal_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmlal_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmlal_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmlal_lane_s16(sum, s4, y_filter_4_7, 0);
+ sum = vmlal_lane_s16(sum, s5, y_filter_4_7, 1);
+ sum = vmlal_lane_s16(sum, s6, y_filter_4_7, 2);
+ sum = vmlal_lane_s16(sum, s7, y_filter_4_7, 3);
+ sum = vmlal_lane_s16(sum, s8, y_filter_8_11, 0);
+ sum = vmlal_lane_s16(sum, s9, y_filter_8_11, 1);
+ sum = vmlal_lane_s16(sum, s10, y_filter_8_11, 2);
+ sum = vmlal_lane_s16(sum, s11, y_filter_8_11, 3);
+
+ return sum;
+}
+
+static INLINE uint16x4_t highbd_convolve12_y_4_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t offset) {
+ int32x4_t sum =
+ highbd_convolve12_y_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11, offset);
+
+ return vqrshrun_n_s32(sum, COMPOUND_ROUND1_BITS);
+}
+
+// Like above but also perform round shifting and subtract correction term
+static INLINE uint16x4_t highbd_convolve12_y_4_sr_s32_s16(
+ const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7, const int16x4_t s8,
+ const int16x4_t s9, const int16x4_t s10, const int16x4_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t round_shift, const int32x4_t offset,
+ const int32x4_t correction) {
+ int32x4_t sum =
+ highbd_convolve12_y_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10,
+ s11, y_filter_0_7, y_filter_8_11, offset);
+
+ sum = vsubq_s32(vqrshlq_s32(sum, round_shift), correction);
+ return vqmovun_s32(sum);
+}
+
+static INLINE void highbd_convolve12_y_8_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
+ const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t offset, int32x4_t *sum0, int32x4_t *sum1) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter_0_7);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter_0_7);
+
+ *sum0 = vmlal_lane_s16(offset, vget_low_s16(s0), y_filter_0_3, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s1), y_filter_0_3, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s2), y_filter_0_3, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s3), y_filter_0_3, 3);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s4), y_filter_4_7, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s5), y_filter_4_7, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s6), y_filter_4_7, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s7), y_filter_4_7, 3);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s8), y_filter_8_11, 0);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s9), y_filter_8_11, 1);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s10), y_filter_8_11, 2);
+ *sum0 = vmlal_lane_s16(*sum0, vget_low_s16(s11), y_filter_8_11, 3);
+
+ *sum1 = vmlal_lane_s16(offset, vget_high_s16(s0), y_filter_0_3, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s1), y_filter_0_3, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s2), y_filter_0_3, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s3), y_filter_0_3, 3);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s4), y_filter_4_7, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s5), y_filter_4_7, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s6), y_filter_4_7, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s7), y_filter_4_7, 3);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s8), y_filter_8_11, 0);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s9), y_filter_8_11, 1);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s10), y_filter_8_11, 2);
+ *sum1 = vmlal_lane_s16(*sum1, vget_high_s16(s11), y_filter_8_11, 3);
+}
+
+static INLINE uint16x8_t highbd_convolve12_y_8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
+ const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t offset) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve12_y_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ y_filter_0_7, y_filter_8_11, offset, &sum0, &sum1);
+
+ return vcombine_u16(vqrshrun_n_s32(sum0, COMPOUND_ROUND1_BITS),
+ vqrshrun_n_s32(sum1, COMPOUND_ROUND1_BITS));
+}
+
+// Like above but also perform round shifting and subtract correction term
+static INLINE uint16x8_t highbd_convolve12_y_8_sr_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7, const int16x8_t s8,
+ const int16x8_t s9, const int16x8_t s10, const int16x8_t s11,
+ const int16x8_t y_filter_0_7, const int16x4_t y_filter_8_11,
+ const int32x4_t round_shift, const int32x4_t offset,
+ const int32x4_t correction) {
+ int32x4_t sum0;
+ int32x4_t sum1;
+ highbd_convolve12_y_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ y_filter_0_7, y_filter_8_11, offset, &sum0, &sum1);
+
+ sum0 = vsubq_s32(vqrshlq_s32(sum0, round_shift), correction);
+ sum1 = vsubq_s32(vqrshlq_s32(sum1, round_shift), correction);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+static INLINE int32x4_t highbd_convolve8_horiz4_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int32x4_t offset) {
+ const int16x8_t s2 = vextq_s16(s0, s1, 1);
+ const int16x8_t s3 = vextq_s16(s0, s1, 2);
+ const int16x8_t s4 = vextq_s16(s0, s1, 3);
+ const int16x4_t s0_lo = vget_low_s16(s0);
+ const int16x4_t s1_lo = vget_low_s16(s2);
+ const int16x4_t s2_lo = vget_low_s16(s3);
+ const int16x4_t s3_lo = vget_low_s16(s4);
+ const int16x4_t s4_lo = vget_high_s16(s0);
+ const int16x4_t s5_lo = vget_high_s16(s2);
+ const int16x4_t s6_lo = vget_high_s16(s3);
+ const int16x4_t s7_lo = vget_high_s16(s4);
+
+ return highbd_convolve8_4_s32(s0_lo, s1_lo, s2_lo, s3_lo, s4_lo, s5_lo, s6_lo,
+ s7_lo, x_filter_0_7, offset);
+}
+
+static INLINE uint16x4_t highbd_convolve8_horiz4_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int32x4_t shift_s32, const int32x4_t offset) {
+ int32x4_t sum = highbd_convolve8_horiz4_s32(s0, s1, x_filter_0_7, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ return vqmovun_s32(sum);
+}
+
+static INLINE uint16x4_t highbd_convolve8_wtd_horiz4_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int32x4_t shift_s32, const int32x4_t offset, const int32x4_t weight,
+ const int32x4_t offset2) {
+ int32x4_t sum = highbd_convolve8_horiz4_s32(s0, s1, x_filter_0_7, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ sum = vmlaq_s32(offset2, sum, weight);
+ return vqmovun_s32(sum);
+}
+
+static INLINE void highbd_convolve8_horiz8_s32(
+ const int16x8_t s0, const int16x8_t s0_hi, const int16x8_t x_filter_0_7,
+ const int32x4_t offset, int32x4_t *sum0, int32x4_t *sum1) {
+ const int16x8_t s1 = vextq_s16(s0, s0_hi, 1);
+ const int16x8_t s2 = vextq_s16(s0, s0_hi, 2);
+ const int16x8_t s3 = vextq_s16(s0, s0_hi, 3);
+ const int16x8_t s4 = vextq_s16(s0, s0_hi, 4);
+ const int16x8_t s5 = vextq_s16(s0, s0_hi, 5);
+ const int16x8_t s6 = vextq_s16(s0, s0_hi, 6);
+ const int16x8_t s7 = vextq_s16(s0, s0_hi, 7);
+
+ highbd_convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, x_filter_0_7, offset,
+ sum0, sum1);
+}
+
+static INLINE uint16x8_t highbd_convolve8_horiz8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int32x4_t shift_s32, const int32x4_t offset) {
+ int32x4_t sum0, sum1;
+ highbd_convolve8_horiz8_s32(s0, s1, x_filter_0_7, offset, &sum0, &sum1);
+
+ sum0 = vqrshlq_s32(sum0, shift_s32);
+ sum1 = vqrshlq_s32(sum1, shift_s32);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+static INLINE uint16x8_t highbd_convolve8_wtd_horiz8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int32x4_t shift_s32, const int32x4_t offset, const int32x4_t weight,
+ const int32x4_t offset2) {
+ int32x4_t sum0, sum1;
+ highbd_convolve8_horiz8_s32(s0, s1, x_filter_0_7, offset, &sum0, &sum1);
+
+ sum0 = vqrshlq_s32(sum0, shift_s32);
+ sum1 = vqrshlq_s32(sum1, shift_s32);
+ sum0 = vmlaq_s32(offset2, sum0, weight);
+ sum1 = vmlaq_s32(offset2, sum1, weight);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+static INLINE int32x4_t highbd_convolve12_horiz4_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int16x4_t x_filter_8_11, const int32x4_t offset) {
+ const int16x8_t s2 = vextq_s16(s0, s1, 1);
+ const int16x8_t s3 = vextq_s16(s0, s1, 2);
+ const int16x8_t s4 = vextq_s16(s0, s1, 3);
+ const int16x8_t s5 = vextq_s16(s0, s1, 4);
+ const int16x8_t s6 = vextq_s16(s0, s1, 5);
+ const int16x8_t s7 = vextq_s16(s0, s1, 6);
+ const int16x8_t s8 = vextq_s16(s0, s1, 7);
+ const int16x4_t s0_lo = vget_low_s16(s0);
+ const int16x4_t s1_lo = vget_low_s16(s2);
+ const int16x4_t s2_lo = vget_low_s16(s3);
+ const int16x4_t s3_lo = vget_low_s16(s4);
+ const int16x4_t s4_lo = vget_high_s16(s0);
+ const int16x4_t s5_lo = vget_high_s16(s2);
+ const int16x4_t s6_lo = vget_high_s16(s3);
+ const int16x4_t s7_lo = vget_high_s16(s4);
+ const int16x4_t s8_lo = vget_high_s16(s5);
+ const int16x4_t s9_lo = vget_high_s16(s6);
+ const int16x4_t s10_lo = vget_high_s16(s7);
+ const int16x4_t s11_lo = vget_high_s16(s8);
+
+ return highbd_convolve12_y_4_s32(s0_lo, s1_lo, s2_lo, s3_lo, s4_lo, s5_lo,
+ s6_lo, s7_lo, s8_lo, s9_lo, s10_lo, s11_lo,
+ x_filter_0_7, x_filter_8_11, offset);
+}
+
+static INLINE uint16x4_t highbd_convolve12_horiz4_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t x_filter_0_7,
+ const int16x4_t x_filter_8_11, const int32x4_t shift_s32,
+ const int32x4_t offset) {
+ int32x4_t sum =
+ highbd_convolve12_horiz4_s32(s0, s1, x_filter_0_7, x_filter_8_11, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ return vqmovun_s32(sum);
+}
+
+static INLINE void highbd_convolve12_horiz8_s32(
+ const int16x8_t s0_0, const int16x8_t s0_1, const int16x8_t s0_2,
+ const int16x8_t x_filter_0_7, const int16x4_t x_filter_8_11,
+ const int32x4_t offset, int32x4_t *sum0, int32x4_t *sum1) {
+ const int16x8_t s1 = vextq_s16(s0_0, s0_1, 1);
+ const int16x8_t s2 = vextq_s16(s0_0, s0_1, 2);
+ const int16x8_t s3 = vextq_s16(s0_0, s0_1, 3);
+ const int16x8_t s4 = vextq_s16(s0_0, s0_1, 4);
+ const int16x8_t s5 = vextq_s16(s0_0, s0_1, 5);
+ const int16x8_t s6 = vextq_s16(s0_0, s0_1, 6);
+ const int16x8_t s7 = vextq_s16(s0_0, s0_1, 7);
+ const int16x8_t s8 = s0_1;
+ const int16x8_t s9 = vextq_s16(s0_1, s0_2, 1);
+ const int16x8_t s10 = vextq_s16(s0_1, s0_2, 2);
+ const int16x8_t s11 = vextq_s16(s0_1, s0_2, 3);
+
+ highbd_convolve12_y_8_s32(s0_0, s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11,
+ x_filter_0_7, x_filter_8_11, offset, sum0, sum1);
+}
+
+static INLINE uint16x8_t highbd_convolve12_horiz8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t x_filter_0_7, const int16x4_t x_filter_8_11,
+ const int32x4_t shift_s32, const int32x4_t offset) {
+ int32x4_t sum0, sum1;
+ highbd_convolve12_horiz8_s32(s0, s1, s2, x_filter_0_7, x_filter_8_11, offset,
+ &sum0, &sum1);
+
+ sum0 = vqrshlq_s32(sum0, shift_s32);
+ sum1 = vqrshlq_s32(sum1, shift_s32);
+
+ return vcombine_u16(vqmovun_s32(sum0), vqmovun_s32(sum1));
+}
+
+static INLINE int32x4_t highbd_convolve8_2d_scale_horiz4x8_s32(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x4_t *filters_lo,
+ const int16x4_t *filters_hi, const int32x4_t offset) {
+ int16x4_t s_lo[] = { vget_low_s16(s0), vget_low_s16(s1), vget_low_s16(s2),
+ vget_low_s16(s3) };
+ int16x4_t s_hi[] = { vget_high_s16(s0), vget_high_s16(s1), vget_high_s16(s2),
+ vget_high_s16(s3) };
+
+ transpose_u16_4x4((uint16x4_t *)s_lo);
+ transpose_u16_4x4((uint16x4_t *)s_hi);
+
+ int32x4_t sum = vmlal_s16(offset, s_lo[0], filters_lo[0]);
+ sum = vmlal_s16(sum, s_lo[1], filters_lo[1]);
+ sum = vmlal_s16(sum, s_lo[2], filters_lo[2]);
+ sum = vmlal_s16(sum, s_lo[3], filters_lo[3]);
+ sum = vmlal_s16(sum, s_hi[0], filters_hi[0]);
+ sum = vmlal_s16(sum, s_hi[1], filters_hi[1]);
+ sum = vmlal_s16(sum, s_hi[2], filters_hi[2]);
+ sum = vmlal_s16(sum, s_hi[3], filters_hi[3]);
+
+ return sum;
+}
+
+static INLINE uint16x4_t highbd_convolve8_2d_scale_horiz4x8_s32_s16(
+ const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x4_t *filters_lo,
+ const int16x4_t *filters_hi, const int32x4_t shift_s32,
+ const int32x4_t offset) {
+ int32x4_t sum = highbd_convolve8_2d_scale_horiz4x8_s32(
+ s0, s1, s2, s3, filters_lo, filters_hi, offset);
+
+ sum = vqrshlq_s32(sum, shift_s32);
+ return vqmovun_s32(sum);
+}
+
+#endif // AOM_AV1_COMMON_ARM_HIGHBD_CONVOLVE_NEON_H_
diff --git a/av1/common/arm/highbd_inv_txfm_neon.c b/av1/common/arm/highbd_inv_txfm_neon.c
index 90306b8..d197fca 100644
--- a/av1/common/arm/highbd_inv_txfm_neon.c
+++ b/av1/common/arm/highbd_inv_txfm_neon.c
@@ -17,7 +17,7 @@
#include "config/aom_config.h"
#include "config/av1_rtcd.h"
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
#define TRANSPOSE_4X4(x0, x1, x2, x3, y0, y1, y2, y3) \
do { \
int32x4x2_t swap_low = vtrnq_s32(x0, x1); \
@@ -49,7 +49,11 @@
y3 = vextq_s32(swap_low.val[1], \
vextq_s32(swap_high.val[1], swap_high.val[1], 2), 2); \
} while (0)
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
+
+static INLINE void transpose_4x4(const int32x4_t *in, int32x4_t *out) {
+ TRANSPOSE_4X4(in[0], in[1], in[2], in[3], out[0], out[1], out[2], out[3]);
+}
static INLINE void transpose_8x8(const int32x4_t *in, int32x4_t *out) {
TRANSPOSE_4X4(in[0], in[2], in[4], in[6], out[0], out[2], out[4], out[6]);
@@ -59,16 +63,12 @@
out[15]);
}
-static INLINE void av1_round_shift_array_32_neon(int32x4_t *input,
- int32x4_t *output,
- const int size,
- const int bit) {
+static INLINE void round_shift_array_32_neon(int32x4_t *input,
+ int32x4_t *output, const int size,
+ const int bit) {
const int32x4_t v_bit = vdupq_n_s32(-bit);
- const int32x4_t rnding = vdupq_n_s32(1 << (bit - 1));
- int i;
- for (i = 0; i < size; i++) {
- int32x4_t vradd = vaddq_s32(input[i], rnding);
- output[i] = vshlq_s32(vradd, v_bit);
+ for (int i = 0; i < size; i++) {
+ output[i] = vrshlq_s32(input[i], v_bit);
}
}
@@ -173,42 +173,25 @@
}
}
-static void round_shift_8x8(int32x4_t *in, int shift, const int32x4_t *rnding) {
- if (shift != 0) {
- const int32x4_t v_shift = vdupq_n_s32(-shift);
- int32x4_t vradd = vaddq_s32(in[0], *rnding);
- in[0] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[1], *rnding);
- in[1] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[2], *rnding);
- in[2] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[3], *rnding);
- in[3] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[4], *rnding);
- in[4] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[5], *rnding);
- in[5] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[6], *rnding);
- in[6] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[7], *rnding);
- in[7] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[8], *rnding);
- in[8] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[9], *rnding);
- in[9] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[10], *rnding);
- in[10] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[11], *rnding);
- in[11] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[12], *rnding);
- in[12] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[13], *rnding);
- in[13] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[14], *rnding);
- in[14] = vshlq_s32(vradd, v_shift);
- vradd = vaddq_s32(in[15], *rnding);
- in[15] = vshlq_s32(vradd, v_shift);
- }
+static void round_shift_8x8(int32x4_t *in, int shift) {
+ assert(shift != 0);
+ const int32x4_t v_shift = vdupq_n_s32(-shift);
+ in[0] = vrshlq_s32(in[0], v_shift);
+ in[1] = vrshlq_s32(in[1], v_shift);
+ in[2] = vrshlq_s32(in[2], v_shift);
+ in[3] = vrshlq_s32(in[3], v_shift);
+ in[4] = vrshlq_s32(in[4], v_shift);
+ in[5] = vrshlq_s32(in[5], v_shift);
+ in[6] = vrshlq_s32(in[6], v_shift);
+ in[7] = vrshlq_s32(in[7], v_shift);
+ in[8] = vrshlq_s32(in[8], v_shift);
+ in[9] = vrshlq_s32(in[9], v_shift);
+ in[10] = vrshlq_s32(in[10], v_shift);
+ in[11] = vrshlq_s32(in[11], v_shift);
+ in[12] = vrshlq_s32(in[12], v_shift);
+ in[13] = vrshlq_s32(in[13], v_shift);
+ in[14] = vrshlq_s32(in[14], v_shift);
+ in[15] = vrshlq_s32(in[15], v_shift);
}
static void highbd_clamp_s32_neon(int32x4_t *in, int32x4_t *out,
@@ -567,7 +550,10 @@
// Stage 0-1-2
- TRANSPOSE_4X4(in[0], in[1], in[2], in[3], u0, u1, u2, u3);
+ u0 = in[0];
+ u1 = in[1];
+ u2 = in[2];
+ u3 = in[3];
const int32x4_t v_bit = vdupq_n_s32(-bit);
@@ -611,7 +597,10 @@
int32x4_t x0, x1, x2, x3;
int32x4_t u0, u1, u2, u3;
- TRANSPOSE_4X4(in[0], in[1], in[2], in[3], x0, x1, x2, x3);
+ x0 = in[0];
+ x1 = in[1];
+ x2 = in[2];
+ x3 = in[3];
s0 = vmulq_n_s32(x0, sinpi[1]);
s1 = vmulq_n_s32(x0, sinpi[2]);
@@ -655,12 +644,12 @@
vreinterpretq_s16_s32(u0x.val[1]), vreinterpretq_s16_s32(zero), 1));
u0x = vzipq_s32(u0x.val[0], u0x.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
u0 = vreinterpretq_s32_s64(vzip1q_s64(vreinterpretq_s64_s32(u0x.val[0]),
vreinterpretq_s64_s32(u0x.val[1])));
#else
u0 = vcombine_s32(vget_low_s32(u0x.val[0]), vget_low_s32(u0x.val[1]));
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
// u1
int32x4x2_t u1x;
u1x.val[0] = vreinterpretq_s32_s64(
@@ -680,12 +669,12 @@
vreinterpretq_s16_s32(u1x.val[1]), vreinterpretq_s16_s32(zero), 1));
u1x = vzipq_s32(u1x.val[0], u1x.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
u1 = vreinterpretq_s32_s64(vzip1q_s64(vreinterpretq_s64_s32(u1x.val[0]),
vreinterpretq_s64_s32(u1x.val[1])));
#else
u1 = vcombine_s32(vget_low_s32(u1x.val[0]), vget_low_s32(u1x.val[1]));
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
// u2
int32x4x2_t u2x;
@@ -706,12 +695,12 @@
vreinterpretq_s16_s32(u2x.val[1]), vreinterpretq_s16_s32(zero), 1));
u2x = vzipq_s32(u2x.val[0], u2x.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
u2 = vreinterpretq_s32_s64(vzip1q_s64(vreinterpretq_s64_s32(u2x.val[0]),
vreinterpretq_s64_s32(u2x.val[1])));
#else
u2 = vcombine_s32(vget_low_s32(u2x.val[0]), vget_low_s32(u2x.val[1]));
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
// u3
int32x4x2_t u3x;
@@ -732,12 +721,12 @@
vreinterpretq_s16_s32(u3x.val[1]), vreinterpretq_s16_s32(zero), 1));
u3x = vzipq_s32(u3x.val[0], u3x.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
u3 = vreinterpretq_s32_s64(vzip1q_s64(vreinterpretq_s64_s32(u3x.val[0]),
vreinterpretq_s64_s32(u3x.val[1])));
#else
u3 = vcombine_s32(vget_low_s32(u3x.val[0]), vget_low_s32(u3x.val[1]));
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
out[0] = u0;
out[1] = u1;
@@ -803,7 +792,6 @@
static void iidentity4_neon(int32x4_t *in, int32x4_t *out, int bit, int do_cols,
int bd, int out_shift) {
(void)bit;
- int32x4_t v[4];
int32x4_t zero = vdupq_n_s32(0);
int32x2_t fact = vdup_n_s32(NewSqrt2);
int32x4x2_t a0;
@@ -821,7 +809,7 @@
vshrq_n_s64(vreinterpretq_s64_s32(a0.val[1]), NewSqrt2Bits));
a0 = vzipq_s32(a0.val[0], a0.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
out[i] = vreinterpretq_s32_s64(vzip1q_s64(
vreinterpretq_s64_s32(a0.val[0]), vreinterpretq_s64_s32(a0.val[1])));
#else
@@ -835,13 +823,6 @@
round_shift_4x4(out, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo, &clamp_hi, 4);
}
- v[0] = out[0];
- v[1] = out[1];
- v[2] = out[2];
- v[3] = out[3];
-
- // Transpose for 4x4
- TRANSPOSE_4X4(v[0], v[1], v[2], v[3], out[0], out[1], out[2], out[3]);
}
void av1_inv_txfm2d_add_4x4_neon(const int32_t *input, uint16_t *output,
@@ -854,96 +835,112 @@
case DCT_DCT:
load_buffer_4x4(input, in);
idct4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
idct4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_DCT:
load_buffer_4x4(input, in);
idct4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case DCT_ADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
idct4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_ADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case FLIPADST_DCT:
load_buffer_4x4(input, in);
idct4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case DCT_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
idct4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 1, -shift[1], bd);
break;
case ADST_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_ADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case IDTX:
load_buffer_4x4(input, in);
iidentity4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iidentity4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_DCT:
load_buffer_4x4(input, in);
iidentity4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
idct4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case H_DCT:
load_buffer_4x4(input, in);
idct4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iidentity4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_ADST:
load_buffer_4x4(input, in);
iidentity4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case H_ADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iidentity4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_FLIPADST:
load_buffer_4x4(input, in);
iidentity4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iadst4x4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case H_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_neon(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_4x4(in, in);
iidentity4_neon(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
@@ -1069,11 +1066,10 @@
}
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range_out = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo_out = vdupq_n_s32(-(1 << (log_range_out - 1)));
const int32x4_t clamp_hi_out = vdupq_n_s32((1 << (log_range_out - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo_out, &clamp_hi_out, 16);
}
}
@@ -1384,8 +1380,7 @@
int fliplr, int flipud, int shift, int bd) {
uint16x8_t u0, u1, u2, u3, u4, u5, u6, u7;
uint16x8_t v0, v1, v2, v3, v4, v5, v6, v7;
- const int32x4_t rnding = vdupq_n_s32(1 << (shift - 1));
- round_shift_8x8(in, shift, &rnding);
+ round_shift_8x8(in, shift);
v0 = vld1q_u16(output + 0 * stride);
v1 = vld1q_u16(output + 1 * stride);
@@ -1434,75 +1429,66 @@
switch (tx_type) {
case DCT_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ idct8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case DCT_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ idct8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case FLIPADST_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 1, -shift[1], bd);
+ idct8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 1, -shift[1], bd);
break;
case DCT_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 0, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 0, -shift[1], bd);
break;
case ADST_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 0, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 1, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 1, -shift[1], bd);
break;
case FLIPADST_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_neon(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 1, -shift[1], bd);
+ iadst8x8_neon(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_neon(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 1, -shift[1], bd);
break;
default: assert(0);
}
@@ -1989,11 +1975,10 @@
addsub_neon(u[7], u[8], out + 7, out + 8, &clamp_lo, &clamp_hi);
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range_out = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo_out = vdupq_n_s32(-(1 << (log_range_out - 1)));
const int32x4_t clamp_hi_out = vdupq_n_s32((1 << (log_range_out - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo_out, &clamp_hi_out, 16);
}
}
@@ -2530,12 +2515,11 @@
addsub_neon(v[7], v[8], out + 7, out + 8, &clamp_lo, &clamp_hi);
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range_out = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo_out = vdupq_n_s32(-(1 << (log_range_out - 1)));
const int32x4_t clamp_hi_out =
vdupq_n_s32((1 << (log_range_out - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo_out, &clamp_hi_out, 16);
}
}
@@ -2821,6 +2805,7 @@
&clamp_hi_out, &v_shift, &offset);
}
}
+
static void iidentity16_neon(int32x4_t *in, int32x4_t *out, int bit,
int do_cols, int bd, int out_shift) {
(void)bit;
@@ -2839,7 +2824,7 @@
a0.val[1] = vreinterpretq_s32_s64(
vshrq_n_s64(vreinterpretq_s64_s32(a0.val[1]), NewSqrt2Bits));
a0 = vzipq_s32(a0.val[0], a0.val[1]);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
out[i] = vreinterpretq_s32_s64(vzip1q_s64(
vreinterpretq_s64_s32(a0.val[0]), vreinterpretq_s64_s32(a0.val[1])));
#else
@@ -2848,14 +2833,14 @@
}
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo = vdupq_n_s32(-(1 << (log_range - 1)));
const int32x4_t clamp_hi = vdupq_n_s32((1 << (log_range - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo, &clamp_hi, 16);
}
}
+
static INLINE void idct64_stage8_neon(int32x4_t *u, const int32_t *cospi,
const int32x4_t *clamp_lo,
const int32x4_t *clamp_hi,
@@ -4687,12 +4672,11 @@
addsub_neon(bf0[15], bf0[16], out + 15, out + 16, &clamp_lo, &clamp_hi);
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range_out = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo_out = vdupq_n_s32(-(1 << (log_range_out - 1)));
const int32x4_t clamp_hi_out = vdupq_n_s32((1 << (log_range_out - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
- round_shift_8x8(out + 16, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
+ round_shift_8x8(out + 16, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo_out, &clamp_hi_out, 32);
}
}
@@ -4720,12 +4704,11 @@
}
if (!do_cols) {
- const int32x4_t rnding_shift = vdupq_n_s32(1 << (out_shift - 1));
const int log_range_out = AOMMAX(16, bd + 6);
const int32x4_t clamp_lo_out = vdupq_n_s32(-(1 << (log_range_out - 1)));
const int32x4_t clamp_hi_out = vdupq_n_s32((1 << (log_range_out - 1)) - 1);
- round_shift_8x8(out, out_shift, &rnding_shift);
- round_shift_8x8(out + 16, out_shift, &rnding_shift);
+ round_shift_8x8(out, out_shift);
+ round_shift_8x8(out + 16, out_shift);
highbd_clamp_s32_neon(out, out, &clamp_lo_out, &clamp_hi_out, 32);
}
}
@@ -4791,7 +4774,7 @@
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
const transform_1d_neon col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][1];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
assert(col_txfm != NULL);
assert(row_txfm != NULL);
@@ -4800,9 +4783,8 @@
// 1st stage: column transform
int32x4_t buf0[8];
- const int32_t *input_row = input;
- int32x4_t *buf0_cur = buf0;
- load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_row);
+ load_buffer_32bit_input(input, input_stride, buf0, txfm_size_col);
+ load_buffer_32bit_input(input + 4, input_stride, buf0 + 4, txfm_size_col);
round_shift_rect_array_32_neon(buf0, buf0, txfm_size_row);
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
row_txfm(buf0 + 4, buf0 + 4, INV_COS_BIT, 0, bd, -shift[0]);
@@ -4824,7 +4806,7 @@
// 2nd stage: column transform
col_txfm(buf1, buf1, INV_COS_BIT, 1, bd, 0);
- av1_round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
+ round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
// write to buffer
highbd_write_buffer_4xn_neon(buf1, output, stride, ud_flip, txfm_size_row,
@@ -4855,12 +4837,7 @@
const int32_t *input_row = input;
load_buffer_32bit_input(input_row, 4, buf0, txfm_size_col);
- TRANSPOSE_4X4(buf0[0], buf0[2], buf0[4], buf0[6], buf1[0], buf1[1], buf1[2],
- buf1[3]);
- TRANSPOSE_4X4(buf0[1], buf0[3], buf0[5], buf0[7], buf1[4], buf1[5], buf1[6],
- buf1[7]);
-
- round_shift_rect_array_32_neon(buf1, buf0, txfm_size_col);
+ round_shift_rect_array_32_neon(buf0, buf0, txfm_size_col);
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *buf1_ptr;
@@ -4873,10 +4850,11 @@
// 2nd stage: column transform
for (int i = 0; i < 2; i++) {
- col_txfm(buf1_ptr + i * txfm_size_row, buf1_ptr + i * txfm_size_row,
- INV_COS_BIT, 1, bd, 0);
+ int32x4_t *buf1_cur = buf1_ptr + i * txfm_size_row;
+ transpose_4x4(buf1_cur, buf1_cur);
+ col_txfm(buf1_cur, buf1_cur, INV_COS_BIT, 1, bd, 0);
}
- av1_round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
+ round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
// write to buffer
highbd_write_buffer_8xn_neon(buf1_ptr, output, stride, ud_flip, txfm_size_row,
bd);
@@ -4896,7 +4874,7 @@
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
const transform_1d_neon col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][2];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
assert(col_txfm != NULL);
assert(row_txfm != NULL);
@@ -4905,10 +4883,10 @@
// 1st stage: column transform
int32x4_t buf0[16];
- const int32_t *input_row = input;
- int32x4_t *buf0_cur = buf0;
- load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_row);
for (int i = 0; i < (txfm_size_row >> 2); i++) {
+ const int32_t *input_row = input + i * 4;
+ int32x4_t *buf0_cur = buf0 + i * 4;
+ load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_col);
row_txfm(buf0 + (i << 2), buf0 + (i << 2), INV_COS_BIT, 0, bd, -shift[0]);
}
@@ -4929,7 +4907,7 @@
// 2nd stage: column transform
col_txfm(buf1, buf1, INV_COS_BIT, 1, bd, 0);
- av1_round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
+ round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
// write to buffer
highbd_write_buffer_4xn_neon(buf1, output, stride, ud_flip, txfm_size_row,
@@ -4961,11 +4939,7 @@
const int32_t *input_row = input;
load_buffer_32bit_input(input_row, 4, buf0, txfm_size_col);
- for (int j = 0; j < buf_size_w_div8; j++) {
- TRANSPOSE_4X4(buf0[j], buf0[j + 4], buf0[j + 8], buf0[j + 12], buf1[4 * j],
- buf1[4 * j + 1], buf1[4 * j + 2], buf1[4 * j + 3]);
- }
- row_txfm(buf1, buf0, INV_COS_BIT, 0, bd, -shift[0]);
+ row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *buf1_ptr;
if (lr_flip) {
@@ -4977,10 +4951,11 @@
// 2nd stage: column transform
for (int i = 0; i < buf_size_w_div8; i++) {
- col_txfm(buf1_ptr + i * txfm_size_row, buf1_ptr + i * txfm_size_row,
- INV_COS_BIT, 1, bd, 0);
+ int32x4_t *buf1_cur = buf1_ptr + i * txfm_size_row;
+ transpose_4x4(buf1_cur, buf1_cur);
+ col_txfm(buf1_cur, buf1_cur, INV_COS_BIT, 1, bd, 0);
}
- av1_round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
+ round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
// write to buffer
for (int i = 0; i < (txfm_size_col >> 3); i++) {
@@ -4990,9 +4965,10 @@
}
}
-void highbd_inv_txfm2d_add_4x16_neon(const int32_t *input, uint16_t *output,
- int stride, TX_TYPE tx_type, int eob,
- const int bd) {
+static void highbd_inv_txfm2d_add_4x16_neon(const int32_t *input,
+ uint16_t *output, int stride,
+ TX_TYPE tx_type, int eob,
+ const int bd) {
(void)eob;
TX_SIZE tx_size = TX_4X16;
int32x4_t buf1[16];
@@ -5039,16 +5015,17 @@
// 2nd stage: column transform
col_txfm(buf1, buf1, INV_COS_BIT, 1, bd, 0);
- av1_round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
+ round_shift_array_32_neon(buf1, buf1, txfm_size_row, -shift[1]);
// write to buffer
highbd_write_buffer_4xn_neon(buf1, output, stride, ud_flip, txfm_size_row,
bd);
}
-void highbd_inv_txfm2d_add_16x4_neon(const int32_t *input, uint16_t *output,
- int stride, TX_TYPE tx_type, int eob,
- const int bd) {
+static void highbd_inv_txfm2d_add_16x4_neon(const int32_t *input,
+ uint16_t *output, int stride,
+ TX_TYPE tx_type, int eob,
+ const int bd) {
(void)eob;
TX_SIZE tx_size = TX_16X4;
int32x4_t buf1[16];
@@ -5092,7 +5069,7 @@
col_txfm(buf1_ptr + i * txfm_size_row, buf1_ptr + i * txfm_size_row,
INV_COS_BIT, 1, bd, 0);
}
- av1_round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
+ round_shift_array_32_neon(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
// write to buffer
for (int i = 0; i < (txfm_size_col >> 3); i++) {
@@ -5261,46 +5238,49 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
- const int buf_size_w_div4 = input_stride >> 2;
+ const int buf_size_w = AOMMIN(32, txfm_size_col);
+ const int buf_size_w_div4 = buf_size_w >> 2;
const int buf_size_h_div8 = (eoby + 8) >> 3;
+ const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eoby];
const transform_1d_neon row_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
+ assert(row_txfm != NULL);
const transform_1d_neon col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][fun_idx];
+ assert(col_txfm != NULL);
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < (buf_size_h_div8 << 1); ++i) {
int32x4_t buf0[16];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < buf_size_w_div4; ++j) {
- int32x4_t *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0, buf_size_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_rect_array_32_neon(buf0, buf0, input_stride);
+ round_shift_rect_array_32_neon(buf0, buf0, buf_size_w);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *_buf1 = buf1 + i * 4;
for (int j = 0; j < buf_size_w_div4; ++j) {
- _buf1[j * txfm_size_row + 0] = buf0[j * 4 + 0];
- _buf1[j * txfm_size_row + 1] = buf0[j * 4 + 1];
- _buf1[j * txfm_size_row + 2] = buf0[j * 4 + 2];
- _buf1[j * txfm_size_row + 3] = buf0[j * 4 + 3];
+ int32x4_t *buf0_cur = buf0 + j * 4;
+ TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
+ buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
+ _buf1[j * txfm_size_row + 0] = buf0_cur[0];
+ _buf1[j * txfm_size_row + 1] = buf0_cur[1];
+ _buf1[j * txfm_size_row + 2] = buf0_cur[2];
+ _buf1[j * txfm_size_row + 3] = buf0_cur[3];
}
}
for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
- av1_round_shift_array_32_neon(buf1 + i * txfm_size_row,
- buf1 + i * txfm_size_row, txfm_size_row,
- -shift[1]);
+ round_shift_array_32_neon(buf1 + i * txfm_size_row,
+ buf1 + i * txfm_size_row, txfm_size_row,
+ -shift[1]);
}
// write to buffer
@@ -5322,46 +5302,43 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
- const int buf_size_w_div8 = input_stride >> 2;
+ const int buf_size_w_div4 = AOMMIN(32, txfm_size_col) >> 2;
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = buf_size_nonzero_w_div8 << 3;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eobx];
const transform_1d_neon row_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][fun_idx];
+ assert(row_txfm != NULL);
const transform_1d_neon col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][0];
+ assert(col_txfm != NULL);
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < (row_max >> 2); ++i) {
int32x4_t buf0[16];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < (buf_size_nonzero_w_div8 << 1); ++j) {
- int32x4_t *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
-
- TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
- buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_rect_array_32_neon(buf0, buf0, buf_size_nonzero_w_div8 << 3);
+ round_shift_rect_array_32_neon(buf0, buf0, buf_size_nonzero_w);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *_buf1 = buf1 + i * 4;
if (lr_flip) {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(buf0[4 * j + 3], buf0[4 * j + 2], buf0[4 * j + 1],
buf0[4 * j],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 0],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 1],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 2],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 3]);
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 0],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 1],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 2],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 3]);
}
} else {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(
buf0[j * 4 + 0], buf0[j * 4 + 1], buf0[j * 4 + 2], buf0[j * 4 + 3],
_buf1[j * txfm_size_row + 0], _buf1[j * txfm_size_row + 1],
@@ -5369,13 +5346,13 @@
}
}
}
- for (int i = 0; i < buf_size_w_div8; i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
- av1_round_shift_array_32_neon(buf1 + i * txfm_size_row,
- buf1 + i * txfm_size_row, txfm_size_row,
- -shift[1]);
+ round_shift_array_32_neon(buf1 + i * txfm_size_row,
+ buf1 + i * txfm_size_row, txfm_size_row,
+ -shift[1]);
}
// write to buffer
@@ -5386,6 +5363,7 @@
}
}
}
+
static void inv_txfm2d_add_idtx_neon(const int32_t *input, uint16_t *output,
int stride, TX_TYPE tx_type,
TX_SIZE tx_size, const int bd) {
@@ -5395,40 +5373,43 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
+ const int buf_size_w = AOMMIN(32, txfm_size_col);
+ const int buf_size_w_div4 = buf_size_w >> 2;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const transform_1d_neon row_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
+ assert(row_txfm != NULL);
const transform_1d_neon col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][0];
+ assert(col_txfm != NULL);
for (int i = 0; i < (row_max >> 2); ++i) {
int32x4_t buf0[32];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < (input_stride >> 2); ++j) {
- int32x4_t *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0, buf_size_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_rect_array_32_neon(buf0, buf0, input_stride);
+ round_shift_rect_array_32_neon(buf0, buf0, buf_size_w);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *_buf1 = buf1 + i * 4;
- for (int j = 0; j < (input_stride >> 2); ++j) {
- _buf1[j * txfm_size_row + 0] = buf0[j * 4 + 0];
- _buf1[j * txfm_size_row + 1] = buf0[j * 4 + 1];
- _buf1[j * txfm_size_row + 2] = buf0[j * 4 + 2];
- _buf1[j * txfm_size_row + 3] = buf0[j * 4 + 3];
+ for (int j = 0; j < buf_size_w_div4; ++j) {
+ int32x4_t *buf0_cur = buf0 + j * 4;
+ TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
+ buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
+ _buf1[j * txfm_size_row + 0] = buf0_cur[0];
+ _buf1[j * txfm_size_row + 1] = buf0_cur[1];
+ _buf1[j * txfm_size_row + 2] = buf0_cur[2];
+ _buf1[j * txfm_size_row + 3] = buf0_cur[3];
}
}
- for (int i = 0; i < (input_stride >> 2); i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
- av1_round_shift_array_32_neon(buf1 + i * txfm_size_row,
- buf1 + i * txfm_size_row, txfm_size_row,
- -shift[1]);
+ round_shift_array_32_neon(buf1 + i * txfm_size_row,
+ buf1 + i * txfm_size_row, txfm_size_row,
+ -shift[1]);
}
// write to buffer
@@ -5439,9 +5420,11 @@
}
}
}
-void inv_txfm2d_add_no_identity_neon(const int32_t *input, uint16_t *output,
- int stride, TX_TYPE tx_type,
- TX_SIZE tx_size, const int bd) {
+
+static void inv_txfm2d_add_no_identity_neon(const int32_t *input,
+ uint16_t *output, int stride,
+ TX_TYPE tx_type, TX_SIZE tx_size,
+ const int bd) {
int32x4_t buf1[64 * 16];
int eobx, eoby;
get_eobx_eoby_scan_default(&eobx, &eoby, tx_size);
@@ -5450,10 +5433,10 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int buf_size_w_div8 = txfm_size_col >> 2;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_w_div4 = txfm_size_col >> 2;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -5470,32 +5453,26 @@
// 1st stage: column transform
for (int i = 0; i < buf_size_nonzero_h_div8 << 1; i++) {
int32x4_t buf0[64];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < buf_size_nonzero_w_div8 << 1; ++j) {
- int32x4_t *buf0_cur = &buf0[j * 4];
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
-
- TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
- buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_rect_array_32_neon(buf0, buf0, buf_size_nonzero_w_div8 << 3);
+ round_shift_rect_array_32_neon(buf0, buf0, buf_size_nonzero_w);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
int32x4_t *_buf1 = &buf1[i * 4];
if (lr_flip) {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(buf0[4 * j + 3], buf0[4 * j + 2], buf0[4 * j + 1],
buf0[4 * j],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 0],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 1],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 2],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 3]);
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 0],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 1],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 2],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 3]);
}
} else {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(
buf0[j * 4 + 0], buf0[j * 4 + 1], buf0[j * 4 + 2], buf0[j * 4 + 3],
_buf1[j * txfm_size_row + 0], _buf1[j * txfm_size_row + 1],
@@ -5504,13 +5481,13 @@
}
}
// 2nd stage: column transform
- for (int i = 0; i < buf_size_w_div8; i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
- av1_round_shift_array_32_neon(buf1 + i * txfm_size_row,
- buf1 + i * txfm_size_row, txfm_size_row,
- -shift[1]);
+ round_shift_array_32_neon(buf1 + i * txfm_size_row,
+ buf1 + i * txfm_size_row, txfm_size_row,
+ -shift[1]);
}
// write to buffer
@@ -5522,10 +5499,11 @@
}
}
-void highbd_inv_txfm2d_add_no_identity_neon(const int32_t *input,
- uint16_t *output, int stride,
- TX_TYPE tx_type, TX_SIZE tx_size,
- int eob, const int bd) {
+static void highbd_inv_txfm2d_add_no_identity_neon(const int32_t *input,
+ uint16_t *output, int stride,
+ TX_TYPE tx_type,
+ TX_SIZE tx_size, int eob,
+ const int bd) {
int32x4_t buf1[64 * 16];
int eobx, eoby;
highbd_get_eobx_eoby_scan_default(&eobx, &eoby, tx_size, eob);
@@ -5592,9 +5570,9 @@
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
- av1_round_shift_array_32_neon(buf1 + i * txfm_size_row,
- buf1 + i * txfm_size_row, txfm_size_row,
- -shift[1]);
+ round_shift_array_32_neon(buf1 + i * txfm_size_row,
+ buf1 + i * txfm_size_row, txfm_size_row,
+ -shift[1]);
}
// write to buffer
@@ -5606,10 +5584,11 @@
}
}
-void av1_highbd_inv_txfm2d_add_universe_neon(const int32_t *input,
- uint8_t *output, int stride,
- TX_TYPE tx_type, TX_SIZE tx_size,
- int eob, const int bd) {
+static void highbd_inv_txfm2d_add_universe_neon(const int32_t *input,
+ uint8_t *output, int stride,
+ TX_TYPE tx_type,
+ TX_SIZE tx_size, int eob,
+ const int bd) {
switch (tx_type) {
case DCT_DCT:
case ADST_DCT:
@@ -5643,9 +5622,9 @@
}
}
-void av1_inv_txfm2d_add_universe_neon(const int32_t *input, uint8_t *output,
- int stride, TX_TYPE tx_type,
- TX_SIZE tx_size, const int bd) {
+static void inv_txfm2d_add_universe_neon(const int32_t *input, uint8_t *output,
+ int stride, TX_TYPE tx_type,
+ TX_SIZE tx_size, const int bd) {
switch (tx_type) {
case DCT_DCT:
case ADST_DCT:
@@ -5692,9 +5671,9 @@
case V_DCT:
case V_ADST:
case V_FLIPADST:
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride, tx_type,
- txfm_param->tx_size,
- txfm_param->eob, bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, tx_type,
+ txfm_param->tx_size, txfm_param->eob,
+ bd);
break;
default:
av1_inv_txfm2d_add_8x8_neon(src, CONVERT_TO_SHORTPTR(dest), stride,
@@ -5702,6 +5681,7 @@
break;
}
}
+
void av1_highbd_inv_txfm_add_4x4_neon(const tran_low_t *input, uint8_t *dest,
int stride, const TxfmParam *txfm_param) {
assert(av1_ext_tx_used[txfm_param->tx_set_type][txfm_param->tx_type]);
@@ -5733,8 +5713,8 @@
void av1_inv_txfm2d_add_8x16_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_8X16, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type, TX_8X16,
+ bd);
}
void av1_highbd_inv_txfm_add_4x16_neon(const tran_low_t *input, uint8_t *dest,
@@ -5760,176 +5740,173 @@
void av1_highbd_inv_txfm_add_8x16_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_8X16,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_8X16, txfm_param->eob, txfm_param->bd);
}
void av1_highbd_inv_txfm_add_16x8_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_16X8,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_16X8, txfm_param->eob, txfm_param->bd);
}
void av1_inv_txfm2d_add_16x8_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_16X8, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type, TX_16X8,
+ bd);
}
void av1_highbd_inv_txfm_add_16x32_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_16X32,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_16X32, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_16x32_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_16X32, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_16X32, bd);
}
void av1_highbd_inv_txfm_add_32x16_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_32X16,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_32X16, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_32x16_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_32X16, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_32X16, bd);
}
void av1_highbd_inv_txfm_add_32x32_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_32X32,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_32X32, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_32x32_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_32X32, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_32X32, bd);
}
void av1_highbd_inv_txfm_add_64x64_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_64X64,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_64X64, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_64x64_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_64X64, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_64X64, bd);
}
void av1_highbd_inv_txfm_add_32x64_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_32X64,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_32X64, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_32x64_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_32X64, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_32X64, bd);
}
void av1_highbd_inv_txfm_add_64x32_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_64X32,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_64X32, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_64x32_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_64X32, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_64X32, bd);
}
void av1_highbd_inv_txfm_add_64x16_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_64X16,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_64X16, txfm_param->eob,
+ txfm_param->bd);
}
+
void av1_inv_txfm2d_add_64x16_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_64X16, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_64X16, bd);
}
void av1_highbd_inv_txfm_add_16x64_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_16X64,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_16X64, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_16x64_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_16X64, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_16X64, bd);
}
void av1_highbd_inv_txfm_add_16x16_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_16X16,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_16X16, txfm_param->eob,
+ txfm_param->bd);
}
void av1_inv_txfm2d_add_16x16_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_16X16, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
+ TX_16X16, bd);
}
void av1_highbd_inv_txfm_add_32x8_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_32X8,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_32X8, txfm_param->eob, txfm_param->bd);
}
void av1_inv_txfm2d_add_32x8_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_32X8, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type, TX_32X8,
+ bd);
}
void av1_highbd_inv_txfm_add_8x32_neon(const tran_low_t *input, uint8_t *dest,
int stride,
const TxfmParam *txfm_param) {
- av1_highbd_inv_txfm2d_add_universe_neon(input, dest, stride,
- txfm_param->tx_type, TX_8X32,
- txfm_param->eob, txfm_param->bd);
+ highbd_inv_txfm2d_add_universe_neon(input, dest, stride, txfm_param->tx_type,
+ TX_8X32, txfm_param->eob, txfm_param->bd);
}
void av1_inv_txfm2d_add_8x32_neon(const tran_low_t *input, uint16_t *dest,
int stride, TX_TYPE tx_type, const int bd) {
- av1_inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type,
- TX_8X32, bd);
+ inv_txfm2d_add_universe_neon(input, (uint8_t *)dest, stride, tx_type, TX_8X32,
+ bd);
}
void av1_highbd_inv_txfm_add_neon(const tran_low_t *input, uint8_t *dest,
diff --git a/av1/common/arm/jnt_convolve_neon.c b/av1/common/arm/jnt_convolve_neon.c
index 36c8f9c..564f7c2 100644
--- a/av1/common/arm/jnt_convolve_neon.c
+++ b/av1/common/arm/jnt_convolve_neon.c
@@ -22,548 +22,566 @@
#include "av1/common/common.h"
#include "av1/common/arm/convolve_neon.h"
-#if !defined(__aarch64__)
-static INLINE void compute_avg_4x1(
- uint16x4_t res0, uint16x4_t d0, const uint16_t fwd_offset,
- const uint16_t bck_offset, const int16x4_t sub_const_vec,
- const int16_t round_bits, const int use_dist_wtd_comp_avg, uint8x8_t *t0) {
- int16x4_t tmp0;
- uint16x4_t tmp_u0;
- uint32x4_t sum0;
- int32x4_t dst0;
- int16x8_t tmp4;
+#if !AOM_ARCH_AARCH64
+static INLINE void compute_dist_wtd_avg_4x1(uint16x4_t dd0, uint16x4_t d0,
+ const uint16_t fwd_offset,
+ const uint16_t bck_offset,
+ const int16x4_t round_offset,
+ uint8x8_t *d0_u8) {
+ uint32x4_t blend0 = vmull_n_u16(dd0, fwd_offset);
+ blend0 = vmlal_n_u16(blend0, d0, bck_offset);
- if (use_dist_wtd_comp_avg) {
- const int32x4_t round_bits_vec = vdupq_n_s32((int32_t)(-round_bits));
+ uint16x4_t avg0 = vshrn_n_u32(blend0, DIST_PRECISION_BITS);
- sum0 = vmull_n_u16(res0, fwd_offset);
- sum0 = vmlal_n_u16(sum0, d0, bck_offset);
+ int16x4_t dst0 = vsub_s16(vreinterpret_s16_u16(avg0), round_offset);
- sum0 = vshrq_n_u32(sum0, DIST_PRECISION_BITS);
+ int16x8_t dst0q = vcombine_s16(dst0, vdup_n_s16(0));
- dst0 = vsubq_s32(vreinterpretq_s32_u32(sum0), vmovl_s16(sub_const_vec));
-
- dst0 = vqrshlq_s32(dst0, round_bits_vec);
-
- tmp0 = vmovn_s32(dst0);
- tmp4 = vcombine_s16(tmp0, tmp0);
-
- *t0 = vqmovun_s16(tmp4);
- } else {
- const int16x4_t round_bits_vec = vdup_n_s16(-round_bits);
- tmp_u0 = vhadd_u16(res0, d0);
-
- tmp0 = vsub_s16(vreinterpret_s16_u16(tmp_u0), sub_const_vec);
-
- tmp0 = vqrshl_s16(tmp0, round_bits_vec);
-
- tmp4 = vcombine_s16(tmp0, vdup_n_s16(0));
-
- *t0 = vqmovun_s16(tmp4);
- }
+ *d0_u8 = vqrshrun_n_s16(dst0q, FILTER_BITS - ROUND0_BITS);
}
-static INLINE void compute_avg_8x1(
- uint16x8_t res0, uint16x8_t d0, const uint16_t fwd_offset,
- const uint16_t bck_offset, const int16x4_t sub_const,
- const int16_t round_bits, const int use_dist_wtd_comp_avg, uint8x8_t *t0) {
- int16x8_t f0;
- uint32x4_t sum0, sum2;
- int32x4_t dst0, dst2;
+static INLINE void compute_basic_avg_4x1(uint16x4_t dd0, uint16x4_t d0,
+ const int16x4_t round_offset,
+ uint8x8_t *d0_u8) {
+ uint16x4_t avg0 = vhadd_u16(dd0, d0);
- uint16x8_t tmp_u0;
+ int16x4_t dst0 = vsub_s16(vreinterpret_s16_u16(avg0), round_offset);
- if (use_dist_wtd_comp_avg) {
- const int32x4_t sub_const_vec = vmovl_s16(sub_const);
- const int32x4_t round_bits_vec = vdupq_n_s32(-(int32_t)round_bits);
+ int16x8_t dst0q = vcombine_s16(dst0, vdup_n_s16(0));
- sum0 = vmull_n_u16(vget_low_u16(res0), fwd_offset);
- sum0 = vmlal_n_u16(sum0, vget_low_u16(d0), bck_offset);
- sum0 = vshrq_n_u32(sum0, DIST_PRECISION_BITS);
-
- sum2 = vmull_n_u16(vget_high_u16(res0), fwd_offset);
- sum2 = vmlal_n_u16(sum2, vget_high_u16(d0), bck_offset);
- sum2 = vshrq_n_u32(sum2, DIST_PRECISION_BITS);
-
- dst0 = vsubq_s32(vreinterpretq_s32_u32(sum0), sub_const_vec);
- dst2 = vsubq_s32(vreinterpretq_s32_u32(sum2), sub_const_vec);
-
- dst0 = vqrshlq_s32(dst0, round_bits_vec);
- dst2 = vqrshlq_s32(dst2, round_bits_vec);
-
- f0 = vcombine_s16(vmovn_s32(dst0), vmovn_s32(dst2));
-
- *t0 = vqmovun_s16(f0);
-
- } else {
- const int16x8_t sub_const_vec = vcombine_s16(sub_const, sub_const);
- const int16x8_t round_bits_vec = vdupq_n_s16(-round_bits);
-
- tmp_u0 = vhaddq_u16(res0, d0);
-
- f0 = vsubq_s16(vreinterpretq_s16_u16(tmp_u0), sub_const_vec);
-
- f0 = vqrshlq_s16(f0, round_bits_vec);
-
- *t0 = vqmovun_s16(f0);
- }
+ *d0_u8 = vqrshrun_n_s16(dst0q, FILTER_BITS - ROUND0_BITS);
}
-#endif // !defined(__arch64__)
-static INLINE void compute_avg_4x4(
- uint16x4_t res0, uint16x4_t res1, uint16x4_t res2, uint16x4_t res3,
+static INLINE void compute_dist_wtd_avg_8x1(uint16x8_t dd0, uint16x8_t d0,
+ const uint16_t fwd_offset,
+ const uint16_t bck_offset,
+ const int16x8_t round_offset,
+ uint8x8_t *d0_u8) {
+ uint32x4_t blend0_lo = vmull_n_u16(vget_low_u16(dd0), fwd_offset);
+ blend0_lo = vmlal_n_u16(blend0_lo, vget_low_u16(d0), bck_offset);
+ uint32x4_t blend0_hi = vmull_n_u16(vget_high_u16(dd0), fwd_offset);
+ blend0_hi = vmlal_n_u16(blend0_hi, vget_high_u16(d0), bck_offset);
+
+ uint16x8_t avg0 = vcombine_u16(vshrn_n_u32(blend0_lo, DIST_PRECISION_BITS),
+ vshrn_n_u32(blend0_hi, DIST_PRECISION_BITS));
+
+ int16x8_t dst0 = vsubq_s16(vreinterpretq_s16_u16(avg0), round_offset);
+
+ *d0_u8 = vqrshrun_n_s16(dst0, FILTER_BITS - ROUND0_BITS);
+}
+
+static INLINE void compute_basic_avg_8x1(uint16x8_t dd0, uint16x8_t d0,
+ const int16x8_t round_offset,
+ uint8x8_t *d0_u8) {
+ uint16x8_t avg0 = vhaddq_u16(dd0, d0);
+
+ int16x8_t dst0 = vsubq_s16(vreinterpretq_s16_u16(avg0), round_offset);
+
+ *d0_u8 = vqrshrun_n_s16(dst0, FILTER_BITS - ROUND0_BITS);
+}
+
+#endif // !AOM_ARCH_AARCH64
+
+static INLINE void compute_dist_wtd_avg_4x4(
+ uint16x4_t dd0, uint16x4_t dd1, uint16x4_t dd2, uint16x4_t dd3,
uint16x4_t d0, uint16x4_t d1, uint16x4_t d2, uint16x4_t d3,
const uint16_t fwd_offset, const uint16_t bck_offset,
- const int16x4_t sub_const_vec, const int16_t round_bits,
- const int use_dist_wtd_comp_avg, uint8x8_t *t0, uint8x8_t *t1) {
- int16x4_t tmp0, tmp1, tmp2, tmp3;
- uint16x4_t tmp_u0, tmp_u1, tmp_u2, tmp_u3;
- uint32x4_t sum0, sum1, sum2, sum3;
+ const int16x8_t round_offset, uint8x8_t *d01_u8, uint8x8_t *d23_u8) {
+ uint32x4_t blend0 = vmull_n_u16(dd0, fwd_offset);
+ blend0 = vmlal_n_u16(blend0, d0, bck_offset);
+ uint32x4_t blend1 = vmull_n_u16(dd1, fwd_offset);
+ blend1 = vmlal_n_u16(blend1, d1, bck_offset);
+ uint32x4_t blend2 = vmull_n_u16(dd2, fwd_offset);
+ blend2 = vmlal_n_u16(blend2, d2, bck_offset);
+ uint32x4_t blend3 = vmull_n_u16(dd3, fwd_offset);
+ blend3 = vmlal_n_u16(blend3, d3, bck_offset);
- int32x4_t dst0, dst1, dst2, dst3;
- int16x8_t tmp4, tmp5;
+ uint16x4_t avg0 = vshrn_n_u32(blend0, DIST_PRECISION_BITS);
+ uint16x4_t avg1 = vshrn_n_u32(blend1, DIST_PRECISION_BITS);
+ uint16x4_t avg2 = vshrn_n_u32(blend2, DIST_PRECISION_BITS);
+ uint16x4_t avg3 = vshrn_n_u32(blend3, DIST_PRECISION_BITS);
- if (use_dist_wtd_comp_avg) {
- const int32x4_t round_bits_vec = vdupq_n_s32((int32_t)(-round_bits));
- const int32x4_t const_vec = vmovl_s16(sub_const_vec);
+ int16x8_t dst_01 = vreinterpretq_s16_u16(vcombine_u16(avg0, avg1));
+ int16x8_t dst_23 = vreinterpretq_s16_u16(vcombine_u16(avg2, avg3));
- sum0 = vmull_n_u16(res0, fwd_offset);
- sum0 = vmlal_n_u16(sum0, d0, bck_offset);
- sum1 = vmull_n_u16(res1, fwd_offset);
- sum1 = vmlal_n_u16(sum1, d1, bck_offset);
- sum2 = vmull_n_u16(res2, fwd_offset);
- sum2 = vmlal_n_u16(sum2, d2, bck_offset);
- sum3 = vmull_n_u16(res3, fwd_offset);
- sum3 = vmlal_n_u16(sum3, d3, bck_offset);
+ dst_01 = vsubq_s16(dst_01, round_offset);
+ dst_23 = vsubq_s16(dst_23, round_offset);
- sum0 = vshrq_n_u32(sum0, DIST_PRECISION_BITS);
- sum1 = vshrq_n_u32(sum1, DIST_PRECISION_BITS);
- sum2 = vshrq_n_u32(sum2, DIST_PRECISION_BITS);
- sum3 = vshrq_n_u32(sum3, DIST_PRECISION_BITS);
-
- dst0 = vsubq_s32(vreinterpretq_s32_u32(sum0), const_vec);
- dst1 = vsubq_s32(vreinterpretq_s32_u32(sum1), const_vec);
- dst2 = vsubq_s32(vreinterpretq_s32_u32(sum2), const_vec);
- dst3 = vsubq_s32(vreinterpretq_s32_u32(sum3), const_vec);
-
- dst0 = vqrshlq_s32(dst0, round_bits_vec);
- dst1 = vqrshlq_s32(dst1, round_bits_vec);
- dst2 = vqrshlq_s32(dst2, round_bits_vec);
- dst3 = vqrshlq_s32(dst3, round_bits_vec);
-
- tmp4 = vcombine_s16(vmovn_s32(dst0), vmovn_s32(dst1));
- tmp5 = vcombine_s16(vmovn_s32(dst2), vmovn_s32(dst3));
-
- *t0 = vqmovun_s16(tmp4);
- *t1 = vqmovun_s16(tmp5);
- } else {
- const int16x4_t round_bits_vec = vdup_n_s16(-round_bits);
- tmp_u0 = vhadd_u16(res0, d0);
- tmp_u1 = vhadd_u16(res1, d1);
- tmp_u2 = vhadd_u16(res2, d2);
- tmp_u3 = vhadd_u16(res3, d3);
-
- tmp0 = vsub_s16(vreinterpret_s16_u16(tmp_u0), sub_const_vec);
- tmp1 = vsub_s16(vreinterpret_s16_u16(tmp_u1), sub_const_vec);
- tmp2 = vsub_s16(vreinterpret_s16_u16(tmp_u2), sub_const_vec);
- tmp3 = vsub_s16(vreinterpret_s16_u16(tmp_u3), sub_const_vec);
-
- tmp0 = vqrshl_s16(tmp0, round_bits_vec);
- tmp1 = vqrshl_s16(tmp1, round_bits_vec);
- tmp2 = vqrshl_s16(tmp2, round_bits_vec);
- tmp3 = vqrshl_s16(tmp3, round_bits_vec);
-
- tmp4 = vcombine_s16(tmp0, tmp1);
- tmp5 = vcombine_s16(tmp2, tmp3);
-
- *t0 = vqmovun_s16(tmp4);
- *t1 = vqmovun_s16(tmp5);
- }
+ *d01_u8 = vqrshrun_n_s16(dst_01, FILTER_BITS - ROUND0_BITS);
+ *d23_u8 = vqrshrun_n_s16(dst_23, FILTER_BITS - ROUND0_BITS);
}
-static INLINE void compute_avg_8x4(
- uint16x8_t res0, uint16x8_t res1, uint16x8_t res2, uint16x8_t res3,
+static INLINE void compute_basic_avg_4x4(uint16x4_t dd0, uint16x4_t dd1,
+ uint16x4_t dd2, uint16x4_t dd3,
+ uint16x4_t d0, uint16x4_t d1,
+ uint16x4_t d2, uint16x4_t d3,
+ const int16x8_t round_offset,
+ uint8x8_t *d01_u8, uint8x8_t *d23_u8) {
+ uint16x4_t avg0 = vhadd_u16(dd0, d0);
+ uint16x4_t avg1 = vhadd_u16(dd1, d1);
+ uint16x4_t avg2 = vhadd_u16(dd2, d2);
+ uint16x4_t avg3 = vhadd_u16(dd3, d3);
+
+ int16x8_t dst_01 = vreinterpretq_s16_u16(vcombine_u16(avg0, avg1));
+ int16x8_t dst_23 = vreinterpretq_s16_u16(vcombine_u16(avg2, avg3));
+
+ dst_01 = vsubq_s16(dst_01, round_offset);
+ dst_23 = vsubq_s16(dst_23, round_offset);
+
+ *d01_u8 = vqrshrun_n_s16(dst_01, FILTER_BITS - ROUND0_BITS);
+ *d23_u8 = vqrshrun_n_s16(dst_23, FILTER_BITS - ROUND0_BITS);
+}
+
+static INLINE void compute_dist_wtd_avg_8x4(
+ uint16x8_t dd0, uint16x8_t dd1, uint16x8_t dd2, uint16x8_t dd3,
uint16x8_t d0, uint16x8_t d1, uint16x8_t d2, uint16x8_t d3,
const uint16_t fwd_offset, const uint16_t bck_offset,
- const int16x4_t sub_const, const int16_t round_bits,
- const int use_dist_wtd_comp_avg, uint8x8_t *t0, uint8x8_t *t1,
- uint8x8_t *t2, uint8x8_t *t3) {
- int16x8_t f0, f1, f2, f3;
- uint32x4_t sum0, sum1, sum2, sum3;
- uint32x4_t sum4, sum5, sum6, sum7;
- int32x4_t dst0, dst1, dst2, dst3;
- int32x4_t dst4, dst5, dst6, dst7;
- uint16x8_t tmp_u0, tmp_u1, tmp_u2, tmp_u3;
+ const int16x8_t round_offset, uint8x8_t *d0_u8, uint8x8_t *d1_u8,
+ uint8x8_t *d2_u8, uint8x8_t *d3_u8) {
+ uint32x4_t blend0_lo = vmull_n_u16(vget_low_u16(dd0), fwd_offset);
+ blend0_lo = vmlal_n_u16(blend0_lo, vget_low_u16(d0), bck_offset);
+ uint32x4_t blend0_hi = vmull_n_u16(vget_high_u16(dd0), fwd_offset);
+ blend0_hi = vmlal_n_u16(blend0_hi, vget_high_u16(d0), bck_offset);
- if (use_dist_wtd_comp_avg) {
- const int32x4_t sub_const_vec = vmovl_s16(sub_const);
- const int32x4_t round_bits_vec = vdupq_n_s32(-(int32_t)round_bits);
+ uint32x4_t blend1_lo = vmull_n_u16(vget_low_u16(dd1), fwd_offset);
+ blend1_lo = vmlal_n_u16(blend1_lo, vget_low_u16(d1), bck_offset);
+ uint32x4_t blend1_hi = vmull_n_u16(vget_high_u16(dd1), fwd_offset);
+ blend1_hi = vmlal_n_u16(blend1_hi, vget_high_u16(d1), bck_offset);
- sum0 = vmull_n_u16(vget_low_u16(res0), fwd_offset);
- sum0 = vmlal_n_u16(sum0, vget_low_u16(d0), bck_offset);
- sum1 = vmull_n_u16(vget_low_u16(res1), fwd_offset);
- sum1 = vmlal_n_u16(sum1, vget_low_u16(d1), bck_offset);
- sum0 = vshrq_n_u32(sum0, DIST_PRECISION_BITS);
- sum1 = vshrq_n_u32(sum1, DIST_PRECISION_BITS);
+ uint32x4_t blend2_lo = vmull_n_u16(vget_low_u16(dd2), fwd_offset);
+ blend2_lo = vmlal_n_u16(blend2_lo, vget_low_u16(d2), bck_offset);
+ uint32x4_t blend2_hi = vmull_n_u16(vget_high_u16(dd2), fwd_offset);
+ blend2_hi = vmlal_n_u16(blend2_hi, vget_high_u16(d2), bck_offset);
- sum2 = vmull_n_u16(vget_high_u16(res0), fwd_offset);
- sum2 = vmlal_n_u16(sum2, vget_high_u16(d0), bck_offset);
- sum3 = vmull_n_u16(vget_high_u16(res1), fwd_offset);
- sum3 = vmlal_n_u16(sum3, vget_high_u16(d1), bck_offset);
- sum2 = vshrq_n_u32(sum2, DIST_PRECISION_BITS);
- sum3 = vshrq_n_u32(sum3, DIST_PRECISION_BITS);
+ uint32x4_t blend3_lo = vmull_n_u16(vget_low_u16(dd3), fwd_offset);
+ blend3_lo = vmlal_n_u16(blend3_lo, vget_low_u16(d3), bck_offset);
+ uint32x4_t blend3_hi = vmull_n_u16(vget_high_u16(dd3), fwd_offset);
+ blend3_hi = vmlal_n_u16(blend3_hi, vget_high_u16(d3), bck_offset);
- sum4 = vmull_n_u16(vget_low_u16(res2), fwd_offset);
- sum4 = vmlal_n_u16(sum4, vget_low_u16(d2), bck_offset);
- sum5 = vmull_n_u16(vget_low_u16(res3), fwd_offset);
- sum5 = vmlal_n_u16(sum5, vget_low_u16(d3), bck_offset);
- sum4 = vshrq_n_u32(sum4, DIST_PRECISION_BITS);
- sum5 = vshrq_n_u32(sum5, DIST_PRECISION_BITS);
+ uint16x8_t avg0 = vcombine_u16(vshrn_n_u32(blend0_lo, DIST_PRECISION_BITS),
+ vshrn_n_u32(blend0_hi, DIST_PRECISION_BITS));
+ uint16x8_t avg1 = vcombine_u16(vshrn_n_u32(blend1_lo, DIST_PRECISION_BITS),
+ vshrn_n_u32(blend1_hi, DIST_PRECISION_BITS));
+ uint16x8_t avg2 = vcombine_u16(vshrn_n_u32(blend2_lo, DIST_PRECISION_BITS),
+ vshrn_n_u32(blend2_hi, DIST_PRECISION_BITS));
+ uint16x8_t avg3 = vcombine_u16(vshrn_n_u32(blend3_lo, DIST_PRECISION_BITS),
+ vshrn_n_u32(blend3_hi, DIST_PRECISION_BITS));
- sum6 = vmull_n_u16(vget_high_u16(res2), fwd_offset);
- sum6 = vmlal_n_u16(sum6, vget_high_u16(d2), bck_offset);
- sum7 = vmull_n_u16(vget_high_u16(res3), fwd_offset);
- sum7 = vmlal_n_u16(sum7, vget_high_u16(d3), bck_offset);
- sum6 = vshrq_n_u32(sum6, DIST_PRECISION_BITS);
- sum7 = vshrq_n_u32(sum7, DIST_PRECISION_BITS);
+ int16x8_t dst0 = vsubq_s16(vreinterpretq_s16_u16(avg0), round_offset);
+ int16x8_t dst1 = vsubq_s16(vreinterpretq_s16_u16(avg1), round_offset);
+ int16x8_t dst2 = vsubq_s16(vreinterpretq_s16_u16(avg2), round_offset);
+ int16x8_t dst3 = vsubq_s16(vreinterpretq_s16_u16(avg3), round_offset);
- dst0 = vsubq_s32(vreinterpretq_s32_u32(sum0), sub_const_vec);
- dst1 = vsubq_s32(vreinterpretq_s32_u32(sum1), sub_const_vec);
- dst2 = vsubq_s32(vreinterpretq_s32_u32(sum2), sub_const_vec);
- dst3 = vsubq_s32(vreinterpretq_s32_u32(sum3), sub_const_vec);
- dst4 = vsubq_s32(vreinterpretq_s32_u32(sum4), sub_const_vec);
- dst5 = vsubq_s32(vreinterpretq_s32_u32(sum5), sub_const_vec);
- dst6 = vsubq_s32(vreinterpretq_s32_u32(sum6), sub_const_vec);
- dst7 = vsubq_s32(vreinterpretq_s32_u32(sum7), sub_const_vec);
-
- dst0 = vqrshlq_s32(dst0, round_bits_vec);
- dst1 = vqrshlq_s32(dst1, round_bits_vec);
- dst2 = vqrshlq_s32(dst2, round_bits_vec);
- dst3 = vqrshlq_s32(dst3, round_bits_vec);
- dst4 = vqrshlq_s32(dst4, round_bits_vec);
- dst5 = vqrshlq_s32(dst5, round_bits_vec);
- dst6 = vqrshlq_s32(dst6, round_bits_vec);
- dst7 = vqrshlq_s32(dst7, round_bits_vec);
-
- f0 = vcombine_s16(vmovn_s32(dst0), vmovn_s32(dst2));
- f1 = vcombine_s16(vmovn_s32(dst1), vmovn_s32(dst3));
- f2 = vcombine_s16(vmovn_s32(dst4), vmovn_s32(dst6));
- f3 = vcombine_s16(vmovn_s32(dst5), vmovn_s32(dst7));
-
- *t0 = vqmovun_s16(f0);
- *t1 = vqmovun_s16(f1);
- *t2 = vqmovun_s16(f2);
- *t3 = vqmovun_s16(f3);
-
- } else {
- const int16x8_t sub_const_vec = vcombine_s16(sub_const, sub_const);
- const int16x8_t round_bits_vec = vdupq_n_s16(-round_bits);
-
- tmp_u0 = vhaddq_u16(res0, d0);
- tmp_u1 = vhaddq_u16(res1, d1);
- tmp_u2 = vhaddq_u16(res2, d2);
- tmp_u3 = vhaddq_u16(res3, d3);
-
- f0 = vsubq_s16(vreinterpretq_s16_u16(tmp_u0), sub_const_vec);
- f1 = vsubq_s16(vreinterpretq_s16_u16(tmp_u1), sub_const_vec);
- f2 = vsubq_s16(vreinterpretq_s16_u16(tmp_u2), sub_const_vec);
- f3 = vsubq_s16(vreinterpretq_s16_u16(tmp_u3), sub_const_vec);
-
- f0 = vqrshlq_s16(f0, round_bits_vec);
- f1 = vqrshlq_s16(f1, round_bits_vec);
- f2 = vqrshlq_s16(f2, round_bits_vec);
- f3 = vqrshlq_s16(f3, round_bits_vec);
-
- *t0 = vqmovun_s16(f0);
- *t1 = vqmovun_s16(f1);
- *t2 = vqmovun_s16(f2);
- *t3 = vqmovun_s16(f3);
- }
+ *d0_u8 = vqrshrun_n_s16(dst0, FILTER_BITS - ROUND0_BITS);
+ *d1_u8 = vqrshrun_n_s16(dst1, FILTER_BITS - ROUND0_BITS);
+ *d2_u8 = vqrshrun_n_s16(dst2, FILTER_BITS - ROUND0_BITS);
+ *d3_u8 = vqrshrun_n_s16(dst3, FILTER_BITS - ROUND0_BITS);
}
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
+static INLINE void compute_basic_avg_8x4(uint16x8_t dd0, uint16x8_t dd1,
+ uint16x8_t dd2, uint16x8_t dd3,
+ uint16x8_t d0, uint16x8_t d1,
+ uint16x8_t d2, uint16x8_t d3,
+ const int16x8_t round_offset,
+ uint8x8_t *d0_u8, uint8x8_t *d1_u8,
+ uint8x8_t *d2_u8, uint8x8_t *d3_u8) {
+ uint16x8_t avg0, avg1, avg2, avg3;
-static INLINE void dist_wtd_convolve_2d_horiz_neon(
+ avg0 = vhaddq_u16(dd0, d0);
+ avg1 = vhaddq_u16(dd1, d1);
+ avg2 = vhaddq_u16(dd2, d2);
+ avg3 = vhaddq_u16(dd3, d3);
+
+ int16x8_t dst0 = vsubq_s16(vreinterpretq_s16_u16(avg0), round_offset);
+ int16x8_t dst1 = vsubq_s16(vreinterpretq_s16_u16(avg1), round_offset);
+ int16x8_t dst2 = vsubq_s16(vreinterpretq_s16_u16(avg2), round_offset);
+ int16x8_t dst3 = vsubq_s16(vreinterpretq_s16_u16(avg3), round_offset);
+
+ *d0_u8 = vqrshrun_n_s16(dst0, FILTER_BITS - ROUND0_BITS);
+ *d1_u8 = vqrshrun_n_s16(dst1, FILTER_BITS - ROUND0_BITS);
+ *d2_u8 = vqrshrun_n_s16(dst2, FILTER_BITS - ROUND0_BITS);
+ *d3_u8 = vqrshrun_n_s16(dst3, FILTER_BITS - ROUND0_BITS);
+}
+
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
+
+static INLINE int16x4_t convolve8_4_2d_h(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const uint8x16x2_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[2];
+ int32x4_t sum;
+
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+
+ // First 4 output values.
+ sum = vusdotq_lane_s32(horiz_const, permuted_samples[0], x_filter, 0);
+ sum = vusdotq_lane_s32(sum, permuted_samples[1], x_filter, 1);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshrn_n_s32(sum, ROUND0_BITS - 1);
+}
+
+static INLINE int16x8_t convolve8_8_2d_h(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t horiz_const) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum[2];
+
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ // { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 }
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ // First 4 output values.
+ sum[0] = vusdotq_lane_s32(horiz_const, permuted_samples[0], x_filter, 0);
+ sum[0] = vusdotq_lane_s32(sum[0], permuted_samples[1], x_filter, 1);
+ // Second 4 output values.
+ sum[1] = vusdotq_lane_s32(horiz_const, permuted_samples[1], x_filter, 0);
+ sum[1] = vusdotq_lane_s32(sum[1], permuted_samples[2], x_filter, 1);
+
+ // Narrow and re-pack.
+ // We halved the convolution filter values so -1 from the right shift.
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
+}
+
+static INLINE void dist_wtd_convolve_2d_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, const int im_stride,
- const int16x8_t x_filter_s16, const int im_h, int w, const int round_0) {
+ const int16x8_t x_filter_s16, const int im_h, int w) {
const int bd = 8;
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int32x4_t horiz_const = vdupq_n_s32((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
+ // Horizontal filter.
+ const int8x8_t x_filter = vmovn_s16(x_filter_s16);
+
+ const uint8_t *src_ptr = src;
int16_t *dst_ptr = im_block;
int dst_stride = im_stride;
- int width = w;
int height = im_h;
- const int8x8_t x_filter = vmovn_s16(x_filter_s16);
- const int32x4_t horiz_const = vdupq_n_s32(1 << (bd + FILTER_BITS - 2));
-
if (w == 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0));
uint8x16_t s0, s1, s2, s3;
- int32x4_t t0, t1, t2, t3;
int16x4_t d0, d1, d2, d3;
do {
- s0 = vld1q_u8(src + 0 * src_stride);
- s1 = vld1q_u8(src + 1 * src_stride);
- s2 = vld1q_u8(src + 2 * src_stride);
- s3 = vld1q_u8(src + 3 * src_stride);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
- t0 = convolve8_4_usdot(s0, x_filter, permute_tbl, horiz_const);
- t1 = convolve8_4_usdot(s1, x_filter, permute_tbl, horiz_const);
- t2 = convolve8_4_usdot(s2, x_filter, permute_tbl, horiz_const);
- t3 = convolve8_4_usdot(s3, x_filter, permute_tbl, horiz_const);
+ d0 = convolve8_4_2d_h(s0, x_filter, permute_tbl, horiz_const);
+ d1 = convolve8_4_2d_h(s1, x_filter, permute_tbl, horiz_const);
+ d2 = convolve8_4_2d_h(s2, x_filter, permute_tbl, horiz_const);
+ d3 = convolve8_4_2d_h(s3, x_filter, permute_tbl, horiz_const);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
- d1 = vqrshl_s16(vmovn_s32(t1), shift_round_0);
- d2 = vqrshl_s16(vmovn_s32(t2), shift_round_0);
- d3 = vqrshl_s16(vmovn_s32(t3), shift_round_0);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
- vst1_s16((dst_ptr + 0 * dst_stride), d0);
- vst1_s16((dst_ptr + 1 * dst_stride), d1);
- vst1_s16((dst_ptr + 2 * dst_stride), d2);
- vst1_s16((dst_ptr + 3 * dst_stride), d3);
-
- src += 4 * src_stride;
+ src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
height -= 4;
} while (height > 0);
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0));
- const uint8_t *s;
- int16_t *d;
uint8x16_t s0, s1, s2, s3;
int16x8_t d0, d1, d2, d3;
do {
- width = w;
- s = src;
- d = dst_ptr;
+ const uint8_t *s = src_ptr;
+ int16_t *d = dst_ptr;
+ int width = w;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_usdot(s0, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d1 = convolve8_8_usdot(s1, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d2 = convolve8_8_usdot(s2, x_filter, permute_tbl, horiz_const,
- shift_round_0);
- d3 = convolve8_8_usdot(s3, x_filter, permute_tbl, horiz_const,
- shift_round_0);
+ d0 = convolve8_8_2d_h(s0, x_filter, permute_tbl, horiz_const);
+ d1 = convolve8_8_2d_h(s1, x_filter, permute_tbl, horiz_const);
+ d2 = convolve8_8_2d_h(s2, x_filter, permute_tbl, horiz_const);
+ d3 = convolve8_8_2d_h(s3, x_filter, permute_tbl, horiz_const);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
width -= 8;
} while (width > 0);
-
- src += 4 * src_stride;
+ src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
height -= 4;
} while (height > 0);
}
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE void dist_wtd_convolve_2d_horiz_neon(
+static INLINE int16x4_t convolve8_4_2d_h(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x2_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[2];
+ int32x4_t sum;
+
+ // Clamp sample range to [-128, 127] for 8-bit signed dot product.
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+
+ // Accumulate dot product into 'correction' to account for range clamp.
+ sum = vdotq_lane_s32(correction, permuted_samples[0], x_filter, 0);
+ sum = vdotq_lane_s32(sum, permuted_samples[1], x_filter, 1);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshrn_n_s32(sum, ROUND0_BITS - 1);
+}
+
+static INLINE int16x8_t convolve8_8_2d_h(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum[2];
+
+ // Clamp sample range to [-128, 127] for 8-bit signed dot product.
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ // Permute samples ready for dot product. */
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ // { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 }
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ // Accumulate dot product into 'correction' to account for range clamp.
+ // First 4 output values.
+ sum[0] = vdotq_lane_s32(correction, permuted_samples[0], x_filter, 0);
+ sum[0] = vdotq_lane_s32(sum[0], permuted_samples[1], x_filter, 1);
+ // Second 4 output values.
+ sum[1] = vdotq_lane_s32(correction, permuted_samples[1], x_filter, 0);
+ sum[1] = vdotq_lane_s32(sum[1], permuted_samples[2], x_filter, 1);
+
+ // Narrow and re-pack.
+ // We halved the convolution filter values so -1 from the right shift.
+ return vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
+}
+
+static INLINE void dist_wtd_convolve_2d_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, const int im_stride,
- const int16x8_t x_filter_s16, const int im_h, int w, const int round_0) {
+ const int16x8_t x_filter_s16, const int im_h, int w) {
const int bd = 8;
- int16_t *dst_ptr = im_block;
- int dst_stride = im_stride;
- int width = w;
- int height = im_h;
-
- const int8x8_t x_filter = vmovn_s16(x_filter_s16);
const int32_t horiz_const = (1 << (bd + FILTER_BITS - 2));
- // Dot product constants.
- const int16x8_t correct_tmp = vshlq_n_s16(x_filter_s16, 7);
- const int32x4_t correction =
- vdupq_n_s32(vaddlvq_s16(correct_tmp) + horiz_const);
+ // Dot product constants and other shims.
+ const int32_t correction_s32 = vaddlvq_s16(vshlq_n_s16(x_filter_s16, 7));
+ // Fold horiz_const into the dot-product filter correction constant. The
+ // additional shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-
+ // rounding shifts - which are generally faster than rounding shifts on
+ // modern CPUs. (The extra -1 is needed because we halved the filter values.)
+ const int32x4_t correction = vdupq_n_s32(correction_s32 + horiz_const +
+ (1 << ((ROUND0_BITS - 1) - 1)));
const uint8x16_t range_limit = vdupq_n_u8(128);
+ // Horizontal filter.
+ const int8x8_t x_filter = vmovn_s16(x_filter_s16);
+
+ const uint8_t *src_ptr = src;
+ int16_t *dst_ptr = im_block;
+ int dst_stride = im_stride;
+ int height = im_h;
if (w == 4) {
const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0));
uint8x16_t s0, s1, s2, s3;
- int32x4_t t0, t1, t2, t3;
int16x4_t d0, d1, d2, d3;
do {
- s0 = vld1q_u8(src + 0 * src_stride);
- s1 = vld1q_u8(src + 1 * src_stride);
- s2 = vld1q_u8(src + 2 * src_stride);
- s3 = vld1q_u8(src + 3 * src_stride);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
- t0 = convolve8_4_sdot(s0, x_filter, correction, range_limit, permute_tbl);
- t1 = convolve8_4_sdot(s1, x_filter, correction, range_limit, permute_tbl);
- t2 = convolve8_4_sdot(s2, x_filter, correction, range_limit, permute_tbl);
- t3 = convolve8_4_sdot(s3, x_filter, correction, range_limit, permute_tbl);
+ d0 = convolve8_4_2d_h(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_4_2d_h(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_4_2d_h(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_4_2d_h(s3, x_filter, correction, range_limit, permute_tbl);
- d0 = vqrshl_s16(vmovn_s32(t0), shift_round_0);
- d1 = vqrshl_s16(vmovn_s32(t1), shift_round_0);
- d2 = vqrshl_s16(vmovn_s32(t2), shift_round_0);
- d3 = vqrshl_s16(vmovn_s32(t3), shift_round_0);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
- vst1_s16((dst_ptr + 0 * dst_stride), d0);
- vst1_s16((dst_ptr + 1 * dst_stride), d1);
- vst1_s16((dst_ptr + 2 * dst_stride), d2);
- vst1_s16((dst_ptr + 3 * dst_stride), d3);
-
- src += 4 * src_stride;
+ src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
height -= 4;
} while (height > 0);
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0));
- const uint8_t *s;
- int16_t *d;
uint8x16_t s0, s1, s2, s3;
int16x8_t d0, d1, d2, d3;
do {
- width = w;
- s = src;
- d = dst_ptr;
+ const uint8_t *s = src_ptr;
+ int16_t *d = dst_ptr;
+ int width = w;
do {
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d1 = convolve8_8_sdot(s1, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d2 = convolve8_8_sdot(s2, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d3 = convolve8_8_sdot(s3, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ d0 = convolve8_8_2d_h(s0, x_filter, correction, range_limit,
+ permute_tbl);
+ d1 = convolve8_8_2d_h(s1, x_filter, correction, range_limit,
+ permute_tbl);
+ d2 = convolve8_8_2d_h(s2, x_filter, correction, range_limit,
+ permute_tbl);
+ d3 = convolve8_8_2d_h(s3, x_filter, correction, range_limit,
+ permute_tbl);
- vst1q_s16(d + 0 * dst_stride, d0);
- vst1q_s16(d + 1 * dst_stride, d1);
- vst1q_s16(d + 2 * dst_stride, d2);
- vst1q_s16(d + 3 * dst_stride, d3);
+ store_s16_8x4(d, dst_stride, d0, d1, d2, d3);
s += 8;
d += 8;
width -= 8;
} while (width > 0);
-
- src += 4 * src_stride;
+ src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
height -= 4;
} while (height > 0);
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
-static INLINE void dist_wtd_convolve_2d_horiz_neon(
+static INLINE int16x4_t convolve8_4_2d_h(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t x_filter,
+ const int16x4_t horiz_const) {
+ const int16x4_t x_filter_0_3 = vget_low_s16(x_filter);
+ const int16x4_t x_filter_4_7 = vget_high_s16(x_filter);
+
+ int16x4_t sum = horiz_const;
+ sum = vmla_lane_s16(sum, s0, x_filter_0_3, 0);
+ sum = vmla_lane_s16(sum, s1, x_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s2, x_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s3, x_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s4, x_filter_4_7, 0);
+ sum = vmla_lane_s16(sum, s5, x_filter_4_7, 1);
+ sum = vmla_lane_s16(sum, s6, x_filter_4_7, 2);
+ sum = vmla_lane_s16(sum, s7, x_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshr_n_s16(sum, ROUND0_BITS - 1);
+}
+
+static INLINE int16x8_t convolve8_8_2d_h(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t x_filter,
+ const int16x8_t horiz_const) {
+ const int16x4_t x_filter_0_3 = vget_low_s16(x_filter);
+ const int16x4_t x_filter_4_7 = vget_high_s16(x_filter);
+
+ int16x8_t sum = horiz_const;
+ sum = vmlaq_lane_s16(sum, s0, x_filter_0_3, 0);
+ sum = vmlaq_lane_s16(sum, s1, x_filter_0_3, 1);
+ sum = vmlaq_lane_s16(sum, s2, x_filter_0_3, 2);
+ sum = vmlaq_lane_s16(sum, s3, x_filter_0_3, 3);
+ sum = vmlaq_lane_s16(sum, s4, x_filter_4_7, 0);
+ sum = vmlaq_lane_s16(sum, s5, x_filter_4_7, 1);
+ sum = vmlaq_lane_s16(sum, s6, x_filter_4_7, 2);
+ sum = vmlaq_lane_s16(sum, s7, x_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vshrq_n_s16(sum, ROUND0_BITS - 1);
+}
+
+static INLINE void dist_wtd_convolve_2d_horiz_8tap_neon(
const uint8_t *src, int src_stride, int16_t *im_block, const int im_stride,
- const int16x8_t x_filter, const int im_h, int w, const int round_0) {
+ const int16x8_t x_filter, const int im_h, int w) {
const int bd = 8;
- const uint8_t *s;
- int16_t *dst_ptr;
- int dst_stride;
- int width, height;
- dst_ptr = im_block;
- dst_stride = im_stride;
- height = im_h;
- width = w;
+ const uint8_t *src_ptr = src;
+ int16_t *dst_ptr = im_block;
+ int dst_stride = im_stride;
+ int height = im_h;
if (w == 4) {
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
- int16x8_t tt0;
uint8x8_t t0;
-
- const int16x4_t horiz_const = vdup_n_s16((1 << (bd + FILTER_BITS - 2)));
- const int16x4_t shift_round_0 = vdup_n_s16(-(round_0));
-
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x4_t s8, s9, s10, d1, d2, d3;
- int16x8_t tt1, tt2, tt3;
uint8x8_t t1, t2, t3;
-#endif
- do {
- s = src;
- __builtin_prefetch(s + 0 * src_stride);
-#if defined(__aarch64__)
- __builtin_prefetch(s + 1 * src_stride);
- __builtin_prefetch(s + 2 * src_stride);
- __builtin_prefetch(s + 3 * src_stride);
+#endif // AOM_ARCH_AARCH64
- load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int16x4_t horiz_const = vdup_n_s16((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
+ do {
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+#if AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+
+ load_u8_8x4(src_ptr, src_stride, &t0, &t1, &t2, &t3);
transpose_u8_8x4(&t0, &t1, &t2, &t3);
- tt0 = vreinterpretq_s16_u16(vmovl_u8(t0));
- tt1 = vreinterpretq_s16_u16(vmovl_u8(t1));
- tt2 = vreinterpretq_s16_u16(vmovl_u8(t2));
- tt3 = vreinterpretq_s16_u16(vmovl_u8(t3));
- s0 = vget_low_s16(tt0);
- s1 = vget_low_s16(tt1);
- s2 = vget_low_s16(tt2);
- s3 = vget_low_s16(tt3);
- s4 = vget_high_s16(tt0);
- s5 = vget_high_s16(tt1);
- s6 = vget_high_s16(tt2);
+
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
__builtin_prefetch(dst_ptr + 0 * dst_stride);
__builtin_prefetch(dst_ptr + 1 * dst_stride);
__builtin_prefetch(dst_ptr + 2 * dst_stride);
__builtin_prefetch(dst_ptr + 3 * dst_stride);
- s += 7;
- load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ load_u8_8x4(src_ptr + 7, src_stride, &t0, &t1, &t2, &t3);
transpose_u8_8x4(&t0, &t1, &t2, &t3);
- tt0 = vreinterpretq_s16_u16(vmovl_u8(t0));
- tt1 = vreinterpretq_s16_u16(vmovl_u8(t1));
- tt2 = vreinterpretq_s16_u16(vmovl_u8(t2));
- tt3 = vreinterpretq_s16_u16(vmovl_u8(t3));
- s7 = vget_low_s16(tt0);
- s8 = vget_low_s16(tt1);
- s9 = vget_low_s16(tt2);
- s10 = vget_low_s16(tt3);
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
- d1 = convolve8_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- horiz_const, shift_round_0);
- d2 = convolve8_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- horiz_const, shift_round_0);
- d3 = convolve8_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- horiz_const, shift_round_0);
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s10 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+
+ d0 = convolve8_4_2d_h(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
+ d1 = convolve8_4_2d_h(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ horiz_const);
+ d2 = convolve8_4_2d_h(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ horiz_const);
+ d3 = convolve8_4_2d_h(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ horiz_const);
transpose_s16_4x4d(&d0, &d1, &d2, &d3);
+ store_s16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
- vst1_s16((dst_ptr + 0 * dst_stride), d0);
- vst1_s16((dst_ptr + 1 * dst_stride), d1);
- vst1_s16((dst_ptr + 2 * dst_stride), d2);
- vst1_s16((dst_ptr + 3 * dst_stride), d3);
-
- src += 4 * src_stride;
+ src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
height -= 4;
-#else
- t0 = vld1_u8(s); // a0 a1 a2 a3 a4 a5 a6 a7
- tt0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
- s0 = vget_low_s16(tt0); // a0 a1 a2 a3
- s4 = vget_high_s16(tt0); // a4 a5 a6 a7
- __builtin_prefetch(dst_ptr);
- s += 8;
- t0 = vld1_u8(s); // a8 a9 a10 a11
+#else // !AOM_ARCH_AARCH64
+ t0 = vld1_u8(src_ptr); // a0 a1 a2 a3 a4 a5 a6 a7
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0))); // a0 a1 a2 a3
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0))); // a4 a5 a6 a7
- // a8 a9 a10 a11
+ __builtin_prefetch(dst_ptr);
+
+ t0 = vld1_u8(src_ptr + 8); // a8 a9 a10 a11
s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
s1 = vext_s16(s0, s4, 1); // a1 a2 a3 a4
@@ -573,39 +591,47 @@
s6 = vext_s16(s4, s7, 2); // a6 a7 a8 a9
s7 = vext_s16(s4, s7, 3); // a7 a8 a9 a10
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
-
+ d0 = convolve8_4_2d_h(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
vst1_s16(dst_ptr, d0);
- src += src_stride;
+ src_ptr += src_stride;
dst_ptr += dst_stride;
- height -= 1;
-#endif
+ height--;
+#endif // AOM_ARCH_AARCH64
} while (height > 0);
} else {
- int16_t *d_tmp;
- int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
- int16x8_t res0;
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8, d0;
uint8x8_t t0;
+#if AOM_ARCH_AARCH64
+ int16x8_t s9, s10, s11, s12, s13, s14;
+ int16x8_t d1, d2, d3, d4, d5, d6, d7;
+ uint8x8_t t1, t2, t3, t4, t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
- const int16x8_t horiz_const = vdupq_n_s16((1 << (bd + FILTER_BITS - 2)));
- const int16x8_t shift_round_0 = vdupq_n_s16(-(round_0));
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int16x8_t horiz_const = vdupq_n_s16((1 << (bd + FILTER_BITS - 2)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
do {
-#if defined(__aarch64__)
- uint8x8_t t1, t2, t3, t4, t5, t6, t7;
- int16x8_t s8, s9, s10, s11, s12, s13, s14;
- int16x8_t res1, res2, res3, res4, res5, res6, res7;
- __builtin_prefetch(src + 0 * src_stride);
- __builtin_prefetch(src + 1 * src_stride);
- __builtin_prefetch(src + 2 * src_stride);
- __builtin_prefetch(src + 3 * src_stride);
- __builtin_prefetch(src + 4 * src_stride);
- __builtin_prefetch(src + 5 * src_stride);
- __builtin_prefetch(src + 6 * src_stride);
- __builtin_prefetch(src + 7 * src_stride);
- load_u8_8x8(src, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ const uint8_t *s;
+ int16_t *d = dst_ptr;
+ int width = w;
+
+#if AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+ __builtin_prefetch(src_ptr + 4 * src_stride);
+ __builtin_prefetch(src_ptr + 5 * src_stride);
+ __builtin_prefetch(src_ptr + 6 * src_stride);
+ __builtin_prefetch(src_ptr + 7 * src_stride);
+
+ load_u8_8x8(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
@@ -614,9 +640,8 @@
s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
- width = w;
- s = src + 7;
- d_tmp = dst_ptr;
+ s = src_ptr + 7;
+
__builtin_prefetch(dst_ptr + 0 * dst_stride);
__builtin_prefetch(dst_ptr + 1 * dst_stride);
__builtin_prefetch(dst_ptr + 2 * dst_stride);
@@ -629,6 +654,7 @@
do {
load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
@@ -638,28 +664,26 @@
s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
- res0 = convolve8_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
- res1 = convolve8_8x8_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
- horiz_const, shift_round_0);
- res2 = convolve8_8x8_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
- horiz_const, shift_round_0);
- res3 = convolve8_8x8_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- horiz_const, shift_round_0);
- res4 = convolve8_8x8_s16(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
- horiz_const, shift_round_0);
- res5 = convolve8_8x8_s16(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
- horiz_const, shift_round_0);
- res6 = convolve8_8x8_s16(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
- horiz_const, shift_round_0);
- res7 = convolve8_8x8_s16(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
- horiz_const, shift_round_0);
+ d0 = convolve8_8_2d_h(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
+ d1 = convolve8_8_2d_h(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ horiz_const);
+ d2 = convolve8_8_2d_h(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ horiz_const);
+ d3 = convolve8_8_2d_h(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ horiz_const);
+ d4 = convolve8_8_2d_h(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
+ horiz_const);
+ d5 = convolve8_8_2d_h(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
+ horiz_const);
+ d6 = convolve8_8_2d_h(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
+ horiz_const);
+ d7 = convolve8_8_2d_h(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
+ horiz_const);
- transpose_s16_8x8(&res0, &res1, &res2, &res3, &res4, &res5, &res6,
- &res7);
+ transpose_s16_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
+ store_s16_8x8(d, dst_stride, d0, d1, d2, d3, d4, d5, d6, d7);
- store_s16_8x8(d_tmp, dst_stride, res0, res1, res2, res3, res4, res5,
- res6, res7);
s0 = s8;
s1 = s9;
s2 = s10;
@@ -668,337 +692,624 @@
s5 = s13;
s6 = s14;
s += 8;
- d_tmp += 8;
+ d += 8;
width -= 8;
} while (width > 0);
- src += 8 * src_stride;
+ src_ptr += 8 * src_stride;
dst_ptr += 8 * dst_stride;
height -= 8;
-#else
- int16x8_t temp_0;
- t0 = vld1_u8(src);
+#else // !AOM_ARCH_AARCH64
+ t0 = vld1_u8(src_ptr);
s0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
- width = w;
- s = src + 8;
- d_tmp = dst_ptr;
+ s = src_ptr + 8;
__builtin_prefetch(dst_ptr);
do {
t0 = vld1_u8(s); // a8 a9 a10 a11 a12 a13 a14 a15
- s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
- temp_0 = s0;
- s0 = s7;
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t0));
- s1 = vextq_s16(temp_0, s7, 1); // a1 a2 a3 a4 a5 a6 a7 a8
- s2 = vextq_s16(temp_0, s7, 2); // a2 a3 a4 a5 a6 a7 a8 a9
- s3 = vextq_s16(temp_0, s7, 3); // a3 a4 a5 a6 a7 a8 a9 a10
- s4 = vextq_s16(temp_0, s7, 4); // a4 a5 a6 a7 a8 a9 a10 a11
- s5 = vextq_s16(temp_0, s7, 5); // a5 a6 a7 a8 a9 a10 a11 a12
- s6 = vextq_s16(temp_0, s7, 6); // a6 a7 a8 a9 a10 a11 a12 a13
- s7 = vextq_s16(temp_0, s7, 7); // a7 a8 a9 a10 a11 a12 a13 a14
+ s1 = vextq_s16(s0, s8, 1); // a1 a2 a3 a4 a5 a6 a7 a8
+ s2 = vextq_s16(s0, s8, 2); // a2 a3 a4 a5 a6 a7 a8 a9
+ s3 = vextq_s16(s0, s8, 3); // a3 a4 a5 a6 a7 a8 a9 a10
+ s4 = vextq_s16(s0, s8, 4); // a4 a5 a6 a7 a8 a9 a10 a11
+ s5 = vextq_s16(s0, s8, 5); // a5 a6 a7 a8 a9 a10 a11 a12
+ s6 = vextq_s16(s0, s8, 6); // a6 a7 a8 a9 a10 a11 a12 a13
+ s7 = vextq_s16(s0, s8, 7); // a7 a8 a9 a10 a11 a12 a13 a14
- res0 = convolve8_8x8_s16(temp_0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- horiz_const, shift_round_0);
- vst1q_s16(d_tmp, res0);
+ d0 = convolve8_8_2d_h(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ horiz_const);
+ vst1q_s16(d, d0);
+ s0 = s8;
s += 8;
- d_tmp += 8;
+ d += 8;
width -= 8;
} while (width > 0);
- src += src_stride;
+ src_ptr += src_stride;
dst_ptr += dst_stride;
- height -= 1;
-#endif
+ height--;
+#endif // AOM_ARCH_AARCH64
} while (height > 0);
}
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-static INLINE void dist_wtd_convolve_2d_vert_6tap_neon(
+static INLINE uint16x4_t
+convolve6_4_2d_v(const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter, const int32x4_t offset_const) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int32x4_t sum = offset_const;
+ // Filter values at indices 0 and 7 are 0.
+ sum = vmlal_lane_s16(sum, s0, y_filter_0_3, 1);
+ sum = vmlal_lane_s16(sum, s1, y_filter_0_3, 2);
+ sum = vmlal_lane_s16(sum, s2, y_filter_0_3, 3);
+ sum = vmlal_lane_s16(sum, s3, y_filter_4_7, 0);
+ sum = vmlal_lane_s16(sum, s4, y_filter_4_7, 1);
+ sum = vmlal_lane_s16(sum, s5, y_filter_4_7, 2);
+
+ return vqrshrun_n_s32(sum, COMPOUND_ROUND1_BITS);
+}
+
+static INLINE uint16x8_t
+convolve6_8_2d_v(const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filter, const int32x4_t offset_const) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int32x4_t sum0 = offset_const;
+ // Filter values at indices 0 and 7 are 0.
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s0), y_filter_0_3, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_0_3, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_0_3, 3);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_4_7, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_4_7, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_4_7, 2);
+
+ int32x4_t sum1 = offset_const;
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s0), y_filter_0_3, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_0_3, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_0_3, 3);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_4_7, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_4_7, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_4_7, 2);
+
+ return vcombine_u16(vqrshrun_n_s32(sum0, COMPOUND_ROUND1_BITS),
+ vqrshrun_n_s32(sum1, COMPOUND_ROUND1_BITS));
+}
+
+static INLINE void dist_wtd_convolve_2d_vert_6tap_dist_wtd_avg_neon(
int16_t *src_ptr, const int src_stride, uint8_t *dst8_ptr, int dst8_stride,
ConvolveParams *conv_params, const int16x8_t y_filter, int h, int w) {
- CONV_BUF_TYPE *dst_ptr = conv_params->dst;
- const int dst_stride = conv_params->dst_stride;
-
const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int16_t sub_const = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
- const int16_t round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
- const int offset = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int32x4_t round_shift_vec = vdupq_n_s32(-(conv_params->round_1));
- const int32x4_t offset_const = vdupq_n_s32(1 << offset);
- const int16x4_t sub_const_vec = vdup_n_s16(sub_const);
const uint16_t fwd_offset = conv_params->fwd_offset;
const uint16_t bck_offset = conv_params->bck_offset;
- const int do_average = conv_params->do_average;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
if (w == 4) {
int16x4_t s0, s1, s2, s3, s4, s5;
uint16x4_t dd0, d0;
uint8x8_t d01_u8;
-
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x4_t s6, s7, s8;
uint16x4_t dd1, dd2, dd3, d1, d2, d3;
uint8x8_t d23_u8;
-#endif
+#endif // AOM_ARCH_AARCH64
- s0 = vld1_s16(src_ptr + 0 * src_stride);
- s1 = vld1_s16(src_ptr + 1 * src_stride);
- s2 = vld1_s16(src_ptr + 2 * src_stride);
- s3 = vld1_s16(src_ptr + 3 * src_stride);
- s4 = vld1_s16(src_ptr + 4 * src_stride);
+ load_s16_4x5(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4);
src_ptr += 5 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_4x4(src_ptr, src_stride, &s5, &s6, &s7, &s8);
- d0 = convolve6_4_s32(s0, s1, s2, s3, s4, s5, y_filter, round_shift_vec,
- offset_const);
- d1 = convolve6_4_s32(s1, s2, s3, s4, s5, s6, y_filter, round_shift_vec,
- offset_const);
- d2 = convolve6_4_s32(s2, s3, s4, s5, s6, s7, y_filter, round_shift_vec,
- offset_const);
- d3 = convolve6_4_s32(s3, s4, s5, s6, s7, s8, y_filter, round_shift_vec,
- offset_const);
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_4_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_4_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_4_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
- if (do_average) {
- load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
- bck_offset, sub_const_vec, round_bits,
- use_dist_wtd_comp_avg, &d01_u8, &d23_u8);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01_u8, &d23_u8);
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 1);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d23_u8), 0);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d23_u8), 1);
- dst8_ptr += dst8_stride;
- } else {
- store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
- }
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+ dst8_ptr += 4 * dst8_stride;
s0 = s4;
s1 = s5;
s2 = s6;
s3 = s7;
s4 = s8;
-
src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
h -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
s5 = vld1_s16(src_ptr);
- d0 = convolve6_4_s32(s0, s1, s2, s3, s4, s5, y_filter, round_shift_vec,
- offset_const);
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
- if (do_average) {
- dd0 = vld1_u16(dst_ptr);
+ dd0 = vld1_u16(dst_ptr);
- compute_avg_4x1(dd0, d0, fwd_offset, bck_offset, sub_const_vec,
- round_bits, use_dist_wtd_comp_avg, &d01_u8);
+ compute_dist_wtd_avg_4x1(dd0, d0, fwd_offset, bck_offset,
+ vget_low_s16(round_offset_vec), &d01_u8);
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst8_ptr += dst8_stride;
+ store_u8_4x1(dst8_ptr, d01_u8, 0);
+ dst8_ptr += dst8_stride;
- } else {
- vst1_u16(dst_ptr, d0);
- }
s0 = s1;
s1 = s2;
s2 = s3;
s3 = s4;
s4 = s5;
-
src_ptr += src_stride;
dst_ptr += dst_stride;
h--;
-#endif
- } while (h > 0);
-
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
} else {
int16x8_t s0, s1, s2, s3, s4, s5;
uint16x8_t dd0, d0;
uint8x8_t d0_u8;
-
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t s6, s7, s8;
uint16x8_t dd1, dd2, dd3, d1, d2, d3;
uint8x8_t d1_u8, d2_u8, d3_u8;
-#endif
+#endif // AOM_ARCH_AARCH64
do {
int16_t *s = src_ptr;
- uint16_t *d = dst_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
uint8_t *d_u8 = dst8_ptr;
int height = h;
- s0 = vld1q_s16(s + 0 * src_stride);
- s1 = vld1q_s16(s + 1 * src_stride);
- s2 = vld1q_s16(s + 2 * src_stride);
- s3 = vld1q_s16(s + 3 * src_stride);
- s4 = vld1q_s16(s + 4 * src_stride);
+ load_s16_8x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
s += 5 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_8x4(s, src_stride, &s5, &s6, &s7, &s8);
- d0 = convolve6_8_s32(s0, s1, s2, s3, s4, s5, y_filter, round_shift_vec,
- offset_const);
- d1 = convolve6_8_s32(s1, s2, s3, s4, s5, s6, y_filter, round_shift_vec,
- offset_const);
- d2 = convolve6_8_s32(s2, s3, s4, s5, s6, s7, y_filter, round_shift_vec,
- offset_const);
- d3 = convolve6_8_s32(s3, s4, s5, s6, s7, s8, y_filter, round_shift_vec,
- offset_const);
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_8_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_8_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_8_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
- if (do_average) {
- load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
- bck_offset, sub_const_vec, round_bits,
- use_dist_wtd_comp_avg, &d0_u8, &d1_u8, &d2_u8,
- &d3_u8);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- vst1_u8(d_u8, d0_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d1_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d2_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d3_u8);
- d_u8 += dst8_stride;
- } else {
- store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
- }
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
s0 = s4;
s1 = s5;
s2 = s6;
s3 = s7;
s4 = s8;
-
s += 4 * src_stride;
d += 4 * dst_stride;
height -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
s5 = vld1q_s16(s);
- d0 = convolve6_8_s32(s0, s1, s2, s3, s4, s5, y_filter, round_shift_vec,
- offset_const);
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
- if (do_average) {
- dd0 = vld1q_u16(d);
+ dd0 = vld1q_u16(d);
- compute_avg_8x1(dd0, d0, fwd_offset, bck_offset, sub_const_vec,
- round_bits, use_dist_wtd_comp_avg, &d0_u8);
+ compute_dist_wtd_avg_8x1(dd0, d0, fwd_offset, bck_offset,
+ round_offset_vec, &d0_u8);
- vst1_u8(d_u8, d0_u8);
- d_u8 += dst8_stride;
-
- } else {
- vst1q_u16(d, d0);
- }
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
s0 = s1;
s1 = s2;
s2 = s3;
s3 = s4;
s4 = s5;
-
s += src_stride;
d += dst_stride;
height--;
-#endif
- } while (height > 0);
-
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
src_ptr += 8;
dst_ptr += 8;
dst8_ptr += 8;
w -= 8;
- } while (w > 0);
+ } while (w != 0);
}
}
-static INLINE void dist_wtd_convolve_2d_vert_8tap_neon(
+static INLINE void dist_wtd_convolve_2d_vert_6tap_avg_neon(
int16_t *src_ptr, const int src_stride, uint8_t *dst8_ptr, int dst8_stride,
ConvolveParams *conv_params, const int16x8_t y_filter, int h, int w) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
CONV_BUF_TYPE *dst_ptr = conv_params->dst;
const int dst_stride = conv_params->dst_stride;
- const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int16_t sub_const = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
+ if (w == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5;
+ uint16x4_t dd0, d0;
+ uint8x8_t d01_u8;
+#if AOM_ARCH_AARCH64
+ int16x4_t s6, s7, s8;
+ uint16x4_t dd1, dd2, dd3, d1, d2, d3;
+ uint8x8_t d23_u8;
+#endif // AOM_ARCH_AARCH64
- const int16_t round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
- const int offset = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int32x4_t round_shift_vec = vdupq_n_s32(-(conv_params->round_1));
- const int32x4_t offset_const = vdupq_n_s32(1 << offset);
- const int16x4_t sub_const_vec = vdup_n_s16(sub_const);
+ load_s16_4x5(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4);
+ src_ptr += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_4x4(src_ptr, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_4_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_4_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_4_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
+
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01_u8, &d23_u8);
+
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+ dst8_ptr += 4 * dst8_stride;
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ s5 = vld1_s16(src_ptr);
+
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+
+ dd0 = vld1_u16(dst_ptr);
+
+ compute_basic_avg_4x1(dd0, d0, vget_low_s16(round_offset_vec), &d01_u8);
+
+ store_u8_4x1(dst8_ptr, d01_u8, 0);
+ dst8_ptr += dst8_stride;
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5;
+ uint16x8_t dd0, d0;
+ uint8x8_t d0_u8;
+#if AOM_ARCH_AARCH64
+ int16x8_t s6, s7, s8;
+ uint16x8_t dd1, dd2, dd3, d1, d2, d3;
+ uint8x8_t d1_u8, d2_u8, d3_u8;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int16_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ load_s16_8x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_8x4(s, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_8_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_8_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_8_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ s5 = vld1q_s16(s);
+
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+
+ dd0 = vld1q_u16(d);
+
+ compute_basic_avg_8x1(dd0, d0, round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ dst8_ptr += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_vert_6tap_neon(
+ int16_t *src_ptr, const int src_stride, ConvolveParams *conv_params,
+ const int16x8_t y_filter, int h, int w) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+
+ if (w == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5;
+ uint16x4_t d0;
+#if AOM_ARCH_AARCH64
+ int16x4_t s6, s7, s8;
+ uint16x4_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ load_s16_4x5(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4);
+ src_ptr += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_4x4(src_ptr, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_4_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_4_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_4_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
+
+ store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ s5 = vld1_s16(src_ptr);
+
+ d0 = convolve6_4_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+
+ vst1_u16(dst_ptr, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5;
+ uint16x8_t d0;
+#if AOM_ARCH_AARCH64
+ int16x8_t s6, s7, s8;
+ uint16x8_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int16_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ load_s16_8x5(s, src_stride, &s0, &s1, &s2, &s3, &s4);
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_8x4(s, src_stride, &s5, &s6, &s7, &s8);
+
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+ d1 = convolve6_8_2d_v(s1, s2, s3, s4, s5, s6, y_filter, offset_const);
+ d2 = convolve6_8_2d_v(s2, s3, s4, s5, s6, s7, y_filter, offset_const);
+ d3 = convolve6_8_2d_v(s3, s4, s5, s6, s7, s8, y_filter, offset_const);
+
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ s5 = vld1q_s16(s);
+
+ d0 = convolve6_8_2d_v(s0, s1, s2, s3, s4, s5, y_filter, offset_const);
+
+ vst1q_u16(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+static INLINE uint16x4_t
+convolve8_4_2d_v(const int16x4_t s0, const int16x4_t s1, const int16x4_t s2,
+ const int16x4_t s3, const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t y_filter, const int32x4_t offset_const) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int32x4_t sum = offset_const;
+ sum = vmlal_lane_s16(sum, s0, y_filter_0_3, 0);
+ sum = vmlal_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmlal_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmlal_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmlal_lane_s16(sum, s4, y_filter_4_7, 0);
+ sum = vmlal_lane_s16(sum, s5, y_filter_4_7, 1);
+ sum = vmlal_lane_s16(sum, s6, y_filter_4_7, 2);
+ sum = vmlal_lane_s16(sum, s7, y_filter_4_7, 3);
+
+ return vqrshrun_n_s32(sum, COMPOUND_ROUND1_BITS);
+}
+
+static INLINE uint16x8_t
+convolve8_8_2d_v(const int16x8_t s0, const int16x8_t s1, const int16x8_t s2,
+ const int16x8_t s3, const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t y_filter, const int32x4_t offset_const) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int32x4_t sum0 = offset_const;
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s0), y_filter_0_3, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s1), y_filter_0_3, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s2), y_filter_0_3, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s3), y_filter_0_3, 3);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s4), y_filter_4_7, 0);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s5), y_filter_4_7, 1);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s6), y_filter_4_7, 2);
+ sum0 = vmlal_lane_s16(sum0, vget_low_s16(s7), y_filter_4_7, 3);
+
+ int32x4_t sum1 = offset_const;
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s0), y_filter_0_3, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s1), y_filter_0_3, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s2), y_filter_0_3, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s3), y_filter_0_3, 3);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s4), y_filter_4_7, 0);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s5), y_filter_4_7, 1);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s6), y_filter_4_7, 2);
+ sum1 = vmlal_lane_s16(sum1, vget_high_s16(s7), y_filter_4_7, 3);
+
+ return vcombine_u16(vqrshrun_n_s32(sum0, COMPOUND_ROUND1_BITS),
+ vqrshrun_n_s32(sum1, COMPOUND_ROUND1_BITS));
+}
+
+static INLINE void dist_wtd_convolve_2d_vert_8tap_dist_wtd_avg_neon(
+ int16_t *src_ptr, const int src_stride, uint8_t *dst8_ptr, int dst8_stride,
+ ConvolveParams *conv_params, const int16x8_t y_filter, int h, int w) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
const uint16_t fwd_offset = conv_params->fwd_offset;
const uint16_t bck_offset = conv_params->bck_offset;
- const int do_average = conv_params->do_average;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
if (w == 4) {
int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
uint16x4_t dd0, d0;
uint8x8_t d01_u8;
-
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x4_t s8, s9, s10;
uint16x4_t dd1, dd2, dd3, d1, d2, d3;
uint8x8_t d23_u8;
-#endif
+#endif // AOM_ARCH_AARCH64
- load_s16_4x8(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+ load_s16_4x7(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
src_ptr += 7 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_4x4(src_ptr, src_stride, &s7, &s8, &s9, &s10);
- d0 = convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const);
- d1 = convolve8_4_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const);
- d2 = convolve8_4_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
- round_shift_vec, offset_const);
- d3 = convolve8_4_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
- round_shift_vec, offset_const);
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_4_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_4_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_4_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
- if (do_average) {
- load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
- bck_offset, sub_const_vec, round_bits,
- use_dist_wtd_comp_avg, &d01_u8, &d23_u8);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01_u8, &d23_u8);
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 1);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d23_u8), 0);
- dst8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d23_u8), 1);
- dst8_ptr += dst8_stride;
- } else {
- store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
- }
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+ dst8_ptr += 4 * dst8_stride;
s0 = s4;
s1 = s5;
@@ -1007,28 +1318,23 @@
s4 = s8;
s5 = s9;
s6 = s10;
-
src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
h -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
s7 = vld1_s16(src_ptr);
- d0 = convolve8_4_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const);
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
- if (do_average) {
- dd0 = vld1_u16(dst_ptr);
+ dd0 = vld1_u16(dst_ptr);
- compute_avg_4x1(dd0, d0, fwd_offset, bck_offset, sub_const_vec,
- round_bits, use_dist_wtd_comp_avg, &d01_u8);
+ compute_dist_wtd_avg_4x1(dd0, d0, fwd_offset, bck_offset,
+ vget_low_s16(round_offset_vec), &d01_u8);
- vst1_lane_u32((uint32_t *)dst8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst8_ptr += dst8_stride;
+ store_u8_4x1(dst8_ptr, d01_u8, 0);
+ dst8_ptr += dst8_stride;
- } else {
- vst1_u16(dst_ptr, d0);
- }
s0 = s1;
s1 = s2;
s2 = s3;
@@ -1036,65 +1342,51 @@
s4 = s5;
s5 = s6;
s6 = s7;
-
src_ptr += src_stride;
dst_ptr += dst_stride;
h--;
-#endif
- } while (h > 0);
-
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
} else {
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
uint16x8_t dd0, d0;
uint8x8_t d0_u8;
-
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int16x8_t s8, s9, s10;
uint16x8_t dd1, dd2, dd3, d1, d2, d3;
uint8x8_t d1_u8, d2_u8, d3_u8;
-#endif
+#endif // AOM_ARCH_AARCH64
do {
int16_t *s = src_ptr;
- uint16_t *d = dst_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
uint8_t *d_u8 = dst8_ptr;
int height = h;
- load_s16_8x8(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6, &s7);
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
s += 7 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
- d0 = convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const);
- d1 = convolve8_8_s32(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
- round_shift_vec, offset_const);
- d2 = convolve8_8_s32(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
- round_shift_vec, offset_const);
- d3 = convolve8_8_s32(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
- round_shift_vec, offset_const);
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_8_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_8_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_8_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
- if (do_average) {
- load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
- bck_offset, sub_const_vec, round_bits,
- use_dist_wtd_comp_avg, &d0_u8, &d1_u8, &d2_u8,
- &d3_u8);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- vst1_u8(d_u8, d0_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d1_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d2_u8);
- d_u8 += dst8_stride;
- vst1_u8(d_u8, d3_u8);
- d_u8 += dst8_stride;
- } else {
- store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
- }
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
s0 = s4;
s1 = s5;
@@ -1103,28 +1395,22 @@
s4 = s8;
s5 = s9;
s6 = s10;
-
s += 4 * src_stride;
d += 4 * dst_stride;
height -= 4;
-#else
+#else // !AOM_ARCH_AARCH64
s7 = vld1q_s16(s);
- d0 = convolve8_8_s32(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
- round_shift_vec, offset_const);
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
- if (do_average) {
- dd0 = vld1q_u16(d);
+ dd0 = vld1q_u16(d);
- compute_avg_8x1(dd0, d0, fwd_offset, bck_offset, sub_const_vec,
- round_bits, use_dist_wtd_comp_avg, &d0_u8);
+ compute_dist_wtd_avg_8x1(dd0, d0, fwd_offset, bck_offset,
+ round_offset_vec, &d0_u8);
- vst1_u8(d_u8, d0_u8);
- d_u8 += dst8_stride;
-
- } else {
- vst1q_u16(d, d0);
- }
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
s0 = s1;
s1 = s2;
@@ -1133,18 +1419,318 @@
s4 = s5;
s5 = s6;
s6 = s7;
-
s += src_stride;
d += dst_stride;
height--;
-#endif
- } while (height > 0);
-
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
src_ptr += 8;
dst_ptr += 8;
dst8_ptr += 8;
w -= 8;
- } while (w > 0);
+ } while (w != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_vert_8tap_avg_neon(
+ int16_t *src_ptr, const int src_stride, uint8_t *dst8_ptr, int dst8_stride,
+ ConvolveParams *conv_params, const int16x8_t y_filter, int h, int w) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+
+ if (w == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t dd0, d0;
+ uint8x8_t d01_u8;
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10;
+ uint16x4_t dd1, dd2, dd3, d1, d2, d3;
+ uint8x8_t d23_u8;
+#endif // AOM_ARCH_AARCH64
+
+ load_s16_4x7(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ src_ptr += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_4x4(src_ptr, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_4_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_4_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_4_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
+
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01_u8, &d23_u8);
+
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+ dst8_ptr += 4 * dst8_stride;
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1_s16(src_ptr);
+
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+
+ dd0 = vld1_u16(dst_ptr);
+
+ compute_basic_avg_4x1(dd0, d0, vget_low_s16(round_offset_vec), &d01_u8);
+
+ store_u8_4x1(dst8_ptr, d01_u8, 0);
+ dst8_ptr += dst8_stride;
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t dd0, d0;
+ uint8x8_t d0_u8;
+#if AOM_ARCH_AARCH64
+ int16x8_t s8, s9, s10;
+ uint16x8_t dd1, dd2, dd3, d1, d2, d3;
+ uint8x8_t d1_u8, d2_u8, d3_u8;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int16_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_8_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_8_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_8_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1q_s16(s);
+
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+
+ dd0 = vld1q_u16(d);
+
+ compute_basic_avg_8x1(dd0, d0, round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ dst8_ptr += 8;
+ w -= 8;
+ } while (w != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_vert_8tap_neon(
+ int16_t *src_ptr, const int src_stride, ConvolveParams *conv_params,
+ const int16x8_t y_filter, int h, int w) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int32x4_t offset_const = vdupq_n_s32(1 << offset_bits);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+
+ if (w == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t d0;
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10;
+ uint16x4_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ load_s16_4x7(src_ptr, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ src_ptr += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_4x4(src_ptr, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_4_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_4_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_4_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
+
+ store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ h -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1_s16(src_ptr);
+
+ d0 = convolve8_4_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+
+ vst1_u16(dst_ptr, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ h--;
+#endif // AOM_ARCH_AARCH64
+ } while (h != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0;
+#if AOM_ARCH_AARCH64
+ int16x8_t s8, s9, s10;
+ uint16x8_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ int16_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
+
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+ d1 = convolve8_8_2d_v(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ offset_const);
+ d2 = convolve8_8_2d_v(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ offset_const);
+ d3 = convolve8_8_2d_v(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ offset_const);
+
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ s7 = vld1q_s16(s);
+
+ d0 = convolve8_8_2d_v(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ offset_const);
+
+ vst1q_u16(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ w -= 8;
+ } while (w != 0);
}
}
@@ -1154,8 +1740,8 @@
const InterpFilterParams *filter_params_y,
const int subpel_x_qn, const int subpel_y_qn,
ConvolveParams *conv_params) {
- assert(!(w % 4));
- assert(!(h % 4));
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
DECLARE_ALIGNED(16, int16_t,
im_block[(MAX_SB_SIZE + HORIZ_EXTRA_ROWS) * MAX_SB_SIZE]);
@@ -1167,7 +1753,6 @@
const int im_stride = MAX_SB_SIZE;
const int vert_offset = filter_params_y->taps / 2 - 1;
const int horiz_offset = filter_params_x->taps / 2 - 1;
- const int round_0 = conv_params->round_0 - 1;
const uint8_t *src_ptr = src - vert_offset * src_stride - horiz_offset;
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_x, subpel_x_qn & SUBPEL_MASK);
@@ -1179,162 +1764,367 @@
const int16x8_t x_filter = vshrq_n_s16(vld1q_s16(x_filter_ptr), 1);
const int16x8_t y_filter = vld1q_s16(y_filter_ptr);
- dist_wtd_convolve_2d_horiz_neon(src_ptr, src_stride, im_block, im_stride,
- x_filter, im_h, w, round_0);
+ dist_wtd_convolve_2d_horiz_8tap_neon(src_ptr, src_stride, im_block, im_stride,
+ x_filter, im_h, w);
if (clamped_y_taps == 6) {
- dist_wtd_convolve_2d_vert_6tap_neon(im_block + im_stride, im_stride, dst8,
- dst8_stride, conv_params, y_filter, h,
- w);
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_2d_vert_6tap_dist_wtd_avg_neon(
+ im_block + im_stride, im_stride, dst8, dst8_stride, conv_params,
+ y_filter, h, w);
+ } else {
+ dist_wtd_convolve_2d_vert_6tap_avg_neon(im_block + im_stride, im_stride,
+ dst8, dst8_stride, conv_params,
+ y_filter, h, w);
+ }
+ } else {
+ dist_wtd_convolve_2d_vert_6tap_neon(im_block + im_stride, im_stride,
+ conv_params, y_filter, h, w);
+ }
} else {
- dist_wtd_convolve_2d_vert_8tap_neon(im_block, im_stride, dst8, dst8_stride,
- conv_params, y_filter, h, w);
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_2d_vert_8tap_dist_wtd_avg_neon(
+ im_block, im_stride, dst8, dst8_stride, conv_params, y_filter, h,
+ w);
+ } else {
+ dist_wtd_convolve_2d_vert_8tap_avg_neon(im_block, im_stride, dst8,
+ dst8_stride, conv_params,
+ y_filter, h, w);
+ }
+ } else {
+ dist_wtd_convolve_2d_vert_8tap_neon(im_block, im_stride, conv_params,
+ y_filter, h, w);
+ }
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_copy_dist_wtd_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const uint16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const uint16x8_t round_offset_vec = vdupq_n_u16(round_offset);
+ const uint8x8_t shift_by_bits = vdup_n_u8(1 << (FILTER_BITS - ROUND0_BITS));
+
+ const uint16_t fwd_offset = conv_params->fwd_offset;
+ const uint16_t bck_offset = conv_params->bck_offset;
+
+ CONV_BUF_TYPE *dst = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ uint8x8_t s0, s1, s2, s3, d01, d23;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+
+ do {
+ load_u8_8x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vget_low_u16(vmlal_u8(round_offset_vec, s0, shift_by_bits));
+ d1 = vget_low_u16(vmlal_u8(round_offset_vec, s1, shift_by_bits));
+ d2 = vget_low_u16(vmlal_u8(round_offset_vec, s2, shift_by_bits));
+ d3 = vget_low_u16(vmlal_u8(round_offset_vec, s3, shift_by_bits));
+
+ load_u16_4x4(dst, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_dist_wtd_avg_4x4(
+ dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset, bck_offset,
+ vreinterpretq_s16_u16(round_offset_vec), &d01, &d23);
+
+ store_u8_4x1(dst8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(dst8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(dst8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(dst8 + 3 * dst8_stride, d23, 1);
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ dst8 += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ uint8x8_t s0, s1, s2, s3, d0_u8, d1_u8, d2_u8, d3_u8;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+
+ do {
+ const uint8_t *s = src;
+ CONV_BUF_TYPE *d = dst;
+ uint8_t *d_u8 = dst8;
+ int width = w;
+
+ do {
+ load_u8_8x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vmlal_u8(round_offset_vec, s0, shift_by_bits);
+ d1 = vmlal_u8(round_offset_vec, s1, shift_by_bits);
+ d2 = vmlal_u8(round_offset_vec, s2, shift_by_bits);
+ d3 = vmlal_u8(round_offset_vec, s3, shift_by_bits);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset,
+ vreinterpretq_s16_u16(round_offset_vec),
+ &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ dst8 += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_copy_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const uint16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const uint16x8_t round_offset_vec = vdupq_n_u16(round_offset);
+ const uint8x8_t shift_by_bits = vdup_n_u8(1 << (FILTER_BITS - ROUND0_BITS));
+
+ CONV_BUF_TYPE *dst = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ uint8x8_t s0, s1, s2, s3, d01, d23;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+
+ do {
+ load_u8_8x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vget_low_u16(vmlal_u8(round_offset_vec, s0, shift_by_bits));
+ d1 = vget_low_u16(vmlal_u8(round_offset_vec, s1, shift_by_bits));
+ d2 = vget_low_u16(vmlal_u8(round_offset_vec, s2, shift_by_bits));
+ d3 = vget_low_u16(vmlal_u8(round_offset_vec, s3, shift_by_bits));
+
+ load_u16_4x4(dst, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ vreinterpretq_s16_u16(round_offset_vec), &d01,
+ &d23);
+
+ store_u8_4x1(dst8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(dst8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(dst8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(dst8 + 3 * dst8_stride, d23, 1);
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ dst8 += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ uint8x8_t s0, s1, s2, s3, d0_u8, d1_u8, d2_u8, d3_u8;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+
+ do {
+ const uint8_t *s = src;
+ CONV_BUF_TYPE *d = dst;
+ uint8_t *d_u8 = dst8;
+ int width = w;
+
+ do {
+ load_u8_8x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vmlal_u8(round_offset_vec, s0, shift_by_bits);
+ d1 = vmlal_u8(round_offset_vec, s1, shift_by_bits);
+ d2 = vmlal_u8(round_offset_vec, s2, shift_by_bits);
+ d3 = vmlal_u8(round_offset_vec, s3, shift_by_bits);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ vreinterpretq_s16_u16(round_offset_vec), &d0_u8,
+ &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ dst8 += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_2d_copy_neon(const uint8_t *src,
+ int src_stride, int w, int h,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const uint16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const uint16x8_t round_offset_vec = vdupq_n_u16(round_offset);
+ const uint8x8_t shift_by_bits = vdup_n_u8(1 << (FILTER_BITS - ROUND0_BITS));
+
+ CONV_BUF_TYPE *dst = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ uint8x8_t s0, s1, s2, s3;
+ uint16x4_t d0, d1, d2, d3;
+
+ do {
+ load_u8_8x4(src, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vget_low_u16(vmlal_u8(round_offset_vec, s0, shift_by_bits));
+ d1 = vget_low_u16(vmlal_u8(round_offset_vec, s1, shift_by_bits));
+ d2 = vget_low_u16(vmlal_u8(round_offset_vec, s2, shift_by_bits));
+ d3 = vget_low_u16(vmlal_u8(round_offset_vec, s3, shift_by_bits));
+
+ store_u16_4x4(dst, dst_stride, d0, d1, d2, d3);
+
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ uint8x8_t s0, s1, s2, s3;
+ uint16x8_t d0, d1, d2, d3;
+
+ do {
+ const uint8_t *s = src;
+ CONV_BUF_TYPE *d = dst;
+ int width = w;
+
+ do {
+ load_u8_8x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = vmlal_u8(round_offset_vec, s0, shift_by_bits);
+ d1 = vmlal_u8(round_offset_vec, s1, shift_by_bits);
+ d2 = vmlal_u8(round_offset_vec, s2, shift_by_bits);
+ d3 = vmlal_u8(round_offset_vec, s3, shift_by_bits);
+
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src += 4 * src_stride;
+ dst += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
}
}
void av1_dist_wtd_convolve_2d_copy_neon(const uint8_t *src, int src_stride,
uint8_t *dst8, int dst8_stride, int w,
int h, ConvolveParams *conv_params) {
- uint8x8_t res0_8, res1_8, res2_8, res3_8, tmp_shift0, tmp_shift1, tmp_shift2,
- tmp_shift3;
- uint16x8_t res_q0, res_q1, res_q2, res_q3, tmp_q0, tmp_q1, tmp_q2, tmp_q3;
- uint16x4_t tmp4, tmp5, tmp6, tmp7, res4, res5, res6, res7;
- const uint8_t *src1, *src2;
- uint8_t *dst8_1;
- CONV_BUF_TYPE *dst = conv_params->dst, *dst_1, *dst_2;
- const int dst_stride = conv_params->dst_stride;
- int x, y;
- const int16_t bits =
- FILTER_BITS * 2 - conv_params->round_1 - conv_params->round_0;
- const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int16x4_t sub_const_vec = vdup_n_s16((int16_t)round_offset);
- const uint16x8_t dup_round_offset16x8 = vdupq_n_u16((uint16_t)round_offset);
- const int16x4_t dup_bits16x4 = vdup_n_s16(bits);
- const int16x8_t dup_bits16x8 = vdupq_n_s16(bits);
-
- if (!(w & 0x07)) {
- for (y = 0; y < (h >> 2); ++y) {
- src1 = src;
- dst8_1 = dst8;
- dst_1 = dst;
- for (x = 0; x < (w >> 3); ++x) {
- src2 = src1;
- load_u8_8x4(src2, src_stride, &res0_8, &res1_8, &res2_8, &res3_8);
-
- res_q0 = vaddq_u16(vshlq_u16(vmovl_u8(res0_8), dup_bits16x8),
- dup_round_offset16x8);
- res_q1 = vaddq_u16(vshlq_u16(vmovl_u8(res1_8), dup_bits16x8),
- dup_round_offset16x8);
- res_q2 = vaddq_u16(vshlq_u16(vmovl_u8(res2_8), dup_bits16x8),
- dup_round_offset16x8);
- res_q3 = vaddq_u16(vshlq_u16(vmovl_u8(res3_8), dup_bits16x8),
- dup_round_offset16x8);
-
- if (conv_params->do_average) {
- dst_2 = dst_1;
- load_u16_8x4(dst_2, dst_stride, &tmp_q0, &tmp_q1, &tmp_q2, &tmp_q3);
-
- compute_avg_8x4(tmp_q0, tmp_q1, tmp_q2, tmp_q3, res_q0, res_q1,
- res_q2, res_q3, conv_params->fwd_offset,
- conv_params->bck_offset, sub_const_vec, bits,
- conv_params->use_dist_wtd_comp_avg, &tmp_shift0,
- &tmp_shift1, &tmp_shift2, &tmp_shift3);
-
- vst1_u8(dst8_1 + (0 * dst8_stride), tmp_shift0);
- vst1_u8(dst8_1 + (1 * dst8_stride), tmp_shift1);
- vst1_u8(dst8_1 + (2 * dst8_stride), tmp_shift2);
- vst1_u8(dst8_1 + (3 * dst8_stride), tmp_shift3);
-
- } else {
- vst1q_u16(dst_1 + (0 * dst_stride), res_q0);
- vst1q_u16(dst_1 + (1 * dst_stride), res_q1);
- vst1q_u16(dst_1 + (2 * dst_stride), res_q2);
- vst1q_u16(dst_1 + (3 * dst_stride), res_q3);
- }
- src1 = src1 + 8;
- dst_1 = dst_1 + 8;
- dst8_1 = dst8_1 + 8;
- }
- src += src_stride * 4;
- dst8 += dst8_stride * 4;
- dst += dst_stride * 4;
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_2d_copy_dist_wtd_avg_neon(
+ src, src_stride, dst8, dst8_stride, w, h, conv_params);
+ } else {
+ dist_wtd_convolve_2d_copy_avg_neon(src, src_stride, dst8, dst8_stride, w,
+ h, conv_params);
}
- } else if (!(w & 0x03)) {
- for (y = 0; y < (h >> 2); ++y) {
- src1 = src;
- dst8_1 = dst8;
- dst_1 = dst;
-
- load_u8_8x4(src1, src_stride, &res0_8, &res1_8, &res2_8, &res3_8);
-
- res4 = vadd_u16(vshl_u16(vget_low_u16(vmovl_u8(res0_8)), dup_bits16x4),
- vreinterpret_u16_s16(sub_const_vec));
- res5 = vadd_u16(vshl_u16(vget_low_u16(vmovl_u8(res1_8)), dup_bits16x4),
- vreinterpret_u16_s16(sub_const_vec));
- res6 = vadd_u16(vshl_u16(vget_low_u16(vmovl_u8(res2_8)), dup_bits16x4),
- vreinterpret_u16_s16(sub_const_vec));
- res7 = vadd_u16(vshl_u16(vget_low_u16(vmovl_u8(res3_8)), dup_bits16x4),
- vreinterpret_u16_s16(sub_const_vec));
- if (conv_params->do_average) {
- load_u16_4x4(dst_1, dst_stride, &tmp4, &tmp5, &tmp6, &tmp7);
-
- compute_avg_4x4(tmp4, tmp5, tmp6, tmp7, res4, res5, res6, res7,
- conv_params->fwd_offset, conv_params->bck_offset,
- sub_const_vec, bits, conv_params->use_dist_wtd_comp_avg,
- &tmp_shift0, &tmp_shift1);
-
- vst1_lane_u32((uint32_t *)(dst8_1), vreinterpret_u32_u8(tmp_shift0), 0);
- dst8_1 += dst8_stride;
- vst1_lane_u32((uint32_t *)(dst8_1), vreinterpret_u32_u8(tmp_shift0), 1);
- dst8_1 += dst8_stride;
- vst1_lane_u32((uint32_t *)(dst8_1), vreinterpret_u32_u8(tmp_shift1), 0);
- dst8_1 += dst8_stride;
- vst1_lane_u32((uint32_t *)(dst8_1), vreinterpret_u32_u8(tmp_shift1), 1);
-
- } else {
- vst1_u16(dst_1, res4);
- dst_1 += dst_stride;
- vst1_u16(dst_1, res5);
- dst_1 += dst_stride;
- vst1_u16(dst_1, res6);
- dst_1 += dst_stride;
- vst1_u16(dst_1, res7);
- }
- src += src_stride * 4;
- dst += dst_stride * 4;
- dst8 += dst8_stride * 4;
- }
+ } else {
+ dist_wtd_convolve_2d_copy_neon(src, src_stride, w, h, conv_params);
}
}
-#if defined(__aarch64__) && defined(__ARM_FEATURE_MATMUL_INT8)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_MATMUL_INT8)
-void av1_dist_wtd_convolve_x_neon(const uint8_t *src, int src_stride,
- uint8_t *dst8, int dst8_stride, int w, int h,
- const InterpFilterParams *filter_params_x,
- const int subpel_x_qn,
- ConvolveParams *conv_params) {
- assert(!(w % 4));
- assert(!(h % 4));
+static INLINE uint16x4_t convolve8_4_x(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const uint8x16x2_t permute_tbl,
+ const int32x4_t round_offset) {
+ uint8x16_t permuted_samples[2];
+ int32x4_t sum;
- const int horiz_offset = filter_params_x->taps / 2 - 1;
- const int bits = FILTER_BITS - conv_params->round_1;
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+
+ // First 4 output values.
+ sum = vusdotq_lane_s32(round_offset, permuted_samples[0], x_filter, 0);
+ sum = vusdotq_lane_s32(sum, permuted_samples[1], x_filter, 1);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vreinterpret_u16_s16(vshrn_n_s32(sum, ROUND0_BITS - 1));
+}
+
+static INLINE uint16x8_t convolve8_8_x(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const uint8x16x3_t permute_tbl,
+ const int32x4_t round_offset) {
+ uint8x16_t permuted_samples[3];
+ int32x4_t sum[2];
+
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_u8(samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_u8(samples, permute_tbl.val[1]);
+ // { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 }
+ permuted_samples[2] = vqtbl1q_u8(samples, permute_tbl.val[2]);
+
+ // First 4 output values.
+ sum[0] = vusdotq_lane_s32(round_offset, permuted_samples[0], x_filter, 0);
+ sum[0] = vusdotq_lane_s32(sum[0], permuted_samples[1], x_filter, 1);
+ // Second 4 output values.
+ sum[1] = vusdotq_lane_s32(round_offset, permuted_samples[1], x_filter, 0);
+ sum[1] = vusdotq_lane_s32(sum[1], permuted_samples[2], x_filter, 1);
+
+ // Narrow and re-pack.
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x8_t res = vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
+ return vreinterpretq_u16_s16(res);
+}
+
+static INLINE void dist_wtd_convolve_x_dist_wtd_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int32x4_t round_offset_shim = vdupq_n_s32(
+ (round_offset << (ROUND0_BITS - 1)) + (1 << ((ROUND0_BITS - 1) - 1)));
+
const uint16_t fwd_offset = conv_params->fwd_offset;
const uint16_t bck_offset = conv_params->bck_offset;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
- const int16x4_t round_offset64 = vdup_n_s16(round_offset);
- const int16x8_t round_offset128 = vdupq_n_s16(round_offset);
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t horiz_const = vdupq_n_s16(bits);
// Horizontal filter.
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
@@ -1343,12 +2133,11 @@
// requirements.
const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
const uint8_t *src_ptr = src - horiz_offset;
- CONV_BUF_TYPE *dst = conv_params->dst;
- CONV_BUF_TYPE *dst_ptr = dst;
- uint8_t *dst_u8_ptr = dst8;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
int dst_stride = conv_params->dst_stride;
- int width = w;
int height = h;
if (w == 4) {
@@ -1356,167 +2145,90 @@
do {
uint8x16_t s0, s1, s2, s3;
- int32x4_t d0, d1, d2, d3;
- int16x8_t d01, d23;
- uint16x4_t dd0, dd1, dd2, dd3;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
uint8x8_t d01_u8, d23_u8;
- s0 = vld1q_u8(src_ptr + 0 * src_stride);
- s1 = vld1q_u8(src_ptr + 1 * src_stride);
- s2 = vld1q_u8(src_ptr + 2 * src_stride);
- s3 = vld1q_u8(src_ptr + 3 * src_stride);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_4_usdot(s0, x_filter, permute_tbl, vdupq_n_s32(0));
- d1 = convolve8_4_usdot(s1, x_filter, permute_tbl, vdupq_n_s32(0));
- d2 = convolve8_4_usdot(s2, x_filter, permute_tbl, vdupq_n_s32(0));
- d3 = convolve8_4_usdot(s3, x_filter, permute_tbl, vdupq_n_s32(0));
+ d0 = convolve8_4_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_4_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_4_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_4_x(s3, x_filter, permute_tbl, round_offset_shim);
- d01 = vcombine_s16(vmovn_s32(d0), vmovn_s32(d1));
- d23 = vcombine_s16(vmovn_s32(d2), vmovn_s32(d3));
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
- d01 = vqrshlq_s16(d01, shift_round_0);
- d23 = vqrshlq_s16(d23, shift_round_0);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01_u8, &d23_u8);
- d01 = vrshlq_s16(d01, horiz_const);
- d23 = vrshlq_s16(d23, horiz_const);
-
- d01 = vaddq_s16(d01, round_offset128);
- d23 = vaddq_s16(d23, round_offset128);
-
- if (conv_params->do_average) {
- dd0 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd1 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd2 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd3 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
-
- compute_avg_4x4(dd0, dd1, dd2, dd3,
- vreinterpret_u16_s16(vget_low_s16(d01)),
- vreinterpret_u16_s16(vget_high_s16(d01)),
- vreinterpret_u16_s16(vget_low_s16(d23)),
- vreinterpret_u16_s16(vget_high_s16(d23)), fwd_offset,
- bck_offset, round_offset64, round_bits,
- use_dist_wtd_comp_avg, &d01_u8, &d23_u8);
-
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d01_u8), 1);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d23_u8), 0);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d23_u8), 1);
- dst_u8_ptr += dst8_stride;
- } else {
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d01), 0);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d01), 1);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d23), 0);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d23), 1);
- dst_ptr += dst_stride;
- }
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
height -= 4;
- } while (height > 0);
+ } while (height != 0);
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
do {
const uint8_t *s = src_ptr;
CONV_BUF_TYPE *d = dst_ptr;
- uint8_t *d_u8 = dst_u8_ptr;
- width = w;
+ uint8_t *d_u8 = dst8_ptr;
+ int width = w;
do {
uint8x16_t s0, s1, s2, s3;
- int16x8_t d0, d1, d2, d3;
- uint16x8_t dd0, dd1, dd2, dd3;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
uint8x8_t d0_u8, d1_u8, d2_u8, d3_u8;
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_usdot(s0, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- d1 = convolve8_8_usdot(s1, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- d2 = convolve8_8_usdot(s2, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
- d3 = convolve8_8_usdot(s3, x_filter, permute_tbl, vdupq_n_s32(0),
- shift_round_0);
+ d0 = convolve8_8_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_8_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_8_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_8_x(s3, x_filter, permute_tbl, round_offset_shim);
- d0 = vrshlq_s16(d0, horiz_const);
- d1 = vrshlq_s16(d1, horiz_const);
- d2 = vrshlq_s16(d2, horiz_const);
- d3 = vrshlq_s16(d3, horiz_const);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- d0 = vaddq_s16(d0, round_offset128);
- d1 = vaddq_s16(d1, round_offset128);
- d2 = vaddq_s16(d2, round_offset128);
- d3 = vaddq_s16(d3, round_offset128);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- if (conv_params->do_average) {
- load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
-
- compute_avg_8x4(dd0, dd1, dd2, dd3, vreinterpretq_u16_s16(d0),
- vreinterpretq_u16_s16(d1), vreinterpretq_u16_s16(d2),
- vreinterpretq_u16_s16(d3), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &d0_u8, &d1_u8, &d2_u8, &d3_u8);
-
- store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
- } else {
- store_u16_8x4(d, dst_stride, vreinterpretq_u16_s16(d0),
- vreinterpretq_u16_s16(d1), vreinterpretq_u16_s16(d2),
- vreinterpretq_u16_s16(d3));
- }
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
s += 8;
d += 8;
d_u8 += 8;
width -= 8;
- } while (width > 0);
-
+ } while (width != 0);
src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
- dst_u8_ptr += 4 * dst8_stride;
+ dst8_ptr += 4 * dst8_stride;
height -= 4;
- } while (height > 0);
+ } while (height != 0);
}
}
-#elif defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+static INLINE void dist_wtd_convolve_x_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
-void av1_dist_wtd_convolve_x_neon(const uint8_t *src, int src_stride,
- uint8_t *dst8, int dst8_stride, int w, int h,
- const InterpFilterParams *filter_params_x,
- const int subpel_x_qn,
- ConvolveParams *conv_params) {
- assert(!(w % 4));
- assert(!(h % 4));
-
- const int horiz_offset = filter_params_x->taps / 2 - 1;
- const int bits = FILTER_BITS - conv_params->round_1;
const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
- const uint16_t fwd_offset = conv_params->fwd_offset;
- const uint16_t bck_offset = conv_params->bck_offset;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
- const int16x4_t round_offset64 = vdup_n_s16(round_offset);
- const int16x8_t round_offset128 = vdupq_n_s16(round_offset);
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t horiz_const = vdupq_n_s16(bits);
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int32x4_t round_offset_shim = vdupq_n_s32(
+ (round_offset << (ROUND0_BITS - 1)) + (1 << ((ROUND0_BITS - 1) - 1)));
// Horizontal filter.
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
@@ -1524,17 +2236,267 @@
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
- // Dot-product constants.
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
+ int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+ uint8x8_t d01_u8, d23_u8;
+
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_4_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_4_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_4_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_4_x(s3, x_filter, permute_tbl, round_offset_shim);
+
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01_u8, &d23_u8);
+
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int width = w;
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+ uint8x8_t d0_u8, d1_u8, d2_u8, d3_u8;
+
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_8_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_8_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_8_x(s3, x_filter, permute_tbl, round_offset_shim);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_x_neon(
+ const uint8_t *src, int src_stride, int w, int h,
+ const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ // A shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-rounding
+ // shifts - which are generally faster than rounding shifts on modern CPUs.
+ // (The extra -1 is needed because we halved the filter values.)
+ const int32x4_t round_offset_shim = vdupq_n_s32(
+ (round_offset << (ROUND0_BITS - 1)) + (1 << ((ROUND0_BITS - 1) - 1)));
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x4_t d0, d1, d2, d3;
+
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_4_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_4_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_4_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_4_x(s3, x_filter, permute_tbl, round_offset_shim);
+
+ store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
+
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int width = w;
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x8_t d0, d1, d2, d3;
+
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_x(s0, x_filter, permute_tbl, round_offset_shim);
+ d1 = convolve8_8_x(s1, x_filter, permute_tbl, round_offset_shim);
+ d2 = convolve8_8_x(s2, x_filter, permute_tbl, round_offset_shim);
+ d3 = convolve8_8_x(s3, x_filter, permute_tbl, round_offset_shim);
+
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+#elif AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
+
+static INLINE uint16x4_t convolve8_4_x(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x2_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[2];
+ int32x4_t sum;
+
+ // Clamp sample range to [-128, 127] for 8-bit signed dot product.
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ // Permute samples ready for dot product.
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+
+ // Accumulate dot product into 'correction' to account for range clamp.
+ sum = vdotq_lane_s32(correction, permuted_samples[0], x_filter, 0);
+ sum = vdotq_lane_s32(sum, permuted_samples[1], x_filter, 1);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ return vreinterpret_u16_s16(vshrn_n_s32(sum, ROUND0_BITS - 1));
+}
+
+static INLINE uint16x8_t convolve8_8_x(uint8x16_t samples,
+ const int8x8_t x_filter,
+ const int32x4_t correction,
+ const uint8x16_t range_limit,
+ const uint8x16x3_t permute_tbl) {
+ int8x16_t clamped_samples, permuted_samples[3];
+ int32x4_t sum[2];
+
+ // Clamp sample range to [-128, 127] for 8-bit signed dot product.
+ clamped_samples = vreinterpretq_s8_u8(vsubq_u8(samples, range_limit));
+
+ // Permute samples ready for dot product. */
+ // { 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6 }
+ permuted_samples[0] = vqtbl1q_s8(clamped_samples, permute_tbl.val[0]);
+ // { 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10 }
+ permuted_samples[1] = vqtbl1q_s8(clamped_samples, permute_tbl.val[1]);
+ // { 8, 9, 10, 11, 9, 10, 11, 12, 10, 11, 12, 13, 11, 12, 13, 14 }
+ permuted_samples[2] = vqtbl1q_s8(clamped_samples, permute_tbl.val[2]);
+
+ // Accumulate dot product into 'correction' to account for range clamp.
+ // First 4 output values.
+ sum[0] = vdotq_lane_s32(correction, permuted_samples[0], x_filter, 0);
+ sum[0] = vdotq_lane_s32(sum[0], permuted_samples[1], x_filter, 1);
+ // Second 4 output values.
+ sum[1] = vdotq_lane_s32(correction, permuted_samples[1], x_filter, 0);
+ sum[1] = vdotq_lane_s32(sum[1], permuted_samples[2], x_filter, 1);
+
+ // Narrow and re-pack.
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x8_t res = vcombine_s16(vshrn_n_s32(sum[0], ROUND0_BITS - 1),
+ vshrn_n_s32(sum[1], ROUND0_BITS - 1));
+ return vreinterpretq_u16_s16(res);
+}
+
+static INLINE void dist_wtd_convolve_x_dist_wtd_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ const uint16_t fwd_offset = conv_params->fwd_offset;
+ const uint16_t bck_offset = conv_params->bck_offset;
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
+
+ // Dot-product constants and other shims.
const uint8x16_t range_limit = vdupq_n_u8(128);
const int32_t correction_s32 = vaddlvq_s16(vshll_n_s8(x_filter, 7));
- const int32x4_t correction = vdupq_n_s32(correction_s32);
+ // Fold round_offset into the dot-product filter correction constant. The
+ // additional shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-
+ // rounding shifts - which are generally faster than rounding shifts on
+ // modern CPUs. (The extra -1 is needed because we halved the filter values.)
+ int32x4_t correction =
+ vdupq_n_s32(correction_s32 + (round_offset << (ROUND0_BITS - 1)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
const uint8_t *src_ptr = src - horiz_offset;
- CONV_BUF_TYPE *dst = conv_params->dst;
- CONV_BUF_TYPE *dst_ptr = dst;
- uint8_t *dst_u8_ptr = dst8;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
int dst_stride = conv_params->dst_stride;
- int width = w;
int height = h;
if (w == 4) {
@@ -1542,305 +2504,435 @@
do {
uint8x16_t s0, s1, s2, s3;
- int32x4_t d0, d1, d2, d3;
- int16x8_t d01, d23;
- uint16x4_t dd0, dd1, dd2, dd3;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
uint8x8_t d01_u8, d23_u8;
- s0 = vld1q_u8(src_ptr + 0 * src_stride);
- s1 = vld1q_u8(src_ptr + 1 * src_stride);
- s2 = vld1q_u8(src_ptr + 2 * src_stride);
- s3 = vld1q_u8(src_ptr + 3 * src_stride);
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_4_sdot(s0, x_filter, correction, range_limit, permute_tbl);
- d1 = convolve8_4_sdot(s1, x_filter, correction, range_limit, permute_tbl);
- d2 = convolve8_4_sdot(s2, x_filter, correction, range_limit, permute_tbl);
- d3 = convolve8_4_sdot(s3, x_filter, correction, range_limit, permute_tbl);
+ d0 = convolve8_4_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_4_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_4_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_4_x(s3, x_filter, correction, range_limit, permute_tbl);
- d01 = vcombine_s16(vmovn_s32(d0), vmovn_s32(d1));
- d23 = vcombine_s16(vmovn_s32(d2), vmovn_s32(d3));
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
- d01 = vqrshlq_s16(d01, shift_round_0);
- d23 = vqrshlq_s16(d23, shift_round_0);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01_u8, &d23_u8);
- d01 = vrshlq_s16(d01, horiz_const);
- d23 = vrshlq_s16(d23, horiz_const);
-
- d01 = vaddq_s16(d01, round_offset128);
- d23 = vaddq_s16(d23, round_offset128);
-
- if (conv_params->do_average) {
- dd0 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd1 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd2 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
- dd3 = vld1_u16(dst_ptr);
- dst_ptr += dst_stride;
-
- compute_avg_4x4(dd0, dd1, dd2, dd3,
- vreinterpret_u16_s16(vget_low_s16(d01)),
- vreinterpret_u16_s16(vget_high_s16(d01)),
- vreinterpret_u16_s16(vget_low_s16(d23)),
- vreinterpret_u16_s16(vget_high_s16(d23)), fwd_offset,
- bck_offset, round_offset64, round_bits,
- use_dist_wtd_comp_avg, &d01_u8, &d23_u8);
-
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d01_u8), 0);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d01_u8), 1);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d23_u8), 0);
- dst_u8_ptr += dst8_stride;
- vst1_lane_u32((uint32_t *)dst_u8_ptr, vreinterpret_u32_u8(d23_u8), 1);
- dst_u8_ptr += dst8_stride;
- } else {
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d01), 0);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d01), 1);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d23), 0);
- dst_ptr += dst_stride;
- vst1q_lane_u64((uint64_t *)dst_ptr, vreinterpretq_u64_s16(d23), 1);
- dst_ptr += dst_stride;
- }
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
height -= 4;
- } while (height > 0);
+ } while (height != 0);
} else {
const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
do {
const uint8_t *s = src_ptr;
CONV_BUF_TYPE *d = dst_ptr;
- uint8_t *d_u8 = dst_u8_ptr;
- width = w;
+ uint8_t *d_u8 = dst8_ptr;
+ int width = w;
do {
uint8x16_t s0, s1, s2, s3;
- int16x8_t d0, d1, d2, d3;
- uint16x8_t dd0, dd1, dd2, dd3;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
uint8x8_t d0_u8, d1_u8, d2_u8, d3_u8;
- s0 = vld1q_u8(s + 0 * src_stride);
- s1 = vld1q_u8(s + 1 * src_stride);
- s2 = vld1q_u8(s + 2 * src_stride);
- s3 = vld1q_u8(s + 3 * src_stride);
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
- d0 = convolve8_8_sdot(s0, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d1 = convolve8_8_sdot(s1, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d2 = convolve8_8_sdot(s2, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
- d3 = convolve8_8_sdot(s3, x_filter, correction, range_limit,
- permute_tbl, shift_round_0);
+ d0 = convolve8_8_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_8_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_8_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_8_x(s3, x_filter, correction, range_limit, permute_tbl);
- d0 = vrshlq_s16(d0, horiz_const);
- d1 = vrshlq_s16(d1, horiz_const);
- d2 = vrshlq_s16(d2, horiz_const);
- d3 = vrshlq_s16(d3, horiz_const);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- d0 = vaddq_s16(d0, round_offset128);
- d1 = vaddq_s16(d1, round_offset128);
- d2 = vaddq_s16(d2, round_offset128);
- d3 = vaddq_s16(d3, round_offset128);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- if (conv_params->do_average) {
- load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
-
- compute_avg_8x4(dd0, dd1, dd2, dd3, vreinterpretq_u16_s16(d0),
- vreinterpretq_u16_s16(d1), vreinterpretq_u16_s16(d2),
- vreinterpretq_u16_s16(d3), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &d0_u8, &d1_u8, &d2_u8, &d3_u8);
-
- store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
- } else {
- store_u16_8x4(d, dst_stride, vreinterpretq_u16_s16(d0),
- vreinterpretq_u16_s16(d1), vreinterpretq_u16_s16(d2),
- vreinterpretq_u16_s16(d3));
- }
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
s += 8;
d += 8;
d_u8 += 8;
width -= 8;
- } while (width > 0);
-
+ } while (width != 0);
src_ptr += 4 * src_stride;
dst_ptr += 4 * dst_stride;
- dst_u8_ptr += 4 * dst8_stride;
+ dst8_ptr += 4 * dst8_stride;
height -= 4;
- } while (height > 0);
+ } while (height != 0);
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+static INLINE void dist_wtd_convolve_x_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
-void av1_dist_wtd_convolve_x_neon(const uint8_t *src, int src_stride,
- uint8_t *dst8, int dst8_stride, int w, int h,
- const InterpFilterParams *filter_params_x,
- const int subpel_x_qn,
- ConvolveParams *conv_params) {
- assert(!(w % 4));
- assert(!(h % 4));
-
- CONV_BUF_TYPE *dst = conv_params->dst;
- int dst_stride = conv_params->dst_stride;
- const int horiz_offset = filter_params_x->taps / 2 - 1;
- const int bits = FILTER_BITS - conv_params->round_1;
const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
- const uint16_t fwd_offset = conv_params->fwd_offset;
- const uint16_t bck_offset = conv_params->bck_offset;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
- // horizontal filter
+ // Horizontal filter.
const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
+ // Dot-product constants and other shims.
+ const uint8x16_t range_limit = vdupq_n_u8(128);
+ const int32_t correction_s32 = vaddlvq_s16(vshll_n_s8(x_filter, 7));
+ // Fold round_offset into the dot-product filter correction constant. The
+ // additional shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-
+ // rounding shifts - which are generally faster than rounding shifts on
+ // modern CPUs. (The extra -1 is needed because we halved the filter values.)
+ int32x4_t correction =
+ vdupq_n_s32(correction_s32 + (round_offset << (ROUND0_BITS - 1)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
+ int dst_stride = conv_params->dst_stride;
+ int height = h;
+ if (w == 4) {
+ const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x4_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+ uint8x8_t d01_u8, d23_u8;
+
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_4_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_4_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_4_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_4_x(s3, x_filter, correction, range_limit, permute_tbl);
+
+ load_u16_4x4(dst_ptr, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01_u8, &d23_u8);
+
+ store_u8_4x1(dst8_ptr + 0 * dst8_stride, d01_u8, 0);
+ store_u8_4x1(dst8_ptr + 1 * dst8_stride, d01_u8, 1);
+ store_u8_4x1(dst8_ptr + 2 * dst8_stride, d23_u8, 0);
+ store_u8_4x1(dst8_ptr + 3 * dst8_stride, d23_u8, 1);
+
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int width = w;
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x8_t d0, d1, d2, d3, dd0, dd1, dd2, dd3;
+ uint8x8_t d0_u8, d1_u8, d2_u8, d3_u8;
+
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_8_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_8_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_8_x(s3, x_filter, correction, range_limit, permute_tbl);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_x_neon(
+ const uint8_t *src, int src_stride, int w, int h,
+ const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int8x8_t x_filter = vshrn_n_s16(vld1q_s16(x_filter_ptr), 1);
+
+ // Dot-product constants and other shims.
+ const uint8x16_t range_limit = vdupq_n_u8(128);
+ const int32_t correction_s32 = vaddlvq_s16(vshll_n_s8(x_filter, 7));
+ // Fold round_offset into the dot-product filter correction constant. The
+ // additional shim of 1 << ((ROUND0_BITS - 1) - 1) enables us to use non-
+ // rounding shifts - which are generally faster than rounding shifts on
+ // modern CPUs. (The extra -1 is needed because we halved the filter values.)
+ int32x4_t correction =
+ vdupq_n_s32(correction_s32 + (round_offset << (ROUND0_BITS - 1)) +
+ (1 << ((ROUND0_BITS - 1) - 1)));
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ int dst_stride = conv_params->dst_stride;
+ int height = h;
+
+ if (w == 4) {
+ const uint8x16x2_t permute_tbl = vld1q_u8_x2(dot_prod_permute_tbl);
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x4_t d0, d1, d2, d3;
+
+ load_u8_16x4(src_ptr, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_4_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_4_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_4_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_4_x(s3, x_filter, correction, range_limit, permute_tbl);
+
+ store_u16_4x4(dst_ptr, dst_stride, d0, d1, d2, d3);
+
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ } else {
+ const uint8x16x3_t permute_tbl = vld1q_u8_x3(dot_prod_permute_tbl);
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int width = w;
+
+ do {
+ uint8x16_t s0, s1, s2, s3;
+ uint16x8_t d0, d1, d2, d3;
+
+ load_u8_16x4(s, src_stride, &s0, &s1, &s2, &s3);
+
+ d0 = convolve8_8_x(s0, x_filter, correction, range_limit, permute_tbl);
+ d1 = convolve8_8_x(s1, x_filter, correction, range_limit, permute_tbl);
+ d2 = convolve8_8_x(s2, x_filter, correction, range_limit, permute_tbl);
+ d3 = convolve8_8_x(s3, x_filter, correction, range_limit, permute_tbl);
+
+ store_u16_8x4(d, dst_stride, d0, d1, d2, d3);
+
+ s += 8;
+ d += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+ } while (height != 0);
+ }
+}
+
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
+
+static INLINE uint16x4_t convolve8_4_x(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t x_filter,
+ const int16x4_t round_offset) {
+ const int16x4_t x_filter_0_3 = vget_low_s16(x_filter);
+ const int16x4_t x_filter_4_7 = vget_high_s16(x_filter);
+
+ int16x4_t sum = vmul_lane_s16(s0, x_filter_0_3, 0);
+ sum = vmla_lane_s16(sum, s1, x_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s2, x_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s3, x_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s4, x_filter_4_7, 0);
+ sum = vmla_lane_s16(sum, s5, x_filter_4_7, 1);
+ sum = vmla_lane_s16(sum, s6, x_filter_4_7, 2);
+ sum = vmla_lane_s16(sum, s7, x_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x4_t res = vrsra_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpret_u16_s16(res);
+}
+
+static INLINE uint16x8_t convolve8_8_x(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t x_filter,
+ const int16x8_t round_offset) {
+ const int16x4_t x_filter_0_3 = vget_low_s16(x_filter);
+ const int16x4_t x_filter_4_7 = vget_high_s16(x_filter);
+
+ int16x8_t sum = vmulq_lane_s16(s0, x_filter_0_3, 0);
+ sum = vmlaq_lane_s16(sum, s1, x_filter_0_3, 1);
+ sum = vmlaq_lane_s16(sum, s2, x_filter_0_3, 2);
+ sum = vmlaq_lane_s16(sum, s3, x_filter_0_3, 3);
+ sum = vmlaq_lane_s16(sum, s4, x_filter_4_7, 0);
+ sum = vmlaq_lane_s16(sum, s5, x_filter_4_7, 1);
+ sum = vmlaq_lane_s16(sum, s6, x_filter_4_7, 2);
+ sum = vmlaq_lane_s16(sum, s7, x_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x8_t res = vrsraq_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpretq_u16_s16(res);
+}
+
+static INLINE void dist_wtd_convolve_x_dist_wtd_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ const uint16_t fwd_offset = conv_params->fwd_offset;
+ const uint16_t bck_offset = conv_params->bck_offset;
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
// Filter values are even, so downshift by 1 to reduce intermediate precision
// requirements.
const int16x8_t x_filter = vshrq_n_s16(vld1q_s16(x_filter_ptr), 1);
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
+ int dst_stride = conv_params->dst_stride;
const uint8_t *s;
uint8_t *d_u8;
- uint8_t *dst_u8_ptr;
- CONV_BUF_TYPE *d, *dst_ptr;
- int width, height;
+ CONV_BUF_TYPE *d;
+ int width;
+ int height = h;
+
uint8x8_t t0;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
uint8x8_t t1, t2, t3, t4, t5, t6, t7;
-#endif
- s = src_ptr;
- dst_ptr = dst;
- dst_u8_ptr = dst8;
- width = w;
- height = h;
+#endif // AOM_ARCH_AARCH64
- if ((w == 4) || (h == 4)) {
- int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
- int16x8_t tt0;
- uint16x4_t res4;
-#if defined(__aarch64__)
- int16x4_t s8, s9, s10, d1, d2, d3;
- int16x8_t tt1, tt2, tt3;
- uint16x4_t res5, res6, res7;
- uint32x2_t tu0 = vdup_n_u32(0), tu1 = vdup_n_u32(0);
- int16x8_t u0, u1;
-#else
- int16x4_t temp_0;
-#endif
- const int16x4_t zero = vdup_n_s16(0);
- const int16x4_t round_offset_vec = vdup_n_s16(round_offset);
- const int16x4_t shift_round_0 = vdup_n_s16(-conv_params->round_0 + 1);
- const int16x4_t horiz_const = vdup_n_s16(bits);
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x4_t d0, dd0;
+ uint8x8_t d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s9, s10;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
do {
- s = src_ptr;
d = dst_ptr;
- d_u8 = dst_u8_ptr;
+ d_u8 = dst8_ptr;
width = w;
- __builtin_prefetch(s + 0 * src_stride);
-#if defined(__aarch64__)
- __builtin_prefetch(s + 1 * src_stride);
- __builtin_prefetch(s + 2 * src_stride);
- __builtin_prefetch(s + 3 * src_stride);
- load_u8_8x4(s, src_stride, &t0, &t1, &t2, &t3);
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+#if AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+
+ load_u8_8x4(src_ptr, src_stride, &t0, &t1, &t2, &t3);
transpose_u8_8x4(&t0, &t1, &t2, &t3);
- tt0 = vreinterpretq_s16_u16(vmovl_u8(t0));
- tt1 = vreinterpretq_s16_u16(vmovl_u8(t1));
- tt2 = vreinterpretq_s16_u16(vmovl_u8(t2));
- tt3 = vreinterpretq_s16_u16(vmovl_u8(t3));
- s0 = vget_low_s16(tt0);
- s1 = vget_low_s16(tt1);
- s2 = vget_low_s16(tt2);
- s3 = vget_low_s16(tt3);
- s4 = vget_high_s16(tt0);
- s5 = vget_high_s16(tt1);
- s6 = vget_high_s16(tt2);
+
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
__builtin_prefetch(d + 0 * dst_stride);
__builtin_prefetch(d + 1 * dst_stride);
__builtin_prefetch(d + 2 * dst_stride);
__builtin_prefetch(d + 3 * dst_stride);
- s += 7;
+
+ s = src_ptr + 7;
+
do {
- load_unaligned_u8_4x4(s, src_stride, &tu0, &tu1);
- t0 = vreinterpret_u8_u32(tu0);
- t1 = vreinterpret_u8_u32(tu1);
-
+ load_unaligned_u8_4x4(s, src_stride, &t0, &t1);
transpose_u8_4x4(&t0, &t1);
- u0 = vreinterpretq_s16_u16(vmovl_u8(t0));
- u1 = vreinterpretq_s16_u16(vmovl_u8(t1));
- s7 = vget_low_s16(u0);
- s8 = vget_low_s16(u1);
- s9 = vget_high_s16(u0);
- s10 = vget_high_s16(u1);
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s10 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter, zero,
- shift_round_0);
- d0 = vrshl_s16(d0, horiz_const);
- d0 = vadd_s16(d0, round_offset_vec);
- d1 = convolve8_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter, zero,
- shift_round_0);
- d1 = vrshl_s16(d1, horiz_const);
- d1 = vadd_s16(d1, round_offset_vec);
- d2 = convolve8_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter, zero,
- shift_round_0);
- d2 = vrshl_s16(d2, horiz_const);
- d2 = vadd_s16(d2, round_offset_vec);
- d3 = convolve8_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter, zero,
- shift_round_0);
- d3 = vrshl_s16(d3, horiz_const);
- d3 = vadd_s16(d3, round_offset_vec);
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ vget_low_s16(round_offset_vec));
- transpose_s16_4x4d(&d0, &d1, &d2, &d3);
+ transpose_u16_4x4d(&d0, &d1, &d2, &d3);
- if (conv_params->do_average) {
- __builtin_prefetch(d + 0 * dst_stride);
- __builtin_prefetch(d + 1 * dst_stride);
- __builtin_prefetch(d + 2 * dst_stride);
- __builtin_prefetch(d + 3 * dst_stride);
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
- __builtin_prefetch(d_u8 + 0 * dst8_stride);
- __builtin_prefetch(d_u8 + 1 * dst8_stride);
- __builtin_prefetch(d_u8 + 2 * dst8_stride);
- __builtin_prefetch(d_u8 + 3 * dst8_stride);
+ __builtin_prefetch(d_u8 + 0 * dst8_stride);
+ __builtin_prefetch(d_u8 + 1 * dst8_stride);
+ __builtin_prefetch(d_u8 + 2 * dst8_stride);
+ __builtin_prefetch(d_u8 + 3 * dst8_stride);
- load_u16_4x4(d, dst_stride, &res4, &res5, &res6, &res7);
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_4x4(res4, res5, res6, res7, vreinterpret_u16_s16(d0),
- vreinterpret_u16_s16(d1), vreinterpret_u16_s16(d2),
- vreinterpret_u16_s16(d3), fwd_offset, bck_offset,
- round_offset_vec, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01, &d23);
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t0),
- 0); // 00 01 02 03
- vst1_lane_u32((uint32_t *)(d_u8 + dst8_stride),
- vreinterpret_u32_u8(t0),
- 1); // 10 11 12 13
- vst1_lane_u32((uint32_t *)(d_u8 + 2 * dst8_stride),
- vreinterpret_u32_u8(t1),
- 0); // 20 21 22 23
- vst1_lane_u32((uint32_t *)(d_u8 + 3 * dst8_stride),
- vreinterpret_u32_u8(t1),
- 1); // 30 31 32 33
- } else {
- store_u16_4x4(d, dst_stride, vreinterpret_u16_s16(d0),
- vreinterpret_u16_s16(d1), vreinterpret_u16_s16(d2),
- vreinterpret_u16_s16(d3));
- }
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
s0 = s4;
s1 = s5;
@@ -1849,90 +2941,76 @@
s4 = s8;
s5 = s9;
s6 = s10;
-
s += 4;
- width -= 4;
d += 4;
d_u8 += 4;
- } while (width > 0);
- src_ptr += (src_stride << 2);
- dst_ptr += (dst_stride << 2);
- dst_u8_ptr += (dst8_stride << 2);
+ width -= 4;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
height -= 4;
-#else
- t0 = vld1_u8(s); // a0 a1 a2 a3 a4 a5 a6 a7
- tt0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
- s0 = vget_low_s16(tt0); // a0 a1 a2 a3
- s4 = vget_high_s16(tt0); // a4 a5 a6 a7
+#else // !AOM_ARCH_AARCH64
+ t0 = vld1_u8(src_ptr); // a0 a1 a2 a3 a4 a5 a6 a7
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+
__builtin_prefetch(d);
- s += 8;
+ s = src_ptr + 8;
+
do {
t0 = vld1_u8(s); // a8 a9 a10 a11
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
- // a8 a9 a10 a11
- s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
- temp_0 = s7;
s1 = vext_s16(s0, s4, 1); // a1 a2 a3 a4
s2 = vext_s16(s0, s4, 2); // a2 a3 a4 a5
s3 = vext_s16(s0, s4, 3); // a3 a4 a5 a6
- s5 = vext_s16(s4, s7, 1); // a5 a6 a7 a8
- s6 = vext_s16(s4, s7, 2); // a6 a7 a8 a9
- s7 = vext_s16(s4, s7, 3); // a7 a8 a9 a10
+ s5 = vext_s16(s4, s8, 1); // a5 a6 a7 a8
+ s6 = vext_s16(s4, s8, 2); // a6 a7 a8 a9
+ s7 = vext_s16(s4, s8, 3); // a7 a8 a9 a10
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter, zero,
- shift_round_0);
- d0 = vrshl_s16(d0, horiz_const);
- d0 = vadd_s16(d0, round_offset_vec);
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+
+ __builtin_prefetch(d);
+ __builtin_prefetch(d_u8);
+
+ dd0 = vld1_u16(d);
+
+ compute_dist_wtd_avg_4x1(dd0, d0, fwd_offset, bck_offset,
+ vget_low_s16(round_offset_vec), &d01);
+
+ store_u8_4x1(d_u8, d01, 0);
+
s0 = s4;
- s4 = temp_0;
- if (conv_params->do_average) {
- __builtin_prefetch(d);
- __builtin_prefetch(d_u8);
-
- res4 = vld1_u16(d);
-
- compute_avg_4x1(res4, vreinterpret_u16_s16(d0), fwd_offset,
- bck_offset, round_offset_vec, round_bits,
- use_dist_wtd_comp_avg, &t0);
-
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t0),
- 0); // 00 01 02 03
- } else {
- vst1_u16(d, vreinterpret_u16_s16(d0));
- }
-
+ s4 = s8;
s += 4;
- width -= 4;
d += 4;
d_u8 += 4;
- } while (width > 0);
- src_ptr += (src_stride);
- dst_ptr += (dst_stride);
- dst_u8_ptr += (dst8_stride);
+ width -= 4;
+ } while (width != 0);
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ dst8_ptr += dst8_stride;
height--;
-#endif
- } while (height > 0);
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
} else {
- CONV_BUF_TYPE *d_tmp;
- uint8_t *d_u8_tmp;
- int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
- int16x8_t res0;
- uint16x8_t res8;
- const int16x8_t round_offset128 = vdupq_n_s16(round_offset);
- const int16x4_t round_offset64 = vdup_n_s16(round_offset);
- const int16x8_t shift_round_0 = vdupq_n_s16(-conv_params->round_0 + 1);
- const int16x8_t horiz_const = vdupq_n_s16(bits);
- const int16x8_t zero = vdupq_n_s16(0);
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8;
- d = dst_ptr = dst;
- d_u8 = dst_u8_ptr = dst8;
do {
-#if defined(__aarch64__)
- int16x8_t s11, s12, s13, s14;
- int16x8_t s8, s9, s10;
- int16x8_t res1, res2, res3, res4, res5, res6, res7;
- uint16x8_t res9, res10, res11;
+ d = dst_ptr;
+ d_u8 = dst8_ptr;
+ width = w;
+
+#if AOM_ARCH_AARCH64
+ int16x8_t s9, s10, s11, s12, s13, s14;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8;
+
__builtin_prefetch(src_ptr + 0 * src_stride);
__builtin_prefetch(src_ptr + 1 * src_stride);
__builtin_prefetch(src_ptr + 2 * src_stride);
@@ -1941,8 +3019,10 @@
__builtin_prefetch(src_ptr + 5 * src_stride);
__builtin_prefetch(src_ptr + 6 * src_stride);
__builtin_prefetch(src_ptr + 7 * src_stride);
+
load_u8_8x8(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
@@ -1951,11 +3031,6 @@
s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
- width = w;
- s = src_ptr + 7;
- d = dst_ptr;
- d_u8_tmp = dst_u8_ptr;
-
__builtin_prefetch(dst_ptr + 0 * dst_stride);
__builtin_prefetch(dst_ptr + 1 * dst_stride);
__builtin_prefetch(dst_ptr + 2 * dst_stride);
@@ -1965,12 +3040,12 @@
__builtin_prefetch(dst_ptr + 6 * dst_stride);
__builtin_prefetch(dst_ptr + 7 * dst_stride);
- do {
- d_u8 = d_u8_tmp;
- d_tmp = d;
+ s = src_ptr + 7;
+ do {
load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
@@ -1980,79 +3055,654 @@
s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
- res0 = convolve8_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, x_filter, zero,
- shift_round_0);
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
+ d1 = convolve8_8_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ round_offset_vec);
+ d2 = convolve8_8_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ round_offset_vec);
+ d3 = convolve8_8_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ round_offset_vec);
+ d4 = convolve8_8_x(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
+ round_offset_vec);
+ d5 = convolve8_8_x(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
+ round_offset_vec);
+ d6 = convolve8_8_x(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
+ round_offset_vec);
+ d7 = convolve8_8_x(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
+ round_offset_vec);
- res0 = vrshlq_s16(res0, horiz_const);
- res0 = vaddq_s16(res0, round_offset128);
+ transpose_u16_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
- res1 = convolve8_8x8_s16(s1, s2, s3, s4, s5, s6, s7, s8, x_filter, zero,
- shift_round_0);
- res1 = vrshlq_s16(res1, horiz_const);
- res1 = vaddq_s16(res1, round_offset128);
- res2 = convolve8_8x8_s16(s2, s3, s4, s5, s6, s7, s8, s9, x_filter, zero,
- shift_round_0);
- res2 = vrshlq_s16(res2, horiz_const);
- res2 = vaddq_s16(res2, round_offset128);
- res3 = convolve8_8x8_s16(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
- zero, shift_round_0);
- res3 = vrshlq_s16(res3, horiz_const);
- res3 = vaddq_s16(res3, round_offset128);
- res4 = convolve8_8x8_s16(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
- zero, shift_round_0);
- res4 = vrshlq_s16(res4, horiz_const);
- res4 = vaddq_s16(res4, round_offset128);
- res5 = convolve8_8x8_s16(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
- zero, shift_round_0);
- res5 = vrshlq_s16(res5, horiz_const);
- res5 = vaddq_s16(res5, round_offset128);
- res6 = convolve8_8x8_s16(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
- zero, shift_round_0);
- res6 = vrshlq_s16(res6, horiz_const);
- res6 = vaddq_s16(res6, round_offset128);
- res7 = convolve8_8x8_s16(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
- zero, shift_round_0);
- res7 = vrshlq_s16(res7, horiz_const);
- res7 = vaddq_s16(res7, round_offset128);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- transpose_s16_8x8(&res0, &res1, &res2, &res3, &res4, &res5, &res6,
- &res7);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- if (conv_params->do_average) {
- load_u16_8x4(d_tmp, dst_stride, &res8, &res9, &res10, &res11);
- d_tmp += (dst_stride << 2);
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
- compute_avg_8x4(res8, res9, res10, res11, vreinterpretq_u16_s16(res0),
- vreinterpretq_u16_s16(res1),
- vreinterpretq_u16_s16(res2),
- vreinterpretq_u16_s16(res3), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1, &t2, &t3);
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
- store_u8_8x4(d_u8, dst8_stride, t0, t1, t2, t3);
- d_u8 += (dst8_stride << 2);
+ compute_dist_wtd_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7, fwd_offset,
+ bck_offset, round_offset_vec, &d4_u8, &d5_u8,
+ &d6_u8, &d7_u8);
- load_u16_8x4(d_tmp, dst_stride, &res8, &res9, &res10, &res11);
- d_tmp += (dst_stride << 2);
+ store_u8_8x4(d_u8 + 4 * dst8_stride, dst8_stride, d4_u8, d5_u8, d6_u8,
+ d7_u8);
- compute_avg_8x4(res8, res9, res10, res11, vreinterpretq_u16_s16(res4),
- vreinterpretq_u16_s16(res5),
- vreinterpretq_u16_s16(res6),
- vreinterpretq_u16_s16(res7), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1, &t2, &t3);
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s5 = s13;
+ s6 = s14;
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 8 * src_stride;
+ dst_ptr += 8 * dst_stride;
+ dst8_ptr += 8 * dst8_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr);
- store_u8_8x4(d_u8, dst8_stride, t0, t1, t2, t3);
- d_u8 += (dst8_stride << 2);
- } else {
- store_u16_8x8(
- d_tmp, dst_stride, vreinterpretq_u16_s16(res0),
- vreinterpretq_u16_s16(res1), vreinterpretq_u16_s16(res2),
- vreinterpretq_u16_s16(res3), vreinterpretq_u16_s16(res4),
- vreinterpretq_u16_s16(res5), vreinterpretq_u16_s16(res6),
- vreinterpretq_u16_s16(res7));
- d_tmp += (dst_stride << 3);
- }
+ t0 = vld1_u8(src_ptr);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
+
+ __builtin_prefetch(dst_ptr);
+
+ s = src_ptr + 8;
+
+ do {
+ t0 = vld1_u8(s); // a8 a9 a10 a11 a12 a13 a14 a15
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t0));
+
+ s1 = vextq_s16(s0, s8, 1); // a1 a2 a3 a4 a5 a6 a7 a8
+ s2 = vextq_s16(s0, s8, 2); // a2 a3 a4 a5 a6 a7 a8 a9
+ s3 = vextq_s16(s0, s8, 3); // a3 a4 a5 a6 a7 a8 a9 a10
+ s4 = vextq_s16(s0, s8, 4); // a4 a5 a6 a7 a8 a9 a10 a11
+ s5 = vextq_s16(s0, s8, 5); // a5 a6 a7 a8 a9 a10 a11 a12
+ s6 = vextq_s16(s0, s8, 6); // a6 a7 a8 a9 a10 a11 a12 a13
+ s7 = vextq_s16(s0, s8, 7); // a7 a8 a9 a10 a11 a12 a13 a14
+
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
+
+ dd0 = vld1q_u16(d);
+
+ compute_dist_wtd_avg_8x1(dd0, d0, fwd_offset, bck_offset,
+ round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+
+ s0 = s8;
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ dst8_ptr += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_x_avg_neon(
+ const uint8_t *src, int src_stride, uint8_t *dst8, int dst8_stride, int w,
+ int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int16x8_t x_filter = vshrq_n_s16(vld1q_s16(x_filter_ptr), 1);
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ uint8_t *dst8_ptr = dst8;
+ int dst_stride = conv_params->dst_stride;
+ const uint8_t *s;
+ uint8_t *d_u8;
+ CONV_BUF_TYPE *d;
+ int width;
+ int height = h;
+
+ uint8x8_t t0;
+#if AOM_ARCH_AARCH64
+ uint8x8_t t1, t2, t3, t4, t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x4_t d0, dd0;
+ uint8x8_t d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s9, s10;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ d = dst_ptr;
+ d_u8 = dst8_ptr;
+ width = w;
+
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+#if AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+
+ load_u8_8x4(src_ptr, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ s = src_ptr + 7;
+
+ do {
+ load_unaligned_u8_4x4(s, src_stride, &t0, &t1);
+ transpose_u8_4x4(&t0, &t1);
+
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s10 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ vget_low_s16(round_offset_vec));
+
+ transpose_u16_4x4d(&d0, &d1, &d2, &d3);
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ __builtin_prefetch(d_u8 + 0 * dst8_stride);
+ __builtin_prefetch(d_u8 + 1 * dst8_stride);
+ __builtin_prefetch(d_u8 + 2 * dst8_stride);
+ __builtin_prefetch(d_u8 + 3 * dst8_stride);
+
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01, &d23);
+
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4;
+ d += 4;
+ d_u8 += 4;
+ width -= 4;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ dst8_ptr += 4 * dst8_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = vld1_u8(src_ptr); // a0 a1 a2 a3 a4 a5 a6 a7
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+
+ __builtin_prefetch(d);
+
+ s = src_ptr + 8;
+
+ do {
+ t0 = vld1_u8(s); // a8 a9 a10 a11
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+
+ s1 = vext_s16(s0, s4, 1); // a1 a2 a3 a4
+ s2 = vext_s16(s0, s4, 2); // a2 a3 a4 a5
+ s3 = vext_s16(s0, s4, 3); // a3 a4 a5 a6
+ s5 = vext_s16(s4, s8, 1); // a5 a6 a7 a8
+ s6 = vext_s16(s4, s8, 2); // a6 a7 a8 a9
+ s7 = vext_s16(s4, s8, 3); // a7 a8 a9 a10
+
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+
+ __builtin_prefetch(d);
+ __builtin_prefetch(d_u8);
+
+ dd0 = vld1_u16(d);
+
+ compute_basic_avg_4x1(dd0, d0, vget_low_s16(round_offset_vec), &d01);
+
+ store_u8_4x1(d_u8, d01, 0);
+
+ s0 = s4;
+ s4 = s8;
+ s += 4;
+ d += 4;
+ d_u8 += 4;
+ width -= 4;
+ } while (width != 0);
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ dst8_ptr += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8;
+
+ do {
+ d = dst_ptr;
+ d_u8 = dst8_ptr;
+ width = w;
+
+#if AOM_ARCH_AARCH64
+ int16x8_t s9, s10, s11, s12, s13, s14;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8;
+
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+ __builtin_prefetch(src_ptr + 4 * src_stride);
+ __builtin_prefetch(src_ptr + 5 * src_stride);
+ __builtin_prefetch(src_ptr + 6 * src_stride);
+ __builtin_prefetch(src_ptr + 7 * src_stride);
+
+ load_u8_8x8(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ __builtin_prefetch(dst_ptr + 0 * dst_stride);
+ __builtin_prefetch(dst_ptr + 1 * dst_stride);
+ __builtin_prefetch(dst_ptr + 2 * dst_stride);
+ __builtin_prefetch(dst_ptr + 3 * dst_stride);
+ __builtin_prefetch(dst_ptr + 4 * dst_stride);
+ __builtin_prefetch(dst_ptr + 5 * dst_stride);
+ __builtin_prefetch(dst_ptr + 6 * dst_stride);
+ __builtin_prefetch(dst_ptr + 7 * dst_stride);
+
+ s = src_ptr + 7;
+
+ do {
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
+ d1 = convolve8_8_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ round_offset_vec);
+ d2 = convolve8_8_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ round_offset_vec);
+ d3 = convolve8_8_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ round_offset_vec);
+ d4 = convolve8_8_x(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
+ round_offset_vec);
+ d5 = convolve8_8_x(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
+ round_offset_vec);
+ d6 = convolve8_8_x(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
+ round_offset_vec);
+ d7 = convolve8_8_x(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
+ round_offset_vec);
+
+ transpose_u16_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
+
+ compute_basic_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7,
+ round_offset_vec, &d4_u8, &d5_u8, &d6_u8, &d7_u8);
+
+ store_u8_8x4(d_u8 + 4 * dst8_stride, dst8_stride, d4_u8, d5_u8, d6_u8,
+ d7_u8);
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s5 = s13;
+ s6 = s14;
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += 8 * src_stride;
+ dst_ptr += 8 * dst_stride;
+ dst8_ptr += 8 * dst8_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr);
+
+ t0 = vld1_u8(src_ptr);
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
+
+ __builtin_prefetch(dst_ptr);
+
+ s = src_ptr + 8;
+
+ do {
+ t0 = vld1_u8(s); // a8 a9 a10 a11 a12 a13 a14 a15
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t0));
+
+ s1 = vextq_s16(s0, s8, 1); // a1 a2 a3 a4 a5 a6 a7 a8
+ s2 = vextq_s16(s0, s8, 2); // a2 a3 a4 a5 a6 a7 a8 a9
+ s3 = vextq_s16(s0, s8, 3); // a3 a4 a5 a6 a7 a8 a9 a10
+ s4 = vextq_s16(s0, s8, 4); // a4 a5 a6 a7 a8 a9 a10 a11
+ s5 = vextq_s16(s0, s8, 5); // a5 a6 a7 a8 a9 a10 a11 a12
+ s6 = vextq_s16(s0, s8, 6); // a6 a7 a8 a9 a10 a11 a12 a13
+ s7 = vextq_s16(s0, s8, 7); // a7 a8 a9 a10 a11 a12 a13 a14
+
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
+
+ dd0 = vld1q_u16(d);
+
+ compute_basic_avg_8x1(dd0, d0, round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+
+ s0 = s8;
+ s += 8;
+ d += 8;
+ d_u8 += 8;
+ width -= 8;
+ } while (width != 0);
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ dst8_ptr += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_x_neon(
+ const uint8_t *src, int src_stride, int w, int h,
+ const InterpFilterParams *filter_params_x, const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ // Horizontal filter.
+ const int16_t *x_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_x, subpel_x_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate precision
+ // requirements.
+ const int16x8_t x_filter = vshrq_n_s16(vld1q_s16(x_filter_ptr), 1);
+
+ const int horiz_offset = filter_params_x->taps / 2 - 1;
+ const uint8_t *src_ptr = src - horiz_offset;
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ int dst_stride = conv_params->dst_stride;
+ const uint8_t *s;
+ CONV_BUF_TYPE *d;
+ int width;
+ int height = h;
+
+ uint8x8_t t0;
+#if AOM_ARCH_AARCH64
+ uint8x8_t t1, t2, t3, t4, t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x4_t d0;
+#if AOM_ARCH_AARCH64
+ int16x4_t s9, s10;
+ uint16x4_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ d = dst_ptr;
+ width = w;
+
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+#if AOM_ARCH_AARCH64
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+
+ load_u8_8x4(src_ptr, src_stride, &t0, &t1, &t2, &t3);
+ transpose_u8_8x4(&t0, &t1, &t2, &t3);
+
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s1 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s2 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+ s3 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t3)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s5 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s6 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t2)));
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ s = src_ptr + 7;
+
+ do {
+ load_unaligned_u8_4x4(s, src_stride, &t0, &t1);
+ transpose_u8_4x4(&t0, &t1);
+
+ s7 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+ s9 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s10 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t1)));
+
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ vget_low_s16(round_offset_vec));
+
+ transpose_u16_4x4d(&d0, &d1, &d2, &d3);
+
+ store_u16_4x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4;
+ d += 4;
+ width -= 4;
+ } while (width != 0);
+ src_ptr += 4 * src_stride;
+ dst_ptr += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = vld1_u8(src_ptr); // a0 a1 a2 a3 a4 a5 a6 a7
+ s0 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+ s4 = vget_high_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+
+ __builtin_prefetch(d);
+
+ s = src_ptr + 8;
+
+ do {
+ t0 = vld1_u8(s); // a8 a9 a10 a11
+ s8 = vget_low_s16(vreinterpretq_s16_u16(vmovl_u8(t0)));
+
+ s1 = vext_s16(s0, s4, 1); // a1 a2 a3 a4
+ s2 = vext_s16(s0, s4, 2); // a2 a3 a4 a5
+ s3 = vext_s16(s0, s4, 3); // a3 a4 a5 a6
+ s5 = vext_s16(s4, s8, 1); // a5 a6 a7 a8
+ s6 = vext_s16(s4, s8, 2); // a6 a7 a8 a9
+ s7 = vext_s16(s4, s8, 3); // a7 a8 a9 a10
+
+ d0 = convolve8_4_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ vget_low_s16(round_offset_vec));
+
+ vst1_u16(d, d0);
+
+ s0 = s4;
+ s4 = s8;
+ s += 4;
+ d += 4;
+ width -= 4;
+ } while (width != 0);
+ src_ptr += src_stride;
+ dst_ptr += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7, s8;
+ uint16x8_t d0;
+
+ do {
+ d = dst_ptr;
+ width = w;
+
+#if AOM_ARCH_AARCH64
+ int16x8_t s9, s10, s11, s12, s13, s14;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7;
+
+ __builtin_prefetch(src_ptr + 0 * src_stride);
+ __builtin_prefetch(src_ptr + 1 * src_stride);
+ __builtin_prefetch(src_ptr + 2 * src_stride);
+ __builtin_prefetch(src_ptr + 3 * src_stride);
+ __builtin_prefetch(src_ptr + 4 * src_stride);
+ __builtin_prefetch(src_ptr + 5 * src_stride);
+ __builtin_prefetch(src_ptr + 6 * src_stride);
+ __builtin_prefetch(src_ptr + 7 * src_stride);
+
+ load_u8_8x8(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ __builtin_prefetch(dst_ptr + 0 * dst_stride);
+ __builtin_prefetch(dst_ptr + 1 * dst_stride);
+ __builtin_prefetch(dst_ptr + 2 * dst_stride);
+ __builtin_prefetch(dst_ptr + 3 * dst_stride);
+ __builtin_prefetch(dst_ptr + 4 * dst_stride);
+ __builtin_prefetch(dst_ptr + 5 * dst_stride);
+ __builtin_prefetch(dst_ptr + 6 * dst_stride);
+ __builtin_prefetch(dst_ptr + 7 * dst_stride);
+
+ s = src_ptr + 7;
+
+ do {
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ transpose_u8_8x8(&t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
+ d1 = convolve8_8_x(s1, s2, s3, s4, s5, s6, s7, s8, x_filter,
+ round_offset_vec);
+ d2 = convolve8_8_x(s2, s3, s4, s5, s6, s7, s8, s9, x_filter,
+ round_offset_vec);
+ d3 = convolve8_8_x(s3, s4, s5, s6, s7, s8, s9, s10, x_filter,
+ round_offset_vec);
+ d4 = convolve8_8_x(s4, s5, s6, s7, s8, s9, s10, s11, x_filter,
+ round_offset_vec);
+ d5 = convolve8_8_x(s5, s6, s7, s8, s9, s10, s11, s12, x_filter,
+ round_offset_vec);
+ d6 = convolve8_8_x(s6, s7, s8, s9, s10, s11, s12, s13, x_filter,
+ round_offset_vec);
+ d7 = convolve8_8_x(s7, s8, s9, s10, s11, s12, s13, s14, x_filter,
+ round_offset_vec);
+
+ transpose_u16_8x8(&d0, &d1, &d2, &d3, &d4, &d5, &d6, &d7);
+
+ store_u16_8x8(d, dst_stride, d0, d1, d2, d3, d4, d5, d6, d7);
s0 = s8;
s1 = s9;
@@ -2064,235 +3714,878 @@
s += 8;
d += 8;
width -= 8;
- d_u8_tmp += 8;
- } while (width > 0);
+ } while (width != 0);
src_ptr += 8 * src_stride;
dst_ptr += 8 * dst_stride;
- dst_u8_ptr += 8 * dst8_stride;
height -= 8;
-#else
- int16x8_t temp_0;
+#else // !AOM_ARCH_AARCH64
__builtin_prefetch(src_ptr);
+
t0 = vld1_u8(src_ptr);
s0 = vreinterpretq_s16_u16(vmovl_u8(t0)); // a0 a1 a2 a3 a4 a5 a6 a7
- width = w;
- s = src_ptr + 8;
- d = dst_ptr;
- d_u8_tmp = dst_u8_ptr;
-
__builtin_prefetch(dst_ptr);
+ s = src_ptr + 8;
+
do {
- d_u8 = d_u8_tmp;
- d_tmp = d;
-
t0 = vld1_u8(s); // a8 a9 a10 a11 a12 a13 a14 a15
- s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
- temp_0 = s0;
- s0 = s7;
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t0));
- s1 = vextq_s16(temp_0, s7, 1); // a1 a2 a3 a4 a5 a6 a7 a8
- s2 = vextq_s16(temp_0, s7, 2); // a2 a3 a4 a5 a6 a7 a8 a9
- s3 = vextq_s16(temp_0, s7, 3); // a3 a4 a5 a6 a7 a8 a9 a10
- s4 = vextq_s16(temp_0, s7, 4); // a4 a5 a6 a7 a8 a9 a10 a11
- s5 = vextq_s16(temp_0, s7, 5); // a5 a6 a7 a8 a9 a10 a11 a12
- s6 = vextq_s16(temp_0, s7, 6); // a6 a7 a8 a9 a10 a11 a12 a13
- s7 = vextq_s16(temp_0, s7, 7); // a7 a8 a9 a10 a11 a12 a13 a14
+ s1 = vextq_s16(s0, s8, 1); // a1 a2 a3 a4 a5 a6 a7 a8
+ s2 = vextq_s16(s0, s8, 2); // a2 a3 a4 a5 a6 a7 a8 a9
+ s3 = vextq_s16(s0, s8, 3); // a3 a4 a5 a6 a7 a8 a9 a10
+ s4 = vextq_s16(s0, s8, 4); // a4 a5 a6 a7 a8 a9 a10 a11
+ s5 = vextq_s16(s0, s8, 5); // a5 a6 a7 a8 a9 a10 a11 a12
+ s6 = vextq_s16(s0, s8, 6); // a6 a7 a8 a9 a10 a11 a12 a13
+ s7 = vextq_s16(s0, s8, 7); // a7 a8 a9 a10 a11 a12 a13 a14
- res0 = convolve8_8x8_s16(temp_0, s1, s2, s3, s4, s5, s6, s7, x_filter,
- zero, shift_round_0);
+ d0 = convolve8_8_x(s0, s1, s2, s3, s4, s5, s6, s7, x_filter,
+ round_offset_vec);
- res0 = vrshlq_s16(res0, horiz_const);
- res0 = vaddq_s16(res0, round_offset128);
+ vst1q_u16(d, d0);
- if (conv_params->do_average) {
- res8 = vld1q_u16(d_tmp);
- d_tmp += (dst_stride);
-
- compute_avg_8x1(res8, vreinterpretq_u16_s16(res0), fwd_offset,
- bck_offset, round_offset64, round_bits,
- use_dist_wtd_comp_avg, &t0);
-
- vst1_u8(d_u8, t0);
- d_u8 += (dst8_stride);
- } else {
- vst1q_u16(d_tmp, vreinterpretq_u16_s16(res0));
- d_tmp += (dst_stride);
- }
-
+ s0 = s8;
s += 8;
d += 8;
width -= 8;
- d_u8_tmp += 8;
- } while (width > 0);
+ } while (width != 0);
src_ptr += src_stride;
dst_ptr += dst_stride;
- dst_u8_ptr += dst8_stride;
height--;
-#endif
- } while (height > 0);
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
}
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
-void av1_dist_wtd_convolve_y_neon(const uint8_t *src, int src_stride,
+void av1_dist_wtd_convolve_x_neon(const uint8_t *src, int src_stride,
uint8_t *dst8, int dst8_stride, int w, int h,
- const InterpFilterParams *filter_params_y,
- const int subpel_y_qn,
+ const InterpFilterParams *filter_params_x,
+ const int subpel_x_qn,
ConvolveParams *conv_params) {
- assert(!(w % 4));
- assert(!(h % 4));
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_x_dist_wtd_avg_neon(src, src_stride, dst8, dst8_stride,
+ w, h, filter_params_x, subpel_x_qn,
+ conv_params);
+ } else {
+ dist_wtd_convolve_x_avg_neon(src, src_stride, dst8, dst8_stride, w, h,
+ filter_params_x, subpel_x_qn, conv_params);
+ }
+ } else {
+ dist_wtd_convolve_x_neon(src, src_stride, w, h, filter_params_x,
+ subpel_x_qn, conv_params);
+ }
+}
- CONV_BUF_TYPE *dst = conv_params->dst;
- const int dst_stride = conv_params->dst_stride;
- const int vert_offset = filter_params_y->taps / 2 - 1;
- const int bits = FILTER_BITS - conv_params->round_0;
+static INLINE uint16x4_t convolve6_4_y(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x8_t y_filter,
+ const int16x4_t round_offset) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ // Filter values at indices 0 and 7 are 0.
+ int16x4_t sum = vmul_lane_s16(s0, y_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s1, y_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s2, y_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s3, y_filter_4_7, 0);
+ sum = vmla_lane_s16(sum, s4, y_filter_4_7, 1);
+ sum = vmla_lane_s16(sum, s5, y_filter_4_7, 2);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x4_t res = vrsra_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpret_u16_s16(res);
+}
+
+static INLINE uint16x8_t convolve6_8_y(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t y_filter,
+ const int16x8_t round_offset) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ // Filter values at indices 0 and 7 are 0.
+ int16x8_t sum = vmulq_lane_s16(s0, y_filter_0_3, 1);
+ sum = vmlaq_lane_s16(sum, s1, y_filter_0_3, 2);
+ sum = vmlaq_lane_s16(sum, s2, y_filter_0_3, 3);
+ sum = vmlaq_lane_s16(sum, s3, y_filter_4_7, 0);
+ sum = vmlaq_lane_s16(sum, s4, y_filter_4_7, 1);
+ sum = vmlaq_lane_s16(sum, s5, y_filter_4_7, 2);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x8_t res = vrsraq_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpretq_u16_s16(res);
+}
+
+static INLINE void dist_wtd_convolve_y_6tap_dist_wtd_avg_neon(
+ const uint8_t *src_ptr, int src_stride, uint8_t *dst8_ptr,
+ const int dst8_stride, int w, int h, const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
const int bd = 8;
- const int offset_bits = bd + 2 * FILTER_BITS - conv_params->round_0;
- const int round_offset = (1 << (offset_bits - conv_params->round_1)) +
- (1 << (offset_bits - conv_params->round_1 - 1));
- const int round_bits =
- 2 * FILTER_BITS - conv_params->round_0 - conv_params->round_1;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
const uint16_t fwd_offset = conv_params->fwd_offset;
const uint16_t bck_offset = conv_params->bck_offset;
- const int use_dist_wtd_comp_avg = conv_params->use_dist_wtd_comp_avg;
- const int shift_value = (conv_params->round_1 - 1 - bits);
- // vertical filter
- const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
- filter_params_y, subpel_y_qn & SUBPEL_MASK);
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
- const uint8_t *src_ptr = src - (vert_offset * src_stride);
-
- // Filter values are even, so downshift by 1 to reduce intermediate precision
- // requirements.
- const int16x8_t y_filter = vshrq_n_s16(vld1q_s16(y_filter_ptr), 1);
-
- const uint8_t *s;
- uint8_t *d_u8;
- uint8_t *dst_u8_ptr;
- CONV_BUF_TYPE *d, *dst_ptr;
- int width, height;
-
- s = src_ptr;
- dst_ptr = dst;
- dst_u8_ptr = dst8;
- width = w;
- height = h;
-
- // used to get rid of multiplication = (vertical filter output sum) *
- // (1<<bits).
- assert((conv_params->round_1 - 2) >= bits);
-
- if ((w == 4) || (h == 4)) {
- int16x4_t s0, s1, s2, s3, s4, s5, s6, s7, d0;
- uint16x4_t res4;
- uint32x2_t tu0 = vdup_n_u32(0), tu1 = vdup_n_u32(0), tu2 = vdup_n_u32(0),
- tu3 = vdup_n_u32(0);
- int16x8_t u0, u1, u2, u3;
- uint8x8_t t0;
-
-#if defined(__aarch64__)
- int16x4_t s8, s9, s10, d1, d2, d3;
- uint16x4_t res5, res6, res7;
- uint8x8_t t1;
-#endif
- const int16x4_t round_offset64 = vdup_n_s16(round_offset);
- const int16x4_t shift_vec = vdup_n_s16(-shift_value);
- const int16x4_t zero = vdup_n_s16(0);
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5;
+ uint16x4_t d0, dd0;
+ uint8x8_t t0, t1, t2, t3, t4, d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s6, s7, s8;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
do {
- s = src_ptr;
- d = dst_ptr;
- d_u8 = dst_u8_ptr;
- height = h;
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve6_4_y(s1, s2, s3, s4, s5, s6, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve6_4_y(s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve6_4_y(s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01, &d23);
+
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ d_u8 += 4 * dst8_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ dd0 = vld1_u16(d);
+
+ compute_dist_wtd_avg_4x1(dd0, d0, fwd_offset, bck_offset,
+ vget_low_s16(round_offset_vec), &d01);
+
+ store_u8_4x1(d_u8, d01, 0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ d_u8 += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 4;
+ dst_ptr += 4;
+ dst8_ptr += 4;
+ width -= 4;
+ } while (width != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5;
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8, t0, t1, t2, t3, t4;
+#if AOM_ARCH_AARCH64
+ int16x8_t s6, s7, s8, s9, s10, s11, s12;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8, t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr + (5 * src_stride);
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ load_u8_8x5(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+ d1 = convolve6_8_y(s1, s2, s3, s4, s5, s6, y_filter, round_offset_vec);
+ d2 = convolve6_8_y(s2, s3, s4, s5, s6, s7, y_filter, round_offset_vec);
+ d3 = convolve6_8_y(s3, s4, s5, s6, s7, s8, y_filter, round_offset_vec);
+ d4 = convolve6_8_y(s4, s5, s6, s7, s8, s9, y_filter, round_offset_vec);
+ d5 = convolve6_8_y(s5, s6, s7, s8, s9, s10, y_filter, round_offset_vec);
+ d6 =
+ convolve6_8_y(s6, s7, s8, s9, s10, s11, y_filter, round_offset_vec);
+ d7 = convolve6_8_y(s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
+
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
+
+ compute_dist_wtd_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7, fwd_offset,
+ bck_offset, round_offset_vec, &d4_u8, &d5_u8,
+ &d6_u8, &d7_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d4_u8, d5_u8, d6_u8, d7_u8);
+ d_u8 += 4 * dst8_stride;
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ s5 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+
+ dd0 = vld1q_u16(d);
+
+ compute_dist_wtd_avg_8x1(dd0, d0, fwd_offset, bck_offset,
+ round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
+
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ dst8_ptr += 8;
+ width -= 8;
+ } while (width != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_y_6tap_avg_neon(
+ const uint8_t *src_ptr, int src_stride, uint8_t *dst8_ptr,
+ const int dst8_stride, int w, int h, const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5;
+ uint16x4_t d0, dd0;
+ uint8x8_t t0, t1, t2, t3, t4, d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s6, s7, s8;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve6_4_y(s1, s2, s3, s4, s5, s6, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve6_4_y(s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve6_4_y(s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01, &d23);
+
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ d_u8 += 4 * dst8_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ dd0 = vld1_u16(d);
+
+ compute_basic_avg_4x1(dd0, d0, vget_low_s16(round_offset_vec), &d01);
+
+ store_u8_4x1(d_u8, d01, 0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ d_u8 += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 4;
+ dst_ptr += 4;
+ dst8_ptr += 4;
+ width -= 4;
+ } while (width != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5;
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8, t0, t1, t2, t3, t4;
+#if AOM_ARCH_AARCH64
+ int16x8_t s6, s7, s8, s9, s10, s11, s12;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8, t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr + (5 * src_stride);
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ load_u8_8x5(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+ d1 = convolve6_8_y(s1, s2, s3, s4, s5, s6, y_filter, round_offset_vec);
+ d2 = convolve6_8_y(s2, s3, s4, s5, s6, s7, y_filter, round_offset_vec);
+ d3 = convolve6_8_y(s3, s4, s5, s6, s7, s8, y_filter, round_offset_vec);
+ d4 = convolve6_8_y(s4, s5, s6, s7, s8, s9, y_filter, round_offset_vec);
+ d5 = convolve6_8_y(s5, s6, s7, s8, s9, s10, y_filter, round_offset_vec);
+ d6 =
+ convolve6_8_y(s6, s7, s8, s9, s10, s11, y_filter, round_offset_vec);
+ d7 = convolve6_8_y(s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
+
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
+
+ compute_basic_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7,
+ round_offset_vec, &d4_u8, &d5_u8, &d6_u8, &d7_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d4_u8, d5_u8, d6_u8, d7_u8);
+ d_u8 += 4 * dst8_stride;
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ s5 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+
+ dd0 = vld1q_u16(d);
+
+ compute_basic_avg_8x1(dd0, d0, round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
+
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ dst8_ptr += 8;
+ width -= 8;
+ } while (width != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_y_6tap_neon(const uint8_t *src_ptr,
+ int src_stride, int w, int h,
+ const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5;
+ uint16x4_t d0;
+ uint8x8_t t0, t1, t2, t3, t4;
+#if AOM_ARCH_AARCH64
+ int16x4_t s6, s7, s8;
+ uint16x4_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+
+ s += 5 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve6_4_y(s1, s2, s3, s4, s5, s6, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve6_4_y(s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve6_4_y(s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ store_u16_4x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+
+ d0 = convolve6_4_y(s0, s1, s2, s3, s4, s5, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ vst1_u16(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 4;
+ dst_ptr += 4;
+ width -= 4;
+ } while (width != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5;
+ uint16x8_t d0;
+ uint8x8_t t0, t1, t2, t3, t4;
+#if AOM_ARCH_AARCH64
+ int16x8_t s6, s7, s8, s9, s10, s11, s12;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7;
+ uint8x8_t t5, t6, t7;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr + (5 * src_stride);
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ load_u8_8x5(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+ d1 = convolve6_8_y(s1, s2, s3, s4, s5, s6, y_filter, round_offset_vec);
+ d2 = convolve6_8_y(s2, s3, s4, s5, s6, s7, y_filter, round_offset_vec);
+ d3 = convolve6_8_y(s3, s4, s5, s6, s7, s8, y_filter, round_offset_vec);
+ d4 = convolve6_8_y(s4, s5, s6, s7, s8, s9, y_filter, round_offset_vec);
+ d5 = convolve6_8_y(s5, s6, s7, s8, s9, s10, y_filter, round_offset_vec);
+ d6 =
+ convolve6_8_y(s6, s7, s8, s9, s10, s11, y_filter, round_offset_vec);
+ d7 = convolve6_8_y(s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+
+ store_u16_8x8(d, dst_stride, d0, d1, d2, d3, d4, d5, d6, d7);
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ s5 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
+
+ d0 = convolve6_8_y(s0, s1, s2, s3, s4, s5, y_filter, round_offset_vec);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+
+ vst1q_u16(d, d0);
+
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ width -= 8;
+ } while (width != 0);
+ }
+}
+
+static INLINE uint16x4_t convolve8_4_y(const int16x4_t s0, const int16x4_t s1,
+ const int16x4_t s2, const int16x4_t s3,
+ const int16x4_t s4, const int16x4_t s5,
+ const int16x4_t s6, const int16x4_t s7,
+ const int16x8_t y_filter,
+ const int16x4_t round_offset) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int16x4_t sum = vmul_lane_s16(s0, y_filter_0_3, 0);
+ sum = vmla_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmla_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmla_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmla_lane_s16(sum, s4, y_filter_4_7, 0);
+ sum = vmla_lane_s16(sum, s5, y_filter_4_7, 1);
+ sum = vmla_lane_s16(sum, s6, y_filter_4_7, 2);
+ sum = vmla_lane_s16(sum, s7, y_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x4_t res = vrsra_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpret_u16_s16(res);
+}
+
+static INLINE uint16x8_t convolve8_8_y(const int16x8_t s0, const int16x8_t s1,
+ const int16x8_t s2, const int16x8_t s3,
+ const int16x8_t s4, const int16x8_t s5,
+ const int16x8_t s6, const int16x8_t s7,
+ const int16x8_t y_filter,
+ const int16x8_t round_offset) {
+ const int16x4_t y_filter_0_3 = vget_low_s16(y_filter);
+ const int16x4_t y_filter_4_7 = vget_high_s16(y_filter);
+
+ int16x8_t sum = vmulq_lane_s16(s0, y_filter_0_3, 0);
+ sum = vmlaq_lane_s16(sum, s1, y_filter_0_3, 1);
+ sum = vmlaq_lane_s16(sum, s2, y_filter_0_3, 2);
+ sum = vmlaq_lane_s16(sum, s3, y_filter_0_3, 3);
+ sum = vmlaq_lane_s16(sum, s4, y_filter_4_7, 0);
+ sum = vmlaq_lane_s16(sum, s5, y_filter_4_7, 1);
+ sum = vmlaq_lane_s16(sum, s6, y_filter_4_7, 2);
+ sum = vmlaq_lane_s16(sum, s7, y_filter_4_7, 3);
+
+ // We halved the convolution filter values so -1 from the right shift.
+ int16x8_t res = vrsraq_n_s16(round_offset, sum, ROUND0_BITS - 1);
+ return vreinterpretq_u16_s16(res);
+}
+
+static INLINE void dist_wtd_convolve_y_8tap_dist_wtd_avg_neon(
+ const uint8_t *src_ptr, int src_stride, uint8_t *dst8_ptr,
+ const int dst8_stride, int w, int h, const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ const uint16_t fwd_offset = conv_params->fwd_offset;
+ const uint16_t bck_offset = conv_params->bck_offset;
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t d0, dd0;
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
__builtin_prefetch(s + 0 * src_stride);
__builtin_prefetch(s + 1 * src_stride);
__builtin_prefetch(s + 2 * src_stride);
__builtin_prefetch(s + 3 * src_stride);
- load_unaligned_u8_4x8(s, src_stride, &tu0, &tu1, &tu2, &tu3);
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+ t5 = load_unaligned_u8_4x1(s + 5 * src_stride);
+ t6 = load_unaligned_u8_4x1(s + 6 * src_stride);
- u0 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu0)));
- u1 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu1)));
- u2 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu2)));
- u3 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu3)));
-
- s0 = vget_low_s16(u0);
- s1 = vget_high_s16(u0);
- s2 = vget_low_s16(u1);
- s3 = vget_high_s16(u1);
- s4 = vget_low_s16(u2);
- s5 = vget_high_s16(u2);
- s6 = vget_low_s16(u3);
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t5)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t6)));
__builtin_prefetch(d + 0 * dst_stride);
__builtin_prefetch(d + 1 * dst_stride);
__builtin_prefetch(d + 2 * dst_stride);
__builtin_prefetch(d + 3 * dst_stride);
- s += (7 * src_stride);
+ s += 7 * src_stride;
+
do {
-#if defined(__aarch64__)
- load_unaligned_u8_4x4(s, src_stride, &tu0, &tu1);
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
- u0 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu0)));
- u1 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu1)));
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s9 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s10 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
- s7 = vget_low_s16(u0);
- s8 = vget_high_s16(u0);
- s9 = vget_low_s16(u1);
- s10 = vget_high_s16(u1);
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ vget_low_s16(round_offset_vec));
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, zero,
- shift_vec);
- d0 = vadd_s16(d0, round_offset64);
- d1 = convolve8_4x4_s16(s1, s2, s3, s4, s5, s6, s7, s8, y_filter, zero,
- shift_vec);
- d1 = vadd_s16(d1, round_offset64);
- d2 = convolve8_4x4_s16(s2, s3, s4, s5, s6, s7, s8, s9, y_filter, zero,
- shift_vec);
- d2 = vadd_s16(d2, round_offset64);
- d3 = convolve8_4x4_s16(s3, s4, s5, s6, s7, s8, s9, s10, y_filter, zero,
- shift_vec);
- d3 = vadd_s16(d3, round_offset64);
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
- if (conv_params->do_average) {
- __builtin_prefetch(d + 0 * dst_stride);
- __builtin_prefetch(d + 1 * dst_stride);
- __builtin_prefetch(d + 2 * dst_stride);
- __builtin_prefetch(d + 3 * dst_stride);
+ __builtin_prefetch(d_u8 + 0 * dst8_stride);
+ __builtin_prefetch(d_u8 + 1 * dst8_stride);
+ __builtin_prefetch(d_u8 + 2 * dst8_stride);
+ __builtin_prefetch(d_u8 + 3 * dst8_stride);
- __builtin_prefetch(d_u8 + 0 * dst8_stride);
- __builtin_prefetch(d_u8 + 1 * dst8_stride);
- __builtin_prefetch(d_u8 + 2 * dst8_stride);
- __builtin_prefetch(d_u8 + 3 * dst8_stride);
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- load_u16_4x4(d, dst_stride, &res4, &res5, &res6, &res7);
- d += (dst_stride << 2);
+ compute_dist_wtd_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d01, &d23);
- compute_avg_4x4(res4, res5, res6, res7, vreinterpret_u16_s16(d0),
- vreinterpret_u16_s16(d1), vreinterpret_u16_s16(d2),
- vreinterpret_u16_s16(d3), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1);
-
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t0), 0);
- d_u8 += dst8_stride;
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t0), 1);
- d_u8 += dst8_stride;
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t1), 0);
- d_u8 += dst8_stride;
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t1), 1);
- d_u8 += dst8_stride;
- } else {
- store_u16_4x4(d, dst_stride, vreinterpret_u16_s16(d0),
- vreinterpret_u16_s16(d1), vreinterpret_u16_s16(d2),
- vreinterpret_u16_s16(d3));
- d += (dst_stride << 2);
- }
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
s0 = s4;
s1 = s5;
@@ -2301,35 +4594,25 @@
s4 = s8;
s5 = s9;
s6 = s10;
-
- s += (src_stride << 2);
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ d_u8 += 4 * dst8_stride;
height -= 4;
-#else
- load_unaligned_u8_4x1(s, src_stride, &tu0);
- u0 = vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(tu0)));
- s7 = vget_low_s16(u0);
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
- d0 = convolve8_4x4_s16(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, zero,
- shift_vec);
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
- d0 = vadd_s16(d0, round_offset64);
+ __builtin_prefetch(d);
- if (conv_params->do_average) {
- __builtin_prefetch(d);
+ dd0 = vld1_u16(d);
- res4 = vld1_u16(d);
- d += (dst_stride);
+ compute_dist_wtd_avg_4x1(dd0, d0, fwd_offset, bck_offset,
+ vget_low_s16(round_offset_vec), &d01);
- compute_avg_4x1(res4, vreinterpret_u16_s16(d0), fwd_offset,
- bck_offset, round_offset64, round_bits,
- use_dist_wtd_comp_avg, &t0);
-
- vst1_lane_u32((uint32_t *)d_u8, vreinterpret_u32_u8(t0), 0);
- d_u8 += dst8_stride;
- } else {
- vst1_u16(d, vreinterpret_u16_s16(d0));
- d += (dst_stride);
- }
+ store_u8_4x1(d_u8, d01, 0);
s0 = s1;
s1 = s2;
@@ -2338,43 +4621,42 @@
s4 = s5;
s5 = s6;
s6 = s7;
-
- s += (src_stride);
+ s += src_stride;
+ d += dst_stride;
+ d_u8 += dst8_stride;
height--;
-#endif
- } while (height > 0);
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
src_ptr += 4;
dst_ptr += 4;
- dst_u8_ptr += 4;
+ dst8_ptr += 4;
width -= 4;
- } while (width > 0);
+ } while (width != 0);
} else {
- CONV_BUF_TYPE *d_tmp;
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
- int16x8_t res0;
- uint16x8_t res8;
- uint8x8_t t0, t1, t2, t3, t4, t5, t6, t7;
- const int16x8_t round_offset128 = vdupq_n_s16(round_offset);
- const int16x8_t shift_vec = vdupq_n_s16(-shift_value);
- const int16x4_t round_offset64 = vdup_n_s16(round_offset);
- const int16x8_t zero = vdupq_n_s16(0);
-#if defined(__aarch64__)
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8, t0, t1, t2, t3, t4, t5, t6;
+#if AOM_ARCH_AARCH64
int16x8_t s8, s9, s10, s11, s12, s13, s14;
- int16x8_t res1, res2, res3, res4, res5, res6, res7;
- uint16x8_t res10, res11, res9;
-#endif
- dst_ptr = dst;
- dst_u8_ptr = dst8;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8, t7;
+#endif // AOM_ARCH_AARCH64
+
do {
- __builtin_prefetch(src_ptr + 0 * src_stride);
- __builtin_prefetch(src_ptr + 1 * src_stride);
- __builtin_prefetch(src_ptr + 2 * src_stride);
- __builtin_prefetch(src_ptr + 3 * src_stride);
- __builtin_prefetch(src_ptr + 4 * src_stride);
- __builtin_prefetch(src_ptr + 5 * src_stride);
- __builtin_prefetch(src_ptr + 6 * src_stride);
- __builtin_prefetch(src_ptr + 7 * src_stride);
- load_u8_8x8(src_ptr, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ __builtin_prefetch(s + 0 * src_stride);
+ __builtin_prefetch(s + 1 * src_stride);
+ __builtin_prefetch(s + 2 * src_stride);
+ __builtin_prefetch(s + 3 * src_stride);
+ __builtin_prefetch(s + 4 * src_stride);
+ __builtin_prefetch(s + 5 * src_stride);
+ __builtin_prefetch(s + 6 * src_stride);
+ __builtin_prefetch(s + 7 * src_stride);
+ load_u8_8x7(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
@@ -2384,13 +4666,10 @@
s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
- height = h;
- s = src_ptr + (7 * src_stride);
- d_tmp = dst_ptr;
- d_u8 = dst_u8_ptr;
+ s += 7 * src_stride;
do {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
@@ -2407,71 +4686,45 @@
__builtin_prefetch(dst_ptr + 2 * dst_stride);
__builtin_prefetch(dst_ptr + 3 * dst_stride);
- res0 = convolve8_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, zero,
- shift_vec);
- res0 = vaddq_s16(res0, round_offset128);
- res1 = convolve8_8x8_s16(s1, s2, s3, s4, s5, s6, s7, s8, y_filter, zero,
- shift_vec);
- res1 = vaddq_s16(res1, round_offset128);
- res2 = convolve8_8x8_s16(s2, s3, s4, s5, s6, s7, s8, s9, y_filter, zero,
- shift_vec);
- res2 = vaddq_s16(res2, round_offset128);
- res3 = convolve8_8x8_s16(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
- zero, shift_vec);
- res3 = vaddq_s16(res3, round_offset128);
- res4 = convolve8_8x8_s16(s4, s5, s6, s7, s8, s9, s10, s11, y_filter,
- zero, shift_vec);
- res4 = vaddq_s16(res4, round_offset128);
- res5 = convolve8_8x8_s16(s5, s6, s7, s8, s9, s10, s11, s12, y_filter,
- zero, shift_vec);
- res5 = vaddq_s16(res5, round_offset128);
- res6 = convolve8_8x8_s16(s6, s7, s8, s9, s10, s11, s12, s13, y_filter,
- zero, shift_vec);
- res6 = vaddq_s16(res6, round_offset128);
- res7 = convolve8_8x8_s16(s7, s8, s9, s10, s11, s12, s13, s14, y_filter,
- zero, shift_vec);
- res7 = vaddq_s16(res7, round_offset128);
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
+ d1 = convolve8_8_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ round_offset_vec);
+ d2 = convolve8_8_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ round_offset_vec);
+ d3 = convolve8_8_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ round_offset_vec);
+ d4 = convolve8_8_y(s4, s5, s6, s7, s8, s9, s10, s11, y_filter,
+ round_offset_vec);
+ d5 = convolve8_8_y(s5, s6, s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+ d6 = convolve8_8_y(s6, s7, s8, s9, s10, s11, s12, s13, y_filter,
+ round_offset_vec);
+ d7 = convolve8_8_y(s7, s8, s9, s10, s11, s12, s13, s14, y_filter,
+ round_offset_vec);
- if (conv_params->do_average) {
- __builtin_prefetch(d_tmp + 0 * dst8_stride);
- __builtin_prefetch(d_tmp + 1 * dst8_stride);
- __builtin_prefetch(d_tmp + 2 * dst8_stride);
- __builtin_prefetch(d_tmp + 3 * dst8_stride);
+ __builtin_prefetch(d + 0 * dst8_stride);
+ __builtin_prefetch(d + 1 * dst8_stride);
+ __builtin_prefetch(d + 2 * dst8_stride);
+ __builtin_prefetch(d + 3 * dst8_stride);
- load_u16_8x4(d_tmp, dst_stride, &res8, &res9, &res10, &res11);
- d_tmp += (dst_stride << 2);
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
- compute_avg_8x4(res8, res9, res10, res11, vreinterpretq_u16_s16(res0),
- vreinterpretq_u16_s16(res1),
- vreinterpretq_u16_s16(res2),
- vreinterpretq_u16_s16(res3), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1, &t2, &t3);
+ compute_dist_wtd_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3, fwd_offset,
+ bck_offset, round_offset_vec, &d0_u8, &d1_u8,
+ &d2_u8, &d3_u8);
- store_u8_8x4(d_u8, dst8_stride, t0, t1, t2, t3);
- d_u8 += (dst8_stride << 2);
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
- load_u16_8x4(d_tmp, dst_stride, &res8, &res9, &res10, &res11);
- d_tmp += (dst_stride << 2);
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
- compute_avg_8x4(res8, res9, res10, res11, vreinterpretq_u16_s16(res4),
- vreinterpretq_u16_s16(res5),
- vreinterpretq_u16_s16(res6),
- vreinterpretq_u16_s16(res7), fwd_offset, bck_offset,
- round_offset64, round_bits, use_dist_wtd_comp_avg,
- &t0, &t1, &t2, &t3);
+ compute_dist_wtd_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7, fwd_offset,
+ bck_offset, round_offset_vec, &d4_u8, &d5_u8,
+ &d6_u8, &d7_u8);
- store_u8_8x4(d_u8, dst8_stride, t0, t1, t2, t3);
- d_u8 += (dst8_stride << 2);
- } else {
- store_u16_8x8(
- d_tmp, dst_stride, vreinterpretq_u16_s16(res0),
- vreinterpretq_u16_s16(res1), vreinterpretq_u16_s16(res2),
- vreinterpretq_u16_s16(res3), vreinterpretq_u16_s16(res4),
- vreinterpretq_u16_s16(res5), vreinterpretq_u16_s16(res6),
- vreinterpretq_u16_s16(res7));
- d_tmp += (dst_stride << 3);
- }
+ store_u8_8x4(d_u8, dst8_stride, d4_u8, d5_u8, d6_u8, d7_u8);
+ d_u8 += 4 * dst8_stride;
s0 = s8;
s1 = s9;
@@ -2480,16 +4733,16 @@
s4 = s12;
s5 = s13;
s6 = s14;
- s += (8 * src_stride);
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
height -= 8;
-#else
+#else // !AOM_ARCH_AARCH64
s7 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
__builtin_prefetch(dst_ptr);
- res0 = convolve8_8x8_s16(s0, s1, s2, s3, s4, s5, s6, s7, y_filter, zero,
- shift_vec);
- res0 = vaddq_s16(res0, round_offset128);
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
s0 = s1;
s1 = s2;
@@ -2499,31 +4752,585 @@
s5 = s6;
s6 = s7;
- if (conv_params->do_average) {
- __builtin_prefetch(d_tmp);
+ __builtin_prefetch(d);
- res8 = vld1q_u16(d_tmp);
- d_tmp += (dst_stride);
+ dd0 = vld1q_u16(d);
- compute_avg_8x1(res8, vreinterpretq_u16_s16(res0), fwd_offset,
- bck_offset, round_offset64, round_bits,
- use_dist_wtd_comp_avg, &t0);
+ compute_dist_wtd_avg_8x1(dd0, d0, fwd_offset, bck_offset,
+ round_offset_vec, &d0_u8);
- vst1_u8(d_u8, t0);
- d_u8 += (dst8_stride);
- } else {
- vst1q_u16(d_tmp, vreinterpretq_u16_s16(res0));
- d_tmp += dst_stride;
- }
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
- s += (src_stride);
+ s += src_stride;
+ d += dst_stride;
height--;
-#endif
- } while (height > 0);
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
src_ptr += 8;
dst_ptr += 8;
- dst_u8_ptr += 8;
+ dst8_ptr += 8;
width -= 8;
- } while (width > 0);
+ } while (width != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_y_8tap_avg_neon(
+ const uint8_t *src_ptr, int src_stride, uint8_t *dst8_ptr,
+ const int dst8_stride, int w, int h, const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t d0, dd0;
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6, d01;
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10;
+ uint16x4_t d1, d2, d3, dd1, dd2, dd3;
+ uint8x8_t d23;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ __builtin_prefetch(s + 0 * src_stride);
+ __builtin_prefetch(s + 1 * src_stride);
+ __builtin_prefetch(s + 2 * src_stride);
+ __builtin_prefetch(s + 3 * src_stride);
+
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+ t5 = load_unaligned_u8_4x1(s + 5 * src_stride);
+ t6 = load_unaligned_u8_4x1(s + 6 * src_stride);
+
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t5)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t6)));
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s9 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s10 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ __builtin_prefetch(d_u8 + 0 * dst8_stride);
+ __builtin_prefetch(d_u8 + 1 * dst8_stride);
+ __builtin_prefetch(d_u8 + 2 * dst8_stride);
+ __builtin_prefetch(d_u8 + 3 * dst8_stride);
+
+ load_u16_4x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_4x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d01, &d23);
+
+ store_u8_4x1(d_u8 + 0 * dst8_stride, d01, 0);
+ store_u8_4x1(d_u8 + 1 * dst8_stride, d01, 1);
+ store_u8_4x1(d_u8 + 2 * dst8_stride, d23, 0);
+ store_u8_4x1(d_u8 + 3 * dst8_stride, d23, 1);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ d_u8 += 4 * dst8_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ __builtin_prefetch(d);
+
+ dd0 = vld1_u16(d);
+
+ compute_basic_avg_4x1(dd0, d0, vget_low_s16(round_offset_vec), &d01);
+
+ store_u8_4x1(d_u8, d01, 0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ s += src_stride;
+ d += dst_stride;
+ d_u8 += dst8_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 4;
+ dst_ptr += 4;
+ dst8_ptr += 4;
+ width -= 4;
+ } while (width != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0, dd0;
+ uint8x8_t d0_u8, t0, t1, t2, t3, t4, t5, t6;
+#if AOM_ARCH_AARCH64
+ int16x8_t s8, s9, s10, s11, s12, s13, s14;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7, dd1, dd2, dd3, dd4, dd5, dd6, dd7;
+ uint8x8_t d1_u8, d2_u8, d3_u8, d4_u8, d5_u8, d6_u8, d7_u8, t7;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ uint8_t *d_u8 = dst8_ptr;
+ int height = h;
+
+ __builtin_prefetch(s + 0 * src_stride);
+ __builtin_prefetch(s + 1 * src_stride);
+ __builtin_prefetch(s + 2 * src_stride);
+ __builtin_prefetch(s + 3 * src_stride);
+ __builtin_prefetch(s + 4 * src_stride);
+ __builtin_prefetch(s + 5 * src_stride);
+ __builtin_prefetch(s + 6 * src_stride);
+ __builtin_prefetch(s + 7 * src_stride);
+ load_u8_8x7(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ __builtin_prefetch(dst_ptr + 0 * dst_stride);
+ __builtin_prefetch(dst_ptr + 1 * dst_stride);
+ __builtin_prefetch(dst_ptr + 2 * dst_stride);
+ __builtin_prefetch(dst_ptr + 3 * dst_stride);
+
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
+ d1 = convolve8_8_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ round_offset_vec);
+ d2 = convolve8_8_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ round_offset_vec);
+ d3 = convolve8_8_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ round_offset_vec);
+ d4 = convolve8_8_y(s4, s5, s6, s7, s8, s9, s10, s11, y_filter,
+ round_offset_vec);
+ d5 = convolve8_8_y(s5, s6, s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+ d6 = convolve8_8_y(s6, s7, s8, s9, s10, s11, s12, s13, y_filter,
+ round_offset_vec);
+ d7 = convolve8_8_y(s7, s8, s9, s10, s11, s12, s13, s14, y_filter,
+ round_offset_vec);
+
+ __builtin_prefetch(d + 0 * dst8_stride);
+ __builtin_prefetch(d + 1 * dst8_stride);
+ __builtin_prefetch(d + 2 * dst8_stride);
+ __builtin_prefetch(d + 3 * dst8_stride);
+
+ load_u16_8x4(d, dst_stride, &dd0, &dd1, &dd2, &dd3);
+
+ compute_basic_avg_8x4(dd0, dd1, dd2, dd3, d0, d1, d2, d3,
+ round_offset_vec, &d0_u8, &d1_u8, &d2_u8, &d3_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d0_u8, d1_u8, d2_u8, d3_u8);
+ d_u8 += 4 * dst8_stride;
+
+ load_u16_8x4(d + 4 * dst_stride, dst_stride, &dd4, &dd5, &dd6, &dd7);
+
+ compute_basic_avg_8x4(dd4, dd5, dd6, dd7, d4, d5, d6, d7,
+ round_offset_vec, &d4_u8, &d5_u8, &d6_u8, &d7_u8);
+
+ store_u8_8x4(d_u8, dst8_stride, d4_u8, d5_u8, d6_u8, d7_u8);
+ d_u8 += 4 * dst8_stride;
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s5 = s13;
+ s6 = s14;
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ s7 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
+
+ __builtin_prefetch(dst_ptr);
+
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+
+ __builtin_prefetch(d);
+
+ dd0 = vld1q_u16(d);
+
+ compute_basic_avg_8x1(dd0, d0, round_offset_vec, &d0_u8);
+
+ vst1_u8(d_u8, d0_u8);
+ d_u8 += dst8_stride;
+
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ dst8_ptr += 8;
+ width -= 8;
+ } while (width != 0);
+ }
+}
+
+static INLINE void dist_wtd_convolve_y_8tap_neon(const uint8_t *src_ptr,
+ int src_stride, int w, int h,
+ const int16x8_t y_filter,
+ ConvolveParams *conv_params) {
+ const int bd = 8;
+ const int offset_bits = bd + 2 * FILTER_BITS - ROUND0_BITS;
+ const int16_t round_offset = (1 << (offset_bits - COMPOUND_ROUND1_BITS)) +
+ (1 << (offset_bits - COMPOUND_ROUND1_BITS - 1));
+ const int16x8_t round_offset_vec = vdupq_n_s16(round_offset);
+
+ CONV_BUF_TYPE *dst_ptr = conv_params->dst;
+ const int dst_stride = conv_params->dst_stride;
+ int width = w;
+
+ if (w == 4 || h == 4) {
+ int16x4_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x4_t d0;
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6;
+#if AOM_ARCH_AARCH64
+ int16x4_t s8, s9, s10;
+ uint16x4_t d1, d2, d3;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ __builtin_prefetch(s + 0 * src_stride);
+ __builtin_prefetch(s + 1 * src_stride);
+ __builtin_prefetch(s + 2 * src_stride);
+ __builtin_prefetch(s + 3 * src_stride);
+
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+ t4 = load_unaligned_u8_4x1(s + 4 * src_stride);
+ t5 = load_unaligned_u8_4x1(s + 5 * src_stride);
+ t6 = load_unaligned_u8_4x1(s + 6 * src_stride);
+
+ s0 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s1 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s2 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s3 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+ s4 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t4)));
+ s5 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t5)));
+ s6 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t6)));
+
+ __builtin_prefetch(d + 0 * dst_stride);
+ __builtin_prefetch(d + 1 * dst_stride);
+ __builtin_prefetch(d + 2 * dst_stride);
+ __builtin_prefetch(d + 3 * dst_stride);
+
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s + 0 * src_stride);
+ t1 = load_unaligned_u8_4x1(s + 1 * src_stride);
+ t2 = load_unaligned_u8_4x1(s + 2 * src_stride);
+ t3 = load_unaligned_u8_4x1(s + 3 * src_stride);
+
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+ s8 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t1)));
+ s9 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t2)));
+ s10 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t3)));
+
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+ d1 = convolve8_4_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ vget_low_s16(round_offset_vec));
+ d2 = convolve8_4_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ vget_low_s16(round_offset_vec));
+ d3 = convolve8_4_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ store_u16_4x4(d, dst_stride, d0, d1, d2, d3);
+
+ s0 = s4;
+ s1 = s5;
+ s2 = s6;
+ s3 = s7;
+ s4 = s8;
+ s5 = s9;
+ s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
+ height -= 4;
+#else // !AOM_ARCH_AARCH64
+ t0 = load_unaligned_u8_4x1(s);
+ s7 = vreinterpret_s16_u16(vget_low_u16(vmovl_u8(t0)));
+
+ d0 = convolve8_4_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ vget_low_s16(round_offset_vec));
+
+ vst1_u16(d, d0);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 4;
+ dst_ptr += 4;
+ width -= 4;
+ } while (width != 0);
+ } else {
+ int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
+ uint16x8_t d0;
+ uint8x8_t t0, t1, t2, t3, t4, t5, t6;
+#if AOM_ARCH_AARCH64
+ int16x8_t s8, s9, s10, s11, s12, s13, s14;
+ uint16x8_t d1, d2, d3, d4, d5, d6, d7;
+ uint8x8_t t7;
+#endif // AOM_ARCH_AARCH64
+
+ do {
+ const uint8_t *s = src_ptr;
+ CONV_BUF_TYPE *d = dst_ptr;
+ int height = h;
+
+ __builtin_prefetch(s + 0 * src_stride);
+ __builtin_prefetch(s + 1 * src_stride);
+ __builtin_prefetch(s + 2 * src_stride);
+ __builtin_prefetch(s + 3 * src_stride);
+ __builtin_prefetch(s + 4 * src_stride);
+ __builtin_prefetch(s + 5 * src_stride);
+ __builtin_prefetch(s + 6 * src_stride);
+ __builtin_prefetch(s + 7 * src_stride);
+ load_u8_8x7(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6);
+
+ s0 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s1 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s2 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s3 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s4 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s5 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s6 = vreinterpretq_s16_u16(vmovl_u8(t6));
+
+ s += 7 * src_stride;
+
+ do {
+#if AOM_ARCH_AARCH64
+ load_u8_8x8(s, src_stride, &t0, &t1, &t2, &t3, &t4, &t5, &t6, &t7);
+
+ s7 = vreinterpretq_s16_u16(vmovl_u8(t0));
+ s8 = vreinterpretq_s16_u16(vmovl_u8(t1));
+ s9 = vreinterpretq_s16_u16(vmovl_u8(t2));
+ s10 = vreinterpretq_s16_u16(vmovl_u8(t3));
+ s11 = vreinterpretq_s16_u16(vmovl_u8(t4));
+ s12 = vreinterpretq_s16_u16(vmovl_u8(t5));
+ s13 = vreinterpretq_s16_u16(vmovl_u8(t6));
+ s14 = vreinterpretq_s16_u16(vmovl_u8(t7));
+
+ __builtin_prefetch(dst_ptr + 0 * dst_stride);
+ __builtin_prefetch(dst_ptr + 1 * dst_stride);
+ __builtin_prefetch(dst_ptr + 2 * dst_stride);
+ __builtin_prefetch(dst_ptr + 3 * dst_stride);
+
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
+ d1 = convolve8_8_y(s1, s2, s3, s4, s5, s6, s7, s8, y_filter,
+ round_offset_vec);
+ d2 = convolve8_8_y(s2, s3, s4, s5, s6, s7, s8, s9, y_filter,
+ round_offset_vec);
+ d3 = convolve8_8_y(s3, s4, s5, s6, s7, s8, s9, s10, y_filter,
+ round_offset_vec);
+ d4 = convolve8_8_y(s4, s5, s6, s7, s8, s9, s10, s11, y_filter,
+ round_offset_vec);
+ d5 = convolve8_8_y(s5, s6, s7, s8, s9, s10, s11, s12, y_filter,
+ round_offset_vec);
+ d6 = convolve8_8_y(s6, s7, s8, s9, s10, s11, s12, s13, y_filter,
+ round_offset_vec);
+ d7 = convolve8_8_y(s7, s8, s9, s10, s11, s12, s13, s14, y_filter,
+ round_offset_vec);
+
+ store_u16_8x8(d, dst_stride, d0, d1, d2, d3, d4, d5, d6, d7);
+
+ s0 = s8;
+ s1 = s9;
+ s2 = s10;
+ s3 = s11;
+ s4 = s12;
+ s5 = s13;
+ s6 = s14;
+ s += 8 * src_stride;
+ d += 8 * dst_stride;
+ height -= 8;
+#else // !AOM_ARCH_AARCH64
+ s7 = vreinterpretq_s16_u16(vmovl_u8(vld1_u8(s)));
+
+ __builtin_prefetch(dst_ptr);
+
+ d0 = convolve8_8_y(s0, s1, s2, s3, s4, s5, s6, s7, y_filter,
+ round_offset_vec);
+
+ s0 = s1;
+ s1 = s2;
+ s2 = s3;
+ s3 = s4;
+ s4 = s5;
+ s5 = s6;
+ s6 = s7;
+
+ vst1q_u16(d, d0);
+
+ s += src_stride;
+ d += dst_stride;
+ height--;
+#endif // AOM_ARCH_AARCH64
+ } while (height != 0);
+ src_ptr += 8;
+ dst_ptr += 8;
+ width -= 8;
+ } while (width != 0);
+ }
+}
+
+void av1_dist_wtd_convolve_y_neon(const uint8_t *src, int src_stride,
+ uint8_t *dst8, int dst8_stride, int w, int h,
+ const InterpFilterParams *filter_params_y,
+ const int subpel_y_qn,
+ ConvolveParams *conv_params) {
+ assert(w % 4 == 0);
+ assert(h % 4 == 0);
+
+ // Vertical filter.
+ const int16_t *y_filter_ptr = av1_get_interp_filter_subpel_kernel(
+ filter_params_y, subpel_y_qn & SUBPEL_MASK);
+ // Filter values are even, so downshift by 1 to reduce intermediate
+ // precision requirements.
+ const int16x8_t y_filter = vshrq_n_s16(vld1q_s16(y_filter_ptr), 1);
+
+ const int vert_offset = filter_params_y->taps / 2 - 1;
+ const uint8_t *src_ptr = src - (vert_offset * src_stride);
+
+ if (get_filter_tap(filter_params_y, subpel_y_qn) <= 6) {
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_y_6tap_dist_wtd_avg_neon(
+ src_ptr + src_stride, src_stride, dst8, dst8_stride, w, h, y_filter,
+ conv_params);
+ } else {
+ dist_wtd_convolve_y_6tap_avg_neon(src_ptr + src_stride, src_stride,
+ dst8, dst8_stride, w, h, y_filter,
+ conv_params);
+ }
+ } else {
+ dist_wtd_convolve_y_6tap_neon(src_ptr + src_stride, src_stride, w, h,
+ y_filter, conv_params);
+ }
+ } else {
+ if (conv_params->do_average) {
+ if (UNLIKELY(conv_params->use_dist_wtd_comp_avg)) {
+ dist_wtd_convolve_y_8tap_dist_wtd_avg_neon(src_ptr, src_stride, dst8,
+ dst8_stride, w, h, y_filter,
+ conv_params);
+ } else {
+ dist_wtd_convolve_y_8tap_avg_neon(src_ptr, src_stride, dst8,
+ dst8_stride, w, h, y_filter,
+ conv_params);
+ }
+ } else {
+ dist_wtd_convolve_y_8tap_neon(src_ptr, src_stride, w, h, y_filter,
+ conv_params);
+ }
}
}
diff --git a/av1/common/arm/reconintra_neon.c b/av1/common/arm/reconintra_neon.c
index 43c470f..8d190fb 100644
--- a/av1/common/arm/reconintra_neon.c
+++ b/av1/common/arm/reconintra_neon.c
@@ -12,6 +12,8 @@
#include <arm_neon.h>
#include <assert.h>
+#include "config/aom_config.h"
+
#include "aom/aom_integer.h"
#include "aom_dsp/arm/sum_neon.h"
@@ -126,7 +128,7 @@
out_45 = vmlaq_s16(out_45, vreinterpretq_s16_u16(p_b_hi), f5f4_hi);
int16x8_t out_67 = vmulq_s16(vreinterpretq_s16_u16(p_b_lo), f7f6_lo);
out_67 = vmlaq_s16(out_67, vreinterpretq_s16_u16(p_b_hi), f7f6_hi);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const int16x8_t out_0123 = vpaddq_s16(out_01, out_23);
const int16x8_t out_4567 = vpaddq_s16(out_45, out_67);
const int16x8_t out_01234567 = vpaddq_s16(out_0123, out_4567);
@@ -137,7 +139,7 @@
vqmovn_s32(vpaddlq_s16(out_67)));
const int16x8_t out_01234567 = vcombine_s16(
vqmovn_s32(vpaddlq_s16(out_0123)), vqmovn_s32(vpaddlq_s16(out_4567)));
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
const uint32x2_t out_r =
vreinterpret_u32_u8(vqmovun_s16(vrshrq_n_s16(out_01234567, 4)));
// Storing
diff --git a/av1/common/arm/resize_neon.c b/av1/common/arm/resize_neon.c
index df4f7d6..5f6d214 100644
--- a/av1/common/arm/resize_neon.c
+++ b/av1/common/arm/resize_neon.c
@@ -783,7 +783,7 @@
const int buffer_height = (4 * dst_y_h + SUBPEL_TAPS - 2 + 7) & ~7;
uint8_t *const temp_buffer =
(uint8_t *)malloc(buffer_stride * buffer_height);
- if (temp_buffer) {
+ if (!temp_buffer) {
malloc_failed = 1;
break;
}
diff --git a/av1/common/arm/selfguided_neon.c b/av1/common/arm/selfguided_neon.c
index f5eb36c..d14088e 100644
--- a/av1/common/arm/selfguided_neon.c
+++ b/av1/common/arm/selfguided_neon.c
@@ -1503,7 +1503,10 @@
{
int16_t *src_ptr;
uint8_t *dst_ptr;
+#if CONFIG_AV1_HIGHBITDEPTH
+ uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst8);
uint16_t *dst16_ptr;
+#endif
int16x4_t d0, d4;
int16x8_t r0, s0;
uint16x8_t r4;
@@ -1515,14 +1518,14 @@
const int32x4_t xq1_vec = vdupq_n_s32(xq[1]);
const int16x8_t zero = vdupq_n_s16(0);
const uint16x8_t max = vdupq_n_u16((1 << bit_depth) - 1);
- uint16_t *dst16 = CONVERT_TO_SHORTPTR(dst8);
- dst_ptr = dst8;
src_ptr = (int16_t *)dgd16;
do {
w = width;
count = 0;
dst_ptr = dst8 + rc * dst_stride;
+#if CONFIG_AV1_HIGHBITDEPTH
dst16_ptr = dst16 + rc * dst_stride;
+#endif
do {
s0 = vld1q_s16(src_ptr + count);
@@ -1565,19 +1568,20 @@
if (highbd) {
r4 = vminq_u16(r4, max);
vst1q_u16(dst16_ptr, r4);
+ dst16_ptr += 8;
} else {
t0 = vqmovn_u16(r4);
vst1_u8(dst_ptr, t0);
+ dst_ptr += 8;
}
#else
(void)max;
t0 = vqmovn_u16(r4);
vst1_u8(dst_ptr, t0);
+ dst_ptr += 8;
#endif
w -= 8;
count += 8;
- dst_ptr += 8;
- dst16_ptr += 8;
} while (w > 0);
src_ptr += dgd16_stride;
diff --git a/av1/common/arm/warp_plane_neon.c b/av1/common/arm/warp_plane_neon.c
index 03b6db8..b4d3148 100644
--- a/av1/common/arm/warp_plane_neon.c
+++ b/av1/common/arm/warp_plane_neon.c
@@ -222,8 +222,7 @@
int16x8_t *tmp_dst, int sx, int alpha,
int k, const int offset_bits_horiz,
const int reduce_bits_horiz) {
- const uint8x16_t mask = { 255, 0, 255, 0, 255, 0, 255, 0,
- 255, 0, 255, 0, 255, 0, 255, 0 };
+ const uint8x16_t mask = vreinterpretq_u8_u16(vdupq_n_u16(0x00ff));
const int32x4_t add_const = vdupq_n_s32((int32_t)(1 << offset_bits_horiz));
const int16x8_t shift = vdupq_n_s16(-(int16_t)reduce_bits_horiz);
@@ -488,9 +487,9 @@
int32x4_t res_lo, res_hi;
int16x8_t result_final;
uint8x16_t src_1, src_2, src_3, src_4;
- uint8x16_t indx_vec = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
- };
+ static const uint8_t k0To15[16] = { 0, 1, 2, 3, 4, 5, 6, 7,
+ 8, 9, 10, 11, 12, 13, 14, 15 };
+ uint8x16_t indx_vec = vld1q_u8(k0To15);
uint8x16_t cmp_vec;
const int reduce_bits_horiz = conv_params->round_0;
diff --git a/av1/common/arm/wiener_convolve_neon.c b/av1/common/arm/wiener_convolve_neon.c
index 0a12c88..d7f511d 100644
--- a/av1/common/arm/wiener_convolve_neon.c
+++ b/av1/common/arm/wiener_convolve_neon.c
@@ -153,7 +153,7 @@
height = intermediate_height;
// For aarch_64.
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
int processed_height = 0;
uint16_t *d_tmp;
int width, remaining_height;
@@ -248,21 +248,11 @@
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
uint8x8_t t0;
s = src_tmp_ptr;
- s0 = vld1q_s16(s);
- s += src_stride;
- s1 = vld1q_s16(s);
- s += src_stride;
- s2 = vld1q_s16(s);
- s += src_stride;
- s3 = vld1q_s16(s);
- s += src_stride;
- s4 = vld1q_s16(s);
- s += src_stride;
- s5 = vld1q_s16(s);
- s += src_stride;
- s6 = vld1q_s16(s);
- s += src_stride;
d = dst_tmp_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
height = h;
do {
@@ -273,14 +263,7 @@
__builtin_prefetch(dst_tmp_ptr + 2 * dst_stride);
__builtin_prefetch(dst_tmp_ptr + 3 * dst_stride);
- s7 = vld1q_s16(s);
- s += src_stride;
- s8 = vld1q_s16(s);
- s += src_stride;
- s9 = vld1q_s16(s);
- s += src_stride;
- s10 = vld1q_s16(s);
- s += src_stride;
+ load_s16_8x4(s, src_stride, &s7, &s8, &s9, &s10);
t0 = wiener_convolve8_vert_4x8(s0, s1, s2, s3, s4, s5, s6, filter_y_tmp,
bd, conv_params->round_1);
@@ -291,14 +274,7 @@
t3 = wiener_convolve8_vert_4x8(s3, s4, s5, s6, s7, s8, s9, filter_y_tmp,
bd, conv_params->round_1);
- vst1_u8(d, t0);
- d += dst_stride;
- vst1_u8(d, t1);
- d += dst_stride;
- vst1_u8(d, t2);
- d += dst_stride;
- vst1_u8(d, t3);
- d += dst_stride;
+ store_u8_8x4(d, dst_stride, t0, t1, t2, t3);
s0 = s4;
s1 = s5;
@@ -307,6 +283,8 @@
s4 = s8;
s5 = s9;
s6 = s10;
+ s += 4 * src_stride;
+ d += 4 * dst_stride;
height -= 4;
} while (height > 3);
@@ -336,21 +314,11 @@
uint8x8_t t0;
int16x8_t s0, s1, s2, s3, s4, s5, s6, s7;
s = src_tmp_ptr;
- s0 = vld1q_s16(s);
- s += src_stride;
- s1 = vld1q_s16(s);
- s += src_stride;
- s2 = vld1q_s16(s);
- s += src_stride;
- s3 = vld1q_s16(s);
- s += src_stride;
- s4 = vld1q_s16(s);
- s += src_stride;
- s5 = vld1q_s16(s);
- s += src_stride;
- s6 = vld1q_s16(s);
- s += src_stride;
d = dst_tmp_ptr;
+
+ load_s16_8x7(s, src_stride, &s0, &s1, &s2, &s3, &s4, &s5, &s6);
+ s += 7 * src_stride;
+
height = h;
PROCESS_ROW_FOR_VERTICAL_FILTER
diff --git a/av1/common/av1_common_int.h b/av1/common/av1_common_int.h
index 304a551..4c0cb99 100644
--- a/av1/common/av1_common_int.h
+++ b/av1/common/av1_common_int.h
@@ -16,6 +16,7 @@
#include "config/av1_rtcd.h"
#include "aom/internal/aom_codec_internal.h"
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_util/aom_thread.h"
#include "av1/common/alloccommon.h"
#include "av1/common/av1_loopfilter.h"
@@ -184,7 +185,8 @@
aom_get_frame_buffer_cb_fn_t get_fb_cb;
aom_release_frame_buffer_cb_fn_t release_fb_cb;
- RefCntBuffer frame_bufs[FRAME_BUFFERS];
+ RefCntBuffer *frame_bufs;
+ uint8_t num_frame_bufs;
// Frame buffers allocated internally by the codec.
InternalFrameBufferList int_frame_buffers;
@@ -1092,10 +1094,11 @@
int i;
lock_buffer_pool(cm->buffer_pool);
- for (i = 0; i < FRAME_BUFFERS; ++i)
+ const int num_frame_bufs = cm->buffer_pool->num_frame_bufs;
+ for (i = 0; i < num_frame_bufs; ++i)
if (frame_bufs[i].ref_count == 0) break;
- if (i != FRAME_BUFFERS) {
+ if (i != num_frame_bufs) {
if (frame_bufs[i].buf.use_external_reference_buffers) {
// If this frame buffer's y_buffer, u_buffer, and v_buffer point to the
// external reference buffers. Restore the buffer pointers to point to the
@@ -1132,7 +1135,10 @@
if (new_fb_idx == INVALID_IDX) return NULL;
cm->cur_frame = &cm->buffer_pool->frame_bufs[new_fb_idx];
- cm->cur_frame->buf.buf_8bit_valid = 0;
+#if CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
+ aom_invalidate_pyramid(cm->cur_frame->buf.y_pyramid);
+ av1_invalidate_corner_list(cm->cur_frame->buf.corners);
+#endif // CONFIG_AV1_ENCODER && !CONFIG_REALTIME_ONLY
av1_zero(cm->cur_frame->interp_filter_selected);
return cm->cur_frame;
}
@@ -1237,10 +1243,8 @@
const int mem_size =
((mi_params->mi_rows + MAX_MIB_SIZE) >> 1) * (mi_params->mi_stride >> 1);
- int realloc = cm->tpl_mvs == NULL;
- if (cm->tpl_mvs) realloc |= cm->tpl_mvs_mem_size < mem_size;
- if (realloc) {
+ if (cm->tpl_mvs == NULL || cm->tpl_mvs_mem_size < mem_size) {
aom_free(cm->tpl_mvs);
CHECK_MEM_ERROR(cm, cm->tpl_mvs,
(TPL_MV_REF *)aom_calloc(mem_size, sizeof(*cm->tpl_mvs)));
@@ -1613,18 +1617,6 @@
sizeof(xd->left_txfm_context_buffer));
}
-// Disable array-bounds checks as the TX_SIZE enum contains values larger than
-// TX_SIZES_ALL (TX_INVALID) which make extending the array as a workaround
-// infeasible. The assert is enough for static analysis and this or other tools
-// asan, valgrind would catch oob access at runtime.
-#if defined(__GNUC__) && __GNUC__ >= 4
-#pragma GCC diagnostic ignored "-Warray-bounds"
-#endif
-
-#if defined(__GNUC__) && __GNUC__ >= 4
-#pragma GCC diagnostic warning "-Warray-bounds"
-#endif
-
static INLINE void set_txfm_ctx(TXFM_CONTEXT *txfm_ctx, uint8_t txs, int len) {
int i;
for (i = 0; i < len; ++i) txfm_ctx[i] = txs;
diff --git a/av1/common/av1_inv_txfm2d.c b/av1/common/av1_inv_txfm2d.c
index 154c9d2..ee67dff 100644
--- a/av1/common/av1_inv_txfm2d.c
+++ b/av1/common/av1_inv_txfm2d.c
@@ -29,10 +29,10 @@
uint16_t *dest = CONVERT_TO_SHORTPTR(dest8);
for (i = 0; i < 4; i++) {
- a1 = ip[0] >> UNIT_QUANT_SHIFT;
- c1 = ip[1] >> UNIT_QUANT_SHIFT;
- d1 = ip[2] >> UNIT_QUANT_SHIFT;
- b1 = ip[3] >> UNIT_QUANT_SHIFT;
+ a1 = ip[4 * 0] >> UNIT_QUANT_SHIFT;
+ c1 = ip[4 * 1] >> UNIT_QUANT_SHIFT;
+ d1 = ip[4 * 2] >> UNIT_QUANT_SHIFT;
+ b1 = ip[4 * 3] >> UNIT_QUANT_SHIFT;
a1 += c1;
d1 -= b1;
e1 = (a1 - d1) >> 1;
@@ -41,20 +41,20 @@
a1 -= b1;
d1 += c1;
- op[0] = a1;
- op[1] = b1;
- op[2] = c1;
- op[3] = d1;
- ip += 4;
- op += 4;
+ op[4 * 0] = a1;
+ op[4 * 1] = b1;
+ op[4 * 2] = c1;
+ op[4 * 3] = d1;
+ ip++;
+ op++;
}
ip = output;
for (i = 0; i < 4; i++) {
- a1 = ip[4 * 0];
- c1 = ip[4 * 1];
- d1 = ip[4 * 2];
- b1 = ip[4 * 3];
+ a1 = ip[0];
+ c1 = ip[1];
+ d1 = ip[2];
+ b1 = ip[3];
a1 += c1;
d1 -= b1;
e1 = (a1 - d1) >> 1;
@@ -73,7 +73,7 @@
dest[stride * 2] = highbd_clip_pixel_add(dest[stride * 2], c1, bd);
dest[stride * 3] = highbd_clip_pixel_add(dest[stride * 3], d1, bd);
- ip++;
+ ip += 4;
dest++;
}
}
@@ -88,7 +88,7 @@
uint16_t *dest = CONVERT_TO_SHORTPTR(dest8);
(void)bd;
- a1 = ip[0] >> UNIT_QUANT_SHIFT;
+ a1 = ip[0 * 4] >> UNIT_QUANT_SHIFT;
e1 = a1 >> 1;
a1 -= e1;
op[0] = a1;
@@ -271,19 +271,19 @@
for (r = 0; r < txfm_size_row; ++r) {
if (abs(rect_type) == 1) {
for (c = 0; c < txfm_size_col; ++c) {
- temp_in[c] = round_shift((int64_t)input[c] * NewInvSqrt2, NewSqrt2Bits);
+ temp_in[c] = round_shift(
+ (int64_t)input[c * txfm_size_row + r] * NewInvSqrt2, NewSqrt2Bits);
}
clamp_buf(temp_in, txfm_size_col, bd + 8);
txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
} else {
for (c = 0; c < txfm_size_col; ++c) {
- temp_in[c] = input[c];
+ temp_in[c] = input[c * txfm_size_row + r];
}
clamp_buf(temp_in, txfm_size_col, bd + 8);
txfm_func_row(temp_in, buf_ptr, cos_bit_row, stage_range_row);
}
av1_round_shift_array(buf_ptr, txfm_size_col, -shift[0]);
- input += txfm_size_col;
buf_ptr += txfm_size_col;
}
@@ -393,9 +393,9 @@
// - Copying over these values in top-left 32x32 locations.
// - Setting the rest of the locations to 0.
int32_t mod_input[64 * 64];
- for (int row = 0; row < 32; ++row) {
- memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
- memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
+ for (int col = 0; col < 32; ++col) {
+ memcpy(mod_input + col * 64, input + col * 32, 32 * sizeof(*mod_input));
+ memset(mod_input + col * 64 + 32, 0, 32 * sizeof(*mod_input));
}
memset(mod_input + 32 * 64, 0, 32 * 64 * sizeof(*mod_input));
DECLARE_ALIGNED(32, int, txfm_buf[64 * 64 + 64 + 64]);
@@ -408,11 +408,9 @@
// Remap 32x32 input into a modified 64x32 by:
// - Copying over these values in top-left 32x32 locations.
// - Setting the rest of the locations to 0.
- int32_t mod_input[64 * 32];
- for (int row = 0; row < 32; ++row) {
- memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
- memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
- }
+ int32_t mod_input[32 * 64];
+ memcpy(mod_input, input, 32 * 32 * sizeof(*mod_input));
+ memset(mod_input + 32 * 32, 0, 32 * 32 * sizeof(*mod_input));
DECLARE_ALIGNED(32, int, txfm_buf[64 * 32 + 64 + 64]);
inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_64X32,
bd);
@@ -423,9 +421,11 @@
// Remap 32x32 input into a modified 32x64 input by:
// - Copying over these values in top-left 32x32 locations.
// - Setting the rest of the locations to 0.
- int32_t mod_input[32 * 64];
- memcpy(mod_input, input, 32 * 32 * sizeof(*mod_input));
- memset(mod_input + 32 * 32, 0, 32 * 32 * sizeof(*mod_input));
+ int32_t mod_input[64 * 32];
+ for (int col = 0; col < 32; ++col) {
+ memcpy(mod_input + col * 64, input + col * 32, 32 * sizeof(*mod_input));
+ memset(mod_input + col * 64 + 32, 0, 32 * sizeof(*mod_input));
+ }
DECLARE_ALIGNED(32, int, txfm_buf[64 * 32 + 64 + 64]);
inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_32X64,
bd);
@@ -436,9 +436,11 @@
// Remap 16x32 input into a modified 16x64 input by:
// - Copying over these values in top-left 16x32 locations.
// - Setting the rest of the locations to 0.
- int32_t mod_input[16 * 64];
- memcpy(mod_input, input, 16 * 32 * sizeof(*mod_input));
- memset(mod_input + 16 * 32, 0, 16 * 32 * sizeof(*mod_input));
+ int32_t mod_input[64 * 16];
+ for (int col = 0; col < 16; ++col) {
+ memcpy(mod_input + col * 64, input + col * 32, 32 * sizeof(*mod_input));
+ memset(mod_input + col * 64 + 32, 0, 32 * sizeof(*mod_input));
+ }
DECLARE_ALIGNED(32, int, txfm_buf[16 * 64 + 64 + 64]);
inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_16X64,
bd);
@@ -449,11 +451,9 @@
// Remap 32x16 input into a modified 64x16 by:
// - Copying over these values in top-left 32x16 locations.
// - Setting the rest of the locations to 0.
- int32_t mod_input[64 * 16];
- for (int row = 0; row < 16; ++row) {
- memcpy(mod_input + row * 64, input + row * 32, 32 * sizeof(*mod_input));
- memset(mod_input + row * 64 + 32, 0, 32 * sizeof(*mod_input));
- }
+ int32_t mod_input[16 * 64];
+ memcpy(mod_input, input, 16 * 32 * sizeof(*mod_input));
+ memset(mod_input + 16 * 32, 0, 16 * 32 * sizeof(*mod_input));
DECLARE_ALIGNED(32, int, txfm_buf[16 * 64 + 64 + 64]);
inv_txfm2d_add_facade(mod_input, output, stride, txfm_buf, tx_type, TX_64X16,
bd);
diff --git a/av1/common/av1_rtcd_defs.pl b/av1/common/av1_rtcd_defs.pl
index ba1dcbb..17dcc49 100644
--- a/av1/common/av1_rtcd_defs.pl
+++ b/av1/common/av1_rtcd_defs.pl
@@ -17,12 +17,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -92,7 +92,7 @@
if(aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/void av1_highbd_convolve_horiz_rs/, "const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd";
- specialize qw/av1_highbd_convolve_horiz_rs sse4_1/;
+ specialize qw/av1_highbd_convolve_horiz_rs sse4_1 neon/;
add_proto qw/void av1_highbd_wiener_convolve_add_src/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, const ConvolveParams *conv_params, int bd";
specialize qw/av1_highbd_wiener_convolve_add_src ssse3 avx2/;
@@ -282,7 +282,7 @@
add_proto qw/void aom_upsampled_pred/, "MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search";
- specialize qw/aom_upsampled_pred sse2/;
+ specialize qw/aom_upsampled_pred neon sse2/;
#
#
#
@@ -290,7 +290,7 @@
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
int ref_stride, int subpel_search";
- specialize qw/aom_comp_avg_upsampled_pred sse2/;
+ specialize qw/aom_comp_avg_upsampled_pred sse2 neon/;
add_proto qw/void aom_dist_wtd_comp_avg_upsampled_pred/, "MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
@@ -298,13 +298,6 @@
int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param, int subpel_search";
specialize qw/aom_dist_wtd_comp_avg_upsampled_pred ssse3/;
- add_proto qw/void aom_comp_mask_upsampled_pred/, "MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search";
- specialize qw/aom_comp_mask_upsampled_pred sse2/;
-
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
add_proto qw/void aom_highbd_upsampled_pred/, "MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred8, int width, int height, int subpel_x_q3,
@@ -402,21 +395,26 @@
# Motion search
#
if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
- add_proto qw/void av1_apply_temporal_filter/, "const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count";
+ add_proto qw/void av1_apply_temporal_filter/, "const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count";
specialize qw/av1_apply_temporal_filter sse2 avx2 neon/;
+
+ add_proto qw/double av1_estimate_noise_from_single_plane/, "const uint8_t *src, int height, int width, int stride, int edge_thresh";
+ specialize qw/av1_estimate_noise_from_single_plane avx2/;
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
- add_proto qw/void av1_highbd_apply_temporal_filter/, "const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count";
+ add_proto qw/void av1_highbd_apply_temporal_filter/, "const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count";
specialize qw/av1_highbd_apply_temporal_filter sse2 avx2/;
+
+ add_proto qw/double av1_highbd_estimate_noise_from_single_plane/, "const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh";
}
}
add_proto qw/void av1_quantize_b/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan, const qm_val_t * qm_ptr, const qm_val_t * iqm_ptr, int log_scale";
add_proto qw/void av1_calc_indices_dim1/, "const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k";
- specialize qw/av1_calc_indices_dim1 sse2 avx2/;
+ specialize qw/av1_calc_indices_dim1 sse2 avx2 neon/;
add_proto qw/void av1_calc_indices_dim2/, "const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k";
- specialize qw/av1_calc_indices_dim2 sse2 avx2/;
+ specialize qw/av1_calc_indices_dim2 sse2 avx2 neon/;
# ENCODEMB INVOKE
if (aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
@@ -429,9 +427,6 @@
specialize qw/av1_highbd_quantize_fp sse4_1 avx2 neon/;
}
- add_proto qw/void av1_highbd_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride";
- specialize qw/av1_highbd_fwht4x4 sse4_1 neon/;
-
# End av1_high encoder functions
# txb
@@ -452,7 +447,7 @@
specialize qw/av1_get_crc32c_value sse4_2 arm_crc32/;
if (aom_config("CONFIG_REALTIME_ONLY") ne "yes") {
- add_proto qw/void av1_compute_stats/, "int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats";
+ add_proto qw/void av1_compute_stats/, "int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats";
specialize qw/av1_compute_stats sse4_1 avx2/;
add_proto qw/void av1_calc_proj_params/, " const uint8_t *src8, int width, int height, int src_stride, const uint8_t *dat8, int dat_stride, int32_t *flt0, int flt0_stride, int32_t *flt1, int flt1_stride, int64_t H[2][2], int64_t C[2], const sgr_params_type *params";
specialize qw/av1_calc_proj_params sse4_1 avx2/;
@@ -594,14 +589,14 @@
specialize qw/av1_dist_wtd_convolve_x sse2 avx2 neon/;
specialize qw/av1_dist_wtd_convolve_y sse2 avx2 neon/;
if(aom_config("CONFIG_AV1_HIGHBITDEPTH") eq "yes") {
- specialize qw/av1_highbd_dist_wtd_convolve_2d sse4_1 avx2/;
- specialize qw/av1_highbd_dist_wtd_convolve_x sse4_1 avx2/;
- specialize qw/av1_highbd_dist_wtd_convolve_y sse4_1 avx2/;
- specialize qw/av1_highbd_dist_wtd_convolve_2d_copy sse4_1 avx2/;
- specialize qw/av1_highbd_convolve_2d_sr ssse3 avx2/;
- specialize qw/av1_highbd_convolve_x_sr ssse3 avx2/;
- specialize qw/av1_highbd_convolve_y_sr ssse3 avx2/;
- specialize qw/av1_highbd_convolve_2d_scale sse4_1/;
+ specialize qw/av1_highbd_dist_wtd_convolve_2d sse4_1 avx2 neon/;
+ specialize qw/av1_highbd_dist_wtd_convolve_x sse4_1 avx2 neon/;
+ specialize qw/av1_highbd_dist_wtd_convolve_y sse4_1 avx2 neon/;
+ specialize qw/av1_highbd_dist_wtd_convolve_2d_copy sse4_1 avx2 neon/;
+ specialize qw/av1_highbd_convolve_2d_sr ssse3 avx2 neon/;
+ specialize qw/av1_highbd_convolve_x_sr ssse3 avx2 neon/;
+ specialize qw/av1_highbd_convolve_y_sr ssse3 avx2 neon/;
+ specialize qw/av1_highbd_convolve_2d_scale sse4_1 neon/;
}
# INTRA_EDGE functions
diff --git a/av1/common/blockd.h b/av1/common/blockd.h
index 5f90e57..e7f1b6b 100644
--- a/av1/common/blockd.h
+++ b/av1/common/blockd.h
@@ -518,11 +518,6 @@
/*!\cond */
-#if CONFIG_DEBUG
-#define CFL_SUB8X8_VAL_MI_SIZE (4)
-#define CFL_SUB8X8_VAL_MI_SQUARE \
- (CFL_SUB8X8_VAL_MI_SIZE * CFL_SUB8X8_VAL_MI_SIZE)
-#endif // CONFIG_DEBUG
#define CFL_MAX_BLOCK_SIZE (BLOCK_32X32)
#define CFL_BUF_LINE (32)
#define CFL_BUF_LINE_I128 (CFL_BUF_LINE >> 3)
@@ -537,9 +532,10 @@
// Cache the DC_PRED when performing RDO, so it does not have to be recomputed
// for every scaling parameter
- int dc_pred_is_cached[CFL_PRED_PLANES];
- // The DC_PRED cache is disable when decoding
- int use_dc_pred_cache;
+ bool dc_pred_is_cached[CFL_PRED_PLANES];
+ // Whether the DC_PRED cache is enabled. The DC_PRED cache is disabled when
+ // decoding.
+ bool use_dc_pred_cache;
// Only cache the first row of the DC_PRED
int16_t dc_pred_cache[CFL_PRED_PLANES][CFL_BUF_LINE];
diff --git a/av1/common/cdef_block_simd.h b/av1/common/cdef_block_simd.h
index df67871..5c62201 100644
--- a/av1/common/cdef_block_simd.h
+++ b/av1/common/cdef_block_simd.h
@@ -270,6 +270,12 @@
return max;
}
+// MSVC takes far too much time optimizing these.
+// https://bugs.chromium.org/p/aomedia/issues/detail?id=3395
+#if defined(_MSC_VER) && !defined(__clang__)
+#pragma optimize("", off)
+#endif
+
CDEF_INLINE void filter_block_4x4(const int is_lowbd, void *dest, int dstride,
const uint16_t *in, int pri_strength,
int sec_strength, int dir, int pri_damping,
@@ -617,6 +623,10 @@
}
}
+#if defined(_MSC_VER) && !defined(__clang__)
+#pragma optimize("", on)
+#endif
+
SIMD_INLINE void copy_block_4xh(const int is_lowbd, void *dest, int dstride,
const uint16_t *in, int height) {
uint8_t *dst8 = (uint8_t *)dest;
@@ -674,14 +684,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint8_t *dst8 = (uint8_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/1);
} else {
- filter_block_4x4(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/1);
@@ -693,14 +702,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint8_t *dst8 = (uint8_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/0);
} else {
- filter_block_4x4(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/0);
@@ -711,14 +719,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint8_t *dst8 = (uint8_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/0,
/*enable_secondary=*/1);
} else {
- filter_block_4x4(/*is_lowbd=*/1, dst8, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/1, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/0,
/*enable_secondary=*/1);
@@ -730,7 +737,6 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint8_t *dst8 = (uint8_t *)dest;
(void)pri_strength;
(void)sec_strength;
(void)dir;
@@ -740,9 +746,9 @@
(void)block_width;
if (block_width == 8) {
- copy_block_8xh(/*is_lowbd=*/1, dst8, dstride, in, block_height);
+ copy_block_8xh(/*is_lowbd=*/1, dest, dstride, in, block_height);
} else {
- copy_block_4xh(/*is_lowbd=*/1, dst8, dstride, in, block_height);
+ copy_block_4xh(/*is_lowbd=*/1, dest, dstride, in, block_height);
}
}
@@ -751,14 +757,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint16_t *dst16 = (uint16_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/1);
} else {
- filter_block_4x4(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/1);
@@ -770,14 +775,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint16_t *dst16 = (uint16_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/0);
} else {
- filter_block_4x4(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/1,
/*enable_secondary=*/0);
@@ -788,14 +792,13 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint16_t *dst16 = (uint16_t *)dest;
if (block_width == 8) {
- filter_block_8x8(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_8x8(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/0,
/*enable_secondary=*/1);
} else {
- filter_block_4x4(/*is_lowbd=*/0, dst16, dstride, in, pri_strength,
+ filter_block_4x4(/*is_lowbd=*/0, dest, dstride, in, pri_strength,
sec_strength, dir, pri_damping, sec_damping, coeff_shift,
block_height, /*enable_primary=*/0,
/*enable_secondary=*/1);
@@ -807,7 +810,6 @@
int pri_damping, int sec_damping,
int coeff_shift, int block_width,
int block_height) {
- uint16_t *dst16 = (uint16_t *)dest;
(void)pri_strength;
(void)sec_strength;
(void)dir;
@@ -816,9 +818,9 @@
(void)coeff_shift;
(void)block_width;
if (block_width == 8) {
- copy_block_8xh(/*is_lowbd=*/0, dst16, dstride, in, block_height);
+ copy_block_8xh(/*is_lowbd=*/0, dest, dstride, in, block_height);
} else {
- copy_block_4xh(/*is_lowbd=*/0, dst16, dstride, in, block_height);
+ copy_block_4xh(/*is_lowbd=*/0, dest, dstride, in, block_height);
}
}
diff --git a/av1/common/cfl.c b/av1/common/cfl.c
index 98199cb..6d4221e 100644
--- a/av1/common/cfl.c
+++ b/av1/common/cfl.c
@@ -27,9 +27,7 @@
cfl->store_y = 0;
// The DC_PRED cache is disabled by default and is only enabled in
// cfl_rd_pick_alpha
- cfl->use_dc_pred_cache = 0;
- cfl->dc_pred_is_cached[CFL_PRED_U] = 0;
- cfl->dc_pred_is_cached[CFL_PRED_V] = 0;
+ clear_cfl_dc_pred_cache_flags(cfl);
}
void cfl_store_dc_pred(MACROBLOCKD *const xd, const uint8_t *input,
diff --git a/av1/common/cfl.h b/av1/common/cfl.h
index 0d53764..af8b833 100644
--- a/av1/common/cfl.h
+++ b/av1/common/cfl.h
@@ -61,11 +61,17 @@
return ROUND_POWER_OF_TWO_SIGNED(scaled_luma_q6, 6);
}
-static INLINE CFL_PRED_TYPE get_cfl_pred_type(PLANE_TYPE plane) {
+static INLINE CFL_PRED_TYPE get_cfl_pred_type(int plane) {
assert(plane > 0);
return (CFL_PRED_TYPE)(plane - 1);
}
+static INLINE void clear_cfl_dc_pred_cache_flags(CFL_CTX *cfl) {
+ cfl->use_dc_pred_cache = false;
+ cfl->dc_pred_is_cached[CFL_PRED_U] = false;
+ cfl->dc_pred_is_cached[CFL_PRED_V] = false;
+}
+
void cfl_predict_block(MACROBLOCKD *const xd, uint8_t *dst, int dst_stride,
TX_SIZE tx_size, int plane);
diff --git a/av1/common/convolve.c b/av1/common/convolve.c
index 54b2bb0..9bca542 100644
--- a/av1/common/convolve.c
+++ b/av1/common/convolve.c
@@ -99,7 +99,13 @@
for (int k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+
+ // TODO(aomedia:3393): for 12-tap filter, in extreme cases, the result can
+ // be beyond the following range. For better prediction, a clamping can be
+ // added for 12 tap filter to ensure the horizontal filtering result is
+ // within 16 bit. The same applies to the vertical filtering.
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
im_block[y * im_stride + x] =
(int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
}
@@ -116,7 +122,8 @@
for (int k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
int16_t res = ROUND_POWER_OF_TWO(sum, conv_params->round_1) -
((1 << (offset_bits - conv_params->round_1)) +
(1 << (offset_bits - conv_params->round_1 - 1)));
@@ -173,6 +180,114 @@
}
}
+// This function is exactly the same as av1_convolve_2d_sr_c, and is an
+// optimized version for intrabc. Use the following 2-tap filter:
+// DECLARE_ALIGNED(256, static const int16_t,
+// av1_intrabc_bilinear_filter[2 * SUBPEL_SHIFTS]) = {
+// 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+// 64, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+// };
+void av1_convolve_2d_sr_intrabc_c(const uint8_t *src, int src_stride,
+ uint8_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_x,
+ const InterpFilterParams *filter_params_y,
+ const int subpel_x_qn, const int subpel_y_qn,
+ ConvolveParams *conv_params) {
+ assert(subpel_x_qn == 8);
+ assert(subpel_y_qn == 8);
+ assert(filter_params_x->taps == 2 && filter_params_y->taps == 2);
+ assert((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS);
+ (void)filter_params_x;
+ (void)subpel_x_qn;
+ (void)filter_params_y;
+ (void)subpel_y_qn;
+ (void)conv_params;
+
+ int16_t im_block[(MAX_SB_SIZE + MAX_FILTER_TAP - 1) * MAX_SB_SIZE];
+ int im_h = h + 1;
+ int im_stride = w;
+ assert(w <= MAX_SB_SIZE && h <= MAX_SB_SIZE);
+ const int bd = 8;
+
+ // horizontal filter
+ // explicitly operate for subpel_x_qn = 8.
+ int16_t *im = im_block;
+ for (int y = 0; y < im_h; ++y) {
+ for (int x = 0; x < w; ++x) {
+ const int32_t sum = (1 << bd) + src[x] + src[x + 1];
+ assert(0 <= sum && sum < (1 << (bd + 2)));
+ im[x] = sum;
+ }
+ src += src_stride;
+ im += im_stride;
+ }
+
+ // vertical filter
+ // explicitly operate for subpel_y_qn = 8.
+ int16_t *src_vert = im_block;
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; ++x) {
+ const int32_t sum =
+ (1 << (bd + 2)) + src_vert[x] + src_vert[im_stride + x];
+ assert(0 <= sum && sum < (1 << (bd + 4)));
+ const int16_t res =
+ ROUND_POWER_OF_TWO(sum, 2) - ((1 << bd) + (1 << (bd - 1)));
+ dst[x] = clip_pixel(res);
+ }
+ src_vert += im_stride;
+ dst += dst_stride;
+ }
+}
+
+// This function is exactly the same as av1_convolve_y_sr_c, and is an
+// optimized version for intrabc.
+void av1_convolve_y_sr_intrabc_c(const uint8_t *src, int src_stride,
+ uint8_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_y,
+ const int subpel_y_qn) {
+ assert(subpel_y_qn == 8);
+ assert(filter_params_y->taps == 2);
+ (void)filter_params_y;
+ (void)subpel_y_qn;
+
+ // vertical filter
+ // explicitly operate for subpel_y_qn = 8.
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; ++x) {
+ const int32_t res = src[x] + src[src_stride + x];
+ dst[x] = clip_pixel(ROUND_POWER_OF_TWO(res, 1));
+ }
+ src += src_stride;
+ dst += dst_stride;
+ }
+}
+
+// This function is exactly the same as av1_convolve_x_sr_c, and is an
+// optimized version for intrabc.
+void av1_convolve_x_sr_intrabc_c(const uint8_t *src, int src_stride,
+ uint8_t *dst, int dst_stride, int w, int h,
+ const InterpFilterParams *filter_params_x,
+ const int subpel_x_qn,
+ ConvolveParams *conv_params) {
+ assert(subpel_x_qn == 8);
+ assert(filter_params_x->taps == 2);
+ assert((conv_params->round_0 + conv_params->round_1) == 2 * FILTER_BITS);
+ (void)filter_params_x;
+ (void)subpel_x_qn;
+ (void)conv_params;
+
+ // horizontal filter
+ // explicitly operate for subpel_x_qn = 8.
+ for (int y = 0; y < h; ++y) {
+ for (int x = 0; x < w; ++x) {
+ const int32_t res = src[x] + src[x + 1];
+ dst[x] = clip_pixel(ROUND_POWER_OF_TWO(res, 1));
+ }
+ src += src_stride;
+ dst += dst_stride;
+ }
+}
+
void av1_dist_wtd_convolve_2d_c(const uint8_t *src, int src_stride,
uint8_t *dst, int dst_stride, int w, int h,
const InterpFilterParams *filter_params_x,
@@ -200,7 +315,8 @@
for (int k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
im_block[y * im_stride + x] =
(int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
}
@@ -217,7 +333,8 @@
for (int k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
if (conv_params->do_average) {
int32_t tmp = dst16[y * dst16_stride + x];
@@ -402,7 +519,8 @@
for (int k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_x[k - fo_horiz];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
im_block[y * im_stride + x] =
(int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
}
@@ -424,7 +542,8 @@
for (int k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_y[(k - fo_vert) * im_stride];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
if (conv_params->is_compound) {
if (conv_params->do_average) {
@@ -529,23 +648,22 @@
const InterpFilterParams *filter_params_y = interp_filters[1];
// TODO(jingning, yunqing): Add SIMD support to 2-tap filter case.
- // Do we have SIMD support to 4-tap case?
// 2-tap filter indicates that it is for IntraBC.
if (filter_params_x->taps == 2 || filter_params_y->taps == 2) {
assert(filter_params_x->taps == 2 && filter_params_y->taps == 2);
assert(!scaled);
if (subpel_x_qn && subpel_y_qn) {
- av1_convolve_2d_sr_c(src, src_stride, dst, dst_stride, w, h,
- filter_params_x, filter_params_y, subpel_x_qn,
- subpel_y_qn, conv_params);
+ av1_convolve_2d_sr_intrabc_c(src, src_stride, dst, dst_stride, w, h,
+ filter_params_x, filter_params_y,
+ subpel_x_qn, subpel_y_qn, conv_params);
return;
} else if (subpel_x_qn) {
- av1_convolve_x_sr_c(src, src_stride, dst, dst_stride, w, h,
- filter_params_x, subpel_x_qn, conv_params);
+ av1_convolve_x_sr_intrabc_c(src, src_stride, dst, dst_stride, w, h,
+ filter_params_x, subpel_x_qn, conv_params);
return;
} else if (subpel_y_qn) {
- av1_convolve_y_sr_c(src, src_stride, dst, dst_stride, w, h,
- filter_params_y, subpel_y_qn);
+ av1_convolve_y_sr_intrabc_c(src, src_stride, dst, dst_stride, w, h,
+ filter_params_y, subpel_y_qn);
return;
}
}
@@ -640,7 +758,8 @@
for (int k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
im_block[y * im_stride + x] =
ROUND_POWER_OF_TWO(sum, conv_params->round_0);
}
@@ -657,7 +776,8 @@
for (int k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
int32_t res = ROUND_POWER_OF_TWO(sum, conv_params->round_1) -
((1 << (offset_bits - conv_params->round_1)) +
(1 << (offset_bits - conv_params->round_1 - 1)));
@@ -694,7 +814,8 @@
for (k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_horiz[y * src_stride + x - fo_horiz + k];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
(void)bd;
im_block[y * im_stride + x] =
(int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
@@ -712,7 +833,8 @@
for (k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_vert[(y - fo_vert + k) * im_stride + x];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
if (conv_params->do_average) {
int32_t tmp = dst16[y * dst16_stride + x];
@@ -899,7 +1021,8 @@
for (int k = 0; k < filter_params_x->taps; ++k) {
sum += x_filter[k] * src_x[k - fo_horiz];
}
- assert(0 <= sum && sum < (1 << (bd + FILTER_BITS + 1)));
+ assert(filter_params_x->taps > 8 ||
+ (0 <= sum && sum < (1 << (bd + FILTER_BITS + 1))));
im_block[y * im_stride + x] =
(int16_t)ROUND_POWER_OF_TWO(sum, conv_params->round_0);
}
@@ -921,7 +1044,8 @@
for (int k = 0; k < filter_params_y->taps; ++k) {
sum += y_filter[k] * src_y[(k - fo_vert) * im_stride];
}
- assert(0 <= sum && sum < (1 << (offset_bits + 2)));
+ assert(filter_params_y->taps > 8 ||
+ (0 <= sum && sum < (1 << (offset_bits + 2))));
CONV_BUF_TYPE res = ROUND_POWER_OF_TWO(sum, conv_params->round_1);
if (conv_params->is_compound) {
if (conv_params->do_average) {
diff --git a/av1/common/entropymode.c b/av1/common/entropymode.c
index 49fc551..8381c1f 100644
--- a/av1/common/entropymode.c
+++ b/av1/common/entropymode.c
@@ -1066,7 +1066,7 @@
RefCntBuffer *const buf = get_ref_frame_buf(cm, i);
if (buf != NULL) buf->frame_context = *cm->fc;
}
- for (int i = 0; i < FRAME_BUFFERS; ++i)
+ for (int i = 0; i < cm->buffer_pool->num_frame_bufs; ++i)
cm->buffer_pool->frame_bufs[i].frame_context = *cm->fc;
}
}
diff --git a/av1/common/entropymode.h b/av1/common/entropymode.h
index d1b0df2..09cd6bd 100644
--- a/av1/common/entropymode.h
+++ b/av1/common/entropymode.h
@@ -190,11 +190,11 @@
void av1_setup_past_independence(struct AV1Common *cm);
// Returns (int)ceil(log2(n)).
-// NOTE: This implementation only works for n <= 2^30.
static INLINE int av1_ceil_log2(int n) {
if (n < 2) return 0;
- int i = 1, p = 2;
- while (p < n) {
+ int i = 1;
+ unsigned int p = 2;
+ while (p < (unsigned int)n) {
i++;
p = p << 1;
}
diff --git a/av1/common/enums.h b/av1/common/enums.h
index 1b952c4..fb4d756 100644
--- a/av1/common/enums.h
+++ b/av1/common/enums.h
@@ -558,8 +558,16 @@
// REF_FRAMES for the cm->ref_frame_map array, 1 scratch frame for the new
// frame in cm->cur_frame, INTER_REFS_PER_FRAME for scaled references on the
// encoder in the cpi->scaled_ref_buf array.
+// The encoder uses FRAME_BUFFERS only in GOOD and REALTIME encoding modes.
+// The decoder also uses FRAME_BUFFERS.
#define FRAME_BUFFERS (REF_FRAMES + 1 + INTER_REFS_PER_FRAME)
+// During allintra encoding, one reference frame buffer is free to be used again
+// only after another frame buffer is stored as the reference frame. Hence, it
+// is necessary and sufficient to maintain only two reference frame buffers in
+// this case.
+#define FRAME_BUFFERS_ALLINTRA 2
+
#define FWD_RF_OFFSET(ref) (ref - LAST_FRAME)
#define BWD_RF_OFFSET(ref) (ref - BWDREF_FRAME)
@@ -610,7 +618,7 @@
} RestorationType;
/*!\cond */
-// Picture prediction structures (0-12 are predefined) in scalability metadata.
+// Picture prediction structures (0-13 are predefined) in scalability metadata.
enum {
SCALABILITY_L1T2 = 0,
SCALABILITY_L1T3 = 1,
diff --git a/av1/common/mv.h b/av1/common/mv.h
index a61287b..6828834 100644
--- a/av1/common/mv.h
+++ b/av1/common/mv.h
@@ -94,11 +94,9 @@
// Bits of precision used for the model
#define WARPEDMODEL_PREC_BITS 16
-#define WARPEDMODEL_ROW3HOMO_PREC_BITS 16
#define WARPEDMODEL_TRANS_CLAMP (128 << WARPEDMODEL_PREC_BITS)
#define WARPEDMODEL_NONDIAGAFFINE_CLAMP (1 << (WARPEDMODEL_PREC_BITS - 3))
-#define WARPEDMODEL_ROW3HOMO_CLAMP (1 << (WARPEDMODEL_PREC_BITS - 2))
// Bits of subpel precision for warped interpolation
#define WARPEDPIXEL_PREC_BITS 6
@@ -108,26 +106,18 @@
#define WARPEDDIFF_PREC_BITS (WARPEDMODEL_PREC_BITS - WARPEDPIXEL_PREC_BITS)
-// Number of types used for global motion (must be >= 3 and <= TRANS_TYPES)
-// The following can be useful:
-// GLOBAL_TRANS_TYPES 3 - up to rotation-zoom
-// GLOBAL_TRANS_TYPES 4 - up to affine
-// GLOBAL_TRANS_TYPES 6 - up to hor/ver trapezoids
-// GLOBAL_TRANS_TYPES 7 - up to full homography
-#define GLOBAL_TRANS_TYPES 4
-
typedef struct {
int global_warp_allowed;
int local_warp_allowed;
} WarpTypesAllowed;
// The order of values in the wmmat matrix below is best described
-// by the homography:
+// by the affine transformation:
// [x' (m2 m3 m0 [x
// z . y' = m4 m5 m1 * y
-// 1] m6 m7 1) 1]
+// 1] 0 0 1) 1]
typedef struct {
- int32_t wmmat[6];
+ int32_t wmmat[MAX_PARAMDIM];
int16_t alpha, beta, gamma, delta;
TransformationType wmtype;
int8_t invalid;
@@ -184,19 +174,11 @@
#define GM_ALPHA_PREC_DIFF (WARPEDMODEL_PREC_BITS - GM_ALPHA_PREC_BITS)
#define GM_ALPHA_DECODE_FACTOR (1 << GM_ALPHA_PREC_DIFF)
-#define GM_ROW3HOMO_PREC_BITS 16
-#define GM_ABS_ROW3HOMO_BITS 11
-#define GM_ROW3HOMO_PREC_DIFF \
- (WARPEDMODEL_ROW3HOMO_PREC_BITS - GM_ROW3HOMO_PREC_BITS)
-#define GM_ROW3HOMO_DECODE_FACTOR (1 << GM_ROW3HOMO_PREC_DIFF)
-
#define GM_TRANS_MAX (1 << GM_ABS_TRANS_BITS)
#define GM_ALPHA_MAX (1 << GM_ABS_ALPHA_BITS)
-#define GM_ROW3HOMO_MAX (1 << GM_ABS_ROW3HOMO_BITS)
#define GM_TRANS_MIN -GM_TRANS_MAX
#define GM_ALPHA_MIN -GM_ALPHA_MAX
-#define GM_ROW3HOMO_MIN -GM_ROW3HOMO_MAX
static INLINE int block_center_x(int mi_col, BLOCK_SIZE bs) {
const int bw = block_size_wide[bs];
diff --git a/av1/common/quant_common.c b/av1/common/quant_common.c
index e96d71a..b097628 100644
--- a/av1/common/quant_common.c
+++ b/av1/common/quant_common.c
@@ -415,21 +415,12 @@
121, 122, 130, 130, 140, 140, 150, 151, 163, 164, 176, 177, 190, 191,
204, 206, 222, 224, 230, 232, 242,
/* Size 4x8 */
- 32, 42, 75, 91, 33, 42, 69, 86, 37, 58, 84, 91, 49, 71, 103, 110, 65,
- 84, 125, 128, 80, 97, 142, 152, 91, 100, 145, 178, 104, 112, 146, 190,
- /* Size 8x4 */
32, 33, 37, 49, 65, 80, 91, 104, 42, 42, 58, 71, 84, 97, 100, 112, 75,
69, 84, 103, 125, 142, 145, 146, 91, 86, 91, 110, 128, 152, 178, 190,
+ /* Size 8x4 */
+ 32, 42, 75, 91, 33, 42, 69, 86, 37, 58, 84, 91, 49, 71, 103, 110, 65,
+ 84, 125, 128, 80, 97, 142, 152, 91, 100, 145, 178, 104, 112, 146, 190,
/* Size 8x16 */
- 32, 32, 36, 53, 65, 87, 93, 99, 31, 33, 34, 49, 59, 78, 86, 93, 32, 34,
- 36, 50, 59, 77, 82, 89, 34, 37, 42, 54, 63, 79, 80, 88, 36, 38, 48, 60,
- 68, 84, 86, 90, 44, 43, 53, 71, 79, 95, 94, 97, 48, 46, 56, 76, 85, 102,
- 105, 105, 58, 54, 63, 87, 98, 116, 112, 115, 65, 58, 68, 92, 105, 124,
- 122, 124, 79, 70, 79, 104, 118, 141, 135, 135, 82, 72, 81, 106, 121,
- 144, 149, 146, 91, 80, 88, 106, 130, 148, 162, 159, 97, 86, 94, 107,
- 128, 157, 167, 171, 103, 93, 98, 114, 131, 150, 174, 186, 110, 100, 101,
- 117, 138, 161, 183, 193, 118, 107, 105, 118, 136, 157, 182, 203,
- /* Size 16x8 */
32, 31, 32, 34, 36, 44, 48, 58, 65, 79, 82, 91, 97, 103, 110, 118, 32,
33, 34, 37, 38, 43, 46, 54, 58, 70, 72, 80, 86, 93, 100, 107, 36, 34,
36, 42, 48, 53, 56, 63, 68, 79, 81, 88, 94, 98, 101, 105, 53, 49, 50,
@@ -438,40 +429,16 @@
79, 84, 95, 102, 116, 124, 141, 144, 148, 157, 150, 161, 157, 93, 86,
82, 80, 86, 94, 105, 112, 122, 135, 149, 162, 167, 174, 183, 182, 99,
93, 89, 88, 90, 97, 105, 115, 124, 135, 146, 159, 171, 186, 193, 203,
+ /* Size 16x8 */
+ 32, 32, 36, 53, 65, 87, 93, 99, 31, 33, 34, 49, 59, 78, 86, 93, 32, 34,
+ 36, 50, 59, 77, 82, 89, 34, 37, 42, 54, 63, 79, 80, 88, 36, 38, 48, 60,
+ 68, 84, 86, 90, 44, 43, 53, 71, 79, 95, 94, 97, 48, 46, 56, 76, 85, 102,
+ 105, 105, 58, 54, 63, 87, 98, 116, 112, 115, 65, 58, 68, 92, 105, 124,
+ 122, 124, 79, 70, 79, 104, 118, 141, 135, 135, 82, 72, 81, 106, 121,
+ 144, 149, 146, 91, 80, 88, 106, 130, 148, 162, 159, 97, 86, 94, 107,
+ 128, 157, 167, 171, 103, 93, 98, 114, 131, 150, 174, 186, 110, 100, 101,
+ 117, 138, 161, 183, 193, 118, 107, 105, 118, 136, 157, 182, 203,
/* Size 16x32 */
- 32, 31, 32, 34, 36, 44, 53, 59, 65, 79, 87, 90, 93, 96, 99, 102, 31, 32,
- 32, 34, 35, 42, 51, 56, 62, 75, 82, 85, 88, 91, 94, 97, 31, 32, 33, 33,
- 34, 41, 49, 54, 59, 72, 78, 82, 86, 90, 93, 97, 31, 32, 33, 34, 35, 41,
- 49, 54, 59, 71, 78, 81, 84, 87, 90, 93, 32, 32, 34, 35, 36, 42, 50, 54,
- 59, 71, 77, 80, 82, 86, 89, 93, 32, 33, 35, 37, 38, 42, 49, 53, 58, 69,
- 75, 78, 82, 86, 89, 92, 34, 34, 37, 39, 42, 48, 54, 58, 63, 73, 79, 78,
- 80, 83, 88, 92, 35, 34, 37, 41, 45, 50, 57, 61, 65, 76, 82, 83, 84, 84,
- 87, 90, 36, 34, 38, 43, 48, 54, 60, 64, 68, 78, 84, 87, 86, 89, 90, 90,
- 39, 37, 40, 45, 50, 58, 65, 69, 73, 84, 89, 89, 91, 91, 93, 96, 44, 41,
- 43, 48, 53, 63, 71, 75, 79, 90, 95, 93, 94, 95, 97, 97, 46, 43, 44, 49,
- 55, 65, 73, 78, 82, 93, 98, 100, 98, 100, 99, 103, 48, 45, 46, 51, 56,
- 67, 76, 80, 85, 96, 102, 102, 105, 102, 105, 104, 53, 49, 50, 54, 60,
- 71, 82, 87, 92, 103, 109, 107, 107, 110, 107, 111, 58, 54, 54, 58, 63,
- 75, 87, 92, 98, 110, 116, 115, 112, 111, 115, 112, 61, 57, 56, 60, 66,
- 77, 89, 95, 101, 114, 120, 118, 119, 118, 116, 120, 65, 60, 58, 63, 68,
- 79, 92, 98, 105, 118, 124, 123, 122, 123, 124, 121, 71, 65, 63, 68, 73,
- 84, 97, 103, 111, 125, 132, 132, 130, 128, 127, 130, 79, 72, 70, 74, 79,
- 90, 104, 110, 118, 133, 141, 136, 135, 135, 135, 131, 81, 74, 71, 75,
- 80, 91, 105, 112, 119, 135, 142, 140, 140, 138, 139, 142, 82, 75, 72,
- 76, 81, 92, 106, 113, 121, 136, 144, 151, 149, 149, 146, 143, 88, 80,
- 77, 80, 85, 97, 108, 115, 126, 142, 149, 153, 153, 152, 152, 154, 91,
- 83, 80, 81, 88, 100, 106, 114, 130, 142, 148, 155, 162, 160, 159, 155,
- 94, 85, 83, 82, 91, 100, 105, 118, 131, 137, 153, 160, 165, 167, 166,
- 168, 97, 88, 86, 85, 94, 100, 107, 123, 128, 140, 157, 161, 167, 173,
- 171, 169, 100, 91, 89, 87, 97, 100, 111, 121, 127, 145, 152, 164, 173,
- 178, 182, 181, 103, 94, 93, 90, 98, 101, 114, 120, 131, 144, 150, 170,
- 174, 180, 186, 183, 107, 97, 96, 93, 100, 104, 117, 119, 136, 142, 155,
- 168, 177, 187, 191, 198, 110, 101, 100, 97, 101, 108, 117, 123, 138,
- 141, 161, 165, 183, 188, 193, 200, 114, 104, 104, 100, 103, 112, 117,
- 127, 137, 146, 159, 167, 185, 190, 201, 206, 118, 108, 107, 103, 105,
- 115, 118, 131, 136, 151, 157, 172, 182, 197, 203, 208, 122, 111, 111,
- 107, 107, 119, 119, 136, 136, 156, 156, 178, 179, 203, 204, 217,
- /* Size 32x16 */
32, 31, 31, 31, 32, 32, 34, 35, 36, 39, 44, 46, 48, 53, 58, 61, 65, 71,
79, 81, 82, 88, 91, 94, 97, 100, 103, 107, 110, 114, 118, 122, 31, 32,
32, 32, 32, 33, 34, 34, 34, 37, 41, 43, 45, 49, 54, 57, 60, 65, 72, 74,
@@ -504,34 +471,50 @@
152, 159, 166, 171, 182, 186, 191, 193, 201, 203, 204, 102, 97, 97, 93,
93, 92, 92, 90, 90, 96, 97, 103, 104, 111, 112, 120, 121, 130, 131, 142,
143, 154, 155, 168, 169, 181, 183, 198, 200, 206, 208, 217,
+ /* Size 32x16 */
+ 32, 31, 32, 34, 36, 44, 53, 59, 65, 79, 87, 90, 93, 96, 99, 102, 31, 32,
+ 32, 34, 35, 42, 51, 56, 62, 75, 82, 85, 88, 91, 94, 97, 31, 32, 33, 33,
+ 34, 41, 49, 54, 59, 72, 78, 82, 86, 90, 93, 97, 31, 32, 33, 34, 35, 41,
+ 49, 54, 59, 71, 78, 81, 84, 87, 90, 93, 32, 32, 34, 35, 36, 42, 50, 54,
+ 59, 71, 77, 80, 82, 86, 89, 93, 32, 33, 35, 37, 38, 42, 49, 53, 58, 69,
+ 75, 78, 82, 86, 89, 92, 34, 34, 37, 39, 42, 48, 54, 58, 63, 73, 79, 78,
+ 80, 83, 88, 92, 35, 34, 37, 41, 45, 50, 57, 61, 65, 76, 82, 83, 84, 84,
+ 87, 90, 36, 34, 38, 43, 48, 54, 60, 64, 68, 78, 84, 87, 86, 89, 90, 90,
+ 39, 37, 40, 45, 50, 58, 65, 69, 73, 84, 89, 89, 91, 91, 93, 96, 44, 41,
+ 43, 48, 53, 63, 71, 75, 79, 90, 95, 93, 94, 95, 97, 97, 46, 43, 44, 49,
+ 55, 65, 73, 78, 82, 93, 98, 100, 98, 100, 99, 103, 48, 45, 46, 51, 56,
+ 67, 76, 80, 85, 96, 102, 102, 105, 102, 105, 104, 53, 49, 50, 54, 60,
+ 71, 82, 87, 92, 103, 109, 107, 107, 110, 107, 111, 58, 54, 54, 58, 63,
+ 75, 87, 92, 98, 110, 116, 115, 112, 111, 115, 112, 61, 57, 56, 60, 66,
+ 77, 89, 95, 101, 114, 120, 118, 119, 118, 116, 120, 65, 60, 58, 63, 68,
+ 79, 92, 98, 105, 118, 124, 123, 122, 123, 124, 121, 71, 65, 63, 68, 73,
+ 84, 97, 103, 111, 125, 132, 132, 130, 128, 127, 130, 79, 72, 70, 74, 79,
+ 90, 104, 110, 118, 133, 141, 136, 135, 135, 135, 131, 81, 74, 71, 75,
+ 80, 91, 105, 112, 119, 135, 142, 140, 140, 138, 139, 142, 82, 75, 72,
+ 76, 81, 92, 106, 113, 121, 136, 144, 151, 149, 149, 146, 143, 88, 80,
+ 77, 80, 85, 97, 108, 115, 126, 142, 149, 153, 153, 152, 152, 154, 91,
+ 83, 80, 81, 88, 100, 106, 114, 130, 142, 148, 155, 162, 160, 159, 155,
+ 94, 85, 83, 82, 91, 100, 105, 118, 131, 137, 153, 160, 165, 167, 166,
+ 168, 97, 88, 86, 85, 94, 100, 107, 123, 128, 140, 157, 161, 167, 173,
+ 171, 169, 100, 91, 89, 87, 97, 100, 111, 121, 127, 145, 152, 164, 173,
+ 178, 182, 181, 103, 94, 93, 90, 98, 101, 114, 120, 131, 144, 150, 170,
+ 174, 180, 186, 183, 107, 97, 96, 93, 100, 104, 117, 119, 136, 142, 155,
+ 168, 177, 187, 191, 198, 110, 101, 100, 97, 101, 108, 117, 123, 138,
+ 141, 161, 165, 183, 188, 193, 200, 114, 104, 104, 100, 103, 112, 117,
+ 127, 137, 146, 159, 167, 185, 190, 201, 206, 118, 108, 107, 103, 105,
+ 115, 118, 131, 136, 151, 157, 172, 182, 197, 203, 208, 122, 111, 111,
+ 107, 107, 119, 119, 136, 136, 156, 156, 178, 179, 203, 204, 217,
/* Size 4x16 */
- 31, 44, 79, 96, 32, 41, 72, 90, 32, 42, 71, 86, 34, 48, 73, 83, 34, 54,
- 78, 89, 41, 63, 90, 95, 45, 67, 96, 102, 54, 75, 110, 111, 60, 79, 118,
- 123, 72, 90, 133, 135, 75, 92, 136, 149, 83, 100, 142, 160, 88, 100,
- 140, 173, 94, 101, 144, 180, 101, 108, 141, 188, 108, 115, 151, 197,
- /* Size 16x4 */
31, 32, 32, 34, 34, 41, 45, 54, 60, 72, 75, 83, 88, 94, 101, 108, 44,
41, 42, 48, 54, 63, 67, 75, 79, 90, 92, 100, 100, 101, 108, 115, 79, 72,
71, 73, 78, 90, 96, 110, 118, 133, 136, 142, 140, 144, 141, 151, 96, 90,
86, 83, 89, 95, 102, 111, 123, 135, 149, 160, 173, 180, 188, 197,
+ /* Size 16x4 */
+ 31, 44, 79, 96, 32, 41, 72, 90, 32, 42, 71, 86, 34, 48, 73, 83, 34, 54,
+ 78, 89, 41, 63, 90, 95, 45, 67, 96, 102, 54, 75, 110, 111, 60, 79, 118,
+ 123, 72, 90, 133, 135, 75, 92, 136, 149, 83, 100, 142, 160, 88, 100,
+ 140, 173, 94, 101, 144, 180, 101, 108, 141, 188, 108, 115, 151, 197,
/* Size 8x32 */
- 32, 32, 36, 53, 65, 87, 93, 99, 31, 32, 35, 51, 62, 82, 88, 94, 31, 33,
- 34, 49, 59, 78, 86, 93, 31, 33, 35, 49, 59, 78, 84, 90, 32, 34, 36, 50,
- 59, 77, 82, 89, 32, 35, 38, 49, 58, 75, 82, 89, 34, 37, 42, 54, 63, 79,
- 80, 88, 35, 37, 45, 57, 65, 82, 84, 87, 36, 38, 48, 60, 68, 84, 86, 90,
- 39, 40, 50, 65, 73, 89, 91, 93, 44, 43, 53, 71, 79, 95, 94, 97, 46, 44,
- 55, 73, 82, 98, 98, 99, 48, 46, 56, 76, 85, 102, 105, 105, 53, 50, 60,
- 82, 92, 109, 107, 107, 58, 54, 63, 87, 98, 116, 112, 115, 61, 56, 66,
- 89, 101, 120, 119, 116, 65, 58, 68, 92, 105, 124, 122, 124, 71, 63, 73,
- 97, 111, 132, 130, 127, 79, 70, 79, 104, 118, 141, 135, 135, 81, 71, 80,
- 105, 119, 142, 140, 139, 82, 72, 81, 106, 121, 144, 149, 146, 88, 77,
- 85, 108, 126, 149, 153, 152, 91, 80, 88, 106, 130, 148, 162, 159, 94,
- 83, 91, 105, 131, 153, 165, 166, 97, 86, 94, 107, 128, 157, 167, 171,
- 100, 89, 97, 111, 127, 152, 173, 182, 103, 93, 98, 114, 131, 150, 174,
- 186, 107, 96, 100, 117, 136, 155, 177, 191, 110, 100, 101, 117, 138,
- 161, 183, 193, 114, 104, 103, 117, 137, 159, 185, 201, 118, 107, 105,
- 118, 136, 157, 182, 203, 122, 111, 107, 119, 136, 156, 179, 204,
- /* Size 32x8 */
32, 31, 31, 31, 32, 32, 34, 35, 36, 39, 44, 46, 48, 53, 58, 61, 65, 71,
79, 81, 82, 88, 91, 94, 97, 100, 103, 107, 110, 114, 118, 122, 32, 32,
33, 33, 34, 35, 37, 37, 38, 40, 43, 44, 46, 50, 54, 56, 58, 63, 70, 71,
@@ -547,7 +530,24 @@
84, 86, 91, 94, 98, 105, 107, 112, 119, 122, 130, 135, 140, 149, 153,
162, 165, 167, 173, 174, 177, 183, 185, 182, 179, 99, 94, 93, 90, 89,
89, 88, 87, 90, 93, 97, 99, 105, 107, 115, 116, 124, 127, 135, 139, 146,
- 152, 159, 166, 171, 182, 186, 191, 193, 201, 203, 204 },
+ 152, 159, 166, 171, 182, 186, 191, 193, 201, 203, 204,
+ /* Size 32x8 */
+ 32, 32, 36, 53, 65, 87, 93, 99, 31, 32, 35, 51, 62, 82, 88, 94, 31, 33,
+ 34, 49, 59, 78, 86, 93, 31, 33, 35, 49, 59, 78, 84, 90, 32, 34, 36, 50,
+ 59, 77, 82, 89, 32, 35, 38, 49, 58, 75, 82, 89, 34, 37, 42, 54, 63, 79,
+ 80, 88, 35, 37, 45, 57, 65, 82, 84, 87, 36, 38, 48, 60, 68, 84, 86, 90,
+ 39, 40, 50, 65, 73, 89, 91, 93, 44, 43, 53, 71, 79, 95, 94, 97, 46, 44,
+ 55, 73, 82, 98, 98, 99, 48, 46, 56, 76, 85, 102, 105, 105, 53, 50, 60,
+ 82, 92, 109, 107, 107, 58, 54, 63, 87, 98, 116, 112, 115, 61, 56, 66,
+ 89, 101, 120, 119, 116, 65, 58, 68, 92, 105, 124, 122, 124, 71, 63, 73,
+ 97, 111, 132, 130, 127, 79, 70, 79, 104, 118, 141, 135, 135, 81, 71, 80,
+ 105, 119, 142, 140, 139, 82, 72, 81, 106, 121, 144, 149, 146, 88, 77,
+ 85, 108, 126, 149, 153, 152, 91, 80, 88, 106, 130, 148, 162, 159, 94,
+ 83, 91, 105, 131, 153, 165, 166, 97, 86, 94, 107, 128, 157, 167, 171,
+ 100, 89, 97, 111, 127, 152, 173, 182, 103, 93, 98, 114, 131, 150, 174,
+ 186, 107, 96, 100, 117, 136, 155, 177, 191, 110, 100, 101, 117, 138,
+ 161, 183, 193, 114, 104, 103, 117, 137, 159, 185, 201, 118, 107, 105,
+ 118, 136, 157, 182, 203, 122, 111, 107, 119, 136, 156, 179, 204 },
{ /* Chroma */
/* Size 4x4 */
35, 46, 57, 66, 46, 60, 69, 71, 57, 69, 90, 90, 66, 71, 90, 109,
@@ -633,21 +633,12 @@
77, 78, 82, 82, 86, 87, 92, 92, 96, 97, 102, 102, 107, 107, 112, 113,
115, 115, 118,
/* Size 4x8 */
- 31, 47, 60, 66, 40, 45, 54, 61, 46, 56, 64, 64, 48, 61, 75, 73, 54, 65,
- 85, 82, 61, 69, 92, 92, 64, 68, 90, 102, 68, 71, 87, 105,
- /* Size 8x4 */
31, 40, 46, 48, 54, 61, 64, 68, 47, 45, 56, 61, 65, 69, 68, 71, 60, 54,
64, 75, 85, 92, 90, 87, 66, 61, 64, 73, 82, 92, 102, 105,
+ /* Size 8x4 */
+ 31, 47, 60, 66, 40, 45, 54, 61, 46, 56, 64, 64, 48, 61, 75, 73, 54, 65,
+ 85, 82, 61, 69, 92, 92, 64, 68, 90, 102, 68, 71, 87, 105,
/* Size 8x16 */
- 32, 37, 48, 52, 57, 66, 68, 71, 30, 40, 46, 48, 52, 60, 63, 66, 33, 43,
- 47, 47, 51, 59, 60, 63, 42, 47, 50, 50, 53, 60, 59, 62, 49, 48, 53, 54,
- 57, 62, 62, 62, 49, 46, 53, 61, 64, 69, 66, 66, 50, 46, 54, 64, 67, 73,
- 72, 70, 54, 49, 55, 68, 73, 80, 76, 75, 57, 50, 56, 70, 76, 84, 80, 79,
- 63, 55, 60, 75, 82, 92, 87, 84, 64, 56, 61, 75, 83, 93, 93, 89, 68, 59,
- 64, 74, 86, 94, 98, 94, 70, 62, 66, 73, 83, 96, 99, 98, 72, 64, 66, 75,
- 83, 92, 101, 104, 74, 67, 66, 74, 84, 94, 103, 106, 76, 69, 67, 73, 82,
- 91, 101, 109,
- /* Size 16x8 */
32, 30, 33, 42, 49, 49, 50, 54, 57, 63, 64, 68, 70, 72, 74, 76, 37, 40,
43, 47, 48, 46, 46, 49, 50, 55, 56, 59, 62, 64, 67, 69, 48, 46, 47, 50,
53, 53, 54, 55, 56, 60, 61, 64, 66, 66, 66, 67, 52, 48, 47, 50, 54, 61,
@@ -656,37 +647,16 @@
93, 94, 96, 92, 94, 91, 68, 63, 60, 59, 62, 66, 72, 76, 80, 87, 93, 98,
99, 101, 103, 101, 71, 66, 63, 62, 62, 66, 70, 75, 79, 84, 89, 94, 98,
104, 106, 109,
+ /* Size 16x8 */
+ 32, 37, 48, 52, 57, 66, 68, 71, 30, 40, 46, 48, 52, 60, 63, 66, 33, 43,
+ 47, 47, 51, 59, 60, 63, 42, 47, 50, 50, 53, 60, 59, 62, 49, 48, 53, 54,
+ 57, 62, 62, 62, 49, 46, 53, 61, 64, 69, 66, 66, 50, 46, 54, 64, 67, 73,
+ 72, 70, 54, 49, 55, 68, 73, 80, 76, 75, 57, 50, 56, 70, 76, 84, 80, 79,
+ 63, 55, 60, 75, 82, 92, 87, 84, 64, 56, 61, 75, 83, 93, 93, 89, 68, 59,
+ 64, 74, 86, 94, 98, 94, 70, 62, 66, 73, 83, 96, 99, 98, 72, 64, 66, 75,
+ 83, 92, 101, 104, 74, 67, 66, 74, 84, 94, 103, 106, 76, 69, 67, 73, 82,
+ 91, 101, 109,
/* Size 16x32 */
- 32, 31, 37, 42, 48, 49, 52, 54, 57, 63, 66, 67, 68, 69, 71, 72, 31, 31,
- 38, 42, 47, 47, 50, 52, 54, 60, 63, 64, 65, 66, 67, 68, 30, 32, 40, 42,
- 46, 45, 48, 50, 52, 57, 60, 62, 63, 65, 66, 68, 32, 34, 41, 44, 46, 45,
- 48, 49, 51, 57, 59, 61, 62, 63, 64, 65, 33, 36, 43, 45, 47, 46, 47, 49,
- 51, 56, 59, 60, 60, 62, 63, 65, 37, 40, 47, 47, 47, 45, 47, 48, 50, 54,
- 57, 58, 60, 61, 62, 63, 42, 43, 47, 48, 50, 49, 50, 52, 53, 57, 60, 58,
- 59, 60, 62, 63, 45, 44, 47, 49, 51, 51, 52, 54, 55, 59, 61, 61, 61, 60,
- 61, 61, 49, 46, 48, 50, 53, 53, 54, 55, 57, 60, 62, 63, 62, 63, 62, 62,
- 48, 46, 47, 50, 53, 56, 57, 59, 60, 64, 66, 65, 65, 64, 64, 65, 49, 45,
- 46, 49, 53, 58, 61, 62, 64, 67, 69, 67, 66, 66, 66, 65, 49, 46, 46, 49,
- 53, 59, 62, 64, 65, 69, 71, 70, 68, 68, 67, 68, 50, 46, 46, 50, 54, 59,
- 64, 65, 67, 71, 73, 72, 72, 70, 70, 69, 52, 48, 47, 50, 54, 61, 66, 68,
- 71, 75, 77, 74, 73, 73, 71, 72, 54, 50, 49, 52, 55, 62, 68, 71, 73, 78,
- 80, 78, 76, 74, 75, 73, 55, 51, 49, 52, 56, 63, 69, 72, 75, 80, 82, 80,
- 79, 78, 76, 77, 57, 52, 50, 53, 56, 64, 70, 73, 76, 82, 84, 82, 80, 80,
- 79, 77, 60, 54, 52, 55, 58, 65, 72, 75, 79, 85, 88, 86, 84, 82, 81, 81,
- 63, 57, 55, 58, 60, 67, 75, 78, 82, 89, 92, 88, 87, 85, 84, 81, 64, 58,
- 55, 58, 61, 68, 75, 78, 82, 89, 92, 90, 89, 87, 86, 86, 64, 59, 56, 58,
- 61, 68, 75, 79, 83, 90, 93, 95, 93, 91, 89, 87, 67, 61, 58, 60, 63, 69,
- 76, 79, 85, 92, 95, 96, 94, 92, 91, 91, 68, 62, 59, 60, 64, 71, 74, 78,
- 86, 91, 94, 96, 98, 96, 94, 91, 69, 62, 60, 60, 65, 70, 72, 79, 85, 88,
- 95, 98, 99, 98, 97, 96, 70, 63, 62, 60, 66, 69, 73, 81, 83, 89, 96, 97,
- 99, 101, 98, 97, 71, 64, 63, 61, 67, 68, 74, 79, 82, 90, 93, 98, 102,
- 102, 102, 101, 72, 65, 64, 62, 66, 68, 75, 78, 83, 89, 92, 100, 101,
- 103, 104, 102, 73, 66, 65, 63, 66, 69, 75, 76, 84, 87, 93, 98, 102, 105,
- 106, 107, 74, 67, 67, 64, 66, 70, 74, 77, 84, 86, 94, 96, 103, 105, 106,
- 107, 75, 68, 68, 65, 66, 71, 74, 78, 83, 87, 93, 96, 103, 105, 109, 109,
- 76, 69, 69, 66, 67, 72, 73, 80, 82, 88, 91, 97, 101, 107, 109, 110, 77,
- 70, 70, 67, 67, 73, 73, 81, 81, 90, 90, 99, 99, 108, 108, 113,
- /* Size 32x16 */
32, 31, 30, 32, 33, 37, 42, 45, 49, 48, 49, 49, 50, 52, 54, 55, 57, 60,
63, 64, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 31, 31, 32, 34,
36, 40, 43, 44, 46, 46, 45, 46, 46, 48, 50, 51, 52, 54, 57, 58, 59, 61,
@@ -716,33 +686,47 @@
76, 79, 81, 84, 86, 89, 91, 94, 97, 98, 102, 104, 106, 106, 109, 109,
108, 72, 68, 68, 65, 65, 63, 63, 61, 62, 65, 65, 68, 69, 72, 73, 77, 77,
81, 81, 86, 87, 91, 91, 96, 97, 101, 102, 107, 107, 109, 110, 113,
+ /* Size 32x16 */
+ 32, 31, 37, 42, 48, 49, 52, 54, 57, 63, 66, 67, 68, 69, 71, 72, 31, 31,
+ 38, 42, 47, 47, 50, 52, 54, 60, 63, 64, 65, 66, 67, 68, 30, 32, 40, 42,
+ 46, 45, 48, 50, 52, 57, 60, 62, 63, 65, 66, 68, 32, 34, 41, 44, 46, 45,
+ 48, 49, 51, 57, 59, 61, 62, 63, 64, 65, 33, 36, 43, 45, 47, 46, 47, 49,
+ 51, 56, 59, 60, 60, 62, 63, 65, 37, 40, 47, 47, 47, 45, 47, 48, 50, 54,
+ 57, 58, 60, 61, 62, 63, 42, 43, 47, 48, 50, 49, 50, 52, 53, 57, 60, 58,
+ 59, 60, 62, 63, 45, 44, 47, 49, 51, 51, 52, 54, 55, 59, 61, 61, 61, 60,
+ 61, 61, 49, 46, 48, 50, 53, 53, 54, 55, 57, 60, 62, 63, 62, 63, 62, 62,
+ 48, 46, 47, 50, 53, 56, 57, 59, 60, 64, 66, 65, 65, 64, 64, 65, 49, 45,
+ 46, 49, 53, 58, 61, 62, 64, 67, 69, 67, 66, 66, 66, 65, 49, 46, 46, 49,
+ 53, 59, 62, 64, 65, 69, 71, 70, 68, 68, 67, 68, 50, 46, 46, 50, 54, 59,
+ 64, 65, 67, 71, 73, 72, 72, 70, 70, 69, 52, 48, 47, 50, 54, 61, 66, 68,
+ 71, 75, 77, 74, 73, 73, 71, 72, 54, 50, 49, 52, 55, 62, 68, 71, 73, 78,
+ 80, 78, 76, 74, 75, 73, 55, 51, 49, 52, 56, 63, 69, 72, 75, 80, 82, 80,
+ 79, 78, 76, 77, 57, 52, 50, 53, 56, 64, 70, 73, 76, 82, 84, 82, 80, 80,
+ 79, 77, 60, 54, 52, 55, 58, 65, 72, 75, 79, 85, 88, 86, 84, 82, 81, 81,
+ 63, 57, 55, 58, 60, 67, 75, 78, 82, 89, 92, 88, 87, 85, 84, 81, 64, 58,
+ 55, 58, 61, 68, 75, 78, 82, 89, 92, 90, 89, 87, 86, 86, 64, 59, 56, 58,
+ 61, 68, 75, 79, 83, 90, 93, 95, 93, 91, 89, 87, 67, 61, 58, 60, 63, 69,
+ 76, 79, 85, 92, 95, 96, 94, 92, 91, 91, 68, 62, 59, 60, 64, 71, 74, 78,
+ 86, 91, 94, 96, 98, 96, 94, 91, 69, 62, 60, 60, 65, 70, 72, 79, 85, 88,
+ 95, 98, 99, 98, 97, 96, 70, 63, 62, 60, 66, 69, 73, 81, 83, 89, 96, 97,
+ 99, 101, 98, 97, 71, 64, 63, 61, 67, 68, 74, 79, 82, 90, 93, 98, 102,
+ 102, 102, 101, 72, 65, 64, 62, 66, 68, 75, 78, 83, 89, 92, 100, 101,
+ 103, 104, 102, 73, 66, 65, 63, 66, 69, 75, 76, 84, 87, 93, 98, 102, 105,
+ 106, 107, 74, 67, 67, 64, 66, 70, 74, 77, 84, 86, 94, 96, 103, 105, 106,
+ 107, 75, 68, 68, 65, 66, 71, 74, 78, 83, 87, 93, 96, 103, 105, 109, 109,
+ 76, 69, 69, 66, 67, 72, 73, 80, 82, 88, 91, 97, 101, 107, 109, 110, 77,
+ 70, 70, 67, 67, 73, 73, 81, 81, 90, 90, 99, 99, 108, 108, 113,
/* Size 4x16 */
- 31, 49, 63, 69, 32, 45, 57, 65, 36, 46, 56, 62, 43, 49, 57, 60, 46, 53,
- 60, 63, 45, 58, 67, 66, 46, 59, 71, 70, 50, 62, 78, 74, 52, 64, 82, 80,
- 57, 67, 89, 85, 59, 68, 90, 91, 62, 71, 91, 96, 63, 69, 89, 101, 65, 68,
- 89, 103, 67, 70, 86, 105, 69, 72, 88, 107,
- /* Size 16x4 */
31, 32, 36, 43, 46, 45, 46, 50, 52, 57, 59, 62, 63, 65, 67, 69, 49, 45,
46, 49, 53, 58, 59, 62, 64, 67, 68, 71, 69, 68, 70, 72, 63, 57, 56, 57,
60, 67, 71, 78, 82, 89, 90, 91, 89, 89, 86, 88, 69, 65, 62, 60, 63, 66,
70, 74, 80, 85, 91, 96, 101, 103, 105, 107,
+ /* Size 16x4 */
+ 31, 49, 63, 69, 32, 45, 57, 65, 36, 46, 56, 62, 43, 49, 57, 60, 46, 53,
+ 60, 63, 45, 58, 67, 66, 46, 59, 71, 70, 50, 62, 78, 74, 52, 64, 82, 80,
+ 57, 67, 89, 85, 59, 68, 90, 91, 62, 71, 91, 96, 63, 69, 89, 101, 65, 68,
+ 89, 103, 67, 70, 86, 105, 69, 72, 88, 107,
/* Size 8x32 */
- 32, 37, 48, 52, 57, 66, 68, 71, 31, 38, 47, 50, 54, 63, 65, 67, 30, 40,
- 46, 48, 52, 60, 63, 66, 32, 41, 46, 48, 51, 59, 62, 64, 33, 43, 47, 47,
- 51, 59, 60, 63, 37, 47, 47, 47, 50, 57, 60, 62, 42, 47, 50, 50, 53, 60,
- 59, 62, 45, 47, 51, 52, 55, 61, 61, 61, 49, 48, 53, 54, 57, 62, 62, 62,
- 48, 47, 53, 57, 60, 66, 65, 64, 49, 46, 53, 61, 64, 69, 66, 66, 49, 46,
- 53, 62, 65, 71, 68, 67, 50, 46, 54, 64, 67, 73, 72, 70, 52, 47, 54, 66,
- 71, 77, 73, 71, 54, 49, 55, 68, 73, 80, 76, 75, 55, 49, 56, 69, 75, 82,
- 79, 76, 57, 50, 56, 70, 76, 84, 80, 79, 60, 52, 58, 72, 79, 88, 84, 81,
- 63, 55, 60, 75, 82, 92, 87, 84, 64, 55, 61, 75, 82, 92, 89, 86, 64, 56,
- 61, 75, 83, 93, 93, 89, 67, 58, 63, 76, 85, 95, 94, 91, 68, 59, 64, 74,
- 86, 94, 98, 94, 69, 60, 65, 72, 85, 95, 99, 97, 70, 62, 66, 73, 83, 96,
- 99, 98, 71, 63, 67, 74, 82, 93, 102, 102, 72, 64, 66, 75, 83, 92, 101,
- 104, 73, 65, 66, 75, 84, 93, 102, 106, 74, 67, 66, 74, 84, 94, 103, 106,
- 75, 68, 66, 74, 83, 93, 103, 109, 76, 69, 67, 73, 82, 91, 101, 109, 77,
- 70, 67, 73, 81, 90, 99, 108,
- /* Size 32x8 */
32, 31, 30, 32, 33, 37, 42, 45, 49, 48, 49, 49, 50, 52, 54, 55, 57, 60,
63, 64, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 37, 38, 40, 41,
43, 47, 47, 47, 48, 47, 46, 46, 46, 47, 49, 49, 50, 52, 55, 55, 56, 58,
@@ -757,7 +741,23 @@
59, 61, 62, 65, 66, 68, 72, 73, 76, 79, 80, 84, 87, 89, 93, 94, 98, 99,
99, 102, 101, 102, 103, 103, 101, 99, 71, 67, 66, 64, 63, 62, 62, 61,
62, 64, 66, 67, 70, 71, 75, 76, 79, 81, 84, 86, 89, 91, 94, 97, 98, 102,
- 104, 106, 106, 109, 109, 108 },
+ 104, 106, 106, 109, 109, 108,
+ /* Size 32x8 */
+ 32, 37, 48, 52, 57, 66, 68, 71, 31, 38, 47, 50, 54, 63, 65, 67, 30, 40,
+ 46, 48, 52, 60, 63, 66, 32, 41, 46, 48, 51, 59, 62, 64, 33, 43, 47, 47,
+ 51, 59, 60, 63, 37, 47, 47, 47, 50, 57, 60, 62, 42, 47, 50, 50, 53, 60,
+ 59, 62, 45, 47, 51, 52, 55, 61, 61, 61, 49, 48, 53, 54, 57, 62, 62, 62,
+ 48, 47, 53, 57, 60, 66, 65, 64, 49, 46, 53, 61, 64, 69, 66, 66, 49, 46,
+ 53, 62, 65, 71, 68, 67, 50, 46, 54, 64, 67, 73, 72, 70, 52, 47, 54, 66,
+ 71, 77, 73, 71, 54, 49, 55, 68, 73, 80, 76, 75, 55, 49, 56, 69, 75, 82,
+ 79, 76, 57, 50, 56, 70, 76, 84, 80, 79, 60, 52, 58, 72, 79, 88, 84, 81,
+ 63, 55, 60, 75, 82, 92, 87, 84, 64, 55, 61, 75, 82, 92, 89, 86, 64, 56,
+ 61, 75, 83, 93, 93, 89, 67, 58, 63, 76, 85, 95, 94, 91, 68, 59, 64, 74,
+ 86, 94, 98, 94, 69, 60, 65, 72, 85, 95, 99, 97, 70, 62, 66, 73, 83, 96,
+ 99, 98, 71, 63, 67, 74, 82, 93, 102, 102, 72, 64, 66, 75, 83, 92, 101,
+ 104, 73, 65, 66, 75, 84, 93, 102, 106, 74, 67, 66, 74, 84, 94, 103, 106,
+ 75, 68, 66, 74, 83, 93, 103, 109, 76, 69, 67, 73, 82, 91, 101, 109, 77,
+ 70, 67, 73, 81, 90, 99, 108 },
},
{
{ /* Luma */
@@ -851,21 +851,12 @@
121, 129, 130, 139, 140, 151, 151, 162, 162, 175, 176, 187, 188, 203,
204, 210, 211, 219,
/* Size 4x8 */
- 32, 42, 69, 88, 33, 42, 64, 83, 36, 56, 77, 88, 46, 67, 93, 105, 60, 79,
- 112, 122, 75, 92, 130, 144, 86, 95, 136, 167, 98, 105, 136, 177,
- /* Size 8x4 */
32, 33, 36, 46, 60, 75, 86, 98, 42, 42, 56, 67, 79, 92, 95, 105, 69, 64,
77, 93, 112, 130, 136, 136, 88, 83, 88, 105, 122, 144, 167, 177,
+ /* Size 8x4 */
+ 32, 42, 69, 88, 33, 42, 64, 83, 36, 56, 77, 88, 46, 67, 93, 105, 60, 79,
+ 112, 122, 75, 92, 130, 144, 86, 95, 136, 167, 98, 105, 136, 177,
/* Size 8x16 */
- 32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 44, 60, 72, 84, 90, 32, 34,
- 36, 45, 59, 71, 80, 87, 32, 35, 40, 47, 60, 71, 78, 85, 36, 37, 48, 56,
- 68, 78, 83, 87, 39, 40, 50, 60, 73, 84, 91, 94, 47, 45, 56, 69, 84, 95,
- 101, 101, 53, 50, 60, 75, 92, 103, 108, 110, 61, 56, 65, 81, 100, 113,
- 116, 118, 71, 64, 73, 89, 111, 125, 129, 129, 79, 70, 79, 95, 118, 133,
- 142, 138, 86, 76, 84, 100, 124, 140, 153, 150, 92, 82, 89, 101, 121,
- 148, 157, 161, 98, 88, 93, 108, 124, 141, 163, 174, 104, 94, 95, 110,
- 129, 151, 171, 181, 110, 100, 98, 111, 127, 147, 169, 188,
- /* Size 16x8 */
32, 31, 32, 32, 36, 39, 47, 53, 61, 71, 79, 86, 92, 98, 104, 110, 32,
32, 34, 35, 37, 40, 45, 50, 56, 64, 70, 76, 82, 88, 94, 100, 36, 35, 36,
40, 48, 50, 56, 60, 65, 73, 79, 84, 89, 93, 95, 98, 47, 44, 45, 47, 56,
@@ -874,40 +865,16 @@
95, 103, 113, 125, 133, 140, 148, 141, 151, 147, 90, 84, 80, 78, 83, 91,
101, 108, 116, 129, 142, 153, 157, 163, 171, 169, 96, 90, 87, 85, 87,
94, 101, 110, 118, 129, 138, 150, 161, 174, 181, 188,
+ /* Size 16x8 */
+ 32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 44, 60, 72, 84, 90, 32, 34,
+ 36, 45, 59, 71, 80, 87, 32, 35, 40, 47, 60, 71, 78, 85, 36, 37, 48, 56,
+ 68, 78, 83, 87, 39, 40, 50, 60, 73, 84, 91, 94, 47, 45, 56, 69, 84, 95,
+ 101, 101, 53, 50, 60, 75, 92, 103, 108, 110, 61, 56, 65, 81, 100, 113,
+ 116, 118, 71, 64, 73, 89, 111, 125, 129, 129, 79, 70, 79, 95, 118, 133,
+ 142, 138, 86, 76, 84, 100, 124, 140, 153, 150, 92, 82, 89, 101, 121,
+ 148, 157, 161, 98, 88, 93, 108, 124, 141, 163, 174, 104, 94, 95, 110,
+ 129, 151, 171, 181, 110, 100, 98, 111, 127, 147, 169, 188,
/* Size 16x32 */
- 32, 31, 32, 32, 36, 44, 47, 53, 65, 73, 79, 87, 90, 93, 96, 99, 31, 32,
- 32, 33, 35, 42, 45, 51, 62, 69, 75, 83, 86, 88, 91, 94, 31, 32, 32, 33,
- 35, 41, 44, 49, 60, 67, 72, 80, 84, 87, 90, 94, 31, 32, 33, 33, 35, 41,
- 44, 49, 59, 66, 71, 79, 82, 84, 87, 90, 32, 32, 34, 34, 36, 42, 45, 50,
- 59, 65, 71, 78, 80, 83, 87, 90, 32, 33, 35, 36, 38, 42, 45, 49, 58, 64,
- 69, 76, 80, 83, 86, 88, 32, 33, 35, 36, 40, 44, 47, 51, 60, 66, 71, 76,
- 78, 81, 85, 89, 34, 34, 36, 38, 42, 48, 50, 54, 63, 69, 73, 80, 82, 81,
- 84, 86, 36, 34, 37, 40, 48, 54, 56, 60, 68, 74, 78, 84, 83, 86, 87, 87,
- 38, 36, 39, 41, 49, 56, 58, 63, 71, 77, 81, 86, 88, 88, 90, 93, 39, 37,
- 40, 42, 50, 58, 60, 65, 73, 79, 84, 90, 91, 92, 94, 93, 44, 41, 42, 45,
- 53, 63, 66, 71, 79, 85, 90, 96, 94, 96, 96, 99, 47, 44, 45, 47, 56, 66,
- 69, 75, 84, 90, 95, 99, 101, 98, 101, 99, 49, 46, 47, 48, 57, 67, 71,
- 77, 86, 93, 97, 103, 103, 105, 102, 106, 53, 49, 50, 51, 60, 71, 75, 82,
- 92, 99, 103, 111, 108, 107, 110, 107, 58, 54, 54, 55, 63, 75, 79, 87,
- 98, 105, 110, 114, 114, 113, 111, 115, 61, 56, 56, 57, 65, 77, 81, 89,
- 100, 107, 113, 118, 116, 117, 118, 116, 65, 60, 59, 60, 68, 79, 84, 92,
- 105, 112, 118, 126, 124, 122, 121, 124, 71, 65, 64, 65, 73, 84, 89, 97,
- 111, 119, 125, 130, 129, 129, 129, 125, 76, 69, 68, 69, 76, 88, 92, 101,
- 115, 123, 130, 134, 134, 131, 132, 135, 79, 72, 70, 71, 79, 90, 95, 104,
- 118, 127, 133, 143, 142, 141, 138, 136, 82, 75, 73, 74, 81, 92, 97, 106,
- 121, 130, 136, 146, 145, 144, 144, 145, 86, 78, 76, 77, 84, 95, 100,
- 109, 124, 133, 140, 147, 153, 151, 150, 146, 89, 81, 79, 78, 87, 95, 99,
- 112, 124, 130, 145, 152, 156, 157, 156, 158, 92, 84, 82, 80, 89, 95,
- 101, 116, 121, 132, 148, 151, 157, 163, 161, 159, 95, 86, 85, 83, 92,
- 95, 105, 114, 120, 136, 143, 155, 163, 167, 171, 170, 98, 89, 88, 85,
- 93, 95, 108, 113, 124, 136, 141, 160, 163, 169, 174, 171, 101, 92, 91,
- 88, 94, 98, 110, 112, 128, 133, 146, 158, 166, 175, 179, 185, 104, 95,
- 94, 91, 95, 101, 110, 115, 129, 132, 151, 154, 171, 175, 181, 186, 107,
- 98, 97, 94, 96, 105, 110, 119, 128, 136, 149, 156, 173, 177, 188, 192,
- 110, 101, 100, 97, 98, 108, 111, 123, 127, 141, 147, 161, 169, 183, 188,
- 193, 114, 104, 104, 100, 100, 111, 111, 126, 127, 145, 145, 166, 166,
- 189, 190, 201,
- /* Size 32x16 */
32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 61, 65,
71, 76, 79, 82, 86, 89, 92, 95, 98, 101, 104, 107, 110, 114, 31, 32, 32,
32, 32, 33, 33, 34, 34, 36, 37, 41, 44, 46, 49, 54, 56, 60, 65, 69, 72,
@@ -940,34 +907,50 @@
188, 188, 190, 99, 94, 94, 90, 90, 88, 89, 86, 87, 93, 93, 99, 99, 106,
107, 115, 116, 124, 125, 135, 136, 145, 146, 158, 159, 170, 171, 185,
186, 192, 193, 201,
+ /* Size 32x16 */
+ 32, 31, 32, 32, 36, 44, 47, 53, 65, 73, 79, 87, 90, 93, 96, 99, 31, 32,
+ 32, 33, 35, 42, 45, 51, 62, 69, 75, 83, 86, 88, 91, 94, 31, 32, 32, 33,
+ 35, 41, 44, 49, 60, 67, 72, 80, 84, 87, 90, 94, 31, 32, 33, 33, 35, 41,
+ 44, 49, 59, 66, 71, 79, 82, 84, 87, 90, 32, 32, 34, 34, 36, 42, 45, 50,
+ 59, 65, 71, 78, 80, 83, 87, 90, 32, 33, 35, 36, 38, 42, 45, 49, 58, 64,
+ 69, 76, 80, 83, 86, 88, 32, 33, 35, 36, 40, 44, 47, 51, 60, 66, 71, 76,
+ 78, 81, 85, 89, 34, 34, 36, 38, 42, 48, 50, 54, 63, 69, 73, 80, 82, 81,
+ 84, 86, 36, 34, 37, 40, 48, 54, 56, 60, 68, 74, 78, 84, 83, 86, 87, 87,
+ 38, 36, 39, 41, 49, 56, 58, 63, 71, 77, 81, 86, 88, 88, 90, 93, 39, 37,
+ 40, 42, 50, 58, 60, 65, 73, 79, 84, 90, 91, 92, 94, 93, 44, 41, 42, 45,
+ 53, 63, 66, 71, 79, 85, 90, 96, 94, 96, 96, 99, 47, 44, 45, 47, 56, 66,
+ 69, 75, 84, 90, 95, 99, 101, 98, 101, 99, 49, 46, 47, 48, 57, 67, 71,
+ 77, 86, 93, 97, 103, 103, 105, 102, 106, 53, 49, 50, 51, 60, 71, 75, 82,
+ 92, 99, 103, 111, 108, 107, 110, 107, 58, 54, 54, 55, 63, 75, 79, 87,
+ 98, 105, 110, 114, 114, 113, 111, 115, 61, 56, 56, 57, 65, 77, 81, 89,
+ 100, 107, 113, 118, 116, 117, 118, 116, 65, 60, 59, 60, 68, 79, 84, 92,
+ 105, 112, 118, 126, 124, 122, 121, 124, 71, 65, 64, 65, 73, 84, 89, 97,
+ 111, 119, 125, 130, 129, 129, 129, 125, 76, 69, 68, 69, 76, 88, 92, 101,
+ 115, 123, 130, 134, 134, 131, 132, 135, 79, 72, 70, 71, 79, 90, 95, 104,
+ 118, 127, 133, 143, 142, 141, 138, 136, 82, 75, 73, 74, 81, 92, 97, 106,
+ 121, 130, 136, 146, 145, 144, 144, 145, 86, 78, 76, 77, 84, 95, 100,
+ 109, 124, 133, 140, 147, 153, 151, 150, 146, 89, 81, 79, 78, 87, 95, 99,
+ 112, 124, 130, 145, 152, 156, 157, 156, 158, 92, 84, 82, 80, 89, 95,
+ 101, 116, 121, 132, 148, 151, 157, 163, 161, 159, 95, 86, 85, 83, 92,
+ 95, 105, 114, 120, 136, 143, 155, 163, 167, 171, 170, 98, 89, 88, 85,
+ 93, 95, 108, 113, 124, 136, 141, 160, 163, 169, 174, 171, 101, 92, 91,
+ 88, 94, 98, 110, 112, 128, 133, 146, 158, 166, 175, 179, 185, 104, 95,
+ 94, 91, 95, 101, 110, 115, 129, 132, 151, 154, 171, 175, 181, 186, 107,
+ 98, 97, 94, 96, 105, 110, 119, 128, 136, 149, 156, 173, 177, 188, 192,
+ 110, 101, 100, 97, 98, 108, 111, 123, 127, 141, 147, 161, 169, 183, 188,
+ 193, 114, 104, 104, 100, 100, 111, 111, 126, 127, 145, 145, 166, 166,
+ 189, 190, 201,
/* Size 4x16 */
- 31, 44, 73, 93, 32, 41, 67, 87, 32, 42, 65, 83, 33, 44, 66, 81, 34, 54,
- 74, 86, 37, 58, 79, 92, 44, 66, 90, 98, 49, 71, 99, 107, 56, 77, 107,
- 117, 65, 84, 119, 129, 72, 90, 127, 141, 78, 95, 133, 151, 84, 95, 132,
- 163, 89, 95, 136, 169, 95, 101, 132, 175, 101, 108, 141, 183,
- /* Size 16x4 */
31, 32, 32, 33, 34, 37, 44, 49, 56, 65, 72, 78, 84, 89, 95, 101, 44, 41,
42, 44, 54, 58, 66, 71, 77, 84, 90, 95, 95, 95, 101, 108, 73, 67, 65,
66, 74, 79, 90, 99, 107, 119, 127, 133, 132, 136, 132, 141, 93, 87, 83,
81, 86, 92, 98, 107, 117, 129, 141, 151, 163, 169, 175, 183,
+ /* Size 16x4 */
+ 31, 44, 73, 93, 32, 41, 67, 87, 32, 42, 65, 83, 33, 44, 66, 81, 34, 54,
+ 74, 86, 37, 58, 79, 92, 44, 66, 90, 98, 49, 71, 99, 107, 56, 77, 107,
+ 117, 65, 84, 119, 129, 72, 90, 127, 141, 78, 95, 133, 151, 84, 95, 132,
+ 163, 89, 95, 136, 169, 95, 101, 132, 175, 101, 108, 141, 183,
/* Size 8x32 */
- 32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 45, 62, 75, 86, 91, 31, 32,
- 35, 44, 60, 72, 84, 90, 31, 33, 35, 44, 59, 71, 82, 87, 32, 34, 36, 45,
- 59, 71, 80, 87, 32, 35, 38, 45, 58, 69, 80, 86, 32, 35, 40, 47, 60, 71,
- 78, 85, 34, 36, 42, 50, 63, 73, 82, 84, 36, 37, 48, 56, 68, 78, 83, 87,
- 38, 39, 49, 58, 71, 81, 88, 90, 39, 40, 50, 60, 73, 84, 91, 94, 44, 42,
- 53, 66, 79, 90, 94, 96, 47, 45, 56, 69, 84, 95, 101, 101, 49, 47, 57,
- 71, 86, 97, 103, 102, 53, 50, 60, 75, 92, 103, 108, 110, 58, 54, 63, 79,
- 98, 110, 114, 111, 61, 56, 65, 81, 100, 113, 116, 118, 65, 59, 68, 84,
- 105, 118, 124, 121, 71, 64, 73, 89, 111, 125, 129, 129, 76, 68, 76, 92,
- 115, 130, 134, 132, 79, 70, 79, 95, 118, 133, 142, 138, 82, 73, 81, 97,
- 121, 136, 145, 144, 86, 76, 84, 100, 124, 140, 153, 150, 89, 79, 87, 99,
- 124, 145, 156, 156, 92, 82, 89, 101, 121, 148, 157, 161, 95, 85, 92,
- 105, 120, 143, 163, 171, 98, 88, 93, 108, 124, 141, 163, 174, 101, 91,
- 94, 110, 128, 146, 166, 179, 104, 94, 95, 110, 129, 151, 171, 181, 107,
- 97, 96, 110, 128, 149, 173, 188, 110, 100, 98, 111, 127, 147, 169, 188,
- 114, 104, 100, 111, 127, 145, 166, 190,
- /* Size 32x8 */
32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 61, 65,
71, 76, 79, 82, 86, 89, 92, 95, 98, 101, 104, 107, 110, 114, 32, 32, 32,
33, 34, 35, 35, 36, 37, 39, 40, 42, 45, 47, 50, 54, 56, 59, 64, 68, 70,
@@ -983,7 +966,24 @@
101, 103, 108, 114, 116, 124, 129, 134, 142, 145, 153, 156, 157, 163,
163, 166, 171, 173, 169, 166, 96, 91, 90, 87, 87, 86, 85, 84, 87, 90,
94, 96, 101, 102, 110, 111, 118, 121, 129, 132, 138, 144, 150, 156, 161,
- 171, 174, 179, 181, 188, 188, 190 },
+ 171, 174, 179, 181, 188, 188, 190,
+ /* Size 32x8 */
+ 32, 32, 36, 47, 65, 79, 90, 96, 31, 32, 35, 45, 62, 75, 86, 91, 31, 32,
+ 35, 44, 60, 72, 84, 90, 31, 33, 35, 44, 59, 71, 82, 87, 32, 34, 36, 45,
+ 59, 71, 80, 87, 32, 35, 38, 45, 58, 69, 80, 86, 32, 35, 40, 47, 60, 71,
+ 78, 85, 34, 36, 42, 50, 63, 73, 82, 84, 36, 37, 48, 56, 68, 78, 83, 87,
+ 38, 39, 49, 58, 71, 81, 88, 90, 39, 40, 50, 60, 73, 84, 91, 94, 44, 42,
+ 53, 66, 79, 90, 94, 96, 47, 45, 56, 69, 84, 95, 101, 101, 49, 47, 57,
+ 71, 86, 97, 103, 102, 53, 50, 60, 75, 92, 103, 108, 110, 58, 54, 63, 79,
+ 98, 110, 114, 111, 61, 56, 65, 81, 100, 113, 116, 118, 65, 59, 68, 84,
+ 105, 118, 124, 121, 71, 64, 73, 89, 111, 125, 129, 129, 76, 68, 76, 92,
+ 115, 130, 134, 132, 79, 70, 79, 95, 118, 133, 142, 138, 82, 73, 81, 97,
+ 121, 136, 145, 144, 86, 76, 84, 100, 124, 140, 153, 150, 89, 79, 87, 99,
+ 124, 145, 156, 156, 92, 82, 89, 101, 121, 148, 157, 161, 95, 85, 92,
+ 105, 120, 143, 163, 171, 98, 88, 93, 108, 124, 141, 163, 174, 101, 91,
+ 94, 110, 128, 146, 166, 179, 104, 94, 95, 110, 129, 151, 171, 181, 107,
+ 97, 96, 110, 128, 149, 173, 188, 110, 100, 98, 111, 127, 147, 169, 188,
+ 114, 104, 100, 111, 127, 145, 166, 190 },
{ /* Chroma */
/* Size 4x4 */
33, 45, 56, 64, 45, 58, 66, 69, 56, 66, 86, 87, 64, 69, 87, 105,
@@ -1068,21 +1068,12 @@
71, 71, 68, 68, 66, 66, 64, 64, 68, 68, 71, 71, 75, 75, 79, 79, 83, 84,
88, 89, 93, 93, 98, 98, 102, 103, 108, 108, 110, 110, 113,
/* Size 4x8 */
- 31, 47, 57, 65, 40, 45, 52, 61, 46, 55, 61, 63, 47, 60, 70, 72, 52, 64,
- 79, 81, 59, 68, 87, 90, 63, 66, 88, 99, 66, 69, 85, 102,
- /* Size 8x4 */
31, 40, 46, 47, 52, 59, 63, 66, 47, 45, 55, 60, 64, 68, 66, 69, 57, 52,
61, 70, 79, 87, 88, 85, 65, 61, 63, 72, 81, 90, 99, 102,
+ /* Size 8x4 */
+ 31, 47, 57, 65, 40, 45, 52, 61, 46, 55, 61, 63, 47, 60, 70, 72, 52, 64,
+ 79, 81, 59, 68, 87, 90, 63, 66, 88, 99, 66, 69, 85, 102,
/* Size 8x16 */
- 32, 35, 48, 50, 57, 63, 68, 70, 30, 38, 46, 46, 52, 58, 63, 65, 33, 41,
- 47, 46, 51, 56, 60, 63, 39, 46, 48, 47, 51, 55, 58, 61, 49, 48, 53, 54,
- 57, 60, 61, 61, 48, 46, 53, 56, 60, 64, 65, 65, 50, 46, 54, 61, 66, 70,
- 71, 69, 52, 47, 54, 63, 71, 75, 75, 74, 55, 49, 56, 65, 74, 79, 79, 78,
- 60, 53, 58, 68, 79, 85, 85, 82, 63, 55, 60, 70, 82, 89, 91, 87, 66, 58,
- 62, 72, 84, 91, 95, 91, 68, 60, 64, 71, 81, 94, 97, 96, 70, 62, 65, 73,
- 81, 89, 98, 101, 72, 65, 65, 72, 82, 92, 100, 103, 74, 67, 65, 71, 79,
- 89, 98, 105,
- /* Size 16x8 */
32, 30, 33, 39, 49, 48, 50, 52, 55, 60, 63, 66, 68, 70, 72, 74, 35, 38,
41, 46, 48, 46, 46, 47, 49, 53, 55, 58, 60, 62, 65, 67, 48, 46, 47, 48,
53, 53, 54, 54, 56, 58, 60, 62, 64, 65, 65, 65, 50, 46, 46, 47, 54, 56,
@@ -1091,37 +1082,16 @@
89, 91, 94, 89, 92, 89, 68, 63, 60, 58, 61, 65, 71, 75, 79, 85, 91, 95,
97, 98, 100, 98, 70, 65, 63, 61, 61, 65, 69, 74, 78, 82, 87, 91, 96,
101, 103, 105,
+ /* Size 16x8 */
+ 32, 35, 48, 50, 57, 63, 68, 70, 30, 38, 46, 46, 52, 58, 63, 65, 33, 41,
+ 47, 46, 51, 56, 60, 63, 39, 46, 48, 47, 51, 55, 58, 61, 49, 48, 53, 54,
+ 57, 60, 61, 61, 48, 46, 53, 56, 60, 64, 65, 65, 50, 46, 54, 61, 66, 70,
+ 71, 69, 52, 47, 54, 63, 71, 75, 75, 74, 55, 49, 56, 65, 74, 79, 79, 78,
+ 60, 53, 58, 68, 79, 85, 85, 82, 63, 55, 60, 70, 82, 89, 91, 87, 66, 58,
+ 62, 72, 84, 91, 95, 91, 68, 60, 64, 71, 81, 94, 97, 96, 70, 62, 65, 73,
+ 81, 89, 98, 101, 72, 65, 65, 72, 82, 92, 100, 103, 74, 67, 65, 71, 79,
+ 89, 98, 105,
/* Size 16x32 */
- 32, 31, 35, 38, 48, 49, 50, 52, 57, 61, 63, 67, 68, 69, 70, 71, 31, 31,
- 37, 40, 47, 47, 48, 50, 54, 57, 60, 63, 64, 65, 66, 67, 30, 32, 38, 40,
- 46, 45, 46, 48, 52, 55, 58, 61, 63, 64, 65, 67, 31, 33, 38, 41, 46, 45,
- 46, 48, 52, 55, 57, 60, 61, 62, 63, 64, 33, 36, 41, 44, 47, 46, 46, 47,
- 51, 54, 56, 59, 60, 61, 63, 64, 37, 40, 45, 47, 47, 45, 46, 47, 50, 52,
- 54, 57, 59, 61, 62, 62, 39, 41, 46, 47, 48, 47, 47, 48, 51, 54, 55, 57,
- 58, 59, 61, 62, 42, 43, 46, 48, 50, 49, 50, 50, 53, 56, 57, 60, 60, 59,
- 60, 60, 49, 46, 48, 49, 53, 53, 54, 54, 57, 59, 60, 63, 61, 62, 61, 61,
- 48, 46, 47, 48, 53, 55, 55, 56, 58, 61, 62, 64, 64, 63, 63, 64, 48, 46,
- 46, 48, 53, 56, 56, 57, 60, 62, 64, 66, 65, 65, 65, 64, 49, 45, 45, 47,
- 53, 58, 59, 61, 64, 66, 67, 69, 67, 67, 66, 67, 50, 46, 46, 48, 54, 59,
- 61, 63, 66, 68, 70, 71, 71, 68, 69, 67, 51, 47, 47, 48, 54, 60, 61, 64,
- 68, 70, 71, 73, 72, 72, 70, 71, 52, 48, 47, 48, 54, 61, 63, 66, 71, 73,
- 75, 77, 75, 73, 74, 71, 54, 50, 49, 50, 55, 62, 65, 68, 73, 76, 78, 79,
- 78, 76, 74, 75, 55, 51, 49, 50, 56, 63, 65, 69, 74, 77, 79, 81, 79, 78,
- 78, 75, 57, 52, 50, 51, 56, 64, 66, 70, 76, 79, 82, 85, 83, 81, 79, 79,
- 60, 54, 53, 53, 58, 65, 68, 72, 79, 82, 85, 87, 85, 84, 82, 80, 62, 56,
- 54, 55, 60, 66, 69, 74, 81, 84, 87, 88, 87, 85, 84, 84, 63, 57, 55, 56,
- 60, 67, 70, 75, 82, 86, 89, 92, 91, 89, 87, 84, 64, 59, 56, 57, 61, 68,
- 71, 75, 83, 87, 90, 93, 92, 90, 89, 89, 66, 60, 58, 58, 62, 69, 72, 76,
- 84, 88, 91, 94, 95, 93, 91, 89, 67, 61, 59, 58, 63, 68, 71, 78, 83, 86,
- 93, 96, 96, 96, 94, 94, 68, 62, 60, 59, 64, 67, 71, 79, 81, 86, 94, 95,
- 97, 98, 96, 94, 69, 63, 61, 60, 65, 66, 72, 77, 80, 88, 91, 96, 99, 99,
- 100, 98, 70, 64, 62, 60, 65, 66, 73, 76, 81, 87, 89, 97, 98, 100, 101,
- 99, 71, 65, 64, 61, 65, 67, 73, 74, 82, 85, 90, 95, 99, 102, 103, 104,
- 72, 65, 65, 62, 65, 68, 72, 75, 82, 83, 92, 93, 100, 102, 103, 104, 73,
- 66, 66, 63, 65, 69, 72, 76, 81, 85, 90, 93, 100, 102, 105, 106, 74, 67,
- 67, 64, 65, 70, 71, 77, 79, 86, 89, 94, 98, 103, 105, 106, 75, 68, 68,
- 65, 65, 71, 71, 78, 78, 87, 87, 96, 96, 105, 105, 109,
- /* Size 32x16 */
32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 55, 57,
60, 62, 63, 64, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 31, 31, 32, 33,
36, 40, 41, 43, 46, 46, 46, 45, 46, 47, 48, 50, 51, 52, 54, 56, 57, 59,
@@ -1151,33 +1121,47 @@
79, 82, 84, 87, 89, 91, 94, 96, 100, 101, 103, 103, 105, 105, 105, 71,
67, 67, 64, 64, 62, 62, 60, 61, 64, 64, 67, 67, 71, 71, 75, 75, 79, 80,
84, 84, 89, 89, 94, 94, 98, 99, 104, 104, 106, 106, 109,
+ /* Size 32x16 */
+ 32, 31, 35, 38, 48, 49, 50, 52, 57, 61, 63, 67, 68, 69, 70, 71, 31, 31,
+ 37, 40, 47, 47, 48, 50, 54, 57, 60, 63, 64, 65, 66, 67, 30, 32, 38, 40,
+ 46, 45, 46, 48, 52, 55, 58, 61, 63, 64, 65, 67, 31, 33, 38, 41, 46, 45,
+ 46, 48, 52, 55, 57, 60, 61, 62, 63, 64, 33, 36, 41, 44, 47, 46, 46, 47,
+ 51, 54, 56, 59, 60, 61, 63, 64, 37, 40, 45, 47, 47, 45, 46, 47, 50, 52,
+ 54, 57, 59, 61, 62, 62, 39, 41, 46, 47, 48, 47, 47, 48, 51, 54, 55, 57,
+ 58, 59, 61, 62, 42, 43, 46, 48, 50, 49, 50, 50, 53, 56, 57, 60, 60, 59,
+ 60, 60, 49, 46, 48, 49, 53, 53, 54, 54, 57, 59, 60, 63, 61, 62, 61, 61,
+ 48, 46, 47, 48, 53, 55, 55, 56, 58, 61, 62, 64, 64, 63, 63, 64, 48, 46,
+ 46, 48, 53, 56, 56, 57, 60, 62, 64, 66, 65, 65, 65, 64, 49, 45, 45, 47,
+ 53, 58, 59, 61, 64, 66, 67, 69, 67, 67, 66, 67, 50, 46, 46, 48, 54, 59,
+ 61, 63, 66, 68, 70, 71, 71, 68, 69, 67, 51, 47, 47, 48, 54, 60, 61, 64,
+ 68, 70, 71, 73, 72, 72, 70, 71, 52, 48, 47, 48, 54, 61, 63, 66, 71, 73,
+ 75, 77, 75, 73, 74, 71, 54, 50, 49, 50, 55, 62, 65, 68, 73, 76, 78, 79,
+ 78, 76, 74, 75, 55, 51, 49, 50, 56, 63, 65, 69, 74, 77, 79, 81, 79, 78,
+ 78, 75, 57, 52, 50, 51, 56, 64, 66, 70, 76, 79, 82, 85, 83, 81, 79, 79,
+ 60, 54, 53, 53, 58, 65, 68, 72, 79, 82, 85, 87, 85, 84, 82, 80, 62, 56,
+ 54, 55, 60, 66, 69, 74, 81, 84, 87, 88, 87, 85, 84, 84, 63, 57, 55, 56,
+ 60, 67, 70, 75, 82, 86, 89, 92, 91, 89, 87, 84, 64, 59, 56, 57, 61, 68,
+ 71, 75, 83, 87, 90, 93, 92, 90, 89, 89, 66, 60, 58, 58, 62, 69, 72, 76,
+ 84, 88, 91, 94, 95, 93, 91, 89, 67, 61, 59, 58, 63, 68, 71, 78, 83, 86,
+ 93, 96, 96, 96, 94, 94, 68, 62, 60, 59, 64, 67, 71, 79, 81, 86, 94, 95,
+ 97, 98, 96, 94, 69, 63, 61, 60, 65, 66, 72, 77, 80, 88, 91, 96, 99, 99,
+ 100, 98, 70, 64, 62, 60, 65, 66, 73, 76, 81, 87, 89, 97, 98, 100, 101,
+ 99, 71, 65, 64, 61, 65, 67, 73, 74, 82, 85, 90, 95, 99, 102, 103, 104,
+ 72, 65, 65, 62, 65, 68, 72, 75, 82, 83, 92, 93, 100, 102, 103, 104, 73,
+ 66, 66, 63, 65, 69, 72, 76, 81, 85, 90, 93, 100, 102, 105, 106, 74, 67,
+ 67, 64, 65, 70, 71, 77, 79, 86, 89, 94, 98, 103, 105, 106, 75, 68, 68,
+ 65, 65, 71, 71, 78, 78, 87, 87, 96, 96, 105, 105, 109,
/* Size 4x16 */
- 31, 49, 61, 69, 32, 45, 55, 64, 36, 46, 54, 61, 41, 47, 54, 59, 46, 53,
- 59, 62, 46, 56, 62, 65, 46, 59, 68, 68, 48, 61, 73, 73, 51, 63, 77, 78,
- 54, 65, 82, 84, 57, 67, 86, 89, 60, 69, 88, 93, 62, 67, 86, 98, 64, 66,
- 87, 100, 65, 68, 83, 102, 67, 70, 86, 103,
- /* Size 16x4 */
31, 32, 36, 41, 46, 46, 46, 48, 51, 54, 57, 60, 62, 64, 65, 67, 49, 45,
46, 47, 53, 56, 59, 61, 63, 65, 67, 69, 67, 66, 68, 70, 61, 55, 54, 54,
59, 62, 68, 73, 77, 82, 86, 88, 86, 87, 83, 86, 69, 64, 61, 59, 62, 65,
68, 73, 78, 84, 89, 93, 98, 100, 102, 103,
+ /* Size 16x4 */
+ 31, 49, 61, 69, 32, 45, 55, 64, 36, 46, 54, 61, 41, 47, 54, 59, 46, 53,
+ 59, 62, 46, 56, 62, 65, 46, 59, 68, 68, 48, 61, 73, 73, 51, 63, 77, 78,
+ 54, 65, 82, 84, 57, 67, 86, 89, 60, 69, 88, 93, 62, 67, 86, 98, 64, 66,
+ 87, 100, 65, 68, 83, 102, 67, 70, 86, 103,
/* Size 8x32 */
- 32, 35, 48, 50, 57, 63, 68, 70, 31, 37, 47, 48, 54, 60, 64, 66, 30, 38,
- 46, 46, 52, 58, 63, 65, 31, 38, 46, 46, 52, 57, 61, 63, 33, 41, 47, 46,
- 51, 56, 60, 63, 37, 45, 47, 46, 50, 54, 59, 62, 39, 46, 48, 47, 51, 55,
- 58, 61, 42, 46, 50, 50, 53, 57, 60, 60, 49, 48, 53, 54, 57, 60, 61, 61,
- 48, 47, 53, 55, 58, 62, 64, 63, 48, 46, 53, 56, 60, 64, 65, 65, 49, 45,
- 53, 59, 64, 67, 67, 66, 50, 46, 54, 61, 66, 70, 71, 69, 51, 47, 54, 61,
- 68, 71, 72, 70, 52, 47, 54, 63, 71, 75, 75, 74, 54, 49, 55, 65, 73, 78,
- 78, 74, 55, 49, 56, 65, 74, 79, 79, 78, 57, 50, 56, 66, 76, 82, 83, 79,
- 60, 53, 58, 68, 79, 85, 85, 82, 62, 54, 60, 69, 81, 87, 87, 84, 63, 55,
- 60, 70, 82, 89, 91, 87, 64, 56, 61, 71, 83, 90, 92, 89, 66, 58, 62, 72,
- 84, 91, 95, 91, 67, 59, 63, 71, 83, 93, 96, 94, 68, 60, 64, 71, 81, 94,
- 97, 96, 69, 61, 65, 72, 80, 91, 99, 100, 70, 62, 65, 73, 81, 89, 98,
- 101, 71, 64, 65, 73, 82, 90, 99, 103, 72, 65, 65, 72, 82, 92, 100, 103,
- 73, 66, 65, 72, 81, 90, 100, 105, 74, 67, 65, 71, 79, 89, 98, 105, 75,
- 68, 65, 71, 78, 87, 96, 105,
- /* Size 32x8 */
32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 55, 57,
60, 62, 63, 64, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 35, 37, 38, 38,
41, 45, 46, 46, 48, 47, 46, 45, 46, 47, 47, 49, 49, 50, 53, 54, 55, 56,
@@ -1192,7 +1176,23 @@
58, 60, 61, 64, 65, 67, 71, 72, 75, 78, 79, 83, 85, 87, 91, 92, 95, 96,
97, 99, 98, 99, 100, 100, 98, 96, 70, 66, 65, 63, 63, 62, 61, 60, 61,
63, 65, 66, 69, 70, 74, 74, 78, 79, 82, 84, 87, 89, 91, 94, 96, 100,
- 101, 103, 103, 105, 105, 105 },
+ 101, 103, 103, 105, 105, 105,
+ /* Size 32x8 */
+ 32, 35, 48, 50, 57, 63, 68, 70, 31, 37, 47, 48, 54, 60, 64, 66, 30, 38,
+ 46, 46, 52, 58, 63, 65, 31, 38, 46, 46, 52, 57, 61, 63, 33, 41, 47, 46,
+ 51, 56, 60, 63, 37, 45, 47, 46, 50, 54, 59, 62, 39, 46, 48, 47, 51, 55,
+ 58, 61, 42, 46, 50, 50, 53, 57, 60, 60, 49, 48, 53, 54, 57, 60, 61, 61,
+ 48, 47, 53, 55, 58, 62, 64, 63, 48, 46, 53, 56, 60, 64, 65, 65, 49, 45,
+ 53, 59, 64, 67, 67, 66, 50, 46, 54, 61, 66, 70, 71, 69, 51, 47, 54, 61,
+ 68, 71, 72, 70, 52, 47, 54, 63, 71, 75, 75, 74, 54, 49, 55, 65, 73, 78,
+ 78, 74, 55, 49, 56, 65, 74, 79, 79, 78, 57, 50, 56, 66, 76, 82, 83, 79,
+ 60, 53, 58, 68, 79, 85, 85, 82, 62, 54, 60, 69, 81, 87, 87, 84, 63, 55,
+ 60, 70, 82, 89, 91, 87, 64, 56, 61, 71, 83, 90, 92, 89, 66, 58, 62, 72,
+ 84, 91, 95, 91, 67, 59, 63, 71, 83, 93, 96, 94, 68, 60, 64, 71, 81, 94,
+ 97, 96, 69, 61, 65, 72, 80, 91, 99, 100, 70, 62, 65, 73, 81, 89, 98,
+ 101, 71, 64, 65, 73, 82, 90, 99, 103, 72, 65, 65, 72, 82, 92, 100, 103,
+ 73, 66, 65, 72, 81, 90, 100, 105, 74, 67, 65, 71, 79, 89, 98, 105, 75,
+ 68, 65, 71, 78, 87, 96, 105 },
},
{
{ /* Luma */
@@ -1284,21 +1284,12 @@
101, 97, 97, 95, 95, 93, 93, 99, 99, 105, 105, 112, 112, 120, 120, 129,
129, 139, 140, 149, 149, 161, 161, 172, 172, 185, 186, 191, 192, 199,
/* Size 4x8 */
- 32, 38, 62, 86, 32, 40, 58, 80, 34, 51, 68, 85, 44, 61, 85, 101, 54, 69,
- 98, 117, 72, 84, 118, 136, 82, 89, 129, 157, 92, 98, 127, 165,
- /* Size 8x4 */
32, 32, 34, 44, 54, 72, 82, 92, 38, 40, 51, 61, 69, 84, 89, 98, 62, 58,
68, 85, 98, 118, 129, 127, 86, 80, 85, 101, 117, 136, 157, 165,
+ /* Size 8x4 */
+ 32, 38, 62, 86, 32, 40, 58, 80, 34, 51, 68, 85, 44, 61, 85, 101, 54, 69,
+ 98, 117, 72, 84, 118, 136, 82, 89, 129, 157, 92, 98, 127, 165,
/* Size 8x16 */
- 32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 41, 54, 73, 81, 88, 32, 33,
- 36, 42, 53, 71, 78, 84, 32, 34, 38, 42, 52, 69, 76, 82, 34, 36, 44, 50,
- 59, 75, 81, 84, 39, 39, 50, 58, 68, 84, 88, 90, 44, 42, 53, 63, 74, 90,
- 97, 97, 49, 46, 57, 67, 81, 97, 104, 105, 57, 53, 63, 74, 90, 108, 111,
- 113, 65, 59, 68, 79, 97, 118, 123, 122, 71, 64, 73, 84, 102, 125, 135,
- 131, 81, 72, 80, 91, 110, 135, 145, 141, 87, 77, 85, 96, 114, 140, 148,
- 151, 92, 83, 88, 102, 117, 133, 153, 163, 98, 88, 89, 103, 121, 141,
- 160, 169, 103, 94, 92, 103, 119, 137, 158, 175,
- /* Size 16x8 */
32, 31, 32, 32, 34, 39, 44, 49, 57, 65, 71, 81, 87, 92, 98, 103, 32, 32,
33, 34, 36, 39, 42, 46, 53, 59, 64, 72, 77, 83, 88, 94, 36, 35, 36, 38,
44, 50, 53, 57, 63, 68, 73, 80, 85, 88, 89, 92, 44, 41, 42, 42, 50, 58,
@@ -1307,39 +1298,16 @@
97, 108, 118, 125, 135, 140, 133, 141, 137, 88, 81, 78, 76, 81, 88, 97,
104, 111, 123, 135, 145, 148, 153, 160, 158, 93, 88, 84, 82, 84, 90, 97,
105, 113, 122, 131, 141, 151, 163, 169, 175,
+ /* Size 16x8 */
+ 32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 41, 54, 73, 81, 88, 32, 33,
+ 36, 42, 53, 71, 78, 84, 32, 34, 38, 42, 52, 69, 76, 82, 34, 36, 44, 50,
+ 59, 75, 81, 84, 39, 39, 50, 58, 68, 84, 88, 90, 44, 42, 53, 63, 74, 90,
+ 97, 97, 49, 46, 57, 67, 81, 97, 104, 105, 57, 53, 63, 74, 90, 108, 111,
+ 113, 65, 59, 68, 79, 97, 118, 123, 122, 71, 64, 73, 84, 102, 125, 135,
+ 131, 81, 72, 80, 91, 110, 135, 145, 141, 87, 77, 85, 96, 114, 140, 148,
+ 151, 92, 83, 88, 102, 117, 133, 153, 163, 98, 88, 89, 103, 121, 141,
+ 160, 169, 103, 94, 92, 103, 119, 137, 158, 175,
/* Size 16x32 */
- 32, 31, 32, 32, 36, 39, 44, 53, 58, 65, 79, 81, 88, 90, 93, 96, 31, 32,
- 32, 32, 35, 38, 42, 51, 55, 62, 75, 77, 83, 86, 88, 91, 31, 32, 32, 32,
- 35, 38, 41, 50, 54, 60, 73, 75, 81, 84, 88, 91, 31, 32, 32, 33, 34, 37,
- 41, 49, 53, 59, 72, 74, 79, 82, 84, 87, 32, 32, 33, 34, 36, 39, 42, 50,
- 53, 59, 71, 72, 78, 81, 84, 87, 32, 32, 34, 34, 37, 40, 42, 49, 53, 58,
- 70, 71, 77, 80, 83, 85, 32, 33, 34, 35, 38, 40, 42, 49, 52, 58, 69, 70,
- 76, 78, 82, 86, 34, 34, 35, 37, 42, 45, 48, 54, 57, 63, 73, 75, 79, 79,
- 81, 83, 34, 34, 36, 37, 44, 47, 50, 56, 59, 65, 75, 77, 81, 83, 84, 84,
- 36, 34, 37, 38, 48, 51, 54, 60, 63, 68, 78, 80, 85, 85, 86, 89, 39, 37,
- 39, 40, 50, 54, 58, 65, 68, 73, 84, 85, 88, 89, 90, 89, 40, 38, 40, 41,
- 51, 55, 59, 67, 70, 75, 85, 87, 91, 92, 92, 95, 44, 41, 42, 43, 53, 58,
- 63, 71, 74, 79, 90, 91, 97, 94, 97, 95, 47, 44, 45, 46, 56, 61, 66, 75,
- 79, 85, 95, 97, 99, 101, 98, 102, 49, 46, 46, 47, 57, 62, 67, 77, 81,
- 86, 97, 99, 104, 102, 105, 102, 53, 49, 50, 50, 60, 65, 71, 82, 86, 92,
- 103, 105, 109, 108, 106, 110, 57, 53, 53, 53, 63, 68, 74, 86, 90, 97,
- 108, 110, 111, 112, 113, 110, 59, 54, 54, 54, 64, 69, 75, 87, 91, 98,
- 111, 112, 119, 117, 115, 118, 65, 60, 59, 58, 68, 73, 79, 92, 97, 105,
- 118, 119, 123, 123, 122, 119, 69, 63, 62, 62, 71, 76, 83, 96, 100, 109,
- 122, 124, 127, 125, 125, 128, 71, 65, 64, 63, 73, 78, 84, 97, 102, 111,
- 125, 127, 135, 134, 131, 129, 79, 72, 71, 70, 79, 84, 90, 104, 109, 118,
- 133, 135, 137, 136, 136, 137, 81, 74, 72, 71, 80, 85, 91, 105, 110, 120,
- 135, 137, 145, 143, 141, 138, 82, 75, 73, 72, 81, 86, 92, 106, 111, 121,
- 136, 139, 147, 148, 147, 149, 87, 79, 77, 76, 85, 90, 96, 110, 114, 125,
- 140, 143, 148, 154, 151, 149, 90, 82, 80, 78, 87, 89, 99, 108, 113, 129,
- 135, 146, 153, 157, 160, 159, 92, 84, 83, 81, 88, 90, 102, 106, 117,
- 128, 133, 150, 153, 158, 163, 160, 95, 87, 85, 83, 88, 92, 103, 105,
- 120, 125, 137, 148, 155, 164, 168, 173, 98, 89, 88, 85, 89, 95, 103,
- 108, 121, 124, 141, 144, 160, 164, 169, 174, 100, 92, 91, 88, 90, 98,
- 103, 111, 120, 127, 139, 146, 161, 165, 175, 179, 103, 94, 94, 90, 92,
- 101, 103, 114, 119, 131, 137, 150, 158, 170, 175, 180, 106, 97, 97, 93,
- 93, 104, 104, 118, 118, 135, 135, 154, 155, 175, 176, 187,
- /* Size 32x16 */
32, 31, 31, 31, 32, 32, 32, 34, 34, 36, 39, 40, 44, 47, 49, 53, 57, 59,
65, 69, 71, 79, 81, 82, 87, 90, 92, 95, 98, 100, 103, 106, 31, 32, 32,
32, 32, 32, 33, 34, 34, 34, 37, 38, 41, 44, 46, 49, 53, 54, 60, 63, 65,
@@ -1371,34 +1339,49 @@
136, 141, 147, 151, 160, 163, 168, 169, 175, 175, 176, 96, 91, 91, 87,
87, 85, 86, 83, 84, 89, 89, 95, 95, 102, 102, 110, 110, 118, 119, 128,
129, 137, 138, 149, 149, 159, 160, 173, 174, 179, 180, 187,
+ /* Size 32x16 */
+ 32, 31, 32, 32, 36, 39, 44, 53, 58, 65, 79, 81, 88, 90, 93, 96, 31, 32,
+ 32, 32, 35, 38, 42, 51, 55, 62, 75, 77, 83, 86, 88, 91, 31, 32, 32, 32,
+ 35, 38, 41, 50, 54, 60, 73, 75, 81, 84, 88, 91, 31, 32, 32, 33, 34, 37,
+ 41, 49, 53, 59, 72, 74, 79, 82, 84, 87, 32, 32, 33, 34, 36, 39, 42, 50,
+ 53, 59, 71, 72, 78, 81, 84, 87, 32, 32, 34, 34, 37, 40, 42, 49, 53, 58,
+ 70, 71, 77, 80, 83, 85, 32, 33, 34, 35, 38, 40, 42, 49, 52, 58, 69, 70,
+ 76, 78, 82, 86, 34, 34, 35, 37, 42, 45, 48, 54, 57, 63, 73, 75, 79, 79,
+ 81, 83, 34, 34, 36, 37, 44, 47, 50, 56, 59, 65, 75, 77, 81, 83, 84, 84,
+ 36, 34, 37, 38, 48, 51, 54, 60, 63, 68, 78, 80, 85, 85, 86, 89, 39, 37,
+ 39, 40, 50, 54, 58, 65, 68, 73, 84, 85, 88, 89, 90, 89, 40, 38, 40, 41,
+ 51, 55, 59, 67, 70, 75, 85, 87, 91, 92, 92, 95, 44, 41, 42, 43, 53, 58,
+ 63, 71, 74, 79, 90, 91, 97, 94, 97, 95, 47, 44, 45, 46, 56, 61, 66, 75,
+ 79, 85, 95, 97, 99, 101, 98, 102, 49, 46, 46, 47, 57, 62, 67, 77, 81,
+ 86, 97, 99, 104, 102, 105, 102, 53, 49, 50, 50, 60, 65, 71, 82, 86, 92,
+ 103, 105, 109, 108, 106, 110, 57, 53, 53, 53, 63, 68, 74, 86, 90, 97,
+ 108, 110, 111, 112, 113, 110, 59, 54, 54, 54, 64, 69, 75, 87, 91, 98,
+ 111, 112, 119, 117, 115, 118, 65, 60, 59, 58, 68, 73, 79, 92, 97, 105,
+ 118, 119, 123, 123, 122, 119, 69, 63, 62, 62, 71, 76, 83, 96, 100, 109,
+ 122, 124, 127, 125, 125, 128, 71, 65, 64, 63, 73, 78, 84, 97, 102, 111,
+ 125, 127, 135, 134, 131, 129, 79, 72, 71, 70, 79, 84, 90, 104, 109, 118,
+ 133, 135, 137, 136, 136, 137, 81, 74, 72, 71, 80, 85, 91, 105, 110, 120,
+ 135, 137, 145, 143, 141, 138, 82, 75, 73, 72, 81, 86, 92, 106, 111, 121,
+ 136, 139, 147, 148, 147, 149, 87, 79, 77, 76, 85, 90, 96, 110, 114, 125,
+ 140, 143, 148, 154, 151, 149, 90, 82, 80, 78, 87, 89, 99, 108, 113, 129,
+ 135, 146, 153, 157, 160, 159, 92, 84, 83, 81, 88, 90, 102, 106, 117,
+ 128, 133, 150, 153, 158, 163, 160, 95, 87, 85, 83, 88, 92, 103, 105,
+ 120, 125, 137, 148, 155, 164, 168, 173, 98, 89, 88, 85, 89, 95, 103,
+ 108, 121, 124, 141, 144, 160, 164, 169, 174, 100, 92, 91, 88, 90, 98,
+ 103, 111, 120, 127, 139, 146, 161, 165, 175, 179, 103, 94, 94, 90, 92,
+ 101, 103, 114, 119, 131, 137, 150, 158, 170, 175, 180, 106, 97, 97, 93,
+ 93, 104, 104, 118, 118, 135, 135, 154, 155, 175, 176, 187,
/* Size 4x16 */
- 31, 39, 65, 90, 32, 38, 60, 84, 32, 39, 59, 81, 33, 40, 58, 78, 34, 47,
- 65, 83, 37, 54, 73, 89, 41, 58, 79, 94, 46, 62, 86, 102, 53, 68, 97,
- 112, 60, 73, 105, 123, 65, 78, 111, 134, 74, 85, 120, 143, 79, 90, 125,
- 154, 84, 90, 128, 158, 89, 95, 124, 164, 94, 101, 131, 170,
- /* Size 16x4 */
31, 32, 32, 33, 34, 37, 41, 46, 53, 60, 65, 74, 79, 84, 89, 94, 39, 38,
39, 40, 47, 54, 58, 62, 68, 73, 78, 85, 90, 90, 95, 101, 65, 60, 59, 58,
65, 73, 79, 86, 97, 105, 111, 120, 125, 128, 124, 131, 90, 84, 81, 78,
83, 89, 94, 102, 112, 123, 134, 143, 154, 158, 164, 170,
+ /* Size 16x4 */
+ 31, 39, 65, 90, 32, 38, 60, 84, 32, 39, 59, 81, 33, 40, 58, 78, 34, 47,
+ 65, 83, 37, 54, 73, 89, 41, 58, 79, 94, 46, 62, 86, 102, 53, 68, 97,
+ 112, 60, 73, 105, 123, 65, 78, 111, 134, 74, 85, 120, 143, 79, 90, 125,
+ 154, 84, 90, 128, 158, 89, 95, 124, 164, 94, 101, 131, 170,
/* Size 8x32 */
- 32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 42, 55, 75, 83, 88, 31, 32,
- 35, 41, 54, 73, 81, 88, 31, 32, 34, 41, 53, 72, 79, 84, 32, 33, 36, 42,
- 53, 71, 78, 84, 32, 34, 37, 42, 53, 70, 77, 83, 32, 34, 38, 42, 52, 69,
- 76, 82, 34, 35, 42, 48, 57, 73, 79, 81, 34, 36, 44, 50, 59, 75, 81, 84,
- 36, 37, 48, 54, 63, 78, 85, 86, 39, 39, 50, 58, 68, 84, 88, 90, 40, 40,
- 51, 59, 70, 85, 91, 92, 44, 42, 53, 63, 74, 90, 97, 97, 47, 45, 56, 66,
- 79, 95, 99, 98, 49, 46, 57, 67, 81, 97, 104, 105, 53, 50, 60, 71, 86,
- 103, 109, 106, 57, 53, 63, 74, 90, 108, 111, 113, 59, 54, 64, 75, 91,
- 111, 119, 115, 65, 59, 68, 79, 97, 118, 123, 122, 69, 62, 71, 83, 100,
- 122, 127, 125, 71, 64, 73, 84, 102, 125, 135, 131, 79, 71, 79, 90, 109,
- 133, 137, 136, 81, 72, 80, 91, 110, 135, 145, 141, 82, 73, 81, 92, 111,
- 136, 147, 147, 87, 77, 85, 96, 114, 140, 148, 151, 90, 80, 87, 99, 113,
- 135, 153, 160, 92, 83, 88, 102, 117, 133, 153, 163, 95, 85, 88, 103,
- 120, 137, 155, 168, 98, 88, 89, 103, 121, 141, 160, 169, 100, 91, 90,
- 103, 120, 139, 161, 175, 103, 94, 92, 103, 119, 137, 158, 175, 106, 97,
- 93, 104, 118, 135, 155, 176,
- /* Size 32x8 */
32, 31, 31, 31, 32, 32, 32, 34, 34, 36, 39, 40, 44, 47, 49, 53, 57, 59,
65, 69, 71, 79, 81, 82, 87, 90, 92, 95, 98, 100, 103, 106, 32, 32, 32,
32, 33, 34, 34, 35, 36, 37, 39, 40, 42, 45, 46, 50, 53, 54, 59, 62, 64,
@@ -1414,7 +1397,24 @@
99, 104, 109, 111, 119, 123, 127, 135, 137, 145, 147, 148, 153, 153,
155, 160, 161, 158, 155, 93, 88, 88, 84, 84, 83, 82, 81, 84, 86, 90, 92,
97, 98, 105, 106, 113, 115, 122, 125, 131, 136, 141, 147, 151, 160, 163,
- 168, 169, 175, 175, 176 },
+ 168, 169, 175, 175, 176,
+ /* Size 32x8 */
+ 32, 32, 36, 44, 58, 79, 88, 93, 31, 32, 35, 42, 55, 75, 83, 88, 31, 32,
+ 35, 41, 54, 73, 81, 88, 31, 32, 34, 41, 53, 72, 79, 84, 32, 33, 36, 42,
+ 53, 71, 78, 84, 32, 34, 37, 42, 53, 70, 77, 83, 32, 34, 38, 42, 52, 69,
+ 76, 82, 34, 35, 42, 48, 57, 73, 79, 81, 34, 36, 44, 50, 59, 75, 81, 84,
+ 36, 37, 48, 54, 63, 78, 85, 86, 39, 39, 50, 58, 68, 84, 88, 90, 40, 40,
+ 51, 59, 70, 85, 91, 92, 44, 42, 53, 63, 74, 90, 97, 97, 47, 45, 56, 66,
+ 79, 95, 99, 98, 49, 46, 57, 67, 81, 97, 104, 105, 53, 50, 60, 71, 86,
+ 103, 109, 106, 57, 53, 63, 74, 90, 108, 111, 113, 59, 54, 64, 75, 91,
+ 111, 119, 115, 65, 59, 68, 79, 97, 118, 123, 122, 69, 62, 71, 83, 100,
+ 122, 127, 125, 71, 64, 73, 84, 102, 125, 135, 131, 79, 71, 79, 90, 109,
+ 133, 137, 136, 81, 72, 80, 91, 110, 135, 145, 141, 82, 73, 81, 92, 111,
+ 136, 147, 147, 87, 77, 85, 96, 114, 140, 148, 151, 90, 80, 87, 99, 113,
+ 135, 153, 160, 92, 83, 88, 102, 117, 133, 153, 163, 95, 85, 88, 103,
+ 120, 137, 155, 168, 98, 88, 89, 103, 121, 141, 160, 169, 100, 91, 90,
+ 103, 120, 139, 161, 175, 103, 94, 92, 103, 119, 137, 158, 175, 106, 97,
+ 93, 104, 118, 135, 155, 176 },
{ /* Chroma */
/* Size 4x4 */
32, 45, 53, 63, 45, 55, 62, 67, 53, 62, 80, 84, 63, 67, 84, 101,
@@ -1499,21 +1499,12 @@
62, 66, 66, 69, 69, 72, 73, 76, 77, 81, 81, 85, 85, 89, 90, 94, 94, 99,
99, 104, 104, 106, 106, 108,
/* Size 4x8 */
- 31, 47, 54, 64, 38, 46, 50, 60, 46, 53, 57, 62, 46, 56, 66, 71, 50, 59,
- 74, 79, 57, 64, 82, 88, 61, 65, 85, 97, 65, 67, 82, 99,
- /* Size 8x4 */
31, 38, 46, 46, 50, 57, 61, 65, 47, 46, 53, 56, 59, 64, 65, 67, 54, 50,
57, 66, 74, 82, 85, 82, 64, 60, 62, 71, 79, 88, 97, 99,
+ /* Size 8x4 */
+ 31, 47, 54, 64, 38, 46, 50, 60, 46, 53, 57, 62, 46, 56, 66, 71, 50, 59,
+ 74, 79, 57, 64, 82, 88, 61, 65, 85, 97, 65, 67, 82, 99,
/* Size 8x16 */
- 32, 34, 48, 49, 54, 63, 67, 69, 31, 36, 46, 46, 50, 58, 62, 65, 33, 40,
- 47, 46, 49, 56, 59, 62, 37, 44, 47, 45, 48, 54, 57, 60, 44, 46, 51, 51,
- 53, 59, 60, 61, 48, 46, 53, 56, 58, 64, 64, 64, 49, 45, 53, 58, 62, 67,
- 70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 54, 49, 55, 62, 70, 77, 77, 76,
- 57, 51, 56, 64, 73, 82, 83, 81, 60, 53, 58, 65, 75, 85, 89, 85, 64, 57,
- 61, 68, 78, 89, 93, 89, 66, 59, 63, 69, 79, 91, 94, 93, 68, 61, 63, 71,
- 79, 87, 96, 98, 70, 63, 63, 70, 80, 89, 97, 100, 72, 65, 63, 69, 77, 86,
- 95, 102,
- /* Size 16x8 */
32, 31, 33, 37, 44, 48, 49, 51, 54, 57, 60, 64, 66, 68, 70, 72, 34, 36,
40, 44, 46, 46, 45, 47, 49, 51, 53, 57, 59, 61, 63, 65, 48, 46, 47, 47,
51, 53, 53, 54, 55, 56, 58, 61, 63, 63, 63, 63, 49, 46, 46, 45, 51, 56,
@@ -1522,37 +1513,16 @@
85, 89, 91, 87, 89, 86, 67, 62, 59, 57, 60, 64, 70, 73, 77, 83, 89, 93,
94, 96, 97, 95, 69, 65, 62, 60, 61, 64, 68, 72, 76, 81, 85, 89, 93, 98,
100, 102,
+ /* Size 16x8 */
+ 32, 34, 48, 49, 54, 63, 67, 69, 31, 36, 46, 46, 50, 58, 62, 65, 33, 40,
+ 47, 46, 49, 56, 59, 62, 37, 44, 47, 45, 48, 54, 57, 60, 44, 46, 51, 51,
+ 53, 59, 60, 61, 48, 46, 53, 56, 58, 64, 64, 64, 49, 45, 53, 58, 62, 67,
+ 70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 54, 49, 55, 62, 70, 77, 77, 76,
+ 57, 51, 56, 64, 73, 82, 83, 81, 60, 53, 58, 65, 75, 85, 89, 85, 64, 57,
+ 61, 68, 78, 89, 93, 89, 66, 59, 63, 69, 79, 91, 94, 93, 68, 61, 63, 71,
+ 79, 87, 96, 98, 70, 63, 63, 70, 80, 89, 97, 100, 72, 65, 63, 69, 77, 86,
+ 95, 102,
/* Size 16x32 */
- 32, 31, 34, 37, 48, 48, 49, 52, 54, 57, 63, 64, 67, 68, 69, 69, 31, 31,
- 35, 38, 47, 47, 47, 50, 51, 54, 60, 61, 63, 64, 65, 66, 31, 32, 36, 39,
- 46, 46, 46, 48, 50, 53, 58, 59, 62, 63, 65, 66, 30, 32, 36, 40, 46, 45,
- 45, 48, 49, 52, 57, 58, 60, 61, 62, 63, 33, 36, 40, 43, 47, 46, 46, 47,
- 49, 51, 56, 57, 59, 60, 62, 63, 35, 38, 42, 45, 47, 46, 45, 47, 48, 50,
- 55, 56, 58, 60, 61, 61, 37, 40, 44, 47, 47, 46, 45, 47, 48, 50, 54, 55,
- 57, 58, 60, 61, 42, 43, 45, 47, 50, 50, 49, 50, 51, 53, 57, 58, 59, 58,
- 59, 59, 44, 44, 46, 47, 51, 51, 51, 52, 53, 54, 59, 59, 60, 61, 61, 60,
- 49, 46, 47, 48, 53, 53, 53, 54, 55, 57, 60, 61, 63, 62, 62, 63, 48, 46,
- 46, 47, 53, 54, 56, 57, 58, 60, 64, 64, 64, 64, 64, 63, 48, 45, 46, 46,
- 53, 55, 56, 58, 59, 61, 65, 65, 66, 66, 65, 66, 49, 45, 45, 46, 53, 56,
- 58, 61, 62, 64, 67, 68, 70, 67, 68, 66, 50, 46, 46, 46, 54, 56, 59, 63,
- 65, 66, 70, 71, 70, 71, 68, 70, 51, 47, 47, 47, 54, 57, 60, 64, 65, 68,
- 71, 72, 73, 71, 72, 70, 52, 48, 47, 47, 54, 57, 61, 66, 68, 71, 75, 75,
- 76, 75, 73, 73, 54, 49, 49, 48, 55, 58, 62, 68, 70, 73, 77, 78, 77, 77,
- 76, 74, 54, 50, 49, 49, 55, 59, 62, 68, 70, 74, 78, 79, 81, 79, 77, 78,
- 57, 52, 51, 50, 56, 60, 64, 70, 73, 76, 82, 82, 83, 82, 81, 78, 59, 54,
- 52, 52, 58, 61, 65, 72, 74, 78, 84, 85, 85, 83, 82, 82, 60, 54, 53, 52,
- 58, 62, 65, 72, 75, 79, 85, 86, 89, 87, 85, 82, 63, 57, 56, 55, 60, 64,
- 67, 75, 77, 82, 89, 90, 90, 88, 87, 86, 64, 58, 57, 55, 61, 64, 68, 75,
- 78, 82, 89, 90, 93, 91, 89, 87, 64, 59, 57, 56, 61, 65, 68, 75, 78, 83,
- 90, 91, 94, 93, 92, 91, 66, 60, 59, 57, 63, 66, 69, 77, 79, 84, 91, 93,
- 94, 95, 93, 91, 67, 61, 60, 58, 63, 65, 70, 75, 78, 85, 88, 93, 96, 97,
- 97, 95, 68, 62, 61, 59, 63, 64, 71, 74, 79, 84, 87, 94, 96, 97, 98, 96,
- 69, 63, 62, 60, 63, 65, 71, 72, 80, 82, 88, 93, 96, 99, 100, 101, 70,
- 64, 63, 60, 63, 66, 70, 73, 80, 81, 89, 90, 97, 99, 100, 101, 71, 65,
- 64, 61, 63, 67, 70, 74, 78, 82, 88, 90, 97, 99, 102, 103, 72, 65, 65,
- 62, 63, 68, 69, 75, 77, 83, 86, 92, 95, 100, 102, 103, 73, 66, 66, 63,
- 63, 69, 69, 76, 76, 84, 84, 93, 93, 101, 101, 105,
- /* Size 32x16 */
32, 31, 31, 30, 33, 35, 37, 42, 44, 49, 48, 48, 49, 50, 51, 52, 54, 54,
57, 59, 60, 63, 64, 64, 66, 67, 68, 69, 70, 71, 72, 73, 31, 31, 32, 32,
36, 38, 40, 43, 44, 46, 46, 45, 45, 46, 47, 48, 49, 50, 52, 54, 54, 57,
@@ -1582,33 +1552,47 @@
82, 85, 87, 89, 92, 93, 97, 98, 100, 100, 102, 102, 101, 69, 66, 66, 63,
63, 61, 61, 59, 60, 63, 63, 66, 66, 70, 70, 73, 74, 78, 78, 82, 82, 86,
87, 91, 91, 95, 96, 101, 101, 103, 103, 105,
+ /* Size 32x16 */
+ 32, 31, 34, 37, 48, 48, 49, 52, 54, 57, 63, 64, 67, 68, 69, 69, 31, 31,
+ 35, 38, 47, 47, 47, 50, 51, 54, 60, 61, 63, 64, 65, 66, 31, 32, 36, 39,
+ 46, 46, 46, 48, 50, 53, 58, 59, 62, 63, 65, 66, 30, 32, 36, 40, 46, 45,
+ 45, 48, 49, 52, 57, 58, 60, 61, 62, 63, 33, 36, 40, 43, 47, 46, 46, 47,
+ 49, 51, 56, 57, 59, 60, 62, 63, 35, 38, 42, 45, 47, 46, 45, 47, 48, 50,
+ 55, 56, 58, 60, 61, 61, 37, 40, 44, 47, 47, 46, 45, 47, 48, 50, 54, 55,
+ 57, 58, 60, 61, 42, 43, 45, 47, 50, 50, 49, 50, 51, 53, 57, 58, 59, 58,
+ 59, 59, 44, 44, 46, 47, 51, 51, 51, 52, 53, 54, 59, 59, 60, 61, 61, 60,
+ 49, 46, 47, 48, 53, 53, 53, 54, 55, 57, 60, 61, 63, 62, 62, 63, 48, 46,
+ 46, 47, 53, 54, 56, 57, 58, 60, 64, 64, 64, 64, 64, 63, 48, 45, 46, 46,
+ 53, 55, 56, 58, 59, 61, 65, 65, 66, 66, 65, 66, 49, 45, 45, 46, 53, 56,
+ 58, 61, 62, 64, 67, 68, 70, 67, 68, 66, 50, 46, 46, 46, 54, 56, 59, 63,
+ 65, 66, 70, 71, 70, 71, 68, 70, 51, 47, 47, 47, 54, 57, 60, 64, 65, 68,
+ 71, 72, 73, 71, 72, 70, 52, 48, 47, 47, 54, 57, 61, 66, 68, 71, 75, 75,
+ 76, 75, 73, 73, 54, 49, 49, 48, 55, 58, 62, 68, 70, 73, 77, 78, 77, 77,
+ 76, 74, 54, 50, 49, 49, 55, 59, 62, 68, 70, 74, 78, 79, 81, 79, 77, 78,
+ 57, 52, 51, 50, 56, 60, 64, 70, 73, 76, 82, 82, 83, 82, 81, 78, 59, 54,
+ 52, 52, 58, 61, 65, 72, 74, 78, 84, 85, 85, 83, 82, 82, 60, 54, 53, 52,
+ 58, 62, 65, 72, 75, 79, 85, 86, 89, 87, 85, 82, 63, 57, 56, 55, 60, 64,
+ 67, 75, 77, 82, 89, 90, 90, 88, 87, 86, 64, 58, 57, 55, 61, 64, 68, 75,
+ 78, 82, 89, 90, 93, 91, 89, 87, 64, 59, 57, 56, 61, 65, 68, 75, 78, 83,
+ 90, 91, 94, 93, 92, 91, 66, 60, 59, 57, 63, 66, 69, 77, 79, 84, 91, 93,
+ 94, 95, 93, 91, 67, 61, 60, 58, 63, 65, 70, 75, 78, 85, 88, 93, 96, 97,
+ 97, 95, 68, 62, 61, 59, 63, 64, 71, 74, 79, 84, 87, 94, 96, 97, 98, 96,
+ 69, 63, 62, 60, 63, 65, 71, 72, 80, 82, 88, 93, 96, 99, 100, 101, 70,
+ 64, 63, 60, 63, 66, 70, 73, 80, 81, 89, 90, 97, 99, 100, 101, 71, 65,
+ 64, 61, 63, 67, 70, 74, 78, 82, 88, 90, 97, 99, 102, 103, 72, 65, 65,
+ 62, 63, 68, 69, 75, 77, 83, 86, 92, 95, 100, 102, 103, 73, 66, 66, 63,
+ 63, 69, 69, 76, 76, 84, 84, 93, 93, 101, 101, 105,
/* Size 4x16 */
- 31, 48, 57, 68, 32, 46, 53, 63, 36, 46, 51, 60, 40, 46, 50, 58, 44, 51,
- 54, 61, 46, 54, 60, 64, 45, 56, 64, 67, 47, 57, 68, 71, 49, 58, 73, 77,
- 52, 60, 76, 82, 54, 62, 79, 87, 58, 64, 82, 91, 60, 66, 84, 95, 62, 64,
- 84, 97, 64, 66, 81, 99, 65, 68, 83, 100,
- /* Size 16x4 */
31, 32, 36, 40, 44, 46, 45, 47, 49, 52, 54, 58, 60, 62, 64, 65, 48, 46,
46, 46, 51, 54, 56, 57, 58, 60, 62, 64, 66, 64, 66, 68, 57, 53, 51, 50,
54, 60, 64, 68, 73, 76, 79, 82, 84, 84, 81, 83, 68, 63, 60, 58, 61, 64,
67, 71, 77, 82, 87, 91, 95, 97, 99, 100,
+ /* Size 16x4 */
+ 31, 48, 57, 68, 32, 46, 53, 63, 36, 46, 51, 60, 40, 46, 50, 58, 44, 51,
+ 54, 61, 46, 54, 60, 64, 45, 56, 64, 67, 47, 57, 68, 71, 49, 58, 73, 77,
+ 52, 60, 76, 82, 54, 62, 79, 87, 58, 64, 82, 91, 60, 66, 84, 95, 62, 64,
+ 84, 97, 64, 66, 81, 99, 65, 68, 83, 100,
/* Size 8x32 */
- 32, 34, 48, 49, 54, 63, 67, 69, 31, 35, 47, 47, 51, 60, 63, 65, 31, 36,
- 46, 46, 50, 58, 62, 65, 30, 36, 46, 45, 49, 57, 60, 62, 33, 40, 47, 46,
- 49, 56, 59, 62, 35, 42, 47, 45, 48, 55, 58, 61, 37, 44, 47, 45, 48, 54,
- 57, 60, 42, 45, 50, 49, 51, 57, 59, 59, 44, 46, 51, 51, 53, 59, 60, 61,
- 49, 47, 53, 53, 55, 60, 63, 62, 48, 46, 53, 56, 58, 64, 64, 64, 48, 46,
- 53, 56, 59, 65, 66, 65, 49, 45, 53, 58, 62, 67, 70, 68, 50, 46, 54, 59,
- 65, 70, 70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 52, 47, 54, 61, 68, 75,
- 76, 73, 54, 49, 55, 62, 70, 77, 77, 76, 54, 49, 55, 62, 70, 78, 81, 77,
- 57, 51, 56, 64, 73, 82, 83, 81, 59, 52, 58, 65, 74, 84, 85, 82, 60, 53,
- 58, 65, 75, 85, 89, 85, 63, 56, 60, 67, 77, 89, 90, 87, 64, 57, 61, 68,
- 78, 89, 93, 89, 64, 57, 61, 68, 78, 90, 94, 92, 66, 59, 63, 69, 79, 91,
- 94, 93, 67, 60, 63, 70, 78, 88, 96, 97, 68, 61, 63, 71, 79, 87, 96, 98,
- 69, 62, 63, 71, 80, 88, 96, 100, 70, 63, 63, 70, 80, 89, 97, 100, 71,
- 64, 63, 70, 78, 88, 97, 102, 72, 65, 63, 69, 77, 86, 95, 102, 73, 66,
- 63, 69, 76, 84, 93, 101,
- /* Size 32x8 */
32, 31, 31, 30, 33, 35, 37, 42, 44, 49, 48, 48, 49, 50, 51, 52, 54, 54,
57, 59, 60, 63, 64, 64, 66, 67, 68, 69, 70, 71, 72, 73, 34, 35, 36, 36,
40, 42, 44, 45, 46, 47, 46, 46, 45, 46, 47, 47, 49, 49, 51, 52, 53, 56,
@@ -1623,7 +1607,23 @@
57, 59, 60, 63, 64, 66, 70, 70, 73, 76, 77, 81, 83, 85, 89, 90, 93, 94,
94, 96, 96, 96, 97, 97, 95, 93, 69, 65, 65, 62, 62, 61, 60, 59, 61, 62,
64, 65, 68, 68, 72, 73, 76, 77, 81, 82, 85, 87, 89, 92, 93, 97, 98, 100,
- 100, 102, 102, 101 },
+ 100, 102, 102, 101,
+ /* Size 32x8 */
+ 32, 34, 48, 49, 54, 63, 67, 69, 31, 35, 47, 47, 51, 60, 63, 65, 31, 36,
+ 46, 46, 50, 58, 62, 65, 30, 36, 46, 45, 49, 57, 60, 62, 33, 40, 47, 46,
+ 49, 56, 59, 62, 35, 42, 47, 45, 48, 55, 58, 61, 37, 44, 47, 45, 48, 54,
+ 57, 60, 42, 45, 50, 49, 51, 57, 59, 59, 44, 46, 51, 51, 53, 59, 60, 61,
+ 49, 47, 53, 53, 55, 60, 63, 62, 48, 46, 53, 56, 58, 64, 64, 64, 48, 46,
+ 53, 56, 59, 65, 66, 65, 49, 45, 53, 58, 62, 67, 70, 68, 50, 46, 54, 59,
+ 65, 70, 70, 68, 51, 47, 54, 60, 65, 71, 73, 72, 52, 47, 54, 61, 68, 75,
+ 76, 73, 54, 49, 55, 62, 70, 77, 77, 76, 54, 49, 55, 62, 70, 78, 81, 77,
+ 57, 51, 56, 64, 73, 82, 83, 81, 59, 52, 58, 65, 74, 84, 85, 82, 60, 53,
+ 58, 65, 75, 85, 89, 85, 63, 56, 60, 67, 77, 89, 90, 87, 64, 57, 61, 68,
+ 78, 89, 93, 89, 64, 57, 61, 68, 78, 90, 94, 92, 66, 59, 63, 69, 79, 91,
+ 94, 93, 67, 60, 63, 70, 78, 88, 96, 97, 68, 61, 63, 71, 79, 87, 96, 98,
+ 69, 62, 63, 71, 80, 88, 96, 100, 70, 63, 63, 70, 80, 89, 97, 100, 71,
+ 64, 63, 70, 78, 88, 97, 102, 72, 65, 63, 69, 77, 86, 95, 102, 73, 66,
+ 63, 69, 76, 84, 93, 101 },
},
{
{ /* Luma */
@@ -1714,21 +1714,12 @@
89, 89, 86, 86, 92, 92, 97, 98, 104, 104, 111, 111, 119, 119, 128, 129,
137, 137, 147, 148, 157, 158, 169, 170, 174, 175, 181,
/* Size 4x8 */
- 32, 35, 59, 83, 32, 36, 57, 78, 34, 47, 65, 82, 41, 53, 78, 97, 51, 61,
- 92, 111, 65, 73, 108, 129, 75, 81, 117, 148, 86, 92, 119, 154,
- /* Size 8x4 */
32, 32, 34, 41, 51, 65, 75, 86, 35, 36, 47, 53, 61, 73, 81, 92, 59, 57,
65, 78, 92, 108, 117, 119, 83, 78, 82, 97, 111, 129, 148, 154,
+ /* Size 8x4 */
+ 32, 35, 59, 83, 32, 36, 57, 78, 34, 47, 65, 82, 41, 53, 78, 97, 51, 61,
+ 92, 111, 65, 73, 108, 129, 75, 81, 117, 148, 86, 92, 119, 154,
/* Size 8x16 */
- 32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 34, 41, 50, 61, 76, 85, 31, 33,
- 35, 42, 49, 59, 73, 81, 32, 34, 37, 42, 49, 58, 71, 79, 34, 35, 41, 48,
- 54, 63, 76, 81, 36, 36, 46, 54, 60, 68, 80, 87, 41, 40, 49, 60, 67, 76,
- 88, 93, 47, 44, 53, 66, 75, 84, 97, 101, 53, 50, 57, 71, 82, 92, 106,
- 108, 58, 54, 61, 75, 87, 98, 112, 116, 65, 59, 66, 79, 92, 105, 120,
- 124, 74, 67, 73, 86, 100, 113, 131, 134, 82, 73, 79, 92, 105, 120, 139,
- 142, 87, 78, 83, 96, 110, 125, 144, 153, 92, 83, 84, 97, 114, 132, 150,
- 157, 97, 88, 86, 97, 111, 128, 147, 163,
- /* Size 16x8 */
32, 31, 31, 32, 34, 36, 41, 47, 53, 58, 65, 74, 82, 87, 92, 97, 31, 32,
33, 34, 35, 36, 40, 44, 50, 54, 59, 67, 73, 78, 83, 88, 35, 34, 35, 37,
41, 46, 49, 53, 57, 61, 66, 73, 79, 83, 84, 86, 44, 41, 42, 42, 48, 54,
@@ -1737,39 +1728,16 @@
98, 105, 113, 120, 125, 132, 128, 82, 76, 73, 71, 76, 80, 88, 97, 106,
112, 120, 131, 139, 144, 150, 147, 90, 85, 81, 79, 81, 87, 93, 101, 108,
116, 124, 134, 142, 153, 157, 163,
+ /* Size 16x8 */
+ 32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 34, 41, 50, 61, 76, 85, 31, 33,
+ 35, 42, 49, 59, 73, 81, 32, 34, 37, 42, 49, 58, 71, 79, 34, 35, 41, 48,
+ 54, 63, 76, 81, 36, 36, 46, 54, 60, 68, 80, 87, 41, 40, 49, 60, 67, 76,
+ 88, 93, 47, 44, 53, 66, 75, 84, 97, 101, 53, 50, 57, 71, 82, 92, 106,
+ 108, 58, 54, 61, 75, 87, 98, 112, 116, 65, 59, 66, 79, 92, 105, 120,
+ 124, 74, 67, 73, 86, 100, 113, 131, 134, 82, 73, 79, 92, 105, 120, 139,
+ 142, 87, 78, 83, 96, 110, 125, 144, 153, 92, 83, 84, 97, 114, 132, 150,
+ 157, 97, 88, 86, 97, 111, 128, 147, 163,
/* Size 16x32 */
- 32, 31, 31, 32, 35, 36, 44, 47, 53, 62, 65, 79, 82, 88, 90, 93, 31, 32,
- 32, 32, 35, 35, 42, 45, 51, 59, 62, 75, 78, 83, 86, 88, 31, 32, 32, 32,
- 34, 35, 41, 45, 50, 58, 61, 74, 76, 82, 85, 88, 31, 32, 32, 33, 34, 34,
- 41, 44, 49, 57, 59, 72, 74, 79, 82, 84, 31, 32, 33, 34, 35, 36, 42, 44,
- 49, 57, 59, 71, 73, 79, 81, 84, 32, 32, 33, 34, 36, 36, 42, 45, 50, 57,
- 59, 71, 73, 78, 80, 82, 32, 33, 34, 35, 37, 38, 42, 45, 49, 56, 58, 69,
- 71, 76, 79, 83, 32, 33, 34, 36, 39, 40, 44, 47, 51, 58, 60, 71, 73, 76,
- 78, 80, 34, 34, 35, 37, 41, 42, 48, 50, 54, 61, 63, 73, 76, 81, 81, 80,
- 35, 34, 36, 38, 45, 47, 52, 55, 59, 65, 67, 77, 79, 82, 83, 86, 36, 34,
- 36, 38, 46, 48, 54, 56, 60, 66, 68, 78, 80, 85, 87, 86, 39, 37, 39, 40,
- 48, 50, 58, 60, 65, 71, 73, 84, 86, 89, 88, 91, 41, 39, 40, 41, 49, 51,
- 60, 62, 67, 74, 76, 86, 88, 91, 93, 91, 44, 41, 42, 43, 51, 53, 63, 66,
- 71, 78, 79, 90, 92, 97, 94, 97, 47, 44, 44, 45, 53, 56, 66, 69, 75, 82,
- 84, 95, 97, 98, 101, 98, 48, 45, 45, 46, 54, 56, 67, 70, 76, 83, 85, 96,
- 98, 104, 101, 105, 53, 49, 50, 50, 57, 60, 71, 75, 82, 90, 92, 103, 106,
- 107, 108, 105, 55, 51, 51, 51, 59, 61, 72, 77, 84, 92, 94, 106, 108,
- 111, 110, 112, 58, 54, 54, 54, 61, 63, 75, 79, 87, 95, 98, 110, 112,
- 117, 116, 113, 63, 58, 58, 57, 65, 67, 78, 83, 91, 100, 103, 116, 118,
- 119, 119, 121, 65, 60, 59, 58, 66, 68, 79, 84, 92, 102, 105, 118, 120,
- 127, 124, 122, 71, 65, 64, 63, 71, 73, 84, 89, 97, 108, 111, 125, 127,
- 129, 129, 130, 74, 68, 67, 66, 73, 75, 86, 91, 100, 110, 113, 128, 131,
- 135, 134, 130, 79, 72, 71, 70, 77, 79, 90, 95, 104, 115, 118, 133, 136,
- 140, 139, 140, 82, 75, 73, 72, 79, 81, 92, 97, 105, 117, 120, 136, 139,
- 145, 142, 140, 82, 75, 74, 72, 79, 81, 92, 97, 106, 117, 121, 136, 139,
- 148, 150, 149, 87, 79, 78, 76, 83, 85, 96, 100, 110, 120, 125, 141, 144,
- 148, 153, 150, 89, 82, 81, 78, 83, 87, 97, 99, 113, 118, 128, 139, 145,
- 153, 157, 161, 92, 84, 83, 80, 84, 89, 97, 101, 114, 116, 132, 135, 150,
- 153, 157, 162, 94, 86, 85, 82, 85, 92, 97, 104, 112, 119, 130, 136, 151,
- 154, 163, 166, 97, 88, 88, 85, 86, 94, 97, 107, 111, 123, 128, 140, 147,
- 159, 163, 167, 99, 91, 91, 87, 87, 97, 97, 110, 110, 126, 126, 144, 144,
- 163, 163, 173,
- /* Size 32x16 */
32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 53, 55,
58, 63, 65, 71, 74, 79, 82, 82, 87, 89, 92, 94, 97, 99, 31, 32, 32, 32,
32, 32, 33, 33, 34, 34, 34, 37, 39, 41, 44, 45, 49, 51, 54, 58, 60, 65,
@@ -1801,34 +1769,49 @@
157, 157, 163, 163, 163, 93, 88, 88, 84, 84, 82, 83, 80, 80, 86, 86, 91,
91, 97, 98, 105, 105, 112, 113, 121, 122, 130, 130, 140, 140, 149, 150,
161, 162, 166, 167, 173,
+ /* Size 32x16 */
+ 32, 31, 31, 32, 35, 36, 44, 47, 53, 62, 65, 79, 82, 88, 90, 93, 31, 32,
+ 32, 32, 35, 35, 42, 45, 51, 59, 62, 75, 78, 83, 86, 88, 31, 32, 32, 32,
+ 34, 35, 41, 45, 50, 58, 61, 74, 76, 82, 85, 88, 31, 32, 32, 33, 34, 34,
+ 41, 44, 49, 57, 59, 72, 74, 79, 82, 84, 31, 32, 33, 34, 35, 36, 42, 44,
+ 49, 57, 59, 71, 73, 79, 81, 84, 32, 32, 33, 34, 36, 36, 42, 45, 50, 57,
+ 59, 71, 73, 78, 80, 82, 32, 33, 34, 35, 37, 38, 42, 45, 49, 56, 58, 69,
+ 71, 76, 79, 83, 32, 33, 34, 36, 39, 40, 44, 47, 51, 58, 60, 71, 73, 76,
+ 78, 80, 34, 34, 35, 37, 41, 42, 48, 50, 54, 61, 63, 73, 76, 81, 81, 80,
+ 35, 34, 36, 38, 45, 47, 52, 55, 59, 65, 67, 77, 79, 82, 83, 86, 36, 34,
+ 36, 38, 46, 48, 54, 56, 60, 66, 68, 78, 80, 85, 87, 86, 39, 37, 39, 40,
+ 48, 50, 58, 60, 65, 71, 73, 84, 86, 89, 88, 91, 41, 39, 40, 41, 49, 51,
+ 60, 62, 67, 74, 76, 86, 88, 91, 93, 91, 44, 41, 42, 43, 51, 53, 63, 66,
+ 71, 78, 79, 90, 92, 97, 94, 97, 47, 44, 44, 45, 53, 56, 66, 69, 75, 82,
+ 84, 95, 97, 98, 101, 98, 48, 45, 45, 46, 54, 56, 67, 70, 76, 83, 85, 96,
+ 98, 104, 101, 105, 53, 49, 50, 50, 57, 60, 71, 75, 82, 90, 92, 103, 106,
+ 107, 108, 105, 55, 51, 51, 51, 59, 61, 72, 77, 84, 92, 94, 106, 108,
+ 111, 110, 112, 58, 54, 54, 54, 61, 63, 75, 79, 87, 95, 98, 110, 112,
+ 117, 116, 113, 63, 58, 58, 57, 65, 67, 78, 83, 91, 100, 103, 116, 118,
+ 119, 119, 121, 65, 60, 59, 58, 66, 68, 79, 84, 92, 102, 105, 118, 120,
+ 127, 124, 122, 71, 65, 64, 63, 71, 73, 84, 89, 97, 108, 111, 125, 127,
+ 129, 129, 130, 74, 68, 67, 66, 73, 75, 86, 91, 100, 110, 113, 128, 131,
+ 135, 134, 130, 79, 72, 71, 70, 77, 79, 90, 95, 104, 115, 118, 133, 136,
+ 140, 139, 140, 82, 75, 73, 72, 79, 81, 92, 97, 105, 117, 120, 136, 139,
+ 145, 142, 140, 82, 75, 74, 72, 79, 81, 92, 97, 106, 117, 121, 136, 139,
+ 148, 150, 149, 87, 79, 78, 76, 83, 85, 96, 100, 110, 120, 125, 141, 144,
+ 148, 153, 150, 89, 82, 81, 78, 83, 87, 97, 99, 113, 118, 128, 139, 145,
+ 153, 157, 161, 92, 84, 83, 80, 84, 89, 97, 101, 114, 116, 132, 135, 150,
+ 153, 157, 162, 94, 86, 85, 82, 85, 92, 97, 104, 112, 119, 130, 136, 151,
+ 154, 163, 166, 97, 88, 88, 85, 86, 94, 97, 107, 111, 123, 128, 140, 147,
+ 159, 163, 167, 99, 91, 91, 87, 87, 97, 97, 110, 110, 126, 126, 144, 144,
+ 163, 163, 173,
/* Size 4x16 */
- 31, 36, 62, 88, 32, 35, 58, 82, 32, 36, 57, 79, 33, 38, 56, 76, 34, 42,
- 61, 81, 34, 48, 66, 85, 39, 51, 74, 91, 44, 56, 82, 98, 49, 60, 90, 107,
- 54, 63, 95, 117, 60, 68, 102, 127, 68, 75, 110, 135, 75, 81, 117, 145,
- 79, 85, 120, 148, 84, 89, 116, 153, 88, 94, 123, 159,
- /* Size 16x4 */
31, 32, 32, 33, 34, 34, 39, 44, 49, 54, 60, 68, 75, 79, 84, 88, 36, 35,
36, 38, 42, 48, 51, 56, 60, 63, 68, 75, 81, 85, 89, 94, 62, 58, 57, 56,
61, 66, 74, 82, 90, 95, 102, 110, 117, 120, 116, 123, 88, 82, 79, 76,
81, 85, 91, 98, 107, 117, 127, 135, 145, 148, 153, 159,
+ /* Size 16x4 */
+ 31, 36, 62, 88, 32, 35, 58, 82, 32, 36, 57, 79, 33, 38, 56, 76, 34, 42,
+ 61, 81, 34, 48, 66, 85, 39, 51, 74, 91, 44, 56, 82, 98, 49, 60, 90, 107,
+ 54, 63, 95, 117, 60, 68, 102, 127, 68, 75, 110, 135, 75, 81, 117, 145,
+ 79, 85, 120, 148, 84, 89, 116, 153, 88, 94, 123, 159,
/* Size 8x32 */
- 32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 35, 42, 51, 62, 78, 86, 31, 32,
- 34, 41, 50, 61, 76, 85, 31, 32, 34, 41, 49, 59, 74, 82, 31, 33, 35, 42,
- 49, 59, 73, 81, 32, 33, 36, 42, 50, 59, 73, 80, 32, 34, 37, 42, 49, 58,
- 71, 79, 32, 34, 39, 44, 51, 60, 73, 78, 34, 35, 41, 48, 54, 63, 76, 81,
- 35, 36, 45, 52, 59, 67, 79, 83, 36, 36, 46, 54, 60, 68, 80, 87, 39, 39,
- 48, 58, 65, 73, 86, 88, 41, 40, 49, 60, 67, 76, 88, 93, 44, 42, 51, 63,
- 71, 79, 92, 94, 47, 44, 53, 66, 75, 84, 97, 101, 48, 45, 54, 67, 76, 85,
- 98, 101, 53, 50, 57, 71, 82, 92, 106, 108, 55, 51, 59, 72, 84, 94, 108,
- 110, 58, 54, 61, 75, 87, 98, 112, 116, 63, 58, 65, 78, 91, 103, 118,
- 119, 65, 59, 66, 79, 92, 105, 120, 124, 71, 64, 71, 84, 97, 111, 127,
- 129, 74, 67, 73, 86, 100, 113, 131, 134, 79, 71, 77, 90, 104, 118, 136,
- 139, 82, 73, 79, 92, 105, 120, 139, 142, 82, 74, 79, 92, 106, 121, 139,
- 150, 87, 78, 83, 96, 110, 125, 144, 153, 89, 81, 83, 97, 113, 128, 145,
- 157, 92, 83, 84, 97, 114, 132, 150, 157, 94, 85, 85, 97, 112, 130, 151,
- 163, 97, 88, 86, 97, 111, 128, 147, 163, 99, 91, 87, 97, 110, 126, 144,
- 163,
- /* Size 32x8 */
32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 53, 55,
58, 63, 65, 71, 74, 79, 82, 82, 87, 89, 92, 94, 97, 99, 31, 32, 32, 32,
33, 33, 34, 34, 35, 36, 36, 39, 40, 42, 44, 45, 50, 51, 54, 58, 59, 64,
@@ -1844,7 +1827,24 @@
108, 112, 118, 120, 127, 131, 136, 139, 139, 144, 145, 150, 151, 147,
144, 90, 86, 85, 82, 81, 80, 79, 78, 81, 83, 87, 88, 93, 94, 101, 101,
108, 110, 116, 119, 124, 129, 134, 139, 142, 150, 153, 157, 157, 163,
- 163, 163 },
+ 163, 163,
+ /* Size 32x8 */
+ 32, 31, 35, 44, 53, 65, 82, 90, 31, 32, 35, 42, 51, 62, 78, 86, 31, 32,
+ 34, 41, 50, 61, 76, 85, 31, 32, 34, 41, 49, 59, 74, 82, 31, 33, 35, 42,
+ 49, 59, 73, 81, 32, 33, 36, 42, 50, 59, 73, 80, 32, 34, 37, 42, 49, 58,
+ 71, 79, 32, 34, 39, 44, 51, 60, 73, 78, 34, 35, 41, 48, 54, 63, 76, 81,
+ 35, 36, 45, 52, 59, 67, 79, 83, 36, 36, 46, 54, 60, 68, 80, 87, 39, 39,
+ 48, 58, 65, 73, 86, 88, 41, 40, 49, 60, 67, 76, 88, 93, 44, 42, 51, 63,
+ 71, 79, 92, 94, 47, 44, 53, 66, 75, 84, 97, 101, 48, 45, 54, 67, 76, 85,
+ 98, 101, 53, 50, 57, 71, 82, 92, 106, 108, 55, 51, 59, 72, 84, 94, 108,
+ 110, 58, 54, 61, 75, 87, 98, 112, 116, 63, 58, 65, 78, 91, 103, 118,
+ 119, 65, 59, 66, 79, 92, 105, 120, 124, 71, 64, 71, 84, 97, 111, 127,
+ 129, 74, 67, 73, 86, 100, 113, 131, 134, 79, 71, 77, 90, 104, 118, 136,
+ 139, 82, 73, 79, 92, 105, 120, 139, 142, 82, 74, 79, 92, 106, 121, 139,
+ 150, 87, 78, 83, 96, 110, 125, 144, 153, 89, 81, 83, 97, 113, 128, 145,
+ 157, 92, 83, 84, 97, 114, 132, 150, 157, 94, 85, 85, 97, 112, 130, 151,
+ 163, 97, 88, 86, 97, 111, 128, 147, 163, 99, 91, 87, 97, 110, 126, 144,
+ 163 },
{ /* Chroma */
/* Size 4x4 */
32, 45, 51, 61, 45, 54, 59, 65, 51, 59, 75, 81, 61, 65, 81, 97,
@@ -1929,21 +1929,12 @@
70, 70, 74, 74, 78, 78, 82, 82, 86, 86, 91, 91, 95, 95, 100, 100, 101,
101, 104,
/* Size 4x8 */
- 31, 47, 53, 63, 36, 47, 50, 59, 46, 52, 55, 61, 45, 53, 63, 70, 49, 55,
- 71, 77, 54, 58, 77, 86, 59, 61, 81, 94, 63, 65, 80, 95,
- /* Size 8x4 */
31, 36, 46, 45, 49, 54, 59, 63, 47, 47, 52, 53, 55, 58, 61, 65, 53, 50,
55, 63, 71, 77, 81, 80, 63, 59, 61, 70, 77, 86, 94, 95,
+ /* Size 8x4 */
+ 31, 47, 53, 63, 36, 47, 50, 59, 46, 52, 55, 61, 45, 53, 63, 70, 49, 55,
+ 71, 77, 54, 58, 77, 86, 59, 61, 81, 94, 63, 65, 80, 95,
/* Size 8x16 */
- 32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 46, 49, 53, 60, 64, 33, 37,
- 46, 45, 47, 51, 57, 61, 37, 43, 47, 45, 47, 50, 55, 59, 42, 44, 49, 49,
- 50, 53, 58, 60, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46, 51, 57, 59, 61,
- 66, 67, 50, 46, 52, 59, 63, 66, 71, 71, 52, 47, 53, 61, 66, 71, 75, 74,
- 54, 49, 54, 62, 68, 73, 79, 79, 57, 51, 55, 64, 70, 76, 83, 83, 61, 55,
- 58, 66, 73, 80, 87, 87, 64, 57, 60, 68, 75, 83, 91, 91, 66, 59, 61, 69,
- 77, 84, 93, 95, 68, 61, 61, 68, 77, 86, 94, 97, 70, 63, 61, 67, 75, 83,
- 92, 98,
- /* Size 16x8 */
32, 31, 33, 37, 42, 49, 48, 50, 52, 54, 57, 61, 64, 66, 68, 70, 33, 34,
37, 43, 44, 47, 46, 46, 47, 49, 51, 55, 57, 59, 61, 63, 45, 45, 46, 47,
49, 52, 51, 52, 53, 54, 55, 58, 60, 61, 61, 61, 49, 46, 45, 45, 49, 53,
@@ -1952,37 +1943,16 @@
76, 80, 83, 84, 86, 83, 64, 60, 57, 55, 58, 61, 66, 71, 75, 79, 83, 87,
91, 93, 94, 92, 68, 64, 61, 59, 60, 63, 67, 71, 74, 79, 83, 87, 91, 95,
97, 98,
+ /* Size 16x8 */
+ 32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 46, 49, 53, 60, 64, 33, 37,
+ 46, 45, 47, 51, 57, 61, 37, 43, 47, 45, 47, 50, 55, 59, 42, 44, 49, 49,
+ 50, 53, 58, 60, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46, 51, 57, 59, 61,
+ 66, 67, 50, 46, 52, 59, 63, 66, 71, 71, 52, 47, 53, 61, 66, 71, 75, 74,
+ 54, 49, 54, 62, 68, 73, 79, 79, 57, 51, 55, 64, 70, 76, 83, 83, 61, 55,
+ 58, 66, 73, 80, 87, 87, 64, 57, 60, 68, 75, 83, 91, 91, 66, 59, 61, 69,
+ 77, 84, 93, 95, 68, 61, 61, 68, 77, 86, 94, 97, 70, 63, 61, 67, 75, 83,
+ 92, 98,
/* Size 16x32 */
- 32, 31, 33, 37, 45, 48, 49, 50, 52, 56, 57, 63, 64, 67, 68, 68, 31, 31,
- 34, 38, 45, 47, 47, 48, 50, 53, 54, 60, 61, 63, 64, 65, 31, 32, 34, 39,
- 45, 46, 46, 47, 49, 52, 53, 59, 60, 62, 64, 65, 30, 32, 35, 40, 44, 46,
- 45, 46, 48, 51, 52, 57, 58, 60, 61, 62, 33, 35, 37, 42, 46, 47, 45, 46,
- 47, 50, 51, 56, 57, 60, 61, 62, 33, 36, 38, 43, 46, 47, 46, 46, 47, 50,
- 51, 56, 57, 59, 60, 60, 37, 40, 43, 47, 47, 47, 45, 46, 47, 49, 50, 54,
- 55, 57, 59, 61, 39, 41, 43, 47, 48, 48, 47, 47, 48, 50, 51, 55, 56, 57,
- 58, 59, 42, 43, 44, 47, 49, 50, 49, 50, 50, 53, 53, 57, 58, 60, 60, 59,
- 47, 46, 46, 48, 51, 52, 53, 53, 53, 55, 56, 60, 61, 61, 61, 62, 49, 46,
- 47, 48, 52, 53, 53, 54, 54, 56, 57, 60, 61, 63, 63, 62, 48, 46, 46, 47,
- 51, 53, 56, 56, 57, 59, 60, 64, 64, 65, 64, 65, 48, 45, 46, 46, 51, 53,
- 57, 57, 59, 61, 61, 65, 66, 66, 67, 65, 49, 45, 45, 46, 51, 53, 58, 59,
- 61, 63, 64, 67, 68, 70, 67, 68, 50, 46, 46, 46, 52, 54, 59, 61, 63, 65,
- 66, 70, 71, 70, 71, 68, 50, 46, 46, 46, 52, 54, 59, 61, 64, 66, 67, 71,
- 71, 73, 71, 72, 52, 48, 47, 47, 53, 54, 61, 63, 66, 70, 71, 75, 75, 75,
- 74, 72, 53, 49, 48, 48, 53, 55, 61, 64, 67, 71, 72, 76, 77, 77, 75, 76,
- 54, 50, 49, 49, 54, 55, 62, 65, 68, 72, 73, 78, 79, 80, 79, 76, 56, 51,
- 51, 50, 55, 56, 63, 66, 70, 74, 76, 81, 82, 81, 80, 80, 57, 52, 51, 50,
- 55, 56, 64, 66, 70, 75, 76, 82, 83, 85, 83, 80, 60, 54, 54, 52, 57, 58,
- 65, 68, 72, 77, 79, 85, 86, 86, 85, 84, 61, 56, 55, 53, 58, 59, 66, 69,
- 73, 79, 80, 86, 87, 89, 87, 84, 63, 57, 56, 55, 59, 60, 67, 70, 75, 80,
- 82, 89, 90, 91, 89, 89, 64, 58, 57, 56, 60, 61, 68, 71, 75, 81, 83, 90,
- 91, 93, 91, 89, 64, 59, 58, 56, 60, 61, 68, 71, 75, 81, 83, 90, 91, 94,
- 94, 93, 66, 60, 59, 57, 61, 63, 69, 72, 77, 82, 84, 92, 93, 94, 95, 93,
- 67, 61, 60, 58, 61, 63, 69, 70, 78, 80, 85, 90, 93, 96, 97, 97, 68, 62,
- 61, 59, 61, 64, 68, 71, 77, 79, 86, 88, 94, 96, 97, 98, 69, 63, 62, 59,
- 61, 65, 68, 72, 76, 80, 85, 88, 94, 95, 99, 99, 70, 63, 63, 60, 61, 66,
- 67, 73, 75, 81, 83, 89, 92, 97, 98, 99, 70, 64, 64, 61, 61, 67, 67, 74,
- 74, 82, 82, 90, 90, 98, 98, 102,
- /* Size 32x16 */
32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 52, 53,
54, 56, 57, 60, 61, 63, 64, 64, 66, 67, 68, 69, 70, 70, 31, 31, 32, 32,
35, 36, 40, 41, 43, 46, 46, 46, 45, 45, 46, 46, 48, 49, 50, 51, 52, 54,
@@ -2012,33 +1982,47 @@
83, 85, 87, 89, 91, 94, 95, 97, 97, 99, 98, 98, 68, 65, 65, 62, 62, 60,
61, 59, 59, 62, 62, 65, 65, 68, 68, 72, 72, 76, 76, 80, 80, 84, 84, 89,
89, 93, 93, 97, 98, 99, 99, 102,
+ /* Size 32x16 */
+ 32, 31, 33, 37, 45, 48, 49, 50, 52, 56, 57, 63, 64, 67, 68, 68, 31, 31,
+ 34, 38, 45, 47, 47, 48, 50, 53, 54, 60, 61, 63, 64, 65, 31, 32, 34, 39,
+ 45, 46, 46, 47, 49, 52, 53, 59, 60, 62, 64, 65, 30, 32, 35, 40, 44, 46,
+ 45, 46, 48, 51, 52, 57, 58, 60, 61, 62, 33, 35, 37, 42, 46, 47, 45, 46,
+ 47, 50, 51, 56, 57, 60, 61, 62, 33, 36, 38, 43, 46, 47, 46, 46, 47, 50,
+ 51, 56, 57, 59, 60, 60, 37, 40, 43, 47, 47, 47, 45, 46, 47, 49, 50, 54,
+ 55, 57, 59, 61, 39, 41, 43, 47, 48, 48, 47, 47, 48, 50, 51, 55, 56, 57,
+ 58, 59, 42, 43, 44, 47, 49, 50, 49, 50, 50, 53, 53, 57, 58, 60, 60, 59,
+ 47, 46, 46, 48, 51, 52, 53, 53, 53, 55, 56, 60, 61, 61, 61, 62, 49, 46,
+ 47, 48, 52, 53, 53, 54, 54, 56, 57, 60, 61, 63, 63, 62, 48, 46, 46, 47,
+ 51, 53, 56, 56, 57, 59, 60, 64, 64, 65, 64, 65, 48, 45, 46, 46, 51, 53,
+ 57, 57, 59, 61, 61, 65, 66, 66, 67, 65, 49, 45, 45, 46, 51, 53, 58, 59,
+ 61, 63, 64, 67, 68, 70, 67, 68, 50, 46, 46, 46, 52, 54, 59, 61, 63, 65,
+ 66, 70, 71, 70, 71, 68, 50, 46, 46, 46, 52, 54, 59, 61, 64, 66, 67, 71,
+ 71, 73, 71, 72, 52, 48, 47, 47, 53, 54, 61, 63, 66, 70, 71, 75, 75, 75,
+ 74, 72, 53, 49, 48, 48, 53, 55, 61, 64, 67, 71, 72, 76, 77, 77, 75, 76,
+ 54, 50, 49, 49, 54, 55, 62, 65, 68, 72, 73, 78, 79, 80, 79, 76, 56, 51,
+ 51, 50, 55, 56, 63, 66, 70, 74, 76, 81, 82, 81, 80, 80, 57, 52, 51, 50,
+ 55, 56, 64, 66, 70, 75, 76, 82, 83, 85, 83, 80, 60, 54, 54, 52, 57, 58,
+ 65, 68, 72, 77, 79, 85, 86, 86, 85, 84, 61, 56, 55, 53, 58, 59, 66, 69,
+ 73, 79, 80, 86, 87, 89, 87, 84, 63, 57, 56, 55, 59, 60, 67, 70, 75, 80,
+ 82, 89, 90, 91, 89, 89, 64, 58, 57, 56, 60, 61, 68, 71, 75, 81, 83, 90,
+ 91, 93, 91, 89, 64, 59, 58, 56, 60, 61, 68, 71, 75, 81, 83, 90, 91, 94,
+ 94, 93, 66, 60, 59, 57, 61, 63, 69, 72, 77, 82, 84, 92, 93, 94, 95, 93,
+ 67, 61, 60, 58, 61, 63, 69, 70, 78, 80, 85, 90, 93, 96, 97, 97, 68, 62,
+ 61, 59, 61, 64, 68, 71, 77, 79, 86, 88, 94, 96, 97, 98, 69, 63, 62, 59,
+ 61, 65, 68, 72, 76, 80, 85, 88, 94, 95, 99, 99, 70, 63, 63, 60, 61, 66,
+ 67, 73, 75, 81, 83, 89, 92, 97, 98, 99, 70, 64, 64, 61, 61, 67, 67, 74,
+ 74, 82, 82, 90, 90, 98, 98, 102,
/* Size 4x16 */
- 31, 48, 56, 67, 32, 46, 52, 62, 35, 47, 50, 60, 40, 47, 49, 57, 43, 50,
- 53, 60, 46, 53, 56, 63, 45, 53, 61, 66, 46, 54, 65, 70, 48, 54, 70, 75,
- 50, 55, 72, 80, 52, 56, 75, 85, 56, 59, 79, 89, 58, 61, 81, 93, 60, 63,
- 82, 94, 62, 64, 79, 96, 63, 66, 81, 97,
- /* Size 16x4 */
31, 32, 35, 40, 43, 46, 45, 46, 48, 50, 52, 56, 58, 60, 62, 63, 48, 46,
47, 47, 50, 53, 53, 54, 54, 55, 56, 59, 61, 63, 64, 66, 56, 52, 50, 49,
53, 56, 61, 65, 70, 72, 75, 79, 81, 82, 79, 81, 67, 62, 60, 57, 60, 63,
66, 70, 75, 80, 85, 89, 93, 94, 96, 97,
+ /* Size 16x4 */
+ 31, 48, 56, 67, 32, 46, 52, 62, 35, 47, 50, 60, 40, 47, 49, 57, 43, 50,
+ 53, 60, 46, 53, 56, 63, 45, 53, 61, 66, 46, 54, 65, 70, 48, 54, 70, 75,
+ 50, 55, 72, 80, 52, 56, 75, 85, 56, 59, 79, 89, 58, 61, 81, 93, 60, 63,
+ 82, 94, 62, 64, 79, 96, 63, 66, 81, 97,
/* Size 8x32 */
- 32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 47, 50, 54, 61, 64, 31, 34,
- 45, 46, 49, 53, 60, 64, 30, 35, 44, 45, 48, 52, 58, 61, 33, 37, 46, 45,
- 47, 51, 57, 61, 33, 38, 46, 46, 47, 51, 57, 60, 37, 43, 47, 45, 47, 50,
- 55, 59, 39, 43, 48, 47, 48, 51, 56, 58, 42, 44, 49, 49, 50, 53, 58, 60,
- 47, 46, 51, 53, 53, 56, 61, 61, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46,
- 51, 56, 57, 60, 64, 64, 48, 46, 51, 57, 59, 61, 66, 67, 49, 45, 51, 58,
- 61, 64, 68, 67, 50, 46, 52, 59, 63, 66, 71, 71, 50, 46, 52, 59, 64, 67,
- 71, 71, 52, 47, 53, 61, 66, 71, 75, 74, 53, 48, 53, 61, 67, 72, 77, 75,
- 54, 49, 54, 62, 68, 73, 79, 79, 56, 51, 55, 63, 70, 76, 82, 80, 57, 51,
- 55, 64, 70, 76, 83, 83, 60, 54, 57, 65, 72, 79, 86, 85, 61, 55, 58, 66,
- 73, 80, 87, 87, 63, 56, 59, 67, 75, 82, 90, 89, 64, 57, 60, 68, 75, 83,
- 91, 91, 64, 58, 60, 68, 75, 83, 91, 94, 66, 59, 61, 69, 77, 84, 93, 95,
- 67, 60, 61, 69, 78, 85, 93, 97, 68, 61, 61, 68, 77, 86, 94, 97, 69, 62,
- 61, 68, 76, 85, 94, 99, 70, 63, 61, 67, 75, 83, 92, 98, 70, 64, 61, 67,
- 74, 82, 90, 98,
- /* Size 32x8 */
32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 52, 53,
54, 56, 57, 60, 61, 63, 64, 64, 66, 67, 68, 69, 70, 70, 33, 34, 34, 35,
37, 38, 43, 43, 44, 46, 47, 46, 46, 45, 46, 46, 47, 48, 49, 51, 51, 54,
@@ -2053,7 +2037,23 @@
55, 56, 58, 61, 61, 64, 66, 68, 71, 71, 75, 77, 79, 82, 83, 86, 87, 90,
91, 91, 93, 93, 94, 94, 92, 90, 68, 64, 64, 61, 61, 60, 59, 58, 60, 61,
63, 64, 67, 67, 71, 71, 74, 75, 79, 80, 83, 85, 87, 89, 91, 94, 95, 97,
- 97, 99, 98, 98 },
+ 97, 99, 98, 98,
+ /* Size 32x8 */
+ 32, 33, 45, 49, 52, 57, 64, 68, 31, 34, 45, 47, 50, 54, 61, 64, 31, 34,
+ 45, 46, 49, 53, 60, 64, 30, 35, 44, 45, 48, 52, 58, 61, 33, 37, 46, 45,
+ 47, 51, 57, 61, 33, 38, 46, 46, 47, 51, 57, 60, 37, 43, 47, 45, 47, 50,
+ 55, 59, 39, 43, 48, 47, 48, 51, 56, 58, 42, 44, 49, 49, 50, 53, 58, 60,
+ 47, 46, 51, 53, 53, 56, 61, 61, 49, 47, 52, 53, 54, 57, 61, 63, 48, 46,
+ 51, 56, 57, 60, 64, 64, 48, 46, 51, 57, 59, 61, 66, 67, 49, 45, 51, 58,
+ 61, 64, 68, 67, 50, 46, 52, 59, 63, 66, 71, 71, 50, 46, 52, 59, 64, 67,
+ 71, 71, 52, 47, 53, 61, 66, 71, 75, 74, 53, 48, 53, 61, 67, 72, 77, 75,
+ 54, 49, 54, 62, 68, 73, 79, 79, 56, 51, 55, 63, 70, 76, 82, 80, 57, 51,
+ 55, 64, 70, 76, 83, 83, 60, 54, 57, 65, 72, 79, 86, 85, 61, 55, 58, 66,
+ 73, 80, 87, 87, 63, 56, 59, 67, 75, 82, 90, 89, 64, 57, 60, 68, 75, 83,
+ 91, 91, 64, 58, 60, 68, 75, 83, 91, 94, 66, 59, 61, 69, 77, 84, 93, 95,
+ 67, 60, 61, 69, 78, 85, 93, 97, 68, 61, 61, 68, 77, 86, 94, 97, 69, 62,
+ 61, 68, 76, 85, 94, 99, 70, 63, 61, 67, 75, 83, 92, 98, 70, 64, 61, 67,
+ 74, 82, 90, 98 },
},
{
{ /* Luma */
@@ -2142,21 +2142,12 @@
84, 84, 83, 83, 80, 81, 86, 86, 91, 91, 96, 97, 103, 103, 110, 110, 118,
119, 126, 126, 135, 136, 144, 144, 155, 155, 159, 159, 164,
/* Size 4x8 */
- 32, 35, 51, 77, 32, 36, 50, 72, 34, 42, 54, 75, 38, 51, 67, 87, 48, 59,
- 80, 103, 60, 68, 92, 119, 72, 79, 104, 135, 81, 86, 112, 144,
- /* Size 8x4 */
32, 32, 34, 38, 48, 60, 72, 81, 35, 36, 42, 51, 59, 68, 79, 86, 51, 50,
54, 67, 80, 92, 104, 112, 77, 72, 75, 87, 103, 119, 135, 144,
+ /* Size 8x4 */
+ 32, 35, 51, 77, 32, 36, 50, 72, 34, 42, 54, 75, 38, 51, 67, 87, 48, 59,
+ 80, 103, 60, 68, 92, 119, 72, 79, 104, 135, 81, 86, 112, 144,
/* Size 8x16 */
- 32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 61, 74, 82, 31, 32,
- 34, 38, 47, 59, 71, 79, 32, 33, 36, 40, 48, 58, 69, 77, 33, 34, 38, 44,
- 52, 62, 72, 78, 36, 35, 42, 51, 58, 68, 78, 84, 39, 38, 44, 54, 63, 73,
- 84, 89, 44, 41, 46, 59, 69, 79, 90, 96, 48, 45, 50, 62, 74, 85, 96, 103,
- 53, 49, 53, 66, 79, 92, 103, 111, 58, 54, 57, 70, 84, 98, 110, 118, 66,
- 60, 63, 75, 90, 106, 119, 126, 74, 67, 69, 81, 97, 113, 128, 134, 81,
- 73, 75, 86, 102, 120, 135, 143, 86, 78, 78, 90, 106, 124, 140, 147, 91,
- 82, 80, 90, 103, 119, 137, 151,
- /* Size 16x8 */
32, 31, 31, 32, 33, 36, 39, 44, 48, 53, 58, 66, 74, 81, 86, 91, 31, 32,
32, 33, 34, 35, 38, 41, 45, 49, 54, 60, 67, 73, 78, 82, 33, 33, 34, 36,
38, 42, 44, 46, 50, 53, 57, 63, 69, 75, 78, 80, 40, 39, 38, 40, 44, 51,
@@ -2165,38 +2156,16 @@
92, 98, 106, 113, 120, 124, 119, 79, 74, 71, 69, 72, 78, 84, 90, 96,
103, 110, 119, 128, 135, 140, 137, 87, 82, 79, 77, 78, 84, 89, 96, 103,
111, 118, 126, 134, 143, 147, 151,
+ /* Size 16x8 */
+ 32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 61, 74, 82, 31, 32,
+ 34, 38, 47, 59, 71, 79, 32, 33, 36, 40, 48, 58, 69, 77, 33, 34, 38, 44,
+ 52, 62, 72, 78, 36, 35, 42, 51, 58, 68, 78, 84, 39, 38, 44, 54, 63, 73,
+ 84, 89, 44, 41, 46, 59, 69, 79, 90, 96, 48, 45, 50, 62, 74, 85, 96, 103,
+ 53, 49, 53, 66, 79, 92, 103, 111, 58, 54, 57, 70, 84, 98, 110, 118, 66,
+ 60, 63, 75, 90, 106, 119, 126, 74, 67, 69, 81, 97, 113, 128, 134, 81,
+ 73, 75, 86, 102, 120, 135, 143, 86, 78, 78, 90, 106, 124, 140, 147, 91,
+ 82, 80, 90, 103, 119, 137, 151,
/* Size 16x32 */
- 32, 31, 31, 32, 33, 36, 40, 44, 51, 53, 65, 66, 79, 81, 87, 90, 31, 32,
- 32, 32, 33, 35, 39, 42, 49, 51, 62, 63, 75, 77, 83, 85, 31, 32, 32, 32,
- 33, 35, 39, 42, 49, 51, 61, 62, 74, 76, 82, 85, 31, 32, 32, 33, 33, 34,
- 38, 41, 47, 49, 59, 60, 72, 74, 79, 81, 31, 32, 32, 33, 34, 35, 38, 41,
- 47, 49, 59, 60, 71, 73, 79, 81, 32, 32, 33, 34, 35, 36, 39, 42, 48, 50,
- 59, 60, 71, 72, 78, 80, 32, 32, 33, 35, 36, 37, 40, 42, 48, 49, 58, 59,
- 69, 71, 77, 80, 32, 33, 33, 35, 36, 38, 41, 42, 48, 49, 58, 59, 69, 70,
- 75, 77, 33, 33, 34, 36, 38, 41, 44, 46, 52, 53, 62, 63, 72, 74, 78, 78,
- 34, 34, 34, 37, 39, 42, 45, 48, 53, 54, 63, 64, 73, 75, 80, 83, 36, 34,
- 35, 38, 42, 48, 51, 54, 58, 60, 68, 69, 78, 80, 84, 83, 36, 35, 35, 38,
- 42, 48, 51, 54, 59, 60, 68, 69, 79, 80, 85, 87, 39, 37, 38, 40, 44, 50,
- 54, 58, 63, 65, 73, 74, 84, 85, 89, 88, 40, 38, 39, 41, 45, 51, 56, 59,
- 65, 67, 75, 76, 85, 87, 90, 93, 44, 41, 41, 43, 46, 53, 59, 63, 69, 71,
- 79, 80, 90, 91, 96, 93, 46, 43, 43, 44, 48, 55, 60, 65, 72, 73, 82, 83,
- 93, 94, 97, 100, 48, 45, 45, 46, 50, 56, 62, 67, 74, 76, 85, 86, 96, 98,
- 103, 100, 52, 48, 48, 49, 52, 59, 65, 70, 78, 80, 90, 91, 101, 103, 105,
- 107, 53, 49, 49, 50, 53, 60, 66, 71, 79, 82, 92, 93, 103, 105, 111, 107,
- 58, 53, 53, 53, 57, 63, 69, 74, 83, 86, 97, 98, 109, 111, 113, 115, 58,
- 54, 54, 54, 57, 63, 70, 75, 84, 87, 98, 99, 110, 112, 118, 115, 65, 60,
- 59, 58, 62, 68, 74, 79, 89, 92, 105, 106, 118, 119, 122, 123, 66, 61,
- 60, 59, 63, 69, 75, 80, 90, 93, 106, 107, 119, 121, 126, 123, 71, 65,
- 65, 63, 67, 73, 79, 84, 94, 97, 111, 112, 125, 127, 131, 132, 74, 68,
- 67, 66, 69, 75, 81, 86, 97, 100, 113, 115, 128, 130, 134, 132, 79, 72,
- 72, 70, 73, 79, 85, 90, 101, 104, 118, 119, 133, 135, 141, 140, 81, 74,
- 73, 71, 75, 80, 86, 91, 102, 105, 120, 121, 135, 137, 143, 140, 82, 75,
- 74, 72, 75, 81, 87, 92, 103, 106, 121, 122, 136, 139, 147, 151, 86, 78,
- 78, 75, 78, 84, 90, 95, 106, 109, 124, 125, 140, 142, 147, 151, 88, 81,
- 80, 77, 80, 86, 90, 98, 105, 112, 122, 127, 140, 144, 152, 155, 91, 83,
- 82, 79, 80, 88, 90, 100, 103, 114, 119, 130, 137, 148, 151, 155, 93, 85,
- 85, 81, 81, 90, 90, 102, 103, 117, 117, 134, 134, 151, 152, 160,
- /* Size 32x16 */
32, 31, 31, 31, 31, 32, 32, 32, 33, 34, 36, 36, 39, 40, 44, 46, 48, 52,
53, 58, 58, 65, 66, 71, 74, 79, 81, 82, 86, 88, 91, 93, 31, 32, 32, 32,
32, 32, 32, 33, 33, 34, 34, 35, 37, 38, 41, 43, 45, 48, 49, 53, 54, 60,
@@ -2228,33 +2197,48 @@
152, 90, 85, 85, 81, 81, 80, 80, 77, 78, 83, 83, 87, 88, 93, 93, 100,
100, 107, 107, 115, 115, 123, 123, 132, 132, 140, 140, 151, 151, 155,
155, 160,
+ /* Size 32x16 */
+ 32, 31, 31, 32, 33, 36, 40, 44, 51, 53, 65, 66, 79, 81, 87, 90, 31, 32,
+ 32, 32, 33, 35, 39, 42, 49, 51, 62, 63, 75, 77, 83, 85, 31, 32, 32, 32,
+ 33, 35, 39, 42, 49, 51, 61, 62, 74, 76, 82, 85, 31, 32, 32, 33, 33, 34,
+ 38, 41, 47, 49, 59, 60, 72, 74, 79, 81, 31, 32, 32, 33, 34, 35, 38, 41,
+ 47, 49, 59, 60, 71, 73, 79, 81, 32, 32, 33, 34, 35, 36, 39, 42, 48, 50,
+ 59, 60, 71, 72, 78, 80, 32, 32, 33, 35, 36, 37, 40, 42, 48, 49, 58, 59,
+ 69, 71, 77, 80, 32, 33, 33, 35, 36, 38, 41, 42, 48, 49, 58, 59, 69, 70,
+ 75, 77, 33, 33, 34, 36, 38, 41, 44, 46, 52, 53, 62, 63, 72, 74, 78, 78,
+ 34, 34, 34, 37, 39, 42, 45, 48, 53, 54, 63, 64, 73, 75, 80, 83, 36, 34,
+ 35, 38, 42, 48, 51, 54, 58, 60, 68, 69, 78, 80, 84, 83, 36, 35, 35, 38,
+ 42, 48, 51, 54, 59, 60, 68, 69, 79, 80, 85, 87, 39, 37, 38, 40, 44, 50,
+ 54, 58, 63, 65, 73, 74, 84, 85, 89, 88, 40, 38, 39, 41, 45, 51, 56, 59,
+ 65, 67, 75, 76, 85, 87, 90, 93, 44, 41, 41, 43, 46, 53, 59, 63, 69, 71,
+ 79, 80, 90, 91, 96, 93, 46, 43, 43, 44, 48, 55, 60, 65, 72, 73, 82, 83,
+ 93, 94, 97, 100, 48, 45, 45, 46, 50, 56, 62, 67, 74, 76, 85, 86, 96, 98,
+ 103, 100, 52, 48, 48, 49, 52, 59, 65, 70, 78, 80, 90, 91, 101, 103, 105,
+ 107, 53, 49, 49, 50, 53, 60, 66, 71, 79, 82, 92, 93, 103, 105, 111, 107,
+ 58, 53, 53, 53, 57, 63, 69, 74, 83, 86, 97, 98, 109, 111, 113, 115, 58,
+ 54, 54, 54, 57, 63, 70, 75, 84, 87, 98, 99, 110, 112, 118, 115, 65, 60,
+ 59, 58, 62, 68, 74, 79, 89, 92, 105, 106, 118, 119, 122, 123, 66, 61,
+ 60, 59, 63, 69, 75, 80, 90, 93, 106, 107, 119, 121, 126, 123, 71, 65,
+ 65, 63, 67, 73, 79, 84, 94, 97, 111, 112, 125, 127, 131, 132, 74, 68,
+ 67, 66, 69, 75, 81, 86, 97, 100, 113, 115, 128, 130, 134, 132, 79, 72,
+ 72, 70, 73, 79, 85, 90, 101, 104, 118, 119, 133, 135, 141, 140, 81, 74,
+ 73, 71, 75, 80, 86, 91, 102, 105, 120, 121, 135, 137, 143, 140, 82, 75,
+ 74, 72, 75, 81, 87, 92, 103, 106, 121, 122, 136, 139, 147, 151, 86, 78,
+ 78, 75, 78, 84, 90, 95, 106, 109, 124, 125, 140, 142, 147, 151, 88, 81,
+ 80, 77, 80, 86, 90, 98, 105, 112, 122, 127, 140, 144, 152, 155, 91, 83,
+ 82, 79, 80, 88, 90, 100, 103, 114, 119, 130, 137, 148, 151, 155, 93, 85,
+ 85, 81, 81, 90, 90, 102, 103, 117, 117, 134, 134, 151, 152, 160,
/* Size 4x16 */
- 31, 36, 53, 81, 32, 35, 51, 76, 32, 35, 49, 73, 32, 37, 49, 71, 33, 41,
- 53, 74, 34, 48, 60, 80, 37, 50, 65, 85, 41, 53, 71, 91, 45, 56, 76, 98,
- 49, 60, 82, 105, 54, 63, 87, 112, 61, 69, 93, 121, 68, 75, 100, 130, 74,
- 80, 105, 137, 78, 84, 109, 142, 83, 88, 114, 148,
- /* Size 16x4 */
31, 32, 32, 32, 33, 34, 37, 41, 45, 49, 54, 61, 68, 74, 78, 83, 36, 35,
35, 37, 41, 48, 50, 53, 56, 60, 63, 69, 75, 80, 84, 88, 53, 51, 49, 49,
53, 60, 65, 71, 76, 82, 87, 93, 100, 105, 109, 114, 81, 76, 73, 71, 74,
80, 85, 91, 98, 105, 112, 121, 130, 137, 142, 148,
+ /* Size 16x4 */
+ 31, 36, 53, 81, 32, 35, 51, 76, 32, 35, 49, 73, 32, 37, 49, 71, 33, 41,
+ 53, 74, 34, 48, 60, 80, 37, 50, 65, 85, 41, 53, 71, 91, 45, 56, 76, 98,
+ 49, 60, 82, 105, 54, 63, 87, 112, 61, 69, 93, 121, 68, 75, 100, 130, 74,
+ 80, 105, 137, 78, 84, 109, 142, 83, 88, 114, 148,
/* Size 8x32 */
- 32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 62, 75, 83, 31, 32,
- 33, 39, 49, 61, 74, 82, 31, 32, 33, 38, 47, 59, 72, 79, 31, 32, 34, 38,
- 47, 59, 71, 79, 32, 33, 35, 39, 48, 59, 71, 78, 32, 33, 36, 40, 48, 58,
- 69, 77, 32, 33, 36, 41, 48, 58, 69, 75, 33, 34, 38, 44, 52, 62, 72, 78,
- 34, 34, 39, 45, 53, 63, 73, 80, 36, 35, 42, 51, 58, 68, 78, 84, 36, 35,
- 42, 51, 59, 68, 79, 85, 39, 38, 44, 54, 63, 73, 84, 89, 40, 39, 45, 56,
- 65, 75, 85, 90, 44, 41, 46, 59, 69, 79, 90, 96, 46, 43, 48, 60, 72, 82,
- 93, 97, 48, 45, 50, 62, 74, 85, 96, 103, 52, 48, 52, 65, 78, 90, 101,
- 105, 53, 49, 53, 66, 79, 92, 103, 111, 58, 53, 57, 69, 83, 97, 109, 113,
- 58, 54, 57, 70, 84, 98, 110, 118, 65, 59, 62, 74, 89, 105, 118, 122, 66,
- 60, 63, 75, 90, 106, 119, 126, 71, 65, 67, 79, 94, 111, 125, 131, 74,
- 67, 69, 81, 97, 113, 128, 134, 79, 72, 73, 85, 101, 118, 133, 141, 81,
- 73, 75, 86, 102, 120, 135, 143, 82, 74, 75, 87, 103, 121, 136, 147, 86,
- 78, 78, 90, 106, 124, 140, 147, 88, 80, 80, 90, 105, 122, 140, 152, 91,
- 82, 80, 90, 103, 119, 137, 151, 93, 85, 81, 90, 103, 117, 134, 152,
- /* Size 32x8 */
32, 31, 31, 31, 31, 32, 32, 32, 33, 34, 36, 36, 39, 40, 44, 46, 48, 52,
53, 58, 58, 65, 66, 71, 74, 79, 81, 82, 86, 88, 91, 93, 31, 32, 32, 32,
32, 33, 33, 33, 34, 34, 35, 35, 38, 39, 41, 43, 45, 48, 49, 53, 54, 59,
@@ -2270,7 +2254,23 @@
103, 109, 110, 118, 119, 125, 128, 133, 135, 136, 140, 140, 137, 134,
87, 83, 82, 79, 79, 78, 77, 75, 78, 80, 84, 85, 89, 90, 96, 97, 103,
105, 111, 113, 118, 122, 126, 131, 134, 141, 143, 147, 147, 152, 151,
- 152 },
+ 152,
+ /* Size 32x8 */
+ 32, 31, 33, 40, 51, 65, 79, 87, 31, 32, 33, 39, 49, 62, 75, 83, 31, 32,
+ 33, 39, 49, 61, 74, 82, 31, 32, 33, 38, 47, 59, 72, 79, 31, 32, 34, 38,
+ 47, 59, 71, 79, 32, 33, 35, 39, 48, 59, 71, 78, 32, 33, 36, 40, 48, 58,
+ 69, 77, 32, 33, 36, 41, 48, 58, 69, 75, 33, 34, 38, 44, 52, 62, 72, 78,
+ 34, 34, 39, 45, 53, 63, 73, 80, 36, 35, 42, 51, 58, 68, 78, 84, 36, 35,
+ 42, 51, 59, 68, 79, 85, 39, 38, 44, 54, 63, 73, 84, 89, 40, 39, 45, 56,
+ 65, 75, 85, 90, 44, 41, 46, 59, 69, 79, 90, 96, 46, 43, 48, 60, 72, 82,
+ 93, 97, 48, 45, 50, 62, 74, 85, 96, 103, 52, 48, 52, 65, 78, 90, 101,
+ 105, 53, 49, 53, 66, 79, 92, 103, 111, 58, 53, 57, 69, 83, 97, 109, 113,
+ 58, 54, 57, 70, 84, 98, 110, 118, 65, 59, 62, 74, 89, 105, 118, 122, 66,
+ 60, 63, 75, 90, 106, 119, 126, 71, 65, 67, 79, 94, 111, 125, 131, 74,
+ 67, 69, 81, 97, 113, 128, 134, 79, 72, 73, 85, 101, 118, 133, 141, 81,
+ 73, 75, 86, 102, 120, 135, 143, 82, 74, 75, 87, 103, 121, 136, 147, 86,
+ 78, 78, 90, 106, 124, 140, 147, 88, 80, 80, 90, 105, 122, 140, 152, 91,
+ 82, 80, 90, 103, 119, 137, 151, 93, 85, 81, 90, 103, 117, 134, 152 },
{ /* Chroma */
/* Size 4x4 */
32, 46, 49, 58, 46, 53, 55, 62, 49, 55, 70, 78, 58, 62, 78, 91,
@@ -2354,21 +2354,12 @@
98, 97, 69, 65, 65, 62, 62, 61, 61, 58, 59, 62, 62, 65, 65, 68, 68, 71,
71, 75, 75, 79, 79, 83, 83, 87, 87, 91, 91, 96, 96, 97, 97, 99,
/* Size 4x8 */
- 31, 47, 50, 61, 36, 47, 47, 57, 43, 50, 50, 58, 45, 53, 58, 65, 47, 54,
- 66, 74, 52, 56, 70, 82, 57, 60, 75, 90, 61, 63, 77, 93,
- /* Size 8x4 */
31, 36, 43, 45, 47, 52, 57, 61, 47, 47, 50, 53, 54, 56, 60, 63, 50, 47,
50, 58, 66, 70, 75, 77, 61, 57, 58, 65, 74, 82, 90, 93,
+ /* Size 8x4 */
+ 31, 47, 50, 61, 36, 47, 47, 57, 43, 50, 50, 58, 45, 53, 58, 65, 47, 54,
+ 66, 74, 52, 56, 70, 82, 57, 60, 75, 90, 61, 63, 77, 93,
/* Size 8x16 */
- 32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 59, 63, 31, 35,
- 43, 46, 47, 51, 57, 60, 35, 39, 46, 46, 47, 50, 55, 58, 41, 43, 48, 49,
- 49, 52, 57, 59, 49, 47, 50, 53, 54, 57, 60, 62, 48, 46, 49, 54, 57, 60,
- 64, 65, 49, 45, 48, 56, 61, 64, 67, 69, 50, 46, 49, 57, 63, 67, 71, 73,
- 52, 48, 50, 58, 65, 71, 75, 77, 54, 50, 51, 59, 67, 73, 78, 81, 57, 52,
- 53, 61, 69, 77, 82, 85, 61, 55, 56, 63, 72, 80, 86, 88, 64, 58, 58, 65,
- 73, 82, 89, 92, 66, 59, 59, 66, 75, 84, 91, 94, 68, 61, 59, 65, 72, 81,
- 89, 95,
- /* Size 16x8 */
32, 31, 31, 35, 41, 49, 48, 49, 50, 52, 54, 57, 61, 64, 66, 68, 32, 33,
35, 39, 43, 47, 46, 45, 46, 48, 50, 52, 55, 58, 59, 61, 40, 41, 43, 46,
48, 50, 49, 48, 49, 50, 51, 53, 56, 58, 59, 59, 49, 47, 46, 46, 49, 53,
@@ -2377,37 +2368,16 @@
73, 77, 80, 82, 84, 81, 63, 59, 57, 55, 57, 60, 64, 67, 71, 75, 78, 82,
86, 89, 91, 89, 67, 63, 60, 58, 59, 62, 65, 69, 73, 77, 81, 85, 88, 92,
94, 95,
+ /* Size 16x8 */
+ 32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 59, 63, 31, 35,
+ 43, 46, 47, 51, 57, 60, 35, 39, 46, 46, 47, 50, 55, 58, 41, 43, 48, 49,
+ 49, 52, 57, 59, 49, 47, 50, 53, 54, 57, 60, 62, 48, 46, 49, 54, 57, 60,
+ 64, 65, 49, 45, 48, 56, 61, 64, 67, 69, 50, 46, 49, 57, 63, 67, 71, 73,
+ 52, 48, 50, 58, 65, 71, 75, 77, 54, 50, 51, 59, 67, 73, 78, 81, 57, 52,
+ 53, 61, 69, 77, 82, 85, 61, 55, 56, 63, 72, 80, 86, 88, 64, 58, 58, 65,
+ 73, 82, 89, 92, 66, 59, 59, 66, 75, 84, 91, 94, 68, 61, 59, 65, 72, 81,
+ 89, 95,
/* Size 16x32 */
- 32, 31, 32, 37, 40, 48, 49, 49, 51, 52, 57, 58, 63, 64, 67, 67, 31, 31,
- 33, 38, 41, 47, 47, 47, 49, 50, 54, 55, 60, 61, 63, 64, 31, 31, 33, 38,
- 41, 47, 47, 47, 49, 49, 54, 54, 59, 60, 63, 64, 30, 32, 33, 40, 42, 46,
- 45, 45, 47, 48, 52, 52, 57, 58, 60, 61, 31, 33, 35, 41, 43, 46, 46, 45,
- 47, 48, 51, 52, 57, 57, 60, 61, 33, 36, 37, 43, 44, 47, 46, 46, 47, 47,
- 51, 52, 56, 57, 59, 60, 35, 38, 39, 45, 46, 47, 46, 45, 47, 47, 50, 51,
- 55, 56, 58, 60, 37, 40, 41, 47, 47, 47, 46, 45, 46, 47, 50, 50, 54, 55,
- 57, 58, 41, 42, 43, 47, 48, 49, 49, 48, 49, 50, 52, 53, 57, 57, 59, 58,
- 42, 43, 43, 47, 48, 50, 49, 49, 50, 50, 53, 54, 57, 58, 60, 61, 49, 46,
- 47, 48, 50, 53, 53, 53, 54, 54, 57, 57, 60, 61, 62, 61, 49, 46, 47, 48,
- 50, 53, 53, 54, 54, 55, 57, 57, 61, 61, 63, 64, 48, 46, 46, 47, 49, 53,
- 54, 56, 57, 57, 60, 60, 64, 64, 65, 64, 48, 45, 46, 46, 49, 53, 55, 56,
- 58, 58, 61, 61, 65, 65, 66, 67, 49, 45, 45, 46, 48, 53, 56, 58, 61, 61,
- 64, 64, 67, 68, 69, 67, 49, 46, 46, 46, 49, 53, 57, 59, 62, 62, 65, 66,
- 69, 69, 70, 70, 50, 46, 46, 46, 49, 54, 57, 59, 63, 64, 67, 67, 71, 71,
- 73, 71, 51, 47, 47, 47, 49, 54, 58, 61, 64, 66, 69, 70, 73, 74, 74, 74,
- 52, 48, 48, 47, 50, 54, 58, 61, 65, 66, 71, 71, 75, 75, 77, 74, 54, 50,
- 49, 48, 51, 55, 59, 62, 67, 68, 73, 73, 77, 78, 78, 78, 54, 50, 50, 49,
- 51, 55, 59, 62, 67, 68, 73, 74, 78, 78, 81, 78, 57, 52, 52, 50, 52, 56,
- 60, 64, 69, 70, 76, 77, 82, 82, 83, 82, 57, 52, 52, 51, 53, 57, 61, 64,
- 69, 71, 77, 77, 82, 83, 85, 82, 60, 54, 54, 52, 55, 58, 62, 65, 71, 72,
- 79, 79, 85, 86, 87, 86, 61, 56, 55, 53, 56, 59, 63, 66, 72, 73, 80, 81,
- 86, 87, 88, 86, 63, 57, 57, 55, 57, 60, 64, 67, 73, 75, 82, 82, 89, 90,
- 92, 90, 64, 58, 58, 55, 58, 61, 65, 68, 73, 75, 82, 83, 89, 90, 92, 90,
- 64, 59, 58, 56, 58, 61, 65, 68, 74, 75, 83, 83, 90, 91, 94, 95, 66, 60,
- 59, 57, 59, 62, 66, 69, 75, 76, 84, 85, 91, 92, 94, 95, 67, 61, 60, 58,
- 59, 63, 66, 70, 74, 77, 82, 85, 91, 93, 96, 96, 68, 62, 61, 58, 59, 64,
- 65, 71, 72, 78, 81, 86, 89, 94, 95, 96, 68, 62, 62, 59, 59, 65, 65, 71,
- 71, 79, 79, 87, 87, 95, 95, 98,
- /* Size 32x16 */
32, 31, 31, 30, 31, 33, 35, 37, 41, 42, 49, 49, 48, 48, 49, 49, 50, 51,
52, 54, 54, 57, 57, 60, 61, 63, 64, 64, 66, 67, 68, 68, 31, 31, 31, 32,
33, 36, 38, 40, 42, 43, 46, 46, 46, 45, 45, 46, 46, 47, 48, 50, 50, 52,
@@ -2437,33 +2407,47 @@
81, 83, 85, 87, 88, 92, 92, 94, 94, 96, 95, 95, 67, 64, 64, 61, 61, 60,
60, 58, 58, 61, 61, 64, 64, 67, 67, 70, 71, 74, 74, 78, 78, 82, 82, 86,
86, 90, 90, 95, 95, 96, 96, 98,
+ /* Size 32x16 */
+ 32, 31, 32, 37, 40, 48, 49, 49, 51, 52, 57, 58, 63, 64, 67, 67, 31, 31,
+ 33, 38, 41, 47, 47, 47, 49, 50, 54, 55, 60, 61, 63, 64, 31, 31, 33, 38,
+ 41, 47, 47, 47, 49, 49, 54, 54, 59, 60, 63, 64, 30, 32, 33, 40, 42, 46,
+ 45, 45, 47, 48, 52, 52, 57, 58, 60, 61, 31, 33, 35, 41, 43, 46, 46, 45,
+ 47, 48, 51, 52, 57, 57, 60, 61, 33, 36, 37, 43, 44, 47, 46, 46, 47, 47,
+ 51, 52, 56, 57, 59, 60, 35, 38, 39, 45, 46, 47, 46, 45, 47, 47, 50, 51,
+ 55, 56, 58, 60, 37, 40, 41, 47, 47, 47, 46, 45, 46, 47, 50, 50, 54, 55,
+ 57, 58, 41, 42, 43, 47, 48, 49, 49, 48, 49, 50, 52, 53, 57, 57, 59, 58,
+ 42, 43, 43, 47, 48, 50, 49, 49, 50, 50, 53, 54, 57, 58, 60, 61, 49, 46,
+ 47, 48, 50, 53, 53, 53, 54, 54, 57, 57, 60, 61, 62, 61, 49, 46, 47, 48,
+ 50, 53, 53, 54, 54, 55, 57, 57, 61, 61, 63, 64, 48, 46, 46, 47, 49, 53,
+ 54, 56, 57, 57, 60, 60, 64, 64, 65, 64, 48, 45, 46, 46, 49, 53, 55, 56,
+ 58, 58, 61, 61, 65, 65, 66, 67, 49, 45, 45, 46, 48, 53, 56, 58, 61, 61,
+ 64, 64, 67, 68, 69, 67, 49, 46, 46, 46, 49, 53, 57, 59, 62, 62, 65, 66,
+ 69, 69, 70, 70, 50, 46, 46, 46, 49, 54, 57, 59, 63, 64, 67, 67, 71, 71,
+ 73, 71, 51, 47, 47, 47, 49, 54, 58, 61, 64, 66, 69, 70, 73, 74, 74, 74,
+ 52, 48, 48, 47, 50, 54, 58, 61, 65, 66, 71, 71, 75, 75, 77, 74, 54, 50,
+ 49, 48, 51, 55, 59, 62, 67, 68, 73, 73, 77, 78, 78, 78, 54, 50, 50, 49,
+ 51, 55, 59, 62, 67, 68, 73, 74, 78, 78, 81, 78, 57, 52, 52, 50, 52, 56,
+ 60, 64, 69, 70, 76, 77, 82, 82, 83, 82, 57, 52, 52, 51, 53, 57, 61, 64,
+ 69, 71, 77, 77, 82, 83, 85, 82, 60, 54, 54, 52, 55, 58, 62, 65, 71, 72,
+ 79, 79, 85, 86, 87, 86, 61, 56, 55, 53, 56, 59, 63, 66, 72, 73, 80, 81,
+ 86, 87, 88, 86, 63, 57, 57, 55, 57, 60, 64, 67, 73, 75, 82, 82, 89, 90,
+ 92, 90, 64, 58, 58, 55, 58, 61, 65, 68, 73, 75, 82, 83, 89, 90, 92, 90,
+ 64, 59, 58, 56, 58, 61, 65, 68, 74, 75, 83, 83, 90, 91, 94, 95, 66, 60,
+ 59, 57, 59, 62, 66, 69, 75, 76, 84, 85, 91, 92, 94, 95, 67, 61, 60, 58,
+ 59, 63, 66, 70, 74, 77, 82, 85, 91, 93, 96, 96, 68, 62, 61, 58, 59, 64,
+ 65, 71, 72, 78, 81, 86, 89, 94, 95, 96, 68, 62, 62, 59, 59, 65, 65, 71,
+ 71, 79, 79, 87, 87, 95, 95, 98,
/* Size 4x16 */
- 31, 48, 52, 64, 31, 47, 49, 60, 33, 46, 48, 57, 38, 47, 47, 56, 42, 49,
- 50, 57, 46, 53, 54, 61, 46, 53, 57, 64, 45, 53, 61, 68, 46, 54, 64, 71,
- 48, 54, 66, 75, 50, 55, 68, 78, 52, 57, 71, 83, 56, 59, 73, 87, 58, 61,
- 75, 90, 60, 62, 76, 92, 62, 64, 78, 94,
- /* Size 16x4 */
31, 31, 33, 38, 42, 46, 46, 45, 46, 48, 50, 52, 56, 58, 60, 62, 48, 47,
46, 47, 49, 53, 53, 53, 54, 54, 55, 57, 59, 61, 62, 64, 52, 49, 48, 47,
50, 54, 57, 61, 64, 66, 68, 71, 73, 75, 76, 78, 64, 60, 57, 56, 57, 61,
64, 68, 71, 75, 78, 83, 87, 90, 92, 94,
+ /* Size 16x4 */
+ 31, 48, 52, 64, 31, 47, 49, 60, 33, 46, 48, 57, 38, 47, 47, 56, 42, 49,
+ 50, 57, 46, 53, 54, 61, 46, 53, 57, 64, 45, 53, 61, 68, 46, 54, 64, 71,
+ 48, 54, 66, 75, 50, 55, 68, 78, 52, 57, 71, 83, 56, 59, 73, 87, 58, 61,
+ 75, 90, 60, 62, 76, 92, 62, 64, 78, 94,
/* Size 8x32 */
- 32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 60, 63, 31, 33,
- 41, 47, 49, 54, 59, 63, 30, 33, 42, 45, 47, 52, 57, 60, 31, 35, 43, 46,
- 47, 51, 57, 60, 33, 37, 44, 46, 47, 51, 56, 59, 35, 39, 46, 46, 47, 50,
- 55, 58, 37, 41, 47, 46, 46, 50, 54, 57, 41, 43, 48, 49, 49, 52, 57, 59,
- 42, 43, 48, 49, 50, 53, 57, 60, 49, 47, 50, 53, 54, 57, 60, 62, 49, 47,
- 50, 53, 54, 57, 61, 63, 48, 46, 49, 54, 57, 60, 64, 65, 48, 46, 49, 55,
- 58, 61, 65, 66, 49, 45, 48, 56, 61, 64, 67, 69, 49, 46, 49, 57, 62, 65,
- 69, 70, 50, 46, 49, 57, 63, 67, 71, 73, 51, 47, 49, 58, 64, 69, 73, 74,
- 52, 48, 50, 58, 65, 71, 75, 77, 54, 49, 51, 59, 67, 73, 77, 78, 54, 50,
- 51, 59, 67, 73, 78, 81, 57, 52, 52, 60, 69, 76, 82, 83, 57, 52, 53, 61,
- 69, 77, 82, 85, 60, 54, 55, 62, 71, 79, 85, 87, 61, 55, 56, 63, 72, 80,
- 86, 88, 63, 57, 57, 64, 73, 82, 89, 92, 64, 58, 58, 65, 73, 82, 89, 92,
- 64, 58, 58, 65, 74, 83, 90, 94, 66, 59, 59, 66, 75, 84, 91, 94, 67, 60,
- 59, 66, 74, 82, 91, 96, 68, 61, 59, 65, 72, 81, 89, 95, 68, 62, 59, 65,
- 71, 79, 87, 95,
- /* Size 32x8 */
32, 31, 31, 30, 31, 33, 35, 37, 41, 42, 49, 49, 48, 48, 49, 49, 50, 51,
52, 54, 54, 57, 57, 60, 61, 63, 64, 64, 66, 67, 68, 68, 32, 33, 33, 33,
35, 37, 39, 41, 43, 43, 47, 47, 46, 46, 45, 46, 46, 47, 48, 49, 50, 52,
@@ -2478,7 +2462,23 @@
55, 54, 57, 57, 60, 61, 64, 65, 67, 69, 71, 73, 75, 77, 78, 82, 82, 85,
86, 89, 89, 90, 91, 91, 89, 87, 67, 63, 63, 60, 60, 59, 58, 57, 59, 60,
62, 63, 65, 66, 69, 70, 73, 74, 77, 78, 81, 83, 85, 87, 88, 92, 92, 94,
- 94, 96, 95, 95 },
+ 94, 96, 95, 95,
+ /* Size 32x8 */
+ 32, 32, 40, 49, 51, 57, 63, 67, 31, 33, 41, 47, 49, 54, 60, 63, 31, 33,
+ 41, 47, 49, 54, 59, 63, 30, 33, 42, 45, 47, 52, 57, 60, 31, 35, 43, 46,
+ 47, 51, 57, 60, 33, 37, 44, 46, 47, 51, 56, 59, 35, 39, 46, 46, 47, 50,
+ 55, 58, 37, 41, 47, 46, 46, 50, 54, 57, 41, 43, 48, 49, 49, 52, 57, 59,
+ 42, 43, 48, 49, 50, 53, 57, 60, 49, 47, 50, 53, 54, 57, 60, 62, 49, 47,
+ 50, 53, 54, 57, 61, 63, 48, 46, 49, 54, 57, 60, 64, 65, 48, 46, 49, 55,
+ 58, 61, 65, 66, 49, 45, 48, 56, 61, 64, 67, 69, 49, 46, 49, 57, 62, 65,
+ 69, 70, 50, 46, 49, 57, 63, 67, 71, 73, 51, 47, 49, 58, 64, 69, 73, 74,
+ 52, 48, 50, 58, 65, 71, 75, 77, 54, 49, 51, 59, 67, 73, 77, 78, 54, 50,
+ 51, 59, 67, 73, 78, 81, 57, 52, 52, 60, 69, 76, 82, 83, 57, 52, 53, 61,
+ 69, 77, 82, 85, 60, 54, 55, 62, 71, 79, 85, 87, 61, 55, 56, 63, 72, 80,
+ 86, 88, 63, 57, 57, 64, 73, 82, 89, 92, 64, 58, 58, 65, 73, 82, 89, 92,
+ 64, 58, 58, 65, 74, 83, 90, 94, 66, 59, 59, 66, 75, 84, 91, 94, 67, 60,
+ 59, 66, 74, 82, 91, 96, 68, 61, 59, 65, 72, 81, 89, 95, 68, 62, 59, 65,
+ 71, 79, 87, 95 },
},
{
{ /* Luma */
@@ -2566,21 +2566,12 @@
79, 79, 77, 77, 75, 75, 80, 80, 84, 84, 90, 90, 96, 96, 102, 102, 109,
109, 116, 116, 124, 124, 132, 132, 141, 141, 144, 144, 149,
/* Size 4x8 */
- 32, 35, 51, 75, 32, 36, 50, 71, 34, 42, 54, 73, 37, 50, 65, 84, 45, 56,
- 76, 96, 54, 63, 87, 110, 65, 73, 97, 125, 75, 81, 106, 136,
- /* Size 8x4 */
32, 32, 34, 37, 45, 54, 65, 75, 35, 36, 42, 50, 56, 63, 73, 81, 51, 50,
54, 65, 76, 87, 97, 106, 75, 71, 73, 84, 96, 110, 125, 136,
+ /* Size 8x4 */
+ 32, 35, 51, 75, 32, 36, 50, 71, 34, 42, 54, 73, 37, 50, 65, 84, 45, 56,
+ 76, 96, 54, 63, 87, 110, 65, 73, 97, 125, 75, 81, 106, 136,
/* Size 8x16 */
- 32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
- 33, 34, 41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 33, 35, 38,
- 42, 49, 58, 69, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34, 38, 48, 54, 60,
- 68, 78, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71, 79, 90,
- 48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 58, 54,
- 54, 63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 71, 65, 63,
- 73, 84, 97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 82, 75, 72, 81,
- 92, 106, 121, 136,
- /* Size 16x8 */
32, 31, 31, 32, 32, 34, 36, 39, 44, 48, 53, 58, 65, 71, 79, 82, 31, 32,
32, 32, 33, 34, 34, 37, 41, 45, 49, 54, 60, 65, 72, 75, 32, 32, 33, 34,
35, 37, 38, 40, 43, 46, 50, 54, 58, 63, 70, 72, 36, 35, 34, 36, 38, 42,
@@ -2589,38 +2580,16 @@
82, 87, 92, 97, 104, 106, 65, 62, 59, 59, 58, 63, 68, 73, 79, 85, 92,
98, 105, 111, 118, 121, 79, 75, 72, 71, 69, 73, 78, 84, 90, 96, 103,
110, 118, 125, 133, 136,
+ /* Size 16x8 */
+ 32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
+ 33, 34, 41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 33, 35, 38,
+ 42, 49, 58, 69, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34, 38, 48, 54, 60,
+ 68, 78, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71, 79, 90,
+ 48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 58, 54,
+ 54, 63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 71, 65, 63,
+ 73, 84, 97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 82, 75, 72, 81,
+ 92, 106, 121, 136,
/* Size 16x32 */
- 32, 31, 31, 32, 32, 36, 36, 44, 44, 53, 53, 65, 65, 79, 79, 87, 31, 32,
- 32, 32, 32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 32,
- 32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 33, 33, 34,
- 34, 41, 41, 49, 49, 59, 59, 72, 72, 78, 31, 32, 32, 33, 33, 34, 34, 41,
- 41, 49, 49, 59, 59, 72, 72, 78, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50,
- 50, 59, 59, 71, 71, 77, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50, 50, 59,
- 59, 71, 71, 77, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69,
- 69, 75, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69, 69, 75,
- 34, 34, 34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 34, 34,
- 34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 36, 34, 34, 38,
- 38, 48, 48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 36, 34, 34, 38, 38, 48,
- 48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 39, 37, 37, 40, 40, 50, 50, 58,
- 58, 65, 65, 73, 73, 84, 84, 89, 39, 37, 37, 40, 40, 50, 50, 58, 58, 65,
- 65, 73, 73, 84, 84, 89, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79,
- 79, 90, 90, 95, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79, 79, 90,
- 90, 95, 48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102,
- 48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102, 53, 49,
- 49, 50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 53, 49, 49,
- 50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 58, 54, 54, 54,
- 54, 63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 58, 54, 54, 54, 54,
- 63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 65, 60, 60, 58, 58, 68,
- 68, 79, 79, 92, 92, 105, 105, 118, 118, 124, 65, 60, 60, 58, 58, 68, 68,
- 79, 79, 92, 92, 105, 105, 118, 118, 124, 71, 65, 65, 63, 63, 73, 73, 84,
- 84, 97, 97, 111, 111, 125, 125, 132, 71, 65, 65, 63, 63, 73, 73, 84, 84,
- 97, 97, 111, 111, 125, 125, 132, 79, 72, 72, 70, 70, 79, 79, 90, 90,
- 104, 104, 118, 118, 133, 133, 141, 79, 72, 72, 70, 70, 79, 79, 90, 90,
- 104, 104, 118, 118, 133, 133, 141, 82, 75, 75, 72, 72, 81, 81, 92, 92,
- 106, 106, 121, 121, 136, 136, 144, 82, 75, 75, 72, 72, 81, 81, 92, 92,
- 106, 106, 121, 121, 136, 136, 144, 87, 79, 79, 76, 76, 84, 84, 96, 96,
- 109, 109, 124, 124, 141, 141, 149,
- /* Size 32x16 */
32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 44, 48,
48, 53, 53, 58, 58, 65, 65, 71, 71, 79, 79, 82, 82, 87, 31, 32, 32, 32,
32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 41, 45, 45, 49, 49, 54,
@@ -2651,33 +2620,48 @@
125, 125, 133, 133, 136, 136, 141, 87, 82, 82, 78, 78, 77, 77, 75, 75,
79, 79, 84, 84, 89, 89, 95, 95, 102, 102, 109, 109, 116, 116, 124, 124,
132, 132, 141, 141, 144, 144, 149,
+ /* Size 32x16 */
+ 32, 31, 31, 32, 32, 36, 36, 44, 44, 53, 53, 65, 65, 79, 79, 87, 31, 32,
+ 32, 32, 32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 32,
+ 32, 35, 35, 42, 42, 51, 51, 62, 62, 75, 75, 82, 31, 32, 32, 33, 33, 34,
+ 34, 41, 41, 49, 49, 59, 59, 72, 72, 78, 31, 32, 32, 33, 33, 34, 34, 41,
+ 41, 49, 49, 59, 59, 72, 72, 78, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50,
+ 50, 59, 59, 71, 71, 77, 32, 32, 32, 34, 34, 36, 36, 42, 42, 50, 50, 59,
+ 59, 71, 71, 77, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69,
+ 69, 75, 32, 33, 33, 35, 35, 38, 38, 42, 42, 49, 49, 58, 58, 69, 69, 75,
+ 34, 34, 34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 34, 34,
+ 34, 37, 37, 42, 42, 48, 48, 54, 54, 63, 63, 73, 73, 79, 36, 34, 34, 38,
+ 38, 48, 48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 36, 34, 34, 38, 38, 48,
+ 48, 54, 54, 60, 60, 68, 68, 78, 78, 84, 39, 37, 37, 40, 40, 50, 50, 58,
+ 58, 65, 65, 73, 73, 84, 84, 89, 39, 37, 37, 40, 40, 50, 50, 58, 58, 65,
+ 65, 73, 73, 84, 84, 89, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79,
+ 79, 90, 90, 95, 44, 41, 41, 43, 43, 53, 53, 63, 63, 71, 71, 79, 79, 90,
+ 90, 95, 48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102,
+ 48, 45, 45, 46, 46, 56, 56, 67, 67, 76, 76, 85, 85, 96, 96, 102, 53, 49,
+ 49, 50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 53, 49, 49,
+ 50, 50, 60, 60, 71, 71, 82, 82, 92, 92, 103, 103, 109, 58, 54, 54, 54,
+ 54, 63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 58, 54, 54, 54, 54,
+ 63, 63, 75, 75, 87, 87, 98, 98, 110, 110, 116, 65, 60, 60, 58, 58, 68,
+ 68, 79, 79, 92, 92, 105, 105, 118, 118, 124, 65, 60, 60, 58, 58, 68, 68,
+ 79, 79, 92, 92, 105, 105, 118, 118, 124, 71, 65, 65, 63, 63, 73, 73, 84,
+ 84, 97, 97, 111, 111, 125, 125, 132, 71, 65, 65, 63, 63, 73, 73, 84, 84,
+ 97, 97, 111, 111, 125, 125, 132, 79, 72, 72, 70, 70, 79, 79, 90, 90,
+ 104, 104, 118, 118, 133, 133, 141, 79, 72, 72, 70, 70, 79, 79, 90, 90,
+ 104, 104, 118, 118, 133, 133, 141, 82, 75, 75, 72, 72, 81, 81, 92, 92,
+ 106, 106, 121, 121, 136, 136, 144, 82, 75, 75, 72, 72, 81, 81, 92, 92,
+ 106, 106, 121, 121, 136, 136, 144, 87, 79, 79, 76, 76, 84, 84, 96, 96,
+ 109, 109, 124, 124, 141, 141, 149,
/* Size 4x16 */
- 31, 36, 53, 79, 32, 35, 51, 75, 32, 34, 49, 72, 32, 36, 50, 71, 33, 38,
- 49, 69, 34, 42, 54, 73, 34, 48, 60, 78, 37, 50, 65, 84, 41, 53, 71, 90,
- 45, 56, 76, 96, 49, 60, 82, 103, 54, 63, 87, 110, 60, 68, 92, 118, 65,
- 73, 97, 125, 72, 79, 104, 133, 75, 81, 106, 136,
- /* Size 16x4 */
31, 32, 32, 32, 33, 34, 34, 37, 41, 45, 49, 54, 60, 65, 72, 75, 36, 35,
34, 36, 38, 42, 48, 50, 53, 56, 60, 63, 68, 73, 79, 81, 53, 51, 49, 50,
49, 54, 60, 65, 71, 76, 82, 87, 92, 97, 104, 106, 79, 75, 72, 71, 69,
73, 78, 84, 90, 96, 103, 110, 118, 125, 133, 136,
+ /* Size 16x4 */
+ 31, 36, 53, 79, 32, 35, 51, 75, 32, 34, 49, 72, 32, 36, 50, 71, 33, 38,
+ 49, 69, 34, 42, 54, 73, 34, 48, 60, 78, 37, 50, 65, 84, 41, 53, 71, 90,
+ 45, 56, 76, 96, 49, 60, 82, 103, 54, 63, 87, 110, 60, 68, 92, 118, 65,
+ 73, 97, 125, 72, 79, 104, 133, 75, 81, 106, 136,
/* Size 8x32 */
- 32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
- 32, 35, 42, 51, 62, 75, 31, 32, 33, 34, 41, 49, 59, 72, 31, 32, 33, 34,
- 41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 32, 34, 36, 42, 50,
- 59, 71, 32, 33, 35, 38, 42, 49, 58, 69, 32, 33, 35, 38, 42, 49, 58, 69,
- 34, 34, 37, 42, 48, 54, 63, 73, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34,
- 38, 48, 54, 60, 68, 78, 36, 34, 38, 48, 54, 60, 68, 78, 39, 37, 40, 50,
- 58, 65, 73, 84, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71,
- 79, 90, 44, 41, 43, 53, 63, 71, 79, 90, 48, 45, 46, 56, 67, 76, 85, 96,
- 48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 53, 49,
- 50, 60, 71, 82, 92, 103, 58, 54, 54, 63, 75, 87, 98, 110, 58, 54, 54,
- 63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 65, 60, 58, 68,
- 79, 92, 105, 118, 71, 65, 63, 73, 84, 97, 111, 125, 71, 65, 63, 73, 84,
- 97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 79, 72, 70, 79, 90,
- 104, 118, 133, 82, 75, 72, 81, 92, 106, 121, 136, 82, 75, 72, 81, 92,
- 106, 121, 136, 87, 79, 76, 84, 96, 109, 124, 141,
- /* Size 32x8 */
32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 44, 48,
48, 53, 53, 58, 58, 65, 65, 71, 71, 79, 79, 82, 82, 87, 31, 32, 32, 32,
32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 41, 45, 45, 49, 49, 54,
@@ -2692,7 +2676,23 @@
59, 59, 58, 58, 63, 63, 68, 68, 73, 73, 79, 79, 85, 85, 92, 92, 98, 98,
105, 105, 111, 111, 118, 118, 121, 121, 124, 79, 75, 75, 72, 72, 71, 71,
69, 69, 73, 73, 78, 78, 84, 84, 90, 90, 96, 96, 103, 103, 110, 110, 118,
- 118, 125, 125, 133, 133, 136, 136, 141 },
+ 118, 125, 125, 133, 133, 136, 136, 141,
+ /* Size 32x8 */
+ 32, 31, 32, 36, 44, 53, 65, 79, 31, 32, 32, 35, 42, 51, 62, 75, 31, 32,
+ 32, 35, 42, 51, 62, 75, 31, 32, 33, 34, 41, 49, 59, 72, 31, 32, 33, 34,
+ 41, 49, 59, 72, 32, 32, 34, 36, 42, 50, 59, 71, 32, 32, 34, 36, 42, 50,
+ 59, 71, 32, 33, 35, 38, 42, 49, 58, 69, 32, 33, 35, 38, 42, 49, 58, 69,
+ 34, 34, 37, 42, 48, 54, 63, 73, 34, 34, 37, 42, 48, 54, 63, 73, 36, 34,
+ 38, 48, 54, 60, 68, 78, 36, 34, 38, 48, 54, 60, 68, 78, 39, 37, 40, 50,
+ 58, 65, 73, 84, 39, 37, 40, 50, 58, 65, 73, 84, 44, 41, 43, 53, 63, 71,
+ 79, 90, 44, 41, 43, 53, 63, 71, 79, 90, 48, 45, 46, 56, 67, 76, 85, 96,
+ 48, 45, 46, 56, 67, 76, 85, 96, 53, 49, 50, 60, 71, 82, 92, 103, 53, 49,
+ 50, 60, 71, 82, 92, 103, 58, 54, 54, 63, 75, 87, 98, 110, 58, 54, 54,
+ 63, 75, 87, 98, 110, 65, 60, 58, 68, 79, 92, 105, 118, 65, 60, 58, 68,
+ 79, 92, 105, 118, 71, 65, 63, 73, 84, 97, 111, 125, 71, 65, 63, 73, 84,
+ 97, 111, 125, 79, 72, 70, 79, 90, 104, 118, 133, 79, 72, 70, 79, 90,
+ 104, 118, 133, 82, 75, 72, 81, 92, 106, 121, 136, 82, 75, 72, 81, 92,
+ 106, 121, 136, 87, 79, 76, 84, 96, 109, 124, 141 },
{ /* Chroma */
/* Size 4x4 */
32, 46, 47, 57, 46, 53, 54, 60, 47, 54, 66, 75, 57, 60, 75, 89,
@@ -2776,21 +2776,12 @@
91, 93, 67, 63, 63, 60, 60, 59, 59, 57, 57, 60, 60, 62, 62, 66, 66, 69,
69, 72, 72, 76, 76, 80, 80, 84, 84, 88, 88, 92, 92, 93, 93, 95,
/* Size 4x8 */
- 31, 47, 50, 60, 36, 47, 47, 56, 43, 50, 50, 57, 46, 53, 57, 64, 46, 54,
- 64, 71, 50, 55, 68, 78, 54, 58, 72, 85, 59, 61, 75, 90,
- /* Size 8x4 */
31, 36, 43, 46, 46, 50, 54, 59, 47, 47, 50, 53, 54, 55, 58, 61, 50, 47,
50, 57, 64, 68, 72, 75, 60, 56, 57, 64, 71, 78, 85, 90,
+ /* Size 8x4 */
+ 31, 47, 50, 60, 36, 47, 47, 56, 43, 50, 50, 57, 46, 53, 57, 64, 46, 54,
+ 64, 71, 50, 55, 68, 78, 54, 58, 72, 85, 59, 61, 75, 90,
/* Size 8x16 */
- 32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 30, 32,
- 40, 46, 45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 37, 40, 47, 47,
- 45, 47, 50, 54, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46, 48, 53, 53, 54,
- 57, 60, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61, 64, 67,
- 50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 54, 50,
- 49, 55, 62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 60, 54, 52, 58,
- 65, 72, 79, 85, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59, 56, 61, 68, 75,
- 83, 90,
- /* Size 16x8 */
32, 31, 30, 33, 37, 42, 49, 48, 49, 50, 52, 54, 57, 60, 63, 64, 31, 31,
32, 36, 40, 43, 46, 46, 45, 46, 48, 50, 52, 54, 57, 59, 37, 38, 40, 43,
47, 47, 48, 47, 46, 46, 47, 49, 50, 52, 55, 56, 48, 47, 46, 47, 47, 50,
@@ -2799,37 +2790,16 @@
66, 68, 70, 72, 75, 75, 57, 54, 52, 51, 50, 53, 57, 60, 64, 67, 71, 73,
76, 79, 82, 83, 63, 60, 57, 56, 54, 57, 60, 64, 67, 71, 75, 78, 82, 85,
89, 90,
+ /* Size 16x8 */
+ 32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 30, 32,
+ 40, 46, 45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 37, 40, 47, 47,
+ 45, 47, 50, 54, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46, 48, 53, 53, 54,
+ 57, 60, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61, 64, 67,
+ 50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 54, 50,
+ 49, 55, 62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 60, 54, 52, 58,
+ 65, 72, 79, 85, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59, 56, 61, 68, 75,
+ 83, 90,
/* Size 16x32 */
- 32, 31, 31, 37, 37, 48, 48, 49, 49, 52, 52, 57, 57, 63, 63, 66, 31, 31,
- 31, 38, 38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 31, 31, 31, 38,
- 38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 30, 32, 32, 40, 40, 46,
- 46, 45, 45, 48, 48, 52, 52, 57, 57, 60, 30, 32, 32, 40, 40, 46, 46, 45,
- 45, 48, 48, 52, 52, 57, 57, 60, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47,
- 47, 51, 51, 56, 56, 59, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47, 47, 51,
- 51, 56, 56, 59, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54,
- 54, 57, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54, 54, 57,
- 42, 43, 43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 42, 43,
- 43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 49, 46, 46, 48,
- 48, 53, 53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 49, 46, 46, 48, 48, 53,
- 53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 48, 46, 46, 47, 47, 53, 53, 56,
- 56, 57, 57, 60, 60, 64, 64, 66, 48, 46, 46, 47, 47, 53, 53, 56, 56, 57,
- 57, 60, 60, 64, 64, 66, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64,
- 64, 67, 67, 69, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64, 64, 67,
- 67, 69, 50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73,
- 50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73, 52, 48,
- 48, 47, 47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 52, 48, 48, 47,
- 47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 54, 50, 50, 49, 49, 55,
- 55, 62, 62, 68, 68, 73, 73, 78, 78, 80, 54, 50, 50, 49, 49, 55, 55, 62,
- 62, 68, 68, 73, 73, 78, 78, 80, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70,
- 70, 76, 76, 82, 82, 84, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70, 70, 76,
- 76, 82, 82, 84, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85,
- 85, 88, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85, 85, 88,
- 63, 57, 57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 63, 57,
- 57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 64, 59, 59, 56,
- 56, 61, 61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 64, 59, 59, 56, 56, 61,
- 61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 66, 60, 60, 57, 57, 63, 63, 69,
- 69, 77, 77, 84, 84, 92, 92, 95,
- /* Size 32x16 */
32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 49, 50,
50, 52, 52, 54, 54, 57, 57, 60, 60, 63, 63, 64, 64, 66, 31, 31, 31, 32,
32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 46, 46, 48, 48, 50,
@@ -2859,33 +2829,47 @@
75, 78, 78, 82, 82, 85, 85, 89, 89, 90, 90, 92, 66, 63, 63, 60, 60, 59,
59, 57, 57, 60, 60, 62, 62, 66, 66, 69, 69, 73, 73, 77, 77, 80, 80, 84,
84, 88, 88, 92, 92, 93, 93, 95,
+ /* Size 32x16 */
+ 32, 31, 31, 37, 37, 48, 48, 49, 49, 52, 52, 57, 57, 63, 63, 66, 31, 31,
+ 31, 38, 38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 31, 31, 31, 38,
+ 38, 47, 47, 47, 47, 50, 50, 54, 54, 60, 60, 63, 30, 32, 32, 40, 40, 46,
+ 46, 45, 45, 48, 48, 52, 52, 57, 57, 60, 30, 32, 32, 40, 40, 46, 46, 45,
+ 45, 48, 48, 52, 52, 57, 57, 60, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47,
+ 47, 51, 51, 56, 56, 59, 33, 36, 36, 43, 43, 47, 47, 46, 46, 47, 47, 51,
+ 51, 56, 56, 59, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54,
+ 54, 57, 37, 40, 40, 47, 47, 47, 47, 45, 45, 47, 47, 50, 50, 54, 54, 57,
+ 42, 43, 43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 42, 43,
+ 43, 47, 47, 50, 50, 49, 49, 50, 50, 53, 53, 57, 57, 60, 49, 46, 46, 48,
+ 48, 53, 53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 49, 46, 46, 48, 48, 53,
+ 53, 53, 53, 54, 54, 57, 57, 60, 60, 62, 48, 46, 46, 47, 47, 53, 53, 56,
+ 56, 57, 57, 60, 60, 64, 64, 66, 48, 46, 46, 47, 47, 53, 53, 56, 56, 57,
+ 57, 60, 60, 64, 64, 66, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64,
+ 64, 67, 67, 69, 49, 45, 45, 46, 46, 53, 53, 58, 58, 61, 61, 64, 64, 67,
+ 67, 69, 50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73,
+ 50, 46, 46, 46, 46, 54, 54, 59, 59, 64, 64, 67, 67, 71, 71, 73, 52, 48,
+ 48, 47, 47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 52, 48, 48, 47,
+ 47, 54, 54, 61, 61, 66, 66, 71, 71, 75, 75, 77, 54, 50, 50, 49, 49, 55,
+ 55, 62, 62, 68, 68, 73, 73, 78, 78, 80, 54, 50, 50, 49, 49, 55, 55, 62,
+ 62, 68, 68, 73, 73, 78, 78, 80, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70,
+ 70, 76, 76, 82, 82, 84, 57, 52, 52, 50, 50, 56, 56, 64, 64, 70, 70, 76,
+ 76, 82, 82, 84, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85,
+ 85, 88, 60, 54, 54, 52, 52, 58, 58, 65, 65, 72, 72, 79, 79, 85, 85, 88,
+ 63, 57, 57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 63, 57,
+ 57, 55, 55, 60, 60, 67, 67, 75, 75, 82, 82, 89, 89, 92, 64, 59, 59, 56,
+ 56, 61, 61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 64, 59, 59, 56, 56, 61,
+ 61, 68, 68, 75, 75, 83, 83, 90, 90, 93, 66, 60, 60, 57, 57, 63, 63, 69,
+ 69, 77, 77, 84, 84, 92, 92, 95,
/* Size 4x16 */
- 31, 48, 52, 63, 31, 47, 50, 60, 32, 46, 48, 57, 36, 47, 47, 56, 40, 47,
- 47, 54, 43, 50, 50, 57, 46, 53, 54, 60, 46, 53, 57, 64, 45, 53, 61, 67,
- 46, 54, 64, 71, 48, 54, 66, 75, 50, 55, 68, 78, 52, 56, 70, 82, 54, 58,
- 72, 85, 57, 60, 75, 89, 59, 61, 75, 90,
- /* Size 16x4 */
31, 31, 32, 36, 40, 43, 46, 46, 45, 46, 48, 50, 52, 54, 57, 59, 48, 47,
46, 47, 47, 50, 53, 53, 53, 54, 54, 55, 56, 58, 60, 61, 52, 50, 48, 47,
47, 50, 54, 57, 61, 64, 66, 68, 70, 72, 75, 75, 63, 60, 57, 56, 54, 57,
60, 64, 67, 71, 75, 78, 82, 85, 89, 90,
+ /* Size 16x4 */
+ 31, 48, 52, 63, 31, 47, 50, 60, 32, 46, 48, 57, 36, 47, 47, 56, 40, 47,
+ 47, 54, 43, 50, 50, 57, 46, 53, 54, 60, 46, 53, 57, 64, 45, 53, 61, 67,
+ 46, 54, 64, 71, 48, 54, 66, 75, 50, 55, 68, 78, 52, 56, 70, 82, 54, 58,
+ 72, 85, 57, 60, 75, 89, 59, 61, 75, 90,
/* Size 8x32 */
- 32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 31, 31,
- 38, 47, 47, 50, 54, 60, 30, 32, 40, 46, 45, 48, 52, 57, 30, 32, 40, 46,
- 45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 33, 36, 43, 47, 46, 47,
- 51, 56, 37, 40, 47, 47, 45, 47, 50, 54, 37, 40, 47, 47, 45, 47, 50, 54,
- 42, 43, 47, 50, 49, 50, 53, 57, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46,
- 48, 53, 53, 54, 57, 60, 49, 46, 48, 53, 53, 54, 57, 60, 48, 46, 47, 53,
- 56, 57, 60, 64, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61,
- 64, 67, 49, 45, 46, 53, 58, 61, 64, 67, 50, 46, 46, 54, 59, 64, 67, 71,
- 50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 52, 48,
- 47, 54, 61, 66, 71, 75, 54, 50, 49, 55, 62, 68, 73, 78, 54, 50, 49, 55,
- 62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 57, 52, 50, 56, 64, 70,
- 76, 82, 60, 54, 52, 58, 65, 72, 79, 85, 60, 54, 52, 58, 65, 72, 79, 85,
- 63, 57, 55, 60, 67, 75, 82, 89, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59,
- 56, 61, 68, 75, 83, 90, 64, 59, 56, 61, 68, 75, 83, 90, 66, 60, 57, 63,
- 69, 77, 84, 92,
- /* Size 32x8 */
32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 49, 50,
50, 52, 52, 54, 54, 57, 57, 60, 60, 63, 63, 64, 64, 66, 31, 31, 31, 32,
32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 46, 46, 48, 48, 50,
@@ -2900,7 +2884,23 @@
51, 50, 50, 53, 53, 57, 57, 60, 60, 64, 64, 67, 67, 71, 71, 73, 73, 76,
76, 79, 79, 82, 82, 83, 83, 84, 63, 60, 60, 57, 57, 56, 56, 54, 54, 57,
57, 60, 60, 64, 64, 67, 67, 71, 71, 75, 75, 78, 78, 82, 82, 85, 85, 89,
- 89, 90, 90, 92 },
+ 89, 90, 90, 92,
+ /* Size 32x8 */
+ 32, 31, 37, 48, 49, 52, 57, 63, 31, 31, 38, 47, 47, 50, 54, 60, 31, 31,
+ 38, 47, 47, 50, 54, 60, 30, 32, 40, 46, 45, 48, 52, 57, 30, 32, 40, 46,
+ 45, 48, 52, 57, 33, 36, 43, 47, 46, 47, 51, 56, 33, 36, 43, 47, 46, 47,
+ 51, 56, 37, 40, 47, 47, 45, 47, 50, 54, 37, 40, 47, 47, 45, 47, 50, 54,
+ 42, 43, 47, 50, 49, 50, 53, 57, 42, 43, 47, 50, 49, 50, 53, 57, 49, 46,
+ 48, 53, 53, 54, 57, 60, 49, 46, 48, 53, 53, 54, 57, 60, 48, 46, 47, 53,
+ 56, 57, 60, 64, 48, 46, 47, 53, 56, 57, 60, 64, 49, 45, 46, 53, 58, 61,
+ 64, 67, 49, 45, 46, 53, 58, 61, 64, 67, 50, 46, 46, 54, 59, 64, 67, 71,
+ 50, 46, 46, 54, 59, 64, 67, 71, 52, 48, 47, 54, 61, 66, 71, 75, 52, 48,
+ 47, 54, 61, 66, 71, 75, 54, 50, 49, 55, 62, 68, 73, 78, 54, 50, 49, 55,
+ 62, 68, 73, 78, 57, 52, 50, 56, 64, 70, 76, 82, 57, 52, 50, 56, 64, 70,
+ 76, 82, 60, 54, 52, 58, 65, 72, 79, 85, 60, 54, 52, 58, 65, 72, 79, 85,
+ 63, 57, 55, 60, 67, 75, 82, 89, 63, 57, 55, 60, 67, 75, 82, 89, 64, 59,
+ 56, 61, 68, 75, 83, 90, 64, 59, 56, 61, 68, 75, 83, 90, 66, 60, 57, 63,
+ 69, 77, 84, 92 },
},
{
{ /* Luma */
@@ -2988,21 +2988,12 @@
84, 86, 90, 91, 96, 96, 103, 104, 108, 110, 114, 118, 120, 125, 126,
134, 134,
/* Size 4x8 */
- 32, 34, 43, 62, 32, 34, 42, 59, 33, 37, 44, 58, 35, 43, 54, 68, 41, 48,
- 64, 79, 49, 54, 71, 91, 57, 60, 78, 101, 66, 68, 86, 111,
- /* Size 8x4 */
32, 32, 33, 35, 41, 49, 57, 66, 34, 34, 37, 43, 48, 54, 60, 68, 43, 42,
44, 54, 64, 71, 78, 86, 62, 59, 58, 68, 79, 91, 101, 111,
+ /* Size 8x4 */
+ 32, 34, 43, 62, 32, 34, 42, 59, 33, 37, 44, 58, 35, 43, 54, 68, 41, 48,
+ 64, 79, 49, 54, 71, 91, 57, 60, 78, 101, 66, 68, 86, 111,
/* Size 8x16 */
- 32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 59, 69, 31, 32,
- 33, 34, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50, 57, 65, 32, 33, 35, 38,
- 42, 49, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59,
- 65, 73, 38, 36, 40, 49, 56, 63, 69, 77, 41, 39, 41, 51, 60, 67, 74, 81,
- 44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 53, 49,
- 50, 60, 71, 82, 90, 99, 58, 54, 54, 63, 75, 87, 95, 105, 65, 60, 58, 68,
- 79, 92, 102, 112, 71, 65, 63, 73, 84, 97, 108, 119, 79, 72, 70, 79, 90,
- 104, 115, 127,
- /* Size 16x8 */
32, 31, 31, 32, 32, 34, 35, 38, 41, 44, 48, 53, 58, 65, 71, 79, 31, 32,
32, 32, 33, 34, 34, 36, 39, 42, 45, 49, 54, 60, 65, 72, 32, 32, 33, 34,
35, 37, 38, 40, 41, 43, 46, 50, 54, 58, 63, 70, 36, 35, 34, 36, 38, 42,
@@ -3011,37 +3002,16 @@
76, 82, 87, 92, 97, 104, 62, 59, 57, 57, 56, 61, 65, 69, 74, 79, 83, 90,
95, 102, 108, 115, 73, 69, 66, 65, 64, 69, 73, 77, 81, 86, 91, 99, 105,
112, 119, 127,
+ /* Size 16x8 */
+ 32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 59, 69, 31, 32,
+ 33, 34, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50, 57, 65, 32, 33, 35, 38,
+ 42, 49, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59,
+ 65, 73, 38, 36, 40, 49, 56, 63, 69, 77, 41, 39, 41, 51, 60, 67, 74, 81,
+ 44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 53, 49,
+ 50, 60, 71, 82, 90, 99, 58, 54, 54, 63, 75, 87, 95, 105, 65, 60, 58, 68,
+ 79, 92, 102, 112, 71, 65, 63, 73, 84, 97, 108, 119, 79, 72, 70, 79, 90,
+ 104, 115, 127,
/* Size 16x32 */
- 32, 31, 31, 32, 32, 34, 36, 38, 44, 44, 53, 53, 62, 65, 73, 79, 31, 32,
- 32, 32, 32, 34, 35, 37, 42, 43, 51, 51, 60, 62, 70, 75, 31, 32, 32, 32,
- 32, 34, 35, 37, 42, 43, 51, 51, 59, 62, 69, 75, 31, 32, 32, 32, 32, 33,
- 35, 36, 41, 42, 50, 50, 58, 60, 67, 73, 31, 32, 32, 32, 33, 33, 34, 36,
- 41, 41, 49, 49, 57, 59, 66, 72, 31, 32, 32, 33, 33, 34, 35, 37, 41, 42,
- 49, 49, 57, 59, 66, 71, 32, 32, 32, 33, 34, 35, 36, 38, 42, 43, 50, 50,
- 57, 59, 65, 71, 32, 32, 32, 34, 34, 35, 37, 38, 42, 43, 49, 49, 56, 59,
- 65, 70, 32, 32, 33, 34, 35, 37, 38, 39, 42, 43, 49, 49, 56, 58, 64, 69,
- 32, 33, 33, 34, 35, 37, 39, 40, 43, 44, 50, 50, 56, 58, 64, 69, 34, 34,
- 34, 36, 37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 34, 34, 34, 36,
- 37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 35, 34, 34, 37, 38, 42,
- 47, 48, 52, 53, 59, 59, 65, 67, 73, 77, 36, 35, 34, 37, 38, 43, 48, 49,
- 54, 54, 60, 60, 66, 68, 74, 78, 38, 36, 36, 38, 40, 44, 49, 51, 56, 57,
- 63, 63, 69, 71, 77, 81, 39, 38, 37, 40, 40, 45, 50, 52, 58, 58, 65, 65,
- 71, 73, 79, 84, 41, 39, 39, 41, 41, 46, 51, 54, 60, 60, 67, 67, 74, 76,
- 81, 86, 44, 41, 41, 42, 43, 48, 53, 56, 63, 64, 71, 71, 78, 79, 85, 90,
- 44, 42, 42, 43, 43, 48, 54, 56, 64, 64, 72, 72, 79, 81, 86, 91, 48, 45,
- 45, 46, 46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 48, 45, 45, 46,
- 46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 53, 49, 49, 49, 49, 54,
- 59, 62, 71, 71, 81, 81, 89, 91, 98, 103, 53, 50, 49, 50, 50, 54, 60, 63,
- 71, 72, 82, 82, 90, 92, 99, 103, 57, 53, 52, 52, 52, 57, 62, 65, 74, 75,
- 85, 85, 94, 96, 103, 108, 58, 54, 54, 54, 54, 58, 63, 67, 75, 76, 87,
- 87, 95, 98, 105, 110, 61, 57, 57, 56, 56, 60, 66, 69, 77, 78, 89, 89,
- 98, 101, 108, 114, 65, 60, 60, 59, 58, 63, 68, 71, 79, 80, 92, 92, 102,
- 105, 112, 118, 67, 62, 61, 60, 60, 64, 69, 72, 81, 82, 94, 94, 103, 106,
- 114, 120, 71, 66, 65, 64, 63, 68, 73, 76, 84, 85, 97, 97, 108, 111, 119,
- 125, 72, 66, 66, 64, 64, 68, 73, 76, 85, 86, 98, 98, 108, 111, 119, 125,
- 79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
- 79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 35, 36, 38, 39, 41, 44,
44, 48, 48, 53, 53, 57, 58, 61, 65, 67, 71, 72, 79, 79, 31, 32, 32, 32,
32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 36, 38, 39, 41, 42, 45, 45, 49,
@@ -3072,33 +3042,47 @@
127, 127, 79, 75, 75, 73, 72, 71, 71, 70, 69, 69, 73, 73, 77, 78, 81,
84, 86, 90, 91, 96, 96, 103, 103, 108, 110, 114, 118, 120, 125, 125,
133, 133,
+ /* Size 32x16 */
+ 32, 31, 31, 32, 32, 34, 36, 38, 44, 44, 53, 53, 62, 65, 73, 79, 31, 32,
+ 32, 32, 32, 34, 35, 37, 42, 43, 51, 51, 60, 62, 70, 75, 31, 32, 32, 32,
+ 32, 34, 35, 37, 42, 43, 51, 51, 59, 62, 69, 75, 31, 32, 32, 32, 32, 33,
+ 35, 36, 41, 42, 50, 50, 58, 60, 67, 73, 31, 32, 32, 32, 33, 33, 34, 36,
+ 41, 41, 49, 49, 57, 59, 66, 72, 31, 32, 32, 33, 33, 34, 35, 37, 41, 42,
+ 49, 49, 57, 59, 66, 71, 32, 32, 32, 33, 34, 35, 36, 38, 42, 43, 50, 50,
+ 57, 59, 65, 71, 32, 32, 32, 34, 34, 35, 37, 38, 42, 43, 49, 49, 56, 59,
+ 65, 70, 32, 32, 33, 34, 35, 37, 38, 39, 42, 43, 49, 49, 56, 58, 64, 69,
+ 32, 33, 33, 34, 35, 37, 39, 40, 43, 44, 50, 50, 56, 58, 64, 69, 34, 34,
+ 34, 36, 37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 34, 34, 34, 36,
+ 37, 39, 42, 44, 48, 48, 54, 54, 61, 63, 69, 73, 35, 34, 34, 37, 38, 42,
+ 47, 48, 52, 53, 59, 59, 65, 67, 73, 77, 36, 35, 34, 37, 38, 43, 48, 49,
+ 54, 54, 60, 60, 66, 68, 74, 78, 38, 36, 36, 38, 40, 44, 49, 51, 56, 57,
+ 63, 63, 69, 71, 77, 81, 39, 38, 37, 40, 40, 45, 50, 52, 58, 58, 65, 65,
+ 71, 73, 79, 84, 41, 39, 39, 41, 41, 46, 51, 54, 60, 60, 67, 67, 74, 76,
+ 81, 86, 44, 41, 41, 42, 43, 48, 53, 56, 63, 64, 71, 71, 78, 79, 85, 90,
+ 44, 42, 42, 43, 43, 48, 54, 56, 64, 64, 72, 72, 79, 81, 86, 91, 48, 45,
+ 45, 46, 46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 48, 45, 45, 46,
+ 46, 51, 56, 59, 67, 67, 76, 76, 83, 85, 91, 96, 53, 49, 49, 49, 49, 54,
+ 59, 62, 71, 71, 81, 81, 89, 91, 98, 103, 53, 50, 49, 50, 50, 54, 60, 63,
+ 71, 72, 82, 82, 90, 92, 99, 103, 57, 53, 52, 52, 52, 57, 62, 65, 74, 75,
+ 85, 85, 94, 96, 103, 108, 58, 54, 54, 54, 54, 58, 63, 67, 75, 76, 87,
+ 87, 95, 98, 105, 110, 61, 57, 57, 56, 56, 60, 66, 69, 77, 78, 89, 89,
+ 98, 101, 108, 114, 65, 60, 60, 59, 58, 63, 68, 71, 79, 80, 92, 92, 102,
+ 105, 112, 118, 67, 62, 61, 60, 60, 64, 69, 72, 81, 82, 94, 94, 103, 106,
+ 114, 120, 71, 66, 65, 64, 63, 68, 73, 76, 84, 85, 97, 97, 108, 111, 119,
+ 125, 72, 66, 66, 64, 64, 68, 73, 76, 85, 86, 98, 98, 108, 111, 119, 125,
+ 79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
+ 79, 73, 72, 71, 70, 74, 79, 82, 90, 91, 104, 104, 115, 118, 127, 133,
/* Size 4x16 */
- 31, 34, 44, 65, 32, 34, 43, 62, 32, 33, 41, 59, 32, 35, 43, 59, 32, 37,
- 43, 58, 34, 39, 48, 63, 34, 42, 53, 67, 36, 44, 57, 71, 39, 46, 60, 76,
- 42, 48, 64, 81, 45, 51, 67, 85, 50, 54, 72, 92, 54, 58, 76, 98, 60, 63,
- 80, 105, 66, 68, 85, 111, 73, 74, 91, 118,
- /* Size 16x4 */
31, 32, 32, 32, 32, 34, 34, 36, 39, 42, 45, 50, 54, 60, 66, 73, 34, 34,
33, 35, 37, 39, 42, 44, 46, 48, 51, 54, 58, 63, 68, 74, 44, 43, 41, 43,
43, 48, 53, 57, 60, 64, 67, 72, 76, 80, 85, 91, 65, 62, 59, 59, 58, 63,
67, 71, 76, 81, 85, 92, 98, 105, 111, 118,
+ /* Size 16x4 */
+ 31, 34, 44, 65, 32, 34, 43, 62, 32, 33, 41, 59, 32, 35, 43, 59, 32, 37,
+ 43, 58, 34, 39, 48, 63, 34, 42, 53, 67, 36, 44, 57, 71, 39, 46, 60, 76,
+ 42, 48, 64, 81, 45, 51, 67, 85, 50, 54, 72, 92, 54, 58, 76, 98, 60, 63,
+ 80, 105, 66, 68, 85, 111, 73, 74, 91, 118,
/* Size 8x32 */
- 32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 60, 70, 31, 32,
- 32, 35, 42, 51, 59, 69, 31, 32, 32, 35, 41, 50, 58, 67, 31, 32, 33, 34,
- 41, 49, 57, 66, 31, 32, 33, 35, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50,
- 57, 65, 32, 32, 34, 37, 42, 49, 56, 65, 32, 33, 35, 38, 42, 49, 56, 64,
- 32, 33, 35, 39, 43, 50, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 34, 34,
- 37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59, 65, 73, 36, 34, 38, 48,
- 54, 60, 66, 74, 38, 36, 40, 49, 56, 63, 69, 77, 39, 37, 40, 50, 58, 65,
- 71, 79, 41, 39, 41, 51, 60, 67, 74, 81, 44, 41, 43, 53, 63, 71, 78, 85,
- 44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 48, 45,
- 46, 56, 67, 76, 83, 91, 53, 49, 49, 59, 71, 81, 89, 98, 53, 49, 50, 60,
- 71, 82, 90, 99, 57, 52, 52, 62, 74, 85, 94, 103, 58, 54, 54, 63, 75, 87,
- 95, 105, 61, 57, 56, 66, 77, 89, 98, 108, 65, 60, 58, 68, 79, 92, 102,
- 112, 67, 61, 60, 69, 81, 94, 103, 114, 71, 65, 63, 73, 84, 97, 108, 119,
- 72, 66, 64, 73, 85, 98, 108, 119, 79, 72, 70, 79, 90, 104, 115, 127, 79,
- 72, 70, 79, 90, 104, 115, 127,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 35, 36, 38, 39, 41, 44,
44, 48, 48, 53, 53, 57, 58, 61, 65, 67, 71, 72, 79, 79, 31, 32, 32, 32,
32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 36, 37, 39, 41, 42, 45, 45, 49,
@@ -3113,7 +3097,23 @@
57, 57, 56, 56, 56, 61, 61, 65, 66, 69, 71, 74, 78, 79, 83, 83, 89, 90,
94, 95, 98, 102, 103, 108, 108, 115, 115, 73, 70, 69, 67, 66, 66, 65,
65, 64, 64, 69, 69, 73, 74, 77, 79, 81, 85, 86, 91, 91, 98, 99, 103,
- 105, 108, 112, 114, 119, 119, 127, 127 },
+ 105, 108, 112, 114, 119, 119, 127, 127,
+ /* Size 32x8 */
+ 32, 31, 32, 36, 44, 53, 62, 73, 31, 32, 32, 35, 42, 51, 60, 70, 31, 32,
+ 32, 35, 42, 51, 59, 69, 31, 32, 32, 35, 41, 50, 58, 67, 31, 32, 33, 34,
+ 41, 49, 57, 66, 31, 32, 33, 35, 41, 49, 57, 66, 32, 32, 34, 36, 42, 50,
+ 57, 65, 32, 32, 34, 37, 42, 49, 56, 65, 32, 33, 35, 38, 42, 49, 56, 64,
+ 32, 33, 35, 39, 43, 50, 56, 64, 34, 34, 37, 42, 48, 54, 61, 69, 34, 34,
+ 37, 42, 48, 54, 61, 69, 35, 34, 38, 47, 52, 59, 65, 73, 36, 34, 38, 48,
+ 54, 60, 66, 74, 38, 36, 40, 49, 56, 63, 69, 77, 39, 37, 40, 50, 58, 65,
+ 71, 79, 41, 39, 41, 51, 60, 67, 74, 81, 44, 41, 43, 53, 63, 71, 78, 85,
+ 44, 42, 43, 54, 64, 72, 79, 86, 48, 45, 46, 56, 67, 76, 83, 91, 48, 45,
+ 46, 56, 67, 76, 83, 91, 53, 49, 49, 59, 71, 81, 89, 98, 53, 49, 50, 60,
+ 71, 82, 90, 99, 57, 52, 52, 62, 74, 85, 94, 103, 58, 54, 54, 63, 75, 87,
+ 95, 105, 61, 57, 56, 66, 77, 89, 98, 108, 65, 60, 58, 68, 79, 92, 102,
+ 112, 67, 61, 60, 69, 81, 94, 103, 114, 71, 65, 63, 73, 84, 97, 108, 119,
+ 72, 66, 64, 73, 85, 98, 108, 119, 79, 72, 70, 79, 90, 104, 115, 127, 79,
+ 72, 70, 79, 90, 104, 115, 127 },
{ /* Chroma */
/* Size 4x4 */
31, 42, 47, 53, 42, 48, 50, 54, 47, 50, 61, 67, 53, 54, 67, 78,
@@ -3197,21 +3197,12 @@
89, 89, 63, 60, 60, 58, 57, 57, 56, 55, 54, 55, 57, 57, 60, 60, 62, 63,
65, 67, 68, 71, 71, 74, 75, 77, 78, 80, 82, 83, 85, 85, 89, 89,
/* Size 4x8 */
- 31, 42, 47, 54, 33, 44, 45, 51, 40, 47, 46, 50, 47, 50, 54, 57, 45, 49,
- 59, 64, 48, 50, 61, 70, 51, 52, 63, 75, 55, 55, 66, 79,
- /* Size 8x4 */
31, 33, 40, 47, 45, 48, 51, 55, 42, 44, 47, 50, 49, 50, 52, 55, 47, 45,
46, 54, 59, 61, 63, 66, 54, 51, 50, 57, 64, 70, 75, 79,
+ /* Size 8x4 */
+ 31, 42, 47, 54, 33, 44, 45, 51, 40, 47, 46, 50, 47, 50, 54, 57, 45, 49,
+ 59, 64, 48, 50, 61, 70, 51, 52, 63, 75, 55, 55, 66, 79,
/* Size 8x16 */
- 32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 53, 57, 30, 32,
- 40, 46, 45, 48, 51, 55, 33, 36, 43, 47, 46, 47, 50, 54, 37, 40, 47, 47,
- 45, 47, 49, 52, 42, 43, 47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53,
- 55, 58, 48, 46, 47, 53, 55, 56, 58, 61, 48, 45, 46, 53, 57, 59, 61, 63,
- 49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 52, 48,
- 47, 54, 61, 66, 70, 73, 54, 50, 49, 55, 62, 68, 72, 76, 57, 52, 50, 56,
- 64, 70, 75, 79, 60, 54, 52, 58, 65, 72, 77, 82, 63, 57, 55, 60, 67, 75,
- 80, 86,
- /* Size 16x8 */
32, 31, 30, 33, 37, 42, 47, 48, 48, 49, 50, 52, 54, 57, 60, 63, 31, 31,
32, 36, 40, 43, 46, 46, 45, 45, 46, 48, 50, 52, 54, 57, 37, 38, 40, 43,
47, 47, 48, 47, 46, 46, 46, 47, 49, 50, 52, 55, 48, 47, 46, 47, 47, 50,
@@ -3220,37 +3211,16 @@
64, 66, 68, 70, 72, 75, 56, 53, 51, 50, 49, 53, 55, 58, 61, 64, 66, 70,
72, 75, 77, 80, 61, 57, 55, 54, 52, 56, 58, 61, 63, 66, 69, 73, 76, 79,
82, 86,
+ /* Size 16x8 */
+ 32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 53, 57, 30, 32,
+ 40, 46, 45, 48, 51, 55, 33, 36, 43, 47, 46, 47, 50, 54, 37, 40, 47, 47,
+ 45, 47, 49, 52, 42, 43, 47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53,
+ 55, 58, 48, 46, 47, 53, 55, 56, 58, 61, 48, 45, 46, 53, 57, 59, 61, 63,
+ 49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 52, 48,
+ 47, 54, 61, 66, 70, 73, 54, 50, 49, 55, 62, 68, 72, 76, 57, 52, 50, 56,
+ 64, 70, 75, 79, 60, 54, 52, 58, 65, 72, 77, 82, 63, 57, 55, 60, 67, 75,
+ 80, 86,
/* Size 16x32 */
- 32, 31, 31, 35, 37, 42, 48, 48, 49, 49, 52, 52, 56, 57, 61, 63, 31, 31,
- 31, 36, 38, 42, 47, 47, 47, 47, 50, 50, 54, 54, 58, 60, 31, 31, 31, 36,
- 38, 42, 47, 47, 47, 47, 50, 50, 53, 54, 57, 60, 30, 32, 32, 37, 39, 42,
- 46, 46, 46, 46, 48, 48, 52, 52, 56, 58, 30, 32, 32, 37, 40, 42, 46, 46,
- 45, 45, 48, 48, 51, 52, 55, 57, 32, 33, 34, 39, 41, 44, 46, 46, 45, 45,
- 48, 48, 51, 51, 54, 57, 33, 35, 36, 40, 43, 45, 47, 46, 46, 46, 47, 47,
- 50, 51, 54, 56, 34, 37, 37, 42, 44, 45, 47, 47, 45, 46, 47, 47, 50, 51,
- 53, 55, 37, 40, 40, 45, 47, 47, 47, 47, 45, 46, 47, 47, 49, 50, 52, 54,
- 37, 40, 40, 45, 47, 47, 48, 47, 46, 46, 47, 47, 49, 50, 53, 55, 42, 43,
- 43, 46, 47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 42, 43, 43, 46,
- 47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 47, 46, 46, 47, 48, 50,
- 52, 52, 53, 53, 53, 53, 55, 56, 58, 60, 49, 47, 46, 47, 48, 50, 53, 53,
- 53, 54, 54, 54, 56, 57, 59, 60, 48, 46, 46, 47, 47, 50, 53, 53, 55, 55,
- 56, 56, 58, 58, 61, 62, 48, 46, 46, 46, 47, 50, 53, 54, 56, 56, 57, 57,
- 59, 60, 62, 64, 48, 46, 45, 46, 46, 49, 53, 54, 57, 57, 59, 59, 61, 61,
- 63, 65, 49, 45, 45, 45, 46, 49, 53, 55, 58, 59, 61, 61, 63, 64, 66, 67,
- 49, 46, 45, 46, 46, 49, 53, 55, 58, 59, 62, 62, 64, 64, 66, 68, 50, 47,
- 46, 46, 46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 50, 47, 46, 46,
- 46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 52, 48, 48, 47, 47, 50,
- 54, 56, 61, 61, 66, 66, 69, 70, 72, 74, 52, 48, 48, 47, 47, 50, 54, 56,
- 61, 61, 66, 66, 70, 71, 73, 75, 53, 50, 49, 48, 48, 51, 55, 57, 62, 62,
- 68, 68, 71, 72, 75, 77, 54, 50, 50, 49, 49, 52, 55, 57, 62, 63, 68, 68,
- 72, 73, 76, 78, 55, 51, 51, 50, 49, 52, 56, 58, 63, 63, 69, 69, 74, 75,
- 78, 80, 57, 52, 52, 51, 50, 53, 56, 58, 64, 64, 70, 70, 75, 76, 79, 82,
- 58, 53, 53, 51, 51, 54, 57, 59, 64, 65, 71, 71, 76, 77, 80, 83, 60, 55,
- 54, 53, 52, 55, 58, 60, 65, 66, 72, 72, 77, 79, 82, 85, 60, 55, 55, 53,
- 53, 55, 59, 60, 65, 66, 73, 73, 78, 79, 83, 85, 63, 58, 57, 56, 55, 58,
- 60, 62, 67, 68, 75, 75, 80, 82, 86, 89, 63, 58, 57, 56, 55, 58, 60, 62,
- 67, 68, 75, 75, 80, 82, 86, 89,
- /* Size 32x16 */
32, 31, 31, 30, 30, 32, 33, 34, 37, 37, 42, 42, 47, 49, 48, 48, 48, 49,
49, 50, 50, 52, 52, 53, 54, 55, 57, 58, 60, 60, 63, 63, 31, 31, 31, 32,
32, 33, 35, 37, 40, 40, 43, 43, 46, 47, 46, 46, 46, 45, 46, 47, 47, 48,
@@ -3280,33 +3250,47 @@
69, 72, 73, 75, 76, 78, 79, 80, 82, 83, 86, 86, 63, 60, 60, 58, 57, 57,
56, 55, 54, 55, 57, 57, 60, 60, 62, 64, 65, 67, 68, 71, 71, 74, 75, 77,
78, 80, 82, 83, 85, 85, 89, 89,
+ /* Size 32x16 */
+ 32, 31, 31, 35, 37, 42, 48, 48, 49, 49, 52, 52, 56, 57, 61, 63, 31, 31,
+ 31, 36, 38, 42, 47, 47, 47, 47, 50, 50, 54, 54, 58, 60, 31, 31, 31, 36,
+ 38, 42, 47, 47, 47, 47, 50, 50, 53, 54, 57, 60, 30, 32, 32, 37, 39, 42,
+ 46, 46, 46, 46, 48, 48, 52, 52, 56, 58, 30, 32, 32, 37, 40, 42, 46, 46,
+ 45, 45, 48, 48, 51, 52, 55, 57, 32, 33, 34, 39, 41, 44, 46, 46, 45, 45,
+ 48, 48, 51, 51, 54, 57, 33, 35, 36, 40, 43, 45, 47, 46, 46, 46, 47, 47,
+ 50, 51, 54, 56, 34, 37, 37, 42, 44, 45, 47, 47, 45, 46, 47, 47, 50, 51,
+ 53, 55, 37, 40, 40, 45, 47, 47, 47, 47, 45, 46, 47, 47, 49, 50, 52, 54,
+ 37, 40, 40, 45, 47, 47, 48, 47, 46, 46, 47, 47, 49, 50, 53, 55, 42, 43,
+ 43, 46, 47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 42, 43, 43, 46,
+ 47, 48, 50, 50, 49, 49, 50, 50, 53, 53, 56, 57, 47, 46, 46, 47, 48, 50,
+ 52, 52, 53, 53, 53, 53, 55, 56, 58, 60, 49, 47, 46, 47, 48, 50, 53, 53,
+ 53, 54, 54, 54, 56, 57, 59, 60, 48, 46, 46, 47, 47, 50, 53, 53, 55, 55,
+ 56, 56, 58, 58, 61, 62, 48, 46, 46, 46, 47, 50, 53, 54, 56, 56, 57, 57,
+ 59, 60, 62, 64, 48, 46, 45, 46, 46, 49, 53, 54, 57, 57, 59, 59, 61, 61,
+ 63, 65, 49, 45, 45, 45, 46, 49, 53, 55, 58, 59, 61, 61, 63, 64, 66, 67,
+ 49, 46, 45, 46, 46, 49, 53, 55, 58, 59, 62, 62, 64, 64, 66, 68, 50, 47,
+ 46, 46, 46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 50, 47, 46, 46,
+ 46, 50, 54, 55, 59, 60, 64, 64, 66, 67, 69, 71, 52, 48, 48, 47, 47, 50,
+ 54, 56, 61, 61, 66, 66, 69, 70, 72, 74, 52, 48, 48, 47, 47, 50, 54, 56,
+ 61, 61, 66, 66, 70, 71, 73, 75, 53, 50, 49, 48, 48, 51, 55, 57, 62, 62,
+ 68, 68, 71, 72, 75, 77, 54, 50, 50, 49, 49, 52, 55, 57, 62, 63, 68, 68,
+ 72, 73, 76, 78, 55, 51, 51, 50, 49, 52, 56, 58, 63, 63, 69, 69, 74, 75,
+ 78, 80, 57, 52, 52, 51, 50, 53, 56, 58, 64, 64, 70, 70, 75, 76, 79, 82,
+ 58, 53, 53, 51, 51, 54, 57, 59, 64, 65, 71, 71, 76, 77, 80, 83, 60, 55,
+ 54, 53, 52, 55, 58, 60, 65, 66, 72, 72, 77, 79, 82, 85, 60, 55, 55, 53,
+ 53, 55, 59, 60, 65, 66, 73, 73, 78, 79, 83, 85, 63, 58, 57, 56, 55, 58,
+ 60, 62, 67, 68, 75, 75, 80, 82, 86, 89, 63, 58, 57, 56, 55, 58, 60, 62,
+ 67, 68, 75, 75, 80, 82, 86, 89,
/* Size 4x16 */
- 31, 42, 49, 57, 31, 42, 47, 54, 32, 42, 45, 52, 35, 45, 46, 51, 40, 47,
- 46, 50, 43, 48, 49, 53, 46, 50, 53, 56, 46, 50, 55, 58, 46, 49, 57, 61,
- 46, 49, 59, 64, 47, 50, 60, 67, 48, 50, 61, 71, 50, 52, 63, 73, 52, 53,
- 64, 76, 55, 55, 66, 79, 58, 58, 68, 82,
- /* Size 16x4 */
31, 31, 32, 35, 40, 43, 46, 46, 46, 46, 47, 48, 50, 52, 55, 58, 42, 42,
42, 45, 47, 48, 50, 50, 49, 49, 50, 50, 52, 53, 55, 58, 49, 47, 45, 46,
46, 49, 53, 55, 57, 59, 60, 61, 63, 64, 66, 68, 57, 54, 52, 51, 50, 53,
56, 58, 61, 64, 67, 71, 73, 76, 79, 82,
+ /* Size 16x4 */
+ 31, 42, 49, 57, 31, 42, 47, 54, 32, 42, 45, 52, 35, 45, 46, 51, 40, 47,
+ 46, 50, 43, 48, 49, 53, 46, 50, 53, 56, 46, 50, 55, 58, 46, 49, 57, 61,
+ 46, 49, 59, 64, 47, 50, 60, 67, 48, 50, 61, 71, 50, 52, 63, 73, 52, 53,
+ 64, 76, 55, 55, 66, 79, 58, 58, 68, 82,
/* Size 8x32 */
- 32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 54, 58, 31, 31,
- 38, 47, 47, 50, 53, 57, 30, 32, 39, 46, 46, 48, 52, 56, 30, 32, 40, 46,
- 45, 48, 51, 55, 32, 34, 41, 46, 45, 48, 51, 54, 33, 36, 43, 47, 46, 47,
- 50, 54, 34, 37, 44, 47, 45, 47, 50, 53, 37, 40, 47, 47, 45, 47, 49, 52,
- 37, 40, 47, 48, 46, 47, 49, 53, 42, 43, 47, 50, 49, 50, 53, 56, 42, 43,
- 47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53, 55, 58, 49, 46, 48, 53,
- 53, 54, 56, 59, 48, 46, 47, 53, 55, 56, 58, 61, 48, 46, 47, 53, 56, 57,
- 59, 62, 48, 45, 46, 53, 57, 59, 61, 63, 49, 45, 46, 53, 58, 61, 63, 66,
- 49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 50, 46,
- 46, 54, 59, 64, 66, 69, 52, 48, 47, 54, 61, 66, 69, 72, 52, 48, 47, 54,
- 61, 66, 70, 73, 53, 49, 48, 55, 62, 68, 71, 75, 54, 50, 49, 55, 62, 68,
- 72, 76, 55, 51, 49, 56, 63, 69, 74, 78, 57, 52, 50, 56, 64, 70, 75, 79,
- 58, 53, 51, 57, 64, 71, 76, 80, 60, 54, 52, 58, 65, 72, 77, 82, 60, 55,
- 53, 59, 65, 73, 78, 83, 63, 57, 55, 60, 67, 75, 80, 86, 63, 57, 55, 60,
- 67, 75, 80, 86,
- /* Size 32x8 */
32, 31, 31, 30, 30, 32, 33, 34, 37, 37, 42, 42, 47, 49, 48, 48, 48, 49,
49, 50, 50, 52, 52, 53, 54, 55, 57, 58, 60, 60, 63, 63, 31, 31, 31, 32,
32, 34, 36, 37, 40, 40, 43, 43, 46, 46, 46, 46, 45, 45, 45, 46, 46, 48,
@@ -3321,7 +3305,23 @@
50, 50, 49, 49, 53, 53, 55, 56, 58, 59, 61, 63, 64, 66, 66, 69, 70, 71,
72, 74, 75, 76, 77, 78, 80, 80, 61, 58, 57, 56, 55, 54, 54, 53, 52, 53,
56, 56, 58, 59, 61, 62, 63, 66, 66, 69, 69, 72, 73, 75, 76, 78, 79, 80,
- 82, 83, 86, 86 },
+ 82, 83, 86, 86,
+ /* Size 32x8 */
+ 32, 31, 37, 48, 49, 52, 56, 61, 31, 31, 38, 47, 47, 50, 54, 58, 31, 31,
+ 38, 47, 47, 50, 53, 57, 30, 32, 39, 46, 46, 48, 52, 56, 30, 32, 40, 46,
+ 45, 48, 51, 55, 32, 34, 41, 46, 45, 48, 51, 54, 33, 36, 43, 47, 46, 47,
+ 50, 54, 34, 37, 44, 47, 45, 47, 50, 53, 37, 40, 47, 47, 45, 47, 49, 52,
+ 37, 40, 47, 48, 46, 47, 49, 53, 42, 43, 47, 50, 49, 50, 53, 56, 42, 43,
+ 47, 50, 49, 50, 53, 56, 47, 46, 48, 52, 53, 53, 55, 58, 49, 46, 48, 53,
+ 53, 54, 56, 59, 48, 46, 47, 53, 55, 56, 58, 61, 48, 46, 47, 53, 56, 57,
+ 59, 62, 48, 45, 46, 53, 57, 59, 61, 63, 49, 45, 46, 53, 58, 61, 63, 66,
+ 49, 45, 46, 53, 58, 62, 64, 66, 50, 46, 46, 54, 59, 64, 66, 69, 50, 46,
+ 46, 54, 59, 64, 66, 69, 52, 48, 47, 54, 61, 66, 69, 72, 52, 48, 47, 54,
+ 61, 66, 70, 73, 53, 49, 48, 55, 62, 68, 71, 75, 54, 50, 49, 55, 62, 68,
+ 72, 76, 55, 51, 49, 56, 63, 69, 74, 78, 57, 52, 50, 56, 64, 70, 75, 79,
+ 58, 53, 51, 57, 64, 71, 76, 80, 60, 54, 52, 58, 65, 72, 77, 82, 60, 55,
+ 53, 59, 65, 73, 78, 83, 63, 57, 55, 60, 67, 75, 80, 86, 63, 57, 55, 60,
+ 67, 75, 80, 86 },
},
{
{ /* Luma */
@@ -3408,21 +3408,12 @@
69, 72, 72, 76, 77, 79, 83, 83, 88, 89, 92, 96, 96, 101, 102, 105, 109,
109, 114,
/* Size 4x8 */
- 32, 32, 42, 56, 32, 33, 41, 53, 32, 35, 42, 52, 34, 37, 50, 59, 38, 40,
- 58, 68, 44, 45, 66, 78, 50, 50, 71, 86, 61, 58, 79, 97,
- /* Size 8x4 */
32, 32, 32, 34, 38, 44, 50, 61, 32, 33, 35, 37, 40, 45, 50, 58, 42, 41,
42, 50, 58, 66, 71, 79, 56, 53, 52, 59, 68, 78, 86, 97,
+ /* Size 8x4 */
+ 32, 32, 42, 56, 32, 33, 41, 53, 32, 35, 42, 52, 34, 37, 50, 59, 38, 40,
+ 58, 68, 44, 45, 66, 78, 50, 50, 71, 86, 61, 58, 79, 97,
/* Size 8x16 */
- 32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 51, 62, 31, 32,
- 33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42, 49, 59, 32, 32, 34, 36,
- 39, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 34, 34, 37, 41, 44, 48,
- 54, 63, 36, 34, 38, 46, 50, 54, 60, 68, 38, 37, 40, 47, 52, 57, 64, 72,
- 41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 48, 45,
- 46, 54, 60, 67, 76, 85, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60,
- 67, 74, 86, 97, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60, 58, 66, 72, 79,
- 92, 105,
- /* Size 16x8 */
32, 31, 31, 31, 32, 32, 34, 36, 38, 41, 44, 48, 53, 57, 61, 65, 31, 32,
32, 32, 32, 33, 34, 34, 37, 39, 41, 45, 49, 53, 56, 60, 32, 32, 33, 34,
34, 35, 37, 38, 40, 41, 43, 46, 50, 53, 56, 58, 35, 35, 34, 35, 36, 37,
@@ -3431,37 +3422,16 @@
63, 67, 71, 74, 77, 79, 53, 51, 49, 49, 49, 49, 54, 60, 64, 67, 71, 76,
82, 86, 89, 92, 65, 62, 59, 59, 58, 58, 63, 68, 72, 76, 79, 85, 92, 97,
100, 105,
+ /* Size 16x8 */
+ 32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 51, 62, 31, 32,
+ 33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42, 49, 59, 32, 32, 34, 36,
+ 39, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 34, 34, 37, 41, 44, 48,
+ 54, 63, 36, 34, 38, 46, 50, 54, 60, 68, 38, 37, 40, 47, 52, 57, 64, 72,
+ 41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 48, 45,
+ 46, 54, 60, 67, 76, 85, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60,
+ 67, 74, 86, 97, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60, 58, 66, 72, 79,
+ 92, 105,
/* Size 16x32 */
- 32, 31, 31, 31, 32, 32, 35, 36, 39, 44, 44, 51, 53, 58, 65, 65, 31, 32,
- 32, 32, 32, 32, 35, 35, 38, 42, 42, 49, 52, 56, 63, 63, 31, 32, 32, 32,
- 32, 32, 35, 35, 38, 42, 42, 49, 51, 55, 62, 62, 31, 32, 32, 32, 32, 32,
- 34, 35, 37, 41, 41, 48, 50, 54, 61, 61, 31, 32, 32, 32, 33, 33, 34, 34,
- 37, 41, 41, 47, 49, 53, 59, 59, 31, 32, 32, 32, 33, 33, 34, 34, 37, 41,
- 41, 47, 49, 53, 59, 59, 31, 32, 32, 33, 34, 34, 35, 36, 38, 42, 42, 48,
- 49, 53, 59, 59, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 48, 50, 53,
- 59, 59, 32, 32, 32, 33, 34, 34, 36, 37, 39, 42, 42, 48, 49, 53, 58, 58,
- 32, 32, 33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 32, 32,
- 33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 33, 33, 33, 35,
- 36, 36, 40, 41, 43, 46, 46, 52, 53, 56, 62, 62, 34, 34, 34, 35, 37, 37,
- 41, 42, 44, 48, 48, 53, 54, 57, 63, 63, 34, 34, 34, 35, 37, 37, 43, 44,
- 46, 50, 50, 55, 56, 59, 65, 65, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54,
- 54, 58, 60, 63, 68, 68, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54, 54, 58,
- 60, 63, 68, 68, 38, 37, 37, 38, 40, 40, 47, 50, 52, 57, 57, 62, 64, 67,
- 72, 72, 39, 38, 37, 39, 40, 40, 48, 50, 53, 58, 58, 63, 65, 68, 73, 73,
- 41, 39, 39, 40, 41, 41, 49, 51, 54, 60, 60, 66, 67, 70, 76, 76, 44, 41,
- 41, 42, 43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 44, 41, 41, 42,
- 43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 47, 44, 44, 44, 45, 45,
- 53, 56, 59, 66, 66, 73, 75, 78, 84, 84, 48, 45, 45, 45, 46, 46, 54, 56,
- 60, 67, 67, 74, 76, 79, 85, 85, 50, 47, 46, 47, 47, 47, 55, 58, 61, 68,
- 68, 76, 78, 82, 88, 88, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79,
- 82, 86, 92, 92, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79, 82, 86,
- 92, 92, 57, 54, 53, 53, 53, 53, 60, 63, 67, 74, 74, 83, 86, 90, 97, 97,
- 58, 55, 54, 54, 54, 54, 61, 63, 68, 75, 75, 84, 87, 91, 98, 98, 61, 57,
- 56, 56, 56, 56, 63, 65, 69, 77, 77, 86, 89, 93, 100, 100, 65, 61, 60,
- 59, 58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 65, 61, 60, 59,
- 58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 70, 65, 64, 63, 62,
- 62, 70, 72, 76, 83, 83, 93, 96, 101, 109, 109,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 39,
41, 44, 44, 47, 48, 50, 53, 53, 57, 58, 61, 65, 65, 70, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 37, 38, 39, 41, 41, 44,
@@ -3491,33 +3461,47 @@
79, 84, 85, 88, 92, 92, 97, 98, 100, 105, 105, 109, 65, 63, 62, 61, 59,
59, 59, 59, 58, 58, 58, 62, 63, 65, 68, 68, 72, 73, 76, 79, 79, 84, 85,
88, 92, 92, 97, 98, 100, 105, 105, 109,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 32, 32, 35, 36, 39, 44, 44, 51, 53, 58, 65, 65, 31, 32,
+ 32, 32, 32, 32, 35, 35, 38, 42, 42, 49, 52, 56, 63, 63, 31, 32, 32, 32,
+ 32, 32, 35, 35, 38, 42, 42, 49, 51, 55, 62, 62, 31, 32, 32, 32, 32, 32,
+ 34, 35, 37, 41, 41, 48, 50, 54, 61, 61, 31, 32, 32, 32, 33, 33, 34, 34,
+ 37, 41, 41, 47, 49, 53, 59, 59, 31, 32, 32, 32, 33, 33, 34, 34, 37, 41,
+ 41, 47, 49, 53, 59, 59, 31, 32, 32, 33, 34, 34, 35, 36, 38, 42, 42, 48,
+ 49, 53, 59, 59, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 48, 50, 53,
+ 59, 59, 32, 32, 32, 33, 34, 34, 36, 37, 39, 42, 42, 48, 49, 53, 58, 58,
+ 32, 32, 33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 32, 32,
+ 33, 34, 35, 35, 37, 38, 40, 42, 42, 48, 49, 52, 58, 58, 33, 33, 33, 35,
+ 36, 36, 40, 41, 43, 46, 46, 52, 53, 56, 62, 62, 34, 34, 34, 35, 37, 37,
+ 41, 42, 44, 48, 48, 53, 54, 57, 63, 63, 34, 34, 34, 35, 37, 37, 43, 44,
+ 46, 50, 50, 55, 56, 59, 65, 65, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54,
+ 54, 58, 60, 63, 68, 68, 36, 35, 34, 36, 38, 38, 46, 48, 50, 54, 54, 58,
+ 60, 63, 68, 68, 38, 37, 37, 38, 40, 40, 47, 50, 52, 57, 57, 62, 64, 67,
+ 72, 72, 39, 38, 37, 39, 40, 40, 48, 50, 53, 58, 58, 63, 65, 68, 73, 73,
+ 41, 39, 39, 40, 41, 41, 49, 51, 54, 60, 60, 66, 67, 70, 76, 76, 44, 41,
+ 41, 42, 43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 44, 41, 41, 42,
+ 43, 43, 51, 53, 57, 63, 63, 69, 71, 74, 79, 79, 47, 44, 44, 44, 45, 45,
+ 53, 56, 59, 66, 66, 73, 75, 78, 84, 84, 48, 45, 45, 45, 46, 46, 54, 56,
+ 60, 67, 67, 74, 76, 79, 85, 85, 50, 47, 46, 47, 47, 47, 55, 58, 61, 68,
+ 68, 76, 78, 82, 88, 88, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79,
+ 82, 86, 92, 92, 53, 50, 49, 50, 50, 50, 57, 60, 64, 71, 71, 79, 82, 86,
+ 92, 92, 57, 54, 53, 53, 53, 53, 60, 63, 67, 74, 74, 83, 86, 90, 97, 97,
+ 58, 55, 54, 54, 54, 54, 61, 63, 68, 75, 75, 84, 87, 91, 98, 98, 61, 57,
+ 56, 56, 56, 56, 63, 65, 69, 77, 77, 86, 89, 93, 100, 100, 65, 61, 60,
+ 59, 58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 65, 61, 60, 59,
+ 58, 58, 66, 68, 72, 79, 79, 89, 92, 97, 105, 105, 70, 65, 64, 63, 62,
+ 62, 70, 72, 76, 83, 83, 93, 96, 101, 109, 109,
/* Size 4x16 */
- 31, 32, 44, 58, 32, 32, 42, 55, 32, 33, 41, 53, 32, 34, 42, 53, 32, 34,
- 42, 53, 32, 35, 42, 52, 34, 37, 48, 57, 35, 38, 54, 63, 37, 40, 57, 67,
- 39, 41, 60, 70, 41, 43, 63, 74, 45, 46, 67, 79, 50, 50, 71, 86, 54, 53,
- 74, 90, 57, 56, 77, 93, 61, 58, 79, 97,
- /* Size 16x4 */
31, 32, 32, 32, 32, 32, 34, 35, 37, 39, 41, 45, 50, 54, 57, 61, 32, 32,
33, 34, 34, 35, 37, 38, 40, 41, 43, 46, 50, 53, 56, 58, 44, 42, 41, 42,
42, 42, 48, 54, 57, 60, 63, 67, 71, 74, 77, 79, 58, 55, 53, 53, 53, 52,
57, 63, 67, 70, 74, 79, 86, 90, 93, 97,
+ /* Size 16x4 */
+ 31, 32, 44, 58, 32, 32, 42, 55, 32, 33, 41, 53, 32, 34, 42, 53, 32, 34,
+ 42, 53, 32, 35, 42, 52, 34, 37, 48, 57, 35, 38, 54, 63, 37, 40, 57, 67,
+ 39, 41, 60, 70, 41, 43, 63, 74, 45, 46, 67, 79, 50, 50, 71, 86, 54, 53,
+ 74, 90, 57, 56, 77, 93, 61, 58, 79, 97,
/* Size 8x32 */
- 32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 52, 63, 31, 32,
- 32, 35, 38, 42, 51, 62, 31, 32, 32, 34, 37, 41, 50, 61, 31, 32, 33, 34,
- 37, 41, 49, 59, 31, 32, 33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42,
- 49, 59, 32, 32, 34, 36, 38, 42, 50, 59, 32, 32, 34, 36, 39, 42, 49, 58,
- 32, 33, 35, 37, 40, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 33, 33,
- 36, 40, 43, 46, 53, 62, 34, 34, 37, 41, 44, 48, 54, 63, 34, 34, 37, 43,
- 46, 50, 56, 65, 36, 34, 38, 46, 50, 54, 60, 68, 36, 34, 38, 46, 50, 54,
- 60, 68, 38, 37, 40, 47, 52, 57, 64, 72, 39, 37, 40, 48, 53, 58, 65, 73,
- 41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 44, 41,
- 43, 51, 57, 63, 71, 79, 47, 44, 45, 53, 59, 66, 75, 84, 48, 45, 46, 54,
- 60, 67, 76, 85, 50, 46, 47, 55, 61, 68, 78, 88, 53, 49, 50, 57, 64, 71,
- 82, 92, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60, 67, 74, 86, 97,
- 58, 54, 54, 61, 68, 75, 87, 98, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60,
- 58, 66, 72, 79, 92, 105, 65, 60, 58, 66, 72, 79, 92, 105, 70, 64, 62,
- 70, 76, 83, 96, 109,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 39,
41, 44, 44, 47, 48, 50, 53, 53, 57, 58, 61, 65, 65, 70, 31, 32, 32, 32,
32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 37, 37, 39, 41, 41, 44,
@@ -3532,7 +3516,23 @@
49, 50, 49, 49, 49, 53, 54, 56, 60, 60, 64, 65, 67, 71, 71, 75, 76, 78,
82, 82, 86, 87, 89, 92, 92, 96, 65, 63, 62, 61, 59, 59, 59, 59, 58, 58,
58, 62, 63, 65, 68, 68, 72, 73, 76, 79, 79, 84, 85, 88, 92, 92, 97, 98,
- 100, 105, 105, 109 },
+ 100, 105, 105, 109,
+ /* Size 32x8 */
+ 32, 31, 32, 35, 39, 44, 53, 65, 31, 32, 32, 35, 38, 42, 52, 63, 31, 32,
+ 32, 35, 38, 42, 51, 62, 31, 32, 32, 34, 37, 41, 50, 61, 31, 32, 33, 34,
+ 37, 41, 49, 59, 31, 32, 33, 34, 37, 41, 49, 59, 31, 32, 34, 35, 38, 42,
+ 49, 59, 32, 32, 34, 36, 38, 42, 50, 59, 32, 32, 34, 36, 39, 42, 49, 58,
+ 32, 33, 35, 37, 40, 42, 49, 58, 32, 33, 35, 37, 40, 42, 49, 58, 33, 33,
+ 36, 40, 43, 46, 53, 62, 34, 34, 37, 41, 44, 48, 54, 63, 34, 34, 37, 43,
+ 46, 50, 56, 65, 36, 34, 38, 46, 50, 54, 60, 68, 36, 34, 38, 46, 50, 54,
+ 60, 68, 38, 37, 40, 47, 52, 57, 64, 72, 39, 37, 40, 48, 53, 58, 65, 73,
+ 41, 39, 41, 49, 54, 60, 67, 76, 44, 41, 43, 51, 57, 63, 71, 79, 44, 41,
+ 43, 51, 57, 63, 71, 79, 47, 44, 45, 53, 59, 66, 75, 84, 48, 45, 46, 54,
+ 60, 67, 76, 85, 50, 46, 47, 55, 61, 68, 78, 88, 53, 49, 50, 57, 64, 71,
+ 82, 92, 53, 49, 50, 57, 64, 71, 82, 92, 57, 53, 53, 60, 67, 74, 86, 97,
+ 58, 54, 54, 61, 68, 75, 87, 98, 61, 56, 56, 63, 69, 77, 89, 100, 65, 60,
+ 58, 66, 72, 79, 92, 105, 65, 60, 58, 66, 72, 79, 92, 105, 70, 64, 62,
+ 70, 76, 83, 96, 109 },
{ /* Chroma */
/* Size 4x4 */
31, 41, 46, 51, 41, 48, 48, 51, 46, 48, 58, 62, 51, 51, 62, 71,
@@ -3616,21 +3616,12 @@
76, 78, 59, 57, 56, 55, 54, 54, 53, 53, 52, 51, 51, 54, 55, 56, 58, 58,
60, 61, 63, 65, 65, 67, 68, 70, 72, 72, 74, 75, 76, 78, 78, 80,
/* Size 4x8 */
- 31, 38, 47, 52, 32, 40, 45, 49, 39, 47, 45, 48, 44, 47, 51, 53, 46, 47,
- 56, 58, 47, 46, 59, 64, 48, 47, 61, 68, 53, 50, 64, 73,
- /* Size 8x4 */
31, 32, 39, 44, 46, 47, 48, 53, 38, 40, 47, 47, 47, 46, 47, 50, 47, 45,
45, 51, 56, 59, 61, 64, 52, 49, 48, 53, 58, 64, 68, 73,
+ /* Size 8x4 */
+ 31, 38, 47, 52, 32, 40, 45, 49, 39, 47, 45, 48, 44, 47, 51, 53, 46, 47,
+ 56, 58, 47, 46, 59, 64, 48, 47, 61, 68, 53, 50, 64, 73,
/* Size 8x16 */
- 32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 54, 30, 32,
- 40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45, 47, 51, 35, 37, 44, 46,
- 46, 45, 47, 51, 37, 40, 47, 47, 47, 45, 47, 50, 42, 43, 47, 49, 50, 49,
- 50, 53, 49, 46, 48, 52, 53, 53, 54, 57, 48, 46, 47, 51, 54, 55, 57, 59,
- 48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 50, 46,
- 46, 52, 56, 59, 64, 67, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54,
- 58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52, 50, 55, 59, 64,
- 70, 76,
- /* Size 16x8 */
32, 31, 30, 33, 35, 37, 42, 49, 48, 48, 49, 50, 52, 54, 55, 57, 31, 31,
32, 35, 37, 40, 43, 46, 46, 45, 45, 46, 48, 49, 51, 52, 37, 38, 40, 42,
44, 47, 47, 48, 47, 46, 46, 46, 47, 48, 49, 50, 45, 45, 44, 46, 46, 47,
@@ -3639,37 +3630,16 @@
58, 59, 61, 62, 63, 64, 52, 50, 48, 47, 47, 47, 50, 54, 57, 59, 61, 64,
66, 68, 69, 70, 57, 54, 52, 51, 51, 50, 53, 57, 59, 61, 64, 67, 71, 73,
74, 76,
+ /* Size 16x8 */
+ 32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 54, 30, 32,
+ 40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45, 47, 51, 35, 37, 44, 46,
+ 46, 45, 47, 51, 37, 40, 47, 47, 47, 45, 47, 50, 42, 43, 47, 49, 50, 49,
+ 50, 53, 49, 46, 48, 52, 53, 53, 54, 57, 48, 46, 47, 51, 54, 55, 57, 59,
+ 48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 50, 46,
+ 46, 52, 56, 59, 64, 67, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54,
+ 58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52, 50, 55, 59, 64,
+ 70, 76,
/* Size 16x32 */
- 32, 31, 31, 33, 37, 37, 45, 48, 48, 49, 49, 51, 52, 54, 57, 57, 31, 31,
- 31, 34, 38, 38, 45, 47, 47, 47, 47, 50, 50, 52, 55, 55, 31, 31, 31, 34,
- 38, 38, 45, 47, 47, 47, 47, 49, 50, 51, 54, 54, 31, 31, 32, 34, 39, 39,
- 45, 46, 46, 46, 46, 48, 49, 51, 53, 53, 30, 32, 32, 35, 40, 40, 44, 46,
- 45, 45, 45, 47, 48, 49, 52, 52, 30, 32, 32, 35, 40, 40, 44, 46, 45, 45,
- 45, 47, 48, 49, 52, 52, 33, 34, 35, 37, 42, 42, 46, 47, 46, 45, 45, 47,
- 47, 49, 51, 51, 33, 35, 36, 38, 43, 43, 46, 47, 46, 46, 46, 47, 47, 49,
- 51, 51, 35, 37, 37, 40, 44, 44, 46, 47, 46, 45, 45, 47, 47, 48, 51, 51,
- 37, 39, 40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 37, 39,
- 40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 41, 42, 42, 44,
- 47, 47, 49, 49, 49, 48, 48, 49, 50, 51, 52, 52, 42, 42, 43, 44, 47, 47,
- 49, 50, 50, 49, 49, 50, 50, 51, 53, 53, 44, 44, 44, 45, 47, 47, 50, 51,
- 51, 51, 51, 52, 52, 53, 54, 54, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53,
- 53, 54, 54, 55, 57, 57, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53, 53, 54,
- 54, 55, 57, 57, 48, 46, 46, 46, 47, 47, 51, 53, 54, 55, 55, 56, 57, 58,
- 59, 59, 48, 46, 46, 46, 47, 47, 51, 53, 54, 56, 56, 57, 57, 58, 60, 60,
- 48, 46, 45, 46, 46, 46, 51, 53, 54, 57, 57, 58, 59, 60, 61, 61, 49, 46,
- 45, 45, 46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 49, 46, 45, 45,
- 46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 50, 47, 46, 46, 46, 46,
- 52, 54, 56, 59, 59, 62, 63, 64, 66, 66, 50, 47, 46, 46, 46, 46, 52, 54,
- 56, 59, 59, 63, 64, 65, 67, 67, 51, 48, 47, 47, 47, 47, 52, 54, 56, 60,
- 60, 64, 65, 66, 68, 68, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65,
- 66, 68, 71, 71, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65, 66, 68,
- 71, 71, 54, 50, 49, 49, 48, 48, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73,
- 54, 51, 50, 49, 49, 49, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73, 55, 51,
- 51, 50, 49, 49, 54, 56, 58, 63, 63, 68, 69, 71, 74, 74, 57, 53, 52, 51,
- 50, 50, 55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 57, 53, 52, 51, 50, 50,
- 55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 59, 55, 54, 53, 52, 52, 57, 58,
- 61, 65, 65, 70, 72, 74, 78, 78,
- /* Size 32x16 */
32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 48, 48,
48, 49, 49, 50, 50, 51, 52, 52, 54, 54, 55, 57, 57, 59, 31, 31, 31, 31,
32, 32, 34, 35, 37, 39, 39, 42, 42, 44, 47, 47, 46, 46, 46, 46, 46, 47,
@@ -3699,33 +3669,47 @@
64, 66, 67, 68, 71, 71, 73, 73, 74, 76, 76, 78, 57, 55, 54, 53, 52, 52,
51, 51, 51, 50, 50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 66, 67, 68,
71, 71, 73, 73, 74, 76, 76, 78,
+ /* Size 32x16 */
+ 32, 31, 31, 33, 37, 37, 45, 48, 48, 49, 49, 51, 52, 54, 57, 57, 31, 31,
+ 31, 34, 38, 38, 45, 47, 47, 47, 47, 50, 50, 52, 55, 55, 31, 31, 31, 34,
+ 38, 38, 45, 47, 47, 47, 47, 49, 50, 51, 54, 54, 31, 31, 32, 34, 39, 39,
+ 45, 46, 46, 46, 46, 48, 49, 51, 53, 53, 30, 32, 32, 35, 40, 40, 44, 46,
+ 45, 45, 45, 47, 48, 49, 52, 52, 30, 32, 32, 35, 40, 40, 44, 46, 45, 45,
+ 45, 47, 48, 49, 52, 52, 33, 34, 35, 37, 42, 42, 46, 47, 46, 45, 45, 47,
+ 47, 49, 51, 51, 33, 35, 36, 38, 43, 43, 46, 47, 46, 46, 46, 47, 47, 49,
+ 51, 51, 35, 37, 37, 40, 44, 44, 46, 47, 46, 45, 45, 47, 47, 48, 51, 51,
+ 37, 39, 40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 37, 39,
+ 40, 43, 47, 47, 47, 47, 47, 45, 45, 46, 47, 48, 50, 50, 41, 42, 42, 44,
+ 47, 47, 49, 49, 49, 48, 48, 49, 50, 51, 52, 52, 42, 42, 43, 44, 47, 47,
+ 49, 50, 50, 49, 49, 50, 50, 51, 53, 53, 44, 44, 44, 45, 47, 47, 50, 51,
+ 51, 51, 51, 52, 52, 53, 54, 54, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53,
+ 53, 54, 54, 55, 57, 57, 49, 47, 46, 47, 48, 48, 52, 53, 53, 53, 53, 54,
+ 54, 55, 57, 57, 48, 46, 46, 46, 47, 47, 51, 53, 54, 55, 55, 56, 57, 58,
+ 59, 59, 48, 46, 46, 46, 47, 47, 51, 53, 54, 56, 56, 57, 57, 58, 60, 60,
+ 48, 46, 45, 46, 46, 46, 51, 53, 54, 57, 57, 58, 59, 60, 61, 61, 49, 46,
+ 45, 45, 46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 49, 46, 45, 45,
+ 46, 46, 51, 53, 55, 58, 58, 61, 61, 62, 64, 64, 50, 47, 46, 46, 46, 46,
+ 52, 54, 56, 59, 59, 62, 63, 64, 66, 66, 50, 47, 46, 46, 46, 46, 52, 54,
+ 56, 59, 59, 63, 64, 65, 67, 67, 51, 48, 47, 47, 47, 47, 52, 54, 56, 60,
+ 60, 64, 65, 66, 68, 68, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65,
+ 66, 68, 71, 71, 52, 48, 48, 47, 47, 47, 53, 54, 57, 61, 61, 65, 66, 68,
+ 71, 71, 54, 50, 49, 49, 48, 48, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73,
+ 54, 51, 50, 49, 49, 49, 54, 55, 58, 62, 62, 67, 68, 70, 73, 73, 55, 51,
+ 51, 50, 49, 49, 54, 56, 58, 63, 63, 68, 69, 71, 74, 74, 57, 53, 52, 51,
+ 50, 50, 55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 57, 53, 52, 51, 50, 50,
+ 55, 56, 59, 64, 64, 69, 70, 73, 76, 76, 59, 55, 54, 53, 52, 52, 57, 58,
+ 61, 65, 65, 70, 72, 74, 78, 78,
/* Size 4x16 */
- 31, 37, 49, 54, 31, 38, 47, 51, 32, 40, 45, 49, 34, 42, 45, 49, 37, 44,
- 45, 48, 39, 47, 45, 48, 42, 47, 49, 51, 47, 48, 53, 55, 46, 47, 55, 58,
- 46, 46, 57, 60, 46, 46, 58, 62, 47, 46, 59, 65, 48, 47, 61, 68, 50, 48,
- 62, 70, 51, 49, 63, 71, 53, 50, 64, 73,
- /* Size 16x4 */
31, 31, 32, 34, 37, 39, 42, 47, 46, 46, 46, 47, 48, 50, 51, 53, 37, 38,
40, 42, 44, 47, 47, 48, 47, 46, 46, 46, 47, 48, 49, 50, 49, 47, 45, 45,
45, 45, 49, 53, 55, 57, 58, 59, 61, 62, 63, 64, 54, 51, 49, 49, 48, 48,
51, 55, 58, 60, 62, 65, 68, 70, 71, 73,
+ /* Size 16x4 */
+ 31, 37, 49, 54, 31, 38, 47, 51, 32, 40, 45, 49, 34, 42, 45, 49, 37, 44,
+ 45, 48, 39, 47, 45, 48, 42, 47, 49, 51, 47, 48, 53, 55, 46, 47, 55, 58,
+ 46, 46, 57, 60, 46, 46, 58, 62, 47, 46, 59, 65, 48, 47, 61, 68, 50, 48,
+ 62, 70, 51, 49, 63, 71, 53, 50, 64, 73,
/* Size 8x32 */
- 32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 55, 31, 31,
- 38, 45, 47, 47, 50, 54, 31, 32, 39, 45, 46, 46, 49, 53, 30, 32, 40, 44,
- 45, 45, 48, 52, 30, 32, 40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45,
- 47, 51, 33, 36, 43, 46, 46, 46, 47, 51, 35, 37, 44, 46, 46, 45, 47, 51,
- 37, 40, 47, 47, 47, 45, 47, 50, 37, 40, 47, 47, 47, 45, 47, 50, 41, 42,
- 47, 49, 49, 48, 50, 52, 42, 43, 47, 49, 50, 49, 50, 53, 44, 44, 47, 50,
- 51, 51, 52, 54, 49, 46, 48, 52, 53, 53, 54, 57, 49, 46, 48, 52, 53, 53,
- 54, 57, 48, 46, 47, 51, 54, 55, 57, 59, 48, 46, 47, 51, 54, 56, 57, 60,
- 48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 49, 45,
- 46, 51, 55, 58, 61, 64, 50, 46, 46, 52, 56, 59, 63, 66, 50, 46, 46, 52,
- 56, 59, 64, 67, 51, 47, 47, 52, 56, 60, 65, 68, 52, 48, 47, 53, 57, 61,
- 66, 71, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54, 58, 62, 68, 73,
- 54, 50, 49, 54, 58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52,
- 50, 55, 59, 64, 70, 76, 57, 52, 50, 55, 59, 64, 70, 76, 59, 54, 52, 57,
- 61, 65, 72, 78,
- /* Size 32x8 */
32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 48, 48,
48, 49, 49, 50, 50, 51, 52, 52, 54, 54, 55, 57, 57, 59, 31, 31, 31, 32,
32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 46, 46, 45, 45, 45, 46,
@@ -3740,7 +3724,23 @@
47, 47, 47, 47, 47, 50, 50, 52, 54, 54, 57, 57, 59, 61, 61, 63, 64, 65,
66, 66, 68, 68, 69, 70, 70, 72, 57, 55, 54, 53, 52, 52, 51, 51, 51, 50,
50, 52, 53, 54, 57, 57, 59, 60, 61, 64, 64, 66, 67, 68, 71, 71, 73, 73,
- 74, 76, 76, 78 },
+ 74, 76, 76, 78,
+ /* Size 32x8 */
+ 32, 31, 37, 45, 48, 49, 52, 57, 31, 31, 38, 45, 47, 47, 50, 55, 31, 31,
+ 38, 45, 47, 47, 50, 54, 31, 32, 39, 45, 46, 46, 49, 53, 30, 32, 40, 44,
+ 45, 45, 48, 52, 30, 32, 40, 44, 45, 45, 48, 52, 33, 35, 42, 46, 46, 45,
+ 47, 51, 33, 36, 43, 46, 46, 46, 47, 51, 35, 37, 44, 46, 46, 45, 47, 51,
+ 37, 40, 47, 47, 47, 45, 47, 50, 37, 40, 47, 47, 47, 45, 47, 50, 41, 42,
+ 47, 49, 49, 48, 50, 52, 42, 43, 47, 49, 50, 49, 50, 53, 44, 44, 47, 50,
+ 51, 51, 52, 54, 49, 46, 48, 52, 53, 53, 54, 57, 49, 46, 48, 52, 53, 53,
+ 54, 57, 48, 46, 47, 51, 54, 55, 57, 59, 48, 46, 47, 51, 54, 56, 57, 60,
+ 48, 45, 46, 51, 54, 57, 59, 61, 49, 45, 46, 51, 55, 58, 61, 64, 49, 45,
+ 46, 51, 55, 58, 61, 64, 50, 46, 46, 52, 56, 59, 63, 66, 50, 46, 46, 52,
+ 56, 59, 64, 67, 51, 47, 47, 52, 56, 60, 65, 68, 52, 48, 47, 53, 57, 61,
+ 66, 71, 52, 48, 47, 53, 57, 61, 66, 71, 54, 49, 48, 54, 58, 62, 68, 73,
+ 54, 50, 49, 54, 58, 62, 68, 73, 55, 51, 49, 54, 58, 63, 69, 74, 57, 52,
+ 50, 55, 59, 64, 70, 76, 57, 52, 50, 55, 59, 64, 70, 76, 59, 54, 52, 57,
+ 61, 65, 72, 78 },
},
{
{ /* Luma */
@@ -3826,21 +3826,12 @@
92, 92, 59, 57, 56, 56, 54, 54, 54, 54, 54, 54, 53, 53, 55, 58, 58, 61,
64, 64, 67, 69, 69, 73, 75, 76, 79, 80, 81, 86, 87, 88, 92, 92,
/* Size 4x8 */
- 32, 32, 37, 52, 32, 33, 36, 49, 32, 34, 38, 49, 34, 37, 44, 54, 35, 38,
- 49, 60, 40, 42, 55, 69, 46, 46, 59, 76, 52, 51, 64, 83,
- /* Size 8x4 */
32, 32, 32, 34, 35, 40, 46, 52, 32, 33, 34, 37, 38, 42, 46, 51, 37, 36,
38, 44, 49, 55, 59, 64, 52, 49, 49, 54, 60, 69, 76, 83,
+ /* Size 8x4 */
+ 32, 32, 37, 52, 32, 33, 36, 49, 32, 34, 38, 49, 34, 37, 44, 54, 35, 38,
+ 49, 60, 40, 42, 55, 69, 46, 46, 59, 76, 52, 51, 64, 83,
/* Size 8x16 */
- 32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32,
- 32, 33, 35, 41, 44, 49, 31, 32, 33, 33, 35, 41, 44, 49, 32, 32, 34, 34,
- 36, 42, 45, 50, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44,
- 47, 51, 34, 34, 36, 38, 42, 48, 50, 54, 36, 34, 37, 40, 48, 54, 56, 60,
- 38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 44, 41,
- 42, 45, 53, 63, 66, 71, 47, 44, 45, 47, 56, 66, 69, 75, 49, 46, 47, 48,
- 57, 67, 71, 77, 53, 49, 50, 51, 60, 71, 75, 82, 58, 54, 54, 55, 63, 75,
- 79, 87,
- /* Size 16x8 */
32, 31, 31, 31, 32, 32, 32, 34, 36, 38, 39, 44, 47, 49, 53, 58, 31, 32,
32, 32, 32, 33, 33, 34, 34, 36, 37, 41, 44, 46, 49, 54, 32, 32, 32, 33,
34, 35, 35, 36, 37, 39, 40, 42, 45, 47, 50, 54, 32, 33, 33, 33, 34, 36,
@@ -3849,37 +3840,16 @@
58, 63, 66, 67, 71, 75, 47, 45, 44, 44, 45, 45, 47, 50, 56, 58, 60, 66,
69, 71, 75, 79, 53, 51, 49, 49, 50, 49, 51, 54, 60, 63, 65, 71, 75, 77,
82, 87,
+ /* Size 16x8 */
+ 32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32,
+ 32, 33, 35, 41, 44, 49, 31, 32, 33, 33, 35, 41, 44, 49, 32, 32, 34, 34,
+ 36, 42, 45, 50, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44,
+ 47, 51, 34, 34, 36, 38, 42, 48, 50, 54, 36, 34, 37, 40, 48, 54, 56, 60,
+ 38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 44, 41,
+ 42, 45, 53, 63, 66, 71, 47, 44, 45, 47, 56, 66, 69, 75, 49, 46, 47, 48,
+ 57, 67, 71, 77, 53, 49, 50, 51, 60, 71, 75, 82, 58, 54, 54, 55, 63, 75,
+ 79, 87,
/* Size 16x32 */
- 32, 31, 31, 31, 32, 32, 32, 35, 36, 38, 44, 44, 47, 53, 53, 59, 31, 32,
- 32, 32, 32, 32, 33, 35, 35, 37, 43, 43, 46, 52, 52, 57, 31, 32, 32, 32,
- 32, 32, 33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32,
- 33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32, 33, 34,
- 35, 36, 41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 32, 33, 33, 34, 34, 36,
- 41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 33, 33, 33, 35, 35, 36, 41, 41,
- 44, 49, 49, 54, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 45, 49,
- 49, 54, 32, 32, 32, 33, 34, 34, 34, 36, 36, 38, 42, 42, 45, 50, 50, 54,
- 32, 32, 32, 33, 34, 34, 35, 37, 37, 38, 42, 42, 45, 49, 49, 54, 32, 32,
- 33, 33, 35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 32, 33, 33,
- 35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 33, 33, 33, 35, 36,
- 36, 39, 40, 41, 44, 44, 47, 51, 51, 55, 34, 34, 34, 34, 36, 37, 38, 42,
- 42, 44, 48, 48, 50, 54, 54, 58, 34, 34, 34, 34, 36, 37, 38, 42, 42, 44,
- 48, 48, 50, 54, 54, 58, 35, 34, 34, 34, 37, 37, 39, 44, 45, 46, 50, 50,
- 53, 57, 57, 61, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60,
- 60, 64, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60, 60, 64,
- 38, 37, 36, 37, 39, 40, 41, 48, 49, 51, 56, 56, 58, 63, 63, 67, 39, 38,
- 37, 38, 40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 39, 38, 37, 38,
- 40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 42, 40, 40, 40, 42, 42,
- 44, 51, 52, 55, 61, 61, 64, 69, 69, 73, 44, 42, 41, 41, 42, 43, 45, 52,
- 53, 56, 63, 63, 66, 71, 71, 75, 44, 42, 41, 41, 43, 43, 45, 52, 54, 56,
- 63, 63, 66, 72, 72, 76, 47, 45, 44, 44, 45, 45, 47, 54, 56, 58, 66, 66,
- 69, 75, 75, 79, 48, 46, 45, 45, 46, 46, 48, 55, 56, 59, 67, 67, 70, 76,
- 76, 80, 49, 47, 46, 46, 47, 47, 48, 56, 57, 60, 67, 67, 71, 77, 77, 81,
- 53, 50, 49, 49, 49, 49, 51, 58, 59, 62, 71, 71, 74, 81, 81, 86, 53, 51,
- 49, 49, 50, 50, 51, 59, 60, 63, 71, 71, 75, 82, 82, 87, 55, 52, 51, 51,
- 51, 51, 53, 60, 61, 64, 72, 72, 76, 83, 83, 88, 58, 55, 54, 54, 54, 54,
- 55, 62, 63, 67, 75, 75, 79, 87, 87, 92, 58, 55, 54, 54, 54, 54, 55, 62,
- 63, 67, 75, 75, 79, 87, 87, 92,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 35, 36, 36,
38, 39, 39, 42, 44, 44, 47, 48, 49, 53, 53, 55, 58, 58, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 37, 38, 38, 40,
@@ -3909,33 +3879,47 @@
65, 69, 71, 72, 75, 76, 77, 81, 82, 83, 87, 87, 59, 57, 56, 56, 54, 54,
54, 54, 54, 54, 53, 53, 55, 58, 58, 61, 64, 64, 67, 69, 69, 73, 75, 76,
79, 80, 81, 86, 87, 88, 92, 92,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 32, 32, 32, 35, 36, 38, 44, 44, 47, 53, 53, 59, 31, 32,
+ 32, 32, 32, 32, 33, 35, 35, 37, 43, 43, 46, 52, 52, 57, 31, 32, 32, 32,
+ 32, 32, 33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32,
+ 33, 35, 35, 37, 42, 42, 45, 51, 51, 56, 31, 32, 32, 32, 32, 32, 33, 34,
+ 35, 36, 41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 32, 33, 33, 34, 34, 36,
+ 41, 41, 44, 49, 49, 54, 31, 32, 32, 32, 33, 33, 33, 35, 35, 36, 41, 41,
+ 44, 49, 49, 54, 32, 32, 32, 32, 33, 34, 34, 36, 36, 38, 42, 42, 45, 49,
+ 49, 54, 32, 32, 32, 33, 34, 34, 34, 36, 36, 38, 42, 42, 45, 50, 50, 54,
+ 32, 32, 32, 33, 34, 34, 35, 37, 37, 38, 42, 42, 45, 49, 49, 54, 32, 32,
+ 33, 33, 35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 32, 33, 33,
+ 35, 35, 36, 38, 38, 39, 42, 42, 45, 49, 49, 53, 32, 33, 33, 33, 35, 36,
+ 36, 39, 40, 41, 44, 44, 47, 51, 51, 55, 34, 34, 34, 34, 36, 37, 38, 42,
+ 42, 44, 48, 48, 50, 54, 54, 58, 34, 34, 34, 34, 36, 37, 38, 42, 42, 44,
+ 48, 48, 50, 54, 54, 58, 35, 34, 34, 34, 37, 37, 39, 44, 45, 46, 50, 50,
+ 53, 57, 57, 61, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60,
+ 60, 64, 36, 35, 34, 35, 37, 38, 40, 47, 48, 49, 54, 54, 56, 60, 60, 64,
+ 38, 37, 36, 37, 39, 40, 41, 48, 49, 51, 56, 56, 58, 63, 63, 67, 39, 38,
+ 37, 38, 40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 39, 38, 37, 38,
+ 40, 40, 42, 49, 50, 52, 58, 58, 60, 65, 65, 69, 42, 40, 40, 40, 42, 42,
+ 44, 51, 52, 55, 61, 61, 64, 69, 69, 73, 44, 42, 41, 41, 42, 43, 45, 52,
+ 53, 56, 63, 63, 66, 71, 71, 75, 44, 42, 41, 41, 43, 43, 45, 52, 54, 56,
+ 63, 63, 66, 72, 72, 76, 47, 45, 44, 44, 45, 45, 47, 54, 56, 58, 66, 66,
+ 69, 75, 75, 79, 48, 46, 45, 45, 46, 46, 48, 55, 56, 59, 67, 67, 70, 76,
+ 76, 80, 49, 47, 46, 46, 47, 47, 48, 56, 57, 60, 67, 67, 71, 77, 77, 81,
+ 53, 50, 49, 49, 49, 49, 51, 58, 59, 62, 71, 71, 74, 81, 81, 86, 53, 51,
+ 49, 49, 50, 50, 51, 59, 60, 63, 71, 71, 75, 82, 82, 87, 55, 52, 51, 51,
+ 51, 51, 53, 60, 61, 64, 72, 72, 76, 83, 83, 88, 58, 55, 54, 54, 54, 54,
+ 55, 62, 63, 67, 75, 75, 79, 87, 87, 92, 58, 55, 54, 54, 54, 54, 55, 62,
+ 63, 67, 75, 75, 79, 87, 87, 92,
/* Size 4x16 */
- 31, 32, 38, 53, 32, 32, 37, 51, 32, 32, 36, 49, 32, 33, 36, 49, 32, 34,
- 38, 50, 32, 35, 39, 49, 33, 36, 41, 51, 34, 37, 44, 54, 35, 38, 49, 60,
- 37, 40, 51, 63, 38, 40, 52, 65, 42, 43, 56, 71, 45, 45, 58, 75, 47, 47,
- 60, 77, 51, 50, 63, 82, 55, 54, 67, 87,
- /* Size 16x4 */
31, 32, 32, 32, 32, 32, 33, 34, 35, 37, 38, 42, 45, 47, 51, 55, 32, 32,
32, 33, 34, 35, 36, 37, 38, 40, 40, 43, 45, 47, 50, 54, 38, 37, 36, 36,
38, 39, 41, 44, 49, 51, 52, 56, 58, 60, 63, 67, 53, 51, 49, 49, 50, 49,
51, 54, 60, 63, 65, 71, 75, 77, 82, 87,
+ /* Size 16x4 */
+ 31, 32, 38, 53, 32, 32, 37, 51, 32, 32, 36, 49, 32, 33, 36, 49, 32, 34,
+ 38, 50, 32, 35, 39, 49, 33, 36, 41, 51, 34, 37, 44, 54, 35, 38, 49, 60,
+ 37, 40, 51, 63, 38, 40, 52, 65, 42, 43, 56, 71, 45, 45, 58, 75, 47, 47,
+ 60, 77, 51, 50, 63, 82, 55, 54, 67, 87,
/* Size 8x32 */
- 32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 43, 46, 52, 31, 32,
- 32, 33, 35, 42, 45, 51, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32, 32, 33,
- 35, 41, 44, 49, 31, 32, 32, 33, 34, 41, 44, 49, 31, 32, 33, 33, 35, 41,
- 44, 49, 32, 32, 33, 34, 36, 42, 45, 49, 32, 32, 34, 34, 36, 42, 45, 50,
- 32, 32, 34, 35, 37, 42, 45, 49, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33,
- 35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44, 47, 51, 34, 34, 36, 38,
- 42, 48, 50, 54, 34, 34, 36, 38, 42, 48, 50, 54, 35, 34, 37, 39, 45, 50,
- 53, 57, 36, 34, 37, 40, 48, 54, 56, 60, 36, 34, 37, 40, 48, 54, 56, 60,
- 38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 39, 37,
- 40, 42, 50, 58, 60, 65, 42, 40, 42, 44, 52, 61, 64, 69, 44, 41, 42, 45,
- 53, 63, 66, 71, 44, 41, 43, 45, 54, 63, 66, 72, 47, 44, 45, 47, 56, 66,
- 69, 75, 48, 45, 46, 48, 56, 67, 70, 76, 49, 46, 47, 48, 57, 67, 71, 77,
- 53, 49, 49, 51, 59, 71, 74, 81, 53, 49, 50, 51, 60, 71, 75, 82, 55, 51,
- 51, 53, 61, 72, 76, 83, 58, 54, 54, 55, 63, 75, 79, 87, 58, 54, 54, 55,
- 63, 75, 79, 87,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 35, 36, 36,
38, 39, 39, 42, 44, 44, 47, 48, 49, 53, 53, 55, 58, 58, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 36, 37, 37, 40,
@@ -3950,7 +3934,23 @@
44, 45, 45, 45, 45, 45, 47, 50, 50, 53, 56, 56, 58, 60, 60, 64, 66, 66,
69, 70, 71, 74, 75, 76, 79, 79, 53, 52, 51, 51, 49, 49, 49, 49, 50, 49,
49, 49, 51, 54, 54, 57, 60, 60, 63, 65, 65, 69, 71, 72, 75, 76, 77, 81,
- 82, 83, 87, 87 },
+ 82, 83, 87, 87,
+ /* Size 32x8 */
+ 32, 31, 32, 32, 36, 44, 47, 53, 31, 32, 32, 33, 35, 43, 46, 52, 31, 32,
+ 32, 33, 35, 42, 45, 51, 31, 32, 32, 33, 35, 42, 45, 51, 31, 32, 32, 33,
+ 35, 41, 44, 49, 31, 32, 32, 33, 34, 41, 44, 49, 31, 32, 33, 33, 35, 41,
+ 44, 49, 32, 32, 33, 34, 36, 42, 45, 49, 32, 32, 34, 34, 36, 42, 45, 50,
+ 32, 32, 34, 35, 37, 42, 45, 49, 32, 33, 35, 36, 38, 42, 45, 49, 32, 33,
+ 35, 36, 38, 42, 45, 49, 32, 33, 35, 36, 40, 44, 47, 51, 34, 34, 36, 38,
+ 42, 48, 50, 54, 34, 34, 36, 38, 42, 48, 50, 54, 35, 34, 37, 39, 45, 50,
+ 53, 57, 36, 34, 37, 40, 48, 54, 56, 60, 36, 34, 37, 40, 48, 54, 56, 60,
+ 38, 36, 39, 41, 49, 56, 58, 63, 39, 37, 40, 42, 50, 58, 60, 65, 39, 37,
+ 40, 42, 50, 58, 60, 65, 42, 40, 42, 44, 52, 61, 64, 69, 44, 41, 42, 45,
+ 53, 63, 66, 71, 44, 41, 43, 45, 54, 63, 66, 72, 47, 44, 45, 47, 56, 66,
+ 69, 75, 48, 45, 46, 48, 56, 67, 70, 76, 49, 46, 47, 48, 57, 67, 71, 77,
+ 53, 49, 49, 51, 59, 71, 74, 81, 53, 49, 50, 51, 60, 71, 75, 82, 55, 51,
+ 51, 53, 61, 72, 76, 83, 58, 54, 54, 55, 63, 75, 79, 87, 58, 54, 54, 55,
+ 63, 75, 79, 87 },
{ /* Chroma */
/* Size 4x4 */
31, 38, 47, 49, 38, 47, 46, 46, 47, 46, 54, 57, 49, 46, 57, 66,
@@ -4034,21 +4034,12 @@
71, 71, 54, 53, 52, 52, 50, 49, 49, 49, 49, 49, 48, 48, 49, 52, 52, 53,
55, 55, 57, 58, 58, 61, 62, 63, 64, 65, 66, 68, 68, 69, 71, 71,
/* Size 4x8 */
- 31, 38, 47, 50, 31, 40, 46, 48, 36, 44, 47, 47, 42, 47, 50, 50, 47, 48,
- 53, 54, 46, 46, 54, 60, 48, 46, 55, 64, 50, 48, 56, 67,
- /* Size 8x4 */
31, 31, 36, 42, 47, 46, 48, 50, 38, 40, 44, 47, 48, 46, 46, 48, 47, 46,
47, 50, 53, 54, 55, 56, 50, 48, 47, 50, 54, 60, 64, 67,
+ /* Size 8x4 */
+ 31, 38, 47, 50, 31, 40, 46, 48, 36, 44, 47, 47, 42, 47, 50, 50, 47, 48,
+ 53, 54, 46, 46, 54, 60, 48, 46, 55, 64, 50, 48, 56, 67,
/* Size 8x16 */
- 32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32,
- 38, 40, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45, 46, 48, 33, 36, 41, 44,
- 47, 46, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47,
- 47, 48, 42, 43, 46, 48, 50, 49, 50, 50, 49, 46, 48, 49, 53, 53, 54, 54,
- 48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 49, 45,
- 45, 47, 53, 58, 59, 61, 50, 46, 46, 48, 54, 59, 61, 63, 51, 47, 47, 48,
- 54, 60, 61, 64, 52, 48, 47, 48, 54, 61, 63, 66, 54, 50, 49, 50, 55, 62,
- 65, 68,
- /* Size 16x8 */
32, 31, 30, 31, 33, 37, 39, 42, 49, 48, 48, 49, 50, 51, 52, 54, 31, 31,
32, 33, 36, 40, 41, 43, 46, 46, 46, 45, 46, 47, 48, 50, 35, 37, 38, 38,
41, 45, 46, 46, 48, 47, 46, 45, 46, 47, 47, 49, 38, 40, 40, 41, 44, 47,
@@ -4057,37 +4048,16 @@
56, 58, 59, 60, 61, 62, 50, 48, 46, 46, 46, 46, 47, 50, 54, 55, 56, 59,
61, 61, 63, 65, 52, 50, 48, 48, 47, 47, 48, 50, 54, 56, 57, 61, 63, 64,
66, 68,
+ /* Size 16x8 */
+ 32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32,
+ 38, 40, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45, 46, 48, 33, 36, 41, 44,
+ 47, 46, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47,
+ 47, 48, 42, 43, 46, 48, 50, 49, 50, 50, 49, 46, 48, 49, 53, 53, 54, 54,
+ 48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 49, 45,
+ 45, 47, 53, 58, 59, 61, 50, 46, 46, 48, 54, 59, 61, 63, 51, 47, 47, 48,
+ 54, 60, 61, 64, 52, 48, 47, 48, 54, 61, 63, 66, 54, 50, 49, 50, 55, 62,
+ 65, 68,
/* Size 16x32 */
- 32, 31, 31, 31, 35, 37, 38, 47, 48, 48, 49, 49, 50, 52, 52, 54, 31, 31,
- 31, 32, 36, 38, 39, 46, 47, 47, 48, 48, 49, 50, 50, 53, 31, 31, 31, 32,
- 37, 38, 40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 31, 31, 31, 32, 37, 38,
- 40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 30, 31, 32, 32, 38, 39, 40, 45,
- 46, 46, 45, 45, 46, 48, 48, 50, 30, 31, 32, 33, 38, 40, 41, 45, 46, 46,
- 45, 45, 46, 48, 48, 50, 31, 32, 33, 33, 38, 40, 41, 45, 46, 46, 45, 45,
- 46, 48, 48, 50, 33, 35, 35, 36, 41, 43, 43, 46, 47, 46, 45, 45, 46, 47,
- 47, 49, 33, 35, 36, 36, 41, 43, 44, 46, 47, 46, 46, 46, 46, 47, 47, 49,
- 34, 36, 37, 37, 42, 44, 45, 47, 47, 47, 45, 45, 46, 47, 47, 49, 37, 39,
- 40, 41, 45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 37, 39, 40, 41,
- 45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 39, 40, 41, 42, 46, 47,
- 47, 48, 48, 48, 47, 47, 47, 48, 48, 50, 42, 42, 43, 43, 46, 47, 48, 50,
- 50, 50, 49, 49, 50, 50, 50, 52, 42, 42, 43, 43, 46, 47, 48, 50, 50, 50,
- 49, 49, 50, 50, 50, 52, 45, 45, 44, 45, 47, 47, 48, 51, 51, 51, 51, 51,
- 52, 52, 52, 54, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54,
- 54, 55, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54, 54, 55,
- 48, 47, 46, 46, 47, 47, 48, 52, 53, 53, 55, 55, 55, 56, 56, 57, 48, 46,
- 46, 46, 46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 48, 46, 46, 46,
- 46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 49, 46, 45, 45, 46, 46,
- 47, 52, 53, 54, 57, 57, 58, 60, 60, 61, 49, 46, 45, 45, 45, 46, 47, 52,
- 53, 55, 58, 58, 59, 61, 61, 62, 49, 46, 45, 45, 46, 46, 47, 52, 53, 55,
- 58, 58, 60, 61, 61, 63, 50, 47, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59,
- 61, 63, 63, 65, 50, 48, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59, 61, 64,
- 64, 65, 51, 48, 47, 47, 47, 47, 48, 53, 54, 55, 60, 60, 61, 64, 64, 66,
- 52, 49, 48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 52, 49,
- 48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 53, 50, 48, 48,
- 48, 48, 49, 54, 54, 56, 61, 61, 63, 67, 67, 69, 54, 51, 50, 50, 49, 49,
- 50, 55, 55, 57, 62, 62, 65, 68, 68, 71, 54, 51, 50, 50, 49, 49, 50, 55,
- 55, 57, 62, 62, 65, 68, 68, 71,
- /* Size 32x16 */
32, 31, 31, 31, 30, 30, 31, 33, 33, 34, 37, 37, 39, 42, 42, 45, 49, 49,
48, 48, 48, 49, 49, 49, 50, 50, 51, 52, 52, 53, 54, 54, 31, 31, 31, 31,
31, 31, 32, 35, 35, 36, 39, 39, 40, 42, 42, 45, 47, 47, 47, 46, 46, 46,
@@ -4117,33 +4087,47 @@
57, 60, 61, 61, 63, 64, 64, 66, 66, 67, 68, 68, 54, 53, 52, 52, 50, 50,
50, 49, 49, 49, 48, 48, 50, 52, 52, 54, 55, 55, 57, 59, 59, 61, 62, 63,
65, 65, 66, 68, 68, 69, 71, 71,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 35, 37, 38, 47, 48, 48, 49, 49, 50, 52, 52, 54, 31, 31,
+ 31, 32, 36, 38, 39, 46, 47, 47, 48, 48, 49, 50, 50, 53, 31, 31, 31, 32,
+ 37, 38, 40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 31, 31, 31, 32, 37, 38,
+ 40, 46, 47, 47, 47, 47, 48, 50, 50, 52, 30, 31, 32, 32, 38, 39, 40, 45,
+ 46, 46, 45, 45, 46, 48, 48, 50, 30, 31, 32, 33, 38, 40, 41, 45, 46, 46,
+ 45, 45, 46, 48, 48, 50, 31, 32, 33, 33, 38, 40, 41, 45, 46, 46, 45, 45,
+ 46, 48, 48, 50, 33, 35, 35, 36, 41, 43, 43, 46, 47, 46, 45, 45, 46, 47,
+ 47, 49, 33, 35, 36, 36, 41, 43, 44, 46, 47, 46, 46, 46, 46, 47, 47, 49,
+ 34, 36, 37, 37, 42, 44, 45, 47, 47, 47, 45, 45, 46, 47, 47, 49, 37, 39,
+ 40, 41, 45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 37, 39, 40, 41,
+ 45, 47, 47, 47, 47, 47, 45, 45, 46, 47, 47, 48, 39, 40, 41, 42, 46, 47,
+ 47, 48, 48, 48, 47, 47, 47, 48, 48, 50, 42, 42, 43, 43, 46, 47, 48, 50,
+ 50, 50, 49, 49, 50, 50, 50, 52, 42, 42, 43, 43, 46, 47, 48, 50, 50, 50,
+ 49, 49, 50, 50, 50, 52, 45, 45, 44, 45, 47, 47, 48, 51, 51, 51, 51, 51,
+ 52, 52, 52, 54, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54,
+ 54, 55, 49, 47, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 54, 54, 54, 55,
+ 48, 47, 46, 46, 47, 47, 48, 52, 53, 53, 55, 55, 55, 56, 56, 57, 48, 46,
+ 46, 46, 46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 48, 46, 46, 46,
+ 46, 47, 48, 52, 53, 54, 56, 56, 56, 57, 57, 59, 49, 46, 45, 45, 46, 46,
+ 47, 52, 53, 54, 57, 57, 58, 60, 60, 61, 49, 46, 45, 45, 45, 46, 47, 52,
+ 53, 55, 58, 58, 59, 61, 61, 62, 49, 46, 45, 45, 46, 46, 47, 52, 53, 55,
+ 58, 58, 60, 61, 61, 63, 50, 47, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59,
+ 61, 63, 63, 65, 50, 48, 46, 46, 46, 46, 48, 53, 54, 55, 59, 59, 61, 64,
+ 64, 65, 51, 48, 47, 47, 47, 47, 48, 53, 54, 55, 60, 60, 61, 64, 64, 66,
+ 52, 49, 48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 52, 49,
+ 48, 48, 47, 47, 48, 53, 54, 56, 61, 61, 63, 66, 66, 68, 53, 50, 48, 48,
+ 48, 48, 49, 54, 54, 56, 61, 61, 63, 67, 67, 69, 54, 51, 50, 50, 49, 49,
+ 50, 55, 55, 57, 62, 62, 65, 68, 68, 71, 54, 51, 50, 50, 49, 49, 50, 55,
+ 55, 57, 62, 62, 65, 68, 68, 71,
/* Size 4x16 */
- 31, 37, 48, 52, 31, 38, 47, 50, 31, 39, 46, 48, 32, 40, 46, 48, 35, 43,
- 46, 47, 39, 47, 47, 47, 40, 47, 48, 48, 42, 47, 50, 50, 47, 48, 53, 54,
- 47, 47, 53, 56, 46, 47, 54, 57, 46, 46, 55, 61, 47, 46, 55, 63, 48, 47,
- 55, 64, 49, 47, 56, 66, 51, 49, 57, 68,
- /* Size 16x4 */
31, 31, 31, 32, 35, 39, 40, 42, 47, 47, 46, 46, 47, 48, 49, 51, 37, 38,
39, 40, 43, 47, 47, 47, 48, 47, 47, 46, 46, 47, 47, 49, 48, 47, 46, 46,
46, 47, 48, 50, 53, 53, 54, 55, 55, 55, 56, 57, 52, 50, 48, 48, 47, 47,
48, 50, 54, 56, 57, 61, 63, 64, 66, 68,
+ /* Size 16x4 */
+ 31, 37, 48, 52, 31, 38, 47, 50, 31, 39, 46, 48, 32, 40, 46, 48, 35, 43,
+ 46, 47, 39, 47, 47, 47, 40, 47, 48, 48, 42, 47, 50, 50, 47, 48, 53, 54,
+ 47, 47, 53, 56, 46, 47, 54, 57, 46, 46, 55, 61, 47, 46, 55, 63, 48, 47,
+ 55, 64, 49, 47, 56, 66, 51, 49, 57, 68,
/* Size 8x32 */
- 32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 36, 39, 47, 48, 49, 50, 31, 31,
- 37, 40, 47, 47, 48, 50, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32, 38, 40,
- 46, 45, 46, 48, 30, 32, 38, 41, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45,
- 46, 48, 33, 35, 41, 43, 47, 45, 46, 47, 33, 36, 41, 44, 47, 46, 46, 47,
- 34, 37, 42, 45, 47, 45, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 37, 40,
- 45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47, 47, 48, 42, 43, 46, 48,
- 50, 49, 50, 50, 42, 43, 46, 48, 50, 49, 50, 50, 45, 44, 47, 48, 51, 51,
- 52, 52, 49, 46, 48, 49, 53, 53, 54, 54, 49, 46, 48, 49, 53, 53, 54, 54,
- 48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 48, 46,
- 46, 48, 53, 56, 56, 57, 49, 45, 46, 47, 53, 57, 58, 60, 49, 45, 45, 47,
- 53, 58, 59, 61, 49, 45, 46, 47, 53, 58, 60, 61, 50, 46, 46, 48, 54, 59,
- 61, 63, 50, 46, 46, 48, 54, 59, 61, 64, 51, 47, 47, 48, 54, 60, 61, 64,
- 52, 48, 47, 48, 54, 61, 63, 66, 52, 48, 47, 48, 54, 61, 63, 66, 53, 48,
- 48, 49, 54, 61, 63, 67, 54, 50, 49, 50, 55, 62, 65, 68, 54, 50, 49, 50,
- 55, 62, 65, 68,
- /* Size 32x8 */
32, 31, 31, 31, 30, 30, 31, 33, 33, 34, 37, 37, 39, 42, 42, 45, 49, 49,
48, 48, 48, 49, 49, 49, 50, 50, 51, 52, 52, 53, 54, 54, 31, 31, 31, 31,
32, 32, 33, 35, 36, 37, 40, 40, 41, 43, 43, 44, 46, 46, 46, 46, 46, 45,
@@ -4158,7 +4142,23 @@
46, 46, 46, 46, 46, 46, 47, 50, 50, 52, 54, 54, 55, 56, 56, 58, 59, 60,
61, 61, 61, 63, 63, 63, 65, 65, 52, 50, 50, 50, 48, 48, 48, 47, 47, 47,
47, 47, 48, 50, 50, 52, 54, 54, 56, 57, 57, 60, 61, 61, 63, 64, 64, 66,
- 66, 67, 68, 68 },
+ 66, 67, 68, 68,
+ /* Size 32x8 */
+ 32, 31, 35, 38, 48, 49, 50, 52, 31, 31, 36, 39, 47, 48, 49, 50, 31, 31,
+ 37, 40, 47, 47, 48, 50, 31, 31, 37, 40, 47, 47, 48, 50, 30, 32, 38, 40,
+ 46, 45, 46, 48, 30, 32, 38, 41, 46, 45, 46, 48, 31, 33, 38, 41, 46, 45,
+ 46, 48, 33, 35, 41, 43, 47, 45, 46, 47, 33, 36, 41, 44, 47, 46, 46, 47,
+ 34, 37, 42, 45, 47, 45, 46, 47, 37, 40, 45, 47, 47, 45, 46, 47, 37, 40,
+ 45, 47, 47, 45, 46, 47, 39, 41, 46, 47, 48, 47, 47, 48, 42, 43, 46, 48,
+ 50, 49, 50, 50, 42, 43, 46, 48, 50, 49, 50, 50, 45, 44, 47, 48, 51, 51,
+ 52, 52, 49, 46, 48, 49, 53, 53, 54, 54, 49, 46, 48, 49, 53, 53, 54, 54,
+ 48, 46, 47, 48, 53, 55, 55, 56, 48, 46, 46, 48, 53, 56, 56, 57, 48, 46,
+ 46, 48, 53, 56, 56, 57, 49, 45, 46, 47, 53, 57, 58, 60, 49, 45, 45, 47,
+ 53, 58, 59, 61, 49, 45, 46, 47, 53, 58, 60, 61, 50, 46, 46, 48, 54, 59,
+ 61, 63, 50, 46, 46, 48, 54, 59, 61, 64, 51, 47, 47, 48, 54, 60, 61, 64,
+ 52, 48, 47, 48, 54, 61, 63, 66, 52, 48, 47, 48, 54, 61, 63, 66, 53, 48,
+ 48, 49, 54, 61, 63, 67, 54, 50, 49, 50, 55, 62, 65, 68, 54, 50, 49, 50,
+ 55, 62, 65, 68 },
},
{
{ /* Luma */
@@ -4244,21 +4244,12 @@
71, 74, 51, 50, 49, 49, 48, 47, 47, 47, 48, 48, 48, 48, 48, 48, 50, 53,
53, 54, 57, 58, 58, 61, 63, 63, 66, 69, 69, 70, 73, 74, 74, 77,
/* Size 4x8 */
- 31, 32, 35, 43, 32, 33, 34, 41, 32, 34, 36, 42, 32, 35, 38, 42, 34, 37,
- 43, 49, 37, 40, 49, 56, 42, 43, 53, 63, 46, 46, 56, 67,
- /* Size 8x4 */
31, 32, 32, 32, 34, 37, 42, 46, 32, 33, 34, 35, 37, 40, 43, 46, 35, 34,
36, 38, 43, 49, 53, 56, 43, 41, 42, 42, 49, 56, 63, 67,
+ /* Size 8x4 */
+ 31, 32, 35, 43, 32, 33, 34, 41, 32, 34, 36, 42, 32, 35, 38, 42, 34, 37,
+ 43, 49, 37, 40, 49, 56, 42, 43, 53, 63, 46, 46, 56, 67,
/* Size 8x16 */
- 32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32,
- 32, 32, 34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 33, 34,
- 35, 36, 42, 44, 32, 32, 33, 34, 36, 36, 42, 45, 32, 33, 34, 35, 37, 38,
- 42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42, 48, 50,
- 35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 39, 37,
- 39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51, 60, 62, 44, 41, 42, 43,
- 51, 53, 63, 66, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45, 45, 46, 54, 56,
- 67, 70,
- /* Size 16x8 */
32, 31, 31, 31, 31, 32, 32, 32, 34, 35, 36, 39, 41, 44, 47, 48, 31, 32,
32, 32, 32, 32, 33, 33, 34, 34, 34, 37, 39, 41, 44, 45, 31, 32, 32, 32,
33, 33, 34, 34, 35, 36, 36, 39, 40, 42, 44, 45, 32, 32, 32, 33, 34, 34,
@@ -4267,37 +4258,16 @@
48, 50, 51, 53, 56, 56, 44, 42, 41, 41, 42, 42, 42, 44, 48, 52, 54, 58,
60, 63, 66, 67, 47, 45, 45, 44, 44, 45, 45, 47, 50, 55, 56, 60, 62, 66,
69, 70,
+ /* Size 16x8 */
+ 32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32,
+ 32, 32, 34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 33, 34,
+ 35, 36, 42, 44, 32, 32, 33, 34, 36, 36, 42, 45, 32, 33, 34, 35, 37, 38,
+ 42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42, 48, 50,
+ 35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 39, 37,
+ 39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51, 60, 62, 44, 41, 42, 43,
+ 51, 53, 63, 66, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45, 45, 46, 54, 56,
+ 67, 70,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 32, 32, 32, 35, 36, 36, 40, 44, 44, 47, 53, 31, 31,
- 32, 32, 32, 32, 32, 33, 35, 35, 35, 39, 43, 43, 46, 52, 31, 32, 32, 32,
- 32, 32, 32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32,
- 32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32, 32, 33,
- 34, 35, 35, 39, 41, 41, 45, 50, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34,
- 34, 38, 41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 38,
- 41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 35, 35, 38, 41, 41,
- 44, 49, 31, 32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 39, 42, 42, 44, 49,
- 32, 32, 32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32,
- 32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32, 32, 32,
- 33, 35, 35, 35, 37, 37, 37, 40, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35,
- 35, 36, 37, 38, 38, 41, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35, 35, 36,
- 37, 38, 38, 41, 42, 42, 45, 49, 32, 33, 33, 33, 34, 36, 36, 36, 39, 40,
- 40, 42, 44, 44, 47, 51, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45,
- 48, 48, 50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45, 48, 48,
- 50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 42, 43, 43, 46, 49, 49, 51, 55,
- 35, 35, 34, 34, 36, 38, 38, 39, 45, 47, 47, 50, 52, 52, 55, 59, 36, 35,
- 34, 34, 36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 36, 35, 34, 34,
- 36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 38, 37, 36, 36, 37, 40,
- 40, 41, 47, 49, 49, 53, 56, 56, 58, 63, 39, 38, 37, 37, 39, 40, 40, 42,
- 48, 50, 50, 54, 58, 58, 60, 65, 39, 38, 37, 37, 39, 40, 40, 42, 48, 50,
- 50, 54, 58, 58, 60, 65, 41, 40, 39, 39, 40, 41, 41, 43, 49, 51, 51, 56,
- 60, 60, 62, 67, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63,
- 66, 71, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63, 66, 71,
- 44, 43, 42, 42, 42, 43, 43, 45, 51, 54, 54, 59, 64, 64, 67, 72, 47, 45,
- 44, 44, 44, 45, 45, 47, 53, 56, 56, 61, 66, 66, 69, 75, 48, 46, 45, 45,
- 45, 46, 46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 48, 46, 45, 45, 45, 46,
- 46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 51, 49, 47, 47, 48, 48, 48, 50,
- 56, 58, 58, 64, 69, 69, 73, 79,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34,
35, 36, 36, 38, 39, 39, 41, 44, 44, 44, 47, 48, 48, 51, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 35, 35, 35, 37,
@@ -4327,33 +4297,47 @@
56, 58, 60, 60, 62, 66, 66, 67, 69, 70, 70, 73, 53, 52, 51, 51, 50, 49,
49, 49, 49, 50, 50, 49, 49, 49, 51, 54, 54, 55, 59, 60, 60, 63, 65, 65,
67, 71, 71, 72, 75, 76, 76, 79,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 32, 32, 32, 35, 36, 36, 40, 44, 44, 47, 53, 31, 31,
+ 32, 32, 32, 32, 32, 33, 35, 35, 35, 39, 43, 43, 46, 52, 31, 32, 32, 32,
+ 32, 32, 32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32,
+ 32, 33, 35, 35, 35, 39, 42, 42, 45, 51, 31, 32, 32, 32, 32, 32, 32, 33,
+ 34, 35, 35, 39, 41, 41, 45, 50, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34,
+ 34, 38, 41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 38,
+ 41, 41, 44, 49, 31, 32, 32, 32, 32, 33, 33, 33, 34, 35, 35, 38, 41, 41,
+ 44, 49, 31, 32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 39, 42, 42, 44, 49,
+ 32, 32, 32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32,
+ 32, 32, 33, 34, 34, 34, 36, 36, 36, 39, 42, 42, 45, 50, 32, 32, 32, 32,
+ 33, 35, 35, 35, 37, 37, 37, 40, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35,
+ 35, 36, 37, 38, 38, 41, 42, 42, 45, 49, 32, 32, 33, 33, 34, 35, 35, 36,
+ 37, 38, 38, 41, 42, 42, 45, 49, 32, 33, 33, 33, 34, 36, 36, 36, 39, 40,
+ 40, 42, 44, 44, 47, 51, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45,
+ 48, 48, 50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 41, 42, 42, 45, 48, 48,
+ 50, 54, 34, 34, 34, 34, 35, 37, 37, 38, 42, 43, 43, 46, 49, 49, 51, 55,
+ 35, 35, 34, 34, 36, 38, 38, 39, 45, 47, 47, 50, 52, 52, 55, 59, 36, 35,
+ 34, 34, 36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 36, 35, 34, 34,
+ 36, 38, 38, 40, 46, 48, 48, 51, 54, 54, 56, 60, 38, 37, 36, 36, 37, 40,
+ 40, 41, 47, 49, 49, 53, 56, 56, 58, 63, 39, 38, 37, 37, 39, 40, 40, 42,
+ 48, 50, 50, 54, 58, 58, 60, 65, 39, 38, 37, 37, 39, 40, 40, 42, 48, 50,
+ 50, 54, 58, 58, 60, 65, 41, 40, 39, 39, 40, 41, 41, 43, 49, 51, 51, 56,
+ 60, 60, 62, 67, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63,
+ 66, 71, 44, 42, 41, 41, 42, 43, 43, 45, 51, 53, 53, 59, 63, 63, 66, 71,
+ 44, 43, 42, 42, 42, 43, 43, 45, 51, 54, 54, 59, 64, 64, 67, 72, 47, 45,
+ 44, 44, 44, 45, 45, 47, 53, 56, 56, 61, 66, 66, 69, 75, 48, 46, 45, 45,
+ 45, 46, 46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 48, 46, 45, 45, 45, 46,
+ 46, 48, 54, 56, 56, 62, 67, 67, 70, 76, 51, 49, 47, 47, 48, 48, 48, 50,
+ 56, 58, 58, 64, 69, 69, 73, 79,
/* Size 4x16 */
- 31, 32, 36, 44, 32, 32, 35, 42, 32, 32, 35, 41, 32, 33, 34, 41, 32, 34,
- 36, 42, 32, 34, 36, 42, 32, 35, 38, 42, 33, 36, 40, 44, 34, 37, 42, 48,
- 35, 38, 47, 52, 35, 38, 48, 54, 38, 40, 50, 58, 40, 41, 51, 60, 42, 43,
- 53, 63, 45, 45, 56, 66, 46, 46, 56, 67,
- /* Size 16x4 */
31, 32, 32, 32, 32, 32, 32, 33, 34, 35, 35, 38, 40, 42, 45, 46, 32, 32,
32, 33, 34, 34, 35, 36, 37, 38, 38, 40, 41, 43, 45, 46, 36, 35, 35, 34,
36, 36, 38, 40, 42, 47, 48, 50, 51, 53, 56, 56, 44, 42, 41, 41, 42, 42,
42, 44, 48, 52, 54, 58, 60, 63, 66, 67,
+ /* Size 16x4 */
+ 31, 32, 36, 44, 32, 32, 35, 42, 32, 32, 35, 41, 32, 33, 34, 41, 32, 34,
+ 36, 42, 32, 34, 36, 42, 32, 35, 38, 42, 33, 36, 40, 44, 34, 37, 42, 48,
+ 35, 38, 47, 52, 35, 38, 48, 54, 38, 40, 50, 58, 40, 41, 51, 60, 42, 43,
+ 53, 63, 45, 45, 56, 66, 46, 46, 56, 67,
/* Size 8x32 */
- 32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 43, 46, 31, 32,
- 32, 32, 35, 35, 42, 45, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32, 32, 32,
- 34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 32, 33, 34, 34,
- 41, 44, 31, 32, 32, 33, 34, 35, 41, 44, 31, 32, 33, 34, 35, 36, 42, 44,
- 32, 32, 33, 34, 36, 36, 42, 45, 32, 32, 33, 34, 36, 36, 42, 45, 32, 32,
- 33, 35, 37, 37, 42, 45, 32, 33, 34, 35, 37, 38, 42, 45, 32, 33, 34, 35,
- 37, 38, 42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42,
- 48, 50, 34, 34, 35, 37, 41, 42, 48, 50, 34, 34, 35, 37, 42, 43, 49, 51,
- 35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 36, 34,
- 36, 38, 46, 48, 54, 56, 38, 36, 37, 40, 47, 49, 56, 58, 39, 37, 39, 40,
- 48, 50, 58, 60, 39, 37, 39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51,
- 60, 62, 44, 41, 42, 43, 51, 53, 63, 66, 44, 41, 42, 43, 51, 53, 63, 66,
- 44, 42, 42, 43, 51, 54, 64, 67, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45,
- 45, 46, 54, 56, 67, 70, 48, 45, 45, 46, 54, 56, 67, 70, 51, 47, 48, 48,
- 56, 58, 69, 73,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 34, 34,
35, 36, 36, 38, 39, 39, 41, 44, 44, 44, 47, 48, 48, 51, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 34, 36,
@@ -4368,7 +4352,23 @@
41, 41, 42, 42, 42, 42, 42, 42, 44, 48, 48, 49, 52, 54, 54, 56, 58, 58,
60, 63, 63, 64, 66, 67, 67, 69, 47, 46, 45, 45, 45, 44, 44, 44, 44, 45,
45, 45, 45, 45, 47, 50, 50, 51, 55, 56, 56, 58, 60, 60, 62, 66, 66, 67,
- 69, 70, 70, 73 },
+ 69, 70, 70, 73,
+ /* Size 32x8 */
+ 32, 31, 31, 32, 35, 36, 44, 47, 31, 32, 32, 32, 35, 35, 43, 46, 31, 32,
+ 32, 32, 35, 35, 42, 45, 31, 32, 32, 32, 35, 35, 42, 45, 31, 32, 32, 32,
+ 34, 35, 41, 45, 31, 32, 32, 33, 34, 34, 41, 44, 31, 32, 32, 33, 34, 34,
+ 41, 44, 31, 32, 32, 33, 34, 35, 41, 44, 31, 32, 33, 34, 35, 36, 42, 44,
+ 32, 32, 33, 34, 36, 36, 42, 45, 32, 32, 33, 34, 36, 36, 42, 45, 32, 32,
+ 33, 35, 37, 37, 42, 45, 32, 33, 34, 35, 37, 38, 42, 45, 32, 33, 34, 35,
+ 37, 38, 42, 45, 32, 33, 34, 36, 39, 40, 44, 47, 34, 34, 35, 37, 41, 42,
+ 48, 50, 34, 34, 35, 37, 41, 42, 48, 50, 34, 34, 35, 37, 42, 43, 49, 51,
+ 35, 34, 36, 38, 45, 47, 52, 55, 36, 34, 36, 38, 46, 48, 54, 56, 36, 34,
+ 36, 38, 46, 48, 54, 56, 38, 36, 37, 40, 47, 49, 56, 58, 39, 37, 39, 40,
+ 48, 50, 58, 60, 39, 37, 39, 40, 48, 50, 58, 60, 41, 39, 40, 41, 49, 51,
+ 60, 62, 44, 41, 42, 43, 51, 53, 63, 66, 44, 41, 42, 43, 51, 53, 63, 66,
+ 44, 42, 42, 43, 51, 54, 64, 67, 47, 44, 44, 45, 53, 56, 66, 69, 48, 45,
+ 45, 46, 54, 56, 67, 70, 48, 45, 45, 46, 54, 56, 67, 70, 51, 47, 48, 48,
+ 56, 58, 69, 73 },
{ /* Chroma */
/* Size 4x4 */
31, 37, 47, 47, 37, 44, 47, 45, 47, 47, 53, 53, 47, 45, 53, 59,
@@ -4452,21 +4452,12 @@
61, 63, 51, 50, 49, 49, 48, 47, 47, 47, 47, 47, 47, 47, 46, 46, 48, 50,
50, 51, 53, 54, 54, 56, 57, 57, 58, 60, 60, 61, 62, 63, 63, 64,
/* Size 4x8 */
- 31, 38, 47, 48, 31, 40, 46, 45, 35, 43, 47, 46, 39, 47, 47, 45, 43, 47,
- 50, 50, 47, 47, 53, 55, 46, 46, 53, 58, 48, 46, 54, 59,
- /* Size 8x4 */
31, 31, 35, 39, 43, 47, 46, 48, 38, 40, 43, 47, 47, 47, 46, 46, 47, 46,
47, 47, 50, 53, 53, 54, 48, 45, 46, 45, 50, 55, 58, 59,
+ /* Size 8x4 */
+ 31, 38, 47, 48, 31, 40, 46, 45, 35, 43, 47, 46, 39, 47, 47, 45, 43, 47,
+ 50, 50, 47, 47, 53, 55, 46, 46, 53, 58, 48, 46, 54, 59,
/* Size 8x16 */
- 32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32,
- 34, 39, 45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 33, 35, 37, 42,
- 46, 47, 45, 46, 33, 36, 38, 43, 46, 47, 46, 46, 37, 40, 43, 47, 47, 47,
- 45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50, 49, 50,
- 47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 48, 46,
- 46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53, 57, 57, 49, 45, 45, 46,
- 51, 53, 58, 59, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54,
- 59, 61,
- /* Size 16x8 */
32, 31, 31, 30, 33, 33, 37, 39, 42, 47, 49, 48, 48, 49, 50, 50, 31, 31,
32, 32, 35, 36, 40, 41, 43, 46, 46, 46, 45, 45, 46, 46, 33, 34, 34, 35,
37, 38, 43, 43, 44, 46, 47, 46, 46, 45, 46, 46, 37, 38, 39, 40, 42, 43,
@@ -4475,37 +4466,16 @@
53, 53, 53, 53, 54, 54, 49, 47, 46, 45, 45, 46, 45, 47, 49, 53, 53, 56,
57, 58, 59, 59, 50, 48, 47, 46, 46, 46, 46, 47, 50, 53, 54, 56, 57, 59,
61, 61,
+ /* Size 16x8 */
+ 32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32,
+ 34, 39, 45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 33, 35, 37, 42,
+ 46, 47, 45, 46, 33, 36, 38, 43, 46, 47, 46, 46, 37, 40, 43, 47, 47, 47,
+ 45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50, 49, 50,
+ 47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 48, 46,
+ 46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53, 57, 57, 49, 45, 45, 46,
+ 51, 53, 58, 59, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54,
+ 59, 61,
/* Size 16x32 */
- 32, 31, 31, 31, 33, 37, 37, 38, 45, 48, 48, 49, 49, 49, 50, 52, 31, 31,
- 31, 31, 33, 38, 38, 39, 45, 47, 47, 48, 48, 48, 49, 51, 31, 31, 31, 31,
- 34, 38, 38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 31, 31, 34, 38,
- 38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 32, 32, 34, 39, 39, 40,
- 45, 46, 46, 46, 46, 46, 47, 49, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46,
- 46, 45, 45, 45, 46, 48, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46, 46, 45,
- 45, 45, 46, 48, 31, 32, 33, 33, 35, 40, 40, 41, 45, 46, 46, 45, 45, 45,
- 46, 48, 33, 34, 35, 35, 37, 42, 42, 43, 46, 47, 47, 46, 45, 45, 46, 47,
- 33, 35, 36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 33, 35,
- 36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 35, 37, 38, 38,
- 41, 45, 45, 46, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47,
- 47, 47, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47, 47, 47,
- 47, 47, 47, 46, 45, 45, 46, 47, 39, 40, 41, 41, 43, 47, 47, 47, 48, 48,
- 48, 47, 47, 47, 47, 48, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49,
- 49, 49, 50, 50, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49, 49, 49,
- 50, 50, 43, 43, 43, 43, 45, 47, 47, 48, 50, 50, 50, 50, 50, 50, 50, 51,
- 47, 46, 46, 46, 46, 48, 48, 48, 51, 52, 52, 52, 53, 53, 53, 53, 49, 47,
- 46, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 49, 47, 46, 46,
- 47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 48, 47, 46, 46, 46, 47,
- 47, 48, 52, 53, 53, 54, 55, 55, 55, 56, 48, 47, 46, 46, 46, 47, 47, 48,
- 51, 53, 53, 54, 56, 56, 56, 57, 48, 47, 46, 46, 46, 47, 47, 48, 51, 53,
- 53, 54, 56, 56, 56, 57, 48, 47, 45, 45, 46, 46, 46, 47, 51, 53, 53, 55,
- 57, 57, 57, 59, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58,
- 59, 61, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58, 59, 61,
- 49, 47, 45, 45, 45, 46, 46, 47, 52, 53, 53, 56, 58, 58, 60, 62, 50, 48,
- 46, 46, 46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 63, 50, 48, 46, 46,
- 46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 50, 48, 46, 46, 46, 46,
- 46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 51, 49, 47, 47, 47, 47, 47, 48,
- 52, 54, 54, 58, 60, 60, 62, 65,
- /* Size 32x16 */
32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 42, 43,
47, 49, 49, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 51, 31, 31, 31, 31,
31, 31, 31, 32, 34, 35, 35, 37, 39, 39, 40, 42, 42, 43, 46, 47, 47, 47,
@@ -4535,33 +4505,47 @@
54, 55, 56, 56, 57, 59, 59, 60, 61, 61, 61, 62, 52, 51, 50, 50, 49, 48,
48, 48, 47, 47, 47, 47, 47, 47, 48, 50, 50, 51, 53, 54, 54, 56, 57, 57,
59, 61, 61, 62, 63, 64, 64, 65,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 33, 37, 37, 38, 45, 48, 48, 49, 49, 49, 50, 52, 31, 31,
+ 31, 31, 33, 38, 38, 39, 45, 47, 47, 48, 48, 48, 49, 51, 31, 31, 31, 31,
+ 34, 38, 38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 31, 31, 34, 38,
+ 38, 40, 45, 47, 47, 47, 47, 47, 48, 50, 31, 31, 32, 32, 34, 39, 39, 40,
+ 45, 46, 46, 46, 46, 46, 47, 49, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46,
+ 46, 45, 45, 45, 46, 48, 30, 31, 32, 32, 35, 40, 40, 41, 44, 46, 46, 45,
+ 45, 45, 46, 48, 31, 32, 33, 33, 35, 40, 40, 41, 45, 46, 46, 45, 45, 45,
+ 46, 48, 33, 34, 35, 35, 37, 42, 42, 43, 46, 47, 47, 46, 45, 45, 46, 47,
+ 33, 35, 36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 33, 35,
+ 36, 36, 38, 43, 43, 44, 46, 47, 47, 46, 46, 46, 46, 47, 35, 37, 38, 38,
+ 41, 45, 45, 46, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47,
+ 47, 47, 47, 47, 47, 46, 45, 45, 46, 47, 37, 39, 40, 40, 43, 47, 47, 47,
+ 47, 47, 47, 46, 45, 45, 46, 47, 39, 40, 41, 41, 43, 47, 47, 47, 48, 48,
+ 48, 47, 47, 47, 47, 48, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49,
+ 49, 49, 50, 50, 42, 42, 43, 43, 44, 47, 47, 48, 49, 50, 50, 49, 49, 49,
+ 50, 50, 43, 43, 43, 43, 45, 47, 47, 48, 50, 50, 50, 50, 50, 50, 50, 51,
+ 47, 46, 46, 46, 46, 48, 48, 48, 51, 52, 52, 52, 53, 53, 53, 53, 49, 47,
+ 46, 46, 47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 49, 47, 46, 46,
+ 47, 48, 48, 49, 52, 53, 53, 53, 53, 53, 54, 54, 48, 47, 46, 46, 46, 47,
+ 47, 48, 52, 53, 53, 54, 55, 55, 55, 56, 48, 47, 46, 46, 46, 47, 47, 48,
+ 51, 53, 53, 54, 56, 56, 56, 57, 48, 47, 46, 46, 46, 47, 47, 48, 51, 53,
+ 53, 54, 56, 56, 56, 57, 48, 47, 45, 45, 46, 46, 46, 47, 51, 53, 53, 55,
+ 57, 57, 57, 59, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58,
+ 59, 61, 49, 46, 45, 45, 45, 46, 46, 47, 51, 53, 53, 56, 58, 58, 59, 61,
+ 49, 47, 45, 45, 45, 46, 46, 47, 52, 53, 53, 56, 58, 58, 60, 62, 50, 48,
+ 46, 46, 46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 63, 50, 48, 46, 46,
+ 46, 46, 46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 50, 48, 46, 46, 46, 46,
+ 46, 48, 52, 54, 54, 57, 59, 59, 61, 64, 51, 49, 47, 47, 47, 47, 47, 48,
+ 52, 54, 54, 58, 60, 60, 62, 65,
/* Size 4x16 */
- 31, 37, 48, 49, 31, 38, 47, 47, 31, 39, 46, 46, 31, 40, 46, 45, 34, 42,
- 47, 45, 35, 43, 47, 46, 39, 47, 47, 45, 40, 47, 48, 47, 42, 47, 50, 49,
- 46, 48, 52, 53, 47, 48, 53, 53, 47, 47, 53, 56, 47, 46, 53, 57, 46, 46,
- 53, 58, 48, 46, 54, 59, 48, 46, 54, 59,
- /* Size 16x4 */
31, 31, 31, 31, 34, 35, 39, 40, 42, 46, 47, 47, 47, 46, 48, 48, 37, 38,
39, 40, 42, 43, 47, 47, 47, 48, 48, 47, 46, 46, 46, 46, 48, 47, 46, 46,
47, 47, 47, 48, 50, 52, 53, 53, 53, 53, 54, 54, 49, 47, 46, 45, 45, 46,
45, 47, 49, 53, 53, 56, 57, 58, 59, 59,
+ /* Size 16x4 */
+ 31, 37, 48, 49, 31, 38, 47, 47, 31, 39, 46, 46, 31, 40, 46, 45, 34, 42,
+ 47, 45, 35, 43, 47, 46, 39, 47, 47, 45, 40, 47, 48, 47, 42, 47, 50, 49,
+ 46, 48, 52, 53, 47, 48, 53, 53, 47, 47, 53, 56, 47, 46, 53, 57, 46, 46,
+ 53, 58, 48, 46, 54, 59, 48, 46, 54, 59,
/* Size 8x32 */
- 32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 33, 38, 45, 47, 48, 49, 31, 31,
- 34, 38, 45, 47, 47, 48, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32, 34, 39,
- 45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 30, 32, 35, 40, 44, 46,
- 45, 46, 31, 33, 35, 40, 45, 46, 45, 46, 33, 35, 37, 42, 46, 47, 45, 46,
- 33, 36, 38, 43, 46, 47, 46, 46, 33, 36, 38, 43, 46, 47, 46, 46, 35, 38,
- 41, 45, 47, 47, 45, 46, 37, 40, 43, 47, 47, 47, 45, 46, 37, 40, 43, 47,
- 47, 47, 45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50,
- 49, 50, 42, 43, 44, 47, 49, 50, 49, 50, 43, 43, 45, 47, 50, 50, 50, 50,
- 47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 49, 46,
- 47, 48, 52, 53, 53, 54, 48, 46, 46, 47, 52, 53, 55, 55, 48, 46, 46, 47,
- 51, 53, 56, 56, 48, 46, 46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53,
- 57, 57, 49, 45, 45, 46, 51, 53, 58, 59, 49, 45, 45, 46, 51, 53, 58, 59,
- 49, 45, 45, 46, 52, 53, 58, 60, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46,
- 46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54, 59, 61, 51, 47, 47, 47,
- 52, 54, 60, 62,
- /* Size 32x8 */
32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 42, 43,
47, 49, 49, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 51, 31, 31, 31, 31,
32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 43, 43, 46, 46, 46, 46,
@@ -4576,7 +4560,23 @@
45, 45, 45, 46, 46, 45, 45, 45, 47, 49, 49, 50, 53, 53, 53, 55, 56, 56,
57, 58, 58, 58, 59, 59, 59, 60, 50, 49, 48, 48, 47, 46, 46, 46, 46, 46,
46, 46, 46, 46, 47, 50, 50, 50, 53, 54, 54, 55, 56, 56, 57, 59, 59, 60,
- 61, 61, 61, 62 },
+ 61, 61, 61, 62,
+ /* Size 32x8 */
+ 32, 31, 33, 37, 45, 48, 49, 50, 31, 31, 33, 38, 45, 47, 48, 49, 31, 31,
+ 34, 38, 45, 47, 47, 48, 31, 31, 34, 38, 45, 47, 47, 48, 31, 32, 34, 39,
+ 45, 46, 46, 47, 30, 32, 35, 40, 44, 46, 45, 46, 30, 32, 35, 40, 44, 46,
+ 45, 46, 31, 33, 35, 40, 45, 46, 45, 46, 33, 35, 37, 42, 46, 47, 45, 46,
+ 33, 36, 38, 43, 46, 47, 46, 46, 33, 36, 38, 43, 46, 47, 46, 46, 35, 38,
+ 41, 45, 47, 47, 45, 46, 37, 40, 43, 47, 47, 47, 45, 46, 37, 40, 43, 47,
+ 47, 47, 45, 46, 39, 41, 43, 47, 48, 48, 47, 47, 42, 43, 44, 47, 49, 50,
+ 49, 50, 42, 43, 44, 47, 49, 50, 49, 50, 43, 43, 45, 47, 50, 50, 50, 50,
+ 47, 46, 46, 48, 51, 52, 53, 53, 49, 46, 47, 48, 52, 53, 53, 54, 49, 46,
+ 47, 48, 52, 53, 53, 54, 48, 46, 46, 47, 52, 53, 55, 55, 48, 46, 46, 47,
+ 51, 53, 56, 56, 48, 46, 46, 47, 51, 53, 56, 56, 48, 45, 46, 46, 51, 53,
+ 57, 57, 49, 45, 45, 46, 51, 53, 58, 59, 49, 45, 45, 46, 51, 53, 58, 59,
+ 49, 45, 45, 46, 52, 53, 58, 60, 50, 46, 46, 46, 52, 54, 59, 61, 50, 46,
+ 46, 46, 52, 54, 59, 61, 50, 46, 46, 46, 52, 54, 59, 61, 51, 47, 47, 47,
+ 52, 54, 60, 62 },
},
{
{ /* Luma */
@@ -4662,21 +4662,12 @@
63, 63, 44, 43, 42, 42, 42, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42,
42, 45, 47, 47, 47, 50, 54, 54, 54, 56, 58, 58, 58, 60, 63, 63,
/* Size 4x8 */
- 31, 32, 34, 39, 32, 32, 34, 38, 32, 33, 34, 38, 32, 33, 36, 40, 33, 34,
- 38, 42, 34, 36, 41, 47, 37, 38, 44, 52, 40, 40, 46, 56,
- /* Size 8x4 */
31, 32, 32, 32, 33, 34, 37, 40, 32, 32, 33, 33, 34, 36, 38, 40, 34, 34,
34, 36, 38, 41, 44, 46, 39, 38, 38, 40, 42, 47, 52, 56,
+ /* Size 8x4 */
+ 31, 32, 34, 39, 32, 32, 34, 38, 32, 33, 34, 38, 32, 33, 36, 40, 33, 34,
+ 38, 42, 34, 36, 41, 47, 37, 38, 44, 52, 40, 40, 46, 56,
/* Size 8x16 */
- 32, 31, 31, 32, 32, 36, 36, 44, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32,
- 32, 32, 32, 35, 35, 42, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33,
- 33, 34, 34, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36,
- 36, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38, 38, 42,
- 34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 36, 34,
- 34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 39, 37, 37, 40,
- 40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 44, 41, 41, 43, 43, 53,
- 53, 63,
- /* Size 16x8 */
32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 34, 36, 36, 39, 39, 44, 31, 32,
32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 31, 32, 32, 32,
32, 32, 32, 33, 33, 34, 34, 34, 34, 37, 37, 41, 32, 32, 32, 33, 33, 34,
@@ -4685,37 +4676,16 @@
42, 48, 48, 50, 50, 53, 36, 35, 35, 34, 34, 36, 36, 38, 38, 42, 42, 48,
48, 50, 50, 53, 44, 42, 42, 41, 41, 42, 42, 42, 42, 48, 48, 54, 54, 58,
58, 63,
+ /* Size 16x8 */
+ 32, 31, 31, 32, 32, 36, 36, 44, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32,
+ 32, 32, 32, 35, 35, 42, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33,
+ 33, 34, 34, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36,
+ 36, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38, 38, 42,
+ 34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 36, 34,
+ 34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 39, 37, 37, 40,
+ 40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 44, 41, 41, 43, 43, 53,
+ 53, 63,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 36, 36, 39, 44, 44, 31, 31,
- 31, 31, 31, 32, 32, 32, 32, 34, 35, 35, 35, 39, 43, 43, 31, 32, 32, 32,
- 32, 32, 32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32,
- 32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32,
- 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32, 32, 34,
- 35, 35, 35, 38, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34,
- 34, 37, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37,
- 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37, 41, 41,
- 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 38, 41, 41, 32, 32,
- 32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32,
- 32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33,
- 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33, 34, 34,
- 34, 36, 37, 37, 37, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37,
- 38, 38, 38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38,
- 38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38, 38, 40,
- 42, 42, 33, 33, 33, 33, 33, 34, 36, 36, 36, 38, 40, 40, 40, 42, 45, 45,
- 34, 34, 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34,
- 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34, 34, 34,
- 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 35, 34, 34, 34, 34, 36,
- 37, 37, 37, 41, 45, 45, 45, 47, 50, 50, 36, 35, 34, 34, 34, 36, 38, 38,
- 38, 43, 48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43,
- 48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43, 48, 48,
- 48, 51, 54, 54, 37, 37, 36, 36, 36, 38, 39, 39, 39, 44, 49, 49, 49, 52,
- 56, 56, 39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58,
- 39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 39, 38,
- 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 41, 40, 39, 39,
- 39, 40, 42, 42, 42, 46, 52, 52, 52, 56, 60, 60, 44, 42, 41, 41, 41, 42,
- 43, 43, 43, 48, 53, 53, 53, 58, 63, 63, 44, 42, 41, 41, 41, 42, 43, 43,
- 43, 48, 53, 53, 53, 58, 63, 63,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
34, 34, 34, 35, 36, 36, 36, 37, 39, 39, 39, 41, 44, 44, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34,
@@ -4745,33 +4715,47 @@
48, 50, 54, 54, 54, 56, 58, 58, 58, 60, 63, 63, 44, 43, 42, 42, 42, 41,
41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 45, 48, 48, 48, 50, 54, 54,
54, 56, 58, 58, 58, 60, 63, 63,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 36, 36, 39, 44, 44, 31, 31,
+ 31, 31, 31, 32, 32, 32, 32, 34, 35, 35, 35, 39, 43, 43, 31, 32, 32, 32,
+ 32, 32, 32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32,
+ 32, 34, 35, 35, 35, 38, 42, 42, 31, 32, 32, 32, 32, 32, 32, 32, 32, 34,
+ 35, 35, 35, 38, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34,
+ 34, 37, 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37,
+ 41, 41, 31, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 37, 41, 41,
+ 31, 32, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 38, 41, 41, 32, 32,
+ 32, 32, 32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32,
+ 32, 33, 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33,
+ 34, 34, 34, 35, 36, 36, 36, 39, 42, 42, 32, 32, 32, 32, 32, 33, 34, 34,
+ 34, 36, 37, 37, 37, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37,
+ 38, 38, 38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38,
+ 38, 40, 42, 42, 32, 32, 33, 33, 33, 34, 35, 35, 35, 37, 38, 38, 38, 40,
+ 42, 42, 33, 33, 33, 33, 33, 34, 36, 36, 36, 38, 40, 40, 40, 42, 45, 45,
+ 34, 34, 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34,
+ 34, 34, 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 34, 34, 34, 34,
+ 34, 35, 37, 37, 37, 39, 42, 42, 42, 45, 48, 48, 35, 34, 34, 34, 34, 36,
+ 37, 37, 37, 41, 45, 45, 45, 47, 50, 50, 36, 35, 34, 34, 34, 36, 38, 38,
+ 38, 43, 48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43,
+ 48, 48, 48, 51, 54, 54, 36, 35, 34, 34, 34, 36, 38, 38, 38, 43, 48, 48,
+ 48, 51, 54, 54, 37, 37, 36, 36, 36, 38, 39, 39, 39, 44, 49, 49, 49, 52,
+ 56, 56, 39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58,
+ 39, 38, 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 39, 38,
+ 37, 37, 37, 39, 40, 40, 40, 45, 50, 50, 50, 54, 58, 58, 41, 40, 39, 39,
+ 39, 40, 42, 42, 42, 46, 52, 52, 52, 56, 60, 60, 44, 42, 41, 41, 41, 42,
+ 43, 43, 43, 48, 53, 53, 53, 58, 63, 63, 44, 42, 41, 41, 41, 42, 43, 43,
+ 43, 48, 53, 53, 53, 58, 63, 63,
/* Size 4x16 */
- 31, 32, 34, 39, 32, 32, 34, 38, 32, 32, 34, 38, 32, 32, 33, 37, 32, 32,
- 33, 37, 32, 33, 35, 39, 32, 33, 35, 39, 32, 34, 37, 40, 32, 34, 37, 40,
- 34, 35, 39, 45, 34, 35, 39, 45, 35, 36, 43, 51, 35, 36, 43, 51, 38, 39,
- 45, 54, 38, 39, 45, 54, 42, 42, 48, 58,
- /* Size 16x4 */
31, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 35, 35, 38, 38, 42, 32, 32,
32, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 39, 39, 42, 34, 34, 34, 33,
33, 35, 35, 37, 37, 39, 39, 43, 43, 45, 45, 48, 39, 38, 38, 37, 37, 39,
39, 40, 40, 45, 45, 51, 51, 54, 54, 58,
+ /* Size 16x4 */
+ 31, 32, 34, 39, 32, 32, 34, 38, 32, 32, 34, 38, 32, 32, 33, 37, 32, 32,
+ 33, 37, 32, 33, 35, 39, 32, 33, 35, 39, 32, 34, 37, 40, 32, 34, 37, 40,
+ 34, 35, 39, 45, 34, 35, 39, 45, 35, 36, 43, 51, 35, 36, 43, 51, 38, 39,
+ 45, 54, 38, 39, 45, 54, 42, 42, 48, 58,
/* Size 8x32 */
- 32, 31, 31, 32, 32, 36, 36, 44, 31, 31, 31, 32, 32, 35, 35, 43, 31, 32,
- 32, 32, 32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32, 32, 32,
- 32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 41, 31, 32, 32, 33, 33, 34,
- 34, 41, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33, 33, 34, 34, 41,
- 31, 32, 32, 33, 33, 35, 35, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32,
- 32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34,
- 34, 37, 37, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38,
- 38, 42, 32, 33, 33, 35, 35, 38, 38, 42, 33, 33, 33, 36, 36, 40, 40, 45,
- 34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 34, 34,
- 34, 37, 37, 42, 42, 48, 35, 34, 34, 37, 37, 45, 45, 50, 36, 34, 34, 38,
- 38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48,
- 48, 54, 37, 36, 36, 39, 39, 49, 49, 56, 39, 37, 37, 40, 40, 50, 50, 58,
- 39, 37, 37, 40, 40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 41, 39,
- 39, 42, 42, 52, 52, 60, 44, 41, 41, 43, 43, 53, 53, 63, 44, 41, 41, 43,
- 43, 53, 53, 63,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
34, 34, 34, 35, 36, 36, 36, 37, 39, 39, 39, 41, 44, 44, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
@@ -4786,7 +4770,23 @@
34, 34, 34, 35, 36, 36, 36, 37, 38, 38, 38, 40, 42, 42, 42, 45, 48, 48,
48, 49, 50, 50, 50, 52, 53, 53, 44, 43, 42, 42, 42, 41, 41, 41, 41, 41,
42, 42, 42, 42, 42, 42, 42, 45, 48, 48, 48, 50, 54, 54, 54, 56, 58, 58,
- 58, 60, 63, 63 },
+ 58, 60, 63, 63,
+ /* Size 32x8 */
+ 32, 31, 31, 32, 32, 36, 36, 44, 31, 31, 31, 32, 32, 35, 35, 43, 31, 32,
+ 32, 32, 32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 42, 31, 32, 32, 32,
+ 32, 35, 35, 42, 31, 32, 32, 32, 32, 35, 35, 41, 31, 32, 32, 33, 33, 34,
+ 34, 41, 31, 32, 32, 33, 33, 34, 34, 41, 31, 32, 32, 33, 33, 34, 34, 41,
+ 31, 32, 32, 33, 33, 35, 35, 41, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32,
+ 32, 34, 34, 36, 36, 42, 32, 32, 32, 34, 34, 36, 36, 42, 32, 32, 32, 34,
+ 34, 37, 37, 42, 32, 33, 33, 35, 35, 38, 38, 42, 32, 33, 33, 35, 35, 38,
+ 38, 42, 32, 33, 33, 35, 35, 38, 38, 42, 33, 33, 33, 36, 36, 40, 40, 45,
+ 34, 34, 34, 37, 37, 42, 42, 48, 34, 34, 34, 37, 37, 42, 42, 48, 34, 34,
+ 34, 37, 37, 42, 42, 48, 35, 34, 34, 37, 37, 45, 45, 50, 36, 34, 34, 38,
+ 38, 48, 48, 54, 36, 34, 34, 38, 38, 48, 48, 54, 36, 34, 34, 38, 38, 48,
+ 48, 54, 37, 36, 36, 39, 39, 49, 49, 56, 39, 37, 37, 40, 40, 50, 50, 58,
+ 39, 37, 37, 40, 40, 50, 50, 58, 39, 37, 37, 40, 40, 50, 50, 58, 41, 39,
+ 39, 42, 42, 52, 52, 60, 44, 41, 41, 43, 43, 53, 53, 63, 44, 41, 41, 43,
+ 43, 53, 53, 63 },
{ /* Chroma */
/* Size 4x4 */
31, 34, 42, 47, 34, 39, 45, 46, 42, 45, 48, 49, 47, 46, 49, 54,
@@ -4870,21 +4870,12 @@
58, 58, 49, 48, 47, 47, 47, 46, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
45, 47, 49, 49, 49, 51, 53, 53, 53, 54, 55, 55, 55, 57, 58, 58,
/* Size 4x8 */
- 31, 34, 42, 48, 31, 35, 42, 46, 33, 37, 44, 46, 36, 41, 46, 46, 40, 44,
- 48, 48, 45, 46, 49, 51, 47, 47, 50, 54, 47, 46, 49, 55,
- /* Size 8x4 */
31, 31, 33, 36, 40, 45, 47, 47, 34, 35, 37, 41, 44, 46, 47, 46, 42, 42,
44, 46, 48, 49, 50, 49, 48, 46, 46, 46, 48, 51, 54, 55,
+ /* Size 8x4 */
+ 31, 34, 42, 48, 31, 35, 42, 46, 33, 37, 44, 46, 36, 41, 46, 46, 40, 44,
+ 48, 48, 45, 46, 49, 51, 47, 47, 50, 54, 47, 46, 49, 55,
/* Size 8x16 */
- 32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31,
- 31, 38, 38, 47, 47, 47, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40,
- 40, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47,
- 47, 46, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45,
- 42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 49, 46,
- 46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 48, 46, 46, 47,
- 47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 49, 45, 45, 46, 46, 53,
- 53, 58,
- /* Size 16x8 */
32, 31, 31, 30, 30, 33, 33, 37, 37, 42, 42, 49, 49, 48, 48, 49, 31, 31,
31, 32, 32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 31, 31, 31, 32,
32, 36, 36, 40, 40, 43, 43, 46, 46, 46, 46, 45, 37, 38, 38, 40, 40, 43,
@@ -4893,37 +4884,16 @@
50, 53, 53, 53, 53, 53, 48, 47, 47, 46, 46, 47, 47, 47, 47, 50, 50, 53,
53, 53, 53, 53, 49, 47, 47, 45, 45, 46, 46, 45, 45, 49, 49, 53, 53, 56,
56, 58,
+ /* Size 16x8 */
+ 32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31,
+ 31, 38, 38, 47, 47, 47, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40,
+ 40, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47,
+ 47, 46, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45,
+ 42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 49, 46,
+ 46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 48, 46, 46, 47,
+ 47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 49, 45, 45, 46, 46, 53,
+ 53, 58,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 33, 37, 37, 37, 42, 48, 48, 48, 48, 49, 49, 31, 31,
- 31, 31, 31, 34, 37, 37, 37, 42, 47, 47, 47, 48, 48, 48, 31, 31, 31, 31,
- 31, 34, 38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34,
- 38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34, 38, 38,
- 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 32, 32, 32, 35, 39, 39, 39, 42,
- 46, 46, 46, 46, 46, 46, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46,
- 46, 45, 45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45,
- 45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45, 45, 45,
- 32, 33, 34, 34, 34, 37, 41, 41, 41, 44, 46, 46, 46, 46, 45, 45, 33, 34,
- 36, 36, 36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36,
- 36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36, 36, 39,
- 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 35, 36, 38, 38, 38, 41, 45, 45,
- 45, 46, 47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47,
- 47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47,
- 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47, 47, 46,
- 45, 45, 39, 40, 41, 41, 41, 44, 47, 47, 47, 48, 49, 49, 49, 48, 47, 47,
- 42, 42, 43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42,
- 43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42, 43, 43,
- 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 45, 45, 44, 44, 44, 46,
- 47, 47, 47, 49, 51, 51, 51, 51, 51, 51, 49, 48, 46, 46, 46, 47, 48, 48,
- 48, 50, 53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50,
- 53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50, 53, 53,
- 53, 53, 53, 53, 48, 47, 46, 46, 46, 47, 47, 47, 47, 50, 53, 53, 53, 54,
- 54, 54, 48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56,
- 48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47,
- 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47, 45, 45,
- 45, 46, 46, 46, 46, 49, 53, 53, 53, 55, 57, 57, 49, 47, 45, 45, 45, 45,
- 46, 46, 46, 49, 53, 53, 53, 56, 58, 58, 49, 47, 45, 45, 45, 45, 46, 46,
- 46, 49, 53, 53, 53, 56, 58, 58,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 33, 33, 35, 37, 37, 37, 39,
42, 42, 42, 45, 49, 49, 49, 48, 48, 48, 48, 48, 49, 49, 31, 31, 31, 31,
31, 31, 31, 31, 31, 33, 34, 34, 34, 36, 38, 38, 38, 40, 42, 42, 42, 45,
@@ -4953,33 +4923,47 @@
49, 51, 53, 53, 53, 54, 56, 56, 56, 57, 58, 58, 49, 48, 47, 47, 47, 46,
45, 45, 45, 45, 46, 46, 46, 45, 45, 45, 45, 47, 49, 49, 49, 51, 53, 53,
53, 54, 56, 56, 56, 57, 58, 58,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 33, 37, 37, 37, 42, 48, 48, 48, 48, 49, 49, 31, 31,
+ 31, 31, 31, 34, 37, 37, 37, 42, 47, 47, 47, 48, 48, 48, 31, 31, 31, 31,
+ 31, 34, 38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34,
+ 38, 38, 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 31, 31, 31, 34, 38, 38,
+ 38, 42, 47, 47, 47, 47, 47, 47, 31, 31, 32, 32, 32, 35, 39, 39, 39, 42,
+ 46, 46, 46, 46, 46, 46, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46,
+ 46, 45, 45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45,
+ 45, 45, 30, 31, 32, 32, 32, 35, 40, 40, 40, 42, 46, 46, 46, 45, 45, 45,
+ 32, 33, 34, 34, 34, 37, 41, 41, 41, 44, 46, 46, 46, 46, 45, 45, 33, 34,
+ 36, 36, 36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36,
+ 36, 39, 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 33, 34, 36, 36, 36, 39,
+ 43, 43, 43, 45, 47, 47, 47, 46, 46, 46, 35, 36, 38, 38, 38, 41, 45, 45,
+ 45, 46, 47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47,
+ 47, 47, 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47,
+ 47, 46, 45, 45, 37, 38, 40, 40, 40, 43, 47, 47, 47, 47, 47, 47, 47, 46,
+ 45, 45, 39, 40, 41, 41, 41, 44, 47, 47, 47, 48, 49, 49, 49, 48, 47, 47,
+ 42, 42, 43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42,
+ 43, 43, 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 42, 42, 43, 43,
+ 43, 45, 47, 47, 47, 48, 50, 50, 50, 50, 49, 49, 45, 45, 44, 44, 44, 46,
+ 47, 47, 47, 49, 51, 51, 51, 51, 51, 51, 49, 48, 46, 46, 46, 47, 48, 48,
+ 48, 50, 53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50,
+ 53, 53, 53, 53, 53, 53, 49, 48, 46, 46, 46, 47, 48, 48, 48, 50, 53, 53,
+ 53, 53, 53, 53, 48, 47, 46, 46, 46, 47, 47, 47, 47, 50, 53, 53, 53, 54,
+ 54, 54, 48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56,
+ 48, 47, 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47,
+ 46, 46, 46, 46, 47, 47, 47, 50, 53, 53, 53, 54, 56, 56, 48, 47, 45, 45,
+ 45, 46, 46, 46, 46, 49, 53, 53, 53, 55, 57, 57, 49, 47, 45, 45, 45, 45,
+ 46, 46, 46, 49, 53, 53, 53, 56, 58, 58, 49, 47, 45, 45, 45, 45, 46, 46,
+ 46, 49, 53, 53, 53, 56, 58, 58,
/* Size 4x16 */
- 31, 33, 42, 48, 31, 34, 42, 47, 31, 34, 42, 47, 31, 35, 42, 45, 31, 35,
- 42, 45, 34, 39, 45, 46, 34, 39, 45, 46, 38, 43, 47, 46, 38, 43, 47, 46,
- 42, 45, 48, 50, 42, 45, 48, 50, 48, 47, 50, 53, 48, 47, 50, 53, 47, 46,
- 50, 54, 47, 46, 50, 54, 47, 45, 49, 56,
- /* Size 16x4 */
31, 31, 31, 31, 31, 34, 34, 38, 38, 42, 42, 48, 48, 47, 47, 47, 33, 34,
34, 35, 35, 39, 39, 43, 43, 45, 45, 47, 47, 46, 46, 45, 42, 42, 42, 42,
42, 45, 45, 47, 47, 48, 48, 50, 50, 50, 50, 49, 48, 47, 47, 45, 45, 46,
46, 46, 46, 50, 50, 53, 53, 54, 54, 56,
+ /* Size 16x4 */
+ 31, 33, 42, 48, 31, 34, 42, 47, 31, 34, 42, 47, 31, 35, 42, 45, 31, 35,
+ 42, 45, 34, 39, 45, 46, 34, 39, 45, 46, 38, 43, 47, 46, 38, 43, 47, 46,
+ 42, 45, 48, 50, 42, 45, 48, 50, 48, 47, 50, 53, 48, 47, 50, 53, 47, 46,
+ 50, 54, 47, 46, 50, 54, 47, 45, 49, 56,
/* Size 8x32 */
- 32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 37, 37, 47, 47, 48, 31, 31,
- 31, 38, 38, 47, 47, 47, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31, 31, 38,
- 38, 47, 47, 47, 31, 32, 32, 39, 39, 46, 46, 46, 30, 32, 32, 40, 40, 46,
- 46, 45, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40, 40, 46, 46, 45,
- 32, 34, 34, 41, 41, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36,
- 36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47, 47, 46, 35, 38, 38, 45,
- 45, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47,
- 47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 39, 41, 41, 47, 47, 49, 49, 47,
- 42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 42, 43,
- 43, 47, 47, 50, 50, 49, 45, 44, 44, 47, 47, 51, 51, 51, 49, 46, 46, 48,
- 48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53,
- 53, 53, 48, 46, 46, 47, 47, 53, 53, 54, 48, 46, 46, 47, 47, 53, 53, 56,
- 48, 46, 46, 47, 47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 48, 45,
- 45, 46, 46, 53, 53, 57, 49, 45, 45, 46, 46, 53, 53, 58, 49, 45, 45, 46,
- 46, 53, 53, 58,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 30, 30, 30, 32, 33, 33, 33, 35, 37, 37, 37, 39,
42, 42, 42, 45, 49, 49, 49, 48, 48, 48, 48, 48, 49, 49, 31, 31, 31, 31,
31, 32, 32, 32, 32, 34, 36, 36, 36, 38, 40, 40, 40, 41, 43, 43, 43, 44,
@@ -4994,7 +4978,23 @@
46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 49, 50, 50, 50, 51, 53, 53,
53, 53, 53, 53, 53, 53, 53, 53, 49, 48, 47, 47, 47, 46, 45, 45, 45, 45,
46, 46, 46, 45, 45, 45, 45, 47, 49, 49, 49, 51, 53, 53, 53, 54, 56, 56,
- 56, 57, 58, 58 },
+ 56, 57, 58, 58,
+ /* Size 32x8 */
+ 32, 31, 31, 37, 37, 48, 48, 49, 31, 31, 31, 37, 37, 47, 47, 48, 31, 31,
+ 31, 38, 38, 47, 47, 47, 31, 31, 31, 38, 38, 47, 47, 47, 31, 31, 31, 38,
+ 38, 47, 47, 47, 31, 32, 32, 39, 39, 46, 46, 46, 30, 32, 32, 40, 40, 46,
+ 46, 45, 30, 32, 32, 40, 40, 46, 46, 45, 30, 32, 32, 40, 40, 46, 46, 45,
+ 32, 34, 34, 41, 41, 46, 46, 45, 33, 36, 36, 43, 43, 47, 47, 46, 33, 36,
+ 36, 43, 43, 47, 47, 46, 33, 36, 36, 43, 43, 47, 47, 46, 35, 38, 38, 45,
+ 45, 47, 47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 37, 40, 40, 47, 47, 47,
+ 47, 45, 37, 40, 40, 47, 47, 47, 47, 45, 39, 41, 41, 47, 47, 49, 49, 47,
+ 42, 43, 43, 47, 47, 50, 50, 49, 42, 43, 43, 47, 47, 50, 50, 49, 42, 43,
+ 43, 47, 47, 50, 50, 49, 45, 44, 44, 47, 47, 51, 51, 51, 49, 46, 46, 48,
+ 48, 53, 53, 53, 49, 46, 46, 48, 48, 53, 53, 53, 49, 46, 46, 48, 48, 53,
+ 53, 53, 48, 46, 46, 47, 47, 53, 53, 54, 48, 46, 46, 47, 47, 53, 53, 56,
+ 48, 46, 46, 47, 47, 53, 53, 56, 48, 46, 46, 47, 47, 53, 53, 56, 48, 45,
+ 45, 46, 46, 53, 53, 57, 49, 45, 45, 46, 46, 53, 53, 58, 49, 45, 45, 46,
+ 46, 53, 53, 58 },
},
{
{ /* Luma */
@@ -5080,21 +5080,12 @@
48, 49, 37, 37, 36, 36, 36, 36, 36, 35, 35, 35, 35, 36, 37, 37, 37, 37,
38, 39, 39, 39, 39, 41, 42, 43, 43, 43, 45, 48, 49, 49, 49, 50,
/* Size 4x8 */
- 31, 31, 32, 35, 32, 32, 32, 35, 32, 32, 33, 34, 32, 32, 34, 36, 32, 33,
- 35, 38, 33, 33, 36, 40, 34, 34, 37, 42, 35, 34, 38, 48,
- /* Size 8x4 */
31, 32, 32, 32, 32, 33, 34, 35, 31, 32, 32, 32, 33, 33, 34, 34, 32, 32,
33, 34, 35, 36, 37, 38, 35, 35, 34, 36, 38, 40, 42, 48,
+ /* Size 8x4 */
+ 31, 31, 32, 35, 32, 32, 32, 35, 32, 32, 33, 34, 32, 32, 34, 36, 32, 33,
+ 35, 38, 33, 33, 36, 40, 34, 34, 37, 42, 35, 34, 38, 48,
/* Size 8x16 */
- 32, 31, 31, 31, 32, 32, 35, 36, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32,
- 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32,
- 33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 33, 34, 34,
- 35, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 37,
- 32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 33, 33,
- 33, 35, 36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35,
- 37, 37, 43, 44, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38,
- 46, 48,
- /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 34, 34, 36, 36, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32, 32, 32,
32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32, 32, 32,
@@ -5103,37 +5094,16 @@
35, 36, 37, 37, 38, 38, 35, 35, 35, 34, 34, 34, 35, 36, 36, 37, 37, 40,
41, 43, 46, 46, 36, 35, 35, 35, 34, 34, 36, 36, 37, 38, 38, 41, 42, 44,
48, 48,
+ /* Size 16x8 */
+ 32, 31, 31, 31, 32, 32, 35, 36, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32,
+ 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32,
+ 33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 33, 34, 34,
+ 35, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 37,
+ 32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 33, 33,
+ 33, 35, 36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35,
+ 37, 37, 43, 44, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38,
+ 46, 48,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31,
- 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
- 34, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35,
- 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
- 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 33, 34, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
- 33, 33, 34, 34, 34, 34, 35, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34,
- 34, 34, 34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34,
- 34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35,
- 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 37,
- 37, 37, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 36, 37, 38, 38, 38,
- 32, 32, 32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32,
- 32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32, 32, 33,
- 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 33, 33, 33, 33, 33,
- 34, 35, 36, 36, 36, 37, 39, 40, 40, 40, 33, 33, 33, 33, 33, 33, 35, 36,
- 36, 36, 36, 38, 40, 41, 41, 41, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37,
- 37, 39, 41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39,
- 41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39, 41, 42,
- 42, 42, 34, 34, 34, 34, 34, 34, 35, 37, 37, 37, 37, 40, 43, 44, 44, 44,
- 35, 35, 34, 34, 34, 34, 36, 37, 38, 38, 38, 41, 45, 47, 47, 47, 36, 35,
- 35, 34, 34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34,
- 34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34, 34, 34,
- 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 37, 36, 36, 36, 36, 36, 37, 38,
- 39, 39, 39, 42, 46, 49, 49, 49,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 36, 36, 37, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
@@ -5163,33 +5133,47 @@
38, 40, 41, 42, 42, 42, 44, 47, 48, 48, 48, 49, 36, 35, 35, 35, 35, 35,
35, 35, 34, 34, 34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 41, 42,
42, 42, 44, 47, 48, 48, 48, 49,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31,
+ 31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 33, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+ 34, 35, 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 35,
+ 35, 35, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34,
+ 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 34, 31, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 33, 34, 35, 35, 35, 35, 31, 32, 32, 32, 32, 32,
+ 33, 33, 34, 34, 34, 34, 35, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34,
+ 34, 34, 34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34,
+ 34, 35, 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35,
+ 36, 36, 36, 36, 32, 32, 32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 37,
+ 37, 37, 32, 32, 32, 33, 33, 33, 33, 34, 35, 35, 35, 36, 37, 38, 38, 38,
+ 32, 32, 32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32,
+ 32, 33, 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 32, 32, 33,
+ 33, 33, 34, 35, 35, 35, 35, 36, 37, 38, 38, 38, 32, 33, 33, 33, 33, 33,
+ 34, 35, 36, 36, 36, 37, 39, 40, 40, 40, 33, 33, 33, 33, 33, 33, 35, 36,
+ 36, 36, 36, 38, 40, 41, 41, 41, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37,
+ 37, 39, 41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39,
+ 41, 42, 42, 42, 34, 34, 34, 34, 34, 34, 35, 36, 37, 37, 37, 39, 41, 42,
+ 42, 42, 34, 34, 34, 34, 34, 34, 35, 37, 37, 37, 37, 40, 43, 44, 44, 44,
+ 35, 35, 34, 34, 34, 34, 36, 37, 38, 38, 38, 41, 45, 47, 47, 47, 36, 35,
+ 35, 34, 34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34,
+ 34, 34, 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 36, 35, 35, 34, 34, 34,
+ 36, 37, 38, 38, 38, 42, 46, 48, 48, 48, 37, 36, 36, 36, 36, 36, 37, 38,
+ 39, 39, 39, 42, 46, 49, 49, 49,
/* Size 4x16 */
- 31, 31, 32, 36, 31, 32, 32, 35, 32, 32, 32, 35, 32, 32, 32, 35, 32, 32,
- 33, 34, 32, 32, 33, 34, 32, 32, 34, 36, 32, 32, 34, 36, 32, 32, 34, 37,
- 32, 33, 35, 38, 32, 33, 35, 38, 33, 33, 36, 41, 34, 34, 37, 42, 34, 34,
- 37, 44, 35, 34, 38, 48, 35, 34, 38, 48,
- /* Size 16x4 */
31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 35, 35, 31, 32,
32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 32, 32, 32, 32,
33, 33, 34, 34, 34, 35, 35, 36, 37, 37, 38, 38, 36, 35, 35, 35, 34, 34,
36, 36, 37, 38, 38, 41, 42, 44, 48, 48,
+ /* Size 16x4 */
+ 31, 31, 32, 36, 31, 32, 32, 35, 32, 32, 32, 35, 32, 32, 32, 35, 32, 32,
+ 33, 34, 32, 32, 33, 34, 32, 32, 34, 36, 32, 32, 34, 36, 32, 32, 34, 37,
+ 32, 33, 35, 38, 32, 33, 35, 38, 33, 33, 36, 41, 34, 34, 37, 42, 34, 34,
+ 37, 44, 35, 34, 38, 48, 35, 34, 38, 48,
/* Size 8x32 */
- 32, 31, 31, 31, 32, 32, 35, 36, 31, 31, 31, 32, 32, 32, 35, 35, 31, 32,
- 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32,
- 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32,
- 34, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32, 33, 33, 34, 34,
- 31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32,
- 32, 33, 33, 33, 35, 35, 31, 32, 32, 33, 34, 34, 35, 36, 32, 32, 32, 33,
- 34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34,
- 36, 36, 32, 32, 32, 33, 34, 34, 36, 37, 32, 32, 33, 33, 35, 35, 37, 38,
- 32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 32, 32,
- 33, 34, 35, 35, 37, 38, 32, 33, 33, 34, 36, 36, 39, 40, 33, 33, 33, 35,
- 36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37,
- 41, 42, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37, 43, 44,
- 35, 34, 34, 36, 38, 38, 45, 47, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35,
- 34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38, 46, 48, 37, 36, 36, 37,
- 39, 39, 46, 49,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
32, 32, 32, 32, 33, 34, 34, 34, 34, 35, 36, 36, 36, 37, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
@@ -5204,7 +5188,23 @@
34, 34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 37, 37, 37, 37, 39, 40, 41,
41, 41, 43, 45, 46, 46, 46, 46, 36, 35, 35, 35, 35, 35, 35, 35, 34, 34,
34, 35, 36, 36, 36, 36, 37, 38, 38, 38, 38, 40, 41, 42, 42, 42, 44, 47,
- 48, 48, 48, 49 },
+ 48, 48, 48, 49,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 32, 32, 35, 36, 31, 31, 31, 32, 32, 32, 35, 35, 31, 32,
+ 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32,
+ 32, 32, 35, 35, 31, 32, 32, 32, 32, 32, 35, 35, 31, 32, 32, 32, 32, 32,
+ 34, 35, 31, 32, 32, 32, 32, 32, 34, 35, 31, 32, 32, 32, 33, 33, 34, 34,
+ 31, 32, 32, 32, 33, 33, 34, 34, 31, 32, 32, 32, 33, 33, 34, 34, 31, 32,
+ 32, 33, 33, 33, 35, 35, 31, 32, 32, 33, 34, 34, 35, 36, 32, 32, 32, 33,
+ 34, 34, 36, 36, 32, 32, 32, 33, 34, 34, 36, 36, 32, 32, 32, 33, 34, 34,
+ 36, 36, 32, 32, 32, 33, 34, 34, 36, 37, 32, 32, 33, 33, 35, 35, 37, 38,
+ 32, 32, 33, 34, 35, 35, 37, 38, 32, 32, 33, 34, 35, 35, 37, 38, 32, 32,
+ 33, 34, 35, 35, 37, 38, 32, 33, 33, 34, 36, 36, 39, 40, 33, 33, 33, 35,
+ 36, 36, 40, 41, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37,
+ 41, 42, 34, 34, 34, 35, 37, 37, 41, 42, 34, 34, 34, 35, 37, 37, 43, 44,
+ 35, 34, 34, 36, 38, 38, 45, 47, 36, 35, 34, 36, 38, 38, 46, 48, 36, 35,
+ 34, 36, 38, 38, 46, 48, 36, 35, 34, 36, 38, 38, 46, 48, 37, 36, 36, 37,
+ 39, 39, 46, 49 },
{ /* Chroma */
/* Size 4x4 */
31, 32, 38, 46, 32, 34, 41, 46, 38, 41, 47, 47, 46, 46, 47, 52,
@@ -5288,21 +5288,12 @@
53, 53, 49, 48, 47, 47, 47, 47, 47, 46, 46, 46, 46, 46, 46, 47, 47, 47,
47, 47, 47, 47, 47, 48, 49, 50, 50, 50, 51, 52, 53, 53, 53, 53,
/* Size 4x8 */
- 31, 31, 37, 48, 31, 31, 38, 47, 31, 32, 40, 46, 34, 36, 43, 47, 37, 39,
- 46, 47, 39, 41, 47, 48, 42, 43, 47, 50, 48, 46, 48, 53,
- /* Size 8x4 */
31, 31, 31, 34, 37, 39, 42, 48, 31, 31, 32, 36, 39, 41, 43, 46, 37, 38,
40, 43, 46, 47, 47, 48, 48, 47, 46, 47, 47, 48, 50, 53,
+ /* Size 8x4 */
+ 31, 31, 37, 48, 31, 31, 38, 47, 31, 32, 40, 46, 34, 36, 43, 47, 37, 39,
+ 46, 47, 39, 41, 47, 48, 42, 43, 47, 50, 48, 46, 48, 53,
/* Size 8x16 */
- 32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31,
- 31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39, 45, 46, 30, 32, 32, 35,
- 40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 33, 34, 35, 37, 42, 42,
- 46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 35, 37, 37, 40, 44, 44, 46, 47,
- 37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 41, 42,
- 42, 44, 47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45,
- 47, 47, 50, 51, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48,
- 52, 53,
- /* Size 16x8 */
32, 31, 31, 31, 30, 30, 33, 33, 35, 37, 37, 41, 42, 44, 49, 49, 31, 31,
31, 31, 32, 32, 34, 35, 37, 39, 39, 42, 42, 44, 47, 47, 31, 31, 31, 32,
32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 33, 34, 34, 34, 35, 35,
@@ -5311,37 +5302,16 @@
47, 47, 47, 47, 48, 48, 45, 45, 45, 45, 44, 44, 46, 46, 46, 47, 47, 49,
49, 50, 52, 52, 48, 47, 47, 46, 46, 46, 47, 47, 47, 47, 47, 49, 50, 51,
53, 53,
+ /* Size 16x8 */
+ 32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31,
+ 31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39, 45, 46, 30, 32, 32, 35,
+ 40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 33, 34, 35, 37, 42, 42,
+ 46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 35, 37, 37, 40, 44, 44, 46, 47,
+ 37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 41, 42,
+ 42, 44, 47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45,
+ 47, 47, 50, 51, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48,
+ 52, 53,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 40, 45, 48, 48, 48, 31, 31,
- 31, 31, 31, 31, 33, 36, 37, 37, 37, 41, 45, 48, 48, 48, 31, 31, 31, 31,
- 31, 31, 34, 36, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31,
- 34, 37, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37,
- 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37, 38, 38,
- 38, 41, 45, 47, 47, 47, 31, 31, 31, 32, 32, 32, 34, 37, 39, 39, 39, 41,
- 45, 46, 46, 46, 30, 31, 31, 32, 32, 32, 34, 38, 39, 39, 39, 42, 44, 46,
- 46, 46, 30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46,
- 30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 30, 31,
- 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 31, 32, 33, 33,
- 33, 33, 36, 39, 41, 41, 41, 43, 45, 46, 46, 46, 33, 34, 34, 35, 35, 35,
- 37, 40, 42, 42, 42, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41,
- 43, 43, 43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43,
- 43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43, 43, 44,
- 46, 47, 47, 47, 35, 36, 37, 37, 37, 37, 40, 43, 44, 44, 44, 45, 46, 47,
- 47, 47, 36, 37, 38, 39, 39, 39, 42, 44, 46, 46, 46, 47, 47, 47, 47, 47,
- 37, 38, 39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38,
- 39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38, 39, 40,
- 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 39, 39, 40, 41, 41, 41,
- 43, 46, 47, 47, 47, 48, 48, 48, 48, 48, 41, 41, 42, 42, 42, 42, 44, 46,
- 47, 47, 47, 48, 49, 49, 49, 49, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47,
- 47, 48, 49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48,
- 49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48, 49, 50,
- 50, 50, 44, 44, 44, 44, 44, 44, 45, 47, 47, 47, 47, 49, 50, 51, 51, 51,
- 47, 46, 46, 46, 46, 46, 46, 47, 48, 48, 48, 49, 51, 52, 52, 52, 49, 48,
- 47, 46, 46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46,
- 46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46,
- 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46, 47, 47,
- 47, 47, 47, 49, 52, 53, 53, 53,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 35, 36,
37, 37, 37, 39, 41, 42, 42, 42, 44, 47, 49, 49, 49, 49, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 36, 37, 38, 38, 38, 39,
@@ -5371,33 +5341,47 @@
47, 48, 49, 50, 50, 50, 51, 52, 53, 53, 53, 53, 48, 48, 47, 47, 47, 47,
46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50,
50, 50, 51, 52, 53, 53, 53, 53,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 40, 45, 48, 48, 48, 31, 31,
+ 31, 31, 31, 31, 33, 36, 37, 37, 37, 41, 45, 48, 48, 48, 31, 31, 31, 31,
+ 31, 31, 34, 36, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31,
+ 34, 37, 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37,
+ 38, 38, 38, 41, 45, 47, 47, 47, 31, 31, 31, 31, 31, 31, 34, 37, 38, 38,
+ 38, 41, 45, 47, 47, 47, 31, 31, 31, 32, 32, 32, 34, 37, 39, 39, 39, 41,
+ 45, 46, 46, 46, 30, 31, 31, 32, 32, 32, 34, 38, 39, 39, 39, 42, 44, 46,
+ 46, 46, 30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46,
+ 30, 31, 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 30, 31,
+ 32, 32, 32, 32, 35, 38, 40, 40, 40, 42, 44, 46, 46, 46, 31, 32, 33, 33,
+ 33, 33, 36, 39, 41, 41, 41, 43, 45, 46, 46, 46, 33, 34, 34, 35, 35, 35,
+ 37, 40, 42, 42, 42, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41,
+ 43, 43, 43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43,
+ 43, 44, 46, 47, 47, 47, 33, 34, 35, 36, 36, 36, 38, 41, 43, 43, 43, 44,
+ 46, 47, 47, 47, 35, 36, 37, 37, 37, 37, 40, 43, 44, 44, 44, 45, 46, 47,
+ 47, 47, 36, 37, 38, 39, 39, 39, 42, 44, 46, 46, 46, 47, 47, 47, 47, 47,
+ 37, 38, 39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38,
+ 39, 40, 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 37, 38, 39, 40,
+ 40, 40, 43, 45, 47, 47, 47, 47, 47, 47, 47, 47, 39, 39, 40, 41, 41, 41,
+ 43, 46, 47, 47, 47, 48, 48, 48, 48, 48, 41, 41, 42, 42, 42, 42, 44, 46,
+ 47, 47, 47, 48, 49, 49, 49, 49, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47,
+ 47, 48, 49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48,
+ 49, 50, 50, 50, 42, 42, 42, 43, 43, 43, 44, 46, 47, 47, 47, 48, 49, 50,
+ 50, 50, 44, 44, 44, 44, 44, 44, 45, 47, 47, 47, 47, 49, 50, 51, 51, 51,
+ 47, 46, 46, 46, 46, 46, 46, 47, 48, 48, 48, 49, 51, 52, 52, 52, 49, 48,
+ 47, 46, 46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46,
+ 46, 46, 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46,
+ 47, 48, 48, 48, 48, 50, 52, 53, 53, 53, 49, 48, 47, 46, 46, 46, 47, 47,
+ 47, 47, 47, 49, 52, 53, 53, 53,
/* Size 4x16 */
- 31, 31, 37, 48, 31, 31, 38, 47, 31, 31, 38, 47, 31, 32, 39, 46, 31, 32,
- 40, 46, 31, 32, 40, 46, 34, 35, 42, 47, 34, 36, 43, 47, 36, 37, 44, 47,
- 38, 40, 47, 47, 38, 40, 47, 47, 41, 42, 47, 49, 42, 43, 47, 50, 44, 44,
- 47, 51, 48, 46, 48, 53, 48, 46, 48, 53,
- /* Size 16x4 */
31, 31, 31, 31, 31, 31, 34, 34, 36, 38, 38, 41, 42, 44, 48, 48, 31, 31,
31, 32, 32, 32, 35, 36, 37, 40, 40, 42, 43, 44, 46, 46, 37, 38, 38, 39,
40, 40, 42, 43, 44, 47, 47, 47, 47, 47, 48, 48, 48, 47, 47, 46, 46, 46,
47, 47, 47, 47, 47, 49, 50, 51, 53, 53,
+ /* Size 16x4 */
+ 31, 31, 37, 48, 31, 31, 38, 47, 31, 31, 38, 47, 31, 32, 39, 46, 31, 32,
+ 40, 46, 31, 32, 40, 46, 34, 35, 42, 47, 34, 36, 43, 47, 36, 37, 44, 47,
+ 38, 40, 47, 47, 38, 40, 47, 47, 41, 42, 47, 49, 42, 43, 47, 50, 44, 44,
+ 47, 51, 48, 46, 48, 53, 48, 46, 48, 53,
/* Size 8x32 */
- 32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 33, 37, 37, 45, 48, 31, 31,
- 31, 34, 38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 31, 34,
- 38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39,
- 45, 46, 30, 31, 32, 34, 39, 39, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46,
- 30, 32, 32, 35, 40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 31, 33,
- 33, 36, 41, 41, 45, 46, 33, 34, 35, 37, 42, 42, 46, 47, 33, 35, 36, 38,
- 43, 43, 46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 33, 35, 36, 38, 43, 43,
- 46, 47, 35, 37, 37, 40, 44, 44, 46, 47, 36, 38, 39, 42, 46, 46, 47, 47,
- 37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 37, 39,
- 40, 43, 47, 47, 47, 47, 39, 40, 41, 43, 47, 47, 48, 48, 41, 42, 42, 44,
- 47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 42, 42, 43, 44, 47, 47,
- 49, 50, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45, 47, 47, 50, 51,
- 47, 46, 46, 46, 48, 48, 51, 52, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47,
- 46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47,
- 47, 47, 52, 53,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 35, 36,
37, 37, 37, 39, 41, 42, 42, 42, 44, 47, 49, 49, 49, 49, 31, 31, 31, 31,
31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 37, 38, 39, 39, 39, 40,
@@ -5412,7 +5396,23 @@
45, 44, 44, 44, 44, 45, 46, 46, 46, 46, 46, 47, 47, 47, 47, 48, 49, 49,
49, 49, 50, 51, 52, 52, 52, 52, 48, 48, 47, 47, 47, 47, 46, 46, 46, 46,
46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50, 50, 50, 51, 52,
- 53, 53, 53, 53 },
+ 53, 53, 53, 53,
+ /* Size 32x8 */
+ 32, 31, 31, 33, 37, 37, 45, 48, 31, 31, 31, 33, 37, 37, 45, 48, 31, 31,
+ 31, 34, 38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 31, 34,
+ 38, 38, 45, 47, 31, 31, 31, 34, 38, 38, 45, 47, 31, 31, 32, 34, 39, 39,
+ 45, 46, 30, 31, 32, 34, 39, 39, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46,
+ 30, 32, 32, 35, 40, 40, 44, 46, 30, 32, 32, 35, 40, 40, 44, 46, 31, 33,
+ 33, 36, 41, 41, 45, 46, 33, 34, 35, 37, 42, 42, 46, 47, 33, 35, 36, 38,
+ 43, 43, 46, 47, 33, 35, 36, 38, 43, 43, 46, 47, 33, 35, 36, 38, 43, 43,
+ 46, 47, 35, 37, 37, 40, 44, 44, 46, 47, 36, 38, 39, 42, 46, 46, 47, 47,
+ 37, 39, 40, 43, 47, 47, 47, 47, 37, 39, 40, 43, 47, 47, 47, 47, 37, 39,
+ 40, 43, 47, 47, 47, 47, 39, 40, 41, 43, 47, 47, 48, 48, 41, 42, 42, 44,
+ 47, 47, 49, 49, 42, 42, 43, 44, 47, 47, 49, 50, 42, 42, 43, 44, 47, 47,
+ 49, 50, 42, 42, 43, 44, 47, 47, 49, 50, 44, 44, 44, 45, 47, 47, 50, 51,
+ 47, 46, 46, 46, 48, 48, 51, 52, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47,
+ 46, 47, 48, 48, 52, 53, 49, 47, 46, 47, 48, 48, 52, 53, 49, 47, 46, 47,
+ 47, 47, 52, 53 },
},
{
{ /* Luma */
@@ -5498,21 +5498,12 @@
39, 39, 34, 34, 34, 34, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 34, 34,
35, 35, 35, 35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39,
/* Size 4x8 */
- 31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32,
- 33, 34, 32, 32, 34, 34, 32, 33, 34, 35, 33, 33, 35, 36,
- /* Size 8x4 */
31, 31, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 33, 33, 32, 32,
32, 32, 33, 34, 34, 35, 32, 32, 32, 33, 34, 34, 35, 36,
+ /* Size 8x4 */
+ 31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32,
+ 33, 34, 32, 32, 34, 34, 32, 33, 34, 35, 33, 33, 35, 36,
/* Size 8x16 */
- 32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 33, 31, 32,
- 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
- 32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33,
- 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 34, 34, 34,
- 32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
- 32, 32, 33, 35, 35, 35, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33,
- 34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 34, 34, 34, 34, 35, 37,
- 37, 38,
- /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 31, 31,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32,
@@ -5521,37 +5512,16 @@
34, 35, 35, 35, 36, 37, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35,
35, 35, 36, 37, 32, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 35, 36, 36,
36, 38,
+ /* Size 16x8 */
+ 32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 33, 31, 32,
+ 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+ 32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33,
+ 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 34, 34, 34,
+ 32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
+ 32, 32, 33, 35, 35, 35, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33,
+ 34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 34, 34, 34, 34, 35, 37,
+ 37, 38,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 31, 31,
- 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 31, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34,
- 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 31, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33,
- 33, 33, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
- 33, 33, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34,
- 34, 35, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35,
- 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32,
- 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32,
- 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32, 32, 32,
- 32, 33, 33, 34, 34, 34, 34, 34, 35, 35, 32, 32, 32, 32, 32, 32, 32, 33,
- 33, 34, 35, 35, 35, 35, 35, 36, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34,
- 35, 35, 35, 35, 36, 36, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35,
- 35, 35, 36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35,
- 36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37,
- 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37, 32, 33,
- 33, 33, 33, 33, 33, 33, 34, 35, 36, 36, 36, 36, 36, 38, 33, 33, 33, 33,
- 33, 33, 33, 34, 34, 35, 36, 36, 36, 36, 37, 38, 34, 34, 34, 34, 34, 34,
- 34, 34, 35, 36, 37, 37, 37, 37, 38, 39, 34, 34, 34, 34, 34, 34, 34, 34,
- 35, 36, 37, 37, 37, 37, 38, 39,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 31,
31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -5581,33 +5551,47 @@
34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 38, 34, 34, 34, 34, 34, 34,
34, 34, 34, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36,
37, 37, 37, 37, 38, 38, 39, 39,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 34, 31, 31,
+ 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 31, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 33, 34, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34,
+ 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 31, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+ 33, 33, 33, 33, 33, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+ 33, 33, 34, 34, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 34, 34,
+ 34, 35, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35,
+ 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32,
+ 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32,
+ 32, 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 32, 32, 32, 32, 32, 32,
+ 32, 33, 33, 34, 34, 34, 34, 34, 35, 35, 32, 32, 32, 32, 32, 32, 32, 33,
+ 33, 34, 35, 35, 35, 35, 35, 36, 32, 32, 32, 32, 33, 33, 33, 33, 33, 34,
+ 35, 35, 35, 35, 36, 36, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35,
+ 35, 35, 36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35,
+ 36, 37, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37,
+ 32, 32, 32, 33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 36, 37, 32, 33,
+ 33, 33, 33, 33, 33, 33, 34, 35, 36, 36, 36, 36, 36, 38, 33, 33, 33, 33,
+ 33, 33, 33, 34, 34, 35, 36, 36, 36, 36, 37, 38, 34, 34, 34, 34, 34, 34,
+ 34, 34, 35, 36, 37, 37, 37, 37, 38, 39, 34, 34, 34, 34, 34, 34, 34, 34,
+ 35, 36, 37, 37, 37, 37, 38, 39,
/* Size 4x16 */
- 31, 31, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
- 32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 33, 33, 32, 32, 33, 34,
- 32, 32, 33, 34, 32, 32, 33, 34, 32, 32, 34, 35, 32, 33, 34, 35, 32, 33,
- 34, 35, 33, 33, 35, 36, 34, 34, 36, 37,
- /* Size 16x4 */
31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 32, 32, 32, 32,
32, 32, 32, 33, 33, 33, 33, 34, 34, 34, 35, 36, 32, 32, 32, 32, 32, 33,
33, 33, 34, 34, 34, 35, 35, 35, 36, 37,
+ /* Size 16x4 */
+ 31, 31, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+ 32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 33, 33, 32, 32, 33, 34,
+ 32, 32, 33, 34, 32, 32, 33, 34, 32, 32, 34, 35, 32, 33, 34, 35, 32, 33,
+ 34, 35, 33, 33, 35, 36, 34, 34, 36, 37,
/* Size 8x32 */
- 32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 33, 31, 31,
- 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
- 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32,
- 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33,
- 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32,
- 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32,
- 32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 33,
- 33, 34, 31, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34,
- 32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
- 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 35, 32, 32, 32, 32,
- 33, 35, 35, 35, 32, 32, 33, 33, 33, 35, 35, 36, 32, 32, 33, 33, 34, 35,
- 35, 36, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33, 34, 35, 35, 36,
- 32, 32, 33, 33, 34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 33, 33,
- 33, 33, 34, 36, 36, 37, 34, 34, 34, 34, 35, 37, 37, 38, 34, 34, 34, 34,
- 35, 37, 37, 38,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 34, 34, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -5622,7 +5606,23 @@
32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35,
35, 35, 35, 35, 36, 36, 37, 37, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 36, 36, 36, 36, 36,
- 36, 37, 38, 38 },
+ 36, 37, 38, 38,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 33, 31, 31,
+ 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+ 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32,
+ 32, 33, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 32, 32, 33,
+ 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32,
+ 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32,
+ 32, 33, 33, 33, 31, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 33, 33,
+ 33, 34, 31, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34,
+ 32, 32, 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 34, 32, 32,
+ 32, 32, 33, 34, 34, 34, 32, 32, 32, 32, 33, 34, 34, 35, 32, 32, 32, 32,
+ 33, 35, 35, 35, 32, 32, 33, 33, 33, 35, 35, 36, 32, 32, 33, 33, 34, 35,
+ 35, 36, 32, 32, 33, 33, 34, 35, 35, 36, 32, 32, 33, 33, 34, 35, 35, 36,
+ 32, 32, 33, 33, 34, 35, 35, 36, 32, 33, 33, 33, 34, 36, 36, 36, 33, 33,
+ 33, 33, 34, 36, 36, 37, 34, 34, 34, 34, 35, 37, 37, 38, 34, 34, 34, 34,
+ 35, 37, 37, 38 },
{ /* Chroma */
/* Size 4x4 */
31, 31, 34, 38, 31, 32, 35, 40, 34, 35, 39, 43, 38, 40, 43, 47,
@@ -5706,21 +5706,12 @@
48, 48, 41, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 43, 43,
44, 45, 45, 45, 45, 45, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48,
/* Size 4x8 */
- 31, 31, 35, 37, 31, 31, 36, 38, 31, 32, 37, 39, 31, 32, 37, 40, 34, 36,
- 40, 43, 35, 37, 42, 44, 38, 40, 45, 47, 41, 42, 45, 47,
- /* Size 8x4 */
31, 31, 31, 31, 34, 35, 38, 41, 31, 31, 32, 32, 36, 37, 40, 42, 35, 36,
37, 37, 40, 42, 45, 45, 37, 38, 39, 40, 43, 44, 47, 47,
+ /* Size 8x4 */
+ 31, 31, 35, 37, 31, 31, 36, 38, 31, 32, 37, 39, 31, 32, 37, 40, 34, 36,
+ 40, 43, 35, 37, 42, 44, 38, 40, 45, 47, 41, 42, 45, 47,
/* Size 8x16 */
- 32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 38, 38, 39, 31, 31,
- 31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32,
- 34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40,
- 40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 33, 34, 35, 35, 37, 42, 42, 43,
- 33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 35, 37,
- 38, 38, 41, 45, 45, 46, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40,
- 43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 42, 42, 43, 43, 44, 47,
- 47, 48,
- /* Size 16x8 */
32, 31, 31, 31, 31, 30, 30, 31, 33, 33, 33, 35, 37, 37, 39, 42, 31, 31,
31, 31, 31, 31, 31, 32, 34, 35, 35, 37, 39, 39, 40, 42, 31, 31, 31, 31,
32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 31, 31, 31, 31, 32, 32,
@@ -5729,37 +5720,16 @@
43, 45, 47, 47, 47, 47, 37, 38, 38, 38, 39, 40, 40, 40, 42, 43, 43, 45,
47, 47, 47, 47, 38, 39, 40, 40, 40, 41, 41, 41, 43, 44, 44, 46, 47, 47,
47, 48,
+ /* Size 16x8 */
+ 32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 38, 38, 39, 31, 31,
+ 31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32,
+ 34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40,
+ 40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 33, 34, 35, 35, 37, 42, 42, 43,
+ 33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 35, 37,
+ 38, 38, 41, 45, 45, 46, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40,
+ 43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 42, 42, 43, 43, 44, 47,
+ 47, 48,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 38, 42, 31, 31,
- 31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 39, 42, 31, 31, 31, 31,
- 31, 31, 31, 32, 33, 35, 38, 38, 38, 38, 39, 42, 31, 31, 31, 31, 31, 31,
- 31, 32, 34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32,
- 34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36,
- 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38,
- 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38, 38, 38,
- 40, 42, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 39, 39, 39, 39, 40, 42,
- 30, 31, 31, 32, 32, 32, 32, 32, 34, 37, 39, 39, 39, 39, 40, 42, 30, 31,
- 31, 32, 32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32,
- 32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32,
- 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32, 32, 33,
- 35, 37, 40, 40, 40, 40, 41, 42, 31, 31, 32, 32, 33, 33, 33, 33, 35, 38,
- 40, 40, 40, 40, 41, 43, 32, 32, 33, 33, 34, 34, 34, 34, 36, 39, 41, 41,
- 41, 41, 42, 44, 33, 33, 34, 35, 35, 35, 35, 35, 37, 40, 42, 42, 42, 42,
- 43, 44, 33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45,
- 33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34,
- 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34, 35, 35,
- 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 34, 35, 36, 37, 37, 37,
- 37, 37, 39, 42, 44, 44, 44, 44, 45, 45, 35, 36, 37, 38, 38, 38, 38, 39,
- 41, 43, 45, 45, 45, 45, 46, 46, 36, 37, 38, 39, 39, 39, 39, 40, 42, 44,
- 47, 47, 47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47,
- 47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47,
- 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47,
- 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47, 39, 39,
- 40, 41, 41, 41, 41, 42, 43, 45, 47, 47, 47, 47, 47, 48, 40, 41, 41, 42,
- 42, 42, 42, 42, 44, 45, 47, 47, 47, 47, 47, 48, 42, 42, 42, 43, 43, 43,
- 43, 43, 44, 46, 47, 47, 47, 47, 48, 48, 42, 42, 42, 43, 43, 43, 43, 43,
- 44, 46, 47, 47, 47, 47, 48, 48,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 32, 33, 33,
33, 33, 33, 34, 35, 36, 37, 37, 37, 37, 39, 40, 42, 42, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 34, 35,
@@ -5789,33 +5759,47 @@
44, 45, 46, 47, 47, 47, 47, 47, 47, 47, 48, 48, 42, 42, 42, 42, 42, 42,
42, 42, 42, 42, 42, 42, 42, 42, 43, 44, 44, 45, 45, 45, 45, 45, 46, 47,
47, 47, 47, 47, 48, 48, 48, 48,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 38, 42, 31, 31,
+ 31, 31, 31, 31, 31, 31, 33, 35, 37, 37, 37, 37, 39, 42, 31, 31, 31, 31,
+ 31, 31, 31, 32, 33, 35, 38, 38, 38, 38, 39, 42, 31, 31, 31, 31, 31, 31,
+ 31, 32, 34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32,
+ 34, 36, 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36,
+ 38, 38, 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38,
+ 38, 38, 40, 42, 31, 31, 31, 31, 31, 31, 31, 32, 34, 36, 38, 38, 38, 38,
+ 40, 42, 31, 31, 31, 31, 32, 32, 32, 32, 34, 36, 39, 39, 39, 39, 40, 42,
+ 30, 31, 31, 32, 32, 32, 32, 32, 34, 37, 39, 39, 39, 39, 40, 42, 30, 31,
+ 31, 32, 32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32,
+ 32, 32, 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32,
+ 32, 33, 35, 37, 40, 40, 40, 40, 41, 42, 30, 31, 31, 32, 32, 32, 32, 33,
+ 35, 37, 40, 40, 40, 40, 41, 42, 31, 31, 32, 32, 33, 33, 33, 33, 35, 38,
+ 40, 40, 40, 40, 41, 43, 32, 32, 33, 33, 34, 34, 34, 34, 36, 39, 41, 41,
+ 41, 41, 42, 44, 33, 33, 34, 35, 35, 35, 35, 35, 37, 40, 42, 42, 42, 42,
+ 43, 44, 33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45,
+ 33, 34, 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34,
+ 35, 35, 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 33, 34, 35, 35,
+ 36, 36, 36, 36, 38, 40, 43, 43, 43, 43, 44, 45, 34, 35, 36, 37, 37, 37,
+ 37, 37, 39, 42, 44, 44, 44, 44, 45, 45, 35, 36, 37, 38, 38, 38, 38, 39,
+ 41, 43, 45, 45, 45, 45, 46, 46, 36, 37, 38, 39, 39, 39, 39, 40, 42, 44,
+ 47, 47, 47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47,
+ 47, 47, 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47,
+ 47, 47, 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47,
+ 37, 38, 39, 40, 40, 40, 40, 41, 43, 45, 47, 47, 47, 47, 47, 47, 39, 39,
+ 40, 41, 41, 41, 41, 42, 43, 45, 47, 47, 47, 47, 47, 48, 40, 41, 41, 42,
+ 42, 42, 42, 42, 44, 45, 47, 47, 47, 47, 47, 48, 42, 42, 42, 43, 43, 43,
+ 43, 43, 44, 46, 47, 47, 47, 47, 48, 48, 42, 42, 42, 43, 43, 43, 43, 43,
+ 44, 46, 47, 47, 47, 47, 48, 48,
/* Size 4x16 */
- 31, 31, 35, 37, 31, 31, 35, 38, 31, 31, 36, 38, 31, 31, 36, 38, 31, 32,
- 36, 39, 31, 32, 37, 40, 31, 32, 37, 40, 31, 33, 38, 40, 33, 35, 40, 42,
- 34, 36, 40, 43, 34, 36, 40, 43, 36, 38, 43, 45, 38, 40, 45, 47, 38, 40,
- 45, 47, 39, 41, 45, 47, 42, 43, 46, 47,
- /* Size 16x4 */
31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 34, 36, 38, 38, 39, 42, 31, 31,
31, 31, 32, 32, 32, 33, 35, 36, 36, 38, 40, 40, 41, 43, 35, 35, 36, 36,
36, 37, 37, 38, 40, 40, 40, 43, 45, 45, 45, 46, 37, 38, 38, 38, 39, 40,
40, 40, 42, 43, 43, 45, 47, 47, 47, 47,
+ /* Size 16x4 */
+ 31, 31, 35, 37, 31, 31, 35, 38, 31, 31, 36, 38, 31, 31, 36, 38, 31, 32,
+ 36, 39, 31, 32, 37, 40, 31, 32, 37, 40, 31, 33, 38, 40, 33, 35, 40, 42,
+ 34, 36, 40, 43, 34, 36, 40, 43, 36, 38, 43, 45, 38, 40, 45, 47, 38, 40,
+ 45, 47, 39, 41, 45, 47, 42, 43, 46, 47,
/* Size 8x32 */
- 32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 37, 37, 39, 31, 31,
- 31, 31, 33, 38, 38, 39, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31,
- 34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38,
- 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32, 34, 39, 39, 40,
- 30, 31, 32, 32, 34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31,
- 32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32,
- 35, 40, 40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 32, 33, 34, 34, 36, 41,
- 41, 42, 33, 34, 35, 35, 37, 42, 42, 43, 33, 35, 36, 36, 38, 43, 43, 44,
- 33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 33, 35,
- 36, 36, 38, 43, 43, 44, 34, 36, 37, 37, 39, 44, 44, 45, 35, 37, 38, 38,
- 41, 45, 45, 46, 36, 38, 39, 39, 42, 47, 47, 47, 37, 39, 40, 40, 43, 47,
- 47, 47, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40, 43, 47, 47, 47,
- 37, 39, 40, 40, 43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 40, 41,
- 42, 42, 44, 47, 47, 47, 42, 42, 43, 43, 44, 47, 47, 48, 42, 42, 43, 43,
- 44, 47, 47, 48,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 32, 33, 33,
33, 33, 33, 34, 35, 36, 37, 37, 37, 37, 39, 40, 42, 42, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 35, 35, 35, 36,
@@ -5830,7 +5814,23 @@
38, 38, 39, 39, 40, 40, 40, 40, 40, 41, 42, 43, 43, 43, 43, 44, 45, 47,
47, 47, 47, 47, 47, 47, 47, 47, 38, 39, 39, 40, 40, 40, 40, 40, 40, 40,
41, 41, 41, 41, 41, 42, 43, 44, 44, 44, 44, 45, 46, 47, 47, 47, 47, 47,
- 47, 47, 48, 48 },
+ 47, 47, 48, 48,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 33, 37, 37, 38, 31, 31, 31, 31, 33, 37, 37, 39, 31, 31,
+ 31, 31, 33, 38, 38, 39, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31,
+ 34, 38, 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 31, 31, 34, 38,
+ 38, 40, 31, 31, 31, 31, 34, 38, 38, 40, 31, 31, 32, 32, 34, 39, 39, 40,
+ 30, 31, 32, 32, 34, 39, 39, 40, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31,
+ 32, 32, 35, 40, 40, 41, 30, 31, 32, 32, 35, 40, 40, 41, 30, 31, 32, 32,
+ 35, 40, 40, 41, 31, 32, 33, 33, 35, 40, 40, 41, 32, 33, 34, 34, 36, 41,
+ 41, 42, 33, 34, 35, 35, 37, 42, 42, 43, 33, 35, 36, 36, 38, 43, 43, 44,
+ 33, 35, 36, 36, 38, 43, 43, 44, 33, 35, 36, 36, 38, 43, 43, 44, 33, 35,
+ 36, 36, 38, 43, 43, 44, 34, 36, 37, 37, 39, 44, 44, 45, 35, 37, 38, 38,
+ 41, 45, 45, 46, 36, 38, 39, 39, 42, 47, 47, 47, 37, 39, 40, 40, 43, 47,
+ 47, 47, 37, 39, 40, 40, 43, 47, 47, 47, 37, 39, 40, 40, 43, 47, 47, 47,
+ 37, 39, 40, 40, 43, 47, 47, 47, 39, 40, 41, 41, 43, 47, 47, 47, 40, 41,
+ 42, 42, 44, 47, 47, 47, 42, 42, 43, 43, 44, 47, 47, 48, 42, 42, 43, 43,
+ 44, 47, 47, 48 },
},
{
{ /* Luma */
@@ -5916,21 +5916,12 @@
33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
/* Size 4x8 */
- 31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
- 32, 32, 31, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
- /* Size 8x4 */
31, 31, 31, 31, 31, 31, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+ /* Size 8x4 */
+ 31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+ 32, 32, 31, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
/* Size 8x16 */
- 32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
- 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
- 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
- 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
- 32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32,
- 33, 34,
- /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
@@ -5939,37 +5930,16 @@
32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34,
34, 34,
+ /* Size 16x8 */
+ 32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
+ 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+ 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+ 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+ 32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32,
+ 33, 34,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
- 33, 33, 33, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
- 34, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 33, 33, 33, 34, 34,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -5999,33 +5969,47 @@
32, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 34, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33,
34, 34, 34, 34, 34, 34, 34, 34,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 33, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33,
+ 33, 33, 33, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33,
+ 34, 34, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 33, 33, 33, 34, 34, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 33, 33, 33, 34, 34,
/* Size 4x16 */
- 31, 31, 31, 32, 31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
- 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
- 31, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32,
- 32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
- /* Size 16x4 */
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
+ /* Size 16x4 */
+ 31, 31, 31, 32, 31, 31, 31, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+ 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+ 31, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32,
+ 32, 33, 32, 32, 32, 33, 32, 32, 32, 33,
/* Size 8x32 */
- 32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
- 31, 31, 31, 31, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
- 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
- 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
- 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
- 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
- 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
- 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
- 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
- 32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34,
- 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32,
- 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32,
- 32, 32, 33, 34,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -6040,7 +6024,23 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34,
- 34, 34, 34, 34 },
+ 34, 34, 34, 34,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 31, 31, 31, 32, 31, 31, 31, 31, 31, 31, 32, 32, 31, 31,
+ 31, 31, 31, 31, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+ 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+ 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+ 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+ 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32,
+ 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32,
+ 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32,
+ 31, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 32, 31, 32,
+ 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 32, 32, 32, 33, 31, 32, 32, 32,
+ 32, 32, 33, 33, 31, 32, 32, 32, 32, 32, 33, 33, 31, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34,
+ 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32,
+ 32, 32, 32, 32, 33, 34, 32, 32, 32, 32, 32, 32, 33, 34, 32, 32, 32, 32,
+ 32, 32, 33, 34 },
{ /* Chroma */
/* Size 4x4 */
31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 32, 35, 34, 35, 35, 39,
@@ -6124,21 +6124,12 @@
39, 40, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 36, 36,
36, 36, 36, 36, 36, 37, 37, 38, 39, 40, 40, 40, 40, 40, 40, 40,
/* Size 4x8 */
- 31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32,
- 32, 36, 31, 33, 33, 37, 34, 36, 36, 40, 34, 36, 36, 40,
- /* Size 8x4 */
31, 31, 31, 31, 31, 31, 34, 34, 31, 31, 31, 32, 32, 33, 36, 36, 31, 31,
31, 32, 32, 33, 36, 36, 34, 35, 35, 36, 36, 37, 40, 40,
+ /* Size 8x4 */
+ 31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32,
+ 32, 36, 31, 33, 33, 37, 34, 36, 36, 40, 34, 36, 36, 40,
/* Size 8x16 */
- 32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31,
- 31, 31, 31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31,
- 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32,
- 34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32, 35, 38,
- 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 31, 32,
- 33, 33, 33, 33, 36, 39, 33, 34, 34, 35, 35, 35, 37, 40, 33, 34, 35, 36,
- 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36,
- 38, 41,
- /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 31, 33, 33, 33, 33, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 34, 34, 34, 34, 31, 31, 31, 31,
31, 31, 31, 31, 32, 32, 32, 33, 34, 35, 35, 35, 31, 31, 31, 31, 31, 31,
@@ -6147,37 +6138,16 @@
32, 33, 35, 36, 36, 36, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35, 36,
37, 38, 38, 38, 35, 36, 36, 37, 37, 37, 37, 38, 38, 38, 38, 39, 40, 41,
41, 41,
+ /* Size 16x8 */
+ 32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31,
+ 31, 31, 31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31,
+ 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32,
+ 34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32, 35, 38,
+ 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 31, 32,
+ 33, 33, 33, 33, 36, 39, 33, 34, 34, 35, 35, 35, 37, 40, 33, 34, 35, 36,
+ 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36,
+ 38, 41,
/* Size 16x32 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 36, 37, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 32, 33, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 32, 34, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33,
- 34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35,
- 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 32,
- 32, 32, 32, 32, 32, 33, 34, 36, 37, 39, 31, 31, 31, 31, 31, 32, 32, 32,
- 32, 32, 32, 33, 34, 36, 37, 39, 30, 31, 31, 31, 31, 32, 32, 32, 32, 32,
- 32, 33, 34, 36, 38, 39, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
- 35, 36, 38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36,
- 38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40,
- 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 31, 31, 31, 32, 32, 33,
- 33, 33, 33, 33, 33, 34, 35, 37, 38, 40, 31, 32, 32, 33, 33, 33, 33, 33,
- 33, 33, 33, 35, 36, 37, 39, 41, 32, 32, 33, 33, 34, 34, 34, 34, 34, 34,
- 34, 35, 37, 38, 40, 41, 33, 33, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36,
- 37, 39, 40, 42, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40,
- 41, 43, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43,
- 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34,
- 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35,
- 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35, 35, 36,
- 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 34, 34, 35, 35, 36, 36, 36, 36,
- 36, 36, 36, 38, 39, 40, 42, 44,
- /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
30, 30, 30, 31, 31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6207,33 +6177,47 @@
38, 38, 39, 40, 40, 41, 41, 41, 41, 41, 41, 42, 37, 37, 37, 38, 38, 38,
38, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 40, 40, 40, 40, 40, 41, 41,
42, 43, 43, 43, 43, 43, 43, 44,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 35, 37, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 36, 37, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 32, 33, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 32, 34, 35, 36, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33,
+ 34, 35, 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35,
+ 37, 38, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 33, 34, 35, 37, 38, 31, 31, 31, 31, 31, 32,
+ 32, 32, 32, 32, 32, 33, 34, 36, 37, 39, 31, 31, 31, 31, 31, 32, 32, 32,
+ 32, 32, 32, 33, 34, 36, 37, 39, 30, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+ 32, 33, 34, 36, 38, 39, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33,
+ 35, 36, 38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36,
+ 38, 40, 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40,
+ 30, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 30, 31, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 33, 35, 36, 38, 40, 31, 31, 31, 32, 32, 33,
+ 33, 33, 33, 33, 33, 34, 35, 37, 38, 40, 31, 32, 32, 33, 33, 33, 33, 33,
+ 33, 33, 33, 35, 36, 37, 39, 41, 32, 32, 33, 33, 34, 34, 34, 34, 34, 34,
+ 34, 35, 37, 38, 40, 41, 33, 33, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36,
+ 37, 39, 40, 42, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40,
+ 41, 43, 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43,
+ 33, 34, 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34,
+ 34, 35, 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35,
+ 35, 36, 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 33, 34, 34, 35, 35, 36,
+ 36, 36, 36, 36, 36, 37, 38, 40, 41, 43, 34, 34, 35, 35, 36, 36, 36, 36,
+ 36, 36, 36, 38, 39, 40, 42, 44,
/* Size 4x16 */
- 31, 31, 31, 34, 31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 31,
- 31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32, 32, 36, 31, 32, 32, 36,
- 31, 32, 32, 36, 31, 32, 32, 36, 32, 33, 33, 37, 33, 35, 35, 39, 34, 36,
- 36, 40, 34, 36, 36, 40, 34, 36, 36, 40,
- /* Size 16x4 */
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 33, 34, 34, 34, 31, 31,
31, 31, 31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 31, 31, 31, 31,
31, 31, 32, 32, 32, 32, 32, 33, 35, 36, 36, 36, 34, 34, 35, 35, 35, 35,
36, 36, 36, 36, 36, 37, 39, 40, 40, 40,
+ /* Size 16x4 */
+ 31, 31, 31, 34, 31, 31, 31, 34, 31, 31, 31, 35, 31, 31, 31, 35, 31, 31,
+ 31, 35, 31, 31, 31, 35, 31, 32, 32, 36, 31, 32, 32, 36, 31, 32, 32, 36,
+ 31, 32, 32, 36, 31, 32, 32, 36, 32, 33, 33, 37, 33, 35, 35, 39, 34, 36,
+ 36, 40, 34, 36, 36, 40, 34, 36, 36, 40,
/* Size 8x32 */
- 32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 35, 31, 31,
- 31, 31, 31, 31, 33, 36, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31, 31, 31,
- 31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31,
- 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37,
- 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31,
- 31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32, 34, 37, 31, 31, 31, 32,
- 32, 32, 34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32,
- 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38,
- 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31,
- 32, 32, 32, 32, 35, 38, 31, 31, 32, 33, 33, 33, 35, 38, 31, 32, 33, 33,
- 33, 33, 36, 39, 32, 33, 34, 34, 34, 34, 37, 40, 33, 34, 34, 35, 35, 35,
- 37, 40, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41,
- 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34,
- 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 34, 35, 36, 36,
- 36, 36, 39, 42,
- /* Size 32x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
30, 30, 30, 31, 31, 32, 33, 33, 33, 33, 33, 33, 33, 34, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6248,7 +6232,23 @@
34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35, 36, 37,
37, 38, 38, 38, 38, 38, 38, 39, 35, 35, 36, 36, 36, 37, 37, 37, 37, 37,
37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 39, 40, 40, 41, 41, 41,
- 41, 41, 41, 42 },
+ 41, 41, 41, 42,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 31, 31, 33, 35, 31, 31, 31, 31, 31, 31, 33, 35, 31, 31,
+ 31, 31, 31, 31, 33, 36, 31, 31, 31, 31, 31, 31, 33, 36, 31, 31, 31, 31,
+ 31, 31, 34, 36, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31,
+ 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37,
+ 31, 31, 31, 31, 31, 31, 34, 37, 31, 31, 31, 31, 31, 31, 34, 37, 31, 31,
+ 31, 31, 31, 31, 34, 37, 31, 31, 31, 32, 32, 32, 34, 37, 31, 31, 31, 32,
+ 32, 32, 34, 37, 30, 31, 31, 32, 32, 32, 34, 38, 30, 31, 32, 32, 32, 32,
+ 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38,
+ 30, 31, 32, 32, 32, 32, 35, 38, 30, 31, 32, 32, 32, 32, 35, 38, 30, 31,
+ 32, 32, 32, 32, 35, 38, 31, 31, 32, 33, 33, 33, 35, 38, 31, 32, 33, 33,
+ 33, 33, 36, 39, 32, 33, 34, 34, 34, 34, 37, 40, 33, 34, 34, 35, 35, 35,
+ 37, 40, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41,
+ 33, 34, 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 33, 34,
+ 35, 36, 36, 36, 38, 41, 33, 34, 35, 36, 36, 36, 38, 41, 34, 35, 36, 36,
+ 36, 36, 39, 42 },
},
{
{ /* Luma */
@@ -6334,22 +6334,13 @@
32, 32, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
/* Size 4x8 */
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
- 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
- /* Size 8x4 */
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+ /* Size 8x4 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+ 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
/* Size 8x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 32,
- 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32,
- /* Size 16x8 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
@@ -6357,37 +6348,16 @@
32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32,
- /* Size 16x32 */
+ /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
- 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 32x16 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 31, 31, 31, 32,
+ 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+ 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32,
+ /* Size 16x32 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6417,35 +6387,49 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 32x16 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
+ 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32,
/* Size 4x16 */
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 31, 32,
- 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
- 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
- 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
- /* Size 16x4 */
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 16x4 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 31, 32,
+ 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
+ 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31, 32,
+ 32, 32, 31, 32, 32, 32, 31, 32, 32, 32,
/* Size 8x32 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
- 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
- 32, 32, 32, 32,
- /* Size 32x8 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -6458,6 +6442,22 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+ 32, 32, 31, 31, 31, 32, 32, 32, 32, 32, 31, 31, 31, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+ 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
+ 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32, 32, 32, 31, 31, 32, 32,
32, 32, 32, 32 },
{ /* Chroma */
/* Size 4x4 */
@@ -6542,21 +6542,12 @@
32, 32, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
/* Size 4x8 */
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 30, 31, 32, 32,
- /* Size 8x4 */
31, 31, 31, 31, 31, 31, 31, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 32, 32, 31, 31, 31, 31, 31, 31, 32, 32,
+ /* Size 8x4 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 30, 31, 32, 32,
/* Size 8x16 */
- 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31,
- 31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 32, 32,
- 32, 32,
- /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6565,37 +6556,16 @@
31, 31, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
32, 32,
- /* Size 16x32 */
+ /* Size 16x8 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
- 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 31,
- 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31,
- 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31,
- 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31, 31, 31,
- 32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 32x16 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31,
+ 31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 32, 32,
+ 32, 32,
+ /* Size 16x32 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6625,17 +6595,7 @@
31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 4x16 */
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 31, 31,
- 32, 32, 31, 31, 32, 32, 30, 31, 32, 32,
- /* Size 16x4 */
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
- /* Size 8x32 */
+ /* Size 32x16 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6646,12 +6606,36 @@
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
- 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32,
- 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32,
- 30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31,
- 31, 31, 32, 32, 32, 32, 30, 31, 31, 31, 32, 32, 32, 32, 30, 31, 31, 31,
- 32, 32, 32, 32,
- /* Size 32x8 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32,
+ 30, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 31,
+ 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31,
+ 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31,
+ 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 31, 31, 31, 31, 31, 31,
+ 32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 4x16 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 32, 32, 32, 32,
+ /* Size 16x4 */
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 31, 31,
+ 32, 32, 31, 31, 32, 32, 30, 31, 32, 32,
+ /* Size 8x32 */
32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
@@ -6666,6 +6650,22 @@
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32,
32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32,
+ /* Size 32x8 */
+ 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
+ 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32,
+ 32, 32, 31, 31, 31, 31, 31, 32, 32, 32, 31, 31, 31, 31, 31, 32, 32, 32,
+ 30, 31, 31, 31, 31, 32, 32, 32, 30, 31, 31, 31, 31, 32, 32, 32, 30, 31,
+ 31, 31, 32, 32, 32, 32, 30, 31, 31, 31, 32, 32, 32, 32, 30, 31, 31, 31,
32, 32, 32, 32 },
},
};
@@ -6748,20 +6748,12 @@
4, 4, 8, 9, 9, 9, 9, 9, 9, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6,
6, 6, 5, 5, 5, 5, 5, 5, 4, 4, 4,
/* Size 4x8 */
- 32, 24, 14, 11, 31, 24, 15, 12, 28, 18, 12, 11, 21, 14, 10, 9, 16, 12,
- 8, 8, 13, 11, 7, 7, 11, 10, 7, 6, 10, 9, 7, 5,
- /* Size 8x4 */
32, 31, 28, 21, 16, 13, 11, 10, 24, 24, 18, 14, 12, 11, 10, 9, 14, 15,
12, 10, 8, 7, 7, 7, 11, 12, 11, 9, 8, 7, 6, 5,
+ /* Size 8x4 */
+ 32, 24, 14, 11, 31, 24, 15, 12, 28, 18, 12, 11, 21, 14, 10, 9, 16, 12,
+ 8, 8, 13, 11, 7, 7, 11, 10, 7, 6, 10, 9, 7, 5,
/* Size 8x16 */
- 32, 32, 28, 19, 16, 12, 11, 10, 33, 31, 30, 21, 17, 13, 12, 11, 32, 30,
- 28, 20, 17, 13, 12, 12, 30, 28, 24, 19, 16, 13, 13, 12, 28, 27, 21, 17,
- 15, 12, 12, 11, 23, 24, 19, 14, 13, 11, 11, 11, 21, 22, 18, 13, 12, 10,
- 10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 16, 18, 15, 11, 10, 8, 8, 8, 13,
- 15, 13, 10, 9, 7, 8, 8, 12, 14, 13, 10, 8, 7, 7, 7, 11, 13, 12, 10, 8,
- 7, 6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 6, 9, 10,
- 10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5,
- /* Size 16x8 */
32, 33, 32, 30, 28, 23, 21, 18, 16, 13, 12, 11, 11, 10, 9, 9, 32, 31,
30, 28, 27, 24, 22, 19, 18, 15, 14, 13, 12, 11, 10, 10, 28, 30, 28, 24,
21, 19, 18, 16, 15, 13, 13, 12, 11, 10, 10, 10, 19, 21, 20, 19, 17, 14,
@@ -6769,34 +6761,15 @@
9, 8, 8, 8, 8, 7, 8, 12, 13, 13, 13, 12, 11, 10, 9, 8, 7, 7, 7, 7, 7, 6,
7, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 12,
12, 11, 11, 10, 9, 8, 8, 7, 6, 6, 6, 5, 5,
+ /* Size 16x8 */
+ 32, 32, 28, 19, 16, 12, 11, 10, 33, 31, 30, 21, 17, 13, 12, 11, 32, 30,
+ 28, 20, 17, 13, 12, 12, 30, 28, 24, 19, 16, 13, 13, 12, 28, 27, 21, 17,
+ 15, 12, 12, 11, 23, 24, 19, 14, 13, 11, 11, 11, 21, 22, 18, 13, 12, 10,
+ 10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 16, 18, 15, 11, 10, 8, 8, 8, 13,
+ 15, 13, 10, 9, 7, 8, 8, 12, 14, 13, 10, 8, 7, 7, 7, 11, 13, 12, 10, 8,
+ 7, 6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 6, 9, 10,
+ 10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5,
/* Size 16x32 */
- 32, 33, 32, 30, 28, 23, 19, 17, 16, 13, 12, 11, 11, 11, 10, 10, 33, 32,
- 32, 30, 29, 24, 20, 18, 17, 14, 12, 12, 12, 11, 11, 11, 33, 32, 31, 31,
- 30, 25, 21, 19, 17, 14, 13, 12, 12, 11, 11, 11, 33, 32, 31, 30, 29, 25,
- 21, 19, 17, 14, 13, 13, 12, 12, 11, 11, 32, 32, 30, 29, 28, 24, 20, 19,
- 17, 14, 13, 13, 12, 12, 12, 11, 32, 31, 29, 28, 27, 24, 21, 19, 18, 15,
- 14, 13, 12, 12, 12, 11, 30, 30, 28, 26, 24, 21, 19, 18, 16, 14, 13, 13,
- 13, 12, 12, 11, 29, 30, 28, 25, 23, 20, 18, 17, 16, 13, 12, 12, 12, 12,
- 12, 11, 28, 30, 27, 24, 21, 19, 17, 16, 15, 13, 12, 12, 12, 12, 11, 11,
- 26, 28, 26, 23, 20, 18, 16, 15, 14, 12, 12, 12, 11, 11, 11, 11, 23, 25,
- 24, 21, 19, 16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 11, 22, 24, 23, 21,
- 19, 16, 14, 13, 12, 11, 10, 10, 10, 10, 10, 10, 21, 23, 22, 20, 18, 15,
- 13, 13, 12, 11, 10, 10, 10, 10, 10, 10, 19, 21, 20, 19, 17, 14, 12, 12,
- 11, 10, 9, 10, 10, 9, 10, 9, 18, 19, 19, 18, 16, 14, 12, 11, 10, 9, 9,
- 9, 9, 9, 9, 9, 17, 18, 18, 17, 16, 13, 12, 11, 10, 9, 9, 9, 9, 9, 9, 9,
- 16, 17, 18, 16, 15, 13, 11, 10, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 15,
- 14, 12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 10, 9,
- 9, 8, 7, 8, 8, 8, 8, 8, 13, 14, 14, 14, 13, 11, 10, 9, 9, 8, 7, 7, 7, 7,
- 7, 7, 12, 14, 14, 13, 13, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13,
- 13, 12, 11, 9, 9, 8, 7, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 10, 10, 9,
- 8, 7, 7, 7, 6, 6, 6, 7, 11, 12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 6, 6, 6,
- 6, 6, 11, 12, 12, 12, 11, 10, 10, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 12,
- 12, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9,
- 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 5,
- 5, 5, 9, 10, 10, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 10, 10, 10,
- 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 9, 10, 10, 10, 9, 9, 8, 8, 7, 7,
- 6, 6, 5, 5, 5, 8, 9, 9, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 5,
- /* Size 32x16 */
32, 33, 33, 33, 32, 32, 30, 29, 28, 26, 23, 22, 21, 19, 18, 17, 16, 14,
13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 33, 32, 32, 32, 32,
31, 30, 30, 30, 28, 25, 24, 23, 21, 19, 18, 17, 16, 14, 14, 14, 13, 12,
@@ -6824,32 +6797,44 @@
8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 10, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 5,
5, 5, 5, 5,
+ /* Size 32x16 */
+ 32, 33, 32, 30, 28, 23, 19, 17, 16, 13, 12, 11, 11, 11, 10, 10, 33, 32,
+ 32, 30, 29, 24, 20, 18, 17, 14, 12, 12, 12, 11, 11, 11, 33, 32, 31, 31,
+ 30, 25, 21, 19, 17, 14, 13, 12, 12, 11, 11, 11, 33, 32, 31, 30, 29, 25,
+ 21, 19, 17, 14, 13, 13, 12, 12, 11, 11, 32, 32, 30, 29, 28, 24, 20, 19,
+ 17, 14, 13, 13, 12, 12, 12, 11, 32, 31, 29, 28, 27, 24, 21, 19, 18, 15,
+ 14, 13, 12, 12, 12, 11, 30, 30, 28, 26, 24, 21, 19, 18, 16, 14, 13, 13,
+ 13, 12, 12, 11, 29, 30, 28, 25, 23, 20, 18, 17, 16, 13, 12, 12, 12, 12,
+ 12, 11, 28, 30, 27, 24, 21, 19, 17, 16, 15, 13, 12, 12, 12, 12, 11, 11,
+ 26, 28, 26, 23, 20, 18, 16, 15, 14, 12, 12, 12, 11, 11, 11, 11, 23, 25,
+ 24, 21, 19, 16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 11, 22, 24, 23, 21,
+ 19, 16, 14, 13, 12, 11, 10, 10, 10, 10, 10, 10, 21, 23, 22, 20, 18, 15,
+ 13, 13, 12, 11, 10, 10, 10, 10, 10, 10, 19, 21, 20, 19, 17, 14, 12, 12,
+ 11, 10, 9, 10, 10, 9, 10, 9, 18, 19, 19, 18, 16, 14, 12, 11, 10, 9, 9,
+ 9, 9, 9, 9, 9, 17, 18, 18, 17, 16, 13, 12, 11, 10, 9, 9, 9, 9, 9, 9, 9,
+ 16, 17, 18, 16, 15, 13, 11, 10, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 15,
+ 14, 12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 10, 9,
+ 9, 8, 7, 8, 8, 8, 8, 8, 13, 14, 14, 14, 13, 11, 10, 9, 9, 8, 7, 7, 7, 7,
+ 7, 7, 12, 14, 14, 13, 13, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13,
+ 13, 12, 11, 9, 9, 8, 7, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 10, 10, 9,
+ 8, 7, 7, 7, 6, 6, 6, 7, 11, 12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 6, 6, 6,
+ 6, 6, 11, 12, 12, 12, 11, 10, 10, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 12,
+ 12, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9,
+ 8, 7, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 5,
+ 5, 5, 9, 10, 10, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 10, 10, 10,
+ 10, 9, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 9, 9, 10, 10, 10, 9, 9, 8, 8, 7, 7,
+ 6, 6, 5, 5, 5, 8, 9, 9, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 5,
/* Size 4x16 */
- 33, 23, 13, 11, 32, 25, 14, 11, 32, 24, 14, 12, 30, 21, 14, 12, 30, 19,
- 13, 12, 25, 16, 11, 11, 23, 15, 11, 10, 19, 14, 9, 9, 17, 13, 9, 8, 14,
- 11, 8, 8, 14, 11, 8, 7, 12, 10, 7, 6, 12, 10, 7, 6, 11, 10, 7, 6, 10, 9,
- 7, 5, 9, 9, 7, 5,
- /* Size 16x4 */
33, 32, 32, 30, 30, 25, 23, 19, 17, 14, 14, 12, 12, 11, 10, 9, 23, 25,
24, 21, 19, 16, 15, 14, 13, 11, 11, 10, 10, 10, 9, 9, 13, 14, 14, 14,
13, 11, 11, 9, 9, 8, 8, 7, 7, 7, 7, 7, 11, 11, 12, 12, 12, 11, 10, 9, 8,
8, 7, 6, 6, 6, 5, 5,
+ /* Size 16x4 */
+ 33, 23, 13, 11, 32, 25, 14, 11, 32, 24, 14, 12, 30, 21, 14, 12, 30, 19,
+ 13, 12, 25, 16, 11, 11, 23, 15, 11, 10, 19, 14, 9, 9, 17, 13, 9, 8, 14,
+ 11, 8, 8, 14, 11, 8, 7, 12, 10, 7, 6, 12, 10, 7, 6, 11, 10, 7, 6, 10, 9,
+ 7, 5, 9, 9, 7, 5,
/* Size 8x32 */
- 32, 32, 28, 19, 16, 12, 11, 10, 33, 32, 29, 20, 17, 12, 12, 11, 33, 31,
- 30, 21, 17, 13, 12, 11, 33, 31, 29, 21, 17, 13, 12, 11, 32, 30, 28, 20,
- 17, 13, 12, 12, 32, 29, 27, 21, 18, 14, 12, 12, 30, 28, 24, 19, 16, 13,
- 13, 12, 29, 28, 23, 18, 16, 12, 12, 12, 28, 27, 21, 17, 15, 12, 12, 11,
- 26, 26, 20, 16, 14, 12, 11, 11, 23, 24, 19, 14, 13, 11, 11, 11, 22, 23,
- 19, 14, 12, 10, 10, 10, 21, 22, 18, 13, 12, 10, 10, 10, 19, 20, 17, 12,
- 11, 9, 10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 17, 18, 16, 12, 10, 9, 9, 9,
- 16, 18, 15, 11, 10, 8, 8, 8, 14, 16, 14, 11, 9, 8, 8, 8, 13, 15, 13, 10,
- 9, 7, 8, 8, 13, 14, 13, 10, 9, 7, 7, 7, 12, 14, 13, 10, 8, 7, 7, 7, 12,
- 13, 12, 9, 8, 7, 7, 7, 11, 13, 12, 10, 8, 7, 6, 6, 11, 12, 11, 10, 8, 7,
- 6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11, 10,
- 9, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9, 7, 6, 6, 5, 9,
- 10, 10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 8, 9, 10, 9, 8, 7, 6,
- 5,
- /* Size 32x8 */
32, 33, 33, 33, 32, 32, 30, 29, 28, 26, 23, 22, 21, 19, 18, 17, 16, 14,
13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9, 8, 32, 32, 31, 31, 30,
29, 28, 28, 27, 26, 24, 23, 22, 20, 19, 18, 18, 16, 15, 14, 14, 13, 13,
@@ -6863,7 +6848,22 @@
11, 12, 12, 12, 12, 12, 13, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8,
7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 10, 11, 11, 11, 12, 12, 12, 12,
11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5, 5,
- 5, 5 },
+ 5, 5,
+ /* Size 32x8 */
+ 32, 32, 28, 19, 16, 12, 11, 10, 33, 32, 29, 20, 17, 12, 12, 11, 33, 31,
+ 30, 21, 17, 13, 12, 11, 33, 31, 29, 21, 17, 13, 12, 11, 32, 30, 28, 20,
+ 17, 13, 12, 12, 32, 29, 27, 21, 18, 14, 12, 12, 30, 28, 24, 19, 16, 13,
+ 13, 12, 29, 28, 23, 18, 16, 12, 12, 12, 28, 27, 21, 17, 15, 12, 12, 11,
+ 26, 26, 20, 16, 14, 12, 11, 11, 23, 24, 19, 14, 13, 11, 11, 11, 22, 23,
+ 19, 14, 12, 10, 10, 10, 21, 22, 18, 13, 12, 10, 10, 10, 19, 20, 17, 12,
+ 11, 9, 10, 10, 18, 19, 16, 12, 10, 9, 9, 9, 17, 18, 16, 12, 10, 9, 9, 9,
+ 16, 18, 15, 11, 10, 8, 8, 8, 14, 16, 14, 11, 9, 8, 8, 8, 13, 15, 13, 10,
+ 9, 7, 8, 8, 13, 14, 13, 10, 9, 7, 7, 7, 12, 14, 13, 10, 8, 7, 7, 7, 12,
+ 13, 12, 9, 8, 7, 7, 7, 11, 13, 12, 10, 8, 7, 6, 6, 11, 12, 11, 10, 8, 7,
+ 6, 6, 11, 12, 11, 10, 8, 7, 6, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11, 10,
+ 9, 8, 7, 6, 6, 10, 11, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9, 7, 6, 6, 5, 9,
+ 10, 10, 9, 7, 6, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 8, 9, 10, 9, 8, 7, 6,
+ 5 },
{ /* Chroma */
/* Size 4x4 */
29, 22, 18, 16, 22, 17, 15, 14, 18, 15, 11, 11, 16, 14, 11, 9,
@@ -6947,21 +6947,12 @@
15, 15, 15, 15, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11,
11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
/* Size 4x8 */
- 33, 22, 17, 16, 26, 23, 19, 17, 22, 18, 16, 16, 21, 17, 14, 14, 19, 16,
- 12, 12, 17, 15, 11, 11, 16, 15, 11, 10, 15, 14, 12, 10,
- /* Size 8x4 */
33, 26, 22, 21, 19, 17, 16, 15, 22, 23, 18, 17, 16, 15, 15, 14, 17, 19,
16, 14, 12, 11, 11, 12, 16, 17, 16, 14, 12, 11, 10, 10,
+ /* Size 8x4 */
+ 33, 22, 17, 16, 26, 23, 19, 17, 22, 18, 16, 16, 21, 17, 14, 14, 19, 16,
+ 12, 12, 17, 15, 11, 11, 16, 15, 11, 10, 15, 14, 12, 10,
/* Size 8x16 */
- 32, 28, 21, 20, 18, 16, 15, 14, 34, 26, 22, 21, 20, 17, 16, 16, 31, 24,
- 22, 22, 20, 17, 17, 16, 24, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19,
- 18, 17, 17, 17, 21, 22, 19, 17, 16, 15, 16, 16, 20, 22, 19, 16, 15, 14,
- 14, 15, 19, 21, 19, 15, 14, 13, 13, 14, 18, 20, 18, 15, 13, 12, 13, 13,
- 16, 19, 17, 14, 12, 11, 12, 12, 16, 18, 17, 14, 12, 11, 11, 12, 15, 17,
- 16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14,
- 12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 13, 15, 15, 14, 12, 11,
- 10, 9,
- /* Size 16x8 */
32, 34, 31, 24, 21, 21, 20, 19, 18, 16, 16, 15, 15, 14, 14, 13, 28, 26,
24, 22, 21, 22, 22, 21, 20, 19, 18, 17, 17, 16, 15, 15, 21, 22, 22, 20,
19, 19, 19, 19, 18, 17, 17, 16, 16, 16, 16, 15, 20, 21, 22, 20, 19, 17,
@@ -6970,37 +6961,16 @@
11, 11, 11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 14, 13, 13, 12, 11, 10,
10, 10, 10, 10, 14, 16, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 10, 10,
10, 9,
+ /* Size 16x8 */
+ 32, 28, 21, 20, 18, 16, 15, 14, 34, 26, 22, 21, 20, 17, 16, 16, 31, 24,
+ 22, 22, 20, 17, 17, 16, 24, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19,
+ 18, 17, 17, 17, 21, 22, 19, 17, 16, 15, 16, 16, 20, 22, 19, 16, 15, 14,
+ 14, 15, 19, 21, 19, 15, 14, 13, 13, 14, 18, 20, 18, 15, 13, 12, 13, 13,
+ 16, 19, 17, 14, 12, 11, 12, 12, 16, 18, 17, 14, 12, 11, 11, 12, 15, 17,
+ 16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14,
+ 12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 13, 15, 15, 14, 12, 11,
+ 10, 9,
/* Size 16x32 */
- 32, 33, 28, 24, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 14, 14, 33, 33,
- 27, 24, 22, 22, 20, 20, 19, 17, 16, 16, 16, 16, 15, 15, 34, 32, 26, 24,
- 22, 23, 21, 20, 20, 18, 17, 17, 16, 16, 16, 15, 32, 30, 25, 23, 22, 23,
- 21, 21, 20, 18, 17, 17, 17, 16, 16, 16, 31, 28, 24, 23, 22, 22, 22, 21,
- 20, 18, 17, 17, 17, 17, 16, 16, 28, 26, 22, 22, 22, 23, 22, 21, 20, 19,
- 18, 18, 17, 17, 17, 16, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 17, 18,
- 17, 17, 17, 16, 23, 23, 22, 21, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17,
- 17, 17, 21, 22, 21, 20, 19, 19, 19, 19, 18, 17, 17, 16, 17, 16, 17, 17,
- 21, 22, 22, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 16, 21, 23,
- 22, 21, 19, 18, 17, 17, 16, 15, 15, 15, 16, 16, 16, 16, 21, 22, 22, 21,
- 19, 17, 17, 16, 16, 15, 14, 15, 15, 15, 15, 15, 20, 22, 22, 20, 19, 17,
- 16, 16, 15, 14, 14, 14, 14, 15, 15, 15, 20, 21, 22, 20, 19, 17, 16, 15,
- 14, 14, 13, 14, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 15, 14, 14, 13,
- 13, 13, 13, 14, 14, 14, 19, 20, 21, 20, 18, 16, 15, 14, 14, 13, 12, 13,
- 13, 13, 13, 13, 18, 20, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 13, 13,
- 13, 13, 17, 19, 20, 19, 18, 16, 14, 14, 13, 12, 12, 12, 12, 12, 13, 13,
- 16, 18, 19, 18, 17, 15, 14, 13, 12, 12, 11, 12, 12, 12, 12, 13, 16, 18,
- 19, 18, 17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 16, 17, 18, 18,
- 17, 15, 14, 13, 12, 11, 11, 11, 11, 11, 12, 12, 15, 17, 18, 17, 16, 15,
- 13, 13, 12, 11, 11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 14, 14, 13,
- 12, 11, 11, 11, 10, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 12, 12,
- 11, 10, 10, 10, 11, 11, 15, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11,
- 10, 10, 10, 11, 14, 16, 16, 17, 15, 15, 14, 13, 12, 11, 11, 10, 10, 10,
- 10, 10, 14, 16, 16, 17, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10,
- 14, 16, 16, 16, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10, 14, 15,
- 15, 16, 16, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16,
- 16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 16, 15, 14,
- 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 15, 15, 14, 14, 13,
- 13, 11, 11, 10, 10, 9, 9, 9,
- /* Size 32x16 */
32, 33, 34, 32, 31, 28, 24, 23, 21, 21, 21, 21, 20, 20, 19, 19, 18, 17,
16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 33, 33, 32, 30,
28, 26, 24, 23, 22, 22, 23, 22, 22, 21, 20, 20, 20, 19, 18, 18, 17, 17,
@@ -7030,33 +7000,47 @@
12, 11, 11, 11, 10, 10, 10, 10, 10, 9, 9, 9, 14, 15, 15, 16, 16, 16, 16,
17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11,
10, 10, 10, 10, 9, 9, 9,
+ /* Size 32x16 */
+ 32, 33, 28, 24, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 14, 14, 33, 33,
+ 27, 24, 22, 22, 20, 20, 19, 17, 16, 16, 16, 16, 15, 15, 34, 32, 26, 24,
+ 22, 23, 21, 20, 20, 18, 17, 17, 16, 16, 16, 15, 32, 30, 25, 23, 22, 23,
+ 21, 21, 20, 18, 17, 17, 17, 16, 16, 16, 31, 28, 24, 23, 22, 22, 22, 21,
+ 20, 18, 17, 17, 17, 17, 16, 16, 28, 26, 22, 22, 22, 23, 22, 21, 20, 19,
+ 18, 18, 17, 17, 17, 16, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 17, 18,
+ 17, 17, 17, 16, 23, 23, 22, 21, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17,
+ 17, 17, 21, 22, 21, 20, 19, 19, 19, 19, 18, 17, 17, 16, 17, 16, 17, 17,
+ 21, 22, 22, 20, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 16, 21, 23,
+ 22, 21, 19, 18, 17, 17, 16, 15, 15, 15, 16, 16, 16, 16, 21, 22, 22, 21,
+ 19, 17, 17, 16, 16, 15, 14, 15, 15, 15, 15, 15, 20, 22, 22, 20, 19, 17,
+ 16, 16, 15, 14, 14, 14, 14, 15, 15, 15, 20, 21, 22, 20, 19, 17, 16, 15,
+ 14, 14, 13, 14, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 15, 14, 14, 13,
+ 13, 13, 13, 14, 14, 14, 19, 20, 21, 20, 18, 16, 15, 14, 14, 13, 12, 13,
+ 13, 13, 13, 13, 18, 20, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 13, 13,
+ 13, 13, 17, 19, 20, 19, 18, 16, 14, 14, 13, 12, 12, 12, 12, 12, 13, 13,
+ 16, 18, 19, 18, 17, 15, 14, 13, 12, 12, 11, 12, 12, 12, 12, 13, 16, 18,
+ 19, 18, 17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 16, 17, 18, 18,
+ 17, 15, 14, 13, 12, 11, 11, 11, 11, 11, 12, 12, 15, 17, 18, 17, 16, 15,
+ 13, 13, 12, 11, 11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 14, 14, 13,
+ 12, 11, 11, 11, 10, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 12, 12,
+ 11, 10, 10, 10, 11, 11, 15, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11,
+ 10, 10, 10, 11, 14, 16, 16, 17, 15, 15, 14, 13, 12, 11, 11, 10, 10, 10,
+ 10, 10, 14, 16, 16, 17, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10,
+ 14, 16, 16, 16, 16, 15, 14, 13, 12, 12, 11, 10, 10, 10, 10, 10, 14, 15,
+ 15, 16, 16, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16,
+ 16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 16, 15, 14,
+ 14, 13, 12, 12, 11, 11, 10, 10, 9, 9, 13, 15, 15, 15, 15, 14, 14, 13,
+ 13, 11, 11, 10, 10, 9, 9, 9,
/* Size 4x16 */
- 33, 21, 16, 15, 32, 23, 18, 16, 28, 22, 18, 17, 24, 21, 18, 17, 22, 19,
- 17, 16, 23, 18, 15, 16, 22, 17, 14, 15, 20, 17, 13, 14, 20, 16, 12, 13,
- 18, 15, 12, 12, 17, 15, 11, 11, 17, 14, 11, 11, 16, 15, 12, 10, 16, 15,
- 12, 10, 15, 15, 12, 10, 15, 14, 12, 10,
- /* Size 16x4 */
33, 32, 28, 24, 22, 23, 22, 20, 20, 18, 17, 17, 16, 16, 15, 15, 21, 23,
22, 21, 19, 18, 17, 17, 16, 15, 15, 14, 15, 15, 15, 14, 16, 18, 18, 18,
17, 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 12, 15, 16, 17, 17, 16, 16,
15, 14, 13, 12, 11, 11, 10, 10, 10, 10,
+ /* Size 16x4 */
+ 33, 21, 16, 15, 32, 23, 18, 16, 28, 22, 18, 17, 24, 21, 18, 17, 22, 19,
+ 17, 16, 23, 18, 15, 16, 22, 17, 14, 15, 20, 17, 13, 14, 20, 16, 12, 13,
+ 18, 15, 12, 12, 17, 15, 11, 11, 17, 14, 11, 11, 16, 15, 12, 10, 16, 15,
+ 12, 10, 15, 15, 12, 10, 15, 14, 12, 10,
/* Size 8x32 */
- 32, 28, 21, 20, 18, 16, 15, 14, 33, 27, 22, 20, 19, 16, 16, 15, 34, 26,
- 22, 21, 20, 17, 16, 16, 32, 25, 22, 21, 20, 17, 17, 16, 31, 24, 22, 22,
- 20, 17, 17, 16, 28, 22, 22, 22, 20, 18, 17, 17, 24, 22, 20, 20, 19, 17,
- 17, 17, 23, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
- 21, 22, 19, 18, 17, 16, 16, 16, 21, 22, 19, 17, 16, 15, 16, 16, 21, 22,
- 19, 17, 16, 14, 15, 15, 20, 22, 19, 16, 15, 14, 14, 15, 20, 22, 19, 16,
- 14, 13, 14, 14, 19, 21, 19, 15, 14, 13, 13, 14, 19, 21, 18, 15, 14, 12,
- 13, 13, 18, 20, 18, 15, 13, 12, 13, 13, 17, 20, 18, 14, 13, 12, 12, 13,
- 16, 19, 17, 14, 12, 11, 12, 12, 16, 19, 17, 14, 12, 11, 12, 12, 16, 18,
- 17, 14, 12, 11, 11, 12, 15, 18, 16, 13, 12, 11, 11, 11, 15, 17, 16, 14,
- 12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11,
- 10, 10, 14, 16, 15, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10,
- 14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 14, 15,
- 16, 14, 12, 11, 10, 9, 13, 15, 15, 14, 12, 11, 10, 9, 13, 15, 15, 14,
- 13, 11, 10, 9,
- /* Size 32x8 */
32, 33, 34, 32, 31, 28, 24, 23, 21, 21, 21, 21, 20, 20, 19, 19, 18, 17,
16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 28, 27, 26, 25,
24, 22, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 20, 19, 19, 18, 18,
@@ -7071,7 +7055,23 @@
17, 17, 17, 16, 16, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 14, 15, 16, 16, 16, 17, 17, 17, 17, 16,
16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10,
- 10, 9, 9, 9 },
+ 10, 9, 9, 9,
+ /* Size 32x8 */
+ 32, 28, 21, 20, 18, 16, 15, 14, 33, 27, 22, 20, 19, 16, 16, 15, 34, 26,
+ 22, 21, 20, 17, 16, 16, 32, 25, 22, 21, 20, 17, 17, 16, 31, 24, 22, 22,
+ 20, 17, 17, 16, 28, 22, 22, 22, 20, 18, 17, 17, 24, 22, 20, 20, 19, 17,
+ 17, 17, 23, 22, 20, 20, 19, 17, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
+ 21, 22, 19, 18, 17, 16, 16, 16, 21, 22, 19, 17, 16, 15, 16, 16, 21, 22,
+ 19, 17, 16, 14, 15, 15, 20, 22, 19, 16, 15, 14, 14, 15, 20, 22, 19, 16,
+ 14, 13, 14, 14, 19, 21, 19, 15, 14, 13, 13, 14, 19, 21, 18, 15, 14, 12,
+ 13, 13, 18, 20, 18, 15, 13, 12, 13, 13, 17, 20, 18, 14, 13, 12, 12, 13,
+ 16, 19, 17, 14, 12, 11, 12, 12, 16, 19, 17, 14, 12, 11, 12, 12, 16, 18,
+ 17, 14, 12, 11, 11, 12, 15, 18, 16, 13, 12, 11, 11, 11, 15, 17, 16, 14,
+ 12, 11, 10, 11, 15, 17, 16, 14, 12, 11, 10, 11, 15, 17, 16, 14, 12, 11,
+ 10, 10, 14, 16, 15, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10,
+ 14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 12, 11, 10, 10, 14, 15,
+ 16, 14, 12, 11, 10, 9, 13, 15, 15, 14, 12, 11, 10, 9, 13, 15, 15, 14,
+ 13, 11, 10, 9 },
},
{
{ /* Luma */
@@ -7152,20 +7152,12 @@
5, 5, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7,
7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5,
/* Size 4x8 */
- 32, 24, 15, 12, 31, 24, 16, 12, 28, 18, 13, 12, 22, 15, 11, 10, 17, 13,
- 9, 8, 14, 11, 8, 7, 12, 11, 8, 6, 10, 10, 8, 6,
- /* Size 8x4 */
32, 31, 28, 22, 17, 14, 12, 10, 24, 24, 18, 15, 13, 11, 11, 10, 15, 16,
13, 11, 9, 8, 8, 8, 12, 12, 12, 10, 8, 7, 6, 6,
+ /* Size 8x4 */
+ 32, 24, 15, 12, 31, 24, 16, 12, 28, 18, 13, 12, 22, 15, 11, 10, 17, 13,
+ 9, 8, 14, 11, 8, 7, 12, 11, 8, 6, 10, 10, 8, 6,
/* Size 8x16 */
- 32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 32, 30,
- 28, 23, 17, 14, 13, 12, 32, 29, 26, 22, 17, 14, 13, 12, 28, 28, 21, 18,
- 15, 13, 12, 12, 26, 26, 20, 17, 14, 12, 11, 11, 22, 23, 18, 15, 12, 11,
- 10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 17, 18, 16, 13, 10, 9, 9, 9, 14,
- 16, 14, 12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7, 12, 13, 12, 10, 8,
- 7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11,
- 11, 9, 8, 7, 6, 6, 9, 10, 10, 9, 8, 7, 6, 5,
- /* Size 16x8 */
32, 33, 32, 32, 28, 26, 22, 19, 17, 14, 13, 12, 11, 10, 10, 9, 32, 32,
30, 29, 28, 26, 23, 20, 18, 16, 15, 13, 12, 12, 11, 10, 28, 29, 28, 26,
21, 20, 18, 17, 16, 14, 13, 12, 12, 11, 11, 10, 22, 23, 23, 22, 18, 17,
@@ -7173,35 +7165,15 @@
9, 9, 8, 8, 8, 8, 8, 13, 14, 14, 14, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7,
7, 7, 11, 12, 13, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 6, 6, 6, 11, 11, 12,
12, 12, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 5,
+ /* Size 16x8 */
+ 32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 32, 30,
+ 28, 23, 17, 14, 13, 12, 32, 29, 26, 22, 17, 14, 13, 12, 28, 28, 21, 18,
+ 15, 13, 12, 12, 26, 26, 20, 17, 14, 12, 11, 11, 22, 23, 18, 15, 12, 11,
+ 10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 17, 18, 16, 13, 10, 9, 9, 9, 14,
+ 16, 14, 12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7, 12, 13, 12, 10, 8,
+ 7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 10, 12, 11, 9, 8, 7, 6, 6, 10, 11,
+ 11, 9, 8, 7, 6, 6, 9, 10, 10, 9, 8, 7, 6, 5,
/* Size 16x32 */
- 32, 33, 32, 32, 28, 23, 22, 19, 16, 14, 13, 12, 11, 11, 11, 10, 33, 32,
- 32, 31, 29, 24, 23, 20, 17, 15, 14, 12, 12, 12, 11, 11, 33, 32, 32, 31,
- 29, 25, 23, 21, 17, 15, 14, 13, 12, 12, 11, 11, 33, 32, 31, 31, 29, 25,
- 23, 21, 17, 16, 14, 13, 12, 12, 12, 11, 32, 32, 30, 30, 28, 24, 23, 20,
- 17, 16, 14, 13, 13, 12, 12, 11, 32, 31, 29, 28, 27, 24, 23, 21, 18, 16,
- 15, 13, 13, 12, 12, 12, 32, 31, 29, 28, 26, 23, 22, 20, 17, 16, 14, 13,
- 13, 13, 12, 12, 30, 30, 28, 27, 24, 21, 20, 19, 16, 15, 14, 13, 12, 13,
- 12, 12, 28, 30, 28, 26, 21, 19, 18, 17, 15, 14, 13, 12, 12, 12, 12, 12,
- 27, 28, 26, 25, 21, 18, 18, 16, 14, 13, 13, 12, 12, 12, 11, 11, 26, 28,
- 26, 24, 20, 18, 17, 16, 14, 13, 12, 11, 11, 11, 11, 11, 23, 25, 24, 23,
- 19, 16, 16, 14, 13, 12, 11, 11, 11, 11, 11, 10, 22, 23, 23, 22, 18, 16,
- 15, 14, 12, 11, 11, 10, 10, 10, 10, 10, 21, 22, 22, 21, 18, 15, 14, 13,
- 12, 11, 11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 14, 14, 12, 11, 10,
- 10, 9, 9, 10, 9, 10, 18, 19, 19, 19, 16, 14, 13, 12, 10, 10, 9, 9, 9, 9,
- 9, 9, 17, 18, 18, 18, 16, 13, 13, 12, 10, 10, 9, 9, 9, 9, 9, 9, 16, 17,
- 17, 17, 15, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 12,
- 12, 11, 9, 9, 8, 8, 8, 8, 8, 8, 13, 15, 15, 15, 13, 12, 11, 10, 9, 8, 8,
- 8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 11, 10, 9, 8, 8, 7, 7, 7, 7, 8,
- 12, 14, 14, 14, 13, 11, 11, 10, 8, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13,
- 12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 10, 9, 8,
- 8, 7, 7, 7, 7, 7, 6, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 6, 6,
- 6, 11, 12, 12, 12, 11, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6, 10, 12, 12,
- 12, 11, 11, 9, 9, 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 9, 9,
- 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6,
- 6, 6, 10, 10, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 9, 10, 10,
- 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 6, 5, 5, 9, 10, 10, 10, 10, 9, 9, 8, 8,
- 7, 7, 6, 6, 5, 5, 5,
- /* Size 32x16 */
32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 17, 16,
14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 33, 32, 32, 32,
32, 31, 31, 30, 30, 28, 28, 25, 23, 22, 21, 19, 18, 17, 16, 15, 14, 14,
@@ -7229,32 +7201,45 @@
11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5,
5, 10, 11, 11, 11, 11, 12, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 8,
8, 8, 8, 7, 7, 6, 6, 6, 6, 6, 6, 5, 5, 5,
+ /* Size 32x16 */
+ 32, 33, 32, 32, 28, 23, 22, 19, 16, 14, 13, 12, 11, 11, 11, 10, 33, 32,
+ 32, 31, 29, 24, 23, 20, 17, 15, 14, 12, 12, 12, 11, 11, 33, 32, 32, 31,
+ 29, 25, 23, 21, 17, 15, 14, 13, 12, 12, 11, 11, 33, 32, 31, 31, 29, 25,
+ 23, 21, 17, 16, 14, 13, 12, 12, 12, 11, 32, 32, 30, 30, 28, 24, 23, 20,
+ 17, 16, 14, 13, 13, 12, 12, 11, 32, 31, 29, 28, 27, 24, 23, 21, 18, 16,
+ 15, 13, 13, 12, 12, 12, 32, 31, 29, 28, 26, 23, 22, 20, 17, 16, 14, 13,
+ 13, 13, 12, 12, 30, 30, 28, 27, 24, 21, 20, 19, 16, 15, 14, 13, 12, 13,
+ 12, 12, 28, 30, 28, 26, 21, 19, 18, 17, 15, 14, 13, 12, 12, 12, 12, 12,
+ 27, 28, 26, 25, 21, 18, 18, 16, 14, 13, 13, 12, 12, 12, 11, 11, 26, 28,
+ 26, 24, 20, 18, 17, 16, 14, 13, 12, 11, 11, 11, 11, 11, 23, 25, 24, 23,
+ 19, 16, 16, 14, 13, 12, 11, 11, 11, 11, 11, 10, 22, 23, 23, 22, 18, 16,
+ 15, 14, 12, 11, 11, 10, 10, 10, 10, 10, 21, 22, 22, 21, 18, 15, 14, 13,
+ 12, 11, 11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 14, 14, 12, 11, 10,
+ 10, 9, 9, 10, 9, 10, 18, 19, 19, 19, 16, 14, 13, 12, 10, 10, 9, 9, 9, 9,
+ 9, 9, 17, 18, 18, 18, 16, 13, 13, 12, 10, 10, 9, 9, 9, 9, 9, 9, 16, 17,
+ 17, 17, 15, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 12,
+ 12, 11, 9, 9, 8, 8, 8, 8, 8, 8, 13, 15, 15, 15, 13, 12, 11, 10, 9, 8, 8,
+ 8, 8, 8, 8, 8, 13, 14, 15, 14, 13, 11, 11, 10, 9, 8, 8, 7, 7, 7, 7, 8,
+ 12, 14, 14, 14, 13, 11, 11, 10, 8, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13,
+ 12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 10, 9, 8,
+ 8, 7, 7, 7, 7, 7, 6, 11, 12, 12, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 6, 6,
+ 6, 11, 12, 12, 12, 11, 11, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6, 10, 12, 12,
+ 12, 11, 11, 9, 9, 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 9, 9,
+ 8, 8, 7, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6,
+ 6, 6, 10, 10, 11, 11, 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5, 5, 9, 10, 10,
+ 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 6, 5, 5, 9, 10, 10, 10, 10, 9, 9, 8, 8,
+ 7, 7, 6, 6, 5, 5, 5,
/* Size 4x16 */
- 33, 23, 14, 11, 32, 25, 15, 12, 32, 24, 16, 12, 31, 23, 16, 13, 30, 19,
- 14, 12, 28, 18, 13, 11, 23, 16, 11, 10, 21, 14, 10, 10, 18, 13, 10, 9,
- 16, 12, 9, 8, 14, 11, 8, 7, 13, 11, 8, 7, 12, 11, 8, 6, 12, 11, 8, 6,
- 11, 10, 8, 6, 10, 9, 7, 6,
- /* Size 16x4 */
33, 32, 32, 31, 30, 28, 23, 21, 18, 16, 14, 13, 12, 12, 11, 10, 23, 25,
24, 23, 19, 18, 16, 14, 13, 12, 11, 11, 11, 11, 10, 9, 14, 15, 16, 16,
14, 13, 11, 10, 10, 9, 8, 8, 8, 8, 8, 7, 11, 12, 12, 13, 12, 11, 10, 10,
9, 8, 7, 7, 6, 6, 6, 6,
+ /* Size 16x4 */
+ 33, 23, 14, 11, 32, 25, 15, 12, 32, 24, 16, 12, 31, 23, 16, 13, 30, 19,
+ 14, 12, 28, 18, 13, 11, 23, 16, 11, 10, 21, 14, 10, 10, 18, 13, 10, 9,
+ 16, 12, 9, 8, 14, 11, 8, 7, 13, 11, 8, 7, 12, 11, 8, 6, 12, 11, 8, 6,
+ 11, 10, 8, 6, 10, 9, 7, 6,
/* Size 8x32 */
- 32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 33, 32,
- 29, 23, 17, 14, 12, 11, 33, 31, 29, 23, 17, 14, 12, 12, 32, 30, 28, 23,
- 17, 14, 13, 12, 32, 29, 27, 23, 18, 15, 13, 12, 32, 29, 26, 22, 17, 14,
- 13, 12, 30, 28, 24, 20, 16, 14, 12, 12, 28, 28, 21, 18, 15, 13, 12, 12,
- 27, 26, 21, 18, 14, 13, 12, 11, 26, 26, 20, 17, 14, 12, 11, 11, 23, 24,
- 19, 16, 13, 11, 11, 11, 22, 23, 18, 15, 12, 11, 10, 10, 21, 22, 18, 14,
- 12, 11, 10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 18, 19, 16, 13, 10, 9, 9,
- 9, 17, 18, 16, 13, 10, 9, 9, 9, 16, 17, 15, 12, 10, 9, 8, 8, 14, 16, 14,
- 12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7,
- 12, 14, 13, 11, 8, 8, 7, 7, 12, 13, 12, 10, 8, 7, 7, 7, 12, 13, 12, 10,
- 8, 7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 11, 12, 11, 10, 9, 7, 6, 6, 10,
- 12, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7,
- 6, 6, 10, 11, 11, 9, 8, 7, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9,
- 8, 7, 6, 5,
- /* Size 32x8 */
32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 17, 16,
14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 32, 32, 32, 31,
30, 29, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 18, 17, 16, 15, 15, 14,
@@ -7268,7 +7253,22 @@
7, 7, 7, 7, 7, 11, 12, 12, 12, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10,
9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 11, 11, 11, 12,
12, 12, 12, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 6,
- 6, 6, 6, 6, 5, 5, 5 },
+ 6, 6, 6, 6, 5, 5, 5,
+ /* Size 32x8 */
+ 32, 32, 28, 22, 16, 13, 11, 11, 33, 32, 29, 23, 17, 14, 12, 11, 33, 32,
+ 29, 23, 17, 14, 12, 11, 33, 31, 29, 23, 17, 14, 12, 12, 32, 30, 28, 23,
+ 17, 14, 13, 12, 32, 29, 27, 23, 18, 15, 13, 12, 32, 29, 26, 22, 17, 14,
+ 13, 12, 30, 28, 24, 20, 16, 14, 12, 12, 28, 28, 21, 18, 15, 13, 12, 12,
+ 27, 26, 21, 18, 14, 13, 12, 11, 26, 26, 20, 17, 14, 12, 11, 11, 23, 24,
+ 19, 16, 13, 11, 11, 11, 22, 23, 18, 15, 12, 11, 10, 10, 21, 22, 18, 14,
+ 12, 11, 10, 10, 19, 20, 17, 14, 11, 10, 9, 9, 18, 19, 16, 13, 10, 9, 9,
+ 9, 17, 18, 16, 13, 10, 9, 9, 9, 16, 17, 15, 12, 10, 9, 8, 8, 14, 16, 14,
+ 12, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 8, 8, 13, 15, 13, 11, 9, 8, 7, 7,
+ 12, 14, 13, 11, 8, 8, 7, 7, 12, 13, 12, 10, 8, 7, 7, 7, 12, 13, 12, 10,
+ 8, 7, 7, 7, 11, 12, 12, 10, 8, 7, 7, 6, 11, 12, 11, 10, 9, 7, 6, 6, 10,
+ 12, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7, 6, 6, 10, 11, 11, 9, 8, 7,
+ 6, 6, 10, 11, 11, 9, 8, 7, 6, 5, 9, 10, 10, 9, 8, 7, 6, 5, 9, 10, 10, 9,
+ 8, 7, 6, 5 },
{ /* Chroma */
/* Size 4x4 */
31, 23, 18, 16, 23, 18, 16, 15, 18, 16, 12, 12, 16, 15, 12, 10,
@@ -7352,21 +7352,12 @@
14, 14, 14, 15, 15, 16, 16, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12,
12, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
/* Size 4x8 */
- 33, 22, 18, 16, 26, 23, 20, 17, 22, 19, 17, 16, 22, 17, 15, 14, 20, 16,
- 13, 13, 17, 15, 12, 11, 16, 16, 12, 10, 16, 15, 12, 10,
- /* Size 8x4 */
33, 26, 22, 22, 20, 17, 16, 16, 22, 23, 19, 17, 16, 15, 16, 15, 18, 20,
17, 15, 13, 12, 12, 12, 16, 17, 16, 14, 13, 11, 10, 10,
+ /* Size 8x4 */
+ 33, 22, 18, 16, 26, 23, 20, 17, 22, 19, 17, 16, 22, 17, 15, 14, 20, 16,
+ 13, 13, 17, 15, 12, 11, 16, 16, 12, 10, 16, 15, 12, 10,
/* Size 8x16 */
- 32, 29, 21, 20, 18, 16, 15, 15, 34, 27, 22, 22, 20, 18, 16, 16, 31, 25,
- 22, 22, 20, 18, 17, 16, 26, 22, 21, 22, 20, 19, 18, 17, 21, 21, 19, 19,
- 18, 17, 17, 17, 21, 22, 19, 18, 17, 16, 16, 16, 20, 22, 19, 17, 16, 15,
- 14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 18, 16, 14, 13, 13, 13,
- 17, 19, 18, 15, 13, 12, 12, 12, 16, 19, 17, 15, 12, 12, 11, 12, 16, 18,
- 17, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11, 11, 11, 15, 17, 16, 14,
- 13, 12, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 13, 12,
- 10, 10,
- /* Size 16x8 */
32, 34, 31, 26, 21, 21, 20, 20, 19, 17, 16, 16, 15, 15, 14, 14, 29, 27,
25, 22, 21, 22, 22, 22, 21, 19, 19, 18, 17, 17, 16, 15, 21, 22, 22, 21,
19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 20, 22, 22, 22, 19, 18,
@@ -7375,37 +7366,16 @@
12, 11, 11, 12, 11, 12, 15, 16, 17, 18, 17, 16, 14, 14, 13, 12, 11, 11,
11, 10, 10, 10, 15, 16, 16, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11, 10,
10, 10,
+ /* Size 16x8 */
+ 32, 29, 21, 20, 18, 16, 15, 15, 34, 27, 22, 22, 20, 18, 16, 16, 31, 25,
+ 22, 22, 20, 18, 17, 16, 26, 22, 21, 22, 20, 19, 18, 17, 21, 21, 19, 19,
+ 18, 17, 17, 17, 21, 22, 19, 18, 17, 16, 16, 16, 20, 22, 19, 17, 16, 15,
+ 14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 18, 16, 14, 13, 13, 13,
+ 17, 19, 18, 15, 13, 12, 12, 12, 16, 19, 17, 15, 12, 12, 11, 12, 16, 18,
+ 17, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11, 11, 11, 15, 17, 16, 14,
+ 13, 12, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 15, 16, 14, 13, 12,
+ 10, 10,
/* Size 16x32 */
- 32, 33, 29, 27, 21, 21, 20, 20, 18, 17, 16, 15, 15, 15, 15, 14, 33, 33,
- 28, 26, 22, 22, 21, 20, 19, 18, 17, 16, 16, 16, 16, 15, 34, 32, 27, 26,
- 22, 23, 22, 21, 20, 19, 18, 17, 16, 16, 16, 15, 33, 31, 27, 25, 22, 23,
- 22, 21, 20, 19, 18, 17, 17, 17, 16, 16, 31, 28, 25, 23, 22, 22, 22, 22,
- 20, 19, 18, 17, 17, 17, 16, 16, 28, 26, 23, 22, 22, 23, 22, 22, 20, 20,
- 19, 18, 17, 17, 17, 17, 26, 25, 22, 22, 21, 22, 22, 21, 20, 19, 19, 18,
- 18, 17, 17, 17, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 18, 17, 17, 17,
- 17, 17, 21, 22, 21, 21, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 17, 17,
- 21, 22, 22, 21, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 22,
- 22, 21, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22,
- 19, 18, 17, 17, 16, 16, 15, 15, 15, 15, 16, 15, 20, 22, 22, 21, 19, 17,
- 17, 16, 16, 15, 15, 14, 14, 15, 15, 15, 20, 22, 22, 21, 19, 17, 17, 16,
- 15, 15, 14, 14, 14, 14, 15, 14, 20, 21, 22, 21, 19, 17, 16, 16, 14, 14,
- 14, 13, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 16, 15, 14, 13, 13, 13,
- 13, 13, 14, 14, 19, 20, 21, 20, 18, 16, 16, 15, 14, 13, 13, 13, 13, 13,
- 13, 14, 18, 20, 20, 20, 18, 16, 16, 15, 13, 13, 12, 12, 12, 13, 13, 13,
- 17, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 13, 17, 18,
- 19, 19, 17, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 16, 18, 19, 18,
- 17, 15, 15, 14, 12, 12, 12, 11, 11, 12, 12, 12, 16, 17, 18, 18, 17, 15,
- 14, 14, 12, 12, 11, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 15, 14, 13,
- 12, 12, 11, 11, 11, 11, 11, 12, 15, 17, 17, 18, 16, 15, 14, 13, 12, 12,
- 11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 13, 12, 11, 11,
- 11, 10, 11, 11, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 11, 11, 10, 10,
- 10, 10, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 12, 11, 10, 10, 10, 10,
- 14, 16, 16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16,
- 16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16, 16, 16,
- 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 15,
- 14, 13, 13, 12, 12, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 14, 14, 13,
- 13, 12, 12, 11, 11, 10, 10, 9,
- /* Size 32x16 */
32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 19, 18,
17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 33, 33, 32, 31,
28, 26, 25, 24, 22, 22, 22, 23, 22, 22, 21, 20, 20, 20, 19, 18, 18, 17,
@@ -7435,33 +7405,47 @@
12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 14, 15, 15, 16, 16, 17,
17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11,
11, 10, 10, 10, 10, 10, 10, 9,
+ /* Size 32x16 */
+ 32, 33, 29, 27, 21, 21, 20, 20, 18, 17, 16, 15, 15, 15, 15, 14, 33, 33,
+ 28, 26, 22, 22, 21, 20, 19, 18, 17, 16, 16, 16, 16, 15, 34, 32, 27, 26,
+ 22, 23, 22, 21, 20, 19, 18, 17, 16, 16, 16, 15, 33, 31, 27, 25, 22, 23,
+ 22, 21, 20, 19, 18, 17, 17, 17, 16, 16, 31, 28, 25, 23, 22, 22, 22, 22,
+ 20, 19, 18, 17, 17, 17, 16, 16, 28, 26, 23, 22, 22, 23, 22, 22, 20, 20,
+ 19, 18, 17, 17, 17, 17, 26, 25, 22, 22, 21, 22, 22, 21, 20, 19, 19, 18,
+ 18, 17, 17, 17, 24, 24, 22, 21, 20, 21, 20, 20, 19, 18, 18, 17, 17, 17,
+ 17, 17, 21, 22, 21, 21, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 17, 17,
+ 21, 22, 22, 21, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 22,
+ 22, 21, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22,
+ 19, 18, 17, 17, 16, 16, 15, 15, 15, 15, 16, 15, 20, 22, 22, 21, 19, 17,
+ 17, 16, 16, 15, 15, 14, 14, 15, 15, 15, 20, 22, 22, 21, 19, 17, 17, 16,
+ 15, 15, 14, 14, 14, 14, 15, 14, 20, 21, 22, 21, 19, 17, 16, 16, 14, 14,
+ 14, 13, 14, 14, 14, 14, 19, 20, 21, 20, 19, 17, 16, 15, 14, 13, 13, 13,
+ 13, 13, 14, 14, 19, 20, 21, 20, 18, 16, 16, 15, 14, 13, 13, 13, 13, 13,
+ 13, 14, 18, 20, 20, 20, 18, 16, 16, 15, 13, 13, 12, 12, 12, 13, 13, 13,
+ 17, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 13, 17, 18,
+ 19, 19, 17, 16, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 16, 18, 19, 18,
+ 17, 15, 15, 14, 12, 12, 12, 11, 11, 12, 12, 12, 16, 17, 18, 18, 17, 15,
+ 14, 14, 12, 12, 11, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 15, 14, 13,
+ 12, 12, 11, 11, 11, 11, 11, 12, 15, 17, 17, 18, 16, 15, 14, 13, 12, 12,
+ 11, 11, 11, 11, 11, 11, 15, 17, 17, 17, 16, 15, 14, 13, 13, 12, 11, 11,
+ 11, 10, 11, 11, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 11, 11, 10, 10,
+ 10, 10, 15, 16, 17, 17, 16, 16, 14, 13, 13, 12, 12, 11, 10, 10, 10, 10,
+ 14, 16, 16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16,
+ 16, 17, 16, 15, 14, 14, 12, 12, 11, 11, 10, 10, 10, 10, 14, 16, 16, 16,
+ 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 15,
+ 14, 13, 13, 12, 12, 11, 10, 10, 10, 10, 14, 15, 15, 16, 16, 14, 14, 13,
+ 13, 12, 12, 11, 11, 10, 10, 9,
/* Size 4x16 */
- 33, 21, 17, 15, 32, 23, 19, 16, 28, 22, 19, 17, 25, 22, 19, 17, 22, 19,
- 17, 17, 22, 18, 17, 16, 22, 17, 15, 15, 21, 17, 14, 14, 20, 16, 13, 13,
- 19, 16, 12, 12, 18, 15, 12, 12, 17, 15, 12, 11, 17, 15, 12, 10, 16, 16,
- 12, 10, 16, 15, 12, 10, 15, 15, 12, 10,
- /* Size 16x4 */
33, 32, 28, 25, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 15, 21, 23,
22, 22, 19, 18, 17, 17, 16, 16, 15, 15, 15, 16, 15, 15, 17, 19, 19, 19,
17, 17, 15, 14, 13, 12, 12, 12, 12, 12, 12, 12, 15, 16, 17, 17, 17, 16,
15, 14, 13, 12, 12, 11, 10, 10, 10, 10,
+ /* Size 16x4 */
+ 33, 21, 17, 15, 32, 23, 19, 16, 28, 22, 19, 17, 25, 22, 19, 17, 22, 19,
+ 17, 17, 22, 18, 17, 16, 22, 17, 15, 15, 21, 17, 14, 14, 20, 16, 13, 13,
+ 19, 16, 12, 12, 18, 15, 12, 12, 17, 15, 12, 11, 17, 15, 12, 10, 16, 16,
+ 12, 10, 16, 15, 12, 10, 15, 15, 12, 10,
/* Size 8x32 */
- 32, 29, 21, 20, 18, 16, 15, 15, 33, 28, 22, 21, 19, 17, 16, 16, 34, 27,
- 22, 22, 20, 18, 16, 16, 33, 27, 22, 22, 20, 18, 17, 16, 31, 25, 22, 22,
- 20, 18, 17, 16, 28, 23, 22, 22, 20, 19, 17, 17, 26, 22, 21, 22, 20, 19,
- 18, 17, 24, 22, 20, 20, 19, 18, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
- 21, 22, 19, 19, 18, 17, 16, 16, 21, 22, 19, 18, 17, 16, 16, 16, 21, 23,
- 19, 17, 16, 15, 15, 16, 20, 22, 19, 17, 16, 15, 14, 15, 20, 22, 19, 17,
- 15, 14, 14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 19, 16, 14, 13,
- 13, 14, 19, 21, 18, 16, 14, 13, 13, 13, 18, 20, 18, 16, 13, 12, 12, 13,
- 17, 19, 18, 15, 13, 12, 12, 12, 17, 19, 17, 15, 13, 12, 12, 12, 16, 19,
- 17, 15, 12, 12, 11, 12, 16, 18, 17, 14, 12, 11, 11, 12, 16, 18, 17, 14,
- 12, 11, 11, 11, 15, 17, 16, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11,
- 11, 11, 15, 17, 16, 14, 13, 11, 10, 10, 15, 17, 16, 14, 13, 12, 10, 10,
- 14, 16, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 16,
- 16, 14, 13, 11, 10, 10, 14, 15, 16, 14, 13, 12, 10, 10, 14, 15, 16, 14,
- 13, 12, 11, 10,
- /* Size 32x8 */
32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 19, 18,
17, 17, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 14, 14, 29, 28, 27, 27,
25, 23, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 21, 20, 19, 19, 19, 18,
@@ -7476,7 +7460,23 @@
18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11,
11, 10, 10, 10, 10, 10, 10, 11, 15, 16, 16, 16, 16, 17, 17, 17, 17, 16,
16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10,
- 10, 10, 10, 10 },
+ 10, 10, 10, 10,
+ /* Size 32x8 */
+ 32, 29, 21, 20, 18, 16, 15, 15, 33, 28, 22, 21, 19, 17, 16, 16, 34, 27,
+ 22, 22, 20, 18, 16, 16, 33, 27, 22, 22, 20, 18, 17, 16, 31, 25, 22, 22,
+ 20, 18, 17, 16, 28, 23, 22, 22, 20, 19, 17, 17, 26, 22, 21, 22, 20, 19,
+ 18, 17, 24, 22, 20, 20, 19, 18, 17, 17, 21, 21, 19, 19, 18, 17, 17, 17,
+ 21, 22, 19, 19, 18, 17, 16, 16, 21, 22, 19, 18, 17, 16, 16, 16, 21, 23,
+ 19, 17, 16, 15, 15, 16, 20, 22, 19, 17, 16, 15, 14, 15, 20, 22, 19, 17,
+ 15, 14, 14, 15, 20, 22, 19, 16, 14, 14, 14, 14, 19, 21, 19, 16, 14, 13,
+ 13, 14, 19, 21, 18, 16, 14, 13, 13, 13, 18, 20, 18, 16, 13, 12, 12, 13,
+ 17, 19, 18, 15, 13, 12, 12, 12, 17, 19, 17, 15, 13, 12, 12, 12, 16, 19,
+ 17, 15, 12, 12, 11, 12, 16, 18, 17, 14, 12, 11, 11, 12, 16, 18, 17, 14,
+ 12, 11, 11, 11, 15, 17, 16, 14, 12, 11, 11, 11, 15, 17, 16, 14, 13, 11,
+ 11, 11, 15, 17, 16, 14, 13, 11, 10, 10, 15, 17, 16, 14, 13, 12, 10, 10,
+ 14, 16, 16, 14, 12, 11, 10, 10, 14, 16, 16, 14, 12, 11, 10, 10, 14, 16,
+ 16, 14, 13, 11, 10, 10, 14, 15, 16, 14, 13, 12, 10, 10, 14, 15, 16, 14,
+ 13, 12, 11, 10 },
},
{
{ /* Luma */
@@ -7558,20 +7558,12 @@
11, 11, 11, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 8, 8, 7, 7, 7, 7, 6, 6,
6, 6, 6, 6, 5, 5, 5,
/* Size 4x8 */
- 32, 27, 17, 12, 32, 26, 18, 13, 30, 20, 15, 12, 23, 17, 12, 10, 19, 15,
- 10, 9, 14, 12, 9, 8, 12, 12, 8, 7, 11, 10, 8, 6,
- /* Size 8x4 */
32, 32, 30, 23, 19, 14, 12, 11, 27, 26, 20, 17, 15, 12, 12, 10, 17, 18,
15, 12, 10, 9, 8, 8, 12, 13, 12, 10, 9, 8, 7, 6,
+ /* Size 8x4 */
+ 32, 27, 17, 12, 32, 26, 18, 13, 30, 20, 15, 12, 23, 17, 12, 10, 19, 15,
+ 10, 9, 14, 12, 9, 8, 12, 12, 8, 7, 11, 10, 8, 6,
/* Size 8x16 */
- 32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 25, 19, 14, 13, 12, 32, 31,
- 28, 24, 19, 14, 13, 12, 32, 30, 27, 24, 20, 15, 13, 12, 30, 28, 23, 20,
- 17, 14, 13, 12, 26, 26, 20, 18, 15, 12, 12, 11, 23, 24, 19, 16, 14, 11,
- 11, 11, 21, 22, 18, 15, 13, 11, 10, 10, 18, 19, 16, 14, 11, 9, 9, 9, 16,
- 17, 15, 13, 11, 9, 8, 8, 14, 16, 14, 12, 10, 8, 8, 8, 13, 14, 13, 11, 9,
- 8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 12, 12, 10, 9, 8, 7, 6, 10, 12,
- 12, 10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
- /* Size 16x8 */
32, 33, 32, 32, 30, 26, 23, 21, 18, 16, 14, 13, 12, 11, 10, 10, 32, 32,
31, 30, 28, 26, 24, 22, 19, 17, 16, 14, 13, 12, 12, 11, 28, 29, 28, 27,
23, 20, 19, 18, 16, 15, 14, 13, 12, 12, 12, 11, 23, 25, 24, 24, 20, 18,
@@ -7579,35 +7571,15 @@
11, 11, 10, 9, 9, 9, 8, 9, 13, 14, 14, 15, 14, 12, 11, 11, 9, 9, 8, 8,
7, 8, 7, 7, 12, 13, 13, 13, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 6, 6, 11,
12, 12, 12, 12, 11, 11, 10, 9, 8, 8, 7, 7, 6, 6, 6,
+ /* Size 16x8 */
+ 32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 25, 19, 14, 13, 12, 32, 31,
+ 28, 24, 19, 14, 13, 12, 32, 30, 27, 24, 20, 15, 13, 12, 30, 28, 23, 20,
+ 17, 14, 13, 12, 26, 26, 20, 18, 15, 12, 12, 11, 23, 24, 19, 16, 14, 11,
+ 11, 11, 21, 22, 18, 15, 13, 11, 10, 10, 18, 19, 16, 14, 11, 9, 9, 9, 16,
+ 17, 15, 13, 11, 9, 8, 8, 14, 16, 14, 12, 10, 8, 8, 8, 13, 14, 13, 11, 9,
+ 8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 12, 12, 10, 9, 8, 7, 6, 10, 12,
+ 12, 10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
/* Size 16x32 */
- 32, 33, 32, 32, 28, 26, 23, 19, 18, 16, 13, 13, 12, 11, 11, 11, 33, 32,
- 32, 32, 29, 27, 24, 20, 19, 17, 14, 13, 12, 12, 12, 11, 33, 32, 32, 32,
- 29, 27, 25, 20, 19, 17, 14, 14, 13, 12, 12, 11, 33, 32, 32, 31, 30, 28,
- 25, 21, 19, 17, 14, 14, 13, 12, 12, 12, 32, 32, 31, 30, 28, 26, 24, 20,
- 19, 17, 14, 14, 13, 13, 12, 12, 32, 32, 30, 30, 28, 26, 24, 21, 19, 18,
- 15, 14, 13, 13, 12, 12, 32, 31, 30, 29, 27, 26, 24, 21, 20, 18, 15, 15,
- 13, 13, 12, 12, 30, 30, 29, 28, 24, 23, 21, 19, 18, 16, 14, 14, 13, 13,
- 13, 12, 30, 30, 28, 28, 23, 22, 20, 18, 17, 16, 14, 13, 13, 12, 12, 12,
- 28, 30, 28, 27, 21, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 12, 26, 28,
- 26, 26, 20, 19, 18, 16, 15, 14, 12, 12, 12, 12, 11, 12, 26, 27, 26, 25,
- 20, 19, 17, 15, 15, 14, 12, 12, 11, 11, 11, 11, 23, 25, 24, 24, 19, 18,
- 16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 22, 23, 23, 22, 18, 17, 16, 14,
- 13, 12, 11, 11, 10, 10, 10, 10, 21, 22, 22, 22, 18, 17, 15, 13, 13, 12,
- 11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 16, 14, 12, 12, 11, 10, 10,
- 9, 9, 10, 9, 18, 19, 19, 19, 16, 15, 14, 12, 11, 11, 9, 9, 9, 9, 9, 9,
- 17, 19, 19, 19, 16, 15, 14, 12, 11, 10, 9, 9, 9, 9, 9, 9, 16, 17, 17,
- 18, 15, 14, 13, 11, 11, 10, 9, 9, 8, 8, 8, 9, 15, 16, 17, 17, 14, 13,
- 12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 13, 12, 11, 10, 9,
- 8, 8, 8, 8, 8, 8, 13, 14, 14, 15, 13, 12, 11, 10, 9, 9, 8, 8, 7, 8, 8,
- 7, 13, 14, 14, 14, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 7, 12, 14, 14,
- 14, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 11,
- 9, 9, 8, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 12, 10, 9, 9, 8, 8, 7, 7,
- 7, 6, 6, 11, 12, 12, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 6, 6, 6, 11, 12,
- 12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 7, 6, 6, 6, 10, 12, 12, 12, 12, 11,
- 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 10, 9, 9, 8, 7,
- 7, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6,
- 10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5,
- /* Size 32x16 */
32, 33, 33, 33, 32, 32, 32, 30, 30, 28, 26, 26, 23, 22, 21, 19, 18, 17,
16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 33, 32, 32, 32,
32, 32, 31, 30, 30, 30, 28, 27, 25, 23, 22, 21, 19, 19, 17, 16, 16, 14,
@@ -7635,32 +7607,45 @@
12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6, 6, 6, 6,
6, 6, 6, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 10, 10, 9,
9, 9, 9, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 5,
+ /* Size 32x16 */
+ 32, 33, 32, 32, 28, 26, 23, 19, 18, 16, 13, 13, 12, 11, 11, 11, 33, 32,
+ 32, 32, 29, 27, 24, 20, 19, 17, 14, 13, 12, 12, 12, 11, 33, 32, 32, 32,
+ 29, 27, 25, 20, 19, 17, 14, 14, 13, 12, 12, 11, 33, 32, 32, 31, 30, 28,
+ 25, 21, 19, 17, 14, 14, 13, 12, 12, 12, 32, 32, 31, 30, 28, 26, 24, 20,
+ 19, 17, 14, 14, 13, 13, 12, 12, 32, 32, 30, 30, 28, 26, 24, 21, 19, 18,
+ 15, 14, 13, 13, 12, 12, 32, 31, 30, 29, 27, 26, 24, 21, 20, 18, 15, 15,
+ 13, 13, 12, 12, 30, 30, 29, 28, 24, 23, 21, 19, 18, 16, 14, 14, 13, 13,
+ 13, 12, 30, 30, 28, 28, 23, 22, 20, 18, 17, 16, 14, 13, 13, 12, 12, 12,
+ 28, 30, 28, 27, 21, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 12, 26, 28,
+ 26, 26, 20, 19, 18, 16, 15, 14, 12, 12, 12, 12, 11, 12, 26, 27, 26, 25,
+ 20, 19, 17, 15, 15, 14, 12, 12, 11, 11, 11, 11, 23, 25, 24, 24, 19, 18,
+ 16, 14, 14, 13, 11, 11, 11, 11, 11, 11, 22, 23, 23, 22, 18, 17, 16, 14,
+ 13, 12, 11, 11, 10, 10, 10, 10, 21, 22, 22, 22, 18, 17, 15, 13, 13, 12,
+ 11, 10, 10, 10, 10, 10, 19, 21, 20, 20, 17, 16, 14, 12, 12, 11, 10, 10,
+ 9, 9, 10, 9, 18, 19, 19, 19, 16, 15, 14, 12, 11, 11, 9, 9, 9, 9, 9, 9,
+ 17, 19, 19, 19, 16, 15, 14, 12, 11, 10, 9, 9, 9, 9, 9, 9, 16, 17, 17,
+ 18, 15, 14, 13, 11, 11, 10, 9, 9, 8, 8, 8, 9, 15, 16, 17, 17, 14, 13,
+ 12, 11, 10, 9, 8, 8, 8, 8, 8, 8, 14, 16, 16, 16, 14, 13, 12, 11, 10, 9,
+ 8, 8, 8, 8, 8, 8, 13, 14, 14, 15, 13, 12, 11, 10, 9, 9, 8, 8, 7, 8, 8,
+ 7, 13, 14, 14, 14, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 7, 12, 14, 14,
+ 14, 13, 12, 11, 10, 9, 8, 8, 7, 7, 7, 7, 7, 12, 13, 13, 13, 12, 11, 11,
+ 9, 9, 8, 7, 7, 7, 7, 7, 7, 11, 12, 13, 13, 12, 12, 10, 9, 9, 8, 8, 7, 7,
+ 7, 6, 6, 11, 12, 12, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 6, 6, 6, 11, 12,
+ 12, 12, 12, 11, 10, 10, 9, 8, 7, 7, 7, 6, 6, 6, 10, 12, 12, 12, 12, 11,
+ 10, 9, 8, 8, 7, 7, 6, 6, 6, 6, 10, 11, 11, 12, 11, 10, 10, 9, 9, 8, 7,
+ 7, 6, 6, 6, 6, 10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 6, 6,
+ 10, 11, 11, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 6, 6, 5,
/* Size 4x16 */
- 33, 26, 16, 11, 32, 27, 17, 12, 32, 26, 17, 13, 31, 26, 18, 13, 30, 22,
- 16, 12, 28, 19, 14, 12, 25, 18, 13, 11, 22, 17, 12, 10, 19, 15, 11, 9,
- 17, 14, 10, 8, 16, 13, 9, 8, 14, 12, 9, 7, 13, 11, 8, 7, 12, 11, 8, 6,
- 12, 11, 8, 6, 11, 10, 8, 6,
- /* Size 16x4 */
33, 32, 32, 31, 30, 28, 25, 22, 19, 17, 16, 14, 13, 12, 12, 11, 26, 27,
26, 26, 22, 19, 18, 17, 15, 14, 13, 12, 11, 11, 11, 10, 16, 17, 17, 18,
16, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 11, 12, 13, 13, 12, 12, 11,
10, 9, 8, 8, 7, 7, 6, 6, 6,
+ /* Size 16x4 */
+ 33, 26, 16, 11, 32, 27, 17, 12, 32, 26, 17, 13, 31, 26, 18, 13, 30, 22,
+ 16, 12, 28, 19, 14, 12, 25, 18, 13, 11, 22, 17, 12, 10, 19, 15, 11, 9,
+ 17, 14, 10, 8, 16, 13, 9, 8, 14, 12, 9, 7, 13, 11, 8, 7, 12, 11, 8, 6,
+ 12, 11, 8, 6, 11, 10, 8, 6,
/* Size 8x32 */
- 32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 24, 19, 14, 12, 12, 33, 32,
- 29, 25, 19, 14, 13, 12, 33, 32, 30, 25, 19, 14, 13, 12, 32, 31, 28, 24,
- 19, 14, 13, 12, 32, 30, 28, 24, 19, 15, 13, 12, 32, 30, 27, 24, 20, 15,
- 13, 12, 30, 29, 24, 21, 18, 14, 13, 13, 30, 28, 23, 20, 17, 14, 13, 12,
- 28, 28, 21, 19, 16, 13, 12, 12, 26, 26, 20, 18, 15, 12, 12, 11, 26, 26,
- 20, 17, 15, 12, 11, 11, 23, 24, 19, 16, 14, 11, 11, 11, 22, 23, 18, 16,
- 13, 11, 10, 10, 21, 22, 18, 15, 13, 11, 10, 10, 19, 20, 17, 14, 12, 10,
- 9, 10, 18, 19, 16, 14, 11, 9, 9, 9, 17, 19, 16, 14, 11, 9, 9, 9, 16, 17,
- 15, 13, 11, 9, 8, 8, 15, 17, 14, 12, 10, 8, 8, 8, 14, 16, 14, 12, 10, 8,
- 8, 8, 13, 14, 13, 11, 9, 8, 7, 8, 13, 14, 13, 11, 9, 8, 7, 7, 12, 14,
- 13, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 13, 12, 10, 9, 8, 7,
- 6, 11, 12, 12, 10, 9, 8, 7, 6, 11, 12, 12, 10, 9, 7, 7, 6, 10, 12, 12,
- 10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
- 10, 11, 11, 10, 9, 8, 7, 6,
- /* Size 32x8 */
32, 33, 33, 33, 32, 32, 32, 30, 30, 28, 26, 26, 23, 22, 21, 19, 18, 17,
16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 10, 10, 32, 32, 32, 32,
31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 19, 17, 17, 16, 14,
@@ -7674,7 +7659,22 @@
8, 8, 7, 7, 7, 7, 8, 12, 12, 13, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11,
10, 10, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 7, 11, 12, 12,
12, 12, 12, 12, 13, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7,
- 7, 7, 6, 6, 6, 6, 6, 6, 6 },
+ 7, 7, 6, 6, 6, 6, 6, 6, 6,
+ /* Size 32x8 */
+ 32, 32, 28, 23, 18, 13, 12, 11, 33, 32, 29, 24, 19, 14, 12, 12, 33, 32,
+ 29, 25, 19, 14, 13, 12, 33, 32, 30, 25, 19, 14, 13, 12, 32, 31, 28, 24,
+ 19, 14, 13, 12, 32, 30, 28, 24, 19, 15, 13, 12, 32, 30, 27, 24, 20, 15,
+ 13, 12, 30, 29, 24, 21, 18, 14, 13, 13, 30, 28, 23, 20, 17, 14, 13, 12,
+ 28, 28, 21, 19, 16, 13, 12, 12, 26, 26, 20, 18, 15, 12, 12, 11, 26, 26,
+ 20, 17, 15, 12, 11, 11, 23, 24, 19, 16, 14, 11, 11, 11, 22, 23, 18, 16,
+ 13, 11, 10, 10, 21, 22, 18, 15, 13, 11, 10, 10, 19, 20, 17, 14, 12, 10,
+ 9, 10, 18, 19, 16, 14, 11, 9, 9, 9, 17, 19, 16, 14, 11, 9, 9, 9, 16, 17,
+ 15, 13, 11, 9, 8, 8, 15, 17, 14, 12, 10, 8, 8, 8, 14, 16, 14, 12, 10, 8,
+ 8, 8, 13, 14, 13, 11, 9, 8, 7, 8, 13, 14, 13, 11, 9, 8, 7, 7, 12, 14,
+ 13, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 7, 7, 7, 11, 13, 12, 10, 9, 8, 7,
+ 6, 11, 12, 12, 10, 9, 8, 7, 6, 11, 12, 12, 10, 9, 7, 7, 6, 10, 12, 12,
+ 10, 8, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6, 10, 11, 11, 10, 9, 7, 6, 6,
+ 10, 11, 11, 10, 9, 8, 7, 6 },
{ /* Chroma */
/* Size 4x4 */
32, 23, 19, 16, 23, 19, 17, 15, 19, 17, 13, 12, 16, 15, 12, 10,
@@ -7758,21 +7758,12 @@
10, 10, 14, 15, 15, 16, 16, 16, 16, 17, 17, 16, 16, 15, 15, 14, 14, 13,
13, 13, 13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 9,
/* Size 4x8 */
- 33, 22, 19, 16, 27, 22, 20, 17, 22, 19, 18, 17, 22, 18, 16, 14, 20, 17,
- 14, 13, 18, 16, 12, 12, 17, 16, 12, 11, 16, 15, 12, 10,
- /* Size 8x4 */
33, 27, 22, 22, 20, 18, 17, 16, 22, 22, 19, 18, 17, 16, 16, 15, 19, 20,
18, 16, 14, 12, 12, 12, 16, 17, 17, 14, 13, 12, 11, 10,
+ /* Size 8x4 */
+ 33, 22, 19, 16, 27, 22, 20, 17, 22, 19, 18, 17, 22, 18, 16, 14, 20, 17,
+ 14, 13, 18, 16, 12, 12, 17, 16, 12, 11, 16, 15, 12, 10,
/* Size 8x16 */
- 32, 30, 21, 21, 19, 16, 15, 15, 33, 28, 22, 22, 20, 18, 17, 16, 31, 26,
- 22, 22, 21, 18, 17, 17, 28, 23, 22, 23, 21, 19, 18, 17, 23, 22, 20, 20,
- 19, 17, 17, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 23, 19, 18, 17, 15,
- 15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 13, 13, 13,
- 18, 20, 18, 16, 14, 12, 12, 13, 17, 19, 18, 16, 14, 12, 12, 12, 16, 18,
- 17, 15, 13, 12, 11, 12, 16, 17, 16, 15, 13, 11, 11, 11, 15, 17, 16, 14,
- 13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12,
- 11, 10,
- /* Size 16x8 */
32, 33, 31, 28, 23, 21, 21, 20, 19, 18, 17, 16, 16, 15, 15, 14, 30, 28,
26, 23, 22, 22, 23, 22, 21, 20, 19, 18, 17, 17, 16, 16, 21, 22, 22, 22,
20, 19, 19, 19, 19, 18, 18, 17, 16, 16, 16, 16, 21, 22, 22, 23, 20, 18,
@@ -7781,37 +7772,16 @@
12, 12, 11, 12, 12, 12, 15, 17, 17, 18, 17, 16, 15, 14, 13, 12, 12, 11,
11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 10,
10, 10,
+ /* Size 16x8 */
+ 32, 30, 21, 21, 19, 16, 15, 15, 33, 28, 22, 22, 20, 18, 17, 16, 31, 26,
+ 22, 22, 21, 18, 17, 17, 28, 23, 22, 23, 21, 19, 18, 17, 23, 22, 20, 20,
+ 19, 17, 17, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 23, 19, 18, 17, 15,
+ 15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 13, 13, 13,
+ 18, 20, 18, 16, 14, 12, 12, 13, 17, 19, 18, 16, 14, 12, 12, 12, 16, 18,
+ 17, 15, 13, 12, 11, 12, 16, 17, 16, 15, 13, 11, 11, 11, 15, 17, 16, 14,
+ 13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12,
+ 11, 10,
/* Size 16x32 */
- 32, 33, 30, 28, 21, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 15, 33, 33,
- 29, 27, 22, 22, 22, 20, 20, 19, 17, 17, 16, 16, 16, 16, 33, 32, 28, 26,
- 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 16, 34, 32, 28, 26, 22, 23,
- 23, 21, 21, 20, 18, 18, 17, 17, 17, 16, 31, 28, 26, 24, 22, 22, 22, 22,
- 21, 20, 18, 18, 17, 17, 17, 16, 29, 27, 24, 23, 22, 22, 23, 22, 21, 20,
- 19, 18, 18, 17, 17, 17, 28, 26, 23, 22, 22, 22, 23, 22, 21, 20, 19, 19,
- 18, 18, 17, 17, 24, 24, 23, 22, 20, 20, 21, 20, 20, 19, 18, 18, 17, 18,
- 17, 17, 23, 23, 22, 22, 20, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17, 17,
- 21, 22, 22, 21, 19, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 16, 21, 22,
- 22, 22, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 16, 16, 21, 23, 22, 22,
- 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22, 19, 18,
- 18, 17, 17, 16, 15, 15, 15, 15, 15, 16, 20, 22, 22, 22, 19, 18, 17, 16,
- 16, 16, 15, 14, 15, 14, 15, 15, 20, 22, 22, 22, 19, 18, 17, 16, 16, 15,
- 14, 14, 14, 14, 14, 15, 20, 21, 22, 22, 19, 18, 17, 16, 15, 14, 14, 14,
- 13, 14, 14, 14, 19, 21, 21, 21, 19, 18, 17, 15, 15, 14, 13, 13, 13, 13,
- 13, 14, 19, 20, 21, 21, 19, 17, 17, 15, 15, 14, 13, 13, 13, 13, 13, 13,
- 18, 20, 20, 20, 18, 17, 16, 15, 14, 13, 12, 12, 12, 12, 13, 13, 17, 19,
- 20, 20, 18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 17, 19, 19, 20,
- 18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 16,
- 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 16, 18, 18, 19, 17, 16, 15, 14,
- 13, 12, 12, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 16, 15, 14, 13, 12,
- 11, 11, 11, 11, 11, 11, 16, 17, 17, 18, 16, 16, 15, 13, 13, 12, 11, 11,
- 11, 11, 11, 11, 15, 17, 17, 18, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11,
- 11, 11, 15, 17, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 11, 10, 11,
- 15, 16, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 10, 15, 16,
- 16, 17, 16, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17,
- 16, 15, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17, 16, 15,
- 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 16, 16, 15, 15, 13,
- 13, 12, 12, 11, 11, 10, 10, 10,
- /* Size 32x16 */
32, 33, 33, 34, 31, 29, 28, 24, 23, 21, 21, 21, 21, 20, 20, 20, 19, 19,
18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 33, 33, 32, 32,
28, 27, 26, 24, 23, 22, 22, 23, 23, 22, 22, 21, 21, 20, 20, 19, 19, 18,
@@ -7841,33 +7811,47 @@
12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 10, 10, 15, 16, 16, 16, 16, 17,
17, 17, 17, 16, 16, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 12, 11,
11, 11, 11, 10, 10, 10, 10, 10,
+ /* Size 32x16 */
+ 32, 33, 30, 28, 21, 21, 21, 20, 19, 18, 16, 16, 15, 15, 15, 15, 33, 33,
+ 29, 27, 22, 22, 22, 20, 20, 19, 17, 17, 16, 16, 16, 16, 33, 32, 28, 26,
+ 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 16, 16, 34, 32, 28, 26, 22, 23,
+ 23, 21, 21, 20, 18, 18, 17, 17, 17, 16, 31, 28, 26, 24, 22, 22, 22, 22,
+ 21, 20, 18, 18, 17, 17, 17, 16, 29, 27, 24, 23, 22, 22, 23, 22, 21, 20,
+ 19, 18, 18, 17, 17, 17, 28, 26, 23, 22, 22, 22, 23, 22, 21, 20, 19, 19,
+ 18, 18, 17, 17, 24, 24, 23, 22, 20, 20, 21, 20, 20, 19, 18, 18, 17, 18,
+ 17, 17, 23, 23, 22, 22, 20, 20, 20, 20, 19, 19, 17, 17, 17, 17, 17, 17,
+ 21, 22, 22, 21, 19, 19, 19, 19, 19, 18, 17, 17, 16, 17, 17, 16, 21, 22,
+ 22, 22, 19, 19, 18, 18, 18, 17, 16, 16, 16, 16, 16, 16, 21, 23, 22, 22,
+ 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 16, 21, 23, 23, 22, 19, 18,
+ 18, 17, 17, 16, 15, 15, 15, 15, 15, 16, 20, 22, 22, 22, 19, 18, 17, 16,
+ 16, 16, 15, 14, 15, 14, 15, 15, 20, 22, 22, 22, 19, 18, 17, 16, 16, 15,
+ 14, 14, 14, 14, 14, 15, 20, 21, 22, 22, 19, 18, 17, 16, 15, 14, 14, 14,
+ 13, 14, 14, 14, 19, 21, 21, 21, 19, 18, 17, 15, 15, 14, 13, 13, 13, 13,
+ 13, 14, 19, 20, 21, 21, 19, 17, 17, 15, 15, 14, 13, 13, 13, 13, 13, 13,
+ 18, 20, 20, 20, 18, 17, 16, 15, 14, 13, 12, 12, 12, 12, 13, 13, 17, 19,
+ 20, 20, 18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 17, 19, 19, 20,
+ 18, 17, 16, 14, 14, 13, 12, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 16,
+ 15, 14, 13, 12, 12, 11, 11, 12, 12, 12, 16, 18, 18, 19, 17, 16, 15, 14,
+ 13, 12, 12, 11, 11, 11, 12, 12, 16, 17, 18, 18, 17, 16, 15, 14, 13, 12,
+ 11, 11, 11, 11, 11, 11, 16, 17, 17, 18, 16, 16, 15, 13, 13, 12, 11, 11,
+ 11, 11, 11, 11, 15, 17, 17, 18, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11,
+ 11, 11, 15, 17, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 11, 10, 11,
+ 15, 16, 17, 17, 16, 16, 14, 14, 13, 12, 12, 11, 11, 10, 10, 10, 15, 16,
+ 16, 17, 16, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17,
+ 16, 15, 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 17, 16, 15,
+ 15, 14, 13, 12, 12, 11, 11, 10, 10, 10, 14, 16, 16, 16, 16, 15, 15, 13,
+ 13, 12, 12, 11, 11, 10, 10, 10,
/* Size 4x16 */
- 33, 21, 18, 15, 32, 22, 19, 16, 28, 22, 20, 17, 26, 22, 20, 18, 23, 20,
- 19, 17, 22, 19, 17, 16, 23, 18, 16, 15, 22, 18, 15, 14, 21, 18, 14, 13,
- 20, 17, 13, 12, 19, 17, 13, 12, 18, 16, 12, 11, 17, 16, 12, 11, 17, 16,
- 12, 11, 16, 16, 13, 10, 16, 15, 12, 10,
- /* Size 16x4 */
33, 32, 28, 26, 23, 22, 23, 22, 21, 20, 19, 18, 17, 17, 16, 16, 21, 22,
22, 22, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 15, 18, 19, 20, 20,
19, 17, 16, 15, 14, 13, 13, 12, 12, 12, 13, 12, 15, 16, 17, 18, 17, 16,
15, 14, 13, 12, 12, 11, 11, 11, 10, 10,
+ /* Size 16x4 */
+ 33, 21, 18, 15, 32, 22, 19, 16, 28, 22, 20, 17, 26, 22, 20, 18, 23, 20,
+ 19, 17, 22, 19, 17, 16, 23, 18, 16, 15, 22, 18, 15, 14, 21, 18, 14, 13,
+ 20, 17, 13, 12, 19, 17, 13, 12, 18, 16, 12, 11, 17, 16, 12, 11, 17, 16,
+ 12, 11, 16, 16, 13, 10, 16, 15, 12, 10,
/* Size 8x32 */
- 32, 30, 21, 21, 19, 16, 15, 15, 33, 29, 22, 22, 20, 17, 16, 16, 33, 28,
- 22, 22, 20, 18, 17, 16, 34, 28, 22, 23, 21, 18, 17, 17, 31, 26, 22, 22,
- 21, 18, 17, 17, 29, 24, 22, 23, 21, 19, 18, 17, 28, 23, 22, 23, 21, 19,
- 18, 17, 24, 23, 20, 21, 20, 18, 17, 17, 23, 22, 20, 20, 19, 17, 17, 17,
- 21, 22, 19, 19, 19, 17, 16, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 22,
- 19, 18, 17, 16, 16, 16, 21, 23, 19, 18, 17, 15, 15, 15, 20, 22, 19, 17,
- 16, 15, 15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 20, 22, 19, 17, 15, 14,
- 13, 14, 19, 21, 19, 17, 15, 13, 13, 13, 19, 21, 19, 17, 15, 13, 13, 13,
- 18, 20, 18, 16, 14, 12, 12, 13, 17, 20, 18, 16, 14, 12, 12, 12, 17, 19,
- 18, 16, 14, 12, 12, 12, 16, 18, 17, 15, 13, 12, 11, 12, 16, 18, 17, 15,
- 13, 12, 11, 12, 16, 18, 17, 15, 13, 11, 11, 11, 16, 17, 16, 15, 13, 11,
- 11, 11, 15, 17, 16, 15, 13, 12, 11, 11, 15, 17, 16, 14, 13, 12, 11, 10,
- 15, 17, 16, 14, 13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16,
- 16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15,
- 13, 12, 11, 10,
- /* Size 32x8 */
32, 33, 33, 34, 31, 29, 28, 24, 23, 21, 21, 21, 21, 20, 20, 20, 19, 19,
18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 14, 30, 29, 28, 28,
26, 24, 23, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19, 18,
@@ -7882,7 +7866,23 @@
18, 17, 17, 16, 16, 16, 15, 15, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 17, 17, 17, 17,
16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10,
- 10, 10, 10, 10 },
+ 10, 10, 10, 10,
+ /* Size 32x8 */
+ 32, 30, 21, 21, 19, 16, 15, 15, 33, 29, 22, 22, 20, 17, 16, 16, 33, 28,
+ 22, 22, 20, 18, 17, 16, 34, 28, 22, 23, 21, 18, 17, 17, 31, 26, 22, 22,
+ 21, 18, 17, 17, 29, 24, 22, 23, 21, 19, 18, 17, 28, 23, 22, 23, 21, 19,
+ 18, 17, 24, 23, 20, 21, 20, 18, 17, 17, 23, 22, 20, 20, 19, 17, 17, 17,
+ 21, 22, 19, 19, 19, 17, 16, 17, 21, 22, 19, 18, 18, 16, 16, 16, 21, 22,
+ 19, 18, 17, 16, 16, 16, 21, 23, 19, 18, 17, 15, 15, 15, 20, 22, 19, 17,
+ 16, 15, 15, 15, 20, 22, 19, 17, 16, 14, 14, 14, 20, 22, 19, 17, 15, 14,
+ 13, 14, 19, 21, 19, 17, 15, 13, 13, 13, 19, 21, 19, 17, 15, 13, 13, 13,
+ 18, 20, 18, 16, 14, 12, 12, 13, 17, 20, 18, 16, 14, 12, 12, 12, 17, 19,
+ 18, 16, 14, 12, 12, 12, 16, 18, 17, 15, 13, 12, 11, 12, 16, 18, 17, 15,
+ 13, 12, 11, 12, 16, 18, 17, 15, 13, 11, 11, 11, 16, 17, 16, 15, 13, 11,
+ 11, 11, 15, 17, 16, 15, 13, 12, 11, 11, 15, 17, 16, 14, 13, 12, 11, 10,
+ 15, 17, 16, 14, 13, 12, 11, 10, 15, 16, 16, 15, 13, 12, 11, 10, 14, 16,
+ 16, 15, 13, 12, 11, 10, 14, 16, 16, 15, 13, 12, 11, 10, 14, 16, 16, 15,
+ 13, 12, 11, 10 },
},
{
{ /* Luma */
@@ -7964,20 +7964,12 @@
7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 10, 11, 11, 11, 11, 12, 12, 12, 12, 11,
11, 11, 10, 10, 10, 9, 9, 9, 9, 8, 8, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6,
/* Size 4x8 */
- 32, 29, 17, 12, 32, 28, 18, 13, 30, 22, 16, 12, 25, 19, 13, 11, 20, 17,
- 11, 9, 16, 14, 9, 8, 14, 13, 9, 7, 12, 11, 9, 7,
- /* Size 8x4 */
32, 32, 30, 25, 20, 16, 14, 12, 29, 28, 22, 19, 17, 14, 13, 11, 17, 18,
16, 13, 11, 9, 9, 9, 12, 13, 12, 11, 9, 8, 7, 7,
+ /* Size 8x4 */
+ 32, 29, 17, 12, 32, 28, 18, 13, 30, 22, 16, 12, 25, 19, 13, 11, 20, 17,
+ 11, 9, 16, 14, 9, 8, 14, 13, 9, 7, 12, 11, 9, 7,
/* Size 8x16 */
- 32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 30, 25, 20, 17, 13, 12, 33, 31,
- 29, 24, 21, 17, 14, 13, 32, 30, 28, 24, 21, 18, 14, 13, 30, 29, 25, 21,
- 19, 16, 13, 13, 28, 28, 22, 19, 17, 15, 13, 12, 25, 26, 21, 17, 15, 13,
- 12, 11, 22, 23, 19, 16, 14, 12, 11, 10, 19, 20, 18, 14, 12, 11, 10, 9,
- 18, 19, 17, 14, 12, 10, 9, 9, 16, 17, 16, 13, 11, 10, 9, 8, 14, 15, 14,
- 12, 10, 9, 8, 8, 12, 14, 13, 11, 10, 9, 7, 7, 12, 13, 12, 11, 9, 8, 7,
- 7, 11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6,
- /* Size 16x8 */
32, 33, 33, 32, 30, 28, 25, 22, 19, 18, 16, 14, 12, 12, 11, 11, 33, 32,
31, 30, 29, 28, 26, 23, 20, 19, 17, 15, 14, 13, 12, 12, 29, 30, 29, 28,
25, 22, 21, 19, 18, 17, 16, 14, 13, 12, 12, 12, 23, 25, 24, 24, 21, 19,
@@ -7985,35 +7977,15 @@
12, 12, 11, 10, 10, 9, 9, 9, 16, 17, 17, 18, 16, 15, 13, 12, 11, 10, 10,
9, 9, 8, 8, 8, 12, 13, 14, 14, 13, 13, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7,
11, 12, 13, 13, 13, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6,
+ /* Size 16x8 */
+ 32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 30, 25, 20, 17, 13, 12, 33, 31,
+ 29, 24, 21, 17, 14, 13, 32, 30, 28, 24, 21, 18, 14, 13, 30, 29, 25, 21,
+ 19, 16, 13, 13, 28, 28, 22, 19, 17, 15, 13, 12, 25, 26, 21, 17, 15, 13,
+ 12, 11, 22, 23, 19, 16, 14, 12, 11, 10, 19, 20, 18, 14, 12, 11, 10, 9,
+ 18, 19, 17, 14, 12, 10, 9, 9, 16, 17, 16, 13, 11, 10, 9, 8, 14, 15, 14,
+ 12, 10, 9, 8, 8, 12, 14, 13, 11, 10, 9, 7, 7, 12, 13, 12, 11, 9, 8, 7,
+ 7, 11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6,
/* Size 16x32 */
- 32, 33, 33, 32, 29, 28, 23, 22, 19, 17, 16, 13, 12, 12, 11, 11, 33, 32,
- 32, 32, 29, 29, 24, 23, 20, 17, 17, 14, 13, 12, 12, 12, 33, 32, 32, 32,
- 30, 29, 25, 23, 20, 18, 17, 14, 13, 12, 12, 12, 33, 32, 32, 31, 30, 30,
- 25, 23, 21, 18, 17, 14, 14, 13, 12, 12, 33, 32, 31, 30, 29, 28, 24, 23,
- 21, 18, 17, 14, 14, 13, 13, 12, 32, 32, 31, 30, 28, 28, 24, 23, 20, 18,
- 17, 14, 14, 13, 13, 12, 32, 31, 30, 29, 28, 27, 24, 23, 21, 18, 18, 15,
- 14, 13, 13, 12, 32, 31, 30, 28, 26, 26, 23, 22, 20, 18, 17, 14, 14, 13,
- 13, 13, 30, 30, 29, 28, 25, 24, 21, 20, 19, 17, 16, 14, 13, 13, 13, 13,
- 29, 30, 28, 27, 23, 22, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 28, 30,
- 28, 27, 22, 21, 19, 18, 17, 16, 15, 13, 13, 12, 12, 12, 26, 28, 26, 26,
- 21, 20, 18, 17, 16, 14, 14, 12, 12, 12, 12, 11, 25, 26, 26, 25, 21, 20,
- 17, 17, 15, 14, 13, 12, 12, 11, 11, 11, 23, 25, 24, 24, 20, 19, 16, 16,
- 14, 13, 13, 11, 11, 11, 11, 11, 22, 23, 23, 23, 19, 18, 16, 15, 14, 12,
- 12, 11, 11, 10, 10, 10, 21, 23, 23, 22, 19, 18, 15, 15, 13, 12, 12, 11,
- 10, 10, 10, 10, 19, 21, 20, 20, 18, 17, 14, 14, 12, 11, 11, 10, 10, 10,
- 9, 10, 19, 20, 20, 20, 17, 17, 14, 13, 12, 11, 11, 10, 9, 9, 9, 9, 18,
- 19, 19, 19, 17, 16, 14, 13, 12, 11, 10, 9, 9, 9, 9, 9, 16, 18, 18, 18,
- 16, 15, 13, 12, 11, 10, 10, 9, 9, 9, 9, 8, 16, 17, 17, 18, 16, 15, 13,
- 12, 11, 10, 10, 9, 9, 8, 8, 8, 14, 16, 16, 16, 14, 14, 12, 12, 11, 9, 9,
- 8, 8, 8, 8, 8, 14, 15, 15, 16, 14, 14, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8,
- 13, 14, 14, 15, 13, 13, 11, 11, 10, 9, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14,
- 13, 13, 11, 11, 10, 9, 9, 8, 7, 7, 7, 7, 12, 14, 14, 14, 13, 13, 11, 11,
- 10, 9, 8, 8, 7, 7, 7, 7, 12, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7,
- 7, 7, 7, 12, 12, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 6, 11, 12,
- 12, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6, 11, 12, 12, 12, 12, 11,
- 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 11, 12, 12, 12, 12, 11, 11, 10, 9, 8, 8,
- 7, 7, 6, 6, 6, 10, 11, 11, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 6, 6, 6,
- /* Size 32x16 */
32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 19, 19,
18, 16, 16, 14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 10, 33, 32, 32, 32,
32, 32, 31, 31, 30, 30, 30, 28, 26, 25, 23, 23, 21, 20, 19, 18, 17, 16,
@@ -8041,32 +8013,45 @@
12, 12, 13, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8,
8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 11, 12, 12, 12, 12, 12, 12, 13, 13, 12,
12, 11, 11, 11, 10, 10, 10, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6,
+ /* Size 32x16 */
+ 32, 33, 33, 32, 29, 28, 23, 22, 19, 17, 16, 13, 12, 12, 11, 11, 33, 32,
+ 32, 32, 29, 29, 24, 23, 20, 17, 17, 14, 13, 12, 12, 12, 33, 32, 32, 32,
+ 30, 29, 25, 23, 20, 18, 17, 14, 13, 12, 12, 12, 33, 32, 32, 31, 30, 30,
+ 25, 23, 21, 18, 17, 14, 14, 13, 12, 12, 33, 32, 31, 30, 29, 28, 24, 23,
+ 21, 18, 17, 14, 14, 13, 13, 12, 32, 32, 31, 30, 28, 28, 24, 23, 20, 18,
+ 17, 14, 14, 13, 13, 12, 32, 31, 30, 29, 28, 27, 24, 23, 21, 18, 18, 15,
+ 14, 13, 13, 12, 32, 31, 30, 28, 26, 26, 23, 22, 20, 18, 17, 14, 14, 13,
+ 13, 13, 30, 30, 29, 28, 25, 24, 21, 20, 19, 17, 16, 14, 13, 13, 13, 13,
+ 29, 30, 28, 27, 23, 22, 20, 19, 17, 16, 15, 13, 13, 12, 12, 12, 28, 30,
+ 28, 27, 22, 21, 19, 18, 17, 16, 15, 13, 13, 12, 12, 12, 26, 28, 26, 26,
+ 21, 20, 18, 17, 16, 14, 14, 12, 12, 12, 12, 11, 25, 26, 26, 25, 21, 20,
+ 17, 17, 15, 14, 13, 12, 12, 11, 11, 11, 23, 25, 24, 24, 20, 19, 16, 16,
+ 14, 13, 13, 11, 11, 11, 11, 11, 22, 23, 23, 23, 19, 18, 16, 15, 14, 12,
+ 12, 11, 11, 10, 10, 10, 21, 23, 23, 22, 19, 18, 15, 15, 13, 12, 12, 11,
+ 10, 10, 10, 10, 19, 21, 20, 20, 18, 17, 14, 14, 12, 11, 11, 10, 10, 10,
+ 9, 10, 19, 20, 20, 20, 17, 17, 14, 13, 12, 11, 11, 10, 9, 9, 9, 9, 18,
+ 19, 19, 19, 17, 16, 14, 13, 12, 11, 10, 9, 9, 9, 9, 9, 16, 18, 18, 18,
+ 16, 15, 13, 12, 11, 10, 10, 9, 9, 9, 9, 8, 16, 17, 17, 18, 16, 15, 13,
+ 12, 11, 10, 10, 9, 9, 8, 8, 8, 14, 16, 16, 16, 14, 14, 12, 12, 11, 9, 9,
+ 8, 8, 8, 8, 8, 14, 15, 15, 16, 14, 14, 12, 11, 10, 9, 9, 8, 8, 8, 8, 8,
+ 13, 14, 14, 15, 13, 13, 11, 11, 10, 9, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14,
+ 13, 13, 11, 11, 10, 9, 9, 8, 7, 7, 7, 7, 12, 14, 14, 14, 13, 13, 11, 11,
+ 10, 9, 8, 8, 7, 7, 7, 7, 12, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7,
+ 7, 7, 7, 12, 12, 13, 13, 12, 12, 11, 10, 9, 9, 8, 7, 7, 7, 7, 6, 11, 12,
+ 12, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7, 6, 11, 12, 12, 12, 12, 11,
+ 11, 10, 9, 9, 8, 8, 7, 7, 6, 6, 11, 12, 12, 12, 12, 11, 11, 10, 9, 8, 8,
+ 7, 7, 6, 6, 6, 10, 11, 11, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 6, 6, 6,
/* Size 4x16 */
- 33, 28, 17, 12, 32, 29, 18, 12, 32, 28, 18, 13, 31, 27, 18, 13, 30, 24,
- 17, 13, 30, 21, 16, 12, 26, 20, 14, 11, 23, 18, 12, 10, 21, 17, 11, 10,
- 19, 16, 11, 9, 17, 15, 10, 8, 15, 14, 9, 8, 14, 13, 9, 7, 13, 12, 9, 7,
- 12, 12, 9, 7, 12, 11, 8, 6,
- /* Size 16x4 */
33, 32, 32, 31, 30, 30, 26, 23, 21, 19, 17, 15, 14, 13, 12, 12, 28, 29,
28, 27, 24, 21, 20, 18, 17, 16, 15, 14, 13, 12, 12, 11, 17, 18, 18, 18,
17, 16, 14, 12, 11, 11, 10, 9, 9, 9, 9, 8, 12, 12, 13, 13, 13, 12, 11,
10, 10, 9, 8, 8, 7, 7, 7, 6,
+ /* Size 16x4 */
+ 33, 28, 17, 12, 32, 29, 18, 12, 32, 28, 18, 13, 31, 27, 18, 13, 30, 24,
+ 17, 13, 30, 21, 16, 12, 26, 20, 14, 11, 23, 18, 12, 10, 21, 17, 11, 10,
+ 19, 16, 11, 9, 17, 15, 10, 8, 15, 14, 9, 8, 14, 13, 9, 7, 13, 12, 9, 7,
+ 12, 12, 9, 7, 12, 11, 8, 6,
/* Size 8x32 */
- 32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 29, 24, 20, 17, 13, 12, 33, 32,
- 30, 25, 20, 17, 13, 12, 33, 32, 30, 25, 21, 17, 14, 12, 33, 31, 29, 24,
- 21, 17, 14, 13, 32, 31, 28, 24, 20, 17, 14, 13, 32, 30, 28, 24, 21, 18,
- 14, 13, 32, 30, 26, 23, 20, 17, 14, 13, 30, 29, 25, 21, 19, 16, 13, 13,
- 29, 28, 23, 20, 17, 15, 13, 12, 28, 28, 22, 19, 17, 15, 13, 12, 26, 26,
- 21, 18, 16, 14, 12, 12, 25, 26, 21, 17, 15, 13, 12, 11, 23, 24, 20, 16,
- 14, 13, 11, 11, 22, 23, 19, 16, 14, 12, 11, 10, 21, 23, 19, 15, 13, 12,
- 10, 10, 19, 20, 18, 14, 12, 11, 10, 9, 19, 20, 17, 14, 12, 11, 9, 9, 18,
- 19, 17, 14, 12, 10, 9, 9, 16, 18, 16, 13, 11, 10, 9, 9, 16, 17, 16, 13,
- 11, 10, 9, 8, 14, 16, 14, 12, 11, 9, 8, 8, 14, 15, 14, 12, 10, 9, 8, 8,
- 13, 14, 13, 11, 10, 9, 8, 7, 12, 14, 13, 11, 10, 9, 7, 7, 12, 14, 13,
- 11, 10, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7,
- 11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6, 11, 12, 12, 11,
- 9, 8, 7, 6, 10, 11, 12, 11, 9, 8, 7, 6,
- /* Size 32x8 */
32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 19, 19,
18, 16, 16, 14, 14, 13, 12, 12, 12, 12, 11, 11, 11, 10, 33, 32, 32, 32,
31, 31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 23, 20, 20, 19, 18, 17, 16,
@@ -8080,7 +8065,22 @@
9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 12, 13, 13, 14, 14, 14, 14, 14, 13, 13,
13, 12, 12, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7,
11, 12, 12, 12, 13, 13, 13, 13, 13, 12, 12, 12, 11, 11, 10, 10, 9, 9, 9,
- 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6 },
+ 9, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6,
+ /* Size 32x8 */
+ 32, 33, 29, 23, 19, 16, 12, 11, 33, 32, 29, 24, 20, 17, 13, 12, 33, 32,
+ 30, 25, 20, 17, 13, 12, 33, 32, 30, 25, 21, 17, 14, 12, 33, 31, 29, 24,
+ 21, 17, 14, 13, 32, 31, 28, 24, 20, 17, 14, 13, 32, 30, 28, 24, 21, 18,
+ 14, 13, 32, 30, 26, 23, 20, 17, 14, 13, 30, 29, 25, 21, 19, 16, 13, 13,
+ 29, 28, 23, 20, 17, 15, 13, 12, 28, 28, 22, 19, 17, 15, 13, 12, 26, 26,
+ 21, 18, 16, 14, 12, 12, 25, 26, 21, 17, 15, 13, 12, 11, 23, 24, 20, 16,
+ 14, 13, 11, 11, 22, 23, 19, 16, 14, 12, 11, 10, 21, 23, 19, 15, 13, 12,
+ 10, 10, 19, 20, 18, 14, 12, 11, 10, 9, 19, 20, 17, 14, 12, 11, 9, 9, 18,
+ 19, 17, 14, 12, 10, 9, 9, 16, 18, 16, 13, 11, 10, 9, 9, 16, 17, 16, 13,
+ 11, 10, 9, 8, 14, 16, 14, 12, 11, 9, 8, 8, 14, 15, 14, 12, 10, 9, 8, 8,
+ 13, 14, 13, 11, 10, 9, 8, 7, 12, 14, 13, 11, 10, 9, 7, 7, 12, 14, 13,
+ 11, 10, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7, 12, 13, 12, 11, 9, 8, 7, 7,
+ 11, 12, 12, 11, 9, 8, 7, 7, 11, 12, 12, 11, 9, 8, 7, 6, 11, 12, 12, 11,
+ 9, 8, 7, 6, 10, 11, 12, 11, 9, 8, 7, 6 },
{ /* Chroma */
/* Size 4x4 */
32, 23, 20, 17, 23, 19, 17, 16, 20, 17, 14, 13, 17, 16, 13, 11,
@@ -8164,21 +8164,12 @@
10, 10, 14, 15, 15, 16, 16, 17, 17, 17, 17, 16, 16, 15, 15, 15, 15, 14,
14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10,
/* Size 4x8 */
- 33, 22, 19, 16, 28, 22, 20, 17, 22, 20, 19, 17, 23, 19, 16, 15, 21, 19,
- 14, 13, 19, 18, 13, 12, 17, 17, 13, 11, 16, 16, 13, 11,
- /* Size 8x4 */
33, 28, 22, 23, 21, 19, 17, 16, 22, 22, 20, 19, 19, 18, 17, 16, 19, 20,
19, 16, 14, 13, 13, 13, 16, 17, 17, 15, 13, 12, 11, 11,
+ /* Size 8x4 */
+ 33, 22, 19, 16, 28, 22, 20, 17, 22, 20, 19, 17, 23, 19, 16, 15, 21, 19,
+ 14, 13, 19, 18, 13, 12, 17, 17, 13, 11, 16, 16, 13, 11,
/* Size 8x16 */
- 32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 21, 19, 17, 16, 31, 28,
- 22, 23, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20, 19, 17, 24, 23, 21, 21,
- 20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22, 20, 18, 17, 17,
- 16, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 19, 17, 16, 14, 14, 14,
- 19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 12, 17, 19,
- 18, 16, 14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15,
- 13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 16, 17, 15, 14, 12,
- 11, 10,
- /* Size 16x8 */
32, 33, 31, 28, 24, 21, 21, 20, 20, 19, 18, 17, 16, 16, 15, 15, 31, 30,
28, 24, 23, 22, 22, 22, 22, 21, 20, 19, 18, 17, 17, 16, 23, 23, 22, 22,
21, 20, 20, 20, 19, 19, 19, 18, 17, 17, 17, 17, 21, 22, 23, 23, 21, 19,
@@ -8187,37 +8178,16 @@
13, 13, 12, 12, 12, 12, 16, 17, 18, 19, 18, 17, 16, 14, 14, 13, 12, 12,
11, 11, 11, 11, 15, 16, 17, 17, 17, 16, 15, 14, 14, 13, 12, 12, 11, 11,
11, 10,
+ /* Size 16x8 */
+ 32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 21, 19, 17, 16, 31, 28,
+ 22, 23, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20, 19, 17, 24, 23, 21, 21,
+ 20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22, 20, 18, 17, 17,
+ 16, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 19, 17, 16, 14, 14, 14,
+ 19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 12, 17, 19,
+ 18, 16, 14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15,
+ 13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 16, 17, 15, 14, 12,
+ 11, 10,
/* Size 16x32 */
- 32, 33, 31, 28, 23, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 15, 33, 33,
- 30, 27, 23, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 16, 33, 32, 30, 26,
- 23, 22, 22, 22, 21, 20, 19, 17, 17, 17, 16, 16, 34, 32, 29, 26, 23, 22,
- 23, 22, 21, 20, 20, 18, 18, 17, 17, 17, 31, 29, 28, 24, 22, 22, 23, 22,
- 22, 20, 20, 18, 18, 17, 17, 17, 31, 28, 27, 24, 22, 22, 22, 22, 22, 20,
- 20, 18, 18, 17, 17, 17, 28, 26, 24, 22, 22, 22, 23, 22, 22, 21, 20, 19,
- 19, 18, 17, 17, 26, 25, 24, 22, 21, 21, 22, 22, 21, 20, 20, 19, 18, 18,
- 18, 17, 24, 24, 23, 22, 21, 20, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17,
- 22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 18, 17, 17, 17, 17, 17, 21, 22,
- 22, 21, 20, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 17, 21, 22, 22, 22,
- 20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 23, 22, 22, 20, 19,
- 18, 18, 17, 17, 17, 16, 16, 16, 15, 16, 21, 23, 23, 22, 20, 19, 18, 17,
- 17, 16, 16, 15, 15, 15, 15, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16,
- 16, 15, 14, 15, 14, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16, 15, 14,
- 14, 14, 14, 14, 20, 21, 22, 22, 19, 19, 17, 16, 16, 15, 14, 14, 14, 14,
- 14, 14, 19, 21, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 14, 13,
- 19, 20, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 13, 13, 18, 20,
- 20, 20, 19, 18, 16, 16, 15, 14, 13, 13, 12, 13, 13, 13, 18, 20, 20, 20,
- 19, 18, 16, 16, 15, 14, 13, 12, 12, 12, 12, 13, 17, 19, 19, 20, 18, 18,
- 16, 15, 14, 13, 13, 12, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 15,
- 14, 13, 13, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 17, 15, 15, 14, 13,
- 12, 12, 11, 11, 12, 12, 16, 18, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11,
- 11, 11, 11, 12, 16, 17, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11, 11, 11,
- 11, 11, 16, 17, 17, 18, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 11,
- 15, 17, 17, 18, 17, 16, 15, 15, 13, 13, 12, 11, 11, 11, 11, 11, 15, 17,
- 17, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 15, 16, 17, 17,
- 17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 16,
- 15, 14, 14, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 15, 15, 14,
- 14, 12, 12, 11, 11, 10, 10, 10,
- /* Size 32x16 */
32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 20, 19,
19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 33, 33, 32, 32,
29, 28, 26, 25, 24, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20, 20, 20, 19,
@@ -8247,33 +8217,47 @@
12, 12, 12, 12, 11, 11, 11, 11, 11, 10, 10, 10, 15, 16, 16, 17, 17, 17,
17, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12,
12, 11, 11, 11, 10, 10, 10, 10,
+ /* Size 32x16 */
+ 32, 33, 31, 28, 23, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 15, 33, 33,
+ 30, 27, 23, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 16, 33, 32, 30, 26,
+ 23, 22, 22, 22, 21, 20, 19, 17, 17, 17, 16, 16, 34, 32, 29, 26, 23, 22,
+ 23, 22, 21, 20, 20, 18, 18, 17, 17, 17, 31, 29, 28, 24, 22, 22, 23, 22,
+ 22, 20, 20, 18, 18, 17, 17, 17, 31, 28, 27, 24, 22, 22, 22, 22, 22, 20,
+ 20, 18, 18, 17, 17, 17, 28, 26, 24, 22, 22, 22, 23, 22, 22, 21, 20, 19,
+ 19, 18, 17, 17, 26, 25, 24, 22, 21, 21, 22, 22, 21, 20, 20, 19, 18, 18,
+ 18, 17, 24, 24, 23, 22, 21, 20, 21, 20, 20, 19, 19, 18, 18, 17, 17, 17,
+ 22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 18, 17, 17, 17, 17, 17, 21, 22,
+ 22, 21, 20, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 17, 21, 22, 22, 22,
+ 20, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 16, 21, 23, 22, 22, 20, 19,
+ 18, 18, 17, 17, 17, 16, 16, 16, 15, 16, 21, 23, 23, 22, 20, 19, 18, 17,
+ 17, 16, 16, 15, 15, 15, 15, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16,
+ 16, 15, 14, 15, 14, 15, 20, 22, 22, 22, 20, 19, 17, 17, 16, 16, 15, 14,
+ 14, 14, 14, 14, 20, 21, 22, 22, 19, 19, 17, 16, 16, 15, 14, 14, 14, 14,
+ 14, 14, 19, 21, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 14, 13,
+ 19, 20, 21, 21, 19, 19, 17, 16, 15, 14, 14, 13, 13, 13, 13, 13, 18, 20,
+ 20, 20, 19, 18, 16, 16, 15, 14, 13, 13, 12, 13, 13, 13, 18, 20, 20, 20,
+ 19, 18, 16, 16, 15, 14, 13, 12, 12, 12, 12, 13, 17, 19, 19, 20, 18, 18,
+ 16, 15, 14, 13, 13, 12, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 15,
+ 14, 13, 13, 12, 12, 12, 12, 12, 16, 18, 18, 19, 17, 17, 15, 15, 14, 13,
+ 12, 12, 11, 11, 12, 12, 16, 18, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11,
+ 11, 11, 11, 12, 16, 17, 18, 18, 17, 17, 15, 14, 14, 13, 12, 11, 11, 11,
+ 11, 11, 16, 17, 17, 18, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 11,
+ 15, 17, 17, 18, 17, 16, 15, 15, 13, 13, 12, 11, 11, 11, 11, 11, 15, 17,
+ 17, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 11, 10, 15, 16, 17, 17,
+ 17, 16, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 16,
+ 15, 14, 14, 13, 12, 12, 11, 11, 10, 10, 15, 16, 16, 17, 17, 15, 15, 14,
+ 14, 12, 12, 11, 11, 10, 10, 10,
/* Size 4x16 */
- 33, 21, 18, 15, 32, 22, 20, 17, 29, 22, 20, 17, 26, 22, 21, 18, 24, 20,
- 19, 17, 22, 19, 18, 16, 23, 19, 17, 16, 22, 19, 16, 15, 21, 19, 15, 14,
- 20, 19, 14, 13, 20, 18, 14, 12, 18, 17, 13, 12, 18, 17, 13, 11, 17, 16,
- 12, 11, 17, 16, 13, 11, 16, 16, 13, 11,
- /* Size 16x4 */
33, 32, 29, 26, 24, 22, 23, 22, 21, 20, 20, 18, 18, 17, 17, 16, 21, 22,
22, 22, 20, 19, 19, 19, 19, 19, 18, 17, 17, 16, 16, 16, 18, 20, 20, 21,
19, 18, 17, 16, 15, 14, 14, 13, 13, 12, 13, 13, 15, 17, 17, 18, 17, 16,
16, 15, 14, 13, 12, 12, 11, 11, 11, 11,
+ /* Size 16x4 */
+ 33, 21, 18, 15, 32, 22, 20, 17, 29, 22, 20, 17, 26, 22, 21, 18, 24, 20,
+ 19, 17, 22, 19, 18, 16, 23, 19, 17, 16, 22, 19, 16, 15, 21, 19, 15, 14,
+ 20, 19, 14, 13, 20, 18, 14, 12, 18, 17, 13, 12, 18, 17, 13, 11, 17, 16,
+ 12, 11, 17, 16, 13, 11, 16, 16, 13, 11,
/* Size 8x32 */
- 32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 20, 19, 17, 16, 33, 30,
- 23, 22, 21, 19, 17, 16, 34, 29, 23, 23, 21, 20, 18, 17, 31, 28, 22, 23,
- 22, 20, 18, 17, 31, 27, 22, 22, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20,
- 19, 17, 26, 24, 21, 22, 21, 20, 18, 18, 24, 23, 21, 21, 20, 19, 18, 17,
- 22, 22, 20, 19, 19, 18, 17, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22,
- 20, 18, 18, 17, 16, 16, 21, 22, 20, 18, 17, 17, 16, 15, 21, 23, 20, 18,
- 17, 16, 15, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 20, 17, 16, 15,
- 14, 14, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 14, 13, 14,
- 19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 13, 18, 20,
- 19, 16, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 17, 19, 18, 16,
- 14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 12, 16, 18, 17, 15, 14, 12,
- 11, 11, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15, 13, 12, 11, 11,
- 15, 17, 17, 15, 13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 17,
- 17, 15, 13, 12, 11, 10, 15, 16, 17, 15, 14, 12, 11, 10, 15, 16, 17, 15,
- 14, 12, 11, 10,
- /* Size 32x8 */
32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 20, 19,
19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 15, 31, 30, 30, 29,
28, 27, 24, 24, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20, 19,
@@ -8288,7 +8272,23 @@
19, 18, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11,
11, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17, 17, 18, 17, 17,
16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11,
- 11, 10, 10, 10 },
+ 11, 10, 10, 10,
+ /* Size 32x8 */
+ 32, 31, 23, 21, 20, 18, 16, 15, 33, 30, 23, 22, 20, 19, 17, 16, 33, 30,
+ 23, 22, 21, 19, 17, 16, 34, 29, 23, 23, 21, 20, 18, 17, 31, 28, 22, 23,
+ 22, 20, 18, 17, 31, 27, 22, 22, 22, 20, 18, 17, 28, 24, 22, 23, 22, 20,
+ 19, 17, 26, 24, 21, 22, 21, 20, 18, 18, 24, 23, 21, 21, 20, 19, 18, 17,
+ 22, 22, 20, 19, 19, 18, 17, 17, 21, 22, 20, 19, 19, 18, 17, 16, 21, 22,
+ 20, 18, 18, 17, 16, 16, 21, 22, 20, 18, 17, 17, 16, 15, 21, 23, 20, 18,
+ 17, 16, 15, 15, 20, 22, 20, 17, 16, 16, 14, 14, 20, 22, 20, 17, 16, 15,
+ 14, 14, 20, 22, 19, 17, 16, 14, 14, 14, 19, 21, 19, 17, 15, 14, 13, 14,
+ 19, 21, 19, 17, 15, 14, 13, 13, 18, 20, 19, 16, 15, 13, 12, 13, 18, 20,
+ 19, 16, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 17, 19, 18, 16,
+ 14, 13, 12, 12, 16, 18, 17, 15, 14, 12, 11, 12, 16, 18, 17, 15, 14, 12,
+ 11, 11, 16, 18, 17, 15, 14, 12, 11, 11, 16, 17, 17, 15, 13, 12, 11, 11,
+ 15, 17, 17, 15, 13, 12, 11, 11, 15, 17, 17, 15, 13, 12, 11, 11, 15, 17,
+ 17, 15, 13, 12, 11, 10, 15, 16, 17, 15, 14, 12, 11, 10, 15, 16, 17, 15,
+ 14, 12, 11, 10 },
},
{
{ /* Luma */
@@ -8371,20 +8371,12 @@
7, 7, 7, 7, 7, 6, 6, 11, 12, 12, 12, 12, 12, 12, 13, 13, 12, 12, 11, 11,
11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 6, 6, 6,
/* Size 4x8 */
- 32, 29, 20, 13, 32, 28, 20, 14, 30, 24, 19, 14, 27, 20, 15, 12, 21, 17,
- 13, 10, 17, 15, 11, 9, 14, 13, 10, 8, 13, 12, 9, 7,
- /* Size 8x4 */
32, 32, 30, 27, 21, 17, 14, 13, 29, 28, 24, 20, 17, 15, 13, 12, 20, 20,
19, 15, 13, 11, 10, 9, 13, 14, 14, 12, 10, 9, 8, 7,
+ /* Size 8x4 */
+ 32, 29, 20, 13, 32, 28, 20, 14, 30, 24, 19, 14, 27, 20, 15, 12, 21, 17,
+ 13, 10, 17, 15, 11, 9, 14, 13, 10, 8, 13, 12, 9, 7,
/* Size 8x16 */
- 32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
- 30, 27, 22, 17, 14, 13, 32, 31, 28, 26, 21, 18, 15, 13, 31, 30, 27, 23,
- 20, 17, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 26, 27, 23, 19, 16, 14,
- 12, 12, 23, 25, 22, 17, 15, 13, 11, 11, 21, 23, 20, 17, 14, 12, 11, 10,
- 19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 10, 9, 9, 16, 17, 16,
- 14, 11, 10, 9, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14, 14, 12, 10, 9, 8,
- 7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12, 13, 11, 10, 9, 7, 7,
- /* Size 16x8 */
32, 33, 33, 32, 31, 28, 26, 23, 21, 19, 18, 16, 14, 13, 12, 11, 33, 32,
32, 31, 30, 29, 27, 25, 23, 21, 19, 17, 15, 14, 13, 12, 31, 31, 30, 28,
27, 24, 23, 22, 20, 19, 18, 16, 15, 14, 13, 13, 26, 26, 27, 26, 23, 20,
@@ -8392,36 +8384,15 @@
14, 13, 12, 11, 11, 10, 10, 10, 16, 17, 17, 18, 17, 15, 14, 13, 12, 11,
10, 10, 9, 9, 8, 9, 13, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8,
7, 7, 12, 12, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, 7, 7, 7,
+ /* Size 16x8 */
+ 32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
+ 30, 27, 22, 17, 14, 13, 32, 31, 28, 26, 21, 18, 15, 13, 31, 30, 27, 23,
+ 20, 17, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 26, 27, 23, 19, 16, 14,
+ 12, 12, 23, 25, 22, 17, 15, 13, 11, 11, 21, 23, 20, 17, 14, 12, 11, 10,
+ 19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 10, 9, 9, 16, 17, 16,
+ 14, 11, 10, 9, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14, 14, 12, 10, 9, 8,
+ 7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12, 13, 11, 10, 9, 7, 7,
/* Size 16x32 */
- 32, 33, 33, 32, 31, 28, 26, 23, 20, 19, 16, 16, 13, 13, 12, 11, 33, 32,
- 32, 32, 31, 29, 26, 24, 21, 20, 17, 16, 14, 13, 12, 12, 33, 32, 32, 32,
- 31, 29, 26, 24, 21, 20, 17, 17, 14, 13, 12, 12, 33, 32, 32, 31, 31, 30,
- 27, 25, 22, 21, 17, 17, 14, 14, 13, 13, 33, 32, 32, 31, 30, 29, 27, 25,
- 22, 21, 17, 17, 14, 14, 13, 13, 32, 32, 31, 30, 29, 28, 26, 24, 21, 20,
- 17, 17, 14, 14, 13, 13, 32, 32, 31, 29, 28, 28, 26, 24, 21, 21, 18, 17,
- 15, 14, 13, 13, 32, 31, 31, 29, 28, 27, 25, 24, 21, 21, 18, 17, 15, 15,
- 14, 13, 31, 31, 30, 28, 27, 25, 23, 22, 20, 19, 17, 16, 14, 14, 13, 13,
- 30, 30, 30, 28, 26, 24, 23, 21, 19, 19, 16, 16, 14, 14, 13, 12, 28, 30,
- 29, 27, 24, 21, 20, 19, 18, 17, 15, 15, 13, 13, 12, 12, 28, 29, 29, 27,
- 24, 21, 20, 19, 17, 17, 15, 15, 13, 13, 12, 12, 26, 28, 27, 26, 23, 20,
- 19, 18, 16, 16, 14, 14, 12, 12, 12, 12, 26, 27, 26, 25, 23, 20, 18, 17,
- 16, 15, 14, 13, 12, 12, 11, 11, 23, 25, 25, 24, 22, 19, 17, 16, 15, 14,
- 13, 13, 11, 11, 11, 11, 22, 24, 24, 23, 21, 19, 17, 16, 14, 14, 12, 12,
- 11, 11, 11, 10, 21, 23, 23, 22, 20, 18, 17, 15, 14, 13, 12, 12, 11, 10,
- 10, 10, 20, 21, 21, 21, 20, 17, 16, 15, 13, 13, 11, 11, 10, 10, 10, 10,
- 19, 21, 21, 20, 19, 17, 16, 14, 13, 12, 11, 11, 10, 10, 9, 10, 18, 19,
- 19, 19, 18, 16, 15, 14, 12, 12, 11, 10, 9, 9, 9, 9, 18, 19, 19, 19, 18,
- 16, 15, 14, 12, 12, 10, 10, 9, 9, 9, 9, 16, 17, 17, 18, 17, 15, 14, 13,
- 12, 11, 10, 10, 9, 9, 8, 8, 16, 17, 17, 17, 16, 15, 14, 13, 11, 11, 10,
- 10, 9, 8, 8, 8, 14, 16, 16, 16, 15, 14, 13, 12, 11, 11, 9, 9, 8, 8, 8,
- 8, 14, 15, 15, 16, 15, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 13, 14, 14,
- 15, 14, 13, 12, 11, 10, 10, 9, 9, 8, 8, 7, 7, 13, 14, 14, 14, 14, 13,
- 12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14, 14, 13, 12, 11, 10,
- 10, 8, 8, 8, 7, 7, 7, 12, 13, 13, 14, 13, 12, 11, 11, 10, 9, 8, 8, 7, 7,
- 7, 7, 12, 13, 13, 13, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 7, 11, 12,
- 12, 13, 13, 12, 11, 10, 10, 9, 9, 8, 7, 7, 7, 7, 11, 12, 12, 13, 13, 11,
- 11, 10, 10, 9, 9, 8, 8, 7, 7, 6,
- /* Size 32x16 */
32, 33, 33, 33, 33, 32, 32, 32, 31, 30, 28, 28, 26, 26, 23, 22, 21, 20,
19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 11, 11, 33, 32, 32, 32,
32, 32, 32, 31, 31, 30, 30, 29, 28, 27, 25, 24, 23, 21, 21, 19, 19, 17,
@@ -8450,32 +8421,46 @@
11, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 11, 12,
12, 13, 13, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 10, 10, 10, 10, 9,
9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 6,
+ /* Size 32x16 */
+ 32, 33, 33, 32, 31, 28, 26, 23, 20, 19, 16, 16, 13, 13, 12, 11, 33, 32,
+ 32, 32, 31, 29, 26, 24, 21, 20, 17, 16, 14, 13, 12, 12, 33, 32, 32, 32,
+ 31, 29, 26, 24, 21, 20, 17, 17, 14, 13, 12, 12, 33, 32, 32, 31, 31, 30,
+ 27, 25, 22, 21, 17, 17, 14, 14, 13, 13, 33, 32, 32, 31, 30, 29, 27, 25,
+ 22, 21, 17, 17, 14, 14, 13, 13, 32, 32, 31, 30, 29, 28, 26, 24, 21, 20,
+ 17, 17, 14, 14, 13, 13, 32, 32, 31, 29, 28, 28, 26, 24, 21, 21, 18, 17,
+ 15, 14, 13, 13, 32, 31, 31, 29, 28, 27, 25, 24, 21, 21, 18, 17, 15, 15,
+ 14, 13, 31, 31, 30, 28, 27, 25, 23, 22, 20, 19, 17, 16, 14, 14, 13, 13,
+ 30, 30, 30, 28, 26, 24, 23, 21, 19, 19, 16, 16, 14, 14, 13, 12, 28, 30,
+ 29, 27, 24, 21, 20, 19, 18, 17, 15, 15, 13, 13, 12, 12, 28, 29, 29, 27,
+ 24, 21, 20, 19, 17, 17, 15, 15, 13, 13, 12, 12, 26, 28, 27, 26, 23, 20,
+ 19, 18, 16, 16, 14, 14, 12, 12, 12, 12, 26, 27, 26, 25, 23, 20, 18, 17,
+ 16, 15, 14, 13, 12, 12, 11, 11, 23, 25, 25, 24, 22, 19, 17, 16, 15, 14,
+ 13, 13, 11, 11, 11, 11, 22, 24, 24, 23, 21, 19, 17, 16, 14, 14, 12, 12,
+ 11, 11, 11, 10, 21, 23, 23, 22, 20, 18, 17, 15, 14, 13, 12, 12, 11, 10,
+ 10, 10, 20, 21, 21, 21, 20, 17, 16, 15, 13, 13, 11, 11, 10, 10, 10, 10,
+ 19, 21, 21, 20, 19, 17, 16, 14, 13, 12, 11, 11, 10, 10, 9, 10, 18, 19,
+ 19, 19, 18, 16, 15, 14, 12, 12, 11, 10, 9, 9, 9, 9, 18, 19, 19, 19, 18,
+ 16, 15, 14, 12, 12, 10, 10, 9, 9, 9, 9, 16, 17, 17, 18, 17, 15, 14, 13,
+ 12, 11, 10, 10, 9, 9, 8, 8, 16, 17, 17, 17, 16, 15, 14, 13, 11, 11, 10,
+ 10, 9, 8, 8, 8, 14, 16, 16, 16, 15, 14, 13, 12, 11, 11, 9, 9, 8, 8, 8,
+ 8, 14, 15, 15, 16, 15, 14, 13, 12, 11, 10, 9, 9, 8, 8, 8, 8, 13, 14, 14,
+ 15, 14, 13, 12, 11, 10, 10, 9, 9, 8, 8, 7, 7, 13, 14, 14, 14, 14, 13,
+ 12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 12, 14, 14, 14, 14, 13, 12, 11, 10,
+ 10, 8, 8, 8, 7, 7, 7, 12, 13, 13, 14, 13, 12, 11, 11, 10, 9, 8, 8, 7, 7,
+ 7, 7, 12, 13, 13, 13, 13, 12, 11, 10, 10, 9, 8, 8, 7, 7, 7, 7, 11, 12,
+ 12, 13, 13, 12, 11, 10, 10, 9, 9, 8, 7, 7, 7, 7, 11, 12, 12, 13, 13, 11,
+ 11, 10, 10, 9, 9, 8, 8, 7, 7, 6,
/* Size 4x16 */
- 33, 28, 19, 13, 32, 29, 20, 13, 32, 29, 21, 14, 32, 28, 21, 14, 31, 25,
- 19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11, 23, 18, 13, 10,
- 21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 8, 15, 14, 10, 8, 14, 13, 10,
- 7, 13, 12, 9, 7, 12, 12, 9, 7,
- /* Size 16x4 */
33, 32, 32, 32, 31, 30, 28, 25, 23, 21, 19, 17, 15, 14, 13, 12, 28, 29,
29, 28, 25, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 12, 19, 20, 21, 21,
19, 17, 16, 14, 13, 12, 12, 11, 10, 10, 9, 9, 13, 13, 14, 14, 14, 13,
12, 11, 10, 10, 9, 8, 8, 7, 7, 7,
+ /* Size 16x4 */
+ 33, 28, 19, 13, 32, 29, 20, 13, 32, 29, 21, 14, 32, 28, 21, 14, 31, 25,
+ 19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11, 23, 18, 13, 10,
+ 21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 8, 15, 14, 10, 8, 14, 13, 10,
+ 7, 13, 12, 9, 7, 12, 12, 9, 7,
/* Size 8x32 */
- 32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
- 31, 26, 21, 17, 14, 12, 33, 32, 31, 27, 22, 17, 14, 13, 33, 32, 30, 27,
- 22, 17, 14, 13, 32, 31, 29, 26, 21, 17, 14, 13, 32, 31, 28, 26, 21, 18,
- 15, 13, 32, 31, 28, 25, 21, 18, 15, 14, 31, 30, 27, 23, 20, 17, 14, 13,
- 30, 30, 26, 23, 19, 16, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 28, 29,
- 24, 20, 17, 15, 13, 12, 26, 27, 23, 19, 16, 14, 12, 12, 26, 26, 23, 18,
- 16, 14, 12, 11, 23, 25, 22, 17, 15, 13, 11, 11, 22, 24, 21, 17, 14, 12,
- 11, 11, 21, 23, 20, 17, 14, 12, 11, 10, 20, 21, 20, 16, 13, 11, 10, 10,
- 19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 11, 9, 9, 18, 19, 18,
- 15, 12, 10, 9, 9, 16, 17, 17, 14, 12, 10, 9, 8, 16, 17, 16, 14, 11, 10,
- 9, 8, 14, 16, 15, 13, 11, 9, 8, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14,
- 14, 12, 10, 9, 8, 7, 13, 14, 14, 12, 10, 9, 8, 7, 12, 14, 14, 12, 10, 8,
- 8, 7, 12, 13, 13, 11, 10, 8, 7, 7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12,
- 13, 11, 10, 9, 7, 7, 11, 12, 13, 11, 10, 9, 8, 7,
- /* Size 32x8 */
32, 33, 33, 33, 33, 32, 32, 32, 31, 30, 28, 28, 26, 26, 23, 22, 21, 20,
19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 11, 11, 33, 32, 32, 32,
32, 31, 31, 31, 30, 30, 29, 29, 27, 26, 25, 24, 23, 21, 21, 19, 19, 17,
@@ -8489,7 +8474,22 @@
10, 10, 10, 9, 9, 9, 9, 8, 8, 8, 9, 9, 13, 14, 14, 14, 14, 14, 15, 15,
14, 14, 13, 13, 12, 12, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8,
7, 7, 7, 8, 12, 12, 12, 13, 13, 13, 13, 14, 13, 13, 12, 12, 12, 11, 11,
- 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7 },
+ 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7,
+ /* Size 32x8 */
+ 32, 33, 31, 26, 20, 16, 13, 12, 33, 32, 31, 26, 21, 17, 14, 12, 33, 32,
+ 31, 26, 21, 17, 14, 12, 33, 32, 31, 27, 22, 17, 14, 13, 33, 32, 30, 27,
+ 22, 17, 14, 13, 32, 31, 29, 26, 21, 17, 14, 13, 32, 31, 28, 26, 21, 18,
+ 15, 13, 32, 31, 28, 25, 21, 18, 15, 14, 31, 30, 27, 23, 20, 17, 14, 13,
+ 30, 30, 26, 23, 19, 16, 14, 13, 28, 29, 24, 20, 18, 15, 13, 12, 28, 29,
+ 24, 20, 17, 15, 13, 12, 26, 27, 23, 19, 16, 14, 12, 12, 26, 26, 23, 18,
+ 16, 14, 12, 11, 23, 25, 22, 17, 15, 13, 11, 11, 22, 24, 21, 17, 14, 12,
+ 11, 11, 21, 23, 20, 17, 14, 12, 11, 10, 20, 21, 20, 16, 13, 11, 10, 10,
+ 19, 21, 19, 16, 13, 11, 10, 9, 18, 19, 18, 15, 12, 11, 9, 9, 18, 19, 18,
+ 15, 12, 10, 9, 9, 16, 17, 17, 14, 12, 10, 9, 8, 16, 17, 16, 14, 11, 10,
+ 9, 8, 14, 16, 15, 13, 11, 9, 8, 8, 14, 15, 15, 13, 11, 9, 8, 8, 13, 14,
+ 14, 12, 10, 9, 8, 7, 13, 14, 14, 12, 10, 9, 8, 7, 12, 14, 14, 12, 10, 8,
+ 8, 7, 12, 13, 13, 11, 10, 8, 7, 7, 12, 13, 13, 11, 10, 8, 7, 7, 11, 12,
+ 13, 11, 10, 9, 7, 7, 11, 12, 13, 11, 10, 9, 8, 7 },
{ /* Chroma */
/* Size 4x4 */
32, 22, 21, 18, 22, 19, 19, 17, 21, 19, 15, 13, 18, 17, 13, 11,
@@ -8573,21 +8573,12 @@
10, 11, 15, 16, 16, 17, 17, 17, 17, 18, 17, 17, 17, 16, 16, 15, 15, 14,
14, 14, 14, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 10,
/* Size 4x8 */
- 33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 23, 19, 18, 16, 22, 19,
- 16, 14, 20, 18, 15, 12, 18, 17, 14, 11, 17, 16, 13, 11,
- /* Size 8x4 */
33, 28, 24, 23, 22, 20, 18, 17, 22, 22, 20, 19, 19, 18, 17, 16, 20, 22,
20, 18, 16, 15, 14, 13, 17, 18, 18, 16, 14, 12, 11, 11,
+ /* Size 8x4 */
+ 33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 23, 19, 18, 16, 22, 19,
+ 16, 14, 20, 18, 15, 12, 18, 17, 14, 11, 17, 16, 13, 11,
/* Size 8x16 */
- 32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 29,
- 24, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21,
- 21, 20, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22, 21, 19, 18, 17,
- 16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 20, 22, 21, 18, 16, 15, 14, 14,
- 20, 21, 20, 18, 16, 14, 14, 13, 19, 20, 20, 17, 15, 14, 13, 13, 18, 20,
- 19, 17, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 16, 18, 18, 16,
- 14, 12, 12, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13,
- 12, 11,
- /* Size 16x8 */
32, 33, 33, 29, 25, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 15, 32, 31,
29, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 17, 26, 25, 24, 22,
21, 20, 21, 21, 21, 20, 20, 19, 18, 18, 17, 17, 21, 22, 22, 22, 21, 19,
@@ -8596,37 +8587,16 @@
14, 13, 13, 12, 12, 13, 16, 17, 18, 19, 18, 17, 16, 15, 14, 14, 13, 12,
12, 12, 11, 12, 15, 16, 17, 18, 17, 17, 16, 15, 14, 13, 13, 12, 12, 11,
11, 11,
+ /* Size 16x8 */
+ 32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 29,
+ 24, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21,
+ 21, 20, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22, 21, 19, 18, 17,
+ 16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 20, 22, 21, 18, 16, 15, 14, 14,
+ 20, 21, 20, 18, 16, 14, 14, 13, 19, 20, 20, 17, 15, 14, 13, 13, 18, 20,
+ 19, 17, 15, 13, 12, 12, 17, 19, 18, 16, 14, 13, 12, 12, 16, 18, 18, 16,
+ 14, 12, 12, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13,
+ 12, 11,
/* Size 16x32 */
- 32, 33, 32, 28, 26, 21, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 33, 33,
- 31, 27, 25, 22, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 33, 33, 31, 27,
- 25, 22, 22, 22, 21, 21, 19, 19, 17, 17, 16, 16, 34, 32, 31, 26, 24, 22,
- 23, 23, 22, 21, 20, 20, 18, 18, 17, 17, 33, 31, 29, 25, 24, 22, 22, 23,
- 22, 21, 20, 20, 18, 18, 17, 17, 31, 28, 28, 24, 23, 22, 22, 22, 22, 22,
- 20, 20, 18, 18, 17, 17, 29, 27, 26, 23, 22, 22, 22, 23, 22, 22, 20, 20,
- 19, 18, 18, 17, 28, 26, 25, 22, 22, 22, 22, 23, 22, 22, 20, 20, 19, 19,
- 18, 18, 25, 24, 24, 22, 21, 21, 21, 21, 21, 20, 20, 19, 18, 18, 17, 18,
- 24, 24, 24, 22, 21, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 21, 22,
- 22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 22, 21,
- 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 21, 22, 22, 22, 21, 19,
- 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 21, 23, 22, 22, 21, 19, 19, 18,
- 18, 18, 17, 17, 16, 16, 16, 15, 21, 23, 23, 22, 21, 19, 18, 18, 17, 17,
- 16, 16, 15, 15, 15, 15, 21, 22, 22, 22, 21, 19, 18, 17, 17, 17, 16, 16,
- 15, 15, 15, 15, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14,
- 14, 14, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 14,
- 20, 21, 21, 22, 20, 19, 18, 17, 16, 16, 14, 14, 14, 14, 13, 14, 19, 20,
- 21, 21, 20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 19, 20, 20, 21,
- 20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 18, 20, 20, 20, 20, 18,
- 17, 16, 15, 15, 13, 13, 12, 12, 12, 12, 18, 20, 20, 20, 19, 18, 17, 16,
- 15, 14, 13, 13, 12, 12, 12, 12, 17, 19, 19, 20, 19, 18, 17, 16, 14, 14,
- 13, 13, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 16, 14, 14, 13, 13,
- 12, 12, 12, 12, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11,
- 11, 11, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11, 11, 11,
- 16, 17, 18, 18, 18, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11, 11, 16, 17,
- 17, 18, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18,
- 17, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18, 17, 16,
- 16, 14, 14, 13, 13, 12, 12, 11, 11, 11, 15, 17, 17, 17, 17, 16, 16, 14,
- 14, 13, 13, 12, 12, 11, 11, 10,
- /* Size 32x16 */
32, 33, 33, 34, 33, 31, 29, 28, 25, 24, 21, 21, 21, 21, 21, 21, 20, 20,
20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 33, 33, 33, 32,
31, 28, 27, 26, 24, 24, 22, 22, 22, 23, 23, 22, 22, 22, 21, 20, 20, 20,
@@ -8656,33 +8626,47 @@
13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 15, 16, 16, 17, 17, 17,
17, 18, 18, 17, 17, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12,
12, 11, 11, 11, 11, 11, 11, 10,
+ /* Size 32x16 */
+ 32, 33, 32, 28, 26, 21, 21, 21, 20, 20, 18, 18, 16, 16, 15, 15, 33, 33,
+ 31, 27, 25, 22, 22, 22, 21, 20, 19, 19, 17, 17, 16, 16, 33, 33, 31, 27,
+ 25, 22, 22, 22, 21, 21, 19, 19, 17, 17, 16, 16, 34, 32, 31, 26, 24, 22,
+ 23, 23, 22, 21, 20, 20, 18, 18, 17, 17, 33, 31, 29, 25, 24, 22, 22, 23,
+ 22, 21, 20, 20, 18, 18, 17, 17, 31, 28, 28, 24, 23, 22, 22, 22, 22, 22,
+ 20, 20, 18, 18, 17, 17, 29, 27, 26, 23, 22, 22, 22, 23, 22, 22, 20, 20,
+ 19, 18, 18, 17, 28, 26, 25, 22, 22, 22, 22, 23, 22, 22, 20, 20, 19, 19,
+ 18, 18, 25, 24, 24, 22, 21, 21, 21, 21, 21, 20, 20, 19, 18, 18, 17, 18,
+ 24, 24, 24, 22, 21, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 17, 21, 22,
+ 22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 22, 21,
+ 20, 19, 19, 19, 19, 19, 18, 18, 17, 17, 16, 16, 21, 22, 22, 22, 21, 19,
+ 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 21, 23, 22, 22, 21, 19, 19, 18,
+ 18, 18, 17, 17, 16, 16, 16, 15, 21, 23, 23, 22, 21, 19, 18, 18, 17, 17,
+ 16, 16, 15, 15, 15, 15, 21, 22, 22, 22, 21, 19, 18, 17, 17, 17, 16, 16,
+ 15, 15, 15, 15, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14,
+ 14, 14, 20, 22, 22, 22, 21, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 14,
+ 20, 21, 21, 22, 20, 19, 18, 17, 16, 16, 14, 14, 14, 14, 13, 14, 19, 20,
+ 21, 21, 20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 19, 20, 20, 21,
+ 20, 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 13, 18, 20, 20, 20, 20, 18,
+ 17, 16, 15, 15, 13, 13, 12, 12, 12, 12, 18, 20, 20, 20, 19, 18, 17, 16,
+ 15, 14, 13, 13, 12, 12, 12, 12, 17, 19, 19, 20, 19, 18, 17, 16, 14, 14,
+ 13, 13, 12, 12, 12, 12, 17, 18, 19, 19, 18, 17, 16, 16, 14, 14, 13, 13,
+ 12, 12, 12, 12, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11,
+ 11, 11, 16, 18, 18, 19, 18, 17, 16, 15, 14, 14, 12, 12, 12, 11, 11, 11,
+ 16, 17, 18, 18, 18, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11, 11, 16, 17,
+ 17, 18, 17, 17, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18,
+ 17, 16, 16, 15, 14, 13, 12, 12, 11, 11, 11, 11, 15, 17, 17, 18, 17, 16,
+ 16, 14, 14, 13, 13, 12, 12, 11, 11, 11, 15, 17, 17, 17, 17, 16, 16, 14,
+ 14, 13, 13, 12, 12, 11, 11, 10,
/* Size 4x16 */
- 33, 21, 20, 16, 33, 22, 21, 17, 31, 22, 21, 18, 27, 22, 22, 18, 24, 21,
- 20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15, 22, 19, 16, 14,
- 21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 14, 12, 18, 17, 14, 12, 18, 17,
- 14, 11, 17, 17, 13, 11, 17, 16, 13, 11,
- /* Size 16x4 */
33, 33, 31, 27, 24, 22, 22, 23, 22, 21, 20, 20, 18, 18, 17, 17, 21, 22,
22, 22, 21, 19, 19, 19, 19, 19, 19, 18, 17, 17, 17, 16, 20, 21, 21, 22,
20, 19, 18, 17, 16, 16, 15, 14, 14, 14, 13, 13, 16, 17, 18, 18, 18, 17,
16, 15, 14, 14, 13, 12, 12, 11, 11, 11,
+ /* Size 16x4 */
+ 33, 21, 20, 16, 33, 22, 21, 17, 31, 22, 21, 18, 27, 22, 22, 18, 24, 21,
+ 20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15, 22, 19, 16, 14,
+ 21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 14, 12, 18, 17, 14, 12, 18, 17,
+ 14, 11, 17, 17, 13, 11, 17, 16, 13, 11,
/* Size 8x32 */
- 32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 31,
- 25, 22, 21, 19, 17, 16, 34, 31, 24, 23, 22, 20, 18, 17, 33, 29, 24, 22,
- 22, 20, 18, 17, 31, 28, 23, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20,
- 19, 18, 28, 25, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21, 21, 20, 18, 17,
- 24, 24, 21, 21, 20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22,
- 20, 19, 19, 18, 17, 16, 21, 22, 21, 19, 18, 17, 16, 16, 21, 22, 21, 19,
- 18, 17, 16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 21, 22, 21, 18, 17, 16,
- 15, 15, 20, 22, 21, 18, 16, 15, 14, 14, 20, 22, 21, 18, 16, 15, 14, 14,
- 20, 21, 20, 18, 16, 14, 14, 13, 19, 21, 20, 17, 15, 14, 13, 13, 19, 20,
- 20, 17, 15, 14, 13, 13, 18, 20, 20, 17, 15, 13, 12, 12, 18, 20, 19, 17,
- 15, 13, 12, 12, 17, 19, 19, 17, 14, 13, 12, 12, 17, 19, 18, 16, 14, 13,
- 12, 12, 16, 18, 18, 16, 14, 12, 12, 11, 16, 18, 18, 16, 14, 12, 12, 11,
- 16, 18, 18, 16, 14, 12, 11, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17,
- 17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13, 12, 11, 15, 17, 17, 16,
- 14, 13, 12, 11,
- /* Size 32x8 */
32, 33, 33, 34, 33, 31, 29, 28, 25, 24, 21, 21, 21, 21, 21, 21, 20, 20,
20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 32, 31, 31, 31,
29, 28, 26, 25, 24, 24, 22, 22, 22, 22, 23, 22, 22, 22, 21, 21, 20, 20,
@@ -8697,7 +8681,23 @@
19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 13, 13, 12, 12, 12,
12, 12, 12, 11, 11, 11, 12, 12, 15, 16, 16, 17, 17, 17, 18, 18, 17, 17,
17, 16, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11,
- 11, 11, 11, 11 },
+ 11, 11, 11, 11,
+ /* Size 32x8 */
+ 32, 32, 26, 21, 20, 18, 16, 15, 33, 31, 25, 22, 21, 19, 17, 16, 33, 31,
+ 25, 22, 21, 19, 17, 16, 34, 31, 24, 23, 22, 20, 18, 17, 33, 29, 24, 22,
+ 22, 20, 18, 17, 31, 28, 23, 22, 22, 20, 18, 17, 29, 26, 22, 22, 22, 20,
+ 19, 18, 28, 25, 22, 22, 22, 20, 19, 18, 25, 24, 21, 21, 21, 20, 18, 17,
+ 24, 24, 21, 21, 20, 19, 18, 17, 21, 22, 20, 19, 19, 18, 17, 17, 21, 22,
+ 20, 19, 19, 18, 17, 16, 21, 22, 21, 19, 18, 17, 16, 16, 21, 22, 21, 19,
+ 18, 17, 16, 16, 21, 23, 21, 18, 17, 16, 15, 15, 21, 22, 21, 18, 17, 16,
+ 15, 15, 20, 22, 21, 18, 16, 15, 14, 14, 20, 22, 21, 18, 16, 15, 14, 14,
+ 20, 21, 20, 18, 16, 14, 14, 13, 19, 21, 20, 17, 15, 14, 13, 13, 19, 20,
+ 20, 17, 15, 14, 13, 13, 18, 20, 20, 17, 15, 13, 12, 12, 18, 20, 19, 17,
+ 15, 13, 12, 12, 17, 19, 19, 17, 14, 13, 12, 12, 17, 19, 18, 16, 14, 13,
+ 12, 12, 16, 18, 18, 16, 14, 12, 12, 11, 16, 18, 18, 16, 14, 12, 12, 11,
+ 16, 18, 18, 16, 14, 12, 11, 11, 16, 17, 17, 16, 14, 12, 11, 11, 15, 17,
+ 17, 16, 14, 12, 11, 11, 15, 17, 17, 16, 14, 13, 12, 11, 15, 17, 17, 16,
+ 14, 13, 12, 11 },
},
{
{ /* Luma */
@@ -8781,12 +8781,20 @@
12, 12, 13, 13, 13, 13, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10,
9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7,
/* Size 4x8 */
- 32, 29, 20, 14, 32, 28, 20, 14, 30, 24, 19, 14, 28, 20, 16, 12, 23, 18,
- 13, 11, 19, 16, 12, 9, 16, 14, 11, 8, 14, 13, 10, 8,
- /* Size 8x4 */
32, 32, 30, 28, 23, 19, 16, 14, 29, 28, 24, 20, 18, 16, 14, 13, 20, 20,
19, 16, 13, 12, 11, 10, 14, 14, 14, 12, 11, 9, 8, 8,
+ /* Size 8x4 */
+ 32, 29, 20, 14, 32, 28, 20, 14, 30, 24, 19, 14, 28, 20, 16, 12, 23, 18,
+ 13, 11, 19, 16, 12, 9, 16, 14, 11, 8, 14, 13, 10, 8,
/* Size 8x16 */
+ 32, 33, 33, 32, 32, 30, 28, 26, 23, 21, 19, 18, 16, 14, 13, 12, 33, 32,
+ 32, 32, 31, 30, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 32, 32, 31, 30,
+ 29, 28, 27, 26, 24, 22, 20, 19, 18, 16, 15, 14, 28, 29, 30, 28, 27, 24,
+ 21, 20, 19, 18, 17, 16, 15, 14, 13, 13, 23, 24, 25, 24, 24, 21, 19, 18,
+ 16, 15, 14, 14, 13, 12, 11, 11, 19, 20, 21, 20, 21, 19, 17, 16, 14, 13,
+ 12, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 16, 15, 14, 13, 12, 11, 10,
+ 10, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8, 8,
+ /* Size 16x8 */
32, 33, 32, 28, 23, 19, 16, 13, 33, 32, 32, 29, 24, 20, 17, 14, 33, 32,
31, 30, 25, 21, 17, 14, 32, 32, 30, 28, 24, 20, 17, 14, 32, 31, 29, 27,
24, 21, 18, 15, 30, 30, 28, 24, 21, 19, 16, 14, 28, 30, 27, 21, 19, 17,
@@ -8795,44 +8803,7 @@
19, 16, 14, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9, 14, 16, 16, 14,
12, 11, 9, 8, 13, 14, 15, 13, 11, 10, 9, 8, 12, 14, 14, 13, 11, 10, 8,
8,
- /* Size 16x8 */
- 32, 33, 33, 32, 32, 30, 28, 26, 23, 21, 19, 18, 16, 14, 13, 12, 33, 32,
- 32, 32, 31, 30, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 32, 32, 31, 30,
- 29, 28, 27, 26, 24, 22, 20, 19, 18, 16, 15, 14, 28, 29, 30, 28, 27, 24,
- 21, 20, 19, 18, 17, 16, 15, 14, 13, 13, 23, 24, 25, 24, 24, 21, 19, 18,
- 16, 15, 14, 14, 13, 12, 11, 11, 19, 20, 21, 20, 21, 19, 17, 16, 14, 13,
- 12, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 16, 15, 14, 13, 12, 11, 10,
- 10, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8, 8,
/* Size 16x32 */
- 32, 33, 33, 32, 32, 28, 28, 23, 23, 19, 19, 16, 16, 13, 13, 12, 33, 32,
- 32, 32, 32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 32,
- 32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 31, 31, 30,
- 30, 25, 25, 21, 21, 17, 17, 14, 14, 13, 33, 32, 32, 31, 31, 30, 30, 25,
- 25, 21, 21, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20,
- 20, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20, 20, 17,
- 17, 14, 14, 13, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15,
- 15, 14, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15, 15, 14,
- 30, 30, 30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 30, 30,
- 30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 28, 30, 30, 27,
- 27, 21, 21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 28, 30, 30, 27, 27, 21,
- 21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 26, 28, 28, 26, 26, 20, 20, 18,
- 18, 16, 16, 14, 14, 12, 12, 12, 26, 28, 28, 26, 26, 20, 20, 18, 18, 16,
- 16, 14, 14, 12, 12, 12, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13,
- 13, 11, 11, 11, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13, 13, 11,
- 11, 11, 21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10,
- 21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10, 19, 21,
- 21, 20, 20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 19, 21, 21, 20,
- 20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 18, 19, 19, 19, 19, 16,
- 16, 14, 14, 12, 12, 10, 10, 9, 9, 9, 18, 19, 19, 19, 19, 16, 16, 14, 14,
- 12, 12, 10, 10, 9, 9, 9, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10,
- 10, 9, 9, 8, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10, 10, 9, 9,
- 8, 14, 16, 16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 14, 16,
- 16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 13, 14, 14, 15, 15,
- 13, 13, 11, 11, 10, 10, 9, 9, 8, 8, 7, 13, 14, 14, 15, 15, 13, 13, 11,
- 11, 10, 10, 9, 9, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10,
- 8, 8, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10, 8, 8, 8, 8,
- 7, 12, 13, 13, 13, 13, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 7,
- /* Size 32x16 */
32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 23, 21,
21, 19, 19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 33, 32, 32, 32,
32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 25, 23, 23, 21, 21, 19,
@@ -8861,32 +8832,46 @@
15, 14, 14, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8,
8, 8, 8, 7, 12, 12, 12, 13, 13, 13, 13, 14, 14, 13, 13, 12, 12, 12, 12,
11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 7, 7,
+ /* Size 32x16 */
+ 32, 33, 33, 32, 32, 28, 28, 23, 23, 19, 19, 16, 16, 13, 13, 12, 33, 32,
+ 32, 32, 32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 32,
+ 32, 29, 29, 24, 24, 20, 20, 17, 17, 14, 14, 12, 33, 32, 32, 31, 31, 30,
+ 30, 25, 25, 21, 21, 17, 17, 14, 14, 13, 33, 32, 32, 31, 31, 30, 30, 25,
+ 25, 21, 21, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20,
+ 20, 17, 17, 14, 14, 13, 32, 32, 32, 30, 30, 28, 28, 24, 24, 20, 20, 17,
+ 17, 14, 14, 13, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15,
+ 15, 14, 32, 31, 31, 29, 29, 27, 27, 24, 24, 21, 21, 18, 18, 15, 15, 14,
+ 30, 30, 30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 30, 30,
+ 30, 28, 28, 24, 24, 21, 21, 19, 19, 16, 16, 14, 14, 13, 28, 30, 30, 27,
+ 27, 21, 21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 28, 30, 30, 27, 27, 21,
+ 21, 19, 19, 17, 17, 15, 15, 13, 13, 12, 26, 28, 28, 26, 26, 20, 20, 18,
+ 18, 16, 16, 14, 14, 12, 12, 12, 26, 28, 28, 26, 26, 20, 20, 18, 18, 16,
+ 16, 14, 14, 12, 12, 12, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13,
+ 13, 11, 11, 11, 23, 25, 25, 24, 24, 19, 19, 16, 16, 14, 14, 13, 13, 11,
+ 11, 11, 21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10,
+ 21, 23, 23, 22, 22, 18, 18, 15, 15, 13, 13, 12, 12, 11, 11, 10, 19, 21,
+ 21, 20, 20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 19, 21, 21, 20,
+ 20, 17, 17, 14, 14, 12, 12, 11, 11, 10, 10, 9, 18, 19, 19, 19, 19, 16,
+ 16, 14, 14, 12, 12, 10, 10, 9, 9, 9, 18, 19, 19, 19, 19, 16, 16, 14, 14,
+ 12, 12, 10, 10, 9, 9, 9, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10,
+ 10, 9, 9, 8, 16, 17, 17, 18, 18, 15, 15, 13, 13, 11, 11, 10, 10, 9, 9,
+ 8, 14, 16, 16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 14, 16,
+ 16, 16, 16, 14, 14, 12, 12, 11, 11, 9, 9, 8, 8, 8, 13, 14, 14, 15, 15,
+ 13, 13, 11, 11, 10, 10, 9, 9, 8, 8, 7, 13, 14, 14, 15, 15, 13, 13, 11,
+ 11, 10, 10, 9, 9, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10,
+ 8, 8, 8, 8, 7, 12, 14, 14, 14, 14, 13, 13, 11, 11, 10, 10, 8, 8, 8, 8,
+ 7, 12, 13, 13, 13, 13, 12, 12, 11, 11, 9, 9, 8, 8, 7, 7, 7,
/* Size 4x16 */
- 33, 28, 19, 13, 32, 29, 20, 14, 32, 30, 21, 14, 32, 28, 20, 14, 31, 27,
- 21, 15, 30, 24, 19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11,
- 23, 18, 13, 11, 21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 9, 16, 14,
- 11, 8, 14, 13, 10, 8, 14, 13, 10, 8,
- /* Size 16x4 */
33, 32, 32, 32, 31, 30, 30, 28, 25, 23, 21, 19, 17, 16, 14, 14, 28, 29,
30, 28, 27, 24, 21, 20, 19, 18, 17, 16, 15, 14, 13, 13, 19, 20, 21, 20,
21, 19, 17, 16, 14, 13, 12, 12, 11, 11, 10, 10, 13, 14, 14, 14, 15, 14,
13, 12, 11, 11, 10, 9, 9, 8, 8, 8,
+ /* Size 16x4 */
+ 33, 28, 19, 13, 32, 29, 20, 14, 32, 30, 21, 14, 32, 28, 20, 14, 31, 27,
+ 21, 15, 30, 24, 19, 14, 30, 21, 17, 13, 28, 20, 16, 12, 25, 19, 14, 11,
+ 23, 18, 13, 11, 21, 17, 12, 10, 19, 16, 12, 9, 17, 15, 11, 9, 16, 14,
+ 11, 8, 14, 13, 10, 8, 14, 13, 10, 8,
/* Size 8x32 */
- 32, 33, 32, 28, 23, 19, 16, 13, 33, 32, 32, 29, 24, 20, 17, 14, 33, 32,
- 32, 29, 24, 20, 17, 14, 33, 32, 31, 30, 25, 21, 17, 14, 33, 32, 31, 30,
- 25, 21, 17, 14, 32, 32, 30, 28, 24, 20, 17, 14, 32, 32, 30, 28, 24, 20,
- 17, 14, 32, 31, 29, 27, 24, 21, 18, 15, 32, 31, 29, 27, 24, 21, 18, 15,
- 30, 30, 28, 24, 21, 19, 16, 14, 30, 30, 28, 24, 21, 19, 16, 14, 28, 30,
- 27, 21, 19, 17, 15, 13, 28, 30, 27, 21, 19, 17, 15, 13, 26, 28, 26, 20,
- 18, 16, 14, 12, 26, 28, 26, 20, 18, 16, 14, 12, 23, 25, 24, 19, 16, 14,
- 13, 11, 23, 25, 24, 19, 16, 14, 13, 11, 21, 23, 22, 18, 15, 13, 12, 11,
- 21, 23, 22, 18, 15, 13, 12, 11, 19, 21, 20, 17, 14, 12, 11, 10, 19, 21,
- 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 10, 9, 18, 19, 19, 16,
- 14, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9, 16, 17, 18, 15, 13, 11,
- 10, 9, 14, 16, 16, 14, 12, 11, 9, 8, 14, 16, 16, 14, 12, 11, 9, 8, 13,
- 14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10, 9, 8, 12, 14, 14, 13,
- 11, 10, 8, 8, 12, 14, 14, 13, 11, 10, 8, 8, 12, 13, 13, 12, 11, 9, 8, 7,
- /* Size 32x8 */
32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 23, 21,
21, 19, 19, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12, 33, 32, 32, 32,
32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 25, 23, 23, 21, 21, 19,
@@ -8900,7 +8885,23 @@
12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 9, 16, 17, 17, 17, 17, 17,
17, 18, 18, 16, 16, 15, 15, 14, 14, 13, 13, 12, 12, 11, 11, 10, 10, 10,
10, 9, 9, 9, 9, 8, 8, 8, 13, 14, 14, 14, 14, 14, 14, 15, 15, 14, 14, 13,
- 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 7 },
+ 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 7,
+ /* Size 32x8 */
+ 32, 33, 32, 28, 23, 19, 16, 13, 33, 32, 32, 29, 24, 20, 17, 14, 33, 32,
+ 32, 29, 24, 20, 17, 14, 33, 32, 31, 30, 25, 21, 17, 14, 33, 32, 31, 30,
+ 25, 21, 17, 14, 32, 32, 30, 28, 24, 20, 17, 14, 32, 32, 30, 28, 24, 20,
+ 17, 14, 32, 31, 29, 27, 24, 21, 18, 15, 32, 31, 29, 27, 24, 21, 18, 15,
+ 30, 30, 28, 24, 21, 19, 16, 14, 30, 30, 28, 24, 21, 19, 16, 14, 28, 30,
+ 27, 21, 19, 17, 15, 13, 28, 30, 27, 21, 19, 17, 15, 13, 26, 28, 26, 20,
+ 18, 16, 14, 12, 26, 28, 26, 20, 18, 16, 14, 12, 23, 25, 24, 19, 16, 14,
+ 13, 11, 23, 25, 24, 19, 16, 14, 13, 11, 21, 23, 22, 18, 15, 13, 12, 11,
+ 21, 23, 22, 18, 15, 13, 12, 11, 19, 21, 20, 17, 14, 12, 11, 10, 19, 21,
+ 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 10, 9, 18, 19, 19, 16,
+ 14, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9, 16, 17, 18, 15, 13, 11,
+ 10, 9, 14, 16, 16, 14, 12, 11, 9, 8, 14, 16, 16, 14, 12, 11, 9, 8, 13,
+ 14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10, 9, 8, 12, 14, 14, 13,
+ 11, 10, 8, 8, 12, 14, 14, 13, 11, 10, 8, 8, 12, 13, 13, 12, 11, 9, 8,
+ 7 },
{ /* Chroma */
/* Size 4x4 */
32, 22, 22, 18, 22, 19, 19, 17, 22, 19, 16, 14, 18, 17, 14, 12,
@@ -8984,21 +8985,12 @@
11, 11, 15, 16, 16, 17, 17, 17, 17, 18, 18, 17, 17, 17, 17, 16, 16, 15,
15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11,
/* Size 4x8 */
- 33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 22, 19, 18, 16, 22, 19,
- 16, 14, 20, 19, 15, 13, 19, 18, 14, 12, 17, 17, 14, 11,
- /* Size 8x4 */
33, 28, 24, 22, 22, 20, 19, 17, 22, 22, 20, 19, 19, 19, 18, 17, 20, 22,
20, 18, 16, 15, 14, 14, 17, 18, 18, 16, 14, 13, 12, 11,
+ /* Size 8x4 */
+ 33, 22, 20, 17, 28, 22, 22, 18, 24, 20, 20, 18, 22, 19, 18, 16, 22, 19,
+ 16, 14, 20, 19, 15, 13, 19, 18, 14, 12, 17, 17, 14, 11,
/* Size 8x16 */
- 32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 34, 32,
- 26, 22, 23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 28, 26, 22, 22,
- 23, 22, 20, 19, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22, 21, 19, 19, 19,
- 18, 17, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17, 16, 15,
- 20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 19, 20,
- 21, 19, 17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 17, 19, 20, 18,
- 16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17, 18, 17, 15, 14,
- 12, 11,
- /* Size 16x8 */
32, 33, 34, 31, 28, 24, 21, 21, 21, 20, 20, 19, 18, 17, 16, 16, 33, 33,
32, 28, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 28, 27, 26, 24,
22, 22, 21, 22, 22, 22, 22, 21, 20, 20, 19, 18, 21, 22, 22, 22, 22, 20,
@@ -9007,37 +8999,16 @@
16, 15, 15, 14, 14, 14, 18, 19, 20, 20, 20, 19, 18, 17, 16, 15, 14, 14,
13, 13, 12, 12, 16, 17, 18, 18, 19, 18, 17, 16, 15, 14, 14, 13, 12, 12,
12, 11,
+ /* Size 16x8 */
+ 32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 34, 32,
+ 26, 22, 23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 28, 26, 22, 22,
+ 23, 22, 20, 19, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22, 21, 19, 19, 19,
+ 18, 17, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17, 16, 15,
+ 20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 19, 20,
+ 21, 19, 17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 17, 19, 20, 18,
+ 16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17, 18, 17, 15, 14,
+ 12, 11,
/* Size 16x32 */
- 32, 33, 33, 28, 28, 21, 21, 21, 21, 20, 20, 18, 18, 16, 16, 16, 33, 33,
- 33, 27, 27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 33, 33, 33, 27,
- 27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 34, 32, 32, 26, 26, 22,
- 22, 23, 23, 21, 21, 20, 20, 18, 18, 17, 34, 32, 32, 26, 26, 22, 22, 23,
- 23, 21, 21, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22,
- 22, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22, 22, 20,
- 20, 18, 18, 17, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19,
- 19, 18, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19, 19, 18,
- 24, 24, 24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 24, 24,
- 24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 21, 22, 22, 21,
- 21, 19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 21, 21, 19,
- 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 22, 22, 19, 19, 18,
- 18, 18, 18, 17, 17, 16, 16, 16, 21, 22, 22, 22, 22, 19, 19, 18, 18, 18,
- 18, 17, 17, 16, 16, 16, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16,
- 16, 15, 15, 15, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16, 16, 15,
- 15, 15, 20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14,
- 20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14, 20, 21,
- 21, 22, 22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 20, 21, 21, 22,
- 22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 19, 20, 20, 21, 21, 19,
- 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 19, 20, 20, 21, 21, 19, 19, 17,
- 17, 15, 15, 14, 14, 13, 13, 13, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15,
- 15, 13, 13, 12, 12, 12, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15, 15, 13,
- 13, 12, 12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12,
- 12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12,
- 16, 18, 18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 18,
- 18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 17, 17, 18,
- 18, 17, 17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 17,
- 17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 16, 16, 15,
- 15, 13, 13, 12, 12, 11, 11, 11,
- /* Size 32x16 */
32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 21, 20,
20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 33, 33, 33, 32,
32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20,
@@ -9067,33 +9038,47 @@
14, 13, 13, 12, 12, 12, 12, 12, 12, 11, 11, 11, 16, 16, 16, 17, 17, 17,
17, 18, 18, 17, 17, 17, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 13, 12,
12, 12, 12, 11, 11, 11, 11, 11,
+ /* Size 32x16 */
+ 32, 33, 33, 28, 28, 21, 21, 21, 21, 20, 20, 18, 18, 16, 16, 16, 33, 33,
+ 33, 27, 27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 33, 33, 33, 27,
+ 27, 22, 22, 22, 22, 20, 20, 19, 19, 17, 17, 16, 34, 32, 32, 26, 26, 22,
+ 22, 23, 23, 21, 21, 20, 20, 18, 18, 17, 34, 32, 32, 26, 26, 22, 22, 23,
+ 23, 21, 21, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22,
+ 22, 20, 20, 18, 18, 17, 31, 28, 28, 24, 24, 22, 22, 22, 22, 22, 22, 20,
+ 20, 18, 18, 17, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19,
+ 19, 18, 28, 26, 26, 22, 22, 22, 22, 23, 23, 22, 22, 20, 20, 19, 19, 18,
+ 24, 24, 24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 24, 24,
+ 24, 22, 22, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 17, 21, 22, 22, 21,
+ 21, 19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 21, 21, 19,
+ 19, 19, 19, 19, 19, 18, 18, 17, 17, 17, 21, 22, 22, 22, 22, 19, 19, 18,
+ 18, 18, 18, 17, 17, 16, 16, 16, 21, 22, 22, 22, 22, 19, 19, 18, 18, 18,
+ 18, 17, 17, 16, 16, 16, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16,
+ 16, 15, 15, 15, 21, 23, 23, 22, 22, 19, 19, 18, 18, 17, 17, 16, 16, 15,
+ 15, 15, 20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14,
+ 20, 22, 22, 22, 22, 19, 19, 17, 17, 16, 16, 15, 15, 14, 14, 14, 20, 21,
+ 21, 22, 22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 20, 21, 21, 22,
+ 22, 19, 19, 17, 17, 16, 16, 14, 14, 14, 14, 13, 19, 20, 20, 21, 21, 19,
+ 19, 17, 17, 15, 15, 14, 14, 13, 13, 13, 19, 20, 20, 21, 21, 19, 19, 17,
+ 17, 15, 15, 14, 14, 13, 13, 13, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15,
+ 15, 13, 13, 12, 12, 12, 18, 20, 20, 20, 20, 18, 18, 16, 16, 15, 15, 13,
+ 13, 12, 12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12,
+ 12, 12, 17, 19, 19, 20, 20, 18, 18, 16, 16, 14, 14, 13, 13, 12, 12, 12,
+ 16, 18, 18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 18,
+ 18, 19, 19, 17, 17, 15, 15, 14, 14, 12, 12, 12, 12, 11, 16, 17, 17, 18,
+ 18, 17, 17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 17,
+ 17, 15, 15, 14, 14, 12, 12, 11, 11, 11, 16, 17, 17, 18, 18, 16, 16, 15,
+ 15, 13, 13, 12, 12, 11, 11, 11,
/* Size 4x16 */
- 33, 21, 20, 16, 33, 22, 20, 17, 32, 22, 21, 18, 28, 22, 22, 18, 26, 22,
- 22, 19, 24, 20, 20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15,
- 22, 19, 16, 14, 21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 15, 12, 19, 18,
- 14, 12, 18, 17, 14, 12, 17, 17, 14, 11,
- /* Size 16x4 */
33, 33, 32, 28, 26, 24, 22, 22, 23, 22, 21, 20, 20, 19, 18, 17, 21, 22,
22, 22, 22, 20, 19, 19, 19, 19, 19, 19, 18, 18, 17, 17, 20, 20, 21, 22,
22, 20, 19, 18, 17, 16, 16, 15, 15, 14, 14, 14, 16, 17, 18, 18, 19, 18,
17, 16, 15, 14, 14, 13, 12, 12, 12, 11,
+ /* Size 16x4 */
+ 33, 21, 20, 16, 33, 22, 20, 17, 32, 22, 21, 18, 28, 22, 22, 18, 26, 22,
+ 22, 19, 24, 20, 20, 18, 22, 19, 19, 17, 22, 19, 18, 16, 23, 19, 17, 15,
+ 22, 19, 16, 14, 21, 19, 16, 14, 20, 19, 15, 13, 20, 18, 15, 12, 19, 18,
+ 14, 12, 18, 17, 14, 12, 17, 17, 14, 11,
/* Size 8x32 */
- 32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 33, 33,
- 27, 22, 22, 20, 19, 17, 34, 32, 26, 22, 23, 21, 20, 18, 34, 32, 26, 22,
- 23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 31, 28, 24, 22, 22, 22,
- 20, 18, 28, 26, 22, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 20, 19,
- 24, 24, 22, 20, 21, 20, 19, 18, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22,
- 21, 19, 19, 19, 18, 17, 21, 22, 21, 19, 19, 19, 18, 17, 21, 22, 22, 19,
- 18, 18, 17, 16, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17,
- 16, 15, 21, 23, 22, 19, 18, 17, 16, 15, 20, 22, 22, 19, 17, 16, 15, 14,
- 20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 20, 21,
- 22, 19, 17, 16, 14, 14, 19, 20, 21, 19, 17, 15, 14, 13, 19, 20, 21, 19,
- 17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 18, 20, 20, 18, 16, 15,
- 13, 12, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19, 20, 18, 16, 14, 13, 12,
- 16, 18, 19, 17, 15, 14, 12, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17,
- 18, 17, 15, 14, 12, 11, 16, 17, 18, 17, 15, 14, 12, 11, 16, 17, 18, 16,
- 15, 13, 12, 11,
- /* Size 32x8 */
32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 21, 20,
20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 16, 33, 33, 33, 32,
32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22, 21, 21, 20,
@@ -9108,7 +9093,23 @@
20, 20, 20, 19, 19, 18, 18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13,
13, 13, 13, 12, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 18,
18, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13, 13, 12, 12, 12, 12, 12,
- 12, 11, 11, 11 },
+ 12, 11, 11, 11,
+ /* Size 32x8 */
+ 32, 33, 28, 21, 21, 20, 18, 16, 33, 33, 27, 22, 22, 20, 19, 17, 33, 33,
+ 27, 22, 22, 20, 19, 17, 34, 32, 26, 22, 23, 21, 20, 18, 34, 32, 26, 22,
+ 23, 21, 20, 18, 31, 28, 24, 22, 22, 22, 20, 18, 31, 28, 24, 22, 22, 22,
+ 20, 18, 28, 26, 22, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 20, 19,
+ 24, 24, 22, 20, 21, 20, 19, 18, 24, 24, 22, 20, 21, 20, 19, 18, 21, 22,
+ 21, 19, 19, 19, 18, 17, 21, 22, 21, 19, 19, 19, 18, 17, 21, 22, 22, 19,
+ 18, 18, 17, 16, 21, 22, 22, 19, 18, 18, 17, 16, 21, 23, 22, 19, 18, 17,
+ 16, 15, 21, 23, 22, 19, 18, 17, 16, 15, 20, 22, 22, 19, 17, 16, 15, 14,
+ 20, 22, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19, 17, 16, 14, 14, 20, 21,
+ 22, 19, 17, 16, 14, 14, 19, 20, 21, 19, 17, 15, 14, 13, 19, 20, 21, 19,
+ 17, 15, 14, 13, 18, 20, 20, 18, 16, 15, 13, 12, 18, 20, 20, 18, 16, 15,
+ 13, 12, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19, 20, 18, 16, 14, 13, 12,
+ 16, 18, 19, 17, 15, 14, 12, 12, 16, 18, 19, 17, 15, 14, 12, 12, 16, 17,
+ 18, 17, 15, 14, 12, 11, 16, 17, 18, 17, 15, 14, 12, 11, 16, 17, 18, 16,
+ 15, 13, 12, 11 },
},
{
{ /* Luma */
@@ -9194,21 +9195,12 @@
14, 15, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 11, 11, 10, 10, 9,
9, 9, 9, 9, 8, 8, 8, 8,
/* Size 4x8 */
- 32, 30, 24, 17, 32, 30, 24, 17, 31, 28, 23, 18, 29, 24, 19, 15, 25, 21,
- 16, 13, 21, 19, 14, 11, 18, 17, 13, 10, 16, 15, 12, 9,
- /* Size 8x4 */
32, 32, 31, 29, 25, 21, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 24, 24,
23, 19, 16, 14, 13, 12, 17, 17, 18, 15, 13, 11, 10, 9,
+ /* Size 8x4 */
+ 32, 30, 24, 17, 32, 30, 24, 17, 31, 28, 23, 18, 29, 24, 19, 15, 25, 21,
+ 16, 13, 21, 19, 14, 11, 18, 17, 13, 10, 16, 15, 12, 9,
/* Size 8x16 */
- 32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
- 31, 30, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20, 18, 16, 32, 31, 29, 27,
- 24, 21, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17,
- 16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 25, 26, 25, 20, 17, 15, 14, 13,
- 23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 19, 21,
- 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 11, 10, 16, 17, 18, 15,
- 13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 13, 14, 15, 13, 11, 10, 9,
- 8,
- /* Size 16x8 */
32, 33, 33, 32, 32, 30, 29, 27, 25, 23, 21, 19, 18, 16, 14, 13, 33, 32,
32, 32, 31, 30, 30, 28, 26, 24, 23, 21, 19, 17, 16, 14, 32, 32, 31, 30,
29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 16, 15, 28, 29, 30, 28, 27, 24,
@@ -9217,37 +9209,16 @@
13, 12, 12, 11, 11, 10, 17, 17, 18, 18, 18, 17, 16, 15, 14, 13, 12, 11,
11, 10, 9, 9, 14, 15, 16, 16, 16, 15, 14, 13, 13, 12, 11, 10, 10, 9, 9,
8,
+ /* Size 16x8 */
+ 32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
+ 31, 30, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20, 18, 16, 32, 31, 29, 27,
+ 24, 21, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17,
+ 16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 25, 26, 25, 20, 17, 15, 14, 13,
+ 23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 19, 21,
+ 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12, 11, 10, 16, 17, 18, 15,
+ 13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 13, 14, 15, 13, 11, 10, 9,
+ 8,
/* Size 16x32 */
- 32, 33, 33, 32, 32, 30, 28, 27, 23, 23, 19, 19, 17, 16, 14, 13, 33, 32,
- 32, 32, 32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32,
- 32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32, 32, 31,
- 29, 28, 25, 24, 20, 20, 18, 17, 15, 14, 33, 32, 32, 32, 31, 31, 30, 28,
- 25, 25, 21, 21, 18, 17, 16, 14, 33, 32, 32, 31, 31, 30, 29, 28, 25, 24,
- 21, 21, 18, 17, 16, 14, 32, 32, 32, 31, 30, 29, 28, 27, 24, 24, 20, 20,
- 18, 17, 16, 14, 32, 32, 32, 30, 30, 29, 28, 27, 24, 24, 21, 21, 18, 17,
- 16, 15, 32, 32, 31, 30, 29, 28, 27, 26, 24, 24, 21, 21, 18, 18, 16, 15,
- 32, 31, 31, 30, 29, 28, 26, 26, 24, 23, 20, 20, 18, 18, 16, 15, 30, 30,
- 30, 28, 28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 30, 30, 30, 28,
- 28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 29, 30, 30, 28, 27, 24,
- 22, 21, 20, 19, 17, 17, 16, 15, 14, 13, 28, 29, 30, 28, 27, 24, 21, 21,
- 19, 19, 17, 17, 16, 15, 14, 13, 27, 28, 28, 27, 26, 23, 21, 20, 18, 18,
- 16, 16, 15, 14, 13, 13, 26, 27, 28, 26, 26, 23, 20, 20, 18, 18, 16, 16,
- 14, 14, 13, 12, 25, 26, 26, 25, 25, 22, 20, 19, 17, 17, 15, 15, 14, 13,
- 13, 12, 23, 25, 25, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11,
- 23, 24, 24, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11, 21, 23,
- 23, 22, 22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 21, 23, 23, 22,
- 22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 19, 21, 21, 21, 21, 19,
- 17, 17, 14, 14, 13, 13, 12, 11, 10, 10, 19, 20, 21, 20, 20, 19, 17, 16,
- 14, 14, 12, 12, 11, 11, 10, 10, 18, 19, 20, 20, 20, 18, 17, 16, 14, 14,
- 12, 12, 11, 11, 10, 9, 18, 19, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12,
- 11, 10, 10, 9, 17, 18, 18, 18, 18, 17, 16, 15, 13, 13, 12, 12, 10, 10,
- 9, 9, 16, 17, 17, 17, 18, 16, 15, 14, 13, 13, 11, 11, 10, 10, 9, 9, 15,
- 17, 17, 17, 17, 16, 15, 14, 13, 12, 11, 11, 10, 10, 9, 9, 14, 16, 16,
- 16, 16, 15, 14, 13, 12, 12, 11, 11, 9, 9, 9, 8, 14, 16, 16, 16, 16, 15,
- 14, 13, 12, 12, 10, 10, 9, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11,
- 11, 10, 10, 9, 9, 8, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 10,
- 9, 9, 8, 8,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 29, 28, 27, 26, 25, 23,
23, 21, 21, 19, 19, 18, 18, 17, 16, 15, 14, 14, 13, 13, 33, 32, 32, 32,
32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 28, 27, 26, 25, 24, 23, 23, 21,
@@ -9277,33 +9248,47 @@
10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15,
14, 14, 13, 13, 13, 12, 12, 11, 11, 11, 11, 10, 10, 9, 9, 9, 9, 9, 8, 8,
8, 8,
+ /* Size 32x16 */
+ 32, 33, 33, 32, 32, 30, 28, 27, 23, 23, 19, 19, 17, 16, 14, 13, 33, 32,
+ 32, 32, 32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32,
+ 32, 30, 29, 28, 24, 24, 20, 20, 17, 17, 15, 14, 33, 32, 32, 32, 32, 31,
+ 29, 28, 25, 24, 20, 20, 18, 17, 15, 14, 33, 32, 32, 32, 31, 31, 30, 28,
+ 25, 25, 21, 21, 18, 17, 16, 14, 33, 32, 32, 31, 31, 30, 29, 28, 25, 24,
+ 21, 21, 18, 17, 16, 14, 32, 32, 32, 31, 30, 29, 28, 27, 24, 24, 20, 20,
+ 18, 17, 16, 14, 32, 32, 32, 30, 30, 29, 28, 27, 24, 24, 21, 21, 18, 17,
+ 16, 15, 32, 32, 31, 30, 29, 28, 27, 26, 24, 24, 21, 21, 18, 18, 16, 15,
+ 32, 31, 31, 30, 29, 28, 26, 26, 24, 23, 20, 20, 18, 18, 16, 15, 30, 30,
+ 30, 28, 28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 30, 30, 30, 28,
+ 28, 26, 24, 23, 21, 21, 19, 19, 17, 16, 15, 14, 29, 30, 30, 28, 27, 24,
+ 22, 21, 20, 19, 17, 17, 16, 15, 14, 13, 28, 29, 30, 28, 27, 24, 21, 21,
+ 19, 19, 17, 17, 16, 15, 14, 13, 27, 28, 28, 27, 26, 23, 21, 20, 18, 18,
+ 16, 16, 15, 14, 13, 13, 26, 27, 28, 26, 26, 23, 20, 20, 18, 18, 16, 16,
+ 14, 14, 13, 12, 25, 26, 26, 25, 25, 22, 20, 19, 17, 17, 15, 15, 14, 13,
+ 13, 12, 23, 25, 25, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11,
+ 23, 24, 24, 24, 24, 21, 19, 18, 16, 16, 14, 14, 13, 13, 12, 11, 21, 23,
+ 23, 22, 22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 21, 23, 23, 22,
+ 22, 20, 18, 17, 15, 15, 13, 13, 12, 12, 11, 11, 19, 21, 21, 21, 21, 19,
+ 17, 17, 14, 14, 13, 13, 12, 11, 10, 10, 19, 20, 21, 20, 20, 19, 17, 16,
+ 14, 14, 12, 12, 11, 11, 10, 10, 18, 19, 20, 20, 20, 18, 17, 16, 14, 14,
+ 12, 12, 11, 11, 10, 9, 18, 19, 19, 19, 19, 18, 16, 15, 14, 13, 12, 12,
+ 11, 10, 10, 9, 17, 18, 18, 18, 18, 17, 16, 15, 13, 13, 12, 12, 10, 10,
+ 9, 9, 16, 17, 17, 17, 18, 16, 15, 14, 13, 13, 11, 11, 10, 10, 9, 9, 15,
+ 17, 17, 17, 17, 16, 15, 14, 13, 12, 11, 11, 10, 10, 9, 9, 14, 16, 16,
+ 16, 16, 15, 14, 13, 12, 12, 11, 11, 9, 9, 9, 8, 14, 16, 16, 16, 16, 15,
+ 14, 13, 12, 12, 10, 10, 9, 9, 9, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11,
+ 11, 10, 10, 9, 9, 8, 8, 13, 14, 14, 14, 15, 14, 13, 12, 11, 11, 10, 10,
+ 9, 9, 8, 8,
/* Size 4x16 */
- 33, 30, 23, 16, 32, 30, 24, 17, 32, 31, 25, 17, 32, 29, 24, 17, 32, 28,
- 24, 18, 30, 26, 21, 16, 30, 24, 19, 15, 28, 23, 18, 14, 26, 22, 17, 13,
- 24, 21, 16, 13, 23, 20, 15, 12, 20, 19, 14, 11, 19, 18, 13, 10, 17, 16,
- 13, 10, 16, 15, 12, 9, 14, 14, 11, 9,
- /* Size 16x4 */
33, 32, 32, 32, 32, 30, 30, 28, 26, 24, 23, 20, 19, 17, 16, 14, 30, 30,
31, 29, 28, 26, 24, 23, 22, 21, 20, 19, 18, 16, 15, 14, 23, 24, 25, 24,
24, 21, 19, 18, 17, 16, 15, 14, 13, 13, 12, 11, 16, 17, 17, 17, 18, 16,
15, 14, 13, 13, 12, 11, 10, 10, 9, 9,
+ /* Size 16x4 */
+ 33, 30, 23, 16, 32, 30, 24, 17, 32, 31, 25, 17, 32, 29, 24, 17, 32, 28,
+ 24, 18, 30, 26, 21, 16, 30, 24, 19, 15, 28, 23, 18, 14, 26, 22, 17, 13,
+ 24, 21, 16, 13, 23, 20, 15, 12, 20, 19, 14, 11, 19, 18, 13, 10, 17, 16,
+ 13, 10, 16, 15, 12, 9, 14, 14, 11, 9,
/* Size 8x32 */
- 32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
- 32, 29, 24, 20, 17, 15, 33, 32, 32, 29, 25, 20, 18, 15, 33, 32, 31, 30,
- 25, 21, 18, 16, 33, 32, 31, 29, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20,
- 18, 16, 32, 32, 30, 28, 24, 21, 18, 16, 32, 31, 29, 27, 24, 21, 18, 16,
- 32, 31, 29, 26, 24, 20, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 30, 30,
- 28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17, 16, 14, 28, 30, 27, 21,
- 19, 17, 16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 26, 28, 26, 20, 18, 16,
- 14, 13, 25, 26, 25, 20, 17, 15, 14, 13, 23, 25, 24, 19, 16, 14, 13, 12,
- 23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 21, 23,
- 22, 18, 15, 13, 12, 11, 19, 21, 21, 17, 14, 13, 12, 10, 19, 21, 20, 17,
- 14, 12, 11, 10, 18, 20, 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12,
- 11, 10, 17, 18, 18, 16, 13, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9,
- 15, 17, 17, 15, 13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 14, 16, 16,
- 14, 12, 10, 9, 9, 13, 14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10,
- 9, 8,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 29, 28, 27, 26, 25, 23,
23, 21, 21, 19, 19, 18, 18, 17, 16, 15, 14, 14, 13, 13, 33, 32, 32, 32,
32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 24, 23, 23, 21,
@@ -9318,7 +9303,23 @@
18, 18, 18, 18, 17, 17, 16, 16, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11,
11, 10, 10, 10, 9, 9, 9, 9, 14, 15, 15, 15, 16, 16, 16, 16, 16, 16, 15,
15, 14, 14, 13, 13, 13, 12, 12, 11, 11, 10, 10, 10, 10, 9, 9, 9, 9, 9,
- 8, 8 },
+ 8, 8,
+ /* Size 32x8 */
+ 32, 33, 32, 28, 23, 19, 17, 14, 33, 32, 32, 29, 24, 20, 17, 15, 33, 32,
+ 32, 29, 24, 20, 17, 15, 33, 32, 32, 29, 25, 20, 18, 15, 33, 32, 31, 30,
+ 25, 21, 18, 16, 33, 32, 31, 29, 25, 21, 18, 16, 32, 32, 30, 28, 24, 20,
+ 18, 16, 32, 32, 30, 28, 24, 21, 18, 16, 32, 31, 29, 27, 24, 21, 18, 16,
+ 32, 31, 29, 26, 24, 20, 18, 16, 30, 30, 28, 24, 21, 19, 17, 15, 30, 30,
+ 28, 24, 21, 19, 17, 15, 29, 30, 27, 22, 20, 17, 16, 14, 28, 30, 27, 21,
+ 19, 17, 16, 14, 27, 28, 26, 21, 18, 16, 15, 13, 26, 28, 26, 20, 18, 16,
+ 14, 13, 25, 26, 25, 20, 17, 15, 14, 13, 23, 25, 24, 19, 16, 14, 13, 12,
+ 23, 24, 24, 19, 16, 14, 13, 12, 21, 23, 22, 18, 15, 13, 12, 11, 21, 23,
+ 22, 18, 15, 13, 12, 11, 19, 21, 21, 17, 14, 13, 12, 10, 19, 21, 20, 17,
+ 14, 12, 11, 10, 18, 20, 20, 17, 14, 12, 11, 10, 18, 19, 19, 16, 14, 12,
+ 11, 10, 17, 18, 18, 16, 13, 12, 10, 9, 16, 17, 18, 15, 13, 11, 10, 9,
+ 15, 17, 17, 15, 13, 11, 10, 9, 14, 16, 16, 14, 12, 11, 9, 9, 14, 16, 16,
+ 14, 12, 10, 9, 9, 13, 14, 15, 13, 11, 10, 9, 8, 13, 14, 15, 13, 11, 10,
+ 9, 8 },
{ /* Chroma */
/* Size 4x4 */
33, 24, 22, 19, 24, 21, 20, 19, 22, 20, 17, 15, 19, 19, 15, 13,
@@ -9402,21 +9403,12 @@
12, 12, 16, 17, 17, 18, 18, 18, 18, 19, 19, 19, 18, 18, 17, 17, 17, 16,
16, 15, 15, 14, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 12, 12,
/* Size 4x8 */
- 33, 24, 22, 19, 31, 23, 23, 20, 26, 22, 22, 20, 22, 20, 19, 18, 23, 21,
- 17, 16, 21, 20, 17, 15, 20, 20, 16, 14, 19, 19, 16, 13,
- /* Size 8x4 */
33, 31, 26, 22, 23, 21, 20, 19, 24, 23, 22, 20, 21, 20, 20, 19, 22, 23,
22, 19, 17, 17, 16, 16, 19, 20, 20, 18, 16, 15, 14, 13,
+ /* Size 8x4 */
+ 33, 24, 22, 19, 31, 23, 23, 20, 26, 22, 22, 20, 22, 20, 19, 18, 23, 21,
+ 17, 16, 21, 20, 17, 15, 20, 20, 16, 14, 19, 19, 16, 13,
/* Size 8x16 */
- 32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 34, 32,
- 26, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22, 20, 19, 28, 26, 22, 22,
- 23, 22, 21, 20, 24, 24, 22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19,
- 19, 18, 21, 22, 22, 19, 19, 18, 18, 17, 21, 23, 22, 19, 18, 17, 17, 16,
- 21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 21,
- 22, 19, 17, 16, 15, 14, 19, 20, 21, 19, 17, 15, 14, 13, 18, 20, 20, 18,
- 16, 15, 14, 13, 17, 19, 20, 18, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14,
- 13, 12,
- /* Size 16x8 */
32, 33, 34, 31, 28, 24, 22, 21, 21, 21, 20, 20, 19, 18, 17, 16, 33, 33,
32, 28, 26, 24, 22, 22, 23, 23, 22, 21, 20, 20, 19, 18, 28, 27, 26, 24,
22, 22, 21, 22, 22, 22, 22, 22, 21, 20, 20, 19, 21, 22, 22, 22, 22, 20,
@@ -9425,37 +9417,16 @@
16, 16, 15, 15, 14, 14, 18, 19, 20, 20, 21, 19, 19, 18, 17, 16, 16, 15,
14, 14, 13, 13, 17, 18, 19, 19, 20, 18, 18, 17, 16, 16, 15, 14, 13, 13,
12, 12,
+ /* Size 16x8 */
+ 32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 34, 32,
+ 26, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22, 20, 19, 28, 26, 22, 22,
+ 23, 22, 21, 20, 24, 24, 22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19,
+ 19, 18, 21, 22, 22, 19, 19, 18, 18, 17, 21, 23, 22, 19, 18, 17, 17, 16,
+ 21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 21,
+ 22, 19, 17, 16, 15, 14, 19, 20, 21, 19, 17, 15, 14, 13, 18, 20, 20, 18,
+ 16, 15, 14, 13, 17, 19, 20, 18, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14,
+ 13, 12,
/* Size 16x32 */
- 32, 33, 33, 29, 28, 24, 21, 21, 21, 21, 20, 20, 18, 18, 17, 16, 33, 33,
- 33, 28, 27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 33, 33, 33, 28,
- 27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 34, 32, 32, 28, 26, 24,
- 22, 22, 22, 22, 21, 21, 20, 20, 18, 18, 34, 32, 32, 28, 26, 24, 22, 22,
- 23, 23, 21, 21, 20, 20, 19, 18, 32, 31, 30, 26, 25, 23, 22, 22, 23, 23,
- 21, 21, 20, 20, 19, 18, 31, 29, 28, 26, 24, 23, 22, 22, 22, 22, 22, 22,
- 20, 20, 19, 18, 30, 28, 28, 24, 23, 23, 22, 22, 23, 22, 22, 22, 20, 20,
- 19, 19, 28, 26, 26, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 20, 20, 19,
- 28, 26, 26, 23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 24, 24,
- 24, 22, 22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 24, 24, 24, 22,
- 22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 22, 22, 22, 22, 21, 20,
- 20, 20, 19, 19, 19, 19, 19, 18, 18, 17, 21, 22, 22, 22, 21, 20, 19, 19,
- 19, 19, 19, 19, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 19, 19,
- 18, 18, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18,
- 17, 17, 17, 16, 21, 22, 23, 22, 22, 21, 19, 19, 18, 18, 17, 17, 17, 17,
- 16, 16, 21, 23, 23, 23, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15,
- 21, 22, 23, 22, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15, 20, 22,
- 22, 22, 22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 22, 22, 22,
- 22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 21, 21, 22, 22, 20,
- 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 20, 21, 21, 22, 22, 20, 19, 18,
- 17, 17, 16, 16, 15, 14, 14, 14, 19, 20, 21, 21, 21, 20, 19, 18, 17, 17,
- 15, 15, 14, 14, 14, 13, 19, 20, 20, 21, 21, 20, 19, 18, 17, 16, 15, 15,
- 14, 14, 13, 13, 19, 20, 20, 20, 21, 20, 18, 18, 16, 16, 15, 15, 14, 14,
- 13, 13, 18, 20, 20, 20, 20, 19, 18, 18, 16, 16, 15, 15, 14, 13, 13, 12,
- 18, 19, 19, 20, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 13, 12, 17, 19,
- 19, 19, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 12, 12, 17, 19, 19, 19,
- 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 16, 18, 18, 18, 19, 18,
- 17, 17, 15, 15, 14, 14, 13, 12, 12, 12, 16, 18, 18, 18, 19, 18, 17, 17,
- 15, 15, 14, 14, 13, 12, 12, 12,
- /* Size 32x16 */
32, 33, 33, 34, 34, 32, 31, 30, 28, 28, 24, 24, 22, 21, 21, 21, 21, 21,
21, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 16, 16, 33, 33, 33, 32,
32, 31, 29, 28, 26, 26, 24, 24, 22, 22, 22, 22, 22, 23, 22, 22, 22, 21,
@@ -9485,33 +9456,47 @@
15, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 16, 17, 17, 18, 18, 18,
18, 19, 19, 19, 18, 18, 17, 17, 17, 16, 16, 15, 15, 14, 14, 14, 14, 13,
13, 13, 12, 12, 12, 12, 12, 12,
+ /* Size 32x16 */
+ 32, 33, 33, 29, 28, 24, 21, 21, 21, 21, 20, 20, 18, 18, 17, 16, 33, 33,
+ 33, 28, 27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 33, 33, 33, 28,
+ 27, 24, 22, 22, 22, 22, 20, 20, 19, 19, 18, 17, 34, 32, 32, 28, 26, 24,
+ 22, 22, 22, 22, 21, 21, 20, 20, 18, 18, 34, 32, 32, 28, 26, 24, 22, 22,
+ 23, 23, 21, 21, 20, 20, 19, 18, 32, 31, 30, 26, 25, 23, 22, 22, 23, 23,
+ 21, 21, 20, 20, 19, 18, 31, 29, 28, 26, 24, 23, 22, 22, 22, 22, 22, 22,
+ 20, 20, 19, 18, 30, 28, 28, 24, 23, 23, 22, 22, 23, 22, 22, 22, 20, 20,
+ 19, 19, 28, 26, 26, 23, 22, 22, 22, 22, 23, 22, 22, 22, 21, 20, 20, 19,
+ 28, 26, 26, 23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 24, 24,
+ 24, 22, 22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 24, 24, 24, 22,
+ 22, 21, 20, 20, 21, 21, 20, 20, 19, 19, 18, 18, 22, 22, 22, 22, 21, 20,
+ 20, 20, 19, 19, 19, 19, 19, 18, 18, 17, 21, 22, 22, 22, 21, 20, 19, 19,
+ 19, 19, 19, 19, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 19, 19,
+ 18, 18, 18, 18, 17, 17, 21, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18,
+ 17, 17, 17, 16, 21, 22, 23, 22, 22, 21, 19, 19, 18, 18, 17, 17, 17, 17,
+ 16, 16, 21, 23, 23, 23, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15,
+ 21, 22, 23, 22, 22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15, 20, 22,
+ 22, 22, 22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 22, 22, 22,
+ 22, 20, 19, 19, 17, 17, 16, 16, 16, 15, 15, 14, 20, 21, 21, 22, 22, 20,
+ 19, 18, 17, 17, 16, 16, 15, 15, 14, 14, 20, 21, 21, 22, 22, 20, 19, 18,
+ 17, 17, 16, 16, 15, 14, 14, 14, 19, 20, 21, 21, 21, 20, 19, 18, 17, 17,
+ 15, 15, 14, 14, 14, 13, 19, 20, 20, 21, 21, 20, 19, 18, 17, 16, 15, 15,
+ 14, 14, 13, 13, 19, 20, 20, 20, 21, 20, 18, 18, 16, 16, 15, 15, 14, 14,
+ 13, 13, 18, 20, 20, 20, 20, 19, 18, 18, 16, 16, 15, 15, 14, 13, 13, 12,
+ 18, 19, 19, 20, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 13, 12, 17, 19,
+ 19, 19, 20, 19, 18, 17, 16, 16, 14, 14, 13, 13, 12, 12, 17, 19, 19, 19,
+ 19, 19, 17, 17, 16, 16, 14, 14, 13, 13, 12, 12, 16, 18, 18, 18, 19, 18,
+ 17, 17, 15, 15, 14, 14, 13, 12, 12, 12, 16, 18, 18, 18, 19, 18, 17, 17,
+ 15, 15, 14, 14, 13, 12, 12, 12,
/* Size 4x16 */
- 33, 24, 21, 18, 33, 24, 22, 19, 32, 24, 23, 20, 29, 23, 22, 20, 26, 22,
- 22, 20, 24, 21, 21, 19, 22, 20, 19, 18, 22, 20, 19, 18, 22, 21, 18, 17,
- 22, 21, 17, 16, 22, 20, 17, 15, 21, 20, 17, 14, 20, 20, 16, 14, 20, 19,
- 16, 13, 19, 19, 16, 13, 18, 18, 15, 12,
- /* Size 16x4 */
33, 33, 32, 29, 26, 24, 22, 22, 22, 22, 22, 21, 20, 20, 19, 18, 24, 24,
24, 23, 22, 21, 20, 20, 21, 21, 20, 20, 20, 19, 19, 18, 21, 22, 23, 22,
22, 21, 19, 19, 18, 17, 17, 17, 16, 16, 16, 15, 18, 19, 20, 20, 20, 19,
18, 18, 17, 16, 15, 14, 14, 13, 13, 12,
+ /* Size 16x4 */
+ 33, 24, 21, 18, 33, 24, 22, 19, 32, 24, 23, 20, 29, 23, 22, 20, 26, 22,
+ 22, 20, 24, 21, 21, 19, 22, 20, 19, 18, 22, 20, 19, 18, 22, 21, 18, 17,
+ 22, 21, 17, 16, 22, 20, 17, 15, 21, 20, 17, 14, 20, 20, 16, 14, 20, 19,
+ 16, 13, 19, 19, 16, 13, 18, 18, 15, 12,
/* Size 8x32 */
- 32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 33, 33,
- 27, 22, 22, 20, 19, 18, 34, 32, 26, 22, 22, 21, 20, 18, 34, 32, 26, 22,
- 23, 21, 20, 19, 32, 30, 25, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22,
- 20, 19, 30, 28, 23, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 21, 20,
- 28, 26, 22, 21, 22, 22, 21, 19, 24, 24, 22, 20, 21, 20, 19, 18, 24, 24,
- 22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 19,
- 19, 19, 18, 17, 21, 22, 22, 19, 19, 18, 18, 17, 21, 22, 22, 19, 18, 18,
- 17, 17, 21, 23, 22, 19, 18, 17, 17, 16, 21, 23, 22, 19, 18, 17, 16, 16,
- 21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 22,
- 22, 19, 17, 16, 16, 15, 20, 21, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19,
- 17, 16, 15, 14, 19, 21, 21, 19, 17, 15, 14, 14, 19, 20, 21, 19, 17, 15,
- 14, 13, 19, 20, 21, 18, 16, 15, 14, 13, 18, 20, 20, 18, 16, 15, 14, 13,
- 18, 19, 20, 18, 16, 14, 13, 13, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19,
- 19, 17, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 13, 12, 16, 18, 19, 17,
- 15, 14, 13, 12,
- /* Size 32x8 */
32, 33, 33, 34, 34, 32, 31, 30, 28, 28, 24, 24, 22, 21, 21, 21, 21, 21,
21, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 17, 16, 16, 33, 33, 33, 32,
32, 30, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 23, 23, 22, 22, 21,
@@ -9526,7 +9511,23 @@
20, 20, 21, 21, 19, 19, 19, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 14,
14, 14, 14, 13, 13, 13, 13, 13, 17, 18, 18, 18, 19, 19, 19, 19, 20, 19,
18, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 13, 13, 13, 13,
- 12, 12, 12, 12 },
+ 12, 12, 12, 12,
+ /* Size 32x8 */
+ 32, 33, 28, 21, 21, 20, 18, 17, 33, 33, 27, 22, 22, 20, 19, 18, 33, 33,
+ 27, 22, 22, 20, 19, 18, 34, 32, 26, 22, 22, 21, 20, 18, 34, 32, 26, 22,
+ 23, 21, 20, 19, 32, 30, 25, 22, 23, 21, 20, 19, 31, 28, 24, 22, 22, 22,
+ 20, 19, 30, 28, 23, 22, 23, 22, 20, 19, 28, 26, 22, 22, 23, 22, 21, 20,
+ 28, 26, 22, 21, 22, 22, 21, 19, 24, 24, 22, 20, 21, 20, 19, 18, 24, 24,
+ 22, 20, 21, 20, 19, 18, 22, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 19,
+ 19, 19, 18, 17, 21, 22, 22, 19, 19, 18, 18, 17, 21, 22, 22, 19, 18, 18,
+ 17, 17, 21, 23, 22, 19, 18, 17, 17, 16, 21, 23, 22, 19, 18, 17, 16, 16,
+ 21, 23, 22, 19, 18, 17, 16, 16, 20, 22, 22, 19, 17, 16, 16, 15, 20, 22,
+ 22, 19, 17, 16, 16, 15, 20, 21, 22, 19, 17, 16, 15, 14, 20, 21, 22, 19,
+ 17, 16, 15, 14, 19, 21, 21, 19, 17, 15, 14, 14, 19, 20, 21, 19, 17, 15,
+ 14, 13, 19, 20, 21, 18, 16, 15, 14, 13, 18, 20, 20, 18, 16, 15, 14, 13,
+ 18, 19, 20, 18, 16, 14, 13, 13, 17, 19, 20, 18, 16, 14, 13, 12, 17, 19,
+ 19, 17, 16, 14, 13, 12, 16, 18, 19, 17, 15, 14, 13, 12, 16, 18, 19, 17,
+ 15, 14, 13, 12 },
},
{
{ /* Luma */
@@ -9612,21 +9613,12 @@
10, 9, 15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 16, 15, 15, 14, 14,
13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 10, 10, 10, 9, 9, 9,
/* Size 4x8 */
- 32, 32, 24, 18, 32, 31, 25, 19, 32, 29, 24, 20, 30, 28, 20, 17, 27, 26,
- 18, 15, 23, 23, 16, 13, 20, 20, 14, 12, 17, 18, 13, 11,
- /* Size 8x4 */
32, 32, 32, 30, 27, 23, 20, 17, 32, 31, 29, 28, 26, 23, 20, 18, 24, 25,
24, 20, 18, 16, 14, 13, 18, 19, 20, 17, 15, 13, 12, 11,
+ /* Size 8x4 */
+ 32, 32, 24, 18, 32, 31, 25, 19, 32, 29, 24, 20, 30, 28, 20, 17, 27, 26,
+ 18, 15, 23, 23, 16, 13, 20, 20, 14, 12, 17, 18, 13, 11,
/* Size 8x16 */
- 32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 17, 33, 32,
- 31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24, 21, 17, 32, 32, 30, 28,
- 26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 30, 30, 28, 25, 23, 21,
- 19, 16, 28, 30, 27, 22, 20, 19, 17, 15, 27, 28, 26, 22, 20, 18, 16, 14,
- 25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 21, 23,
- 22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17,
- 15, 14, 12, 11, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17, 18, 16, 14, 13,
- 11, 10,
- /* Size 16x8 */
32, 33, 33, 33, 32, 32, 30, 28, 27, 25, 23, 21, 19, 18, 17, 16, 33, 32,
32, 32, 32, 31, 30, 30, 28, 26, 25, 23, 21, 19, 18, 17, 32, 32, 31, 30,
30, 29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 18, 29, 29, 30, 29, 28, 28,
@@ -9635,37 +9627,16 @@
16, 15, 14, 14, 13, 13, 19, 20, 21, 21, 21, 21, 19, 17, 16, 15, 14, 13,
12, 12, 12, 11, 16, 17, 17, 17, 18, 18, 16, 15, 14, 13, 13, 12, 11, 11,
10, 10,
+ /* Size 16x8 */
+ 32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 17, 33, 32,
+ 31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24, 21, 17, 32, 32, 30, 28,
+ 26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 30, 30, 28, 25, 23, 21,
+ 19, 16, 28, 30, 27, 22, 20, 19, 17, 15, 27, 28, 26, 22, 20, 18, 16, 14,
+ 25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 21, 23,
+ 22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17,
+ 15, 14, 12, 11, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17, 18, 16, 14, 13,
+ 11, 10,
/* Size 16x32 */
- 32, 33, 33, 33, 32, 32, 29, 28, 26, 23, 23, 20, 19, 18, 16, 16, 33, 32,
- 32, 32, 32, 32, 29, 29, 27, 24, 24, 21, 20, 18, 16, 16, 33, 32, 32, 32,
- 32, 32, 29, 29, 27, 24, 24, 21, 20, 19, 17, 17, 33, 32, 32, 32, 32, 32,
- 30, 29, 28, 25, 25, 21, 20, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30,
- 28, 25, 25, 22, 21, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30, 28, 25,
- 25, 22, 21, 19, 17, 17, 33, 32, 32, 31, 30, 30, 29, 28, 27, 24, 24, 21,
- 21, 19, 17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 21, 20, 19,
- 17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 26, 24, 24, 21, 21, 19, 18, 18,
- 32, 32, 31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 32, 32,
- 31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 31, 31, 31, 29,
- 28, 28, 26, 25, 24, 22, 22, 20, 19, 18, 17, 17, 30, 30, 30, 29, 28, 28,
- 25, 24, 23, 21, 21, 19, 19, 18, 16, 16, 30, 30, 30, 29, 28, 28, 24, 23,
- 22, 20, 20, 19, 18, 17, 16, 16, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19,
- 19, 18, 17, 16, 15, 15, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19, 19, 18,
- 17, 16, 15, 15, 27, 28, 28, 27, 26, 26, 22, 20, 20, 18, 18, 17, 16, 15,
- 14, 14, 26, 27, 28, 26, 26, 26, 21, 20, 19, 18, 18, 16, 16, 15, 14, 14,
- 25, 26, 26, 26, 25, 25, 21, 20, 19, 17, 17, 16, 15, 15, 13, 13, 23, 25,
- 25, 24, 24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 23, 25, 25, 24,
- 24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 22, 23, 23, 23, 23, 23,
- 19, 18, 17, 16, 16, 14, 14, 13, 12, 12, 21, 23, 23, 23, 22, 22, 19, 18,
- 17, 15, 15, 14, 13, 13, 12, 12, 20, 22, 22, 22, 22, 22, 19, 18, 17, 15,
- 15, 13, 13, 12, 12, 12, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13,
- 12, 12, 11, 11, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13, 12, 12,
- 11, 11, 18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11,
- 18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 10, 10, 17, 18,
- 18, 18, 18, 18, 16, 16, 15, 13, 13, 12, 12, 11, 10, 10, 16, 17, 17, 17,
- 18, 18, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 18,
- 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 15, 16, 16, 16, 17, 17, 15, 14,
- 13, 12, 12, 11, 11, 10, 9, 9,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 26,
25, 23, 23, 22, 21, 20, 19, 19, 18, 18, 17, 16, 16, 15, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 28, 27, 26, 25, 25, 23,
@@ -9695,33 +9666,47 @@
13, 12, 12, 12, 11, 11, 11, 10, 10, 10, 10, 9, 16, 16, 17, 17, 17, 17,
17, 17, 18, 18, 18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12,
11, 11, 11, 10, 10, 10, 10, 9,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 32, 32, 29, 28, 26, 23, 23, 20, 19, 18, 16, 16, 33, 32,
+ 32, 32, 32, 32, 29, 29, 27, 24, 24, 21, 20, 18, 16, 16, 33, 32, 32, 32,
+ 32, 32, 29, 29, 27, 24, 24, 21, 20, 19, 17, 17, 33, 32, 32, 32, 32, 32,
+ 30, 29, 28, 25, 25, 21, 20, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30,
+ 28, 25, 25, 22, 21, 19, 17, 17, 33, 32, 32, 32, 31, 31, 30, 30, 28, 25,
+ 25, 22, 21, 19, 17, 17, 33, 32, 32, 31, 30, 30, 29, 28, 27, 24, 24, 21,
+ 21, 19, 17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 21, 20, 19,
+ 17, 17, 32, 32, 32, 31, 30, 30, 28, 28, 26, 24, 24, 21, 21, 19, 18, 18,
+ 32, 32, 31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 32, 32,
+ 31, 30, 29, 29, 28, 27, 26, 24, 24, 21, 21, 20, 18, 18, 31, 31, 31, 29,
+ 28, 28, 26, 25, 24, 22, 22, 20, 19, 18, 17, 17, 30, 30, 30, 29, 28, 28,
+ 25, 24, 23, 21, 21, 19, 19, 18, 16, 16, 30, 30, 30, 29, 28, 28, 24, 23,
+ 22, 20, 20, 19, 18, 17, 16, 16, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19,
+ 19, 18, 17, 16, 15, 15, 28, 29, 30, 28, 27, 27, 22, 21, 20, 19, 19, 18,
+ 17, 16, 15, 15, 27, 28, 28, 27, 26, 26, 22, 20, 20, 18, 18, 17, 16, 15,
+ 14, 14, 26, 27, 28, 26, 26, 26, 21, 20, 19, 18, 18, 16, 16, 15, 14, 14,
+ 25, 26, 26, 26, 25, 25, 21, 20, 19, 17, 17, 16, 15, 15, 13, 13, 23, 25,
+ 25, 24, 24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 23, 25, 25, 24,
+ 24, 24, 20, 19, 18, 16, 16, 15, 14, 14, 13, 13, 22, 23, 23, 23, 23, 23,
+ 19, 18, 17, 16, 16, 14, 14, 13, 12, 12, 21, 23, 23, 23, 22, 22, 19, 18,
+ 17, 15, 15, 14, 13, 13, 12, 12, 20, 22, 22, 22, 22, 22, 19, 18, 17, 15,
+ 15, 13, 13, 12, 12, 12, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13,
+ 12, 12, 11, 11, 19, 20, 21, 20, 20, 20, 18, 17, 16, 14, 14, 13, 12, 12,
+ 11, 11, 18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 11, 11,
+ 18, 19, 19, 19, 19, 19, 17, 16, 15, 14, 14, 12, 12, 11, 10, 10, 17, 18,
+ 18, 18, 18, 18, 16, 16, 15, 13, 13, 12, 12, 11, 10, 10, 16, 17, 17, 17,
+ 18, 18, 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 16, 17, 17, 17, 18, 18,
+ 16, 15, 14, 13, 13, 12, 11, 11, 10, 10, 15, 16, 16, 16, 17, 17, 15, 14,
+ 13, 12, 12, 11, 11, 10, 9, 9,
/* Size 4x16 */
- 33, 32, 23, 18, 32, 32, 24, 19, 32, 31, 25, 19, 32, 30, 24, 19, 32, 30,
- 24, 19, 32, 29, 24, 20, 30, 28, 21, 18, 29, 27, 19, 16, 28, 26, 18, 15,
- 26, 25, 17, 15, 25, 24, 16, 14, 23, 22, 15, 13, 20, 20, 14, 12, 19, 19,
- 14, 11, 18, 18, 13, 11, 17, 18, 13, 11,
- /* Size 16x4 */
33, 32, 32, 32, 32, 32, 30, 29, 28, 26, 25, 23, 20, 19, 18, 17, 32, 32,
31, 30, 30, 29, 28, 27, 26, 25, 24, 22, 20, 19, 18, 18, 23, 24, 25, 24,
24, 24, 21, 19, 18, 17, 16, 15, 14, 14, 13, 13, 18, 19, 19, 19, 19, 20,
18, 16, 15, 15, 14, 13, 12, 11, 11, 11,
+ /* Size 16x4 */
+ 33, 32, 23, 18, 32, 32, 24, 19, 32, 31, 25, 19, 32, 30, 24, 19, 32, 30,
+ 24, 19, 32, 29, 24, 20, 30, 28, 21, 18, 29, 27, 19, 16, 28, 26, 18, 15,
+ 26, 25, 17, 15, 25, 24, 16, 14, 23, 22, 15, 13, 20, 20, 14, 12, 19, 19,
+ 14, 11, 18, 18, 13, 11, 17, 18, 13, 11,
/* Size 8x32 */
- 32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 16, 33, 32,
- 32, 29, 27, 24, 20, 17, 33, 32, 32, 30, 28, 25, 20, 17, 33, 32, 31, 30,
- 28, 25, 21, 17, 33, 32, 31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24,
- 21, 17, 32, 32, 30, 28, 27, 24, 20, 17, 32, 32, 30, 28, 26, 24, 21, 18,
- 32, 31, 29, 28, 26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 31, 31,
- 28, 26, 24, 22, 19, 17, 30, 30, 28, 25, 23, 21, 19, 16, 30, 30, 28, 24,
- 22, 20, 18, 16, 28, 30, 27, 22, 20, 19, 17, 15, 28, 30, 27, 22, 20, 19,
- 17, 15, 27, 28, 26, 22, 20, 18, 16, 14, 26, 28, 26, 21, 19, 18, 16, 14,
- 25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 23, 25,
- 24, 20, 18, 16, 14, 13, 22, 23, 23, 19, 17, 16, 14, 12, 21, 23, 22, 19,
- 17, 15, 13, 12, 20, 22, 22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14,
- 12, 11, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17, 15, 14, 12, 11,
- 18, 19, 19, 17, 15, 14, 12, 10, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17,
- 18, 16, 14, 13, 11, 10, 16, 17, 18, 16, 14, 13, 11, 10, 15, 16, 17, 15,
- 13, 12, 11, 9,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 26,
25, 23, 23, 22, 21, 20, 19, 19, 18, 18, 17, 16, 16, 15, 33, 32, 32, 32,
32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 28, 28, 26, 25, 25, 23,
@@ -9736,7 +9721,23 @@
21, 20, 21, 21, 21, 19, 19, 18, 17, 17, 16, 16, 15, 14, 14, 14, 13, 13,
12, 12, 12, 12, 12, 11, 11, 11, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18,
18, 17, 16, 16, 15, 15, 14, 14, 13, 13, 13, 12, 12, 12, 11, 11, 11, 10,
- 10, 10, 10, 9 },
+ 10, 10, 10, 9,
+ /* Size 32x8 */
+ 32, 33, 32, 29, 26, 23, 19, 16, 33, 32, 32, 29, 27, 24, 20, 16, 33, 32,
+ 32, 29, 27, 24, 20, 17, 33, 32, 32, 30, 28, 25, 20, 17, 33, 32, 31, 30,
+ 28, 25, 21, 17, 33, 32, 31, 30, 28, 25, 21, 17, 33, 32, 30, 29, 27, 24,
+ 21, 17, 32, 32, 30, 28, 27, 24, 20, 17, 32, 32, 30, 28, 26, 24, 21, 18,
+ 32, 31, 29, 28, 26, 24, 21, 18, 32, 31, 29, 28, 26, 24, 21, 18, 31, 31,
+ 28, 26, 24, 22, 19, 17, 30, 30, 28, 25, 23, 21, 19, 16, 30, 30, 28, 24,
+ 22, 20, 18, 16, 28, 30, 27, 22, 20, 19, 17, 15, 28, 30, 27, 22, 20, 19,
+ 17, 15, 27, 28, 26, 22, 20, 18, 16, 14, 26, 28, 26, 21, 19, 18, 16, 14,
+ 25, 26, 25, 21, 19, 17, 15, 13, 23, 25, 24, 20, 18, 16, 14, 13, 23, 25,
+ 24, 20, 18, 16, 14, 13, 22, 23, 23, 19, 17, 16, 14, 12, 21, 23, 22, 19,
+ 17, 15, 13, 12, 20, 22, 22, 19, 17, 15, 13, 12, 19, 21, 20, 18, 16, 14,
+ 12, 11, 19, 21, 20, 18, 16, 14, 12, 11, 18, 19, 19, 17, 15, 14, 12, 11,
+ 18, 19, 19, 17, 15, 14, 12, 10, 17, 18, 18, 16, 15, 13, 12, 10, 16, 17,
+ 18, 16, 14, 13, 11, 10, 16, 17, 18, 16, 14, 13, 11, 10, 15, 16, 17, 15,
+ 13, 12, 11, 9 },
{ /* Chroma */
/* Size 4x4 */
33, 25, 22, 20, 25, 21, 21, 20, 22, 21, 18, 17, 20, 20, 17, 14,
@@ -9820,21 +9821,12 @@
13, 13, 17, 18, 18, 19, 19, 19, 19, 19, 20, 20, 20, 19, 19, 18, 18, 18,
17, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 13, 13, 13, 13,
/* Size 4x8 */
- 33, 27, 22, 20, 32, 26, 23, 21, 26, 22, 23, 21, 23, 22, 20, 19, 22, 22,
- 18, 18, 22, 22, 17, 16, 21, 22, 17, 15, 19, 20, 16, 14,
- /* Size 8x4 */
33, 32, 26, 23, 22, 22, 21, 19, 27, 26, 22, 22, 22, 22, 22, 20, 22, 23,
23, 20, 18, 17, 17, 16, 20, 21, 21, 19, 18, 16, 15, 14,
+ /* Size 8x4 */
+ 33, 27, 22, 20, 32, 26, 23, 21, 26, 22, 23, 21, 23, 22, 20, 19, 22, 22,
+ 18, 18, 22, 22, 17, 16, 21, 22, 17, 15, 19, 20, 16, 14,
/* Size 8x16 */
- 32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 34, 32,
- 26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23, 22, 20, 29, 28, 23, 22,
- 22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 24, 24, 22, 21, 20, 21,
- 20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 22, 20, 19, 19, 18, 17,
- 21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 20, 22,
- 22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19,
- 18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20, 20, 19, 17, 16,
- 15, 13,
- /* Size 16x8 */
32, 33, 34, 31, 29, 28, 24, 21, 21, 21, 21, 20, 20, 19, 19, 18, 33, 33,
32, 29, 28, 26, 24, 22, 22, 23, 23, 22, 21, 21, 20, 20, 28, 27, 26, 24,
23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 23, 23, 23, 22, 22, 22,
@@ -9843,37 +9835,16 @@
18, 17, 17, 17, 16, 16, 20, 20, 21, 22, 22, 22, 20, 19, 18, 17, 17, 16,
16, 15, 15, 15, 18, 19, 20, 20, 20, 20, 19, 18, 17, 17, 16, 15, 14, 14,
14, 13,
+ /* Size 16x8 */
+ 32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 34, 32,
+ 26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23, 22, 20, 29, 28, 23, 22,
+ 22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 24, 24, 22, 21, 20, 21,
+ 20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 22, 20, 19, 19, 18, 17,
+ 21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 20, 22,
+ 22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19,
+ 18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20, 20, 19, 17, 16,
+ 15, 13,
/* Size 16x32 */
- 32, 33, 33, 31, 28, 28, 23, 21, 21, 21, 21, 20, 20, 19, 18, 18, 33, 33,
- 33, 30, 27, 27, 23, 22, 22, 22, 22, 20, 20, 20, 19, 19, 33, 33, 33, 30,
- 27, 27, 23, 22, 22, 22, 22, 21, 20, 20, 19, 19, 33, 33, 32, 30, 26, 26,
- 23, 22, 22, 22, 22, 21, 21, 20, 19, 19, 34, 32, 32, 29, 26, 26, 23, 22,
- 23, 23, 23, 22, 21, 21, 20, 20, 34, 32, 32, 29, 26, 26, 23, 22, 23, 23,
- 23, 22, 21, 21, 20, 20, 31, 30, 29, 28, 24, 24, 22, 22, 22, 23, 23, 22,
- 22, 21, 20, 20, 31, 29, 28, 27, 24, 24, 22, 22, 22, 22, 22, 22, 22, 21,
- 20, 20, 29, 28, 28, 26, 23, 23, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20,
- 28, 26, 26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 28, 26,
- 26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 25, 24, 24, 23,
- 22, 22, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 24, 24, 24, 23, 22, 22,
- 21, 20, 20, 21, 21, 20, 20, 20, 19, 19, 23, 23, 23, 23, 22, 22, 20, 20,
- 20, 20, 20, 20, 20, 19, 19, 19, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19,
- 19, 19, 19, 19, 18, 18, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19,
- 19, 19, 18, 18, 21, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 18,
- 17, 17, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18, 18, 17, 17,
- 21, 22, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 21, 22,
- 23, 23, 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 21, 22, 23, 23,
- 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 20, 22, 22, 22, 22, 22,
- 20, 19, 18, 17, 17, 17, 16, 16, 16, 16, 20, 22, 22, 22, 22, 22, 20, 19,
- 18, 17, 17, 16, 16, 16, 15, 15, 20, 21, 22, 22, 22, 22, 20, 19, 18, 17,
- 17, 16, 16, 16, 15, 15, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16,
- 16, 15, 14, 14, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16, 16, 15,
- 14, 14, 19, 20, 21, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14,
- 19, 20, 20, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14, 19, 20,
- 20, 20, 21, 21, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 18, 19, 20, 20,
- 20, 20, 19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 18, 19, 20, 20, 20, 20,
- 19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 17, 19, 19, 19, 20, 20, 18, 18,
- 17, 16, 16, 15, 14, 14, 13, 13,
- /* Size 32x16 */
32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 21, 21,
21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 33, 33, 33, 33,
32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
@@ -9903,33 +9874,47 @@
16, 16, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, 18, 19, 19, 19, 20, 20,
20, 20, 20, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15,
14, 14, 14, 14, 14, 13, 13, 13,
+ /* Size 32x16 */
+ 32, 33, 33, 31, 28, 28, 23, 21, 21, 21, 21, 20, 20, 19, 18, 18, 33, 33,
+ 33, 30, 27, 27, 23, 22, 22, 22, 22, 20, 20, 20, 19, 19, 33, 33, 33, 30,
+ 27, 27, 23, 22, 22, 22, 22, 21, 20, 20, 19, 19, 33, 33, 32, 30, 26, 26,
+ 23, 22, 22, 22, 22, 21, 21, 20, 19, 19, 34, 32, 32, 29, 26, 26, 23, 22,
+ 23, 23, 23, 22, 21, 21, 20, 20, 34, 32, 32, 29, 26, 26, 23, 22, 23, 23,
+ 23, 22, 21, 21, 20, 20, 31, 30, 29, 28, 24, 24, 22, 22, 22, 23, 23, 22,
+ 22, 21, 20, 20, 31, 29, 28, 27, 24, 24, 22, 22, 22, 22, 22, 22, 22, 21,
+ 20, 20, 29, 28, 28, 26, 23, 23, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20,
+ 28, 26, 26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 28, 26,
+ 26, 24, 22, 22, 22, 22, 22, 23, 23, 22, 22, 21, 20, 20, 25, 24, 24, 23,
+ 22, 22, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 24, 24, 24, 23, 22, 22,
+ 21, 20, 20, 21, 21, 20, 20, 20, 19, 19, 23, 23, 23, 23, 22, 22, 20, 20,
+ 20, 20, 20, 20, 20, 19, 19, 19, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19,
+ 19, 19, 19, 19, 18, 18, 21, 22, 22, 22, 21, 21, 20, 19, 19, 19, 19, 19,
+ 19, 19, 18, 18, 21, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 18,
+ 17, 17, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 18, 18, 18, 17, 17,
+ 21, 22, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 17, 21, 22,
+ 23, 23, 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 21, 22, 23, 23,
+ 22, 22, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 20, 22, 22, 22, 22, 22,
+ 20, 19, 18, 17, 17, 17, 16, 16, 16, 16, 20, 22, 22, 22, 22, 22, 20, 19,
+ 18, 17, 17, 16, 16, 16, 15, 15, 20, 21, 22, 22, 22, 22, 20, 19, 18, 17,
+ 17, 16, 16, 16, 15, 15, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16,
+ 16, 15, 14, 14, 20, 21, 21, 22, 22, 22, 19, 19, 18, 17, 17, 16, 16, 15,
+ 14, 14, 19, 20, 21, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14,
+ 19, 20, 20, 21, 21, 21, 19, 19, 18, 17, 17, 15, 15, 15, 14, 14, 19, 20,
+ 20, 20, 21, 21, 19, 18, 18, 16, 16, 15, 15, 14, 14, 14, 18, 19, 20, 20,
+ 20, 20, 19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 18, 19, 20, 20, 20, 20,
+ 19, 18, 17, 16, 16, 15, 15, 14, 13, 13, 17, 19, 19, 19, 20, 20, 18, 18,
+ 17, 16, 16, 15, 14, 14, 13, 13,
/* Size 4x16 */
- 33, 28, 21, 19, 33, 27, 22, 20, 32, 26, 23, 21, 30, 24, 23, 21, 28, 23,
- 23, 21, 26, 22, 23, 21, 24, 22, 21, 20, 22, 21, 19, 19, 22, 22, 19, 18,
- 22, 22, 18, 17, 22, 22, 18, 17, 22, 22, 17, 16, 21, 22, 17, 15, 20, 21,
- 17, 15, 20, 21, 16, 14, 19, 20, 16, 14,
- /* Size 16x4 */
33, 33, 32, 30, 28, 26, 24, 22, 22, 22, 22, 22, 21, 20, 20, 19, 28, 27,
26, 24, 23, 22, 22, 21, 22, 22, 22, 22, 22, 21, 21, 20, 21, 22, 23, 23,
23, 23, 21, 19, 19, 18, 18, 17, 17, 17, 16, 16, 19, 20, 21, 21, 21, 21,
20, 19, 18, 17, 17, 16, 15, 15, 14, 14,
+ /* Size 16x4 */
+ 33, 28, 21, 19, 33, 27, 22, 20, 32, 26, 23, 21, 30, 24, 23, 21, 28, 23,
+ 23, 21, 26, 22, 23, 21, 24, 22, 21, 20, 22, 21, 19, 19, 22, 22, 19, 18,
+ 22, 22, 18, 17, 22, 22, 18, 17, 22, 22, 17, 16, 21, 22, 17, 15, 20, 21,
+ 17, 15, 20, 21, 16, 14, 19, 20, 16, 14,
/* Size 8x32 */
- 32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 33, 33,
- 27, 23, 22, 22, 20, 19, 33, 32, 26, 23, 22, 22, 21, 19, 34, 32, 26, 23,
- 23, 23, 21, 20, 34, 32, 26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23,
- 22, 20, 31, 28, 24, 22, 22, 22, 22, 20, 29, 28, 23, 22, 22, 23, 22, 20,
- 28, 26, 22, 22, 22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 25, 24,
- 22, 21, 21, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 19, 23, 23, 22, 20,
- 20, 20, 20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 20, 19, 19,
- 19, 18, 21, 22, 22, 20, 19, 19, 18, 17, 21, 22, 22, 20, 19, 18, 18, 17,
- 21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 21, 23,
- 22, 20, 19, 18, 17, 16, 20, 22, 22, 20, 18, 17, 16, 16, 20, 22, 22, 20,
- 18, 17, 16, 15, 20, 22, 22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17,
- 16, 14, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19, 18, 17, 15, 14,
- 19, 20, 21, 19, 18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20,
- 20, 19, 17, 16, 15, 13, 18, 20, 20, 19, 17, 16, 15, 13, 17, 19, 20, 18,
- 17, 16, 14, 13,
- /* Size 32x8 */
32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 21, 21,
21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 18, 18, 17, 33, 33, 33, 32,
32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 22, 22, 23, 23, 23, 22,
@@ -9944,7 +9929,23 @@
22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16,
16, 16, 15, 15, 15, 15, 15, 14, 18, 19, 19, 19, 20, 20, 20, 20, 20, 20,
20, 20, 19, 19, 18, 18, 17, 17, 17, 16, 16, 16, 15, 15, 14, 14, 14, 14,
- 14, 13, 13, 13 },
+ 14, 13, 13, 13,
+ /* Size 32x8 */
+ 32, 33, 28, 23, 21, 21, 20, 18, 33, 33, 27, 23, 22, 22, 20, 19, 33, 33,
+ 27, 23, 22, 22, 20, 19, 33, 32, 26, 23, 22, 22, 21, 19, 34, 32, 26, 23,
+ 23, 23, 21, 20, 34, 32, 26, 23, 23, 23, 21, 20, 31, 29, 24, 22, 22, 23,
+ 22, 20, 31, 28, 24, 22, 22, 22, 22, 20, 29, 28, 23, 22, 22, 23, 22, 20,
+ 28, 26, 22, 22, 22, 23, 22, 20, 28, 26, 22, 22, 22, 23, 22, 20, 25, 24,
+ 22, 21, 21, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 19, 23, 23, 22, 20,
+ 20, 20, 20, 19, 21, 22, 21, 20, 19, 19, 19, 18, 21, 22, 21, 20, 19, 19,
+ 19, 18, 21, 22, 22, 20, 19, 19, 18, 17, 21, 22, 22, 20, 19, 18, 18, 17,
+ 21, 23, 22, 20, 19, 18, 17, 17, 21, 23, 22, 20, 19, 18, 17, 16, 21, 23,
+ 22, 20, 19, 18, 17, 16, 20, 22, 22, 20, 18, 17, 16, 16, 20, 22, 22, 20,
+ 18, 17, 16, 15, 20, 22, 22, 20, 18, 17, 16, 15, 20, 21, 22, 19, 18, 17,
+ 16, 14, 20, 21, 22, 19, 18, 17, 16, 14, 19, 21, 21, 19, 18, 17, 15, 14,
+ 19, 20, 21, 19, 18, 17, 15, 14, 19, 20, 21, 19, 18, 16, 15, 14, 18, 20,
+ 20, 19, 17, 16, 15, 13, 18, 20, 20, 19, 17, 16, 15, 13, 17, 19, 20, 18,
+ 17, 16, 14, 13 },
},
{
{ /* Luma */
@@ -10030,21 +10031,12 @@
11, 11, 17, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 18, 18, 17,
16, 16, 15, 15, 15, 14, 14, 13, 13, 13, 13, 12, 12, 12, 11, 11,
/* Size 4x8 */
- 32, 32, 28, 20, 32, 31, 28, 21, 32, 30, 27, 21, 30, 28, 23, 19, 29, 27,
- 21, 17, 26, 24, 19, 15, 22, 22, 17, 13, 20, 20, 16, 12,
- /* Size 8x4 */
32, 32, 32, 30, 29, 26, 22, 20, 32, 31, 30, 28, 27, 24, 22, 20, 28, 28,
27, 23, 21, 19, 17, 16, 20, 21, 21, 19, 17, 15, 13, 12,
+ /* Size 8x4 */
+ 32, 32, 28, 20, 32, 31, 28, 21, 32, 30, 27, 21, 30, 28, 23, 19, 29, 27,
+ 21, 17, 26, 24, 19, 15, 22, 22, 17, 13, 20, 20, 16, 12,
/* Size 8x16 */
- 32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32,
- 32, 31, 29, 25, 23, 21, 33, 32, 31, 31, 29, 25, 23, 21, 32, 32, 30, 30,
- 28, 24, 23, 20, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23,
- 22, 20, 30, 30, 28, 27, 24, 21, 20, 19, 28, 30, 28, 26, 21, 19, 18, 17,
- 27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 23, 25,
- 24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16, 15, 14, 21, 22, 22, 21,
- 18, 15, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 18, 19, 19, 19, 16, 14,
- 13, 12,
- /* Size 16x8 */
32, 33, 33, 33, 32, 32, 32, 30, 28, 27, 26, 23, 22, 21, 19, 18, 33, 32,
32, 32, 32, 31, 31, 30, 30, 28, 28, 25, 23, 22, 21, 19, 32, 32, 32, 31,
30, 29, 29, 28, 28, 26, 26, 24, 23, 22, 20, 19, 32, 31, 31, 31, 30, 28,
@@ -10053,37 +10045,16 @@
18, 16, 16, 15, 14, 14, 22, 23, 23, 23, 23, 23, 22, 20, 18, 18, 17, 16,
15, 14, 14, 13, 19, 20, 21, 21, 20, 21, 20, 19, 17, 16, 16, 14, 14, 13,
12, 12,
+ /* Size 16x8 */
+ 32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32,
+ 32, 31, 29, 25, 23, 21, 33, 32, 31, 31, 29, 25, 23, 21, 32, 32, 30, 30,
+ 28, 24, 23, 20, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23,
+ 22, 20, 30, 30, 28, 27, 24, 21, 20, 19, 28, 30, 28, 26, 21, 19, 18, 17,
+ 27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 23, 25,
+ 24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16, 15, 14, 21, 22, 22, 21,
+ 18, 15, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 18, 19, 19, 19, 16, 14,
+ 13, 12,
/* Size 16x32 */
- 32, 33, 33, 33, 32, 32, 32, 29, 28, 27, 23, 23, 22, 19, 19, 17, 33, 32,
- 32, 32, 32, 32, 31, 29, 29, 28, 24, 24, 22, 20, 20, 18, 33, 32, 32, 32,
- 32, 32, 31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32,
- 31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32, 31, 30,
- 29, 28, 25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 32, 31, 31, 30, 30, 28,
- 25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 31, 31, 31, 29, 29, 28, 25, 25,
- 23, 21, 21, 19, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 23, 21,
- 21, 19, 32, 32, 32, 31, 30, 30, 30, 28, 28, 27, 24, 24, 23, 20, 20, 19,
- 32, 32, 32, 31, 30, 30, 29, 28, 28, 27, 24, 24, 23, 21, 21, 19, 32, 32,
- 31, 31, 29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 32, 31, 31,
- 29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 31, 31, 31, 29, 28,
- 28, 26, 26, 25, 23, 23, 22, 20, 20, 19, 30, 30, 30, 30, 28, 28, 27, 24,
- 24, 23, 21, 21, 20, 19, 19, 18, 30, 30, 30, 30, 28, 28, 27, 24, 24, 23,
- 21, 21, 20, 19, 19, 18, 29, 30, 30, 30, 28, 28, 26, 23, 23, 22, 20, 20,
- 19, 18, 18, 17, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17,
- 17, 16, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17, 17, 16,
- 27, 28, 28, 28, 26, 26, 25, 21, 21, 20, 18, 18, 18, 16, 16, 15, 26, 27,
- 28, 27, 26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 26, 27, 28, 27,
- 26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 24, 26, 26, 26, 24, 24,
- 23, 20, 20, 19, 17, 17, 16, 15, 15, 14, 23, 24, 25, 25, 24, 24, 23, 20,
- 19, 18, 16, 16, 16, 14, 14, 14, 23, 24, 25, 25, 24, 24, 23, 20, 19, 18,
- 16, 16, 16, 14, 14, 13, 22, 23, 23, 23, 23, 23, 22, 19, 18, 18, 16, 16,
- 15, 14, 14, 13, 21, 22, 23, 23, 22, 22, 21, 19, 18, 17, 15, 15, 15, 13,
- 13, 13, 21, 22, 22, 22, 22, 22, 21, 18, 18, 17, 15, 15, 14, 13, 13, 13,
- 19, 20, 21, 21, 21, 21, 20, 18, 17, 17, 14, 14, 14, 13, 13, 12, 19, 20,
- 21, 21, 20, 20, 20, 17, 17, 16, 14, 14, 14, 12, 12, 12, 19, 20, 20, 20,
- 20, 20, 19, 17, 17, 16, 14, 14, 13, 12, 12, 12, 18, 19, 19, 19, 19, 19,
- 19, 17, 16, 15, 14, 14, 13, 12, 12, 11, 18, 19, 19, 19, 19, 19, 19, 17,
- 16, 15, 14, 14, 13, 12, 12, 11,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 29, 28, 28,
27, 26, 26, 24, 23, 23, 22, 21, 21, 19, 19, 19, 18, 18, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 28, 27, 27, 26,
@@ -10113,33 +10084,47 @@
16, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 12, 17, 18, 18, 18, 19, 19,
19, 19, 19, 19, 19, 19, 19, 18, 18, 17, 16, 16, 15, 15, 15, 14, 14, 13,
13, 13, 13, 12, 12, 12, 11, 11,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 32, 32, 32, 29, 28, 27, 23, 23, 22, 19, 19, 17, 33, 32,
+ 32, 32, 32, 32, 31, 29, 29, 28, 24, 24, 22, 20, 20, 18, 33, 32, 32, 32,
+ 32, 32, 31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32,
+ 31, 29, 29, 28, 24, 24, 23, 20, 20, 18, 33, 32, 32, 32, 32, 32, 31, 30,
+ 29, 28, 25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 32, 31, 31, 30, 30, 28,
+ 25, 25, 23, 21, 21, 19, 33, 32, 32, 32, 31, 31, 31, 29, 29, 28, 25, 25,
+ 23, 21, 21, 19, 32, 32, 32, 32, 31, 30, 30, 28, 28, 27, 24, 24, 23, 21,
+ 21, 19, 32, 32, 32, 31, 30, 30, 30, 28, 28, 27, 24, 24, 23, 20, 20, 19,
+ 32, 32, 32, 31, 30, 30, 29, 28, 28, 27, 24, 24, 23, 21, 21, 19, 32, 32,
+ 31, 31, 29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 32, 31, 31,
+ 29, 29, 28, 27, 27, 26, 24, 24, 23, 21, 21, 19, 32, 31, 31, 31, 29, 28,
+ 28, 26, 26, 25, 23, 23, 22, 20, 20, 19, 30, 30, 30, 30, 28, 28, 27, 24,
+ 24, 23, 21, 21, 20, 19, 19, 18, 30, 30, 30, 30, 28, 28, 27, 24, 24, 23,
+ 21, 21, 20, 19, 19, 18, 29, 30, 30, 30, 28, 28, 26, 23, 23, 22, 20, 20,
+ 19, 18, 18, 17, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17,
+ 17, 16, 28, 29, 30, 29, 28, 27, 26, 22, 21, 21, 19, 19, 18, 17, 17, 16,
+ 27, 28, 28, 28, 26, 26, 25, 21, 21, 20, 18, 18, 18, 16, 16, 15, 26, 27,
+ 28, 27, 26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 26, 27, 28, 27,
+ 26, 26, 24, 21, 20, 20, 18, 18, 17, 16, 16, 15, 24, 26, 26, 26, 24, 24,
+ 23, 20, 20, 19, 17, 17, 16, 15, 15, 14, 23, 24, 25, 25, 24, 24, 23, 20,
+ 19, 18, 16, 16, 16, 14, 14, 14, 23, 24, 25, 25, 24, 24, 23, 20, 19, 18,
+ 16, 16, 16, 14, 14, 13, 22, 23, 23, 23, 23, 23, 22, 19, 18, 18, 16, 16,
+ 15, 14, 14, 13, 21, 22, 23, 23, 22, 22, 21, 19, 18, 17, 15, 15, 15, 13,
+ 13, 13, 21, 22, 22, 22, 22, 22, 21, 18, 18, 17, 15, 15, 14, 13, 13, 13,
+ 19, 20, 21, 21, 21, 21, 20, 18, 17, 17, 14, 14, 14, 13, 13, 12, 19, 20,
+ 21, 21, 20, 20, 20, 17, 17, 16, 14, 14, 14, 12, 12, 12, 19, 20, 20, 20,
+ 20, 20, 19, 17, 17, 16, 14, 14, 13, 12, 12, 12, 18, 19, 19, 19, 19, 19,
+ 19, 17, 16, 15, 14, 14, 13, 12, 12, 11, 18, 19, 19, 19, 19, 19, 19, 17,
+ 16, 15, 14, 14, 13, 12, 12, 11,
/* Size 4x16 */
- 33, 32, 27, 19, 32, 32, 28, 20, 32, 32, 28, 21, 32, 31, 28, 21, 32, 30,
- 27, 20, 32, 29, 26, 21, 31, 28, 25, 20, 30, 28, 23, 19, 29, 27, 21, 17,
- 28, 26, 20, 16, 27, 26, 20, 16, 24, 24, 18, 14, 23, 23, 18, 14, 22, 22,
- 17, 13, 20, 20, 16, 12, 19, 19, 15, 12,
- /* Size 16x4 */
33, 32, 32, 32, 32, 32, 31, 30, 29, 28, 27, 24, 23, 22, 20, 19, 32, 32,
32, 31, 30, 29, 28, 28, 27, 26, 26, 24, 23, 22, 20, 19, 27, 28, 28, 28,
27, 26, 25, 23, 21, 20, 20, 18, 18, 17, 16, 15, 19, 20, 21, 21, 20, 21,
20, 19, 17, 16, 16, 14, 14, 13, 12, 12,
+ /* Size 16x4 */
+ 33, 32, 27, 19, 32, 32, 28, 20, 32, 32, 28, 21, 32, 31, 28, 21, 32, 30,
+ 27, 20, 32, 29, 26, 21, 31, 28, 25, 20, 30, 28, 23, 19, 29, 27, 21, 17,
+ 28, 26, 20, 16, 27, 26, 20, 16, 24, 24, 18, 14, 23, 23, 18, 14, 22, 22,
+ 17, 13, 20, 20, 16, 12, 19, 19, 15, 12,
/* Size 8x32 */
- 32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 22, 20, 33, 32,
- 32, 31, 29, 24, 23, 20, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32, 32, 31,
- 29, 25, 23, 21, 33, 32, 32, 31, 30, 25, 23, 21, 33, 32, 31, 31, 29, 25,
- 23, 21, 32, 32, 31, 30, 28, 24, 23, 21, 32, 32, 30, 30, 28, 24, 23, 20,
- 32, 32, 30, 29, 28, 24, 23, 21, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31,
- 29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23, 22, 20, 30, 30, 28, 27,
- 24, 21, 20, 19, 30, 30, 28, 27, 24, 21, 20, 19, 29, 30, 28, 26, 23, 20,
- 19, 18, 28, 30, 28, 26, 21, 19, 18, 17, 28, 30, 28, 26, 21, 19, 18, 17,
- 27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 26, 28,
- 26, 24, 20, 18, 17, 16, 24, 26, 24, 23, 20, 17, 16, 15, 23, 25, 24, 23,
- 19, 16, 16, 14, 23, 25, 24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16,
- 15, 14, 21, 23, 22, 21, 18, 15, 15, 13, 21, 22, 22, 21, 18, 15, 14, 13,
- 19, 21, 21, 20, 17, 14, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 19, 20,
- 20, 19, 17, 14, 13, 12, 18, 19, 19, 19, 16, 14, 13, 12, 18, 19, 19, 19,
- 16, 14, 13, 12,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 29, 28, 28,
27, 26, 26, 24, 23, 23, 22, 21, 21, 19, 19, 19, 18, 18, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 28, 28, 28, 26,
@@ -10154,7 +10139,23 @@
23, 23, 23, 23, 23, 23, 22, 20, 20, 19, 18, 18, 18, 17, 17, 16, 16, 16,
15, 15, 14, 14, 14, 13, 13, 13, 19, 20, 20, 20, 21, 21, 21, 21, 20, 21,
21, 21, 20, 19, 19, 18, 17, 17, 16, 16, 16, 15, 14, 14, 14, 13, 13, 13,
- 12, 12, 12, 12 },
+ 12, 12, 12, 12,
+ /* Size 32x8 */
+ 32, 33, 32, 32, 28, 23, 22, 19, 33, 32, 32, 31, 29, 24, 22, 20, 33, 32,
+ 32, 31, 29, 24, 23, 20, 33, 32, 32, 31, 29, 24, 23, 20, 33, 32, 32, 31,
+ 29, 25, 23, 21, 33, 32, 32, 31, 30, 25, 23, 21, 33, 32, 31, 31, 29, 25,
+ 23, 21, 32, 32, 31, 30, 28, 24, 23, 21, 32, 32, 30, 30, 28, 24, 23, 20,
+ 32, 32, 30, 29, 28, 24, 23, 21, 32, 31, 29, 28, 27, 24, 23, 21, 32, 31,
+ 29, 28, 27, 24, 23, 21, 32, 31, 29, 28, 26, 23, 22, 20, 30, 30, 28, 27,
+ 24, 21, 20, 19, 30, 30, 28, 27, 24, 21, 20, 19, 29, 30, 28, 26, 23, 20,
+ 19, 18, 28, 30, 28, 26, 21, 19, 18, 17, 28, 30, 28, 26, 21, 19, 18, 17,
+ 27, 28, 26, 25, 21, 18, 18, 16, 26, 28, 26, 24, 20, 18, 17, 16, 26, 28,
+ 26, 24, 20, 18, 17, 16, 24, 26, 24, 23, 20, 17, 16, 15, 23, 25, 24, 23,
+ 19, 16, 16, 14, 23, 25, 24, 23, 19, 16, 16, 14, 22, 23, 23, 22, 18, 16,
+ 15, 14, 21, 23, 22, 21, 18, 15, 15, 13, 21, 22, 22, 21, 18, 15, 14, 13,
+ 19, 21, 21, 20, 17, 14, 14, 13, 19, 21, 20, 20, 17, 14, 14, 12, 19, 20,
+ 20, 19, 17, 14, 13, 12, 18, 19, 19, 19, 16, 14, 13, 12, 18, 19, 19, 19,
+ 16, 14, 13, 12 },
{ /* Chroma */
/* Size 4x4 */
33, 27, 22, 21, 27, 22, 22, 22, 22, 22, 19, 18, 21, 22, 18, 16,
@@ -10238,21 +10239,12 @@
14, 14, 19, 19, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 19,
19, 19, 18, 18, 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 14, 14,
/* Size 4x8 */
- 33, 27, 22, 20, 33, 26, 22, 21, 28, 23, 22, 22, 24, 22, 20, 20, 22, 21,
- 19, 19, 22, 22, 19, 17, 21, 22, 19, 16, 20, 21, 18, 15,
- /* Size 8x4 */
33, 33, 28, 24, 22, 22, 21, 20, 27, 26, 23, 22, 21, 22, 22, 21, 22, 22,
22, 20, 19, 19, 19, 18, 20, 21, 22, 20, 19, 17, 16, 15,
+ /* Size 8x4 */
+ 33, 27, 22, 20, 33, 26, 22, 21, 28, 23, 22, 22, 24, 22, 20, 20, 22, 21,
+ 19, 19, 22, 22, 19, 17, 21, 22, 19, 16, 20, 21, 18, 15,
/* Size 8x16 */
- 32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32,
- 27, 26, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23, 22, 21, 31, 28, 25, 23,
- 22, 22, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22,
- 22, 21, 24, 24, 22, 21, 20, 21, 20, 20, 21, 22, 21, 21, 19, 19, 19, 19,
- 21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 23,
- 23, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21,
- 19, 17, 17, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 20, 21, 20, 19, 17,
- 16, 15,
- /* Size 16x8 */
32, 33, 34, 33, 31, 28, 26, 24, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
32, 31, 28, 26, 25, 24, 22, 22, 22, 23, 22, 22, 21, 20, 29, 28, 27, 27,
25, 23, 22, 22, 21, 22, 22, 23, 22, 22, 22, 21, 27, 26, 26, 25, 23, 22,
@@ -10261,37 +10253,16 @@
18, 18, 17, 17, 17, 17, 20, 21, 22, 22, 22, 22, 22, 20, 19, 19, 18, 17,
17, 17, 16, 16, 20, 20, 21, 21, 22, 22, 21, 20, 19, 18, 18, 17, 16, 16,
16, 15,
+ /* Size 16x8 */
+ 32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32,
+ 27, 26, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23, 22, 21, 31, 28, 25, 23,
+ 22, 22, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22,
+ 22, 21, 24, 24, 22, 21, 20, 21, 20, 20, 21, 22, 21, 21, 19, 19, 19, 19,
+ 21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 23,
+ 23, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21,
+ 19, 17, 17, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 20, 21, 20, 19, 17,
+ 16, 15,
/* Size 16x32 */
- 32, 33, 33, 33, 29, 28, 27, 22, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
- 33, 32, 28, 27, 26, 22, 22, 22, 21, 21, 21, 20, 20, 19, 33, 33, 33, 32,
- 28, 27, 26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 33, 33, 33, 32, 28, 27,
- 26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 34, 33, 32, 32, 27, 26, 26, 23,
- 22, 22, 23, 23, 22, 21, 21, 20, 34, 33, 32, 31, 27, 26, 25, 23, 22, 22,
- 23, 23, 22, 21, 21, 20, 33, 32, 31, 31, 27, 26, 25, 23, 22, 22, 23, 23,
- 22, 21, 21, 20, 31, 29, 29, 28, 25, 24, 24, 22, 22, 22, 23, 23, 22, 22,
- 22, 21, 31, 29, 28, 28, 25, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21,
- 30, 28, 28, 28, 24, 23, 23, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26,
- 26, 25, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26, 26, 25,
- 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 26, 26, 25, 24, 22, 22,
- 22, 21, 21, 21, 22, 22, 22, 21, 21, 20, 24, 24, 24, 24, 22, 22, 21, 20,
- 20, 20, 21, 21, 20, 20, 20, 20, 24, 24, 24, 24, 22, 22, 21, 20, 20, 20,
- 21, 21, 20, 20, 20, 20, 23, 23, 23, 23, 22, 22, 21, 20, 20, 20, 20, 20,
- 20, 20, 20, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19,
- 19, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19,
- 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 18, 21, 22,
- 22, 22, 22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 22, 22,
- 22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 23, 23, 22, 22,
- 22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 21, 22, 23, 23, 23, 22, 22, 20,
- 19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 23, 23, 22, 22, 22, 20, 19, 19,
- 18, 18, 17, 17, 17, 16, 20, 22, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17,
- 17, 16, 16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16,
- 16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16, 16, 16,
- 20, 21, 21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 20, 21,
- 21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 19, 20, 21, 21,
- 21, 21, 21, 19, 19, 18, 17, 17, 16, 15, 15, 15, 19, 20, 20, 20, 21, 21,
- 20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 19, 20, 20, 20, 21, 21, 20, 19,
- 19, 18, 17, 17, 16, 15, 15, 14,
- /* Size 32x16 */
32, 33, 33, 33, 34, 34, 33, 31, 31, 30, 28, 28, 26, 24, 24, 23, 21, 21,
21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 33, 33, 33, 33,
33, 33, 32, 29, 29, 28, 26, 26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 22,
@@ -10321,33 +10292,47 @@
18, 17, 17, 17, 16, 16, 16, 16, 16, 15, 15, 15, 19, 19, 20, 20, 20, 20,
20, 21, 21, 21, 21, 21, 20, 20, 20, 19, 19, 19, 18, 17, 17, 17, 17, 16,
16, 16, 16, 15, 15, 15, 14, 14,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 29, 28, 27, 22, 21, 21, 21, 21, 20, 20, 20, 19, 33, 33,
+ 33, 32, 28, 27, 26, 22, 22, 22, 21, 21, 21, 20, 20, 19, 33, 33, 33, 32,
+ 28, 27, 26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 33, 33, 33, 32, 28, 27,
+ 26, 22, 22, 22, 22, 22, 21, 20, 20, 20, 34, 33, 32, 32, 27, 26, 26, 23,
+ 22, 22, 23, 23, 22, 21, 21, 20, 34, 33, 32, 31, 27, 26, 25, 23, 22, 22,
+ 23, 23, 22, 21, 21, 20, 33, 32, 31, 31, 27, 26, 25, 23, 22, 22, 23, 23,
+ 22, 21, 21, 20, 31, 29, 29, 28, 25, 24, 24, 22, 22, 22, 23, 23, 22, 22,
+ 22, 21, 31, 29, 28, 28, 25, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21,
+ 30, 28, 28, 28, 24, 23, 23, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26,
+ 26, 25, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 28, 26, 26, 25,
+ 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 22, 21, 26, 26, 25, 24, 22, 22,
+ 22, 21, 21, 21, 22, 22, 22, 21, 21, 20, 24, 24, 24, 24, 22, 22, 21, 20,
+ 20, 20, 21, 21, 20, 20, 20, 20, 24, 24, 24, 24, 22, 22, 21, 20, 20, 20,
+ 21, 21, 20, 20, 20, 20, 23, 23, 23, 23, 22, 22, 21, 20, 20, 20, 20, 20,
+ 20, 20, 20, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19,
+ 19, 19, 21, 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 19,
+ 21, 22, 22, 22, 22, 22, 21, 20, 19, 19, 19, 19, 19, 18, 18, 18, 21, 22,
+ 22, 22, 22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 22, 22,
+ 22, 22, 21, 20, 19, 19, 18, 18, 18, 18, 18, 17, 21, 22, 23, 23, 22, 22,
+ 22, 20, 19, 19, 18, 18, 18, 17, 17, 17, 21, 22, 23, 23, 23, 22, 22, 20,
+ 19, 19, 18, 18, 17, 17, 17, 17, 21, 22, 23, 23, 22, 22, 22, 20, 19, 19,
+ 18, 18, 17, 17, 17, 16, 20, 22, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17,
+ 17, 16, 16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16,
+ 16, 16, 20, 21, 22, 22, 22, 22, 21, 19, 19, 19, 17, 17, 17, 16, 16, 16,
+ 20, 21, 21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 20, 21,
+ 21, 21, 22, 22, 21, 19, 19, 18, 17, 17, 16, 16, 16, 15, 19, 20, 21, 21,
+ 21, 21, 21, 19, 19, 18, 17, 17, 16, 15, 15, 15, 19, 20, 20, 20, 21, 21,
+ 20, 19, 19, 18, 17, 17, 16, 15, 15, 14, 19, 20, 20, 20, 21, 21, 20, 19,
+ 19, 18, 17, 17, 16, 15, 15, 14,
/* Size 4x16 */
- 33, 28, 21, 20, 33, 27, 22, 20, 33, 26, 22, 21, 32, 26, 22, 21, 29, 24,
- 22, 22, 26, 22, 22, 22, 26, 22, 21, 21, 24, 22, 20, 20, 22, 21, 19, 19,
- 22, 22, 19, 18, 22, 22, 19, 18, 22, 22, 19, 17, 22, 22, 19, 16, 21, 22,
- 19, 16, 21, 22, 18, 16, 20, 21, 18, 15,
- /* Size 16x4 */
33, 33, 33, 32, 29, 26, 26, 24, 22, 22, 22, 22, 22, 21, 21, 20, 28, 27,
26, 26, 24, 22, 22, 22, 21, 22, 22, 22, 22, 22, 22, 21, 21, 22, 22, 22,
22, 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 18, 20, 20, 21, 21, 22, 22,
21, 20, 19, 18, 18, 17, 16, 16, 16, 15,
+ /* Size 16x4 */
+ 33, 28, 21, 20, 33, 27, 22, 20, 33, 26, 22, 21, 32, 26, 22, 21, 29, 24,
+ 22, 22, 26, 22, 22, 22, 26, 22, 21, 21, 24, 22, 20, 20, 22, 21, 19, 19,
+ 22, 22, 19, 18, 22, 22, 19, 18, 22, 22, 19, 17, 22, 22, 19, 16, 21, 22,
+ 19, 16, 21, 22, 18, 16, 20, 21, 18, 15,
/* Size 8x32 */
- 32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 21, 21, 20, 33, 33,
- 28, 26, 22, 22, 21, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32, 27, 26,
- 22, 23, 22, 21, 34, 32, 27, 25, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23,
- 22, 21, 31, 29, 25, 24, 22, 23, 22, 22, 31, 28, 25, 23, 22, 22, 22, 22,
- 30, 28, 24, 23, 22, 23, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 28, 26,
- 23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22, 22, 21, 24, 24, 22, 21,
- 20, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 20, 23, 23, 22, 21, 20, 20,
- 20, 20, 21, 22, 21, 21, 19, 19, 19, 19, 21, 22, 21, 21, 19, 19, 19, 19,
- 21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 22,
- 22, 21, 19, 18, 18, 18, 21, 23, 22, 22, 19, 18, 18, 17, 21, 23, 23, 22,
- 19, 18, 17, 17, 21, 23, 22, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17,
- 17, 16, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21, 19, 17, 17, 16,
- 20, 21, 22, 21, 19, 17, 16, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 21,
- 21, 21, 19, 17, 16, 15, 19, 20, 21, 20, 19, 17, 16, 15, 19, 20, 21, 20,
- 19, 17, 16, 15,
- /* Size 32x8 */
32, 33, 33, 33, 34, 34, 33, 31, 31, 30, 28, 28, 26, 24, 24, 23, 21, 21,
21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 33, 33, 33, 33,
32, 32, 31, 29, 28, 28, 26, 26, 25, 24, 24, 23, 22, 22, 22, 22, 22, 23,
@@ -10362,7 +10347,23 @@
22, 22, 22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 18, 18, 18, 17, 17,
17, 17, 17, 16, 16, 16, 16, 16, 20, 20, 20, 20, 21, 21, 21, 22, 22, 22,
22, 22, 21, 20, 20, 20, 19, 19, 18, 18, 18, 17, 17, 17, 16, 16, 16, 16,
- 16, 15, 15, 15 },
+ 16, 15, 15, 15,
+ /* Size 32x8 */
+ 32, 33, 29, 27, 21, 21, 20, 20, 33, 33, 28, 26, 22, 21, 21, 20, 33, 33,
+ 28, 26, 22, 22, 21, 20, 33, 33, 28, 26, 22, 22, 21, 20, 34, 32, 27, 26,
+ 22, 23, 22, 21, 34, 32, 27, 25, 22, 23, 22, 21, 33, 31, 27, 25, 22, 23,
+ 22, 21, 31, 29, 25, 24, 22, 23, 22, 22, 31, 28, 25, 23, 22, 22, 22, 22,
+ 30, 28, 24, 23, 22, 23, 22, 22, 28, 26, 23, 22, 22, 23, 22, 22, 28, 26,
+ 23, 22, 22, 23, 22, 22, 26, 25, 22, 22, 21, 22, 22, 21, 24, 24, 22, 21,
+ 20, 21, 20, 20, 24, 24, 22, 21, 20, 21, 20, 20, 23, 23, 22, 21, 20, 20,
+ 20, 20, 21, 22, 21, 21, 19, 19, 19, 19, 21, 22, 21, 21, 19, 19, 19, 19,
+ 21, 22, 22, 21, 19, 19, 19, 18, 21, 22, 22, 21, 19, 18, 18, 18, 21, 22,
+ 22, 21, 19, 18, 18, 18, 21, 23, 22, 22, 19, 18, 18, 17, 21, 23, 23, 22,
+ 19, 18, 17, 17, 21, 23, 22, 22, 19, 18, 17, 17, 20, 22, 22, 21, 19, 17,
+ 17, 16, 20, 22, 22, 21, 19, 17, 17, 16, 20, 22, 22, 21, 19, 17, 17, 16,
+ 20, 21, 22, 21, 19, 17, 16, 16, 20, 21, 22, 21, 19, 17, 16, 16, 19, 21,
+ 21, 21, 19, 17, 16, 15, 19, 20, 21, 20, 19, 17, 16, 15, 19, 20, 21, 20,
+ 19, 17, 16, 15 },
},
{
{ /* Luma */
@@ -10448,21 +10449,12 @@
14, 14, 20, 20, 21, 21, 21, 22, 22, 22, 21, 21, 21, 21, 21, 21, 20, 19,
19, 19, 18, 18, 18, 17, 16, 16, 16, 15, 15, 15, 14, 14, 14, 13,
/* Size 4x8 */
- 33, 32, 29, 24, 32, 31, 30, 25, 32, 30, 28, 24, 32, 29, 27, 24, 30, 28,
- 24, 21, 28, 26, 21, 18, 24, 24, 19, 16, 22, 22, 18, 15,
- /* Size 8x4 */
33, 32, 32, 32, 30, 28, 24, 22, 32, 31, 30, 29, 28, 26, 24, 22, 29, 30,
28, 27, 24, 21, 19, 18, 24, 25, 24, 24, 21, 18, 16, 15,
+ /* Size 8x4 */
+ 33, 32, 29, 24, 32, 31, 30, 25, 32, 30, 28, 24, 32, 29, 27, 24, 30, 28,
+ 24, 21, 28, 26, 21, 18, 24, 24, 19, 16, 22, 22, 18, 15,
/* Size 8x16 */
- 32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32,
- 32, 32, 30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 31, 30,
- 29, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27,
- 24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24, 21, 20,
- 29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 26, 28,
- 26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20, 17, 17, 23, 25, 24, 24,
- 20, 19, 16, 16, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23, 23, 22, 19, 18,
- 15, 15,
- /* Size 16x8 */
32, 33, 33, 33, 33, 32, 32, 32, 30, 29, 28, 26, 25, 23, 22, 21, 33, 32,
32, 32, 32, 32, 31, 31, 30, 30, 30, 28, 26, 25, 23, 23, 33, 32, 32, 32,
31, 31, 30, 30, 29, 28, 28, 26, 26, 24, 23, 23, 32, 32, 32, 31, 30, 30,
@@ -10471,37 +10463,16 @@
21, 20, 20, 19, 18, 18, 23, 24, 25, 25, 24, 24, 24, 23, 21, 20, 19, 18,
17, 16, 16, 15, 22, 23, 23, 23, 23, 23, 23, 22, 20, 19, 18, 17, 17, 16,
15, 15,
+ /* Size 16x8 */
+ 32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32,
+ 32, 32, 30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 31, 30,
+ 29, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27,
+ 24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24, 21, 20,
+ 29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 26, 28,
+ 26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20, 17, 17, 23, 25, 24, 24,
+ 20, 19, 16, 16, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23, 23, 22, 19, 18,
+ 15, 15,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 32, 32, 32, 29, 28, 28, 26, 23, 23, 22, 19, 33, 33,
- 32, 32, 32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 22, 20, 33, 32, 32, 32,
- 32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32,
- 32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32, 32, 31,
- 30, 29, 29, 26, 25, 25, 23, 20, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30,
- 30, 27, 25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 27,
- 25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 29, 29, 27, 25, 25,
- 23, 21, 33, 32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 26, 24, 24, 23, 21,
- 32, 32, 32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32,
- 32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32, 32, 32,
- 31, 29, 29, 29, 28, 28, 28, 26, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29,
- 29, 28, 28, 27, 27, 25, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29, 29, 28,
- 28, 27, 27, 25, 24, 24, 23, 21, 32, 31, 31, 31, 30, 28, 28, 28, 26, 26,
- 26, 24, 23, 23, 22, 20, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23,
- 21, 21, 20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23, 21, 21,
- 20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 24, 24, 24, 22, 21, 21, 20, 19,
- 29, 29, 30, 30, 28, 27, 27, 26, 23, 22, 22, 20, 20, 20, 19, 17, 28, 29,
- 30, 30, 28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 28, 29, 30, 30,
- 28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 27, 28, 28, 28, 28, 26,
- 26, 25, 22, 21, 21, 19, 18, 18, 18, 16, 26, 27, 28, 28, 26, 26, 26, 24,
- 21, 20, 20, 19, 18, 18, 17, 16, 26, 27, 28, 28, 26, 26, 26, 24, 21, 20,
- 20, 19, 18, 18, 17, 16, 25, 26, 26, 26, 26, 25, 25, 24, 21, 20, 20, 18,
- 17, 17, 17, 15, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16,
- 16, 14, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 16, 14,
- 23, 24, 24, 24, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 15, 14, 22, 23,
- 23, 23, 23, 23, 23, 22, 19, 18, 18, 17, 16, 16, 15, 14, 21, 22, 23, 23,
- 23, 22, 22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 21, 22, 23, 23, 23, 22,
- 22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 20, 21, 22, 22, 21, 21, 21, 20,
- 18, 18, 18, 16, 15, 15, 14, 13,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 30,
29, 28, 28, 27, 26, 26, 25, 23, 23, 23, 22, 21, 21, 20, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 29, 29, 29, 28,
@@ -10531,33 +10502,47 @@
18, 18, 17, 17, 17, 16, 16, 15, 15, 15, 15, 14, 19, 20, 20, 20, 20, 21,
21, 21, 21, 20, 20, 21, 21, 21, 20, 19, 19, 19, 17, 17, 17, 16, 16, 16,
15, 14, 14, 14, 14, 13, 13, 13,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 32, 32, 32, 29, 28, 28, 26, 23, 23, 22, 19, 33, 33,
+ 32, 32, 32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 22, 20, 33, 32, 32, 32,
+ 32, 32, 32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32,
+ 32, 31, 29, 29, 29, 26, 24, 24, 23, 20, 33, 32, 32, 32, 32, 32, 32, 31,
+ 30, 29, 29, 26, 25, 25, 23, 20, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30,
+ 30, 27, 25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 27,
+ 25, 25, 23, 21, 33, 32, 32, 32, 32, 31, 31, 31, 30, 29, 29, 27, 25, 25,
+ 23, 21, 33, 32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 26, 24, 24, 23, 21,
+ 32, 32, 32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32,
+ 32, 32, 31, 30, 30, 30, 28, 28, 28, 26, 24, 24, 23, 20, 32, 32, 32, 32,
+ 31, 29, 29, 29, 28, 28, 28, 26, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29,
+ 29, 28, 28, 27, 27, 25, 24, 24, 23, 21, 32, 32, 31, 31, 30, 29, 29, 28,
+ 28, 27, 27, 25, 24, 24, 23, 21, 32, 31, 31, 31, 30, 28, 28, 28, 26, 26,
+ 26, 24, 23, 23, 22, 20, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23,
+ 21, 21, 20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 25, 24, 24, 23, 21, 21,
+ 20, 19, 30, 30, 30, 30, 29, 28, 28, 27, 24, 24, 24, 22, 21, 21, 20, 19,
+ 29, 29, 30, 30, 28, 27, 27, 26, 23, 22, 22, 20, 20, 20, 19, 17, 28, 29,
+ 30, 30, 28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 28, 29, 30, 30,
+ 28, 27, 27, 26, 22, 21, 21, 20, 19, 19, 18, 17, 27, 28, 28, 28, 28, 26,
+ 26, 25, 22, 21, 21, 19, 18, 18, 18, 16, 26, 27, 28, 28, 26, 26, 26, 24,
+ 21, 20, 20, 19, 18, 18, 17, 16, 26, 27, 28, 28, 26, 26, 26, 24, 21, 20,
+ 20, 19, 18, 18, 17, 16, 25, 26, 26, 26, 26, 25, 25, 24, 21, 20, 20, 18,
+ 17, 17, 17, 15, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16,
+ 16, 14, 23, 24, 25, 25, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 16, 14,
+ 23, 24, 24, 24, 24, 24, 24, 23, 20, 19, 19, 17, 16, 16, 15, 14, 22, 23,
+ 23, 23, 23, 23, 23, 22, 19, 18, 18, 17, 16, 16, 15, 14, 21, 22, 23, 23,
+ 23, 22, 22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 21, 22, 23, 23, 23, 22,
+ 22, 21, 19, 18, 18, 17, 15, 15, 15, 13, 20, 21, 22, 22, 21, 21, 21, 20,
+ 18, 18, 18, 16, 15, 15, 14, 13,
/* Size 4x16 */
- 33, 32, 28, 23, 32, 32, 29, 24, 32, 32, 29, 25, 32, 31, 30, 25, 32, 30,
- 28, 24, 32, 30, 28, 24, 32, 29, 27, 24, 31, 28, 26, 23, 30, 28, 24, 21,
- 29, 27, 22, 20, 29, 27, 21, 19, 27, 26, 20, 18, 26, 25, 20, 17, 24, 24,
- 19, 16, 23, 23, 18, 16, 22, 22, 18, 15,
- /* Size 16x4 */
33, 32, 32, 32, 32, 32, 32, 31, 30, 29, 29, 27, 26, 24, 23, 22, 32, 32,
32, 31, 30, 30, 29, 28, 28, 27, 27, 26, 25, 24, 23, 22, 28, 29, 29, 30,
28, 28, 27, 26, 24, 22, 21, 20, 20, 19, 18, 18, 23, 24, 25, 25, 24, 24,
24, 23, 21, 20, 19, 18, 17, 16, 16, 15,
+ /* Size 16x4 */
+ 33, 32, 28, 23, 32, 32, 29, 24, 32, 32, 29, 25, 32, 31, 30, 25, 32, 30,
+ 28, 24, 32, 30, 28, 24, 32, 29, 27, 24, 31, 28, 26, 23, 30, 28, 24, 21,
+ 29, 27, 22, 20, 29, 27, 21, 19, 27, 26, 20, 18, 26, 25, 20, 17, 24, 24,
+ 19, 16, 23, 23, 18, 16, 22, 22, 18, 15,
/* Size 8x32 */
- 32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 22, 33, 32,
- 32, 32, 29, 29, 24, 23, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32, 32, 32,
- 30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 32, 31, 30, 30,
- 25, 23, 33, 32, 32, 31, 30, 29, 25, 23, 33, 32, 31, 30, 29, 28, 24, 23,
- 32, 32, 31, 30, 28, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 32,
- 31, 29, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27, 24, 23, 32, 31, 30, 29,
- 28, 27, 24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24,
- 21, 20, 30, 30, 29, 28, 25, 24, 21, 20, 30, 30, 29, 28, 24, 24, 21, 20,
- 29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 28, 30,
- 28, 27, 22, 21, 19, 18, 27, 28, 28, 26, 22, 21, 18, 18, 26, 28, 26, 26,
- 21, 20, 18, 17, 26, 28, 26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20,
- 17, 17, 23, 25, 24, 24, 20, 19, 16, 16, 23, 25, 24, 24, 20, 19, 16, 16,
- 23, 24, 24, 24, 20, 19, 16, 15, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23,
- 23, 22, 19, 18, 15, 15, 21, 23, 23, 22, 19, 18, 15, 15, 20, 22, 21, 21,
- 18, 18, 15, 14,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 30, 30,
29, 28, 28, 27, 26, 26, 25, 23, 23, 23, 22, 21, 21, 20, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 30, 28,
@@ -10572,7 +10557,23 @@
25, 25, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19, 18, 18, 18,
17, 16, 16, 16, 16, 15, 15, 15, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 23, 23, 22, 20, 20, 20, 19, 18, 18, 18, 17, 17, 17, 16, 16, 15,
- 15, 15, 15, 14 },
+ 15, 15, 15, 14,
+ /* Size 32x8 */
+ 32, 33, 33, 32, 29, 28, 23, 22, 33, 32, 32, 32, 29, 29, 24, 22, 33, 32,
+ 32, 32, 29, 29, 24, 23, 33, 32, 32, 32, 29, 29, 24, 23, 33, 32, 32, 32,
+ 30, 29, 25, 23, 33, 32, 32, 31, 30, 30, 25, 23, 33, 32, 32, 31, 30, 30,
+ 25, 23, 33, 32, 32, 31, 30, 29, 25, 23, 33, 32, 31, 30, 29, 28, 24, 23,
+ 32, 32, 31, 30, 28, 28, 24, 23, 32, 32, 31, 30, 28, 28, 24, 23, 32, 32,
+ 31, 29, 28, 28, 24, 23, 32, 31, 30, 29, 28, 27, 24, 23, 32, 31, 30, 29,
+ 28, 27, 24, 23, 32, 31, 30, 28, 26, 26, 23, 22, 30, 30, 29, 28, 25, 24,
+ 21, 20, 30, 30, 29, 28, 25, 24, 21, 20, 30, 30, 29, 28, 24, 24, 21, 20,
+ 29, 30, 28, 27, 23, 22, 20, 19, 28, 30, 28, 27, 22, 21, 19, 18, 28, 30,
+ 28, 27, 22, 21, 19, 18, 27, 28, 28, 26, 22, 21, 18, 18, 26, 28, 26, 26,
+ 21, 20, 18, 17, 26, 28, 26, 26, 21, 20, 18, 17, 25, 26, 26, 25, 21, 20,
+ 17, 17, 23, 25, 24, 24, 20, 19, 16, 16, 23, 25, 24, 24, 20, 19, 16, 16,
+ 23, 24, 24, 24, 20, 19, 16, 15, 22, 23, 23, 23, 19, 18, 16, 15, 21, 23,
+ 23, 22, 19, 18, 15, 15, 21, 23, 23, 22, 19, 18, 15, 15, 20, 22, 21, 21,
+ 18, 18, 15, 14 },
{ /* Chroma */
/* Size 4x4 */
33, 28, 22, 22, 28, 23, 22, 23, 22, 22, 19, 19, 22, 23, 19, 17,
@@ -10656,21 +10657,12 @@
17, 16, 20, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20,
20, 20, 19, 19, 19, 18, 18, 18, 18, 17, 17, 17, 17, 16, 16, 16,
/* Size 4x8 */
- 33, 27, 22, 21, 33, 26, 22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 24, 22,
- 20, 20, 22, 22, 19, 19, 22, 22, 19, 18, 21, 22, 19, 17,
- /* Size 8x4 */
33, 33, 29, 26, 24, 22, 22, 21, 27, 26, 24, 22, 22, 22, 22, 22, 22, 22,
22, 22, 20, 19, 19, 19, 21, 23, 22, 23, 20, 19, 18, 17,
+ /* Size 8x4 */
+ 33, 27, 22, 21, 33, 26, 22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 24, 22,
+ 20, 20, 22, 22, 19, 19, 22, 22, 19, 18, 21, 22, 19, 17,
/* Size 8x16 */
- 32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32,
- 30, 26, 23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24,
- 22, 22, 23, 22, 31, 28, 27, 24, 22, 22, 22, 22, 28, 26, 24, 22, 22, 22,
- 23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20, 21, 20,
- 22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
- 22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19, 18, 18, 21, 23, 23, 22,
- 20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19,
- 17, 17,
- /* Size 16x8 */
32, 33, 33, 34, 31, 31, 28, 26, 24, 22, 21, 21, 21, 21, 20, 20, 33, 33,
32, 32, 29, 28, 26, 25, 24, 22, 22, 22, 23, 23, 22, 22, 31, 30, 30, 29,
28, 27, 24, 24, 23, 22, 22, 22, 22, 23, 22, 22, 28, 27, 26, 26, 24, 24,
@@ -10679,37 +10671,16 @@
19, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 23, 22, 21, 19, 19, 18,
18, 18, 17, 17, 20, 21, 22, 22, 22, 22, 22, 22, 20, 19, 19, 18, 18, 17,
17, 17,
+ /* Size 16x8 */
+ 32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32,
+ 30, 26, 23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24,
+ 22, 22, 23, 22, 31, 28, 27, 24, 22, 22, 22, 22, 28, 26, 24, 22, 22, 22,
+ 23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20, 21, 20,
+ 22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
+ 22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19, 18, 18, 21, 23, 23, 22,
+ 20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19,
+ 17, 17,
/* Size 16x32 */
- 32, 33, 33, 33, 31, 28, 28, 27, 23, 21, 21, 21, 21, 21, 20, 20, 33, 33,
- 33, 33, 31, 27, 27, 26, 23, 22, 22, 21, 21, 21, 21, 20, 33, 33, 33, 33,
- 30, 27, 27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 33, 33, 30, 27,
- 27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 32, 32, 30, 26, 26, 26,
- 23, 22, 22, 22, 22, 22, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22,
- 22, 23, 23, 23, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22, 22, 23,
- 23, 23, 22, 21, 33, 32, 31, 31, 29, 26, 26, 25, 23, 22, 22, 23, 23, 23,
- 22, 21, 31, 30, 29, 29, 28, 24, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22,
- 31, 29, 28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 31, 29,
- 28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27,
- 25, 23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22,
- 22, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
- 22, 22, 22, 22, 23, 23, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 21, 21,
- 21, 22, 22, 22, 22, 21, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21,
- 21, 21, 20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21,
- 20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 20, 20, 20, 20, 20, 20, 20, 20,
- 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 21, 22,
- 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22,
- 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22,
- 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 21, 22, 22, 22, 22, 22, 22, 21,
- 20, 19, 19, 19, 18, 18, 18, 18, 21, 22, 22, 22, 22, 22, 22, 21, 20, 19,
- 19, 19, 18, 18, 18, 18, 21, 22, 23, 23, 22, 22, 22, 22, 20, 19, 19, 19,
- 18, 18, 18, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18,
- 17, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17,
- 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 20, 21,
- 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22,
- 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22,
- 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22, 22, 21,
- 20, 19, 19, 18, 17, 17, 17, 16,
- /* Size 32x16 */
32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 24, 24,
22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 33, 33, 33, 33,
33, 33, 33, 32, 30, 29, 29, 28, 26, 26, 26, 24, 24, 24, 22, 22, 22, 22,
@@ -10739,33 +10710,47 @@
19, 19, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 20, 20, 20, 20, 21, 21,
21, 21, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 19, 19, 19, 18, 18, 18,
17, 17, 17, 17, 16, 16, 16, 16,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 31, 28, 28, 27, 23, 21, 21, 21, 21, 21, 20, 20, 33, 33,
+ 33, 33, 31, 27, 27, 26, 23, 22, 22, 21, 21, 21, 21, 20, 33, 33, 33, 33,
+ 30, 27, 27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 33, 33, 30, 27,
+ 27, 26, 23, 22, 22, 22, 22, 22, 21, 20, 33, 33, 32, 32, 30, 26, 26, 26,
+ 23, 22, 22, 22, 22, 22, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22,
+ 22, 23, 23, 23, 22, 21, 34, 33, 32, 32, 29, 26, 26, 25, 23, 22, 22, 23,
+ 23, 23, 22, 21, 33, 32, 31, 31, 29, 26, 26, 25, 23, 22, 22, 23, 23, 23,
+ 22, 21, 31, 30, 29, 29, 28, 24, 24, 24, 22, 22, 22, 22, 23, 23, 22, 22,
+ 31, 29, 28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 31, 29,
+ 28, 28, 27, 24, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27,
+ 25, 23, 23, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22,
+ 22, 22, 22, 22, 22, 22, 23, 23, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
+ 22, 22, 22, 22, 23, 23, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 21, 21,
+ 21, 22, 22, 22, 22, 21, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21,
+ 21, 21, 20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 21, 20, 20, 21, 21, 21,
+ 20, 20, 24, 24, 24, 24, 23, 22, 22, 21, 20, 20, 20, 20, 20, 20, 20, 20,
+ 22, 22, 22, 22, 22, 21, 21, 21, 20, 20, 20, 20, 19, 19, 19, 19, 21, 22,
+ 22, 22, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22,
+ 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22,
+ 22, 21, 20, 19, 19, 19, 19, 19, 19, 18, 21, 22, 22, 22, 22, 22, 22, 21,
+ 20, 19, 19, 19, 18, 18, 18, 18, 21, 22, 22, 22, 22, 22, 22, 21, 20, 19,
+ 19, 19, 18, 18, 18, 18, 21, 22, 23, 23, 22, 22, 22, 22, 20, 19, 19, 19,
+ 18, 18, 18, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18,
+ 17, 17, 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17,
+ 21, 22, 23, 23, 23, 22, 22, 22, 20, 19, 19, 18, 18, 18, 17, 17, 20, 21,
+ 22, 22, 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22,
+ 22, 22, 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22,
+ 22, 21, 20, 19, 19, 18, 17, 17, 17, 16, 20, 21, 22, 22, 22, 22, 22, 21,
+ 20, 19, 19, 18, 17, 17, 17, 16,
/* Size 4x16 */
- 33, 28, 21, 21, 33, 27, 22, 22, 33, 26, 22, 22, 33, 26, 22, 23, 30, 24,
- 22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 26, 22, 21, 22, 24, 22, 20, 21,
- 22, 21, 20, 19, 22, 21, 19, 19, 22, 22, 19, 18, 22, 22, 19, 18, 22, 22,
- 19, 18, 21, 22, 19, 17, 21, 22, 19, 17,
- /* Size 16x4 */
33, 33, 33, 33, 30, 29, 26, 26, 24, 22, 22, 22, 22, 22, 21, 21, 28, 27,
26, 26, 24, 24, 22, 22, 22, 21, 21, 22, 22, 22, 22, 22, 21, 22, 22, 22,
22, 22, 22, 21, 20, 20, 19, 19, 19, 19, 19, 19, 21, 22, 22, 23, 23, 22,
23, 22, 21, 19, 19, 18, 18, 18, 17, 17,
+ /* Size 16x4 */
+ 33, 28, 21, 21, 33, 27, 22, 22, 33, 26, 22, 22, 33, 26, 22, 23, 30, 24,
+ 22, 23, 29, 24, 22, 22, 26, 22, 22, 23, 26, 22, 21, 22, 24, 22, 20, 21,
+ 22, 21, 20, 19, 22, 21, 19, 19, 22, 22, 19, 18, 22, 22, 19, 18, 22, 22,
+ 19, 18, 21, 22, 19, 17, 21, 22, 19, 17,
/* Size 8x32 */
- 32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 31, 27, 23, 22, 21, 21, 33, 33,
- 30, 27, 23, 22, 22, 21, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32, 30, 26,
- 23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 34, 32, 29, 26, 23, 22,
- 23, 22, 33, 31, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24, 22, 22, 23, 22,
- 31, 28, 27, 24, 22, 22, 22, 22, 31, 28, 27, 24, 22, 22, 22, 22, 29, 27,
- 25, 23, 22, 22, 23, 22, 28, 26, 24, 22, 22, 22, 23, 22, 28, 26, 24, 22,
- 22, 22, 23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20,
- 21, 20, 24, 24, 23, 22, 21, 20, 21, 20, 24, 24, 23, 22, 20, 20, 20, 20,
- 22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
- 22, 21, 20, 19, 19, 19, 21, 22, 22, 22, 20, 19, 19, 19, 21, 22, 22, 22,
- 20, 19, 18, 18, 21, 22, 22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19,
- 18, 18, 21, 23, 23, 22, 20, 19, 18, 17, 21, 23, 23, 22, 20, 19, 18, 17,
- 21, 23, 23, 22, 20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22,
- 22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22,
- 20, 19, 17, 17,
- /* Size 32x8 */
32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 24, 24,
22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 20, 20, 20, 20, 33, 33, 33, 33,
32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 24, 24, 22, 22, 22, 22,
@@ -10780,7 +10765,23 @@
23, 23, 23, 22, 22, 23, 23, 23, 22, 21, 21, 20, 19, 19, 19, 19, 18, 18,
18, 18, 18, 18, 17, 17, 17, 17, 20, 21, 21, 21, 22, 22, 22, 22, 22, 22,
22, 22, 22, 22, 22, 20, 20, 20, 19, 19, 19, 19, 18, 18, 18, 17, 17, 17,
- 17, 17, 17, 17 },
+ 17, 17, 17, 17,
+ /* Size 32x8 */
+ 32, 33, 31, 28, 23, 21, 21, 20, 33, 33, 31, 27, 23, 22, 21, 21, 33, 33,
+ 30, 27, 23, 22, 22, 21, 33, 33, 30, 27, 23, 22, 22, 21, 33, 32, 30, 26,
+ 23, 22, 22, 22, 34, 32, 29, 26, 23, 22, 23, 22, 34, 32, 29, 26, 23, 22,
+ 23, 22, 33, 31, 29, 26, 23, 22, 23, 22, 31, 29, 28, 24, 22, 22, 23, 22,
+ 31, 28, 27, 24, 22, 22, 22, 22, 31, 28, 27, 24, 22, 22, 22, 22, 29, 27,
+ 25, 23, 22, 22, 23, 22, 28, 26, 24, 22, 22, 22, 23, 22, 28, 26, 24, 22,
+ 22, 22, 23, 22, 26, 25, 24, 22, 21, 21, 22, 22, 24, 24, 23, 22, 21, 20,
+ 21, 20, 24, 24, 23, 22, 21, 20, 21, 20, 24, 24, 23, 22, 20, 20, 20, 20,
+ 22, 22, 22, 21, 20, 20, 19, 19, 21, 22, 22, 21, 20, 19, 19, 19, 21, 22,
+ 22, 21, 20, 19, 19, 19, 21, 22, 22, 22, 20, 19, 19, 19, 21, 22, 22, 22,
+ 20, 19, 18, 18, 21, 22, 22, 22, 20, 19, 18, 18, 21, 23, 22, 22, 20, 19,
+ 18, 18, 21, 23, 23, 22, 20, 19, 18, 17, 21, 23, 23, 22, 20, 19, 18, 17,
+ 21, 23, 23, 22, 20, 19, 18, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22,
+ 22, 22, 20, 19, 17, 17, 20, 22, 22, 22, 20, 19, 17, 17, 20, 22, 22, 22,
+ 20, 19, 17, 17 },
},
{
{ /* Luma */
@@ -10866,21 +10867,12 @@
16, 16, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 24, 24, 24, 24, 24, 24,
24, 23, 22, 22, 22, 20, 19, 19, 19, 18, 18, 18, 18, 17, 16, 16,
/* Size 4x8 */
- 33, 32, 30, 26, 32, 32, 30, 27, 32, 31, 30, 27, 32, 31, 28, 26, 31, 30,
- 27, 24, 30, 28, 25, 22, 28, 27, 23, 20, 26, 26, 22, 18,
- /* Size 8x4 */
33, 32, 32, 32, 31, 30, 28, 26, 32, 32, 31, 31, 30, 28, 27, 26, 30, 30,
30, 28, 27, 25, 23, 22, 26, 27, 27, 26, 24, 22, 20, 18,
+ /* Size 8x4 */
+ 33, 32, 30, 26, 32, 32, 30, 27, 32, 31, 30, 27, 32, 31, 28, 26, 31, 30,
+ 27, 24, 30, 28, 25, 22, 28, 27, 23, 20, 26, 26, 22, 18,
/* Size 8x16 */
- 32, 33, 33, 32, 32, 28, 28, 23, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32,
- 32, 32, 32, 29, 29, 24, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31,
- 31, 30, 30, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28,
- 28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27, 27, 24,
- 30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 28, 30,
- 30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 26, 28, 28, 26,
- 26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 23, 25, 25, 24, 24, 19,
- 19, 16,
- /* Size 16x8 */
32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 30, 28, 28, 26, 26, 23, 33, 32,
32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 33, 32, 32, 32,
32, 32, 32, 31, 31, 30, 30, 30, 30, 28, 28, 25, 32, 32, 32, 31, 31, 30,
@@ -10889,37 +10881,16 @@
24, 21, 21, 20, 20, 19, 28, 29, 29, 30, 30, 28, 28, 27, 27, 24, 24, 21,
21, 20, 20, 19, 23, 24, 24, 25, 25, 24, 24, 24, 24, 21, 21, 19, 19, 18,
18, 16,
+ /* Size 16x8 */
+ 32, 33, 33, 32, 32, 28, 28, 23, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32,
+ 32, 32, 32, 29, 29, 24, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31,
+ 31, 30, 30, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28,
+ 28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27, 27, 24,
+ 30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 28, 30,
+ 30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 26, 28, 28, 26,
+ 26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 23, 25, 25, 24, 24, 19,
+ 19, 16,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 28, 28, 26, 23, 23, 33, 33,
- 33, 33, 33, 32, 32, 32, 32, 30, 29, 29, 29, 26, 24, 24, 33, 32, 32, 32,
- 32, 32, 32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32,
- 32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32,
- 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32, 32, 30,
- 29, 29, 29, 27, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30,
- 30, 28, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28,
- 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28, 25, 25,
- 33, 32, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 27, 25, 25, 32, 32,
- 32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32,
- 32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31,
- 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31, 30, 30,
- 30, 28, 28, 28, 28, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28,
- 27, 27, 27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27,
- 27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27, 27, 26,
- 24, 24, 31, 31, 31, 31, 31, 30, 28, 28, 28, 27, 26, 26, 26, 24, 23, 23,
- 30, 30, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30,
- 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30, 30, 30,
- 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 29, 30, 30, 30, 30, 28,
- 28, 28, 28, 25, 23, 23, 23, 22, 20, 20, 28, 29, 30, 30, 30, 28, 27, 27,
- 27, 24, 21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24,
- 21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24, 21, 21,
- 21, 20, 19, 19, 28, 28, 28, 28, 28, 27, 26, 26, 26, 23, 21, 21, 21, 20,
- 18, 18, 26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18,
- 26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 26, 27,
- 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 25, 26, 26, 26,
- 26, 26, 24, 24, 24, 22, 20, 20, 20, 18, 17, 17, 23, 24, 25, 25, 25, 24,
- 24, 24, 24, 21, 19, 19, 19, 18, 16, 16, 23, 24, 25, 25, 25, 24, 24, 24,
- 24, 21, 19, 19, 19, 18, 16, 16,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
30, 30, 30, 29, 28, 28, 28, 28, 26, 26, 26, 25, 23, 23, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30,
@@ -10949,33 +10920,47 @@
21, 20, 19, 19, 19, 18, 18, 18, 18, 17, 16, 16, 23, 24, 24, 24, 24, 25,
25, 25, 25, 25, 24, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19,
19, 18, 18, 18, 18, 17, 16, 16,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 28, 28, 26, 23, 23, 33, 33,
+ 33, 33, 33, 32, 32, 32, 32, 30, 29, 29, 29, 26, 24, 24, 33, 32, 32, 32,
+ 32, 32, 32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32,
+ 32, 30, 29, 29, 29, 27, 24, 24, 33, 32, 32, 32, 32, 32, 32, 32, 32, 30,
+ 29, 29, 29, 27, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30,
+ 30, 28, 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28,
+ 25, 25, 33, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 28, 25, 25,
+ 33, 32, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 27, 25, 25, 32, 32,
+ 32, 32, 32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32,
+ 32, 31, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31,
+ 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 32, 32, 32, 32, 32, 31, 30, 30,
+ 30, 28, 28, 28, 28, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28,
+ 27, 27, 27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27,
+ 27, 26, 24, 24, 32, 32, 31, 31, 31, 30, 29, 29, 29, 28, 27, 27, 27, 26,
+ 24, 24, 31, 31, 31, 31, 31, 30, 28, 28, 28, 27, 26, 26, 26, 24, 23, 23,
+ 30, 30, 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30,
+ 30, 30, 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 30, 30, 30, 30,
+ 30, 29, 28, 28, 28, 26, 24, 24, 24, 23, 21, 21, 29, 30, 30, 30, 30, 28,
+ 28, 28, 28, 25, 23, 23, 23, 22, 20, 20, 28, 29, 30, 30, 30, 28, 27, 27,
+ 27, 24, 21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24,
+ 21, 21, 21, 20, 19, 19, 28, 29, 30, 30, 30, 28, 27, 27, 27, 24, 21, 21,
+ 21, 20, 19, 19, 28, 28, 28, 28, 28, 27, 26, 26, 26, 23, 21, 21, 21, 20,
+ 18, 18, 26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18,
+ 26, 27, 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 26, 27,
+ 28, 28, 28, 26, 26, 26, 26, 23, 20, 20, 20, 19, 18, 18, 25, 26, 26, 26,
+ 26, 26, 24, 24, 24, 22, 20, 20, 20, 18, 17, 17, 23, 24, 25, 25, 25, 24,
+ 24, 24, 24, 21, 19, 19, 19, 18, 16, 16, 23, 24, 25, 25, 25, 24, 24, 24,
+ 24, 21, 19, 19, 19, 18, 16, 16,
/* Size 4x16 */
- 33, 32, 30, 26, 32, 32, 30, 27, 32, 32, 30, 27, 32, 32, 31, 28, 32, 32,
- 31, 28, 32, 31, 29, 26, 32, 31, 29, 26, 32, 30, 28, 26, 32, 30, 28, 26,
- 30, 29, 26, 23, 30, 29, 26, 23, 29, 28, 24, 20, 29, 28, 24, 20, 27, 26,
- 23, 19, 27, 26, 23, 19, 24, 24, 21, 18,
- /* Size 16x4 */
33, 32, 32, 32, 32, 32, 32, 32, 32, 30, 30, 29, 29, 27, 27, 24, 32, 32,
32, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28, 26, 26, 24, 30, 30, 30, 31,
31, 29, 29, 28, 28, 26, 26, 24, 24, 23, 23, 21, 26, 27, 27, 28, 28, 26,
26, 26, 26, 23, 23, 20, 20, 19, 19, 18,
+ /* Size 16x4 */
+ 33, 32, 30, 26, 32, 32, 30, 27, 32, 32, 30, 27, 32, 32, 31, 28, 32, 32,
+ 31, 28, 32, 31, 29, 26, 32, 31, 29, 26, 32, 30, 28, 26, 32, 30, 28, 26,
+ 30, 29, 26, 23, 30, 29, 26, 23, 29, 28, 24, 20, 29, 28, 24, 20, 27, 26,
+ 23, 19, 27, 26, 23, 19, 24, 24, 21, 18,
/* Size 8x32 */
- 32, 33, 33, 32, 32, 28, 28, 23, 33, 33, 33, 32, 32, 29, 29, 24, 33, 32,
- 32, 32, 32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32, 32, 32,
- 32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 25, 33, 32, 32, 31, 31, 30,
- 30, 25, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31, 31, 30, 30, 25,
- 33, 32, 32, 31, 31, 29, 29, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32,
- 32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30,
- 30, 28, 28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27,
- 27, 24, 32, 31, 31, 29, 29, 27, 27, 24, 31, 31, 31, 28, 28, 26, 26, 23,
- 30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 30, 30,
- 30, 28, 28, 24, 24, 21, 29, 30, 30, 28, 28, 23, 23, 20, 28, 30, 30, 27,
- 27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21,
- 21, 19, 28, 28, 28, 26, 26, 21, 21, 18, 26, 28, 28, 26, 26, 20, 20, 18,
- 26, 28, 28, 26, 26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 25, 26,
- 26, 24, 24, 20, 20, 17, 23, 25, 25, 24, 24, 19, 19, 16, 23, 25, 25, 24,
- 24, 19, 19, 16,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
30, 30, 30, 29, 28, 28, 28, 28, 26, 26, 26, 25, 23, 23, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
@@ -10990,7 +10975,23 @@
30, 30, 30, 29, 28, 28, 28, 28, 27, 27, 27, 26, 24, 24, 24, 23, 21, 21,
21, 21, 20, 20, 20, 20, 19, 19, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25,
24, 24, 24, 24, 24, 24, 24, 23, 21, 21, 21, 20, 19, 19, 19, 18, 18, 18,
- 18, 17, 16, 16 },
+ 18, 17, 16, 16,
+ /* Size 32x8 */
+ 32, 33, 33, 32, 32, 28, 28, 23, 33, 33, 33, 32, 32, 29, 29, 24, 33, 32,
+ 32, 32, 32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 24, 33, 32, 32, 32,
+ 32, 29, 29, 24, 33, 32, 32, 32, 32, 29, 29, 25, 33, 32, 32, 31, 31, 30,
+ 30, 25, 33, 32, 32, 31, 31, 30, 30, 25, 33, 32, 32, 31, 31, 30, 30, 25,
+ 33, 32, 32, 31, 31, 29, 29, 25, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32,
+ 32, 30, 30, 28, 28, 24, 32, 32, 32, 30, 30, 28, 28, 24, 32, 32, 32, 30,
+ 30, 28, 28, 24, 32, 31, 31, 29, 29, 27, 27, 24, 32, 31, 31, 29, 29, 27,
+ 27, 24, 32, 31, 31, 29, 29, 27, 27, 24, 31, 31, 31, 28, 28, 26, 26, 23,
+ 30, 30, 30, 28, 28, 24, 24, 21, 30, 30, 30, 28, 28, 24, 24, 21, 30, 30,
+ 30, 28, 28, 24, 24, 21, 29, 30, 30, 28, 28, 23, 23, 20, 28, 30, 30, 27,
+ 27, 21, 21, 19, 28, 30, 30, 27, 27, 21, 21, 19, 28, 30, 30, 27, 27, 21,
+ 21, 19, 28, 28, 28, 26, 26, 21, 21, 18, 26, 28, 28, 26, 26, 20, 20, 18,
+ 26, 28, 28, 26, 26, 20, 20, 18, 26, 28, 28, 26, 26, 20, 20, 18, 25, 26,
+ 26, 24, 24, 20, 20, 17, 23, 25, 25, 24, 24, 19, 19, 16, 23, 25, 25, 24,
+ 24, 19, 19, 16 },
{ /* Chroma */
/* Size 4x4 */
33, 30, 24, 22, 30, 26, 23, 22, 24, 23, 21, 21, 22, 22, 21, 19,
@@ -11074,21 +11075,12 @@
18, 18, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
23, 22, 21, 21, 21, 20, 19, 19, 19, 19, 19, 19, 19, 18, 18, 18,
/* Size 4x8 */
- 33, 30, 24, 21, 33, 29, 24, 22, 31, 28, 23, 22, 28, 25, 22, 22, 26, 23,
- 21, 21, 23, 22, 21, 20, 22, 22, 20, 19, 22, 22, 21, 19,
- /* Size 8x4 */
33, 33, 31, 28, 26, 23, 22, 22, 30, 29, 28, 25, 23, 22, 22, 22, 24, 24,
23, 22, 21, 21, 20, 21, 21, 22, 22, 22, 21, 20, 19, 19,
+ /* Size 8x4 */
+ 33, 30, 24, 21, 33, 29, 24, 22, 31, 28, 23, 22, 28, 25, 22, 22, 26, 23,
+ 21, 21, 23, 22, 21, 20, 22, 22, 20, 19, 22, 22, 21, 19,
/* Size 8x16 */
- 32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33,
- 33, 27, 27, 22, 22, 22, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26,
- 26, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22,
- 22, 22, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23,
- 24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 21, 22,
- 22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 22,
- 22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19,
- 19, 18,
- /* Size 16x8 */
32, 33, 33, 34, 34, 31, 31, 28, 28, 24, 24, 21, 21, 21, 21, 21, 33, 33,
33, 32, 32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 33, 33, 33, 32,
32, 28, 28, 26, 26, 24, 24, 22, 22, 22, 22, 23, 28, 27, 27, 26, 26, 24,
@@ -11097,37 +11089,16 @@
20, 19, 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 20, 19,
19, 19, 19, 19, 21, 22, 22, 23, 23, 22, 22, 23, 23, 21, 21, 19, 19, 18,
18, 18,
+ /* Size 16x8 */
+ 32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33,
+ 33, 27, 27, 22, 22, 22, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26,
+ 26, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22,
+ 22, 22, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23,
+ 24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 21, 22,
+ 22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 22,
+ 22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19,
+ 19, 18,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 31, 28, 28, 28, 24, 21, 21, 21, 21, 21, 21, 33, 33,
- 33, 33, 33, 30, 28, 28, 28, 24, 22, 22, 22, 21, 21, 21, 33, 33, 33, 33,
- 33, 30, 27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30,
- 27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30, 27, 27,
- 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 32, 32, 32, 29, 26, 26, 26, 24,
- 22, 22, 22, 22, 22, 22, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22,
- 22, 23, 23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23,
- 23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23, 23, 23,
- 32, 31, 30, 30, 30, 28, 25, 25, 25, 23, 22, 22, 22, 22, 23, 23, 31, 30,
- 28, 28, 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28,
- 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28, 28, 26,
- 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27, 27, 25, 23, 23,
- 23, 22, 22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22,
- 22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22,
- 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22, 22, 22,
- 23, 23, 26, 26, 25, 25, 25, 23, 22, 22, 22, 21, 21, 21, 21, 21, 22, 22,
- 24, 24, 24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24,
- 24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24, 24, 24,
- 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 23, 23, 23, 23, 23, 22,
- 22, 22, 22, 21, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 21, 21,
- 21, 20, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20,
- 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20, 19, 19,
- 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19,
- 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18,
- 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22,
- 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23,
- 23, 22, 22, 22, 22, 21, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23, 23, 23,
- 22, 22, 22, 21, 19, 19, 19, 18, 18, 18, 21, 22, 23, 23, 23, 23, 22, 22,
- 22, 21, 19, 19, 19, 18, 18, 18,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 31, 31, 29, 28, 28, 28, 26,
24, 24, 24, 23, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33,
33, 33, 33, 33, 33, 31, 30, 30, 30, 28, 27, 27, 27, 26, 24, 24, 24, 23,
@@ -11157,33 +11128,47 @@
21, 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 21, 21, 22, 22, 22, 22,
23, 23, 23, 23, 22, 22, 22, 23, 23, 23, 23, 22, 21, 21, 21, 20, 19, 19,
19, 19, 18, 18, 18, 18, 18, 18,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 31, 28, 28, 28, 24, 21, 21, 21, 21, 21, 21, 33, 33,
+ 33, 33, 33, 30, 28, 28, 28, 24, 22, 22, 22, 21, 21, 21, 33, 33, 33, 33,
+ 33, 30, 27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30,
+ 27, 27, 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 30, 27, 27,
+ 27, 24, 22, 22, 22, 22, 22, 22, 33, 33, 32, 32, 32, 29, 26, 26, 26, 24,
+ 22, 22, 22, 22, 22, 22, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22,
+ 22, 23, 23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23,
+ 23, 23, 34, 33, 32, 32, 32, 29, 26, 26, 26, 24, 22, 22, 22, 23, 23, 23,
+ 32, 31, 30, 30, 30, 28, 25, 25, 25, 23, 22, 22, 22, 22, 23, 23, 31, 30,
+ 28, 28, 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28,
+ 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 31, 30, 28, 28, 28, 26,
+ 24, 24, 24, 23, 22, 22, 22, 22, 22, 22, 29, 28, 27, 27, 27, 25, 23, 23,
+ 23, 22, 22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22,
+ 22, 22, 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22,
+ 22, 22, 23, 23, 28, 27, 26, 26, 26, 24, 22, 22, 22, 22, 22, 22, 22, 22,
+ 23, 23, 26, 26, 25, 25, 25, 23, 22, 22, 22, 21, 21, 21, 21, 21, 22, 22,
+ 24, 24, 24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24,
+ 24, 24, 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 24, 24, 24, 24,
+ 24, 23, 22, 22, 22, 21, 20, 20, 20, 20, 21, 21, 23, 23, 23, 23, 23, 22,
+ 22, 22, 22, 21, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 21, 21,
+ 21, 20, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20,
+ 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 21, 21, 21, 20, 19, 19,
+ 19, 19, 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19,
+ 19, 19, 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18,
+ 21, 22, 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22,
+ 22, 22, 22, 22, 22, 22, 22, 20, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23,
+ 23, 22, 22, 22, 22, 21, 19, 19, 19, 19, 18, 18, 21, 22, 23, 23, 23, 23,
+ 22, 22, 22, 21, 19, 19, 19, 18, 18, 18, 21, 22, 23, 23, 23, 23, 22, 22,
+ 22, 21, 19, 19, 19, 18, 18, 18,
/* Size 4x16 */
- 33, 31, 24, 21, 33, 30, 24, 22, 33, 30, 24, 22, 33, 29, 24, 23, 33, 29,
- 24, 23, 30, 26, 23, 22, 30, 26, 23, 22, 27, 24, 22, 22, 27, 24, 22, 22,
- 24, 23, 21, 20, 24, 23, 21, 20, 21, 22, 20, 19, 21, 22, 20, 19, 22, 22,
- 20, 19, 22, 22, 20, 19, 22, 23, 21, 18,
- /* Size 16x4 */
33, 33, 33, 33, 33, 30, 30, 27, 27, 24, 24, 21, 21, 22, 22, 22, 31, 30,
30, 29, 29, 26, 26, 24, 24, 23, 23, 22, 22, 22, 22, 23, 24, 24, 24, 24,
24, 23, 23, 22, 22, 21, 21, 20, 20, 20, 20, 21, 21, 22, 22, 23, 23, 22,
22, 22, 22, 20, 20, 19, 19, 19, 19, 18,
+ /* Size 16x4 */
+ 33, 31, 24, 21, 33, 30, 24, 22, 33, 30, 24, 22, 33, 29, 24, 23, 33, 29,
+ 24, 23, 30, 26, 23, 22, 30, 26, 23, 22, 27, 24, 22, 22, 27, 24, 22, 22,
+ 24, 23, 21, 20, 24, 23, 21, 20, 21, 22, 20, 19, 21, 22, 20, 19, 22, 22,
+ 20, 19, 22, 22, 20, 19, 22, 23, 21, 18,
/* Size 8x32 */
- 32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 28, 28, 22, 22, 21, 33, 33,
- 33, 27, 27, 22, 22, 22, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33, 33, 27,
- 27, 22, 22, 22, 33, 32, 32, 26, 26, 22, 22, 22, 34, 32, 32, 26, 26, 22,
- 22, 23, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26, 26, 22, 22, 23,
- 32, 30, 30, 25, 25, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28,
- 28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22, 22, 22, 29, 27, 27, 23,
- 23, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22,
- 22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 26, 25, 25, 22, 22, 21, 21, 22,
- 24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 24, 24,
- 24, 22, 22, 20, 20, 21, 23, 23, 23, 22, 22, 20, 20, 20, 21, 22, 22, 21,
- 21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19,
- 19, 19, 21, 22, 22, 22, 22, 19, 19, 19, 21, 22, 22, 22, 22, 19, 19, 18,
- 21, 22, 22, 22, 22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23,
- 23, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19, 19, 18, 21, 23, 23, 22,
- 22, 19, 19, 18,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 34, 34, 34, 32, 31, 31, 31, 29, 28, 28, 28, 26,
24, 24, 24, 23, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 33, 33, 33, 33,
33, 32, 32, 32, 32, 30, 28, 28, 28, 27, 26, 26, 26, 25, 24, 24, 24, 23,
@@ -11198,7 +11183,23 @@
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20, 20, 20, 19, 19,
19, 19, 19, 19, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23,
22, 22, 22, 23, 23, 23, 23, 22, 21, 21, 21, 20, 19, 19, 19, 19, 18, 18,
- 18, 18, 18, 18 },
+ 18, 18, 18, 18,
+ /* Size 32x8 */
+ 32, 33, 33, 28, 28, 21, 21, 21, 33, 33, 33, 28, 28, 22, 22, 21, 33, 33,
+ 33, 27, 27, 22, 22, 22, 33, 33, 33, 27, 27, 22, 22, 22, 33, 33, 33, 27,
+ 27, 22, 22, 22, 33, 32, 32, 26, 26, 22, 22, 22, 34, 32, 32, 26, 26, 22,
+ 22, 23, 34, 32, 32, 26, 26, 22, 22, 23, 34, 32, 32, 26, 26, 22, 22, 23,
+ 32, 30, 30, 25, 25, 22, 22, 23, 31, 28, 28, 24, 24, 22, 22, 22, 31, 28,
+ 28, 24, 24, 22, 22, 22, 31, 28, 28, 24, 24, 22, 22, 22, 29, 27, 27, 23,
+ 23, 22, 22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 28, 26, 26, 22, 22, 22,
+ 22, 23, 28, 26, 26, 22, 22, 22, 22, 23, 26, 25, 25, 22, 22, 21, 21, 22,
+ 24, 24, 24, 22, 22, 20, 20, 21, 24, 24, 24, 22, 22, 20, 20, 21, 24, 24,
+ 24, 22, 22, 20, 20, 21, 23, 23, 23, 22, 22, 20, 20, 20, 21, 22, 22, 21,
+ 21, 19, 19, 19, 21, 22, 22, 21, 21, 19, 19, 19, 21, 22, 22, 21, 21, 19,
+ 19, 19, 21, 22, 22, 22, 22, 19, 19, 19, 21, 22, 22, 22, 22, 19, 19, 18,
+ 21, 22, 22, 22, 22, 19, 19, 18, 21, 22, 22, 22, 22, 19, 19, 18, 21, 23,
+ 23, 22, 22, 19, 19, 18, 21, 23, 23, 22, 22, 19, 19, 18, 21, 23, 23, 22,
+ 22, 19, 19, 18 },
},
{
{ /* Luma */
@@ -11284,21 +11285,12 @@
21, 21, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 28, 28, 28, 28, 28,
27, 26, 26, 26, 26, 25, 24, 24, 24, 24, 23, 21, 21, 21, 21, 20,
/* Size 4x8 */
- 33, 33, 32, 29, 32, 32, 32, 29, 32, 32, 31, 30, 32, 32, 30, 28, 32, 31,
- 29, 27, 31, 31, 28, 26, 30, 30, 28, 24, 29, 30, 27, 21,
- /* Size 8x4 */
33, 32, 32, 32, 32, 31, 30, 29, 33, 32, 32, 32, 31, 31, 30, 30, 32, 32,
31, 30, 29, 28, 28, 27, 29, 29, 30, 28, 27, 26, 24, 21,
+ /* Size 8x4 */
+ 33, 33, 32, 29, 32, 32, 32, 29, 32, 32, 31, 30, 32, 32, 30, 28, 32, 31,
+ 29, 27, 31, 31, 28, 26, 30, 30, 28, 24, 29, 30, 27, 21,
/* Size 8x16 */
- 32, 33, 33, 33, 32, 32, 29, 28, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32,
- 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32,
- 31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 31, 30, 30,
- 29, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28,
- 32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 31, 31,
- 31, 29, 28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29,
- 28, 28, 24, 23, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27,
- 22, 21,
- /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 30, 30, 28, 28, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32, 32, 32,
32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32, 32, 32,
@@ -11307,37 +11299,16 @@
29, 28, 28, 28, 27, 27, 29, 29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 26,
25, 24, 22, 22, 28, 29, 29, 29, 30, 30, 28, 28, 28, 27, 27, 25, 24, 23,
21, 21,
+ /* Size 16x8 */
+ 32, 33, 33, 33, 32, 32, 29, 28, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32,
+ 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32,
+ 31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 31, 30, 30,
+ 29, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28,
+ 32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 31, 31,
+ 31, 29, 28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29,
+ 28, 28, 24, 23, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27,
+ 22, 21,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33,
- 33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
- 30, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 29,
- 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
- 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 31, 30, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
- 31, 31, 30, 30, 30, 30, 29, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30,
- 30, 30, 30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30,
- 30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29,
- 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28,
- 28, 28, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 28, 28, 27, 27, 27,
- 32, 32, 32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32,
- 32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32, 32, 31,
- 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 31, 31, 31, 31, 31,
- 30, 29, 28, 28, 28, 28, 26, 26, 26, 26, 31, 31, 31, 31, 31, 31, 29, 28,
- 28, 28, 28, 27, 26, 25, 25, 25, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28,
- 28, 26, 25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26,
- 25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 25, 24,
- 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 24, 23, 23, 23,
- 29, 29, 30, 30, 30, 30, 28, 28, 27, 27, 27, 25, 23, 22, 22, 22, 28, 29,
- 29, 30, 30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30,
- 30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30, 30, 30,
- 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 28, 28, 28, 28, 28, 28, 27,
- 26, 26, 26, 24, 22, 21, 21, 21,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
@@ -11367,33 +11338,47 @@
27, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 28, 29, 29, 29, 29, 29,
29, 29, 30, 30, 30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 25, 24,
24, 24, 23, 22, 21, 21, 21, 21,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33,
+ 33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 31, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+ 30, 29, 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 29,
+ 29, 29, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30,
+ 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 30, 33, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 31, 30, 29, 29, 29, 29, 33, 32, 32, 32, 32, 32,
+ 31, 31, 30, 30, 30, 30, 29, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30,
+ 30, 30, 30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30,
+ 30, 29, 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29,
+ 28, 28, 28, 28, 32, 32, 32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28,
+ 28, 28, 32, 32, 32, 31, 31, 31, 31, 30, 29, 29, 29, 28, 28, 27, 27, 27,
+ 32, 32, 32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32,
+ 32, 31, 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 32, 32, 31,
+ 31, 31, 30, 29, 29, 29, 29, 28, 28, 27, 27, 27, 32, 31, 31, 31, 31, 31,
+ 30, 29, 28, 28, 28, 28, 26, 26, 26, 26, 31, 31, 31, 31, 31, 31, 29, 28,
+ 28, 28, 28, 27, 26, 25, 25, 25, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28,
+ 28, 26, 25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26,
+ 25, 24, 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 25, 24,
+ 24, 24, 30, 30, 30, 30, 30, 30, 29, 28, 28, 28, 28, 26, 24, 23, 23, 23,
+ 29, 29, 30, 30, 30, 30, 28, 28, 27, 27, 27, 25, 23, 22, 22, 22, 28, 29,
+ 29, 30, 30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30,
+ 30, 30, 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 29, 29, 30, 30, 30,
+ 28, 28, 27, 27, 27, 24, 22, 21, 21, 21, 28, 28, 28, 28, 28, 28, 28, 27,
+ 26, 26, 26, 24, 22, 21, 21, 21,
/* Size 4x16 */
- 33, 33, 32, 28, 33, 32, 32, 29, 32, 32, 32, 29, 32, 32, 32, 29, 32, 32,
- 31, 30, 32, 32, 31, 30, 32, 32, 30, 28, 32, 32, 30, 28, 32, 32, 30, 28,
- 32, 31, 29, 27, 32, 31, 29, 27, 31, 31, 28, 25, 30, 30, 28, 24, 30, 30,
- 28, 23, 29, 30, 27, 21, 29, 30, 27, 21,
- /* Size 16x4 */
33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 29, 29, 33, 32,
32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 32, 32, 32, 32,
31, 31, 30, 30, 30, 29, 29, 28, 28, 28, 27, 27, 28, 29, 29, 29, 30, 30,
28, 28, 28, 27, 27, 25, 24, 23, 21, 21,
+ /* Size 16x4 */
+ 33, 33, 32, 28, 33, 32, 32, 29, 32, 32, 32, 29, 32, 32, 32, 29, 32, 32,
+ 31, 30, 32, 32, 31, 30, 32, 32, 30, 28, 32, 32, 30, 28, 32, 32, 30, 28,
+ 32, 31, 29, 27, 32, 31, 29, 27, 31, 31, 28, 25, 30, 30, 28, 24, 30, 30,
+ 28, 23, 29, 30, 27, 21, 29, 30, 27, 21,
/* Size 8x32 */
- 32, 33, 33, 33, 32, 32, 29, 28, 33, 33, 33, 32, 32, 32, 29, 29, 33, 32,
- 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32,
- 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32,
- 30, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32, 31, 31, 30, 30,
- 33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32,
- 32, 31, 31, 31, 29, 29, 33, 32, 32, 31, 30, 30, 29, 28, 32, 32, 32, 31,
- 30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30,
- 28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 31, 31, 29, 29, 28, 27,
- 32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 32, 32,
- 31, 30, 29, 29, 28, 27, 32, 31, 31, 30, 28, 28, 26, 26, 31, 31, 31, 29,
- 28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28,
- 25, 24, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28, 24, 23,
- 29, 30, 30, 28, 27, 27, 23, 22, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29,
- 30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27, 22, 21, 28, 28, 28, 28,
- 26, 26, 22, 21,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
32, 32, 32, 32, 31, 30, 30, 30, 30, 29, 28, 28, 28, 28, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
@@ -11408,7 +11393,23 @@
30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 28, 26, 26, 25,
25, 25, 24, 23, 22, 22, 22, 22, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30,
30, 29, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 25, 24, 24, 24, 23, 22,
- 21, 21, 21, 21 },
+ 21, 21, 21, 21,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 32, 32, 29, 28, 33, 33, 33, 32, 32, 32, 29, 29, 33, 32,
+ 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32,
+ 32, 32, 29, 29, 33, 32, 32, 32, 32, 32, 29, 29, 33, 32, 32, 32, 32, 32,
+ 30, 29, 33, 32, 32, 32, 32, 32, 30, 29, 33, 32, 32, 32, 31, 31, 30, 30,
+ 33, 32, 32, 32, 31, 31, 30, 30, 33, 32, 32, 32, 31, 31, 30, 30, 33, 32,
+ 32, 31, 31, 31, 29, 29, 33, 32, 32, 31, 30, 30, 29, 28, 32, 32, 32, 31,
+ 30, 30, 28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 32, 31, 30, 30,
+ 28, 28, 32, 32, 32, 31, 30, 30, 28, 28, 32, 32, 31, 31, 29, 29, 28, 27,
+ 32, 32, 31, 30, 29, 29, 28, 27, 32, 32, 31, 30, 29, 29, 28, 27, 32, 32,
+ 31, 30, 29, 29, 28, 27, 32, 31, 31, 30, 28, 28, 26, 26, 31, 31, 31, 29,
+ 28, 28, 26, 25, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28,
+ 25, 24, 30, 30, 30, 29, 28, 28, 25, 24, 30, 30, 30, 29, 28, 28, 24, 23,
+ 29, 30, 30, 28, 27, 27, 23, 22, 28, 29, 30, 28, 27, 27, 22, 21, 28, 29,
+ 30, 28, 27, 27, 22, 21, 28, 29, 30, 28, 27, 27, 22, 21, 28, 28, 28, 28,
+ 26, 26, 22, 21 },
{ /* Chroma */
/* Size 4x4 */
33, 32, 27, 22, 32, 30, 25, 22, 27, 25, 22, 22, 22, 22, 22, 20,
@@ -11492,21 +11493,12 @@
19, 19, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 19,
/* Size 4x8 */
- 33, 33, 28, 21, 33, 33, 27, 22, 33, 32, 26, 22, 30, 28, 24, 22, 28, 26,
- 22, 22, 26, 25, 22, 21, 24, 24, 22, 20, 21, 22, 21, 19,
- /* Size 8x4 */
33, 33, 33, 30, 28, 26, 24, 21, 33, 33, 32, 28, 26, 25, 24, 22, 28, 27,
26, 24, 22, 22, 22, 21, 21, 22, 22, 22, 22, 21, 20, 19,
+ /* Size 8x4 */
+ 33, 33, 28, 21, 33, 33, 27, 22, 33, 32, 26, 22, 30, 28, 24, 22, 28, 26,
+ 22, 22, 26, 25, 22, 21, 24, 24, 22, 20, 21, 22, 21, 19,
/* Size 8x16 */
- 32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33,
- 33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29,
- 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 31, 30, 29, 28, 24, 24,
- 22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 29, 28, 28, 26, 23, 23, 22, 22,
- 28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 25, 24,
- 24, 23, 22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23,
- 22, 22, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21,
- 20, 19,
- /* Size 16x8 */
32, 33, 33, 33, 34, 34, 31, 31, 29, 28, 28, 25, 24, 23, 21, 21, 33, 33,
33, 33, 32, 32, 30, 29, 28, 26, 26, 24, 24, 23, 22, 22, 33, 33, 33, 32,
32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 31, 30, 30, 30, 29, 29,
@@ -11515,37 +11507,16 @@
22, 22, 22, 22, 21, 21, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 21,
21, 20, 20, 20, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 20, 20,
19, 19,
+ /* Size 16x8 */
+ 32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33,
+ 33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29,
+ 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 31, 30, 29, 28, 24, 24,
+ 22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 29, 28, 28, 26, 23, 23, 22, 22,
+ 28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 25, 24,
+ 24, 23, 22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23,
+ 22, 22, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21,
+ 20, 19,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 26, 23, 21, 21, 21, 33, 33,
- 33, 33, 33, 33, 31, 28, 28, 28, 28, 25, 23, 21, 21, 21, 33, 33, 33, 33,
- 33, 33, 30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33,
- 30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28,
- 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28, 27, 27,
- 27, 25, 23, 22, 22, 22, 33, 33, 33, 32, 32, 32, 30, 28, 26, 26, 26, 25,
- 23, 22, 22, 22, 34, 33, 33, 32, 32, 32, 30, 27, 26, 26, 26, 24, 23, 22,
- 22, 22, 34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22,
- 34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 34, 33,
- 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 33, 32, 31, 31,
- 31, 31, 28, 26, 25, 25, 25, 24, 23, 22, 22, 22, 31, 30, 30, 29, 29, 29,
- 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25,
- 24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24,
- 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24, 24, 23,
- 22, 22, 22, 22, 29, 28, 28, 28, 28, 28, 26, 24, 23, 23, 23, 23, 22, 22,
- 22, 22, 28, 28, 27, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
- 28, 27, 26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27,
- 26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27, 26, 26,
- 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 26, 26, 26, 25, 25, 25,
- 24, 22, 22, 22, 22, 21, 21, 21, 21, 21, 25, 25, 24, 24, 24, 24, 23, 22,
- 22, 22, 22, 21, 21, 21, 21, 21, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22,
- 22, 21, 21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21,
- 21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21, 21, 20,
- 20, 20, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 21, 20, 20, 20, 20,
- 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 21, 21,
- 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22,
- 22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22,
- 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22,
- 22, 22, 22, 21, 20, 19, 19, 19,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 29, 28,
28, 28, 28, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 28, 28, 27, 27, 27, 26,
@@ -11575,33 +11546,47 @@
22, 21, 21, 20, 20, 20, 20, 20, 19, 19, 19, 19, 21, 21, 22, 22, 22, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20,
20, 20, 20, 20, 19, 19, 19, 19,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 26, 23, 21, 21, 21, 33, 33,
+ 33, 33, 33, 33, 31, 28, 28, 28, 28, 25, 23, 21, 21, 21, 33, 33, 33, 33,
+ 33, 33, 30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33,
+ 30, 28, 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28,
+ 27, 27, 27, 25, 23, 22, 22, 22, 33, 33, 33, 33, 33, 33, 30, 28, 27, 27,
+ 27, 25, 23, 22, 22, 22, 33, 33, 33, 32, 32, 32, 30, 28, 26, 26, 26, 25,
+ 23, 22, 22, 22, 34, 33, 33, 32, 32, 32, 30, 27, 26, 26, 26, 24, 23, 22,
+ 22, 22, 34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22,
+ 34, 33, 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 34, 33,
+ 32, 32, 32, 32, 29, 27, 26, 26, 26, 24, 23, 22, 22, 22, 33, 32, 31, 31,
+ 31, 31, 28, 26, 25, 25, 25, 24, 23, 22, 22, 22, 31, 30, 30, 29, 29, 29,
+ 28, 26, 24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25,
+ 24, 24, 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24,
+ 24, 23, 22, 22, 22, 22, 31, 30, 29, 28, 28, 28, 27, 25, 24, 24, 24, 23,
+ 22, 22, 22, 22, 29, 28, 28, 28, 28, 28, 26, 24, 23, 23, 23, 23, 22, 22,
+ 22, 22, 28, 28, 27, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22,
+ 28, 27, 26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27,
+ 26, 26, 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 28, 27, 26, 26,
+ 26, 26, 24, 23, 22, 22, 22, 22, 22, 22, 22, 22, 26, 26, 26, 25, 25, 25,
+ 24, 22, 22, 22, 22, 21, 21, 21, 21, 21, 25, 25, 24, 24, 24, 24, 23, 22,
+ 22, 22, 22, 21, 21, 21, 21, 21, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22,
+ 22, 21, 21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21,
+ 21, 20, 20, 20, 24, 24, 24, 24, 24, 24, 23, 22, 22, 22, 22, 21, 21, 20,
+ 20, 20, 23, 23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 21, 20, 20, 20, 20,
+ 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, 21, 21,
+ 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22,
+ 22, 22, 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22,
+ 22, 21, 21, 21, 21, 20, 20, 19, 19, 19, 21, 21, 22, 22, 22, 22, 22, 22,
+ 22, 22, 22, 21, 20, 19, 19, 19,
/* Size 4x16 */
- 33, 33, 28, 21, 33, 33, 27, 22, 33, 33, 27, 22, 33, 32, 26, 22, 33, 32,
- 26, 22, 33, 32, 26, 22, 30, 29, 24, 22, 30, 28, 24, 22, 28, 28, 23, 22,
- 27, 26, 22, 22, 27, 26, 22, 22, 25, 24, 22, 21, 24, 24, 22, 20, 23, 23,
- 22, 20, 21, 22, 21, 19, 21, 22, 21, 19,
- /* Size 16x4 */
33, 33, 33, 33, 33, 33, 30, 30, 28, 27, 27, 25, 24, 23, 21, 21, 33, 33,
33, 32, 32, 32, 29, 28, 28, 26, 26, 24, 24, 23, 22, 22, 28, 27, 27, 26,
26, 26, 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 21, 22, 22, 22, 22, 22,
22, 22, 22, 22, 22, 21, 20, 20, 19, 19,
+ /* Size 16x4 */
+ 33, 33, 28, 21, 33, 33, 27, 22, 33, 33, 27, 22, 33, 32, 26, 22, 33, 32,
+ 26, 22, 33, 32, 26, 22, 30, 29, 24, 22, 30, 28, 24, 22, 28, 28, 23, 22,
+ 27, 26, 22, 22, 27, 26, 22, 22, 25, 24, 22, 21, 24, 24, 22, 20, 23, 23,
+ 22, 20, 21, 22, 21, 19, 21, 22, 21, 19,
/* Size 8x32 */
- 32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 31, 28, 28, 23, 21, 33, 33,
- 33, 30, 27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 33, 30,
- 27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26,
- 23, 22, 34, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22,
- 34, 32, 32, 29, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 33, 31,
- 31, 28, 25, 25, 23, 22, 31, 30, 29, 28, 24, 24, 22, 22, 31, 29, 28, 27,
- 24, 24, 22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 31, 29, 28, 27, 24, 24,
- 22, 22, 29, 28, 28, 26, 23, 23, 22, 22, 28, 27, 26, 24, 22, 22, 22, 22,
- 28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 28, 26,
- 26, 24, 22, 22, 22, 22, 26, 26, 25, 24, 22, 22, 21, 21, 25, 24, 24, 23,
- 22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 24, 24, 24, 23, 22, 22,
- 21, 20, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23, 22, 22, 20, 20,
- 22, 22, 22, 22, 21, 21, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22,
- 22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22,
- 22, 22, 20, 19,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 29, 28,
28, 28, 28, 26, 25, 24, 24, 24, 23, 22, 21, 21, 21, 21, 33, 33, 33, 33,
33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 28, 27, 26, 26, 26, 26,
@@ -11616,7 +11601,23 @@
23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 21,
21, 21, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 20, 20, 20, 20, 20,
- 19, 19, 19, 19 },
+ 19, 19, 19, 19,
+ /* Size 32x8 */
+ 32, 33, 33, 31, 28, 28, 23, 21, 33, 33, 33, 31, 28, 28, 23, 21, 33, 33,
+ 33, 30, 27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 33, 30,
+ 27, 27, 23, 22, 33, 33, 33, 30, 27, 27, 23, 22, 33, 33, 32, 30, 26, 26,
+ 23, 22, 34, 33, 32, 30, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22,
+ 34, 32, 32, 29, 26, 26, 23, 22, 34, 32, 32, 29, 26, 26, 23, 22, 33, 31,
+ 31, 28, 25, 25, 23, 22, 31, 30, 29, 28, 24, 24, 22, 22, 31, 29, 28, 27,
+ 24, 24, 22, 22, 31, 29, 28, 27, 24, 24, 22, 22, 31, 29, 28, 27, 24, 24,
+ 22, 22, 29, 28, 28, 26, 23, 23, 22, 22, 28, 27, 26, 24, 22, 22, 22, 22,
+ 28, 26, 26, 24, 22, 22, 22, 22, 28, 26, 26, 24, 22, 22, 22, 22, 28, 26,
+ 26, 24, 22, 22, 22, 22, 26, 26, 25, 24, 22, 22, 21, 21, 25, 24, 24, 23,
+ 22, 22, 21, 21, 24, 24, 24, 23, 22, 22, 21, 20, 24, 24, 24, 23, 22, 22,
+ 21, 20, 24, 24, 24, 23, 22, 22, 21, 20, 23, 23, 23, 23, 22, 22, 20, 20,
+ 22, 22, 22, 22, 21, 21, 20, 20, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22,
+ 22, 22, 21, 21, 20, 19, 21, 22, 22, 22, 21, 21, 20, 19, 21, 22, 22, 22,
+ 22, 22, 20, 19 },
},
{
{ /* Luma */
@@ -11702,21 +11703,12 @@
26, 26, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 30, 30,
29, 29, 29, 29, 29, 29, 28, 28, 28, 28, 28, 28, 27, 27, 26, 26,
/* Size 4x8 */
- 33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32,
- 31, 30, 32, 32, 30, 30, 32, 31, 30, 29, 31, 31, 29, 28,
- /* Size 8x4 */
33, 33, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 31, 31, 32, 32,
32, 32, 31, 30, 30, 29, 32, 32, 32, 31, 30, 30, 29, 28,
+ /* Size 8x4 */
+ 33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 32, 32,
+ 31, 30, 32, 32, 30, 30, 32, 31, 30, 29, 31, 31, 29, 28,
/* Size 8x16 */
- 32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 31, 33, 32,
- 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
- 32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31,
- 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 30, 30, 30,
- 32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
- 32, 32, 31, 29, 29, 29, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31,
- 30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28,
- 28, 27,
- /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 33, 33,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32,
@@ -11725,37 +11717,16 @@
30, 29, 29, 29, 28, 28, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 29,
29, 29, 28, 28, 32, 31, 31, 31, 31, 31, 31, 31, 30, 30, 30, 29, 28, 28,
28, 27,
+ /* Size 16x8 */
+ 32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 31, 33, 32,
+ 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+ 32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31,
+ 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 30, 30, 30,
+ 32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
+ 32, 32, 31, 29, 29, 29, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31,
+ 30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28,
+ 28, 27,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 33, 33,
- 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 33, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30,
- 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 33, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31,
- 31, 31, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
- 31, 31, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30,
- 30, 29, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29,
- 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32,
- 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32,
- 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32, 32, 32,
- 32, 31, 31, 30, 30, 30, 30, 30, 29, 29, 32, 32, 32, 32, 32, 32, 32, 31,
- 31, 30, 29, 29, 29, 29, 29, 28, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30,
- 29, 29, 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29,
- 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29,
- 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28,
- 32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28, 32, 31,
- 31, 31, 31, 31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 27, 31, 31, 31, 31,
- 31, 31, 31, 30, 30, 29, 28, 28, 28, 28, 28, 27, 30, 30, 30, 30, 30, 30,
- 30, 30, 29, 28, 28, 28, 28, 28, 27, 26, 30, 30, 30, 30, 30, 30, 30, 30,
- 29, 28, 28, 28, 28, 28, 27, 26,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 33,
33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -11785,33 +11756,47 @@
30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 27, 30, 30, 30, 30, 30, 30,
30, 30, 30, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 29, 29, 28, 28,
28, 28, 28, 28, 27, 27, 26, 26,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 30, 33, 33,
+ 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 33, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 31, 30, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30,
+ 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 33, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+ 31, 31, 31, 31, 31, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31,
+ 31, 31, 30, 30, 33, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30, 30, 30, 30,
+ 30, 29, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29,
+ 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32,
+ 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32,
+ 32, 32, 32, 31, 31, 31, 30, 30, 30, 30, 30, 29, 32, 32, 32, 32, 32, 32,
+ 32, 31, 31, 30, 30, 30, 30, 30, 29, 29, 32, 32, 32, 32, 32, 32, 32, 31,
+ 31, 30, 29, 29, 29, 29, 29, 28, 32, 32, 32, 32, 31, 31, 31, 31, 31, 30,
+ 29, 29, 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29,
+ 29, 29, 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29,
+ 28, 28, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28,
+ 32, 32, 32, 31, 31, 31, 31, 31, 30, 30, 29, 29, 29, 29, 28, 28, 32, 31,
+ 31, 31, 31, 31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 27, 31, 31, 31, 31,
+ 31, 31, 31, 30, 30, 29, 28, 28, 28, 28, 28, 27, 30, 30, 30, 30, 30, 30,
+ 30, 30, 29, 28, 28, 28, 28, 28, 27, 26, 30, 30, 30, 30, 30, 30, 30, 30,
+ 29, 28, 28, 28, 28, 28, 27, 26,
/* Size 4x16 */
- 33, 33, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
- 32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 31, 31, 32, 32, 31, 30,
- 32, 32, 31, 30, 32, 32, 31, 30, 32, 32, 30, 29, 32, 31, 30, 29, 32, 31,
- 30, 29, 31, 31, 29, 28, 30, 30, 28, 28,
- /* Size 16x4 */
33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 32, 32, 32, 32,
32, 32, 32, 31, 31, 31, 31, 30, 30, 30, 29, 28, 32, 32, 32, 32, 32, 31,
31, 31, 30, 30, 30, 29, 29, 29, 28, 28,
+ /* Size 16x4 */
+ 33, 33, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+ 32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32, 31, 31, 32, 32, 31, 30,
+ 32, 32, 31, 30, 32, 32, 31, 30, 32, 32, 30, 29, 32, 31, 30, 29, 32, 31,
+ 30, 29, 31, 31, 29, 28, 30, 30, 28, 28,
/* Size 8x32 */
- 32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 31, 33, 33,
- 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
- 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32,
- 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31,
- 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32,
- 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32,
- 32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 31,
- 31, 30, 33, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30,
- 32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
- 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 29, 32, 32, 32, 32,
- 31, 29, 29, 29, 32, 32, 31, 31, 31, 29, 29, 28, 32, 32, 31, 31, 30, 29,
- 29, 28, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31, 30, 29, 29, 28,
- 32, 32, 31, 31, 30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 31, 31,
- 31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28, 28, 27, 30, 30, 30, 30,
- 29, 28, 28, 27,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 30, 30, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -11826,7 +11811,23 @@
32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29,
29, 29, 29, 29, 28, 28, 28, 28, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31,
31, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 28, 28, 28, 28, 28,
- 28, 28, 27, 27 },
+ 28, 28, 27, 27,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 31, 33, 33,
+ 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+ 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32,
+ 32, 31, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 32, 32, 31,
+ 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32,
+ 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32,
+ 32, 31, 31, 31, 33, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 31, 31,
+ 31, 30, 33, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30,
+ 32, 32, 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 30, 32, 32,
+ 32, 32, 31, 30, 30, 30, 32, 32, 32, 32, 31, 30, 30, 29, 32, 32, 32, 32,
+ 31, 29, 29, 29, 32, 32, 31, 31, 31, 29, 29, 28, 32, 32, 31, 31, 30, 29,
+ 29, 28, 32, 32, 31, 31, 30, 29, 29, 28, 32, 32, 31, 31, 30, 29, 29, 28,
+ 32, 32, 31, 31, 30, 29, 29, 28, 32, 31, 31, 31, 30, 28, 28, 28, 31, 31,
+ 31, 31, 30, 28, 28, 28, 30, 30, 30, 30, 29, 28, 28, 27, 30, 30, 30, 30,
+ 29, 28, 28, 27 },
{ /* Chroma */
/* Size 4x4 */
33, 33, 30, 27, 33, 32, 29, 26, 30, 29, 26, 24, 27, 26, 24, 22,
@@ -11910,21 +11911,12 @@
21, 21, 25, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
23, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22, 21, 21, 21, 21,
/* Size 4x8 */
- 33, 33, 29, 28, 33, 33, 28, 27, 33, 32, 28, 26, 33, 32, 28, 26, 30, 28,
- 26, 24, 29, 28, 24, 23, 27, 26, 23, 22, 25, 24, 23, 22,
- /* Size 8x4 */
33, 33, 33, 33, 30, 29, 27, 25, 33, 33, 32, 32, 28, 28, 26, 24, 29, 28,
28, 28, 26, 24, 23, 23, 28, 27, 26, 26, 24, 23, 22, 22,
+ /* Size 8x4 */
+ 33, 33, 29, 28, 33, 33, 28, 27, 33, 32, 28, 26, 33, 32, 28, 26, 30, 28,
+ 26, 24, 29, 28, 24, 23, 27, 26, 23, 22, 25, 24, 23, 22,
/* Size 8x16 */
- 32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 27, 27, 26, 33, 33,
- 33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32,
- 30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26,
- 26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 31, 30, 29, 29, 28, 24, 24, 24,
- 31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 29, 28,
- 27, 27, 25, 23, 23, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26,
- 24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 24, 24, 24, 24, 23, 22,
- 22, 21,
- /* Size 16x8 */
32, 33, 33, 33, 33, 34, 34, 33, 31, 31, 31, 29, 28, 28, 26, 24, 33, 33,
33, 33, 33, 33, 33, 32, 30, 29, 29, 28, 26, 26, 26, 24, 33, 33, 33, 33,
32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 33, 33, 33, 33, 32, 32,
@@ -11933,37 +11925,16 @@
24, 23, 22, 22, 22, 22, 28, 27, 27, 27, 26, 26, 26, 26, 24, 24, 24, 23,
22, 22, 22, 22, 27, 26, 26, 26, 26, 25, 25, 25, 24, 23, 23, 22, 22, 22,
22, 21,
+ /* Size 16x8 */
+ 32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 27, 27, 26, 33, 33,
+ 33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32,
+ 30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26,
+ 26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 31, 30, 29, 29, 28, 24, 24, 24,
+ 31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 29, 28,
+ 27, 27, 25, 23, 23, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26,
+ 24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 24, 24, 24, 24, 23, 22,
+ 22, 21,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 27, 24, 33, 33,
- 33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 26, 24, 33, 33, 33, 33,
- 33, 33, 33, 32, 31, 29, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33,
- 33, 32, 30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32,
- 30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28,
- 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27,
- 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27, 27, 27,
- 26, 24, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24,
- 34, 33, 33, 32, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24, 34, 33,
- 33, 32, 32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32,
- 32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32,
- 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32, 32, 31,
- 29, 28, 26, 26, 26, 26, 25, 24, 33, 33, 32, 32, 31, 31, 31, 31, 29, 27,
- 26, 26, 26, 26, 25, 24, 32, 32, 31, 31, 30, 30, 30, 30, 28, 26, 25, 25,
- 25, 25, 24, 23, 31, 31, 30, 29, 29, 29, 29, 29, 28, 26, 24, 24, 24, 24,
- 24, 23, 31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23,
- 31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30,
- 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30, 29, 29,
- 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 30, 29, 28, 28, 28, 28,
- 28, 28, 26, 24, 23, 23, 23, 23, 23, 23, 29, 28, 28, 27, 27, 27, 27, 26,
- 25, 24, 23, 23, 23, 23, 22, 22, 28, 28, 27, 26, 26, 26, 26, 26, 24, 23,
- 22, 22, 22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22,
- 22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22,
- 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22,
- 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22, 26, 26,
- 26, 25, 25, 25, 25, 24, 24, 23, 22, 22, 22, 22, 22, 21, 26, 25, 25, 24,
- 24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 21, 24, 24, 24, 24, 24, 24,
- 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 24, 24, 24, 24, 24, 24, 24, 24,
- 23, 22, 22, 22, 22, 22, 21, 21,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 32, 31, 31,
31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 26, 26, 24, 24, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 30, 29,
@@ -11993,33 +11964,47 @@
23, 23, 22, 22, 22, 22, 22, 22, 22, 22, 21, 21, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 23, 23, 23, 23, 23, 23, 22, 22,
22, 22, 22, 22, 21, 21, 21, 21,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 27, 24, 33, 33,
+ 33, 33, 33, 33, 33, 33, 31, 29, 28, 28, 28, 28, 26, 24, 33, 33, 33, 33,
+ 33, 33, 33, 32, 31, 29, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33,
+ 33, 32, 30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32,
+ 30, 28, 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28,
+ 27, 27, 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27,
+ 27, 27, 26, 24, 33, 33, 33, 33, 33, 33, 33, 32, 30, 28, 27, 27, 27, 27,
+ 26, 24, 33, 33, 33, 33, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24,
+ 34, 33, 33, 32, 32, 32, 32, 32, 30, 28, 26, 26, 26, 26, 26, 24, 34, 33,
+ 33, 32, 32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32,
+ 32, 32, 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32,
+ 32, 31, 29, 28, 26, 26, 26, 26, 25, 24, 34, 33, 33, 32, 32, 32, 32, 31,
+ 29, 28, 26, 26, 26, 26, 25, 24, 33, 33, 32, 32, 31, 31, 31, 31, 29, 27,
+ 26, 26, 26, 26, 25, 24, 32, 32, 31, 31, 30, 30, 30, 30, 28, 26, 25, 25,
+ 25, 25, 24, 23, 31, 31, 30, 29, 29, 29, 29, 29, 28, 26, 24, 24, 24, 24,
+ 24, 23, 31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23,
+ 31, 30, 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30,
+ 29, 29, 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 31, 30, 29, 29,
+ 28, 28, 28, 28, 27, 26, 24, 24, 24, 24, 23, 23, 30, 29, 28, 28, 28, 28,
+ 28, 28, 26, 24, 23, 23, 23, 23, 23, 23, 29, 28, 28, 27, 27, 27, 27, 26,
+ 25, 24, 23, 23, 23, 23, 22, 22, 28, 28, 27, 26, 26, 26, 26, 26, 24, 23,
+ 22, 22, 22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22,
+ 22, 22, 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22,
+ 22, 22, 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22,
+ 28, 27, 26, 26, 26, 26, 26, 25, 24, 23, 22, 22, 22, 22, 22, 22, 26, 26,
+ 26, 25, 25, 25, 25, 24, 24, 23, 22, 22, 22, 22, 22, 21, 26, 25, 25, 24,
+ 24, 24, 24, 24, 23, 23, 22, 22, 22, 22, 22, 21, 24, 24, 24, 24, 24, 24,
+ 24, 24, 23, 22, 22, 22, 22, 22, 21, 21, 24, 24, 24, 24, 24, 24, 24, 24,
+ 23, 22, 22, 22, 22, 22, 21, 21,
/* Size 4x16 */
- 33, 33, 29, 28, 33, 33, 29, 27, 33, 33, 28, 27, 33, 33, 28, 27, 33, 32,
- 28, 26, 33, 32, 28, 26, 33, 32, 28, 26, 33, 31, 27, 26, 31, 29, 26, 24,
- 30, 28, 26, 24, 30, 28, 26, 24, 28, 27, 24, 23, 27, 26, 23, 22, 27, 26,
- 23, 22, 26, 25, 23, 22, 24, 24, 22, 22,
- /* Size 16x4 */
33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 30, 28, 27, 27, 26, 24, 33, 33,
33, 33, 32, 32, 32, 31, 29, 28, 28, 27, 26, 26, 25, 24, 29, 29, 28, 28,
28, 28, 28, 27, 26, 26, 26, 24, 23, 23, 23, 22, 28, 27, 27, 27, 26, 26,
26, 26, 24, 24, 24, 23, 22, 22, 22, 22,
+ /* Size 16x4 */
+ 33, 33, 29, 28, 33, 33, 29, 27, 33, 33, 28, 27, 33, 33, 28, 27, 33, 32,
+ 28, 26, 33, 32, 28, 26, 33, 32, 28, 26, 33, 31, 27, 26, 31, 29, 26, 24,
+ 30, 28, 26, 24, 30, 28, 26, 24, 28, 27, 24, 23, 27, 26, 23, 22, 27, 26,
+ 23, 22, 26, 25, 23, 22, 24, 24, 22, 22,
/* Size 8x32 */
- 32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 28, 28, 26, 33, 33,
- 33, 33, 31, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33,
- 30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27,
- 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32, 30, 26, 26, 26,
- 34, 33, 32, 32, 30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33,
- 32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32,
- 29, 26, 26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 32, 31, 30, 30, 28, 25,
- 25, 24, 31, 30, 29, 29, 28, 24, 24, 24, 31, 29, 28, 28, 27, 24, 24, 23,
- 31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 31, 29,
- 28, 28, 27, 24, 24, 23, 30, 28, 28, 28, 26, 23, 23, 23, 29, 28, 27, 27,
- 25, 23, 23, 22, 28, 27, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22,
- 22, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
- 28, 26, 26, 26, 24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 26, 25,
- 24, 24, 23, 22, 22, 22, 24, 24, 24, 24, 23, 22, 22, 21, 24, 24, 24, 24,
- 23, 22, 22, 21,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 32, 31, 31,
31, 31, 31, 30, 29, 28, 28, 28, 28, 28, 26, 26, 24, 24, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 29, 29, 29, 28,
@@ -12034,7 +12019,23 @@
27, 27, 26, 26, 26, 26, 26, 26, 26, 25, 24, 24, 24, 24, 24, 23, 23, 22,
22, 22, 22, 22, 22, 22, 22, 22, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26,
25, 25, 25, 25, 25, 24, 24, 23, 23, 23, 23, 23, 22, 22, 22, 22, 22, 22,
- 22, 22, 21, 21 },
+ 22, 22, 21, 21,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 31, 28, 28, 27, 33, 33, 33, 33, 31, 28, 28, 26, 33, 33,
+ 33, 33, 31, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33,
+ 30, 27, 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 33, 33, 30, 27,
+ 27, 26, 33, 33, 33, 33, 30, 27, 27, 26, 33, 33, 32, 32, 30, 26, 26, 26,
+ 34, 33, 32, 32, 30, 26, 26, 26, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33,
+ 32, 32, 29, 26, 26, 25, 34, 33, 32, 32, 29, 26, 26, 25, 34, 33, 32, 32,
+ 29, 26, 26, 25, 33, 32, 31, 31, 29, 26, 26, 25, 32, 31, 30, 30, 28, 25,
+ 25, 24, 31, 30, 29, 29, 28, 24, 24, 24, 31, 29, 28, 28, 27, 24, 24, 23,
+ 31, 29, 28, 28, 27, 24, 24, 23, 31, 29, 28, 28, 27, 24, 24, 23, 31, 29,
+ 28, 28, 27, 24, 24, 23, 30, 28, 28, 28, 26, 23, 23, 23, 29, 28, 27, 27,
+ 25, 23, 23, 22, 28, 27, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22,
+ 22, 22, 28, 26, 26, 26, 24, 22, 22, 22, 28, 26, 26, 26, 24, 22, 22, 22,
+ 28, 26, 26, 26, 24, 22, 22, 22, 26, 26, 25, 25, 24, 22, 22, 22, 26, 25,
+ 24, 24, 23, 22, 22, 22, 24, 24, 24, 24, 23, 22, 22, 21, 24, 24, 24, 24,
+ 23, 22, 22, 21 },
},
{
{ /* Luma */
@@ -12120,21 +12121,12 @@
31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31,
/* Size 4x8 */
- 33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
- 32, 32, 33, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
- /* Size 8x4 */
33, 33, 33, 33, 33, 33, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+ /* Size 8x4 */
+ 33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+ 32, 32, 33, 32, 32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
/* Size 8x16 */
- 32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
- 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
- 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
- 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
- 32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
- 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32,
- 31, 30,
- /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
@@ -12143,37 +12135,16 @@
32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 30,
30, 30,
+ /* Size 16x8 */
+ 32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
+ 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+ 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+ 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+ 32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 32, 32, 32, 32,
+ 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32,
+ 31, 30,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
- 31, 31, 31, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
- 30, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 31, 31, 31, 30, 30,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12203,33 +12174,47 @@
32, 31, 31, 31, 31, 30, 30, 30, 30, 30, 30, 30, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 31, 31, 31, 31, 31,
30, 30, 30, 30, 30, 30, 30, 30,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 31, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 31, 31, 31, 31, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31,
+ 31, 31, 31, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31,
+ 30, 30, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 31, 31, 31, 30, 30, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 31, 31, 31, 30, 30,
/* Size 4x16 */
- 33, 33, 33, 32, 33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
- 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
- 33, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32,
- 32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
- /* Size 16x4 */
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 31, 31, 31, 31, 31,
+ /* Size 16x4 */
+ 33, 33, 33, 32, 33, 33, 33, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+ 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+ 33, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 32, 32, 32, 31, 32, 32,
+ 32, 31, 32, 32, 32, 31, 32, 32, 32, 31,
/* Size 8x32 */
- 32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
- 33, 33, 33, 33, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
- 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
- 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
- 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
- 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
- 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
- 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
- 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
- 32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32,
- 31, 31, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30,
- 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32,
- 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32,
- 32, 32, 31, 30,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -12244,7 +12229,23 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31,
31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 31, 31, 31, 31, 30, 30, 30,
- 30, 30, 30, 30 },
+ 30, 30, 30, 30,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 33, 33, 33, 32, 33, 33, 33, 33, 33, 33, 32, 32, 33, 33,
+ 33, 33, 33, 33, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+ 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+ 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+ 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+ 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32,
+ 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32,
+ 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32,
+ 33, 32, 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 32, 33, 32,
+ 32, 32, 32, 32, 32, 32, 33, 32, 32, 32, 32, 32, 32, 31, 33, 32, 32, 32,
+ 32, 32, 31, 31, 33, 32, 32, 32, 32, 32, 31, 31, 33, 32, 32, 32, 32, 32,
+ 31, 31, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30,
+ 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32,
+ 32, 32, 32, 32, 31, 30, 32, 32, 32, 32, 32, 32, 31, 30, 32, 32, 32, 32,
+ 32, 32, 31, 30 },
{ /* Chroma */
/* Size 4x4 */
33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 32, 29, 30, 29, 29, 26,
@@ -12328,21 +12329,12 @@
26, 26, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 28, 28, 28,
28, 28, 28, 28, 28, 28, 28, 27, 26, 26, 26, 26, 26, 26, 26, 26,
/* Size 4x8 */
- 33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32,
- 32, 28, 33, 31, 31, 28, 30, 28, 28, 26, 30, 28, 28, 26,
- /* Size 8x4 */
33, 33, 33, 33, 33, 33, 30, 30, 33, 33, 33, 32, 32, 31, 28, 28, 33, 33,
33, 32, 32, 31, 28, 28, 30, 29, 29, 28, 28, 28, 26, 26,
+ /* Size 8x4 */
+ 33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32,
+ 32, 28, 33, 31, 31, 28, 30, 28, 28, 26, 30, 28, 28, 26,
/* Size 8x16 */
- 32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33,
- 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33,
- 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32,
- 30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32, 29, 27,
- 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 33, 32,
- 31, 31, 31, 31, 28, 26, 31, 30, 30, 29, 29, 29, 28, 26, 31, 30, 29, 28,
- 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28,
- 27, 25,
- /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 33, 31, 31, 31, 31, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 30, 30, 30, 30, 33, 33, 33, 33,
33, 33, 33, 33, 32, 32, 32, 31, 30, 29, 29, 29, 33, 33, 33, 33, 33, 33,
@@ -12351,37 +12343,16 @@
32, 31, 29, 28, 28, 28, 31, 31, 30, 30, 30, 30, 30, 30, 29, 29, 29, 28,
28, 27, 27, 27, 29, 28, 28, 28, 28, 28, 28, 27, 27, 27, 27, 26, 26, 25,
25, 25,
+ /* Size 16x8 */
+ 32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33,
+ 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33,
+ 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32,
+ 30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32, 29, 27,
+ 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 33, 32,
+ 31, 31, 31, 31, 28, 26, 31, 30, 30, 29, 29, 29, 28, 26, 31, 30, 29, 28,
+ 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28,
+ 27, 25,
/* Size 16x32 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 28, 28, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 32, 31, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 32, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31,
- 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29,
- 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 32,
- 32, 32, 32, 32, 32, 31, 30, 28, 28, 26, 33, 33, 33, 33, 33, 32, 32, 32,
- 32, 32, 32, 31, 30, 28, 28, 26, 34, 33, 33, 33, 33, 32, 32, 32, 32, 32,
- 32, 31, 30, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
- 29, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28,
- 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26,
- 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 33, 33, 33, 32, 32, 31,
- 31, 31, 31, 31, 31, 30, 29, 28, 27, 26, 33, 32, 32, 31, 31, 31, 31, 31,
- 31, 31, 31, 29, 28, 28, 26, 25, 32, 32, 31, 31, 30, 30, 30, 30, 30, 30,
- 30, 29, 28, 27, 26, 25, 31, 31, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28,
- 28, 26, 26, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26,
- 25, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24,
- 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30,
- 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29,
- 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29, 29, 28,
- 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 30, 30, 29, 29, 28, 28, 28, 28,
- 28, 28, 28, 27, 26, 26, 24, 23,
- /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
34, 34, 34, 33, 33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12411,33 +12382,47 @@
27, 27, 26, 26, 26, 25, 25, 25, 25, 25, 25, 24, 28, 28, 28, 27, 27, 27,
27, 27, 27, 27, 27, 27, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 25, 25,
24, 24, 24, 24, 24, 24, 24, 23,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 29, 28, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 28, 28, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 32, 31, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 32, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31,
+ 30, 29, 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29,
+ 28, 27, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 31, 30, 29, 28, 27, 33, 33, 33, 33, 33, 32,
+ 32, 32, 32, 32, 32, 31, 30, 28, 28, 26, 33, 33, 33, 33, 33, 32, 32, 32,
+ 32, 32, 32, 31, 30, 28, 28, 26, 34, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+ 32, 31, 30, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31,
+ 29, 28, 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28,
+ 27, 26, 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26,
+ 34, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 34, 33, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 31, 29, 28, 27, 26, 33, 33, 33, 32, 32, 31,
+ 31, 31, 31, 31, 31, 30, 29, 28, 27, 26, 33, 32, 32, 31, 31, 31, 31, 31,
+ 31, 31, 31, 29, 28, 28, 26, 25, 32, 32, 31, 31, 30, 30, 30, 30, 30, 30,
+ 30, 29, 28, 27, 26, 25, 31, 31, 30, 30, 30, 29, 29, 29, 29, 29, 29, 28,
+ 28, 26, 26, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26,
+ 25, 24, 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24,
+ 31, 30, 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30,
+ 30, 29, 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29,
+ 29, 28, 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 31, 30, 30, 29, 29, 28,
+ 28, 28, 28, 28, 28, 28, 27, 26, 25, 24, 30, 30, 29, 29, 28, 28, 28, 28,
+ 28, 28, 28, 27, 26, 26, 24, 23,
/* Size 4x16 */
- 33, 33, 33, 30, 33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 33,
- 33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32, 32, 28, 33, 32, 32, 28,
- 33, 32, 32, 28, 33, 32, 32, 28, 32, 31, 31, 28, 31, 29, 29, 26, 30, 28,
- 28, 26, 30, 28, 28, 26, 30, 28, 28, 26,
- /* Size 16x4 */
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 31, 30, 30, 30, 33, 33,
33, 33, 33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 33, 33, 33, 33,
33, 33, 32, 32, 32, 32, 32, 31, 29, 28, 28, 28, 30, 30, 29, 29, 29, 29,
28, 28, 28, 28, 28, 28, 26, 26, 26, 26,
+ /* Size 16x4 */
+ 33, 33, 33, 30, 33, 33, 33, 30, 33, 33, 33, 29, 33, 33, 33, 29, 33, 33,
+ 33, 29, 33, 33, 33, 29, 33, 32, 32, 28, 33, 32, 32, 28, 33, 32, 32, 28,
+ 33, 32, 32, 28, 33, 32, 32, 28, 32, 31, 31, 28, 31, 29, 29, 26, 30, 28,
+ 28, 26, 30, 28, 28, 26, 30, 28, 28, 26,
/* Size 8x32 */
- 32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 29, 33, 33,
- 33, 33, 33, 33, 31, 28, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33, 33, 33,
- 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33,
- 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28,
- 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33,
- 33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32, 30, 28, 33, 33, 33, 32,
- 32, 32, 30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32,
- 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27,
- 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33,
- 32, 32, 32, 32, 29, 27, 33, 33, 32, 31, 31, 31, 29, 27, 33, 32, 31, 31,
- 31, 31, 28, 26, 32, 31, 30, 30, 30, 30, 28, 26, 31, 30, 30, 29, 29, 29,
- 28, 26, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25,
- 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30,
- 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 30, 29, 28, 28,
- 28, 28, 26, 24,
- /* Size 32x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34,
34, 34, 34, 33, 33, 32, 31, 31, 31, 31, 31, 31, 31, 30, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12452,7 +12437,23 @@
30, 30, 30, 30, 30, 30, 30, 30, 30, 29, 29, 29, 29, 29, 29, 29, 28, 28,
28, 27, 27, 27, 27, 27, 27, 26, 29, 29, 28, 28, 28, 28, 28, 28, 28, 28,
28, 28, 28, 28, 27, 27, 27, 27, 27, 27, 27, 27, 26, 26, 26, 25, 25, 25,
- 25, 25, 25, 24 },
+ 25, 25, 25, 24,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 33, 33, 31, 29, 33, 33, 33, 33, 33, 33, 31, 29, 33, 33,
+ 33, 33, 33, 33, 31, 28, 33, 33, 33, 33, 33, 33, 31, 28, 33, 33, 33, 33,
+ 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33,
+ 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28,
+ 33, 33, 33, 33, 33, 33, 30, 28, 33, 33, 33, 33, 33, 33, 30, 28, 33, 33,
+ 33, 33, 33, 33, 30, 28, 33, 33, 33, 32, 32, 32, 30, 28, 33, 33, 33, 32,
+ 32, 32, 30, 28, 34, 33, 33, 32, 32, 32, 30, 27, 34, 33, 32, 32, 32, 32,
+ 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27,
+ 34, 33, 32, 32, 32, 32, 29, 27, 34, 33, 32, 32, 32, 32, 29, 27, 34, 33,
+ 32, 32, 32, 32, 29, 27, 33, 33, 32, 31, 31, 31, 29, 27, 33, 32, 31, 31,
+ 31, 31, 28, 26, 32, 31, 30, 30, 30, 30, 28, 26, 31, 30, 30, 29, 29, 29,
+ 28, 26, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25,
+ 31, 30, 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 31, 30,
+ 29, 28, 28, 28, 27, 25, 31, 30, 29, 28, 28, 28, 27, 25, 30, 29, 28, 28,
+ 28, 28, 26, 24 },
},
{
{ /* Luma */
@@ -12538,22 +12539,13 @@
32, 32, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
/* Size 4x8 */
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
- 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
- /* Size 8x4 */
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+ /* Size 8x4 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+ 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
/* Size 8x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 32,
- 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32,
- /* Size 16x8 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
@@ -12561,37 +12553,16 @@
32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32,
- /* Size 16x32 */
+ /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
- 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
- 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
- 32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 32x16 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 33, 33, 33, 32,
+ 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+ 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32,
+ /* Size 16x32 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12621,35 +12592,49 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 32x16 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33,
+ 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32, 32, 32, 32, 32,
/* Size 4x16 */
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 33, 32,
- 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
- 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
- 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
- /* Size 16x4 */
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 16x4 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 33, 32,
+ 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
+ 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32, 32, 32, 33, 32,
+ 32, 32, 33, 32, 32, 32, 33, 32, 32, 32,
/* Size 8x32 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
- 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
- 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
- 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
- 32, 32, 32, 32,
- /* Size 32x8 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
@@ -12662,6 +12647,22 @@
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+ 32, 32, 33, 33, 33, 32, 32, 32, 32, 32, 33, 33, 33, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+ 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
+ 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32,
+ 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32,
+ 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 33, 32, 32, 32, 32, 32, 32, 33, 33, 32, 32,
32, 32, 32, 32 },
{ /* Chroma */
/* Size 4x4 */
@@ -12746,21 +12747,12 @@
32, 32, 34, 34, 34, 34, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
/* Size 4x8 */
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 34, 33, 32, 32,
- /* Size 8x4 */
33, 33, 33, 33, 33, 33, 33, 34, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 32, 32, 33, 33, 33, 33, 33, 33, 32, 32,
+ /* Size 8x4 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 34, 33, 32, 32,
/* Size 8x16 */
- 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33,
- 33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 32, 32,
- 32, 32,
- /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12769,37 +12761,16 @@
33, 33, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
32, 32,
- /* Size 16x32 */
+ /* Size 16x8 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
- 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
- 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
- 34, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 33,
- 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33,
- 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33,
- 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33, 33, 33,
- 32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 32x16 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33,
+ 33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 32, 32,
+ 32, 32,
+ /* Size 16x32 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12829,17 +12800,7 @@
33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32,
- /* Size 4x16 */
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 33, 33,
- 32, 32, 33, 33, 32, 32, 34, 33, 32, 32,
- /* Size 16x4 */
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
- /* Size 8x32 */
+ /* Size 32x16 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12850,12 +12811,36 @@
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
- 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32,
- 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32,
- 34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33,
- 33, 33, 32, 32, 32, 32, 34, 33, 33, 33, 32, 32, 32, 32, 34, 33, 33, 33,
- 32, 32, 32, 32,
- /* Size 32x8 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32,
+ 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+ 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32,
+ 34, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 33,
+ 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33,
+ 33, 33, 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33,
+ 33, 33, 32, 32, 32, 32, 32, 32, 32, 32, 34, 34, 33, 33, 33, 33, 33, 33,
+ 32, 32, 32, 32, 32, 32, 32, 32,
+ /* Size 4x16 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 32, 32, 32, 32,
+ /* Size 16x4 */
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 33, 33,
+ 32, 32, 33, 33, 32, 32, 34, 33, 32, 32,
+ /* Size 8x32 */
32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
@@ -12870,6 +12855,22 @@
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32,
32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 32, 32,
+ 32, 32, 32, 32,
+ /* Size 32x8 */
+ 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
+ 33, 33, 33, 33, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32,
+ 32, 32, 33, 33, 33, 33, 33, 32, 32, 32, 33, 33, 33, 33, 33, 32, 32, 32,
+ 34, 33, 33, 33, 33, 32, 32, 32, 34, 33, 33, 33, 33, 32, 32, 32, 34, 33,
+ 33, 33, 32, 32, 32, 32, 34, 33, 33, 33, 32, 32, 32, 32, 34, 33, 33, 33,
32, 32, 32, 32 },
},
-};
+};
\ No newline at end of file
diff --git a/av1/common/reconinter.c b/av1/common/reconinter.c
index 11fd257..602fab7 100644
--- a/av1/common/reconinter.c
+++ b/av1/common/reconinter.c
@@ -30,11 +30,11 @@
// This function will determine whether or not to create a warped
// prediction.
-int av1_allow_warp(const MB_MODE_INFO *const mbmi,
- const WarpTypesAllowed *const warp_types,
- const WarpedMotionParams *const gm_params,
- int build_for_obmc, const struct scale_factors *const sf,
- WarpedMotionParams *final_warp_params) {
+static int allow_warp(const MB_MODE_INFO *const mbmi,
+ const WarpTypesAllowed *const warp_types,
+ const WarpedMotionParams *const gm_params,
+ int build_for_obmc, const struct scale_factors *const sf,
+ WarpedMotionParams *final_warp_params) {
// Note: As per the spec, we must test the fixed point scales here, which are
// at a higher precision (1 << 14) than the xs and ys in subpel_params (that
// have 1 << 10 precision).
@@ -65,9 +65,9 @@
if (xd->cur_frame_force_integer_mv) return;
- if (av1_allow_warp(mi, warp_types, &xd->global_motion[mi->ref_frame[ref]], 0,
- inter_pred_params->scale_factors,
- &inter_pred_params->warp_params)) {
+ if (allow_warp(mi, warp_types, &xd->global_motion[mi->ref_frame[ref]], 0,
+ inter_pred_params->scale_factors,
+ &inter_pred_params->warp_params)) {
inter_pred_params->mode = WARP_PRED;
}
}
@@ -819,11 +819,11 @@
#if DISABLE_CHROMA_U8X8_OBMC
case BLOCK_4X4:
case BLOCK_8X4:
- case BLOCK_4X8: return 1; break;
+ case BLOCK_4X8: return 1;
#else
case BLOCK_4X4:
case BLOCK_8X4:
- case BLOCK_4X8: return dir == 0; break;
+ case BLOCK_4X8: return dir == 0;
#endif
default: return 0;
}
@@ -832,8 +832,6 @@
void av1_modify_neighbor_predictor_for_obmc(MB_MODE_INFO *mbmi) {
mbmi->ref_frame[1] = NONE_FRAME;
mbmi->interinter_comp.type = COMPOUND_AVERAGE;
-
- return;
}
struct obmc_inter_pred_ctxt {
diff --git a/av1/common/reconinter.h b/av1/common/reconinter.h
index da7b84a..0b93d3b 100644
--- a/av1/common/reconinter.h
+++ b/av1/common/reconinter.h
@@ -482,12 +482,6 @@
const uint8_t *inter_pred, int inter_stride,
const uint8_t *intra_pred, int intra_stride);
-int av1_allow_warp(const MB_MODE_INFO *const mbmi,
- const WarpTypesAllowed *const warp_types,
- const WarpedMotionParams *const gm_params,
- int build_for_obmc, const struct scale_factors *const sf,
- WarpedMotionParams *final_warp_params);
-
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/av1/common/reconintra.c b/av1/common/reconintra.c
index d5f806e..3704b8a 100644
--- a/av1/common/reconintra.c
+++ b/av1/common/reconintra.c
@@ -1683,7 +1683,7 @@
#endif
CFL_CTX *const cfl = &xd->cfl;
CFL_PRED_TYPE pred_plane = get_cfl_pred_type(plane);
- if (cfl->dc_pred_is_cached[pred_plane] == 0) {
+ if (!cfl->dc_pred_is_cached[pred_plane]) {
av1_predict_intra_block(xd, seq_params->sb_size,
seq_params->enable_intra_edge_filter, pd->width,
pd->height, tx_size, mode, angle_delta,
@@ -1691,7 +1691,7 @@
dst, dst_stride, blk_col, blk_row, plane);
if (cfl->use_dc_pred_cache) {
cfl_store_dc_pred(xd, dst, pred_plane, tx_size_wide[tx_size]);
- cfl->dc_pred_is_cached[pred_plane] = 1;
+ cfl->dc_pred_is_cached[pred_plane] = true;
}
} else {
cfl_load_dc_pred(xd, dst, dst_stride, tx_size, pred_plane);
diff --git a/av1/common/resize.c b/av1/common/resize.c
index 242930c..f4bfcd0 100644
--- a/av1/common/resize.c
+++ b/av1/common/resize.c
@@ -20,6 +20,7 @@
#include "config/aom_config.h"
#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_ports/mem.h"
#include "aom_scale/aom_scale.h"
#include "av1/common/common.h"
@@ -1369,7 +1370,7 @@
AV1_COMMON *cm, YV12_BUFFER_CONFIG *unscaled, YV12_BUFFER_CONFIG *scaled,
const InterpFilter filter, const int phase, const bool use_optimized_scaler,
const bool for_psnr, const int border_in_pixels,
- const bool alloc_y_buffer_8bit) {
+ const int num_pyramid_levels) {
// If scaling is performed for the sole purpose of calculating PSNR, then our
// target dimensions are superres upscaled width/height. Otherwise our target
// dimensions are coded width/height.
@@ -1389,7 +1390,7 @@
scaled, scaled_width, scaled_height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
border_in_pixels, cm->features.byte_alignment, NULL, NULL, NULL,
- alloc_y_buffer_8bit, 0))
+ num_pyramid_levels, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate scaled buffer");
@@ -1421,9 +1422,8 @@
}
#endif
return scaled;
- } else {
- return unscaled;
}
+ return unscaled;
}
// Calculates the scaled dimension given the original dimension and the scale
@@ -1481,7 +1481,8 @@
// TODO(afergs): Look for in-place upscaling
// TODO(afergs): aom_ vs av1_ functions? Which can I use?
// Upscale decoded image.
-void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool) {
+void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool,
+ int num_pyramid_levels) {
const int num_planes = av1_num_planes(cm);
if (!av1_superres_scaled(cm)) return;
const SequenceHeader *const seq_params = cm->seq_params;
@@ -1496,7 +1497,7 @@
if (aom_alloc_frame_buffer(
©_buffer, aligned_width, cm->height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
- AOM_BORDER_IN_PIXELS, byte_alignment, 0))
+ AOM_BORDER_IN_PIXELS, byte_alignment, 0, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate copy buffer for superres upscaling");
@@ -1528,7 +1529,8 @@
frame_to_show, cm->superres_upscaled_width,
cm->superres_upscaled_height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
- AOM_BORDER_IN_PIXELS, byte_alignment, fb, cb, cb_priv, 0, 0)) {
+ AOM_BORDER_IN_PIXELS, byte_alignment, fb, cb, cb_priv,
+ num_pyramid_levels, 0)) {
unlock_buffer_pool(pool);
aom_internal_error(
cm->error, AOM_CODEC_MEM_ERROR,
@@ -1545,7 +1547,7 @@
frame_to_show, cm->superres_upscaled_width,
cm->superres_upscaled_height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
- AOM_BORDER_IN_PIXELS, byte_alignment, 0))
+ AOM_BORDER_IN_PIXELS, byte_alignment, num_pyramid_levels, 0))
aom_internal_error(
cm->error, AOM_CODEC_MEM_ERROR,
"Failed to reallocate current frame buffer for superres upscaling");
diff --git a/av1/common/resize.h b/av1/common/resize.h
index 4e8ee0f..5927d8e 100644
--- a/av1/common/resize.h
+++ b/av1/common/resize.h
@@ -75,7 +75,7 @@
AV1_COMMON *cm, YV12_BUFFER_CONFIG *unscaled, YV12_BUFFER_CONFIG *scaled,
const InterpFilter filter, const int phase, const bool use_optimized_scaler,
const bool for_psnr, const int border_in_pixels,
- const bool alloc_y_buffer_8bit);
+ const int num_pyramid_levels);
void av1_resize_and_extend_frame_nonnormative(const YV12_BUFFER_CONFIG *src,
YV12_BUFFER_CONFIG *dst, int bd,
@@ -95,7 +95,8 @@
// denominator.
void av1_calculate_unscaled_superres_size(int *width, int *height, int denom);
-void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool);
+void av1_superres_upscale(AV1_COMMON *cm, BufferPool *const pool,
+ int num_pyramid_levels);
// Returns 1 if a superres upscaled frame is scaled and 0 otherwise.
static INLINE int av1_superres_scaled(const AV1_COMMON *cm) {
diff --git a/av1/common/scan.c b/av1/common/scan.c
index b86068d..0943579 100644
--- a/av1/common/scan.c
+++ b/av1/common/scan.c
@@ -14,112 +14,91 @@
#include "av1/common/common_data.h"
#include "av1/common/scan.h"
-DECLARE_ALIGNED(16, static const int16_t,
- default_scan_4x4[16]) = { 0, 1, 4, 8, 5, 2, 3, 6,
- 9, 12, 13, 10, 7, 11, 14, 15 };
-
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x4[16]) = {
- 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
+DECLARE_ALIGNED(16, static const int16_t, default_scan_4x4[16]) = {
+ 0, 4, 1, 2, 5, 8, 12, 9, 6, 3, 7, 10, 13, 14, 11, 15,
};
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x4[16]) = {
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x4[16]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
};
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x4[16]) = {
+ 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
+};
+
DECLARE_ALIGNED(16, static const int16_t, default_scan_4x8[32]) = {
- 0, 1, 4, 2, 5, 8, 3, 6, 9, 12, 7, 10, 13, 16, 11, 14,
- 17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 27, 30, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x8[32]) = {
- 0, 4, 8, 12, 16, 20, 24, 28, 1, 5, 9, 13, 17, 21, 25, 29,
- 2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x8[32]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, default_scan_8x4[32]) = {
0, 8, 1, 16, 9, 2, 24, 17, 10, 3, 25, 18, 11, 4, 26, 19,
12, 5, 27, 20, 13, 6, 28, 21, 14, 7, 29, 22, 15, 30, 23, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x4[32]) = {
- 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
- 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x4[32]) = {
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x8[32]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_4x16[64]) = {
- 0, 1, 4, 2, 5, 8, 3, 6, 9, 12, 7, 10, 13, 16, 11, 14,
- 17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 32, 27, 30,
- 33, 36, 31, 34, 37, 40, 35, 38, 41, 44, 39, 42, 45, 48, 43, 46,
- 49, 52, 47, 50, 53, 56, 51, 54, 57, 60, 55, 58, 61, 59, 62, 63,
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x8[32]) = {
+ 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
+ 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_16x4[64]) = {
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x4[32]) = {
+ 0, 1, 4, 2, 5, 8, 3, 6, 9, 12, 7, 10, 13, 16, 11, 14,
+ 17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 27, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x4[32]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x4[32]) = {
+ 0, 4, 8, 12, 16, 20, 24, 28, 1, 5, 9, 13, 17, 21, 25, 29,
+ 2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_4x16[64]) = {
0, 16, 1, 32, 17, 2, 48, 33, 18, 3, 49, 34, 19, 4, 50, 35,
20, 5, 51, 36, 21, 6, 52, 37, 22, 7, 53, 38, 23, 8, 54, 39,
24, 9, 55, 40, 25, 10, 56, 41, 26, 11, 57, 42, 27, 12, 58, 43,
28, 13, 59, 44, 29, 14, 60, 45, 30, 15, 61, 46, 31, 62, 47, 63,
};
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x4[64]) = {
+ 0, 1, 4, 2, 5, 8, 3, 6, 9, 12, 7, 10, 13, 16, 11, 14,
+ 17, 20, 15, 18, 21, 24, 19, 22, 25, 28, 23, 26, 29, 32, 27, 30,
+ 33, 36, 31, 34, 37, 40, 35, 38, 41, 44, 39, 42, 45, 48, 43, 46,
+ 49, 52, 47, 50, 53, 56, 51, 54, 57, 60, 55, 58, 61, 59, 62, 63,
+};
+
DECLARE_ALIGNED(16, static const int16_t, mrow_scan_4x16[64]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
- 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
- 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x4[64]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
- 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
- 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x16[64]) = {
- 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
- 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61,
- 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62,
- 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x4[64]) = {
0, 16, 32, 48, 1, 17, 33, 49, 2, 18, 34, 50, 3, 19, 35, 51,
4, 20, 36, 52, 5, 21, 37, 53, 6, 22, 38, 54, 7, 23, 39, 55,
8, 24, 40, 56, 9, 25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59,
12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_8x32[256]) = {
- 0, 1, 8, 2, 9, 16, 3, 10, 17, 24, 4, 11, 18, 25, 32,
- 5, 12, 19, 26, 33, 40, 6, 13, 20, 27, 34, 41, 48, 7, 14,
- 21, 28, 35, 42, 49, 56, 15, 22, 29, 36, 43, 50, 57, 64, 23,
- 30, 37, 44, 51, 58, 65, 72, 31, 38, 45, 52, 59, 66, 73, 80,
- 39, 46, 53, 60, 67, 74, 81, 88, 47, 54, 61, 68, 75, 82, 89,
- 96, 55, 62, 69, 76, 83, 90, 97, 104, 63, 70, 77, 84, 91, 98,
- 105, 112, 71, 78, 85, 92, 99, 106, 113, 120, 79, 86, 93, 100, 107,
- 114, 121, 128, 87, 94, 101, 108, 115, 122, 129, 136, 95, 102, 109, 116,
- 123, 130, 137, 144, 103, 110, 117, 124, 131, 138, 145, 152, 111, 118, 125,
- 132, 139, 146, 153, 160, 119, 126, 133, 140, 147, 154, 161, 168, 127, 134,
- 141, 148, 155, 162, 169, 176, 135, 142, 149, 156, 163, 170, 177, 184, 143,
- 150, 157, 164, 171, 178, 185, 192, 151, 158, 165, 172, 179, 186, 193, 200,
- 159, 166, 173, 180, 187, 194, 201, 208, 167, 174, 181, 188, 195, 202, 209,
- 216, 175, 182, 189, 196, 203, 210, 217, 224, 183, 190, 197, 204, 211, 218,
- 225, 232, 191, 198, 205, 212, 219, 226, 233, 240, 199, 206, 213, 220, 227,
- 234, 241, 248, 207, 214, 221, 228, 235, 242, 249, 215, 222, 229, 236, 243,
- 250, 223, 230, 237, 244, 251, 231, 238, 245, 252, 239, 246, 253, 247, 254,
- 255,
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x4[64]) = {
+ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
+ 1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61,
+ 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62,
+ 3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_32x8[256]) = {
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_4x16[64]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x4[64]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x32[256]) = {
0, 32, 1, 64, 33, 2, 96, 65, 34, 3, 128, 97, 66, 35, 4,
160, 129, 98, 67, 36, 5, 192, 161, 130, 99, 68, 37, 6, 224, 193,
162, 131, 100, 69, 38, 7, 225, 194, 163, 132, 101, 70, 39, 8, 226,
@@ -140,49 +119,47 @@
255,
};
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x32[256]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x8[256]) = {
+ 0, 1, 8, 2, 9, 16, 3, 10, 17, 24, 4, 11, 18, 25, 32,
+ 5, 12, 19, 26, 33, 40, 6, 13, 20, 27, 34, 41, 48, 7, 14,
+ 21, 28, 35, 42, 49, 56, 15, 22, 29, 36, 43, 50, 57, 64, 23,
+ 30, 37, 44, 51, 58, 65, 72, 31, 38, 45, 52, 59, 66, 73, 80,
+ 39, 46, 53, 60, 67, 74, 81, 88, 47, 54, 61, 68, 75, 82, 89,
+ 96, 55, 62, 69, 76, 83, 90, 97, 104, 63, 70, 77, 84, 91, 98,
+ 105, 112, 71, 78, 85, 92, 99, 106, 113, 120, 79, 86, 93, 100, 107,
+ 114, 121, 128, 87, 94, 101, 108, 115, 122, 129, 136, 95, 102, 109, 116,
+ 123, 130, 137, 144, 103, 110, 117, 124, 131, 138, 145, 152, 111, 118, 125,
+ 132, 139, 146, 153, 160, 119, 126, 133, 140, 147, 154, 161, 168, 127, 134,
+ 141, 148, 155, 162, 169, 176, 135, 142, 149, 156, 163, 170, 177, 184, 143,
+ 150, 157, 164, 171, 178, 185, 192, 151, 158, 165, 172, 179, 186, 193, 200,
+ 159, 166, 173, 180, 187, 194, 201, 208, 167, 174, 181, 188, 195, 202, 209,
+ 216, 175, 182, 189, 196, 203, 210, 217, 224, 183, 190, 197, 204, 211, 218,
+ 225, 232, 191, 198, 205, 212, 219, 226, 233, 240, 199, 206, 213, 220, 227,
+ 234, 241, 248, 207, 214, 221, 228, 235, 242, 249, 215, 222, 229, 236, 243,
+ 250, 223, 230, 237, 244, 251, 231, 238, 245, 252, 239, 246, 253, 247, 254,
255,
};
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x32[256]) = {
+ 0, 32, 64, 96, 128, 160, 192, 224, 1, 33, 65, 97, 129, 161, 193, 225,
+ 2, 34, 66, 98, 130, 162, 194, 226, 3, 35, 67, 99, 131, 163, 195, 227,
+ 4, 36, 68, 100, 132, 164, 196, 228, 5, 37, 69, 101, 133, 165, 197, 229,
+ 6, 38, 70, 102, 134, 166, 198, 230, 7, 39, 71, 103, 135, 167, 199, 231,
+ 8, 40, 72, 104, 136, 168, 200, 232, 9, 41, 73, 105, 137, 169, 201, 233,
+ 10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
+ 12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
+ 14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
+ 16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
+ 18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
+ 20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
+ 22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
+ 24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
+ 26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
+ 28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
+ 30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
+};
+
DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x8[256]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x32[256]) = {
0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112,
120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232,
240, 248, 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97,
@@ -203,47 +180,81 @@
255,
};
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x32[256]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255,
+};
+
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x8[256]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 1, 33, 65, 97, 129, 161, 193, 225,
- 2, 34, 66, 98, 130, 162, 194, 226, 3, 35, 67, 99, 131, 163, 195, 227,
- 4, 36, 68, 100, 132, 164, 196, 228, 5, 37, 69, 101, 133, 165, 197, 229,
- 6, 38, 70, 102, 134, 166, 198, 230, 7, 39, 71, 103, 135, 167, 199, 231,
- 8, 40, 72, 104, 136, 168, 200, 232, 9, 41, 73, 105, 137, 169, 201, 233,
- 10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
- 12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
- 14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
- 16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
- 18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
- 20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
- 22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
- 24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
- 26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
- 28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
- 30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255,
};
DECLARE_ALIGNED(16, static const int16_t, default_scan_8x8[64]) = {
- 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18, 11, 4, 5,
- 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13, 6, 7, 14, 21, 28,
- 35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
- 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+ 0, 8, 1, 2, 9, 16, 24, 17, 10, 3, 4, 11, 18, 25, 32, 40,
+ 33, 26, 19, 12, 5, 6, 13, 20, 27, 34, 41, 48, 56, 49, 42, 35,
+ 28, 21, 14, 7, 15, 22, 29, 36, 43, 50, 57, 58, 51, 44, 37, 30,
+ 23, 31, 38, 45, 52, 59, 60, 53, 46, 39, 47, 54, 61, 62, 55, 63,
};
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x8[64]) = {
- 0, 8, 16, 24, 32, 40, 48, 56, 1, 9, 17, 25, 33, 41, 49, 57,
- 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
- 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
- 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x8[64]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
};
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x8[64]) = {
+ 0, 8, 16, 24, 32, 40, 48, 56, 1, 9, 17, 25, 33, 41, 49, 57,
+ 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
+ 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
+ 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
+};
+
DECLARE_ALIGNED(16, static const int16_t, default_scan_8x16[128]) = {
+ 0, 16, 1, 32, 17, 2, 48, 33, 18, 3, 64, 49, 34, 19, 4, 80,
+ 65, 50, 35, 20, 5, 96, 81, 66, 51, 36, 21, 6, 112, 97, 82, 67,
+ 52, 37, 22, 7, 113, 98, 83, 68, 53, 38, 23, 8, 114, 99, 84, 69,
+ 54, 39, 24, 9, 115, 100, 85, 70, 55, 40, 25, 10, 116, 101, 86, 71,
+ 56, 41, 26, 11, 117, 102, 87, 72, 57, 42, 27, 12, 118, 103, 88, 73,
+ 58, 43, 28, 13, 119, 104, 89, 74, 59, 44, 29, 14, 120, 105, 90, 75,
+ 60, 45, 30, 15, 121, 106, 91, 76, 61, 46, 31, 122, 107, 92, 77, 62,
+ 47, 123, 108, 93, 78, 63, 124, 109, 94, 79, 125, 110, 95, 126, 111, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, default_scan_16x8[128]) = {
0, 1, 8, 2, 9, 16, 3, 10, 17, 24, 4, 11, 18, 25, 32,
5, 12, 19, 26, 33, 40, 6, 13, 20, 27, 34, 41, 48, 7, 14,
21, 28, 35, 42, 49, 56, 15, 22, 29, 36, 43, 50, 57, 64, 23,
@@ -255,29 +266,31 @@
117, 124, 111, 118, 125, 119, 126, 127,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_16x8[128]) = {
- 0, 16, 1, 32, 17, 2, 48, 33, 18, 3, 64, 49, 34, 19, 4, 80,
- 65, 50, 35, 20, 5, 96, 81, 66, 51, 36, 21, 6, 112, 97, 82, 67,
- 52, 37, 22, 7, 113, 98, 83, 68, 53, 38, 23, 8, 114, 99, 84, 69,
- 54, 39, 24, 9, 115, 100, 85, 70, 55, 40, 25, 10, 116, 101, 86, 71,
- 56, 41, 26, 11, 117, 102, 87, 72, 57, 42, 27, 12, 118, 103, 88, 73,
- 58, 43, 28, 13, 119, 104, 89, 74, 59, 44, 29, 14, 120, 105, 90, 75,
- 60, 45, 30, 15, 121, 106, 91, 76, 61, 46, 31, 122, 107, 92, 77, 62,
- 47, 123, 108, 93, 78, 63, 124, 109, 94, 79, 125, 110, 95, 126, 111, 127,
-};
-
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_8x16[128]) = {
- 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120,
- 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121,
- 2, 10, 18, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98, 106, 114, 122,
- 3, 11, 19, 27, 35, 43, 51, 59, 67, 75, 83, 91, 99, 107, 115, 123,
- 4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124,
- 5, 13, 21, 29, 37, 45, 53, 61, 69, 77, 85, 93, 101, 109, 117, 125,
- 6, 14, 22, 30, 38, 46, 54, 62, 70, 78, 86, 94, 102, 110, 118, 126,
- 7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, 95, 103, 111, 119, 127,
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127,
};
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x8[128]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x16[128]) = {
0, 16, 32, 48, 64, 80, 96, 112, 1, 17, 33, 49, 65, 81, 97, 113,
2, 18, 34, 50, 66, 82, 98, 114, 3, 19, 35, 51, 67, 83, 99, 115,
4, 20, 36, 52, 68, 84, 100, 116, 5, 21, 37, 53, 69, 85, 101, 117,
@@ -288,69 +301,18 @@
14, 30, 46, 62, 78, 94, 110, 126, 15, 31, 47, 63, 79, 95, 111, 127,
};
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_8x16[128]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127,
-};
-
DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x8[128]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127,
+ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120,
+ 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121,
+ 2, 10, 18, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98, 106, 114, 122,
+ 3, 11, 19, 27, 35, 43, 51, 59, 67, 75, 83, 91, 99, 107, 115, 123,
+ 4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124,
+ 5, 13, 21, 29, 37, 45, 53, 61, 69, 77, 85, 93, 101, 109, 117, 125,
+ 6, 14, 22, 30, 38, 46, 54, 62, 70, 78, 86, 94, 102, 110, 118, 126,
+ 7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, 95, 103, 111, 119, 127,
};
DECLARE_ALIGNED(16, static const int16_t, default_scan_16x32[512]) = {
- 0, 1, 16, 2, 17, 32, 3, 18, 33, 48, 4, 19, 34, 49, 64,
- 5, 20, 35, 50, 65, 80, 6, 21, 36, 51, 66, 81, 96, 7, 22,
- 37, 52, 67, 82, 97, 112, 8, 23, 38, 53, 68, 83, 98, 113, 128,
- 9, 24, 39, 54, 69, 84, 99, 114, 129, 144, 10, 25, 40, 55, 70,
- 85, 100, 115, 130, 145, 160, 11, 26, 41, 56, 71, 86, 101, 116, 131,
- 146, 161, 176, 12, 27, 42, 57, 72, 87, 102, 117, 132, 147, 162, 177,
- 192, 13, 28, 43, 58, 73, 88, 103, 118, 133, 148, 163, 178, 193, 208,
- 14, 29, 44, 59, 74, 89, 104, 119, 134, 149, 164, 179, 194, 209, 224,
- 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225,
- 240, 31, 46, 61, 76, 91, 106, 121, 136, 151, 166, 181, 196, 211, 226,
- 241, 256, 47, 62, 77, 92, 107, 122, 137, 152, 167, 182, 197, 212, 227,
- 242, 257, 272, 63, 78, 93, 108, 123, 138, 153, 168, 183, 198, 213, 228,
- 243, 258, 273, 288, 79, 94, 109, 124, 139, 154, 169, 184, 199, 214, 229,
- 244, 259, 274, 289, 304, 95, 110, 125, 140, 155, 170, 185, 200, 215, 230,
- 245, 260, 275, 290, 305, 320, 111, 126, 141, 156, 171, 186, 201, 216, 231,
- 246, 261, 276, 291, 306, 321, 336, 127, 142, 157, 172, 187, 202, 217, 232,
- 247, 262, 277, 292, 307, 322, 337, 352, 143, 158, 173, 188, 203, 218, 233,
- 248, 263, 278, 293, 308, 323, 338, 353, 368, 159, 174, 189, 204, 219, 234,
- 249, 264, 279, 294, 309, 324, 339, 354, 369, 384, 175, 190, 205, 220, 235,
- 250, 265, 280, 295, 310, 325, 340, 355, 370, 385, 400, 191, 206, 221, 236,
- 251, 266, 281, 296, 311, 326, 341, 356, 371, 386, 401, 416, 207, 222, 237,
- 252, 267, 282, 297, 312, 327, 342, 357, 372, 387, 402, 417, 432, 223, 238,
- 253, 268, 283, 298, 313, 328, 343, 358, 373, 388, 403, 418, 433, 448, 239,
- 254, 269, 284, 299, 314, 329, 344, 359, 374, 389, 404, 419, 434, 449, 464,
- 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465,
- 480, 271, 286, 301, 316, 331, 346, 361, 376, 391, 406, 421, 436, 451, 466,
- 481, 496, 287, 302, 317, 332, 347, 362, 377, 392, 407, 422, 437, 452, 467,
- 482, 497, 303, 318, 333, 348, 363, 378, 393, 408, 423, 438, 453, 468, 483,
- 498, 319, 334, 349, 364, 379, 394, 409, 424, 439, 454, 469, 484, 499, 335,
- 350, 365, 380, 395, 410, 425, 440, 455, 470, 485, 500, 351, 366, 381, 396,
- 411, 426, 441, 456, 471, 486, 501, 367, 382, 397, 412, 427, 442, 457, 472,
- 487, 502, 383, 398, 413, 428, 443, 458, 473, 488, 503, 399, 414, 429, 444,
- 459, 474, 489, 504, 415, 430, 445, 460, 475, 490, 505, 431, 446, 461, 476,
- 491, 506, 447, 462, 477, 492, 507, 463, 478, 493, 508, 479, 494, 509, 495,
- 510, 511,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, default_scan_32x16[512]) = {
0, 32, 1, 64, 33, 2, 96, 65, 34, 3, 128, 97, 66, 35, 4,
160, 129, 98, 67, 36, 5, 192, 161, 130, 99, 68, 37, 6, 224, 193,
162, 131, 100, 69, 38, 7, 256, 225, 194, 163, 132, 101, 70, 39, 8,
@@ -388,7 +350,156 @@
479, 511,
};
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x16[512]) = {
+ 0, 1, 16, 2, 17, 32, 3, 18, 33, 48, 4, 19, 34, 49, 64,
+ 5, 20, 35, 50, 65, 80, 6, 21, 36, 51, 66, 81, 96, 7, 22,
+ 37, 52, 67, 82, 97, 112, 8, 23, 38, 53, 68, 83, 98, 113, 128,
+ 9, 24, 39, 54, 69, 84, 99, 114, 129, 144, 10, 25, 40, 55, 70,
+ 85, 100, 115, 130, 145, 160, 11, 26, 41, 56, 71, 86, 101, 116, 131,
+ 146, 161, 176, 12, 27, 42, 57, 72, 87, 102, 117, 132, 147, 162, 177,
+ 192, 13, 28, 43, 58, 73, 88, 103, 118, 133, 148, 163, 178, 193, 208,
+ 14, 29, 44, 59, 74, 89, 104, 119, 134, 149, 164, 179, 194, 209, 224,
+ 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225,
+ 240, 31, 46, 61, 76, 91, 106, 121, 136, 151, 166, 181, 196, 211, 226,
+ 241, 256, 47, 62, 77, 92, 107, 122, 137, 152, 167, 182, 197, 212, 227,
+ 242, 257, 272, 63, 78, 93, 108, 123, 138, 153, 168, 183, 198, 213, 228,
+ 243, 258, 273, 288, 79, 94, 109, 124, 139, 154, 169, 184, 199, 214, 229,
+ 244, 259, 274, 289, 304, 95, 110, 125, 140, 155, 170, 185, 200, 215, 230,
+ 245, 260, 275, 290, 305, 320, 111, 126, 141, 156, 171, 186, 201, 216, 231,
+ 246, 261, 276, 291, 306, 321, 336, 127, 142, 157, 172, 187, 202, 217, 232,
+ 247, 262, 277, 292, 307, 322, 337, 352, 143, 158, 173, 188, 203, 218, 233,
+ 248, 263, 278, 293, 308, 323, 338, 353, 368, 159, 174, 189, 204, 219, 234,
+ 249, 264, 279, 294, 309, 324, 339, 354, 369, 384, 175, 190, 205, 220, 235,
+ 250, 265, 280, 295, 310, 325, 340, 355, 370, 385, 400, 191, 206, 221, 236,
+ 251, 266, 281, 296, 311, 326, 341, 356, 371, 386, 401, 416, 207, 222, 237,
+ 252, 267, 282, 297, 312, 327, 342, 357, 372, 387, 402, 417, 432, 223, 238,
+ 253, 268, 283, 298, 313, 328, 343, 358, 373, 388, 403, 418, 433, 448, 239,
+ 254, 269, 284, 299, 314, 329, 344, 359, 374, 389, 404, 419, 434, 449, 464,
+ 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465,
+ 480, 271, 286, 301, 316, 331, 346, 361, 376, 391, 406, 421, 436, 451, 466,
+ 481, 496, 287, 302, 317, 332, 347, 362, 377, 392, 407, 422, 437, 452, 467,
+ 482, 497, 303, 318, 333, 348, 363, 378, 393, 408, 423, 438, 453, 468, 483,
+ 498, 319, 334, 349, 364, 379, 394, 409, 424, 439, 454, 469, 484, 499, 335,
+ 350, 365, 380, 395, 410, 425, 440, 455, 470, 485, 500, 351, 366, 381, 396,
+ 411, 426, 441, 456, 471, 486, 501, 367, 382, 397, 412, 427, 442, 457, 472,
+ 487, 502, 383, 398, 413, 428, 443, 458, 473, 488, 503, 399, 414, 429, 444,
+ 459, 474, 489, 504, 415, 430, 445, 460, 475, 490, 505, 431, 446, 461, 476,
+ 491, 506, 447, 462, 477, 492, 507, 463, 478, 493, 508, 479, 494, 509, 495,
+ 510, 511,
+};
+
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x32[512]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+ 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+ 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+ 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+ 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+ 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+ 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+ 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+ 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+ 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+ 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+ 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+ 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+ 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+ 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+ 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+ 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+ 510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x16[512]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+ 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+ 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+ 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+ 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+ 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+ 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+ 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+ 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+ 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+ 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+ 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+ 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+ 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+ 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+ 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+ 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+ 510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x32[512]) = {
+ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
+ 1, 33, 65, 97, 129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
+ 2, 34, 66, 98, 130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
+ 3, 35, 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+ 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
+ 5, 37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
+ 6, 38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
+ 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
+ 8, 40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
+ 9, 41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
+ 10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+ 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
+ 12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
+ 13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
+ 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
+ 15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
+ 16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
+ 17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+ 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
+ 19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
+ 20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
+ 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
+ 22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
+ 23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
+ 24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+ 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
+ 26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
+ 27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
+ 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
+ 29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
+ 30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
+ 31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x16[512]) = {
0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224,
240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464,
480, 496, 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193,
@@ -426,158 +537,28 @@
495, 511,
};
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x16[512]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
- 1, 33, 65, 97, 129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
- 2, 34, 66, 98, 130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
- 3, 35, 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
- 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
- 5, 37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
- 6, 38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
- 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
- 8, 40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
- 9, 41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
- 10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
- 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
- 12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
- 13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
- 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
- 15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
- 16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
- 17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
- 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
- 19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
- 20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
- 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
- 22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
- 23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
- 24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
- 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
- 26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
- 27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
- 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
- 29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
- 30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
- 31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x32[512]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
- 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
- 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
- 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
- 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
- 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
- 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
- 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
- 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
- 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
- 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
- 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
- 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
- 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
- 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
- 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
- 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
- 510, 511,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x16[512]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
- 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
- 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
- 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
- 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
- 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
- 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
- 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
- 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
- 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
- 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
- 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
- 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
- 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
- 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
- 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
- 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
- 510, 511,
-};
-
DECLARE_ALIGNED(16, static const int16_t, default_scan_16x16[256]) = {
- 0, 1, 16, 32, 17, 2, 3, 18, 33, 48, 64, 49, 34, 19, 4,
- 5, 20, 35, 50, 65, 80, 96, 81, 66, 51, 36, 21, 6, 7, 22,
- 37, 52, 67, 82, 97, 112, 128, 113, 98, 83, 68, 53, 38, 23, 8,
- 9, 24, 39, 54, 69, 84, 99, 114, 129, 144, 160, 145, 130, 115, 100,
- 85, 70, 55, 40, 25, 10, 11, 26, 41, 56, 71, 86, 101, 116, 131,
- 146, 161, 176, 192, 177, 162, 147, 132, 117, 102, 87, 72, 57, 42, 27,
- 12, 13, 28, 43, 58, 73, 88, 103, 118, 133, 148, 163, 178, 193, 208,
- 224, 209, 194, 179, 164, 149, 134, 119, 104, 89, 74, 59, 44, 29, 14,
- 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225,
- 240, 241, 226, 211, 196, 181, 166, 151, 136, 121, 106, 91, 76, 61, 46,
- 31, 47, 62, 77, 92, 107, 122, 137, 152, 167, 182, 197, 212, 227, 242,
- 243, 228, 213, 198, 183, 168, 153, 138, 123, 108, 93, 78, 63, 79, 94,
- 109, 124, 139, 154, 169, 184, 199, 214, 229, 244, 245, 230, 215, 200, 185,
- 170, 155, 140, 125, 110, 95, 111, 126, 141, 156, 171, 186, 201, 216, 231,
- 246, 247, 232, 217, 202, 187, 172, 157, 142, 127, 143, 158, 173, 188, 203,
- 218, 233, 248, 249, 234, 219, 204, 189, 174, 159, 175, 190, 205, 220, 235,
- 250, 251, 236, 221, 206, 191, 207, 222, 237, 252, 253, 238, 223, 239, 254,
- 255
+ 0, 16, 1, 2, 17, 32, 48, 33, 18, 3, 4, 19, 34, 49, 64,
+ 80, 65, 50, 35, 20, 5, 6, 21, 36, 51, 66, 81, 96, 112, 97,
+ 82, 67, 52, 37, 22, 7, 8, 23, 38, 53, 68, 83, 98, 113, 128,
+ 144, 129, 114, 99, 84, 69, 54, 39, 24, 9, 10, 25, 40, 55, 70,
+ 85, 100, 115, 130, 145, 160, 176, 161, 146, 131, 116, 101, 86, 71, 56,
+ 41, 26, 11, 12, 27, 42, 57, 72, 87, 102, 117, 132, 147, 162, 177,
+ 192, 208, 193, 178, 163, 148, 133, 118, 103, 88, 73, 58, 43, 28, 13,
+ 14, 29, 44, 59, 74, 89, 104, 119, 134, 149, 164, 179, 194, 209, 224,
+ 240, 225, 210, 195, 180, 165, 150, 135, 120, 105, 90, 75, 60, 45, 30,
+ 15, 31, 46, 61, 76, 91, 106, 121, 136, 151, 166, 181, 196, 211, 226,
+ 241, 242, 227, 212, 197, 182, 167, 152, 137, 122, 107, 92, 77, 62, 47,
+ 63, 78, 93, 108, 123, 138, 153, 168, 183, 198, 213, 228, 243, 244, 229,
+ 214, 199, 184, 169, 154, 139, 124, 109, 94, 79, 95, 110, 125, 140, 155,
+ 170, 185, 200, 215, 230, 245, 246, 231, 216, 201, 186, 171, 156, 141, 126,
+ 111, 127, 142, 157, 172, 187, 202, 217, 232, 247, 248, 233, 218, 203, 188,
+ 173, 158, 143, 159, 174, 189, 204, 219, 234, 249, 250, 235, 220, 205, 190,
+ 175, 191, 206, 221, 236, 251, 252, 237, 222, 207, 223, 238, 253, 254, 239,
+ 255,
};
DECLARE_ALIGNED(16, static const int16_t, mcol_scan_16x16[256]) = {
- 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240,
- 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193, 209, 225, 241,
- 2, 18, 34, 50, 66, 82, 98, 114, 130, 146, 162, 178, 194, 210, 226, 242,
- 3, 19, 35, 51, 67, 83, 99, 115, 131, 147, 163, 179, 195, 211, 227, 243,
- 4, 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
- 5, 21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
- 6, 22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
- 7, 23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
- 8, 24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
- 9, 25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
- 10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
- 11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
- 12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
- 13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
- 14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
- 15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x16[256]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
@@ -598,84 +579,26 @@
255,
};
-DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x32[1024]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416,
- 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864,
- 896, 928, 960, 992, 1, 33, 65, 97, 129, 161, 193, 225, 257, 289,
- 321, 353, 385, 417, 449, 481, 513, 545, 577, 609, 641, 673, 705, 737,
- 769, 801, 833, 865, 897, 929, 961, 993, 2, 34, 66, 98, 130, 162,
- 194, 226, 258, 290, 322, 354, 386, 418, 450, 482, 514, 546, 578, 610,
- 642, 674, 706, 738, 770, 802, 834, 866, 898, 930, 962, 994, 3, 35,
- 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
- 515, 547, 579, 611, 643, 675, 707, 739, 771, 803, 835, 867, 899, 931,
- 963, 995, 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356,
- 388, 420, 452, 484, 516, 548, 580, 612, 644, 676, 708, 740, 772, 804,
- 836, 868, 900, 932, 964, 996, 5, 37, 69, 101, 133, 165, 197, 229,
- 261, 293, 325, 357, 389, 421, 453, 485, 517, 549, 581, 613, 645, 677,
- 709, 741, 773, 805, 837, 869, 901, 933, 965, 997, 6, 38, 70, 102,
- 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486, 518, 550,
- 582, 614, 646, 678, 710, 742, 774, 806, 838, 870, 902, 934, 966, 998,
- 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423,
- 455, 487, 519, 551, 583, 615, 647, 679, 711, 743, 775, 807, 839, 871,
- 903, 935, 967, 999, 8, 40, 72, 104, 136, 168, 200, 232, 264, 296,
- 328, 360, 392, 424, 456, 488, 520, 552, 584, 616, 648, 680, 712, 744,
- 776, 808, 840, 872, 904, 936, 968, 1000, 9, 41, 73, 105, 137, 169,
- 201, 233, 265, 297, 329, 361, 393, 425, 457, 489, 521, 553, 585, 617,
- 649, 681, 713, 745, 777, 809, 841, 873, 905, 937, 969, 1001, 10, 42,
- 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
- 522, 554, 586, 618, 650, 682, 714, 746, 778, 810, 842, 874, 906, 938,
- 970, 1002, 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363,
- 395, 427, 459, 491, 523, 555, 587, 619, 651, 683, 715, 747, 779, 811,
- 843, 875, 907, 939, 971, 1003, 12, 44, 76, 108, 140, 172, 204, 236,
- 268, 300, 332, 364, 396, 428, 460, 492, 524, 556, 588, 620, 652, 684,
- 716, 748, 780, 812, 844, 876, 908, 940, 972, 1004, 13, 45, 77, 109,
- 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493, 525, 557,
- 589, 621, 653, 685, 717, 749, 781, 813, 845, 877, 909, 941, 973, 1005,
- 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430,
- 462, 494, 526, 558, 590, 622, 654, 686, 718, 750, 782, 814, 846, 878,
- 910, 942, 974, 1006, 15, 47, 79, 111, 143, 175, 207, 239, 271, 303,
- 335, 367, 399, 431, 463, 495, 527, 559, 591, 623, 655, 687, 719, 751,
- 783, 815, 847, 879, 911, 943, 975, 1007, 16, 48, 80, 112, 144, 176,
- 208, 240, 272, 304, 336, 368, 400, 432, 464, 496, 528, 560, 592, 624,
- 656, 688, 720, 752, 784, 816, 848, 880, 912, 944, 976, 1008, 17, 49,
- 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
- 529, 561, 593, 625, 657, 689, 721, 753, 785, 817, 849, 881, 913, 945,
- 977, 1009, 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370,
- 402, 434, 466, 498, 530, 562, 594, 626, 658, 690, 722, 754, 786, 818,
- 850, 882, 914, 946, 978, 1010, 19, 51, 83, 115, 147, 179, 211, 243,
- 275, 307, 339, 371, 403, 435, 467, 499, 531, 563, 595, 627, 659, 691,
- 723, 755, 787, 819, 851, 883, 915, 947, 979, 1011, 20, 52, 84, 116,
- 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500, 532, 564,
- 596, 628, 660, 692, 724, 756, 788, 820, 852, 884, 916, 948, 980, 1012,
- 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437,
- 469, 501, 533, 565, 597, 629, 661, 693, 725, 757, 789, 821, 853, 885,
- 917, 949, 981, 1013, 22, 54, 86, 118, 150, 182, 214, 246, 278, 310,
- 342, 374, 406, 438, 470, 502, 534, 566, 598, 630, 662, 694, 726, 758,
- 790, 822, 854, 886, 918, 950, 982, 1014, 23, 55, 87, 119, 151, 183,
- 215, 247, 279, 311, 343, 375, 407, 439, 471, 503, 535, 567, 599, 631,
- 663, 695, 727, 759, 791, 823, 855, 887, 919, 951, 983, 1015, 24, 56,
- 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
- 536, 568, 600, 632, 664, 696, 728, 760, 792, 824, 856, 888, 920, 952,
- 984, 1016, 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377,
- 409, 441, 473, 505, 537, 569, 601, 633, 665, 697, 729, 761, 793, 825,
- 857, 889, 921, 953, 985, 1017, 26, 58, 90, 122, 154, 186, 218, 250,
- 282, 314, 346, 378, 410, 442, 474, 506, 538, 570, 602, 634, 666, 698,
- 730, 762, 794, 826, 858, 890, 922, 954, 986, 1018, 27, 59, 91, 123,
- 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507, 539, 571,
- 603, 635, 667, 699, 731, 763, 795, 827, 859, 891, 923, 955, 987, 1019,
- 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444,
- 476, 508, 540, 572, 604, 636, 668, 700, 732, 764, 796, 828, 860, 892,
- 924, 956, 988, 1020, 29, 61, 93, 125, 157, 189, 221, 253, 285, 317,
- 349, 381, 413, 445, 477, 509, 541, 573, 605, 637, 669, 701, 733, 765,
- 797, 829, 861, 893, 925, 957, 989, 1021, 30, 62, 94, 126, 158, 190,
- 222, 254, 286, 318, 350, 382, 414, 446, 478, 510, 542, 574, 606, 638,
- 670, 702, 734, 766, 798, 830, 862, 894, 926, 958, 990, 1022, 31, 63,
- 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
- 543, 575, 607, 639, 671, 703, 735, 767, 799, 831, 863, 895, 927, 959,
- 991, 1023,
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_16x16[256]) = {
+ 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240,
+ 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193, 209, 225, 241,
+ 2, 18, 34, 50, 66, 82, 98, 114, 130, 146, 162, 178, 194, 210, 226, 242,
+ 3, 19, 35, 51, 67, 83, 99, 115, 131, 147, 163, 179, 195, 211, 227, 243,
+ 4, 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
+ 5, 21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
+ 6, 22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
+ 7, 23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+ 8, 24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
+ 9, 25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
+ 10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
+ 11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
+ 12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
+ 13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
+ 14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
+ 15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
};
-DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x32[1024]) = {
+DECLARE_ALIGNED(16, static const int16_t, mcol_scan_32x32[1024]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
@@ -757,194 +680,250 @@
1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
};
-DECLARE_ALIGNED(16, static const int16_t, default_scan_32x32[1024]) = {
- 0, 1, 32, 64, 33, 2, 3, 34, 65, 96, 128, 97, 66,
- 35, 4, 5, 36, 67, 98, 129, 160, 192, 161, 130, 99, 68,
- 37, 6, 7, 38, 69, 100, 131, 162, 193, 224, 256, 225, 194,
- 163, 132, 101, 70, 39, 8, 9, 40, 71, 102, 133, 164, 195,
- 226, 257, 288, 320, 289, 258, 227, 196, 165, 134, 103, 72, 41,
- 10, 11, 42, 73, 104, 135, 166, 197, 228, 259, 290, 321, 352,
- 384, 353, 322, 291, 260, 229, 198, 167, 136, 105, 74, 43, 12,
- 13, 44, 75, 106, 137, 168, 199, 230, 261, 292, 323, 354, 385,
- 416, 448, 417, 386, 355, 324, 293, 262, 231, 200, 169, 138, 107,
- 76, 45, 14, 15, 46, 77, 108, 139, 170, 201, 232, 263, 294,
- 325, 356, 387, 418, 449, 480, 512, 481, 450, 419, 388, 357, 326,
- 295, 264, 233, 202, 171, 140, 109, 78, 47, 16, 17, 48, 79,
- 110, 141, 172, 203, 234, 265, 296, 327, 358, 389, 420, 451, 482,
- 513, 544, 576, 545, 514, 483, 452, 421, 390, 359, 328, 297, 266,
- 235, 204, 173, 142, 111, 80, 49, 18, 19, 50, 81, 112, 143,
- 174, 205, 236, 267, 298, 329, 360, 391, 422, 453, 484, 515, 546,
- 577, 608, 640, 609, 578, 547, 516, 485, 454, 423, 392, 361, 330,
- 299, 268, 237, 206, 175, 144, 113, 82, 51, 20, 21, 52, 83,
- 114, 145, 176, 207, 238, 269, 300, 331, 362, 393, 424, 455, 486,
- 517, 548, 579, 610, 641, 672, 704, 673, 642, 611, 580, 549, 518,
- 487, 456, 425, 394, 363, 332, 301, 270, 239, 208, 177, 146, 115,
- 84, 53, 22, 23, 54, 85, 116, 147, 178, 209, 240, 271, 302,
- 333, 364, 395, 426, 457, 488, 519, 550, 581, 612, 643, 674, 705,
- 736, 768, 737, 706, 675, 644, 613, 582, 551, 520, 489, 458, 427,
- 396, 365, 334, 303, 272, 241, 210, 179, 148, 117, 86, 55, 24,
- 25, 56, 87, 118, 149, 180, 211, 242, 273, 304, 335, 366, 397,
- 428, 459, 490, 521, 552, 583, 614, 645, 676, 707, 738, 769, 800,
- 832, 801, 770, 739, 708, 677, 646, 615, 584, 553, 522, 491, 460,
- 429, 398, 367, 336, 305, 274, 243, 212, 181, 150, 119, 88, 57,
- 26, 27, 58, 89, 120, 151, 182, 213, 244, 275, 306, 337, 368,
- 399, 430, 461, 492, 523, 554, 585, 616, 647, 678, 709, 740, 771,
- 802, 833, 864, 896, 865, 834, 803, 772, 741, 710, 679, 648, 617,
- 586, 555, 524, 493, 462, 431, 400, 369, 338, 307, 276, 245, 214,
- 183, 152, 121, 90, 59, 28, 29, 60, 91, 122, 153, 184, 215,
- 246, 277, 308, 339, 370, 401, 432, 463, 494, 525, 556, 587, 618,
- 649, 680, 711, 742, 773, 804, 835, 866, 897, 928, 960, 929, 898,
- 867, 836, 805, 774, 743, 712, 681, 650, 619, 588, 557, 526, 495,
- 464, 433, 402, 371, 340, 309, 278, 247, 216, 185, 154, 123, 92,
- 61, 30, 31, 62, 93, 124, 155, 186, 217, 248, 279, 310, 341,
- 372, 403, 434, 465, 496, 527, 558, 589, 620, 651, 682, 713, 744,
- 775, 806, 837, 868, 899, 930, 961, 992, 993, 962, 931, 900, 869,
- 838, 807, 776, 745, 714, 683, 652, 621, 590, 559, 528, 497, 466,
- 435, 404, 373, 342, 311, 280, 249, 218, 187, 156, 125, 94, 63,
- 95, 126, 157, 188, 219, 250, 281, 312, 343, 374, 405, 436, 467,
- 498, 529, 560, 591, 622, 653, 684, 715, 746, 777, 808, 839, 870,
- 901, 932, 963, 994, 995, 964, 933, 902, 871, 840, 809, 778, 747,
- 716, 685, 654, 623, 592, 561, 530, 499, 468, 437, 406, 375, 344,
- 313, 282, 251, 220, 189, 158, 127, 159, 190, 221, 252, 283, 314,
- 345, 376, 407, 438, 469, 500, 531, 562, 593, 624, 655, 686, 717,
- 748, 779, 810, 841, 872, 903, 934, 965, 996, 997, 966, 935, 904,
- 873, 842, 811, 780, 749, 718, 687, 656, 625, 594, 563, 532, 501,
- 470, 439, 408, 377, 346, 315, 284, 253, 222, 191, 223, 254, 285,
- 316, 347, 378, 409, 440, 471, 502, 533, 564, 595, 626, 657, 688,
- 719, 750, 781, 812, 843, 874, 905, 936, 967, 998, 999, 968, 937,
- 906, 875, 844, 813, 782, 751, 720, 689, 658, 627, 596, 565, 534,
- 503, 472, 441, 410, 379, 348, 317, 286, 255, 287, 318, 349, 380,
- 411, 442, 473, 504, 535, 566, 597, 628, 659, 690, 721, 752, 783,
- 814, 845, 876, 907, 938, 969, 1000, 1001, 970, 939, 908, 877, 846,
- 815, 784, 753, 722, 691, 660, 629, 598, 567, 536, 505, 474, 443,
- 412, 381, 350, 319, 351, 382, 413, 444, 475, 506, 537, 568, 599,
- 630, 661, 692, 723, 754, 785, 816, 847, 878, 909, 940, 971, 1002,
- 1003, 972, 941, 910, 879, 848, 817, 786, 755, 724, 693, 662, 631,
- 600, 569, 538, 507, 476, 445, 414, 383, 415, 446, 477, 508, 539,
- 570, 601, 632, 663, 694, 725, 756, 787, 818, 849, 880, 911, 942,
- 973, 1004, 1005, 974, 943, 912, 881, 850, 819, 788, 757, 726, 695,
- 664, 633, 602, 571, 540, 509, 478, 447, 479, 510, 541, 572, 603,
- 634, 665, 696, 727, 758, 789, 820, 851, 882, 913, 944, 975, 1006,
- 1007, 976, 945, 914, 883, 852, 821, 790, 759, 728, 697, 666, 635,
- 604, 573, 542, 511, 543, 574, 605, 636, 667, 698, 729, 760, 791,
- 822, 853, 884, 915, 946, 977, 1008, 1009, 978, 947, 916, 885, 854,
- 823, 792, 761, 730, 699, 668, 637, 606, 575, 607, 638, 669, 700,
- 731, 762, 793, 824, 855, 886, 917, 948, 979, 1010, 1011, 980, 949,
- 918, 887, 856, 825, 794, 763, 732, 701, 670, 639, 671, 702, 733,
- 764, 795, 826, 857, 888, 919, 950, 981, 1012, 1013, 982, 951, 920,
- 889, 858, 827, 796, 765, 734, 703, 735, 766, 797, 828, 859, 890,
- 921, 952, 983, 1014, 1015, 984, 953, 922, 891, 860, 829, 798, 767,
- 799, 830, 861, 892, 923, 954, 985, 1016, 1017, 986, 955, 924, 893,
- 862, 831, 863, 894, 925, 956, 987, 1018, 1019, 988, 957, 926, 895,
- 927, 958, 989, 1020, 1021, 990, 959, 991, 1022, 1023
+DECLARE_ALIGNED(16, static const int16_t, mrow_scan_32x32[1024]) = {
+ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416,
+ 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864,
+ 896, 928, 960, 992, 1, 33, 65, 97, 129, 161, 193, 225, 257, 289,
+ 321, 353, 385, 417, 449, 481, 513, 545, 577, 609, 641, 673, 705, 737,
+ 769, 801, 833, 865, 897, 929, 961, 993, 2, 34, 66, 98, 130, 162,
+ 194, 226, 258, 290, 322, 354, 386, 418, 450, 482, 514, 546, 578, 610,
+ 642, 674, 706, 738, 770, 802, 834, 866, 898, 930, 962, 994, 3, 35,
+ 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+ 515, 547, 579, 611, 643, 675, 707, 739, 771, 803, 835, 867, 899, 931,
+ 963, 995, 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356,
+ 388, 420, 452, 484, 516, 548, 580, 612, 644, 676, 708, 740, 772, 804,
+ 836, 868, 900, 932, 964, 996, 5, 37, 69, 101, 133, 165, 197, 229,
+ 261, 293, 325, 357, 389, 421, 453, 485, 517, 549, 581, 613, 645, 677,
+ 709, 741, 773, 805, 837, 869, 901, 933, 965, 997, 6, 38, 70, 102,
+ 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486, 518, 550,
+ 582, 614, 646, 678, 710, 742, 774, 806, 838, 870, 902, 934, 966, 998,
+ 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423,
+ 455, 487, 519, 551, 583, 615, 647, 679, 711, 743, 775, 807, 839, 871,
+ 903, 935, 967, 999, 8, 40, 72, 104, 136, 168, 200, 232, 264, 296,
+ 328, 360, 392, 424, 456, 488, 520, 552, 584, 616, 648, 680, 712, 744,
+ 776, 808, 840, 872, 904, 936, 968, 1000, 9, 41, 73, 105, 137, 169,
+ 201, 233, 265, 297, 329, 361, 393, 425, 457, 489, 521, 553, 585, 617,
+ 649, 681, 713, 745, 777, 809, 841, 873, 905, 937, 969, 1001, 10, 42,
+ 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+ 522, 554, 586, 618, 650, 682, 714, 746, 778, 810, 842, 874, 906, 938,
+ 970, 1002, 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363,
+ 395, 427, 459, 491, 523, 555, 587, 619, 651, 683, 715, 747, 779, 811,
+ 843, 875, 907, 939, 971, 1003, 12, 44, 76, 108, 140, 172, 204, 236,
+ 268, 300, 332, 364, 396, 428, 460, 492, 524, 556, 588, 620, 652, 684,
+ 716, 748, 780, 812, 844, 876, 908, 940, 972, 1004, 13, 45, 77, 109,
+ 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493, 525, 557,
+ 589, 621, 653, 685, 717, 749, 781, 813, 845, 877, 909, 941, 973, 1005,
+ 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430,
+ 462, 494, 526, 558, 590, 622, 654, 686, 718, 750, 782, 814, 846, 878,
+ 910, 942, 974, 1006, 15, 47, 79, 111, 143, 175, 207, 239, 271, 303,
+ 335, 367, 399, 431, 463, 495, 527, 559, 591, 623, 655, 687, 719, 751,
+ 783, 815, 847, 879, 911, 943, 975, 1007, 16, 48, 80, 112, 144, 176,
+ 208, 240, 272, 304, 336, 368, 400, 432, 464, 496, 528, 560, 592, 624,
+ 656, 688, 720, 752, 784, 816, 848, 880, 912, 944, 976, 1008, 17, 49,
+ 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+ 529, 561, 593, 625, 657, 689, 721, 753, 785, 817, 849, 881, 913, 945,
+ 977, 1009, 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370,
+ 402, 434, 466, 498, 530, 562, 594, 626, 658, 690, 722, 754, 786, 818,
+ 850, 882, 914, 946, 978, 1010, 19, 51, 83, 115, 147, 179, 211, 243,
+ 275, 307, 339, 371, 403, 435, 467, 499, 531, 563, 595, 627, 659, 691,
+ 723, 755, 787, 819, 851, 883, 915, 947, 979, 1011, 20, 52, 84, 116,
+ 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500, 532, 564,
+ 596, 628, 660, 692, 724, 756, 788, 820, 852, 884, 916, 948, 980, 1012,
+ 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437,
+ 469, 501, 533, 565, 597, 629, 661, 693, 725, 757, 789, 821, 853, 885,
+ 917, 949, 981, 1013, 22, 54, 86, 118, 150, 182, 214, 246, 278, 310,
+ 342, 374, 406, 438, 470, 502, 534, 566, 598, 630, 662, 694, 726, 758,
+ 790, 822, 854, 886, 918, 950, 982, 1014, 23, 55, 87, 119, 151, 183,
+ 215, 247, 279, 311, 343, 375, 407, 439, 471, 503, 535, 567, 599, 631,
+ 663, 695, 727, 759, 791, 823, 855, 887, 919, 951, 983, 1015, 24, 56,
+ 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+ 536, 568, 600, 632, 664, 696, 728, 760, 792, 824, 856, 888, 920, 952,
+ 984, 1016, 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377,
+ 409, 441, 473, 505, 537, 569, 601, 633, 665, 697, 729, 761, 793, 825,
+ 857, 889, 921, 953, 985, 1017, 26, 58, 90, 122, 154, 186, 218, 250,
+ 282, 314, 346, 378, 410, 442, 474, 506, 538, 570, 602, 634, 666, 698,
+ 730, 762, 794, 826, 858, 890, 922, 954, 986, 1018, 27, 59, 91, 123,
+ 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507, 539, 571,
+ 603, 635, 667, 699, 731, 763, 795, 827, 859, 891, 923, 955, 987, 1019,
+ 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444,
+ 476, 508, 540, 572, 604, 636, 668, 700, 732, 764, 796, 828, 860, 892,
+ 924, 956, 988, 1020, 29, 61, 93, 125, 157, 189, 221, 253, 285, 317,
+ 349, 381, 413, 445, 477, 509, 541, 573, 605, 637, 669, 701, 733, 765,
+ 797, 829, 861, 893, 925, 957, 989, 1021, 30, 62, 94, 126, 158, 190,
+ 222, 254, 286, 318, 350, 382, 414, 446, 478, 510, 542, 574, 606, 638,
+ 670, 702, 734, 766, 798, 830, 862, 894, 926, 958, 990, 1022, 31, 63,
+ 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+ 543, 575, 607, 639, 671, 703, 735, 767, 799, 831, 863, 895, 927, 959,
+ 991, 1023,
};
-DECLARE_ALIGNED(16, static const int16_t,
- av1_default_iscan_4x4[16]) = { 0, 1, 5, 6, 2, 4, 7, 12,
- 3, 8, 11, 13, 9, 10, 14, 15 };
+DECLARE_ALIGNED(16, static const int16_t, default_scan_32x32[1024]) = {
+ 0, 32, 1, 2, 33, 64, 96, 65, 34, 3, 4, 35, 66,
+ 97, 128, 160, 129, 98, 67, 36, 5, 6, 37, 68, 99, 130,
+ 161, 192, 224, 193, 162, 131, 100, 69, 38, 7, 8, 39, 70,
+ 101, 132, 163, 194, 225, 256, 288, 257, 226, 195, 164, 133, 102,
+ 71, 40, 9, 10, 41, 72, 103, 134, 165, 196, 227, 258, 289,
+ 320, 352, 321, 290, 259, 228, 197, 166, 135, 104, 73, 42, 11,
+ 12, 43, 74, 105, 136, 167, 198, 229, 260, 291, 322, 353, 384,
+ 416, 385, 354, 323, 292, 261, 230, 199, 168, 137, 106, 75, 44,
+ 13, 14, 45, 76, 107, 138, 169, 200, 231, 262, 293, 324, 355,
+ 386, 417, 448, 480, 449, 418, 387, 356, 325, 294, 263, 232, 201,
+ 170, 139, 108, 77, 46, 15, 16, 47, 78, 109, 140, 171, 202,
+ 233, 264, 295, 326, 357, 388, 419, 450, 481, 512, 544, 513, 482,
+ 451, 420, 389, 358, 327, 296, 265, 234, 203, 172, 141, 110, 79,
+ 48, 17, 18, 49, 80, 111, 142, 173, 204, 235, 266, 297, 328,
+ 359, 390, 421, 452, 483, 514, 545, 576, 608, 577, 546, 515, 484,
+ 453, 422, 391, 360, 329, 298, 267, 236, 205, 174, 143, 112, 81,
+ 50, 19, 20, 51, 82, 113, 144, 175, 206, 237, 268, 299, 330,
+ 361, 392, 423, 454, 485, 516, 547, 578, 609, 640, 672, 641, 610,
+ 579, 548, 517, 486, 455, 424, 393, 362, 331, 300, 269, 238, 207,
+ 176, 145, 114, 83, 52, 21, 22, 53, 84, 115, 146, 177, 208,
+ 239, 270, 301, 332, 363, 394, 425, 456, 487, 518, 549, 580, 611,
+ 642, 673, 704, 736, 705, 674, 643, 612, 581, 550, 519, 488, 457,
+ 426, 395, 364, 333, 302, 271, 240, 209, 178, 147, 116, 85, 54,
+ 23, 24, 55, 86, 117, 148, 179, 210, 241, 272, 303, 334, 365,
+ 396, 427, 458, 489, 520, 551, 582, 613, 644, 675, 706, 737, 768,
+ 800, 769, 738, 707, 676, 645, 614, 583, 552, 521, 490, 459, 428,
+ 397, 366, 335, 304, 273, 242, 211, 180, 149, 118, 87, 56, 25,
+ 26, 57, 88, 119, 150, 181, 212, 243, 274, 305, 336, 367, 398,
+ 429, 460, 491, 522, 553, 584, 615, 646, 677, 708, 739, 770, 801,
+ 832, 864, 833, 802, 771, 740, 709, 678, 647, 616, 585, 554, 523,
+ 492, 461, 430, 399, 368, 337, 306, 275, 244, 213, 182, 151, 120,
+ 89, 58, 27, 28, 59, 90, 121, 152, 183, 214, 245, 276, 307,
+ 338, 369, 400, 431, 462, 493, 524, 555, 586, 617, 648, 679, 710,
+ 741, 772, 803, 834, 865, 896, 928, 897, 866, 835, 804, 773, 742,
+ 711, 680, 649, 618, 587, 556, 525, 494, 463, 432, 401, 370, 339,
+ 308, 277, 246, 215, 184, 153, 122, 91, 60, 29, 30, 61, 92,
+ 123, 154, 185, 216, 247, 278, 309, 340, 371, 402, 433, 464, 495,
+ 526, 557, 588, 619, 650, 681, 712, 743, 774, 805, 836, 867, 898,
+ 929, 960, 992, 961, 930, 899, 868, 837, 806, 775, 744, 713, 682,
+ 651, 620, 589, 558, 527, 496, 465, 434, 403, 372, 341, 310, 279,
+ 248, 217, 186, 155, 124, 93, 62, 31, 63, 94, 125, 156, 187,
+ 218, 249, 280, 311, 342, 373, 404, 435, 466, 497, 528, 559, 590,
+ 621, 652, 683, 714, 745, 776, 807, 838, 869, 900, 931, 962, 993,
+ 994, 963, 932, 901, 870, 839, 808, 777, 746, 715, 684, 653, 622,
+ 591, 560, 529, 498, 467, 436, 405, 374, 343, 312, 281, 250, 219,
+ 188, 157, 126, 95, 127, 158, 189, 220, 251, 282, 313, 344, 375,
+ 406, 437, 468, 499, 530, 561, 592, 623, 654, 685, 716, 747, 778,
+ 809, 840, 871, 902, 933, 964, 995, 996, 965, 934, 903, 872, 841,
+ 810, 779, 748, 717, 686, 655, 624, 593, 562, 531, 500, 469, 438,
+ 407, 376, 345, 314, 283, 252, 221, 190, 159, 191, 222, 253, 284,
+ 315, 346, 377, 408, 439, 470, 501, 532, 563, 594, 625, 656, 687,
+ 718, 749, 780, 811, 842, 873, 904, 935, 966, 997, 998, 967, 936,
+ 905, 874, 843, 812, 781, 750, 719, 688, 657, 626, 595, 564, 533,
+ 502, 471, 440, 409, 378, 347, 316, 285, 254, 223, 255, 286, 317,
+ 348, 379, 410, 441, 472, 503, 534, 565, 596, 627, 658, 689, 720,
+ 751, 782, 813, 844, 875, 906, 937, 968, 999, 1000, 969, 938, 907,
+ 876, 845, 814, 783, 752, 721, 690, 659, 628, 597, 566, 535, 504,
+ 473, 442, 411, 380, 349, 318, 287, 319, 350, 381, 412, 443, 474,
+ 505, 536, 567, 598, 629, 660, 691, 722, 753, 784, 815, 846, 877,
+ 908, 939, 970, 1001, 1002, 971, 940, 909, 878, 847, 816, 785, 754,
+ 723, 692, 661, 630, 599, 568, 537, 506, 475, 444, 413, 382, 351,
+ 383, 414, 445, 476, 507, 538, 569, 600, 631, 662, 693, 724, 755,
+ 786, 817, 848, 879, 910, 941, 972, 1003, 1004, 973, 942, 911, 880,
+ 849, 818, 787, 756, 725, 694, 663, 632, 601, 570, 539, 508, 477,
+ 446, 415, 447, 478, 509, 540, 571, 602, 633, 664, 695, 726, 757,
+ 788, 819, 850, 881, 912, 943, 974, 1005, 1006, 975, 944, 913, 882,
+ 851, 820, 789, 758, 727, 696, 665, 634, 603, 572, 541, 510, 479,
+ 511, 542, 573, 604, 635, 666, 697, 728, 759, 790, 821, 852, 883,
+ 914, 945, 976, 1007, 1008, 977, 946, 915, 884, 853, 822, 791, 760,
+ 729, 698, 667, 636, 605, 574, 543, 575, 606, 637, 668, 699, 730,
+ 761, 792, 823, 854, 885, 916, 947, 978, 1009, 1010, 979, 948, 917,
+ 886, 855, 824, 793, 762, 731, 700, 669, 638, 607, 639, 670, 701,
+ 732, 763, 794, 825, 856, 887, 918, 949, 980, 1011, 1012, 981, 950,
+ 919, 888, 857, 826, 795, 764, 733, 702, 671, 703, 734, 765, 796,
+ 827, 858, 889, 920, 951, 982, 1013, 1014, 983, 952, 921, 890, 859,
+ 828, 797, 766, 735, 767, 798, 829, 860, 891, 922, 953, 984, 1015,
+ 1016, 985, 954, 923, 892, 861, 830, 799, 831, 862, 893, 924, 955,
+ 986, 1017, 1018, 987, 956, 925, 894, 863, 895, 926, 957, 988, 1019,
+ 1020, 989, 958, 927, 959, 990, 1021, 1022, 991, 1023,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x4[16]) = {
+ 0, 2, 3, 9, 1, 4, 8, 10, 5, 7, 11, 14, 6, 12, 13, 15,
+};
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x4[16]) = {
- 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x4[16]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x4[16]) = {
+ 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x8[32]) = {
- 0, 1, 3, 6, 2, 4, 7, 10, 5, 8, 11, 14, 9, 12, 15, 18,
- 13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 29, 25, 28, 30, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x8[32]) = {
- 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
- 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x8[32]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x4[32]) = {
0, 2, 5, 9, 13, 17, 21, 25, 1, 4, 8, 12, 16, 20, 24, 28,
3, 7, 11, 15, 19, 23, 27, 30, 6, 10, 14, 18, 22, 26, 29, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x4[32]) = {
- 0, 4, 8, 12, 16, 20, 24, 28, 1, 5, 9, 13, 17, 21, 25, 29,
- 2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x4[32]) = {
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x8[32]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x16[64]) = {
- 0, 1, 3, 6, 2, 4, 7, 10, 5, 8, 11, 14, 9, 12, 15, 18,
- 13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 30, 25, 28, 31, 34,
- 29, 32, 35, 38, 33, 36, 39, 42, 37, 40, 43, 46, 41, 44, 47, 50,
- 45, 48, 51, 54, 49, 52, 55, 58, 53, 56, 59, 61, 57, 60, 62, 63,
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x8[32]) = {
+ 0, 4, 8, 12, 16, 20, 24, 28, 1, 5, 9, 13, 17, 21, 25, 29,
+ 2, 6, 10, 14, 18, 22, 26, 30, 3, 7, 11, 15, 19, 23, 27, 31,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x4[64]) = {
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x4[32]) = {
+ 0, 1, 3, 6, 2, 4, 7, 10, 5, 8, 11, 14, 9, 12, 15, 18,
+ 13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 29, 25, 28, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x4[32]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x4[32]) = {
+ 0, 8, 16, 24, 1, 9, 17, 25, 2, 10, 18, 26, 3, 11, 19, 27,
+ 4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_4x16[64]) = {
0, 2, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57,
1, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 62,
6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 61, 63,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x4[64]) = {
+ 0, 1, 3, 6, 2, 4, 7, 10, 5, 8, 11, 14, 9, 12, 15, 18,
+ 13, 16, 19, 22, 17, 20, 23, 26, 21, 24, 27, 30, 25, 28, 31, 34,
+ 29, 32, 35, 38, 33, 36, 39, 42, 37, 40, 43, 46, 41, 44, 47, 50,
+ 45, 48, 51, 54, 49, 52, 55, 58, 53, 56, 59, 61, 57, 60, 62, 63,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_4x16[64]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
- 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
- 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x4[64]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
- 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
- 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
- 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x16[64]) = {
- 0, 16, 32, 48, 1, 17, 33, 49, 2, 18, 34, 50, 3, 19, 35, 51,
- 4, 20, 36, 52, 5, 21, 37, 53, 6, 22, 38, 54, 7, 23, 39, 55,
- 8, 24, 40, 56, 9, 25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59,
- 12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x4[64]) = {
0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60,
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61,
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62,
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x32[256]) = {
- 0, 1, 3, 6, 10, 15, 21, 28, 2, 4, 7, 11, 16, 22, 29,
- 36, 5, 8, 12, 17, 23, 30, 37, 44, 9, 13, 18, 24, 31, 38,
- 45, 52, 14, 19, 25, 32, 39, 46, 53, 60, 20, 26, 33, 40, 47,
- 54, 61, 68, 27, 34, 41, 48, 55, 62, 69, 76, 35, 42, 49, 56,
- 63, 70, 77, 84, 43, 50, 57, 64, 71, 78, 85, 92, 51, 58, 65,
- 72, 79, 86, 93, 100, 59, 66, 73, 80, 87, 94, 101, 108, 67, 74,
- 81, 88, 95, 102, 109, 116, 75, 82, 89, 96, 103, 110, 117, 124, 83,
- 90, 97, 104, 111, 118, 125, 132, 91, 98, 105, 112, 119, 126, 133, 140,
- 99, 106, 113, 120, 127, 134, 141, 148, 107, 114, 121, 128, 135, 142, 149,
- 156, 115, 122, 129, 136, 143, 150, 157, 164, 123, 130, 137, 144, 151, 158,
- 165, 172, 131, 138, 145, 152, 159, 166, 173, 180, 139, 146, 153, 160, 167,
- 174, 181, 188, 147, 154, 161, 168, 175, 182, 189, 196, 155, 162, 169, 176,
- 183, 190, 197, 204, 163, 170, 177, 184, 191, 198, 205, 212, 171, 178, 185,
- 192, 199, 206, 213, 220, 179, 186, 193, 200, 207, 214, 221, 228, 187, 194,
- 201, 208, 215, 222, 229, 235, 195, 202, 209, 216, 223, 230, 236, 241, 203,
- 210, 217, 224, 231, 237, 242, 246, 211, 218, 225, 232, 238, 243, 247, 250,
- 219, 226, 233, 239, 244, 248, 251, 253, 227, 234, 240, 245, 249, 252, 254,
- 255,
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x4[64]) = {
+ 0, 16, 32, 48, 1, 17, 33, 49, 2, 18, 34, 50, 3, 19, 35, 51,
+ 4, 20, 36, 52, 5, 21, 37, 53, 6, 22, 38, 54, 7, 23, 39, 55,
+ 8, 24, 40, 56, 9, 25, 41, 57, 10, 26, 42, 58, 11, 27, 43, 59,
+ 12, 28, 44, 60, 13, 29, 45, 61, 14, 30, 46, 62, 15, 31, 47, 63,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x8[256]) = {
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_4x16[64]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x4[64]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
+ 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+ 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
+ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x32[256]) = {
0, 2, 5, 9, 14, 20, 27, 35, 43, 51, 59, 67, 75, 83, 91,
99, 107, 115, 123, 131, 139, 147, 155, 163, 171, 179, 187, 195, 203, 211,
219, 227, 1, 4, 8, 13, 19, 26, 34, 42, 50, 58, 66, 74, 82,
@@ -965,68 +944,28 @@
255,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x8[256]) = {
+ 0, 1, 3, 6, 10, 15, 21, 28, 2, 4, 7, 11, 16, 22, 29,
+ 36, 5, 8, 12, 17, 23, 30, 37, 44, 9, 13, 18, 24, 31, 38,
+ 45, 52, 14, 19, 25, 32, 39, 46, 53, 60, 20, 26, 33, 40, 47,
+ 54, 61, 68, 27, 34, 41, 48, 55, 62, 69, 76, 35, 42, 49, 56,
+ 63, 70, 77, 84, 43, 50, 57, 64, 71, 78, 85, 92, 51, 58, 65,
+ 72, 79, 86, 93, 100, 59, 66, 73, 80, 87, 94, 101, 108, 67, 74,
+ 81, 88, 95, 102, 109, 116, 75, 82, 89, 96, 103, 110, 117, 124, 83,
+ 90, 97, 104, 111, 118, 125, 132, 91, 98, 105, 112, 119, 126, 133, 140,
+ 99, 106, 113, 120, 127, 134, 141, 148, 107, 114, 121, 128, 135, 142, 149,
+ 156, 115, 122, 129, 136, 143, 150, 157, 164, 123, 130, 137, 144, 151, 158,
+ 165, 172, 131, 138, 145, 152, 159, 166, 173, 180, 139, 146, 153, 160, 167,
+ 174, 181, 188, 147, 154, 161, 168, 175, 182, 189, 196, 155, 162, 169, 176,
+ 183, 190, 197, 204, 163, 170, 177, 184, 191, 198, 205, 212, 171, 178, 185,
+ 192, 199, 206, 213, 220, 179, 186, 193, 200, 207, 214, 221, 228, 187, 194,
+ 201, 208, 215, 222, 229, 235, 195, 202, 209, 216, 223, 230, 236, 241, 203,
+ 210, 217, 224, 231, 237, 242, 246, 211, 218, 225, 232, 238, 243, 247, 250,
+ 219, 226, 233, 239, 244, 248, 251, 253, 227, 234, 240, 245, 249, 252, 254,
+ 255,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x32[256]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x8[256]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x32[256]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 1, 33, 65, 97, 129, 161, 193, 225,
- 2, 34, 66, 98, 130, 162, 194, 226, 3, 35, 67, 99, 131, 163, 195, 227,
- 4, 36, 68, 100, 132, 164, 196, 228, 5, 37, 69, 101, 133, 165, 197, 229,
- 6, 38, 70, 102, 134, 166, 198, 230, 7, 39, 71, 103, 135, 167, 199, 231,
- 8, 40, 72, 104, 136, 168, 200, 232, 9, 41, 73, 105, 137, 169, 201, 233,
- 10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
- 12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
- 14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
- 16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
- 18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
- 20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
- 22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
- 24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
- 26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
- 28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
- 30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x8[256]) = {
0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112,
120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232,
240, 248, 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97,
@@ -1047,39 +986,89 @@
255,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x8[64]) = {
- 0, 8, 16, 24, 32, 40, 48, 56, 1, 9, 17, 25, 33, 41, 49, 57,
- 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
- 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
- 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x8[256]) = {
+ 0, 32, 64, 96, 128, 160, 192, 224, 1, 33, 65, 97, 129, 161, 193, 225,
+ 2, 34, 66, 98, 130, 162, 194, 226, 3, 35, 67, 99, 131, 163, 195, 227,
+ 4, 36, 68, 100, 132, 164, 196, 228, 5, 37, 69, 101, 133, 165, 197, 229,
+ 6, 38, 70, 102, 134, 166, 198, 230, 7, 39, 71, 103, 135, 167, 199, 231,
+ 8, 40, 72, 104, 136, 168, 200, 232, 9, 41, 73, 105, 137, 169, 201, 233,
+ 10, 42, 74, 106, 138, 170, 202, 234, 11, 43, 75, 107, 139, 171, 203, 235,
+ 12, 44, 76, 108, 140, 172, 204, 236, 13, 45, 77, 109, 141, 173, 205, 237,
+ 14, 46, 78, 110, 142, 174, 206, 238, 15, 47, 79, 111, 143, 175, 207, 239,
+ 16, 48, 80, 112, 144, 176, 208, 240, 17, 49, 81, 113, 145, 177, 209, 241,
+ 18, 50, 82, 114, 146, 178, 210, 242, 19, 51, 83, 115, 147, 179, 211, 243,
+ 20, 52, 84, 116, 148, 180, 212, 244, 21, 53, 85, 117, 149, 181, 213, 245,
+ 22, 54, 86, 118, 150, 182, 214, 246, 23, 55, 87, 119, 151, 183, 215, 247,
+ 24, 56, 88, 120, 152, 184, 216, 248, 25, 57, 89, 121, 153, 185, 217, 249,
+ 26, 58, 90, 122, 154, 186, 218, 250, 27, 59, 91, 123, 155, 187, 219, 251,
+ 28, 60, 92, 124, 156, 188, 220, 252, 29, 61, 93, 125, 157, 189, 221, 253,
+ 30, 62, 94, 126, 158, 190, 222, 254, 31, 63, 95, 127, 159, 191, 223, 255,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x8[64]) = {
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x32[256]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x8[256]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x8[64]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x8[64]) = {
+ 0, 8, 16, 24, 32, 40, 48, 56, 1, 9, 17, 25, 33, 41, 49, 57,
+ 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59,
+ 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61,
+ 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x8[64]) = {
- 0, 1, 5, 6, 14, 15, 27, 28, 2, 4, 7, 13, 16, 26, 29, 42,
- 3, 8, 12, 17, 25, 30, 41, 43, 9, 11, 18, 24, 31, 40, 44, 53,
- 10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60,
- 21, 34, 37, 47, 50, 56, 59, 61, 35, 36, 48, 49, 57, 58, 62, 63
+ 0, 2, 3, 9, 10, 20, 21, 35, 1, 4, 8, 11, 19, 22, 34, 36,
+ 5, 7, 12, 18, 23, 33, 37, 48, 6, 13, 17, 24, 32, 38, 47, 49,
+ 14, 16, 25, 31, 39, 46, 50, 57, 15, 26, 30, 40, 45, 51, 56, 58,
+ 27, 29, 41, 44, 52, 55, 59, 62, 28, 42, 43, 53, 54, 60, 61, 63,
};
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_8x16[128]) = {
- 0, 1, 3, 6, 10, 15, 21, 28, 2, 4, 7, 11, 16, 22, 29, 36,
- 5, 8, 12, 17, 23, 30, 37, 44, 9, 13, 18, 24, 31, 38, 45, 52,
- 14, 19, 25, 32, 39, 46, 53, 60, 20, 26, 33, 40, 47, 54, 61, 68,
- 27, 34, 41, 48, 55, 62, 69, 76, 35, 42, 49, 56, 63, 70, 77, 84,
- 43, 50, 57, 64, 71, 78, 85, 92, 51, 58, 65, 72, 79, 86, 93, 100,
- 59, 66, 73, 80, 87, 94, 101, 107, 67, 74, 81, 88, 95, 102, 108, 113,
- 75, 82, 89, 96, 103, 109, 114, 118, 83, 90, 97, 104, 110, 115, 119, 122,
- 91, 98, 105, 111, 116, 120, 123, 125, 99, 106, 112, 117, 121, 124, 126, 127,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x8[128]) = {
0, 2, 5, 9, 14, 20, 27, 35, 43, 51, 59, 67, 75, 83, 91, 99,
1, 4, 8, 13, 19, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98, 106,
3, 7, 12, 18, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 112,
@@ -1090,18 +1079,42 @@
28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 107, 113, 118, 122, 125, 127,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x8[128]) = {
+ 0, 1, 3, 6, 10, 15, 21, 28, 2, 4, 7, 11, 16, 22, 29, 36,
+ 5, 8, 12, 17, 23, 30, 37, 44, 9, 13, 18, 24, 31, 38, 45, 52,
+ 14, 19, 25, 32, 39, 46, 53, 60, 20, 26, 33, 40, 47, 54, 61, 68,
+ 27, 34, 41, 48, 55, 62, 69, 76, 35, 42, 49, 56, 63, 70, 77, 84,
+ 43, 50, 57, 64, 71, 78, 85, 92, 51, 58, 65, 72, 79, 86, 93, 100,
+ 59, 66, 73, 80, 87, 94, 101, 107, 67, 74, 81, 88, 95, 102, 108, 113,
+ 75, 82, 89, 96, 103, 109, 114, 118, 83, 90, 97, 104, 110, 115, 119, 122,
+ 91, 98, 105, 111, 116, 120, 123, 125, 99, 106, 112, 117, 121, 124, 126, 127,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_8x16[128]) = {
- 0, 16, 32, 48, 64, 80, 96, 112, 1, 17, 33, 49, 65, 81, 97, 113,
- 2, 18, 34, 50, 66, 82, 98, 114, 3, 19, 35, 51, 67, 83, 99, 115,
- 4, 20, 36, 52, 68, 84, 100, 116, 5, 21, 37, 53, 69, 85, 101, 117,
- 6, 22, 38, 54, 70, 86, 102, 118, 7, 23, 39, 55, 71, 87, 103, 119,
- 8, 24, 40, 56, 72, 88, 104, 120, 9, 25, 41, 57, 73, 89, 105, 121,
- 10, 26, 42, 58, 74, 90, 106, 122, 11, 27, 43, 59, 75, 91, 107, 123,
- 12, 28, 44, 60, 76, 92, 108, 124, 13, 29, 45, 61, 77, 93, 109, 125,
- 14, 30, 46, 62, 78, 94, 110, 126, 15, 31, 47, 63, 79, 95, 111, 127,
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127,
};
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x8[128]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x16[128]) = {
0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120,
1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121,
2, 10, 18, 26, 34, 42, 50, 58, 66, 74, 82, 90, 98, 106, 114, 122,
@@ -1112,69 +1125,18 @@
7, 15, 23, 31, 39, 47, 55, 63, 71, 79, 87, 95, 103, 111, 119, 127,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_8x16[128]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127,
-};
-
DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x8[128]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127,
+ 0, 16, 32, 48, 64, 80, 96, 112, 1, 17, 33, 49, 65, 81, 97, 113,
+ 2, 18, 34, 50, 66, 82, 98, 114, 3, 19, 35, 51, 67, 83, 99, 115,
+ 4, 20, 36, 52, 68, 84, 100, 116, 5, 21, 37, 53, 69, 85, 101, 117,
+ 6, 22, 38, 54, 70, 86, 102, 118, 7, 23, 39, 55, 71, 87, 103, 119,
+ 8, 24, 40, 56, 72, 88, 104, 120, 9, 25, 41, 57, 73, 89, 105, 121,
+ 10, 26, 42, 58, 74, 90, 106, 122, 11, 27, 43, 59, 75, 91, 107, 123,
+ 12, 28, 44, 60, 76, 92, 108, 124, 13, 29, 45, 61, 77, 93, 109, 125,
+ 14, 30, 46, 62, 78, 94, 110, 126, 15, 31, 47, 63, 79, 95, 111, 127,
};
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x32[512]) = {
- 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105,
- 120, 2, 4, 7, 11, 16, 22, 29, 37, 46, 56, 67, 79, 92, 106,
- 121, 136, 5, 8, 12, 17, 23, 30, 38, 47, 57, 68, 80, 93, 107,
- 122, 137, 152, 9, 13, 18, 24, 31, 39, 48, 58, 69, 81, 94, 108,
- 123, 138, 153, 168, 14, 19, 25, 32, 40, 49, 59, 70, 82, 95, 109,
- 124, 139, 154, 169, 184, 20, 26, 33, 41, 50, 60, 71, 83, 96, 110,
- 125, 140, 155, 170, 185, 200, 27, 34, 42, 51, 61, 72, 84, 97, 111,
- 126, 141, 156, 171, 186, 201, 216, 35, 43, 52, 62, 73, 85, 98, 112,
- 127, 142, 157, 172, 187, 202, 217, 232, 44, 53, 63, 74, 86, 99, 113,
- 128, 143, 158, 173, 188, 203, 218, 233, 248, 54, 64, 75, 87, 100, 114,
- 129, 144, 159, 174, 189, 204, 219, 234, 249, 264, 65, 76, 88, 101, 115,
- 130, 145, 160, 175, 190, 205, 220, 235, 250, 265, 280, 77, 89, 102, 116,
- 131, 146, 161, 176, 191, 206, 221, 236, 251, 266, 281, 296, 90, 103, 117,
- 132, 147, 162, 177, 192, 207, 222, 237, 252, 267, 282, 297, 312, 104, 118,
- 133, 148, 163, 178, 193, 208, 223, 238, 253, 268, 283, 298, 313, 328, 119,
- 134, 149, 164, 179, 194, 209, 224, 239, 254, 269, 284, 299, 314, 329, 344,
- 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345,
- 360, 151, 166, 181, 196, 211, 226, 241, 256, 271, 286, 301, 316, 331, 346,
- 361, 376, 167, 182, 197, 212, 227, 242, 257, 272, 287, 302, 317, 332, 347,
- 362, 377, 392, 183, 198, 213, 228, 243, 258, 273, 288, 303, 318, 333, 348,
- 363, 378, 393, 407, 199, 214, 229, 244, 259, 274, 289, 304, 319, 334, 349,
- 364, 379, 394, 408, 421, 215, 230, 245, 260, 275, 290, 305, 320, 335, 350,
- 365, 380, 395, 409, 422, 434, 231, 246, 261, 276, 291, 306, 321, 336, 351,
- 366, 381, 396, 410, 423, 435, 446, 247, 262, 277, 292, 307, 322, 337, 352,
- 367, 382, 397, 411, 424, 436, 447, 457, 263, 278, 293, 308, 323, 338, 353,
- 368, 383, 398, 412, 425, 437, 448, 458, 467, 279, 294, 309, 324, 339, 354,
- 369, 384, 399, 413, 426, 438, 449, 459, 468, 476, 295, 310, 325, 340, 355,
- 370, 385, 400, 414, 427, 439, 450, 460, 469, 477, 484, 311, 326, 341, 356,
- 371, 386, 401, 415, 428, 440, 451, 461, 470, 478, 485, 491, 327, 342, 357,
- 372, 387, 402, 416, 429, 441, 452, 462, 471, 479, 486, 492, 497, 343, 358,
- 373, 388, 403, 417, 430, 442, 453, 463, 472, 480, 487, 493, 498, 502, 359,
- 374, 389, 404, 418, 431, 443, 454, 464, 473, 481, 488, 494, 499, 503, 506,
- 375, 390, 405, 419, 432, 444, 455, 465, 474, 482, 489, 495, 500, 504, 507,
- 509, 391, 406, 420, 433, 445, 456, 466, 475, 483, 490, 496, 501, 505, 508,
- 510, 511,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x16[512]) = {
0, 2, 5, 9, 14, 20, 27, 35, 44, 54, 65, 77, 90, 104, 119,
135, 151, 167, 183, 199, 215, 231, 247, 263, 279, 295, 311, 327, 343, 359,
375, 391, 1, 4, 8, 13, 19, 26, 34, 43, 53, 64, 76, 89, 103,
@@ -1212,42 +1174,121 @@
509, 511,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x16[512]) = {
+ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91, 105,
+ 120, 2, 4, 7, 11, 16, 22, 29, 37, 46, 56, 67, 79, 92, 106,
+ 121, 136, 5, 8, 12, 17, 23, 30, 38, 47, 57, 68, 80, 93, 107,
+ 122, 137, 152, 9, 13, 18, 24, 31, 39, 48, 58, 69, 81, 94, 108,
+ 123, 138, 153, 168, 14, 19, 25, 32, 40, 49, 59, 70, 82, 95, 109,
+ 124, 139, 154, 169, 184, 20, 26, 33, 41, 50, 60, 71, 83, 96, 110,
+ 125, 140, 155, 170, 185, 200, 27, 34, 42, 51, 61, 72, 84, 97, 111,
+ 126, 141, 156, 171, 186, 201, 216, 35, 43, 52, 62, 73, 85, 98, 112,
+ 127, 142, 157, 172, 187, 202, 217, 232, 44, 53, 63, 74, 86, 99, 113,
+ 128, 143, 158, 173, 188, 203, 218, 233, 248, 54, 64, 75, 87, 100, 114,
+ 129, 144, 159, 174, 189, 204, 219, 234, 249, 264, 65, 76, 88, 101, 115,
+ 130, 145, 160, 175, 190, 205, 220, 235, 250, 265, 280, 77, 89, 102, 116,
+ 131, 146, 161, 176, 191, 206, 221, 236, 251, 266, 281, 296, 90, 103, 117,
+ 132, 147, 162, 177, 192, 207, 222, 237, 252, 267, 282, 297, 312, 104, 118,
+ 133, 148, 163, 178, 193, 208, 223, 238, 253, 268, 283, 298, 313, 328, 119,
+ 134, 149, 164, 179, 194, 209, 224, 239, 254, 269, 284, 299, 314, 329, 344,
+ 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345,
+ 360, 151, 166, 181, 196, 211, 226, 241, 256, 271, 286, 301, 316, 331, 346,
+ 361, 376, 167, 182, 197, 212, 227, 242, 257, 272, 287, 302, 317, 332, 347,
+ 362, 377, 392, 183, 198, 213, 228, 243, 258, 273, 288, 303, 318, 333, 348,
+ 363, 378, 393, 407, 199, 214, 229, 244, 259, 274, 289, 304, 319, 334, 349,
+ 364, 379, 394, 408, 421, 215, 230, 245, 260, 275, 290, 305, 320, 335, 350,
+ 365, 380, 395, 409, 422, 434, 231, 246, 261, 276, 291, 306, 321, 336, 351,
+ 366, 381, 396, 410, 423, 435, 446, 247, 262, 277, 292, 307, 322, 337, 352,
+ 367, 382, 397, 411, 424, 436, 447, 457, 263, 278, 293, 308, 323, 338, 353,
+ 368, 383, 398, 412, 425, 437, 448, 458, 467, 279, 294, 309, 324, 339, 354,
+ 369, 384, 399, 413, 426, 438, 449, 459, 468, 476, 295, 310, 325, 340, 355,
+ 370, 385, 400, 414, 427, 439, 450, 460, 469, 477, 484, 311, 326, 341, 356,
+ 371, 386, 401, 415, 428, 440, 451, 461, 470, 478, 485, 491, 327, 342, 357,
+ 372, 387, 402, 416, 429, 441, 452, 462, 471, 479, 486, 492, 497, 343, 358,
+ 373, 388, 403, 417, 430, 442, 453, 463, 472, 480, 487, 493, 498, 502, 359,
+ 374, 389, 404, 418, 431, 443, 454, 464, 473, 481, 488, 494, 499, 503, 506,
+ 375, 390, 405, 419, 432, 444, 455, 465, 474, 482, 489, 495, 500, 504, 507,
+ 509, 391, 406, 420, 433, 445, 456, 466, 475, 483, 490, 496, 501, 505, 508,
+ 510, 511,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x32[512]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
- 1, 33, 65, 97, 129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
- 2, 34, 66, 98, 130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
- 3, 35, 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
- 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
- 5, 37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
- 6, 38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
- 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
- 8, 40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
- 9, 41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
- 10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
- 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
- 12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
- 13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
- 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
- 15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
- 16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
- 17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
- 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
- 19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
- 20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
- 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
- 22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
- 23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
- 24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
- 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
- 26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
- 27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
- 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
- 29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
- 30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
- 31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+ 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+ 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+ 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+ 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+ 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+ 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+ 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+ 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+ 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+ 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+ 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+ 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+ 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+ 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+ 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+ 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+ 510, 511,
};
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x16[512]) = {
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
+ 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
+ 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
+ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
+ 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
+ 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
+ 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
+ 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
+ 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
+ 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
+ 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
+ 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
+ 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
+ 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
+ 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
+ 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
+ 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
+ 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
+ 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
+ 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
+ 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
+ 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
+ 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
+ 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
+ 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
+ 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
+ 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
+ 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
+ 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
+ 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
+ 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
+ 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
+ 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
+ 510, 511,
+};
+
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x32[512]) = {
0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224,
240, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464,
480, 496, 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193,
@@ -1285,102 +1326,42 @@
495, 511,
};
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x32[512]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
- 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
- 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
- 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
- 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
- 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
- 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
- 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
- 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
- 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
- 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
- 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
- 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
- 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
- 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
- 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
- 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
- 510, 511,
-};
-
DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x16[512]) = {
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
- 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
- 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
- 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
- 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
- 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
- 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
- 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
- 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
- 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
- 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
- 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
- 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209,
- 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224,
- 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
- 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254,
- 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269,
- 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
- 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
- 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
- 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329,
- 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344,
- 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359,
- 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374,
- 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
- 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404,
- 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419,
- 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434,
- 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449,
- 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464,
- 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
- 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494,
- 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509,
- 510, 511,
+ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416, 448, 480,
+ 1, 33, 65, 97, 129, 161, 193, 225, 257, 289, 321, 353, 385, 417, 449, 481,
+ 2, 34, 66, 98, 130, 162, 194, 226, 258, 290, 322, 354, 386, 418, 450, 482,
+ 3, 35, 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+ 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356, 388, 420, 452, 484,
+ 5, 37, 69, 101, 133, 165, 197, 229, 261, 293, 325, 357, 389, 421, 453, 485,
+ 6, 38, 70, 102, 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486,
+ 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423, 455, 487,
+ 8, 40, 72, 104, 136, 168, 200, 232, 264, 296, 328, 360, 392, 424, 456, 488,
+ 9, 41, 73, 105, 137, 169, 201, 233, 265, 297, 329, 361, 393, 425, 457, 489,
+ 10, 42, 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+ 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363, 395, 427, 459, 491,
+ 12, 44, 76, 108, 140, 172, 204, 236, 268, 300, 332, 364, 396, 428, 460, 492,
+ 13, 45, 77, 109, 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493,
+ 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430, 462, 494,
+ 15, 47, 79, 111, 143, 175, 207, 239, 271, 303, 335, 367, 399, 431, 463, 495,
+ 16, 48, 80, 112, 144, 176, 208, 240, 272, 304, 336, 368, 400, 432, 464, 496,
+ 17, 49, 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+ 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370, 402, 434, 466, 498,
+ 19, 51, 83, 115, 147, 179, 211, 243, 275, 307, 339, 371, 403, 435, 467, 499,
+ 20, 52, 84, 116, 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500,
+ 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437, 469, 501,
+ 22, 54, 86, 118, 150, 182, 214, 246, 278, 310, 342, 374, 406, 438, 470, 502,
+ 23, 55, 87, 119, 151, 183, 215, 247, 279, 311, 343, 375, 407, 439, 471, 503,
+ 24, 56, 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+ 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377, 409, 441, 473, 505,
+ 26, 58, 90, 122, 154, 186, 218, 250, 282, 314, 346, 378, 410, 442, 474, 506,
+ 27, 59, 91, 123, 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507,
+ 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444, 476, 508,
+ 29, 61, 93, 125, 157, 189, 221, 253, 285, 317, 349, 381, 413, 445, 477, 509,
+ 30, 62, 94, 126, 158, 190, 222, 254, 286, 318, 350, 382, 414, 446, 478, 510,
+ 31, 63, 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
};
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_16x16[256]) = {
- 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240,
- 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193, 209, 225, 241,
- 2, 18, 34, 50, 66, 82, 98, 114, 130, 146, 162, 178, 194, 210, 226, 242,
- 3, 19, 35, 51, 67, 83, 99, 115, 131, 147, 163, 179, 195, 211, 227, 243,
- 4, 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
- 5, 21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
- 6, 22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
- 7, 23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
- 8, 24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
- 9, 25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
- 10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
- 11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
- 12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
- 13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
- 14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
- 15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x16[256]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
@@ -1401,105 +1382,47 @@
255,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_16x16[256]) = {
+ 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240,
+ 1, 17, 33, 49, 65, 81, 97, 113, 129, 145, 161, 177, 193, 209, 225, 241,
+ 2, 18, 34, 50, 66, 82, 98, 114, 130, 146, 162, 178, 194, 210, 226, 242,
+ 3, 19, 35, 51, 67, 83, 99, 115, 131, 147, 163, 179, 195, 211, 227, 243,
+ 4, 20, 36, 52, 68, 84, 100, 116, 132, 148, 164, 180, 196, 212, 228, 244,
+ 5, 21, 37, 53, 69, 85, 101, 117, 133, 149, 165, 181, 197, 213, 229, 245,
+ 6, 22, 38, 54, 70, 86, 102, 118, 134, 150, 166, 182, 198, 214, 230, 246,
+ 7, 23, 39, 55, 71, 87, 103, 119, 135, 151, 167, 183, 199, 215, 231, 247,
+ 8, 24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248,
+ 9, 25, 41, 57, 73, 89, 105, 121, 137, 153, 169, 185, 201, 217, 233, 249,
+ 10, 26, 42, 58, 74, 90, 106, 122, 138, 154, 170, 186, 202, 218, 234, 250,
+ 11, 27, 43, 59, 75, 91, 107, 123, 139, 155, 171, 187, 203, 219, 235, 251,
+ 12, 28, 44, 60, 76, 92, 108, 124, 140, 156, 172, 188, 204, 220, 236, 252,
+ 13, 29, 45, 61, 77, 93, 109, 125, 141, 157, 173, 189, 205, 221, 237, 253,
+ 14, 30, 46, 62, 78, 94, 110, 126, 142, 158, 174, 190, 206, 222, 238, 254,
+ 15, 31, 47, 63, 79, 95, 111, 127, 143, 159, 175, 191, 207, 223, 239, 255,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_16x16[256]) = {
- 0, 1, 5, 6, 14, 15, 27, 28, 44, 45, 65, 66, 90, 91, 119,
- 120, 2, 4, 7, 13, 16, 26, 29, 43, 46, 64, 67, 89, 92, 118,
- 121, 150, 3, 8, 12, 17, 25, 30, 42, 47, 63, 68, 88, 93, 117,
- 122, 149, 151, 9, 11, 18, 24, 31, 41, 48, 62, 69, 87, 94, 116,
- 123, 148, 152, 177, 10, 19, 23, 32, 40, 49, 61, 70, 86, 95, 115,
- 124, 147, 153, 176, 178, 20, 22, 33, 39, 50, 60, 71, 85, 96, 114,
- 125, 146, 154, 175, 179, 200, 21, 34, 38, 51, 59, 72, 84, 97, 113,
- 126, 145, 155, 174, 180, 199, 201, 35, 37, 52, 58, 73, 83, 98, 112,
- 127, 144, 156, 173, 181, 198, 202, 219, 36, 53, 57, 74, 82, 99, 111,
- 128, 143, 157, 172, 182, 197, 203, 218, 220, 54, 56, 75, 81, 100, 110,
- 129, 142, 158, 171, 183, 196, 204, 217, 221, 234, 55, 76, 80, 101, 109,
- 130, 141, 159, 170, 184, 195, 205, 216, 222, 233, 235, 77, 79, 102, 108,
- 131, 140, 160, 169, 185, 194, 206, 215, 223, 232, 236, 245, 78, 103, 107,
- 132, 139, 161, 168, 186, 193, 207, 214, 224, 231, 237, 244, 246, 104, 106,
- 133, 138, 162, 167, 187, 192, 208, 213, 225, 230, 238, 243, 247, 252, 105,
- 134, 137, 163, 166, 188, 191, 209, 212, 226, 229, 239, 242, 248, 251, 253,
- 135, 136, 164, 165, 189, 190, 210, 211, 227, 228, 240, 241, 249, 250, 254,
- 255
+ 0, 2, 3, 9, 10, 20, 21, 35, 36, 54, 55, 77, 78, 104, 105,
+ 135, 1, 4, 8, 11, 19, 22, 34, 37, 53, 56, 76, 79, 103, 106,
+ 134, 136, 5, 7, 12, 18, 23, 33, 38, 52, 57, 75, 80, 102, 107,
+ 133, 137, 164, 6, 13, 17, 24, 32, 39, 51, 58, 74, 81, 101, 108,
+ 132, 138, 163, 165, 14, 16, 25, 31, 40, 50, 59, 73, 82, 100, 109,
+ 131, 139, 162, 166, 189, 15, 26, 30, 41, 49, 60, 72, 83, 99, 110,
+ 130, 140, 161, 167, 188, 190, 27, 29, 42, 48, 61, 71, 84, 98, 111,
+ 129, 141, 160, 168, 187, 191, 210, 28, 43, 47, 62, 70, 85, 97, 112,
+ 128, 142, 159, 169, 186, 192, 209, 211, 44, 46, 63, 69, 86, 96, 113,
+ 127, 143, 158, 170, 185, 193, 208, 212, 227, 45, 64, 68, 87, 95, 114,
+ 126, 144, 157, 171, 184, 194, 207, 213, 226, 228, 65, 67, 88, 94, 115,
+ 125, 145, 156, 172, 183, 195, 206, 214, 225, 229, 240, 66, 89, 93, 116,
+ 124, 146, 155, 173, 182, 196, 205, 215, 224, 230, 239, 241, 90, 92, 117,
+ 123, 147, 154, 174, 181, 197, 204, 216, 223, 231, 238, 242, 249, 91, 118,
+ 122, 148, 153, 175, 180, 198, 203, 217, 222, 232, 237, 243, 248, 250, 119,
+ 121, 149, 152, 176, 179, 199, 202, 218, 221, 233, 236, 244, 247, 251, 254,
+ 120, 150, 151, 177, 178, 200, 201, 219, 220, 234, 235, 245, 246, 252, 253,
+ 255,
};
DECLARE_ALIGNED(16, static const int16_t, av1_mcol_iscan_32x32[1024]) = {
- 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416,
- 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864,
- 896, 928, 960, 992, 1, 33, 65, 97, 129, 161, 193, 225, 257, 289,
- 321, 353, 385, 417, 449, 481, 513, 545, 577, 609, 641, 673, 705, 737,
- 769, 801, 833, 865, 897, 929, 961, 993, 2, 34, 66, 98, 130, 162,
- 194, 226, 258, 290, 322, 354, 386, 418, 450, 482, 514, 546, 578, 610,
- 642, 674, 706, 738, 770, 802, 834, 866, 898, 930, 962, 994, 3, 35,
- 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
- 515, 547, 579, 611, 643, 675, 707, 739, 771, 803, 835, 867, 899, 931,
- 963, 995, 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356,
- 388, 420, 452, 484, 516, 548, 580, 612, 644, 676, 708, 740, 772, 804,
- 836, 868, 900, 932, 964, 996, 5, 37, 69, 101, 133, 165, 197, 229,
- 261, 293, 325, 357, 389, 421, 453, 485, 517, 549, 581, 613, 645, 677,
- 709, 741, 773, 805, 837, 869, 901, 933, 965, 997, 6, 38, 70, 102,
- 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486, 518, 550,
- 582, 614, 646, 678, 710, 742, 774, 806, 838, 870, 902, 934, 966, 998,
- 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423,
- 455, 487, 519, 551, 583, 615, 647, 679, 711, 743, 775, 807, 839, 871,
- 903, 935, 967, 999, 8, 40, 72, 104, 136, 168, 200, 232, 264, 296,
- 328, 360, 392, 424, 456, 488, 520, 552, 584, 616, 648, 680, 712, 744,
- 776, 808, 840, 872, 904, 936, 968, 1000, 9, 41, 73, 105, 137, 169,
- 201, 233, 265, 297, 329, 361, 393, 425, 457, 489, 521, 553, 585, 617,
- 649, 681, 713, 745, 777, 809, 841, 873, 905, 937, 969, 1001, 10, 42,
- 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
- 522, 554, 586, 618, 650, 682, 714, 746, 778, 810, 842, 874, 906, 938,
- 970, 1002, 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363,
- 395, 427, 459, 491, 523, 555, 587, 619, 651, 683, 715, 747, 779, 811,
- 843, 875, 907, 939, 971, 1003, 12, 44, 76, 108, 140, 172, 204, 236,
- 268, 300, 332, 364, 396, 428, 460, 492, 524, 556, 588, 620, 652, 684,
- 716, 748, 780, 812, 844, 876, 908, 940, 972, 1004, 13, 45, 77, 109,
- 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493, 525, 557,
- 589, 621, 653, 685, 717, 749, 781, 813, 845, 877, 909, 941, 973, 1005,
- 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430,
- 462, 494, 526, 558, 590, 622, 654, 686, 718, 750, 782, 814, 846, 878,
- 910, 942, 974, 1006, 15, 47, 79, 111, 143, 175, 207, 239, 271, 303,
- 335, 367, 399, 431, 463, 495, 527, 559, 591, 623, 655, 687, 719, 751,
- 783, 815, 847, 879, 911, 943, 975, 1007, 16, 48, 80, 112, 144, 176,
- 208, 240, 272, 304, 336, 368, 400, 432, 464, 496, 528, 560, 592, 624,
- 656, 688, 720, 752, 784, 816, 848, 880, 912, 944, 976, 1008, 17, 49,
- 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
- 529, 561, 593, 625, 657, 689, 721, 753, 785, 817, 849, 881, 913, 945,
- 977, 1009, 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370,
- 402, 434, 466, 498, 530, 562, 594, 626, 658, 690, 722, 754, 786, 818,
- 850, 882, 914, 946, 978, 1010, 19, 51, 83, 115, 147, 179, 211, 243,
- 275, 307, 339, 371, 403, 435, 467, 499, 531, 563, 595, 627, 659, 691,
- 723, 755, 787, 819, 851, 883, 915, 947, 979, 1011, 20, 52, 84, 116,
- 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500, 532, 564,
- 596, 628, 660, 692, 724, 756, 788, 820, 852, 884, 916, 948, 980, 1012,
- 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437,
- 469, 501, 533, 565, 597, 629, 661, 693, 725, 757, 789, 821, 853, 885,
- 917, 949, 981, 1013, 22, 54, 86, 118, 150, 182, 214, 246, 278, 310,
- 342, 374, 406, 438, 470, 502, 534, 566, 598, 630, 662, 694, 726, 758,
- 790, 822, 854, 886, 918, 950, 982, 1014, 23, 55, 87, 119, 151, 183,
- 215, 247, 279, 311, 343, 375, 407, 439, 471, 503, 535, 567, 599, 631,
- 663, 695, 727, 759, 791, 823, 855, 887, 919, 951, 983, 1015, 24, 56,
- 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
- 536, 568, 600, 632, 664, 696, 728, 760, 792, 824, 856, 888, 920, 952,
- 984, 1016, 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377,
- 409, 441, 473, 505, 537, 569, 601, 633, 665, 697, 729, 761, 793, 825,
- 857, 889, 921, 953, 985, 1017, 26, 58, 90, 122, 154, 186, 218, 250,
- 282, 314, 346, 378, 410, 442, 474, 506, 538, 570, 602, 634, 666, 698,
- 730, 762, 794, 826, 858, 890, 922, 954, 986, 1018, 27, 59, 91, 123,
- 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507, 539, 571,
- 603, 635, 667, 699, 731, 763, 795, 827, 859, 891, 923, 955, 987, 1019,
- 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444,
- 476, 508, 540, 572, 604, 636, 668, 700, 732, 764, 796, 828, 860, 892,
- 924, 956, 988, 1020, 29, 61, 93, 125, 157, 189, 221, 253, 285, 317,
- 349, 381, 413, 445, 477, 509, 541, 573, 605, 637, 669, 701, 733, 765,
- 797, 829, 861, 893, 925, 957, 989, 1021, 30, 62, 94, 126, 158, 190,
- 222, 254, 286, 318, 350, 382, 414, 446, 478, 510, 542, 574, 606, 638,
- 670, 702, 734, 766, 798, 830, 862, 894, 926, 958, 990, 1022, 31, 63,
- 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
- 543, 575, 607, 639, 671, 703, 735, 767, 799, 831, 863, 895, 927, 959,
- 991, 1023,
-};
-
-DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x32[1024]) = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
@@ -1581,86 +1504,163 @@
1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
};
+DECLARE_ALIGNED(16, static const int16_t, av1_mrow_iscan_32x32[1024]) = {
+ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416,
+ 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800, 832, 864,
+ 896, 928, 960, 992, 1, 33, 65, 97, 129, 161, 193, 225, 257, 289,
+ 321, 353, 385, 417, 449, 481, 513, 545, 577, 609, 641, 673, 705, 737,
+ 769, 801, 833, 865, 897, 929, 961, 993, 2, 34, 66, 98, 130, 162,
+ 194, 226, 258, 290, 322, 354, 386, 418, 450, 482, 514, 546, 578, 610,
+ 642, 674, 706, 738, 770, 802, 834, 866, 898, 930, 962, 994, 3, 35,
+ 67, 99, 131, 163, 195, 227, 259, 291, 323, 355, 387, 419, 451, 483,
+ 515, 547, 579, 611, 643, 675, 707, 739, 771, 803, 835, 867, 899, 931,
+ 963, 995, 4, 36, 68, 100, 132, 164, 196, 228, 260, 292, 324, 356,
+ 388, 420, 452, 484, 516, 548, 580, 612, 644, 676, 708, 740, 772, 804,
+ 836, 868, 900, 932, 964, 996, 5, 37, 69, 101, 133, 165, 197, 229,
+ 261, 293, 325, 357, 389, 421, 453, 485, 517, 549, 581, 613, 645, 677,
+ 709, 741, 773, 805, 837, 869, 901, 933, 965, 997, 6, 38, 70, 102,
+ 134, 166, 198, 230, 262, 294, 326, 358, 390, 422, 454, 486, 518, 550,
+ 582, 614, 646, 678, 710, 742, 774, 806, 838, 870, 902, 934, 966, 998,
+ 7, 39, 71, 103, 135, 167, 199, 231, 263, 295, 327, 359, 391, 423,
+ 455, 487, 519, 551, 583, 615, 647, 679, 711, 743, 775, 807, 839, 871,
+ 903, 935, 967, 999, 8, 40, 72, 104, 136, 168, 200, 232, 264, 296,
+ 328, 360, 392, 424, 456, 488, 520, 552, 584, 616, 648, 680, 712, 744,
+ 776, 808, 840, 872, 904, 936, 968, 1000, 9, 41, 73, 105, 137, 169,
+ 201, 233, 265, 297, 329, 361, 393, 425, 457, 489, 521, 553, 585, 617,
+ 649, 681, 713, 745, 777, 809, 841, 873, 905, 937, 969, 1001, 10, 42,
+ 74, 106, 138, 170, 202, 234, 266, 298, 330, 362, 394, 426, 458, 490,
+ 522, 554, 586, 618, 650, 682, 714, 746, 778, 810, 842, 874, 906, 938,
+ 970, 1002, 11, 43, 75, 107, 139, 171, 203, 235, 267, 299, 331, 363,
+ 395, 427, 459, 491, 523, 555, 587, 619, 651, 683, 715, 747, 779, 811,
+ 843, 875, 907, 939, 971, 1003, 12, 44, 76, 108, 140, 172, 204, 236,
+ 268, 300, 332, 364, 396, 428, 460, 492, 524, 556, 588, 620, 652, 684,
+ 716, 748, 780, 812, 844, 876, 908, 940, 972, 1004, 13, 45, 77, 109,
+ 141, 173, 205, 237, 269, 301, 333, 365, 397, 429, 461, 493, 525, 557,
+ 589, 621, 653, 685, 717, 749, 781, 813, 845, 877, 909, 941, 973, 1005,
+ 14, 46, 78, 110, 142, 174, 206, 238, 270, 302, 334, 366, 398, 430,
+ 462, 494, 526, 558, 590, 622, 654, 686, 718, 750, 782, 814, 846, 878,
+ 910, 942, 974, 1006, 15, 47, 79, 111, 143, 175, 207, 239, 271, 303,
+ 335, 367, 399, 431, 463, 495, 527, 559, 591, 623, 655, 687, 719, 751,
+ 783, 815, 847, 879, 911, 943, 975, 1007, 16, 48, 80, 112, 144, 176,
+ 208, 240, 272, 304, 336, 368, 400, 432, 464, 496, 528, 560, 592, 624,
+ 656, 688, 720, 752, 784, 816, 848, 880, 912, 944, 976, 1008, 17, 49,
+ 81, 113, 145, 177, 209, 241, 273, 305, 337, 369, 401, 433, 465, 497,
+ 529, 561, 593, 625, 657, 689, 721, 753, 785, 817, 849, 881, 913, 945,
+ 977, 1009, 18, 50, 82, 114, 146, 178, 210, 242, 274, 306, 338, 370,
+ 402, 434, 466, 498, 530, 562, 594, 626, 658, 690, 722, 754, 786, 818,
+ 850, 882, 914, 946, 978, 1010, 19, 51, 83, 115, 147, 179, 211, 243,
+ 275, 307, 339, 371, 403, 435, 467, 499, 531, 563, 595, 627, 659, 691,
+ 723, 755, 787, 819, 851, 883, 915, 947, 979, 1011, 20, 52, 84, 116,
+ 148, 180, 212, 244, 276, 308, 340, 372, 404, 436, 468, 500, 532, 564,
+ 596, 628, 660, 692, 724, 756, 788, 820, 852, 884, 916, 948, 980, 1012,
+ 21, 53, 85, 117, 149, 181, 213, 245, 277, 309, 341, 373, 405, 437,
+ 469, 501, 533, 565, 597, 629, 661, 693, 725, 757, 789, 821, 853, 885,
+ 917, 949, 981, 1013, 22, 54, 86, 118, 150, 182, 214, 246, 278, 310,
+ 342, 374, 406, 438, 470, 502, 534, 566, 598, 630, 662, 694, 726, 758,
+ 790, 822, 854, 886, 918, 950, 982, 1014, 23, 55, 87, 119, 151, 183,
+ 215, 247, 279, 311, 343, 375, 407, 439, 471, 503, 535, 567, 599, 631,
+ 663, 695, 727, 759, 791, 823, 855, 887, 919, 951, 983, 1015, 24, 56,
+ 88, 120, 152, 184, 216, 248, 280, 312, 344, 376, 408, 440, 472, 504,
+ 536, 568, 600, 632, 664, 696, 728, 760, 792, 824, 856, 888, 920, 952,
+ 984, 1016, 25, 57, 89, 121, 153, 185, 217, 249, 281, 313, 345, 377,
+ 409, 441, 473, 505, 537, 569, 601, 633, 665, 697, 729, 761, 793, 825,
+ 857, 889, 921, 953, 985, 1017, 26, 58, 90, 122, 154, 186, 218, 250,
+ 282, 314, 346, 378, 410, 442, 474, 506, 538, 570, 602, 634, 666, 698,
+ 730, 762, 794, 826, 858, 890, 922, 954, 986, 1018, 27, 59, 91, 123,
+ 155, 187, 219, 251, 283, 315, 347, 379, 411, 443, 475, 507, 539, 571,
+ 603, 635, 667, 699, 731, 763, 795, 827, 859, 891, 923, 955, 987, 1019,
+ 28, 60, 92, 124, 156, 188, 220, 252, 284, 316, 348, 380, 412, 444,
+ 476, 508, 540, 572, 604, 636, 668, 700, 732, 764, 796, 828, 860, 892,
+ 924, 956, 988, 1020, 29, 61, 93, 125, 157, 189, 221, 253, 285, 317,
+ 349, 381, 413, 445, 477, 509, 541, 573, 605, 637, 669, 701, 733, 765,
+ 797, 829, 861, 893, 925, 957, 989, 1021, 30, 62, 94, 126, 158, 190,
+ 222, 254, 286, 318, 350, 382, 414, 446, 478, 510, 542, 574, 606, 638,
+ 670, 702, 734, 766, 798, 830, 862, 894, 926, 958, 990, 1022, 31, 63,
+ 95, 127, 159, 191, 223, 255, 287, 319, 351, 383, 415, 447, 479, 511,
+ 543, 575, 607, 639, 671, 703, 735, 767, 799, 831, 863, 895, 927, 959,
+ 991, 1023,
+};
+
DECLARE_ALIGNED(16, static const int16_t, av1_default_iscan_32x32[1024]) = {
- 0, 1, 5, 6, 14, 15, 27, 28, 44, 45, 65, 66, 90,
- 91, 119, 120, 152, 153, 189, 190, 230, 231, 275, 276, 324, 325,
- 377, 378, 434, 435, 495, 496, 2, 4, 7, 13, 16, 26, 29,
- 43, 46, 64, 67, 89, 92, 118, 121, 151, 154, 188, 191, 229,
- 232, 274, 277, 323, 326, 376, 379, 433, 436, 494, 497, 558, 3,
- 8, 12, 17, 25, 30, 42, 47, 63, 68, 88, 93, 117, 122,
- 150, 155, 187, 192, 228, 233, 273, 278, 322, 327, 375, 380, 432,
- 437, 493, 498, 557, 559, 9, 11, 18, 24, 31, 41, 48, 62,
- 69, 87, 94, 116, 123, 149, 156, 186, 193, 227, 234, 272, 279,
- 321, 328, 374, 381, 431, 438, 492, 499, 556, 560, 617, 10, 19,
- 23, 32, 40, 49, 61, 70, 86, 95, 115, 124, 148, 157, 185,
- 194, 226, 235, 271, 280, 320, 329, 373, 382, 430, 439, 491, 500,
- 555, 561, 616, 618, 20, 22, 33, 39, 50, 60, 71, 85, 96,
- 114, 125, 147, 158, 184, 195, 225, 236, 270, 281, 319, 330, 372,
- 383, 429, 440, 490, 501, 554, 562, 615, 619, 672, 21, 34, 38,
- 51, 59, 72, 84, 97, 113, 126, 146, 159, 183, 196, 224, 237,
- 269, 282, 318, 331, 371, 384, 428, 441, 489, 502, 553, 563, 614,
- 620, 671, 673, 35, 37, 52, 58, 73, 83, 98, 112, 127, 145,
- 160, 182, 197, 223, 238, 268, 283, 317, 332, 370, 385, 427, 442,
- 488, 503, 552, 564, 613, 621, 670, 674, 723, 36, 53, 57, 74,
- 82, 99, 111, 128, 144, 161, 181, 198, 222, 239, 267, 284, 316,
- 333, 369, 386, 426, 443, 487, 504, 551, 565, 612, 622, 669, 675,
- 722, 724, 54, 56, 75, 81, 100, 110, 129, 143, 162, 180, 199,
- 221, 240, 266, 285, 315, 334, 368, 387, 425, 444, 486, 505, 550,
- 566, 611, 623, 668, 676, 721, 725, 770, 55, 76, 80, 101, 109,
- 130, 142, 163, 179, 200, 220, 241, 265, 286, 314, 335, 367, 388,
- 424, 445, 485, 506, 549, 567, 610, 624, 667, 677, 720, 726, 769,
- 771, 77, 79, 102, 108, 131, 141, 164, 178, 201, 219, 242, 264,
- 287, 313, 336, 366, 389, 423, 446, 484, 507, 548, 568, 609, 625,
- 666, 678, 719, 727, 768, 772, 813, 78, 103, 107, 132, 140, 165,
- 177, 202, 218, 243, 263, 288, 312, 337, 365, 390, 422, 447, 483,
- 508, 547, 569, 608, 626, 665, 679, 718, 728, 767, 773, 812, 814,
- 104, 106, 133, 139, 166, 176, 203, 217, 244, 262, 289, 311, 338,
- 364, 391, 421, 448, 482, 509, 546, 570, 607, 627, 664, 680, 717,
- 729, 766, 774, 811, 815, 852, 105, 134, 138, 167, 175, 204, 216,
- 245, 261, 290, 310, 339, 363, 392, 420, 449, 481, 510, 545, 571,
- 606, 628, 663, 681, 716, 730, 765, 775, 810, 816, 851, 853, 135,
- 137, 168, 174, 205, 215, 246, 260, 291, 309, 340, 362, 393, 419,
- 450, 480, 511, 544, 572, 605, 629, 662, 682, 715, 731, 764, 776,
- 809, 817, 850, 854, 887, 136, 169, 173, 206, 214, 247, 259, 292,
- 308, 341, 361, 394, 418, 451, 479, 512, 543, 573, 604, 630, 661,
- 683, 714, 732, 763, 777, 808, 818, 849, 855, 886, 888, 170, 172,
- 207, 213, 248, 258, 293, 307, 342, 360, 395, 417, 452, 478, 513,
- 542, 574, 603, 631, 660, 684, 713, 733, 762, 778, 807, 819, 848,
- 856, 885, 889, 918, 171, 208, 212, 249, 257, 294, 306, 343, 359,
- 396, 416, 453, 477, 514, 541, 575, 602, 632, 659, 685, 712, 734,
- 761, 779, 806, 820, 847, 857, 884, 890, 917, 919, 209, 211, 250,
- 256, 295, 305, 344, 358, 397, 415, 454, 476, 515, 540, 576, 601,
- 633, 658, 686, 711, 735, 760, 780, 805, 821, 846, 858, 883, 891,
- 916, 920, 945, 210, 251, 255, 296, 304, 345, 357, 398, 414, 455,
- 475, 516, 539, 577, 600, 634, 657, 687, 710, 736, 759, 781, 804,
- 822, 845, 859, 882, 892, 915, 921, 944, 946, 252, 254, 297, 303,
- 346, 356, 399, 413, 456, 474, 517, 538, 578, 599, 635, 656, 688,
- 709, 737, 758, 782, 803, 823, 844, 860, 881, 893, 914, 922, 943,
- 947, 968, 253, 298, 302, 347, 355, 400, 412, 457, 473, 518, 537,
- 579, 598, 636, 655, 689, 708, 738, 757, 783, 802, 824, 843, 861,
- 880, 894, 913, 923, 942, 948, 967, 969, 299, 301, 348, 354, 401,
- 411, 458, 472, 519, 536, 580, 597, 637, 654, 690, 707, 739, 756,
- 784, 801, 825, 842, 862, 879, 895, 912, 924, 941, 949, 966, 970,
- 987, 300, 349, 353, 402, 410, 459, 471, 520, 535, 581, 596, 638,
- 653, 691, 706, 740, 755, 785, 800, 826, 841, 863, 878, 896, 911,
- 925, 940, 950, 965, 971, 986, 988, 350, 352, 403, 409, 460, 470,
- 521, 534, 582, 595, 639, 652, 692, 705, 741, 754, 786, 799, 827,
- 840, 864, 877, 897, 910, 926, 939, 951, 964, 972, 985, 989, 1002,
- 351, 404, 408, 461, 469, 522, 533, 583, 594, 640, 651, 693, 704,
- 742, 753, 787, 798, 828, 839, 865, 876, 898, 909, 927, 938, 952,
- 963, 973, 984, 990, 1001, 1003, 405, 407, 462, 468, 523, 532, 584,
- 593, 641, 650, 694, 703, 743, 752, 788, 797, 829, 838, 866, 875,
- 899, 908, 928, 937, 953, 962, 974, 983, 991, 1000, 1004, 1013, 406,
- 463, 467, 524, 531, 585, 592, 642, 649, 695, 702, 744, 751, 789,
- 796, 830, 837, 867, 874, 900, 907, 929, 936, 954, 961, 975, 982,
- 992, 999, 1005, 1012, 1014, 464, 466, 525, 530, 586, 591, 643, 648,
- 696, 701, 745, 750, 790, 795, 831, 836, 868, 873, 901, 906, 930,
- 935, 955, 960, 976, 981, 993, 998, 1006, 1011, 1015, 1020, 465, 526,
- 529, 587, 590, 644, 647, 697, 700, 746, 749, 791, 794, 832, 835,
- 869, 872, 902, 905, 931, 934, 956, 959, 977, 980, 994, 997, 1007,
- 1010, 1016, 1019, 1021, 527, 528, 588, 589, 645, 646, 698, 699, 747,
- 748, 792, 793, 833, 834, 870, 871, 903, 904, 932, 933, 957, 958,
- 978, 979, 995, 996, 1008, 1009, 1017, 1018, 1022, 1023
+ 0, 2, 3, 9, 10, 20, 21, 35, 36, 54, 55, 77, 78,
+ 104, 105, 135, 136, 170, 171, 209, 210, 252, 253, 299, 300, 350,
+ 351, 405, 406, 464, 465, 527, 1, 4, 8, 11, 19, 22, 34,
+ 37, 53, 56, 76, 79, 103, 106, 134, 137, 169, 172, 208, 211,
+ 251, 254, 298, 301, 349, 352, 404, 407, 463, 466, 526, 528, 5,
+ 7, 12, 18, 23, 33, 38, 52, 57, 75, 80, 102, 107, 133,
+ 138, 168, 173, 207, 212, 250, 255, 297, 302, 348, 353, 403, 408,
+ 462, 467, 525, 529, 588, 6, 13, 17, 24, 32, 39, 51, 58,
+ 74, 81, 101, 108, 132, 139, 167, 174, 206, 213, 249, 256, 296,
+ 303, 347, 354, 402, 409, 461, 468, 524, 530, 587, 589, 14, 16,
+ 25, 31, 40, 50, 59, 73, 82, 100, 109, 131, 140, 166, 175,
+ 205, 214, 248, 257, 295, 304, 346, 355, 401, 410, 460, 469, 523,
+ 531, 586, 590, 645, 15, 26, 30, 41, 49, 60, 72, 83, 99,
+ 110, 130, 141, 165, 176, 204, 215, 247, 258, 294, 305, 345, 356,
+ 400, 411, 459, 470, 522, 532, 585, 591, 644, 646, 27, 29, 42,
+ 48, 61, 71, 84, 98, 111, 129, 142, 164, 177, 203, 216, 246,
+ 259, 293, 306, 344, 357, 399, 412, 458, 471, 521, 533, 584, 592,
+ 643, 647, 698, 28, 43, 47, 62, 70, 85, 97, 112, 128, 143,
+ 163, 178, 202, 217, 245, 260, 292, 307, 343, 358, 398, 413, 457,
+ 472, 520, 534, 583, 593, 642, 648, 697, 699, 44, 46, 63, 69,
+ 86, 96, 113, 127, 144, 162, 179, 201, 218, 244, 261, 291, 308,
+ 342, 359, 397, 414, 456, 473, 519, 535, 582, 594, 641, 649, 696,
+ 700, 747, 45, 64, 68, 87, 95, 114, 126, 145, 161, 180, 200,
+ 219, 243, 262, 290, 309, 341, 360, 396, 415, 455, 474, 518, 536,
+ 581, 595, 640, 650, 695, 701, 746, 748, 65, 67, 88, 94, 115,
+ 125, 146, 160, 181, 199, 220, 242, 263, 289, 310, 340, 361, 395,
+ 416, 454, 475, 517, 537, 580, 596, 639, 651, 694, 702, 745, 749,
+ 792, 66, 89, 93, 116, 124, 147, 159, 182, 198, 221, 241, 264,
+ 288, 311, 339, 362, 394, 417, 453, 476, 516, 538, 579, 597, 638,
+ 652, 693, 703, 744, 750, 791, 793, 90, 92, 117, 123, 148, 158,
+ 183, 197, 222, 240, 265, 287, 312, 338, 363, 393, 418, 452, 477,
+ 515, 539, 578, 598, 637, 653, 692, 704, 743, 751, 790, 794, 833,
+ 91, 118, 122, 149, 157, 184, 196, 223, 239, 266, 286, 313, 337,
+ 364, 392, 419, 451, 478, 514, 540, 577, 599, 636, 654, 691, 705,
+ 742, 752, 789, 795, 832, 834, 119, 121, 150, 156, 185, 195, 224,
+ 238, 267, 285, 314, 336, 365, 391, 420, 450, 479, 513, 541, 576,
+ 600, 635, 655, 690, 706, 741, 753, 788, 796, 831, 835, 870, 120,
+ 151, 155, 186, 194, 225, 237, 268, 284, 315, 335, 366, 390, 421,
+ 449, 480, 512, 542, 575, 601, 634, 656, 689, 707, 740, 754, 787,
+ 797, 830, 836, 869, 871, 152, 154, 187, 193, 226, 236, 269, 283,
+ 316, 334, 367, 389, 422, 448, 481, 511, 543, 574, 602, 633, 657,
+ 688, 708, 739, 755, 786, 798, 829, 837, 868, 872, 903, 153, 188,
+ 192, 227, 235, 270, 282, 317, 333, 368, 388, 423, 447, 482, 510,
+ 544, 573, 603, 632, 658, 687, 709, 738, 756, 785, 799, 828, 838,
+ 867, 873, 902, 904, 189, 191, 228, 234, 271, 281, 318, 332, 369,
+ 387, 424, 446, 483, 509, 545, 572, 604, 631, 659, 686, 710, 737,
+ 757, 784, 800, 827, 839, 866, 874, 901, 905, 932, 190, 229, 233,
+ 272, 280, 319, 331, 370, 386, 425, 445, 484, 508, 546, 571, 605,
+ 630, 660, 685, 711, 736, 758, 783, 801, 826, 840, 865, 875, 900,
+ 906, 931, 933, 230, 232, 273, 279, 320, 330, 371, 385, 426, 444,
+ 485, 507, 547, 570, 606, 629, 661, 684, 712, 735, 759, 782, 802,
+ 825, 841, 864, 876, 899, 907, 930, 934, 957, 231, 274, 278, 321,
+ 329, 372, 384, 427, 443, 486, 506, 548, 569, 607, 628, 662, 683,
+ 713, 734, 760, 781, 803, 824, 842, 863, 877, 898, 908, 929, 935,
+ 956, 958, 275, 277, 322, 328, 373, 383, 428, 442, 487, 505, 549,
+ 568, 608, 627, 663, 682, 714, 733, 761, 780, 804, 823, 843, 862,
+ 878, 897, 909, 928, 936, 955, 959, 978, 276, 323, 327, 374, 382,
+ 429, 441, 488, 504, 550, 567, 609, 626, 664, 681, 715, 732, 762,
+ 779, 805, 822, 844, 861, 879, 896, 910, 927, 937, 954, 960, 977,
+ 979, 324, 326, 375, 381, 430, 440, 489, 503, 551, 566, 610, 625,
+ 665, 680, 716, 731, 763, 778, 806, 821, 845, 860, 880, 895, 911,
+ 926, 938, 953, 961, 976, 980, 995, 325, 376, 380, 431, 439, 490,
+ 502, 552, 565, 611, 624, 666, 679, 717, 730, 764, 777, 807, 820,
+ 846, 859, 881, 894, 912, 925, 939, 952, 962, 975, 981, 994, 996,
+ 377, 379, 432, 438, 491, 501, 553, 564, 612, 623, 667, 678, 718,
+ 729, 765, 776, 808, 819, 847, 858, 882, 893, 913, 924, 940, 951,
+ 963, 974, 982, 993, 997, 1008, 378, 433, 437, 492, 500, 554, 563,
+ 613, 622, 668, 677, 719, 728, 766, 775, 809, 818, 848, 857, 883,
+ 892, 914, 923, 941, 950, 964, 973, 983, 992, 998, 1007, 1009, 434,
+ 436, 493, 499, 555, 562, 614, 621, 669, 676, 720, 727, 767, 774,
+ 810, 817, 849, 856, 884, 891, 915, 922, 942, 949, 965, 972, 984,
+ 991, 999, 1006, 1010, 1017, 435, 494, 498, 556, 561, 615, 620, 670,
+ 675, 721, 726, 768, 773, 811, 816, 850, 855, 885, 890, 916, 921,
+ 943, 948, 966, 971, 985, 990, 1000, 1005, 1011, 1016, 1018, 495, 497,
+ 557, 560, 616, 619, 671, 674, 722, 725, 769, 772, 812, 815, 851,
+ 854, 886, 889, 917, 920, 944, 947, 967, 970, 986, 989, 1001, 1004,
+ 1012, 1015, 1019, 1022, 496, 558, 559, 617, 618, 672, 673, 723, 724,
+ 770, 771, 813, 814, 852, 853, 887, 888, 918, 919, 945, 946, 968,
+ 969, 987, 988, 1002, 1003, 1013, 1014, 1020, 1021, 1023,
};
const SCAN_ORDER av1_scan_orders[TX_SIZES_ALL][TX_TYPES] = {
diff --git a/av1/common/txb_common.c b/av1/common/txb_common.c
index 4eef319..bf2bc36 100644
--- a/av1/common/txb_common.c
+++ b/av1/common/txb_common.c
@@ -12,90 +12,6 @@
#include "av1/common/av1_common_int.h"
#include "av1/common/txb_common.h"
-const int8_t av1_coeff_band_4x4[16] = { 0, 1, 2, 3, 4, 5, 6, 7,
- 8, 9, 10, 11, 12, 13, 14, 15 };
-
-const int8_t av1_coeff_band_8x8[64] = {
- 0, 1, 2, 2, 3, 3, 4, 4, 5, 6, 2, 2, 3, 3, 4, 4,
- 7, 7, 8, 8, 9, 9, 10, 10, 7, 7, 8, 8, 9, 9, 10, 10,
- 11, 11, 12, 12, 13, 13, 14, 14, 11, 11, 12, 12, 13, 13, 14, 14,
- 15, 15, 16, 16, 17, 17, 18, 18, 15, 15, 16, 16, 17, 17, 18, 18,
-};
-
-const int8_t av1_coeff_band_16x16[256] = {
- 0, 1, 4, 4, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 2, 3, 4,
- 4, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 5, 5, 6, 6, 7, 7,
- 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 5, 5, 6, 6, 7, 7, 7, 7, 8,
- 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12,
- 13, 13, 13, 13, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13,
- 13, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 10, 10,
- 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15,
- 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15,
- 16, 16, 16, 16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16,
- 16, 17, 17, 17, 17, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17,
- 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18,
- 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18, 18, 18, 18,
- 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 18, 18, 18, 18, 19, 19, 19,
- 19, 20, 20, 20, 20, 21, 21, 21, 21,
-};
-
-const int8_t av1_coeff_band_32x32[1024] = {
- 0, 1, 4, 4, 7, 7, 7, 7, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11,
- 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 2, 3, 4, 4, 7, 7,
- 7, 7, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12,
- 12, 12, 12, 12, 12, 12, 12, 5, 5, 6, 6, 7, 7, 7, 7, 10, 10, 10, 10,
- 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12,
- 12, 5, 5, 6, 6, 7, 7, 7, 7, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11,
- 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 8, 8, 8, 8, 9,
- 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
- 12, 12, 12, 12, 12, 12, 12, 12, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10,
- 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12,
- 12, 12, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11,
- 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 8, 8, 8, 8,
- 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11,
- 11, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14,
- 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16,
- 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14,
- 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13,
- 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15,
- 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14,
- 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16,
- 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14,
- 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13,
- 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15,
- 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13,
- 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16,
- 16, 16, 16, 16, 16, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14,
- 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 17,
- 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19,
- 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17,
- 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20,
- 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18,
- 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20,
- 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19,
- 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17,
- 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20,
- 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18,
- 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20,
- 20, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19,
- 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 17, 17, 17, 17, 17,
- 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19,
- 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
- 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24,
- 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23,
- 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21,
- 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23,
- 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22,
- 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24,
- 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22,
- 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21,
- 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23,
- 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22,
- 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24,
- 24, 24, 24, 24, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22,
- 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24,
-};
-
// The ctx offset table when TX is TX_CLASS_2D.
// TX col and row indices are clamped to 4
@@ -184,34 +100,54 @@
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
-const int8_t av1_nz_map_ctx_offset_8x4[32] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 16, 16, 6, 21, 21, 21, 21, 21,
- 16, 16, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
+const int8_t av1_nz_map_ctx_offset_4x8[32] = {
+ 0, 11, 6, 6, 21, 21, 21, 21, 11, 11, 6, 21, 21, 21, 21, 21,
+ 11, 11, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21,
};
const int8_t av1_nz_map_ctx_offset_8x16[128] = {
- 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21,
- 21, 21, 21, 21, 21, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
-};
-
-const int8_t av1_nz_map_ctx_offset_16x8[128] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
+ 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 6,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
const int8_t av1_nz_map_ctx_offset_16x32[512] = {
- 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
- 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21, 21, 21, 21,
+ 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 6, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x16[512] = {
+ 0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 6, 6, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 6, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
@@ -239,41 +175,68 @@
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
-const int8_t av1_nz_map_ctx_offset_32x16[512] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6, 21, 21, 21,
+const int8_t av1_nz_map_ctx_offset_32x64[1024] = {
+ 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 6, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
+ 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
+ 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
+ 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
+ 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11,
+ 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
-const int8_t av1_nz_map_ctx_offset_32x64[1024] = {
- 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
- 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
- 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
- 11, 11, 11, 11, 11, 11, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+const int8_t av1_nz_map_ctx_offset_64x32[1024] = {
+ 0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16, 16, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
@@ -326,79 +289,39 @@
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
-const int8_t av1_nz_map_ctx_offset_64x32[1024] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16,
- 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
-};
-
const int8_t av1_nz_map_ctx_offset_4x16[64] = {
- 0, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21, 21, 6, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 11, 11, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
const int8_t av1_nz_map_ctx_offset_16x4[64] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 16, 16, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 0, 16, 16, 16, 16, 16, 16, 16, 6, 6, 21, 21, 6, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
};
const int8_t av1_nz_map_ctx_offset_8x32[256] = {
- 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21,
+ 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 6, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+ 21, 21, 21, 21, 21, 21, 21, 21, 21,
+};
+
+const int8_t av1_nz_map_ctx_offset_32x8[256] = {
+ 0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 6, 6, 21,
21, 21, 21, 21, 21, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
@@ -414,33 +337,16 @@
21, 21, 21, 21, 21, 21, 21, 21, 21,
};
-const int8_t av1_nz_map_ctx_offset_32x8[256] = {
- 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 6, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
- 21, 21, 21, 21, 21, 21, 21, 21, 21,
-};
-
const int8_t *av1_nz_map_ctx_offset[19] = {
av1_nz_map_ctx_offset_4x4, // TX_4x4
av1_nz_map_ctx_offset_8x8, // TX_8x8
av1_nz_map_ctx_offset_16x16, // TX_16x16
av1_nz_map_ctx_offset_32x32, // TX_32x32
- av1_nz_map_ctx_offset_32x32, // TX_32x32
- av1_nz_map_ctx_offset_4x16, // TX_4x8
- av1_nz_map_ctx_offset_8x4, // TX_8x4
- av1_nz_map_ctx_offset_8x32, // TX_8x16
- av1_nz_map_ctx_offset_16x8, // TX_16x8
+ av1_nz_map_ctx_offset_32x32, // TX_64x64
+ av1_nz_map_ctx_offset_4x8, // TX_4x8
+ av1_nz_map_ctx_offset_16x4, // TX_8x4
+ av1_nz_map_ctx_offset_8x16, // TX_8x16
+ av1_nz_map_ctx_offset_32x8, // TX_16x8
av1_nz_map_ctx_offset_16x32, // TX_16x32
av1_nz_map_ctx_offset_32x16, // TX_32x16
av1_nz_map_ctx_offset_32x64, // TX_32x64
@@ -449,8 +355,8 @@
av1_nz_map_ctx_offset_16x4, // TX_16x4
av1_nz_map_ctx_offset_8x32, // TX_8x32
av1_nz_map_ctx_offset_32x8, // TX_32x8
- av1_nz_map_ctx_offset_16x32, // TX_16x64
- av1_nz_map_ctx_offset_64x32, // TX_64x16
+ av1_nz_map_ctx_offset_32x64, // TX_16x64
+ av1_nz_map_ctx_offset_32x16, // TX_64x16
};
const int16_t av1_eob_group_start[12] = { 0, 1, 2, 3, 5, 9,
diff --git a/av1/common/txb_common.h b/av1/common/txb_common.h
index 40fcffc..9628090 100644
--- a/av1/common/txb_common.h
+++ b/av1/common/txb_common.h
@@ -17,14 +17,6 @@
extern const int16_t av1_eob_group_start[12];
extern const int16_t av1_eob_offset_bits[12];
-extern const int8_t av1_coeff_band_4x4[16];
-
-extern const int8_t av1_coeff_band_8x8[64];
-
-extern const int8_t av1_coeff_band_16x16[256];
-
-extern const int8_t av1_coeff_band_32x32[1024];
-
extern const int8_t *av1_nz_map_ctx_offset[TX_SIZES_ALL];
typedef struct txb_ctx {
@@ -55,9 +47,9 @@
TX_CLASS_HORIZ, // H_FLIPADST
};
-static INLINE int get_txb_bwl(TX_SIZE tx_size) {
+static INLINE int get_txb_bhl(TX_SIZE tx_size) {
tx_size = av1_get_adjusted_tx_size(tx_size);
- return tx_size_wide_log2[tx_size];
+ return tx_size_high_log2[tx_size];
}
static INLINE int get_txb_wide(TX_SIZE tx_size) {
@@ -70,22 +62,22 @@
return tx_size_high[tx_size];
}
-static INLINE uint8_t *set_levels(uint8_t *const levels_buf, const int width) {
- return levels_buf + TX_PAD_TOP * (width + TX_PAD_HOR);
+static INLINE uint8_t *set_levels(uint8_t *const levels_buf, const int height) {
+ return levels_buf + TX_PAD_TOP * (height + TX_PAD_HOR);
}
-static INLINE int get_padded_idx(const int idx, const int bwl) {
- return idx + ((idx >> bwl) << TX_PAD_HOR_LOG2);
+static INLINE int get_padded_idx(const int idx, const int bhl) {
+ return idx + ((idx >> bhl) << TX_PAD_HOR_LOG2);
}
static INLINE int get_br_ctx_2d(const uint8_t *const levels,
const int c, // raster order
- const int bwl) {
+ const int bhl) {
assert(c > 0);
- const int row = c >> bwl;
- const int col = c - (row << bwl);
- const int stride = (1 << bwl) + TX_PAD_HOR;
- const int pos = row * stride + col;
+ const int col = c >> bhl;
+ const int row = c - (col << bhl);
+ const int stride = (1 << bhl) + TX_PAD_HOR;
+ const int pos = col * stride + row;
int mag = AOMMIN(levels[pos + 1], MAX_BASE_BR_RANGE) +
AOMMIN(levels[pos + stride], MAX_BASE_BR_RANGE) +
AOMMIN(levels[pos + 1 + stride], MAX_BASE_BR_RANGE);
@@ -96,10 +88,10 @@
}
static AOM_FORCE_INLINE int get_br_ctx_eob(const int c, // raster order
- const int bwl,
+ const int bhl,
const TX_CLASS tx_class) {
- const int row = c >> bwl;
- const int col = c - (row << bwl);
+ const int col = c >> bhl;
+ const int row = c - (col << bhl);
if (c == 0) return 0;
if ((tx_class == TX_CLASS_2D && row < 2 && col < 2) ||
(tx_class == TX_CLASS_HORIZ && col == 0) ||
@@ -110,11 +102,11 @@
static AOM_FORCE_INLINE int get_br_ctx(const uint8_t *const levels,
const int c, // raster order
- const int bwl, const TX_CLASS tx_class) {
- const int row = c >> bwl;
- const int col = c - (row << bwl);
- const int stride = (1 << bwl) + TX_PAD_HOR;
- const int pos = row * stride + col;
+ const int bhl, const TX_CLASS tx_class) {
+ const int col = c >> bhl;
+ const int row = c - (col << bhl);
+ const int stride = (1 << bhl) + TX_PAD_HOR;
+ const int pos = col * stride + row;
int mag = levels[pos + 1];
mag += levels[pos + stride];
switch (tx_class) {
@@ -125,13 +117,13 @@
if ((row < 2) && (col < 2)) return mag + 7;
break;
case TX_CLASS_HORIZ:
- mag += levels[pos + 2];
+ mag += levels[pos + (stride << 1)];
mag = AOMMIN((mag + 1) >> 1, 6);
if (c == 0) return mag;
if (col == 0) return mag + 7;
break;
case TX_CLASS_VERT:
- mag += levels[pos + (stride << 1)];
+ mag += levels[pos + 2];
mag = AOMMIN((mag + 1) >> 1, 6);
if (c == 0) return mag;
if (row == 0) return mag + 7;
@@ -156,25 +148,25 @@
};
static AOM_FORCE_INLINE int get_nz_mag(const uint8_t *const levels,
- const int bwl, const TX_CLASS tx_class) {
+ const int bhl, const TX_CLASS tx_class) {
int mag;
// Note: AOMMIN(level, 3) is useless for decoder since level < 3.
- mag = clip_max3[levels[1]]; // { 0, 1 }
- mag += clip_max3[levels[(1 << bwl) + TX_PAD_HOR]]; // { 1, 0 }
+ mag = clip_max3[levels[(1 << bhl) + TX_PAD_HOR]]; // { 0, 1 }
+ mag += clip_max3[levels[1]]; // { 1, 0 }
if (tx_class == TX_CLASS_2D) {
- mag += clip_max3[levels[(1 << bwl) + TX_PAD_HOR + 1]]; // { 1, 1 }
- mag += clip_max3[levels[2]]; // { 0, 2 }
- mag += clip_max3[levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)]]; // { 2, 0 }
+ mag += clip_max3[levels[(1 << bhl) + TX_PAD_HOR + 1]]; // { 1, 1 }
+ mag += clip_max3[levels[(2 << bhl) + (2 << TX_PAD_HOR_LOG2)]]; // { 0, 2 }
+ mag += clip_max3[levels[2]]; // { 2, 0 }
} else if (tx_class == TX_CLASS_VERT) {
- mag += clip_max3[levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)]]; // { 2, 0 }
- mag += clip_max3[levels[(3 << bwl) + (3 << TX_PAD_HOR_LOG2)]]; // { 3, 0 }
- mag += clip_max3[levels[(4 << bwl) + (4 << TX_PAD_HOR_LOG2)]]; // { 4, 0 }
+ mag += clip_max3[levels[2]]; // { 2, 0 }
+ mag += clip_max3[levels[3]]; // { 3, 0 }
+ mag += clip_max3[levels[4]]; // { 4, 0 }
} else {
- mag += clip_max3[levels[2]]; // { 0, 2 }
- mag += clip_max3[levels[3]]; // { 0, 3 }
- mag += clip_max3[levels[4]]; // { 0, 4 }
+ mag += clip_max3[levels[(2 << bhl) + (2 << TX_PAD_HOR_LOG2)]]; // { 0, 2 }
+ mag += clip_max3[levels[(3 << bhl) + (3 << TX_PAD_HOR_LOG2)]]; // { 0, 3 }
+ mag += clip_max3[levels[(4 << bhl) + (4 << TX_PAD_HOR_LOG2)]]; // { 0, 4 }
}
return mag;
@@ -197,7 +189,7 @@
static AOM_FORCE_INLINE int get_nz_map_ctx_from_stats(
const int stats,
const int coeff_idx, // raster order
- const int bwl, const TX_SIZE tx_size, const TX_CLASS tx_class) {
+ const int bhl, const TX_SIZE tx_size, const TX_CLASS tx_class) {
// tx_class == 0(TX_CLASS_2D)
if ((tx_class | coeff_idx) == 0) return 0;
int ctx = (stats + 1) >> 1;
@@ -218,12 +210,12 @@
return ctx + av1_nz_map_ctx_offset[tx_size][coeff_idx];
}
case TX_CLASS_HORIZ: {
- const int row = coeff_idx >> bwl;
- const int col = coeff_idx - (row << bwl);
+ const int col = coeff_idx >> bhl;
return ctx + nz_map_ctx_offset_1d[col];
}
case TX_CLASS_VERT: {
- const int row = coeff_idx >> bwl;
+ const int col = coeff_idx >> bhl;
+ const int row = coeff_idx - (col << bhl);
return ctx + nz_map_ctx_offset_1d[row];
}
default: break;
@@ -234,49 +226,49 @@
typedef aom_cdf_prob (*base_cdf_arr)[CDF_SIZE(4)];
typedef aom_cdf_prob (*br_cdf_arr)[CDF_SIZE(BR_CDF_SIZE)];
-static INLINE int get_lower_levels_ctx_eob(int bwl, int height, int scan_idx) {
+static INLINE int get_lower_levels_ctx_eob(int bhl, int width, int scan_idx) {
if (scan_idx == 0) return 0;
- if (scan_idx <= (height << bwl) / 8) return 1;
- if (scan_idx <= (height << bwl) / 4) return 2;
+ if (scan_idx <= (width << bhl) / 8) return 1;
+ if (scan_idx <= (width << bhl) / 4) return 2;
return 3;
}
static INLINE int get_lower_levels_ctx_2d(const uint8_t *levels, int coeff_idx,
- int bwl, TX_SIZE tx_size) {
+ int bhl, TX_SIZE tx_size) {
assert(coeff_idx > 0);
int mag;
// Note: AOMMIN(level, 3) is useless for decoder since level < 3.
- levels = levels + get_padded_idx(coeff_idx, bwl);
- mag = AOMMIN(levels[1], 3); // { 0, 1 }
- mag += AOMMIN(levels[(1 << bwl) + TX_PAD_HOR], 3); // { 1, 0 }
- mag += AOMMIN(levels[(1 << bwl) + TX_PAD_HOR + 1], 3); // { 1, 1 }
- mag += AOMMIN(levels[2], 3); // { 0, 2 }
- mag += AOMMIN(levels[(2 << bwl) + (2 << TX_PAD_HOR_LOG2)], 3); // { 2, 0 }
+ levels = levels + get_padded_idx(coeff_idx, bhl);
+ mag = AOMMIN(levels[(1 << bhl) + TX_PAD_HOR], 3); // { 0, 1 }
+ mag += AOMMIN(levels[1], 3); // { 1, 0 }
+ mag += AOMMIN(levels[(1 << bhl) + TX_PAD_HOR + 1], 3); // { 1, 1 }
+ mag += AOMMIN(levels[(2 << bhl) + (2 << TX_PAD_HOR_LOG2)], 3); // { 0, 2 }
+ mag += AOMMIN(levels[2], 3); // { 2, 0 }
const int ctx = AOMMIN((mag + 1) >> 1, 4);
return ctx + av1_nz_map_ctx_offset[tx_size][coeff_idx];
}
static AOM_FORCE_INLINE int get_lower_levels_ctx(const uint8_t *levels,
- int coeff_idx, int bwl,
+ int coeff_idx, int bhl,
TX_SIZE tx_size,
TX_CLASS tx_class) {
const int stats =
- get_nz_mag(levels + get_padded_idx(coeff_idx, bwl), bwl, tx_class);
- return get_nz_map_ctx_from_stats(stats, coeff_idx, bwl, tx_size, tx_class);
+ get_nz_mag(levels + get_padded_idx(coeff_idx, bhl), bhl, tx_class);
+ return get_nz_map_ctx_from_stats(stats, coeff_idx, bhl, tx_size, tx_class);
}
static INLINE int get_lower_levels_ctx_general(int is_last, int scan_idx,
- int bwl, int height,
+ int bhl, int width,
const uint8_t *levels,
int coeff_idx, TX_SIZE tx_size,
TX_CLASS tx_class) {
if (is_last) {
if (scan_idx == 0) return 0;
- if (scan_idx <= (height << bwl) >> 3) return 1;
- if (scan_idx <= (height << bwl) >> 2) return 2;
+ if (scan_idx <= (width << bhl) >> 3) return 1;
+ if (scan_idx <= (width << bhl) >> 2) return 2;
return 3;
}
- return get_lower_levels_ctx(levels, coeff_idx, bwl, tx_size, tx_class);
+ return get_lower_levels_ctx(levels, coeff_idx, bhl, tx_size, tx_class);
}
static INLINE void set_dc_sign(int *cul_level, int dc_val) {
diff --git a/av1/common/warped_motion.c b/av1/common/warped_motion.c
index 4e5966e..83f410e 100644
--- a/av1/common/warped_motion.c
+++ b/av1/common/warped_motion.c
@@ -27,7 +27,6 @@
// We need an extra 2 taps to fit this in, for a total of 8 taps.
/* clang-format off */
const int16_t av1_warped_filter[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8] = {
-#if WARPEDPIXEL_PREC_BITS == 6
// [-1, 0)
{ 0, 0, 127, 1, 0, 0, 0, 0 }, { 0, - 1, 127, 2, 0, 0, 0, 0 },
{ 1, - 3, 127, 4, - 1, 0, 0, 0 }, { 1, - 4, 126, 6, - 2, 1, 0, 0 },
@@ -131,63 +130,6 @@
{ 0, 0, 0, - 1, 4, 127, - 3, 1 }, { 0, 0, 0, 0, 2, 127, - 1, 0 },
// dummy (replicate row index 191)
{ 0, 0, 0, 0, 2, 127, - 1, 0 },
-
-#elif WARPEDPIXEL_PREC_BITS == 5
- // [-1, 0)
- {0, 0, 127, 1, 0, 0, 0, 0}, {1, -3, 127, 4, -1, 0, 0, 0},
- {1, -5, 126, 8, -3, 1, 0, 0}, {1, -7, 124, 13, -4, 1, 0, 0},
- {2, -9, 122, 18, -6, 1, 0, 0}, {2, -11, 120, 22, -7, 2, 0, 0},
- {3, -13, 117, 27, -8, 2, 0, 0}, {3, -14, 114, 32, -10, 3, 0, 0},
- {3, -15, 111, 37, -11, 3, 0, 0}, {3, -16, 108, 42, -12, 3, 0, 0},
- {4, -17, 104, 47, -13, 3, 0, 0}, {4, -17, 100, 52, -14, 3, 0, 0},
- {4, -18, 96, 58, -15, 3, 0, 0}, {4, -18, 91, 63, -16, 4, 0, 0},
- {4, -18, 87, 68, -17, 4, 0, 0}, {4, -18, 82, 73, -17, 4, 0, 0},
- {4, -18, 78, 78, -18, 4, 0, 0}, {4, -17, 73, 82, -18, 4, 0, 0},
- {4, -17, 68, 87, -18, 4, 0, 0}, {4, -16, 63, 91, -18, 4, 0, 0},
- {3, -15, 58, 96, -18, 4, 0, 0}, {3, -14, 52, 100, -17, 4, 0, 0},
- {3, -13, 47, 104, -17, 4, 0, 0}, {3, -12, 42, 108, -16, 3, 0, 0},
- {3, -11, 37, 111, -15, 3, 0, 0}, {3, -10, 32, 114, -14, 3, 0, 0},
- {2, -8, 27, 117, -13, 3, 0, 0}, {2, -7, 22, 120, -11, 2, 0, 0},
- {1, -6, 18, 122, -9, 2, 0, 0}, {1, -4, 13, 124, -7, 1, 0, 0},
- {1, -3, 8, 126, -5, 1, 0, 0}, {0, -1, 4, 127, -3, 1, 0, 0},
- // [0, 1)
- { 0, 0, 0, 127, 1, 0, 0, 0}, { 0, 1, -3, 127, 4, -2, 1, 0},
- { 0, 2, -6, 126, 8, -3, 1, 0}, {-1, 3, -8, 125, 13, -5, 2, -1},
- {-1, 4, -11, 123, 18, -7, 3, -1}, {-1, 4, -13, 121, 23, -8, 3, -1},
- {-1, 5, -15, 119, 27, -10, 4, -1}, {-2, 6, -17, 116, 33, -12, 5, -1},
- {-2, 6, -18, 113, 38, -13, 5, -1}, {-2, 7, -19, 110, 43, -15, 6, -2},
- {-2, 7, -20, 106, 49, -16, 6, -2}, {-2, 7, -21, 102, 54, -17, 7, -2},
- {-2, 8, -22, 98, 59, -18, 7, -2}, {-2, 8, -22, 94, 64, -19, 7, -2},
- {-2, 8, -22, 89, 69, -20, 8, -2}, {-2, 8, -21, 84, 74, -21, 8, -2},
- {-2, 8, -21, 79, 79, -21, 8, -2}, {-2, 8, -21, 74, 84, -21, 8, -2},
- {-2, 8, -20, 69, 89, -22, 8, -2}, {-2, 7, -19, 64, 94, -22, 8, -2},
- {-2, 7, -18, 59, 98, -22, 8, -2}, {-2, 7, -17, 54, 102, -21, 7, -2},
- {-2, 6, -16, 49, 106, -20, 7, -2}, {-2, 6, -15, 43, 110, -19, 7, -2},
- {-1, 5, -13, 38, 113, -18, 6, -2}, {-1, 5, -12, 33, 116, -17, 6, -2},
- {-1, 4, -10, 27, 119, -15, 5, -1}, {-1, 3, -8, 23, 121, -13, 4, -1},
- {-1, 3, -7, 18, 123, -11, 4, -1}, {-1, 2, -5, 13, 125, -8, 3, -1},
- { 0, 1, -3, 8, 126, -6, 2, 0}, { 0, 1, -2, 4, 127, -3, 1, 0},
- // [1, 2)
- {0, 0, 0, 1, 127, 0, 0, 0}, {0, 0, 1, -3, 127, 4, -1, 0},
- {0, 0, 1, -5, 126, 8, -3, 1}, {0, 0, 1, -7, 124, 13, -4, 1},
- {0, 0, 2, -9, 122, 18, -6, 1}, {0, 0, 2, -11, 120, 22, -7, 2},
- {0, 0, 3, -13, 117, 27, -8, 2}, {0, 0, 3, -14, 114, 32, -10, 3},
- {0, 0, 3, -15, 111, 37, -11, 3}, {0, 0, 3, -16, 108, 42, -12, 3},
- {0, 0, 4, -17, 104, 47, -13, 3}, {0, 0, 4, -17, 100, 52, -14, 3},
- {0, 0, 4, -18, 96, 58, -15, 3}, {0, 0, 4, -18, 91, 63, -16, 4},
- {0, 0, 4, -18, 87, 68, -17, 4}, {0, 0, 4, -18, 82, 73, -17, 4},
- {0, 0, 4, -18, 78, 78, -18, 4}, {0, 0, 4, -17, 73, 82, -18, 4},
- {0, 0, 4, -17, 68, 87, -18, 4}, {0, 0, 4, -16, 63, 91, -18, 4},
- {0, 0, 3, -15, 58, 96, -18, 4}, {0, 0, 3, -14, 52, 100, -17, 4},
- {0, 0, 3, -13, 47, 104, -17, 4}, {0, 0, 3, -12, 42, 108, -16, 3},
- {0, 0, 3, -11, 37, 111, -15, 3}, {0, 0, 3, -10, 32, 114, -14, 3},
- {0, 0, 2, -8, 27, 117, -13, 3}, {0, 0, 2, -7, 22, 120, -11, 2},
- {0, 0, 1, -6, 18, 122, -9, 2}, {0, 0, 1, -4, 13, 124, -7, 1},
- {0, 0, 1, -3, 8, 126, -5, 1}, {0, 0, 0, -1, 4, 127, -3, 1},
- // dummy (replicate row index 95)
- {0, 0, 0, -1, 4, 127, -3, 1},
-
-#endif // WARPEDPIXEL_PREC_BITS == 6
};
/* clang-format on */
diff --git a/av1/common/x86/av1_inv_txfm_avx2.c b/av1/common/x86/av1_inv_txfm_avx2.c
index 7993707..0afd42b 100644
--- a/av1/common/x86/av1_inv_txfm_avx2.c
+++ b/av1/common/x86/av1_inv_txfm_avx2.c
@@ -1641,9 +1641,9 @@
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div16 = txfm_size_col >> 4;
- const int buf_size_nonzero_w_div16 = (eobx + 16) >> 4;
+ const int buf_size_nonzero_w = ((eobx + 16) >> 4) << 4;
const int buf_size_nonzero_h_div16 = (eoby + 16) >> 4;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -1660,16 +1660,10 @@
const __m256i scale0 = _mm256_set1_epi16(1 << (15 + shift[0]));
for (int i = 0; i < buf_size_nonzero_h_div16; i++) {
__m256i buf0[64];
- const int32_t *input_row = input + (i << 4) * input_stride;
- for (int j = 0; j < buf_size_nonzero_w_div16; ++j) {
- __m256i *buf0_cur = buf0 + j * 16;
- const int32_t *input_cur = input_row + j * 16;
- load_buffer_32bit_to_16bit_w16_avx2(input_cur, input_stride, buf0_cur,
- 16);
- transpose_16bit_16x16_avx2(buf0_cur, buf0_cur);
- }
+ load_buffer_32bit_to_16bit_w16_avx2(input + 16 * i, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_avx2(buf0, buf0, input_stride); // rect special code
+ round_shift_avx2(buf0, buf0, buf_size_nonzero_w); // rect special code
}
row_txfm(buf0, buf0);
for (int j = 0; j < txfm_size_col; ++j) {
@@ -1778,15 +1772,20 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int col_max = AOMMIN(32, txfm_size_col);
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
__m256i buf[32];
- for (int i = 0; i < input_stride; i += 16) {
- iidentity_row_16xn_avx2(buf, input + i, input_stride, shift[0], row_max,
- txw_idx, rect_type);
- iidentity_col_16xn_avx2(output + i, stride, buf, shift[1], row_max,
- txh_idx);
+
+ for (int i = 0; i < (col_max >> 4); ++i) {
+ for (int j = 0; j < (row_max >> 4); j++) {
+ iidentity_row_16xn_avx2(buf, input + j * 16 + i * 16 * input_stride,
+ row_max, shift[0], 16, txw_idx, rect_type);
+ transpose_16bit_16x16_avx2(buf, buf);
+ iidentity_col_16xn_avx2(output + i * 16 + j * 16 * stride, stride, buf,
+ shift[1], 16, txh_idx);
+ }
}
}
@@ -1800,9 +1799,10 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int txfm_size_col_notzero = AOMMIN(32, txfm_size_col);
- const int input_stride = txfm_size_col_notzero;
+ const int txfm_size_row_notzero = AOMMIN(32, txfm_size_row);
+ const int input_stride = txfm_size_row_notzero;
const int buf_size_w_div16 = (eobx + 16) >> 4;
+ const int buf_size_h_div16 = (eoby + 16) >> 4;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_y = lowbd_txfm_all_1d_zeros_idx[eoby];
@@ -1815,8 +1815,13 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_w_div16; i++) {
__m256i buf0[64];
- iidentity_row_16xn_avx2(buf0, input + (i << 4), input_stride, shift[0],
- eoby + 1, txw_idx, rect_type);
+ for (int j = 0; j < buf_size_h_div16; j++) {
+ __m256i *buf0_cur = buf0 + j * 16;
+ const int32_t *input_cur = input + i * 16 * input_stride + j * 16;
+ iidentity_row_16xn_avx2(buf0_cur, input_cur, input_stride, shift[0], 16,
+ txw_idx, rect_type);
+ transpose_16bit_16x16_avx2(buf0_cur, buf0_cur);
+ }
col_txfm(buf0, buf0);
__m256i mshift = _mm256_set1_epi16(1 << (15 + shift[1]));
int k = ud_flip ? (txfm_size_row - 1) : 0;
@@ -1841,7 +1846,8 @@
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div16 = txfm_size_col >> 4;
const int buf_size_h_div16 = (eoby + 16) >> 4;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int buf_size_nonzero_w = ((eobx + 8) >> 3) << 3;
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -1854,15 +1860,10 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_h_div16; i++) {
__m256i buf0[64];
- const int32_t *input_row = input + i * input_stride * 16;
- for (int j = 0; j < AOMMIN(4, buf_size_w_div16); ++j) {
- __m256i *buf0_cur = buf0 + j * 16;
- load_buffer_32bit_to_16bit_w16_avx2(input_row + j * 16, input_stride,
- buf0_cur, 16);
- transpose_16bit_16x16_avx2(buf0_cur, buf0_cur);
- }
+ load_buffer_32bit_to_16bit_w16_avx2(input + i * 16, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_avx2(buf0, buf0, input_stride); // rect special code
+ round_shift_avx2(buf0, buf0, buf_size_nonzero_w); // rect special code
}
row_txfm(buf0, buf0);
round_shift_16bit_w16_avx2(buf0, txfm_size_col, shift[0]);
@@ -1886,6 +1887,285 @@
}
}
+static const transform_1d_ssse3 lowbd_txfm_all_1d_zeros_8x8_arr[2][2] = {
+ { av1_idct8_low1_ssse3, av1_idct8_sse2 },
+ { av1_iadst8_low1_ssse3, av1_iadst8_sse2 }
+};
+
+static INLINE void load_buffer_avx2(const int32_t *in, int stride,
+ __m128i *out) {
+ const __m256i a = _mm256_load_si256((const __m256i *)in);
+ const __m256i b = _mm256_load_si256((const __m256i *)(in + stride * 1));
+ const __m256i c = _mm256_load_si256((const __m256i *)(in + stride * 2));
+ const __m256i d = _mm256_load_si256((const __m256i *)(in + stride * 3));
+ const __m256i e = _mm256_load_si256((const __m256i *)(in + stride * 4));
+ const __m256i f = _mm256_load_si256((const __m256i *)(in + stride * 5));
+ const __m256i g = _mm256_load_si256((const __m256i *)(in + stride * 6));
+ const __m256i h = _mm256_load_si256((const __m256i *)(in + stride * 7));
+
+ // a0 a1 a2 a3 b0 b1 b2 b3 a4 a5 a6 a7 b4 b5 b6 b7
+ const __m256i ab_16bit = _mm256_packs_epi32(a, b);
+ // c0 c1 c2 c3 d0 d1 d2 d3 c4 c5 c6 c7 d4 d5 d6 d7
+ const __m256i cd_16bit = _mm256_packs_epi32(c, d);
+ // e0 e1 e2 e3 f0 f1 f2 f3 e4 e5 e6 e7 f4 f5 f6 f7
+ const __m256i ef_16bit = _mm256_packs_epi32(e, f);
+ // g0 g1 g2 g3 h0 h1 h2 h3 g4 g5 g6 g7 h4 h5 h6 h7
+ const __m256i gh_16bit = _mm256_packs_epi32(g, h);
+
+ // a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7
+ const __m256i ab = _mm256_permute4x64_epi64(ab_16bit, 0xd8);
+ // c0 c1 c2 c3 c4 c5 c6 c7 d0 d1 d2 d3 d4 d5 d6 d7
+ const __m256i cd = _mm256_permute4x64_epi64(cd_16bit, 0xd8);
+ // e0 e1 e2 e3 e4 e5 e6 e7 f0 f1 f2 f3 f4 f5 f6 f7
+ const __m256i ef = _mm256_permute4x64_epi64(ef_16bit, 0xd8);
+ // g0 g1 g2 g3 g4 g5 g6 g7 h0 h1 h2 h3 h4 h5 h6 h7
+ const __m256i gh = _mm256_permute4x64_epi64(gh_16bit, 0xd8);
+
+ out[0] = _mm256_castsi256_si128(ab);
+ out[1] = _mm256_extractf128_si256(ab, 1);
+ out[2] = _mm256_castsi256_si128(cd);
+ out[3] = _mm256_extractf128_si256(cd, 1);
+ out[4] = _mm256_castsi256_si128(ef);
+ out[5] = _mm256_extractf128_si256(ef, 1);
+ out[6] = _mm256_castsi256_si128(gh);
+ out[7] = _mm256_extractf128_si256(gh, 1);
+}
+
+static INLINE void round_and_transpose_avx2(const __m128i *const in,
+ __m128i *const out, int bit,
+ int *lr_flip) {
+ __m256i buf_temp[4];
+ const __m256i scale = _mm256_set1_epi16(1 << (15 + bit));
+ int j = *lr_flip ? 7 : 0;
+ const int step = *lr_flip ? -1 : 1;
+
+ // 70 71 72 73 74 75 76 77 | 30 31 32 33 34 35 36 37
+ buf_temp[0] = _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]),
+ in[j + 4 * step], 1);
+ j += step;
+ // 60 61 62 63 64 65 66 67 | 20 21 22 23 24 25 26 27
+ buf_temp[1] = _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]),
+ in[j + 4 * step], 1);
+ j += step;
+ // 50 51 52 53 54 55 56 57 | 10 11 12 13 14 15 16 17
+ buf_temp[2] = _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]),
+ in[j + 4 * step], 1);
+ j += step;
+ // 40 41 42 43 44 45 46 47 | 00 01 02 03 04 05 06 07
+ buf_temp[3] = _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]),
+ in[j + 4 * step], 1);
+
+ // 70 71 72 73 74 75 76 77 | 30 31 32 33 34 35 36 37
+ buf_temp[0] = _mm256_mulhrs_epi16(buf_temp[0], scale);
+ // 60 61 62 63 64 65 66 67 | 20 21 22 23 24 25 26 27
+ buf_temp[1] = _mm256_mulhrs_epi16(buf_temp[1], scale);
+ // 50 51 52 53 54 55 56 57 | 10 11 12 13 14 15 16 17
+ buf_temp[2] = _mm256_mulhrs_epi16(buf_temp[2], scale);
+ // 40 41 42 43 44 45 46 47 | 00 01 02 03 04 05 06 07
+ buf_temp[3] = _mm256_mulhrs_epi16(buf_temp[3], scale);
+
+ // 70 60 71 61 72 62 73 63 | 30 20 31 21 32 22 33 23
+ const __m256i unpcklo0 = _mm256_unpacklo_epi16(buf_temp[0], buf_temp[1]);
+ // 74 64 75 65 76 66 77 67 | 34 24 35 25 36 26 37 27
+ const __m256i unpckhi0 = _mm256_unpackhi_epi16(buf_temp[0], buf_temp[1]);
+ // 50 40 51 41 52 42 53 43 | 10 00 11 01 12 02 13 03
+ const __m256i unpcklo1 = _mm256_unpacklo_epi16(buf_temp[2], buf_temp[3]);
+ // 54 44 55 45 56 46 57 47 | 14 04 15 05 16 06 17 07
+ const __m256i unpckhi1 = _mm256_unpackhi_epi16(buf_temp[2], buf_temp[3]);
+
+ // 70 60 50 40 71 61 51 41 | 30 20 10 00 31 21 11 01
+ const __m256i unpcklo00 = _mm256_unpacklo_epi32(unpcklo0, unpcklo1);
+ // 72 62 52 42 73 63 53 43 | 32 22 12 02 33 23 13 03
+ const __m256i unpckhi00 = _mm256_unpackhi_epi32(unpcklo0, unpcklo1);
+ // 74 64 54 44 75 65 55 45 | 34 24 14 04 35 25 15 05
+ const __m256i unpcklo01 = _mm256_unpacklo_epi32(unpckhi0, unpckhi1);
+ // 76 66 56 46 77 67 57 47 | 36 26 16 06 37 27 17 07
+ const __m256i unpckhi01 = _mm256_unpackhi_epi32(unpckhi0, unpckhi1);
+
+ // 70 60 50 40 30 20 10 00 | 71 61 51 41 31 21 11 01
+ const __m256i reg_00 = _mm256_permute4x64_epi64(unpcklo00, 0xd8);
+ // 72 62 52 42 32 22 12 02 | 73 63 53 43 33 23 13 03
+ const __m256i reg_01 = _mm256_permute4x64_epi64(unpckhi00, 0xd8);
+ // 74 64 54 44 34 24 14 04 | 75 65 55 45 35 25 15 05
+ const __m256i reg_10 = _mm256_permute4x64_epi64(unpcklo01, 0xd8);
+ // 76 66 56 46 36 26 16 06 | 77 67 57 47 37 27 17 07
+ const __m256i reg_11 = _mm256_permute4x64_epi64(unpckhi01, 0xd8);
+
+ // 70 60 50 40 30 20 10 00
+ out[0] = _mm256_castsi256_si128(reg_00);
+ // 71 61 51 41 31 21 11 01
+ out[1] = _mm256_extracti128_si256(reg_00, 1);
+ // 72 62 52 42 32 22 12 02
+ out[2] = _mm256_castsi256_si128(reg_01);
+ // 73 63 53 43 33 23 13 03
+ out[3] = _mm256_extracti128_si256(reg_01, 1);
+ // 74 64 54 44 34 24 14 04
+ out[4] = _mm256_castsi256_si128(reg_10);
+ // 75 65 55 45 35 25 15 05
+ out[5] = _mm256_extracti128_si256(reg_10, 1);
+ // 76 66 56 46 36 26 16 06
+ out[6] = _mm256_castsi256_si128(reg_11);
+ // 77 67 57 47 37 27 17 07
+ out[7] = _mm256_extracti128_si256(reg_11, 1);
+}
+
+static INLINE void round_shift_lowbd_write_buffer_avx2(__m128i *in, int bit,
+ uint8_t *output,
+ int stride, int flipud) {
+ __m256i in_256[4], v_256[4];
+ int j = flipud ? 7 : 0;
+ const int step = flipud ? -1 : 1;
+ const __m256i scale = _mm256_set1_epi16(1 << (15 + bit));
+ const __m256i zero = _mm256_setzero_si256();
+ // in[0], in[1]
+ in_256[0] =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]), in[j + step], 1);
+ j += 2 * step;
+ // in[2], in[3]
+ in_256[1] =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]), in[j + step], 1);
+ j += 2 * step;
+ // in[4], in[5]
+ in_256[2] =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]), in[j + step], 1);
+ j += 2 * step;
+ // in[6], in[7]
+ in_256[3] =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(in[j]), in[j + step], 1);
+
+ // i00 i01 i02 i03 i04 i05 i06 i07 i10 i11 i12 i13 i14 i15 i16 i17
+ in_256[0] = _mm256_mulhrs_epi16(in_256[0], scale);
+ // i20 i21 i22 i23 i24 i25 i26 i27 i30 i31 i32 i33 i34 i35 i36 i37
+ in_256[1] = _mm256_mulhrs_epi16(in_256[1], scale);
+ // i40 i41 i42 i43 i44 i45 i46 i47 i50 i51 i52 i53 i54 i55 i56 i57
+ in_256[2] = _mm256_mulhrs_epi16(in_256[2], scale);
+ // i60 i61 i62 i63 i64 i65 i66 i67 i70 i71 i72 i73 i74 i75 i76 i77
+ in_256[3] = _mm256_mulhrs_epi16(in_256[3], scale);
+
+ const __m128i v0 = _mm_loadl_epi64((__m128i const *)(output));
+ const __m128i v1 = _mm_loadl_epi64((__m128i const *)(output + stride));
+ const __m128i v2 = _mm_loadl_epi64((__m128i const *)(output + 2 * stride));
+ const __m128i v3 = _mm_loadl_epi64((__m128i const *)(output + 3 * stride));
+ const __m128i v4 = _mm_loadl_epi64((__m128i const *)(output + 4 * stride));
+ const __m128i v5 = _mm_loadl_epi64((__m128i const *)(output + 5 * stride));
+ const __m128i v6 = _mm_loadl_epi64((__m128i const *)(output + 6 * stride));
+ const __m128i v7 = _mm_loadl_epi64((__m128i const *)(output + 7 * stride));
+
+ v_256[0] = _mm256_inserti128_si256(_mm256_castsi128_si256(v0), v1, 1);
+ v_256[1] = _mm256_inserti128_si256(_mm256_castsi128_si256(v2), v3, 1);
+ v_256[2] = _mm256_inserti128_si256(_mm256_castsi128_si256(v4), v5, 1);
+ v_256[3] = _mm256_inserti128_si256(_mm256_castsi128_si256(v6), v7, 1);
+
+ const __m256i unpcklo0 = _mm256_unpacklo_epi8(v_256[0], zero);
+ const __m256i unpcklo1 = _mm256_unpacklo_epi8(v_256[1], zero);
+ const __m256i unpcklo2 = _mm256_unpacklo_epi8(v_256[2], zero);
+ const __m256i unpcklo3 = _mm256_unpacklo_epi8(v_256[3], zero);
+ // 00 01 10 11
+ const __m256i x0 = _mm256_adds_epi16(in_256[0], unpcklo0);
+ // 20 21 30 31
+ const __m256i x1 = _mm256_adds_epi16(in_256[1], unpcklo1);
+ // 40 41 50 51
+ const __m256i x2 = _mm256_adds_epi16(in_256[2], unpcklo2);
+ // 60 61 70 71
+ const __m256i x3 = _mm256_adds_epi16(in_256[3], unpcklo3);
+
+ // 00 01 20 21 10 11 30 31
+ const __m256i res_0123 = _mm256_packus_epi16(x0, x1);
+ // 40 41 60 61 50 51 70 71
+ const __m256i res_4567 = _mm256_packus_epi16(x2, x3);
+
+ // 00 01 20 21
+ const __m128i res_02 = _mm256_castsi256_si128(res_0123);
+ // 10 11 30 31
+ const __m128i res_13 = _mm256_extracti128_si256(res_0123, 1);
+ // 40 41 60 61
+ const __m128i res_46 = _mm256_castsi256_si128(res_4567);
+ // 50 51 70 71
+ const __m128i res_57 = _mm256_extracti128_si256(res_4567, 1);
+
+ // 00 01
+ _mm_storel_epi64((__m128i *)(output), res_02);
+ // 10 11
+ _mm_storel_epi64((__m128i *)(output + stride), res_13);
+ // 20 21
+ _mm_storel_epi64((__m128i *)(output + 2 * stride),
+ _mm_unpackhi_epi64(res_02, res_02));
+ // 30 31
+ _mm_storel_epi64((__m128i *)(output + 3 * stride),
+ _mm_unpackhi_epi64(res_13, res_13));
+ // 40 41
+ _mm_storel_epi64((__m128i *)(output + 4 * stride), res_46);
+ // 50 51
+ _mm_storel_epi64((__m128i *)(output + 5 * stride), res_57);
+ // 60 61
+ _mm_storel_epi64((__m128i *)(output + 6 * stride),
+ _mm_unpackhi_epi64(res_46, res_46));
+ // 70 71
+ _mm_storel_epi64((__m128i *)(output + 7 * stride),
+ _mm_unpackhi_epi64(res_57, res_57));
+}
+
+// AVX2 implementation has the advantage when combined multiple operations
+// together.
+static INLINE void lowbd_inv_txfm2d_8x8_no_identity_avx2(
+ const int32_t *input, uint8_t *output, int stride, TX_TYPE tx_type,
+ TX_SIZE tx_size, int eob) {
+ __m128i buf1[8];
+ const int input_stride = 8;
+ const int8_t *shift = av1_inv_txfm_shift_ls[tx_size];
+ assert(hitx_1d_tab[tx_type] < 2);
+ assert(vitx_1d_tab[tx_type] < 2);
+ const transform_1d_ssse3 row_txfm =
+ lowbd_txfm_all_1d_zeros_8x8_arr[hitx_1d_tab[tx_type]][eob != 1];
+ const transform_1d_ssse3 col_txfm =
+ lowbd_txfm_all_1d_zeros_8x8_arr[vitx_1d_tab[tx_type]][eob != 1];
+
+ assert(col_txfm != NULL);
+ assert(row_txfm != NULL);
+ int ud_flip, lr_flip;
+ get_flip_cfg(tx_type, &ud_flip, &lr_flip);
+
+ __m128i buf0[8];
+ __m128i *buf0_cur = buf0;
+ load_buffer_avx2(input, input_stride, buf0_cur);
+ row_txfm(buf0, buf0);
+
+ assert(shift[0] < 0);
+ __m128i *_buf1 = buf1;
+ round_and_transpose_avx2(buf0, _buf1, shift[0], &lr_flip);
+ assert(shift[1] < 0);
+ col_txfm(buf1, buf1);
+ round_shift_lowbd_write_buffer_avx2(buf1, shift[1], output, stride, ud_flip);
+}
+
+// AVX2 implementation of 8x8 inverse transform. Observed that coding AVX2 for
+// tx_type with identity in either of the direction has no advantage.
+static void lowbd_inv_txfm2d_add_8x8_avx2(const int32_t *input, uint8_t *output,
+ int stride, TX_TYPE tx_type,
+ TX_SIZE tx_size, int eob) {
+ switch (tx_type) {
+ case IDTX:
+ av1_lowbd_inv_txfm2d_add_idtx_ssse3(input, output, stride, tx_size);
+
+ break;
+ case V_DCT:
+ case V_ADST:
+ case V_FLIPADST:
+ av1_lowbd_inv_txfm2d_add_h_identity_ssse3(input, output, stride, tx_type,
+ tx_size, eob);
+ break;
+ case H_DCT:
+ case H_ADST:
+ case H_FLIPADST:
+ av1_lowbd_inv_txfm2d_add_v_identity_ssse3(input, output, stride, tx_type,
+ tx_size, eob);
+ break;
+ default:
+ lowbd_inv_txfm2d_8x8_no_identity_avx2(input, output, stride, tx_type,
+ tx_size, eob);
+ }
+}
+
// for 32x32,32x64,64x32,64x64,16x32,32x16,64x16,16x64
static INLINE void lowbd_inv_txfm2d_add_universe_avx2(
const int32_t *input, uint8_t *output, int stride, TX_TYPE tx_type,
@@ -1931,7 +2211,6 @@
int eob) {
switch (tx_size) {
case TX_4X4:
- case TX_8X8:
case TX_4X8:
case TX_8X4:
case TX_8X16:
@@ -1943,6 +2222,10 @@
av1_lowbd_inv_txfm2d_add_ssse3(input, output, stride, tx_type, tx_size,
eob);
break;
+ case TX_8X8:
+ lowbd_inv_txfm2d_add_8x8_avx2(input, output, stride, tx_type, tx_size,
+ eob);
+ break;
case TX_16X16:
case TX_32X32:
case TX_64X64:
diff --git a/av1/common/x86/av1_inv_txfm_ssse3.c b/av1/common/x86/av1_inv_txfm_ssse3.c
index 738cc98..79a6064 100644
--- a/av1/common/x86/av1_inv_txfm_ssse3.c
+++ b/av1/common/x86/av1_inv_txfm_ssse3.c
@@ -76,7 +76,7 @@
btf_16_adds_subs_out_sse2(output[1], output[2], x[1], x[2]);
}
-static void idct8_low1_ssse3(const __m128i *input, __m128i *output) {
+void av1_idct8_low1_ssse3(const __m128i *input, __m128i *output) {
const int32_t *cospi = cospi_arr(INV_COS_BIT);
// stage 1
@@ -99,7 +99,7 @@
output[4] = x[0];
}
-static void idct8_sse2(const __m128i *input, __m128i *output) {
+void av1_idct8_sse2(const __m128i *input, __m128i *output) {
const int8_t cos_bit = INV_COS_BIT;
const int32_t *cospi = cospi_arr(INV_COS_BIT);
const __m128i __rounding = _mm_set1_epi32(1 << (INV_COS_BIT - 1));
@@ -1698,7 +1698,7 @@
}
}
-static void iadst8_low1_ssse3(const __m128i *input, __m128i *output) {
+void av1_iadst8_low1_ssse3(const __m128i *input, __m128i *output) {
const int8_t cos_bit = INV_COS_BIT;
const int32_t *cospi = cospi_arr(INV_COS_BIT);
const __m128i __zero = _mm_setzero_si128();
@@ -1744,7 +1744,7 @@
output[7] = _mm_subs_epi16(__zero, x[1]);
}
-static void iadst8_sse2(const __m128i *input, __m128i *output) {
+void av1_iadst8_sse2(const __m128i *input, __m128i *output) {
const int8_t cos_bit = INV_COS_BIT;
const int32_t *cospi = cospi_arr(INV_COS_BIT);
const __m128i __zero = _mm_setzero_si128();
@@ -2269,7 +2269,7 @@
static const transform_1d_ssse3
lowbd_txfm_all_1d_w8_arr[TX_SIZES][ITX_TYPES_1D] = {
{ idct4_sse2, iadst4_sse2, iidentity4_ssse3 },
- { idct8_sse2, iadst8_sse2, iidentity8_sse2 },
+ { av1_idct8_sse2, av1_iadst8_sse2, iidentity8_sse2 },
{ idct16_sse2, iadst16_sse2, iidentity16_ssse3 },
{ idct32_sse2, NULL, NULL },
{ idct64_low32_ssse3, NULL, NULL },
@@ -2284,8 +2284,8 @@
{ iadst4_sse2, iadst4_sse2, NULL, NULL },
{ iidentity4_ssse3, iidentity4_ssse3, NULL, NULL },
},
- { { idct8_low1_ssse3, idct8_sse2, NULL, NULL },
- { iadst8_low1_ssse3, iadst8_sse2, NULL, NULL },
+ { { av1_idct8_low1_ssse3, av1_idct8_sse2, NULL, NULL },
+ { av1_iadst8_low1_ssse3, av1_iadst8_sse2, NULL, NULL },
{ iidentity8_sse2, iidentity8_sse2, NULL, NULL } },
{
{ idct16_low1_ssse3, idct16_low8_ssse3, idct16_sse2, NULL },
@@ -2382,24 +2382,27 @@
}
}
-static INLINE void lowbd_inv_txfm2d_add_idtx_ssse3(const int32_t *input,
- uint8_t *output, int stride,
- TX_SIZE tx_size) {
+void av1_lowbd_inv_txfm2d_add_idtx_ssse3(const int32_t *input, uint8_t *output,
+ int stride, TX_SIZE tx_size) {
const int8_t *shift = av1_inv_txfm_shift_ls[tx_size];
const int txw_idx = get_txw_idx(tx_size);
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int col_max = AOMMIN(32, txfm_size_col);
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
- __m128i buf[32];
- for (int i = 0; i < (input_stride >> 3); ++i) {
- iidentity_row_8xn_ssse3(buf, input + 8 * i, input_stride, shift[0], row_max,
- txw_idx, rect_type);
- iidentity_col_8xn_ssse3(output + 8 * i, stride, buf, shift[1], row_max,
- txh_idx);
+ for (int i = 0; i < (col_max >> 3); ++i) {
+ for (int j = 0; j < (row_max >> 3); j++) {
+ __m128i buf[8];
+ iidentity_row_8xn_ssse3(buf, input + j * 8 + i * 8 * input_stride,
+ row_max, shift[0], 8, txw_idx, rect_type);
+ transpose_16bit_8x8(buf, buf);
+ iidentity_col_8xn_ssse3(output + i * 8 + j * 8 * stride, stride, buf,
+ shift[1], 8, txh_idx);
+ }
}
}
@@ -2424,8 +2427,7 @@
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
- load_buffer_32bit_to_16bit_w4(input, txfm_size_col, buf, txfm_size_row);
- transpose_16bit_4x4(buf, buf);
+ load_buffer_32bit_to_16bit_w4(input, txfm_size_row, buf, txfm_size_col);
row_txfm(buf, buf);
if (lr_flip) {
__m128i temp[4];
@@ -2481,9 +2483,9 @@
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div8 = txfm_size_col >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = ((eobx + 8) >> 3) << 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -2499,14 +2501,10 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
__m128i buf0[64];
- const int32_t *input_row = input + i * input_stride * 8;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- __m128i *buf0_cur = buf0 + j * 8;
- load_buffer_32bit_to_16bit(input_row + j * 8, input_stride, buf0_cur, 8);
- transpose_16bit_8x8(buf0_cur, buf0_cur);
- }
+ load_buffer_32bit_to_16bit(input + 8 * i, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_ssse3(buf0, buf0, input_stride); // rect special code
+ round_shift_ssse3(buf0, buf0, buf_size_nonzero_w); // rect special code
}
row_txfm(buf0, buf0);
round_shift_16bit_ssse3(buf0, txfm_size_col, shift[0]);
@@ -2540,9 +2538,10 @@
}
}
-static INLINE void lowbd_inv_txfm2d_add_h_identity_ssse3(
- const int32_t *input, uint8_t *output, int stride, TX_TYPE tx_type,
- TX_SIZE tx_size, int eob) {
+void av1_lowbd_inv_txfm2d_add_h_identity_ssse3(const int32_t *input,
+ uint8_t *output, int stride,
+ TX_TYPE tx_type, TX_SIZE tx_size,
+ int eob) {
const int8_t *shift = av1_inv_txfm_shift_ls[tx_size];
int eobx, eoby;
get_eobx_eoby_scan_h_identity(&eobx, &eoby, tx_size, eob);
@@ -2551,7 +2550,8 @@
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div8 = (eobx + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int buf_size_h_div8 = (eoby + 8) >> 3;
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eoby];
@@ -2565,8 +2565,13 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_w_div8; i++) {
__m128i buf0[64];
- iidentity_row_8xn_ssse3(buf0, input + 8 * i, input_stride, shift[0],
- eoby + 1, txw_idx, rect_type);
+ for (int j = 0; j < buf_size_h_div8; j++) {
+ __m128i *buf0_cur = buf0 + j * 8;
+ const int32_t *input_cur = input + i * 8 * input_stride + j * 8;
+ iidentity_row_8xn_ssse3(buf0_cur, input_cur, input_stride, shift[0], 8,
+ txw_idx, rect_type);
+ transpose_16bit_8x8(buf0_cur, buf0_cur);
+ }
col_txfm(buf0, buf0);
__m128i mshift = _mm_set1_epi16(1 << (15 + shift[1]));
int k = ud_flip ? (txfm_size_row - 1) : 0;
@@ -2582,9 +2587,10 @@
}
}
-static INLINE void lowbd_inv_txfm2d_add_v_identity_ssse3(
- const int32_t *input, uint8_t *output, int stride, TX_TYPE tx_type,
- TX_SIZE tx_size, int eob) {
+void av1_lowbd_inv_txfm2d_add_v_identity_ssse3(const int32_t *input,
+ uint8_t *output, int stride,
+ TX_TYPE tx_type, TX_SIZE tx_size,
+ int eob) {
__m128i buf1[64];
int eobx, eoby;
get_eobx_eoby_scan_v_identity(&eobx, &eoby, tx_size, eob);
@@ -2594,8 +2600,9 @@
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div8 = txfm_size_col >> 3;
+ const int buf_size_nonzero_w = ((eobx + 8) >> 3) << 3;
const int buf_size_h_div8 = (eoby + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -2607,14 +2614,10 @@
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
for (int i = 0; i < buf_size_h_div8; i++) {
__m128i buf0[64];
- const int32_t *input_row = input + i * input_stride * 8;
- for (int j = 0; j < AOMMIN(4, buf_size_w_div8); ++j) {
- __m128i *buf0_cur = buf0 + j * 8;
- load_buffer_32bit_to_16bit(input_row + j * 8, input_stride, buf0_cur, 8);
- transpose_16bit_8x8(buf0_cur, buf0_cur);
- }
+ load_buffer_32bit_to_16bit(input + i * 8, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_ssse3(buf0, buf0, input_stride); // rect special code
+ round_shift_ssse3(buf0, buf0, buf_size_nonzero_w); // rect special code
}
row_txfm(buf0, buf0);
round_shift_16bit_ssse3(buf0, txfm_size_col, shift[0]);
@@ -2648,19 +2651,19 @@
tx_size, eob);
break;
case IDTX:
- lowbd_inv_txfm2d_add_idtx_ssse3(input, output, stride, tx_size);
+ av1_lowbd_inv_txfm2d_add_idtx_ssse3(input, output, stride, tx_size);
break;
case V_DCT:
case V_ADST:
case V_FLIPADST:
- lowbd_inv_txfm2d_add_h_identity_ssse3(input, output, stride, tx_type,
- tx_size, eob);
+ av1_lowbd_inv_txfm2d_add_h_identity_ssse3(input, output, stride, tx_type,
+ tx_size, eob);
break;
case H_DCT:
case H_ADST:
case H_FLIPADST:
- lowbd_inv_txfm2d_add_v_identity_ssse3(input, output, stride, tx_type,
- tx_size, eob);
+ av1_lowbd_inv_txfm2d_add_v_identity_ssse3(input, output, stride, tx_type,
+ tx_size, eob);
break;
default:
lowbd_inv_txfm2d_add_no_identity_ssse3(input, output, stride, tx_type,
@@ -2690,8 +2693,7 @@
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
- load_buffer_32bit_to_16bit_w4(input, txfm_size_col, buf, txfm_size_row);
- transpose_16bit_4x8(buf, buf);
+ load_buffer_32bit_to_16bit(input, txfm_size_row, buf, txfm_size_col);
round_shift_ssse3(buf, buf, txfm_size_col); // rect special code
row_txfm(buf, buf);
// round_shift_16bit_ssse3(buf, txfm_size_col, shift[0]);// shift[0] is 0
@@ -2728,8 +2730,7 @@
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
- load_buffer_32bit_to_16bit(input, txfm_size_col, buf, txfm_size_row);
- transpose_16bit_8x4(buf, buf);
+ load_buffer_32bit_to_16bit_w4(input, txfm_size_row, buf, txfm_size_col);
round_shift_ssse3(buf, buf, txfm_size_col); // rect special code
row_txfm(buf, buf);
// round_shift_16bit_ssse3(buf, txfm_size_col, shift[0]); // shift[0] is 0
@@ -2769,11 +2770,10 @@
const int row_one_loop = 8;
for (int i = 0; i < 2; ++i) {
- const int32_t *input_cur = input + i * txfm_size_col * row_one_loop;
+ const int32_t *input_cur = input + i * row_one_loop;
__m128i *buf_cur = buf + i * row_one_loop;
- load_buffer_32bit_to_16bit_w4(input_cur, txfm_size_col, buf_cur,
- row_one_loop);
- transpose_16bit_4x8(buf_cur, buf_cur);
+ load_buffer_32bit_to_16bit(input_cur, txfm_size_row, buf_cur,
+ txfm_size_col);
if (row_txfm == iidentity4_ssse3) {
const __m128i scale = pair_set_epi16(NewSqrt2, 3 << (NewSqrt2Bits - 1));
const __m128i ones = _mm_set1_epi16(1);
@@ -2826,13 +2826,7 @@
int ud_flip, lr_flip;
get_flip_cfg(tx_type, &ud_flip, &lr_flip);
const int row_one_loop = 8;
- for (int i = 0; i < buf_size_w_div8; ++i) {
- const int32_t *input_cur = input + i * row_one_loop;
- __m128i *buf_cur = buf + i * row_one_loop;
- load_buffer_32bit_to_16bit(input_cur, txfm_size_col, buf_cur,
- txfm_size_row);
- transpose_16bit_8x4(buf_cur, buf_cur);
- }
+ load_buffer_32bit_to_16bit_w4(input, txfm_size_row, buf, txfm_size_col);
if (row_txfm == iidentity16_ssse3) {
const __m128i scale = pair_set_epi16(2 * NewSqrt2, 3 << (NewSqrt2Bits - 1));
const __m128i ones = _mm_set1_epi16(1);
diff --git a/av1/common/x86/av1_inv_txfm_ssse3.h b/av1/common/x86/av1_inv_txfm_ssse3.h
index b85bc9d..1873d01 100644
--- a/av1/common/x86/av1_inv_txfm_ssse3.h
+++ b/av1/common/x86/av1_inv_txfm_ssse3.h
@@ -19,7 +19,6 @@
#include "aom/aom_integer.h"
#include "aom_dsp/x86/transpose_sse2.h"
-#include "aom_dsp/x86/txfm_common_sse2.h"
#ifdef __cplusplus
extern "C" {
@@ -215,7 +214,7 @@
eob -= 1;
const int txfm_size_row = tx_size_high[tx_size];
const int eoby_max = AOMMIN(32, txfm_size_row) - 1;
- *eobx = eob / (eoby_max + 1);
+ *eobx = eob_fill[eob / (eoby_max + 1)];
*eoby = (eob >= eoby_max) ? eoby_max : eob_fill[eob];
}
@@ -224,6 +223,23 @@
void av1_lowbd_inv_txfm2d_add_ssse3(const int32_t *input, uint8_t *output,
int stride, TX_TYPE tx_type,
TX_SIZE tx_size, int eob);
+
+void av1_lowbd_inv_txfm2d_add_idtx_ssse3(const int32_t *input, uint8_t *output,
+ int stride, TX_SIZE tx_size);
+
+void av1_lowbd_inv_txfm2d_add_h_identity_ssse3(const int32_t *input,
+ uint8_t *output, int stride,
+ TX_TYPE tx_type, TX_SIZE tx_size,
+ int eob);
+void av1_lowbd_inv_txfm2d_add_v_identity_ssse3(const int32_t *input,
+ uint8_t *output, int stride,
+ TX_TYPE tx_type, TX_SIZE tx_size,
+ int eob);
+
+void av1_iadst8_low1_ssse3(const __m128i *input, __m128i *output);
+
+void av1_idct8_low1_ssse3(const __m128i *input, __m128i *output);
+
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/av1/common/x86/av1_txfm_sse2.h b/av1/common/x86/av1_txfm_sse2.h
index b67bf54..129721c 100644
--- a/av1/common/x86/av1_txfm_sse2.h
+++ b/av1/common/x86/av1_txfm_sse2.h
@@ -307,6 +307,10 @@
typedef void (*transform_1d_sse2)(const __m128i *input, __m128i *output,
int8_t cos_bit);
+void av1_iadst8_sse2(const __m128i *input, __m128i *output);
+
+void av1_idct8_sse2(const __m128i *input, __m128i *output);
+
typedef struct {
transform_1d_sse2 col, row; // vertical and horizontal
} transform_2d_sse2;
diff --git a/av1/common/x86/av1_txfm_sse4.c b/av1/common/x86/av1_txfm_sse4.c
index 65ccd19..1894efd 100644
--- a/av1/common/x86/av1_txfm_sse4.c
+++ b/av1/common/x86/av1_txfm_sse4.c
@@ -14,6 +14,7 @@
#include "av1/common/av1_txfm.h"
#include "av1/common/x86/av1_txfm_sse4.h"
+// This function assumes `arr` is 16-byte aligned.
void av1_round_shift_array_sse4_1(int32_t *arr, int size, int bit) {
__m128i *const vec = (__m128i *)arr;
const int vec_size = size >> 2;
diff --git a/av1/common/x86/av1_txfm_sse4.h b/av1/common/x86/av1_txfm_sse4.h
index 6cad821..387dfd6 100644
--- a/av1/common/x86/av1_txfm_sse4.h
+++ b/av1/common/x86/av1_txfm_sse4.h
@@ -25,7 +25,7 @@
return _mm_srai_epi32(tmp, bit);
}
-static INLINE void av1_round_shift_array_32_sse4_1(__m128i *input,
+static INLINE void av1_round_shift_array_32_sse4_1(const __m128i *input,
__m128i *output,
const int size,
const int bit) {
@@ -42,7 +42,7 @@
}
}
-static INLINE void av1_round_shift_rect_array_32_sse4_1(__m128i *input,
+static INLINE void av1_round_shift_rect_array_32_sse4_1(const __m128i *input,
__m128i *output,
const int size,
const int bit,
diff --git a/av1/common/x86/convolve_avx2.c b/av1/common/x86/convolve_avx2.c
index 30de982..3862bbe 100644
--- a/av1/common/x86/convolve_avx2.c
+++ b/av1/common/x86/convolve_avx2.c
@@ -714,32 +714,32 @@
(__m128i *)(&src_ptr[i * src_stride + src_stride]))),
0x20);
// row0 0..7 row1 0..7
- const __m256i s_16l = _mm256_unpacklo_epi8(data, v_zero);
+ const __m256i s_16lo = _mm256_unpacklo_epi8(data, v_zero);
// row0 8..F row1 8..F
- const __m256i s_16h = _mm256_unpackhi_epi8(data, v_zero);
+ const __m256i s_16hi = _mm256_unpackhi_epi8(data, v_zero);
// row0 00 00 01 01 .. 03 03 row1 00 00 01 01 .. 03 03
- const __m256i s_ll = _mm256_unpacklo_epi16(s_16l, s_16l);
+ const __m256i s_lolo = _mm256_unpacklo_epi16(s_16lo, s_16lo);
// row0 04 04 .. 07 07 row1 04 04 .. 07 07
- const __m256i s_lh = _mm256_unpackhi_epi16(s_16l, s_16l);
+ const __m256i s_lohi = _mm256_unpackhi_epi16(s_16lo, s_16lo);
// row0 08 08 09 09 .. 0B 0B row1 08 08 09 09 .. 0B 0B
- const __m256i s_hl = _mm256_unpacklo_epi16(s_16h, s_16h);
+ const __m256i s_hilo = _mm256_unpacklo_epi16(s_16hi, s_16hi);
// row0 0C 0C .. 0F 0F row1 0C 0C .. 0F 0F
- const __m256i s_hh = _mm256_unpackhi_epi16(s_16h, s_16h);
+ const __m256i s_hihi = _mm256_unpackhi_epi16(s_16hi, s_16hi);
// 00 01 01 02 02 03 03 04 10 11 11 12 12 13 13 14
- s[0] = _mm256_alignr_epi8(s_lh, s_ll, 2);
+ s[0] = _mm256_alignr_epi8(s_lohi, s_lolo, 2);
// 02 03 03 04 04 05 05 06 12 13 13 14 14 15 15 16
- s[1] = _mm256_alignr_epi8(s_lh, s_ll, 10);
+ s[1] = _mm256_alignr_epi8(s_lohi, s_lolo, 10);
// 04 05 05 06 06 07 07 08 14 15 15 16 16 17 17 18
- s[2] = _mm256_alignr_epi8(s_hl, s_lh, 2);
+ s[2] = _mm256_alignr_epi8(s_hilo, s_lohi, 2);
// 06 07 07 08 08 09 09 0A 16 17 17 18 18 19 19 1A
- s[3] = _mm256_alignr_epi8(s_hl, s_lh, 10);
+ s[3] = _mm256_alignr_epi8(s_hilo, s_lohi, 10);
// 08 09 09 0A 0A 0B 0B 0C 18 19 19 1A 1A 1B 1B 1C
- s[4] = _mm256_alignr_epi8(s_hh, s_hl, 2);
+ s[4] = _mm256_alignr_epi8(s_hihi, s_hilo, 2);
// 0A 0B 0B 0C 0C 0D 0D 0E 1A 1B 1B 1C 1C 1D 1D 1E
- s[5] = _mm256_alignr_epi8(s_hh, s_hl, 10);
+ s[5] = _mm256_alignr_epi8(s_hihi, s_hilo, 10);
const __m256i res_lo = convolve_12taps(s, coeffs);
@@ -784,26 +784,26 @@
(__m128i *)(&src_ptr[i * src_stride + j + 4]))),
0x20);
// row0 0..7 4..B
- const __m256i s_16l = _mm256_unpacklo_epi8(data, v_zero);
+ const __m256i s_16lo = _mm256_unpacklo_epi8(data, v_zero);
// row0 8..F C..13
- const __m256i s_16h = _mm256_unpackhi_epi8(data, v_zero);
+ const __m256i s_16hi = _mm256_unpackhi_epi8(data, v_zero);
// row0 00 00 01 01 .. 03 03 04 04 05 05 .. 07 07
- const __m256i s_ll = _mm256_unpacklo_epi16(s_16l, s_16l);
+ const __m256i s_lolo = _mm256_unpacklo_epi16(s_16lo, s_16lo);
// row0 04 04 .. 07 07 08 08 .. 0B 0B
- const __m256i s_lh = _mm256_unpackhi_epi16(s_16l, s_16l);
+ const __m256i s_lohi = _mm256_unpackhi_epi16(s_16lo, s_16lo);
// row0 08 08 09 09 .. 0B 0B 0C 0C 0D 0D .. 0F 0F
- const __m256i s_hl = _mm256_unpacklo_epi16(s_16h, s_16h);
+ const __m256i s_hilo = _mm256_unpacklo_epi16(s_16hi, s_16hi);
// row0 0C 0C 0D 0D .. 0F 0F 10 10 11 11 .. 13 13
- const __m256i s_hh = _mm256_unpackhi_epi16(s_16h, s_16h);
+ const __m256i s_hihi = _mm256_unpackhi_epi16(s_16hi, s_16hi);
- s[0] = _mm256_alignr_epi8(s_lh, s_ll, 2);
- s[1] = _mm256_alignr_epi8(s_lh, s_ll, 10);
- s[2] = _mm256_alignr_epi8(s_hl, s_lh, 2);
- s[3] = _mm256_alignr_epi8(s_hl, s_lh, 10);
- s[4] = _mm256_alignr_epi8(s_hh, s_hl, 2);
- s[5] = _mm256_alignr_epi8(s_hh, s_hl, 10);
+ s[0] = _mm256_alignr_epi8(s_lohi, s_lolo, 2);
+ s[1] = _mm256_alignr_epi8(s_lohi, s_lolo, 10);
+ s[2] = _mm256_alignr_epi8(s_hilo, s_lohi, 2);
+ s[3] = _mm256_alignr_epi8(s_hilo, s_lohi, 10);
+ s[4] = _mm256_alignr_epi8(s_hihi, s_hilo, 2);
+ s[5] = _mm256_alignr_epi8(s_hihi, s_hilo, 10);
const __m256i res_lo = convolve_12taps(s, coeffs);
diff --git a/av1/common/x86/highbd_inv_txfm_avx2.c b/av1/common/x86/highbd_inv_txfm_avx2.c
index 0798c6d..cbfe561 100644
--- a/av1/common/x86/highbd_inv_txfm_avx2.c
+++ b/av1/common/x86/highbd_inv_txfm_avx2.c
@@ -231,11 +231,10 @@
out[7] = _mm256_permute2f128_si256(x0, x1, 0x31);
}
-static void load_buffer_32x32(const int32_t *coeff, __m256i *in,
- int input_stiride, int size) {
- int i;
- for (i = 0; i < size; ++i) {
- in[i] = _mm256_loadu_si256((const __m256i *)(coeff + i * input_stiride));
+static INLINE void load_buffer_32bit_input(const int32_t *in, int stride,
+ __m256i *out, int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ out[i] = _mm256_loadu_si256((const __m256i *)(in + i * stride));
}
}
@@ -4119,9 +4118,9 @@
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
const int buf_size_w_div8 = txfm_size_col >> 3;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
const int fun_idx_y = lowbd_txfm_all_1d_zeros_idx[eoby];
@@ -4138,16 +4137,11 @@
// 1st stage: column transform
for (int i = 0; i < buf_size_nonzero_h_div8; i++) {
__m256i buf0[64];
- const int32_t *input_row = input + i * input_stride * 8;
- for (int j = 0; j < buf_size_nonzero_w_div8; ++j) {
- __m256i *buf0_cur = buf0 + j * 8;
- load_buffer_32x32(input_row + j * 8, buf0_cur, input_stride, 8);
-
- transpose_8x8_avx2(&buf0_cur[0], &buf0_cur[0]);
- }
+ load_buffer_32bit_input(input + i * 8, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- round_shift_rect_array_32_avx2(buf0, buf0, buf_size_nonzero_w_div8 << 3,
- 0, NewInvSqrt2);
+ round_shift_rect_array_32_avx2(buf0, buf0, buf_size_nonzero_w, 0,
+ NewInvSqrt2);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
diff --git a/av1/common/x86/highbd_inv_txfm_sse4.c b/av1/common/x86/highbd_inv_txfm_sse4.c
index de3af3a..4ff6a90 100644
--- a/av1/common/x86/highbd_inv_txfm_sse4.c
+++ b/av1/common/x86/highbd_inv_txfm_sse4.c
@@ -161,8 +161,6 @@
op[3] = _mm_srai_epi32(op[3], UNIT_QUANT_SHIFT);
for (int i = 0; i < 2; ++i) {
- transpose_32bit_4x4(op, op);
-
__m128i a1 = op[0];
__m128i c1 = op[1];
__m128i d1 = op[2];
@@ -180,6 +178,9 @@
op[1] = b1;
op[2] = c1;
op[3] = d1;
+ if (i == 0) {
+ transpose_32bit_4x4(op, op);
+ }
}
// Convert to int16_t. The C code checks that we are in range.
@@ -468,15 +469,10 @@
// Stage 0
// Stage 1
// Stage 2
- v0 = _mm_unpacklo_epi32(in[0], in[1]);
- v1 = _mm_unpackhi_epi32(in[0], in[1]);
- v2 = _mm_unpacklo_epi32(in[2], in[3]);
- v3 = _mm_unpackhi_epi32(in[2], in[3]);
-
- u0 = _mm_unpacklo_epi64(v0, v2);
- u1 = _mm_unpackhi_epi64(v0, v2);
- u2 = _mm_unpacklo_epi64(v1, v3);
- u3 = _mm_unpackhi_epi64(v1, v3);
+ u0 = in[0];
+ u1 = in[1];
+ u2 = in[2];
+ u3 = in[3];
x = _mm_mullo_epi32(u0, cospi32);
y = _mm_mullo_epi32(u2, cospi32);
@@ -529,19 +525,13 @@
__m128i s0, s1, s2, s3, s4, s5, s6, s7;
__m128i x0, x1, x2, x3;
__m128i u0, u1, u2, u3;
- __m128i v0, v1, v2, v3;
__m128i u0_low, u1_low, u2_low, u3_low;
__m128i u0_high, u1_high, u2_high, u3_high;
- v0 = _mm_unpacklo_epi32(in[0], in[1]);
- v1 = _mm_unpackhi_epi32(in[0], in[1]);
- v2 = _mm_unpacklo_epi32(in[2], in[3]);
- v3 = _mm_unpackhi_epi32(in[2], in[3]);
-
- x0 = _mm_unpacklo_epi64(v0, v2);
- x1 = _mm_unpackhi_epi64(v0, v2);
- x2 = _mm_unpacklo_epi64(v1, v3);
- x3 = _mm_unpackhi_epi64(v1, v3);
+ x0 = in[0];
+ x1 = in[1];
+ x2 = in[2];
+ x3 = in[3];
s0 = _mm_mullo_epi32(x0, sinpi1);
s1 = _mm_mullo_epi32(x0, sinpi2);
@@ -697,7 +687,6 @@
static void iidentity4_sse4_1(__m128i *in, __m128i *out, int bit, int do_cols,
int bd, int out_shift) {
(void)bit;
- __m128i v[4];
__m128i zero = _mm_setzero_si128();
__m128i fact = _mm_set1_epi32(NewSqrt2);
__m128i offset = _mm_set1_epi32(1 << (NewSqrt2Bits - 1));
@@ -728,17 +717,6 @@
round_shift_4x4(out, out_shift);
highbd_clamp_epi32_sse4_1(out, out, &clamp_lo, &clamp_hi, 4);
}
-
- // Transpose for 4x4
- v[0] = _mm_unpacklo_epi32(out[0], out[1]);
- v[1] = _mm_unpackhi_epi32(out[0], out[1]);
- v[2] = _mm_unpacklo_epi32(out[2], out[3]);
- v[3] = _mm_unpackhi_epi32(out[2], out[3]);
-
- out[0] = _mm_unpacklo_epi64(v[0], v[2]);
- out[1] = _mm_unpackhi_epi64(v[0], v[2]);
- out[2] = _mm_unpacklo_epi64(v[1], v[3]);
- out[3] = _mm_unpackhi_epi64(v[1], v[3]);
}
void av1_inv_txfm2d_add_4x4_sse4_1(const int32_t *input, uint16_t *output,
int stride, TX_TYPE tx_type, int bd) {
@@ -749,96 +727,112 @@
case DCT_DCT:
load_buffer_4x4(input, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_DCT:
load_buffer_4x4(input, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case DCT_ADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_ADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case FLIPADST_DCT:
load_buffer_4x4(input, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case DCT_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 1, -shift[1], bd);
break;
case ADST_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_ADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case IDTX:
load_buffer_4x4(input, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_DCT:
load_buffer_4x4(input, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case H_DCT:
load_buffer_4x4(input, in);
idct4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_ADST:
load_buffer_4x4(input, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case H_ADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 0, -shift[1], bd);
break;
case V_FLIPADST:
load_buffer_4x4(input, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 0, 1, -shift[1], bd);
break;
case H_FLIPADST:
load_buffer_4x4(input, in);
iadst4x4_sse4_1(in, in, INV_COS_BIT, 0, bd, 0);
+ transpose_32bit_4x4(in, in);
iidentity4_sse4_1(in, in, INV_COS_BIT, 1, bd, 0);
write_buffer_4x4(in, output, stride, 1, 0, -shift[1], bd);
break;
@@ -1408,75 +1402,66 @@
switch (tx_type) {
case DCT_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case DCT_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case ADST_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 0, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 0, -shift[1], bd);
break;
case FLIPADST_DCT:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 1, -shift[1], bd);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 1, -shift[1], bd);
break;
case DCT_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- idct8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 0, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ idct8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 0, -shift[1], bd);
break;
case ADST_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 0, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 0, -shift[1], bd);
break;
case FLIPADST_FLIPADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 1, 1, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 1, 1, -shift[1], bd);
break;
case FLIPADST_ADST:
load_buffer_8x8(input, in);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 0, bd, -shift[0]);
- transpose_8x8(in, out);
- iadst8x8_sse4_1(out, in, INV_COS_BIT, 1, bd, 0);
- write_buffer_8x8(in, output, stride, 0, 1, -shift[1], bd);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 0, bd, -shift[0]);
+ transpose_8x8(out, in);
+ iadst8x8_sse4_1(in, out, INV_COS_BIT, 1, bd, 0);
+ write_buffer_8x8(out, output, stride, 0, 1, -shift[1], bd);
break;
default: assert(0);
}
@@ -5251,9 +5236,11 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
- const int buf_size_w_div4 = input_stride >> 2;
+ const int buf_size_w = AOMMIN(32, txfm_size_col);
+ const int buf_size_w_div4 = buf_size_w >> 2;
const int buf_size_h_div8 = (eoby + 8) >> 3;
+ const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eoby];
const transform_1d_sse4_1 row_txfm =
@@ -5265,13 +5252,9 @@
for (int i = 0; i < (buf_size_h_div8 << 1); ++i) {
__m128i buf0[16];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < buf_size_w_div4; ++j) {
- __m128i *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0, buf_size_w);
if (rect_type == 1 || rect_type == -1) {
- av1_round_shift_rect_array_32_sse4_1(buf0, buf0, input_stride, 0,
+ av1_round_shift_rect_array_32_sse4_1(buf0, buf0, buf_size_w, 0,
NewInvSqrt2);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
@@ -5279,10 +5262,13 @@
__m128i *_buf1 = buf1 + i * 4;
for (int j = 0; j < buf_size_w_div4; ++j) {
- _buf1[j * txfm_size_row + 0] = buf0[j * 4 + 0];
- _buf1[j * txfm_size_row + 1] = buf0[j * 4 + 1];
- _buf1[j * txfm_size_row + 2] = buf0[j * 4 + 2];
- _buf1[j * txfm_size_row + 3] = buf0[j * 4 + 3];
+ __m128i *buf0_cur = buf0 + j * 4;
+ TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
+ buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
+ _buf1[j * txfm_size_row + 0] = buf0_cur[0];
+ _buf1[j * txfm_size_row + 1] = buf0_cur[1];
+ _buf1[j * txfm_size_row + 2] = buf0_cur[2];
+ _buf1[j * txfm_size_row + 3] = buf0_cur[3];
}
}
for (int i = 0; i < buf_size_w_div4; i++) {
@@ -5313,10 +5299,11 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
- const int buf_size_w_div8 = input_stride >> 2;
+ const int buf_size_w_div4 = AOMMIN(32, txfm_size_col) >> 2;
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_nonzero_w = buf_size_nonzero_w_div8 << 3;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx = lowbd_txfm_all_1d_zeros_idx[eobx];
const transform_1d_sse4_1 row_txfm =
@@ -5328,32 +5315,26 @@
for (int i = 0; i < (row_max >> 2); ++i) {
__m128i buf0[16];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < (buf_size_nonzero_w_div8 << 1); ++j) {
- __m128i *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
-
- TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
- buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- av1_round_shift_rect_array_32_sse4_1(
- buf0, buf0, (buf_size_nonzero_w_div8 << 3), 0, NewInvSqrt2);
+ av1_round_shift_rect_array_32_sse4_1(buf0, buf0, buf_size_nonzero_w, 0,
+ NewInvSqrt2);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
__m128i *_buf1 = buf1 + i * 4;
if (lr_flip) {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(buf0[4 * j + 3], buf0[4 * j + 2], buf0[4 * j + 1],
buf0[4 * j],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 0],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 1],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 2],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 3]);
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 0],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 1],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 2],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 3]);
}
} else {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(
buf0[j * 4 + 0], buf0[j * 4 + 1], buf0[j * 4 + 2], buf0[j * 4 + 3],
_buf1[j * txfm_size_row + 0], _buf1[j * txfm_size_row + 1],
@@ -5361,7 +5342,7 @@
}
}
}
- for (int i = 0; i < buf_size_w_div8; i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
@@ -5390,8 +5371,10 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int input_stride = AOMMIN(32, txfm_size_col);
const int row_max = AOMMIN(32, txfm_size_row);
+ const int input_stride = row_max;
+ const int buf_size_w = AOMMIN(32, txfm_size_col);
+ const int buf_size_w_div4 = buf_size_w >> 2;
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const transform_1d_sse4_1 row_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
@@ -5400,26 +5383,25 @@
for (int i = 0; i < (row_max >> 2); ++i) {
__m128i buf0[32];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < (input_stride >> 2); ++j) {
- __m128i *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0, buf_size_w);
if (rect_type == 1 || rect_type == -1) {
- av1_round_shift_rect_array_32_sse4_1(buf0, buf0, input_stride, 0,
+ av1_round_shift_rect_array_32_sse4_1(buf0, buf0, buf_size_w, 0,
NewInvSqrt2);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
__m128i *_buf1 = buf1 + i * 4;
- for (int j = 0; j < (input_stride >> 2); ++j) {
- _buf1[j * txfm_size_row + 0] = buf0[j * 4 + 0];
- _buf1[j * txfm_size_row + 1] = buf0[j * 4 + 1];
- _buf1[j * txfm_size_row + 2] = buf0[j * 4 + 2];
- _buf1[j * txfm_size_row + 3] = buf0[j * 4 + 3];
+ for (int j = 0; j < buf_size_w_div4; ++j) {
+ __m128i *buf0_cur = buf0 + j * 4;
+ TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
+ buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
+ _buf1[j * txfm_size_row + 0] = buf0_cur[0];
+ _buf1[j * txfm_size_row + 1] = buf0_cur[1];
+ _buf1[j * txfm_size_row + 2] = buf0_cur[2];
+ _buf1[j * txfm_size_row + 3] = buf0_cur[3];
}
}
- for (int i = 0; i < (input_stride >> 2); i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
@@ -5450,10 +5432,10 @@
const int txh_idx = get_txh_idx(tx_size);
const int txfm_size_col = tx_size_wide[tx_size];
const int txfm_size_row = tx_size_high[tx_size];
- const int buf_size_w_div8 = txfm_size_col >> 2;
- const int buf_size_nonzero_w_div8 = (eobx + 8) >> 3;
+ const int buf_size_w_div4 = txfm_size_col >> 2;
+ const int buf_size_nonzero_w = (eobx + 8) >> 3 << 3;
const int buf_size_nonzero_h_div8 = (eoby + 8) >> 3;
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
const int rect_type = get_rect_tx_log_ratio(txfm_size_col, txfm_size_row);
const int fun_idx_x = lowbd_txfm_all_1d_zeros_idx[eobx];
@@ -5471,32 +5453,26 @@
// 1st stage: column transform
for (int i = 0; i < buf_size_nonzero_h_div8 << 1; i++) {
__m128i buf0[64];
- const int32_t *input_row = input + i * input_stride * 4;
- for (int j = 0; j < buf_size_nonzero_w_div8 << 1; ++j) {
- __m128i *buf0_cur = buf0 + j * 4;
- load_buffer_32bit_input(input_row + j * 4, input_stride, buf0_cur, 4);
-
- TRANSPOSE_4X4(buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3],
- buf0_cur[0], buf0_cur[1], buf0_cur[2], buf0_cur[3]);
- }
+ load_buffer_32bit_input(input + i * 4, input_stride, buf0,
+ buf_size_nonzero_w);
if (rect_type == 1 || rect_type == -1) {
- av1_round_shift_rect_array_32_sse4_1(
- buf0, buf0, buf_size_nonzero_w_div8 << 3, 0, NewInvSqrt2);
+ av1_round_shift_rect_array_32_sse4_1(buf0, buf0, buf_size_nonzero_w, 0,
+ NewInvSqrt2);
}
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
__m128i *_buf1 = buf1 + i * 4;
if (lr_flip) {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(buf0[4 * j + 3], buf0[4 * j + 2], buf0[4 * j + 1],
buf0[4 * j],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 0],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 1],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 2],
- _buf1[txfm_size_row * (buf_size_w_div8 - 1 - j) + 3]);
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 0],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 1],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 2],
+ _buf1[txfm_size_row * (buf_size_w_div4 - 1 - j) + 3]);
}
} else {
- for (int j = 0; j < buf_size_w_div8; ++j) {
+ for (int j = 0; j < buf_size_w_div4; ++j) {
TRANSPOSE_4X4(
buf0[j * 4 + 0], buf0[j * 4 + 1], buf0[j * 4 + 2], buf0[j * 4 + 3],
_buf1[j * txfm_size_row + 0], _buf1[j * txfm_size_row + 1],
@@ -5505,7 +5481,7 @@
}
}
// 2nd stage: column transform
- for (int i = 0; i < buf_size_w_div8; i++) {
+ for (int i = 0; i < buf_size_w_div4; i++) {
col_txfm(buf1 + i * txfm_size_row, buf1 + i * txfm_size_row, INV_COS_BIT, 1,
bd, 0);
@@ -5539,7 +5515,7 @@
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
const transform_1d_sse4_1 col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][1];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
assert(col_txfm != NULL);
assert(row_txfm != NULL);
@@ -5548,9 +5524,8 @@
// 1st stage: column transform
__m128i buf0[8];
- const int32_t *input_row = input;
- __m128i *buf0_cur = buf0;
- load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_row);
+ load_buffer_32bit_input(input, input_stride, buf0, txfm_size_col);
+ load_buffer_32bit_input(input + 4, input_stride, buf0 + 4, txfm_size_col);
av1_round_shift_rect_array_32_sse4_1(buf0, buf0, txfm_size_row, 0,
NewInvSqrt2);
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
@@ -5606,12 +5581,7 @@
const int32_t *input_row = input;
load_buffer_32bit_input(input_row, 4, buf0, txfm_size_col);
- TRANSPOSE_4X4(buf0[0], buf0[2], buf0[4], buf0[6], buf1[0], buf1[1], buf1[2],
- buf1[3]);
- TRANSPOSE_4X4(buf0[1], buf0[3], buf0[5], buf0[7], buf1[4], buf1[5], buf1[6],
- buf1[7]);
-
- av1_round_shift_rect_array_32_sse4_1(buf1, buf0, txfm_size_col, 0,
+ av1_round_shift_rect_array_32_sse4_1(buf0, buf0, txfm_size_col, 0,
NewInvSqrt2);
row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
@@ -5625,8 +5595,9 @@
// 2nd stage: column transform
for (int i = 0; i < 2; i++) {
- col_txfm(buf1_ptr + i * txfm_size_row, buf1_ptr + i * txfm_size_row,
- INV_COS_BIT, 1, bd, 0);
+ __m128i *buf1_cur = buf1_ptr + i * txfm_size_row;
+ transpose_32bit_4x4(buf1_cur, buf1_cur);
+ col_txfm(buf1_cur, buf1_cur, INV_COS_BIT, 1, bd, 0);
}
av1_round_shift_array_32_sse4_1(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
// write to buffer
@@ -5650,7 +5621,7 @@
highbd_txfm_all_1d_zeros_w8_arr[txw_idx][hitx_1d_tab[tx_type]][0];
const transform_1d_sse4_1 col_txfm =
highbd_txfm_all_1d_zeros_w8_arr[txh_idx][vitx_1d_tab[tx_type]][2];
- const int input_stride = AOMMIN(32, txfm_size_col);
+ const int input_stride = AOMMIN(32, txfm_size_row);
assert(col_txfm != NULL);
assert(row_txfm != NULL);
@@ -5659,11 +5630,11 @@
// 1st stage: column transform
__m128i buf0[16];
- const int32_t *input_row = input;
- __m128i *buf0_cur = buf0;
- load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_row);
for (int i = 0; i < (txfm_size_row >> 2); i++) {
- row_txfm(buf0 + (i << 2), buf0 + (i << 2), INV_COS_BIT, 0, bd, -shift[0]);
+ const int32_t *input_row = input + i * 4;
+ __m128i *buf0_cur = buf0 + i * 4;
+ load_buffer_32bit_input(input_row, input_stride, buf0_cur, txfm_size_col);
+ row_txfm(buf0_cur, buf0_cur, INV_COS_BIT, 0, bd, -shift[0]);
}
if (lr_flip) {
@@ -5717,11 +5688,7 @@
const int32_t *input_row = input;
load_buffer_32bit_input(input_row, 4, buf0, txfm_size_col);
- for (int j = 0; j < buf_size_w_div8; j++) {
- TRANSPOSE_4X4(buf0[j], buf0[j + 4], buf0[j + 8], buf0[j + 12], buf1[4 * j],
- buf1[4 * j + 1], buf1[4 * j + 2], buf1[4 * j + 3]);
- }
- row_txfm(buf1, buf0, INV_COS_BIT, 0, bd, -shift[0]);
+ row_txfm(buf0, buf0, INV_COS_BIT, 0, bd, -shift[0]);
__m128i *buf1_ptr;
if (lr_flip) {
@@ -5733,8 +5700,9 @@
// 2nd stage: column transform
for (int i = 0; i < buf_size_w_div8; i++) {
- col_txfm(buf1_ptr + i * txfm_size_row, buf1_ptr + i * txfm_size_row,
- INV_COS_BIT, 1, bd, 0);
+ __m128i *buf1_cur = buf1_ptr + i * txfm_size_row;
+ transpose_32bit_4x4(buf1_cur, buf1_cur);
+ col_txfm(buf1_cur, buf1_cur, INV_COS_BIT, 1, bd, 0);
}
av1_round_shift_array_32_sse4_1(buf1_ptr, buf1_ptr, txfm_size_col, -shift[1]);
diff --git a/av1/common/x86/warp_plane_sse4.c b/av1/common/x86/warp_plane_sse4.c
index e35b557..4c05555 100644
--- a/av1/common/x86/warp_plane_sse4.c
+++ b/av1/common/x86/warp_plane_sse4.c
@@ -33,7 +33,6 @@
/* clang-format off */
DECLARE_ALIGNED(8, const int8_t,
av1_filter_8bit[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8]) = {
-#if WARPEDPIXEL_PREC_BITS == 6
// [-1, 0)
{ 0, 127, 0, 0, 0, 1, 0, 0}, { 0, 127, 0, 0, -1, 2, 0, 0},
{ 1, 127, -1, 0, -3, 4, 0, 0}, { 1, 126, -2, 0, -4, 6, 1, 0},
@@ -135,62 +134,6 @@
{ 0, 0, 4, -3, 0, -1, 127, 1}, { 0, 0, 2, -1, 0, 0, 127, 0},
// dummy (replicate row index 191)
{ 0, 0, 2, -1, 0, 0, 127, 0},
-
-#else
- // [-1, 0)
- { 0, 127, 0, 0, 0, 1, 0, 0}, { 1, 127, -1, 0, -3, 4, 0, 0},
- { 1, 126, -3, 0, -5, 8, 1, 0}, { 1, 124, -4, 0, -7, 13, 1, 0},
- { 2, 122, -6, 0, -9, 18, 1, 0}, { 2, 120, -7, 0, -11, 22, 2, 0},
- { 3, 117, -8, 0, -13, 27, 2, 0}, { 3, 114, -10, 0, -14, 32, 3, 0},
- { 3, 111, -11, 0, -15, 37, 3, 0}, { 3, 108, -12, 0, -16, 42, 3, 0},
- { 4, 104, -13, 0, -17, 47, 3, 0}, { 4, 100, -14, 0, -17, 52, 3, 0},
- { 4, 96, -15, 0, -18, 58, 3, 0}, { 4, 91, -16, 0, -18, 63, 4, 0},
- { 4, 87, -17, 0, -18, 68, 4, 0}, { 4, 82, -17, 0, -18, 73, 4, 0},
- { 4, 78, -18, 0, -18, 78, 4, 0}, { 4, 73, -18, 0, -17, 82, 4, 0},
- { 4, 68, -18, 0, -17, 87, 4, 0}, { 4, 63, -18, 0, -16, 91, 4, 0},
- { 3, 58, -18, 0, -15, 96, 4, 0}, { 3, 52, -17, 0, -14, 100, 4, 0},
- { 3, 47, -17, 0, -13, 104, 4, 0}, { 3, 42, -16, 0, -12, 108, 3, 0},
- { 3, 37, -15, 0, -11, 111, 3, 0}, { 3, 32, -14, 0, -10, 114, 3, 0},
- { 2, 27, -13, 0, -8, 117, 3, 0}, { 2, 22, -11, 0, -7, 120, 2, 0},
- { 1, 18, -9, 0, -6, 122, 2, 0}, { 1, 13, -7, 0, -4, 124, 1, 0},
- { 1, 8, -5, 0, -3, 126, 1, 0}, { 0, 4, -3, 0, -1, 127, 1, 0},
- // [0, 1)
- { 0, 0, 1, 0, 0, 127, 0, 0}, { 0, -3, 4, 1, 1, 127, -2, 0},
- { 0, -6, 8, 1, 2, 126, -3, 0}, {-1, -8, 13, 2, 3, 125, -5, -1},
- {-1, -11, 18, 3, 4, 123, -7, -1}, {-1, -13, 23, 3, 4, 121, -8, -1},
- {-1, -15, 27, 4, 5, 119, -10, -1}, {-2, -17, 33, 5, 6, 116, -12, -1},
- {-2, -18, 38, 5, 6, 113, -13, -1}, {-2, -19, 43, 6, 7, 110, -15, -2},
- {-2, -20, 49, 6, 7, 106, -16, -2}, {-2, -21, 54, 7, 7, 102, -17, -2},
- {-2, -22, 59, 7, 8, 98, -18, -2}, {-2, -22, 64, 7, 8, 94, -19, -2},
- {-2, -22, 69, 8, 8, 89, -20, -2}, {-2, -21, 74, 8, 8, 84, -21, -2},
- {-2, -21, 79, 8, 8, 79, -21, -2}, {-2, -21, 84, 8, 8, 74, -21, -2},
- {-2, -20, 89, 8, 8, 69, -22, -2}, {-2, -19, 94, 8, 7, 64, -22, -2},
- {-2, -18, 98, 8, 7, 59, -22, -2}, {-2, -17, 102, 7, 7, 54, -21, -2},
- {-2, -16, 106, 7, 6, 49, -20, -2}, {-2, -15, 110, 7, 6, 43, -19, -2},
- {-1, -13, 113, 6, 5, 38, -18, -2}, {-1, -12, 116, 6, 5, 33, -17, -2},
- {-1, -10, 119, 5, 4, 27, -15, -1}, {-1, -8, 121, 4, 3, 23, -13, -1},
- {-1, -7, 123, 4, 3, 18, -11, -1}, {-1, -5, 125, 3, 2, 13, -8, -1},
- { 0, -3, 126, 2, 1, 8, -6, 0}, { 0, -2, 127, 1, 1, 4, -3, 0},
- // [1, 2)
- { 0, 0, 127, 0, 0, 1, 0, 0}, { 0, 1, 127, -1, 0, -3, 4, 0},
- { 0, 1, 126, -3, 0, -5, 8, 1}, { 0, 1, 124, -4, 0, -7, 13, 1},
- { 0, 2, 122, -6, 0, -9, 18, 1}, { 0, 2, 120, -7, 0, -11, 22, 2},
- { 0, 3, 117, -8, 0, -13, 27, 2}, { 0, 3, 114, -10, 0, -14, 32, 3},
- { 0, 3, 111, -11, 0, -15, 37, 3}, { 0, 3, 108, -12, 0, -16, 42, 3},
- { 0, 4, 104, -13, 0, -17, 47, 3}, { 0, 4, 100, -14, 0, -17, 52, 3},
- { 0, 4, 96, -15, 0, -18, 58, 3}, { 0, 4, 91, -16, 0, -18, 63, 4},
- { 0, 4, 87, -17, 0, -18, 68, 4}, { 0, 4, 82, -17, 0, -18, 73, 4},
- { 0, 4, 78, -18, 0, -18, 78, 4}, { 0, 4, 73, -18, 0, -17, 82, 4},
- { 0, 4, 68, -18, 0, -17, 87, 4}, { 0, 4, 63, -18, 0, -16, 91, 4},
- { 0, 3, 58, -18, 0, -15, 96, 4}, { 0, 3, 52, -17, 0, -14, 100, 4},
- { 0, 3, 47, -17, 0, -13, 104, 4}, { 0, 3, 42, -16, 0, -12, 108, 3},
- { 0, 3, 37, -15, 0, -11, 111, 3}, { 0, 3, 32, -14, 0, -10, 114, 3},
- { 0, 2, 27, -13, 0, -8, 117, 3}, { 0, 2, 22, -11, 0, -7, 120, 2},
- { 0, 1, 18, -9, 0, -6, 122, 2}, { 0, 1, 13, -7, 0, -4, 124, 1},
- { 0, 1, 8, -5, 0, -3, 126, 1}, { 0, 0, 4, -3, 0, -1, 127, 1},
- // dummy (replicate row index 95)
- { 0, 0, 4, -3, 0, -1, 127, 1},
-#endif // WARPEDPIXEL_PREC_BITS == 6
};
/* clang-format on */
diff --git a/av1/decoder/decodeframe.c b/av1/decoder/decodeframe.c
index 53275ea..5b76de8 100644
--- a/av1/decoder/decodeframe.c
+++ b/av1/decoder/decodeframe.c
@@ -10,6 +10,7 @@
*/
#include <assert.h>
+#include <stdbool.h>
#include <stddef.h>
#include "config/aom_config.h"
@@ -4325,10 +4326,9 @@
trans_dec_factor;
}
- if (params->wmtype <= AFFINE) {
- int good_shear_params = av1_get_shear_params(params);
- if (!good_shear_params) return 0;
- }
+ assert(params->wmtype <= AFFINE);
+ int good_shear_params = av1_get_shear_params(params);
+ if (!good_shear_params) return 0;
return 1;
}
@@ -4434,7 +4434,7 @@
lock_buffer_pool(cm->buffer_pool);
reset_ref_frame_map(cm);
assert(cm->cur_frame->ref_count == 1);
- for (i = 0; i < FRAME_BUFFERS; ++i) {
+ for (i = 0; i < cm->buffer_pool->num_frame_bufs; ++i) {
// Reset all unreferenced frame buffers. We can also reset cm->cur_frame
// because we are the sole owner of cm->cur_frame.
if (frame_bufs[i].ref_count > 0 && &frame_bufs[i] != cm->cur_frame) {
@@ -5128,7 +5128,7 @@
if (!av1_superres_scaled(cm)) return;
assert(!cm->features.all_lossless);
- av1_superres_upscale(cm, pool);
+ av1_superres_upscale(cm, pool, 0);
}
uint32_t av1_decode_frame_headers_and_setup(AV1Decoder *pbi,
@@ -5218,7 +5218,7 @@
if (cm->rst_info[0].frame_restoration_type != RESTORE_NONE ||
cm->rst_info[1].frame_restoration_type != RESTORE_NONE ||
cm->rst_info[2].frame_restoration_type != RESTORE_NONE) {
- av1_alloc_restoration_buffers(cm);
+ av1_alloc_restoration_buffers(cm, /*is_sgr_enabled =*/true);
}
const int use_highbd = cm->seq_params->use_highbitdepth;
diff --git a/av1/decoder/decodetxb.c b/av1/decoder/decodetxb.c
index 0ec1487..dd5aa62 100644
--- a/av1/decoder/decodetxb.c
+++ b/av1/decoder/decodetxb.c
@@ -61,17 +61,17 @@
static INLINE void read_coeffs_reverse_2d(aom_reader *r, TX_SIZE tx_size,
int start_si, int end_si,
- const int16_t *scan, int bwl,
+ const int16_t *scan, int bhl,
uint8_t *levels,
base_cdf_arr base_cdf,
br_cdf_arr br_cdf) {
for (int c = end_si; c >= start_si; --c) {
const int pos = scan[c];
- const int coeff_ctx = get_lower_levels_ctx_2d(levels, pos, bwl, tx_size);
+ const int coeff_ctx = get_lower_levels_ctx_2d(levels, pos, bhl, tx_size);
const int nsymbs = 4;
int level = aom_read_symbol(r, base_cdf[coeff_ctx], nsymbs, ACCT_STR);
if (level > NUM_BASE_LEVELS) {
- const int br_ctx = get_br_ctx_2d(levels, pos, bwl);
+ const int br_ctx = get_br_ctx_2d(levels, pos, bhl);
aom_cdf_prob *cdf = br_cdf[br_ctx];
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
@@ -79,23 +79,23 @@
if (k < BR_CDF_SIZE - 1) break;
}
}
- levels[get_padded_idx(pos, bwl)] = level;
+ levels[get_padded_idx(pos, bhl)] = level;
}
}
static INLINE void read_coeffs_reverse(aom_reader *r, TX_SIZE tx_size,
TX_CLASS tx_class, int start_si,
- int end_si, const int16_t *scan, int bwl,
+ int end_si, const int16_t *scan, int bhl,
uint8_t *levels, base_cdf_arr base_cdf,
br_cdf_arr br_cdf) {
for (int c = end_si; c >= start_si; --c) {
const int pos = scan[c];
const int coeff_ctx =
- get_lower_levels_ctx(levels, pos, bwl, tx_size, tx_class);
+ get_lower_levels_ctx(levels, pos, bhl, tx_size, tx_class);
const int nsymbs = 4;
int level = aom_read_symbol(r, base_cdf[coeff_ctx], nsymbs, ACCT_STR);
if (level > NUM_BASE_LEVELS) {
- const int br_ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int br_ctx = get_br_ctx(levels, pos, bhl, tx_class);
aom_cdf_prob *cdf = br_cdf[br_ctx];
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
@@ -103,7 +103,7 @@
if (k < BR_CDF_SIZE - 1) break;
}
}
- levels[get_padded_idx(pos, bwl)] = level;
+ levels[get_padded_idx(pos, bhl)] = level;
}
}
@@ -123,13 +123,13 @@
const int16_t *const dequant = pd->seg_dequant_QTX[mbmi->segment_id];
tran_low_t *const tcoeffs = dcb->dqcoeff_block[plane] + dcb->cb_offset[plane];
const int shift = av1_get_tx_scale(tx_size);
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
int cul_level = 0;
int dc_val = 0;
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
const int all_zero = aom_read_symbol(
r, ec_ctx->txb_skip_cdf[txs_ctx][txb_ctx->txb_skip_ctx], 2, ACCT_STR);
eob_info *eob_data = dcb->eob_data[plane] + dcb->txb_offset[plane];
@@ -238,7 +238,7 @@
if (*eob > 1) {
memset(levels_buf, 0,
sizeof(*levels_buf) *
- ((width + TX_PAD_HOR) * (height + TX_PAD_VER) + TX_PAD_END));
+ ((height + TX_PAD_HOR) * (width + TX_PAD_VER) + TX_PAD_END));
}
{
@@ -246,13 +246,13 @@
// TODO(angiebird): Put this into a function
const int c = *eob - 1;
const int pos = scan[c];
- const int coeff_ctx = get_lower_levels_ctx_eob(bwl, height, c);
+ const int coeff_ctx = get_lower_levels_ctx_eob(bhl, width, c);
const int nsymbs = 3;
aom_cdf_prob *cdf =
ec_ctx->coeff_base_eob_cdf[txs_ctx][plane_type][coeff_ctx];
int level = aom_read_symbol(r, cdf, nsymbs, ACCT_STR) + 1;
if (level > NUM_BASE_LEVELS) {
- const int br_ctx = get_br_ctx_eob(pos, bwl, tx_class);
+ const int br_ctx = get_br_ctx_eob(pos, bhl, tx_class);
cdf = ec_ctx->coeff_br_cdf[AOMMIN(txs_ctx, TX_32X32)][plane_type][br_ctx];
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
const int k = aom_read_symbol(r, cdf, BR_CDF_SIZE, ACCT_STR);
@@ -260,19 +260,19 @@
if (k < BR_CDF_SIZE - 1) break;
}
}
- levels[get_padded_idx(pos, bwl)] = level;
+ levels[get_padded_idx(pos, bhl)] = level;
}
if (*eob > 1) {
base_cdf_arr base_cdf = ec_ctx->coeff_base_cdf[txs_ctx][plane_type];
br_cdf_arr br_cdf =
ec_ctx->coeff_br_cdf[AOMMIN(txs_ctx, TX_32X32)][plane_type];
if (tx_class == TX_CLASS_2D) {
- read_coeffs_reverse_2d(r, tx_size, 1, *eob - 1 - 1, scan, bwl, levels,
+ read_coeffs_reverse_2d(r, tx_size, 1, *eob - 1 - 1, scan, bhl, levels,
base_cdf, br_cdf);
- read_coeffs_reverse(r, tx_size, tx_class, 0, 0, scan, bwl, levels,
+ read_coeffs_reverse(r, tx_size, tx_class, 0, 0, scan, bhl, levels,
base_cdf, br_cdf);
} else {
- read_coeffs_reverse(r, tx_size, tx_class, 0, *eob - 1 - 1, scan, bwl,
+ read_coeffs_reverse(r, tx_size, tx_class, 0, *eob - 1 - 1, scan, bhl,
levels, base_cdf, br_cdf);
}
}
@@ -280,7 +280,7 @@
for (int c = 0; c < *eob; ++c) {
const int pos = scan[c];
uint8_t sign;
- tran_low_t level = levels[get_padded_idx(pos, bwl)];
+ tran_low_t level = levels[get_padded_idx(pos, bhl)];
if (level) {
*max_scan_line = AOMMAX(*max_scan_line, pos);
if (c == 0) {
diff --git a/av1/decoder/obu.c b/av1/decoder/obu.c
index d589f00..b687cf9 100644
--- a/av1/decoder/obu.c
+++ b/av1/decoder/obu.c
@@ -396,7 +396,7 @@
cm->seq_params->subsampling_y,
(cm->seq_params->use_highbitdepth &&
(cm->seq_params->bit_depth > AOM_BITS_8)),
- 0, cm->features.byte_alignment, 0))
+ 0, cm->features.byte_alignment, 0, 0))
aom_internal_error(&pbi->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate the tile list output buffer");
}
diff --git a/av1/encoder/allintra_vis.c b/av1/encoder/allintra_vis.c
index cfc3270..236b296 100644
--- a/av1/encoder/allintra_vis.c
+++ b/av1/encoder/allintra_vis.c
@@ -9,6 +9,8 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
+#include <assert.h>
+
#include "config/aom_config.h"
#if CONFIG_TFLITE
@@ -35,11 +37,29 @@
// "compute_num_ai_workers()".
cpi->weber_bsize = BLOCK_8X8;
- if (cpi->mb_weber_stats) return;
+ if (cpi->oxcf.enable_rate_guide_deltaq) {
+ if (cpi->mb_weber_stats && cpi->prep_rate_estimates &&
+ cpi->ext_rate_distribution)
+ return;
+ } else {
+ if (cpi->mb_weber_stats) return;
+ }
CHECK_MEM_ERROR(cm, cpi->mb_weber_stats,
aom_calloc(cpi->frame_info.mi_rows * cpi->frame_info.mi_cols,
sizeof(*cpi->mb_weber_stats)));
+
+ if (cpi->oxcf.enable_rate_guide_deltaq) {
+ CHECK_MEM_ERROR(
+ cm, cpi->prep_rate_estimates,
+ aom_calloc(cpi->frame_info.mi_rows * cpi->frame_info.mi_cols,
+ sizeof(*cpi->prep_rate_estimates)));
+
+ CHECK_MEM_ERROR(
+ cm, cpi->ext_rate_distribution,
+ aom_calloc(cpi->frame_info.mi_rows * cpi->frame_info.mi_cols,
+ sizeof(*cpi->ext_rate_distribution)));
+ }
}
static int64_t get_satd(AV1_COMP *const cpi, BLOCK_SIZE bsize, int mi_row,
@@ -197,6 +217,20 @@
return sb_wiener_var;
}
+static int rate_estimator(const tran_low_t *qcoeff, int eob, TX_SIZE tx_size) {
+ const SCAN_ORDER *const scan_order = &av1_scan_orders[tx_size][DCT_DCT];
+
+ assert((1 << num_pels_log2_lookup[txsize_to_bsize[tx_size]]) >= eob);
+ int rate_cost = 1;
+
+ for (int idx = 0; idx < eob; ++idx) {
+ int abs_level = abs(qcoeff[scan_order->scan[idx]]);
+ rate_cost += (int)(log1p(abs_level) / log(2.0)) + 1 + (abs_level > 0);
+ }
+
+ return (rate_cost << AV1_PROB_COST_SHIFT);
+}
+
void av1_calc_mb_wiener_var_row(AV1_COMP *const cpi, MACROBLOCK *x,
MACROBLOCKD *xd, const int mi_row,
int16_t *src_diff, tran_low_t *coeff,
@@ -216,22 +250,36 @@
const int coeff_count = block_size * block_size;
const int mb_step = mi_size_wide[bsize];
const BitDepthInfo bd_info = get_bit_depth_info(xd);
- const AV1EncRowMultiThreadInfo *const enc_row_mt = &cpi->mt_info.enc_row_mt;
- // We allocate cpi->tile_data (of size 1) when we call this function in
- // multithreaded mode, so cpi->tile_data may be a null pointer when we call
- // this function in single-threaded mode.
- AV1EncRowMultiThreadSync *const row_mt_sync =
- cpi->tile_data ? &cpi->tile_data[0].row_mt_sync : NULL;
+ const AV1EncAllIntraMultiThreadInfo *const intra_mt = &cpi->mt_info.intra_mt;
+ AV1EncRowMultiThreadSync *const intra_row_mt_sync =
+ &cpi->ppi->intra_row_mt_sync;
const int mi_cols = cm->mi_params.mi_cols;
const int mt_thread_id = mi_row / mb_step;
// TODO(chengchen): test different unit step size
const int mt_unit_step = mi_size_wide[BLOCK_64X64];
const int mt_unit_cols = (mi_cols + (mt_unit_step >> 1)) / mt_unit_step;
int mt_unit_col = 0;
+ const int is_high_bitdepth = is_cur_buf_hbd(xd);
+
+ // We use a scratch buffer to store the prediction.
+ // The stride is the max block size (128).
+ uint8_t *pred_buffer;
+ const int dst_buffer_stride = 128;
+ const int buf_width = 128;
+ const int buf_height = 128;
+ const size_t buf_size = (buf_width * buf_height * sizeof(*pred_buffer))
+ << is_high_bitdepth;
+ CHECK_MEM_ERROR(cm, pred_buffer, aom_memalign(32, buf_size));
+ uint8_t *dst_buffer = pred_buffer;
+ if (is_high_bitdepth) {
+ uint16_t *pred_buffer_16 = (uint16_t *)pred_buffer;
+ dst_buffer = CONVERT_TO_BYTEPTR(pred_buffer_16);
+ }
for (int mi_col = 0; mi_col < mi_cols; mi_col += mb_step) {
if (mi_col % mt_unit_step == 0) {
- enc_row_mt->sync_read_ptr(row_mt_sync, mt_thread_id, mt_unit_col);
+ intra_mt->intra_sync_read_ptr(intra_row_mt_sync, mt_thread_id,
+ mt_unit_col);
}
PREDICTION_MODE best_mode = DC_PRED;
@@ -241,24 +289,32 @@
set_mode_info_offsets(&cpi->common.mi_params, &cpi->mbmi_ext_info, x, xd,
mi_row, mi_col);
set_mi_row_col(xd, &xd->tile, mi_row, mi_height, mi_col, mi_width,
- cm->mi_params.mi_rows, cm->mi_params.mi_cols);
+ AOMMIN(mi_row + mi_height, cm->mi_params.mi_rows),
+ AOMMIN(mi_col + mi_width, cm->mi_params.mi_cols));
set_plane_n4(xd, mi_size_wide[bsize], mi_size_high[bsize],
av1_num_planes(cm));
xd->mi[0]->bsize = bsize;
xd->mi[0]->motion_mode = SIMPLE_TRANSLATION;
- av1_setup_dst_planes(xd->plane, bsize, &cm->cur_frame->buf, mi_row, mi_col,
- 0, av1_num_planes(cm));
- int dst_buffer_stride = xd->plane[0].dst.stride;
- uint8_t *dst_buffer = xd->plane[0].dst.buf;
+ // Set above and left mbmi to NULL as they are not available in the
+ // preprocessing stage.
+ // They are used to detemine intra edge filter types in intra prediction.
+ if (xd->up_available) {
+ xd->above_mbmi = NULL;
+ }
+ if (xd->left_available) {
+ xd->left_mbmi = NULL;
+ }
uint8_t *mb_buffer =
buffer + mi_row * MI_SIZE * buf_stride + mi_col * MI_SIZE;
for (PREDICTION_MODE mode = INTRA_MODE_START; mode < INTRA_MODE_END;
++mode) {
- av1_predict_intra_block(xd, cm->seq_params->sb_size,
- cm->seq_params->enable_intra_edge_filter,
- block_size, block_size, tx_size, mode, 0, 0,
- FILTER_INTRA_MODES, dst_buffer, dst_buffer_stride,
- dst_buffer, dst_buffer_stride, 0, 0, 0);
+ // TODO(chengchen): Here we use src instead of reconstructed frame as
+ // the intra predictor to make single and multithread version match.
+ // Ideally we want to use the reconstructed.
+ av1_predict_intra_block(
+ xd, cm->seq_params->sb_size, cm->seq_params->enable_intra_edge_filter,
+ block_size, block_size, tx_size, mode, 0, 0, FILTER_INTRA_MODES,
+ mb_buffer, buf_stride, dst_buffer, dst_buffer_stride, 0, 0, 0);
av1_subtract_block(bd_info, block_size, block_size, src_diff, block_size,
mb_buffer, buf_stride, dst_buffer, dst_buffer_stride);
av1_quick_txfm(0, tx_size, bd_info, src_diff, block_size, coeff);
@@ -272,7 +328,7 @@
av1_predict_intra_block(
xd, cm->seq_params->sb_size, cm->seq_params->enable_intra_edge_filter,
block_size, block_size, tx_size, best_mode, 0, 0, FILTER_INTRA_MODES,
- dst_buffer, dst_buffer_stride, dst_buffer, dst_buffer_stride, 0, 0, 0);
+ mb_buffer, buf_stride, dst_buffer, dst_buffer_stride, 0, 0, 0);
av1_subtract_block(bd_info, block_size, block_size, src_diff, block_size,
mb_buffer, buf_stride, dst_buffer, dst_buffer_stride);
av1_quick_txfm(0, tx_size, bd_info, src_diff, block_size, coeff);
@@ -295,6 +351,13 @@
av1_quantize_fp_facade(coeff, pix_num, p, qcoeff, dqcoeff, &eob, scan_order,
&quant_param);
#endif // CONFIG_AV1_HIGHBITDEPTH
+
+ if (cpi->oxcf.enable_rate_guide_deltaq) {
+ const int rate_cost = rate_estimator(qcoeff, eob, tx_size);
+ cpi->prep_rate_estimates[(mi_row / mb_step) * cpi->frame_info.mi_cols +
+ (mi_col / mb_step)] = rate_cost;
+ }
+
av1_inverse_transform_block(xd, dqcoeff, 0, DCT_DCT, tx_size, dst_buffer,
dst_buffer_stride, eob, 0);
WeberStats *weber_stats =
@@ -364,13 +427,14 @@
if ((mi_col + mb_step) % mt_unit_step == 0 ||
(mi_col + mb_step) >= mi_cols) {
- enc_row_mt->sync_write_ptr(row_mt_sync, mt_thread_id, mt_unit_col,
- mt_unit_cols);
+ intra_mt->intra_sync_write_ptr(intra_row_mt_sync, mt_thread_id,
+ mt_unit_col, mt_unit_cols);
++mt_unit_col;
}
}
// Set the pointer to null since mbmi is only allocated inside this function.
xd->mi = NULL;
+ aom_free(pred_buffer);
}
static void calc_mb_wiener_var(AV1_COMP *const cpi, double *sum_rec_distortion,
@@ -440,6 +504,57 @@
}
}
+static void ext_rate_guided_quantization(AV1_COMP *cpi) {
+ // Calculation uses 8x8.
+ const int mb_step = mi_size_wide[cpi->weber_bsize];
+ // Accumulate to 16x16, step size is in the unit of mi.
+ const int block_step = 4;
+
+ const char *filename = cpi->oxcf.rate_distribution_info;
+ FILE *pfile = fopen(filename, "r");
+ if (pfile == NULL) {
+ assert(pfile != NULL);
+ return;
+ }
+
+ double ext_rate_sum = 0.0;
+ for (int row = 0; row < cpi->frame_info.mi_rows; row += block_step) {
+ for (int col = 0; col < cpi->frame_info.mi_cols; col += block_step) {
+ float val;
+ const int fields_converted = fscanf(pfile, "%f", &val);
+ if (fields_converted != 1) {
+ assert(fields_converted == 1);
+ fclose(pfile);
+ return;
+ }
+ ext_rate_sum += val;
+ cpi->ext_rate_distribution[(row / mb_step) * cpi->frame_info.mi_cols +
+ (col / mb_step)] = val;
+ }
+ }
+ fclose(pfile);
+
+ int uniform_rate_sum = 0;
+ for (int row = 0; row < cpi->frame_info.mi_rows; row += block_step) {
+ for (int col = 0; col < cpi->frame_info.mi_cols; col += block_step) {
+ int rate_sum = 0;
+ for (int r = 0; r < block_step; r += mb_step) {
+ for (int c = 0; c < block_step; c += mb_step) {
+ const int mi_row = row + r;
+ const int mi_col = col + c;
+ rate_sum += cpi->prep_rate_estimates[(mi_row / mb_step) *
+ cpi->frame_info.mi_cols +
+ (mi_col / mb_step)];
+ }
+ }
+ uniform_rate_sum += rate_sum;
+ }
+ }
+
+ const double scale = uniform_rate_sum / ext_rate_sum;
+ cpi->ext_rate_scale = scale;
+}
+
void av1_set_mb_wiener_variance(AV1_COMP *cpi) {
AV1_COMMON *const cm = &cpi->common;
const SequenceHeader *const seq_params = cm->seq_params;
@@ -447,7 +562,7 @@
&cm->cur_frame->buf, cm->width, cm->height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels, cm->features.byte_alignment, NULL, NULL,
- NULL, cpi->oxcf.tool_cfg.enable_global_motion, 0))
+ NULL, cpi->image_pyramid_levels, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate frame buffer");
cpi->norm_wiener_variance = 0;
@@ -468,15 +583,16 @@
MultiThreadInfo *const mt_info = &cpi->mt_info;
const int num_workers =
AOMMIN(mt_info->num_mod_workers[MOD_AI], mt_info->num_workers);
- AV1EncRowMultiThreadInfo *const enc_row_mt = &mt_info->enc_row_mt;
- enc_row_mt->sync_read_ptr = av1_row_mt_sync_read_dummy;
- enc_row_mt->sync_write_ptr = av1_row_mt_sync_write_dummy;
+ AV1EncAllIntraMultiThreadInfo *const intra_mt = &mt_info->intra_mt;
+ intra_mt->intra_sync_read_ptr = av1_row_mt_sync_read_dummy;
+ intra_mt->intra_sync_write_ptr = av1_row_mt_sync_write_dummy;
// Calculate differential contrast for each block for the entire image.
- // TODO(aomedia:3376): Remove " && 0" when there are no data races in
- // av1_calc_mb_wiener_var_mt(). See also bug aomedia:3380.
- if (num_workers > 1 && 0) {
- enc_row_mt->sync_read_ptr = av1_row_mt_sync_read;
- enc_row_mt->sync_write_ptr = av1_row_mt_sync_write;
+ // TODO(chengchen): properly accumulate the distortion and rate in
+ // av1_calc_mb_wiener_var_mt(). Until then, call calc_mb_wiener_var() if
+ // auto_intra_tools_off is true.
+ if (num_workers > 1 && !cpi->oxcf.intra_mode_cfg.auto_intra_tools_off) {
+ intra_mt->intra_sync_read_ptr = av1_row_mt_sync_read;
+ intra_mt->intra_sync_write_ptr = av1_row_mt_sync_write;
av1_calc_mb_wiener_var_mt(cpi, num_workers, &sum_rec_distortion,
&sum_est_rate);
} else {
@@ -486,6 +602,9 @@
// Determine whether to turn off several intra coding tools.
automatic_intra_tools_off(cpi, sum_rec_distortion, sum_est_rate);
+ // Read external rate distribution and use it to guide delta quantization
+ if (cpi->oxcf.enable_rate_guide_deltaq) ext_rate_guided_quantization(cpi);
+
const BLOCK_SIZE norm_block_size = cm->seq_params->sb_size;
cpi->norm_wiener_variance = estimate_wiener_var_norm(cpi, norm_block_size);
const int norm_step = mi_size_wide[norm_block_size];
@@ -530,8 +649,67 @@
aom_free_frame_buffer(&cm->cur_frame->buf);
}
+static int get_rate_guided_quantizer(AV1_COMP *const cpi, BLOCK_SIZE bsize,
+ int mi_row, int mi_col) {
+ // Calculation uses 8x8.
+ const int mb_step = mi_size_wide[cpi->weber_bsize];
+ // Accumulate to 16x16
+ const int block_step = mi_size_wide[BLOCK_16X16];
+ double sb_rate_hific = 0.0;
+ double sb_rate_uniform = 0.0;
+ for (int row = mi_row; row < mi_row + mi_size_wide[bsize];
+ row += block_step) {
+ for (int col = mi_col; col < mi_col + mi_size_high[bsize];
+ col += block_step) {
+ sb_rate_hific +=
+ cpi->ext_rate_distribution[(row / mb_step) * cpi->frame_info.mi_cols +
+ (col / mb_step)];
+
+ for (int r = 0; r < block_step; r += mb_step) {
+ for (int c = 0; c < block_step; c += mb_step) {
+ const int this_row = row + r;
+ const int this_col = col + c;
+ sb_rate_uniform +=
+ cpi->prep_rate_estimates[(this_row / mb_step) *
+ cpi->frame_info.mi_cols +
+ (this_col / mb_step)];
+ }
+ }
+ }
+ }
+ sb_rate_hific *= cpi->ext_rate_scale;
+
+ const double weight = 1.0;
+ const double rate_diff =
+ weight * (sb_rate_hific - sb_rate_uniform) / sb_rate_uniform;
+ double scale = pow(2, rate_diff);
+
+ scale = scale * scale;
+ double min_max_scale = AOMMAX(1.0, get_max_scale(cpi, bsize, mi_row, mi_col));
+ scale = 1.0 / AOMMIN(1.0 / scale, min_max_scale);
+
+ AV1_COMMON *const cm = &cpi->common;
+ const int base_qindex = cm->quant_params.base_qindex;
+ int offset =
+ av1_get_deltaq_offset(cm->seq_params->bit_depth, base_qindex, scale);
+ const DeltaQInfo *const delta_q_info = &cm->delta_q_info;
+ const int max_offset = delta_q_info->delta_q_res * 10;
+ offset = AOMMIN(offset, max_offset - 1);
+ offset = AOMMAX(offset, -max_offset + 1);
+ int qindex = cm->quant_params.base_qindex + offset;
+ qindex = AOMMIN(qindex, MAXQ);
+ qindex = AOMMAX(qindex, MINQ);
+ if (base_qindex > MINQ) qindex = AOMMAX(qindex, MINQ + 1);
+
+ return qindex;
+}
+
int av1_get_sbq_perceptual_ai(AV1_COMP *const cpi, BLOCK_SIZE bsize, int mi_row,
int mi_col) {
+ if (cpi->oxcf.enable_rate_guide_deltaq) {
+ return get_rate_guided_quantizer(cpi, bsize, mi_row, mi_col);
+ }
+
AV1_COMMON *const cm = &cpi->common;
const int base_qindex = cm->quant_params.base_qindex;
int sb_wiener_var = get_var_perceptual_ai(cpi, bsize, mi_row, mi_col);
diff --git a/av1/encoder/aq_cyclicrefresh.c b/av1/encoder/aq_cyclicrefresh.c
index 616d52f..be51ba1 100644
--- a/av1/encoder/aq_cyclicrefresh.c
+++ b/av1/encoder/aq_cyclicrefresh.c
@@ -313,6 +313,7 @@
if (cr->sb_index >= sbs_in_frame) cr->sb_index = 0;
assert(cr->sb_index < sbs_in_frame);
i = cr->sb_index;
+ cr->last_sb_index = cr->sb_index;
cr->target_num_seg_blocks = 0;
do {
int sum_map = 0;
@@ -330,13 +331,22 @@
if (cr->use_block_sad_scene_det && cpi->rc.frames_since_key > 30 &&
cr->counter_encode_maxq_scene_change > 30 &&
cpi->src_sad_blk_64x64 != NULL &&
- cpi->svc.number_temporal_layers == 1 &&
cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1) {
sb_sad = cpi->src_sad_blk_64x64[sb_col_index + sb_cols * sb_row_index];
int scale = (cm->width * cm->height < 640 * 360) ? 6 : 8;
int scale_low = 2;
thresh_sad = (scale * 64 * 64);
thresh_sad_low = (scale_low * 64 * 64);
+ // For temporal layers: the base temporal layer (temporal_layer_id = 0)
+ // has larger frame separation (2 or 4 frames apart), so use larger sad
+ // thresholds to compensate for larger frame sad. The larger thresholds
+ // also increase the amount of refresh, which is needed for the base
+ // temporal layer.
+ if (cpi->svc.number_temporal_layers > 1 &&
+ cpi->svc.temporal_layer_id == 0) {
+ thresh_sad <<= 4;
+ thresh_sad_low <<= 2;
+ }
}
// cr_map only needed at 8x8 blocks.
for (y = 0; y < ymis; y += 2) {
@@ -384,18 +394,23 @@
const PRIMARY_RATE_CONTROL *const p_rc = &cpi->ppi->p_rc;
const AV1_COMMON *const cm = &cpi->common;
CYCLIC_REFRESH *const cr = cpi->cyclic_refresh;
- int num4x4bl = cm->mi_params.MBs << 4;
- int target_refresh = 0;
- double weight_segment_target = 0;
- double weight_segment = 0;
- int qp_thresh = AOMMIN(20, rc->best_quality << 1);
- if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN)
- qp_thresh = AOMMIN(35, rc->best_quality << 1);
- int qp_max_thresh = 118 * MAXQ >> 7;
+ SVC *const svc = &cpi->svc;
+ const int qp_thresh = AOMMAX(16, rc->best_quality + 4);
+ const int qp_max_thresh = 118 * MAXQ >> 7;
const int scene_change_detected = is_scene_change_detected(cpi);
+ const int is_screen_content =
+ (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN);
+
+ // A scene change or key frame marks the start of a cyclic refresh cycle.
+ const int frames_since_scene_change =
+ (cpi->ppi->use_svc || !is_screen_content)
+ ? cpi->rc.frames_since_key
+ : AOMMIN(cpi->rc.frames_since_key,
+ cr->counter_encode_maxq_scene_change);
// Cases to reset the cyclic refresh adjustment parameters.
- if (frame_is_intra_only(cm) || scene_change_detected) {
+ if (frame_is_intra_only(cm) || scene_change_detected ||
+ cpi->ppi->rtc_ref.bias_recovery_frame) {
// Reset adaptive elements for intra only frames and scene changes.
cr->percent_refresh_adjustment = 5;
cr->rate_ratio_qdelta_adjustment = 0.25;
@@ -414,20 +429,22 @@
// should we enable cyclic refresh on this frame.
cr->apply_cyclic_refresh = 1;
if (frame_is_intra_only(cm) || is_lossless_requested(&cpi->oxcf.rc_cfg) ||
- scene_change_detected || cpi->svc.temporal_layer_id > 0 ||
+ scene_change_detected || svc->temporal_layer_id > 0 ||
+ svc->prev_number_spatial_layers != svc->number_spatial_layers ||
p_rc->avg_frame_qindex[INTER_FRAME] < qp_thresh ||
- (cpi->svc.number_spatial_layers > 1 &&
- cpi->svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame) ||
- (rc->frames_since_key > 20 &&
+ (svc->number_spatial_layers > 1 &&
+ svc->layer_context[svc->temporal_layer_id].is_key_frame) ||
+ (frames_since_scene_change > 20 &&
p_rc->avg_frame_qindex[INTER_FRAME] > qp_max_thresh) ||
(rc->avg_frame_low_motion && rc->avg_frame_low_motion < 30 &&
- rc->frames_since_key > 40)) {
+ frames_since_scene_change > 40) ||
+ cpi->ppi->rtc_ref.bias_recovery_frame) {
cr->apply_cyclic_refresh = 0;
return;
}
// Increase the amount of refresh for #temporal_layers > 2
- if (cpi->svc.number_temporal_layers > 2)
+ if (svc->number_temporal_layers > 2)
cr->percent_refresh = 15;
else
cr->percent_refresh = 10 + cr->percent_refresh_adjustment;
@@ -442,24 +459,46 @@
cr->motion_thresh = 32;
cr->rate_boost_fac =
(cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN) ? 10 : 15;
- // Use larger delta-qp (increase rate_ratio_qdelta) for first few (~4)
- // periods of the refresh cycle, after a key frame.
- // Account for larger interval on base layer for temporal layers.
- if (cr->percent_refresh > 0 &&
- rc->frames_since_key <
- (4 * cpi->svc.number_temporal_layers) * (100 / cr->percent_refresh)) {
- cr->rate_ratio_qdelta = 3.0 + cr->rate_ratio_qdelta_adjustment;
+
+ // Use larger delta-qp (increase rate_ratio_qdelta) for first few
+ // refresh cycles after a key frame (svc) or scene change (non svc).
+ // For non svc screen content, after a scene change gradually reduce
+ // this boost and supress it further if either of the previous two
+ // frames overshot.
+ if (cr->percent_refresh > 0) {
+ if (cpi->ppi->use_svc || !is_screen_content) {
+ if (frames_since_scene_change <
+ ((4 * svc->number_temporal_layers) * (100 / cr->percent_refresh))) {
+ cr->rate_ratio_qdelta = 3.0 + cr->rate_ratio_qdelta_adjustment;
+ } else {
+ cr->rate_ratio_qdelta = 2.25 + cr->rate_ratio_qdelta_adjustment;
+ }
+ } else {
+ double distance_from_sc_factor =
+ AOMMIN(0.75, (int)(frames_since_scene_change / 10) * 0.1);
+ cr->rate_ratio_qdelta =
+ 3.0 + cr->rate_ratio_qdelta_adjustment - distance_from_sc_factor;
+ if ((frames_since_scene_change < 10) &&
+ ((cpi->rc.rc_1_frame < 0) || (cpi->rc.rc_2_frame < 0))) {
+ cr->rate_ratio_qdelta -= 0.25;
+ }
+ }
} else {
cr->rate_ratio_qdelta = 2.25 + cr->rate_ratio_qdelta_adjustment;
}
// Adjust some parameters for low resolutions.
if (cm->width * cm->height <= 352 * 288) {
- if (rc->avg_frame_bandwidth < 3000) {
- cr->motion_thresh = 16;
+ if (cpi->svc.number_temporal_layers > 1) {
+ cr->motion_thresh = 32;
cr->rate_boost_fac = 13;
} else {
- cr->max_qdelta_perc = 50;
- cr->rate_ratio_qdelta = AOMMAX(cr->rate_ratio_qdelta, 2.0);
+ if (rc->avg_frame_bandwidth < 3000) {
+ cr->motion_thresh = 16;
+ cr->rate_boost_fac = 13;
+ } else {
+ cr->max_qdelta_perc = 50;
+ cr->rate_ratio_qdelta = AOMMAX(cr->rate_ratio_qdelta, 2.0);
+ }
}
}
if (cpi->oxcf.rc_cfg.mode == AOM_VBR) {
@@ -474,25 +513,10 @@
cr->rate_ratio_qdelta = 1.0;
}
}
- // Weight for segment prior to encoding: take the average of the target
- // number for the frame to be encoded and the actual from the previous frame.
- // Use the target if its less. To be used for setting the base qp for the
- // frame in av1_rc_regulate_q.
- target_refresh =
- cr->percent_refresh * cm->mi_params.mi_rows * cm->mi_params.mi_cols / 100;
- weight_segment_target = (double)(target_refresh) / num4x4bl;
- weight_segment = (double)((target_refresh + cr->actual_num_seg1_blocks +
- cr->actual_num_seg2_blocks) >>
- 1) /
- num4x4bl;
- if (weight_segment_target < 7 * weight_segment / 8)
- weight_segment = weight_segment_target;
- cr->weight_segment = weight_segment;
if (rc->rtc_external_ratectrl) {
cr->actual_num_seg1_blocks = cr->percent_refresh * cm->mi_params.mi_rows *
cm->mi_params.mi_cols / 100;
cr->actual_num_seg2_blocks = 0;
- cr->weight_segment = (double)(cr->actual_num_seg1_blocks) / num4x4bl;
}
}
@@ -508,9 +532,13 @@
const int layer_depth = AOMMIN(gf_group->layer_depth[cpi->gf_frame_index], 6);
const FRAME_TYPE frame_type = cm->current_frame.frame_type;
+ // Set resolution_change flag: for svc only set it when the
+ // number of spatial layers has not changed.
const int resolution_change =
- cm->prev_frame && (cm->width != cm->prev_frame->width ||
- cm->height != cm->prev_frame->height);
+ cm->prev_frame &&
+ (cm->width != cm->prev_frame->width ||
+ cm->height != cm->prev_frame->height) &&
+ cpi->svc.prev_number_spatial_layers == cpi->svc.number_spatial_layers;
if (resolution_change) av1_cyclic_refresh_reset_resize(cpi);
if (!cr->apply_cyclic_refresh) {
@@ -518,9 +546,13 @@
unsigned char *const seg_map = cpi->enc_seg.map;
memset(seg_map, 0, cm->mi_params.mi_rows * cm->mi_params.mi_cols);
av1_disable_segmentation(&cm->seg);
- if (cm->current_frame.frame_type == KEY_FRAME || scene_change_detected) {
+ if (frame_is_intra_only(cm) || scene_change_detected ||
+ cpi->ppi->rtc_ref.bias_recovery_frame) {
cr->sb_index = 0;
+ cr->last_sb_index = 0;
cr->counter_encode_maxq_scene_change = 0;
+ cr->actual_num_seg1_blocks = 0;
+ cr->actual_num_seg2_blocks = 0;
}
return;
} else {
@@ -600,6 +632,7 @@
CYCLIC_REFRESH *const cr = cpi->cyclic_refresh;
memset(cr->map, 0, cm->mi_params.mi_rows * cm->mi_params.mi_cols);
cr->sb_index = 0;
+ cr->last_sb_index = 0;
cpi->refresh_frame.golden_frame = true;
cr->apply_cyclic_refresh = 0;
cr->counter_encode_maxq_scene_change = 0;
@@ -610,6 +643,7 @@
int av1_cyclic_refresh_disable_lf_cdef(AV1_COMP *const cpi) {
CYCLIC_REFRESH *const cr = cpi->cyclic_refresh;
// TODO(marpan): Tune these conditons, add QP dependence.
+ if (cpi->sf.rt_sf.skip_lf_screen > 1 && !cpi->rc.high_source_sad) return 1;
if (cpi->rc.frames_since_key > 30 && cr->percent_refresh > 0 &&
cr->counter_encode_maxq_scene_change > 300 / cr->percent_refresh &&
cpi->rc.frame_source_sad < 1000)
diff --git a/av1/encoder/aq_cyclicrefresh.h b/av1/encoder/aq_cyclicrefresh.h
index 3353c5a..10974f0 100644
--- a/av1/encoder/aq_cyclicrefresh.h
+++ b/av1/encoder/aq_cyclicrefresh.h
@@ -54,6 +54,10 @@
*/
int sb_index;
/*!
+ *Superblock index cyclic refresh index last frame
+ */
+ int last_sb_index;
+ /*!
* Controls how long block will need to wait to be refreshed again, in
* excess of the cycle time, i.e., in the case of all zero motion, block
* will be refreshed every (100/percent_refresh + time_for_refresh) frames.
@@ -113,7 +117,6 @@
/*!\cond */
int qindex_delta[3];
- double weight_segment;
int apply_cyclic_refresh;
int skip_over4x4;
int counter_encode_maxq_scene_change;
@@ -226,7 +229,7 @@
/*!\brief Initialize counters used for cyclic refresh.
*
- * Initializes cyclic refresh counters cnt_zeromv, actual_num_seg1_blocks and
+ * Initializes cyclic refresh counters actual_num_seg1_blocks and
* actual_num_seg2_blocks.
*
* \ingroup cyclic_refresh
@@ -235,14 +238,14 @@
*
* \param[in] x Pointer to MACROBLOCK structure
*
- * \remark Update the \c x->cnt_zeromv, the \c x->actual_num_seg1_blocks and
- * the \c x->actual_num_seg1_blocks.
+ * \remark Update the \c x->actual_num_seg1_blocks and the
+ * \c x->actual_num_seg2_blocks.
*/
void av1_init_cyclic_refresh_counters(MACROBLOCK *const x);
/*!\brief Accumulate cyclic refresh counters.
*
- * Accumulates cyclic refresh counters cnt_zeromv, actual_num_seg1_blocks and
+ * Accumulates cyclic refresh counters actual_num_seg1_blocks and
* actual_num_seg2_blocks from MACROBLOCK strcture to CYCLIC_REFRESH strcture.
*
* \ingroup cyclic_refresh
@@ -252,9 +255,8 @@
* \param[in] cyclic_refresh Pointer to CYCLIC_REFRESH structure
* \param[in] x Pointer to MACROBLOCK structure
*
- * \remark Update the \c cyclic_refresh->cnt_zeromv, the \c
- * cyclic_refresh->actual_num_seg1_blocks and the \c
- * cyclic_refresh->actual_num_seg1_blocks.
+ * \remark Update the \c cyclic_refresh->actual_num_seg1_blocks and the
+ * \c cyclic_refresh->actual_num_seg2_blocks.
*/
void av1_accumulate_cyclic_refresh_counters(
CYCLIC_REFRESH *const cyclic_refresh, const MACROBLOCK *const x);
diff --git a/av1/encoder/aq_variance.c b/av1/encoder/aq_variance.c
index d53d2c9..086928a 100644
--- a/av1/encoder/aq_variance.c
+++ b/av1/encoder/aq_variance.c
@@ -118,18 +118,16 @@
for (i = 0; i < bh; i += 4) {
for (j = 0; j < bw; j += 4) {
if (is_cur_buf_hbd(xd)) {
- var +=
- log(1.0 + cpi->ppi->fn_ptr[BLOCK_4X4].vf(
- x->plane[0].src.buf + i * x->plane[0].src.stride + j,
- x->plane[0].src.stride,
- CONVERT_TO_BYTEPTR(av1_highbd_all_zeros), 0, &sse) /
- 16.0);
+ var += log1p(cpi->ppi->fn_ptr[BLOCK_4X4].vf(
+ x->plane[0].src.buf + i * x->plane[0].src.stride + j,
+ x->plane[0].src.stride,
+ CONVERT_TO_BYTEPTR(av1_highbd_all_zeros), 0, &sse) /
+ 16.0);
} else {
- var +=
- log(1.0 + cpi->ppi->fn_ptr[BLOCK_4X4].vf(
- x->plane[0].src.buf + i * x->plane[0].src.stride + j,
- x->plane[0].src.stride, av1_all_zeros, 0, &sse) /
- 16.0);
+ var += log1p(cpi->ppi->fn_ptr[BLOCK_4X4].vf(
+ x->plane[0].src.buf + i * x->plane[0].src.stride + j,
+ x->plane[0].src.stride, av1_all_zeros, 0, &sse) /
+ 16.0);
}
}
}
@@ -184,9 +182,9 @@
return (unsigned int)((uint64_t)var * 256) >> num_pels_log2_lookup[bs];
}
-double av1_log_block_wavelet_energy(MACROBLOCK *x, BLOCK_SIZE bs) {
+static double log_block_wavelet_energy(MACROBLOCK *x, BLOCK_SIZE bs) {
unsigned int haar_sad = haar_ac_energy(x, bs);
- return log(haar_sad + 1.0);
+ return log1p(haar_sad);
}
int av1_block_wavelet_energy_level(const AV1_COMP *cpi, MACROBLOCK *x,
@@ -195,7 +193,7 @@
energy_midpoint = (is_stat_consumption_stage_twopass(cpi))
? cpi->twopass_frame.frame_avg_haar_energy
: DEFAULT_E_MIDPOINT;
- energy = av1_log_block_wavelet_energy(x, bs) - energy_midpoint;
+ energy = log_block_wavelet_energy(x, bs) - energy_midpoint;
return clamp((int)round(energy), ENERGY_MIN, ENERGY_MAX);
}
diff --git a/av1/encoder/arm/crc32/hash_crc32.c b/av1/encoder/arm/crc32/hash_crc32.c
index dd8685d..771496c 100644
--- a/av1/encoder/arm/crc32/hash_crc32.c
+++ b/av1/encoder/arm/crc32/hash_crc32.c
@@ -13,6 +13,8 @@
#include <stddef.h>
#include <arm_acle.h>
+#include "config/aom_config.h"
+
#define CRC_LOOP(op, crc, type, buf, len) \
while ((len) >= sizeof(type)) { \
(crc) = op((crc), *(type *)(buf)); \
@@ -37,7 +39,7 @@
const uint8_t *buf = p;
uint32_t crc = 0xFFFFFFFF;
-#if !defined(__aarch64__)
+#if !AOM_ARCH_AARCH64
// Align input to 8-byte boundary (only necessary for 32-bit builds.)
while (len && ((uintptr_t)buf & 7)) {
crc = __crc32cb(crc, *buf++);
diff --git a/av1/encoder/arm/neon/av1_error_neon.c b/av1/encoder/arm/neon/av1_error_neon.c
index 124c1fd..7d24c7d 100644
--- a/av1/encoder/arm/neon/av1_error_neon.c
+++ b/av1/encoder/arm/neon/av1_error_neon.c
@@ -11,6 +11,8 @@
#include <arm_neon.h>
#include <assert.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/aom_dsp_common.h"
#include "aom_dsp/arm/mem_neon.h"
@@ -48,7 +50,7 @@
block_size -= 8;
} while (block_size != 0);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
*ssz = vaddvq_s64(sqcoeff);
return vaddvq_s64(error);
#else
diff --git a/av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c b/av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c
index 8a282b3..ee8b115 100644
--- a/av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c
+++ b/av1/encoder/arm/neon/av1_fwd_txfm2d_neon.c
@@ -24,7 +24,7 @@
static INLINE void transpose_16bit_4x4(const int16x8_t *const in,
int16x8_t *const out) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const int16x8_t a0 = vzip1q_s16(in[0], in[1]);
const int16x8_t a1 = vzip1q_s16(in[2], in[3]);
#else
@@ -45,7 +45,7 @@
static INLINE void transpose_16bit_4x8(const int16x8_t *const in,
int16x8_t *const out) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const int16x8_t a0 = vzip1q_s16(in[0], in[1]);
const int16x8_t a1 = vzip1q_s16(in[2], in[3]);
const int16x8_t a2 = vzip1q_s16(in[4], in[5]);
@@ -67,7 +67,7 @@
const int32x4x2_t b13 =
vzipq_s32(vreinterpretq_s32_s16(a2), vreinterpretq_s32_s16(a3));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
out[0] = vreinterpretq_s16_s64(vzip1q_s64(vreinterpretq_s64_s32(b02.val[0]),
vreinterpretq_s64_s32(b13.val[0])));
out[1] = vreinterpretq_s16_s64(vzip2q_s64(vreinterpretq_s64_s32(b02.val[0]),
@@ -100,7 +100,7 @@
const int32x4_t zeros = vdupq_n_s32(0);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
out[0] = vreinterpretq_s16_s64(vzip1q_s64(vreinterpretq_s64_s32(b01.val[0]),
vreinterpretq_s64_s32(zeros)));
out[1] = vreinterpretq_s16_s64(vzip2q_s64(vreinterpretq_s64_s32(b01.val[0]),
@@ -149,7 +149,7 @@
const int32x4x2_t b37 = vzipq_s32(vreinterpretq_s32_s16(a26.val[1]),
vreinterpretq_s32_s16(a37.val[1]));
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
out[0] = vreinterpretq_s16_s64(vzip1q_s64(vreinterpretq_s64_s32(b04.val[0]),
vreinterpretq_s64_s32(b15.val[0])));
out[1] = vreinterpretq_s16_s64(vzip2q_s64(vreinterpretq_s64_s32(b04.val[0]),
@@ -254,6 +254,16 @@
vst1q_s32((b + 4), vmovl_s16(vget_high_s16(a)));
}
+static INLINE void store_output_32bit_w8(int32_t *const out,
+ const int32x4_t *const in1,
+ const int32x4_t *const in2,
+ const int stride, const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ vst1q_s32(out + stride * i, in1[i]);
+ vst1q_s32(out + stride * i + 4, in2[i]);
+ }
+}
+
static INLINE void store_rect_16bit_to_32bit_w4(
const int16x8_t a, int32_t *const b, const int16x4_t *v_newsqrt2,
const int32x4_t *v_newsqrt2bits) {
@@ -2329,8 +2339,7 @@
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_4x4(buf, buf);
- store_buffer_16bit_to_32bit_w4(buf, output, width, height);
+ store_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_4x8_neon(const int16_t *input, int32_t *output,
@@ -2371,8 +2380,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x4(buf, buf);
- store_rect_buffer_16bit_to_32bit_w4(buf, output, width, height);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_4x16_neon(const int16_t *input, int32_t *output,
@@ -2415,8 +2423,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x4(buf, buf);
- store_buffer_16bit_to_32bit_w4(buf, output + 8 * width * i, width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2456,8 +2463,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output, width, height);
+ store_rect_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_8x8_neon(const int16_t *input, int32_t *output,
@@ -2496,8 +2502,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output, width, height);
+ store_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_8x16_neon(const int16_t *input, int32_t *output,
@@ -2540,8 +2545,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, 8);
}
}
@@ -2587,8 +2591,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2632,10 +2635,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_4x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output, width, height);
- transpose_16bit_4x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8, width, height);
+ store_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_16x8_neon(const int16_t *input, int32_t *output,
@@ -2678,10 +2678,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output, width, height);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8, width, height);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_16x16_neon(const int16_t *input, int32_t *output,
@@ -2727,11 +2724,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2781,12 +2774,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf0, height, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8,
- width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_16x32_c(input, output, stride, tx_type, bd);
@@ -2836,18 +2824,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf, width, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- height);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- height);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, height);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, height);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x16_c(input, output, stride, tx_type, bd);
@@ -2898,18 +2875,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit_vector(buf, width, &v_shift2);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8,
- width, 8);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_rect_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, 8);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_rect_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x16_c(input, output, stride, tx_type, bd);
@@ -2959,17 +2925,7 @@
}
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- 8);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, 8);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x32_c(input, output, stride, tx_type, bd);
@@ -3009,13 +2965,10 @@
int16x8_t *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit(buf, width, shift[2]);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < 4; ++j) {
- int16x8_t *buf8 = buf + 8 * j;
- transpose_16bit_8x8(buf8, buf8);
- store_buffer_16bit_to_32bit_w8(buf8, output8 + 8 * j, 32, 8);
- }
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, 16, 32);
}
+ // Zero out the bottom 16x32 area.
+ memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
void av1_lowbd_fwd_txfm2d_16x64_neon(const int16_t *input, int32_t *output,
@@ -3051,15 +3004,8 @@
int16x8_t *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row, NULL);
round_shift_16bit(buf, width, shift[2]);
- int32_t *output8 = output + 8 * width * i;
- for (int j = 0; j < width_div8; ++j) {
- int16x8_t *buf8 = buf + 8 * j;
- transpose_16bit_8x8(buf8, buf8);
- store_buffer_16bit_to_32bit_w8(buf8, output8 + 8 * j, width, 8);
- }
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, 32, 16);
}
- // Zero out the bottom 16x32 area.
- memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
#define TRANSPOSE_4X4_L32(x0, x1, x2, x3, y0, y1, y2, y3) \
@@ -3074,17 +3020,6 @@
y3 = y23.val[1]; \
} while (0)
-static INLINE void transpose_32_4x4x2(int stride, const int32x4_t *inputA,
- const int32x4_t *inputB,
- int32x4_t *output) {
- TRANSPOSE_4X4_L32(inputA[0], inputA[2], inputA[1], inputA[3],
- output[0 * stride], output[1 * stride], output[2 * stride],
- output[3 * stride]);
- TRANSPOSE_4X4_L32(inputB[0], inputB[2], inputB[1], inputB[3],
- output[4 * stride], output[5 * stride], output[6 * stride],
- output[7 * stride]);
-}
-
static void av1_fdct32_new_neon(int32x4_t *input, int32x4_t *output,
int cos_bit, const int stride,
const int8_t *stage_range) {
@@ -4259,11 +4194,7 @@
av1_round_shift_array_32_neon(bufA, bufA, 32);
av1_round_shift_array_32_neon(bufB, bufB, 32);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < width_div8; ++j) {
- int32x4_t *out = (int32x4_t *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
static void av1_lowbd_fwd_txfm2d_64x32_neon(const int16_t *input,
@@ -4306,11 +4237,7 @@
av1_round_shift_rect_array_32_neon(bufA, bufA, 32);
av1_round_shift_rect_array_32_neon(bufB, bufB, 32);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < width_div8; ++j) {
- int32x4_t *out = (int32x4_t *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
@@ -4356,11 +4283,7 @@
av1_round_shift_rect_array_32_neon(bufA, bufA, 32);
av1_round_shift_rect_array_32_neon(bufB, bufB, 32);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < (32 / 4); ++j) {
- int32x4_t *out = (int32x4_t *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
diff --git a/av1/encoder/arm/neon/av1_highbd_quantize_neon.c b/av1/encoder/arm/neon/av1_highbd_quantize_neon.c
index 197eae0..11d3def 100644
--- a/av1/encoder/arm/neon/av1_highbd_quantize_neon.c
+++ b/av1/encoder/arm/neon/av1_highbd_quantize_neon.c
@@ -11,6 +11,8 @@
#include <arm_neon.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/arm/mem_neon.h"
#include "av1/common/quant_common.h"
@@ -65,7 +67,7 @@
}
static INLINE uint16_t get_max_eob(int16x8_t v_eobmax) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
return (uint16_t)vmaxvq_s16(v_eobmax);
#else
const int16x4_t v_eobmax_3210 =
diff --git a/av1/encoder/arm/neon/av1_k_means_neon.c b/av1/encoder/arm/neon/av1_k_means_neon.c
new file mode 100644
index 0000000..d13cc65
--- /dev/null
+++ b/av1/encoder/arm/neon/av1_k_means_neon.c
@@ -0,0 +1,115 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All Rights Reserved.
+ *
+ * Use of this source code is governed by a BSD-style license
+ * that can be found in the LICENSE file in the root of the source
+ * tree. An additional intellectual property rights grant can be found
+ * in the file PATENTS. All contributing project authors may
+ * be found in the AUTHORS file in the root of the source tree.
+ */
+
+#include <arm_neon.h>
+
+#include "aom_dsp/arm/sum_neon.h"
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+static int32x4_t k_means_multiply_add_neon(const int16x8_t a) {
+ const int32x4_t l = vmull_s16(vget_low_s16(a), vget_low_s16(a));
+ const int32x4_t h = vmull_s16(vget_high_s16(a), vget_high_s16(a));
+#if AOM_ARCH_AARCH64
+ return vpaddq_s32(l, h);
+#else
+ const int32x2_t dl = vpadd_s32(vget_low_s32(l), vget_high_s32(l));
+ const int32x2_t dh = vpadd_s32(vget_low_s32(h), vget_high_s32(h));
+ return vcombine_s32(dl, dh);
+#endif
+}
+
+void av1_calc_indices_dim1_neon(const int16_t *data, const int16_t *centroids,
+ uint8_t *indices, int64_t *total_dist, int n,
+ int k) {
+ int64x2_t sum = vdupq_n_s64(0);
+ int16x8_t cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ cents[j] = vdupq_n_s16(centroids[j]);
+ }
+
+ for (int i = 0; i < n; i += 8) {
+ const int16x8_t in = vld1q_s16(data);
+ uint16x8_t ind = vdupq_n_u16(0);
+ // Compute the distance to the first centroid.
+ int16x8_t dist_min = vabdq_s16(in, cents[0]);
+
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ const int16x8_t dist = vabdq_s16(in, cents[j]);
+ // Compare to the minimal one.
+ const uint16x8_t cmp = vcgtq_s16(dist_min, dist);
+ dist_min = vminq_s16(dist_min, dist);
+ const uint16x8_t ind1 = vdupq_n_u16(j);
+ ind = vbslq_u16(cmp, ind1, ind);
+ }
+ if (total_dist) {
+ // Square, convert to 32 bit and add together.
+ const int32x4_t l =
+ vmull_s16(vget_low_s16(dist_min), vget_low_s16(dist_min));
+ const int32x4_t sum32_tmp =
+ vmlal_s16(l, vget_high_s16(dist_min), vget_high_s16(dist_min));
+ // Pairwise sum, convert to 64 bit and add to sum.
+ sum = vpadalq_s32(sum, sum32_tmp);
+ }
+ vst1_u8(indices, vmovn_u16(ind));
+ indices += 8;
+ data += 8;
+ }
+ if (total_dist) {
+ *total_dist = horizontal_add_s64x2(sum);
+ }
+}
+
+void av1_calc_indices_dim2_neon(const int16_t *data, const int16_t *centroids,
+ uint8_t *indices, int64_t *total_dist, int n,
+ int k) {
+ int64x2_t sum = vdupq_n_s64(0);
+ uint32x4_t ind[2];
+ int16x8_t cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ const int16_t cx = centroids[2 * j], cy = centroids[2 * j + 1];
+ const int16_t cxcy[8] = { cx, cy, cx, cy, cx, cy, cx, cy };
+ cents[j] = vld1q_s16(cxcy);
+ }
+
+ for (int i = 0; i < n; i += 8) {
+ for (int l = 0; l < 2; ++l) {
+ const int16x8_t in = vld1q_s16(data);
+ ind[l] = vdupq_n_u32(0);
+ // Compute the distance to the first centroid.
+ int16x8_t d1 = vsubq_s16(in, cents[0]);
+ int32x4_t dist_min = k_means_multiply_add_neon(d1);
+
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ d1 = vsubq_s16(in, cents[j]);
+ const int32x4_t dist = k_means_multiply_add_neon(d1);
+ // Compare to the minimal one.
+ const uint32x4_t cmp = vcgtq_s32(dist_min, dist);
+ dist_min = vminq_s32(dist_min, dist);
+ const uint32x4_t ind1 = vdupq_n_u32(j);
+ ind[l] = vbslq_u32(cmp, ind1, ind[l]);
+ }
+ if (total_dist) {
+ // Pairwise sum, convert to 64 bit and add to sum.
+ sum = vpadalq_s32(sum, dist_min);
+ }
+ data += 8;
+ }
+ // Cast to 8 bit and store.
+ vst1_u8(indices,
+ vmovn_u16(vcombine_u16(vmovn_u32(ind[0]), vmovn_u32(ind[1]))));
+ indices += 8;
+ }
+ if (total_dist) {
+ *total_dist = horizontal_add_s64x2(sum);
+ }
+}
diff --git a/av1/encoder/arm/neon/av1_temporal_denoiser_neon.c b/av1/encoder/arm/neon/av1_temporal_denoiser_neon.c
index 3528105..18cd0ce 100644
--- a/av1/encoder/arm/neon/av1_temporal_denoiser_neon.c
+++ b/av1/encoder/arm/neon/av1_temporal_denoiser_neon.c
@@ -24,7 +24,7 @@
// Compute the sum of all pixel differences of this MB.
static INLINE int horizontal_add_s8x16(const int8x16_t v_sum_diff_total) {
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
return vaddlvq_s8(v_sum_diff_total);
#else
const int16x8_t fe_dc_ba_98_76_54_32_10 = vpaddlq_s8(v_sum_diff_total);
diff --git a/av1/encoder/arm/neon/encodetxb_neon.c b/av1/encoder/arm/neon/encodetxb_neon.c
index 9bb822a..ee93608 100644
--- a/av1/encoder/arm/neon/encodetxb_neon.c
+++ b/av1/encoder/arm/neon/encodetxb_neon.c
@@ -13,31 +13,33 @@
#include <assert.h>
#include <math.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/arm/mem_neon.h"
#include "av1/common/txb_common.h"
#include "av1/encoder/encodetxb.h"
void av1_txb_init_levels_neon(const tran_low_t *const coeff, const int width,
const int height, uint8_t *const levels) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
memset(levels - TX_PAD_TOP * stride, 0,
sizeof(*levels) * TX_PAD_TOP * stride);
- memset(levels + stride * height, 0,
+ memset(levels + stride * width, 0,
sizeof(*levels) * (TX_PAD_BOTTOM * stride + TX_PAD_END));
const int32x4_t zeros = vdupq_n_s32(0);
int i = 0;
uint8_t *ls = levels;
const tran_low_t *cf = coeff;
- if (width == 4) {
+ if (height == 4) {
do {
const int32x4_t coeffA = vld1q_s32(cf);
- const int32x4_t coeffB = vld1q_s32(cf + width);
+ const int32x4_t coeffB = vld1q_s32(cf + height);
const int16x8_t coeffAB =
vcombine_s16(vqmovn_s32(coeffA), vqmovn_s32(coeffB));
const int16x8_t absAB = vqabsq_s16(coeffAB);
const int8x8_t absABs = vqmovn_s16(absAB);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const int8x16_t absAB8 =
vcombine_s8(absABs, vreinterpret_s8_s32(vget_low_s32(zeros)));
const uint8x16_t lsAB =
@@ -50,10 +52,10 @@
#endif
vst1q_u8(ls, lsAB);
ls += (stride << 1);
- cf += (width << 1);
+ cf += (height << 1);
i += 2;
- } while (i < height);
- } else if (width == 8) {
+ } while (i < width);
+ } else if (height == 8) {
do {
const int32x4_t coeffA = vld1q_s32(cf);
const int32x4_t coeffB = vld1q_s32(cf + 4);
@@ -64,9 +66,9 @@
vqmovn_s16(absAB), vreinterpret_s8_s32(vget_low_s32(zeros))));
vst1q_u8(ls, absAB8);
ls += stride;
- cf += width;
+ cf += height;
i += 1;
- } while (i < height);
+ } while (i < width);
} else {
do {
int j = 0;
@@ -86,18 +88,18 @@
vst1q_u8((ls + j), absABCD);
j += 16;
cf += 16;
- } while (j < width);
- *(int32_t *)(ls + width) = 0;
+ } while (j < height);
+ *(int32_t *)(ls + height) = 0;
ls += stride;
i += 1;
- } while (i < height);
+ } while (i < width);
}
}
// get_4_nz_map_contexts_2d coefficients:
static const DECLARE_ALIGNED(16, uint8_t, c_4_po_2d[2][16]) = {
{ 0, 1, 6, 6, 1, 6, 6, 21, 6, 6, 21, 21, 6, 21, 21, 21 },
- { 0, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21, 21, 6, 21, 21, 21 }
+ { 0, 16, 16, 16, 16, 16, 16, 16, 6, 6, 21, 21, 6, 21, 21, 21 }
};
// get_4_nz_map_contexts_hor coefficients:
@@ -108,7 +110,7 @@
/* clang-format on */
// get_4_nz_map_contexts_ver coefficients:
-static const DECLARE_ALIGNED(16, uint8_t, c_4_po_ver[16]) = {
+static const DECLARE_ALIGNED(16, uint8_t, c_4_po_hor[16]) = {
SIG_COEF_CONTEXTS_2D + 0, SIG_COEF_CONTEXTS_2D + 0,
SIG_COEF_CONTEXTS_2D + 0, SIG_COEF_CONTEXTS_2D + 0,
SIG_COEF_CONTEXTS_2D + 5, SIG_COEF_CONTEXTS_2D + 5,
@@ -120,25 +122,25 @@
};
// get_8_coeff_contexts_2d coefficients:
-// if (height == 8)
+// if (width == 8)
static const DECLARE_ALIGNED(16, uint8_t, c_8_po_2d_8[2][16]) = {
{ 0, 1, 6, 6, 21, 21, 21, 21, 1, 6, 6, 21, 21, 21, 21, 21 },
{ 6, 6, 21, 21, 21, 21, 21, 21, 6, 21, 21, 21, 21, 21, 21, 21 }
};
-// if (height < 8)
+// if (width < 8)
static const DECLARE_ALIGNED(16, uint8_t, c_8_po_2d_l[2][16]) = {
- { 0, 16, 6, 6, 21, 21, 21, 21, 16, 16, 6, 21, 21, 21, 21, 21 },
- { 16, 16, 21, 21, 21, 21, 21, 21, 16, 16, 21, 21, 21, 21, 21, 21 }
+ { 0, 11, 6, 6, 21, 21, 21, 21, 11, 11, 6, 21, 21, 21, 21, 21 },
+ { 11, 11, 21, 21, 21, 21, 21, 21, 11, 11, 21, 21, 21, 21, 21, 21 }
};
-// if (height > 8)
+// if (width > 8)
static const DECLARE_ALIGNED(16, uint8_t, c_8_po_2d_g[2][16]) = {
- { 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11 },
+ { 0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16 },
{ 6, 6, 21, 21, 21, 21, 21, 21, 6, 21, 21, 21, 21, 21, 21, 21 }
};
// get_4_nz_map_contexts_ver coefficients:
-static const DECLARE_ALIGNED(16, uint8_t, c_8_po_hor[16]) = {
+static const DECLARE_ALIGNED(16, uint8_t, c_8_po_ver[16]) = {
SIG_COEF_CONTEXTS_2D + 0, SIG_COEF_CONTEXTS_2D + 5,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
@@ -158,22 +160,22 @@
{ 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 }
};
-// real_width > real_height
+// real_width < real_height
static const DECLARE_ALIGNED(16, uint8_t, c_16_po_2d_g[3][16]) = {
- { 0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 },
- { 16, 16, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 },
- { 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 }
+ { 0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 },
+ { 11, 11, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 },
+ { 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 }
};
-// real_width < real_height
+// real_width > real_height
static const DECLARE_ALIGNED(16, uint8_t, c_16_po_2d_l[3][16]) = {
- { 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11 },
+ { 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16 },
{ 6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 },
{ 6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21 }
};
// get_16n_coeff_contexts_hor coefficients:
-static const DECLARE_ALIGNED(16, uint8_t, c_16_po_hor[16]) = {
+static const DECLARE_ALIGNED(16, uint8_t, c_16_po_ver[16]) = {
SIG_COEF_CONTEXTS_2D + 0, SIG_COEF_CONTEXTS_2D + 5,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
@@ -188,7 +190,7 @@
static INLINE uint8x16_t load_8bit_4x4_to_1_reg(const uint8_t *const src,
const int byte_stride) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
uint32x4_t v_data = vld1q_u32((uint32_t *)src);
v_data = vld1q_lane_u32((uint32_t *)(src + 1 * byte_stride), v_data, 1);
v_data = vld1q_lane_u32((uint32_t *)(src + 2 * byte_stride), v_data, 2);
@@ -202,7 +204,7 @@
static INLINE uint8x16_t load_8bit_8x2_to_1_reg(const uint8_t *const src,
const int byte_stride) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
uint64x2_t v_data = vld1q_u64((uint64_t *)src);
v_data = vld1q_lane_u64((uint64_t *)(src + 1 * byte_stride), v_data, 1);
@@ -273,22 +275,22 @@
}
static INLINE void get_4_nz_map_contexts_2d(const uint8_t *levels,
- const int height,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *const coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
const uint8x16_t pos_to_offset_large = vdupq_n_u8(21);
uint8x16_t pos_to_offset =
- vld1q_u8((height == 4) ? c_4_po_2d[0] : c_4_po_2d[1]);
+ vld1q_u8((width == 4) ? c_4_po_2d[0] : c_4_po_2d[1]);
uint8x16_t count;
uint8x16_t level[5];
uint8_t *cc = coeff_contexts;
- assert(!(height % 4));
+ assert(!(width % 4));
- int row = height;
+ int col = width;
do {
load_levels_4x4x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -297,14 +299,14 @@
pos_to_offset = pos_to_offset_large;
levels += 4 * stride;
cc += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
coeff_contexts[0] = 0;
}
-static INLINE void get_4_nz_map_contexts_hor(const uint8_t *levels,
- const int height,
+static INLINE void get_4_nz_map_contexts_ver(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
@@ -315,9 +317,9 @@
uint8x16_t count;
uint8x16_t level[5];
- assert(!(height % 4));
+ assert(!(width % 4));
- int row = height;
+ int col = width;
do {
load_levels_4x4x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -325,25 +327,25 @@
vst1q_u8(coeff_contexts, count);
levels += 4 * stride;
coeff_contexts += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
}
-static INLINE void get_4_nz_map_contexts_ver(const uint8_t *levels,
- const int height,
+static INLINE void get_4_nz_map_contexts_hor(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
const uint8x16_t pos_to_offset_large = vdupq_n_u8(SIG_COEF_CONTEXTS_2D + 10);
- uint8x16_t pos_to_offset = vld1q_u8(c_4_po_ver);
+ uint8x16_t pos_to_offset = vld1q_u8(c_4_po_hor);
uint8x16_t count;
uint8x16_t level[5];
- assert(!(height % 4));
+ assert(!(width % 4));
- int row = height;
+ int col = width;
do {
load_levels_4x4x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -352,12 +354,12 @@
pos_to_offset = pos_to_offset_large;
levels += 4 * stride;
coeff_contexts += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
}
static INLINE void get_8_coeff_contexts_2d(const uint8_t *levels,
- const int height,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
@@ -366,12 +368,12 @@
uint8x16_t level[5];
uint8x16_t pos_to_offset[3];
- assert(!(height % 2));
+ assert(!(width % 2));
- if (height == 8) {
+ if (width == 8) {
pos_to_offset[0] = vld1q_u8(c_8_po_2d_8[0]);
pos_to_offset[1] = vld1q_u8(c_8_po_2d_8[1]);
- } else if (height < 8) {
+ } else if (width < 8) {
pos_to_offset[0] = vld1q_u8(c_8_po_2d_l[0]);
pos_to_offset[1] = vld1q_u8(c_8_po_2d_l[1]);
} else {
@@ -380,7 +382,7 @@
}
pos_to_offset[2] = vdupq_n_u8(21);
- int row = height;
+ int col = width;
do {
load_levels_8x2x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -390,26 +392,26 @@
pos_to_offset[1] = pos_to_offset[2];
levels += 2 * stride;
cc += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
coeff_contexts[0] = 0;
}
-static INLINE void get_8_coeff_contexts_hor(const uint8_t *levels,
- const int height,
+static INLINE void get_8_coeff_contexts_ver(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
- const uint8x16_t pos_to_offset = vld1q_u8(c_8_po_hor);
+ const uint8x16_t pos_to_offset = vld1q_u8(c_8_po_ver);
uint8x16_t count;
uint8x16_t level[5];
- assert(!(height % 2));
+ assert(!(width % 2));
- int row = height;
+ int col = width;
do {
load_levels_8x2x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -417,12 +419,12 @@
vst1q_u8(coeff_contexts, count);
levels += 2 * stride;
coeff_contexts += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
}
-static INLINE void get_8_coeff_contexts_ver(const uint8_t *levels,
- const int height,
+static INLINE void get_8_coeff_contexts_hor(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
@@ -434,9 +436,9 @@
uint8x16_t count;
uint8x16_t level[5];
- assert(!(height % 2));
+ assert(!(width % 2));
- int row = height;
+ int col = width;
do {
load_levels_8x2x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -445,8 +447,8 @@
pos_to_offset = pos_to_offset_large;
levels += 2 * stride;
coeff_contexts += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
}
static INLINE void get_16n_coeff_contexts_2d(const uint8_t *levels,
@@ -455,15 +457,15 @@
const int width, const int height,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
uint8_t *cc = coeff_contexts;
- int row = height;
+ int col = width;
uint8x16_t pos_to_offset[5];
uint8x16_t pos_to_offset_large[3];
uint8x16_t count;
uint8x16_t level[5];
- assert(!(width % 16));
+ assert(!(height % 16));
pos_to_offset_large[2] = vdupq_n_u8(21);
if (real_width == real_height) {
@@ -473,22 +475,22 @@
pos_to_offset[3] = vld1q_u8(c_16_po_2d_e[3]);
pos_to_offset[4] = pos_to_offset_large[0] = pos_to_offset_large[1] =
pos_to_offset_large[2];
- } else if (real_width > real_height) {
+ } else if (real_width < real_height) {
pos_to_offset[0] = vld1q_u8(c_16_po_2d_g[0]);
pos_to_offset[1] = vld1q_u8(c_16_po_2d_g[1]);
pos_to_offset[2] = pos_to_offset[3] = pos_to_offset[4] =
vld1q_u8(c_16_po_2d_g[2]);
pos_to_offset_large[0] = pos_to_offset_large[1] = pos_to_offset_large[2];
- } else { // real_width < real_height
+ } else { // real_width > real_height
pos_to_offset[0] = pos_to_offset[1] = vld1q_u8(c_16_po_2d_l[0]);
pos_to_offset[2] = vld1q_u8(c_16_po_2d_l[1]);
pos_to_offset[3] = vld1q_u8(c_16_po_2d_l[2]);
pos_to_offset[4] = pos_to_offset_large[2];
- pos_to_offset_large[0] = pos_to_offset_large[1] = vdupq_n_u8(11);
+ pos_to_offset_large[0] = pos_to_offset_large[1] = vdupq_n_u8(16);
}
do {
- int w = width;
+ int h = height;
do {
load_levels_16x1x5(levels, stride, offsets, level);
@@ -497,9 +499,9 @@
vst1q_u8(cc, count);
levels += 16;
cc += 16;
- w -= 16;
+ h -= 16;
pos_to_offset[0] = pos_to_offset_large[0];
- } while (w);
+ } while (h);
pos_to_offset[0] = pos_to_offset[1];
pos_to_offset[1] = pos_to_offset[2];
@@ -508,29 +510,29 @@
pos_to_offset_large[0] = pos_to_offset_large[1];
pos_to_offset_large[1] = pos_to_offset_large[2];
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
coeff_contexts[0] = 0;
}
-static INLINE void get_16n_coeff_contexts_hor(const uint8_t *levels,
+static INLINE void get_16n_coeff_contexts_ver(const uint8_t *levels,
const int width, const int height,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
const uint8x16_t pos_to_offset_large = vdupq_n_u8(SIG_COEF_CONTEXTS_2D + 10);
uint8x16_t count;
uint8x16_t level[5];
- assert(!(width % 16));
+ assert(!(height % 16));
- int row = height;
+ int col = width;
do {
- uint8x16_t pos_to_offset = vld1q_u8(c_16_po_hor);
+ uint8x16_t pos_to_offset = vld1q_u8(c_16_po_ver);
- int w = width;
+ int h = height;
do {
load_levels_16x1x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -539,32 +541,32 @@
pos_to_offset = pos_to_offset_large;
levels += 16;
coeff_contexts += 16;
- w -= 16;
- } while (w);
+ h -= 16;
+ } while (h);
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
}
-static INLINE void get_16n_coeff_contexts_ver(const uint8_t *levels,
+static INLINE void get_16n_coeff_contexts_hor(const uint8_t *levels,
const int width, const int height,
const ptrdiff_t *const offsets,
uint8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
uint8x16_t pos_to_offset[3];
uint8x16_t count;
uint8x16_t level[5];
- assert(!(width % 16));
+ assert(!(height % 16));
pos_to_offset[0] = vdupq_n_u8(SIG_COEF_CONTEXTS_2D + 0);
pos_to_offset[1] = vdupq_n_u8(SIG_COEF_CONTEXTS_2D + 5);
pos_to_offset[2] = vdupq_n_u8(SIG_COEF_CONTEXTS_2D + 10);
- int row = height;
+ int col = width;
do {
- int w = width;
+ int h = height;
do {
load_levels_16x1x5(levels, stride, offsets, level);
count = get_coeff_contexts_kernel(level);
@@ -572,13 +574,13 @@
vst1q_u8(coeff_contexts, count);
levels += 16;
coeff_contexts += 16;
- w -= 16;
- } while (w);
+ h -= 16;
+ } while (h);
pos_to_offset[0] = pos_to_offset[1];
pos_to_offset[1] = pos_to_offset[2];
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
}
// Note: levels[] must be in the range [0, 127], inclusive.
@@ -599,7 +601,7 @@
const int real_height = tx_size_high[tx_size];
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
ptrdiff_t offsets[3];
/* coeff_contexts must be 16 byte aligned. */
@@ -610,43 +612,43 @@
offsets[1] = 1 * stride + 1;
offsets[2] = 2 * stride + 0;
- if (width == 4) {
- get_4_nz_map_contexts_2d(levels, height, offsets, coefficients);
- } else if (width == 8) {
- get_8_coeff_contexts_2d(levels, height, offsets, coefficients);
+ if (height == 4) {
+ get_4_nz_map_contexts_2d(levels, width, offsets, coefficients);
+ } else if (height == 8) {
+ get_8_coeff_contexts_2d(levels, width, offsets, coefficients);
} else {
get_16n_coeff_contexts_2d(levels, real_width, real_height, width, height,
offsets, coefficients);
}
} else if (tx_class == TX_CLASS_HORIZ) {
- offsets[0] = 2;
- offsets[1] = 3;
- offsets[2] = 4;
- if (width == 4) {
- get_4_nz_map_contexts_hor(levels, height, offsets, coefficients);
- } else if (width == 8) {
- get_8_coeff_contexts_hor(levels, height, offsets, coefficients);
+ offsets[0] = 2 * stride;
+ offsets[1] = 3 * stride;
+ offsets[2] = 4 * stride;
+ if (height == 4) {
+ get_4_nz_map_contexts_hor(levels, width, offsets, coefficients);
+ } else if (height == 8) {
+ get_8_coeff_contexts_hor(levels, width, offsets, coefficients);
} else {
get_16n_coeff_contexts_hor(levels, width, height, offsets, coefficients);
}
} else { // TX_CLASS_VERT
- offsets[0] = 2 * stride;
- offsets[1] = 3 * stride;
- offsets[2] = 4 * stride;
- if (width == 4) {
- get_4_nz_map_contexts_ver(levels, height, offsets, coefficients);
- } else if (width == 8) {
- get_8_coeff_contexts_ver(levels, height, offsets, coefficients);
+ offsets[0] = 2;
+ offsets[1] = 3;
+ offsets[2] = 4;
+ if (height == 4) {
+ get_4_nz_map_contexts_ver(levels, width, offsets, coefficients);
+ } else if (height == 8) {
+ get_8_coeff_contexts_ver(levels, width, offsets, coefficients);
} else {
get_16n_coeff_contexts_ver(levels, width, height, offsets, coefficients);
}
}
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int pos = scan[last_idx];
- if (last_idx <= (height << bwl) / 8)
+ if (last_idx <= (width << bhl) / 8)
coeff_contexts[pos] = 1;
- else if (last_idx <= (height << bwl) / 4)
+ else if (last_idx <= (width << bhl) / 4)
coeff_contexts[pos] = 2;
else
coeff_contexts[pos] = 3;
diff --git a/av1/encoder/arm/neon/highbd_fwd_txfm_neon.c b/av1/encoder/arm/neon/highbd_fwd_txfm_neon.c
index 273712a..15d375a 100644
--- a/av1/encoder/arm/neon/highbd_fwd_txfm_neon.c
+++ b/av1/encoder/arm/neon/highbd_fwd_txfm_neon.c
@@ -19,6 +19,14 @@
#include "config/av1_rtcd.h"
#include "config/aom_config.h"
+static INLINE void store_output_w4(int32_t *const out,
+ const int32x4_t *const in, const int stride,
+ const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ vst1q_s32(out + i * stride, in[i]);
+ }
+}
+
static INLINE int32x4_t half_btf_neon(const int32_t *w0, const int32x4_t *n0,
const int32_t *w1, const int32x4_t *n1,
const int32x4_t v_bit) {
@@ -39,7 +47,7 @@
return x;
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
#define TRANSPOSE_4X4(x0, x1, x2, x3, y0, y1, y2, y3) \
do { \
int32x4x2_t swap_low = vtrnq_s32(x0, x1); \
@@ -71,7 +79,11 @@
y3 = vextq_s32(swap_low.val[1], \
vextq_s32(swap_high.val[1], swap_high.val[1], 2), 2); \
} while (0)
-#endif // (__aarch64__)
+#endif // AOM_ARCH_AARCH64
+
+static INLINE void transpose_4x4(const int32x4_t *in, int32x4_t *out) {
+ TRANSPOSE_4X4(in[0], in[1], in[2], in[3], out[0], out[1], out[2], out[3]);
+}
static INLINE void transpose_8x8(const int32x4_t *in, int32x4_t *out) {
TRANSPOSE_4X4(in[0], in[2], in[4], in[6], out[0], out[2], out[4], out[6]);
@@ -215,7 +227,10 @@
u3 = vrshlq_s32(v2, v_bit);
- TRANSPOSE_4X4(u0, u1, u2, u3, out[0], out[1], out[2], out[3]);
+ out[0] = u0;
+ out[1] = u1;
+ out[2] = u2;
+ out[3] = u3;
}
static INLINE void write_buffer_4x4(int32x4_t *res, int32_t *output) {
@@ -237,7 +252,6 @@
int32x4_t t;
int32x4_t s0, s1, s2, s3, s7;
int32x4_t x0, x1, x2, x3;
- int32x4_t u0, u1, u2, u3;
int idx = 0 * num_col;
s0 = vmulq_s32(in[idx], sinpi1);
@@ -261,12 +275,10 @@
s3 = vaddq_s32(t, x3);
const int32x4_t v_bit = vdupq_n_s32(-bit);
- u0 = vrshlq_s32(s0, v_bit);
- u1 = vrshlq_s32(s1, v_bit);
- u2 = vrshlq_s32(s2, v_bit);
- u3 = vrshlq_s32(s3, v_bit);
-
- TRANSPOSE_4X4(u0, u1, u2, u3, out[0], out[1], out[2], out[3]);
+ out[0] = vrshlq_s32(s0, v_bit);
+ out[1] = vrshlq_s32(s1, v_bit);
+ out[2] = vrshlq_s32(s2, v_bit);
+ out[3] = vrshlq_s32(s3, v_bit);
}
static void idtx4x4_neon(int32x4_t *in, int32x4_t *out, int bit, int col_num) {
(void)bit;
@@ -278,8 +290,6 @@
a_low = vmulq_s32(in[i * col_num], fact);
out[i] = vrshrq_n_s32(a_low, NewSqrt2Bits);
}
-
- TRANSPOSE_4X4(out[0], out[1], out[2], out[3], out[0], out[1], out[2], out[3]);
}
void av1_fwd_txfm2d_4x4_neon(const int16_t *input, int32_t *coeff,
int input_stride, TX_TYPE tx_type, int bd) {
@@ -292,96 +302,112 @@
case DCT_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fdct4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fdct4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fdct4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case DCT_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fdct4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_DCT:
load_buffer_4x4(input, in, input_stride, 1, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fdct4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case DCT_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, &v_shift0);
fdct4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_4x4(input, in, input_stride, 1, 1, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_ADST:
load_buffer_4x4(input, in, input_stride, 1, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case IDTX:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
idtx4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fdct4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fdct4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, &v_shift0);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_FLIPADST:
load_buffer_4x4(input, in, input_stride, 1, 0, &v_shift0);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, &v_shift0);
idtx4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_4x4(in, in);
fadst4x4_neon(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
@@ -827,8 +853,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fdct8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -836,8 +861,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fdct8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case DCT_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -845,8 +869,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -854,8 +877,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_DCT:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -863,8 +885,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fdct8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case DCT_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -872,8 +893,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_8x8(input, in, stride, 1, 1, shift[0]);
@@ -881,8 +901,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -890,8 +909,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_ADST:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -899,8 +917,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case IDTX:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -908,8 +925,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
idtx8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -917,8 +933,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
idtx8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -926,8 +941,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fdct8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -935,8 +949,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
idtx8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -944,8 +957,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_FLIPADST:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -953,8 +965,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
idtx8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -962,8 +973,7 @@
col_txfm_8x8_rounding(out, &v_shift1);
transpose_8x8(out, in);
fadst8x8_neon(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
default: assert(0);
}
@@ -1628,8 +1638,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fdct16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1637,8 +1646,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fdct16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case DCT_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1646,8 +1654,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1655,8 +1662,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_DCT:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1664,8 +1670,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fdct16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case DCT_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1673,8 +1678,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_16x16(input, in, stride, 1, 1, shift[0]);
@@ -1682,8 +1686,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1691,8 +1694,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_ADST:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1700,8 +1702,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case IDTX:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1709,8 +1710,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
idtx16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1718,8 +1718,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
idtx16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1727,8 +1726,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fdct16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1736,8 +1734,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
idtx16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1745,8 +1742,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_FLIPADST:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1754,8 +1750,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
idtx16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1763,8 +1758,7 @@
col_txfm_16x16_rounding(out, &v_shift);
transpose_16x16(out, in);
fadst16x16_neon(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
default: assert(0);
}
@@ -2356,37 +2350,30 @@
cospi = cospi_arr(cos_bit);
for (col = 0; col < col_num; col++) {
// stage 0;
- int32_t stage_idx = 0;
int j;
for (j = 0; j < 4; ++j) {
buf0[j] = input[j * col_num + col];
}
// stage 1
- stage_idx++;
buf1[0] = buf0[3];
buf1[1] = buf0[0];
buf1[2] = buf0[1];
buf1[3] = buf0[2];
// stage 2
- stage_idx++;
-
btf_32_neon_type0(cospi[8], cospi[56], buf1[0], buf1[1], buf0[0], buf0[1],
v_cos_bit);
btf_32_neon_type0(cospi[40], cospi[24], buf1[2], buf1[3], buf0[2], buf0[3],
v_cos_bit);
// stage 3
- stage_idx++;
buf1[0] = vaddq_s32(buf0[0], buf0[2]);
buf1[2] = vsubq_s32(buf0[0], buf0[2]);
buf1[1] = vaddq_s32(buf0[1], buf0[3]);
buf1[3] = vsubq_s32(buf0[1], buf0[3]);
// stage 4
- stage_idx++;
-
cospi = cospi_arr(cos_bit);
buf0[0] = buf1[0];
buf0[1] = buf1[1];
@@ -2395,7 +2382,6 @@
v_cos_bit);
// stage 5
- stage_idx++;
buf1[0] = buf0[0];
buf1[1] = vnegq_s32(buf0[2]);
buf1[2] = buf0[3];
@@ -3375,9 +3361,9 @@
}
for (int i = 0; i < 2; i++) {
- transpose_8x8(out + i * 16, in);
- av1_round_shift_rect_array_32_neon(in, in, 16, -shift[2], NewSqrt2);
- write_buffer_16x8(in, coeff + i * 8, 16);
+ av1_round_shift_rect_array_32_neon(out + i * 16, in, 16, -shift[2],
+ NewSqrt2);
+ write_buffer_8x8(in, coeff + i * 64);
}
}
@@ -3403,9 +3389,8 @@
for (int i = 0; i < 2; i++) {
row_txfm(out + i * 16, out, bit, 2);
- transpose_8x8(out, in);
- av1_round_shift_rect_array_32_neon(in, in, 16, -shift[2], NewSqrt2);
- write_buffer_8x8(in, coeff + i * 64);
+ av1_round_shift_rect_array_32_neon(out, out, 16, -shift[2], NewSqrt2);
+ write_buffer_16x8(out, coeff + i * 8, 16);
}
}
@@ -3456,7 +3441,9 @@
// row transform
for (int i = 0; i < txfm_size_col; i++) {
- row_txfm(in + i, outcoeff128 + i * txfm_size_col, bitrow, txfm_size_col);
+ int32x4_t tmp[4];
+ row_txfm(in + i, tmp, bitrow, txfm_size_row >> 2);
+ store_output_w4(coeff + i * 4, tmp, txfm_size_row, txfm_size_col);
}
}
#endif
@@ -3483,16 +3470,16 @@
const int32x4_t v_shift0 = vdupq_n_s32(shift[0]);
load_buffer_16x4(input, in, stride, ud_flip, lr_flip, &v_shift0);
- for (int i = 0; i < txfm_size_row; i++) {
- col_txfm(in + i * txfm_size_row, outcoeff128 + i * txfm_size_row, bitcol,
- 1);
+ for (int i = 0; i < (txfm_size_col >> 2); i++) {
+ int32x4_t *cur_in = &in[i * txfm_size_row];
+ col_txfm(cur_in, cur_in, bitcol, 1);
+ transpose_4x4(cur_in, cur_in);
}
const int32x4_t v_shift1 = vdupq_n_s32(shift[1]);
- col_txfm_8x8_rounding(outcoeff128, &v_shift1);
+ col_txfm_8x8_rounding(in, &v_shift1);
// row transform
- row_txfm(outcoeff128, in, bitrow, 1);
- transpose_8nx8n(in, outcoeff128, txfm_size_row, txfm_size_col);
+ row_txfm(in, outcoeff128, bitrow, 1);
}
void av1_fwd_txfm2d_16x32_neon(const int16_t *input, int32_t *coeff, int stride,
@@ -3524,9 +3511,7 @@
// row transform
row_txfm(outcoef128, in, bitrow, 8);
- transpose_8nx8n(in, outcoef128, 32, 16);
- av1_round_shift_rect_array_32_neon(outcoef128, outcoef128, 128, -shift[2],
- NewSqrt2);
+ av1_round_shift_rect_array_32_neon(in, outcoef128, 128, -shift[2], NewSqrt2);
}
void av1_fwd_txfm2d_32x64_neon(const int16_t *input, int32_t *coeff, int stride,
@@ -3562,9 +3547,10 @@
for (int i = 0; i < num_row; i++) {
av1_fdct32_new_neon((outcoef128 + i), (in + i), bitrow, num_row);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
- av1_round_shift_rect_array_32_neon(outcoef128, outcoef128, 512, -shift[2],
- NewSqrt2);
+ for (int i = 0; i < txfm_size_col; i++) {
+ av1_round_shift_rect_array_32_neon(in + i * 16, outcoef128 + i * 8, 8,
+ -shift[2], NewSqrt2);
+ }
}
void av1_fwd_txfm2d_64x32_neon(const int16_t *input, int32_t *coeff, int stride,
@@ -3609,9 +3595,7 @@
for (int i = 0; i < num_row; i++) {
av1_fdct64_new_neon((outcoef128 + i), (in + i), bitrow, num_row, num_row);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col >> 1);
- av1_round_shift_rect_array_32_neon(outcoef128, outcoef128, 512 >> 1,
- -shift[2], NewSqrt2);
+ av1_round_shift_rect_array_32_neon(in, outcoef128, 512, -shift[2], NewSqrt2);
(void)bd;
}
@@ -3639,9 +3623,7 @@
for (int i = 0; i < 4; i++) {
row_txfm((outcoef128 + i), (in + i), bitrow, 4);
}
- transpose_8nx8n(in, outcoef128, 16, 32);
- av1_round_shift_rect_array_32_neon(outcoef128, outcoef128, 128, -shift[2],
- NewSqrt2);
+ av1_round_shift_rect_array_32_neon(in, outcoef128, 128, -shift[2], NewSqrt2);
(void)bd;
}
@@ -3677,9 +3659,8 @@
// row transform
for (int i = 0; i < txfm_size_col; i += 2) {
- row_txfm((outcoef128 + i), (in + i), bitrow, txfm_size_col);
+ row_txfm((outcoef128 + i), (outcoef128 + i), bitrow, txfm_size_col);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
(void)bd;
}
@@ -3711,9 +3692,8 @@
// row transform
for (int i = 0; i < num_col; i++) {
- row_txfm((outcoef128 + i), (in + i), bitrow, num_col);
+ row_txfm((outcoef128 + i), (outcoef128 + i), bitrow, num_col);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
(void)bd;
}
#endif
@@ -3721,7 +3701,6 @@
void av1_fwd_txfm2d_4x8_neon(const int16_t *input, int32_t *coeff, int stride,
TX_TYPE tx_type, int bd) {
int32x4_t in[8];
- int32x4_t *outcoeff128 = (int32x4_t *)coeff;
const int8_t *shift = av1_fwd_txfm_shift_ls[TX_4X8];
const int txw_idx = get_txw_idx(TX_4X8);
const int txh_idx = get_txh_idx(TX_4X8);
@@ -3739,13 +3718,15 @@
col_txfm(in, in, bitcol, 1);
int32x4_t v_shift1 = vdupq_n_s32(shift[1]);
col_txfm_4x8_rounding(in, &v_shift1);
- transpose_8nx8n(in, outcoeff128, txfm_size_col, txfm_size_row);
for (int i = 0; i < 2; i++) {
- row_txfm(outcoeff128 + i, in + i * txfm_size_col, bitrow, 2);
+ int32x4_t *cur_in = &in[i * 4];
+ transpose_4x4(cur_in, cur_in);
+ row_txfm(cur_in, cur_in, bitrow, 1);
+ av1_round_shift_rect_array_32_neon(cur_in, cur_in, txfm_size_col, -shift[2],
+ NewSqrt2);
+ store_output_w4(coeff + i * 4, cur_in, txfm_size_row, 4);
}
- av1_round_shift_rect_array_32_neon(in, outcoeff128, txfm_size_row, -shift[2],
- NewSqrt2);
(void)bd;
}
@@ -3768,16 +3749,17 @@
int32x4_t v_shift0 = vdupq_n_s32(shift[0]);
load_buffer_8x4(input, in, stride, ud_flip, lr_flip, &v_shift0);
for (int i = 0; i < 2; i++) {
- col_txfm(in + i * txfm_size_row, in + i * txfm_size_row, bitcol, 1);
+ int32x4_t *cur_in = &in[i * txfm_size_row];
+ col_txfm(cur_in, cur_in, bitcol, 1);
+ transpose_4x4(cur_in, cur_in);
}
int32x4_t v_shift1 = vdupq_n_s32(shift[1]);
col_txfm_4x8_rounding(in, &v_shift1);
// row tranform
row_txfm(in, outcoeff128, bitrow, 1);
- av1_round_shift_rect_array_32_neon(outcoeff128, in, txfm_size_col, -shift[2],
- NewSqrt2);
- transpose_8nx8n(in, outcoeff128, txfm_size_row, txfm_size_col);
+ av1_round_shift_rect_array_32_neon(outcoeff128, outcoeff128, txfm_size_col,
+ -shift[2], NewSqrt2);
(void)bd;
}
@@ -3820,9 +3802,7 @@
col_txfm_16x16_rounding(outcoeff128 + 192, &v_shift);
transpose_8nx8n(outcoeff128, in, txfm_size_col, 32);
- fdct16x16_neon(in, in, bitrow, 8);
- transpose_8nx8n(in, outcoeff128, 32, txfm_size_col);
- memset(coeff + txfm_size_col * 32, 0, txfm_size_col * 32 * sizeof(*coeff));
+ fdct16x16_neon(in, outcoeff128, bitrow, 8);
(void)bd;
}
@@ -3861,9 +3841,9 @@
transpose_8nx8n(outcoeff128, in, txfm_size_col, txfm_size_row);
for (int i = 0; i < 4; i++) {
- av1_fdct64_new_neon(in + i, in + i, bitrow, 4, 4);
+ av1_fdct64_new_neon(in + i, outcoeff128 + i, bitrow, 4, 4);
}
- transpose_8nx8n(in, outcoeff128, txfm_size_row, 32);
+ memset(coeff + txfm_size_row * 32, 0, txfm_size_row * 32 * sizeof(*coeff));
(void)bd;
}
#endif
@@ -3906,9 +3886,9 @@
static INLINE TxfmFuncNEON fwd_txfm_type_to_func(TXFM_TYPE txfm_type) {
switch (txfm_type) {
- case TXFM_TYPE_DCT32: return fdct32_new_neon; break;
- case TXFM_TYPE_DCT64: return fdct64_new_neon; break;
- case TXFM_TYPE_IDENTITY32: return idtx32x32_neon; break;
+ case TXFM_TYPE_DCT32: return fdct32_new_neon;
+ case TXFM_TYPE_DCT64: return fdct64_new_neon;
+ case TXFM_TYPE_IDENTITY32: return idtx32x32_neon;
default: assert(0);
}
return NULL;
@@ -3994,8 +3974,7 @@
}
txfm2d_size_128 = (col_num >> 1) * (txfm_size >> 1);
- av1_round_shift_array_32_neon(out_128, buf_128, txfm2d_size_128, -shift[2]);
- transpose_8nx8n(buf_128, out_128, 32, 32);
+ av1_round_shift_array_32_neon(out_128, out_128, txfm2d_size_128, -shift[2]);
}
static INLINE void fwd_txfm2d_neon(const int16_t *input, int32_t *output,
@@ -4024,8 +4003,7 @@
av1_round_shift_array_32_neon(buf_128, out_128, txfm2d_size_128, -shift[1]);
transpose_32(txfm_size, out_128, buf_128);
txfm_func_row(buf_128, out_128, cos_bit_row, stage_range_row);
- av1_round_shift_array_32_neon(out_128, buf_128, txfm2d_size_128, -shift[2]);
- transpose_32(txfm_size, buf_128, out_128);
+ av1_round_shift_array_32_neon(out_128, out_128, txfm2d_size_128, -shift[2]);
}
void av1_fwd_txfm2d_32x32_neon(const int16_t *input, int32_t *output,
diff --git a/av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c b/av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c
index 0ad1131..6cf835a 100644
--- a/av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c
+++ b/av1/encoder/arm/neon/hybrid_fwd_txfm_neon.c
@@ -66,18 +66,8 @@
a1 = vsub_s16(a1, c1);
d1 = vadd_s16(d1, b1);
- x[0] = vcombine_s16(a1, c1);
- x[1] = vcombine_s16(d1, b1);
-
- transpose4x4(x, s);
-
- vst1q_s32(&output[0], vshll_n_s16(s[0], UNIT_QUANT_SHIFT));
- vst1q_s32(&output[4], vshll_n_s16(s[1], UNIT_QUANT_SHIFT));
- vst1q_s32(&output[8], vshll_n_s16(s[2], UNIT_QUANT_SHIFT));
- vst1q_s32(&output[12], vshll_n_s16(s[3], UNIT_QUANT_SHIFT));
-}
-
-void av1_highbd_fwht4x4_neon(const int16_t *input, tran_low_t *output,
- int stride) {
- av1_fwht4x4_neon(input, output, stride);
+ vst1q_s32(&output[0], vshll_n_s16(a1, UNIT_QUANT_SHIFT));
+ vst1q_s32(&output[4], vshll_n_s16(c1, UNIT_QUANT_SHIFT));
+ vst1q_s32(&output[8], vshll_n_s16(d1, UNIT_QUANT_SHIFT));
+ vst1q_s32(&output[12], vshll_n_s16(b1, UNIT_QUANT_SHIFT));
}
diff --git a/av1/encoder/arm/neon/ml_neon.c b/av1/encoder/arm/neon/ml_neon.c
index fcff3a9..be6ddfd 100644
--- a/av1/encoder/arm/neon/ml_neon.c
+++ b/av1/encoder/arm/neon/ml_neon.c
@@ -13,6 +13,7 @@
#include <assert.h>
#include <arm_neon.h>
+#include "config/aom_config.h"
#include "config/av1_rtcd.h"
#include "av1/encoder/ml.h"
@@ -46,7 +47,7 @@
vadd = vmlaq_f32(vadd, inputs_h, weights_h);
vadd = vmlaq_f32(vadd, inputs_l, weights_l);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
total += vaddvq_f32(vadd);
#else
float32x2_t vadd_lo = vadd_f32(vget_low_f32(vadd), vget_high_f32(vadd));
@@ -80,7 +81,7 @@
j -= 8;
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
total += vaddvq_f32(vadd);
#else
@@ -98,7 +99,7 @@
const float *layer_bias,
float *const output_nodes) {
float total = *layer_bias;
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const float32x4_t v_inputs = vld1q_f32(inputs);
const float32x4_t v_weights = vld1q_f32(weights);
const float32x4_t vadd = vmulq_f32(v_inputs, v_weights);
@@ -126,7 +127,7 @@
vadd = vmlaq_f32(vadd, v_inputs, v_weights);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
total += vaddvq_f32(vadd);
#else
float32x2_t vadd_lo = vadd_f32(vget_low_f32(vadd), vget_high_f32(vadd));
@@ -159,7 +160,7 @@
}
}
for (int i = 0; i < 2; i++)
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
mul0[i] = vpaddq_f32(mul0[i], mul1[i]);
const float32x4_t hh = vpaddq_f32(mul0[0], mul0[1]);
#else
@@ -197,7 +198,7 @@
}
}
for (int i = 0; i < 4; i++)
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
mul0[i] = vpaddq_f32(mul0[i], mul1[i]);
const float32x4_t hh0 = vpaddq_f32(mul0[0], mul0[1]);
const float32x4_t hh1 = vpaddq_f32(mul0[2], mul0[3]);
@@ -239,7 +240,7 @@
add[i] = vmlaq_f32(add[i], inputs_h, weight_h);
}
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const float32x4_t hadd_h = vpaddq_f32(add[2], add[3]);
const float32x4_t hadd_l = vpaddq_f32(add[0], add[1]);
const float32x4_t haddhadd = vpaddq_f32(hadd_l, hadd_h);
diff --git a/av1/encoder/arm/neon/picksrt_neon.c b/av1/encoder/arm/neon/picksrt_neon.c
index a1e7765..1346d6b 100644
--- a/av1/encoder/arm/neon/picksrt_neon.c
+++ b/av1/encoder/arm/neon/picksrt_neon.c
@@ -141,10 +141,10 @@
}
sum64 = vpaddlq_u32(err0);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
err += vaddvq_u64(sum64);
#else
err += vget_lane_u64(vadd_u64(vget_low_u64(sum64), vget_high_u64(sum64)), 0);
-#endif // __aarch64__
+#endif // AOM_ARCH_AARCH64
return err;
}
diff --git a/av1/encoder/arm/neon/quantize_neon.c b/av1/encoder/arm/neon/quantize_neon.c
index dbfbeef..c3b57ce 100644
--- a/av1/encoder/arm/neon/quantize_neon.c
+++ b/av1/encoder/arm/neon/quantize_neon.c
@@ -14,6 +14,8 @@
#include <assert.h>
#include <math.h>
+#include "config/aom_config.h"
+
#include "aom_dsp/arm/mem_neon.h"
#include "aom_dsp/arm/sum_neon.h"
#include "aom_mem/aom_mem.h"
@@ -26,7 +28,7 @@
#include "av1/encoder/rd.h"
static INLINE uint16_t get_max_eob(int16x8_t v_eobmax) {
-#ifdef __aarch64__
+#if AOM_ARCH_AARCH64
return (uint16_t)vmaxvq_s16(v_eobmax);
#else
const int16x4_t v_eobmax_3210 =
diff --git a/av1/encoder/arm/neon/rdopt_neon.c b/av1/encoder/arm/neon/rdopt_neon.c
index 25df6b4..7d3bd4c 100644
--- a/av1/encoder/arm/neon/rdopt_neon.c
+++ b/av1/encoder/arm/neon/rdopt_neon.c
@@ -14,6 +14,7 @@
#include <arm_neon.h>
#include "av1/encoder/rdopt.h"
+#include "config/aom_config.h"
#include "config/av1_rtcd.h"
// Process horizontal and vertical correlations in a 4x4 block of pixels.
@@ -97,7 +98,7 @@
v_x_sum = vpadalq_s32(v_x_sum, x_sum_32);
v_x2_sum = vpadalq_s32(v_x2_sum, x2_sum_32);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
xy_sum = vaddvq_s64(v_xy_sum);
xz_sum = vaddvq_s64(v_xz_sum);
x2_sum = vaddvq_s64(v_x2_sum);
@@ -160,7 +161,7 @@
v_y2_sum = vmlal_s16(v_y2_sum, v_y_hi, v_y_hi);
const int32x4_t v_y_sum_a = vpadalq_s16(v_y_sum, v_y);
const int64x2_t v_xy_sum2 = vpaddlq_s32(v_xy_sum_a);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
const int64x2_t v_y2_sum_a = vpaddlq_s32(v_y2_sum);
xy_sum += vaddvq_s64(v_xy_sum2);
const int32_t y = vaddvq_s32(v_y_sum_a);
@@ -278,7 +279,7 @@
v_x_sum_a = vpadalq_s16(v_x_sum_a, v_y);
v_x_sum_a = vpadalq_s16(v_x_sum_a, v_w);
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
xy_sum += vaddvq_s64(vpaddlq_s32(v_xy_sum_a));
xz_sum += vaddvq_s64(vpaddlq_s32(v_xz_sum_a));
x_sum += vaddvq_s32(v_x_sum_a);
@@ -398,7 +399,7 @@
v_x2_firstrow = vmlal_s16(v_x2_firstrow, v_diff_lo, v_diff_lo);
v_x2_firstrow = vmlal_s16(v_x2_firstrow, v_diff_hi, v_diff_hi);
}
-#if defined(__aarch64__)
+#if AOM_ARCH_AARCH64
x_firstrow += vaddvq_s32(v_x_firstrow);
x2_firstrow += vaddvq_s32(v_x2_firstrow);
#else
diff --git a/av1/encoder/arm/neon/reconinter_enc_neon.c b/av1/encoder/arm/neon/reconinter_enc_neon.c
new file mode 100644
index 0000000..e5975b0
--- /dev/null
+++ b/av1/encoder/arm/neon/reconinter_enc_neon.c
@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <arm_neon.h>
+#include <assert.h>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom_dsp/arm/mem_neon.h"
+
+#include "av1/encoder/reconinter_enc.h"
+
+void aom_upsampled_pred_neon(MACROBLOCKD *xd, const AV1_COMMON *const cm,
+ int mi_row, int mi_col, const MV *const mv,
+ uint8_t *comp_pred, int width, int height,
+ int subpel_x_q3, int subpel_y_q3,
+ const uint8_t *ref, int ref_stride,
+ int subpel_search) {
+ // expect xd == NULL only in tests
+ if (xd != NULL) {
+ const MB_MODE_INFO *mi = xd->mi[0];
+ const int ref_num = 0;
+ const int is_intrabc = is_intrabc_block(mi);
+ const struct scale_factors *const sf =
+ is_intrabc ? &cm->sf_identity : xd->block_ref_scale_factors[ref_num];
+ const int is_scaled = av1_is_scaled(sf);
+
+ if (is_scaled) {
+ int plane = 0;
+ const int mi_x = mi_col * MI_SIZE;
+ const int mi_y = mi_row * MI_SIZE;
+ const struct macroblockd_plane *const pd = &xd->plane[plane];
+ const struct buf_2d *const dst_buf = &pd->dst;
+ const struct buf_2d *const pre_buf =
+ is_intrabc ? dst_buf : &pd->pre[ref_num];
+
+ InterPredParams inter_pred_params;
+ inter_pred_params.conv_params = get_conv_params(0, plane, xd->bd);
+ const int_interpfilters filters =
+ av1_broadcast_interp_filter(EIGHTTAP_REGULAR);
+ av1_init_inter_params(
+ &inter_pred_params, width, height, mi_y >> pd->subsampling_y,
+ mi_x >> pd->subsampling_x, pd->subsampling_x, pd->subsampling_y,
+ xd->bd, is_cur_buf_hbd(xd), is_intrabc, sf, pre_buf, filters);
+ av1_enc_build_one_inter_predictor(comp_pred, width, mv,
+ &inter_pred_params);
+ return;
+ }
+ }
+
+ const InterpFilterParams *filter_params = av1_get_filter(subpel_search);
+
+ if (!subpel_x_q3 && !subpel_y_q3) {
+ if (width > 8) {
+ assert(width % 16 == 0);
+ int i = height;
+ do {
+ int j = 0;
+ do {
+ uint8x16_t r = vld1q_u8(ref + j);
+ vst1q_u8(comp_pred + j, r);
+ j += 16;
+ } while (j < width);
+ ref += ref_stride;
+ comp_pred += width;
+ } while (--i != 0);
+ } else if (width == 8) {
+ int i = height;
+ do {
+ uint8x8_t r = vld1_u8(ref);
+ vst1_u8(comp_pred, r);
+ ref += ref_stride;
+ comp_pred += width;
+ } while (--i != 0);
+ } else {
+ assert(width == 4);
+ int i = height / 2;
+ do {
+ uint8x8_t r = load_unaligned_u8(ref, ref_stride);
+ vst1_u8(comp_pred, r);
+ ref += 2 * ref_stride;
+ comp_pred += 2 * width;
+ } while (--i != 0);
+ }
+ } else if (!subpel_y_q3) {
+ const int16_t *const filter_x =
+ av1_get_interp_filter_subpel_kernel(filter_params, subpel_x_q3 << 1);
+ aom_convolve8_horiz_neon(ref, ref_stride, comp_pred, width, filter_x, 16,
+ NULL, -1, width, height);
+ } else if (!subpel_x_q3) {
+ const int16_t *const filter_y =
+ av1_get_interp_filter_subpel_kernel(filter_params, subpel_y_q3 << 1);
+ aom_convolve8_vert_neon(ref, ref_stride, comp_pred, width, NULL, -1,
+ filter_y, 16, width, height);
+ } else {
+ DECLARE_ALIGNED(16, uint8_t,
+ im_block[((MAX_SB_SIZE * 2 + 16) + 16) * MAX_SB_SIZE]);
+
+ const int16_t *const filter_x =
+ av1_get_interp_filter_subpel_kernel(filter_params, subpel_x_q3 << 1);
+ const int16_t *const filter_y =
+ av1_get_interp_filter_subpel_kernel(filter_params, subpel_y_q3 << 1);
+
+ const int im_stride = MAX_SB_SIZE;
+ const int im_height = (((height - 1) * 8 + subpel_y_q3) >> 3) + SUBPEL_TAPS;
+
+ const int ref_vert_offset = ref_stride * ((SUBPEL_TAPS >> 1) - 1);
+ const int im_vert_offset = im_stride * ((filter_params->taps >> 1) - 1);
+
+ assert(im_height <= (MAX_SB_SIZE * 2 + 16) + 16);
+ aom_convolve8_horiz_neon(ref - ref_vert_offset, ref_stride, im_block,
+ MAX_SB_SIZE, filter_x, 16, NULL, -1, width,
+ im_height);
+ aom_convolve8_vert_neon(im_block + im_vert_offset, MAX_SB_SIZE, comp_pred,
+ width, NULL, -1, filter_y, 16, width, height);
+ }
+}
+
+void aom_comp_avg_upsampled_pred_neon(MACROBLOCKD *xd,
+ const AV1_COMMON *const cm, int mi_row,
+ int mi_col, const MV *const mv,
+ uint8_t *comp_pred, const uint8_t *pred,
+ int width, int height, int subpel_x_q3,
+ int subpel_y_q3, const uint8_t *ref,
+ int ref_stride, int subpel_search) {
+ aom_upsampled_pred_neon(xd, cm, mi_row, mi_col, mv, comp_pred, width, height,
+ subpel_x_q3, subpel_y_q3, ref, ref_stride,
+ subpel_search);
+
+ aom_comp_avg_pred_neon(comp_pred, pred, width, height, comp_pred, width);
+}
diff --git a/av1/encoder/arm/neon/temporal_filter_neon.c b/av1/encoder/arm/neon/temporal_filter_neon.c
index cae44f9..163768b 100644
--- a/av1/encoder/arm/neon/temporal_filter_neon.c
+++ b/av1/encoder/arm/neon/temporal_filter_neon.c
@@ -11,16 +11,18 @@
#include <arm_neon.h>
+#include "config/aom_config.h"
#include "config/av1_rtcd.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/temporal_filter.h"
+#include "aom_dsp/mathutils.h"
#include "aom_dsp/arm/mem_neon.h"
#include "aom_dsp/arm/sum_neon.h"
// For the squared error buffer, add padding for 4 samples.
#define SSE_STRIDE (BW + 4)
-#if defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
// clang-format off
@@ -57,16 +59,18 @@
} while (i < block_height);
}
-static INLINE uint8x16_t load_and_pad(uint8_t *src, const uint32_t col,
+static INLINE uint8x16_t load_and_pad(const uint8_t *src, const uint32_t col,
const uint32_t block_width) {
uint8x8_t s = vld1_u8(src);
if (col == 0) {
- s[0] = s[2];
- s[1] = s[2];
+ const uint8_t lane2 = vget_lane_u8(s, 2);
+ s = vset_lane_u8(lane2, s, 0);
+ s = vset_lane_u8(lane2, s, 1);
} else if (col >= block_width - 4) {
- s[6] = s[5];
- s[7] = s[5];
+ const uint8_t lane5 = vget_lane_u8(s, 5);
+ s = vset_lane_u8(lane5, s, 6);
+ s = vset_lane_u8(lane5, s, 7);
}
return vcombine_u8(s, s);
}
@@ -74,10 +78,10 @@
static void apply_temporal_filter(
const uint8_t *frame, const unsigned int stride, const uint32_t block_width,
const uint32_t block_height, const int *subblock_mses,
- unsigned int *accumulator, uint16_t *count, uint8_t *frame_abs_diff,
- uint32_t *luma_sse_sum, const double inv_num_ref_pixels,
+ unsigned int *accumulator, uint16_t *count, const uint8_t *frame_abs_diff,
+ const uint32_t *luma_sse_sum, const double inv_num_ref_pixels,
const double decay_factor, const double inv_factor,
- const double weight_factor, double *d_factor) {
+ const double weight_factor, const double *d_factor, int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -87,11 +91,11 @@
// Traverse 4 columns at a time - first and last two columns need padding.
for (uint32_t col = 0; col < block_width; col += 4) {
uint8x16_t vsrc[5][2];
- uint8_t *src = frame_abs_diff + col;
+ const uint8_t *src = frame_abs_diff + col;
// Load, pad (for first and last two columns) and mask 3 rows from the top.
for (int i = 2; i < 5; i++) {
- uint8x16_t s = load_and_pad(src, col, block_width);
+ const uint8x16_t s = load_and_pad(src, col, block_width);
vsrc[i][0] = vandq_u8(s, vmask.val[0]);
vsrc[i][1] = vandq_u8(s, vmask.val[1]);
src += SSE_STRIDE;
@@ -142,29 +146,54 @@
}
// Perform filtering.
- for (unsigned int i = 0, k = 0; i < block_height; i++) {
- for (unsigned int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame[i * stride + j];
- uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
+ if (tf_wgt_calc_lvl == 0) {
+ for (unsigned int i = 0, k = 0; i < block_height; i++) {
+ for (unsigned int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame[i * stride + j];
+ const uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
- // Compute filter weight.
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
- accumulator[k] += weight * pixel_value;
- count[k] += weight;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx =
+ (i >= block_height / 2) * 2 + (j >= block_width / 2);
+ const double block_error = (double)subblock_mses[subblock_idx];
+ const double combined_error =
+ weight_factor * window_error + block_error * inv_factor;
+ // Compute filter weight.
+ double scaled_error =
+ combined_error * d_factor[subblock_idx] * decay_factor;
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ }
+ }
+ } else {
+ for (unsigned int i = 0, k = 0; i < block_height; i++) {
+ for (unsigned int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame[i * stride + j];
+ const uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
+
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx =
+ (i >= block_height / 2) * 2 + (j >= block_width / 2);
+ const double block_error = (double)subblock_mses[subblock_idx];
+ const double combined_error =
+ weight_factor * window_error + block_error * inv_factor;
+ // Compute filter weight.
+ double scaled_error =
+ combined_error * d_factor[subblock_idx] * decay_factor;
+ scaled_error = AOMMIN(scaled_error, 7);
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ const int weight = iroundpf(fweight);
+ accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ }
}
}
}
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
// When using vld1q_u16_x4 compilers may insert an alignment hint of 256 bits.
DECLARE_ALIGNED(32, static const uint16_t, kSlidingWindowMask[]) = {
@@ -205,16 +234,18 @@
} while (i < block_height);
}
-static INLINE uint16x8_t load_and_pad(uint16_t *src, const uint32_t col,
+static INLINE uint16x8_t load_and_pad(const uint16_t *src, const uint32_t col,
const uint32_t block_width) {
uint16x8_t s = vld1q_u16(src);
if (col == 0) {
- s[0] = s[2];
- s[1] = s[2];
+ const uint16_t lane2 = vgetq_lane_u16(s, 2);
+ s = vsetq_lane_u16(lane2, s, 0);
+ s = vsetq_lane_u16(lane2, s, 1);
} else if (col >= block_width - 4) {
- s[6] = s[5];
- s[7] = s[5];
+ const uint16_t lane5 = vgetq_lane_u16(s, 5);
+ s = vsetq_lane_u16(lane5, s, 6);
+ s = vsetq_lane_u16(lane5, s, 7);
}
return s;
}
@@ -222,10 +253,10 @@
static void apply_temporal_filter(
const uint8_t *frame, const unsigned int stride, const uint32_t block_width,
const uint32_t block_height, const int *subblock_mses,
- unsigned int *accumulator, uint16_t *count, uint16_t *frame_sse,
- uint32_t *luma_sse_sum, const double inv_num_ref_pixels,
+ unsigned int *accumulator, uint16_t *count, const uint16_t *frame_sse,
+ const uint32_t *luma_sse_sum, const double inv_num_ref_pixels,
const double decay_factor, const double inv_factor,
- const double weight_factor, double *d_factor) {
+ const double weight_factor, const double *d_factor, int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -235,7 +266,7 @@
// Traverse 4 columns at a time - first and last two columns need padding.
for (uint32_t col = 0; col < block_width; col += 4) {
uint16x8_t vsrc[5];
- uint16_t *src = frame_sse + col;
+ const uint16_t *src = frame_sse + col;
// Load and pad (for first and last two columns) 3 rows from the top.
for (int i = 2; i < 5; i++) {
@@ -273,36 +304,62 @@
}
// Perform filtering.
- for (unsigned int i = 0, k = 0; i < block_height; i++) {
- for (unsigned int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame[i * stride + j];
- uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
+ if (tf_wgt_calc_lvl == 0) {
+ for (unsigned int i = 0, k = 0; i < block_height; i++) {
+ for (unsigned int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame[i * stride + j];
+ const uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
- // Compute filter weight.
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
- accumulator[k] += weight * pixel_value;
- count[k] += weight;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx =
+ (i >= block_height / 2) * 2 + (j >= block_width / 2);
+ const double block_error = (double)subblock_mses[subblock_idx];
+ const double combined_error =
+ weight_factor * window_error + block_error * inv_factor;
+ // Compute filter weight.
+ double scaled_error =
+ combined_error * d_factor[subblock_idx] * decay_factor;
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ }
+ }
+ } else {
+ for (unsigned int i = 0, k = 0; i < block_height; i++) {
+ for (unsigned int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame[i * stride + j];
+ const uint32_t diff_sse = acc_5x5_neon[i][j] + luma_sse_sum[i * BW + j];
+
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx =
+ (i >= block_height / 2) * 2 + (j >= block_width / 2);
+ const double block_error = (double)subblock_mses[subblock_idx];
+ const double combined_error =
+ weight_factor * window_error + block_error * inv_factor;
+ // Compute filter weight.
+ double scaled_error =
+ combined_error * d_factor[subblock_idx] * decay_factor;
+ scaled_error = AOMMIN(scaled_error, 7);
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ const int weight = iroundpf(fweight);
+ accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ }
}
}
}
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
void av1_apply_temporal_filter_neon(
const YV12_BUFFER_CONFIG *frame_to_filter, const MACROBLOCKD *mbd,
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
const int is_high_bitdepth = frame_to_filter->flags & YV12_FLAG_HIGHBITDEPTH;
assert(block_size == BLOCK_32X32 && "Only support 32x32 block with Neon!");
assert(TF_WINDOW_LENGTH == 5 && "Only support window length 5 with Neon!");
@@ -336,11 +393,11 @@
double s_decay = pow((double)filter_strength / TF_STRENGTH_THRESHOLD, 2);
s_decay = CLIP(s_decay, 1e-5, 1);
double d_factor[4] = { 0 };
-#if defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
uint8_t frame_abs_diff[SSE_STRIDE * BH] = { 0 };
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
uint16_t frame_sse[SSE_STRIDE * BH] = { 0 };
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
uint32_t luma_sse_sum[BW * BH] = { 0 };
for (int subblock_idx = 0; subblock_idx < 4; subblock_idx++) {
@@ -379,7 +436,7 @@
// search is only done on Y-plane, so the information from Y-plane
// will be more accurate. The luma sse sum is reused in both chroma
// planes.
-#if defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+#if AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
if (plane == AOM_PLANE_U) {
for (unsigned int i = 0; i < plane_h; i++) {
for (unsigned int j = 0; j < plane_w; j++) {
@@ -403,8 +460,8 @@
subblock_mses, accum + plane_offset,
count + plane_offset, frame_abs_diff, luma_sse_sum,
inv_num_ref_pixels, decay_factor, inv_factor,
- weight_factor, d_factor);
-#else // !(defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD))
+ weight_factor, d_factor, tf_wgt_calc_lvl);
+#else // !(AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD))
if (plane == AOM_PLANE_U) {
for (unsigned int i = 0; i < plane_h; i++) {
for (unsigned int j = 0; j < plane_w; j++) {
@@ -422,11 +479,12 @@
get_squared_error(ref, frame_stride, pred + plane_offset, plane_w, plane_w,
plane_h, frame_sse, SSE_STRIDE);
- apply_temporal_filter(
- pred + plane_offset, plane_w, plane_w, plane_h, subblock_mses,
- accum + plane_offset, count + plane_offset, frame_sse, luma_sse_sum,
- inv_num_ref_pixels, decay_factor, inv_factor, weight_factor, d_factor);
-#endif // defined(__aarch64__) && defined(__ARM_FEATURE_DOTPROD)
+ apply_temporal_filter(pred + plane_offset, plane_w, plane_w, plane_h,
+ subblock_mses, accum + plane_offset,
+ count + plane_offset, frame_sse, luma_sse_sum,
+ inv_num_ref_pixels, decay_factor, inv_factor,
+ weight_factor, d_factor, tf_wgt_calc_lvl);
+#endif // AOM_ARCH_AARCH64 && defined(__ARM_FEATURE_DOTPROD)
plane_offset += plane_h * plane_w;
}
diff --git a/av1/encoder/av1_fwd_txfm2d.c b/av1/encoder/av1_fwd_txfm2d.c
index bcb829d..12a9535 100644
--- a/av1/encoder/av1_fwd_txfm2d.c
+++ b/av1/encoder/av1_fwd_txfm2d.c
@@ -105,19 +105,24 @@
}
}
+ DECLARE_ALIGNED(16, int32_t, row_buffer[MAX_TX_SIZE]);
+
// Rows
for (r = 0; r < txfm_size_row; ++r) {
- txfm_func_row(buf + r * txfm_size_col, output + r * txfm_size_col,
- cos_bit_row, stage_range_row);
- av1_round_shift_array(output + r * txfm_size_col, txfm_size_col, -shift[2]);
+ txfm_func_row(buf + r * txfm_size_col, row_buffer, cos_bit_row,
+ stage_range_row);
+ av1_round_shift_array(row_buffer, txfm_size_col, -shift[2]);
if (abs(rect_type) == 1) {
// Multiply everything by Sqrt2 if the transform is rectangular and the
// size difference is a factor of 2.
for (c = 0; c < txfm_size_col; ++c) {
- output[r * txfm_size_col + c] = round_shift(
- (int64_t)output[r * txfm_size_col + c] * NewSqrt2, NewSqrt2Bits);
+ row_buffer[c] =
+ round_shift((int64_t)row_buffer[c] * NewSqrt2, NewSqrt2Bits);
}
}
+ for (c = 0; c < txfm_size_col; ++c) {
+ output[c * txfm_size_row + r] = row_buffer[c];
+ }
}
}
@@ -241,14 +246,14 @@
fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
// Zero out top-right 32x32 area.
- for (int row = 0; row < 32; ++row) {
- memset(output + row * 64 + 32, 0, 32 * sizeof(*output));
+ for (int col = 0; col < 32; ++col) {
+ memset(output + col * 64 + 32, 0, 32 * sizeof(*output));
}
// Zero out the bottom 64x32 area.
memset(output + 32 * 64, 0, 32 * 64 * sizeof(*output));
// Re-pack non-zero coeffs in the first 32x32 indices.
- for (int row = 1; row < 32; ++row) {
- memcpy(output + row * 32, output + row * 64, 32 * sizeof(*output));
+ for (int col = 1; col < 32; ++col) {
+ memcpy(output + col * 32, output + col * 64, 32 * sizeof(*output));
}
}
@@ -258,9 +263,14 @@
TXFM_2D_FLIP_CFG cfg;
av1_get_fwd_txfm_cfg(tx_type, TX_32X64, &cfg);
fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
- // Zero out the bottom 32x32 area.
- memset(output + 32 * 32, 0, 32 * 32 * sizeof(*output));
- // Note: no repacking needed here.
+ // Zero out right 32x32 area.
+ for (int col = 0; col < 32; ++col) {
+ memset(output + col * 64 + 32, 0, 32 * sizeof(*output));
+ }
+ // Re-pack non-zero coeffs in the first 32x32 indices.
+ for (int col = 1; col < 32; ++col) {
+ memcpy(output + col * 32, output + col * 64, 32 * sizeof(*output));
+ }
}
void av1_fwd_txfm2d_64x32_c(const int16_t *input, int32_t *output, int stride,
@@ -269,15 +279,9 @@
TXFM_2D_FLIP_CFG cfg;
av1_get_fwd_txfm_cfg(tx_type, TX_64X32, &cfg);
fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
-
- // Zero out right 32x32 area.
- for (int row = 0; row < 32; ++row) {
- memset(output + row * 64 + 32, 0, 32 * sizeof(*output));
- }
- // Re-pack non-zero coeffs in the first 32x32 indices.
- for (int row = 1; row < 32; ++row) {
- memcpy(output + row * 32, output + row * 64, 32 * sizeof(*output));
- }
+ // Zero out the bottom 32x32 area.
+ memset(output + 32 * 32, 0, 32 * 32 * sizeof(*output));
+ // Note: no repacking needed here.
}
void av1_fwd_txfm2d_16x64_c(const int16_t *input, int32_t *output, int stride,
@@ -286,17 +290,6 @@
TXFM_2D_FLIP_CFG cfg;
av1_get_fwd_txfm_cfg(tx_type, TX_16X64, &cfg);
fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
- // Zero out the bottom 16x32 area.
- memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
- // Note: no repacking needed here.
-}
-
-void av1_fwd_txfm2d_64x16_c(const int16_t *input, int32_t *output, int stride,
- TX_TYPE tx_type, int bd) {
- int32_t txfm_buf[64 * 16];
- TXFM_2D_FLIP_CFG cfg;
- av1_get_fwd_txfm_cfg(tx_type, TX_64X16, &cfg);
- fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
// Zero out right 32x16 area.
for (int row = 0; row < 16; ++row) {
memset(output + row * 64 + 32, 0, 32 * sizeof(*output));
@@ -307,6 +300,17 @@
}
}
+void av1_fwd_txfm2d_64x16_c(const int16_t *input, int32_t *output, int stride,
+ TX_TYPE tx_type, int bd) {
+ int32_t txfm_buf[64 * 16];
+ TXFM_2D_FLIP_CFG cfg;
+ av1_get_fwd_txfm_cfg(tx_type, TX_64X16, &cfg);
+ fwd_txfm2d_c(input, output, stride, &cfg, txfm_buf, bd);
+ // Zero out the bottom 16x32 area.
+ memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
+ // Note: no repacking needed here.
+}
+
static const int8_t fwd_shift_4x4[3] = { 2, 0, 0 };
static const int8_t fwd_shift_8x8[3] = { 2, -1, 0 };
static const int8_t fwd_shift_16x16[3] = { 2, -2, 0 };
@@ -369,16 +373,6 @@
static const int8_t fidtx16_range_mult2[1] = { 3 };
static const int8_t fidtx32_range_mult2[1] = { 4 };
-#if 0
-const int8_t fwd_idtx_range_row[MAX_TXWH_IDX /*txw_idx*/]
- [MAX_TXWH_IDX /*txh_idx*/] = { { 2, 4, 5, 0, 0 },
- { 3, 4, 5, 6, 0 },
- { 4, 5, 6, 7, 8 },
- { 0, 5, 6, 7, 8 },
- { 0, 0, 7, 8,
- 9 } };
-#endif
-
static const int8_t *fwd_txfm_range_mult2_list[TXFM_TYPES] = {
fdct4_range_mult2, fdct8_range_mult2, fdct16_range_mult2,
fdct32_range_mult2, fdct64_range_mult2, fadst4_range_mult2,
@@ -390,22 +384,20 @@
av1_zero(cfg->stage_range_col);
av1_zero(cfg->stage_range_row);
- const int8_t *range_mult2_col = fwd_txfm_range_mult2_list[cfg->txfm_type_col];
- if (cfg->txfm_type_col != TXFM_TYPE_INVALID) {
- int stage_num_col = cfg->stage_num_col;
- for (int i = 0; i < stage_num_col; ++i)
- cfg->stage_range_col[i] = (range_mult2_col[i] + 1) >> 1;
- }
+ const int8_t *const range_mult2_col =
+ fwd_txfm_range_mult2_list[cfg->txfm_type_col];
+ const int stage_num_col = cfg->stage_num_col;
+ // i < MAX_TXFM_STAGE_NUM will quiet -Wstringop-overflow.
+ for (int i = 0; i < stage_num_col && i < MAX_TXFM_STAGE_NUM; ++i)
+ cfg->stage_range_col[i] = (range_mult2_col[i] + 1) >> 1;
- if (cfg->txfm_type_row != TXFM_TYPE_INVALID) {
- int stage_num_row = cfg->stage_num_row;
- const int8_t *range_mult2_row =
- fwd_txfm_range_mult2_list[cfg->txfm_type_row];
- for (int i = 0; i < stage_num_row; ++i) {
- cfg->stage_range_row[i] =
- (range_mult2_col[cfg->stage_num_col - 1] + range_mult2_row[i] + 1) >>
- 1;
- }
+ const int8_t *const range_mult2_row =
+ fwd_txfm_range_mult2_list[cfg->txfm_type_row];
+ const int stage_num_row = cfg->stage_num_row;
+ // i < MAX_TXFM_STAGE_NUM will quiet -Wstringop-overflow.
+ for (int i = 0; i < stage_num_row && i < MAX_TXFM_STAGE_NUM; ++i) {
+ cfg->stage_range_row[i] =
+ (range_mult2_col[stage_num_col - 1] + range_mult2_row[i] + 1) >> 1;
}
}
@@ -422,7 +414,9 @@
cfg->cos_bit_col = av1_fwd_cos_bit_col[txw_idx][txh_idx];
cfg->cos_bit_row = av1_fwd_cos_bit_row[txw_idx][txh_idx];
cfg->txfm_type_col = av1_txfm_type_ls[txh_idx][tx_type_1d_col];
+ assert(cfg->txfm_type_col != TXFM_TYPE_INVALID);
cfg->txfm_type_row = av1_txfm_type_ls[txw_idx][tx_type_1d_row];
+ assert(cfg->txfm_type_row != TXFM_TYPE_INVALID);
cfg->stage_num_col = av1_txfm_stage_num_list[cfg->txfm_type_col];
cfg->stage_num_row = av1_txfm_stage_num_list[cfg->txfm_type_row];
set_fwd_txfm_non_scale_range(cfg);
diff --git a/av1/encoder/av1_quantize.c b/av1/encoder/av1_quantize.c
index 97652cf..1aad473 100644
--- a/av1/encoder/av1_quantize.c
+++ b/av1/encoder/av1_quantize.c
@@ -673,15 +673,38 @@
}
}
+static INLINE bool deltaq_params_have_changed(
+ const DeltaQuantParams *prev_deltaq_params,
+ const CommonQuantParams *quant_params) {
+ return (prev_deltaq_params->y_dc_delta_q != quant_params->y_dc_delta_q ||
+ prev_deltaq_params->u_dc_delta_q != quant_params->u_dc_delta_q ||
+ prev_deltaq_params->v_dc_delta_q != quant_params->v_dc_delta_q ||
+ prev_deltaq_params->u_ac_delta_q != quant_params->u_ac_delta_q ||
+ prev_deltaq_params->v_ac_delta_q != quant_params->v_ac_delta_q);
+}
+
void av1_init_quantizer(EncQuantDequantParams *const enc_quant_dequant_params,
const CommonQuantParams *quant_params,
aom_bit_depth_t bit_depth) {
+ DeltaQuantParams *const prev_deltaq_params =
+ &enc_quant_dequant_params->prev_deltaq_params;
+
+ // Re-initialize the quantizer only if any of the dc/ac deltaq parameters
+ // change.
+ if (!deltaq_params_have_changed(prev_deltaq_params, quant_params)) return;
QUANTS *const quants = &enc_quant_dequant_params->quants;
Dequants *const dequants = &enc_quant_dequant_params->dequants;
av1_build_quantizer(bit_depth, quant_params->y_dc_delta_q,
quant_params->u_dc_delta_q, quant_params->u_ac_delta_q,
quant_params->v_dc_delta_q, quant_params->v_ac_delta_q,
quants, dequants);
+
+ // Record the state of deltaq parameters.
+ prev_deltaq_params->y_dc_delta_q = quant_params->y_dc_delta_q;
+ prev_deltaq_params->u_dc_delta_q = quant_params->u_dc_delta_q;
+ prev_deltaq_params->v_dc_delta_q = quant_params->v_dc_delta_q;
+ prev_deltaq_params->u_ac_delta_q = quant_params->u_ac_delta_q;
+ prev_deltaq_params->v_ac_delta_q = quant_params->v_ac_delta_q;
}
void av1_set_q_index(const EncQuantDequantParams *enc_quant_dequant_params,
diff --git a/av1/encoder/av1_quantize.h b/av1/encoder/av1_quantize.h
index 701e4cf..0409733 100644
--- a/av1/encoder/av1_quantize.h
+++ b/av1/encoder/av1_quantize.h
@@ -81,11 +81,24 @@
v_dequant_QTX[QINDEX_RANGE][8]); // 8: SIMD width
} Dequants;
+// The DeltaQuantParams structure holds the dc/ac deltaq parameters.
+typedef struct {
+ int y_dc_delta_q;
+ int u_dc_delta_q;
+ int u_ac_delta_q;
+ int v_dc_delta_q;
+ int v_ac_delta_q;
+} DeltaQuantParams;
+
typedef struct {
// Quantization parameters for internal quantizer setup.
QUANTS quants;
// Dequantization parameters for internal quantizer setup.
Dequants dequants;
+ // Deltaq parameters to track the state of the dc/ac deltaq parameters in
+ // cm->quant_params. It is used to decide whether the quantizer tables need
+ // to be re-initialized.
+ DeltaQuantParams prev_deltaq_params;
} EncQuantDequantParams;
struct AV1_COMP;
diff --git a/av1/encoder/av1_temporal_denoiser.c b/av1/encoder/av1_temporal_denoiser.c
index 87ae763..3012df6 100644
--- a/av1/encoder/av1_temporal_denoiser.c
+++ b/av1/encoder/av1_temporal_denoiser.c
@@ -489,7 +489,7 @@
&denoiser->running_avg_y[fb_idx], cm->width, cm->height,
cm->seq_params->subsampling_x, cm->seq_params->subsampling_y,
cm->seq_params->use_highbitdepth, AOM_BORDER_IN_PIXELS,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
if (fail) {
av1_denoiser_free(denoiser);
return 1;
@@ -577,7 +577,7 @@
fail = aom_alloc_frame_buffer(
&denoiser->running_avg_y[i + denoiser->num_ref_frames * layer],
denoise_width, denoise_height, ssx, ssy, use_highbitdepth, border,
- legacy_byte_alignment, 0);
+ legacy_byte_alignment, 0, 0);
if (fail) {
av1_denoiser_free(denoiser);
return 1;
@@ -589,7 +589,7 @@
fail = aom_alloc_frame_buffer(
&denoiser->mc_running_avg_y[layer], denoise_width, denoise_height, ssx,
- ssy, use_highbitdepth, border, legacy_byte_alignment, 0);
+ ssy, use_highbitdepth, border, legacy_byte_alignment, 0, 0);
if (fail) {
av1_denoiser_free(denoiser);
return 1;
@@ -600,7 +600,7 @@
// layer.
fail = aom_alloc_frame_buffer(&denoiser->last_source, width, height, ssx, ssy,
use_highbitdepth, border, legacy_byte_alignment,
- 0);
+ 0, 0);
if (fail) {
av1_denoiser_free(denoiser);
return 1;
diff --git a/av1/encoder/bitstream.c b/av1/encoder/bitstream.c
index 4f85307..39aa027 100644
--- a/av1/encoder/bitstream.c
+++ b/av1/encoder/bitstream.c
@@ -2795,7 +2795,7 @@
}
// Check whether all references are distinct frames.
- const RefCntBuffer *seen_bufs[FRAME_BUFFERS] = { NULL };
+ const RefCntBuffer *seen_bufs[INTER_REFS_PER_FRAME] = { NULL };
int num_refs = 0;
for (int ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ++ref_frame) {
const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
diff --git a/av1/encoder/block.h b/av1/encoder/block.h
index 4185798..360b9d4 100644
--- a/av1/encoder/block.h
+++ b/av1/encoder/block.h
@@ -42,6 +42,35 @@
/*! Maximum value taken by transform type probabilities */
#define MAX_TX_TYPE_PROB 1024
+
+//! Compute color sensitivity index for given plane
+#define COLOR_SENS_IDX(plane) ((plane)-1)
+
+//! Enable timer statistics of mode search in non-rd
+#define COLLECT_NONRD_PICK_MODE_STAT 0
+
+/*!\cond */
+#if COLLECT_NONRD_PICK_MODE_STAT
+#include "aom_ports/aom_timer.h"
+
+typedef struct _mode_search_stat_nonrd {
+ int32_t num_blocks[BLOCK_SIZES];
+ int64_t total_block_times[BLOCK_SIZES];
+ int32_t num_searches[BLOCK_SIZES][MB_MODE_COUNT];
+ int32_t num_nonskipped_searches[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t search_times[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t nonskipped_search_times[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t ms_time[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t ifs_time[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t model_rd_time[BLOCK_SIZES][MB_MODE_COUNT];
+ int64_t txfm_time[BLOCK_SIZES][MB_MODE_COUNT];
+ struct aom_usec_timer timer1;
+ struct aom_usec_timer timer2;
+ struct aom_usec_timer bsize_timer;
+} mode_search_stat_nonrd;
+#endif // COLLECT_NONRD_PICK_MODE_STAT
+/*!\endcond */
+
/*! \brief Superblock level encoder info
*
* SuperblockEnc stores superblock level information used by the encoder for
@@ -1286,11 +1315,13 @@
* Used in REALTIME coding mode to enhance the visual quality at the boundary
* of moving color objects.
*/
- uint8_t color_sensitivity_sb[2];
+ uint8_t color_sensitivity_sb[MAX_MB_PLANE - 1];
//! Color sensitivity flag for the superblock for golden reference.
- uint8_t color_sensitivity_sb_g[2];
+ uint8_t color_sensitivity_sb_g[MAX_MB_PLANE - 1];
+ //! Color sensitivity flag for the superblock for altref reference.
+ uint8_t color_sensitivity_sb_alt[MAX_MB_PLANE - 1];
//! Color sensitivity flag for the coding block.
- uint8_t color_sensitivity[2];
+ uint8_t color_sensitivity[MAX_MB_PLANE - 1];
/**@}*/
/*****************************************************************************
@@ -1326,6 +1357,15 @@
/*! \brief A hash to make sure av1_set_offsets is called */
SetOffsetsLoc last_set_offsets_loc;
#endif // NDEBUG
+
+#if COLLECT_NONRD_PICK_MODE_STAT
+ mode_search_stat_nonrd ms_stat_nonrd;
+#endif // COLLECT_NONRD_PICK_MODE_STAT
+
+ /*!\brief Number of pixels in current thread that choose palette mode in the
+ * fast encoding stage for screen content tool detemination.
+ */
+ int palette_pixels;
} MACROBLOCK;
#undef SINGLE_REF_MODES
diff --git a/av1/encoder/cnn.c b/av1/encoder/cnn.c
index 639922f..28e1f71 100644
--- a/av1/encoder/cnn.c
+++ b/av1/encoder/cnn.c
@@ -1193,40 +1193,3 @@
aom_free(input_);
return success;
}
-
-// Assume output already has proper allocation
-// Assume input image buffers all have same resolution and strides
-bool av1_cnn_predict_img(uint8_t **dgd, int width, int height, int stride,
- const CNN_CONFIG *cnn_config,
- const CNN_THREAD_DATA *thread_data, float **output,
- int out_stride) {
- int out_width = 0, out_height = 0, out_channels = 0;
- av1_find_cnn_output_size(width, height, cnn_config, &out_width, &out_height,
- &out_channels);
- const int output_chs[1] = { out_channels };
- const int output_strides[1] = { out_stride };
- CNN_MULTI_OUT output_struct = { .output_channels = output_chs,
- .output_strides = output_strides,
- .output_buffer = output };
- return av1_cnn_predict_img_multi_out(dgd, width, height, stride, cnn_config,
- thread_data, &output_struct);
-}
-
-// Assume output already has proper allocation
-// Assume input image buffers all have same resolution and strides
-bool av1_cnn_predict_img_highbd(uint16_t **dgd, int width, int height,
- int stride, const CNN_CONFIG *cnn_config,
- const CNN_THREAD_DATA *thread_data,
- int bit_depth, float **output, int out_stride) {
- int out_width = 0, out_height = 0, out_channels = 0;
- av1_find_cnn_output_size(width, height, cnn_config, &out_width, &out_height,
- &out_channels);
- const int output_chs[1] = { out_channels };
- const int output_strides[1] = { out_stride };
- CNN_MULTI_OUT output_struct = { .output_channels = output_chs,
- .output_strides = output_strides,
- .output_buffer = output };
- return av1_cnn_predict_img_multi_out_highbd(dgd, width, height, stride,
- cnn_config, thread_data,
- bit_depth, &output_struct);
-}
diff --git a/av1/encoder/cnn.h b/av1/encoder/cnn.h
index 1a6c03a..df6401f 100644
--- a/av1/encoder/cnn.h
+++ b/av1/encoder/cnn.h
@@ -9,8 +9,8 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
-#ifndef AOM_AV1_COMMON_CNN_H_
-#define AOM_AV1_COMMON_CNN_H_
+#ifndef AOM_AV1_ENCODER_CNN_H_
+#define AOM_AV1_ENCODER_CNN_H_
#ifdef __cplusplus
extern "C" {
@@ -184,20 +184,8 @@
const CNN_CONFIG *cnn_config,
const CNN_THREAD_DATA *thread_data,
int bit_depth, CNN_MULTI_OUT *output);
-
-// Prediction functions from set of input image buffers. This function only
-// supports a single output.
-bool av1_cnn_predict_img(uint8_t **dgd, int width, int height, int stride,
- const CNN_CONFIG *cnn_config,
- const CNN_THREAD_DATA *thread_data, float **output,
- int out_stride);
-bool av1_cnn_predict_img_highbd(uint16_t **dgd, int width, int height,
- int stride, const CNN_CONFIG *cnn_config,
- const CNN_THREAD_DATA *thread_data,
- int bit_depth, float **output, int out_stride);
-
#ifdef __cplusplus
} // extern "C"
#endif
-#endif // AOM_AV1_COMMON_CNN_H_
+#endif // AOM_AV1_ENCODER_CNN_H_
diff --git a/av1/encoder/compound_type.c b/av1/encoder/compound_type.c
index 39c505d..1992f23 100644
--- a/av1/encoder/compound_type.c
+++ b/av1/encoder/compound_type.c
@@ -1023,12 +1023,15 @@
const BLOCK_SIZE bsize,
int64_t ref_skip_rd, int mode_rate) {
int eval_txfm = 1;
+ const int txfm_rd_gate_level =
+ get_txfm_rd_gate_level(cpi->sf.inter_sf.txfm_rd_gate_level, bsize,
+ TX_SEARCH_DEFAULT, /*eval_motion_mode=*/0);
// Check if the mode is good enough based on skip rd
- if (cpi->sf.inter_sf.txfm_rd_gate_level) {
+ if (txfm_rd_gate_level) {
int64_t sse_y = compute_sse_plane(x, xd, PLANE_TYPE_Y, bsize);
int64_t skip_rd = RDCOST(x->rdmult, mode_rate, (sse_y << 4));
- eval_txfm = check_txfm_eval(x, bsize, ref_skip_rd, skip_rd,
- cpi->sf.inter_sf.txfm_rd_gate_level, 1);
+ eval_txfm =
+ check_txfm_eval(x, bsize, ref_skip_rd, skip_rd, txfm_rd_gate_level, 1);
}
return eval_txfm;
}
@@ -1104,9 +1107,12 @@
// Check if the mode is good enough based on skip rd
// TODO(nithya): Handle wedge_newmv_search if extending for lower speed
// setting
- if (cpi->sf.inter_sf.txfm_rd_gate_level) {
+ const int txfm_rd_gate_level =
+ get_txfm_rd_gate_level(cpi->sf.inter_sf.txfm_rd_gate_level, bsize,
+ TX_SEARCH_DEFAULT, /*eval_motion_mode=*/0);
+ if (txfm_rd_gate_level) {
int eval_txfm = check_txfm_eval(x, bsize, ref_skip_rd, skip_rd_cur,
- cpi->sf.inter_sf.txfm_rd_gate_level, 1);
+ txfm_rd_gate_level, 1);
if (!eval_txfm) {
*comp_model_rd_cur = INT64_MAX;
return INT64_MAX;
@@ -1300,9 +1306,18 @@
int64_t mode_rd = RDCOST(x->rdmult, rs2 + rd_stats->rate, 0);
if (mode_rd >= ref_best_rd) continue;
+ // Derive the flags to indicate enabling/disabling of MV refinement process.
+ const int enable_fast_compound_mode_search =
+ cpi->sf.inter_sf.enable_fast_compound_mode_search;
+ const bool skip_mv_refinement_for_avg_distwtd =
+ enable_fast_compound_mode_search == 3 ||
+ (enable_fast_compound_mode_search == 2 && (this_mode != NEW_NEWMV));
+ const bool skip_mv_refinement_for_diffwtd =
+ (!enable_fast_compound_mode_search && cur_type == COMPOUND_DIFFWTD);
+
// Case COMPOUND_AVERAGE and COMPOUND_DISTWTD
if (cur_type < COMPOUND_WEDGE) {
- if (cpi->sf.inter_sf.enable_fast_compound_mode_search == 2) {
+ if (skip_mv_refinement_for_avg_distwtd) {
int rate_sum;
uint8_t tmp_skip_txfm_sb;
int64_t dist_sum, tmp_skip_sse_sb;
@@ -1514,8 +1529,7 @@
mbmi->mv[1] = tmp_mv[1];
tmp_rate_mv = best_rate_mv;
rs2 = best_rs2;
- } else if (!cpi->sf.inter_sf.enable_fast_compound_mode_search &&
- cur_type == COMPOUND_DIFFWTD) {
+ } else if (skip_mv_refinement_for_diffwtd) {
int_mv tmp_mv[2];
int best_mask_index = 0;
rs2 += get_interinter_compound_mask_rate(&x->mode_costs, mbmi);
@@ -1597,20 +1611,24 @@
mbmi->mv[1] = tmp_mv[1];
} else {
// Handle masked compound types
- // Factors to control gating of compound type selection based on best
- // approximate rd so far
- const int max_comp_type_rd_threshold_mul =
- comp_type_rd_threshold_mul[cpi->sf.inter_sf
- .prune_comp_type_by_comp_avg];
- const int max_comp_type_rd_threshold_div =
- comp_type_rd_threshold_div[cpi->sf.inter_sf
- .prune_comp_type_by_comp_avg];
- // Evaluate COMPOUND_WEDGE / COMPOUND_DIFFWTD if approximated cost is
- // within threshold
- int64_t approx_rd = ((*rd / max_comp_type_rd_threshold_div) *
- max_comp_type_rd_threshold_mul);
+ bool eval_masked_comp_type = true;
+ if (*rd != INT64_MAX) {
+ // Factors to control gating of compound type selection based on best
+ // approximate rd so far
+ const int max_comp_type_rd_threshold_mul =
+ comp_type_rd_threshold_mul[cpi->sf.inter_sf
+ .prune_comp_type_by_comp_avg];
+ const int max_comp_type_rd_threshold_div =
+ comp_type_rd_threshold_div[cpi->sf.inter_sf
+ .prune_comp_type_by_comp_avg];
+ // Evaluate COMPOUND_WEDGE / COMPOUND_DIFFWTD if approximated cost is
+ // within threshold
+ const int64_t approx_rd = ((*rd / max_comp_type_rd_threshold_div) *
+ max_comp_type_rd_threshold_mul);
+ if (approx_rd >= ref_best_rd) eval_masked_comp_type = false;
+ }
- if (approx_rd < ref_best_rd) {
+ if (eval_masked_comp_type) {
const int64_t tmp_rd_thresh = AOMMIN(*rd, rd_thresh);
best_rd_cur = masked_compound_type_rd(
cpi, x, cur_mv, bsize, this_mode, &rs2, *rate_mv, orig_dst,
diff --git a/av1/encoder/context_tree.c b/av1/encoder/context_tree.c
index f328745..2bd2d7f 100644
--- a/av1/encoder/context_tree.c
+++ b/av1/encoder/context_tree.c
@@ -12,6 +12,7 @@
#include "av1/encoder/context_tree.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/rd.h"
+#include <assert.h>
void av1_copy_tree_context(PICK_MODE_CONTEXT *dst_ctx,
PICK_MODE_CONTEXT *src_ctx) {
@@ -150,36 +151,11 @@
}
PC_TREE *av1_alloc_pc_tree_node(BLOCK_SIZE bsize) {
- PC_TREE *pc_tree = NULL;
- struct aom_internal_error_info error;
-
- AOM_CHECK_MEM_ERROR(&error, pc_tree, aom_calloc(1, sizeof(*pc_tree)));
+ PC_TREE *pc_tree = aom_calloc(1, sizeof(*pc_tree));
+ if (pc_tree == NULL) return NULL;
pc_tree->partitioning = PARTITION_NONE;
pc_tree->block_size = bsize;
- pc_tree->index = 0;
-
- pc_tree->none = NULL;
- for (int i = 0; i < 2; ++i) {
- pc_tree->horizontal[i] = NULL;
- pc_tree->vertical[i] = NULL;
- }
-
-#if !CONFIG_REALTIME_ONLY
- for (int i = 0; i < 3; ++i) {
- pc_tree->horizontala[i] = NULL;
- pc_tree->horizontalb[i] = NULL;
- pc_tree->verticala[i] = NULL;
- pc_tree->verticalb[i] = NULL;
- }
- for (int i = 0; i < 4; ++i) {
- pc_tree->horizontal4[i] = NULL;
- pc_tree->vertical4[i] = NULL;
- }
-#endif
- for (int i = 0; i < 4; ++i) {
- pc_tree->split[i] = NULL;
- }
return pc_tree;
}
@@ -191,9 +167,45 @@
} while (0)
void av1_free_pc_tree_recursive(PC_TREE *pc_tree, int num_planes, int keep_best,
- int keep_none) {
+ int keep_none,
+ PARTITION_SEARCH_TYPE partition_search_type) {
if (pc_tree == NULL) return;
+ // Avoid freeing of extended partitions as they are not supported when
+ // partition_search_type is VAR_BASED_PARTITION.
+ if (partition_search_type == VAR_BASED_PARTITION && !keep_best &&
+ !keep_none) {
+ FREE_PMC_NODE(pc_tree->none);
+
+ for (int i = 0; i < 2; ++i) {
+ FREE_PMC_NODE(pc_tree->horizontal[i]);
+ FREE_PMC_NODE(pc_tree->vertical[i]);
+ }
+
+#if !defined(NDEBUG) && !CONFIG_REALTIME_ONLY
+ for (int i = 0; i < 3; ++i) {
+ assert(pc_tree->horizontala[i] == NULL);
+ assert(pc_tree->horizontalb[i] == NULL);
+ assert(pc_tree->verticala[i] == NULL);
+ assert(pc_tree->verticalb[i] == NULL);
+ }
+ for (int i = 0; i < 4; ++i) {
+ assert(pc_tree->horizontal4[i] == NULL);
+ assert(pc_tree->vertical4[i] == NULL);
+ }
+#endif
+
+ for (int i = 0; i < 4; ++i) {
+ if (pc_tree->split[i] != NULL) {
+ av1_free_pc_tree_recursive(pc_tree->split[i], num_planes, 0, 0,
+ partition_search_type);
+ pc_tree->split[i] = NULL;
+ }
+ }
+ aom_free(pc_tree);
+ return;
+ }
+
const PARTITION_TYPE partition = pc_tree->partitioning;
if (!keep_none && (!keep_best || (partition != PARTITION_NONE)))
@@ -226,7 +238,8 @@
if (!keep_best || (partition != PARTITION_SPLIT)) {
for (int i = 0; i < 4; ++i) {
if (pc_tree->split[i] != NULL) {
- av1_free_pc_tree_recursive(pc_tree->split[i], num_planes, 0, 0);
+ av1_free_pc_tree_recursive(pc_tree->split[i], num_planes, 0, 0,
+ partition_search_type);
pc_tree->split[i] = NULL;
}
}
diff --git a/av1/encoder/context_tree.h b/av1/encoder/context_tree.h
index 413535d..78f2076 100644
--- a/av1/encoder/context_tree.h
+++ b/av1/encoder/context_tree.h
@@ -16,6 +16,7 @@
#include "av1/common/blockd.h"
#include "av1/encoder/block.h"
+#include "av1/encoder/speed_features.h"
#ifdef __cplusplus
extern "C" {
@@ -107,7 +108,8 @@
PC_TREE *av1_alloc_pc_tree_node(BLOCK_SIZE bsize);
void av1_free_pc_tree_recursive(PC_TREE *tree, int num_planes, int keep_best,
- int keep_none);
+ int keep_none,
+ PARTITION_SEARCH_TYPE partition_search_type);
PICK_MODE_CONTEXT *av1_alloc_pmc(const struct AV1_COMP *const cpi,
BLOCK_SIZE bsize,
diff --git a/av1/encoder/dwt.c b/av1/encoder/dwt.c
index 5dfbcb6..2fab99d 100644
--- a/av1/encoder/dwt.c
+++ b/av1/encoder/dwt.c
@@ -114,7 +114,7 @@
dyadic_analyze_53_uint8_input(4, 8, 8, input, stride, output, 8, 2, hbd);
}
-int av1_haar_ac_sad(const tran_low_t *output, int bw, int bh, int stride) {
+static int haar_ac_sad(const tran_low_t *output, int bw, int bh, int stride) {
int acsad = 0;
for (int r = 0; r < bh; ++r)
@@ -124,35 +124,12 @@
return acsad;
}
-uint64_t av1_dct_ac_sad(tran_low_t *output, int bw, int bh, int stride) {
- uint64_t acsad = 0;
-
- for (int r = 0; r < bh; ++r)
- for (int c = 0; c < bw; ++c) {
- if (r > 0 || c > 0) acsad += abs(output[r * stride + c]);
- }
-
- return acsad;
-}
-
-uint32_t av1_variance(uint8_t *input, int bw, int bh, int stride) {
- int sum = 0;
- uint32_t sse = 0;
-
- for (int r = 0; r < bh; ++r)
- for (int c = 0; c < bw; ++c) {
- sum += input[r * stride + c];
- sse += input[r * stride + c] * input[r * stride + c];
- }
- return sse - (uint32_t)(((int64_t)sum * sum) / (bw * bh));
-}
-
static int haar_ac_sad_8x8_uint8_input(const uint8_t *input, int stride,
int hbd) {
tran_low_t output[64];
av1_fdwt8x8_uint8_input_c(input, output, stride, hbd);
- return av1_haar_ac_sad(output, 8, 8, 8);
+ return haar_ac_sad(output, 8, 8, 8);
}
int64_t av1_haar_ac_sad_mxn_uint8_input(const uint8_t *input, int stride,
diff --git a/av1/encoder/encode_strategy.c b/av1/encoder/encode_strategy.c
index f4c1ba3..90279b0 100644
--- a/av1/encoder/encode_strategy.c
+++ b/av1/encoder/encode_strategy.c
@@ -717,7 +717,7 @@
// to av1_encode() except that tpl is not performed.
static int denoise_and_encode(AV1_COMP *const cpi, uint8_t *const dest,
EncodeFrameInput *const frame_input,
- EncodeFrameParams *const frame_params,
+ const EncodeFrameParams *const frame_params,
EncodeFrameResults *const frame_results) {
#if CONFIG_COLLECT_COMPONENT_TIMING
if (cpi->oxcf.pass == 2) start_timing(cpi, denoise_and_encode_time);
@@ -744,9 +744,10 @@
!frame_params->show_existing_frame &&
!is_lossless_requested(&oxcf->rc_cfg);
if (allow_kf_filtering) {
- const double y_noise_level = av1_estimate_noise_from_single_plane(
- frame_input->source, 0, cm->seq_params->bit_depth,
- NOISE_ESTIMATION_EDGE_THRESHOLD);
+ double y_noise_level = 0.0;
+ av1_estimate_noise_level(
+ frame_input->source, &y_noise_level, AOM_PLANE_Y, AOM_PLANE_Y,
+ cm->seq_params->bit_depth, NOISE_ESTIMATION_EDGE_THRESHOLD);
apply_filtering = y_noise_level > 0;
} else {
apply_filtering = 0;
@@ -786,6 +787,8 @@
tf_buf, &frame_diff, q_index, cm->seq_params->bit_depth);
if (show_existing_alt_ref) {
cpi->common.showable_frame |= 1;
+ } else {
+ cpi->common.showable_frame = 0;
}
}
if (gf_group->frame_type[cpi->gf_frame_index] != KEY_FRAME) {
@@ -801,7 +804,7 @@
oxcf->frm_dim_cfg.height, cm->seq_params->subsampling_x,
cm->seq_params->subsampling_y, cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels, cm->features.byte_alignment, NULL, NULL,
- NULL, cpi->oxcf.tool_cfg.enable_global_motion, 0);
+ NULL, cpi->image_pyramid_levels, 0);
if (ret)
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate tf_buf_second_arf");
@@ -860,10 +863,7 @@
if (gf_group->size > MAX_LENGTH_TPL_FRAME_STATS) {
allow_tpl = 0;
}
- if (frame_params->frame_type == KEY_FRAME) {
- // TODO(angiebird): handle disable_filtered_key_tpl properly
- allow_tpl = allow_tpl && !cpi->sf.tpl_sf.disable_filtered_key_tpl;
- } else {
+ if (frame_params->frame_type != KEY_FRAME) {
// In rare case, it's possible to have non ARF/GF update_type here.
// We should set allow_tpl to zero in the situation
allow_tpl =
@@ -908,8 +908,7 @@
if (apply_filtering && is_psnr_calc_enabled(cpi)) {
cpi->source = av1_realloc_and_scale_if_required(
cm, source_buffer, &cpi->scaled_source, cm->features.interp_filter, 0,
- false, true, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ false, true, cpi->oxcf.border_in_pixels, cpi->image_pyramid_levels);
cpi->unscaled_source = source_buffer;
}
#if CONFIG_COLLECT_COMPONENT_TIMING
@@ -996,18 +995,30 @@
#if !CONFIG_REALTIME_ONLY
if (cpi->use_ducky_encode &&
cpi->ducky_encode_info.frame_info.gop_mode == DUCKY_ENCODE_GOP_MODE_RCL) {
- int valid_rf_idx = 0;
for (int rf = LAST_FRAME; rf < REF_FRAMES; ++rf) {
if (cpi->ppi->gf_group.ref_frame_list[gf_index][rf] != INVALID_IDX) {
remapped_ref_idx[rf - LAST_FRAME] =
cpi->ppi->gf_group.ref_frame_list[gf_index][rf];
+ }
+ }
+
+ int valid_rf_idx = 0;
+ static const int ref_frame_type_order[REF_FRAMES - LAST_FRAME] = {
+ GOLDEN_FRAME, ALTREF_FRAME, LAST_FRAME, BWDREF_FRAME,
+ ALTREF2_FRAME, LAST2_FRAME, LAST3_FRAME
+ };
+ for (int i = 0; i < REF_FRAMES - LAST_FRAME; i++) {
+ int rf = ref_frame_type_order[i];
+ if (remapped_ref_idx[rf - LAST_FRAME] != INVALID_IDX) {
valid_rf_idx = remapped_ref_idx[rf - LAST_FRAME];
+ break;
}
}
for (int i = 0; i < REF_FRAMES; ++i) {
- if (remapped_ref_idx[i] == INVALID_IDX)
+ if (remapped_ref_idx[i] == INVALID_IDX) {
remapped_ref_idx[i] = valid_rf_idx;
+ }
}
return;
@@ -1351,6 +1362,35 @@
}
frame_params.show_existing_frame &= allow_show_existing(cpi, *frame_flags);
+ // Special handling to reset 'show_existing_frame' in case of dropped
+ // frames.
+ if (oxcf->rc_cfg.drop_frames_water_mark &&
+ (gf_group->update_type[cpi->gf_frame_index] == OVERLAY_UPDATE ||
+ gf_group->update_type[cpi->gf_frame_index] == INTNL_OVERLAY_UPDATE)) {
+ // During the encode of an OVERLAY_UPDATE/INTNL_OVERLAY_UPDATE frame, loop
+ // over the gf group to check if the corresponding
+ // ARF_UPDATE/INTNL_ARF_UPDATE frame was dropped.
+ int cur_disp_idx = gf_group->display_idx[cpi->gf_frame_index];
+ for (int idx = 0; idx < cpi->gf_frame_index; idx++) {
+ if (cur_disp_idx == gf_group->display_idx[idx]) {
+ assert(IMPLIES(
+ gf_group->update_type[cpi->gf_frame_index] == OVERLAY_UPDATE,
+ gf_group->update_type[idx] == ARF_UPDATE));
+ assert(IMPLIES(gf_group->update_type[cpi->gf_frame_index] ==
+ INTNL_OVERLAY_UPDATE,
+ gf_group->update_type[idx] == INTNL_ARF_UPDATE));
+ // Reset show_existing_frame and set cpi->is_dropped_frame to true if
+ // the frame was dropped during its first encode.
+ if (gf_group->is_frame_dropped[idx]) {
+ frame_params.show_existing_frame = 0;
+ assert(!cpi->is_dropped_frame);
+ cpi->is_dropped_frame = true;
+ }
+ break;
+ }
+ }
+ }
+
// Reset show_existing_alt_ref decision to 0 after it is used.
if (gf_group->update_type[cpi->gf_frame_index] == OVERLAY_UPDATE) {
cpi->ppi->show_existing_alt_ref = 0;
@@ -1387,7 +1427,8 @@
// Source may be changed if temporal filtered later.
frame_input.source = &source->img;
- if (cpi->ppi->use_svc && last_source != NULL)
+ if ((cpi->ppi->use_svc || cpi->rc.prev_frame_is_dropped) &&
+ last_source != NULL)
av1_svc_set_last_source(cpi, &frame_input, &last_source->img);
else
frame_input.last_source = last_source != NULL ? &last_source->img : NULL;
@@ -1680,13 +1721,15 @@
is_frame_droppable(&cpi->ppi->rtc_ref, &ext_flags->refresh_frame);
}
- // For SVC: keep track of the (unscaled) source corresponding to the
- // refresh of LAST reference (base temporal layer- TL0). Copy only for the
+ // For SVC, or when frame-dropper is enabled:
+ // keep track of the (unscaled) source corresponding to the refresh of LAST
+ // reference (base temporal layer - TL0). Copy only for the
// top spatial enhancement layer so all spatial layers of the next
// superframe have last_source to be aligned with previous TL0 superframe.
// Avoid cases where resolution changes for unscaled source (top spatial
- // layer).
- if (cpi->ppi->use_svc &&
+ // layer). Only needs to be done for frame that are encoded (size > 0).
+ if (*size > 0 &&
+ (cpi->ppi->use_svc || cpi->oxcf.rc_cfg.drop_frames_water_mark > 0) &&
cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1 &&
cpi->svc.temporal_layer_id == 0 &&
cpi->unscaled_source->y_width == cpi->svc.source_last_TL0.y_width &&
diff --git a/av1/encoder/encodeframe.c b/av1/encoder/encodeframe.c
index 6700669..50f046d 100644
--- a/av1/encoder/encodeframe.c
+++ b/av1/encoder/encodeframe.c
@@ -81,7 +81,7 @@
// purposes of activity masking.
// Eventually this should be replaced by custom no-reference routines,
// which will be faster.
-const uint8_t AV1_VAR_OFFS[MAX_SB_SIZE] = {
+static const uint8_t AV1_VAR_OFFS[MAX_SB_SIZE] = {
128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128,
@@ -93,6 +93,7 @@
128, 128, 128, 128, 128, 128, 128, 128
};
+#if CONFIG_AV1_HIGHBITDEPTH
static const uint16_t AV1_HIGH_VAR_OFFS_8[MAX_SB_SIZE] = {
128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128,
@@ -145,8 +146,31 @@
128 * 16, 128 * 16, 128 * 16, 128 * 16, 128 * 16, 128 * 16, 128 * 16,
128 * 16, 128 * 16
};
+#endif // CONFIG_AV1_HIGHBITDEPTH
/*!\endcond */
+// For the given bit depth, returns a constant array used to assist the
+// calculation of source block variance, which will then be used to decide
+// adaptive quantizers.
+static const uint8_t *get_var_offs(int use_hbd, int bd) {
+#if CONFIG_AV1_HIGHBITDEPTH
+ if (use_hbd) {
+ assert(bd == 8 || bd == 10 || bd == 12);
+ const int off_index = (bd - 8) >> 1;
+ static const uint16_t *high_var_offs[3] = { AV1_HIGH_VAR_OFFS_8,
+ AV1_HIGH_VAR_OFFS_10,
+ AV1_HIGH_VAR_OFFS_12 };
+ return CONVERT_TO_BYTEPTR(high_var_offs[off_index]);
+ }
+#else
+ (void)use_hbd;
+ (void)bd;
+ assert(!use_hbd);
+#endif
+ assert(bd == 8);
+ return AV1_VAR_OFFS;
+}
+
void av1_init_rtc_counters(MACROBLOCK *const x) {
av1_init_cyclic_refresh_counters(x);
x->cnt_zeromv = 0;
@@ -167,21 +191,9 @@
const int subsampling_y = xd->plane[plane].subsampling_y;
const BLOCK_SIZE plane_bsize =
get_plane_block_size(bsize, subsampling_x, subsampling_y);
- unsigned int var, sse;
- if (use_hbd) {
- const int bd = xd->bd;
- assert(bd == 8 || bd == 10 || bd == 12);
- const int off_index = (bd - 8) >> 1;
- static const uint16_t *high_var_offs[3] = { AV1_HIGH_VAR_OFFS_8,
- AV1_HIGH_VAR_OFFS_10,
- AV1_HIGH_VAR_OFFS_12 };
- var = cpi->ppi->fn_ptr[plane_bsize].vf(
- ref->buf, ref->stride, CONVERT_TO_BYTEPTR(high_var_offs[off_index]), 0,
- &sse);
- } else {
- var = cpi->ppi->fn_ptr[plane_bsize].vf(ref->buf, ref->stride, AV1_VAR_OFFS,
- 0, &sse);
- }
+ unsigned int sse;
+ const unsigned int var = cpi->ppi->fn_ptr[plane_bsize].vf(
+ ref->buf, ref->stride, get_var_offs(use_hbd, xd->bd), 0, &sse);
return ROUND_POWER_OF_TWO(var, num_pels_log2_lookup[plane_bsize]);
}
@@ -247,7 +259,7 @@
const int sb_row = mi_row >> cm->seq_params->mib_size_log2;
const int sb_col = mi_col >> cm->seq_params->mib_size_log2;
const int sb_cols =
- CEIL_POWER_OF_TWO(cm->mi_params.mi_cols, MAX_MIB_SIZE_LOG2);
+ CEIL_POWER_OF_TWO(cm->mi_params.mi_cols, cm->seq_params->mib_size_log2);
const int sb_index = sb_row * sb_cols + sb_col;
current_qindex =
cpi->ducky_encode_info.frame_info.superblock_encode_qindex[sb_index];
@@ -599,7 +611,7 @@
}
// TODO(jingning): revisit this function.
- if (cpi->oxcf.algo_cfg.enable_tpl_model && 0) {
+ if (cpi->oxcf.algo_cfg.enable_tpl_model && (0)) {
adjust_rdmult_tpl_model(cpi, x, mi_row, mi_col);
}
}
@@ -778,7 +790,8 @@
PC_TREE *const pc_root = av1_alloc_pc_tree_node(sb_size);
av1_rd_use_partition(cpi, td, tile_data, mi, tp, mi_row, mi_col, sb_size,
&dummy_rate, &dummy_dist, 1, pc_root);
- av1_free_pc_tree_recursive(pc_root, num_planes, 0, 0);
+ av1_free_pc_tree_recursive(pc_root, num_planes, 0, 0,
+ sf->part_sf.partition_search_type);
#if CONFIG_COLLECT_COMPONENT_TIMING
end_timing(cpi, rd_use_partition_time);
#endif
@@ -793,7 +806,8 @@
PC_TREE *const pc_root = av1_alloc_pc_tree_node(sb_size);
av1_rd_use_partition(cpi, td, tile_data, mi, tp, mi_row, mi_col, sb_size,
&dummy_rate, &dummy_dist, 1, pc_root);
- av1_free_pc_tree_recursive(pc_root, num_planes, 0, 0);
+ av1_free_pc_tree_recursive(pc_root, num_planes, 0, 0,
+ sf->part_sf.partition_search_type);
} else {
SB_FIRST_PASS_STATS *sb_org_stats = NULL;
@@ -1132,12 +1146,10 @@
av1_set_cost_upd_freq(cpi, td, tile_info, mi_row, mi_col);
// Reset color coding related parameters
- x->color_sensitivity_sb[0] = 0;
- x->color_sensitivity_sb[1] = 0;
- x->color_sensitivity_sb_g[0] = 0;
- x->color_sensitivity_sb_g[1] = 0;
- x->color_sensitivity[0] = 0;
- x->color_sensitivity[1] = 0;
+ av1_zero(x->color_sensitivity_sb);
+ av1_zero(x->color_sensitivity_sb_g);
+ av1_zero(x->color_sensitivity_sb_alt);
+ av1_zero(x->color_sensitivity);
x->content_state_sb.source_sad_nonrd = kMedSad;
x->content_state_sb.source_sad_rd = kMedSad;
x->content_state_sb.lighting_change = 0;
@@ -1419,9 +1431,11 @@
cpi->td.mb.e_mbd.tile_ctx = &this_tile->tctx;
cpi->td.mb.tile_pb_ctx = &this_tile->tctx;
av1_init_rtc_counters(&cpi->td.mb);
+ cpi->td.mb.palette_pixels = 0;
av1_encode_tile(cpi, &cpi->td, tile_row, tile_col);
if (!frame_is_intra_only(&cpi->common))
av1_accumulate_rtc_counters(cpi, &cpi->td.mb);
+ cpi->palette_pixel_num += cpi->td.mb.palette_pixels;
cpi->intrabc_used |= cpi->td.intrabc_used;
cpi->deltaq_used |= cpi->td.deltaq_used;
}
@@ -1857,11 +1871,12 @@
// base_qindex
cm->delta_q_info.delta_q_present_flag &= quant_params->base_qindex > 0;
cm->delta_q_info.delta_lf_present_flag &= quant_params->base_qindex > 0;
- } else {
+ } else if (cpi->cyclic_refresh->apply_cyclic_refresh ||
+ cpi->svc.number_temporal_layers == 1) {
cpi->cyclic_refresh->actual_num_seg1_blocks = 0;
cpi->cyclic_refresh->actual_num_seg2_blocks = 0;
- cpi->rc.cnt_zeromv = 0;
}
+ cpi->rc.cnt_zeromv = 0;
av1_frame_init_quantizer(cpi);
init_encode_frame_mb_context(cpi);
@@ -1946,7 +1961,8 @@
? av1_alloc_pc_tree_node(cm->seq_params->sb_size)
: NULL;
encode_tiles(cpi);
- av1_free_pc_tree_recursive(td->rt_pc_root, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(td->rt_pc_root, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
}
}
@@ -2215,7 +2231,6 @@
AV1_COMMON *const cm = &cpi->common;
CurrentFrame *const current_frame = &cm->current_frame;
FeatureFlags *const features = &cm->features;
- const int num_planes = av1_num_planes(cm);
RD_COUNTS *const rdc = &cpi->td.rd_counts;
const AV1EncoderConfig *const oxcf = &cpi->oxcf;
// Indicates whether or not to use a default reduced set for ext-tx
@@ -2244,13 +2259,27 @@
cpi->ref_frame_flags);
av1_setup_frame_sign_bias(cm);
+ // If global motion is enabled, then every buffer which is used as either
+ // a source or a ref frame should have an image pyramid allocated.
+ // Check here so that issues can be caught early in debug mode
+#if !defined(NDEBUG) && !CONFIG_REALTIME_ONLY
+ if (cpi->image_pyramid_levels > 0) {
+ assert(cpi->source->y_pyramid);
+ for (int ref_frame = LAST_FRAME; ref_frame <= ALTREF_FRAME; ++ref_frame) {
+ const RefCntBuffer *const buf = get_ref_frame_buf(cm, ref_frame);
+ if (buf != NULL) {
+ assert(buf->buf.y_pyramid);
+ }
+ }
+ }
+#endif // !defined(NDEBUG) && !CONFIG_REALTIME_ONLY
+
#if CONFIG_MISMATCH_DEBUG
- mismatch_reset_frame(num_planes);
-#else
- (void)num_planes;
+ mismatch_reset_frame(av1_num_planes(cm));
#endif
rdc->newmv_or_intra_blocks = 0;
+ cpi->palette_pixel_num = 0;
if (cpi->sf.hl_sf.frame_parameter_update ||
cpi->sf.rt_sf.use_comp_ref_nonrd) {
diff --git a/av1/encoder/encodeframe_utils.c b/av1/encoder/encodeframe_utils.c
index c478ef6..29d7fe4 100644
--- a/av1/encoder/encodeframe_utils.c
+++ b/av1/encoder/encodeframe_utils.c
@@ -31,8 +31,19 @@
const int num_brows = (mi_size_high[bsize] + num_mi_h - 1) / num_mi_h;
int row, col;
double num_of_mi = 0.0;
- double geom_mean_of_scale = 0.0;
+ double geom_mean_of_scale = 1.0;
+ // To avoid overflow of 'geom_mean_of_scale', bsize_base must be at least
+ // BLOCK_8X8.
+ //
+ // For bsize=BLOCK_128X128 and bsize_base=BLOCK_8X8, the loop below would
+ // iterate 256 times. Considering the maximum value of
+ // cpi->ssim_rdmult_scaling_factors (see av1_set_mb_ssim_rdmult_scaling()),
+ // geom_mean_of_scale can go up to 4.8323^256, which is within DBL_MAX
+ // (maximum value a double data type can hold). If bsize_base is modified to
+ // BLOCK_4X4 (minimum possible block size), geom_mean_of_scale can go up
+ // to 4.8323^1024 and exceed DBL_MAX, resulting in data overflow.
+ assert(bsize_base >= BLOCK_8X8);
assert(cpi->oxcf.tune_cfg.tuning == AOM_TUNE_SSIM);
for (row = mi_row / num_mi_w;
@@ -41,17 +52,36 @@
col < num_cols && col < mi_col / num_mi_h + num_bcols; ++col) {
const int index = row * num_cols + col;
assert(cpi->ssim_rdmult_scaling_factors[index] != 0.0);
- geom_mean_of_scale += log(cpi->ssim_rdmult_scaling_factors[index]);
+ geom_mean_of_scale *= cpi->ssim_rdmult_scaling_factors[index];
num_of_mi += 1.0;
}
}
- geom_mean_of_scale = exp(geom_mean_of_scale / num_of_mi);
+ geom_mean_of_scale = pow(geom_mean_of_scale, (1.0 / num_of_mi));
*rdmult = (int)((double)(*rdmult) * geom_mean_of_scale + 0.5);
*rdmult = AOMMAX(*rdmult, 0);
av1_set_error_per_bit(errorperbit, *rdmult);
}
+#if CONFIG_SALIENCY_MAP
+void av1_set_saliency_map_vmaf_rdmult(const AV1_COMP *const cpi,
+ int *errorperbit, const BLOCK_SIZE bsize,
+ const int mi_row, const int mi_col,
+ int *const rdmult) {
+ const AV1_COMMON *const cm = &cpi->common;
+ const int num_mi_w = mi_size_wide[bsize];
+ const int num_mi_h = mi_size_high[bsize];
+ const int num_cols = (cm->mi_params.mi_cols + num_mi_w - 1) / num_mi_w;
+
+ *rdmult =
+ (int)(*rdmult * cpi->sm_scaling_factor[(mi_row / num_mi_h) * num_cols +
+ (mi_col / num_mi_w)]);
+
+ *rdmult = AOMMAX(*rdmult, 0);
+ av1_set_error_per_bit(errorperbit, *rdmult);
+}
+#endif
+
// TODO(angiebird): Move these function to tpl_model.c
#if !CONFIG_REALTIME_ONLY
// Return the end column for the current superblock, in unit of TPL blocks.
@@ -193,6 +223,9 @@
for (dir = 0; dir < 2; ++dir) {
const int ctx = av1_get_pred_context_switchable_interp(xd, dir);
InterpFilter filter = av1_extract_interp_filter(mbmi->interp_filters, dir);
+
+ // Only allow the 3 valid SWITCHABLE_FILTERS.
+ assert(filter < SWITCHABLE_FILTERS);
++counts->switchable_interp[ctx][filter];
}
}
@@ -306,8 +339,7 @@
}
// Count zero motion vector.
- if (!dry_run && cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ &&
- !frame_is_intra_only(cm)) {
+ if (!dry_run && !frame_is_intra_only(cm)) {
const MV mv = mi->mv[0].as_mv;
if (is_inter_block(mi) && mi->ref_frame[0] == LAST_FRAME &&
abs(mv.row) < 8 && abs(mv.col) < 8) {
@@ -369,9 +401,12 @@
}
#endif
if (!frame_is_intra_only(cm)) {
- if (cm->features.interp_filter == SWITCHABLE &&
- mi_addr->motion_mode != WARPED_CAUSAL &&
- !is_nontrans_global_motion(xd, xd->mi[0])) {
+ if (is_inter_block(mi) && cm->features.interp_filter == SWITCHABLE) {
+ // When the frame interp filter is SWITCHABLE, several cases that always
+ // use the default type (EIGHTTAP_REGULAR) are described in
+ // av1_is_interp_needed(). Here, we should keep the counts for all
+ // applicable blocks, so the frame filter resetting decision in
+ // fix_interp_filter() is made correctly.
update_filter_type_count(td->counts, xd, mi_addr);
}
}
diff --git a/av1/encoder/encodeframe_utils.h b/av1/encoder/encodeframe_utils.h
index 29350d7..24a36c5 100644
--- a/av1/encoder/encodeframe_utils.h
+++ b/av1/encoder/encodeframe_utils.h
@@ -368,6 +368,13 @@
const BLOCK_SIZE bsize, const int mi_row,
const int mi_col, int *const rdmult);
+#if CONFIG_SALIENCY_MAP
+void av1_set_saliency_map_vmaf_rdmult(const AV1_COMP *const cpi,
+ int *errorperbit, const BLOCK_SIZE bsize,
+ const int mi_row, const int mi_col,
+ int *const rdmult);
+#endif
+
void av1_update_state(const AV1_COMP *const cpi, ThreadData *td,
const PICK_MODE_CONTEXT *const ctx, int mi_row,
int mi_col, BLOCK_SIZE bsize, RUN_TYPE dry_run);
diff --git a/av1/encoder/encodemb.c b/av1/encoder/encodemb.c
index 8dee801..78efa0c 100644
--- a/av1/encoder/encodemb.c
+++ b/av1/encoder/encodemb.c
@@ -403,10 +403,7 @@
l = &args->tl[blk_row];
TX_TYPE tx_type = DCT_DCT;
- const int blk_skip_idx =
- (cpi->sf.rt_sf.use_nonrd_pick_mode && is_inter_block(mbmi))
- ? blk_row * bw / 4 + blk_col / 2
- : blk_row * bw + blk_col;
+ const int blk_skip_idx = blk_row * bw + blk_col;
if (!is_blk_skip(x->txfm_search_info.blk_skip, plane, blk_skip_idx) &&
!mbmi->skip_mode) {
tx_type = av1_get_tx_type(xd, pd->plane_type, blk_row, blk_col, tx_size,
@@ -556,6 +553,13 @@
// 4x4=0, 8x8=2, 16x16=4, 32x32=6, 64x64=8
// transform size varies per plane, look it up in a common way.
const TX_SIZE tx_size = av1_get_tx_size(plane, xd);
+ const BLOCK_SIZE tx_bsize = txsize_to_bsize[tx_size];
+ // Call visit() directly with zero offsets if the current block size is the
+ // same as the transform block size.
+ if (plane_bsize == tx_bsize) {
+ visit(plane, 0, 0, 0, plane_bsize, tx_size, arg);
+ return;
+ }
const uint8_t txw_unit = tx_size_wide_unit[tx_size];
const uint8_t txh_unit = tx_size_high_unit[tx_size];
const int step = txw_unit * txh_unit;
@@ -588,6 +592,8 @@
}
}
}
+ // Check if visit() is invoked at least once.
+ assert(i >= 1);
}
typedef struct encode_block_pass1_args {
diff --git a/av1/encoder/encoder.c b/av1/encoder/encoder.c
index 0b12ff0..d5d7dcc 100644
--- a/av1/encoder/encoder.c
+++ b/av1/encoder/encoder.c
@@ -27,6 +27,7 @@
#include "aom_dsp/noise_util.h"
#include "aom_dsp/noise_model.h"
#endif
+#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_dsp/psnr.h"
#if CONFIG_INTERNAL_STATS
#include "aom_dsp/ssim.h"
@@ -75,6 +76,9 @@
#include "av1/encoder/rc_utils.h"
#include "av1/encoder/rd.h"
#include "av1/encoder/rdopt.h"
+#if CONFIG_SALIENCY_MAP
+#include "av1/encoder/saliency_map.h"
+#endif
#include "av1/encoder/segmentation.h"
#include "av1/encoder/speed_features.h"
#include "av1/encoder/superres_scale.h"
@@ -125,6 +129,14 @@
*hr = 1;
*hs = 2;
break;
+ case AOME_TWOTHREE:
+ *hr = 2;
+ *hs = 3;
+ break;
+ case AOME_ONETHREE:
+ *hr = 1;
+ *hs = 3;
+ break;
default:
*hr = 1;
*hs = 1;
@@ -363,9 +375,7 @@
static void set_bitstream_level_tier(AV1_PRIMARY *const ppi, int width,
int height, double init_framerate) {
SequenceHeader *const seq_params = &ppi->seq_params;
-#if CONFIG_CWG_C013
const AV1LevelParams *const level_params = &ppi->level_params;
-#endif
// TODO(any): This is a placeholder function that only addresses dimensions
// and max display sample rates.
// Need to add checks for max bit rate, max decoded luma sample rate, header
@@ -435,7 +445,15 @@
#endif
for (int i = 0; i < MAX_NUM_OPERATING_POINTS; ++i) {
- seq_params->seq_level_idx[i] = level;
+ assert(is_valid_seq_level_idx(level_params->target_seq_level_idx[i]) ||
+ level_params->target_seq_level_idx[i] == SEQ_LEVEL_KEEP_STATS);
+ // If a higher target level is specified, it is then used rather than the
+ // inferred one from resolution and framerate.
+ seq_params->seq_level_idx[i] =
+ level_params->target_seq_level_idx[i] < SEQ_LEVELS &&
+ level_params->target_seq_level_idx[i] > level
+ ? level_params->target_seq_level_idx[i]
+ : level;
// Set the maximum parameters for bitrate and buffer size for this profile,
// level, and tier
seq_params->op_params[i].bitrate = av1_max_level_bitrate(
@@ -650,6 +668,9 @@
resize_pending_params->width = 0;
resize_pending_params->height = 0;
+ // Setup identity scale factor
+ av1_setup_scale_factors_for_frame(&cm->sf_identity, 1, 1, 1, 1);
+
init_buffer_indices(&cpi->force_intpel_info, cm->remapped_ref_idx);
av1_noise_estimate_init(&cpi->noise_estimate, cm->width, cm->height);
@@ -799,7 +820,11 @@
if (has_no_stats_stage(cpi) && (rc_cfg->mode == AOM_Q)) {
p_rc->baseline_gf_interval = FIXED_GF_INTERVAL;
- } else {
+ } else if (!is_one_pass_rt_params(cpi) ||
+ cm->current_frame.frame_number == 0) {
+ // For rtc mode: logic for setting the baseline_gf_interval is done
+ // in av1_get_one_pass_rt_params(), and it should not be reset here in
+ // change_config(), unless after init_config (first frame).
p_rc->baseline_gf_interval = (MIN_GF_INTERVAL + MAX_GF_INTERVAL) / 2;
}
@@ -859,6 +884,14 @@
rc->worst_quality = rc_cfg->worst_allowed_q;
rc->best_quality = rc_cfg->best_allowed_q;
+ // If lossless has been requested make sure average Q accumulators are reset.
+ if (is_lossless_requested(&cpi->oxcf.rc_cfg)) {
+ int i;
+ for (i = 0; i < FRAME_TYPES; ++i) {
+ p_rc->avg_frame_qindex[i] = 0;
+ }
+ }
+
features->interp_filter =
oxcf->tile_cfg.enable_large_scale_tile ? EIGHTTAP_REGULAR : SWITCHABLE;
features->switchable_motion_mode = is_switchable_motion_mode_allowed(
@@ -906,6 +939,18 @@
if (lap_lag_in_frames != -1) {
cpi->oxcf.gf_cfg.lag_in_frames = lap_lag_in_frames;
}
+
+#if CONFIG_REALTIME_ONLY
+ assert(!oxcf->tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels = 0;
+#else
+ if (oxcf->tool_cfg.enable_global_motion) {
+ cpi->image_pyramid_levels =
+ global_motion_pyr_levels[default_global_motion_method];
+ } else {
+ cpi->image_pyramid_levels = 0;
+ }
+#endif // CONFIG_REALTIME_ONLY
}
static INLINE void init_frame_info(FRAME_INFO *frame_info,
@@ -928,11 +973,10 @@
frame_index_set->show_frame_count = 0;
}
-static INLINE void update_frame_index_set(FRAME_INDEX_SET *frame_index_set,
- int is_show_frame) {
- if (is_show_frame) {
- frame_index_set->show_frame_count++;
- }
+static INLINE void update_counters_for_show_frame(AV1_COMP *const cpi) {
+ assert(cpi->common.show_frame);
+ cpi->frame_index_set.show_frame_count++;
+ cpi->common.current_frame.frame_number++;
}
AV1_PRIMARY *av1_create_primary_compressor(
@@ -1366,6 +1410,8 @@
init_frame_index_set(&cpi->frame_index_set);
cm->current_frame.frame_number = 0;
+ cpi->rc.frame_number_encoded = 0;
+ cpi->rc.prev_frame_is_dropped = 0;
cm->current_frame_id = -1;
cpi->tile_data = NULL;
cpi->last_show_frame_buf = NULL;
@@ -1446,6 +1492,7 @@
cpi->mb_weber_stats = NULL;
cpi->mb_delta_q = NULL;
+ cpi->palette_pixel_num = 0;
{
const int bsize = BLOCK_16X16;
@@ -1499,15 +1546,41 @@
}
#endif
+#if CONFIG_SALIENCY_MAP
+ {
+ CHECK_MEM_ERROR(cm, cpi->saliency_map,
+ (uint8_t *)aom_calloc(cm->height * cm->width,
+ sizeof(*cpi->saliency_map)));
+ // Buffer initialization based on MIN_MIB_SIZE_LOG2 to ensure that
+ // cpi->sm_scaling_factor buffer is allocated big enough, since we have no
+ // idea of the actual superblock size we are going to use yet.
+ const int min_mi_w_sb = (1 << MIN_MIB_SIZE_LOG2);
+ const int min_mi_h_sb = (1 << MIN_MIB_SIZE_LOG2);
+ const int max_sb_cols =
+ (cm->mi_params.mi_cols + min_mi_w_sb - 1) / min_mi_w_sb;
+ const int max_sb_rows =
+ (cm->mi_params.mi_rows + min_mi_h_sb - 1) / min_mi_h_sb;
+ CHECK_MEM_ERROR(cm, cpi->sm_scaling_factor,
+ (double *)aom_calloc(max_sb_rows * max_sb_cols,
+ sizeof(*cpi->sm_scaling_factor)));
+ }
+#endif
+
#if CONFIG_COLLECT_PARTITION_STATS
av1_zero(cpi->partition_stats);
#endif // CONFIG_COLLECT_PARTITION_STATS
- /* av1_init_quantizer() is first called here. Add check in
- * av1_frame_init_quantizer() so that av1_init_quantizer is only
- * called later when needed. This will avoid unnecessary calls of
- * av1_init_quantizer() for every frame.
- */
+ // Initialize the members of DeltaQuantParams with INT_MAX to ensure that
+ // the quantizer tables are correctly initialized using the default deltaq
+ // parameters when av1_init_quantizer is called for the first time.
+ DeltaQuantParams *const prev_deltaq_params =
+ &cpi->enc_quant_dequant_params.prev_deltaq_params;
+ prev_deltaq_params->y_dc_delta_q = INT_MAX;
+ prev_deltaq_params->u_dc_delta_q = INT_MAX;
+ prev_deltaq_params->v_dc_delta_q = INT_MAX;
+ prev_deltaq_params->u_ac_delta_q = INT_MAX;
+ prev_deltaq_params->v_ac_delta_q = INT_MAX;
+
av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
cm->seq_params->bit_depth);
av1_qm_init(&cm->quant_params, av1_num_planes(cm));
@@ -1550,40 +1623,6 @@
}
}
-// Deallocate allocated thread_data.
-static AOM_INLINE void free_thread_data(AV1_PRIMARY *ppi) {
- PrimaryMultiThreadInfo *const p_mt_info = &ppi->p_mt_info;
- for (int t = 1; t < p_mt_info->num_workers; ++t) {
- EncWorkerData *const thread_data = &p_mt_info->tile_thr_data[t];
- thread_data->td = thread_data->original_td;
- aom_free(thread_data->td->tctx);
- aom_free(thread_data->td->palette_buffer);
- aom_free(thread_data->td->tmp_conv_dst);
- release_compound_type_rd_buffers(&thread_data->td->comp_rd_buffer);
- for (int j = 0; j < 2; ++j) {
- aom_free(thread_data->td->tmp_pred_bufs[j]);
- }
- aom_free(thread_data->td->pixel_gradient_info);
- aom_free(thread_data->td->src_var_info_of_4x4_sub_blocks);
- release_obmc_buffers(&thread_data->td->obmc_buffer);
- aom_free(thread_data->td->vt64x64);
-
- for (int x = 0; x < 2; x++) {
- for (int y = 0; y < 2; y++) {
- aom_free(thread_data->td->hash_value_buffer[x][y]);
- thread_data->td->hash_value_buffer[x][y] = NULL;
- }
- }
- aom_free(thread_data->td->counts);
- av1_free_pmc(thread_data->td->firstpass_ctx,
- ppi->seq_params.monochrome ? 1 : MAX_MB_PLANE);
- thread_data->td->firstpass_ctx = NULL;
- av1_free_shared_coeff_buffer(&thread_data->td->shared_coeff_buf);
- av1_free_sms_tree(thread_data->td);
- aom_free(thread_data->td);
- }
-}
-
void av1_remove_primary_compressor(AV1_PRIMARY *ppi) {
if (!ppi) return;
#if !CONFIG_REALTIME_ONLY
@@ -1648,7 +1687,12 @@
av1_denoiser_free(&(cpi->denoiser));
#endif
- aom_free(cm->error);
+ if (cm->error) {
+ // Help detect use after free of the error detail string.
+ memset(cm->error->detail, 'A', sizeof(cm->error->detail) - 1);
+ cm->error->detail[sizeof(cm->error->detail) - 1] = '\0';
+ aom_free(cm->error);
+ }
aom_free(cpi->td.tctx);
MultiThreadInfo *const mt_info = &cpi->mt_info;
#if CONFIG_MULTITHREAD
@@ -2040,7 +2084,7 @@
}
#ifndef NDEBUG
BufferPool *const pool = cm->buffer_pool;
- for (i = 0; i < FRAME_BUFFERS; ++i) {
+ for (i = 0; i < pool->num_frame_bufs; ++i) {
assert(pool->frame_bufs[i].ref_count == 0);
}
#endif
@@ -2153,7 +2197,6 @@
}
#endif
}
-
if (is_stat_consumption_stage(cpi)) {
av1_set_target_rate(cpi, cm->width, cm->height);
}
@@ -2183,7 +2226,7 @@
&cm->cur_frame->buf, cm->width, cm->height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels, cm->features.byte_alignment, NULL, NULL,
- NULL, cpi->oxcf.tool_cfg.enable_global_motion, 0))
+ NULL, cpi->image_pyramid_levels, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate frame buffer");
@@ -2199,7 +2242,8 @@
for (int i = 0; i < num_planes; ++i)
cm->rst_info[i].frame_restoration_type = RESTORE_NONE;
- av1_alloc_restoration_buffers(cm);
+ const bool is_sgr_enabled = !cpi->sf.lpf_sf.disable_sgr_filter;
+ av1_alloc_restoration_buffers(cm, is_sgr_enabled);
// Store the allocated restoration buffers in MT object.
if (cpi->ppi->p_mt_info.num_workers > 1) {
av1_init_lr_mt_buffers(cpi);
@@ -2278,7 +2322,8 @@
cpi->sf.lpf_sf.cdef_pick_method, cpi->td.mb.rdmult,
cpi->sf.rt_sf.skip_cdef_sb, cpi->oxcf.tool_cfg.cdef_control,
use_screen_content_model,
- cpi->ppi->rtc_ref.non_reference_frame);
+ cpi->ppi->rtc_ref.non_reference_frame,
+ cpi->rc.rtc_external_ratectrl);
// Apply the filter
if ((skip_apply_postproc_filters & SKIP_APPLY_CDEF) == 0) {
@@ -2472,14 +2517,15 @@
av1_set_size_dependent_vars(cpi, &q, &bottom_index, &top_index);
av1_set_mv_search_params(cpi);
- if (cm->current_frame.frame_number == 0 && cpi->ppi->use_svc &&
+ if (cm->current_frame.frame_number == 0 &&
+ (cpi->ppi->use_svc || cpi->oxcf.rc_cfg.drop_frames_water_mark > 0) &&
cpi->svc.temporal_layer_id == 0) {
const SequenceHeader *seq_params = cm->seq_params;
if (aom_alloc_frame_buffer(
&cpi->svc.source_last_TL0, cpi->oxcf.frm_dim_cfg.width,
cpi->oxcf.frm_dim_cfg.height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0)) {
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0)) {
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate buffer for source_last_TL0");
}
@@ -2528,8 +2574,7 @@
cpi->source = av1_realloc_and_scale_if_required(
cm, unscaled, &cpi->scaled_source, filter_scaler, phase_scaler, true,
- false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ false, cpi->oxcf.border_in_pixels, cpi->image_pyramid_levels);
if (frame_is_intra_only(cm) || resize_pending != 0) {
memset(cpi->consec_zero_mv, 0,
((cm->mi_params.mi_rows * cm->mi_params.mi_cols) >> 2) *
@@ -2540,7 +2585,7 @@
cpi->last_source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_last_source, &cpi->scaled_last_source, filter_scaler,
phase_scaler, true, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels);
}
if (cpi->sf.rt_sf.use_temporal_noise_estimate) {
@@ -2595,9 +2640,8 @@
av1_set_quantizer(cm, q_cfg->qm_minlevel, q_cfg->qm_maxlevel, q,
q_cfg->enable_chroma_deltaq, q_cfg->enable_hdr_deltaq);
av1_set_speed_features_qindex_dependent(cpi, cpi->oxcf.speed);
- if ((q_cfg->deltaq_mode != NO_DELTA_Q) || q_cfg->enable_chroma_deltaq)
- av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
- cm->seq_params->bit_depth);
+ av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
+ cm->seq_params->bit_depth);
av1_set_variance_partition_thresholds(cpi, q, 0);
av1_setup_frame(cpi);
@@ -2610,9 +2654,8 @@
av1_set_quantizer(cm, q_cfg->qm_minlevel, q_cfg->qm_maxlevel, q,
q_cfg->enable_chroma_deltaq, q_cfg->enable_hdr_deltaq);
av1_set_speed_features_qindex_dependent(cpi, cpi->oxcf.speed);
- if (q_cfg->deltaq_mode != NO_DELTA_Q || q_cfg->enable_chroma_deltaq)
- av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
- cm->seq_params->bit_depth);
+ av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
+ cm->seq_params->bit_depth);
av1_set_variance_partition_thresholds(cpi, q, 0);
if (frame_is_intra_only(cm) || cm->features.error_resilient_mode ||
cm->features.primary_ref_frame == PRIMARY_REF_NONE)
@@ -2651,7 +2694,7 @@
&cpi->orig_source, cpi->oxcf.frm_dim_cfg.width,
cpi->oxcf.frm_dim_cfg.height, seq_params->subsampling_x,
seq_params->subsampling_y, seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0))
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate scaled buffer");
}
@@ -2725,7 +2768,6 @@
cpi->sf.interp_sf.adaptive_interp_filter_search)
cpi->interp_search_flags.interp_filter_search_mask =
av1_setup_interp_filter_search_mask(cpi);
- cpi->source->buf_8bit_valid = 0;
av1_setup_frame_size(cpi);
@@ -2800,8 +2842,7 @@
}
cpi->source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_source, &cpi->scaled_source, EIGHTTAP_REGULAR, 0,
- false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ false, false, cpi->oxcf.border_in_pixels, cpi->image_pyramid_levels);
#if CONFIG_TUNE_BUTTERAUGLI
if (oxcf->tune_cfg.tuning == AOM_TUNE_BUTTERAUGLI) {
@@ -2821,7 +2862,7 @@
cpi->last_source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_last_source, &cpi->scaled_last_source,
EIGHTTAP_REGULAR, 0, false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels);
}
int scale_references = 0;
@@ -2900,10 +2941,8 @@
av1_set_quantizer(cm, q_cfg->qm_minlevel, q_cfg->qm_maxlevel, q,
q_cfg->enable_chroma_deltaq, q_cfg->enable_hdr_deltaq);
av1_set_speed_features_qindex_dependent(cpi, oxcf->speed);
-
- if (q_cfg->deltaq_mode != NO_DELTA_Q || q_cfg->enable_chroma_deltaq)
- av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
- cm->seq_params->bit_depth);
+ av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
+ cm->seq_params->bit_depth);
av1_set_variance_partition_thresholds(cpi, q, 0);
@@ -3080,15 +3119,21 @@
film_grain_params->scaling_points_y[0][0] = 128;
film_grain_params->scaling_points_y[0][1] = 100;
- film_grain_params->num_cb_points = 1;
- film_grain_params->scaling_points_cb[0][0] = 128;
- film_grain_params->scaling_points_cb[0][1] = 100;
+ if (!cm->seq_params->monochrome) {
+ film_grain_params->num_cb_points = 1;
+ film_grain_params->scaling_points_cb[0][0] = 128;
+ film_grain_params->scaling_points_cb[0][1] = 100;
- film_grain_params->num_cr_points = 1;
- film_grain_params->scaling_points_cr[0][0] = 128;
- film_grain_params->scaling_points_cr[0][1] = 100;
+ film_grain_params->num_cr_points = 1;
+ film_grain_params->scaling_points_cr[0][0] = 128;
+ film_grain_params->scaling_points_cr[0][1] = 100;
+ } else {
+ film_grain_params->num_cb_points = 0;
+ film_grain_params->num_cr_points = 0;
+ }
film_grain_params->chroma_scaling_from_luma = 0;
+
film_grain_params->scaling_shift = 1;
film_grain_params->ar_coeff_lag = 0;
film_grain_params->ar_coeff_shift = 1;
@@ -3423,7 +3468,8 @@
// after 8 frames since last update if frame_source_sad > 0.
if (frame_is_intra_only(cm) || is_frame_resize_pending(cpi) ||
rc->high_source_sad || rc->frames_since_key < 30 ||
- cpi->cyclic_refresh->counter_encode_maxq_scene_change < 30 ||
+ (cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ &&
+ cpi->cyclic_refresh->counter_encode_maxq_scene_change < 30) ||
(cpi->frames_since_last_update > 8 && cpi->rc.frame_source_sad > 0))
return 0;
else
@@ -3494,7 +3540,7 @@
src, stride, hbd, num_8x8_rows, num_8x8_cols);
cpi->twopass_frame.frame_avg_haar_energy =
- log(((double)frame_avg_wavelet_energy / num_mbs) + 1.0);
+ log1p((double)frame_avg_wavelet_energy / num_mbs);
}
#endif
@@ -3632,8 +3678,7 @@
}
#endif // !CONFIG_REALTIME_ONLY
- ++current_frame->frame_number;
- update_frame_index_set(&cpi->frame_index_set, cm->show_frame);
+ update_counters_for_show_frame(cpi);
return AOM_CODEC_OK;
}
@@ -3685,15 +3730,32 @@
}
// For 1 pass CBR, check if we are dropping this frame.
- // Never drop on key frame.
+ // Never drop on key frame, or for frame whose base layer is key.
if (has_no_stats_stage(cpi) && oxcf->rc_cfg.mode == AOM_CBR &&
- current_frame->frame_type != KEY_FRAME) {
+ current_frame->frame_type != KEY_FRAME &&
+ !(cpi->ppi->use_svc &&
+ cpi->svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame)) {
+ FRAME_UPDATE_TYPE update_type =
+ cpi->ppi->gf_group.update_type[cpi->gf_frame_index];
+ (void)update_type;
+ assert(
+ IMPLIES(cpi->is_dropped_frame, (update_type == OVERLAY_UPDATE ||
+ update_type == INTNL_OVERLAY_UPDATE)));
if (av1_rc_drop_frame(cpi)) {
+ cpi->is_dropped_frame = true;
+ }
+ if (cpi->is_dropped_frame) {
av1_setup_frame_size(cpi);
av1_set_mv_search_params(cpi);
av1_rc_postencode_update_drop_frame(cpi);
release_scaled_references(cpi);
- cpi->is_dropped_frame = true;
+ cpi->ppi->gf_group.is_frame_dropped[cpi->gf_frame_index] = true;
+ // A dropped frame might not be shown but it always takes a slot in the gf
+ // group. Therefore, even when it is not shown, we still need to update
+ // the relevant frame counters.
+ if (cm->show_frame) {
+ update_counters_for_show_frame(cpi);
+ }
return AOM_CODEC_OK;
}
}
@@ -3701,11 +3763,26 @@
if (oxcf->tune_cfg.tuning == AOM_TUNE_SSIM) {
av1_set_mb_ssim_rdmult_scaling(cpi);
}
-
+#if CONFIG_SALIENCY_MAP
+ else if (oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_SALIENCY_MAP &&
+ !(cpi->source->flags & YV12_FLAG_HIGHBITDEPTH)) {
+ if (av1_set_saliency_map(cpi) == 0) {
+ return AOM_CODEC_MEM_ERROR;
+ }
+#if !CONFIG_REALTIME_ONLY
+ double motion_ratio = av1_setup_motion_ratio(cpi);
+#else
+ double motion_ratio = 1.0;
+#endif
+ if (av1_setup_sm_rdmult_scaling_factor(cpi, motion_ratio) == 0) {
+ return AOM_CODEC_MEM_ERROR;
+ }
+ }
+#endif
#if CONFIG_TUNE_VMAF
- if (oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_WITHOUT_PREPROCESSING ||
- oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_MAX_GAIN ||
- oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_NEG_MAX_GAIN) {
+ else if (oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_WITHOUT_PREPROCESSING ||
+ oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_MAX_GAIN ||
+ oxcf->tune_cfg.tuning == AOM_TUNE_VMAF_NEG_MAX_GAIN) {
av1_set_mb_vmaf_rdmult_scaling(cpi);
}
#endif
@@ -3787,6 +3864,15 @@
features->disable_cdf_update = 1;
}
+#if !CONFIG_REALTIME_ONLY
+ if (cpi->oxcf.tool_cfg.enable_global_motion && !frame_is_intra_only(cm)) {
+ // Flush any stale global motion information, which may be left over
+ // from a previous frame
+ aom_invalidate_pyramid(cpi->source->y_pyramid);
+ av1_invalidate_corner_list(cpi->source->corners);
+ }
+#endif // !CONFIG_REALTIME_ONLY
+
int largest_tile_id = 0;
if (av1_superres_in_recode_allowed(cpi)) {
if (encode_with_and_without_superres(cpi, size, dest, &largest_tile_id) !=
@@ -3875,18 +3961,17 @@
cpi->frames_since_last_update = 1;
}
+ if (cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1)
+ cpi->svc.prev_number_spatial_layers = cpi->svc.number_spatial_layers;
+
// Clear the one shot update flags for segmentation map and mode/ref loop
// filter deltas.
cm->seg.update_map = 0;
cm->seg.update_data = 0;
cm->lf.mode_ref_delta_update = 0;
- // A droppable frame might not be shown but it always
- // takes a space in the gf group. Therefore, even when
- // it is not shown, we still need update the count down.
if (cm->show_frame) {
- update_frame_index_set(&cpi->frame_index_set, cm->show_frame);
- ++current_frame->frame_number;
+ update_counters_for_show_frame(cpi);
}
#if CONFIG_COLLECT_COMPONENT_TIMING
@@ -4038,10 +4123,10 @@
// No noise synthesis if source is very clean.
// Uses a low edge threshold to focus on smooth areas.
// Increase output noise setting a little compared to measured value.
- cpi->oxcf.noise_level =
- (float)(av1_estimate_noise_from_single_plane(
- sd, 0, cm->seq_params->bit_depth, 16) -
- 0.1);
+ double y_noise_level = 0.0;
+ av1_estimate_noise_level(sd, &y_noise_level, AOM_PLANE_Y, AOM_PLANE_Y,
+ cm->seq_params->bit_depth, 16);
+ cpi->oxcf.noise_level = (float)(y_noise_level - 0.1);
cpi->oxcf.noise_level = (float)AOMMAX(0.0, cpi->oxcf.noise_level);
if (cpi->oxcf.noise_level > 0.0) {
cpi->oxcf.noise_level += (float)0.5;
@@ -4057,7 +4142,8 @@
#endif // CONFIG_DENOISE
if (av1_lookahead_push(cpi->ppi->lookahead, sd, time_stamp, end_time,
- use_highbitdepth, frame_flags)) {
+ use_highbitdepth, cpi->image_pyramid_levels,
+ frame_flags)) {
aom_internal_error(cm->error, AOM_CODEC_ERROR,
"av1_lookahead_push() failed");
res = -1;
@@ -4509,6 +4595,13 @@
}
#endif
+#if CONFIG_OUTPUT_FRAME_SIZE
+ FILE *f = fopen("frame_sizes.csv", "a");
+ fprintf(f, "%d,", 8 * (int)cpi_data->frame_size);
+ fprintf(f, "%d\n", cm->quant_params.base_qindex);
+ fclose(f);
+#endif // CONFIG_OUTPUT_FRAME_SIZE
+
if (!is_stat_generation_stage(cpi) && !cpi->is_dropped_frame) {
// Before calling refresh_reference_frames(), copy ppi->ref_frame_map_copy
// to cm->ref_frame_map for frame_parallel_level 2 frame in a parallel
@@ -4564,6 +4657,11 @@
av1_pop_third_pass_info(cpi->third_pass_ctx);
}
+ if (ppi->rtc_ref.set_ref_frame_config) {
+ av1_svc_update_buffer_slot_refreshed(cpi);
+ av1_svc_set_reference_was_previous(cpi);
+ }
+
if (ppi->use_svc) av1_save_layer_context(cpi);
// Note *size = 0 indicates a dropped frame for which psnr is not calculated
@@ -4701,6 +4799,9 @@
}
#endif
+ // Reset the flag to 0 afer encoding.
+ cpi->rc.use_external_qp_one_pass = 0;
+
if (result == -1) {
cm->error->setjmp = 0;
// Returning -1 indicates no frame encoded; more input is required
@@ -4749,7 +4850,7 @@
RefCntBuffer *buf = get_ref_frame_buf(cm, ref_frame);
cpi->scaled_ref_buf[ref_frame - 1] = buf;
- for (int i = 0; i < FRAME_BUFFERS; ++i) {
+ for (int i = 0; i < cm->buffer_pool->num_frame_bufs; ++i) {
if (&cm->buffer_pool->frame_bufs[i] == buf) {
*ref_buffers_used_map |= (1 << i);
}
@@ -4764,7 +4865,7 @@
// corresponding to frames in a parallel encode set.
void av1_increment_scaled_ref_counts_fpmt(BufferPool *buffer_pool,
int ref_buffers_used_map) {
- for (int i = 0; i < FRAME_BUFFERS; ++i) {
+ for (int i = 0; i < buffer_pool->num_frame_bufs; ++i) {
if (ref_buffers_used_map & (1 << i)) {
++buffer_pool->frame_bufs[i].ref_count;
}
@@ -4787,7 +4888,7 @@
// corresponding to frames in a parallel encode set.
void av1_decrement_ref_counts_fpmt(BufferPool *buffer_pool,
int ref_buffers_used_map) {
- for (int i = 0; i < FRAME_BUFFERS; ++i) {
+ for (int i = 0; i < buffer_pool->num_frame_bufs; ++i) {
if (ref_buffers_used_map & (1 << i)) {
--buffer_pool->frame_bufs[i].ref_count;
}
@@ -5070,7 +5171,8 @@
AOM_SCALING_MODE vert_mode) {
int hr = 0, hs = 0, vr = 0, vs = 0;
- if (horiz_mode > AOME_ONETWO || vert_mode > AOME_ONETWO) return -1;
+ // Checks for invalid AOM_SCALING_MODE values.
+ if (horiz_mode > AOME_ONETHREE || vert_mode > AOME_ONETHREE) return -1;
Scale2Ratio(horiz_mode, &hr, &hs);
Scale2Ratio(vert_mode, &vr, &vs);
diff --git a/av1/encoder/encoder.h b/av1/encoder/encoder.h
index d13f08f..2965f9b 100644
--- a/av1/encoder/encoder.h
+++ b/av1/encoder/encoder.h
@@ -1071,6 +1071,15 @@
// CONFIG_PARTITION_SEARCH_ORDER.
const char *partition_info_path;
+ // The flag that indicates whether we use an external rate distribution to
+ // guide adaptive quantization. It requires --deltaq-mode=3. The rate
+ // distribution map file name is stored in |rate_distribution_info|.
+ unsigned int enable_rate_guide_deltaq;
+
+ // The input file of rate distribution information used in all intra mode
+ // to determine delta quantization.
+ const char *rate_distribution_info;
+
// Exit the encoder when it fails to encode to a given level.
int strict_level_conformance;
@@ -1544,6 +1553,36 @@
} AV1EncRowMultiThreadInfo;
/*!
+ * \brief Encoder data related to multi-threading for allintra deltaq-mode=3
+ */
+typedef struct {
+#if CONFIG_MULTITHREAD
+ /*!
+ * Mutex lock used while dispatching jobs.
+ */
+ pthread_mutex_t *mutex_;
+ /*!
+ * Condition variable used to dispatch loopfilter jobs.
+ */
+ pthread_cond_t *cond_;
+#endif
+
+ /**
+ * \name Row synchronization related function pointers for all intra mode
+ */
+ /**@{*/
+ /*!
+ * Reader.
+ */
+ void (*intra_sync_read_ptr)(AV1EncRowMultiThreadSync *const, int, int);
+ /*!
+ * Writer.
+ */
+ void (*intra_sync_write_ptr)(AV1EncRowMultiThreadSync *const, int, int, int);
+ /**@}*/
+} AV1EncAllIntraMultiThreadInfo;
+
+/*!
* \brief Max number of recodes used to track the frame probabilities.
*/
#define NUM_RECODES_PER_FRAME 10
@@ -1619,6 +1658,11 @@
* Number of primary workers created for multi-threading.
*/
int p_num_workers;
+
+ /*!
+ * Tracks the number of workers in encode stage multi-threading.
+ */
+ int prev_num_enc_workers;
} PrimaryMultiThreadInfo;
/*!
@@ -1663,6 +1707,12 @@
AV1EncRowMultiThreadInfo enc_row_mt;
/*!
+ * Encoder multi-threading data for allintra mode in the preprocessing stage
+ * when --deltaq-mode=3.
+ */
+ AV1EncAllIntraMultiThreadInfo intra_mt;
+
+ /*!
* Tpl row multi-threading data.
*/
AV1TplRowMultiThreadInfo tpl_row_mt;
@@ -1950,11 +2000,6 @@
YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES];
/*!
- * Pointer to the source frame buffer.
- */
- unsigned char *src_buffer;
-
- /*!
* Holds the number of valid reference frames in past and future directions
* w.r.t. the current frame. num_ref_frames[i] stores the total number of
* valid reference frames in 'i' direction.
@@ -1976,18 +2021,6 @@
int segment_map_w; /*!< segment map width */
int segment_map_h; /*!< segment map height */
/**@}*/
-
- /*!
- * Holds the total number of corner points detected in the source frame.
- */
- int num_src_corners;
-
- /*!
- * Holds the x and y co-ordinates of the corner points detected in the source
- * frame. src_corners[i] holds the x co-ordinate and src_corners[i+1] holds
- * the y co-ordinate of the ith corner point detected.
- */
- int src_corners[2 * MAX_CORNERS];
} GlobalMotionInfo;
/*!
@@ -2405,6 +2438,23 @@
int non_reference_frame;
int ref_frame_comp[3];
int gld_idx_1layer;
+ /*!
+ * Frame number of the last frame that refreshed the buffer slot.
+ */
+ unsigned int buffer_time_index[REF_FRAMES];
+ /*!
+ * Spatial layer id of the last frame that refreshed the buffer slot.
+ */
+ unsigned char buffer_spatial_layer[REF_FRAMES];
+ /*!
+ * Flag to indicate whether closest reference was the previous frame.
+ */
+ bool reference_was_previous_frame;
+ /*!
+ * Flag to indicate this frame is based on longer term reference only,
+ * for recovery from past loss, and it should be biased for improved coding.
+ */
+ bool bias_recovery_frame;
} RTC_REF;
/*!\endcond */
@@ -2751,6 +2801,12 @@
* Struct for the reference structure for RTC.
*/
RTC_REF rtc_ref;
+
+ /*!
+ * Struct for all intra mode row multi threading in the preprocess stage
+ * when --deltaq-mode=3.
+ */
+ AV1EncRowMultiThreadSync intra_row_mt_sync;
} AV1_PRIMARY;
/*!
@@ -3382,6 +3438,23 @@
WeberStats *mb_weber_stats;
/*!
+ * Buffer to store rate cost estimates for each macro block (8x8) in the
+ * preprocessing stage used in allintra mode.
+ */
+ int *prep_rate_estimates;
+
+ /*!
+ * Buffer to store rate cost estimates for each 16x16 block read
+ * from an external file, used in allintra mode.
+ */
+ double *ext_rate_distribution;
+
+ /*!
+ * The scale that equals sum_rate_uniform_quantizer / sum_ext_rate.
+ */
+ double ext_rate_scale;
+
+ /*!
* Buffer to store MB variance after Wiener filter.
*/
BLOCK_SIZE weber_bsize;
@@ -3462,6 +3535,30 @@
* Block level thresholds to force zeromv-skip at partition level.
*/
unsigned int zeromv_skip_thresh_exit_part[BLOCK_SIZES_ALL];
+
+ /*!
+ * Number of downsampling pyramid levels to allocate for each frame
+ * This is currently only used for global motion
+ */
+ int image_pyramid_levels;
+
+#if CONFIG_SALIENCY_MAP
+ /*!
+ * Pixel level saliency map for each frame.
+ */
+ uint8_t *saliency_map;
+
+ /*!
+ * Superblock level rdmult scaling factor driven by saliency map.
+ */
+ double *sm_scaling_factor;
+#endif
+
+ /*!
+ * Number of pixels that choose palette mode for luma in the
+ * fast encoding pass in av1_determine_sc_tools_with_encoding().
+ */
+ int palette_pixel_num;
} AV1_COMP;
/*!
@@ -3599,11 +3696,11 @@
* \ingroup high_level_algo
* This function receives the raw frame data from input.
*
- * \param[in] cpi Top-level encoder structure
- * \param[in] frame_flags Flags to decide how to encoding the frame
- * \param[in] sd Contain raw frame data
- * \param[in] time_stamp Time stamp of the frame
- * \param[in] end_time_stamp End time stamp
+ * \param[in] cpi Top-level encoder structure
+ * \param[in] frame_flags Flags to decide how to encoding the frame
+ * \param[in,out] sd Contain raw frame data
+ * \param[in] time_stamp Time stamp of the frame
+ * \param[in] end_time_stamp End time stamp
*
* \return Returns a value to indicate if the frame data is received
* successfully.
@@ -4177,7 +4274,9 @@
}
if (use_loopfilter) return SKIP_APPLY_LOOPFILTER;
- return 0; // All post-processing stages disabled.
+ // If we reach here, all post-processing stages are disabled, so none need to
+ // be skipped.
+ return 0;
}
static INLINE void set_postproc_filter_default_params(AV1_COMMON *cm) {
diff --git a/av1/encoder/encoder_alloc.h b/av1/encoder/encoder_alloc.h
index f4c345f..7dd81bd 100644
--- a/av1/encoder/encoder_alloc.h
+++ b/av1/encoder/encoder_alloc.h
@@ -213,6 +213,11 @@
aom_free_frame_buffer(&cpi->butteraugli_info.resized_source);
#endif
+#if CONFIG_SALIENCY_MAP
+ aom_free(cpi->saliency_map);
+ aom_free(cpi->sm_scaling_factor);
+#endif
+
release_obmc_buffers(&cpi->td.mb.obmc_buffer);
if (cpi->td.mb.mv_costs) {
@@ -291,6 +296,7 @@
#endif
if (cpi->film_grain_table) {
aom_film_grain_table_free(cpi->film_grain_table);
+ aom_free(cpi->film_grain_table);
cpi->film_grain_table = NULL;
}
@@ -311,6 +317,14 @@
aom_free(cpi->mb_weber_stats);
cpi->mb_weber_stats = NULL;
+ if (cpi->oxcf.enable_rate_guide_deltaq) {
+ aom_free(cpi->prep_rate_estimates);
+ cpi->prep_rate_estimates = NULL;
+
+ aom_free(cpi->ext_rate_distribution);
+ cpi->ext_rate_distribution = NULL;
+ }
+
aom_free(cpi->mb_delta_q);
cpi->mb_delta_q = NULL;
}
@@ -379,7 +393,7 @@
cm->seq_params->subsampling_x, cm->seq_params->subsampling_y,
cm->seq_params->use_highbitdepth, AOM_BORDER_IN_PIXELS,
cm->features.byte_alignment, NULL, NULL, NULL,
- cpi->oxcf.tool_cfg.enable_global_motion, 0))
+ cpi->image_pyramid_levels, 0))
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to reallocate scaled source buffer");
assert(cpi->scaled_source.y_crop_width == scaled_width);
@@ -390,6 +404,40 @@
return &cpi->scaled_source;
}
+// Deallocate allocated thread_data.
+static AOM_INLINE void free_thread_data(AV1_PRIMARY *ppi) {
+ PrimaryMultiThreadInfo *const p_mt_info = &ppi->p_mt_info;
+ for (int t = 1; t < p_mt_info->num_workers; ++t) {
+ EncWorkerData *const thread_data = &p_mt_info->tile_thr_data[t];
+ thread_data->td = thread_data->original_td;
+ aom_free(thread_data->td->tctx);
+ aom_free(thread_data->td->palette_buffer);
+ aom_free(thread_data->td->tmp_conv_dst);
+ release_compound_type_rd_buffers(&thread_data->td->comp_rd_buffer);
+ for (int j = 0; j < 2; ++j) {
+ aom_free(thread_data->td->tmp_pred_bufs[j]);
+ }
+ aom_free(thread_data->td->pixel_gradient_info);
+ aom_free(thread_data->td->src_var_info_of_4x4_sub_blocks);
+ release_obmc_buffers(&thread_data->td->obmc_buffer);
+ aom_free(thread_data->td->vt64x64);
+
+ for (int x = 0; x < 2; x++) {
+ for (int y = 0; y < 2; y++) {
+ aom_free(thread_data->td->hash_value_buffer[x][y]);
+ thread_data->td->hash_value_buffer[x][y] = NULL;
+ }
+ }
+ aom_free(thread_data->td->counts);
+ av1_free_pmc(thread_data->td->firstpass_ctx,
+ ppi->seq_params.monochrome ? 1 : MAX_MB_PLANE);
+ thread_data->td->firstpass_ctx = NULL;
+ av1_free_shared_coeff_buffer(&thread_data->td->shared_coeff_buf);
+ av1_free_sms_tree(thread_data->td);
+ aom_free(thread_data->td);
+ }
+}
+
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/av1/encoder/encoder_utils.c b/av1/encoder/encoder_utils.c
index ad99ec6..bc136b1 100644
--- a/av1/encoder/encoder_utils.c
+++ b/av1/encoder/encoder_utils.c
@@ -701,7 +701,8 @@
RefCntBuffer *ref_fb = get_ref_frame_buf(cm, ref_frame);
if (aom_yv12_realloc_with_new_border(
&ref_fb->buf, AOM_BORDER_IN_PIXELS,
- cm->features.byte_alignment, num_planes) != 0) {
+ cm->features.byte_alignment, cpi->image_pyramid_levels,
+ num_planes) != 0) {
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate frame buffer");
}
@@ -802,10 +803,21 @@
? BLOCK_128X128
: BLOCK_64X64;
} else if (oxcf->mode == REALTIME) {
- if (oxcf->tune_cfg.content == AOM_CONTENT_SCREEN)
- return AOMMIN(width, height) >= 720 ? BLOCK_128X128 : BLOCK_64X64;
- else
+ if (oxcf->tune_cfg.content == AOM_CONTENT_SCREEN) {
+ const TileConfig *const tile_cfg = &oxcf->tile_cfg;
+ const int num_tiles =
+ (1 << tile_cfg->tile_columns) * (1 << tile_cfg->tile_rows);
+ // For multi-thread encode: if the number of (128x128) superblocks
+ // per tile is low use 64X64 superblock.
+ if (oxcf->row_mt == 1 && oxcf->max_threads >= 4 &&
+ oxcf->max_threads >= num_tiles && AOMMIN(width, height) > 720 &&
+ (width * height) / (128 * 128 * num_tiles) <= 38)
+ return BLOCK_64X64;
+ else
+ return AOMMIN(width, height) >= 720 ? BLOCK_128X128 : BLOCK_64X64;
+ } else {
return AOMMIN(width, height) > 720 ? BLOCK_128X128 : BLOCK_64X64;
+ }
}
// TODO(any): Possibly could improve this with a heuristic.
@@ -825,6 +837,16 @@
if (!is_480p_or_lesser && is_1080p_or_lesser && oxcf->mode == GOOD &&
oxcf->row_mt == 1 && oxcf->max_threads > 1 && oxcf->speed >= 5)
return BLOCK_64X64;
+
+ // For allintra encode, since the maximum partition size is set to 32X32 for
+ // speed>=6, superblock size is set to 64X64 instead of 128X128. This
+ // improves the multithread performance due to reduction in top right delay
+ // and thread sync wastage. Currently, this setting is selectively enabled
+ // only for speed>=9 and resolutions less than 4k since cost update
+ // frequency is set to INTERNAL_COST_UPD_OFF in these cases.
+ const int is_4k_or_larger = AOMMIN(width, height) >= 2160;
+ if (oxcf->mode == ALLINTRA && oxcf->speed >= 9 && !is_4k_or_larger)
+ return BLOCK_64X64;
}
return BLOCK_128X128;
}
@@ -948,7 +970,13 @@
if (pass != 1) return;
const double psnr_diff = psnr[1].psnr[0] - psnr[0].psnr[0];
- const int is_sc_encoding_much_better = psnr_diff > STRICT_PSNR_DIFF_THRESH;
+ // Calculate % of palette mode to be chosen in a frame from mode decision.
+ const double palette_ratio =
+ (double)cpi->palette_pixel_num / (double)(cm->height * cm->width);
+ const int psnr_diff_is_large = (psnr_diff > STRICT_PSNR_DIFF_THRESH);
+ const int ratio_is_large =
+ ((palette_ratio >= 0.0001) && ((psnr_diff / palette_ratio) > 4));
+ const int is_sc_encoding_much_better = (psnr_diff_is_large || ratio_is_large);
if (is_sc_encoding_much_better) {
// Use screen content tools, if we get coding gain.
features->allow_screen_content_tools = 1;
@@ -1029,13 +1057,12 @@
cpi->source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_source, &cpi->scaled_source, cm->features.interp_filter,
- 0, false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ 0, false, false, cpi->oxcf.border_in_pixels, cpi->image_pyramid_levels);
if (cpi->unscaled_last_source != NULL) {
cpi->last_source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_last_source, &cpi->scaled_last_source,
cm->features.interp_filter, 0, false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels);
}
av1_setup_frame(cpi);
@@ -1061,9 +1088,8 @@
q_for_screen_content_quick_run,
q_cfg->enable_chroma_deltaq, q_cfg->enable_hdr_deltaq);
av1_set_speed_features_qindex_dependent(cpi, oxcf->speed);
- if (q_cfg->deltaq_mode != NO_DELTA_Q || q_cfg->enable_chroma_deltaq)
- av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
- cm->seq_params->bit_depth);
+ av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
+ cm->seq_params->bit_depth);
av1_set_variance_partition_thresholds(cpi, q_for_screen_content_quick_run,
0);
@@ -1101,7 +1127,7 @@
// Only one filter is used. So set the filter at frame level
for (int i = 0; i < SWITCHABLE_FILTERS; ++i) {
if (count[i]) {
- if (i == EIGHTTAP_REGULAR) *interp_filter = i;
+ *interp_filter = i;
break;
}
}
@@ -1151,7 +1177,8 @@
}
}
- fix_interp_filter(&cm->features.interp_filter, cpi->td.counts);
+ if (!frame_is_intra_only(cm))
+ fix_interp_filter(&cm->features.interp_filter, cpi->td.counts);
}
int av1_is_integer_mv(const YV12_BUFFER_CONFIG *cur_picture,
@@ -1307,10 +1334,22 @@
// Curve fitting with an exponential model on all 16x16 blocks from the
// midres dataset.
var = 67.035434 * (1 - exp(-0.0021489 * var)) + 17.492222;
+
+ // As per the above computation, var will be in the range of
+ // [17.492222, 84.527656], assuming the data type is of infinite
+ // precision. The following assert conservatively checks if var is in the
+ // range of [17.0, 85.0] to avoid any issues due to the precision of the
+ // relevant data type.
+ assert(var > 17.0 && var < 85.0);
cpi->ssim_rdmult_scaling_factors[index] = var;
log_sum += log(var);
}
}
+
+ // As log_sum holds the geometric mean, it will be in the range
+ // [17.492222, 84.527656]. Hence, in the below loop, the value of
+ // cpi->ssim_rdmult_scaling_factors[index] would be in the range
+ // [0.2069, 4.8323].
log_sum = exp(log_sum / (double)(num_rows * num_cols));
for (int row = 0; row < num_rows; ++row) {
diff --git a/av1/encoder/encodetxb.c b/av1/encoder/encodetxb.c
index 4ea4f4c..602a6c4 100644
--- a/av1/encoder/encodetxb.c
+++ b/av1/encoder/encodetxb.c
@@ -220,32 +220,32 @@
}
static INLINE int get_nz_map_ctx(const uint8_t *const levels,
- const int coeff_idx, const int bwl,
- const int height, const int scan_idx,
+ const int coeff_idx, const int bhl,
+ const int width, const int scan_idx,
const int is_eob, const TX_SIZE tx_size,
const TX_CLASS tx_class) {
if (is_eob) {
if (scan_idx == 0) return 0;
- if (scan_idx <= (height << bwl) / 8) return 1;
- if (scan_idx <= (height << bwl) / 4) return 2;
+ if (scan_idx <= (width << bhl) / 8) return 1;
+ if (scan_idx <= (width << bhl) / 4) return 2;
return 3;
}
const int stats =
- get_nz_mag(levels + get_padded_idx(coeff_idx, bwl), bwl, tx_class);
- return get_nz_map_ctx_from_stats(stats, coeff_idx, bwl, tx_size, tx_class);
+ get_nz_mag(levels + get_padded_idx(coeff_idx, bhl), bhl, tx_class);
+ return get_nz_map_ctx_from_stats(stats, coeff_idx, bhl, tx_size, tx_class);
}
void av1_txb_init_levels_c(const tran_low_t *const coeff, const int width,
const int height, uint8_t *const levels) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
uint8_t *ls = levels;
- memset(levels + stride * height, 0,
+ memset(levels + stride * width, 0,
sizeof(*levels) * (TX_PAD_BOTTOM * stride + TX_PAD_END));
- for (int i = 0; i < height; i++) {
- for (int j = 0; j < width; j++) {
- *ls++ = (uint8_t)clamp(abs(coeff[i * width + j]), 0, INT8_MAX);
+ for (int i = 0; i < width; i++) {
+ for (int j = 0; j < height; j++) {
+ *ls++ = (uint8_t)clamp(abs(coeff[i * height + j]), 0, INT8_MAX);
}
for (int j = 0; j < TX_PAD_HOR; j++) {
*ls++ = 0;
@@ -257,11 +257,11 @@
const int16_t *const scan, const uint16_t eob,
const TX_SIZE tx_size, const TX_CLASS tx_class,
int8_t *const coeff_contexts) {
- const int bwl = get_txb_bwl(tx_size);
- const int height = get_txb_high(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
+ const int width = get_txb_wide(tx_size);
for (int i = 0; i < eob; ++i) {
const int pos = scan[i];
- coeff_contexts[pos] = get_nz_map_ctx(levels, pos, bwl, height, i,
+ coeff_contexts[pos] = get_nz_map_ctx(levels, pos, bhl, width, i,
i == eob - 1, tx_size, tx_class);
}
}
@@ -344,7 +344,7 @@
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
const tran_low_t *tcoeff_txb =
cb_coef_buff->tcoeff[plane] + x->mbmi_ext_frame->cb_offset[plane_type];
const tran_low_t *tcoeff = tcoeff_txb + BLOCK_OFFSET(block);
@@ -354,7 +354,7 @@
DECLARE_ALIGNED(16, int8_t, coeff_contexts[MAX_TX_SQUARE]);
av1_get_nz_map_contexts(levels, scan, eob, tx_size, tx_class, coeff_contexts);
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
for (int c = eob - 1; c >= 0; --c) {
const int pos = scan[c];
const int coeff_ctx = coeff_contexts[pos];
@@ -373,7 +373,7 @@
if (level > NUM_BASE_LEVELS) {
// level is above 1.
const int base_range = level - 1 - NUM_BASE_LEVELS;
- const int br_ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int br_ctx = get_br_ctx(levels, pos, bhl, tx_class);
aom_cdf_prob *cdf =
ec_ctx->coeff_br_cdf[AOMMIN(txs_ctx, TX_32X32)][plane_type][br_ctx];
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
@@ -571,7 +571,7 @@
get_txb_ctx(plane_bsize, tx_size, plane,
pd->above_entropy_context + blk_col,
pd->left_entropy_context + blk_row, &txb_ctx);
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
const uint8_t allow_update_cdf = args->allow_update_cdf;
@@ -607,7 +607,7 @@
memcpy(tcoeff, qcoeff, sizeof(*tcoeff) * seg_eob);
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
av1_txb_init_levels(tcoeff, width, height, levels);
update_tx_type_count(cpi, cm, xd, blk_row, blk_col, plane, tx_size,
td->counts, allow_update_cdf);
@@ -663,7 +663,7 @@
}
if (level > NUM_BASE_LEVELS) {
const int base_range = level - 1 - NUM_BASE_LEVELS;
- const int br_ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int br_ctx = get_br_ctx(levels, pos, bhl, tx_class);
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
const int k = AOMMIN(base_range - idx, BR_CDF_SIZE - 1);
if (allow_update_cdf) {
@@ -735,7 +735,7 @@
pd->left_entropy_context + blk_row, &txb_ctx);
#if CONFIG_ENTROPY_STATS
const TX_SIZE txsize_ctx = get_txsize_entropy_ctx(tx_size);
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
int cdf_idx = cm->coef_cdf_category;
@@ -764,7 +764,7 @@
#if CONFIG_ENTROPY_STATS
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
av1_txb_init_levels(tcoeff, width, height, levels);
update_tx_type_count(cpi, cm, xd, blk_row, blk_col, plane, tx_size,
td->counts, 0 /*allow_update_cdf*/);
@@ -810,7 +810,7 @@
}
if (level > NUM_BASE_LEVELS) {
const int base_range = level - 1 - NUM_BASE_LEVELS;
- const int br_ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int br_ctx = get_br_ctx(levels, pos, bhl, tx_class);
for (int idx = 0; idx < COEFF_BASE_RANGE; idx += BR_CDF_SIZE - 1) {
const int k = AOMMIN(base_range - idx, BR_CDF_SIZE - 1);
for (int lps = 0; lps < BR_CDF_SIZE - 1; lps++) {
diff --git a/av1/encoder/ethread.c b/av1/encoder/ethread.c
index 4127c9a..2a00999 100644
--- a/av1/encoder/ethread.c
+++ b/av1/encoder/ethread.c
@@ -100,7 +100,6 @@
(void)row_mt_sync;
(void)r;
(void)c;
- return;
}
void av1_row_mt_sync_write_dummy(AV1EncRowMultiThreadSync *row_mt_sync, int r,
@@ -109,7 +108,6 @@
(void)r;
(void)c;
(void)cols;
- return;
}
void av1_row_mt_sync_read(AV1EncRowMultiThreadSync *row_mt_sync, int r, int c) {
@@ -586,7 +584,7 @@
launch_loop_filter_rows(cm, thread_data, enc_row_mt, mib_size_log2);
}
av1_free_pc_tree_recursive(thread_data->td->rt_pc_root, av1_num_planes(cm), 0,
- 0);
+ 0, cpi->sf.part_sf.partition_search_type);
return 1;
}
@@ -619,7 +617,7 @@
}
av1_free_pc_tree_recursive(thread_data->td->rt_pc_root, av1_num_planes(cm), 0,
- 0);
+ 0, cpi->sf.part_sf.partition_search_type);
return 1;
}
@@ -777,6 +775,7 @@
int num_workers = p_mt_info->num_workers;
int num_enc_workers = av1_get_num_mod_workers_for_alloc(p_mt_info, MOD_ENC);
+ assert(num_enc_workers <= num_workers);
for (int i = num_workers - 1; i >= 0; i--) {
EncWorkerData *const thread_data = &p_mt_info->tile_thr_data[i];
@@ -886,6 +885,10 @@
}
}
}
+
+ // Record the number of workers in encode stage multi-threading for which
+ // allocation is done.
+ p_mt_info->prev_num_enc_workers = num_enc_workers;
}
void av1_create_workers(AV1_PRIMARY *ppi, int num_workers) {
@@ -1261,7 +1264,7 @@
static AOM_INLINE void sync_enc_workers(MultiThreadInfo *const mt_info,
AV1_COMMON *const cm, int num_workers) {
const AVxWorkerInterface *const winterface = aom_get_worker_interface();
- int had_error = 0;
+ int had_error = mt_info->workers[0].had_error;
// Encoding ends.
for (int i = num_workers - 1; i > 0; i--) {
@@ -1284,6 +1287,7 @@
// Accumulate rtc counters.
if (!frame_is_intra_only(&cpi->common))
av1_accumulate_rtc_counters(cpi, &thread_data->td->mb);
+ cpi->palette_pixel_num += thread_data->td->mb.palette_pixels;
if (thread_data->td != &cpi->td) {
// Keep these conditional expressions in sync with the corresponding ones
// in prepare_enc_workers().
@@ -1381,6 +1385,8 @@
// Reset rtc counters.
av1_init_rtc_counters(&thread_data->td->mb);
+ thread_data->td->mb.palette_pixels = 0;
+
if (thread_data->td->counts != &cpi->counts) {
memcpy(thread_data->td->counts, &cpi->counts, sizeof(cpi->counts));
}
@@ -1827,7 +1833,6 @@
(void)tpl_mt_sync;
(void)r;
(void)c;
- return;
}
void av1_tpl_row_mt_sync_write_dummy(AV1TplRowMultiThreadSync *tpl_mt_sync,
@@ -1836,7 +1841,6 @@
(void)r;
(void)c;
(void)cols;
- return;
}
void av1_tpl_row_mt_sync_read(AV1TplRowMultiThreadSync *tpl_row_mt_sync, int r,
@@ -2007,6 +2011,7 @@
}
}
+#if CONFIG_BITRATE_ACCURACY
// Accumulate transform stats after tpl.
static void tpl_accumulate_txfm_stats(ThreadData *main_td,
const MultiThreadInfo *mt_info,
@@ -2022,6 +2027,7 @@
}
}
}
+#endif // CONFIG_BITRATE_ACCURACY
// Implements multi-threading for tpl.
void av1_mc_flow_dispenser_mt(AV1_COMP *cpi) {
@@ -2047,7 +2053,9 @@
prepare_tpl_workers(cpi, tpl_worker_hook, num_workers);
launch_workers(&cpi->mt_info, num_workers);
sync_enc_workers(&cpi->mt_info, cm, num_workers);
+#if CONFIG_BITRATE_ACCURACY
tpl_accumulate_txfm_stats(&cpi->td, &cpi->mt_info, num_workers);
+#endif // CONFIG_BITRATE_ACCURACY
}
// Deallocate memory for temporal filter multi-thread synchronization.
@@ -2223,7 +2231,7 @@
static AOM_INLINE void init_gm_thread_data(
const GlobalMotionInfo *gm_info, GlobalMotionThreadData *thread_data) {
for (int m = 0; m < RANSAC_NUM_MOTIONS; m++) {
- MotionModel motion_params = thread_data->params_by_motion[m];
+ MotionModel motion_params = thread_data->motion_models[m];
av1_zero(motion_params.params);
motion_params.num_inliers = 0;
}
@@ -2251,7 +2259,6 @@
while (1) {
int ref_buf_idx = -1;
- int ref_frame_idx = -1;
#if CONFIG_MULTITHREAD
pthread_mutex_lock(gm_mt_mutex_);
@@ -2265,11 +2272,6 @@
switch_direction(cpi, &ref_buf_idx, &cur_dir);
}
- // 'ref_frame_idx' holds the index of the current reference frame type in
- // gm_info->reference_frames. job_info->next_frame_to_process will be
- // incremented in get_next_gm_job() and hence subtracting by 1.
- ref_frame_idx = job_info->next_frame_to_process[cur_dir] - 1;
-
#if CONFIG_MULTITHREAD
pthread_mutex_unlock(gm_mt_mutex_);
#endif
@@ -2280,23 +2282,18 @@
// Compute global motion for the given ref_buf_idx.
av1_compute_gm_for_valid_ref_frames(
- cpi, gm_info->ref_buf, ref_buf_idx, gm_info->num_src_corners,
- gm_info->src_corners, gm_info->src_buffer,
- gm_thread_data->params_by_motion, gm_thread_data->segment_map,
- gm_info->segment_map_w, gm_info->segment_map_h);
+ cpi, gm_info->ref_buf, ref_buf_idx, gm_thread_data->motion_models,
+ gm_thread_data->segment_map, gm_info->segment_map_w,
+ gm_info->segment_map_h);
#if CONFIG_MULTITHREAD
pthread_mutex_lock(gm_mt_mutex_);
#endif
- assert(ref_frame_idx != -1);
// If global motion w.r.t. current ref frame is
// INVALID/TRANSLATION/IDENTITY, skip the evaluation of global motion w.r.t
- // the remaining ref frames in that direction. The below exit is disabled
- // when ref frame distance w.r.t. current frame is zero. E.g.:
- // source_alt_ref_frame w.r.t. ARF frames.
+ // the remaining ref frames in that direction.
if (cpi->sf.gm_sf.prune_ref_frame_for_gm_search &&
- gm_info->reference_frames[cur_dir][ref_frame_idx].distance != 0 &&
- cpi->common.global_motion[ref_buf_idx].wmtype != ROTZOOM)
+ cpi->common.global_motion[ref_buf_idx].wmtype <= TRANSLATION)
job_info->early_exit[cur_dir] = 1;
#if CONFIG_MULTITHREAD
@@ -2361,7 +2358,7 @@
aom_free(thread_data->segment_map);
for (int m = 0; m < RANSAC_NUM_MOTIONS; m++)
- aom_free(thread_data->params_by_motion[m].inliers);
+ aom_free(thread_data->motion_models[m].inliers);
}
aom_free(gm_sync_data->thread_data);
}
@@ -2390,8 +2387,8 @@
for (int m = 0; m < RANSAC_NUM_MOTIONS; m++) {
CHECK_MEM_ERROR(
- cm, thread_data->params_by_motion[m].inliers,
- aom_malloc(sizeof(*thread_data->params_by_motion[m].inliers) * 2 *
+ cm, thread_data->motion_models[m].inliers,
+ aom_malloc(sizeof(*thread_data->motion_models[m].inliers) * 2 *
MAX_CORNERS));
}
}
@@ -2420,64 +2417,16 @@
}
#endif // !CONFIG_REALTIME_ONLY
-// Allocate memory for row synchronization
-static void wiener_var_sync_mem_alloc(
- AV1EncRowMultiThreadSync *const row_mt_sync, AV1_COMMON *const cm,
- const int rows) {
-#if CONFIG_MULTITHREAD
- int i;
-
- CHECK_MEM_ERROR(cm, row_mt_sync->mutex_,
- aom_malloc(sizeof(*row_mt_sync->mutex_) * rows));
- if (row_mt_sync->mutex_) {
- for (i = 0; i < rows; ++i) {
- pthread_mutex_init(&row_mt_sync->mutex_[i], NULL);
- }
+static AOM_INLINE int get_next_job_allintra(
+ AV1EncRowMultiThreadSync *const row_mt_sync, const int mi_row_end,
+ int *current_mi_row, int mib_size) {
+ if (row_mt_sync->next_mi_row < mi_row_end) {
+ *current_mi_row = row_mt_sync->next_mi_row;
+ row_mt_sync->num_threads_working++;
+ row_mt_sync->next_mi_row += mib_size;
+ return 1;
}
-
- CHECK_MEM_ERROR(cm, row_mt_sync->cond_,
- aom_malloc(sizeof(*row_mt_sync->cond_) * rows));
- if (row_mt_sync->cond_) {
- for (i = 0; i < rows; ++i) {
- pthread_cond_init(&row_mt_sync->cond_[i], NULL);
- }
- }
-#endif // CONFIG_MULTITHREAD
-
- CHECK_MEM_ERROR(cm, row_mt_sync->num_finished_cols,
- aom_malloc(sizeof(*row_mt_sync->num_finished_cols) * rows));
-
- row_mt_sync->rows = rows;
- // Set up nsync.
- row_mt_sync->sync_range = 1;
-}
-
-// Deallocate row based multi-threading synchronization related mutex and data
-static void wiener_var_sync_mem_dealloc(AV1EncRowMultiThreadSync *row_mt_sync) {
- if (row_mt_sync != NULL) {
-#if CONFIG_MULTITHREAD
- int i;
-
- if (row_mt_sync->mutex_ != NULL) {
- for (i = 0; i < row_mt_sync->rows; ++i) {
- pthread_mutex_destroy(&row_mt_sync->mutex_[i]);
- }
- aom_free(row_mt_sync->mutex_);
- }
- if (row_mt_sync->cond_ != NULL) {
- for (i = 0; i < row_mt_sync->rows; ++i) {
- pthread_cond_destroy(&row_mt_sync->cond_[i]);
- }
- aom_free(row_mt_sync->cond_);
- }
-#endif // CONFIG_MULTITHREAD
- aom_free(row_mt_sync->num_finished_cols);
-
- // clear the structure as the source of this call may be dynamic change
- // in tiles in which case this call will be followed by an _alloc()
- // which may fail.
- av1_zero(*row_mt_sync);
- }
+ return 0;
}
static AOM_INLINE void prepare_wiener_var_workers(AV1_COMP *const cpi,
@@ -2518,7 +2467,8 @@
MACROBLOCKD *xd = &x->e_mbd;
const BLOCK_SIZE bsize = cpi->weber_bsize;
const int mb_step = mi_size_wide[bsize];
- AV1EncRowMultiThreadSync *const row_mt_sync = &cpi->tile_data[0].row_mt_sync;
+ AV1EncRowMultiThreadSync *const intra_row_mt_sync =
+ &cpi->ppi->intra_row_mt_sync;
AV1EncRowMultiThreadInfo *const enc_row_mt = &cpi->mt_info.enc_row_mt;
(void)enc_row_mt;
#if CONFIG_MULTITHREAD
@@ -2536,7 +2486,9 @@
#if CONFIG_MULTITHREAD
pthread_mutex_lock(enc_row_mt_mutex_);
#endif
- has_jobs = get_next_job(&cpi->tile_data[0], ¤t_mi_row, mb_step);
+ has_jobs =
+ get_next_job_allintra(intra_row_mt_sync, cpi->common.mi_params.mi_rows,
+ ¤t_mi_row, mb_step);
#if CONFIG_MULTITHREAD
pthread_mutex_unlock(enc_row_mt_mutex_);
#endif
@@ -2548,7 +2500,7 @@
#if CONFIG_MULTITHREAD
pthread_mutex_lock(enc_row_mt_mutex_);
#endif
- row_mt_sync->num_threads_working--;
+ intra_row_mt_sync->num_threads_working--;
#if CONFIG_MULTITHREAD
pthread_mutex_unlock(enc_row_mt_mutex_);
#endif
@@ -2569,31 +2521,24 @@
(void)sum_est_rate;
AV1_COMMON *const cm = &cpi->common;
MultiThreadInfo *const mt_info = &cpi->mt_info;
- const int tile_cols = 1;
- const int tile_rows = 1;
- if (cpi->tile_data != NULL) aom_free(cpi->tile_data);
- CHECK_MEM_ERROR(
- cm, cpi->tile_data,
- aom_memalign(32, tile_cols * tile_rows * sizeof(*cpi->tile_data)));
- cpi->allocated_tiles = tile_cols * tile_rows;
- cpi->tile_data->tile_info.mi_row_end = cm->mi_params.mi_rows;
- AV1EncRowMultiThreadSync *const row_mt_sync = &cpi->tile_data[0].row_mt_sync;
+ AV1EncRowMultiThreadSync *const intra_row_mt_sync =
+ &cpi->ppi->intra_row_mt_sync;
// TODO(chengchen): the memory usage could be improved.
const int mi_rows = cm->mi_params.mi_rows;
- wiener_var_sync_mem_alloc(row_mt_sync, cm, mi_rows);
+ row_mt_sync_mem_alloc(intra_row_mt_sync, cm, mi_rows);
- row_mt_sync->intrabc_extra_top_right_sb_delay = 0;
- row_mt_sync->num_threads_working = num_workers;
- row_mt_sync->next_mi_row = 0;
- memset(row_mt_sync->num_finished_cols, -1,
- sizeof(*row_mt_sync->num_finished_cols) * num_workers);
+ intra_row_mt_sync->intrabc_extra_top_right_sb_delay = 0;
+ intra_row_mt_sync->num_threads_working = num_workers;
+ intra_row_mt_sync->next_mi_row = 0;
+ memset(intra_row_mt_sync->num_finished_cols, -1,
+ sizeof(*intra_row_mt_sync->num_finished_cols) * mi_rows);
prepare_wiener_var_workers(cpi, cal_mb_wiener_var_hook, num_workers);
launch_workers(mt_info, num_workers);
sync_enc_workers(mt_info, cm, num_workers);
- wiener_var_sync_mem_dealloc(row_mt_sync);
+ row_mt_sync_mem_dealloc(intra_row_mt_sync);
}
// Compare and order tiles based on absolute sum of tx coeffs.
@@ -3073,6 +3018,9 @@
// Computes num_workers for all intra multi-threading.
static AOM_INLINE int compute_num_ai_workers(AV1_COMP *cpi) {
if (cpi->oxcf.max_threads <= 1) return 1;
+ // The multi-threading implementation of deltaq-mode = 3 in allintra
+ // mode is based on row multi threading.
+ if (!cpi->oxcf.row_mt) return 1;
cpi->weber_bsize = BLOCK_8X8;
const BLOCK_SIZE bsize = cpi->weber_bsize;
const int mb_step = mi_size_wide[bsize];
diff --git a/av1/encoder/firstpass.c b/av1/encoder/firstpass.c
index 3dee644..1fad149 100644
--- a/av1/encoder/firstpass.c
+++ b/av1/encoder/firstpass.c
@@ -93,6 +93,8 @@
section->intra_error = 0.0;
section->frame_avg_wavelet_energy = 0.0;
section->coded_error = 0.0;
+ section->log_intra_error = 0.0;
+ section->log_coded_error = 0.0;
section->sr_coded_error = 0.0;
section->pcnt_inter = 0.0;
section->pcnt_motion = 0.0;
@@ -121,6 +123,8 @@
section->frame += frame->frame;
section->weight += frame->weight;
section->intra_error += frame->intra_error;
+ section->log_intra_error += log1p(frame->intra_error);
+ section->log_coded_error += log1p(frame->coded_error);
section->frame_avg_wavelet_energy += frame->frame_avg_wavelet_energy;
section->coded_error += frame->coded_error;
section->sr_coded_error += frame->sr_coded_error;
@@ -217,7 +221,6 @@
case BLOCK_8X16: return aom_highbd_8_mse8x16;
default: return aom_highbd_8_mse16x16;
}
- break;
case 10:
switch (bsize) {
case BLOCK_8X8: return aom_highbd_10_mse8x8;
@@ -225,7 +228,6 @@
case BLOCK_8X16: return aom_highbd_10_mse8x16;
default: return aom_highbd_10_mse16x16;
}
- break;
case 12:
switch (bsize) {
case BLOCK_8X8: return aom_highbd_12_mse8x8;
@@ -233,7 +235,6 @@
case BLOCK_8X16: return aom_highbd_12_mse8x16;
default: return aom_highbd_12_mse16x16;
}
- break;
}
}
@@ -276,7 +277,7 @@
cpi->is_screen_content_type && cpi->common.features.allow_intrabc;
FULLPEL_MOTION_SEARCH_PARAMS ms_params;
av1_make_default_fullpel_ms_params(&ms_params, cpi, x, bsize, ref_mv,
- first_pass_search_sites,
+ start_mv, first_pass_search_sites,
fine_search_interval);
av1_set_mv_search_method(&ms_params, first_pass_search_sites, NSTEP);
@@ -514,7 +515,7 @@
stats->image_data_start_row = unit_row;
}
- double log_intra = log(this_intra_error + 1.0);
+ double log_intra = log1p(this_intra_error);
if (log_intra < 10.0) {
stats->intra_factor += 1.0 + ((10.0 - log_intra) * 0.05);
} else {
@@ -707,6 +708,8 @@
// Compute the motion error of the 0,0 motion using the last source
// frame as the reference. Skip the further motion search on
// reconstructed frame if this error is small.
+ // TODO(chiyotsai): The unscaled last source might be different dimension
+ // as the current source. See BUG=aomedia:3413
struct buf_2d unscaled_last_source_buf_2d;
unscaled_last_source_buf_2d.buf =
cpi->unscaled_last_source->y_buffer + src_yoffset;
@@ -734,44 +737,43 @@
mv = tmp_mv;
}
}
+ }
- // Motion search in 2nd reference frame.
- int gf_motion_error = motion_error;
- if ((current_frame->frame_number > 1) && golden_frame != NULL) {
- FULLPEL_MV tmp_mv = kZeroFullMv;
- // Assume 0,0 motion with no mv overhead.
- xd->plane[0].pre[0].buf = golden_frame->y_buffer + recon_yoffset;
- xd->plane[0].pre[0].stride = golden_frame->y_stride;
- gf_motion_error =
- get_prediction_error_bitdepth(is_high_bitdepth, bitdepth, bsize,
- &x->plane[0].src, &xd->plane[0].pre[0]);
- first_pass_motion_search(cpi, x, &kZeroMv, &tmp_mv, &gf_motion_error);
- }
- if (gf_motion_error < motion_error && gf_motion_error < this_intra_error) {
- ++stats->second_ref_count;
- }
- // In accumulating a score for the 2nd reference frame take the
- // best of the motion predicted score and the intra coded error
- // (just as will be done for) accumulation of "coded_error" for
- // the last frame.
- if ((current_frame->frame_number > 1) && golden_frame != NULL) {
- stats->sr_coded_error += AOMMIN(gf_motion_error, this_intra_error);
- } else {
- // TODO(chengchen): I believe logically this should also be changed to
- // stats->sr_coded_error += AOMMIN(gf_motion_error, this_intra_error).
- stats->sr_coded_error += motion_error;
- }
-
- // Reset to last frame as reference buffer.
- xd->plane[0].pre[0].buf = last_frame->y_buffer + recon_yoffset;
- if (av1_num_planes(&cpi->common) > 1) {
- xd->plane[1].pre[0].buf = last_frame->u_buffer + recon_uvoffset;
- xd->plane[2].pre[0].buf = last_frame->v_buffer + recon_uvoffset;
- }
+ // Motion search in 2nd reference frame.
+ int gf_motion_error = motion_error;
+ if ((current_frame->frame_number > 1) && golden_frame != NULL) {
+ FULLPEL_MV tmp_mv = kZeroFullMv;
+ // Assume 0,0 motion with no mv overhead.
+ xd->plane[0].pre[0].buf = golden_frame->y_buffer + recon_yoffset;
+ xd->plane[0].pre[0].stride = golden_frame->y_stride;
+ xd->plane[0].pre[0].width = golden_frame->y_width;
+ gf_motion_error =
+ get_prediction_error_bitdepth(is_high_bitdepth, bitdepth, bsize,
+ &x->plane[0].src, &xd->plane[0].pre[0]);
+ first_pass_motion_search(cpi, x, &kZeroMv, &tmp_mv, &gf_motion_error);
+ }
+ if (gf_motion_error < motion_error && gf_motion_error < this_intra_error) {
+ ++stats->second_ref_count;
+ }
+ // In accumulating a score for the 2nd reference frame take the
+ // best of the motion predicted score and the intra coded error
+ // (just as will be done for) accumulation of "coded_error" for
+ // the last frame.
+ if ((current_frame->frame_number > 1) && golden_frame != NULL) {
+ stats->sr_coded_error += AOMMIN(gf_motion_error, this_intra_error);
} else {
+ // TODO(chengchen): I believe logically this should also be changed to
+ // stats->sr_coded_error += AOMMIN(gf_motion_error, this_intra_error).
stats->sr_coded_error += motion_error;
}
+ // Reset to last frame as reference buffer.
+ xd->plane[0].pre[0].buf = last_frame->y_buffer + recon_yoffset;
+ if (av1_num_planes(&cpi->common) > 1) {
+ xd->plane[1].pre[0].buf = last_frame->u_buffer + recon_uvoffset;
+ xd->plane[2].pre[0].buf = last_frame->v_buffer + recon_uvoffset;
+ }
+
// Start by assuming that intra mode is best.
*best_mv = kZeroMv;
@@ -829,7 +831,8 @@
fps->sr_coded_error /= num_mbs_16x16;
fps->intra_error /= num_mbs_16x16;
fps->frame_avg_wavelet_energy /= num_mbs_16x16;
-
+ fps->log_coded_error = log1p(fps->coded_error);
+ fps->log_intra_error = log1p(fps->intra_error);
fps->MVr /= f_h;
fps->mvr_abs /= f_h;
fps->MVc /= f_w;
@@ -889,11 +892,13 @@
fps.pcnt_neutral = (double)stats->neutral_count / num_mbs;
fps.intra_skip_pct = (double)stats->intra_skip_count / num_mbs;
fps.inactive_zone_rows = (double)stats->image_data_start_row;
- fps.inactive_zone_cols = (double)0; // Placeholder: not currently supported.
+ fps.inactive_zone_cols = 0.0; // Placeholder: not currently supported.
fps.raw_error_stdev = raw_err_stdev;
fps.is_flash = 0;
- fps.noise_var = (double)0;
- fps.cor_coeff = (double)1.0;
+ fps.noise_var = 0.0;
+ fps.cor_coeff = 1.0;
+ fps.log_coded_error = 0.0;
+ fps.log_intra_error = 0.0;
if (stats->mv_count > 0) {
fps.MVr = (double)stats->sum_mvr / stats->mv_count;
@@ -1118,10 +1123,16 @@
AV1EncRowMultiThreadInfo *const enc_row_mt = &mt_info->enc_row_mt;
AV1EncRowMultiThreadSync *const row_mt_sync = &tile_data->row_mt_sync;
- const YV12_BUFFER_CONFIG *const last_frame =
- get_ref_frame_yv12_buf(cm, LAST_FRAME);
+ const YV12_BUFFER_CONFIG *last_frame =
+ av1_get_scaled_ref_frame(cpi, LAST_FRAME);
+ if (!last_frame) {
+ last_frame = get_ref_frame_yv12_buf(cm, LAST_FRAME);
+ }
const YV12_BUFFER_CONFIG *golden_frame =
- get_ref_frame_yv12_buf(cm, GOLDEN_FRAME);
+ av1_get_scaled_ref_frame(cpi, GOLDEN_FRAME);
+ if (!golden_frame) {
+ golden_frame = get_ref_frame_yv12_buf(cm, GOLDEN_FRAME);
+ }
YV12_BUFFER_CONFIG *const this_frame = &cm->cur_frame->buf;
PICK_MODE_CONTEXT *ctx = td->firstpass_ctx;
@@ -1249,6 +1260,9 @@
const int num_planes = av1_num_planes(cm);
MACROBLOCKD *const xd = &x->e_mbd;
const int qindex = find_fp_qindex(seq_params->bit_depth);
+ const int ref_frame_flags_backup = cpi->ref_frame_flags;
+ cpi->ref_frame_flags = av1_ref_frame_flag_list[LAST_FRAME] |
+ av1_ref_frame_flag_list[GOLDEN_FRAME];
// Detect if the key frame is screen content type.
if (frame_is_intra_only(cm)) {
@@ -1300,10 +1314,18 @@
av1_init_tile_data(cpi);
- const YV12_BUFFER_CONFIG *const last_frame =
- get_ref_frame_yv12_buf(cm, LAST_FRAME);
- const YV12_BUFFER_CONFIG *golden_frame =
- get_ref_frame_yv12_buf(cm, GOLDEN_FRAME);
+ const YV12_BUFFER_CONFIG *last_frame = NULL;
+ const YV12_BUFFER_CONFIG *golden_frame = NULL;
+ if (!frame_is_intra_only(cm)) {
+ av1_scale_references(cpi, EIGHTTAP_REGULAR, 0, 0);
+ last_frame = av1_is_scaled(get_ref_scale_factors_const(cm, LAST_FRAME))
+ ? av1_get_scaled_ref_frame(cpi, LAST_FRAME)
+ : get_ref_frame_yv12_buf(cm, LAST_FRAME);
+ golden_frame = av1_is_scaled(get_ref_scale_factors_const(cm, GOLDEN_FRAME))
+ ? av1_get_scaled_ref_frame(cpi, GOLDEN_FRAME)
+ : get_ref_frame_yv12_buf(cm, GOLDEN_FRAME);
+ }
+
YV12_BUFFER_CONFIG *const this_frame = &cm->cur_frame->buf;
// First pass code requires valid last and new frame buffers.
assert(this_frame != NULL);
@@ -1425,6 +1447,10 @@
/*do_print=*/0);
++current_frame->frame_number;
+ cpi->ref_frame_flags = ref_frame_flags_backup;
+ if (!frame_is_intra_only(cm)) {
+ release_scaled_references(cpi);
+ }
}
aom_codec_err_t av1_firstpass_info_init(FIRSTPASS_INFO *firstpass_info,
diff --git a/av1/encoder/firstpass.h b/av1/encoder/firstpass.h
index d5f750f..e18e9e4 100644
--- a/av1/encoder/firstpass.h
+++ b/av1/encoder/firstpass.h
@@ -12,6 +12,8 @@
#ifndef AOM_AV1_ENCODER_FIRSTPASS_H_
#define AOM_AV1_ENCODER_FIRSTPASS_H_
+#include <stdbool.h>
+
#include "av1/common/av1_common_int.h"
#include "av1/common/enums.h"
#include "av1/encoder/lookahead.h"
@@ -161,6 +163,14 @@
* Correlation coefficient with the previous frame
*/
double cor_coeff;
+ /*!
+ * log of intra_error
+ */
+ double log_intra_error;
+ /*!
+ * log of coded_error
+ */
+ double log_coded_error;
} FIRSTPASS_STATS;
// We want to keep one past stats for key frame detection
@@ -386,9 +396,9 @@
// 2 : frame occurs later in encode order in a given parallel encode set.
int frame_parallel_level[MAX_STATIC_GF_GROUP_LENGTH];
// Indicates whether a frame should act as non-reference frame.
- // 0 : frame is a reference frame.
- // 1 : frame is a non-reference frame.
- int is_frame_non_ref[MAX_STATIC_GF_GROUP_LENGTH];
+ bool is_frame_non_ref[MAX_STATIC_GF_GROUP_LENGTH];
+ // Indicates whether a frame is dropped.
+ bool is_frame_dropped[MAX_STATIC_GF_GROUP_LENGTH];
// Stores the display order hint of the frames not to be
// refreshed by the current frame.
@@ -454,7 +464,6 @@
int last_kfgroup_zeromotion_pct;
int extend_minq;
int extend_maxq;
- int extend_minq_fast;
/*!\endcond */
} TWO_PASS;
diff --git a/av1/encoder/global_motion.c b/av1/encoder/global_motion.c
index 9e84e53..bc5e186 100644
--- a/av1/encoder/global_motion.c
+++ b/av1/encoder/global_motion.c
@@ -37,7 +37,6 @@
static void convert_to_params(const double *params, int32_t *model) {
int i;
- int alpha_present = 0;
model[0] = (int32_t)floor(params[0] * (1 << GM_TRANS_PREC_BITS) + 0.5);
model[1] = (int32_t)floor(params[1] * (1 << GM_TRANS_PREC_BITS) + 0.5);
model[0] = (int32_t)clamp(model[0], GM_TRANS_MIN, GM_TRANS_MAX) *
@@ -50,22 +49,8 @@
model[i] = (int32_t)floor(params[i] * (1 << GM_ALPHA_PREC_BITS) + 0.5);
model[i] =
(int32_t)clamp(model[i] - diag_value, GM_ALPHA_MIN, GM_ALPHA_MAX);
- alpha_present |= (model[i] != 0);
model[i] = (model[i] + diag_value) * GM_ALPHA_DECODE_FACTOR;
}
- for (; i < 8; ++i) {
- model[i] = (int32_t)floor(params[i] * (1 << GM_ROW3HOMO_PREC_BITS) + 0.5);
- model[i] = (int32_t)clamp(model[i], GM_ROW3HOMO_MIN, GM_ROW3HOMO_MAX) *
- GM_ROW3HOMO_DECODE_FACTOR;
- alpha_present |= (model[i] != 0);
- }
-
- if (!alpha_present) {
- if (abs(model[0]) < MIN_TRANS_THRESH && abs(model[1]) < MIN_TRANS_THRESH) {
- model[0] = 0;
- model[1] = 0;
- }
- }
}
void av1_convert_model_to_params(const double *params,
@@ -80,11 +65,10 @@
// zero-centering.
static int32_t add_param_offset(int param_index, int32_t param_value,
int32_t offset) {
- const int scale_vals[3] = { GM_TRANS_PREC_DIFF, GM_ALPHA_PREC_DIFF,
- GM_ROW3HOMO_PREC_DIFF };
- const int clamp_vals[3] = { GM_TRANS_MAX, GM_ALPHA_MAX, GM_ROW3HOMO_MAX };
- // type of param: 0 - translation, 1 - affine, 2 - homography
- const int param_type = (param_index < 2 ? 0 : (param_index < 6 ? 1 : 2));
+ const int scale_vals[2] = { GM_TRANS_PREC_DIFF, GM_ALPHA_PREC_DIFF };
+ const int clamp_vals[2] = { GM_TRANS_MAX, GM_ALPHA_MAX };
+ // type of param: 0 - translation, 1 - affine
+ const int param_type = (param_index < 2 ? 0 : 1);
const int is_one_centered = (param_index == 2 || param_index == 5);
// Make parameter zero-centered and offset the shift that was done to make
@@ -206,8 +190,9 @@
int p_height, int p_stride, int subsampling_x,
int subsampling_y, int64_t best_error,
uint8_t *segment_map, int segment_map_stride) {
- if (wm->wmtype <= AFFINE)
- if (!av1_get_shear_params(wm)) return INT64_MAX;
+ force_wmtype(wm, wm->wmtype);
+ assert(wm->wmtype <= AFFINE);
+ if (!av1_get_shear_params(wm)) return INT64_MAX;
#if CONFIG_AV1_HIGHBITDEPTH
if (use_hbd)
return highbd_warp_error(wm, CONVERT_TO_SHORTPTR(ref), width, height,
@@ -224,8 +209,8 @@
}
// Factors used to calculate the thresholds for av1_warp_error
-static double thresh_factors[GM_REFINEMENT_COUNT] = { 1.25, 1.20, 1.15, 1.10,
- 1.05 };
+static double thresh_factors[GM_MAX_REFINEMENT_STEPS] = { 1.25, 1.20, 1.15,
+ 1.10, 1.05 };
static INLINE int64_t calc_approx_erroradv_threshold(
double scaling_factor, int64_t erroradv_threshold) {
@@ -258,6 +243,12 @@
dst + border * d_stride + border, border, border,
d_width - 2 * border, d_height - 2 * border, d_stride, 0,
0, best_frame_error, segment_map, segment_map_stride);
+
+ if (n_refinements == 0) {
+ wm->wmtype = get_wmtype(wm);
+ return best_error;
+ }
+
best_error = AOMMIN(best_error, best_frame_error);
step = 1 << (n_refinements - 1);
for (i = 0; i < n_refinements; i++, step >>= 1) {
@@ -324,7 +315,7 @@
}
#define FEAT_COUNT_TR 3
-#define SEG_COUNT_TR 0.40
+#define SEG_COUNT_TR 48
void av1_compute_feature_segmentation_map(uint8_t *segment_map, int width,
int height, int *inliers,
int num_inliers) {
@@ -349,6 +340,6 @@
// If this motion does not make up a large enough portion of the frame,
// use the unsegmented version of the error metric
- if (seg_count < (width * height * SEG_COUNT_TR))
+ if (seg_count < SEG_COUNT_TR)
memset(segment_map, 1, width * height * sizeof(*segment_map));
}
diff --git a/av1/encoder/global_motion.h b/av1/encoder/global_motion.h
index 4fa3253..cf1d0fd 100644
--- a/av1/encoder/global_motion.h
+++ b/av1/encoder/global_motion.h
@@ -22,7 +22,7 @@
#endif
#define RANSAC_NUM_MOTIONS 1
-#define GM_REFINEMENT_COUNT 5
+#define GM_MAX_REFINEMENT_STEPS 5
#define MAX_DIRECTIONS 2
// The structure holds a valid reference frame type and its temporal distance
@@ -34,9 +34,9 @@
typedef struct {
// Array of structure which holds the global motion parameters for a given
- // motion model. params_by_motion[i] holds the parameters for a given motion
+ // motion model. motion_models[i] holds the parameters for a given motion
// model for the ith ransac motion.
- MotionModel params_by_motion[RANSAC_NUM_MOTIONS];
+ MotionModel motion_models[RANSAC_NUM_MOTIONS];
// Pointer to hold inliers from motion model.
uint8_t *segment_map;
diff --git a/av1/encoder/global_motion_facade.c b/av1/encoder/global_motion_facade.c
index 0df070a..1a00cbb 100644
--- a/av1/encoder/global_motion_facade.c
+++ b/av1/encoder/global_motion_facade.c
@@ -13,10 +13,12 @@
#include "aom_dsp/flow_estimation/corner_detect.h"
#include "aom_dsp/flow_estimation/flow_estimation.h"
+#include "aom_dsp/pyramid.h"
#include "av1/common/warped_motion.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/ethread.h"
#include "av1/encoder/rdopt.h"
+#include "av1/encoder/global_motion_facade.h"
// Highest motion model to search.
#define GLOBAL_TRANS_TYPES_ENC 3
@@ -80,111 +82,90 @@
// different motion models and finds the best.
static AOM_INLINE void compute_global_motion_for_ref_frame(
AV1_COMP *cpi, YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES], int frame,
- int num_src_corners, int *src_corners, unsigned char *src_buffer,
- MotionModel *params_by_motion, uint8_t *segment_map,
- const int segment_map_w, const int segment_map_h,
- const WarpedMotionParams *ref_params) {
+ MotionModel *motion_models, uint8_t *segment_map, const int segment_map_w,
+ const int segment_map_h, const WarpedMotionParams *ref_params) {
ThreadData *const td = &cpi->td;
MACROBLOCK *const x = &td->mb;
AV1_COMMON *const cm = &cpi->common;
MACROBLOCKD *const xd = &x->e_mbd;
int i;
- int src_width = cpi->source->y_width;
- int src_height = cpi->source->y_height;
+ int src_width = cpi->source->y_crop_width;
+ int src_height = cpi->source->y_crop_height;
int src_stride = cpi->source->y_stride;
- // clang-format off
- static const double kIdentityParams[MAX_PARAMDIM - 1] = {
- 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0
- };
- // clang-format on
WarpedMotionParams tmp_wm_params;
const double *params_this_motion;
- int inliers_by_motion[RANSAC_NUM_MOTIONS];
assert(ref_buf[frame] != NULL);
TransformationType model;
+ int bit_depth = cpi->common.seq_params->bit_depth;
+ GlobalMotionMethod global_motion_method = default_global_motion_method;
+ int num_refinements = cpi->sf.gm_sf.num_refinement_steps;
- // TODO(sarahparker, debargha): Explore do_adaptive_gm_estimation = 1
- const int do_adaptive_gm_estimation = 0;
-
- const int ref_frame_dist = get_relative_dist(
- &cm->seq_params->order_hint_info, cm->current_frame.order_hint,
- cm->cur_frame->ref_order_hints[frame - LAST_FRAME]);
- const GlobalMotionEstimationType gm_estimation_type =
- cm->seq_params->order_hint_info.enable_order_hint &&
- abs(ref_frame_dist) <= 2 && do_adaptive_gm_estimation
- ? GLOBAL_MOTION_DISFLOW_BASED
- : GLOBAL_MOTION_FEATURE_BASED;
for (model = ROTZOOM; model < GLOBAL_TRANS_TYPES_ENC; ++model) {
- int64_t best_warp_error = INT64_MAX;
- // Initially set all params to identity.
- for (i = 0; i < RANSAC_NUM_MOTIONS; ++i) {
- memcpy(params_by_motion[i].params, kIdentityParams,
- (MAX_PARAMDIM - 1) * sizeof(*(params_by_motion[i].params)));
- params_by_motion[i].num_inliers = 0;
+ if (!aom_compute_global_motion(model, cpi->source, ref_buf[frame],
+ bit_depth, global_motion_method,
+ motion_models, RANSAC_NUM_MOTIONS)) {
+ continue;
}
- aom_compute_global_motion(model, src_buffer, src_width, src_height,
- src_stride, src_corners, num_src_corners,
- ref_buf[frame], cpi->common.seq_params->bit_depth,
- gm_estimation_type, inliers_by_motion,
- params_by_motion, RANSAC_NUM_MOTIONS);
- int64_t ref_frame_error = 0;
+ int64_t best_ref_frame_error = 0;
+ int64_t best_warp_error = INT64_MAX;
for (i = 0; i < RANSAC_NUM_MOTIONS; ++i) {
- if (inliers_by_motion[i] == 0) continue;
+ if (motion_models[i].num_inliers == 0) continue;
- params_this_motion = params_by_motion[i].params;
+ params_this_motion = motion_models[i].params;
av1_convert_model_to_params(params_this_motion, &tmp_wm_params);
- // Work around a bug in the AV1 specification
+ // Skip models that we won't use (IDENTITY or TRANSLATION)
+ //
+ // For IDENTITY type models, we don't need to evaluate anything because
+ // all the following logic is effectively comparing the estimated model
+ // to an identity model.
//
// For TRANSLATION type global motion models, gm_get_motion_vector() gives
// the wrong motion vector (see comments in that function for details).
// As translation-type models do not give much gain, we can avoid this bug
// by never choosing a TRANSLATION type model
- if (tmp_wm_params.wmtype == TRANSLATION) {
- continue;
- }
+ if (tmp_wm_params.wmtype <= TRANSLATION) continue;
- if (tmp_wm_params.wmtype != IDENTITY) {
- av1_compute_feature_segmentation_map(
- segment_map, segment_map_w, segment_map_h,
- params_by_motion[i].inliers, params_by_motion[i].num_inliers);
+ av1_compute_feature_segmentation_map(
+ segment_map, segment_map_w, segment_map_h, motion_models[i].inliers,
+ motion_models[i].num_inliers);
- ref_frame_error = av1_segmented_frame_error(
- is_cur_buf_hbd(xd), xd->bd, ref_buf[frame]->y_buffer,
- ref_buf[frame]->y_stride, cpi->source->y_buffer, src_width,
- src_height, src_stride, segment_map, segment_map_w);
+ int64_t ref_frame_error = av1_segmented_frame_error(
+ is_cur_buf_hbd(xd), xd->bd, ref_buf[frame]->y_buffer,
+ ref_buf[frame]->y_stride, cpi->source->y_buffer, src_width,
+ src_height, src_stride, segment_map, segment_map_w);
- const int64_t erroradv_threshold =
- calc_erroradv_threshold(ref_frame_error);
+ if (ref_frame_error == 0) continue;
- const int64_t warp_error = av1_refine_integerized_param(
- &tmp_wm_params, tmp_wm_params.wmtype, is_cur_buf_hbd(xd), xd->bd,
- ref_buf[frame]->y_buffer, ref_buf[frame]->y_width,
- ref_buf[frame]->y_height, ref_buf[frame]->y_stride,
- cpi->source->y_buffer, src_width, src_height, src_stride,
- GM_REFINEMENT_COUNT, best_warp_error, segment_map, segment_map_w,
- erroradv_threshold);
+ const int64_t erroradv_threshold =
+ calc_erroradv_threshold(ref_frame_error);
- // av1_refine_integerized_param() can return a TRANSLATION type model
- // even if its input is some other type, so we have to skip those too
- if (tmp_wm_params.wmtype == TRANSLATION) {
- continue;
- }
+ const int64_t warp_error = av1_refine_integerized_param(
+ &tmp_wm_params, tmp_wm_params.wmtype, is_cur_buf_hbd(xd), xd->bd,
+ ref_buf[frame]->y_buffer, ref_buf[frame]->y_crop_width,
+ ref_buf[frame]->y_crop_height, ref_buf[frame]->y_stride,
+ cpi->source->y_buffer, src_width, src_height, src_stride,
+ num_refinements, best_warp_error, segment_map, segment_map_w,
+ erroradv_threshold);
- if (warp_error < best_warp_error) {
- best_warp_error = warp_error;
- // Save the wm_params modified by
- // av1_refine_integerized_param() rather than motion index to
- // avoid rerunning refine() below.
- memcpy(&(cm->global_motion[frame]), &tmp_wm_params,
- sizeof(WarpedMotionParams));
- }
+ // av1_refine_integerized_param() can return a simpler model type than
+ // its input, so re-check model type here
+ if (tmp_wm_params.wmtype <= TRANSLATION) continue;
+
+ if (warp_error < best_warp_error) {
+ best_ref_frame_error = ref_frame_error;
+ best_warp_error = warp_error;
+ // Save the wm_params modified by
+ // av1_refine_integerized_param() rather than motion index to
+ // avoid rerunning refine() below.
+ memcpy(&(cm->global_motion[frame]), &tmp_wm_params,
+ sizeof(WarpedMotionParams));
}
}
- if (cm->global_motion[frame].wmtype <= AFFINE)
- if (!av1_get_shear_params(&cm->global_motion[frame]))
- cm->global_motion[frame] = default_warp_params;
+ assert(cm->global_motion[frame].wmtype <= AFFINE);
+ if (!av1_get_shear_params(&cm->global_motion[frame]))
+ cm->global_motion[frame] = default_warp_params;
#if 0
// We never choose translational models, so this code is disabled
@@ -202,12 +183,15 @@
if (cm->global_motion[frame].wmtype == IDENTITY) continue;
- if (ref_frame_error == 0) continue;
+ // Once we get here, best_ref_frame_error must be > 0. This is because
+ // of the logic above, which skips over any models which have
+ // ref_frame_error == 0
+ assert(best_ref_frame_error > 0);
// If the best error advantage found doesn't meet the threshold for
// this motion type, revert to IDENTITY.
if (!av1_is_enough_erroradvantage(
- (double)best_warp_error / ref_frame_error,
+ (double)best_warp_error / best_ref_frame_error,
gm_get_params_cost(&cm->global_motion[frame], ref_params,
cm->features.allow_high_precision_mv))) {
cm->global_motion[frame] = default_warp_params;
@@ -220,44 +204,37 @@
// Computes global motion for the given reference frame.
void av1_compute_gm_for_valid_ref_frames(
AV1_COMP *cpi, YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES], int frame,
- int num_src_corners, int *src_corners, unsigned char *src_buffer,
- MotionModel *params_by_motion, uint8_t *segment_map, int segment_map_w,
+ MotionModel *motion_models, uint8_t *segment_map, int segment_map_w,
int segment_map_h) {
AV1_COMMON *const cm = &cpi->common;
const WarpedMotionParams *ref_params =
cm->prev_frame ? &cm->prev_frame->global_motion[frame]
: &default_warp_params;
- compute_global_motion_for_ref_frame(
- cpi, ref_buf, frame, num_src_corners, src_corners, src_buffer,
- params_by_motion, segment_map, segment_map_w, segment_map_h, ref_params);
+ compute_global_motion_for_ref_frame(cpi, ref_buf, frame, motion_models,
+ segment_map, segment_map_w, segment_map_h,
+ ref_params);
}
// Loops over valid reference frames and computes global motion estimation.
static AOM_INLINE void compute_global_motion_for_references(
AV1_COMP *cpi, YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES],
FrameDistPair reference_frame[REF_FRAMES - 1], int num_ref_frames,
- int num_src_corners, int *src_corners, unsigned char *src_buffer,
- MotionModel *params_by_motion, uint8_t *segment_map,
- const int segment_map_w, const int segment_map_h) {
- // Computation of frame corners for the source frame will be done already.
- assert(num_src_corners != -1);
+ MotionModel *motion_models, uint8_t *segment_map, const int segment_map_w,
+ const int segment_map_h) {
AV1_COMMON *const cm = &cpi->common;
// Compute global motion w.r.t. reference frames starting from the nearest ref
// frame in a given direction.
for (int frame = 0; frame < num_ref_frames; frame++) {
int ref_frame = reference_frame[frame].frame;
- av1_compute_gm_for_valid_ref_frames(
- cpi, ref_buf, ref_frame, num_src_corners, src_corners, src_buffer,
- params_by_motion, segment_map, segment_map_w, segment_map_h);
+ av1_compute_gm_for_valid_ref_frames(cpi, ref_buf, ref_frame, motion_models,
+ segment_map, segment_map_w,
+ segment_map_h);
// If global motion w.r.t. current ref frame is
// INVALID/TRANSLATION/IDENTITY, skip the evaluation of global motion w.r.t
- // the remaining ref frames in that direction. The below exit is disabled
- // when ref frame distance w.r.t. current frame is zero. E.g.:
- // source_alt_ref_frame w.r.t. ARF frames.
+ // the remaining ref frames in that direction.
if (cpi->sf.gm_sf.prune_ref_frame_for_gm_search &&
- reference_frame[frame].distance != 0 &&
- cm->global_motion[ref_frame].wmtype != ROTZOOM)
+ cm->global_motion[ref_frame].wmtype <= TRANSLATION)
break;
}
}
@@ -306,6 +283,7 @@
case GM_REDUCED_REF_SEARCH_SKIP_L2_L3_ARF2:
return !(frame == LAST2_FRAME || frame == LAST3_FRAME ||
(frame == ALTREF2_FRAME));
+ case GM_SEARCH_CLOSEST_REFS_ONLY: return 1;
case GM_DISABLE_SEARCH: return 0;
default: assert(0);
}
@@ -325,6 +303,7 @@
int ref_pruning_enabled = is_frame_eligible_for_ref_pruning(
gf_group, cpi->sf.inter_sf.selective_ref_frame, 1, cpi->gf_frame_index);
int cur_frame_gm_disabled = 0;
+ int pyr_lvl = cm->cur_frame->pyramid_level;
if (cpi->sf.gm_sf.disable_gm_search_based_on_stats) {
cur_frame_gm_disabled = disable_gm_search_based_on_stats(cpi);
@@ -349,18 +328,25 @@
ref_pruning_enabled &&
prune_ref_by_selective_ref_frame(cpi, NULL, ref_frame,
cm->cur_frame->ref_display_order_hint);
+ int ref_pyr_lvl = buf->pyramid_level;
if (ref_buf[frame]->y_crop_width == cpi->source->y_crop_width &&
ref_buf[frame]->y_crop_height == cpi->source->y_crop_height &&
do_gm_search_logic(&cpi->sf, frame) && !prune_ref_frames &&
- !cur_frame_gm_disabled) {
+ ref_pyr_lvl <= pyr_lvl && !cur_frame_gm_disabled) {
assert(ref_buf[frame] != NULL);
const int relative_frame_dist = av1_encoder_get_relative_dist(
buf->display_order_hint, cm->cur_frame->display_order_hint);
// Populate past and future ref frames.
// reference_frames[0][] indicates past direction and
// reference_frames[1][] indicates future direction.
- if (relative_frame_dist <= 0) {
+ if (relative_frame_dist == 0) {
+ // Skip global motion estimation for frames at the same nominal instant.
+ // This will generally be either a "real" frame coded against a
+ // temporal filtered version, or a higher spatial layer coded against
+ // a lower spatial layer. In either case, the optimal motion model will
+ // be IDENTITY, so we don't need to search explicitly.
+ } else if (relative_frame_dist < 0) {
reference_frames[0][*num_past_ref_frames].distance =
abs(relative_frame_dist);
reference_frames[0][*num_past_ref_frames].frame = frame;
@@ -376,26 +362,26 @@
}
// Deallocates segment_map and inliers.
-static AOM_INLINE void dealloc_global_motion_data(MotionModel *params_by_motion,
+static AOM_INLINE void dealloc_global_motion_data(MotionModel *motion_models,
uint8_t *segment_map) {
aom_free(segment_map);
for (int m = 0; m < RANSAC_NUM_MOTIONS; m++) {
- aom_free(params_by_motion[m].inliers);
+ aom_free(motion_models[m].inliers);
}
}
// Allocates and initializes memory for segment_map and MotionModel.
-static AOM_INLINE bool alloc_global_motion_data(MotionModel *params_by_motion,
+static AOM_INLINE bool alloc_global_motion_data(MotionModel *motion_models,
uint8_t **segment_map,
const int segment_map_w,
const int segment_map_h) {
- av1_zero_array(params_by_motion, RANSAC_NUM_MOTIONS);
+ av1_zero_array(motion_models, RANSAC_NUM_MOTIONS);
for (int m = 0; m < RANSAC_NUM_MOTIONS; m++) {
- params_by_motion[m].inliers =
- aom_malloc(sizeof(*(params_by_motion[m].inliers)) * 2 * MAX_CORNERS);
- if (!params_by_motion[m].inliers) {
- dealloc_global_motion_data(params_by_motion, NULL);
+ motion_models[m].inliers =
+ aom_malloc(sizeof(*(motion_models[m].inliers)) * 2 * MAX_CORNERS);
+ if (!motion_models[m].inliers) {
+ dealloc_global_motion_data(motion_models, NULL);
return false;
}
}
@@ -403,7 +389,7 @@
*segment_map = (uint8_t *)aom_calloc(segment_map_w * segment_map_h,
sizeof(*segment_map));
if (!*segment_map) {
- dealloc_global_motion_data(params_by_motion, NULL);
+ dealloc_global_motion_data(motion_models, NULL);
return false;
}
return true;
@@ -414,18 +400,10 @@
GlobalMotionInfo *const gm_info = &cpi->gm_info;
YV12_BUFFER_CONFIG *source = cpi->source;
- gm_info->src_buffer = source->y_buffer;
- if (source->flags & YV12_FLAG_HIGHBITDEPTH) {
- // The source buffer is 16-bit, so we need to convert to 8 bits for the
- // following code. We cache the result until the source frame is released.
- gm_info->src_buffer =
- av1_downconvert_frame(source, cpi->common.seq_params->bit_depth);
- }
-
gm_info->segment_map_w =
- (source->y_width + WARP_ERROR_BLOCK) >> WARP_ERROR_BLOCK_LOG;
+ (source->y_crop_width + WARP_ERROR_BLOCK - 1) >> WARP_ERROR_BLOCK_LOG;
gm_info->segment_map_h =
- (source->y_height + WARP_ERROR_BLOCK) >> WARP_ERROR_BLOCK_LOG;
+ (source->y_crop_height + WARP_ERROR_BLOCK - 1) >> WARP_ERROR_BLOCK_LOG;
memset(gm_info->reference_frames, -1,
sizeof(gm_info->reference_frames[0][0]) * MAX_DIRECTIONS *
@@ -445,24 +423,27 @@
qsort(gm_info->reference_frames[1], gm_info->num_ref_frames[1],
sizeof(gm_info->reference_frames[1][0]), compare_distance);
- gm_info->num_src_corners = -1;
- // If at least one valid reference frame exists in past/future directions,
- // compute interest points of source frame using FAST features.
- if (gm_info->num_ref_frames[0] > 0 || gm_info->num_ref_frames[1] > 0) {
- gm_info->num_src_corners = av1_fast_corner_detect(
- gm_info->src_buffer, source->y_width, source->y_height,
- source->y_stride, gm_info->src_corners, MAX_CORNERS);
+ if (cpi->sf.gm_sf.gm_search_type == GM_SEARCH_CLOSEST_REFS_ONLY) {
+ // Filter down to the nearest two ref frames.
+ // Prefer one past and one future ref over two past refs, even if
+ // the second past ref is closer
+ if (gm_info->num_ref_frames[1] > 0) {
+ gm_info->num_ref_frames[0] = AOMMIN(gm_info->num_ref_frames[0], 1);
+ gm_info->num_ref_frames[1] = AOMMIN(gm_info->num_ref_frames[1], 1);
+ } else {
+ gm_info->num_ref_frames[0] = AOMMIN(gm_info->num_ref_frames[0], 2);
+ }
}
}
// Computes global motion w.r.t. valid reference frames.
static AOM_INLINE void global_motion_estimation(AV1_COMP *cpi) {
GlobalMotionInfo *const gm_info = &cpi->gm_info;
- MotionModel params_by_motion[RANSAC_NUM_MOTIONS];
+ MotionModel motion_models[RANSAC_NUM_MOTIONS];
uint8_t *segment_map = NULL;
- alloc_global_motion_data(params_by_motion, &segment_map,
- gm_info->segment_map_w, gm_info->segment_map_h);
+ alloc_global_motion_data(motion_models, &segment_map, gm_info->segment_map_w,
+ gm_info->segment_map_h);
// Compute global motion w.r.t. past reference frames and future reference
// frames
@@ -470,12 +451,11 @@
if (gm_info->num_ref_frames[dir] > 0)
compute_global_motion_for_references(
cpi, gm_info->ref_buf, gm_info->reference_frames[dir],
- gm_info->num_ref_frames[dir], gm_info->num_src_corners,
- gm_info->src_corners, gm_info->src_buffer, params_by_motion,
- segment_map, gm_info->segment_map_w, gm_info->segment_map_h);
+ gm_info->num_ref_frames[dir], motion_models, segment_map,
+ gm_info->segment_map_w, gm_info->segment_map_h);
}
- dealloc_global_motion_data(params_by_motion, segment_map);
+ dealloc_global_motion_data(motion_models, segment_map);
}
// Global motion estimation for the current frame is computed.This computation
@@ -498,7 +478,6 @@
}
if (cpi->common.current_frame.frame_type == INTER_FRAME && cpi->source &&
- cpi->superres_mode == AOM_SUPERRES_NONE &&
cpi->oxcf.tool_cfg.enable_global_motion && !gm_info->search_done) {
setup_global_motion_info_params(cpi);
if (cpi->mt_info.num_workers > 1)
diff --git a/av1/encoder/global_motion_facade.h b/av1/encoder/global_motion_facade.h
index 52df19d..dfdedf7 100644
--- a/av1/encoder/global_motion_facade.h
+++ b/av1/encoder/global_motion_facade.h
@@ -19,9 +19,8 @@
struct AV1_COMP;
void av1_compute_gm_for_valid_ref_frames(
- struct AV1_COMP *cpi, YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES], int frame,
- int num_src_corners, int *src_corners, unsigned char *src_buffer,
- MotionModel *params_by_motion, uint8_t *segment_map, int segment_map_w,
+ AV1_COMP *cpi, YV12_BUFFER_CONFIG *ref_buf[REF_FRAMES], int frame,
+ MotionModel *motion_models, uint8_t *segment_map, int segment_map_w,
int segment_map_h);
void av1_compute_global_motion_facade(struct AV1_COMP *cpi);
#ifdef __cplusplus
diff --git a/av1/encoder/gop_structure.c b/av1/encoder/gop_structure.c
index e0208c9..5078098 100644
--- a/av1/encoder/gop_structure.c
+++ b/av1/encoder/gop_structure.c
@@ -84,7 +84,7 @@
set_frame_parallel_level(&gf_group->frame_parallel_level[*frame_ind],
parallel_frame_count, max_parallel_frames);
// Set LF_UPDATE frames as non-reference frames.
- gf_group->is_frame_non_ref[*frame_ind] = 1;
+ gf_group->is_frame_non_ref[*frame_ind] = true;
}
set_src_offset(gf_group, first_frame_index, *cur_frame_idx, *frame_ind);
@@ -437,7 +437,7 @@
RATE_CONTROL *rc, FRAME_INFO *frame_info, int start, int end,
int *cur_frame_idx, int *frame_ind, int *parallel_frame_count,
int max_parallel_frames, int do_frame_parallel_encode,
- int *first_frame_index, int layer_depth) {
+ int *first_frame_index, int *cur_disp_idx, int layer_depth) {
const int num_frames_to_process = end - start;
// Either we are at the last level of the pyramid, or we don't have enough
@@ -449,6 +449,7 @@
gf_group->update_type[*frame_ind] = LF_UPDATE;
gf_group->arf_src_offset[*frame_ind] = 0;
gf_group->cur_frame_idx[*frame_ind] = *cur_frame_idx;
+ gf_group->display_idx[*frame_ind] = *cur_disp_idx;
gf_group->layer_depth[*frame_ind] = MAX_ARF_LAYERS;
gf_group->arf_boost[*frame_ind] =
av1_calc_arf_boost(twopass, twopass_frame, p_rc, frame_info, start,
@@ -462,11 +463,12 @@
set_frame_parallel_level(&gf_group->frame_parallel_level[*frame_ind],
parallel_frame_count, max_parallel_frames);
// Set LF_UPDATE frames as non-reference frames.
- gf_group->is_frame_non_ref[*frame_ind] = 1;
+ gf_group->is_frame_non_ref[*frame_ind] = true;
}
set_src_offset(gf_group, first_frame_index, *cur_frame_idx, *frame_ind);
++(*frame_ind);
++(*cur_frame_idx);
+ ++(*cur_disp_idx);
++start;
}
} else {
@@ -476,6 +478,8 @@
gf_group->update_type[*frame_ind] = INTNL_ARF_UPDATE;
gf_group->arf_src_offset[*frame_ind] = m - start;
gf_group->cur_frame_idx[*frame_ind] = *cur_frame_idx;
+ gf_group->display_idx[*frame_ind] =
+ *cur_disp_idx + gf_group->arf_src_offset[*frame_ind];
gf_group->layer_depth[*frame_ind] = layer_depth;
gf_group->frame_type[*frame_ind] = INTER_FRAME;
gf_group->refbuf_state[*frame_ind] = REFBUF_UPDATE;
@@ -499,15 +503,17 @@
++(*frame_ind);
// Frames displayed before this internal ARF.
- set_multi_layer_params(
- twopass, twopass_frame, gf_group, p_rc, rc, frame_info, start, m,
- cur_frame_idx, frame_ind, parallel_frame_count, max_parallel_frames,
- do_frame_parallel_encode, first_frame_index, layer_depth + 1);
+ set_multi_layer_params(twopass, twopass_frame, gf_group, p_rc, rc,
+ frame_info, start, m, cur_frame_idx, frame_ind,
+ parallel_frame_count, max_parallel_frames,
+ do_frame_parallel_encode, first_frame_index,
+ cur_disp_idx, layer_depth + 1);
// Overlay for internal ARF.
gf_group->update_type[*frame_ind] = INTNL_OVERLAY_UPDATE;
gf_group->arf_src_offset[*frame_ind] = 0;
gf_group->cur_frame_idx[*frame_ind] = *cur_frame_idx;
+ gf_group->display_idx[*frame_ind] = *cur_disp_idx;
gf_group->arf_boost[*frame_ind] = 0;
gf_group->layer_depth[*frame_ind] = layer_depth;
gf_group->frame_type[*frame_ind] = INTER_FRAME;
@@ -516,12 +522,14 @@
set_src_offset(gf_group, first_frame_index, *cur_frame_idx, *frame_ind);
++(*frame_ind);
++(*cur_frame_idx);
+ ++(*cur_disp_idx);
// Frames displayed after this internal ARF.
- set_multi_layer_params(
- twopass, twopass_frame, gf_group, p_rc, rc, frame_info, m + 1, end,
- cur_frame_idx, frame_ind, parallel_frame_count, max_parallel_frames,
- do_frame_parallel_encode, first_frame_index, layer_depth + 1);
+ set_multi_layer_params(twopass, twopass_frame, gf_group, p_rc, rc,
+ frame_info, m + 1, end, cur_frame_idx, frame_ind,
+ parallel_frame_count, max_parallel_frames,
+ do_frame_parallel_encode, first_frame_index,
+ cur_disp_idx, layer_depth + 1);
}
}
@@ -540,22 +548,19 @@
? 0
: cpi->common.current_frame.frame_number;
- // Initialize gf_group->frame_parallel_level and gf_group->is_frame_non_ref to
- // 0.
- memset(
- gf_group->frame_parallel_level, 0,
- sizeof(gf_group->frame_parallel_level[0]) * MAX_STATIC_GF_GROUP_LENGTH);
- memset(gf_group->is_frame_non_ref, 0,
- sizeof(gf_group->is_frame_non_ref[0]) * MAX_STATIC_GF_GROUP_LENGTH);
- memset(gf_group->src_offset, 0,
- sizeof(gf_group->src_offset[0]) * MAX_STATIC_GF_GROUP_LENGTH);
+ // Initialize gf_group->frame_parallel_level, gf_group->is_frame_non_ref,
+ // gf_group->src_offset and gf_group->is_frame_dropped with 0.
+ memset(gf_group->frame_parallel_level, 0,
+ sizeof(gf_group->frame_parallel_level));
+ memset(gf_group->is_frame_non_ref, 0, sizeof(gf_group->is_frame_non_ref));
+ memset(gf_group->src_offset, 0, sizeof(gf_group->src_offset));
+ memset(gf_group->is_frame_dropped, 0, sizeof(gf_group->is_frame_dropped));
// Initialize gf_group->skip_frame_refresh and gf_group->skip_frame_as_ref
// with INVALID_IDX.
memset(gf_group->skip_frame_refresh, INVALID_IDX,
- sizeof(gf_group->skip_frame_refresh[0][0]) *
- MAX_STATIC_GF_GROUP_LENGTH * REF_FRAMES);
+ sizeof(gf_group->skip_frame_refresh));
memset(gf_group->skip_frame_as_ref, INVALID_IDX,
- sizeof(gf_group->skip_frame_as_ref[0]) * MAX_STATIC_GF_GROUP_LENGTH);
+ sizeof(gf_group->skip_frame_as_ref));
int kf_decomp = cpi->oxcf.kf_cfg.enable_keyframe_filtering > 1;
// This is a patch that fixes https://crbug.com/aomedia/3163
@@ -721,11 +726,12 @@
// Rest of the frames.
if (!is_multi_layer_configured)
- set_multi_layer_params(
- twopass, &cpi->twopass_frame, gf_group, p_rc, rc, frame_info,
- cur_frame_index, gf_interval, &cur_frame_index, &frame_index,
- ¶llel_frame_count, cpi->ppi->num_fp_contexts,
- do_frame_parallel_encode, &first_frame_index, use_altref + 1);
+ set_multi_layer_params(twopass, &cpi->twopass_frame, gf_group, p_rc, rc,
+ frame_info, cur_frame_index, gf_interval,
+ &cur_frame_index, &frame_index,
+ ¶llel_frame_count, cpi->ppi->num_fp_contexts,
+ do_frame_parallel_encode, &first_frame_index,
+ &cur_disp_index, use_altref + 1);
if (use_altref) {
gf_group->update_type[frame_index] = OVERLAY_UPDATE;
diff --git a/av1/encoder/hybrid_fwd_txfm.c b/av1/encoder/hybrid_fwd_txfm.c
index eda5ddf..4c2f8d0 100644
--- a/av1/encoder/hybrid_fwd_txfm.c
+++ b/av1/encoder/hybrid_fwd_txfm.c
@@ -18,7 +18,9 @@
#include "av1/encoder/hybrid_fwd_txfm.h"
/* 4-point reversible, orthonormal Walsh-Hadamard in 3.5 adds, 0.5 shifts per
- pixel. */
+ pixel.
+ Shared for both high and low bit depth.
+ */
void av1_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride) {
int i;
tran_high_t a1, b1, c1, d1, e1;
@@ -40,21 +42,21 @@
a1 -= c1;
d1 += b1;
op[0] = (tran_low_t)a1;
- op[4] = (tran_low_t)c1;
- op[8] = (tran_low_t)d1;
- op[12] = (tran_low_t)b1;
+ op[1] = (tran_low_t)c1;
+ op[2] = (tran_low_t)d1;
+ op[3] = (tran_low_t)b1;
ip_pass0++;
- op++;
+ op += 4;
}
ip = output;
op = output;
for (i = 0; i < 4; i++) {
- a1 = ip[0];
- b1 = ip[1];
- c1 = ip[2];
- d1 = ip[3];
+ a1 = ip[4 * 0];
+ b1 = ip[4 * 1];
+ c1 = ip[4 * 2];
+ d1 = ip[4 * 3];
a1 += b1;
d1 -= c1;
@@ -63,21 +65,16 @@
c1 = e1 - c1;
a1 -= c1;
d1 += b1;
- op[0] = (tran_low_t)(a1 * UNIT_QUANT_FACTOR);
- op[1] = (tran_low_t)(c1 * UNIT_QUANT_FACTOR);
- op[2] = (tran_low_t)(d1 * UNIT_QUANT_FACTOR);
- op[3] = (tran_low_t)(b1 * UNIT_QUANT_FACTOR);
+ op[4 * 0] = (tran_low_t)(a1 * UNIT_QUANT_FACTOR);
+ op[4 * 1] = (tran_low_t)(c1 * UNIT_QUANT_FACTOR);
+ op[4 * 2] = (tran_low_t)(d1 * UNIT_QUANT_FACTOR);
+ op[4 * 3] = (tran_low_t)(b1 * UNIT_QUANT_FACTOR);
- ip += 4;
- op += 4;
+ ip++;
+ op++;
}
}
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output,
- int stride) {
- av1_fwht4x4_c(input, output, stride);
-}
-
static void highbd_fwd_txfm_4x4(const int16_t *src_diff, tran_low_t *coeff,
int diff_stride, TxfmParam *txfm_param) {
int32_t *dst_coeff = (int32_t *)coeff;
@@ -85,7 +82,7 @@
const int bd = txfm_param->bd;
if (txfm_param->lossless) {
assert(tx_type == DCT_DCT);
- av1_highbd_fwht4x4(src_diff, coeff, diff_stride);
+ av1_fwht4x4(src_diff, coeff, diff_stride);
return;
}
av1_fwd_txfm2d_4x4(src_diff, dst_coeff, diff_stride, tx_type, bd);
diff --git a/av1/encoder/interp_search.c b/av1/encoder/interp_search.c
index 2b7eb91..247fa3e 100644
--- a/av1/encoder/interp_search.c
+++ b/av1/encoder/interp_search.c
@@ -682,6 +682,7 @@
*rd = args->interp_filter_stats[match_found_idx].rd;
x->pred_sse[ref_frame] =
args->interp_filter_stats[match_found_idx].pred_sse;
+ *skip_build_pred = 0;
return 0;
}
diff --git a/av1/encoder/interp_search.h b/av1/encoder/interp_search.h
index 8eba483..bce494e 100644
--- a/av1/encoder/interp_search.h
+++ b/av1/encoder/interp_search.h
@@ -126,6 +126,11 @@
FULLPEL_MV start_mv_stack[(MAX_REF_MV_SEARCH - 1) * 2];
/*!
+ * Stack to store ref_mv_idx of NEWMV mode.
+ */
+ uint8_t ref_mv_idx_stack[(MAX_REF_MV_SEARCH - 1) * 2];
+
+ /*!
* Count of mvs in start mv stack.
*/
int start_mv_cnt;
diff --git a/av1/encoder/intra_mode_search.c b/av1/encoder/intra_mode_search.c
index d863910..3b5dd75 100644
--- a/av1/encoder/intra_mode_search.c
+++ b/av1/encoder/intra_mode_search.c
@@ -10,6 +10,7 @@
*/
#include "av1/common/av1_common_int.h"
+#include "av1/common/cfl.h"
#include "av1/common/reconintra.h"
#include "av1/encoder/intra_mode_search.h"
@@ -149,7 +150,7 @@
x->plane[0].src.buf + i * x->plane[0].src.stride + j,
x->plane[0].src.stride, is_hbd);
block_4x4_var_info->var = src_var;
- log_src_var = log(1.0 + src_var / 16.0);
+ log_src_var = log1p(src_var / 16.0);
block_4x4_var_info->log_var = log_src_var;
} else {
// When source variance is already calculated and available for
@@ -157,7 +158,7 @@
// available, then retrieve from buffer. Else, calculate the same and
// store to the buffer.
if (log_src_var < 0) {
- log_src_var = log(1.0 + src_var / 16.0);
+ log_src_var = log1p(src_var / 16.0);
block_4x4_var_info->log_var = log_src_var;
}
}
@@ -167,7 +168,7 @@
cpi->ppi->fn_ptr[BLOCK_4X4].vf,
xd->plane[0].dst.buf + i * xd->plane[0].dst.stride + j,
xd->plane[0].dst.stride, is_hbd);
- *avg_log_recon_variance += log(1.0 + recon_var / 16.0);
+ *avg_log_recon_variance += log1p(recon_var / 16.0);
}
}
@@ -640,6 +641,12 @@
return est_best_cfl_idx;
}
+static AOM_INLINE void set_invalid_cfl_parameters(
+ uint8_t *best_cfl_alpha_idx, int8_t *best_cfl_alpha_signs) {
+ *best_cfl_alpha_idx = 0;
+ *best_cfl_alpha_signs = 0;
+}
+
static void cfl_pick_plane_rd(const AV1_COMP *const cpi, MACROBLOCK *x,
int plane, TX_SIZE tx_size, int cfl_search_range,
RD_STATS cfl_rd_arr[CFL_MAGS_SIZE],
@@ -717,28 +724,44 @@
av1_invalid_rd_stats(best_rd_stats);
// As the dc pred data is same for different values of alpha, enable the
- // caching of dc pred data.
- xd->cfl.use_dc_pred_cache = 1;
+ // caching of dc pred data. Call clear_cfl_dc_pred_cache_flags() before
+ // returning to avoid the unintentional usage of cached dc pred data.
+ xd->cfl.use_dc_pred_cache = true;
// Evaluate alpha parameter of each chroma plane.
est_best_cfl_idx_u =
cfl_pick_plane_parameter(cpi, x, 1, tx_size, cfl_search_range);
est_best_cfl_idx_v =
cfl_pick_plane_parameter(cpi, x, 2, tx_size, cfl_search_range);
- // For cfl_search_range=1, further refinement of alpha is not enabled. Hence
- // CfL index=0 for both the chroma planes implies invalid CfL mode.
- if (cfl_search_range == 1 && est_best_cfl_idx_u == CFL_INDEX_ZERO &&
- est_best_cfl_idx_v == CFL_INDEX_ZERO) {
- // Set invalid CfL parameters here as CfL mode is invalid.
- *best_cfl_alpha_idx = 0;
- *best_cfl_alpha_signs = 0;
+ if (cfl_search_range == 1) {
+ // For cfl_search_range=1, further refinement of alpha is not enabled. Hence
+ // CfL index=0 for both the chroma planes implies invalid CfL mode.
+ if (est_best_cfl_idx_u == CFL_INDEX_ZERO &&
+ est_best_cfl_idx_v == CFL_INDEX_ZERO) {
+ set_invalid_cfl_parameters(best_cfl_alpha_idx, best_cfl_alpha_signs);
+ clear_cfl_dc_pred_cache_flags(&xd->cfl);
+ return 0;
+ }
- // Clear the following flags to avoid the unintentional usage of cached dc
- // pred data.
- xd->cfl.use_dc_pred_cache = 0;
- xd->cfl.dc_pred_is_cached[0] = 0;
- xd->cfl.dc_pred_is_cached[1] = 0;
- return 0;
+ int cfl_alpha_u, cfl_alpha_v;
+ CFL_SIGN_TYPE cfl_sign_u, cfl_sign_v;
+ const MB_MODE_INFO *mbmi = xd->mi[0];
+ cfl_idx_to_sign_and_alpha(est_best_cfl_idx_u, &cfl_sign_u, &cfl_alpha_u);
+ cfl_idx_to_sign_and_alpha(est_best_cfl_idx_v, &cfl_sign_v, &cfl_alpha_v);
+ const int joint_sign = cfl_sign_u * CFL_SIGNS + cfl_sign_v - 1;
+ // Compute alpha and mode signaling rate.
+ const int rate_overhead =
+ mode_costs->cfl_cost[joint_sign][CFL_PRED_U][cfl_alpha_u] +
+ mode_costs->cfl_cost[joint_sign][CFL_PRED_V][cfl_alpha_v] +
+ mode_costs
+ ->intra_uv_mode_cost[is_cfl_allowed(xd)][mbmi->mode][UV_CFL_PRED];
+ // Skip the CfL mode evaluation if the RD cost derived using the rate needed
+ // to signal the CfL mode and alpha parameter exceeds the ref_best_rd.
+ if (RDCOST(x->rdmult, rate_overhead, 0) > ref_best_rd) {
+ set_invalid_cfl_parameters(best_cfl_alpha_idx, best_cfl_alpha_signs);
+ clear_cfl_dc_pred_cache_flags(&xd->cfl);
+ return 0;
+ }
}
// Compute the rd cost of each chroma plane using the alpha parameters which
@@ -748,11 +771,7 @@
cfl_pick_plane_rd(cpi, x, 2, tx_size, cfl_search_range, cfl_rd_arr_v,
est_best_cfl_idx_v);
- // Clear the following flags to avoid the unintentional usage of cached dc
- // pred data.
- xd->cfl.use_dc_pred_cache = 0;
- xd->cfl.dc_pred_is_cached[0] = 0;
- xd->cfl.dc_pred_is_cached[1] = 0;
+ clear_cfl_dc_pred_cache_flags(&xd->cfl);
for (int ui = 0; ui < CFL_MAGS_SIZE; ++ui) {
if (cfl_rd_arr_u[ui].rate == INT_MAX) continue;
@@ -789,8 +808,7 @@
av1_invalid_rd_stats(best_rd_stats);
// Set invalid CFL parameters here since the rdcost is not better than
// ref_best_rd.
- *best_cfl_alpha_idx = 0;
- *best_cfl_alpha_signs = 0;
+ set_invalid_cfl_parameters(best_cfl_alpha_idx, best_cfl_alpha_signs);
return 0;
}
return 1;
@@ -850,12 +868,20 @@
}
IntraModeSearchState intra_search_state;
init_intra_mode_search_state(&intra_search_state);
+ const CFL_ALLOWED_TYPE cfl_allowed = is_cfl_allowed(xd);
// Search through all non-palette modes.
for (int mode_idx = 0; mode_idx < UV_INTRA_MODES; ++mode_idx) {
int this_rate;
RD_STATS tokenonly_rd_stats;
UV_PREDICTION_MODE mode = uv_rd_search_mode_order[mode_idx];
+
+ // Skip the current mode evaluation if the RD cost derived using the mode
+ // signaling rate exceeds the best_rd so far.
+ const int mode_rate =
+ mode_costs->intra_uv_mode_cost[cfl_allowed][mbmi->mode][mode];
+ if (RDCOST(x->rdmult, mode_rate, 0) > best_rd) continue;
+
const int is_diagonal_mode = av1_is_diagonal_mode(get_uv_mode(mode));
const int is_directional_mode = av1_is_directional_mode(get_uv_mode(mode));
@@ -885,7 +911,7 @@
const SPEED_FEATURES *sf = &cpi->sf;
mbmi->angle_delta[PLANE_TYPE_UV] = 0;
if (mode == UV_CFL_PRED) {
- if (!is_cfl_allowed(xd) || !intra_mode_cfg->enable_cfl_intra) continue;
+ if (!cfl_allowed || !intra_mode_cfg->enable_cfl_intra) continue;
assert(!is_directional_mode);
const TX_SIZE uv_tx_size = av1_get_tx_size(AOM_PLANE_U, xd);
if (!cfl_rd_pick_alpha(x, cpi, uv_tx_size, best_rd,
@@ -916,7 +942,7 @@
// Search through angle delta
const int rate_overhead =
- mode_costs->intra_uv_mode_cost[is_cfl_allowed(xd)][mbmi->mode][mode];
+ mode_costs->intra_uv_mode_cost[cfl_allowed][mbmi->mode][mode];
if (!rd_pick_intra_angle_sbuv(cpi, x, bsize, rate_overhead, best_rd,
&this_rate, &tokenonly_rd_stats))
continue;
@@ -932,7 +958,7 @@
}
}
const int mode_cost =
- mode_costs->intra_uv_mode_cost[is_cfl_allowed(xd)][mbmi->mode][mode];
+ mode_costs->intra_uv_mode_cost[cfl_allowed][mbmi->mode][mode];
this_rate = tokenonly_rd_stats.rate +
intra_mode_info_cost_uv(cpi, x, mbmi, bsize, mode_cost);
this_rd = RDCOST(x->rdmult, this_rate, tokenonly_rd_stats.dist);
@@ -956,8 +982,7 @@
uint8_t *best_palette_color_map = x->palette_buffer->best_palette_color_map;
av1_rd_pick_palette_intra_sbuv(
cpi, x,
- mode_costs
- ->intra_uv_mode_cost[is_cfl_allowed(xd)][mbmi->mode][UV_DC_PRED],
+ mode_costs->intra_uv_mode_cost[cfl_allowed][mbmi->mode][UV_DC_PRED],
best_palette_color_map, &best_mbmi, &best_rd, rate, rate_tokenonly,
distortion, skippable);
}
@@ -1143,9 +1168,13 @@
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mbmi = xd->mi[0];
RD_STATS rd_stats;
- // In order to improve txfm search avoid rd based breakouts during winner
- // mode evaluation. Hence passing ref_best_rd as a maximum value
- av1_pick_uniform_tx_size_type_yrd(cpi, x, &rd_stats, bsize, INT64_MAX);
+ // In order to improve txfm search, avoid rd based breakouts during winner
+ // mode evaluation. Hence passing ref_best_rd as INT64_MAX by default when the
+ // speed feature use_rd_based_breakout_for_intra_tx_search is disabled.
+ int64_t ref_best_rd = cpi->sf.tx_sf.use_rd_based_breakout_for_intra_tx_search
+ ? *best_rd
+ : INT64_MAX;
+ av1_pick_uniform_tx_size_type_yrd(cpi, x, &rd_stats, bsize, ref_best_rd);
if (rd_stats.rate == INT_MAX) return 0;
int this_rate_tokenonly = rd_stats.rate;
if (!xd->lossless[mbmi->segment_id] && block_signals_txsize(mbmi->bsize)) {
diff --git a/av1/encoder/k_means_template.h b/av1/encoder/k_means_template.h
index 31ffdcf..4be2038 100644
--- a/av1/encoder/k_means_template.h
+++ b/av1/encoder/k_means_template.h
@@ -123,6 +123,10 @@
l = (l == 1) ? 0 : 1;
RENAME(calc_centroids)(data, meta_centroids[l], meta_indices[prev_l], n, k);
+ if (!memcmp(meta_centroids[l], meta_centroids[prev_l],
+ sizeof(centroids[0]) * k * AV1_K_MEANS_DIM)) {
+ break;
+ }
#if AV1_K_MEANS_DIM == 1
av1_calc_indices_dim1(data, meta_centroids[l], meta_indices[l], &this_dist,
n, k);
@@ -135,9 +139,6 @@
best_l = prev_l;
break;
}
- if (!memcmp(meta_centroids[l], meta_centroids[prev_l],
- sizeof(centroids[0]) * k * AV1_K_MEANS_DIM))
- break;
}
if (i == max_itr) best_l = l;
if (best_l != 0) {
diff --git a/av1/encoder/level.c b/av1/encoder/level.c
index 5741659..5d5fe9c 100644
--- a/av1/encoder/level.c
+++ b/av1/encoder/level.c
@@ -522,9 +522,10 @@
}
#define MAX_TIME 1e16
-double time_next_buffer_is_free(int num_decoded_frame, int decoder_buffer_delay,
- const FRAME_BUFFER *frame_buffer_pool,
- double current_time) {
+static double time_next_buffer_is_free(int num_decoded_frame,
+ int decoder_buffer_delay,
+ const FRAME_BUFFER *frame_buffer_pool,
+ double current_time) {
if (num_decoded_frame == 0) {
return (double)decoder_buffer_delay / 90000.0;
}
@@ -1243,7 +1244,8 @@
AOMMAX(level_spec->max_decode_rate, decoded_samples);
level_spec->max_tile_rate = AOMMAX(level_spec->max_tile_rate, tiles);
level_stats->max_bitrate =
- AOMMAX(level_stats->max_bitrate, (int)encoded_size_in_bytes * 8);
+ AOMMAX(level_stats->max_bitrate,
+ (int)AOMMIN(encoded_size_in_bytes * 8, (size_t)INT_MAX));
}
void av1_update_level_info(AV1_COMP *cpi, size_t size, int64_t ts_start,
diff --git a/av1/encoder/lookahead.c b/av1/encoder/lookahead.c
index 10fbb77..9ef9b88 100644
--- a/av1/encoder/lookahead.c
+++ b/av1/encoder/lookahead.c
@@ -46,7 +46,7 @@
unsigned int width, unsigned int height, unsigned int subsampling_x,
unsigned int subsampling_y, int use_highbitdepth, unsigned int depth,
const int border_in_pixels, int byte_alignment, int num_lap_buffers,
- bool is_all_intra, int enable_global_motion) {
+ bool is_all_intra, int num_pyramid_levels) {
int lag_in_frames = AOMMAX(1, depth);
// For all-intra frame encoding, previous source frames are not required.
@@ -82,7 +82,7 @@
if (aom_realloc_frame_buffer(
&ctx->buf[i].img, width, height, subsampling_x, subsampling_y,
use_highbitdepth, border_in_pixels, byte_alignment, NULL, NULL,
- NULL, enable_global_motion, 0)) {
+ NULL, num_pyramid_levels, 0)) {
goto fail;
}
}
@@ -100,7 +100,7 @@
int av1_lookahead_push(struct lookahead_ctx *ctx, const YV12_BUFFER_CONFIG *src,
int64_t ts_start, int64_t ts_end, int use_highbitdepth,
- aom_enc_frame_flags_t flags) {
+ int num_pyramid_levels, aom_enc_frame_flags_t flags) {
int width = src->y_crop_width;
int height = src->y_crop_height;
int uv_width = src->uv_crop_width;
@@ -134,7 +134,7 @@
memset(&new_img, 0, sizeof(new_img));
if (aom_alloc_frame_buffer(&new_img, width, height, subsampling_x,
subsampling_y, use_highbitdepth,
- AOM_BORDER_IN_PIXELS, 0, 0))
+ AOM_BORDER_IN_PIXELS, 0, num_pyramid_levels, 0))
return 1;
aom_free_frame_buffer(&buf->img);
buf->img = new_img;
diff --git a/av1/encoder/lookahead.h b/av1/encoder/lookahead.h
index bd7cae4..c0e6d22 100644
--- a/av1/encoder/lookahead.h
+++ b/av1/encoder/lookahead.h
@@ -70,7 +70,7 @@
unsigned int width, unsigned int height, unsigned int subsampling_x,
unsigned int subsampling_y, int use_highbitdepth, unsigned int depth,
const int border_in_pixels, int byte_alignment, int num_lap_buffers,
- bool is_all_intra, int enable_global_motion);
+ bool is_all_intra, int num_pyramid_levels);
/**\brief Destroys the lookahead stage
*/
@@ -90,11 +90,13 @@
* \param[in] ts_start Timestamp for the start of this frame
* \param[in] ts_end Timestamp for the end of this frame
* \param[in] use_highbitdepth Tell if HBD is used
+ * \param[in] num_pyramid_levels Number of pyramid levels to allocate
+ for each frame buffer
* \param[in] flags Flags set on this frame
*/
int av1_lookahead_push(struct lookahead_ctx *ctx, const YV12_BUFFER_CONFIG *src,
int64_t ts_start, int64_t ts_end, int use_highbitdepth,
- aom_enc_frame_flags_t flags);
+ int num_pyramid_levels, aom_enc_frame_flags_t flags);
/**\brief Get the next source buffer to encode
*
diff --git a/av1/encoder/mcomp.c b/av1/encoder/mcomp.c
index 8fd1ab1..cc39c81 100644
--- a/av1/encoder/mcomp.c
+++ b/av1/encoder/mcomp.c
@@ -94,10 +94,12 @@
void av1_make_default_fullpel_ms_params(
FULLPEL_MOTION_SEARCH_PARAMS *ms_params, const struct AV1_COMP *cpi,
- MACROBLOCK *x, BLOCK_SIZE bsize, const MV *ref_mv,
+ MACROBLOCK *x, BLOCK_SIZE bsize, const MV *ref_mv, FULLPEL_MV start_mv,
const search_site_config search_sites[NUM_DISTINCT_SEARCH_METHODS],
int fine_search_interval) {
const MV_SPEED_FEATURES *mv_sf = &cpi->sf.mv_sf;
+ const int is_key_frame =
+ cpi->ppi->gf_group.update_type[cpi->gf_frame_index] == KF_UPDATE;
// High level params
ms_params->bsize = bsize;
@@ -129,19 +131,6 @@
av1_set_mv_search_method(ms_params, search_sites, search_method);
- const int use_downsampled_sad =
- mv_sf->use_downsampled_sad && block_size_high[bsize] >= 16;
- if (use_downsampled_sad) {
- ms_params->sdf = ms_params->vfp->sdsf;
- ms_params->sdx4df = ms_params->vfp->sdsx4df;
- // Skip version of sadx3 is not is not available yet
- ms_params->sdx3df = ms_params->vfp->sdsx4df;
- } else {
- ms_params->sdf = ms_params->vfp->sdf;
- ms_params->sdx4df = ms_params->vfp->sdx4df;
- ms_params->sdx3df = ms_params->vfp->sdx3df;
- }
-
ms_params->mesh_patterns[0] = mv_sf->mesh_patterns;
ms_params->mesh_patterns[1] = mv_sf->intrabc_mesh_patterns;
ms_params->force_mesh_thresh = mv_sf->exhaustive_searches_thresh;
@@ -161,6 +150,47 @@
// Mvcost params
init_mv_cost_params(&ms_params->mv_cost_params, x->mv_costs, ref_mv,
x->errorperbit, x->sadperbit);
+
+ ms_params->sdf = ms_params->vfp->sdf;
+ ms_params->sdx4df = ms_params->vfp->sdx4df;
+ ms_params->sdx3df = ms_params->vfp->sdx3df;
+
+ if (mv_sf->use_downsampled_sad == 2 && block_size_high[bsize] >= 16) {
+ ms_params->sdf = ms_params->vfp->sdsf;
+ ms_params->sdx4df = ms_params->vfp->sdsx4df;
+ // Skip version of sadx3 is not available yet
+ ms_params->sdx3df = ms_params->vfp->sdsx4df;
+ } else if (mv_sf->use_downsampled_sad == 1 && block_size_high[bsize] >= 16 &&
+ !is_key_frame) {
+ FULLPEL_MV start_mv_clamped = start_mv;
+ // adjust start_mv to make sure it is within MV range
+ clamp_fullmv(&start_mv_clamped, &ms_params->mv_limits);
+
+ const struct buf_2d *const ref = ms_params->ms_buffers.ref;
+ const int ref_stride = ref->stride;
+ const uint8_t *best_address = get_buf_from_fullmv(ref, &start_mv_clamped);
+ const struct buf_2d *const src = ms_params->ms_buffers.src;
+ const uint8_t *src_buf = src->buf;
+ const int src_stride = src->stride;
+
+ unsigned int start_mv_sad_even_rows, start_mv_sad_odd_rows;
+ start_mv_sad_even_rows =
+ ms_params->vfp->sdsf(src_buf, src_stride, best_address, ref_stride);
+ start_mv_sad_odd_rows =
+ ms_params->vfp->sdsf(src_buf + src_stride, src_stride,
+ best_address + ref_stride, ref_stride);
+
+ // If the absolute SAD difference computed between the pred-to-src of even
+ // and odd rows is small, skip every other row in sad computation.
+ const int odd_to_even_diff_sad =
+ abs((int)start_mv_sad_even_rows - (int)start_mv_sad_odd_rows);
+ const int mult_thresh = 4;
+ if (odd_to_even_diff_sad * mult_thresh < (int)start_mv_sad_even_rows) {
+ ms_params->sdf = ms_params->vfp->sdsf;
+ ms_params->sdx4df = ms_params->vfp->sdsx4df;
+ ms_params->sdx3df = ms_params->vfp->sdsx4df;
+ }
+ }
}
void av1_set_ms_to_intra_mode(FULLPEL_MOTION_SEARCH_PARAMS *ms_params,
@@ -228,6 +258,9 @@
if (mv_limits->col_max > col_max) mv_limits->col_max = col_max;
if (mv_limits->row_min < row_min) mv_limits->row_min = row_min;
if (mv_limits->row_max > row_max) mv_limits->row_max = row_max;
+
+ mv_limits->col_max = AOMMAX(mv_limits->col_min, mv_limits->col_max);
+ mv_limits->row_max = AOMMAX(mv_limits->row_min, mv_limits->row_max);
}
int av1_init_search_range(int size) {
@@ -649,6 +682,14 @@
cfg->num_search_steps = MAX_PATTERN_SCALES;
}
+const av1_init_search_site_config
+ av1_init_motion_compensation[NUM_DISTINCT_SEARCH_METHODS] = {
+ av1_init_dsmotion_compensation, av1_init_motion_compensation_nstep,
+ av1_init_motion_compensation_nstep, av1_init_dsmotion_compensation,
+ av1_init_motion_compensation_hex, av1_init_motion_compensation_bigdia,
+ av1_init_motion_compensation_square
+ };
+
// Checks whether the mv is within range of the mv_limits
static INLINE int check_bounds(const FullMvLimits *mv_limits, int row, int col,
int range) {
@@ -1312,88 +1353,76 @@
do_init_search, cost_list, best_mv);
}
-static int diamond_search_sad(FULLPEL_MV start_mv,
+static int diamond_search_sad(FULLPEL_MV start_mv, unsigned int start_mv_sad,
const FULLPEL_MOTION_SEARCH_PARAMS *ms_params,
const int search_step, int *num00,
FULLPEL_MV *best_mv, FULLPEL_MV *second_best_mv) {
+#define UPDATE_SEARCH_STEP \
+ do { \
+ if (best_site != 0) { \
+ tmp_second_best_mv = *best_mv; \
+ best_mv->row += site[best_site].mv.row; \
+ best_mv->col += site[best_site].mv.col; \
+ best_address += site[best_site].offset; \
+ is_off_center = 1; \
+ } \
+ \
+ if (is_off_center == 0) num_center_steps++; \
+ \
+ if (best_site == 0 && step > 2) { \
+ int next_step_size = cfg->radius[step - 1]; \
+ while (next_step_size == cfg->radius[step] && step > 2) { \
+ num_center_steps++; \
+ --step; \
+ next_step_size = cfg->radius[step - 1]; \
+ } \
+ } \
+ } while (0)
+
const struct buf_2d *const src = ms_params->ms_buffers.src;
const struct buf_2d *const ref = ms_params->ms_buffers.ref;
+ const uint8_t *src_buf = src->buf;
+ const int src_stride = src->stride;
const int ref_stride = ref->stride;
- const uint8_t *best_address;
- const uint8_t *mask = ms_params->ms_buffers.mask;
- const uint8_t *second_pred = ms_params->ms_buffers.second_pred;
const MV_COST_PARAMS *mv_cost_params = &ms_params->mv_cost_params;
const search_site_config *cfg = ms_params->search_sites;
- unsigned int bestsad = INT_MAX;
- int best_site = 0;
int is_off_center = 0;
-
- clamp_fullmv(&start_mv, &ms_params->mv_limits);
+ // Number of times that we have stayed in the middle. This is used to skip
+ // search steps in the future if diamond_search_sad is called again.
+ int num_center_steps = 0;
// search_step determines the length of the initial step and hence the number
// of iterations.
const int tot_steps = cfg->num_search_steps - search_step;
+ FULLPEL_MV tmp_second_best_mv;
+ if (second_best_mv) {
+ tmp_second_best_mv = *second_best_mv;
+ }
- *num00 = 0;
*best_mv = start_mv;
// Check the starting position
- best_address = get_buf_from_fullmv(ref, &start_mv);
- bestsad = get_mvpred_compound_sad(ms_params, src, best_address, ref_stride);
- bestsad += mvsad_err_cost_(best_mv, &ms_params->mv_cost_params);
+ const uint8_t *best_address = get_buf_from_fullmv(ref, &start_mv);
+ unsigned int bestsad = start_mv_sad;
- int next_step_size = tot_steps > 2 ? cfg->radius[tot_steps - 2] : 1;
- for (int step = tot_steps - 1; step >= 0; --step) {
- const search_site *site = cfg->site[step];
- best_site = 0;
- if (step > 0) next_step_size = cfg->radius[step - 1];
+ // TODO([email protected]): Implement 4 points search for msdf&sdaf
+ if (ms_params->ms_buffers.second_pred) {
+ for (int step = tot_steps - 1; step >= 0; --step) {
+ const search_site *site = cfg->site[step];
+ const int num_searches = cfg->searches_per_step[step];
+ int best_site = 0;
- int all_in = 1, j;
- // Trap illegal vectors
- all_in &= best_mv->row + site[1].mv.row >= ms_params->mv_limits.row_min;
- all_in &= best_mv->row + site[2].mv.row <= ms_params->mv_limits.row_max;
- all_in &= best_mv->col + site[3].mv.col >= ms_params->mv_limits.col_min;
- all_in &= best_mv->col + site[4].mv.col <= ms_params->mv_limits.col_max;
-
- // TODO(anyone): Implement 4 points search for msdf&sdaf
- if (all_in && !mask && !second_pred) {
- const uint8_t *src_buf = src->buf;
- const int src_stride = src->stride;
- for (int idx = 1; idx <= cfg->searches_per_step[step]; idx += 4) {
- unsigned char const *block_offset[4];
- unsigned int sads[4];
-
- for (j = 0; j < 4; j++)
- block_offset[j] = site[idx + j].offset + best_address;
-
- ms_params->sdx4df(src_buf, src_stride, block_offset, ref_stride, sads);
- for (j = 0; j < 4; j++) {
- if (sads[j] < bestsad) {
- const FULLPEL_MV this_mv = { best_mv->row + site[idx + j].mv.row,
- best_mv->col + site[idx + j].mv.col };
- unsigned int thissad =
- sads[j] + mvsad_err_cost_(&this_mv, mv_cost_params);
- if (thissad < bestsad) {
- bestsad = thissad;
- best_site = idx + j;
- }
- }
- }
- }
- } else {
- for (int idx = 1; idx <= cfg->searches_per_step[step]; idx++) {
+ for (int idx = 1; idx <= num_searches; idx++) {
const FULLPEL_MV this_mv = { best_mv->row + site[idx].mv.row,
best_mv->col + site[idx].mv.col };
if (av1_is_fullmv_in_range(&ms_params->mv_limits, this_mv)) {
const uint8_t *const check_here = site[idx].offset + best_address;
- unsigned int thissad;
-
- thissad =
+ unsigned int thissad =
get_mvpred_compound_sad(ms_params, src, check_here, ref_stride);
if (thissad < bestsad) {
@@ -1405,47 +1434,112 @@
}
}
}
+ UPDATE_SEARCH_STEP;
}
+ } else {
+ for (int step = tot_steps - 1; step >= 0; --step) {
+ const search_site *site = cfg->site[step];
+ const int num_searches = cfg->searches_per_step[step];
+ int best_site = 0;
- if (best_site != 0) {
- if (second_best_mv) {
- *second_best_mv = *best_mv;
+ int all_in = 1;
+ // Trap illegal vectors
+ all_in &= best_mv->row + site[1].mv.row >= ms_params->mv_limits.row_min;
+ all_in &= best_mv->row + site[2].mv.row <= ms_params->mv_limits.row_max;
+ all_in &= best_mv->col + site[3].mv.col >= ms_params->mv_limits.col_min;
+ all_in &= best_mv->col + site[4].mv.col <= ms_params->mv_limits.col_max;
+
+ if (all_in) {
+ for (int idx = 1; idx <= num_searches; idx += 4) {
+ unsigned char const *block_offset[4];
+ unsigned int sads[4];
+
+ for (int j = 0; j < 4; j++)
+ block_offset[j] = site[idx + j].offset + best_address;
+
+ ms_params->sdx4df(src_buf, src_stride, block_offset, ref_stride,
+ sads);
+ for (int j = 0; j < 4; j++) {
+ if (sads[j] < bestsad) {
+ const FULLPEL_MV this_mv = { best_mv->row + site[idx + j].mv.row,
+ best_mv->col +
+ site[idx + j].mv.col };
+ unsigned int thissad =
+ sads[j] + mvsad_err_cost_(&this_mv, mv_cost_params);
+ if (thissad < bestsad) {
+ bestsad = thissad;
+ best_site = idx + j;
+ }
+ }
+ }
+ }
+ } else {
+ for (int idx = 1; idx <= num_searches; idx++) {
+ const FULLPEL_MV this_mv = { best_mv->row + site[idx].mv.row,
+ best_mv->col + site[idx].mv.col };
+
+ if (av1_is_fullmv_in_range(&ms_params->mv_limits, this_mv)) {
+ const uint8_t *const check_here = site[idx].offset + best_address;
+ unsigned int thissad =
+ get_mvpred_sad(ms_params, src, check_here, ref_stride);
+
+ if (thissad < bestsad) {
+ thissad += mvsad_err_cost_(&this_mv, mv_cost_params);
+ if (thissad < bestsad) {
+ bestsad = thissad;
+ best_site = idx;
+ }
+ }
+ }
+ }
}
- best_mv->row += site[best_site].mv.row;
- best_mv->col += site[best_site].mv.col;
- best_address += site[best_site].offset;
- is_off_center = 1;
- }
-
- if (is_off_center == 0) (*num00)++;
-
- if (best_site == 0) {
- while (next_step_size == cfg->radius[step] && step > 2) {
- ++(*num00);
- --step;
- next_step_size = cfg->radius[step - 1];
- }
+ UPDATE_SEARCH_STEP;
}
}
+ *num00 = num_center_steps;
+ if (second_best_mv) {
+ *second_best_mv = tmp_second_best_mv;
+ }
+
return bestsad;
+
+#undef UPDATE_SEARCH_STEP
}
-/* do_refine: If last step (1-away) of n-step search doesn't pick the center
- point as the best match, we will do a final 1-away diamond
- refining search */
-static int full_pixel_diamond(const FULLPEL_MV start_mv,
+static INLINE unsigned int get_start_mvpred_sad_cost(
+ const FULLPEL_MOTION_SEARCH_PARAMS *ms_params, FULLPEL_MV start_mv) {
+ const struct buf_2d *const src = ms_params->ms_buffers.src;
+ const struct buf_2d *const ref = ms_params->ms_buffers.ref;
+ const uint8_t *best_address = get_buf_from_fullmv(ref, &start_mv);
+
+ unsigned int start_mv_sad =
+ mvsad_err_cost_(&start_mv, &ms_params->mv_cost_params);
+
+ if (ms_params->ms_buffers.second_pred)
+ start_mv_sad +=
+ get_mvpred_compound_sad(ms_params, src, best_address, ref->stride);
+ else
+ start_mv_sad += get_mvpred_sad(ms_params, src, best_address, ref->stride);
+
+ return start_mv_sad;
+}
+
+static int full_pixel_diamond(FULLPEL_MV start_mv,
const FULLPEL_MOTION_SEARCH_PARAMS *ms_params,
const int step_param, int *cost_list,
FULLPEL_MV *best_mv, FULLPEL_MV *second_best_mv) {
const search_site_config *cfg = ms_params->search_sites;
int thissme, n, num00 = 0;
- int bestsme = diamond_search_sad(start_mv, ms_params, step_param, &n, best_mv,
- second_best_mv);
- if (bestsme < INT_MAX) {
- bestsme = get_mvpred_compound_var_cost(ms_params, best_mv);
- }
+ // Clamp start mv and calculate the cost
+ clamp_fullmv(&start_mv, &ms_params->mv_limits);
+ unsigned int start_mv_sad = get_start_mvpred_sad_cost(ms_params, start_mv);
+
+ diamond_search_sad(start_mv, start_mv_sad, ms_params, step_param, &n, best_mv,
+ second_best_mv);
+
+ int bestsme = get_mvpred_compound_var_cost(ms_params, best_mv);
// If there won't be more n-step search, check to see if refining search is
// needed.
@@ -1453,23 +1547,23 @@
while (n < further_steps) {
++n;
+ // TODO([email protected]): There is another bug here where the second
+ // best mv gets incorrectly overwritten. Fix it later.
+ FULLPEL_MV tmp_best_mv;
+ diamond_search_sad(start_mv, start_mv_sad, ms_params, step_param + n,
+ &num00, &tmp_best_mv, second_best_mv);
+
+ thissme = get_mvpred_compound_var_cost(ms_params, &tmp_best_mv);
+
+ if (thissme < bestsme) {
+ bestsme = thissme;
+ *best_mv = tmp_best_mv;
+ }
+
if (num00) {
- num00--;
- } else {
- // TODO([email protected]): There is another bug here where the second
- // best mv gets incorrectly overwritten. Fix it later.
- FULLPEL_MV tmp_best_mv;
- thissme = diamond_search_sad(start_mv, ms_params, step_param + n, &num00,
- &tmp_best_mv, second_best_mv);
-
- if (thissme < INT_MAX) {
- thissme = get_mvpred_compound_var_cost(ms_params, &tmp_best_mv);
- }
-
- if (thissme < bestsme) {
- bestsme = thissme;
- *best_mv = tmp_best_mv;
- }
+ // Advance the loop by num00 steps
+ n += num00;
+ num00 = 0;
}
}
@@ -1575,6 +1669,12 @@
int range = mesh_patterns[0].range;
int baseline_interval_divisor;
+ // TODO([email protected]): Currently exhaustive search calls single ref
+ // version of sad and variance function. We still need to check the
+ // performance when compound ref exhaustive search is enabled.
+ assert(!ms_params->ms_buffers.second_pred &&
+ "Mesh search does not support compound mode!");
+
*best_mv = start_mv;
// Trap illegal values for interval and range for this function.
@@ -1772,7 +1872,8 @@
// Should we allow a follow on exhaustive search?
if (!run_mesh_search &&
- ((search_method == NSTEP) || (search_method == NSTEP_8PT))) {
+ ((search_method == NSTEP) || (search_method == NSTEP_8PT)) &&
+ !ms_params->ms_buffers.second_pred) {
int exhaustive_thr = ms_params->force_mesh_thresh;
exhaustive_thr >>=
10 - (mi_size_wide_log2[bsize] + mi_size_high_log2[bsize]);
@@ -2018,16 +2119,15 @@
}
if (xd->bd != 8) {
- unsigned int sad;
best_int_mv->as_fullmv = kZeroFullMv;
- sad = cpi->ppi->fn_ptr[bsize].sdf(x->plane[0].src.buf, src_stride,
- xd->plane[0].pre[0].buf, ref_stride);
+ best_sad = cpi->ppi->fn_ptr[bsize].sdf(x->plane[0].src.buf, src_stride,
+ xd->plane[0].pre[0].buf, ref_stride);
if (scaled_ref_frame) {
int i;
for (i = 0; i < MAX_MB_PLANE; i++) xd->plane[i].pre[0] = backup_yv12[i];
}
- return sad;
+ return best_sad;
}
// Set up prediction 1-D reference set
@@ -2055,6 +2155,19 @@
best_sad =
cpi->ppi->fn_ptr[bsize].sdf(src_buf, src_stride, ref_buf, ref_stride);
+ // Evaluate zero MV if found MV is non-zero.
+ if (best_int_mv->as_int != 0) {
+ tmp_sad = cpi->ppi->fn_ptr[bsize].sdf(x->plane[0].src.buf, src_stride,
+ xd->plane[0].pre[0].buf, ref_stride);
+
+ if (tmp_sad < best_sad) {
+ best_int_mv->as_fullmv = kZeroFullMv;
+ this_mv = best_int_mv->as_fullmv;
+ ref_buf = xd->plane[0].pre[0].buf;
+ best_sad = tmp_sad;
+ }
+ }
+
{
const uint8_t *const pos[4] = {
ref_buf - ref_stride,
@@ -3225,13 +3338,111 @@
}
// Refines MV in a small range
+
+// Macros to build bitmasks which help us avoid redundant computations
+//
+// To explain the idea here, imagine that on the first iteration of the
+// loop below, we step rightwards. Then, on the second iteration, the neighbors
+// to consider are:
+// . . .
+// 0 1 .
+// . . .
+// Where 0 is the initial search point, 1 is the best candidate found in the
+// first iteration, and the dots are the other neighbors of point 1.
+//
+// Naively, we would now need to scan all 8 neighbors of point 1 (point 0 and
+// the seven points marked with dots), and compare them to see where to move
+// next. However, we already evaluated 5 of those 8 neighbors in the last
+// iteration, and decided that they are worse than point 1. So we don't need
+// to re-consider these points. We only really need to consider the three
+// points which are adjacent to point 1 but *not* to point 0.
+//
+// As the algorithm goes on, there are other ways that redundant evaluations
+// can happen, if the search path curls back around on itself.
+//
+// To avoid all possible redundancies, we'd have to build a set containing
+// every point we have already checked, and this would be quite expensive.
+//
+// So instead, we apply a 95%-effective solution with a much lower overhead:
+// we prune out the points which were considered during the previous
+// iteration, but we don't worry about any prior iteration. This can be done
+// as follows:
+//
+// We build a static table, called neighbor_mask, which answers the question
+// "if we moved in direction X last time, which neighbors are new, and which
+// were scanned last iteration?"
+// Then we can query this table to quickly determine which points we need to
+// evaluate, and which we can skip.
+//
+// To query the table, the logic is simply:
+// neighbor_mask[i] & (1 << j) == "if we moved in direction i last iteration,
+// do we need to scan neighbor j this iteration?"
+#define NEIGHBOR_MASK_DIA(left, down, right, up) \
+ (left | (down << 1) | (right << 2) | (up << 3))
+
+#define NEIGHBOR_MASK_SQR(left, down, right, up, down_left, down_right, \
+ up_left, up_right) \
+ (left | (down << 1) | (right << 2) | (up << 3) | (down_left << 4) | \
+ (down_right << 5) | (up_left << 6) | (up_right << 7))
+
+static const warp_search_config warp_search_info[WARP_SEARCH_METHODS] = {
+ // WARP_SEARCH_DIAMOND
+ {
+ .num_neighbors = 4,
+ .neighbors = { { 0, -1 }, { 1, 0 }, { 0, 1 }, { -1, 0 } },
+ .neighbor_mask = {
+ // If we stepped left last time, consider all points except right
+ NEIGHBOR_MASK_DIA(1, 1, 0, 1),
+ // If we stepped down last time, consider all points except up
+ NEIGHBOR_MASK_DIA(1, 1, 1, 0),
+ // Stepped right last time
+ NEIGHBOR_MASK_DIA(0, 1, 1, 1),
+ // Stepped up last time
+ NEIGHBOR_MASK_DIA(1, 0, 1, 1),
+ },
+ },
+ // WARP_SEARCH_SQUARE
+ {
+ .num_neighbors = 8,
+ .neighbors = { { 0, -1 }, { 1, 0 }, { 0, 1 }, { -1, 0 },
+ { 1, -1 }, { 1, 1 }, { -1, -1 }, { -1, 1 } },
+ .neighbor_mask = {
+ // If we stepped left last time, then we only need to consider 3 points:
+ // left, down+left, up+left
+ NEIGHBOR_MASK_SQR(1, 0, 0, 0, 1, 0, 1, 0),
+ // If we stepped down last time, then we only need to consider 3 points:
+ // down, down+left, down+right
+ NEIGHBOR_MASK_SQR(0, 1, 0, 0, 1, 1, 0, 0),
+ // Stepped right last time
+ NEIGHBOR_MASK_SQR(0, 0, 1, 0, 0, 1, 0, 1),
+ // Stepped up last time
+ NEIGHBOR_MASK_SQR(0, 0, 0, 1, 0, 0, 1, 1),
+
+ // If we stepped down+left last time, then we need to consider 5 points:
+ // left, down, down+left, down+right, up+left
+ NEIGHBOR_MASK_SQR(1, 1, 0, 0, 1, 1, 1, 0),
+ // Stepped down+right last time
+ NEIGHBOR_MASK_SQR(0, 1, 1, 0, 1, 1, 0, 1),
+ // Stepped up+left last time
+ NEIGHBOR_MASK_SQR(1, 0, 0, 1, 1, 0, 1, 1),
+ // Stepped up+right last time
+ NEIGHBOR_MASK_SQR(0, 0, 1, 1, 0, 1, 1, 1),
+ },
+ },
+};
+
unsigned int av1_refine_warped_mv(MACROBLOCKD *xd, const AV1_COMMON *const cm,
const SUBPEL_MOTION_SEARCH_PARAMS *ms_params,
BLOCK_SIZE bsize, const int *pts0,
- const int *pts_inref0, int total_samples) {
+ const int *pts_inref0, int total_samples,
+ WARP_SEARCH_METHOD search_method,
+ int num_iterations) {
MB_MODE_INFO *mbmi = xd->mi[0];
- static const MV neighbors[8] = { { 0, -1 }, { 1, 0 }, { 0, 1 }, { -1, 0 },
- { 0, -2 }, { 2, 0 }, { 0, 2 }, { -2, 0 } };
+
+ const MV *neighbors = warp_search_info[search_method].neighbors;
+ const int num_neighbors = warp_search_info[search_method].num_neighbors;
+ const uint8_t *neighbor_mask = warp_search_info[search_method].neighbor_mask;
+
MV *best_mv = &mbmi->mv[0].as_mv;
WarpedMotionParams best_wm_params = mbmi->wm_params;
@@ -3239,7 +3450,7 @@
unsigned int bestmse;
const SubpelMvLimits *mv_limits = &ms_params->mv_limits;
- const int start = ms_params->allow_hp ? 0 : 4;
+ const int mv_shift = ms_params->allow_hp ? 0 : 1;
// Calculate the center position's error
assert(av1_is_subpelmv_in_range(mv_limits, *best_mv));
@@ -3249,14 +3460,22 @@
int pts[SAMPLES_ARRAY_SIZE], pts_inref[SAMPLES_ARRAY_SIZE];
const int mi_row = xd->mi_row;
const int mi_col = xd->mi_col;
- for (int ite = 0; ite < 2; ++ite) {
+
+ // First step always scans all neighbors
+ uint8_t valid_neighbors = UINT8_MAX;
+
+ for (int ite = 0; ite < num_iterations; ++ite) {
int best_idx = -1;
- for (int idx = start; idx < start + 4; ++idx) {
+ for (int idx = 0; idx < num_neighbors; ++idx) {
+ if ((valid_neighbors & (1 << idx)) == 0) {
+ continue;
+ }
+
unsigned int thismse;
- MV this_mv = { best_mv->row + neighbors[idx].row,
- best_mv->col + neighbors[idx].col };
+ MV this_mv = { best_mv->row + neighbors[idx].row * (1 << mv_shift),
+ best_mv->col + neighbors[idx].col * (1 << mv_shift) };
if (av1_is_subpelmv_in_range(mv_limits, this_mv)) {
memcpy(pts, pts0, total_samples * 2 * sizeof(*pts0));
memcpy(pts_inref, pts_inref0, total_samples * 2 * sizeof(*pts_inref0));
@@ -3283,8 +3502,9 @@
if (best_idx == -1) break;
if (best_idx >= 0) {
- best_mv->row += neighbors[best_idx].row;
- best_mv->col += neighbors[best_idx].col;
+ best_mv->row += neighbors[best_idx].row * (1 << mv_shift);
+ best_mv->col += neighbors[best_idx].col * (1 << mv_shift);
+ valid_neighbors = neighbor_mask[best_idx];
}
}
@@ -3292,6 +3512,7 @@
mbmi->num_proj_ref = best_num_proj_ref;
return bestmse;
}
+
#endif // !CONFIG_REALTIME_ONLY
// =============================================================================
// Subpixel Motion Search: OBMC
diff --git a/av1/encoder/mcomp.h b/av1/encoder/mcomp.h
index 1e8bbab..6b9af07 100644
--- a/av1/encoder/mcomp.h
+++ b/av1/encoder/mcomp.h
@@ -144,7 +144,7 @@
void av1_make_default_fullpel_ms_params(
FULLPEL_MOTION_SEARCH_PARAMS *ms_params, const struct AV1_COMP *cpi,
- MACROBLOCK *x, BLOCK_SIZE bsize, const MV *ref_mv,
+ MACROBLOCK *x, BLOCK_SIZE bsize, const MV *ref_mv, FULLPEL_MV start_mv,
const search_site_config search_sites[NUM_DISTINCT_SEARCH_METHODS],
int fine_search_interval);
@@ -176,14 +176,9 @@
typedef void (*av1_init_search_site_config)(search_site_config *cfg, int stride,
int level);
-/*! Array of function pointer used to set the motion search config. */
-static const av1_init_search_site_config
- av1_init_motion_compensation[NUM_DISTINCT_SEARCH_METHODS] = {
- av1_init_dsmotion_compensation, av1_init_motion_compensation_nstep,
- av1_init_motion_compensation_nstep, av1_init_dsmotion_compensation,
- av1_init_motion_compensation_hex, av1_init_motion_compensation_bigdia,
- av1_init_motion_compensation_square
- };
+/*! Array of function pointers used to set the motion search config. */
+extern const av1_init_search_site_config
+ av1_init_motion_compensation[NUM_DISTINCT_SEARCH_METHODS];
// Array to inform which all search methods are having
// same candidates and different in number of search steps.
@@ -344,7 +339,9 @@
unsigned int av1_refine_warped_mv(MACROBLOCKD *xd, const AV1_COMMON *const cm,
const SUBPEL_MOTION_SEARCH_PARAMS *ms_params,
BLOCK_SIZE bsize, const int *pts0,
- const int *pts_inref0, int total_samples);
+ const int *pts_inref0, int total_samples,
+ WARP_SEARCH_METHOD search_method,
+ int num_iterations);
static INLINE void av1_set_fractional_mv(int_mv *fractional_best_mv) {
for (int z = 0; z < 3; z++) {
@@ -356,14 +353,13 @@
const FullMvLimits *mv_limits,
const MV *ref_mv) {
const int max_mv = GET_MV_SUBPEL(MAX_FULL_PEL_VAL);
- const int minc =
- AOMMAX(GET_MV_SUBPEL(mv_limits->col_min), ref_mv->col - max_mv);
- const int maxc =
- AOMMIN(GET_MV_SUBPEL(mv_limits->col_max), ref_mv->col + max_mv);
- const int minr =
- AOMMAX(GET_MV_SUBPEL(mv_limits->row_min), ref_mv->row - max_mv);
- const int maxr =
- AOMMIN(GET_MV_SUBPEL(mv_limits->row_max), ref_mv->row + max_mv);
+ int minc = AOMMAX(GET_MV_SUBPEL(mv_limits->col_min), ref_mv->col - max_mv);
+ int maxc = AOMMIN(GET_MV_SUBPEL(mv_limits->col_max), ref_mv->col + max_mv);
+ int minr = AOMMAX(GET_MV_SUBPEL(mv_limits->row_min), ref_mv->row - max_mv);
+ int maxr = AOMMIN(GET_MV_SUBPEL(mv_limits->row_max), ref_mv->row + max_mv);
+
+ maxc = AOMMAX(minc, maxc);
+ maxr = AOMMAX(minr, maxr);
subpel_limits->col_min = AOMMAX(MV_LOW + 1, minc);
subpel_limits->col_max = AOMMIN(MV_UPP - 1, maxc);
diff --git a/av1/encoder/mcomp_structs.h b/av1/encoder/mcomp_structs.h
index 3fc1ab8..06660cf 100644
--- a/av1/encoder/mcomp_structs.h
+++ b/av1/encoder/mcomp_structs.h
@@ -22,6 +22,12 @@
#define MAX_FULL_PEL_VAL ((1 << (MAX_MVSEARCH_STEPS - 1)) - 1)
// Maximum size of the first step in full pel units
#define MAX_FIRST_STEP (1 << (MAX_MVSEARCH_STEPS - 1))
+// Maximum number of neighbors to scan per iteration during
+// WARPED_CAUSAL refinement
+// Note: The elements of warp_search_config.neighbor_mask must be at least
+// MAX_WARP_SEARCH_NEIGHBORS many bits wide. So the type may need to be
+// widened if this value is increased.
+#define MAX_WARP_SEARCH_NEIGHBORS 8
#define SEARCH_RANGE_8P 3
#define SEARCH_GRID_STRIDE_8P (2 * SEARCH_RANGE_8P + 1)
@@ -82,4 +88,22 @@
NUM_DISTINCT_SEARCH_METHODS = SQUARE + 1,
} UENUM1BYTE(SEARCH_METHODS);
+typedef struct warp_search_config {
+ int num_neighbors;
+ MV neighbors[MAX_WARP_SEARCH_NEIGHBORS];
+ // Bitmask which is used to prune the search neighbors at one iteration
+ // based on which direction we chose in the previous iteration.
+ // See comments in av1_refine_warped_mv for details.
+ uint8_t neighbor_mask[MAX_WARP_SEARCH_NEIGHBORS];
+} warp_search_config;
+
+// Methods for refining WARPED_CAUSAL motion vectors
+enum {
+ // Search 4 adjacent points in a diamond shape at each iteration
+ WARP_SEARCH_DIAMOND,
+ // Search 8 adjacent points in a square at each iteration
+ WARP_SEARCH_SQUARE,
+ WARP_SEARCH_METHODS
+} UENUM1BYTE(WARP_SEARCH_METHOD);
+
#endif // AOM_AV1_ENCODER_MCOMP_STRUCTS_H_
diff --git a/av1/encoder/ml.c b/av1/encoder/ml.c
index 5078fb1..94cd56c 100644
--- a/av1/encoder/ml.c
+++ b/av1/encoder/ml.c
@@ -13,6 +13,7 @@
#include <math.h>
#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/mathutils.h"
#include "av1/encoder/ml.h"
void av1_nn_output_prec_reduce(float *const output, int num_output) {
@@ -155,22 +156,6 @@
for (int i = 0; i < n; i++) output[i] /= sum_out;
}
-static AOM_INLINE float approx_exp(float y) {
-#define A ((1 << 23) / 0.69314718056f) // (1 << 23) / ln(2)
-#define B \
- 127 // Offset for the exponent according to IEEE floating point standard.
-#define C 60801 // Magic number controls the accuracy of approximation
- union {
- float as_float;
- int32_t as_int32;
- } container;
- container.as_int32 = ((int32_t)(y * A)) + ((B << 23) - C);
- return container.as_float;
-#undef A
-#undef B
-#undef C
-}
-
void av1_nn_fast_softmax_16_c(const float *input, float *output) {
const int kNumClasses = 16;
float max_input = input[0];
diff --git a/av1/encoder/motion_search_facade.c b/av1/encoder/motion_search_facade.c
index 30e1b73..b771b05 100644
--- a/av1/encoder/motion_search_facade.c
+++ b/av1/encoder/motion_search_facade.c
@@ -192,23 +192,42 @@
// Check difference between mvs in the stack and candidate mv.
for (int stack_idx = 0; stack_idx < stack_size; stack_idx++) {
- FULLPEL_MV *fmv_stack = &args->start_mv_stack[stack_idx];
- const int row = abs(fmv_stack->row - fmv_cand->as_fullmv.row);
- const int col = abs(fmv_stack->col - fmv_cand->as_fullmv.col);
+ const uint8_t this_ref_mv_idx = args->ref_mv_idx_stack[stack_idx];
+ const FULLPEL_MV *fmv_stack = &args->start_mv_stack[stack_idx];
+ const int this_newmv_valid =
+ args->single_newmv_valid[this_ref_mv_idx][ref];
+ const int row_diff = abs(fmv_stack->row - fmv_cand->as_fullmv.row);
+ const int col_diff = abs(fmv_stack->col - fmv_cand->as_fullmv.col);
- if (row <= 1 && col <= 1) {
- skip_cand_mv = 1;
- break;
+ if (!this_newmv_valid) continue;
+
+ if (cpi->sf.mv_sf.skip_fullpel_search_using_startmv >= 2) {
+ // Prunes the current start_mv candidate, if the absolute mv
+ // difference of both row and column are <= 1.
+ if (row_diff <= 1 && col_diff <= 1) {
+ skip_cand_mv = 1;
+ break;
+ }
+ } else if (cpi->sf.mv_sf.skip_fullpel_search_using_startmv >= 1) {
+ // Prunes the current start_mv candidate, if the sum of the absolute
+ // mv difference of row and column is <= 1.
+ if (row_diff + col_diff <= 1) {
+ skip_cand_mv = 1;
+ break;
+ }
}
}
if (skip_cand_mv) {
+ // Ensure atleast one full-pel motion search is not pruned.
+ assert(mbmi->ref_mv_idx != 0);
// Mark the candidate mv as invalid so that motion search gets skipped.
cand[cand_idx].fmv.as_int = INVALID_MV;
} else {
- // Store start mv candidate of full-pel search in the mv stack (except
- // last ref_mv_idx).
+ // Store start_mv candidate and corresponding ref_mv_idx of full-pel
+ // search in the mv stack (except last ref_mv_idx).
if (mbmi->ref_mv_idx != MAX_REF_MV_SEARCH - 1) {
args->start_mv_stack[args->start_mv_cnt] = fmv_cand->as_fullmv;
+ args->ref_mv_idx_stack[args->start_mv_cnt] = mbmi->ref_mv_idx;
args->start_mv_cnt++;
assert(args->start_mv_cnt <= (MAX_REF_MV_SEARCH - 1) * 2);
}
@@ -246,8 +265,6 @@
// Allow more mesh searches for screen content type on the ARF.
const int fine_search_interval = use_fine_search_interval(cpi);
FULLPEL_MOTION_SEARCH_PARAMS full_ms_params;
- av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize, &ref_mv,
- src_search_site_cfg, fine_search_interval);
switch (mbmi->motion_mode) {
case SIMPLE_TRANSLATION: {
@@ -259,7 +276,11 @@
if (smv.as_int == INVALID_MV) continue;
- int thissme =
+ av1_make_default_fullpel_ms_params(
+ &full_ms_params, cpi, x, bsize, &ref_mv, smv.as_fullmv,
+ src_search_site_cfg, fine_search_interval);
+
+ const int thissme =
av1_full_pixel_search(smv.as_fullmv, &full_ms_params, step_param,
cond_cost_list(cpi, cost_list), &this_best_mv,
&this_second_best_mv);
@@ -275,6 +296,10 @@
}
} break;
case OBMC_CAUSAL:
+ av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize,
+ &ref_mv, start_mv, src_search_site_cfg,
+ fine_search_interval);
+
bestsme = av1_obmc_full_pixel_search(start_mv, &full_ms_params,
step_param, &best_mv->as_fullmv);
break;
@@ -496,7 +521,7 @@
int av1_joint_motion_search(const AV1_COMP *cpi, MACROBLOCK *x,
BLOCK_SIZE bsize, int_mv *cur_mv,
const uint8_t *mask, int mask_stride, int *rate_mv,
- int allow_second_mv) {
+ int allow_second_mv, int joint_me_num_refine_iter) {
const AV1_COMMON *const cm = &cpi->common;
const int num_planes = av1_num_planes(cm);
const int pw = block_size_wide[bsize];
@@ -536,7 +561,7 @@
// Allow joint search multiple times iteratively for each reference frame
// and break out of the search loop if it couldn't find a better mv.
- for (ite = 0; ite < 4; ite++) {
+ for (ite = 0; ite < (2 * joint_me_num_refine_iter); ite++) {
struct buf_2d ref_yv12[2];
int bestsme = INT_MAX;
int id = ite % 2; // Even iterations search in the first reference frame,
@@ -599,16 +624,16 @@
const SEARCH_METHODS search_method = cpi->sf.mv_sf.search_method;
const search_site_config *src_search_sites =
av1_get_search_site_config(cpi, x, search_method);
+ // Use the mv result from the single mode as mv predictor.
+ const FULLPEL_MV start_fullmv = get_fullmv_from_mv(&cur_mv[id].as_mv);
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize,
- &ref_mv[id].as_mv, src_search_sites,
+ &ref_mv[id].as_mv, start_fullmv,
+ src_search_sites,
/*fine_search_interval=*/0);
av1_set_ms_compound_refs(&full_ms_params.ms_buffers, second_pred, mask,
mask_stride, id);
- // Use the mv result from the single mode as mv predictor.
- const FULLPEL_MV start_fullmv = get_fullmv_from_mv(&cur_mv[id].as_mv);
-
// Small-range full-pixel motion search.
if (!cpi->sf.mv_sf.disable_extensive_joint_motion_search &&
mbmi->interinter_comp.type != COMPOUND_WEDGE) {
@@ -737,7 +762,11 @@
}
const int mi_row = xd->mi_row;
const int mi_col = xd->mi_col;
- av1_setup_pre_planes(xd, ref_idx, scaled_ref_frame, mi_row, mi_col, NULL,
+ // The index below needs to be 0 instead of ref_idx since we assume the
+ // 0th slot to be used for subsequent searches. Note that the ref_idx
+ // reference buffer has been copied to the 0th slot in the code above.
+ // Now we need to swap the reference frame for the 0th slot.
+ av1_setup_pre_planes(xd, 0, scaled_ref_frame, mi_row, mi_col, NULL,
num_planes);
}
@@ -749,24 +778,24 @@
const SEARCH_METHODS search_method = cpi->sf.mv_sf.search_method;
const search_site_config *src_search_sites =
av1_get_search_site_config(cpi, x, search_method);
+ // Use the mv result from the single mode as mv predictor.
+ const FULLPEL_MV start_fullmv = get_fullmv_from_mv(this_mv);
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize,
- &ref_mv.as_mv, src_search_sites,
+ &ref_mv.as_mv, start_fullmv,
+ src_search_sites,
/*fine_search_interval=*/0);
av1_set_ms_compound_refs(&full_ms_params.ms_buffers, second_pred, mask,
mask_stride, ref_idx);
- // Use the mv result from the single mode as mv predictor.
- const FULLPEL_MV start_fullmv = get_fullmv_from_mv(this_mv);
-
// Small-range full-pixel motion search.
bestsme = av1_full_pixel_search(start_fullmv, &full_ms_params, 5, NULL,
&best_mv.as_fullmv, NULL);
if (scaled_ref_frame) {
- // Swap back the original buffers for subpel motion search.
+ // Swap back the original buffers for subpel motion search for the 0th slot.
for (int i = 0; i < num_planes; i++) {
- xd->plane[i].pre[ref_idx] = backup_yv12[i];
+ xd->plane[i].pre[0] = backup_yv12[i];
}
}
@@ -883,8 +912,13 @@
av1_compound_single_motion_search_interinter(cpi, x, bsize, tmp_mv, mask,
mask_stride, rate_mv, which);
} else if (which == 2) {
+ const int joint_me_num_refine_iter =
+ cpi->sf.inter_sf.enable_fast_compound_mode_search == 2
+ ? REDUCED_JOINT_ME_REFINE_ITER
+ : NUM_JOINT_ME_REFINE_ITER;
av1_joint_motion_search(cpi, x, bsize, tmp_mv, mask, mask_stride, rate_mv,
- !cpi->sf.mv_sf.disable_second_mv);
+ !cpi->sf.mv_sf.disable_second_mv,
+ joint_me_num_refine_iter);
}
}
@@ -971,7 +1005,8 @@
const search_site_config *src_search_sites =
av1_get_search_site_config(cpi, x, search_method);
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize, &ref_mv,
- src_search_sites, fine_search_interval);
+ start_mv, src_search_sites,
+ fine_search_interval);
var = av1_full_pixel_search(start_mv, &full_ms_params, step_param,
cond_cost_list(cpi, cost_list),
diff --git a/av1/encoder/motion_search_facade.h b/av1/encoder/motion_search_facade.h
index 4d76287..d2996bc 100644
--- a/av1/encoder/motion_search_facade.h
+++ b/av1/encoder/motion_search_facade.h
@@ -18,6 +18,8 @@
extern "C" {
#endif
+#define NUM_JOINT_ME_REFINE_ITER 2
+#define REDUCED_JOINT_ME_REFINE_ITER 1
// TODO(any): rename this struct to something else. There is already another
// struct called inter_modes_info, which makes this terribly confusing.
typedef struct {
@@ -38,7 +40,7 @@
int av1_joint_motion_search(const AV1_COMP *cpi, MACROBLOCK *x,
BLOCK_SIZE bsize, int_mv *cur_mv,
const uint8_t *mask, int mask_stride, int *rate_mv,
- int allow_second_mv);
+ int allow_second_mv, int joint_me_num_refine_iter);
int av1_interinter_compound_motion_search(const AV1_COMP *const cpi,
MACROBLOCK *x,
diff --git a/av1/encoder/nonrd_opt.c b/av1/encoder/nonrd_opt.c
new file mode 100644
index 0000000..651ca43
--- /dev/null
+++ b/av1/encoder/nonrd_opt.c
@@ -0,0 +1,933 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "config/aom_dsp_rtcd.h"
+
+#include "av1/common/reconinter.h"
+
+#include "av1/encoder/encodemv.h"
+#include "av1/encoder/nonrd_opt.h"
+#include "av1/encoder/rdopt.h"
+
+static const SCAN_ORDER av1_fast_idtx_scan_order_16x16 = {
+ av1_fast_idtx_scan_16x16, av1_fast_idtx_iscan_16x16
+};
+
+#define DECLARE_BLOCK_YRD_BUFFERS() \
+ DECLARE_ALIGNED(64, tran_low_t, dqcoeff_buf[16 * 16]); \
+ DECLARE_ALIGNED(64, tran_low_t, qcoeff_buf[16 * 16]); \
+ DECLARE_ALIGNED(64, tran_low_t, coeff_buf[16 * 16]); \
+ uint16_t eob[1];
+
+#define DECLARE_BLOCK_YRD_VARS() \
+ /* When is_tx_8x8_dual_applicable is true, we compute the txfm for the \
+ * entire bsize and write macroblock_plane::coeff. So low_coeff is kept \
+ * as a non-const so we can reassign it to macroblock_plane::coeff. */ \
+ int16_t *low_coeff = (int16_t *)coeff_buf; \
+ int16_t *const low_qcoeff = (int16_t *)qcoeff_buf; \
+ int16_t *const low_dqcoeff = (int16_t *)dqcoeff_buf; \
+ const int diff_stride = bw;
+
+#define DECLARE_LOOP_VARS_BLOCK_YRD() \
+ const int16_t *src_diff = &p->src_diff[(r * diff_stride + c) << 2];
+
+static AOM_FORCE_INLINE void update_yrd_loop_vars(
+ MACROBLOCK *x, int *skippable, int step, int ncoeffs,
+ int16_t *const low_coeff, int16_t *const low_qcoeff,
+ int16_t *const low_dqcoeff, RD_STATS *this_rdc, int *eob_cost,
+ int tx_blk_id) {
+ const int is_txfm_skip = (ncoeffs == 0);
+ *skippable &= is_txfm_skip;
+ x->txfm_search_info.blk_skip[tx_blk_id] = is_txfm_skip;
+ *eob_cost += get_msb(ncoeffs + 1);
+ if (ncoeffs == 1)
+ this_rdc->rate += (int)abs(low_qcoeff[0]);
+ else if (ncoeffs > 1)
+ this_rdc->rate += aom_satd_lp(low_qcoeff, step << 4);
+
+ this_rdc->dist += av1_block_error_lp(low_coeff, low_dqcoeff, step << 4) >> 2;
+}
+
+static INLINE void aom_process_hadamard_lp_8x16(MACROBLOCK *x,
+ int max_blocks_high,
+ int max_blocks_wide,
+ int num_4x4_w, int step,
+ int block_step) {
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ const int bw = 4 * num_4x4_w;
+ const int num_4x4 = AOMMIN(num_4x4_w, max_blocks_wide);
+ int block = 0;
+
+ for (int r = 0; r < max_blocks_high; r += block_step) {
+ for (int c = 0; c < num_4x4; c += 2 * block_step) {
+ const int16_t *src_diff = &p->src_diff[(r * bw + c) << 2];
+ int16_t *low_coeff = (int16_t *)p->coeff + BLOCK_OFFSET(block);
+ aom_hadamard_lp_8x8_dual(src_diff, (ptrdiff_t)bw, low_coeff);
+ block += 2 * step;
+ }
+ }
+}
+
+#if CONFIG_AV1_HIGHBITDEPTH
+#define DECLARE_BLOCK_YRD_HBD_VARS() \
+ tran_low_t *const coeff = coeff_buf; \
+ tran_low_t *const qcoeff = qcoeff_buf; \
+ tran_low_t *const dqcoeff = dqcoeff_buf;
+
+static AOM_FORCE_INLINE void update_yrd_loop_vars_hbd(
+ MACROBLOCK *x, int *skippable, int step, int ncoeffs,
+ tran_low_t *const coeff, tran_low_t *const qcoeff,
+ tran_low_t *const dqcoeff, RD_STATS *this_rdc, int *eob_cost,
+ int tx_blk_id) {
+ const MACROBLOCKD *xd = &x->e_mbd;
+ const int is_txfm_skip = (ncoeffs == 0);
+ *skippable &= is_txfm_skip;
+ x->txfm_search_info.blk_skip[tx_blk_id] = is_txfm_skip;
+ *eob_cost += get_msb(ncoeffs + 1);
+
+ int64_t dummy;
+ if (ncoeffs == 1)
+ this_rdc->rate += (int)abs(qcoeff[0]);
+ else if (ncoeffs > 1)
+ this_rdc->rate += aom_satd(qcoeff, step << 4);
+ this_rdc->dist +=
+ av1_highbd_block_error(coeff, dqcoeff, step << 4, &dummy, xd->bd) >> 2;
+}
+#endif
+
+/*!\brief Calculates RD Cost using Hadamard transform.
+ *
+ * \ingroup nonrd_mode_search
+ * \callgraph
+ * \callergraph
+ * Calculates RD Cost using Hadamard transform. For low bit depth this function
+ * uses low-precision set of functions (16-bit) and 32 bit for high bit depth
+ * \param[in] x Pointer to structure holding all the data for
+ the current macroblock
+ * \param[in] this_rdc Pointer to calculated RD Cost
+ * \param[in] skippable Pointer to a flag indicating possible tx skip
+ * \param[in] bsize Current block size
+ * \param[in] tx_size Transform size
+ * \param[in] is_inter_mode Flag to indicate inter mode
+ *
+ * \remark Nothing is returned. Instead, calculated RD cost is placed to
+ * \c this_rdc. \c skippable flag is set if there is no non-zero quantized
+ * coefficients for Hadamard transform
+ */
+void av1_block_yrd(MACROBLOCK *x, RD_STATS *this_rdc, int *skippable,
+ BLOCK_SIZE bsize, TX_SIZE tx_size) {
+ MACROBLOCKD *xd = &x->e_mbd;
+ const struct macroblockd_plane *pd = &xd->plane[AOM_PLANE_Y];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ assert(bsize < BLOCK_SIZES_ALL);
+ const int num_4x4_w = mi_size_wide[bsize];
+ const int num_4x4_h = mi_size_high[bsize];
+ const int step = 1 << (tx_size << 1);
+ const int block_step = (1 << tx_size);
+ const int row_step = step * num_4x4_w >> tx_size;
+ int block = 0;
+ const int max_blocks_wide =
+ num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 : xd->mb_to_right_edge >> 5);
+ const int max_blocks_high =
+ num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 : xd->mb_to_bottom_edge >> 5);
+ int eob_cost = 0;
+ const int bw = 4 * num_4x4_w;
+ const int bh = 4 * num_4x4_h;
+ const int use_hbd = is_cur_buf_hbd(xd);
+ int num_blk_skip_w = num_4x4_w;
+
+#if CONFIG_AV1_HIGHBITDEPTH
+ if (use_hbd) {
+ aom_highbd_subtract_block(bh, bw, p->src_diff, bw, p->src.buf,
+ p->src.stride, pd->dst.buf, pd->dst.stride);
+ } else {
+ aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
+ pd->dst.buf, pd->dst.stride);
+ }
+#else
+ aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
+ pd->dst.buf, pd->dst.stride);
+#endif
+
+ // Keep the intermediate value on the stack here. Writing directly to
+ // skippable causes speed regression due to load-and-store issues in
+ // update_yrd_loop_vars.
+ int temp_skippable = 1;
+ this_rdc->dist = 0;
+ this_rdc->rate = 0;
+ // For block sizes 8x16 or above, Hadamard txfm of two adjacent 8x8 blocks
+ // can be done per function call. Hence the call of Hadamard txfm is
+ // abstracted here for the specified cases.
+ int is_tx_8x8_dual_applicable =
+ (tx_size == TX_8X8 && block_size_wide[bsize] >= 16 &&
+ block_size_high[bsize] >= 8);
+
+#if CONFIG_AV1_HIGHBITDEPTH
+ // As of now, dual implementation of hadamard txfm is available for low
+ // bitdepth.
+ if (use_hbd) is_tx_8x8_dual_applicable = 0;
+#endif
+
+ if (is_tx_8x8_dual_applicable) {
+ aom_process_hadamard_lp_8x16(x, max_blocks_high, max_blocks_wide, num_4x4_w,
+ step, block_step);
+ }
+
+ const SCAN_ORDER *const scan_order = &av1_scan_orders[tx_size][DCT_DCT];
+ DECLARE_BLOCK_YRD_BUFFERS()
+ DECLARE_BLOCK_YRD_VARS()
+#if CONFIG_AV1_HIGHBITDEPTH
+ DECLARE_BLOCK_YRD_HBD_VARS()
+#else
+ (void)use_hbd;
+#endif
+
+ // Keep track of the row and column of the blocks we use so that we know
+ // if we are in the unrestricted motion border.
+ for (int r = 0; r < max_blocks_high; r += block_step) {
+ for (int c = 0, s = 0; c < max_blocks_wide; c += block_step, s += step) {
+ DECLARE_LOOP_VARS_BLOCK_YRD()
+
+ switch (tx_size) {
+#if CONFIG_AV1_HIGHBITDEPTH
+ case TX_16X16:
+ if (use_hbd) {
+ aom_hadamard_16x16(src_diff, diff_stride, coeff);
+ av1_quantize_fp(coeff, 16 * 16, p->zbin_QTX, p->round_fp_QTX,
+ p->quant_fp_QTX, p->quant_shift_QTX, qcoeff,
+ dqcoeff, p->dequant_QTX, eob,
+ // default_scan_fp_16x16_transpose and
+ // av1_default_iscan_fp_16x16_transpose have to be
+ // used together.
+ default_scan_fp_16x16_transpose,
+ av1_default_iscan_fp_16x16_transpose);
+ } else {
+ aom_hadamard_lp_16x16(src_diff, diff_stride, low_coeff);
+ av1_quantize_lp(low_coeff, 16 * 16, p->round_fp_QTX,
+ p->quant_fp_QTX, low_qcoeff, low_dqcoeff,
+ p->dequant_QTX, eob,
+ // default_scan_lp_16x16_transpose and
+ // av1_default_iscan_lp_16x16_transpose have to be
+ // used together.
+ default_scan_lp_16x16_transpose,
+ av1_default_iscan_lp_16x16_transpose);
+ }
+ break;
+ case TX_8X8:
+ if (use_hbd) {
+ aom_hadamard_8x8(src_diff, diff_stride, coeff);
+ av1_quantize_fp(
+ coeff, 8 * 8, p->zbin_QTX, p->round_fp_QTX, p->quant_fp_QTX,
+ p->quant_shift_QTX, qcoeff, dqcoeff, p->dequant_QTX, eob,
+ default_scan_8x8_transpose, av1_default_iscan_8x8_transpose);
+ } else {
+ if (is_tx_8x8_dual_applicable) {
+ // The coeffs are pre-computed for the whole block, so re-assign
+ // low_coeff to the appropriate location.
+ const int block_offset = BLOCK_OFFSET(block + s);
+ low_coeff = (int16_t *)p->coeff + block_offset;
+ } else {
+ aom_hadamard_lp_8x8(src_diff, diff_stride, low_coeff);
+ }
+ av1_quantize_lp(
+ low_coeff, 8 * 8, p->round_fp_QTX, p->quant_fp_QTX, low_qcoeff,
+ low_dqcoeff, p->dequant_QTX, eob,
+ // default_scan_8x8_transpose and
+ // av1_default_iscan_8x8_transpose have to be used together.
+ default_scan_8x8_transpose, av1_default_iscan_8x8_transpose);
+ }
+ break;
+ default:
+ assert(tx_size == TX_4X4);
+ // In tx_size=4x4 case, aom_fdct4x4 and aom_fdct4x4_lp generate
+ // normal coefficients order, so we don't need to change the scan
+ // order here.
+ if (use_hbd) {
+ aom_fdct4x4(src_diff, coeff, diff_stride);
+ av1_quantize_fp(coeff, 4 * 4, p->zbin_QTX, p->round_fp_QTX,
+ p->quant_fp_QTX, p->quant_shift_QTX, qcoeff,
+ dqcoeff, p->dequant_QTX, eob, scan_order->scan,
+ scan_order->iscan);
+ } else {
+ aom_fdct4x4_lp(src_diff, low_coeff, diff_stride);
+ av1_quantize_lp(low_coeff, 4 * 4, p->round_fp_QTX, p->quant_fp_QTX,
+ low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
+ scan_order->scan, scan_order->iscan);
+ }
+ break;
+#else
+ case TX_16X16:
+ aom_hadamard_lp_16x16(src_diff, diff_stride, low_coeff);
+ av1_quantize_lp(low_coeff, 16 * 16, p->round_fp_QTX, p->quant_fp_QTX,
+ low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
+ default_scan_lp_16x16_transpose,
+ av1_default_iscan_lp_16x16_transpose);
+ break;
+ case TX_8X8:
+ if (is_tx_8x8_dual_applicable) {
+ // The coeffs are pre-computed for the whole block, so re-assign
+ // low_coeff to the appropriate location.
+ const int block_offset = BLOCK_OFFSET(block + s);
+ low_coeff = (int16_t *)p->coeff + block_offset;
+ } else {
+ aom_hadamard_lp_8x8(src_diff, diff_stride, low_coeff);
+ }
+ av1_quantize_lp(low_coeff, 8 * 8, p->round_fp_QTX, p->quant_fp_QTX,
+ low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
+ default_scan_8x8_transpose,
+ av1_default_iscan_8x8_transpose);
+ break;
+ default:
+ aom_fdct4x4_lp(src_diff, low_coeff, diff_stride);
+ av1_quantize_lp(low_coeff, 4 * 4, p->round_fp_QTX, p->quant_fp_QTX,
+ low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
+ scan_order->scan, scan_order->iscan);
+ break;
+#endif
+ }
+ assert(*eob <= 1024);
+#if CONFIG_AV1_HIGHBITDEPTH
+ if (use_hbd)
+ update_yrd_loop_vars_hbd(x, &temp_skippable, step, *eob, coeff, qcoeff,
+ dqcoeff, this_rdc, &eob_cost,
+ r * num_blk_skip_w + c);
+ else
+#endif
+ update_yrd_loop_vars(x, &temp_skippable, step, *eob, low_coeff,
+ low_qcoeff, low_dqcoeff, this_rdc, &eob_cost,
+ r * num_blk_skip_w + c);
+ }
+ block += row_step;
+ }
+
+ this_rdc->skip_txfm = *skippable = temp_skippable;
+ if (this_rdc->sse < INT64_MAX) {
+ this_rdc->sse = (this_rdc->sse << 6) >> 2;
+ if (temp_skippable) {
+ this_rdc->dist = 0;
+ this_rdc->dist = this_rdc->sse;
+ return;
+ }
+ }
+
+ // If skippable is set, rate gets clobbered later.
+ this_rdc->rate <<= (2 + AV1_PROB_COST_SHIFT);
+ this_rdc->rate += (eob_cost << AV1_PROB_COST_SHIFT);
+}
+
+// Explicitly enumerate the cases so the compiler can generate SIMD for the
+// function. According to the disassembler, gcc generates SSE codes for each of
+// the possible block sizes. The hottest case is tx_width 16, which takes up
+// about 8% of the self cycle of av1_nonrd_pick_inter_mode_sb. Since
+// av1_nonrd_pick_inter_mode_sb takes up about 3% of total encoding time, the
+// potential room of improvement for writing AVX2 optimization is only 3% * 8% =
+// 0.24% of total encoding time.
+static AOM_INLINE void scale_square_buf_vals(int16_t *dst, int tx_width,
+ const int16_t *src,
+ int src_stride) {
+#define DO_SCALING \
+ do { \
+ for (int idy = 0; idy < tx_width; ++idy) { \
+ for (int idx = 0; idx < tx_width; ++idx) { \
+ dst[idy * tx_width + idx] = src[idy * src_stride + idx] * 8; \
+ } \
+ } \
+ } while (0)
+
+ if (tx_width == 4) {
+ DO_SCALING;
+ } else if (tx_width == 8) {
+ DO_SCALING;
+ } else if (tx_width == 16) {
+ DO_SCALING;
+ } else {
+ assert(0);
+ }
+
+#undef DO_SCALING
+}
+
+/*!\brief Calculates RD Cost when the block uses Identity transform.
+ * Note that this function is only for low bit depth encoding, since it
+ * is called in real-time mode for now, which sets high bit depth to 0:
+ * -DCONFIG_AV1_HIGHBITDEPTH=0
+ *
+ * \ingroup nonrd_mode_search
+ * \callgraph
+ * \callergraph
+ * Calculates RD Cost. For low bit depth this function
+ * uses low-precision set of functions (16-bit) and 32 bit for high bit depth
+ * \param[in] x Pointer to structure holding all the data for
+ the current macroblock
+ * \param[in] pred_buf Pointer to the prediction buffer
+ * \param[in] pred_stride Stride for the prediction buffer
+ * \param[in] this_rdc Pointer to calculated RD Cost
+ * \param[in] skippable Pointer to a flag indicating possible tx skip
+ * \param[in] bsize Current block size
+ * \param[in] tx_size Transform size
+ *
+ * \remark Nothing is returned. Instead, calculated RD cost is placed to
+ * \c this_rdc. \c skippable flag is set if all coefficients are zero.
+ */
+void av1_block_yrd_idtx(MACROBLOCK *x, const uint8_t *const pred_buf,
+ int pred_stride, RD_STATS *this_rdc, int *skippable,
+ BLOCK_SIZE bsize, TX_SIZE tx_size) {
+ MACROBLOCKD *xd = &x->e_mbd;
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ assert(bsize < BLOCK_SIZES_ALL);
+ const int num_4x4_w = mi_size_wide[bsize];
+ const int num_4x4_h = mi_size_high[bsize];
+ const int step = 1 << (tx_size << 1);
+ const int block_step = (1 << tx_size);
+ const int max_blocks_wide =
+ num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 : xd->mb_to_right_edge >> 5);
+ const int max_blocks_high =
+ num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 : xd->mb_to_bottom_edge >> 5);
+ int eob_cost = 0;
+ const int bw = 4 * num_4x4_w;
+ const int bh = 4 * num_4x4_h;
+ const int num_blk_skip_w = num_4x4_w;
+ // Keep the intermediate value on the stack here. Writing directly to
+ // skippable causes speed regression due to load-and-store issues in
+ // update_yrd_loop_vars.
+ int temp_skippable = 1;
+ int tx_wd = 0;
+ const SCAN_ORDER *scan_order = NULL;
+ switch (tx_size) {
+ case TX_64X64:
+ assert(0); // Not implemented
+ break;
+ case TX_32X32:
+ assert(0); // Not used
+ break;
+ case TX_16X16:
+ scan_order = &av1_fast_idtx_scan_order_16x16;
+ tx_wd = 16;
+ break;
+ case TX_8X8:
+ scan_order = &av1_fast_idtx_scan_order_8x8;
+ tx_wd = 8;
+ break;
+ default:
+ assert(tx_size == TX_4X4);
+ scan_order = &av1_fast_idtx_scan_order_4x4;
+ tx_wd = 4;
+ break;
+ }
+ assert(scan_order != NULL);
+
+ this_rdc->dist = 0;
+ this_rdc->rate = 0;
+ aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
+ pred_buf, pred_stride);
+ // Keep track of the row and column of the blocks we use so that we know
+ // if we are in the unrestricted motion border.
+ DECLARE_BLOCK_YRD_BUFFERS()
+ DECLARE_BLOCK_YRD_VARS()
+ for (int r = 0; r < max_blocks_high; r += block_step) {
+ for (int c = 0, s = 0; c < max_blocks_wide; c += block_step, s += step) {
+ DECLARE_LOOP_VARS_BLOCK_YRD()
+ scale_square_buf_vals(low_coeff, tx_wd, src_diff, diff_stride);
+ av1_quantize_lp(low_coeff, tx_wd * tx_wd, p->round_fp_QTX,
+ p->quant_fp_QTX, low_qcoeff, low_dqcoeff, p->dequant_QTX,
+ eob, scan_order->scan, scan_order->iscan);
+ assert(*eob <= 1024);
+ update_yrd_loop_vars(x, &temp_skippable, step, *eob, low_coeff,
+ low_qcoeff, low_dqcoeff, this_rdc, &eob_cost,
+ r * num_blk_skip_w + c);
+ }
+ }
+ this_rdc->skip_txfm = *skippable = temp_skippable;
+ if (this_rdc->sse < INT64_MAX) {
+ this_rdc->sse = (this_rdc->sse << 6) >> 2;
+ if (temp_skippable) {
+ this_rdc->dist = 0;
+ this_rdc->dist = this_rdc->sse;
+ return;
+ }
+ }
+ // If skippable is set, rate gets clobbered later.
+ this_rdc->rate <<= (2 + AV1_PROB_COST_SHIFT);
+ this_rdc->rate += (eob_cost << AV1_PROB_COST_SHIFT);
+}
+
+int64_t av1_model_rd_for_sb_uv(AV1_COMP *cpi, BLOCK_SIZE plane_bsize,
+ MACROBLOCK *x, MACROBLOCKD *xd,
+ RD_STATS *this_rdc, int start_plane,
+ int stop_plane) {
+ // Note our transform coeffs are 8 times an orthogonal transform.
+ // Hence quantizer step is also 8 times. To get effective quantizer
+ // we need to divide by 8 before sending to modeling function.
+ unsigned int sse;
+ int rate;
+ int64_t dist;
+ int plane;
+ int64_t tot_sse = 0;
+
+ this_rdc->rate = 0;
+ this_rdc->dist = 0;
+ this_rdc->skip_txfm = 0;
+
+ for (plane = start_plane; plane <= stop_plane; ++plane) {
+ struct macroblock_plane *const p = &x->plane[plane];
+ struct macroblockd_plane *const pd = &xd->plane[plane];
+ const uint32_t dc_quant = p->dequant_QTX[0];
+ const uint32_t ac_quant = p->dequant_QTX[1];
+ const BLOCK_SIZE bs = plane_bsize;
+ unsigned int var;
+ if (!x->color_sensitivity[COLOR_SENS_IDX(plane)]) continue;
+
+ var = cpi->ppi->fn_ptr[bs].vf(p->src.buf, p->src.stride, pd->dst.buf,
+ pd->dst.stride, &sse);
+ assert(sse >= var);
+ tot_sse += sse;
+
+ av1_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bs],
+ dc_quant >> 3, &rate, &dist);
+
+ this_rdc->rate += rate >> 1;
+ this_rdc->dist += dist << 3;
+
+ av1_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bs], ac_quant >> 3,
+ &rate, &dist);
+
+ this_rdc->rate += rate;
+ this_rdc->dist += dist << 4;
+ }
+
+ if (this_rdc->rate == 0) {
+ this_rdc->skip_txfm = 1;
+ }
+
+ if (RDCOST(x->rdmult, this_rdc->rate, this_rdc->dist) >=
+ RDCOST(x->rdmult, 0, tot_sse << 4)) {
+ this_rdc->rate = 0;
+ this_rdc->dist = tot_sse << 4;
+ this_rdc->skip_txfm = 1;
+ }
+
+ return tot_sse;
+}
+
+static void compute_intra_yprediction(const AV1_COMMON *cm,
+ PREDICTION_MODE mode, BLOCK_SIZE bsize,
+ MACROBLOCK *x, MACROBLOCKD *xd) {
+ const SequenceHeader *seq_params = cm->seq_params;
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ uint8_t *const src_buf_base = p->src.buf;
+ uint8_t *const dst_buf_base = pd->dst.buf;
+ const int src_stride = p->src.stride;
+ const int dst_stride = pd->dst.stride;
+ int plane = 0;
+ int row, col;
+ // block and transform sizes, in number of 4x4 blocks log 2 ("*_b")
+ // 4x4=0, 8x8=2, 16x16=4, 32x32=6, 64x64=8
+ // transform size varies per plane, look it up in a common way.
+ const TX_SIZE tx_size = max_txsize_lookup[bsize];
+ const BLOCK_SIZE plane_bsize =
+ get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
+ // If mb_to_right_edge is < 0 we are in a situation in which
+ // the current block size extends into the UMV and we won't
+ // visit the sub blocks that are wholly within the UMV.
+ const int max_blocks_wide = max_block_wide(xd, plane_bsize, plane);
+ const int max_blocks_high = max_block_high(xd, plane_bsize, plane);
+ // Keep track of the row and column of the blocks we use so that we know
+ // if we are in the unrestricted motion border.
+ for (row = 0; row < max_blocks_high; row += (1 << tx_size)) {
+ // Skip visiting the sub blocks that are wholly within the UMV.
+ for (col = 0; col < max_blocks_wide; col += (1 << tx_size)) {
+ p->src.buf = &src_buf_base[4 * (row * (int64_t)src_stride + col)];
+ pd->dst.buf = &dst_buf_base[4 * (row * (int64_t)dst_stride + col)];
+ av1_predict_intra_block(
+ xd, seq_params->sb_size, seq_params->enable_intra_edge_filter,
+ block_size_wide[bsize], block_size_high[bsize], tx_size, mode, 0, 0,
+ FILTER_INTRA_MODES, pd->dst.buf, dst_stride, pd->dst.buf, dst_stride,
+ 0, 0, plane);
+ }
+ }
+ p->src.buf = src_buf_base;
+ pd->dst.buf = dst_buf_base;
+}
+
+// Checks whether Intra mode needs to be pruned based on
+// 'intra_y_mode_bsize_mask_nrd' and 'prune_hv_pred_modes_using_blksad'
+// speed features.
+static INLINE bool is_prune_intra_mode(
+ AV1_COMP *cpi, int mode_index, int force_intra_check, BLOCK_SIZE bsize,
+ uint8_t segment_id, SOURCE_SAD source_sad_nonrd,
+ uint8_t color_sensitivity[MAX_MB_PLANE - 1]) {
+ const PREDICTION_MODE this_mode = intra_mode_list[mode_index];
+ if (mode_index > 2 || force_intra_check == 0) {
+ if (!((1 << this_mode) & cpi->sf.rt_sf.intra_y_mode_bsize_mask_nrd[bsize]))
+ return true;
+
+ if (this_mode == DC_PRED) return false;
+
+ if (!cpi->sf.rt_sf.prune_hv_pred_modes_using_src_sad) return false;
+
+ const bool has_color_sensitivity =
+ color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] &&
+ color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)];
+ if (has_color_sensitivity &&
+ (cpi->rc.frame_source_sad > 1.1 * cpi->rc.avg_source_sad ||
+ cyclic_refresh_segment_id_boosted(segment_id) ||
+ source_sad_nonrd > kMedSad))
+ return false;
+
+ return true;
+ }
+ return false;
+}
+
+/*!\brief Estimation of RD cost of an intra mode for Non-RD optimized case.
+ *
+ * \ingroup nonrd_mode_search
+ * \callgraph
+ * \callergraph
+ * Calculates RD Cost for an intra mode for a single TX block using Hadamard
+ * transform.
+ * \param[in] plane Color plane
+ * \param[in] block Index of a TX block in a prediction block
+ * \param[in] row Row of a current TX block
+ * \param[in] col Column of a current TX block
+ * \param[in] plane_bsize Block size of a current prediction block
+ * \param[in] tx_size Transform size
+ * \param[in] arg Pointer to a structure that holds parameters
+ * for intra mode search
+ *
+ * \remark Nothing is returned. Instead, best mode and RD Cost of the best mode
+ * are set in \c args->rdc and \c args->mode
+ */
+void av1_estimate_block_intra(int plane, int block, int row, int col,
+ BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
+ void *arg) {
+ struct estimate_block_intra_args *const args = arg;
+ AV1_COMP *const cpi = args->cpi;
+ AV1_COMMON *const cm = &cpi->common;
+ MACROBLOCK *const x = args->x;
+ MACROBLOCKD *const xd = &x->e_mbd;
+ struct macroblock_plane *const p = &x->plane[plane];
+ struct macroblockd_plane *const pd = &xd->plane[plane];
+ const BLOCK_SIZE bsize_tx = txsize_to_bsize[tx_size];
+ uint8_t *const src_buf_base = p->src.buf;
+ uint8_t *const dst_buf_base = pd->dst.buf;
+ const int64_t src_stride = p->src.stride;
+ const int64_t dst_stride = pd->dst.stride;
+
+ (void)block;
+
+ av1_predict_intra_block_facade(cm, xd, plane, col, row, tx_size);
+
+ if (args->prune_mode_based_on_sad) {
+ unsigned int this_sad = cpi->ppi->fn_ptr[plane_bsize].sdf(
+ p->src.buf, p->src.stride, pd->dst.buf, pd->dst.stride);
+ const unsigned int sad_threshold =
+ args->best_sad != UINT_MAX ? args->best_sad + (args->best_sad >> 4)
+ : UINT_MAX;
+ // Skip the evaluation of current mode if its SAD is more than a threshold.
+ if (this_sad > sad_threshold) {
+ // For the current mode, set rate and distortion to maximum possible
+ // values and return.
+ // Note: args->rdc->rate is checked in av1_nonrd_pick_intra_mode() to skip
+ // the evaluation of the current mode.
+ args->rdc->rate = INT_MAX;
+ args->rdc->dist = INT64_MAX;
+ return;
+ }
+ if (this_sad < args->best_sad) {
+ args->best_sad = this_sad;
+ }
+ }
+
+ RD_STATS this_rdc;
+ av1_invalid_rd_stats(&this_rdc);
+
+ p->src.buf = &src_buf_base[4 * (row * src_stride + col)];
+ pd->dst.buf = &dst_buf_base[4 * (row * dst_stride + col)];
+
+ if (plane == 0) {
+ av1_block_yrd(x, &this_rdc, &args->skippable, bsize_tx,
+ AOMMIN(tx_size, TX_16X16));
+ } else {
+ av1_model_rd_for_sb_uv(cpi, bsize_tx, x, xd, &this_rdc, plane, plane);
+ }
+
+ p->src.buf = src_buf_base;
+ pd->dst.buf = dst_buf_base;
+ assert(args->rdc->rate != INT_MAX && args->rdc->dist != INT64_MAX);
+ args->rdc->rate += this_rdc.rate;
+ args->rdc->dist += this_rdc.dist;
+}
+
+/*!\brief Estimates best intra mode for inter mode search
+ *
+ * \ingroup nonrd_mode_search
+ * \callgraph
+ * \callergraph
+ *
+ * Using heuristics based on best inter mode, block size, and other decides
+ * whether to check intra modes. If so, estimates and selects best intra mode
+ * from the reduced set of intra modes (max 4 intra modes checked)
+ *
+ * \param[in] cpi Top-level encoder structure
+ * \param[in] x Pointer to structure holding all the
+ * data for the current macroblock
+ * \param[in] bsize Current block size
+ * \param[in] best_early_term Flag, indicating that TX for the
+ * best inter mode was skipped
+ * \param[in] ref_cost_intra Cost of signalling intra mode
+ * \param[in] reuse_prediction Flag, indicating prediction re-use
+ * \param[in] orig_dst Original destination buffer
+ * \param[in] tmp_buffers Pointer to a temporary buffers for
+ * prediction re-use
+ * \param[out] this_mode_pred Pointer to store prediction buffer
+ * for prediction re-use
+ * \param[in] best_rdc Pointer to RD cost for the best
+ * selected intra mode
+ * \param[in] best_pickmode Pointer to a structure containing
+ * best mode picked so far
+ * \param[in] ctx Pointer to structure holding coding
+ * contexts and modes for the block
+ *
+ * \remark Nothing is returned. Instead, calculated RD cost is placed to
+ * \c best_rdc and best selected mode is placed to \c best_pickmode
+ *
+ */
+void av1_estimate_intra_mode(AV1_COMP *cpi, MACROBLOCK *x, BLOCK_SIZE bsize,
+ int best_early_term, unsigned int ref_cost_intra,
+ int reuse_prediction, struct buf_2d *orig_dst,
+ PRED_BUFFER *tmp_buffers,
+ PRED_BUFFER **this_mode_pred, RD_STATS *best_rdc,
+ BEST_PICKMODE *best_pickmode,
+ PICK_MODE_CONTEXT *ctx) {
+ AV1_COMMON *const cm = &cpi->common;
+ MACROBLOCKD *const xd = &x->e_mbd;
+ MB_MODE_INFO *const mi = xd->mi[0];
+ const TxfmSearchParams *txfm_params = &x->txfm_search_params;
+ const unsigned char segment_id = mi->segment_id;
+ const int *const rd_threshes = cpi->rd.threshes[segment_id][bsize];
+ const int *const rd_thresh_freq_fact = x->thresh_freq_fact[bsize];
+ const bool is_screen_content =
+ cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN;
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
+ const REAL_TIME_SPEED_FEATURES *const rt_sf = &cpi->sf.rt_sf;
+
+ const CommonQuantParams *quant_params = &cm->quant_params;
+
+ RD_STATS this_rdc;
+
+ int intra_cost_penalty = av1_get_intra_cost_penalty(
+ quant_params->base_qindex, quant_params->y_dc_delta_q,
+ cm->seq_params->bit_depth);
+ int64_t inter_mode_thresh =
+ RDCOST(x->rdmult, ref_cost_intra + intra_cost_penalty, 0);
+ int perform_intra_pred = rt_sf->check_intra_pred_nonrd;
+ int force_intra_check = 0;
+ // For spatial enhancement layer: turn off intra prediction if the
+ // previous spatial layer as golden ref is not chosen as best reference.
+ // only do this for temporal enhancement layer and on non-key frames.
+ if (cpi->svc.spatial_layer_id > 0 &&
+ best_pickmode->best_ref_frame != GOLDEN_FRAME &&
+ cpi->svc.temporal_layer_id > 0 &&
+ !cpi->svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame)
+ perform_intra_pred = 0;
+
+ int do_early_exit_rdthresh = 1;
+
+ uint32_t spatial_var_thresh = 50;
+ int motion_thresh = 32;
+ // Adjust thresholds to make intra mode likely tested if the other
+ // references (golden, alt) are skipped/not checked. For now always
+ // adjust for svc mode.
+ if (cpi->ppi->use_svc || (rt_sf->use_nonrd_altref_frame == 0 &&
+ rt_sf->nonrd_prune_ref_frame_search > 0)) {
+ spatial_var_thresh = 150;
+ motion_thresh = 0;
+ }
+
+ // Some adjustments to checking intra mode based on source variance.
+ if (x->source_variance < spatial_var_thresh) {
+ // If the best inter mode is large motion or non-LAST ref reduce intra cost
+ // penalty, so intra mode is more likely tested.
+ if (best_rdc->rdcost != INT64_MAX &&
+ (best_pickmode->best_ref_frame != LAST_FRAME ||
+ abs(mi->mv[0].as_mv.row) >= motion_thresh ||
+ abs(mi->mv[0].as_mv.col) >= motion_thresh)) {
+ intra_cost_penalty = intra_cost_penalty >> 2;
+ inter_mode_thresh =
+ RDCOST(x->rdmult, ref_cost_intra + intra_cost_penalty, 0);
+ do_early_exit_rdthresh = 0;
+ }
+ if ((x->source_variance < AOMMAX(50, (spatial_var_thresh >> 1)) &&
+ x->content_state_sb.source_sad_nonrd >= kHighSad) ||
+ (is_screen_content && x->source_variance < 50 &&
+ ((bsize >= BLOCK_32X32 &&
+ x->content_state_sb.source_sad_nonrd != kZeroSad) ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)))
+ force_intra_check = 1;
+ // For big blocks worth checking intra (since only DC will be checked),
+ // even if best_early_term is set.
+ if (bsize >= BLOCK_32X32) best_early_term = 0;
+ } else if (rt_sf->source_metrics_sb_nonrd &&
+ x->content_state_sb.source_sad_nonrd <= kLowSad) {
+ perform_intra_pred = 0;
+ }
+
+ if (best_rdc->skip_txfm && best_pickmode->best_mode_initial_skip_flag) {
+ if (rt_sf->skip_intra_pred == 1 && best_pickmode->best_mode != NEWMV)
+ perform_intra_pred = 0;
+ else if (rt_sf->skip_intra_pred == 2)
+ perform_intra_pred = 0;
+ }
+
+ if (!(best_rdc->rdcost == INT64_MAX || force_intra_check ||
+ (perform_intra_pred && !best_early_term &&
+ bsize <= cpi->sf.part_sf.max_intra_bsize))) {
+ return;
+ }
+
+ // Early exit based on RD cost calculated using known rate. When
+ // is_screen_content is true, more bias is given to intra modes. Hence,
+ // considered conservative threshold in early exit for the same.
+ const int64_t known_rd = is_screen_content
+ ? CALC_BIASED_RDCOST(inter_mode_thresh)
+ : inter_mode_thresh;
+ if (known_rd > best_rdc->rdcost) return;
+
+ struct estimate_block_intra_args args;
+ init_estimate_block_intra_args(&args, cpi, x);
+ TX_SIZE intra_tx_size = AOMMIN(
+ AOMMIN(max_txsize_lookup[bsize],
+ tx_mode_to_biggest_tx_size[txfm_params->tx_mode_search_type]),
+ TX_16X16);
+ if (is_screen_content && cpi->rc.high_source_sad &&
+ x->source_variance > spatial_var_thresh && bsize <= BLOCK_16X16)
+ intra_tx_size = TX_4X4;
+
+ PRED_BUFFER *const best_pred = best_pickmode->best_pred;
+ if (reuse_prediction && best_pred != NULL) {
+ const int bh = block_size_high[bsize];
+ const int bw = block_size_wide[bsize];
+ if (best_pred->data == orig_dst->buf) {
+ *this_mode_pred = &tmp_buffers[get_pred_buffer(tmp_buffers, 3)];
+ aom_convolve_copy(best_pred->data, best_pred->stride,
+ (*this_mode_pred)->data, (*this_mode_pred)->stride, bw,
+ bh);
+ best_pickmode->best_pred = *this_mode_pred;
+ }
+ }
+ pd->dst = *orig_dst;
+
+ for (int midx = 0; midx < RTC_INTRA_MODES; ++midx) {
+ const PREDICTION_MODE this_mode = intra_mode_list[midx];
+ const THR_MODES mode_index = mode_idx[INTRA_FRAME][mode_offset(this_mode)];
+ const int64_t mode_rd_thresh = rd_threshes[mode_index];
+
+ if (is_prune_intra_mode(cpi, midx, force_intra_check, bsize, segment_id,
+ x->content_state_sb.source_sad_nonrd,
+ x->color_sensitivity))
+ continue;
+
+ if (is_screen_content && rt_sf->source_metrics_sb_nonrd) {
+ // For spatially flat blocks with zero motion only check
+ // DC mode.
+ if (x->content_state_sb.source_sad_nonrd == kZeroSad &&
+ x->source_variance == 0 && this_mode != DC_PRED)
+ continue;
+ // Only test Intra for big blocks if spatial_variance is small.
+ else if (bsize > BLOCK_32X32 && x->source_variance > 50)
+ continue;
+ }
+
+ if (rd_less_than_thresh(best_rdc->rdcost, mode_rd_thresh,
+ rd_thresh_freq_fact[mode_index]) &&
+ (do_early_exit_rdthresh || this_mode == SMOOTH_PRED)) {
+ continue;
+ }
+ const BLOCK_SIZE uv_bsize =
+ get_plane_block_size(bsize, xd->plane[AOM_PLANE_U].subsampling_x,
+ xd->plane[AOM_PLANE_U].subsampling_y);
+
+ mi->mode = this_mode;
+ mi->ref_frame[0] = INTRA_FRAME;
+ mi->ref_frame[1] = NONE_FRAME;
+
+ av1_invalid_rd_stats(&this_rdc);
+ args.mode = this_mode;
+ args.skippable = 1;
+ args.rdc = &this_rdc;
+ mi->tx_size = intra_tx_size;
+ compute_intra_yprediction(cm, this_mode, bsize, x, xd);
+ // Look into selecting tx_size here, based on prediction residual.
+ av1_block_yrd(x, &this_rdc, &args.skippable, bsize, mi->tx_size);
+ // TODO(kyslov@) Need to account for skippable
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)]) {
+ av1_foreach_transformed_block_in_plane(xd, uv_bsize, AOM_PLANE_U,
+ av1_estimate_block_intra, &args);
+ }
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]) {
+ av1_foreach_transformed_block_in_plane(xd, uv_bsize, AOM_PLANE_V,
+ av1_estimate_block_intra, &args);
+ }
+
+ int mode_cost = 0;
+ if (av1_is_directional_mode(this_mode) && av1_use_angle_delta(bsize)) {
+ mode_cost +=
+ x->mode_costs.angle_delta_cost[this_mode - V_PRED]
+ [MAX_ANGLE_DELTA +
+ mi->angle_delta[PLANE_TYPE_Y]];
+ }
+ if (this_mode == DC_PRED && av1_filter_intra_allowed_bsize(cm, bsize)) {
+ mode_cost += x->mode_costs.filter_intra_cost[bsize][0];
+ }
+ this_rdc.rate += ref_cost_intra;
+ this_rdc.rate += intra_cost_penalty;
+ this_rdc.rate += mode_cost;
+ this_rdc.rdcost = RDCOST(x->rdmult, this_rdc.rate, this_rdc.dist);
+
+ if (is_screen_content && rt_sf->source_metrics_sb_nonrd) {
+ // For blocks with low spatial variance and color sad,
+ // favor the intra-modes, only on scene/slide change.
+ if (cpi->rc.high_source_sad && x->source_variance < 800 &&
+ (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]))
+ this_rdc.rdcost = CALC_BIASED_RDCOST(this_rdc.rdcost);
+ // Otherwise bias against intra for blocks with zero
+ // motion and no color, on non-scene/slide changes.
+ else if (!cpi->rc.high_source_sad && x->source_variance > 0 &&
+ x->content_state_sb.source_sad_nonrd == kZeroSad &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 0)
+ this_rdc.rdcost = (3 * this_rdc.rdcost) >> 1;
+ }
+
+ if (this_rdc.rdcost < best_rdc->rdcost) {
+ *best_rdc = this_rdc;
+ best_pickmode->best_mode = this_mode;
+ best_pickmode->best_tx_size = mi->tx_size;
+ best_pickmode->best_ref_frame = INTRA_FRAME;
+ best_pickmode->best_second_ref_frame = NONE;
+ best_pickmode->best_mode_skip_txfm = this_rdc.skip_txfm;
+ mi->uv_mode = this_mode;
+ mi->mv[0].as_int = INVALID_MV;
+ mi->mv[1].as_int = INVALID_MV;
+ if (!this_rdc.skip_txfm)
+ memset(ctx->blk_skip, 0,
+ sizeof(x->txfm_search_info.blk_skip[0]) * ctx->num_4x4_blk);
+ }
+ }
+ if (best_pickmode->best_ref_frame == INTRA_FRAME)
+ memset(ctx->blk_skip, 0,
+ sizeof(x->txfm_search_info.blk_skip[0]) * ctx->num_4x4_blk);
+ mi->tx_size = best_pickmode->best_tx_size;
+}
diff --git a/av1/encoder/nonrd_opt.h b/av1/encoder/nonrd_opt.h
index 0d0db81..7948c78 100644
--- a/av1/encoder/nonrd_opt.h
+++ b/av1/encoder/nonrd_opt.h
@@ -13,10 +13,104 @@
#define AOM_AV1_ENCODER_NONRD_OPT_H_
#include "av1/encoder/rdopt_utils.h"
+#include "av1/encoder/rdopt.h"
#define RTC_INTER_MODES (4)
#define RTC_INTRA_MODES (4)
#define RTC_MODES (AOMMAX(RTC_INTER_MODES, RTC_INTRA_MODES))
+#define CALC_BIASED_RDCOST(rdcost) (7 * (rdcost) >> 3)
+#define NUM_COMP_INTER_MODES_RT (6)
+#define NUM_INTER_MODES 12
+#define CAP_TX_SIZE_FOR_BSIZE_GT32(tx_mode_search_type, bsize) \
+ (((tx_mode_search_type) != ONLY_4X4 && (bsize) > BLOCK_32X32) ? true : false)
+#define TX_SIZE_FOR_BSIZE_GT32 (TX_16X16)
+#define FILTER_SEARCH_SIZE 2
+#if !CONFIG_REALTIME_ONLY
+#define MOTION_MODE_SEARCH_SIZE 2
+#endif
+
+extern int g_pick_inter_mode_cnt;
+/*!\cond */
+typedef struct {
+ uint8_t *data;
+ int stride;
+ int in_use;
+} PRED_BUFFER;
+
+typedef struct {
+ PRED_BUFFER *best_pred;
+ PREDICTION_MODE best_mode;
+ TX_SIZE best_tx_size;
+ TX_TYPE tx_type;
+ MV_REFERENCE_FRAME best_ref_frame;
+ MV_REFERENCE_FRAME best_second_ref_frame;
+ uint8_t best_mode_skip_txfm;
+ uint8_t best_mode_initial_skip_flag;
+ int_interpfilters best_pred_filter;
+ MOTION_MODE best_motion_mode;
+ WarpedMotionParams wm_params;
+ int num_proj_ref;
+ PALETTE_MODE_INFO pmi;
+ int64_t best_sse;
+} BEST_PICKMODE;
+
+typedef struct {
+ MV_REFERENCE_FRAME ref_frame;
+ PREDICTION_MODE pred_mode;
+} REF_MODE;
+
+typedef struct {
+ MV_REFERENCE_FRAME ref_frame[2];
+ PREDICTION_MODE pred_mode;
+} COMP_REF_MODE;
+
+struct estimate_block_intra_args {
+ AV1_COMP *cpi;
+ MACROBLOCK *x;
+ PREDICTION_MODE mode;
+ int skippable;
+ RD_STATS *rdc;
+ unsigned int best_sad;
+ bool prune_mode_based_on_sad;
+};
+/*!\endcond */
+
+/*!\brief Structure to store parameters and statistics used in non-rd inter mode
+ * evaluation.
+ */
+typedef struct {
+ //! Structure to hold best inter mode data
+ BEST_PICKMODE best_pickmode;
+ //! Structure to RD cost of current mode
+ RD_STATS this_rdc;
+ //! Pointer to the RD Cost for the best mode found so far
+ RD_STATS best_rdc;
+ //! Distortion of chroma planes for all modes and reference frames
+ int64_t uv_dist[RTC_INTER_MODES][REF_FRAMES];
+ //! Buffer to hold predicted block for all reference frames and planes
+ struct buf_2d yv12_mb[REF_FRAMES][MAX_MB_PLANE];
+ //! Array to hold variance of all modes and reference frames
+ unsigned int vars[RTC_INTER_MODES][REF_FRAMES];
+ //! Array to hold ref cost of single reference mode for all ref frames
+ unsigned int ref_costs_single[REF_FRAMES];
+ //! Array to hold motion vector for all modes and reference frames
+ int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES];
+ //! Array to hold best mv for all modes and reference frames
+ int_mv frame_mv_best[MB_MODE_COUNT][REF_FRAMES];
+ //! Array to hold inter mode cost of single ref mode for all ref frames
+ int single_inter_mode_costs[RTC_INTER_MODES][REF_FRAMES];
+ //! Array to hold use reference frame mask for each reference frame
+ int use_ref_frame_mask[REF_FRAMES];
+ //! Array to hold flags of evaluated modes for each reference frame
+ uint8_t mode_checked[MB_MODE_COUNT][REF_FRAMES];
+} InterModeSearchStateNonrd;
+
+static const uint8_t b_width_log2_lookup[BLOCK_SIZES] = { 0, 0, 1, 1, 1, 2,
+ 2, 2, 3, 3, 3, 4,
+ 4, 4, 5, 5 };
+static const uint8_t b_height_log2_lookup[BLOCK_SIZES] = { 0, 1, 0, 1, 2, 1,
+ 2, 3, 2, 3, 4, 3,
+ 4, 5, 4, 5 };
static const PREDICTION_MODE intra_mode_list[] = { DC_PRED, V_PRED, H_PRED,
SMOOTH_PRED };
@@ -35,6 +129,266 @@
{ THR_NEARESTA, THR_NEARA, THR_GLOBALA, THR_NEWA },
};
+// GLOBALMV in the set below is in fact ZEROMV as we don't do global ME in RT
+// mode
+static const REF_MODE ref_mode_set[NUM_INTER_MODES] = {
+ { LAST_FRAME, NEARESTMV }, { LAST_FRAME, NEARMV },
+ { LAST_FRAME, GLOBALMV }, { LAST_FRAME, NEWMV },
+ { GOLDEN_FRAME, NEARESTMV }, { GOLDEN_FRAME, NEARMV },
+ { GOLDEN_FRAME, GLOBALMV }, { GOLDEN_FRAME, NEWMV },
+ { ALTREF_FRAME, NEARESTMV }, { ALTREF_FRAME, NEARMV },
+ { ALTREF_FRAME, GLOBALMV }, { ALTREF_FRAME, NEWMV },
+};
+
+static const COMP_REF_MODE comp_ref_mode_set[NUM_COMP_INTER_MODES_RT] = {
+ { { LAST_FRAME, GOLDEN_FRAME }, GLOBAL_GLOBALMV },
+ { { LAST_FRAME, GOLDEN_FRAME }, NEAREST_NEARESTMV },
+ { { LAST_FRAME, LAST2_FRAME }, GLOBAL_GLOBALMV },
+ { { LAST_FRAME, LAST2_FRAME }, NEAREST_NEARESTMV },
+ { { LAST_FRAME, ALTREF_FRAME }, GLOBAL_GLOBALMV },
+ { { LAST_FRAME, ALTREF_FRAME }, NEAREST_NEARESTMV },
+};
+
+static const int_interpfilters filters_ref_set[9] = {
+ [0].as_filters = { EIGHTTAP_REGULAR, EIGHTTAP_REGULAR },
+ [1].as_filters = { EIGHTTAP_SMOOTH, EIGHTTAP_SMOOTH },
+ [2].as_filters = { EIGHTTAP_REGULAR, EIGHTTAP_SMOOTH },
+ [3].as_filters = { EIGHTTAP_SMOOTH, EIGHTTAP_REGULAR },
+ [4].as_filters = { MULTITAP_SHARP, MULTITAP_SHARP },
+ [5].as_filters = { EIGHTTAP_REGULAR, MULTITAP_SHARP },
+ [6].as_filters = { MULTITAP_SHARP, EIGHTTAP_REGULAR },
+ [7].as_filters = { EIGHTTAP_SMOOTH, MULTITAP_SHARP },
+ [8].as_filters = { MULTITAP_SHARP, EIGHTTAP_SMOOTH }
+};
+
+enum {
+ // INTER_ALL = (1 << NEARESTMV) | (1 << NEARMV) | (1 << NEWMV),
+ INTER_NEAREST = (1 << NEARESTMV),
+ INTER_NEAREST_NEW = (1 << NEARESTMV) | (1 << NEWMV),
+ INTER_NEAREST_NEAR = (1 << NEARESTMV) | (1 << NEARMV),
+ INTER_NEAR_NEW = (1 << NEARMV) | (1 << NEWMV),
+};
+
+// The original scan order (default_scan_8x8) is modified according to the extra
+// transpose in hadamard c implementation, i.e., aom_hadamard_lp_8x8_c and
+// aom_hadamard_8x8_c.
+DECLARE_ALIGNED(16, static const int16_t, default_scan_8x8_transpose[64]) = {
+ 0, 8, 1, 2, 9, 16, 24, 17, 10, 3, 4, 11, 18, 25, 32, 40,
+ 33, 26, 19, 12, 5, 6, 13, 20, 27, 34, 41, 48, 56, 49, 42, 35,
+ 28, 21, 14, 7, 15, 22, 29, 36, 43, 50, 57, 58, 51, 44, 37, 30,
+ 23, 31, 38, 45, 52, 59, 60, 53, 46, 39, 47, 54, 61, 62, 55, 63
+};
+
+// The original scan order (av1_default_iscan_8x8) is modified to match
+// hadamard AVX2 implementation, i.e., aom_hadamard_lp_8x8_avx2 and
+// aom_hadamard_8x8_avx2. Since hadamard AVX2 implementation will modify the
+// order of coefficients, such that the normal scan order is no longer
+// guaranteed to scan low coefficients first, therefore we modify the scan order
+// accordingly.
+// Note that this one has to be used together with default_scan_8x8_transpose.
+DECLARE_ALIGNED(16, static const int16_t,
+ av1_default_iscan_8x8_transpose[64]) = {
+ 0, 2, 3, 9, 10, 20, 21, 35, 1, 4, 8, 11, 19, 22, 34, 36,
+ 5, 7, 12, 18, 23, 33, 37, 48, 6, 13, 17, 24, 32, 38, 47, 49,
+ 14, 16, 25, 31, 39, 46, 50, 57, 15, 26, 30, 40, 45, 51, 56, 58,
+ 27, 29, 41, 44, 52, 55, 59, 62, 28, 42, 43, 53, 54, 60, 61, 63
+};
+
+// The original scan order (default_scan_16x16) is modified according to the
+// extra transpose in hadamard c implementation in lp case, i.e.,
+// aom_hadamard_lp_16x16_c.
+DECLARE_ALIGNED(16, static const int16_t,
+ default_scan_lp_16x16_transpose[256]) = {
+ 0, 8, 2, 4, 10, 16, 24, 18, 12, 6, 64, 14, 20, 26, 32,
+ 40, 34, 28, 22, 72, 66, 68, 74, 80, 30, 36, 42, 48, 56, 50,
+ 44, 38, 88, 82, 76, 70, 128, 78, 84, 90, 96, 46, 52, 58, 1,
+ 9, 3, 60, 54, 104, 98, 92, 86, 136, 130, 132, 138, 144, 94, 100,
+ 106, 112, 62, 5, 11, 17, 25, 19, 13, 7, 120, 114, 108, 102, 152,
+ 146, 140, 134, 192, 142, 148, 154, 160, 110, 116, 122, 65, 15, 21, 27,
+ 33, 41, 35, 29, 23, 73, 67, 124, 118, 168, 162, 156, 150, 200, 194,
+ 196, 202, 208, 158, 164, 170, 176, 126, 69, 75, 81, 31, 37, 43, 49,
+ 57, 51, 45, 39, 89, 83, 77, 71, 184, 178, 172, 166, 216, 210, 204,
+ 198, 206, 212, 218, 224, 174, 180, 186, 129, 79, 85, 91, 97, 47, 53,
+ 59, 61, 55, 105, 99, 93, 87, 137, 131, 188, 182, 232, 226, 220, 214,
+ 222, 228, 234, 240, 190, 133, 139, 145, 95, 101, 107, 113, 63, 121, 115,
+ 109, 103, 153, 147, 141, 135, 248, 242, 236, 230, 238, 244, 250, 193, 143,
+ 149, 155, 161, 111, 117, 123, 125, 119, 169, 163, 157, 151, 201, 195, 252,
+ 246, 254, 197, 203, 209, 159, 165, 171, 177, 127, 185, 179, 173, 167, 217,
+ 211, 205, 199, 207, 213, 219, 225, 175, 181, 187, 189, 183, 233, 227, 221,
+ 215, 223, 229, 235, 241, 191, 249, 243, 237, 231, 239, 245, 251, 253, 247,
+ 255
+};
+
+#if CONFIG_AV1_HIGHBITDEPTH
+// The original scan order (default_scan_16x16) is modified according to the
+// extra shift in hadamard c implementation in fp case, i.e.,
+// aom_hadamard_16x16_c. Note that 16x16 lp and fp hadamard generate different
+// outputs, so we handle them separately.
+DECLARE_ALIGNED(16, static const int16_t,
+ default_scan_fp_16x16_transpose[256]) = {
+ 0, 4, 2, 8, 6, 16, 20, 18, 12, 10, 64, 14, 24, 22, 32,
+ 36, 34, 28, 26, 68, 66, 72, 70, 80, 30, 40, 38, 48, 52, 50,
+ 44, 42, 84, 82, 76, 74, 128, 78, 88, 86, 96, 46, 56, 54, 1,
+ 5, 3, 60, 58, 100, 98, 92, 90, 132, 130, 136, 134, 144, 94, 104,
+ 102, 112, 62, 9, 7, 17, 21, 19, 13, 11, 116, 114, 108, 106, 148,
+ 146, 140, 138, 192, 142, 152, 150, 160, 110, 120, 118, 65, 15, 25, 23,
+ 33, 37, 35, 29, 27, 69, 67, 124, 122, 164, 162, 156, 154, 196, 194,
+ 200, 198, 208, 158, 168, 166, 176, 126, 73, 71, 81, 31, 41, 39, 49,
+ 53, 51, 45, 43, 85, 83, 77, 75, 180, 178, 172, 170, 212, 210, 204,
+ 202, 206, 216, 214, 224, 174, 184, 182, 129, 79, 89, 87, 97, 47, 57,
+ 55, 61, 59, 101, 99, 93, 91, 133, 131, 188, 186, 228, 226, 220, 218,
+ 222, 232, 230, 240, 190, 137, 135, 145, 95, 105, 103, 113, 63, 117, 115,
+ 109, 107, 149, 147, 141, 139, 244, 242, 236, 234, 238, 248, 246, 193, 143,
+ 153, 151, 161, 111, 121, 119, 125, 123, 165, 163, 157, 155, 197, 195, 252,
+ 250, 254, 201, 199, 209, 159, 169, 167, 177, 127, 181, 179, 173, 171, 213,
+ 211, 205, 203, 207, 217, 215, 225, 175, 185, 183, 189, 187, 229, 227, 221,
+ 219, 223, 233, 231, 241, 191, 245, 243, 237, 235, 239, 249, 247, 253, 251,
+ 255
+};
+#endif
+
+// The original scan order (av1_default_iscan_16x16) is modified to match
+// hadamard AVX2 implementation, i.e., aom_hadamard_lp_16x16_avx2.
+// Since hadamard AVX2 implementation will modify the order of coefficients,
+// such that the normal scan order is no longer guaranteed to scan low
+// coefficients first, therefore we modify the scan order accordingly. Note that
+// this one has to be used together with default_scan_lp_16x16_transpose.
+DECLARE_ALIGNED(16, static const int16_t,
+ av1_default_iscan_lp_16x16_transpose[256]) = {
+ 0, 44, 2, 46, 3, 63, 9, 69, 1, 45, 4, 64, 8, 68, 11,
+ 87, 5, 65, 7, 67, 12, 88, 18, 94, 6, 66, 13, 89, 17, 93,
+ 24, 116, 14, 90, 16, 92, 25, 117, 31, 123, 15, 91, 26, 118, 30,
+ 122, 41, 148, 27, 119, 29, 121, 42, 149, 48, 152, 28, 120, 43, 150,
+ 47, 151, 62, 177, 10, 86, 20, 96, 21, 113, 35, 127, 19, 95, 22,
+ 114, 34, 126, 37, 144, 23, 115, 33, 125, 38, 145, 52, 156, 32, 124,
+ 39, 146, 51, 155, 58, 173, 40, 147, 50, 154, 59, 174, 73, 181, 49,
+ 153, 60, 175, 72, 180, 83, 198, 61, 176, 71, 179, 84, 199, 98, 202,
+ 70, 178, 85, 200, 97, 201, 112, 219, 36, 143, 54, 158, 55, 170, 77,
+ 185, 53, 157, 56, 171, 76, 184, 79, 194, 57, 172, 75, 183, 80, 195,
+ 102, 206, 74, 182, 81, 196, 101, 205, 108, 215, 82, 197, 100, 204, 109,
+ 216, 131, 223, 99, 203, 110, 217, 130, 222, 140, 232, 111, 218, 129, 221,
+ 141, 233, 160, 236, 128, 220, 142, 234, 159, 235, 169, 245, 78, 193, 104,
+ 208, 105, 212, 135, 227, 103, 207, 106, 213, 134, 226, 136, 228, 107, 214,
+ 133, 225, 137, 229, 164, 240, 132, 224, 138, 230, 163, 239, 165, 241, 139,
+ 231, 162, 238, 166, 242, 189, 249, 161, 237, 167, 243, 188, 248, 190, 250,
+ 168, 244, 187, 247, 191, 251, 210, 254, 186, 246, 192, 252, 209, 253, 211,
+ 255
+};
+
+#if CONFIG_AV1_HIGHBITDEPTH
+// The original scan order (av1_default_iscan_16x16) is modified to match
+// hadamard AVX2 implementation, i.e., aom_hadamard_16x16_avx2.
+// Since hadamard AVX2 implementation will modify the order of coefficients,
+// such that the normal scan order is no longer guaranteed to scan low
+// coefficients first, therefore we modify the scan order accordingly. Note that
+// this one has to be used together with default_scan_fp_16x16_transpose.
+DECLARE_ALIGNED(16, static const int16_t,
+ av1_default_iscan_fp_16x16_transpose[256]) = {
+ 0, 44, 2, 46, 1, 45, 4, 64, 3, 63, 9, 69, 8, 68, 11,
+ 87, 5, 65, 7, 67, 6, 66, 13, 89, 12, 88, 18, 94, 17, 93,
+ 24, 116, 14, 90, 16, 92, 15, 91, 26, 118, 25, 117, 31, 123, 30,
+ 122, 41, 148, 27, 119, 29, 121, 28, 120, 43, 150, 42, 149, 48, 152,
+ 47, 151, 62, 177, 10, 86, 20, 96, 19, 95, 22, 114, 21, 113, 35,
+ 127, 34, 126, 37, 144, 23, 115, 33, 125, 32, 124, 39, 146, 38, 145,
+ 52, 156, 51, 155, 58, 173, 40, 147, 50, 154, 49, 153, 60, 175, 59,
+ 174, 73, 181, 72, 180, 83, 198, 61, 176, 71, 179, 70, 178, 85, 200,
+ 84, 199, 98, 202, 97, 201, 112, 219, 36, 143, 54, 158, 53, 157, 56,
+ 171, 55, 170, 77, 185, 76, 184, 79, 194, 57, 172, 75, 183, 74, 182,
+ 81, 196, 80, 195, 102, 206, 101, 205, 108, 215, 82, 197, 100, 204, 99,
+ 203, 110, 217, 109, 216, 131, 223, 130, 222, 140, 232, 111, 218, 129, 221,
+ 128, 220, 142, 234, 141, 233, 160, 236, 159, 235, 169, 245, 78, 193, 104,
+ 208, 103, 207, 106, 213, 105, 212, 135, 227, 134, 226, 136, 228, 107, 214,
+ 133, 225, 132, 224, 138, 230, 137, 229, 164, 240, 163, 239, 165, 241, 139,
+ 231, 162, 238, 161, 237, 167, 243, 166, 242, 189, 249, 188, 248, 190, 250,
+ 168, 244, 187, 247, 186, 246, 192, 252, 191, 251, 210, 254, 209, 253, 211,
+ 255
+};
+#endif
+
+// For entropy coding, IDTX shares the scan orders of the other 2D-transforms,
+// but the fastest way to calculate the IDTX transform (i.e. no transposes)
+// results in coefficients that are a transposition of the entropy coding
+// versions. These tables are used as substitute for the scan order for the
+// faster version of IDTX.
+
+// Must be used together with av1_fast_idtx_iscan_4x4
+DECLARE_ALIGNED(16, static const int16_t,
+ av1_fast_idtx_scan_4x4[16]) = { 0, 1, 4, 8, 5, 2, 3, 6,
+ 9, 12, 13, 10, 7, 11, 14, 15 };
+
+// Must be used together with av1_fast_idtx_scan_4x4
+DECLARE_ALIGNED(16, static const int16_t,
+ av1_fast_idtx_iscan_4x4[16]) = { 0, 1, 5, 6, 2, 4, 7, 12,
+ 3, 8, 11, 13, 9, 10, 14, 15 };
+
+static const SCAN_ORDER av1_fast_idtx_scan_order_4x4 = {
+ av1_fast_idtx_scan_4x4, av1_fast_idtx_iscan_4x4
+};
+
+// Must be used together with av1_fast_idtx_iscan_8x8
+DECLARE_ALIGNED(16, static const int16_t, av1_fast_idtx_scan_8x8[64]) = {
+ 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18, 11, 4, 5,
+ 12, 19, 26, 33, 40, 48, 41, 34, 27, 20, 13, 6, 7, 14, 21, 28,
+ 35, 42, 49, 56, 57, 50, 43, 36, 29, 22, 15, 23, 30, 37, 44, 51,
+ 58, 59, 52, 45, 38, 31, 39, 46, 53, 60, 61, 54, 47, 55, 62, 63
+};
+
+// Must be used together with av1_fast_idtx_scan_8x8
+DECLARE_ALIGNED(16, static const int16_t, av1_fast_idtx_iscan_8x8[64]) = {
+ 0, 1, 5, 6, 14, 15, 27, 28, 2, 4, 7, 13, 16, 26, 29, 42,
+ 3, 8, 12, 17, 25, 30, 41, 43, 9, 11, 18, 24, 31, 40, 44, 53,
+ 10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60,
+ 21, 34, 37, 47, 50, 56, 59, 61, 35, 36, 48, 49, 57, 58, 62, 63
+};
+
+static const SCAN_ORDER av1_fast_idtx_scan_order_8x8 = {
+ av1_fast_idtx_scan_8x8, av1_fast_idtx_iscan_8x8
+};
+
+// Must be used together with av1_fast_idtx_iscan_16x16
+DECLARE_ALIGNED(16, static const int16_t, av1_fast_idtx_scan_16x16[256]) = {
+ 0, 1, 16, 32, 17, 2, 3, 18, 33, 48, 64, 49, 34, 19, 4,
+ 5, 20, 35, 50, 65, 80, 96, 81, 66, 51, 36, 21, 6, 7, 22,
+ 37, 52, 67, 82, 97, 112, 128, 113, 98, 83, 68, 53, 38, 23, 8,
+ 9, 24, 39, 54, 69, 84, 99, 114, 129, 144, 160, 145, 130, 115, 100,
+ 85, 70, 55, 40, 25, 10, 11, 26, 41, 56, 71, 86, 101, 116, 131,
+ 146, 161, 176, 192, 177, 162, 147, 132, 117, 102, 87, 72, 57, 42, 27,
+ 12, 13, 28, 43, 58, 73, 88, 103, 118, 133, 148, 163, 178, 193, 208,
+ 224, 209, 194, 179, 164, 149, 134, 119, 104, 89, 74, 59, 44, 29, 14,
+ 15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225,
+ 240, 241, 226, 211, 196, 181, 166, 151, 136, 121, 106, 91, 76, 61, 46,
+ 31, 47, 62, 77, 92, 107, 122, 137, 152, 167, 182, 197, 212, 227, 242,
+ 243, 228, 213, 198, 183, 168, 153, 138, 123, 108, 93, 78, 63, 79, 94,
+ 109, 124, 139, 154, 169, 184, 199, 214, 229, 244, 245, 230, 215, 200, 185,
+ 170, 155, 140, 125, 110, 95, 111, 126, 141, 156, 171, 186, 201, 216, 231,
+ 246, 247, 232, 217, 202, 187, 172, 157, 142, 127, 143, 158, 173, 188, 203,
+ 218, 233, 248, 249, 234, 219, 204, 189, 174, 159, 175, 190, 205, 220, 235,
+ 250, 251, 236, 221, 206, 191, 207, 222, 237, 252, 253, 238, 223, 239, 254,
+ 255
+};
+
+// Must be used together with av1_fast_idtx_scan_16x16
+DECLARE_ALIGNED(16, static const int16_t, av1_fast_idtx_iscan_16x16[256]) = {
+ 0, 1, 5, 6, 14, 15, 27, 28, 44, 45, 65, 66, 90, 91, 119,
+ 120, 2, 4, 7, 13, 16, 26, 29, 43, 46, 64, 67, 89, 92, 118,
+ 121, 150, 3, 8, 12, 17, 25, 30, 42, 47, 63, 68, 88, 93, 117,
+ 122, 149, 151, 9, 11, 18, 24, 31, 41, 48, 62, 69, 87, 94, 116,
+ 123, 148, 152, 177, 10, 19, 23, 32, 40, 49, 61, 70, 86, 95, 115,
+ 124, 147, 153, 176, 178, 20, 22, 33, 39, 50, 60, 71, 85, 96, 114,
+ 125, 146, 154, 175, 179, 200, 21, 34, 38, 51, 59, 72, 84, 97, 113,
+ 126, 145, 155, 174, 180, 199, 201, 35, 37, 52, 58, 73, 83, 98, 112,
+ 127, 144, 156, 173, 181, 198, 202, 219, 36, 53, 57, 74, 82, 99, 111,
+ 128, 143, 157, 172, 182, 197, 203, 218, 220, 54, 56, 75, 81, 100, 110,
+ 129, 142, 158, 171, 183, 196, 204, 217, 221, 234, 55, 76, 80, 101, 109,
+ 130, 141, 159, 170, 184, 195, 205, 216, 222, 233, 235, 77, 79, 102, 108,
+ 131, 140, 160, 169, 185, 194, 206, 215, 223, 232, 236, 245, 78, 103, 107,
+ 132, 139, 161, 168, 186, 193, 207, 214, 224, 231, 237, 244, 246, 104, 106,
+ 133, 138, 162, 167, 187, 192, 208, 213, 225, 230, 238, 243, 247, 252, 105,
+ 134, 137, 163, 166, 188, 191, 209, 212, 226, 229, 239, 242, 248, 251, 253,
+ 135, 136, 164, 165, 189, 190, 210, 211, 227, 228, 240, 241, 249, 250, 254,
+ 255
+};
+
// Indicates the blocks for which RD model should be based on special logic
static INLINE int get_model_rd_flag(const AV1_COMP *cpi, const MACROBLOCKD *xd,
BLOCK_SIZE bsize) {
@@ -59,9 +413,6 @@
* \param[in] ref_frame Reference frame for which to find
* ref MVs
* \param[in] frame_mv Predicted MVs for a block
- * \param[in] tile_data Pointer to struct holding adaptive
- * data/contexts/models for the tile
- * during encoding
* \param[in] yv12_mb Buffer to hold predicted block
* \param[in] bsize Current block size
* \param[in] force_skip_low_temp_var Flag indicating possible mode search
@@ -71,18 +422,19 @@
* \remark Nothing is returned. Instead, predicted MVs are placed into
* \c frame_mv array
*/
-static INLINE void find_predictors(
- AV1_COMP *cpi, MACROBLOCK *x, MV_REFERENCE_FRAME ref_frame,
- int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES], TileDataEnc *tile_data,
- struct buf_2d yv12_mb[8][MAX_MB_PLANE], BLOCK_SIZE bsize,
- int force_skip_low_temp_var, int skip_pred_mv) {
+static INLINE void find_predictors(AV1_COMP *cpi, MACROBLOCK *x,
+ MV_REFERENCE_FRAME ref_frame,
+ int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES],
+ struct buf_2d yv12_mb[8][MAX_MB_PLANE],
+ BLOCK_SIZE bsize,
+ int force_skip_low_temp_var,
+ int skip_pred_mv) {
AV1_COMMON *const cm = &cpi->common;
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mbmi = xd->mi[0];
MB_MODE_INFO_EXT *const mbmi_ext = &x->mbmi_ext;
const YV12_BUFFER_CONFIG *yv12 = get_ref_frame_yv12_buf(cm, ref_frame);
const int num_planes = av1_num_planes(cm);
- (void)tile_data;
x->pred_mv_sad[ref_frame] = INT_MAX;
x->pred_mv0_sad[ref_frame] = INT_MAX;
@@ -117,4 +469,99 @@
mbmi->num_proj_ref = 1;
}
+static INLINE void init_mbmi_nonrd(MB_MODE_INFO *mbmi,
+ PREDICTION_MODE pred_mode,
+ MV_REFERENCE_FRAME ref_frame0,
+ MV_REFERENCE_FRAME ref_frame1,
+ const AV1_COMMON *cm) {
+ PALETTE_MODE_INFO *const pmi = &mbmi->palette_mode_info;
+ mbmi->ref_mv_idx = 0;
+ mbmi->mode = pred_mode;
+ mbmi->uv_mode = UV_DC_PRED;
+ mbmi->ref_frame[0] = ref_frame0;
+ mbmi->ref_frame[1] = ref_frame1;
+ pmi->palette_size[PLANE_TYPE_Y] = 0;
+ pmi->palette_size[PLANE_TYPE_UV] = 0;
+ mbmi->filter_intra_mode_info.use_filter_intra = 0;
+ mbmi->mv[0].as_int = mbmi->mv[1].as_int = 0;
+ mbmi->motion_mode = SIMPLE_TRANSLATION;
+ mbmi->num_proj_ref = 1;
+ mbmi->interintra_mode = 0;
+ set_default_interp_filters(mbmi, cm->features.interp_filter);
+}
+
+static INLINE void init_estimate_block_intra_args(
+ struct estimate_block_intra_args *args, AV1_COMP *cpi, MACROBLOCK *x) {
+ args->cpi = cpi;
+ args->x = x;
+ args->mode = DC_PRED;
+ args->skippable = 1;
+ args->rdc = 0;
+ args->best_sad = UINT_MAX;
+ args->prune_mode_based_on_sad = false;
+}
+
+static INLINE int get_pred_buffer(PRED_BUFFER *p, int len) {
+ for (int buf_idx = 0; buf_idx < len; buf_idx++) {
+ if (!p[buf_idx].in_use) {
+ p[buf_idx].in_use = 1;
+ return buf_idx;
+ }
+ }
+ return -1;
+}
+
+static INLINE void free_pred_buffer(PRED_BUFFER *p) {
+ if (p != NULL) p->in_use = 0;
+}
+
+#if CONFIG_INTERNAL_STATS
+static INLINE void store_coding_context_nonrd(MACROBLOCK *x,
+ PICK_MODE_CONTEXT *ctx,
+ int mode_index) {
+#else
+static INLINE void store_coding_context_nonrd(MACROBLOCK *x,
+ PICK_MODE_CONTEXT *ctx) {
+#endif // CONFIG_INTERNAL_STATS
+ MACROBLOCKD *const xd = &x->e_mbd;
+ TxfmSearchInfo *txfm_info = &x->txfm_search_info;
+
+ // Take a snapshot of the coding context so it can be
+ // restored if we decide to encode this way
+ ctx->rd_stats.skip_txfm = txfm_info->skip_txfm;
+
+ ctx->skippable = txfm_info->skip_txfm;
+#if CONFIG_INTERNAL_STATS
+ ctx->best_mode_index = mode_index;
+#endif // CONFIG_INTERNAL_STATS
+ ctx->mic = *xd->mi[0];
+ ctx->skippable = txfm_info->skip_txfm;
+ av1_copy_mbmi_ext_to_mbmi_ext_frame(&ctx->mbmi_ext_best, &x->mbmi_ext,
+ av1_ref_frame_type(xd->mi[0]->ref_frame));
+}
+
+void av1_block_yrd(MACROBLOCK *x, RD_STATS *this_rdc, int *skippable,
+ BLOCK_SIZE bsize, TX_SIZE tx_size);
+
+void av1_block_yrd_idtx(MACROBLOCK *x, const uint8_t *const pred_buf,
+ int pred_stride, RD_STATS *this_rdc, int *skippable,
+ BLOCK_SIZE bsize, TX_SIZE tx_size);
+
+int64_t av1_model_rd_for_sb_uv(AV1_COMP *cpi, BLOCK_SIZE plane_bsize,
+ MACROBLOCK *x, MACROBLOCKD *xd,
+ RD_STATS *this_rdc, int start_plane,
+ int stop_plane);
+
+void av1_estimate_block_intra(int plane, int block, int row, int col,
+ BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
+ void *arg);
+
+void av1_estimate_intra_mode(AV1_COMP *cpi, MACROBLOCK *x, BLOCK_SIZE bsize,
+ int best_early_term, unsigned int ref_cost_intra,
+ int reuse_prediction, struct buf_2d *orig_dst,
+ PRED_BUFFER *tmp_buffers,
+ PRED_BUFFER **this_mode_pred, RD_STATS *best_rdc,
+ BEST_PICKMODE *best_pickmode,
+ PICK_MODE_CONTEXT *ctx);
+
#endif // AOM_AV1_ENCODER_NONRD_OPT_H_
diff --git a/av1/encoder/nonrd_pickmode.c b/av1/encoder/nonrd_pickmode.c
index 4bad7a6..24a5264 100644
--- a/av1/encoder/nonrd_pickmode.c
+++ b/av1/encoder/nonrd_pickmode.c
@@ -15,265 +15,17 @@
#include <math.h>
#include <stdio.h>
-#include "config/aom_dsp_rtcd.h"
-#include "config/av1_rtcd.h"
-
-#include "aom_dsp/aom_dsp_common.h"
-#include "aom_dsp/txfm_common.h"
-#include "aom_ports/mem.h"
-
-#include "av1/common/blockd.h"
-#include "av1/common/mvref_common.h"
-#include "av1/common/pred_common.h"
#include "av1/common/reconinter.h"
#include "av1/common/reconintra.h"
#include "av1/encoder/encodemv.h"
-#include "av1/encoder/encoder.h"
#include "av1/encoder/intra_mode_search.h"
#include "av1/encoder/model_rd.h"
#include "av1/encoder/motion_search_facade.h"
#include "av1/encoder/nonrd_opt.h"
-#include "av1/encoder/rdopt.h"
#include "av1/encoder/reconinter_enc.h"
#include "av1/encoder/var_based_part.h"
-#define CALC_BIASED_RDCOST(rdcost) (7 * (rdcost) >> 3)
-extern int g_pick_inter_mode_cnt;
-/*!\cond */
-typedef struct {
- uint8_t *data;
- int stride;
- int in_use;
-} PRED_BUFFER;
-
-typedef struct {
- PRED_BUFFER *best_pred;
- PREDICTION_MODE best_mode;
- TX_SIZE best_tx_size;
- TX_TYPE tx_type;
- MV_REFERENCE_FRAME best_ref_frame;
- MV_REFERENCE_FRAME best_second_ref_frame;
- uint8_t best_mode_skip_txfm;
- uint8_t best_mode_initial_skip_flag;
- int_interpfilters best_pred_filter;
- MOTION_MODE best_motion_mode;
- WarpedMotionParams wm_params;
- int num_proj_ref;
- uint8_t blk_skip[MAX_MIB_SIZE * MAX_MIB_SIZE / 4];
- PALETTE_MODE_INFO pmi;
- int64_t best_sse;
-} BEST_PICKMODE;
-
-typedef struct {
- MV_REFERENCE_FRAME ref_frame;
- PREDICTION_MODE pred_mode;
-} REF_MODE;
-
-typedef struct {
- MV_REFERENCE_FRAME ref_frame[2];
- PREDICTION_MODE pred_mode;
-} COMP_REF_MODE;
-
-typedef struct {
- InterpFilter filter_x;
- InterpFilter filter_y;
-} INTER_FILTER;
-
-/*!\brief Structure to store parameters and statistics used in non-rd inter mode
- * evaluation.
- */
-typedef struct {
- BEST_PICKMODE best_pickmode;
- RD_STATS this_rdc;
- RD_STATS best_rdc;
- int64_t uv_dist[RTC_INTER_MODES][REF_FRAMES];
- struct buf_2d yv12_mb[REF_FRAMES][MAX_MB_PLANE];
- unsigned int vars[RTC_INTER_MODES][REF_FRAMES];
- unsigned int ref_costs_single[REF_FRAMES];
- int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES];
- int_mv frame_mv_best[MB_MODE_COUNT][REF_FRAMES];
- int single_inter_mode_costs[RTC_INTER_MODES][REF_FRAMES];
- int use_ref_frame_mask[REF_FRAMES];
- uint8_t mode_checked[MB_MODE_COUNT][REF_FRAMES];
-} InterModeSearchStateNonrd;
-/*!\endcond */
-
-#define NUM_COMP_INTER_MODES_RT (6)
-#define NUM_INTER_MODES 12
-
-// GLOBALMV in the set below is in fact ZEROMV as we don't do global ME in RT
-// mode
-static const REF_MODE ref_mode_set[NUM_INTER_MODES] = {
- { LAST_FRAME, NEARESTMV }, { LAST_FRAME, NEARMV },
- { LAST_FRAME, GLOBALMV }, { LAST_FRAME, NEWMV },
- { GOLDEN_FRAME, NEARESTMV }, { GOLDEN_FRAME, NEARMV },
- { GOLDEN_FRAME, GLOBALMV }, { GOLDEN_FRAME, NEWMV },
- { ALTREF_FRAME, NEARESTMV }, { ALTREF_FRAME, NEARMV },
- { ALTREF_FRAME, GLOBALMV }, { ALTREF_FRAME, NEWMV },
-};
-
-static const COMP_REF_MODE comp_ref_mode_set[NUM_COMP_INTER_MODES_RT] = {
- { { LAST_FRAME, GOLDEN_FRAME }, GLOBAL_GLOBALMV },
- { { LAST_FRAME, GOLDEN_FRAME }, NEAREST_NEARESTMV },
- { { LAST_FRAME, LAST2_FRAME }, GLOBAL_GLOBALMV },
- { { LAST_FRAME, LAST2_FRAME }, NEAREST_NEARESTMV },
- { { LAST_FRAME, ALTREF_FRAME }, GLOBAL_GLOBALMV },
- { { LAST_FRAME, ALTREF_FRAME }, NEAREST_NEARESTMV },
-};
-
-static const INTER_FILTER filters_ref_set[9] = {
- { EIGHTTAP_REGULAR, EIGHTTAP_REGULAR }, { EIGHTTAP_SMOOTH, EIGHTTAP_SMOOTH },
- { EIGHTTAP_REGULAR, EIGHTTAP_SMOOTH }, { EIGHTTAP_SMOOTH, EIGHTTAP_REGULAR },
- { MULTITAP_SHARP, MULTITAP_SHARP }, { EIGHTTAP_REGULAR, MULTITAP_SHARP },
- { MULTITAP_SHARP, EIGHTTAP_REGULAR }, { EIGHTTAP_SMOOTH, MULTITAP_SHARP },
- { MULTITAP_SHARP, EIGHTTAP_SMOOTH }
-};
-
-enum {
- // INTER_ALL = (1 << NEARESTMV) | (1 << NEARMV) | (1 << NEWMV),
- INTER_NEAREST = (1 << NEARESTMV),
- INTER_NEAREST_NEW = (1 << NEARESTMV) | (1 << NEWMV),
- INTER_NEAREST_NEAR = (1 << NEARESTMV) | (1 << NEARMV),
- INTER_NEAR_NEW = (1 << NEARMV) | (1 << NEWMV),
-};
-
-// The original scan order (default_scan_8x8) is modified according to the extra
-// transpose in hadamard c implementation, i.e., aom_hadamard_lp_8x8_c and
-// aom_hadamard_8x8_c.
-DECLARE_ALIGNED(16, static const int16_t, default_scan_8x8_transpose[64]) = {
- 0, 8, 1, 2, 9, 16, 24, 17, 10, 3, 4, 11, 18, 25, 32, 40,
- 33, 26, 19, 12, 5, 6, 13, 20, 27, 34, 41, 48, 56, 49, 42, 35,
- 28, 21, 14, 7, 15, 22, 29, 36, 43, 50, 57, 58, 51, 44, 37, 30,
- 23, 31, 38, 45, 52, 59, 60, 53, 46, 39, 47, 54, 61, 62, 55, 63
-};
-
-// The original scan order (av1_default_iscan_8x8) is modified to match
-// hadamard AVX2 implementation, i.e., aom_hadamard_lp_8x8_avx2 and
-// aom_hadamard_8x8_avx2. Since hadamard AVX2 implementation will modify the
-// order of coefficients, such that the normal scan order is no longer
-// guaranteed to scan low coefficients first, therefore we modify the scan order
-// accordingly.
-// Note that this one has to be used together with default_scan_8x8_transpose.
-DECLARE_ALIGNED(16, static const int16_t,
- av1_default_iscan_8x8_transpose[64]) = {
- 0, 2, 3, 9, 10, 20, 21, 35, 1, 4, 8, 11, 19, 22, 34, 36,
- 5, 7, 12, 18, 23, 33, 37, 48, 6, 13, 17, 24, 32, 38, 47, 49,
- 14, 16, 25, 31, 39, 46, 50, 57, 15, 26, 30, 40, 45, 51, 56, 58,
- 27, 29, 41, 44, 52, 55, 59, 62, 28, 42, 43, 53, 54, 60, 61, 63
-};
-
-// The original scan order (default_scan_16x16) is modified according to the
-// extra transpose in hadamard c implementation in lp case, i.e.,
-// aom_hadamard_lp_16x16_c.
-DECLARE_ALIGNED(16, static const int16_t,
- default_scan_lp_16x16_transpose[256]) = {
- 0, 8, 2, 4, 10, 16, 24, 18, 12, 6, 64, 14, 20, 26, 32,
- 40, 34, 28, 22, 72, 66, 68, 74, 80, 30, 36, 42, 48, 56, 50,
- 44, 38, 88, 82, 76, 70, 128, 78, 84, 90, 96, 46, 52, 58, 1,
- 9, 3, 60, 54, 104, 98, 92, 86, 136, 130, 132, 138, 144, 94, 100,
- 106, 112, 62, 5, 11, 17, 25, 19, 13, 7, 120, 114, 108, 102, 152,
- 146, 140, 134, 192, 142, 148, 154, 160, 110, 116, 122, 65, 15, 21, 27,
- 33, 41, 35, 29, 23, 73, 67, 124, 118, 168, 162, 156, 150, 200, 194,
- 196, 202, 208, 158, 164, 170, 176, 126, 69, 75, 81, 31, 37, 43, 49,
- 57, 51, 45, 39, 89, 83, 77, 71, 184, 178, 172, 166, 216, 210, 204,
- 198, 206, 212, 218, 224, 174, 180, 186, 129, 79, 85, 91, 97, 47, 53,
- 59, 61, 55, 105, 99, 93, 87, 137, 131, 188, 182, 232, 226, 220, 214,
- 222, 228, 234, 240, 190, 133, 139, 145, 95, 101, 107, 113, 63, 121, 115,
- 109, 103, 153, 147, 141, 135, 248, 242, 236, 230, 238, 244, 250, 193, 143,
- 149, 155, 161, 111, 117, 123, 125, 119, 169, 163, 157, 151, 201, 195, 252,
- 246, 254, 197, 203, 209, 159, 165, 171, 177, 127, 185, 179, 173, 167, 217,
- 211, 205, 199, 207, 213, 219, 225, 175, 181, 187, 189, 183, 233, 227, 221,
- 215, 223, 229, 235, 241, 191, 249, 243, 237, 231, 239, 245, 251, 253, 247,
- 255
-};
-
-#if CONFIG_AV1_HIGHBITDEPTH
-// The original scan order (default_scan_16x16) is modified according to the
-// extra shift in hadamard c implementation in fp case, i.e.,
-// aom_hadamard_16x16_c. Note that 16x16 lp and fp hadamard generate different
-// outputs, so we handle them separately.
-DECLARE_ALIGNED(16, static const int16_t,
- default_scan_fp_16x16_transpose[256]) = {
- 0, 4, 2, 8, 6, 16, 20, 18, 12, 10, 64, 14, 24, 22, 32,
- 36, 34, 28, 26, 68, 66, 72, 70, 80, 30, 40, 38, 48, 52, 50,
- 44, 42, 84, 82, 76, 74, 128, 78, 88, 86, 96, 46, 56, 54, 1,
- 5, 3, 60, 58, 100, 98, 92, 90, 132, 130, 136, 134, 144, 94, 104,
- 102, 112, 62, 9, 7, 17, 21, 19, 13, 11, 116, 114, 108, 106, 148,
- 146, 140, 138, 192, 142, 152, 150, 160, 110, 120, 118, 65, 15, 25, 23,
- 33, 37, 35, 29, 27, 69, 67, 124, 122, 164, 162, 156, 154, 196, 194,
- 200, 198, 208, 158, 168, 166, 176, 126, 73, 71, 81, 31, 41, 39, 49,
- 53, 51, 45, 43, 85, 83, 77, 75, 180, 178, 172, 170, 212, 210, 204,
- 202, 206, 216, 214, 224, 174, 184, 182, 129, 79, 89, 87, 97, 47, 57,
- 55, 61, 59, 101, 99, 93, 91, 133, 131, 188, 186, 228, 226, 220, 218,
- 222, 232, 230, 240, 190, 137, 135, 145, 95, 105, 103, 113, 63, 117, 115,
- 109, 107, 149, 147, 141, 139, 244, 242, 236, 234, 238, 248, 246, 193, 143,
- 153, 151, 161, 111, 121, 119, 125, 123, 165, 163, 157, 155, 197, 195, 252,
- 250, 254, 201, 199, 209, 159, 169, 167, 177, 127, 181, 179, 173, 171, 213,
- 211, 205, 203, 207, 217, 215, 225, 175, 185, 183, 189, 187, 229, 227, 221,
- 219, 223, 233, 231, 241, 191, 245, 243, 237, 235, 239, 249, 247, 253, 251,
- 255
-};
-#endif
-
-// The original scan order (av1_default_iscan_16x16) is modified to match
-// hadamard AVX2 implementation, i.e., aom_hadamard_lp_16x16_avx2.
-// Since hadamard AVX2 implementation will modify the order of coefficients,
-// such that the normal scan order is no longer guaranteed to scan low
-// coefficients first, therefore we modify the scan order accordingly. Note that
-// this one has to be used together with default_scan_lp_16x16_transpose.
-DECLARE_ALIGNED(16, static const int16_t,
- av1_default_iscan_lp_16x16_transpose[256]) = {
- 0, 44, 2, 46, 3, 63, 9, 69, 1, 45, 4, 64, 8, 68, 11,
- 87, 5, 65, 7, 67, 12, 88, 18, 94, 6, 66, 13, 89, 17, 93,
- 24, 116, 14, 90, 16, 92, 25, 117, 31, 123, 15, 91, 26, 118, 30,
- 122, 41, 148, 27, 119, 29, 121, 42, 149, 48, 152, 28, 120, 43, 150,
- 47, 151, 62, 177, 10, 86, 20, 96, 21, 113, 35, 127, 19, 95, 22,
- 114, 34, 126, 37, 144, 23, 115, 33, 125, 38, 145, 52, 156, 32, 124,
- 39, 146, 51, 155, 58, 173, 40, 147, 50, 154, 59, 174, 73, 181, 49,
- 153, 60, 175, 72, 180, 83, 198, 61, 176, 71, 179, 84, 199, 98, 202,
- 70, 178, 85, 200, 97, 201, 112, 219, 36, 143, 54, 158, 55, 170, 77,
- 185, 53, 157, 56, 171, 76, 184, 79, 194, 57, 172, 75, 183, 80, 195,
- 102, 206, 74, 182, 81, 196, 101, 205, 108, 215, 82, 197, 100, 204, 109,
- 216, 131, 223, 99, 203, 110, 217, 130, 222, 140, 232, 111, 218, 129, 221,
- 141, 233, 160, 236, 128, 220, 142, 234, 159, 235, 169, 245, 78, 193, 104,
- 208, 105, 212, 135, 227, 103, 207, 106, 213, 134, 226, 136, 228, 107, 214,
- 133, 225, 137, 229, 164, 240, 132, 224, 138, 230, 163, 239, 165, 241, 139,
- 231, 162, 238, 166, 242, 189, 249, 161, 237, 167, 243, 188, 248, 190, 250,
- 168, 244, 187, 247, 191, 251, 210, 254, 186, 246, 192, 252, 209, 253, 211,
- 255
-};
-
-#if CONFIG_AV1_HIGHBITDEPTH
-// The original scan order (av1_default_iscan_16x16) is modified to match
-// hadamard AVX2 implementation, i.e., aom_hadamard_16x16_avx2.
-// Since hadamard AVX2 implementation will modify the order of coefficients,
-// such that the normal scan order is no longer guaranteed to scan low
-// coefficients first, therefore we modify the scan order accordingly. Note that
-// this one has to be used together with default_scan_fp_16x16_transpose.
-DECLARE_ALIGNED(16, static const int16_t,
- av1_default_iscan_fp_16x16_transpose[256]) = {
- 0, 44, 2, 46, 1, 45, 4, 64, 3, 63, 9, 69, 8, 68, 11,
- 87, 5, 65, 7, 67, 6, 66, 13, 89, 12, 88, 18, 94, 17, 93,
- 24, 116, 14, 90, 16, 92, 15, 91, 26, 118, 25, 117, 31, 123, 30,
- 122, 41, 148, 27, 119, 29, 121, 28, 120, 43, 150, 42, 149, 48, 152,
- 47, 151, 62, 177, 10, 86, 20, 96, 19, 95, 22, 114, 21, 113, 35,
- 127, 34, 126, 37, 144, 23, 115, 33, 125, 32, 124, 39, 146, 38, 145,
- 52, 156, 51, 155, 58, 173, 40, 147, 50, 154, 49, 153, 60, 175, 59,
- 174, 73, 181, 72, 180, 83, 198, 61, 176, 71, 179, 70, 178, 85, 200,
- 84, 199, 98, 202, 97, 201, 112, 219, 36, 143, 54, 158, 53, 157, 56,
- 171, 55, 170, 77, 185, 76, 184, 79, 194, 57, 172, 75, 183, 74, 182,
- 81, 196, 80, 195, 102, 206, 101, 205, 108, 215, 82, 197, 100, 204, 99,
- 203, 110, 217, 109, 216, 131, 223, 130, 222, 140, 232, 111, 218, 129, 221,
- 128, 220, 142, 234, 141, 233, 160, 236, 159, 235, 169, 245, 78, 193, 104,
- 208, 103, 207, 106, 213, 105, 212, 135, 227, 134, 226, 136, 228, 107, 214,
- 133, 225, 132, 224, 138, 230, 137, 229, 164, 240, 163, 239, 165, 241, 139,
- 231, 162, 238, 161, 237, 167, 243, 166, 242, 189, 249, 188, 248, 190, 250,
- 168, 244, 187, 247, 186, 246, 192, 252, 191, 251, 210, 254, 209, 253, 211,
- 255
-};
-#endif
-
static INLINE int early_term_inter_search_with_sse(int early_term_idx,
BLOCK_SIZE bsize,
int64_t this_sse,
@@ -317,17 +69,44 @@
bp->best_pred = NULL;
bp->best_motion_mode = SIMPLE_TRANSLATION;
bp->num_proj_ref = 0;
- memset(&bp->wm_params, 0, sizeof(bp->wm_params));
- memset(&bp->blk_skip, 0, sizeof(bp->blk_skip));
- memset(&bp->pmi, 0, sizeof(bp->pmi));
+ av1_zero(bp->wm_params);
+ av1_zero(bp->pmi);
+}
+
+// Copy best inter mode parameters to best_pickmode
+static INLINE void update_search_state_nonrd(
+ InterModeSearchStateNonrd *search_state, MB_MODE_INFO *const mi,
+ TxfmSearchInfo *txfm_info, RD_STATS *nonskip_rdc, PICK_MODE_CONTEXT *ctx,
+ PREDICTION_MODE this_best_mode, const int64_t sse_y) {
+ BEST_PICKMODE *const best_pickmode = &search_state->best_pickmode;
+
+ best_pickmode->best_sse = sse_y;
+ best_pickmode->best_mode = this_best_mode;
+ best_pickmode->best_motion_mode = mi->motion_mode;
+ best_pickmode->wm_params = mi->wm_params;
+ best_pickmode->num_proj_ref = mi->num_proj_ref;
+ best_pickmode->best_pred_filter = mi->interp_filters;
+ best_pickmode->best_tx_size = mi->tx_size;
+ best_pickmode->best_ref_frame = mi->ref_frame[0];
+ best_pickmode->best_second_ref_frame = mi->ref_frame[1];
+ best_pickmode->best_mode_skip_txfm = search_state->this_rdc.skip_txfm;
+ best_pickmode->best_mode_initial_skip_flag =
+ (nonskip_rdc->rate == INT_MAX && search_state->this_rdc.skip_txfm);
+ if (!best_pickmode->best_mode_skip_txfm) {
+ memcpy(ctx->blk_skip, txfm_info->blk_skip,
+ sizeof(txfm_info->blk_skip[0]) * ctx->num_4x4_blk);
+ }
}
static INLINE int subpel_select(AV1_COMP *cpi, MACROBLOCK *x, BLOCK_SIZE bsize,
int_mv *mv, MV ref_mv, FULLPEL_MV start_mv,
bool fullpel_performed_well) {
const int frame_lowmotion = cpi->rc.avg_frame_low_motion;
+ const int reduce_mv_pel_precision_highmotion =
+ cpi->sf.rt_sf.reduce_mv_pel_precision_highmotion;
+
// Reduce MV precision for higher int MV value & frame-level motion
- if (cpi->sf.rt_sf.reduce_mv_pel_precision_highmotion >= 3) {
+ if (reduce_mv_pel_precision_highmotion >= 3) {
int mv_thresh = 4;
const int is_low_resoln =
(cpi->common.width * cpi->common.height <= 320 * 240);
@@ -337,10 +116,10 @@
if (abs(mv->as_fullmv.row) >= mv_thresh ||
abs(mv->as_fullmv.col) >= mv_thresh)
return HALF_PEL;
- } else if (cpi->sf.rt_sf.reduce_mv_pel_precision_highmotion >= 1) {
+ } else if (reduce_mv_pel_precision_highmotion >= 1) {
int mv_thresh;
const int th_vals[2][3] = { { 4, 8, 10 }, { 4, 6, 8 } };
- const int th_idx = cpi->sf.rt_sf.reduce_mv_pel_precision_highmotion - 1;
+ const int th_idx = reduce_mv_pel_precision_highmotion - 1;
assert(th_idx >= 0 && th_idx < 2);
if (frame_lowmotion > 0 && frame_lowmotion < 40)
mv_thresh = 12;
@@ -375,9 +154,9 @@
return cpi->sf.mv_sf.subpel_force_stop;
}
-static bool use_aggressive_subpel_search_method(
- MACROBLOCK *x, bool use_adaptive_subpel_search,
- const bool fullpel_performed_well) {
+static bool use_aggressive_subpel_search_method(MACROBLOCK *x,
+ bool use_adaptive_subpel_search,
+ bool fullpel_performed_well) {
if (!use_adaptive_subpel_search) return false;
const int qband = x->qindex >> (QINDEX_BITS - 2);
assert(qband < 4);
@@ -437,11 +216,12 @@
av1_get_scaled_ref_frame(cpi, ref);
if (scaled_ref_frame) {
- int i;
+ int plane;
// Swap out the reference frame for a version that's been scaled to
// match the resolution of the current frame, allowing the existing
// motion search code to be used without additional modifications.
- for (i = 0; i < MAX_MB_PLANE; i++) backup_yv12[i] = xd->plane[i].pre[0];
+ for (plane = 0; plane < MAX_MB_PLANE; plane++)
+ backup_yv12[plane] = xd->plane[plane].pre[0];
av1_setup_pre_planes(xd, 0, scaled_ref_frame, mi_row, mi_col, NULL,
num_planes);
}
@@ -458,7 +238,7 @@
av1_get_search_site_config(cpi, x, search_method);
FULLPEL_MOTION_SEARCH_PARAMS full_ms_params;
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize, ¢er_mv,
- src_search_sites,
+ start_mv, src_search_sites,
/*fine_search_interval=*/0);
const unsigned int full_var_rd = av1_full_pixel_search(
@@ -505,8 +285,8 @@
}
if (scaled_ref_frame) {
- int i;
- for (i = 0; i < MAX_MB_PLANE; i++) xd->plane[i].pre[0] = backup_yv12[i];
+ for (int plane = 0; plane < MAX_MB_PLANE; plane++)
+ xd->plane[plane].pre[0] = backup_yv12[plane];
}
// The final MV can not be equal to the reference MV as this will trigger an
// assert later. This can happen if both NEAREST and NEAR modes were skipped.
@@ -550,6 +330,7 @@
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mi = xd->mi[0];
AV1_COMMON *cm = &cpi->common;
+ int_mv *this_ref_frm_newmv = &frame_mv[NEWMV][ref_frame];
if (ref_frame > LAST_FRAME && cpi->oxcf.rc_cfg.mode == AOM_CBR &&
gf_temporal_ref) {
int tmp_sad;
@@ -563,13 +344,13 @@
if (tmp_sad > x->pred_mv_sad[LAST_FRAME]) return -1;
- frame_mv[NEWMV][ref_frame].as_int = mi->mv[0].as_int;
+ this_ref_frm_newmv->as_int = mi->mv[0].as_int;
int_mv best_mv = mi->mv[0];
best_mv.as_mv.row >>= 3;
best_mv.as_mv.col >>= 3;
MV ref_mv = av1_get_ref_mv(x, 0).as_mv;
- frame_mv[NEWMV][ref_frame].as_mv.row >>= 3;
- frame_mv[NEWMV][ref_frame].as_mv.col >>= 3;
+ this_ref_frm_newmv->as_mv.row >>= 3;
+ this_ref_frm_newmv->as_mv.col >>= 3;
SUBPEL_MOTION_SEARCH_PARAMS ms_params;
av1_make_default_subpel_ms_params(&ms_params, cpi, x, bsize, &ref_mv, NULL);
@@ -584,17 +365,17 @@
cpi->mv_search_params.find_fractional_mv_step(
xd, cm, &ms_params, start_mv, &best_mv.as_mv, &dis,
&x->pred_sse[ref_frame], NULL);
- frame_mv[NEWMV][ref_frame].as_int = best_mv.as_int;
+ this_ref_frm_newmv->as_int = best_mv.as_int;
// When NEWMV is same as ref_mv from the drl, it is preferred to code the
// MV as NEARESTMV or NEARMV. In this case, NEWMV needs to be skipped to
// avoid an assert failure at a later stage. The scenario can occur if
// NEARESTMV was not evaluated for ALTREF.
- if (frame_mv[NEWMV][ref_frame].as_mv.col == ref_mv.col &&
- frame_mv[NEWMV][ref_frame].as_mv.row == ref_mv.row)
+ if (this_ref_frm_newmv->as_mv.col == ref_mv.col &&
+ this_ref_frm_newmv->as_mv.row == ref_mv.row)
return -1;
- *rate_mv = av1_mv_bit_cost(&frame_mv[NEWMV][ref_frame].as_mv, &ref_mv,
+ *rate_mv = av1_mv_bit_cost(&this_ref_frm_newmv->as_mv, &ref_mv,
x->mv_costs->nmv_joint_cost,
x->mv_costs->mv_cost_stack, MV_COST_WEIGHT);
} else if (!combined_motion_search(cpi, x, bsize, mi_row, mi_col,
@@ -643,7 +424,7 @@
if (x->txfm_search_params.tx_mode_search_type == TX_MODE_SELECT &&
cpi->sf.rt_sf.tx_size_level_based_on_qstep &&
cpi->sf.rt_sf.tx_size_level_based_on_qstep >= 2) {
- const int qstep = x->plane[0].dequant_QTX[1] >> (x->e_mbd.bd - 5);
+ const int qstep = x->plane[AOM_PLANE_Y].dequant_QTX[1] >> (x->e_mbd.bd - 5);
const unsigned int qstep_sq = qstep * qstep;
// If the sse is low for low source variance blocks, mark those as
// transform skip.
@@ -651,7 +432,8 @@
// low so that reliable early estimate of tx skip can be obtained
// through its comparison with sse.
if (sse < qstep_sq && x->source_variance < qstep_sq &&
- x->color_sensitivity[0] == 0 && x->color_sensitivity[1] == 0)
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 0)
*force_skip = 1;
}
}
@@ -676,7 +458,7 @@
const int mult[4] = { 8, 7, 6, 5 };
assert(qband < 4);
multiplier = mult[qband];
- const int qstep = x->plane[0].dequant_QTX[1] >> (xd->bd - 5);
+ const int qstep = x->plane[AOM_PLANE_Y].dequant_QTX[1] >> (xd->bd - 5);
const unsigned int qstep_sq = qstep * qstep;
var_thresh = qstep_sq * 2;
if (cpi->sf.rt_sf.tx_size_level_based_on_qstep >= 2) {
@@ -686,7 +468,8 @@
// low so that reliable early estimate of tx skip can be obtained
// through its comparison with sse.
if (sse < qstep_sq && x->source_variance < qstep_sq &&
- x->color_sensitivity[0] == 0 && x->color_sensitivity[1] == 0)
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 0)
*force_skip = 1;
// Further lower transform size based on aq mode only if residual
// variance is high.
@@ -719,13 +502,6 @@
return AOMMIN(tx_size, TX_16X16);
}
-static const uint8_t b_width_log2_lookup[BLOCK_SIZES] = { 0, 0, 1, 1, 1, 2,
- 2, 2, 3, 3, 3, 4,
- 4, 4, 5, 5 };
-static const uint8_t b_height_log2_lookup[BLOCK_SIZES] = { 0, 1, 0, 1, 2, 1,
- 2, 3, 2, 3, 4, 3,
- 4, 5, 4, 5 };
-
static void block_variance(const uint8_t *src, int src_stride,
const uint8_t *ref, int ref_stride, int w, int h,
unsigned int *sse, int *sum, int block_size,
@@ -740,11 +516,12 @@
// 32 samples respectively.
assert(w >= 32);
assert(h >= 8);
- for (int i = 0; i < h; i += block_size) {
- for (int j = 0; j < w; j += 32) {
- aom_get_var_sse_sum_8x8_quad(
- src + src_stride * i + j, src_stride, ref + ref_stride * i + j,
- ref_stride, &sse8x8[k], &sum8x8[k], sse, sum, &var8x8[k]);
+ for (int row = 0; row < h; row += block_size) {
+ for (int col = 0; col < w; col += 32) {
+ aom_get_var_sse_sum_8x8_quad(src + src_stride * row + col, src_stride,
+ ref + ref_stride * row + col, ref_stride,
+ &sse8x8[k], &sum8x8[k], sse, sum,
+ &var8x8[k]);
k += 4;
}
}
@@ -764,10 +541,10 @@
// least 16 and 32 samples respectively.
assert(w >= 32);
assert(h >= 16);
- for (int i = 0; i < h; i += block_size) {
- for (int j = 0; j < w; j += 32) {
- aom_get_var_sse_sum_16x16_dual(src + src_stride * i + j, src_stride,
- ref + ref_stride * i + j, ref_stride,
+ for (int row = 0; row < h; row += block_size) {
+ for (int col = 0; col < w; col += 32) {
+ aom_get_var_sse_sum_16x16_dual(src + src_stride * row + col, src_stride,
+ ref + ref_stride * row + col, ref_stride,
&sse16x16[k], sse, sum, &var16x16[k]);
k += 2;
}
@@ -781,14 +558,14 @@
const BLOCK_SIZE unit_size = txsize_to_bsize[tx_size];
const int nw = 1 << (bw - b_width_log2_lookup[unit_size]);
const int nh = 1 << (bh - b_height_log2_lookup[unit_size]);
- int i, j, k = 0;
+ int row, col, k = 0;
- for (i = 0; i < nh; i += 2) {
- for (j = 0; j < nw; j += 2) {
- sse_o[k] = sse_i[i * nw + j] + sse_i[i * nw + j + 1] +
- sse_i[(i + 1) * nw + j] + sse_i[(i + 1) * nw + j + 1];
- sum_o[k] = sum_i[i * nw + j] + sum_i[i * nw + j + 1] +
- sum_i[(i + 1) * nw + j] + sum_i[(i + 1) * nw + j + 1];
+ for (row = 0; row < nh; row += 2) {
+ for (col = 0; col < nw; col += 2) {
+ sse_o[k] = sse_i[row * nw + col] + sse_i[row * nw + col + 1] +
+ sse_i[(row + 1) * nw + col] + sse_i[(row + 1) * nw + col + 1];
+ sum_o[k] = sum_i[row * nw + col] + sum_i[row * nw + col + 1] +
+ sum_i[(row + 1) * nw + col] + sum_i[(row + 1) * nw + col + 1];
var_o[k] = sse_o[k] - (uint32_t)(((int64_t)sum_o[k] * sum_o[k]) >>
(b_width_log2_lookup[unit_size] +
b_height_log2_lookup[unit_size] + 6));
@@ -798,8 +575,7 @@
}
// Adjust the ac_thr according to speed, width, height and normalized sum
-static int ac_thr_factor(const int speed, const int width, const int height,
- const int norm_sum) {
+static int ac_thr_factor(int speed, int width, int height, int norm_sum) {
if (speed >= 8 && norm_sum < 5) {
if (width <= 640 && height <= 480)
return 4;
@@ -815,7 +591,7 @@
int mi_col, int *early_term, int num_blk, const unsigned int *sse_tx,
const unsigned int *var_tx, int sum, unsigned int var, unsigned int sse) {
AV1_COMMON *const cm = &cpi->common;
- struct macroblock_plane *const p = &x->plane[0];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
const uint32_t dc_quant = p->dequant_QTX[0];
const uint32_t ac_quant = p->dequant_QTX[1];
const int64_t dc_thr = dc_quant * dc_quant >> 6;
@@ -857,13 +633,13 @@
unsigned int var_uv[2];
unsigned int sse_uv[2];
// Transform skipping test in UV planes.
- for (int i = 1; i <= 2; i++) {
- int j = i - 1;
+ for (int plane = AOM_PLANE_U; plane <= AOM_PLANE_V; plane++) {
+ int j = plane - 1;
skip_uv[j] = 1;
- if (x->color_sensitivity[j]) {
+ if (x->color_sensitivity[COLOR_SENS_IDX(plane)]) {
skip_uv[j] = 0;
- struct macroblock_plane *const puv = &x->plane[i];
- struct macroblockd_plane *const puvd = &xd->plane[i];
+ struct macroblock_plane *const puv = &x->plane[plane];
+ struct macroblockd_plane *const puvd = &xd->plane[plane];
const BLOCK_SIZE uv_bsize = get_plane_block_size(
bsize, puvd->subsampling_x, puvd->subsampling_y);
// Adjust these thresholds for UV.
@@ -871,8 +647,8 @@
(puv->dequant_QTX[0] * puv->dequant_QTX[0]) >> 3;
const int64_t uv_ac_thr =
(puv->dequant_QTX[1] * puv->dequant_QTX[1]) >> 3;
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, i,
- i);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ plane, plane);
var_uv[j] = cpi->ppi->fn_ptr[uv_bsize].vf(puv->src.buf, puv->src.stride,
puvd->dst.buf,
puvd->dst.stride, &sse_uv[j]);
@@ -921,8 +697,8 @@
// Hence quantizer step is also 8 times. To get effective quantizer
// we need to divide by 8 before sending to modeling function.
unsigned int sse;
- struct macroblock_plane *const p = &x->plane[0];
- struct macroblockd_plane *const pd = &xd->plane[0];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
int test_skip = 1;
unsigned int var;
int sum;
@@ -1007,8 +783,8 @@
// Hence quantizer step is also 8 times. To get effective quantizer
// we need to divide by 8 before sending to modeling function.
unsigned int sse;
- struct macroblock_plane *const p = &x->plane[0];
- struct macroblockd_plane *const pd = &xd->plane[0];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
int test_skip = 1;
unsigned int var;
int sum;
@@ -1093,8 +869,8 @@
assert(bsize < BLOCK_SIZES_ALL);
- struct macroblock_plane *const p = &x->plane[0];
- struct macroblockd_plane *const pd = &xd->plane[0];
+ struct macroblock_plane *const p = &x->plane[AOM_PLANE_Y];
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
unsigned int sse;
int rate;
int64_t dist;
@@ -1113,7 +889,7 @@
model_rd_with_curvfit(cpi, x, bsize, AOM_PLANE_Y, sse, bwide * bhigh, &rate,
&dist);
} else {
- rate = INT_MAX; // this will be overwritten later with block_yrd
+ rate = INT_MAX; // this will be overwritten later with av1_block_yrd
dist = INT_MAX;
}
rd_stats->sse = sse;
@@ -1132,496 +908,7 @@
rd_stats->dist = dist;
}
-static INLINE void aom_process_hadamard_lp_8x16(MACROBLOCK *x,
- int max_blocks_high,
- int max_blocks_wide,
- int num_4x4_w, int step,
- int block_step) {
- struct macroblock_plane *const p = &x->plane[0];
- const int bw = 4 * num_4x4_w;
- const int num_4x4 = AOMMIN(num_4x4_w, max_blocks_wide);
- int block = 0;
-
- for (int r = 0; r < max_blocks_high; r += block_step) {
- for (int c = 0; c < num_4x4; c += 2 * block_step) {
- const int16_t *src_diff = &p->src_diff[(r * bw + c) << 2];
- int16_t *low_coeff = (int16_t *)p->coeff + BLOCK_OFFSET(block);
- aom_hadamard_lp_8x8_dual(src_diff, (ptrdiff_t)bw, low_coeff);
- block += 2 * step;
- }
- }
-}
-
-#define DECLARE_BLOCK_YRD_BUFFERS() \
- DECLARE_ALIGNED(64, tran_low_t, dqcoeff_buf[16 * 16]); \
- DECLARE_ALIGNED(64, tran_low_t, qcoeff_buf[16 * 16]); \
- DECLARE_ALIGNED(64, tran_low_t, coeff_buf[16 * 16]); \
- uint16_t eob[1];
-
-#define DECLARE_BLOCK_YRD_VARS() \
- /* When is_tx_8x8_dual_applicable is true, we compute the txfm for the \
- * entire bsize and write macroblock_plane::coeff. So low_coeff is kept \
- * as a non-const so we can reassign it to macroblock_plane::coeff. */ \
- int16_t *low_coeff = (int16_t *)coeff_buf; \
- int16_t *const low_qcoeff = (int16_t *)qcoeff_buf; \
- int16_t *const low_dqcoeff = (int16_t *)dqcoeff_buf; \
- const SCAN_ORDER *const scan_order = &av1_scan_orders[tx_size][DCT_DCT]; \
- const int diff_stride = bw;
-
-#define DECLARE_LOOP_VARS_BLOCK_YRD() \
- const int16_t *src_diff = &p->src_diff[(r * diff_stride + c) << 2];
-
-#if CONFIG_AV1_HIGHBITDEPTH
-#define DECLARE_BLOCK_YRD_HBD_VARS() \
- tran_low_t *const coeff = coeff_buf; \
- tran_low_t *const qcoeff = qcoeff_buf; \
- tran_low_t *const dqcoeff = dqcoeff_buf;
-
-static AOM_FORCE_INLINE void update_yrd_loop_vars_hbd(
- MACROBLOCK *x, int *skippable, const int step, const int ncoeffs,
- tran_low_t *const coeff, tran_low_t *const qcoeff,
- tran_low_t *const dqcoeff, RD_STATS *this_rdc, int *eob_cost,
- const int tx_blk_id) {
- const int is_txfm_skip = (ncoeffs == 0);
- *skippable &= is_txfm_skip;
- x->txfm_search_info.blk_skip[tx_blk_id] = is_txfm_skip;
- *eob_cost += get_msb(ncoeffs + 1);
-
- int64_t dummy;
- if (ncoeffs == 1)
- this_rdc->rate += (int)abs(qcoeff[0]);
- else if (ncoeffs > 1)
- this_rdc->rate += aom_satd(qcoeff, step << 4);
-
- this_rdc->dist += av1_block_error(coeff, dqcoeff, step << 4, &dummy) >> 2;
-}
-#endif
-static AOM_FORCE_INLINE void update_yrd_loop_vars(
- MACROBLOCK *x, int *skippable, const int step, const int ncoeffs,
- int16_t *const low_coeff, int16_t *const low_qcoeff,
- int16_t *const low_dqcoeff, RD_STATS *this_rdc, int *eob_cost,
- const int tx_blk_id) {
- const int is_txfm_skip = (ncoeffs == 0);
- *skippable &= is_txfm_skip;
- x->txfm_search_info.blk_skip[tx_blk_id] = is_txfm_skip;
- *eob_cost += get_msb(ncoeffs + 1);
- if (ncoeffs == 1)
- this_rdc->rate += (int)abs(low_qcoeff[0]);
- else if (ncoeffs > 1)
- this_rdc->rate += aom_satd_lp(low_qcoeff, step << 4);
-
- this_rdc->dist += av1_block_error_lp(low_coeff, low_dqcoeff, step << 4) >> 2;
-}
-
-/*!\brief Calculates RD Cost using Hadamard transform.
- *
- * \ingroup nonrd_mode_search
- * \callgraph
- * \callergraph
- * Calculates RD Cost using Hadamard transform. For low bit depth this function
- * uses low-precision set of functions (16-bit) and 32 bit for high bit depth
- * \param[in] x Pointer to structure holding all the data for
- the current macroblock
- * \param[in] this_rdc Pointer to calculated RD Cost
- * \param[in] skippable Pointer to a flag indicating possible tx skip
- * \param[in] bsize Current block size
- * \param[in] tx_size Transform size
- * \param[in] is_inter_mode Flag to indicate inter mode
- *
- * \remark Nothing is returned. Instead, calculated RD cost is placed to
- * \c this_rdc. \c skippable flag is set if there is no non-zero quantized
- * coefficients for Hadamard transform
- */
-static void block_yrd(MACROBLOCK *x, RD_STATS *this_rdc, int *skippable,
- const BLOCK_SIZE bsize, const TX_SIZE tx_size,
- const int is_inter_mode) {
- MACROBLOCKD *xd = &x->e_mbd;
- const struct macroblockd_plane *pd = &xd->plane[0];
- struct macroblock_plane *const p = &x->plane[0];
- assert(bsize < BLOCK_SIZES_ALL);
- const int num_4x4_w = mi_size_wide[bsize];
- const int num_4x4_h = mi_size_high[bsize];
- const int step = 1 << (tx_size << 1);
- const int block_step = (1 << tx_size);
- const int row_step = step * num_4x4_w >> tx_size;
- int block = 0;
- const int max_blocks_wide =
- num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 : xd->mb_to_right_edge >> 5);
- const int max_blocks_high =
- num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 : xd->mb_to_bottom_edge >> 5);
- int eob_cost = 0;
- const int bw = 4 * num_4x4_w;
- const int bh = 4 * num_4x4_h;
- const int use_hbd = is_cur_buf_hbd(xd);
- int num_blk_skip_w = num_4x4_w;
- int sh_blk_skip = 0;
- if (is_inter_mode) {
- num_blk_skip_w = num_4x4_w >> 1;
- sh_blk_skip = 1;
- }
-
-#if CONFIG_AV1_HIGHBITDEPTH
- if (use_hbd) {
- aom_highbd_subtract_block(bh, bw, p->src_diff, bw, p->src.buf,
- p->src.stride, pd->dst.buf, pd->dst.stride);
- } else {
- aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
- pd->dst.buf, pd->dst.stride);
- }
-#else
- aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
- pd->dst.buf, pd->dst.stride);
-#endif
-
- // Keep the intermediate value on the stack here. Writing directly to
- // skippable causes speed regression due to load-and-store issues in
- // update_yrd_loop_vars.
- int temp_skippable = 1;
- this_rdc->dist = 0;
- this_rdc->rate = 0;
- // For block sizes 8x16 or above, Hadamard txfm of two adjacent 8x8 blocks
- // can be done per function call. Hence the call of Hadamard txfm is
- // abstracted here for the specified cases.
- int is_tx_8x8_dual_applicable =
- (tx_size == TX_8X8 && block_size_wide[bsize] >= 16 &&
- block_size_high[bsize] >= 8);
-
-#if CONFIG_AV1_HIGHBITDEPTH
- // As of now, dual implementation of hadamard txfm is available for low
- // bitdepth.
- if (use_hbd) is_tx_8x8_dual_applicable = 0;
-#endif
-
- if (is_tx_8x8_dual_applicable) {
- aom_process_hadamard_lp_8x16(x, max_blocks_high, max_blocks_wide, num_4x4_w,
- step, block_step);
- }
-
- DECLARE_BLOCK_YRD_BUFFERS()
- DECLARE_BLOCK_YRD_VARS()
-#if CONFIG_AV1_HIGHBITDEPTH
- DECLARE_BLOCK_YRD_HBD_VARS()
-#else
- (void)use_hbd;
-#endif
-
- // Keep track of the row and column of the blocks we use so that we know
- // if we are in the unrestricted motion border.
- for (int r = 0; r < max_blocks_high; r += block_step) {
- for (int c = 0, s = 0; c < max_blocks_wide; c += block_step, s += step) {
- DECLARE_LOOP_VARS_BLOCK_YRD()
-
- switch (tx_size) {
-#if CONFIG_AV1_HIGHBITDEPTH
- case TX_16X16:
- if (use_hbd) {
- aom_hadamard_16x16(src_diff, diff_stride, coeff);
- av1_quantize_fp(coeff, 16 * 16, p->zbin_QTX, p->round_fp_QTX,
- p->quant_fp_QTX, p->quant_shift_QTX, qcoeff,
- dqcoeff, p->dequant_QTX, eob,
- // default_scan_fp_16x16_transpose and
- // av1_default_iscan_fp_16x16_transpose have to be
- // used together.
- default_scan_fp_16x16_transpose,
- av1_default_iscan_fp_16x16_transpose);
- } else {
- aom_hadamard_lp_16x16(src_diff, diff_stride, low_coeff);
- av1_quantize_lp(low_coeff, 16 * 16, p->round_fp_QTX,
- p->quant_fp_QTX, low_qcoeff, low_dqcoeff,
- p->dequant_QTX, eob,
- // default_scan_lp_16x16_transpose and
- // av1_default_iscan_lp_16x16_transpose have to be
- // used together.
- default_scan_lp_16x16_transpose,
- av1_default_iscan_lp_16x16_transpose);
- }
- break;
- case TX_8X8:
- if (use_hbd) {
- aom_hadamard_8x8(src_diff, diff_stride, coeff);
- av1_quantize_fp(
- coeff, 8 * 8, p->zbin_QTX, p->round_fp_QTX, p->quant_fp_QTX,
- p->quant_shift_QTX, qcoeff, dqcoeff, p->dequant_QTX, eob,
- default_scan_8x8_transpose, av1_default_iscan_8x8_transpose);
- } else {
- if (is_tx_8x8_dual_applicable) {
- // The coeffs are pre-computed for the whole block, so re-assign
- // low_coeff to the appropriate location.
- const int block_offset = BLOCK_OFFSET(block + s);
- low_coeff = (int16_t *)p->coeff + block_offset;
- } else {
- aom_hadamard_lp_8x8(src_diff, diff_stride, low_coeff);
- }
- av1_quantize_lp(
- low_coeff, 8 * 8, p->round_fp_QTX, p->quant_fp_QTX, low_qcoeff,
- low_dqcoeff, p->dequant_QTX, eob,
- // default_scan_8x8_transpose and
- // av1_default_iscan_8x8_transpose have to be used together.
- default_scan_8x8_transpose, av1_default_iscan_8x8_transpose);
- }
- break;
- default:
- assert(tx_size == TX_4X4);
- // In tx_size=4x4 case, aom_fdct4x4 and aom_fdct4x4_lp generate
- // normal coefficients order, so we don't need to change the scan
- // order here.
- if (use_hbd) {
- aom_fdct4x4(src_diff, coeff, diff_stride);
- av1_quantize_fp(coeff, 4 * 4, p->zbin_QTX, p->round_fp_QTX,
- p->quant_fp_QTX, p->quant_shift_QTX, qcoeff,
- dqcoeff, p->dequant_QTX, eob, scan_order->scan,
- scan_order->iscan);
- } else {
- aom_fdct4x4_lp(src_diff, low_coeff, diff_stride);
- av1_quantize_lp(low_coeff, 4 * 4, p->round_fp_QTX, p->quant_fp_QTX,
- low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
- scan_order->scan, scan_order->iscan);
- }
- break;
-#else
- case TX_16X16:
- aom_hadamard_lp_16x16(src_diff, diff_stride, low_coeff);
- av1_quantize_lp(low_coeff, 16 * 16, p->round_fp_QTX, p->quant_fp_QTX,
- low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
- default_scan_lp_16x16_transpose,
- av1_default_iscan_lp_16x16_transpose);
- break;
- case TX_8X8:
- if (is_tx_8x8_dual_applicable) {
- // The coeffs are pre-computed for the whole block, so re-assign
- // low_coeff to the appropriate location.
- const int block_offset = BLOCK_OFFSET(block + s);
- low_coeff = (int16_t *)p->coeff + block_offset;
- } else {
- aom_hadamard_lp_8x8(src_diff, diff_stride, low_coeff);
- }
- av1_quantize_lp(low_coeff, 8 * 8, p->round_fp_QTX, p->quant_fp_QTX,
- low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
- default_scan_8x8_transpose,
- av1_default_iscan_8x8_transpose);
- break;
- default:
- aom_fdct4x4_lp(src_diff, low_coeff, diff_stride);
- av1_quantize_lp(low_coeff, 4 * 4, p->round_fp_QTX, p->quant_fp_QTX,
- low_qcoeff, low_dqcoeff, p->dequant_QTX, eob,
- scan_order->scan, scan_order->iscan);
- break;
-#endif
- }
- assert(*eob <= 1024);
-#if CONFIG_AV1_HIGHBITDEPTH
- if (use_hbd)
- update_yrd_loop_vars_hbd(x, &temp_skippable, step, *eob, coeff, qcoeff,
- dqcoeff, this_rdc, &eob_cost,
- (r * num_blk_skip_w + c) >> sh_blk_skip);
- else
-#endif
- update_yrd_loop_vars(x, &temp_skippable, step, *eob, low_coeff,
- low_qcoeff, low_dqcoeff, this_rdc, &eob_cost,
- (r * num_blk_skip_w + c) >> sh_blk_skip);
- }
- block += row_step;
- }
-
- this_rdc->skip_txfm = *skippable = temp_skippable;
- if (this_rdc->sse < INT64_MAX) {
- this_rdc->sse = (this_rdc->sse << 6) >> 2;
- if (temp_skippable) {
- this_rdc->dist = 0;
- this_rdc->dist = this_rdc->sse;
- return;
- }
- }
-
- // If skippable is set, rate gets clobbered later.
- this_rdc->rate <<= (2 + AV1_PROB_COST_SHIFT);
- this_rdc->rate += (eob_cost << AV1_PROB_COST_SHIFT);
-}
-
-// Explicitly enumerate the cases so the compiler can generate SIMD for the
-// function. According to the disassembler, gcc generates SSE codes for each of
-// the possible block sizes. The hottest case is tx_width 16, which takes up
-// about 8% of the self cycle of av1_nonrd_pick_inter_mode_sb. Since
-// av1_nonrd_pick_inter_mode_sb takes up about 3% of total encoding time, the
-// potential room of improvement for writing AVX2 optimization is only 3% * 8% =
-// 0.24% of total encoding time.
-static AOM_INLINE void scale_square_buf_vals(int16_t *dst, const int tx_width,
- const int16_t *src,
- const int src_stride) {
-#define DO_SCALING \
- do { \
- for (int idy = 0; idy < tx_width; ++idy) { \
- for (int idx = 0; idx < tx_width; ++idx) { \
- dst[idy * tx_width + idx] = src[idy * src_stride + idx] * 8; \
- } \
- } \
- } while (0)
-
- if (tx_width == 4) {
- DO_SCALING;
- } else if (tx_width == 8) {
- DO_SCALING;
- } else if (tx_width == 16) {
- DO_SCALING;
- } else {
- assert(0);
- }
-
-#undef DO_SCALING
-}
-
-/*!\brief Calculates RD Cost when the block uses Identity transform.
- * Note that thie function is only for low bit depth encoding, since it
- * is called in real-time mode for now, which sets high bit depth to 0:
- * -DCONFIG_AV1_HIGHBITDEPTH=0
- *
- * \ingroup nonrd_mode_search
- * \callgraph
- * \callergraph
- * Calculates RD Cost. For low bit depth this function
- * uses low-precision set of functions (16-bit) and 32 bit for high bit depth
- * \param[in] x Pointer to structure holding all the data for
- the current macroblock
- * \param[in] this_rdc Pointer to calculated RD Cost
- * \param[in] skippable Pointer to a flag indicating possible tx skip
- * \param[in] bsize Current block size
- * \param[in] tx_size Transform size
- *
- * \remark Nothing is returned. Instead, calculated RD cost is placed to
- * \c this_rdc. \c skippable flag is set if all coefficients are zero.
- */
-static void block_yrd_idtx(MACROBLOCK *x, RD_STATS *this_rdc, int *skippable,
- const BLOCK_SIZE bsize, const TX_SIZE tx_size) {
- MACROBLOCKD *xd = &x->e_mbd;
- const struct macroblockd_plane *pd = &xd->plane[0];
- struct macroblock_plane *const p = &x->plane[0];
- assert(bsize < BLOCK_SIZES_ALL);
- const int num_4x4_w = mi_size_wide[bsize];
- const int num_4x4_h = mi_size_high[bsize];
- const int step = 1 << (tx_size << 1);
- const int block_step = (1 << tx_size);
- const int max_blocks_wide =
- num_4x4_w + (xd->mb_to_right_edge >= 0 ? 0 : xd->mb_to_right_edge >> 5);
- const int max_blocks_high =
- num_4x4_h + (xd->mb_to_bottom_edge >= 0 ? 0 : xd->mb_to_bottom_edge >> 5);
- int eob_cost = 0;
- const int bw = 4 * num_4x4_w;
- const int bh = 4 * num_4x4_h;
- const int num_blk_skip_w = num_4x4_w >> 1;
- const int sh_blk_skip = 1;
- // Keep the intermediate value on the stack here. Writing directly to
- // skippable causes speed regression due to load-and-store issues in
- // update_yrd_loop_vars.
- int temp_skippable = 1;
- int tx_wd = 0;
- switch (tx_size) {
- case TX_64X64:
- assert(0); // Not implemented
- break;
- case TX_32X32:
- assert(0); // Not used
- break;
- case TX_16X16: tx_wd = 16; break;
- case TX_8X8: tx_wd = 8; break;
- default:
- assert(tx_size == TX_4X4);
- tx_wd = 4;
- break;
- }
- this_rdc->dist = 0;
- this_rdc->rate = 0;
- aom_subtract_block(bh, bw, p->src_diff, bw, p->src.buf, p->src.stride,
- pd->dst.buf, pd->dst.stride);
- // Keep track of the row and column of the blocks we use so that we know
- // if we are in the unrestricted motion border.
- DECLARE_BLOCK_YRD_BUFFERS()
- DECLARE_BLOCK_YRD_VARS()
- for (int r = 0; r < max_blocks_high; r += block_step) {
- for (int c = 0, s = 0; c < max_blocks_wide; c += block_step, s += step) {
- DECLARE_LOOP_VARS_BLOCK_YRD()
- scale_square_buf_vals(low_coeff, tx_wd, src_diff, diff_stride);
- av1_quantize_lp(low_coeff, tx_wd * tx_wd, p->round_fp_QTX,
- p->quant_fp_QTX, low_qcoeff, low_dqcoeff, p->dequant_QTX,
- eob, scan_order->scan, scan_order->iscan);
- assert(*eob <= 1024);
- update_yrd_loop_vars(x, &temp_skippable, step, *eob, low_coeff,
- low_qcoeff, low_dqcoeff, this_rdc, &eob_cost,
- (r * num_blk_skip_w + c) >> sh_blk_skip);
- }
- }
- this_rdc->skip_txfm = *skippable = temp_skippable;
- if (this_rdc->sse < INT64_MAX) {
- this_rdc->sse = (this_rdc->sse << 6) >> 2;
- if (temp_skippable) {
- this_rdc->dist = 0;
- this_rdc->dist = this_rdc->sse;
- return;
- }
- }
- // If skippable is set, rate gets clobbered later.
- this_rdc->rate <<= (2 + AV1_PROB_COST_SHIFT);
- this_rdc->rate += (eob_cost << AV1_PROB_COST_SHIFT);
-}
-
-static INLINE void init_mbmi(MB_MODE_INFO *mbmi, PREDICTION_MODE pred_mode,
- MV_REFERENCE_FRAME ref_frame0,
- MV_REFERENCE_FRAME ref_frame1,
- const AV1_COMMON *cm) {
- PALETTE_MODE_INFO *const pmi = &mbmi->palette_mode_info;
- mbmi->ref_mv_idx = 0;
- mbmi->mode = pred_mode;
- mbmi->uv_mode = UV_DC_PRED;
- mbmi->ref_frame[0] = ref_frame0;
- mbmi->ref_frame[1] = ref_frame1;
- pmi->palette_size[0] = 0;
- pmi->palette_size[1] = 0;
- mbmi->filter_intra_mode_info.use_filter_intra = 0;
- mbmi->mv[0].as_int = mbmi->mv[1].as_int = 0;
- mbmi->motion_mode = SIMPLE_TRANSLATION;
- mbmi->num_proj_ref = 1;
- mbmi->interintra_mode = 0;
- set_default_interp_filters(mbmi, cm->features.interp_filter);
-}
-
-#if CONFIG_INTERNAL_STATS
-static void store_coding_context(MACROBLOCK *x, PICK_MODE_CONTEXT *ctx,
- int mode_index) {
-#else
-static void store_coding_context(MACROBLOCK *x, PICK_MODE_CONTEXT *ctx) {
-#endif // CONFIG_INTERNAL_STATS
- MACROBLOCKD *const xd = &x->e_mbd;
- TxfmSearchInfo *txfm_info = &x->txfm_search_info;
-
- // Take a snapshot of the coding context so it can be
- // restored if we decide to encode this way
- ctx->rd_stats.skip_txfm = txfm_info->skip_txfm;
-
- ctx->skippable = txfm_info->skip_txfm;
-#if CONFIG_INTERNAL_STATS
- ctx->best_mode_index = mode_index;
-#endif // CONFIG_INTERNAL_STATS
- ctx->mic = *xd->mi[0];
- ctx->skippable = txfm_info->skip_txfm;
- av1_copy_mbmi_ext_to_mbmi_ext_frame(&ctx->mbmi_ext_best, &x->mbmi_ext,
- av1_ref_frame_type(xd->mi[0]->ref_frame));
-}
-
-static int get_pred_buffer(PRED_BUFFER *p, int len) {
- for (int i = 0; i < len; i++) {
- if (!p[i].in_use) {
- p[i].in_use = 1;
- return i;
- }
- }
- return -1;
-}
-
-static void free_pred_buffer(PRED_BUFFER *p) {
- if (p != NULL) p->in_use = 0;
-}
-
-static INLINE int get_drl_cost(const PREDICTION_MODE this_mode,
- const int ref_mv_idx,
+static INLINE int get_drl_cost(PREDICTION_MODE this_mode, int ref_mv_idx,
const MB_MODE_INFO_EXT *mbmi_ext,
const int (*const drl_mode_cost0)[2],
int8_t ref_frame_type) {
@@ -1739,132 +1026,6 @@
}
}
-static int64_t model_rd_for_sb_uv(AV1_COMP *cpi, BLOCK_SIZE plane_bsize,
- MACROBLOCK *x, MACROBLOCKD *xd,
- RD_STATS *this_rdc, int start_plane,
- int stop_plane) {
- // Note our transform coeffs are 8 times an orthogonal transform.
- // Hence quantizer step is also 8 times. To get effective quantizer
- // we need to divide by 8 before sending to modeling function.
- unsigned int sse;
- int rate;
- int64_t dist;
- int i;
- int64_t tot_sse = 0;
-
- this_rdc->rate = 0;
- this_rdc->dist = 0;
- this_rdc->skip_txfm = 0;
-
- for (i = start_plane; i <= stop_plane; ++i) {
- struct macroblock_plane *const p = &x->plane[i];
- struct macroblockd_plane *const pd = &xd->plane[i];
- const uint32_t dc_quant = p->dequant_QTX[0];
- const uint32_t ac_quant = p->dequant_QTX[1];
- const BLOCK_SIZE bs = plane_bsize;
- unsigned int var;
- if (!x->color_sensitivity[i - 1]) continue;
-
- var = cpi->ppi->fn_ptr[bs].vf(p->src.buf, p->src.stride, pd->dst.buf,
- pd->dst.stride, &sse);
- assert(sse >= var);
- tot_sse += sse;
-
- av1_model_rd_from_var_lapndz(sse - var, num_pels_log2_lookup[bs],
- dc_quant >> 3, &rate, &dist);
-
- this_rdc->rate += rate >> 1;
- this_rdc->dist += dist << 3;
-
- av1_model_rd_from_var_lapndz(var, num_pels_log2_lookup[bs], ac_quant >> 3,
- &rate, &dist);
-
- this_rdc->rate += rate;
- this_rdc->dist += dist << 4;
- }
-
- if (this_rdc->rate == 0) {
- this_rdc->skip_txfm = 1;
- }
-
- if (RDCOST(x->rdmult, this_rdc->rate, this_rdc->dist) >=
- RDCOST(x->rdmult, 0, tot_sse << 4)) {
- this_rdc->rate = 0;
- this_rdc->dist = tot_sse << 4;
- this_rdc->skip_txfm = 1;
- }
-
- return tot_sse;
-}
-
-/*!\cond */
-struct estimate_block_intra_args {
- AV1_COMP *cpi;
- MACROBLOCK *x;
- PREDICTION_MODE mode;
- int skippable;
- RD_STATS *rdc;
-};
-/*!\endcond */
-
-/*!\brief Estimation of RD cost of an intra mode for Non-RD optimized case.
- *
- * \ingroup nonrd_mode_search
- * \callgraph
- * \callergraph
- * Calculates RD Cost for an intra mode for a single TX block using Hadamard
- * transform.
- * \param[in] plane Color plane
- * \param[in] block Index of a TX block in a prediction block
- * \param[in] row Row of a current TX block
- * \param[in] col Column of a current TX block
- * \param[in] plane_bsize Block size of a current prediction block
- * \param[in] tx_size Transform size
- * \param[in] arg Pointer to a structure that holds parameters
- * for intra mode search
- *
- * \remark Nothing is returned. Instead, best mode and RD Cost of the best mode
- * are set in \c args->rdc and \c args->mode
- */
-static void estimate_block_intra(int plane, int block, int row, int col,
- BLOCK_SIZE plane_bsize, TX_SIZE tx_size,
- void *arg) {
- struct estimate_block_intra_args *const args = arg;
- AV1_COMP *const cpi = args->cpi;
- AV1_COMMON *const cm = &cpi->common;
- MACROBLOCK *const x = args->x;
- MACROBLOCKD *const xd = &x->e_mbd;
- struct macroblock_plane *const p = &x->plane[plane];
- struct macroblockd_plane *const pd = &xd->plane[plane];
- const BLOCK_SIZE bsize_tx = txsize_to_bsize[tx_size];
- uint8_t *const src_buf_base = p->src.buf;
- uint8_t *const dst_buf_base = pd->dst.buf;
- const int64_t src_stride = p->src.stride;
- const int64_t dst_stride = pd->dst.stride;
- RD_STATS this_rdc;
-
- (void)block;
- (void)plane_bsize;
-
- av1_predict_intra_block_facade(cm, xd, plane, col, row, tx_size);
- av1_invalid_rd_stats(&this_rdc);
-
- p->src.buf = &src_buf_base[4 * (row * src_stride + col)];
- pd->dst.buf = &dst_buf_base[4 * (row * dst_stride + col)];
-
- if (plane == 0) {
- block_yrd(x, &this_rdc, &args->skippable, bsize_tx,
- AOMMIN(tx_size, TX_16X16), 0);
- } else {
- model_rd_for_sb_uv(cpi, bsize_tx, x, xd, &this_rdc, plane, plane);
- }
-
- p->src.buf = src_buf_base;
- pd->dst.buf = dst_buf_base;
- args->rdc->rate += this_rdc.rate;
- args->rdc->dist += this_rdc.dist;
-}
-
static INLINE void update_thresh_freq_fact(AV1_COMP *cpi, MACROBLOCK *x,
BLOCK_SIZE bsize,
MV_REFERENCE_FRAME ref_frame,
@@ -1930,7 +1091,7 @@
set_ref_ptrs(cm, xd, mi->ref_frame[0], NONE_FRAME);
mi->mv[0].as_int = 0;
mi->interp_filters = av1_broadcast_interp_filter(EIGHTTAP_REGULAR);
- xd->plane[0].pre[0] = yv12_mb[LAST_FRAME][0];
+ xd->plane[AOM_PLANE_Y].pre[0] = yv12_mb[LAST_FRAME][AOM_PLANE_Y];
av1_enc_build_inter_predictor_y(xd, mi_row, mi_col);
unsigned int var;
model_rd_for_sb_y(cpi, bsize, x, xd, &this_rdc, &var, 1, NULL);
@@ -1958,7 +1119,7 @@
[best_pickmode->best_ref_frame]
.as_int;
if (ctx_den->reuse_inter_pred) {
- xd->plane[0].pre[0] = yv12_mb[GOLDEN_FRAME][0];
+ xd->plane[AOM_PLANE_Y].pre[0] = yv12_mb[GOLDEN_FRAME][AOM_PLANE_Y];
av1_enc_build_inter_predictor_y(xd, mi_row, mi_col);
}
}
@@ -1972,8 +1133,6 @@
}
#endif // CONFIG_AV1_TEMPORAL_DENOISING
-#define FILTER_SEARCH_SIZE 2
-
/*!\brief Searches for the best interpolation filter
*
* \ingroup nonrd_mode_search
@@ -2006,7 +1165,7 @@
* \param[in] use_model_yrd_large Flag, indicating special logic to handle
* large blocks
* \param[in] best_sse Best sse so far.
- * \param[in] comp_pred Flag, indicating compound mode.
+ * \param[in] is_single_pred Flag, indicating single mode.
*
* \remark Nothing is returned. Instead, calculated RD cost is placed to
* \c this_rdc and best filter is placed to \c mi->interp_filters. In case
@@ -2021,10 +1180,10 @@
PRED_BUFFER **this_mode_pred,
int *this_early_term, unsigned int *var,
int use_model_yrd_large, int64_t best_sse,
- int comp_pred) {
+ int is_single_pred) {
AV1_COMMON *const cm = &cpi->common;
MACROBLOCKD *const xd = &x->e_mbd;
- struct macroblockd_plane *const pd = &xd->plane[0];
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
MB_MODE_INFO *const mi = xd->mi[0];
const int bw = block_size_wide[bsize];
int dim_factor =
@@ -2040,38 +1199,43 @@
SubpelParams subpel_params;
// Initialize inter prediction params at mode level for single reference
// mode.
- if (!comp_pred)
+ if (is_single_pred)
init_inter_mode_params(&mi->mv[0].as_mv, inter_pred_params_sr,
&subpel_params, xd->block_ref_scale_factors[0],
pd->pre->width, pd->pre->height);
- for (int i = 0; i < FILTER_SEARCH_SIZE * FILTER_SEARCH_SIZE; ++i) {
+ for (int filter_idx = 0; filter_idx < FILTER_SEARCH_SIZE * FILTER_SEARCH_SIZE;
+ ++filter_idx) {
int64_t cost;
if (cpi->sf.interp_sf.disable_dual_filter &&
- filters_ref_set[i].filter_x != filters_ref_set[i].filter_y)
+ filters_ref_set[filter_idx].as_filters.x_filter !=
+ filters_ref_set[filter_idx].as_filters.y_filter)
continue;
- mi->interp_filters.as_filters.x_filter = filters_ref_set[i].filter_x;
- mi->interp_filters.as_filters.y_filter = filters_ref_set[i].filter_y;
- if (!comp_pred)
+
+ mi->interp_filters.as_int = filters_ref_set[filter_idx].as_int;
+ if (is_single_pred)
av1_enc_build_inter_predictor_y_nonrd(xd, inter_pred_params_sr,
&subpel_params);
else
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0, 0);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, AOM_PLANE_Y);
unsigned int curr_var = UINT_MAX;
if (use_model_yrd_large)
model_skip_for_sb_y_large(cpi, bsize, mi_row, mi_col, x, xd,
- &pf_rd_stats[i], this_early_term, 1, best_sse,
- &curr_var, UINT_MAX);
+ &pf_rd_stats[filter_idx], this_early_term, 1,
+ best_sse, &curr_var, UINT_MAX);
else
- model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[i], &curr_var, 1, NULL);
- pf_rd_stats[i].rate += av1_get_switchable_rate(
+ model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[filter_idx], &curr_var,
+ 1, NULL);
+ pf_rd_stats[filter_idx].rate += av1_get_switchable_rate(
x, xd, cm->features.interp_filter, cm->seq_params->enable_dual_filter);
- cost = RDCOST(x->rdmult, pf_rd_stats[i].rate, pf_rd_stats[i].dist);
- pf_tx_size[i] = mi->tx_size;
+ cost = RDCOST(x->rdmult, pf_rd_stats[filter_idx].rate,
+ pf_rd_stats[filter_idx].dist);
+ pf_tx_size[filter_idx] = mi->tx_size;
if (cost < best_cost) {
*var = curr_var;
- best_filter_index = i;
+ best_filter_index = filter_idx;
best_cost = cost;
- best_skip = pf_rd_stats[i].skip_txfm;
+ best_skip = pf_rd_stats[filter_idx].skip_txfm;
best_early_term = *this_early_term;
if (reuse_inter_pred) {
if (*this_mode_pred != current_pred) {
@@ -2089,10 +1253,7 @@
if (reuse_inter_pred && *this_mode_pred != current_pred)
free_pred_buffer(current_pred);
- mi->interp_filters.as_filters.x_filter =
- filters_ref_set[best_filter_index].filter_x;
- mi->interp_filters.as_filters.y_filter =
- filters_ref_set[best_filter_index].filter_y;
+ mi->interp_filters.as_int = filters_ref_set[best_filter_index].as_int;
mi->tx_size = pf_tx_size[best_filter_index];
this_rdc->rate = pf_rd_stats[best_filter_index].rate;
this_rdc->dist = pf_rd_stats[best_filter_index].dist;
@@ -2103,15 +1264,15 @@
pd->dst.buf = (*this_mode_pred)->data;
pd->dst.stride = (*this_mode_pred)->stride;
} else if (best_filter_index < dim_factor * FILTER_SEARCH_SIZE - 1) {
- if (!comp_pred)
+ if (is_single_pred)
av1_enc_build_inter_predictor_y_nonrd(xd, inter_pred_params_sr,
&subpel_params);
else
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0, 0);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, AOM_PLANE_Y);
}
}
#if !CONFIG_REALTIME_ONLY
-#define MOTION_MODE_SEARCH_SIZE 2
static AOM_INLINE int is_warped_mode_allowed(const AV1_COMP *cpi,
MACROBLOCK *const x,
@@ -2199,25 +1360,28 @@
const MB_MODE_INFO base_mbmi = *mi;
MB_MODE_INFO best_mbmi;
- for (int i = 0; i < mode_search_size; ++i) {
+ for (int mode_index = 0; mode_index < mode_search_size; ++mode_index) {
int64_t cost = INT64_MAX;
- MOTION_MODE motion_mode = motion_modes[i];
+ MOTION_MODE motion_mode = motion_modes[mode_index];
*mi = base_mbmi;
mi->motion_mode = motion_mode;
if (motion_mode == SIMPLE_TRANSLATION) {
mi->interp_filters = av1_broadcast_interp_filter(EIGHTTAP_REGULAR);
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0, 0);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, AOM_PLANE_Y);
if (use_model_yrd_large)
model_skip_for_sb_y_large(cpi, bsize, mi_row, mi_col, x, xd,
- &pf_rd_stats[i], this_early_term, 1, best_sse,
- NULL, UINT_MAX);
+ &pf_rd_stats[mode_index], this_early_term, 1,
+ best_sse, NULL, UINT_MAX);
else
- model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[i], NULL, 1, NULL);
- pf_rd_stats[i].rate +=
+ model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[mode_index], NULL, 1,
+ NULL);
+ pf_rd_stats[mode_index].rate +=
av1_get_switchable_rate(x, xd, cm->features.interp_filter,
cm->seq_params->enable_dual_filter);
- cost = RDCOST(x->rdmult, pf_rd_stats[i].rate, pf_rd_stats[i].dist);
+ cost = RDCOST(x->rdmult, pf_rd_stats[mode_index].rate,
+ pf_rd_stats[mode_index].dist);
} else if (motion_mode == WARPED_CAUSAL) {
int pts[SAMPLES_ARRAY_SIZE], pts_inref[SAMPLES_ARRAY_SIZE];
const ModeCosts *mode_costs = &x->mode_costs;
@@ -2250,7 +1414,8 @@
// Refine MV in a small range.
av1_refine_warped_mv(xd, cm, &ms_params, bsize, pts0, pts_inref0,
- total_samples);
+ total_samples, cpi->sf.mv_sf.warp_search_method,
+ cpi->sf.mv_sf.warp_search_iters);
if (mi->mv[0].as_int == ref_mv.as_int) {
continue;
}
@@ -2269,26 +1434,28 @@
}
}
// Build the warped predictor
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0,
- av1_num_planes(cm) - 1);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, av1_num_planes(cm) - 1);
if (use_model_yrd_large)
model_skip_for_sb_y_large(cpi, bsize, mi_row, mi_col, x, xd,
- &pf_rd_stats[i], this_early_term, 1,
- best_sse, NULL, UINT_MAX);
+ &pf_rd_stats[mode_index], this_early_term,
+ 1, best_sse, NULL, UINT_MAX);
else
- model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[i], NULL, 1, NULL);
+ model_rd_for_sb_y(cpi, bsize, x, xd, &pf_rd_stats[mode_index], NULL,
+ 1, NULL);
- pf_rd_stats[i].rate +=
+ pf_rd_stats[mode_index].rate +=
mode_costs->motion_mode_cost[bsize][mi->motion_mode];
- cost = RDCOST(x->rdmult, pf_rd_stats[i].rate, pf_rd_stats[i].dist);
+ cost = RDCOST(x->rdmult, pf_rd_stats[mode_index].rate,
+ pf_rd_stats[mode_index].dist);
} else {
cost = INT64_MAX;
}
}
if (cost < best_cost) {
- best_mode_index = i;
+ best_mode_index = mode_index;
best_cost = cost;
- best_skip = pf_rd_stats[i].skip_txfm;
+ best_skip = pf_rd_stats[mode_index].skip_txfm;
best_early_term = *this_early_term;
best_mbmi = *mi;
}
@@ -2302,33 +1469,15 @@
this_rdc->skip_txfm = (best_skip || best_early_term);
*this_early_term = best_early_term;
if (best_mode_index < FILTER_SEARCH_SIZE - 1) {
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0, 0);
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, AOM_PLANE_Y);
}
}
#endif // !CONFIG_REALTIME_ONLY
-#define COLLECT_PICK_MODE_STAT 0
#define COLLECT_NON_SQR_STAT 0
-#if COLLECT_PICK_MODE_STAT
-#include "aom_ports/aom_timer.h"
-typedef struct _mode_search_stat {
- int32_t num_blocks[BLOCK_SIZES];
- int64_t total_block_times[BLOCK_SIZES];
- int32_t num_searches[BLOCK_SIZES][MB_MODE_COUNT];
- int32_t num_nonskipped_searches[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t search_times[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t nonskipped_search_times[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t ms_time[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t ifs_time[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t model_rd_time[BLOCK_SIZES][MB_MODE_COUNT];
- int64_t txfm_time[BLOCK_SIZES][MB_MODE_COUNT];
- struct aom_usec_timer timer1;
- struct aom_usec_timer timer2;
- struct aom_usec_timer bsize_timer;
-} mode_search_stat;
-
-static mode_search_stat ms_stat;
+#if COLLECT_NONRD_PICK_MODE_STAT
static AOM_INLINE void print_stage_time(const char *stage_name,
int64_t stage_time,
@@ -2337,9 +1486,9 @@
100 * stage_time / (float)total_time);
}
-static void print_time(const mode_search_stat *const ms_stat,
- const BLOCK_SIZE bsize, const int mi_rows,
- const int mi_cols, const int mi_row, const int mi_col) {
+static void print_time(const mode_search_stat_nonrd *const ms_stat,
+ BLOCK_SIZE bsize, int mi_rows, int mi_cols, int mi_row,
+ int mi_col) {
if ((mi_row + mi_size_high[bsize] >= mi_rows) &&
(mi_col + mi_size_wide[bsize] >= mi_cols)) {
int64_t total_time = 0l;
@@ -2396,47 +1545,22 @@
printf("Total time = %ld. Total blocks = %d\n", total_time, total_blocks);
}
}
-#endif // COLLECT_PICK_MODE_STAT
+#endif // COLLECT_NONRD_PICK_MODE_STAT
-static void compute_intra_yprediction(const AV1_COMMON *cm,
- PREDICTION_MODE mode, BLOCK_SIZE bsize,
- MACROBLOCK *x, MACROBLOCKD *xd) {
- const SequenceHeader *seq_params = cm->seq_params;
- struct macroblockd_plane *const pd = &xd->plane[0];
- struct macroblock_plane *const p = &x->plane[0];
- uint8_t *const src_buf_base = p->src.buf;
- uint8_t *const dst_buf_base = pd->dst.buf;
- const int src_stride = p->src.stride;
- const int dst_stride = pd->dst.stride;
- int plane = 0;
- int row, col;
- // block and transform sizes, in number of 4x4 blocks log 2 ("*_b")
- // 4x4=0, 8x8=2, 16x16=4, 32x32=6, 64x64=8
- // transform size varies per plane, look it up in a common way.
- const TX_SIZE tx_size = max_txsize_lookup[bsize];
- const BLOCK_SIZE plane_bsize =
- get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
- // If mb_to_right_edge is < 0 we are in a situation in which
- // the current block size extends into the UMV and we won't
- // visit the sub blocks that are wholly within the UMV.
- const int max_blocks_wide = max_block_wide(xd, plane_bsize, plane);
- const int max_blocks_high = max_block_high(xd, plane_bsize, plane);
- // Keep track of the row and column of the blocks we use so that we know
- // if we are in the unrestricted motion border.
- for (row = 0; row < max_blocks_high; row += (1 << tx_size)) {
- // Skip visiting the sub blocks that are wholly within the UMV.
- for (col = 0; col < max_blocks_wide; col += (1 << tx_size)) {
- p->src.buf = &src_buf_base[4 * (row * (int64_t)src_stride + col)];
- pd->dst.buf = &dst_buf_base[4 * (row * (int64_t)dst_stride + col)];
- av1_predict_intra_block(
- xd, seq_params->sb_size, seq_params->enable_intra_edge_filter,
- block_size_wide[bsize], block_size_high[bsize], tx_size, mode, 0, 0,
- FILTER_INTRA_MODES, pd->dst.buf, dst_stride, pd->dst.buf, dst_stride,
- 0, 0, plane);
- }
- }
- p->src.buf = src_buf_base;
- pd->dst.buf = dst_buf_base;
+static bool should_prune_intra_modes_using_neighbors(
+ const MACROBLOCKD *xd, bool enable_intra_mode_pruning_using_neighbors,
+ PREDICTION_MODE this_mode, PREDICTION_MODE above_mode,
+ PREDICTION_MODE left_mode) {
+ if (!enable_intra_mode_pruning_using_neighbors) return false;
+
+ // Avoid pruning of DC_PRED as it is the most probable mode to win as per the
+ // statistics generated for nonrd intra mode evaluations.
+ if (this_mode == DC_PRED) return false;
+
+ // Enable the pruning for current mode only if it is not the winner mode of
+ // both the neighboring blocks (left/top).
+ return xd->up_available && this_mode != above_mode && xd->left_available &&
+ this_mode != left_mode;
}
void av1_nonrd_pick_intra_mode(AV1_COMP *cpi, MACROBLOCK *x, RD_STATS *rd_cost,
@@ -2445,11 +1569,20 @@
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mi = xd->mi[0];
RD_STATS this_rdc, best_rdc;
- struct estimate_block_intra_args args = { cpi, x, DC_PRED, 1, 0 };
+ struct estimate_block_intra_args args;
+ init_estimate_block_intra_args(&args, cpi, x);
const TxfmSearchParams *txfm_params = &x->txfm_search_params;
- const TX_SIZE intra_tx_size =
+ mi->tx_size =
AOMMIN(max_txsize_lookup[bsize],
tx_mode_to_biggest_tx_size[txfm_params->tx_mode_search_type]);
+ assert(IMPLIES(xd->lossless[mi->segment_id], mi->tx_size == TX_4X4));
+ const BLOCK_SIZE tx_bsize = txsize_to_bsize[mi->tx_size];
+
+ // If the current block size is the same as the transform block size, enable
+ // mode pruning based on the best SAD so far.
+ if (cpi->sf.rt_sf.prune_intra_mode_using_best_sad_so_far && bsize == tx_bsize)
+ args.prune_mode_based_on_sad = true;
+
int *bmode_costs;
PREDICTION_MODE best_mode = DC_PRED;
const MB_MODE_INFO *above_mi = xd->above_mbmi;
@@ -2458,37 +1591,54 @@
const PREDICTION_MODE L = av1_left_block_mode(left_mi);
const int above_ctx = intra_mode_context[A];
const int left_ctx = intra_mode_context[L];
+ const unsigned int source_variance = x->source_variance;
bmode_costs = x->mode_costs.y_mode_costs[above_ctx][left_ctx];
av1_invalid_rd_stats(&best_rdc);
av1_invalid_rd_stats(&this_rdc);
- init_mbmi(mi, DC_PRED, INTRA_FRAME, NONE_FRAME, cm);
+ init_mbmi_nonrd(mi, DC_PRED, INTRA_FRAME, NONE_FRAME, cm);
mi->mv[0].as_int = mi->mv[1].as_int = INVALID_MV;
// Change the limit of this loop to add other intra prediction
// mode tests.
- for (int i = 0; i < 4; ++i) {
- PREDICTION_MODE this_mode = intra_mode_list[i];
+ for (int mode_index = 0; mode_index < RTC_INTRA_MODES; ++mode_index) {
+ PREDICTION_MODE this_mode = intra_mode_list[mode_index];
// As per the statistics generated for intra mode evaluation in the nonrd
// path, it is found that the probability of H_PRED mode being the winner is
- // very less when the best mode so far is V_PRED (out of DC_PRED and
- // V_PRED). If V_PRED is the winner mode out of DC_PRED and V_PRED, it could
- // imply the presence of a vertically dominant pattern. Hence, H_PRED mode
- // is not evaluated.
+ // very low when the best mode so far is V_PRED (out of DC_PRED and V_PRED).
+ // If V_PRED is the winner mode out of DC_PRED and V_PRED, it could imply
+ // the presence of a vertically dominant pattern. Hence, H_PRED mode is not
+ // evaluated.
if (cpi->sf.rt_sf.prune_h_pred_using_best_mode_so_far &&
this_mode == H_PRED && best_mode == V_PRED)
continue;
+ if (should_prune_intra_modes_using_neighbors(
+ xd, cpi->sf.rt_sf.enable_intra_mode_pruning_using_neighbors,
+ this_mode, A, L)) {
+ // Prune V_PRED and H_PRED if source variance of the block is less than
+ // or equal to 50. The source variance threshold is obtained empirically.
+ if ((this_mode == V_PRED || this_mode == H_PRED) && source_variance <= 50)
+ continue;
+
+ // As per the statistics, probability of SMOOTH_PRED being the winner is
+ // low when best mode so far is DC_PRED (out of DC_PRED, V_PRED and
+ // H_PRED). Hence, SMOOTH_PRED mode is not evaluated.
+ if (best_mode == DC_PRED && this_mode == SMOOTH_PRED) continue;
+ }
+
this_rdc.dist = this_rdc.rate = 0;
args.mode = this_mode;
args.skippable = 1;
args.rdc = &this_rdc;
- mi->tx_size = intra_tx_size;
mi->mode = this_mode;
- av1_foreach_transformed_block_in_plane(xd, bsize, 0, estimate_block_intra,
- &args);
+ av1_foreach_transformed_block_in_plane(xd, bsize, AOM_PLANE_Y,
+ av1_estimate_block_intra, &args);
+
+ if (this_rdc.rate == INT_MAX) continue;
+
const int skip_ctx = av1_get_skip_txfm_context(xd);
if (args.skippable) {
this_rdc.rate = x->mode_costs.skip_txfm_cost[skip_ctx][1];
@@ -2513,10 +1663,19 @@
mi->uv_mode = UV_DC_PRED;
*rd_cost = best_rdc;
+ // For lossless: always force the skip flags off.
+ // Even though the blk_skip is set to 0 above in the rdcost comparison,
+ // do it here again in case the above logic changes.
+ if (is_lossless_requested(&cpi->oxcf.rc_cfg)) {
+ x->txfm_search_info.skip_txfm = 0;
+ memset(ctx->blk_skip, 0,
+ sizeof(x->txfm_search_info.blk_skip[0]) * ctx->num_4x4_blk);
+ }
+
#if CONFIG_INTERNAL_STATS
- store_coding_context(x, ctx, mi->mode);
+ store_coding_context_nonrd(x, ctx, mi->mode);
#else
- store_coding_context(x, ctx);
+ store_coding_context_nonrd(x, ctx);
#endif // CONFIG_INTERNAL_STATS
}
@@ -2588,18 +1747,26 @@
use_alt_ref_frame = 0;
}
- // Skip golden reference if color is set, on flat blocks with motion.
- // For screen: always skip golden (if color_sensitivity_sb_g is set)
+ // Skip golden/altref reference if color is set, on flat blocks with motion.
+ // For screen: always skip golden/alt (if color_sensitivity_sb_g/alt is set)
// except when x->nonrd_prune_ref_frame_search = 0. This latter flag
// may be set in the variance partition when golden is a much better
// reference than last, in which case it may not be worth skipping
- // golden completely.
- if (((cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
+ // golden/altref completely.
+ // Condition on use_last_ref to make sure there remains at least one
+ // reference.
+ if (use_last_ref_frame &&
+ ((cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
x->nonrd_prune_ref_frame_search != 0) ||
- (x->source_variance < 500 &&
- x->content_state_sb.source_sad_nonrd > kLowSad)) &&
- (x->color_sensitivity_sb_g[0] == 1 || x->color_sensitivity_sb_g[1] == 1))
- use_golden_ref_frame = 0;
+ (x->source_variance < 200 &&
+ x->content_state_sb.source_sad_nonrd >= kLowSad))) {
+ if (x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)
+ use_golden_ref_frame = 0;
+ if (x->color_sensitivity_sb_alt[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity_sb_alt[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)
+ use_alt_ref_frame = 0;
+ }
// For non-screen: if golden and altref are not being selected as references
// (use_golden_ref_frame/use_alt_ref_frame = 0) check to allow golden back
@@ -2610,7 +1777,8 @@
(cpi->ref_frame_flags & AOM_LAST_FLAG) && !use_golden_ref_frame &&
!use_alt_ref_frame && x->pred_mv_sad[LAST_FRAME] != INT_MAX &&
x->nonrd_prune_ref_frame_search > 2 &&
- x->color_sensitivity_sb_g[0] == 0 && x->color_sensitivity_sb_g[1] == 0) {
+ x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_V)] == 0) {
int thr = (cm->width * cm->height >= 640 * 360) ? 100 : 150;
int pred = x->pred_mv_sad[LAST_FRAME] >>
(b_width_log2_lookup[bsize] + b_height_log2_lookup[bsize]);
@@ -2628,7 +1796,7 @@
x->content_state_sb.source_sad_nonrd < kHighSad) {
const int buffslot_golden =
cpi->ppi->rtc_ref.ref_idx[GOLDEN_FRAME - LAST_FRAME];
- if (cpi->svc.buffer_time_index[buffslot_golden] ==
+ if (cpi->ppi->rtc_ref.buffer_time_index[buffslot_golden] ==
cpi->svc.current_superframe)
use_golden_ref_frame = 1;
}
@@ -2643,289 +1811,6 @@
assert(use_last_ref_frame || use_golden_ref_frame || use_alt_ref_frame);
}
-// Checks whether Intra mode needs to be pruned based on
-// 'intra_y_mode_bsize_mask_nrd' and 'prune_hv_pred_modes_using_blksad'
-// speed features.
-static INLINE bool is_prune_intra_mode(AV1_COMP *cpi, int mode_index,
- int force_intra_check, BLOCK_SIZE bsize,
- uint8_t segment_id,
- SOURCE_SAD source_sad_nonrd,
- uint8_t color_sensitivity[2]) {
- const PREDICTION_MODE this_mode = intra_mode_list[mode_index];
- if (mode_index > 2 || force_intra_check == 0) {
- if (!((1 << this_mode) & cpi->sf.rt_sf.intra_y_mode_bsize_mask_nrd[bsize]))
- return true;
-
- if (this_mode == DC_PRED) return false;
-
- if (!cpi->sf.rt_sf.prune_hv_pred_modes_using_src_sad) return false;
-
- const bool has_color_sensitivity =
- color_sensitivity[0] && color_sensitivity[1];
- if (has_color_sensitivity &&
- (cpi->rc.frame_source_sad > 1.1 * cpi->rc.avg_source_sad ||
- cyclic_refresh_segment_id_boosted(segment_id) ||
- source_sad_nonrd > kMedSad))
- return false;
-
- return true;
- }
- return false;
-}
-
-/*!\brief Estimates best intra mode for inter mode search
- *
- * \ingroup nonrd_mode_search
- * \callgraph
- * \callergraph
- *
- * Using heuristics based on best inter mode, block size, and other decides
- * whether to check intra modes. If so, estimates and selects best intra mode
- * from the reduced set of intra modes (max 4 intra modes checked)
- *
- * \param[in] cpi Top-level encoder structure
- * \param[in] x Pointer to structure holding all the
- * data for the current macroblock
- * \param[in] bsize Current block size
- * \param[in] best_early_term Flag, indicating that TX for the
- * best inter mode was skipped
- * \param[in] ref_cost_intra Cost of signalling intra mode
- * \param[in] reuse_prediction Flag, indicating prediction re-use
- * \param[in] orig_dst Original destination buffer
- * \param[in] tmp_buffers Pointer to a temporary buffers for
- * prediction re-use
- * \param[out] this_mode_pred Pointer to store prediction buffer
- * for prediction re-use
- * \param[in] best_rdc Pointer to RD cost for the best
- * selected intra mode
- * \param[in] best_pickmode Pointer to a structure containing
- * best mode picked so far
- * \param[in] ctx Pointer to structure holding coding
- * contexts and modes for the block
- *
- * \remark Nothing is returned. Instead, calculated RD cost is placed to
- * \c best_rdc and best selected mode is placed to \c best_pickmode
- */
-static void estimate_intra_mode(
- AV1_COMP *cpi, MACROBLOCK *x, BLOCK_SIZE bsize, int best_early_term,
- unsigned int ref_cost_intra, int reuse_prediction, struct buf_2d *orig_dst,
- PRED_BUFFER *tmp_buffers, PRED_BUFFER **this_mode_pred, RD_STATS *best_rdc,
- BEST_PICKMODE *best_pickmode, PICK_MODE_CONTEXT *ctx) {
- AV1_COMMON *const cm = &cpi->common;
- MACROBLOCKD *const xd = &x->e_mbd;
- MB_MODE_INFO *const mi = xd->mi[0];
- const TxfmSearchParams *txfm_params = &x->txfm_search_params;
- const unsigned char segment_id = mi->segment_id;
- const int *const rd_threshes = cpi->rd.threshes[segment_id][bsize];
- const int *const rd_thresh_freq_fact = x->thresh_freq_fact[bsize];
- const bool is_screen_content =
- cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN;
- struct macroblockd_plane *const pd = &xd->plane[0];
-
- const CommonQuantParams *quant_params = &cm->quant_params;
-
- RD_STATS this_rdc;
-
- int intra_cost_penalty = av1_get_intra_cost_penalty(
- quant_params->base_qindex, quant_params->y_dc_delta_q,
- cm->seq_params->bit_depth);
- int64_t inter_mode_thresh =
- RDCOST(x->rdmult, ref_cost_intra + intra_cost_penalty, 0);
- int perform_intra_pred = cpi->sf.rt_sf.check_intra_pred_nonrd;
- int force_intra_check = 0;
- // For spatial enhancement layer: turn off intra prediction if the
- // previous spatial layer as golden ref is not chosen as best reference.
- // only do this for temporal enhancement layer and on non-key frames.
- if (cpi->svc.spatial_layer_id > 0 &&
- best_pickmode->best_ref_frame != GOLDEN_FRAME &&
- cpi->svc.temporal_layer_id > 0 &&
- !cpi->svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame)
- perform_intra_pred = 0;
-
- int do_early_exit_rdthresh = 1;
-
- uint32_t spatial_var_thresh = 50;
- int motion_thresh = 32;
- // Adjust thresholds to make intra mode likely tested if the other
- // references (golden, alt) are skipped/not checked. For now always
- // adjust for svc mode.
- if (cpi->ppi->use_svc || (cpi->sf.rt_sf.use_nonrd_altref_frame == 0 &&
- cpi->sf.rt_sf.nonrd_prune_ref_frame_search > 0)) {
- spatial_var_thresh = 150;
- motion_thresh = 0;
- }
-
- // Some adjustments to checking intra mode based on source variance.
- if (x->source_variance < spatial_var_thresh) {
- // If the best inter mode is large motion or non-LAST ref reduce intra cost
- // penalty, so intra mode is more likely tested.
- if (best_rdc->rdcost != INT64_MAX &&
- (best_pickmode->best_ref_frame != LAST_FRAME ||
- abs(mi->mv[0].as_mv.row) >= motion_thresh ||
- abs(mi->mv[0].as_mv.col) >= motion_thresh)) {
- intra_cost_penalty = intra_cost_penalty >> 2;
- inter_mode_thresh =
- RDCOST(x->rdmult, ref_cost_intra + intra_cost_penalty, 0);
- do_early_exit_rdthresh = 0;
- }
- if ((x->source_variance < AOMMAX(50, (spatial_var_thresh >> 1)) &&
- x->content_state_sb.source_sad_nonrd >= kHighSad) ||
- (is_screen_content && x->source_variance < 50 &&
- ((bsize >= BLOCK_32X32 &&
- x->content_state_sb.source_sad_nonrd != kZeroSad) ||
- x->color_sensitivity[0] == 1 || x->color_sensitivity[1] == 1)))
- force_intra_check = 1;
- // For big blocks worth checking intra (since only DC will be checked),
- // even if best_early_term is set.
- if (bsize >= BLOCK_32X32) best_early_term = 0;
- } else if (cpi->sf.rt_sf.source_metrics_sb_nonrd &&
- x->content_state_sb.source_sad_nonrd <= kLowSad) {
- perform_intra_pred = 0;
- }
-
- if (best_rdc->skip_txfm && best_pickmode->best_mode_initial_skip_flag) {
- if (cpi->sf.rt_sf.skip_intra_pred == 1 && best_pickmode->best_mode != NEWMV)
- perform_intra_pred = 0;
- else if (cpi->sf.rt_sf.skip_intra_pred == 2)
- perform_intra_pred = 0;
- }
-
- if (!(best_rdc->rdcost == INT64_MAX || force_intra_check ||
- (perform_intra_pred && !best_early_term &&
- bsize <= cpi->sf.part_sf.max_intra_bsize))) {
- return;
- }
-
- // Early exit based on RD cost calculated using known rate. When
- // is_screen_content is true, more bias is given to intra modes. Hence,
- // considered conservative threshold in early exit for the same.
- const int64_t known_rd = is_screen_content
- ? CALC_BIASED_RDCOST(inter_mode_thresh)
- : inter_mode_thresh;
- if (known_rd > best_rdc->rdcost) return;
-
- struct estimate_block_intra_args args = { cpi, x, DC_PRED, 1, 0 };
- TX_SIZE intra_tx_size = AOMMIN(
- AOMMIN(max_txsize_lookup[bsize],
- tx_mode_to_biggest_tx_size[txfm_params->tx_mode_search_type]),
- TX_16X16);
- if (is_screen_content && cpi->rc.high_source_sad &&
- x->source_variance > spatial_var_thresh && bsize <= BLOCK_16X16)
- intra_tx_size = TX_4X4;
-
- PRED_BUFFER *const best_pred = best_pickmode->best_pred;
- if (reuse_prediction && best_pred != NULL) {
- const int bh = block_size_high[bsize];
- const int bw = block_size_wide[bsize];
- if (best_pred->data == orig_dst->buf) {
- *this_mode_pred = &tmp_buffers[get_pred_buffer(tmp_buffers, 3)];
- aom_convolve_copy(best_pred->data, best_pred->stride,
- (*this_mode_pred)->data, (*this_mode_pred)->stride, bw,
- bh);
- best_pickmode->best_pred = *this_mode_pred;
- }
- }
- pd->dst = *orig_dst;
-
- for (int i = 0; i < 4; ++i) {
- const PREDICTION_MODE this_mode = intra_mode_list[i];
- const THR_MODES mode_index = mode_idx[INTRA_FRAME][mode_offset(this_mode)];
- const int64_t mode_rd_thresh = rd_threshes[mode_index];
-
- if (is_prune_intra_mode(cpi, i, force_intra_check, bsize, segment_id,
- x->content_state_sb.source_sad_nonrd,
- x->color_sensitivity))
- continue;
-
- if (is_screen_content && cpi->sf.rt_sf.source_metrics_sb_nonrd) {
- // For spatially flat blocks with zero motion only check
- // DC mode.
- if (x->content_state_sb.source_sad_nonrd == kZeroSad &&
- x->source_variance == 0 && this_mode != DC_PRED)
- continue;
- // Only test Intra for big blocks if spatial_variance is small.
- else if (bsize > BLOCK_32X32 && x->source_variance > 50)
- continue;
- }
-
- if (rd_less_than_thresh(best_rdc->rdcost, mode_rd_thresh,
- rd_thresh_freq_fact[mode_index]) &&
- (do_early_exit_rdthresh || this_mode == SMOOTH_PRED)) {
- continue;
- }
- const BLOCK_SIZE uv_bsize = get_plane_block_size(
- bsize, xd->plane[1].subsampling_x, xd->plane[1].subsampling_y);
-
- mi->mode = this_mode;
- mi->ref_frame[0] = INTRA_FRAME;
- mi->ref_frame[1] = NONE_FRAME;
-
- av1_invalid_rd_stats(&this_rdc);
- args.mode = this_mode;
- args.skippable = 1;
- args.rdc = &this_rdc;
- mi->tx_size = intra_tx_size;
- compute_intra_yprediction(cm, this_mode, bsize, x, xd);
- // Look into selecting tx_size here, based on prediction residual.
- block_yrd(x, &this_rdc, &args.skippable, bsize, mi->tx_size, 0);
- // TODO(kyslov@) Need to account for skippable
- if (x->color_sensitivity[0]) {
- av1_foreach_transformed_block_in_plane(xd, uv_bsize, 1,
- estimate_block_intra, &args);
- }
- if (x->color_sensitivity[1]) {
- av1_foreach_transformed_block_in_plane(xd, uv_bsize, 2,
- estimate_block_intra, &args);
- }
-
- int mode_cost = 0;
- if (av1_is_directional_mode(this_mode) && av1_use_angle_delta(bsize)) {
- mode_cost +=
- x->mode_costs.angle_delta_cost[this_mode - V_PRED]
- [MAX_ANGLE_DELTA +
- mi->angle_delta[PLANE_TYPE_Y]];
- }
- if (this_mode == DC_PRED && av1_filter_intra_allowed_bsize(cm, bsize)) {
- mode_cost += x->mode_costs.filter_intra_cost[bsize][0];
- }
- this_rdc.rate += ref_cost_intra;
- this_rdc.rate += intra_cost_penalty;
- this_rdc.rate += mode_cost;
- this_rdc.rdcost = RDCOST(x->rdmult, this_rdc.rate, this_rdc.dist);
-
- if (is_screen_content && cpi->sf.rt_sf.source_metrics_sb_nonrd) {
- // For blocks with low spatial variance and color sad,
- // favor the intra-modes, only on scene/slide change.
- if (cpi->rc.high_source_sad && x->source_variance < 800 &&
- (x->color_sensitivity[0] || x->color_sensitivity[1]))
- this_rdc.rdcost = CALC_BIASED_RDCOST(this_rdc.rdcost);
- // Otherwise bias against intra for blocks with zero
- // motion and no color, on non-scene/slide changes.
- else if (!cpi->rc.high_source_sad && x->source_variance > 0 &&
- x->content_state_sb.source_sad_nonrd == kZeroSad &&
- x->color_sensitivity[0] == 0 && x->color_sensitivity[1] == 0)
- this_rdc.rdcost = (3 * this_rdc.rdcost) >> 1;
- }
-
- if (this_rdc.rdcost < best_rdc->rdcost) {
- *best_rdc = this_rdc;
- best_pickmode->best_mode = this_mode;
- best_pickmode->best_tx_size = mi->tx_size;
- best_pickmode->best_ref_frame = INTRA_FRAME;
- best_pickmode->best_second_ref_frame = NONE;
- best_pickmode->best_mode_skip_txfm = this_rdc.skip_txfm;
- if (!this_rdc.skip_txfm) {
- memcpy(ctx->blk_skip, x->txfm_search_info.blk_skip,
- sizeof(x->txfm_search_info.blk_skip[0]) * ctx->num_4x4_blk);
- }
- mi->uv_mode = this_mode;
- mi->mv[0].as_int = INVALID_MV;
- mi->mv[1].as_int = INVALID_MV;
- }
- }
- mi->tx_size = best_pickmode->best_tx_size;
-}
-
static AOM_INLINE int is_filter_search_enabled_blk(
AV1_COMP *cpi, MACROBLOCK *x, int mi_row, int mi_col, BLOCK_SIZE bsize,
int segment_id, int cb_pred_filter_search, InterpFilter *filt_select) {
@@ -3043,10 +1928,9 @@
int shift = 3;
if (source_sad_nonrd >= kMedSad &&
cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN &&
- (int64_t) cpi->common.width * (int64_t) cpi->common.height >=
- (int64_t) 640 * 360) {
+ cpi->common.width * cpi->common.height >= 640 * 360)
shift = 4;
- } else if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
+ if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
cpi->rc.high_source_sad) {
shift = 6;
}
@@ -3062,26 +1946,28 @@
noise_level = av1_noise_estimate_extract_level(&cpi->noise_estimate);
if (noise_level == kLow && source_variance > thresh_spatial &&
cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN && norm_sad < 50) {
- x->color_sensitivity[0] = 0;
- x->color_sensitivity[1] = 0;
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] = 0;
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] = 0;
return;
}
const int num_planes = av1_num_planes(&cpi->common);
- for (int i = 1; i < num_planes; ++i) {
- if (x->color_sensitivity[i - 1] == 2 || source_variance < 50) {
- struct macroblock_plane *const p = &x->plane[i];
+
+ for (int plane = AOM_PLANE_U; plane < num_planes; ++plane) {
+ if (x->color_sensitivity[COLOR_SENS_IDX(plane)] == 2 ||
+ source_variance < 50) {
+ struct macroblock_plane *const p = &x->plane[plane];
const BLOCK_SIZE bs =
get_plane_block_size(bsize, subsampling_x, subsampling_y);
const int uv_sad = cpi->ppi->fn_ptr[bs].sdf(
- p->src.buf, p->src.stride, yv12_mb[i].buf, yv12_mb[i].stride);
+ p->src.buf, p->src.stride, yv12_mb[plane].buf, yv12_mb[plane].stride);
const int norm_uv_sad =
uv_sad >> (b_width_log2_lookup[bs] + b_height_log2_lookup[bs]);
- x->color_sensitivity[i - 1] =
+ x->color_sensitivity[COLOR_SENS_IDX(plane)] =
uv_sad > (y_sad >> shift) && norm_uv_sad > 40;
if (source_variance < 50 && norm_uv_sad > 100)
- x->color_sensitivity[i - 1] = 1;
+ x->color_sensitivity[COLOR_SENS_IDX(plane)] = 1;
}
}
}
@@ -3115,8 +2001,8 @@
*ref_mv_idx = mbmi->ref_mv_idx + 1;
}
-static void set_compound_mode(MACROBLOCK *x, int ref_frame, int ref_frame2,
- int ref_mv_idx,
+static void set_compound_mode(MACROBLOCK *x, MV_REFERENCE_FRAME ref_frame,
+ MV_REFERENCE_FRAME ref_frame2, int ref_mv_idx,
int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES],
PREDICTION_MODE this_mode) {
MACROBLOCKD *const xd = &x->e_mbd;
@@ -3168,7 +2054,7 @@
}
static AOM_FORCE_INLINE void fill_single_inter_mode_costs(
- int (*single_inter_mode_costs)[REF_FRAMES], const int num_inter_modes,
+ int (*single_inter_mode_costs)[REF_FRAMES], int num_inter_modes,
const REF_MODE *reference_mode_set, const ModeCosts *mode_costs,
const int16_t *mode_context) {
bool ref_frame_used[REF_FRAMES] = { false };
@@ -3216,18 +2102,29 @@
PREDICTION_MODE *this_mode, MV_REFERENCE_FRAME *ref_frame,
MV_REFERENCE_FRAME *ref_frame2, int_mv frame_mv[MB_MODE_COUNT][REF_FRAMES],
const int *use_ref_frame_mask, int comp_index,
- bool comp_use_zero_zeromv_only, MV_REFERENCE_FRAME *last_comp_ref_frame) {
+ bool comp_use_zero_zeromv_only, MV_REFERENCE_FRAME *last_comp_ref_frame,
+ BLOCK_SIZE bsize) {
const MV_REFERENCE_FRAME *rf = comp_ref_mode_set[comp_index].ref_frame;
+ int skip_gf = 0;
+ int skip_alt = 0;
*this_mode = comp_ref_mode_set[comp_index].pred_mode;
*ref_frame = rf[0];
*ref_frame2 = rf[1];
assert(*ref_frame == LAST_FRAME);
assert(*this_mode == GLOBAL_GLOBALMV || *this_mode == NEAREST_NEARESTMV);
+ if (x->source_variance < 50 && bsize > BLOCK_16X16) {
+ if (x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)
+ skip_gf = 1;
+ if (x->color_sensitivity_sb_alt[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity_sb_alt[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)
+ skip_alt = 1;
+ }
if (comp_use_zero_zeromv_only && *this_mode != GLOBAL_GLOBALMV) {
return 0;
}
if (*ref_frame2 == GOLDEN_FRAME &&
- (cpi->sf.rt_sf.ref_frame_comp_nonrd[0] == 0 ||
+ (cpi->sf.rt_sf.ref_frame_comp_nonrd[0] == 0 || skip_gf ||
!(cpi->ref_frame_flags & AOM_GOLD_FLAG))) {
return 0;
} else if (*ref_frame2 == LAST2_FRAME &&
@@ -3235,7 +2132,7 @@
!(cpi->ref_frame_flags & AOM_LAST2_FLAG))) {
return 0;
} else if (*ref_frame2 == ALTREF_FRAME &&
- (cpi->sf.rt_sf.ref_frame_comp_nonrd[2] == 0 ||
+ (cpi->sf.rt_sf.ref_frame_comp_nonrd[2] == 0 || skip_alt ||
!(cpi->ref_frame_flags & AOM_ALT_FLAG))) {
return 0;
}
@@ -3313,16 +2210,15 @@
return false;
}
-// Function to setup parameters used for inter mode evaluation.
+// Function to setup parameters used for inter mode evaluation in non-rd.
static AOM_FORCE_INLINE void set_params_nonrd_pick_inter_mode(
AV1_COMP *cpi, MACROBLOCK *x, InterModeSearchStateNonrd *search_state,
- TileDataEnc *tile_data, PICK_MODE_CONTEXT *ctx, RD_STATS *rd_cost,
- int *force_skip_low_temp_var, int *skip_pred_mv, const int mi_row,
- const int mi_col, const int gf_temporal_ref, const unsigned char segment_id,
+ RD_STATS *rd_cost, int *force_skip_low_temp_var, int *skip_pred_mv,
+ int mi_row, int mi_col, int gf_temporal_ref, unsigned char segment_id,
BLOCK_SIZE bsize
#if CONFIG_AV1_TEMPORAL_DENOISING
,
- int denoise_svc_pickmode
+ PICK_MODE_CONTEXT *ctx, int denoise_svc_pickmode
#endif
) {
AV1_COMMON *const cm = &cpi->common;
@@ -3330,8 +2226,9 @@
TxfmSearchInfo *txfm_info = &x->txfm_search_info;
MB_MODE_INFO *const mi = xd->mi[0];
const ModeCosts *mode_costs = &x->mode_costs;
- (void)ctx;
+ // Initialize variance and distortion (chroma) for all modes and reference
+ // frames
for (int idx = 0; idx < RTC_INTER_MODES; idx++) {
for (int ref = 0; ref < REF_FRAMES; ref++) {
search_state->vars[idx][ref] = UINT_MAX;
@@ -3339,23 +2236,26 @@
}
}
- x->color_sensitivity[0] = x->color_sensitivity_sb[0];
- x->color_sensitivity[1] = x->color_sensitivity_sb[1];
+ // Initialize values of color sensitivity with sb level color sensitivity
+ av1_copy(x->color_sensitivity, x->color_sensitivity_sb);
+
init_best_pickmode(&search_state->best_pickmode);
+ // Estimate cost for single reference frames
estimate_single_ref_frame_costs(cm, xd, mode_costs, segment_id, bsize,
search_state->ref_costs_single);
- memset(&search_state->mode_checked[0][0], 0, MB_MODE_COUNT * REF_FRAMES);
+ // Reset flag to indicate modes evaluated
+ av1_zero(search_state->mode_checked);
txfm_info->skip_txfm = 0;
- // initialize mode decisions
+ // Initialize mode decisions
av1_invalid_rd_stats(&search_state->best_rdc);
av1_invalid_rd_stats(&search_state->this_rdc);
av1_invalid_rd_stats(rd_cost);
- for (int i = 0; i < REF_FRAMES; ++i) {
- x->warp_sample_info[i].num = -1;
+ for (int ref_idx = 0; ref_idx < REF_FRAMES; ++ref_idx) {
+ x->warp_sample_info[ref_idx].num = -1;
}
mi->bsize = bsize;
@@ -3371,25 +2271,28 @@
}
#endif
+ // Populate predicated motion vectors for LAST_FRAME
if (cpi->ref_frame_flags & AOM_LAST_FLAG)
- find_predictors(cpi, x, LAST_FRAME, search_state->frame_mv, tile_data,
+ find_predictors(cpi, x, LAST_FRAME, search_state->frame_mv,
search_state->yv12_mb, bsize, *force_skip_low_temp_var,
x->force_zeromv_skip_for_blk);
+ // Update mask to use all reference frame
get_ref_frame_use_mask(cpi, x, mi, mi_row, mi_col, bsize, gf_temporal_ref,
search_state->use_ref_frame_mask,
force_skip_low_temp_var);
- *skip_pred_mv =
- x->force_zeromv_skip_for_blk ||
- (x->nonrd_prune_ref_frame_search > 2 && x->color_sensitivity[0] != 2 &&
- x->color_sensitivity[1] != 2);
+ *skip_pred_mv = x->force_zeromv_skip_for_blk ||
+ (x->nonrd_prune_ref_frame_search > 2 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] != 2 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] != 2);
+ // Populate predicated motion vectors for other single reference frame
// Start at LAST_FRAME + 1.
for (MV_REFERENCE_FRAME ref_frame_iter = LAST_FRAME + 1;
ref_frame_iter <= ALTREF_FRAME; ++ref_frame_iter) {
if (search_state->use_ref_frame_mask[ref_frame_iter]) {
- find_predictors(cpi, x, ref_frame_iter, search_state->frame_mv, tile_data,
+ find_predictors(cpi, x, ref_frame_iter, search_state->frame_mv,
search_state->yv12_mb, bsize, *force_skip_low_temp_var,
*skip_pred_mv);
}
@@ -3400,52 +2303,60 @@
// speed features settings.
static AOM_FORCE_INLINE bool skip_inter_mode_nonrd(
AV1_COMP *cpi, MACROBLOCK *x, InterModeSearchStateNonrd *search_state,
- int64_t *thresh_sad_pred, int *force_mv_inter_layer, int *comp_pred,
+ int64_t *thresh_sad_pred, int *force_mv_inter_layer, int *is_single_pred,
PREDICTION_MODE *this_mode, MV_REFERENCE_FRAME *last_comp_ref_frame,
MV_REFERENCE_FRAME *ref_frame, MV_REFERENCE_FRAME *ref_frame2, int idx,
- int svc_mv_col, int svc_mv_row, int force_skip_low_temp_var,
- unsigned int sse_zeromv_norm, const int num_inter_modes,
- const unsigned char segment_id, BLOCK_SIZE bsize,
+ int_mv svc_mv, int force_skip_low_temp_var, unsigned int sse_zeromv_norm,
+ int num_inter_modes, unsigned char segment_id, BLOCK_SIZE bsize,
bool comp_use_zero_zeromv_only, bool check_globalmv) {
AV1_COMMON *const cm = &cpi->common;
const struct segmentation *const seg = &cm->seg;
const SVC *const svc = &cpi->svc;
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mi = xd->mi[0];
+ const REAL_TIME_SPEED_FEATURES *const rt_sf = &cpi->sf.rt_sf;
+ // Skip compound mode based on reference frame mask and type of the mode and
+ // for allowed compound modes, setup ref mv stack and reference frame.
if (idx >= num_inter_modes) {
const int comp_index = idx - num_inter_modes;
if (!setup_compound_params_from_comp_idx(
cpi, x, search_state->yv12_mb, this_mode, ref_frame, ref_frame2,
search_state->frame_mv, search_state->use_ref_frame_mask,
- comp_index, comp_use_zero_zeromv_only, last_comp_ref_frame)) {
+ comp_index, comp_use_zero_zeromv_only, last_comp_ref_frame,
+ bsize)) {
return true;
}
- *comp_pred = 1;
+ *is_single_pred = 0;
} else {
*this_mode = ref_mode_set[idx].pred_mode;
*ref_frame = ref_mode_set[idx].ref_frame;
*ref_frame2 = NONE_FRAME;
}
- if (!*comp_pred && search_state->mode_checked[*this_mode][*ref_frame]) {
+ // Skip the single reference mode for which mode check flag is set.
+ if (*is_single_pred && search_state->mode_checked[*this_mode][*ref_frame]) {
return true;
}
+ // Skip GLOBALMV mode if check_globalmv flag is not enabled.
if (!check_globalmv && *this_mode == GLOBALMV) {
return true;
}
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer1);
- ms_stat.num_searches[bsize][*this_mode]++;
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer1);
+ x->ms_stat_nonrd.num_searches[bsize][*this_mode]++;
#endif
mi->mode = *this_mode;
mi->ref_frame[0] = *ref_frame;
mi->ref_frame[1] = *ref_frame2;
+ // Skip the mode if use reference frame mask flag is not set.
if (!search_state->use_ref_frame_mask[*ref_frame]) return true;
+ // Skip mode for some modes and reference frames when
+ // force_zeromv_skip_for_blk flag is true.
if (x->force_zeromv_skip_for_blk &&
((!(*this_mode == NEARESTMV &&
search_state->frame_mv[*this_mode][*ref_frame].as_int == 0) &&
@@ -3453,7 +2364,9 @@
*ref_frame != LAST_FRAME))
return true;
- if (cpi->sf.rt_sf.prune_compoundmode_with_singlemode_var && *comp_pred &&
+ // Skip compound mode based on variance of previously evaluated single
+ // reference modes.
+ if (rt_sf->prune_compoundmode_with_singlemode_var && !*is_single_pred &&
prune_compoundmode_with_singlemode_var(
*this_mode, *ref_frame, *ref_frame2, search_state->frame_mv,
search_state->mode_checked, search_state->vars,
@@ -3466,17 +2379,14 @@
((*ref_frame == LAST_FRAME && svc->skip_mvsearch_last) ||
(*ref_frame == GOLDEN_FRAME && svc->skip_mvsearch_gf) ||
(*ref_frame == ALTREF_FRAME && svc->skip_mvsearch_altref))) {
- // Only test mode if NEARESTMV/NEARMV is (svc_mv_col, svc_mv_row),
- // otherwise set NEWMV to (svc_mv_col, svc_mv_row).
+ // Only test mode if NEARESTMV/NEARMV is (svc_mv.mv.col, svc_mv.mv.row),
+ // otherwise set NEWMV to (svc_mv.mv.col, svc_mv.mv.row).
// Skip newmv and filter search.
*force_mv_inter_layer = 1;
if (*this_mode == NEWMV) {
- search_state->frame_mv[*this_mode][*ref_frame].as_mv.col = svc_mv_col;
- search_state->frame_mv[*this_mode][*ref_frame].as_mv.row = svc_mv_row;
- } else if (search_state->frame_mv[*this_mode][*ref_frame].as_mv.col !=
- svc_mv_col ||
- search_state->frame_mv[*this_mode][*ref_frame].as_mv.row !=
- svc_mv_row) {
+ search_state->frame_mv[*this_mode][*ref_frame] = svc_mv;
+ } else if (search_state->frame_mv[*this_mode][*ref_frame].as_int !=
+ svc_mv.as_int) {
return true;
}
}
@@ -3497,12 +2407,13 @@
// For the latter condition: the same condition should apply
// to newmv if (0, 0), so this latter condition is repeated
// below after search_new_mv.
- if (cpi->sf.rt_sf.source_metrics_sb_nonrd) {
+ if (rt_sf->source_metrics_sb_nonrd) {
if ((search_state->frame_mv[*this_mode][*ref_frame].as_int != 0 &&
x->content_state_sb.source_sad_nonrd == kZeroSad) ||
(search_state->frame_mv[*this_mode][*ref_frame].as_int == 0 &&
x->content_state_sb.source_sad_nonrd != kZeroSad &&
- ((x->color_sensitivity[0] == 0 && x->color_sensitivity[1] == 0) ||
+ ((x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 0) ||
cpi->rc.high_source_sad) &&
x->source_variance == 0))
return true;
@@ -3511,15 +2422,19 @@
if (*this_mode == NEWMV && x->source_variance < 100) return true;
// Skip non-LAST for color on flat blocks.
if (*ref_frame > LAST_FRAME && x->source_variance == 0 &&
- (x->color_sensitivity[0] == 1 || x->color_sensitivity[1] == 1))
+ (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 1 ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 1))
return true;
}
+ // Skip mode based on block size, reference frame mode and other block
+ // properties.
if (skip_mode_by_bsize_and_ref_frame(
*this_mode, *ref_frame, bsize, x->nonrd_prune_ref_frame_search,
- sse_zeromv_norm, cpi->sf.rt_sf.nonrd_aggressive_skip))
+ sse_zeromv_norm, rt_sf->nonrd_aggressive_skip))
return true;
+ // Skip mode based on low temporal variance and souce sad.
if (skip_mode_by_low_temp(*this_mode, *ref_frame, bsize, x->content_state_sb,
search_state->frame_mv[*this_mode][*ref_frame],
force_skip_low_temp_var))
@@ -3530,7 +2445,7 @@
// end up unable to pick any mode.
if (!segfeature_active(seg, segment_id, SEG_LVL_REF_FRAME)) {
// Check for skipping GOLDEN and ALTREF based pred_mv_sad.
- if (cpi->sf.rt_sf.nonrd_prune_ref_frame_search > 0 &&
+ if (rt_sf->nonrd_prune_ref_frame_search > 0 &&
x->pred_mv_sad[*ref_frame] != INT_MAX && *ref_frame != LAST_FRAME) {
if ((int64_t)(x->pred_mv_sad[*ref_frame]) > *thresh_sad_pred) return true;
}
@@ -3541,19 +2456,607 @@
x->pred_mv1_sad[*ref_frame] > (x->pred_mv0_sad[*ref_frame] << 1))
return true;
- if (!*comp_pred) {
+ // Skip single reference mode based on rd threshold.
+ if (*is_single_pred) {
if (skip_mode_by_threshold(
*this_mode, *ref_frame,
search_state->frame_mv[*this_mode][*ref_frame],
cpi->rc.frames_since_golden, cpi->rd.threshes[segment_id][bsize],
x->thresh_freq_fact[bsize], search_state->best_rdc.rdcost,
search_state->best_pickmode.best_mode_skip_txfm,
- (cpi->sf.rt_sf.nonrd_aggressive_skip ? 1 : 0)))
+ (rt_sf->nonrd_aggressive_skip ? 1 : 0)))
return true;
}
return false;
}
+// Function to perform inter mode evaluation for non-rd
+static AOM_FORCE_INLINE bool handle_inter_mode_nonrd(
+ AV1_COMP *cpi, MACROBLOCK *x, InterModeSearchStateNonrd *search_state,
+ PICK_MODE_CONTEXT *ctx, PRED_BUFFER **this_mode_pred,
+ PRED_BUFFER *tmp_buffer, InterPredParams inter_pred_params_sr,
+ int *best_early_term, unsigned int *sse_zeromv_norm, bool *check_globalmv,
+#if CONFIG_AV1_TEMPORAL_DENOISING
+ int64_t *zero_last_cost_orig, int denoise_svc_pickmode,
+#endif
+ int idx, int force_mv_inter_layer, int is_single_pred, int skip_pred_mv,
+ int gf_temporal_ref, int use_model_yrd_large, int filter_search_enabled_blk,
+ BLOCK_SIZE bsize, PREDICTION_MODE this_mode, InterpFilter filt_select,
+ int cb_pred_filter_search, int reuse_inter_pred) {
+ AV1_COMMON *const cm = &cpi->common;
+ MACROBLOCKD *const xd = &x->e_mbd;
+ MB_MODE_INFO *const mi = xd->mi[0];
+ const MB_MODE_INFO_EXT *const mbmi_ext = &x->mbmi_ext;
+ const int mi_row = xd->mi_row;
+ const int mi_col = xd->mi_col;
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
+ const int bw = block_size_wide[bsize];
+ const InterpFilter filter_ref = cm->features.interp_filter;
+ const InterpFilter default_interp_filter = EIGHTTAP_REGULAR;
+ TxfmSearchInfo *txfm_info = &x->txfm_search_info;
+ const ModeCosts *mode_costs = &x->mode_costs;
+ const REAL_TIME_SPEED_FEATURES *const rt_sf = &cpi->sf.rt_sf;
+ BEST_PICKMODE *const best_pickmode = &search_state->best_pickmode;
+
+ MV_REFERENCE_FRAME ref_frame = mi->ref_frame[0];
+ MV_REFERENCE_FRAME ref_frame2 = mi->ref_frame[1];
+ int_mv *const this_mv = &search_state->frame_mv[this_mode][ref_frame];
+ unsigned int var = UINT_MAX;
+ int this_early_term = 0;
+ int rate_mv = 0;
+ int is_skippable;
+ int skip_this_mv = 0;
+ unsigned int var_threshold = UINT_MAX;
+ PREDICTION_MODE this_best_mode;
+ RD_STATS nonskip_rdc;
+ av1_invalid_rd_stats(&nonskip_rdc);
+
+ if (this_mode == NEWMV && !force_mv_inter_layer) {
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer2);
+#endif
+ // Find the best motion vector for single/compound mode.
+ const bool skip_newmv = search_new_mv(
+ cpi, x, search_state->frame_mv, ref_frame, gf_temporal_ref, bsize,
+ mi_row, mi_col, &rate_mv, &search_state->best_rdc);
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer2);
+ x->ms_stat_nonrd.ms_time[bsize][this_mode] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer2);
+#endif
+ // Skip NEWMV mode,
+ // (i). For bsize smaller than 16X16
+ // (ii). Based on sad of the predicted mv w.r.t LAST_FRAME
+ // (iii). When motion vector is same as that of reference mv
+ if (skip_newmv) {
+ return true;
+ }
+ }
+
+ // Check the current motion vector is same as that of previously evaluated
+ // motion vectors.
+ for (PREDICTION_MODE inter_mv_mode = NEARESTMV; inter_mv_mode <= NEWMV;
+ inter_mv_mode++) {
+ if (inter_mv_mode == this_mode) continue;
+ if (is_single_pred &&
+ search_state->mode_checked[inter_mv_mode][ref_frame] &&
+ this_mv->as_int ==
+ search_state->frame_mv[inter_mv_mode][ref_frame].as_int) {
+ skip_this_mv = 1;
+ break;
+ }
+ }
+
+ // Skip single mode if current motion vector is same that of previously
+ // evaluated motion vectors.
+ if (skip_this_mv && is_single_pred) return true;
+
+ // For screen: for spatially flat blocks with non-zero motion,
+ // skip newmv if the motion vector is (0, 0), and color is not set.
+ if (this_mode == NEWMV && cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
+ cpi->svc.spatial_layer_id == 0 && rt_sf->source_metrics_sb_nonrd) {
+ if (this_mv->as_int == 0 &&
+ x->content_state_sb.source_sad_nonrd != kZeroSad &&
+ ((x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] == 0 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] == 0) ||
+ cpi->rc.high_source_sad) &&
+ x->source_variance == 0)
+ return true;
+ }
+
+ mi->mode = this_mode;
+ mi->mv[0].as_int = this_mv->as_int;
+ mi->mv[1].as_int = 0;
+ if (!is_single_pred)
+ mi->mv[1].as_int = search_state->frame_mv[this_mode][ref_frame2].as_int;
+
+ // Set buffers to store predicted samples for reuse
+ if (reuse_inter_pred) {
+ if (!*this_mode_pred) {
+ *this_mode_pred = &tmp_buffer[3];
+ } else {
+ *this_mode_pred = &tmp_buffer[get_pred_buffer(tmp_buffer, 3)];
+ pd->dst.buf = (*this_mode_pred)->data;
+ pd->dst.stride = bw;
+ }
+ }
+
+ if (idx == 0 && !skip_pred_mv) {
+ // Set color sensitivity on first tested mode only.
+ // Use y-sad already computed in find_predictors: take the sad with motion
+ // vector closest to 0; the uv-sad computed below in set_color_sensitivity
+ // is for zeromv.
+ // For screen: first check if golden reference is being used, if so,
+ // force color_sensitivity on if the color sensitivity for sb_g is on.
+ if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
+ search_state->use_ref_frame_mask[GOLDEN_FRAME]) {
+ if (x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_U)] == 1)
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] = 1;
+ if (x->color_sensitivity_sb_g[COLOR_SENS_IDX(AOM_PLANE_V)] == 1)
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] = 1;
+ } else {
+ int y_sad = x->pred_mv0_sad[LAST_FRAME];
+ if (x->pred_mv1_sad[LAST_FRAME] != INT_MAX &&
+ (abs(search_state->frame_mv[NEARMV][LAST_FRAME].as_mv.col) +
+ abs(search_state->frame_mv[NEARMV][LAST_FRAME].as_mv.row)) <
+ (abs(search_state->frame_mv[NEARESTMV][LAST_FRAME].as_mv.col) +
+ abs(search_state->frame_mv[NEARESTMV][LAST_FRAME].as_mv.row)))
+ y_sad = x->pred_mv1_sad[LAST_FRAME];
+ set_color_sensitivity(cpi, x, bsize, y_sad, x->source_variance,
+ search_state->yv12_mb[LAST_FRAME]);
+ }
+ }
+
+ mi->motion_mode = SIMPLE_TRANSLATION;
+#if !CONFIG_REALTIME_ONLY
+ if (cpi->oxcf.motion_mode_cfg.allow_warped_motion) {
+ calc_num_proj_ref(cpi, x, mi);
+ }
+#endif
+ // set variance threshold for compound mode pruning
+ if (rt_sf->prune_compoundmode_with_singlecompound_var && !is_single_pred &&
+ use_model_yrd_large) {
+ const PREDICTION_MODE single_mode0 = compound_ref0_mode(this_mode);
+ const PREDICTION_MODE single_mode1 = compound_ref1_mode(this_mode);
+ var_threshold =
+ AOMMIN(var_threshold,
+ search_state->vars[INTER_OFFSET(single_mode0)][ref_frame]);
+ var_threshold =
+ AOMMIN(var_threshold,
+ search_state->vars[INTER_OFFSET(single_mode1)][ref_frame2]);
+ }
+
+ // decide interpolation filter, build prediction signal, get sse
+ const bool is_mv_subpel =
+ (mi->mv[0].as_mv.row & 0x07) || (mi->mv[0].as_mv.col & 0x07);
+ const bool enable_filt_search_this_mode =
+ (filter_search_enabled_blk == 2)
+ ? true
+ : (filter_search_enabled_blk && !force_mv_inter_layer &&
+ is_single_pred &&
+ (ref_frame == LAST_FRAME || !x->nonrd_prune_ref_frame_search));
+ if (is_mv_subpel && enable_filt_search_this_mode) {
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer2);
+#endif
+ search_filter_ref(
+ cpi, x, &search_state->this_rdc, &inter_pred_params_sr, mi_row, mi_col,
+ tmp_buffer, bsize, reuse_inter_pred, this_mode_pred, &this_early_term,
+ &var, use_model_yrd_large, best_pickmode->best_sse, is_single_pred);
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer2);
+ x->ms_stat_nonrd.ifs_time[bsize][this_mode] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer2);
+#endif
+#if !CONFIG_REALTIME_ONLY
+ } else if (cpi->oxcf.motion_mode_cfg.allow_warped_motion &&
+ this_mode == NEWMV) {
+ // Find the best motion mode when current mode is NEWMV
+ search_motion_mode(cpi, x, &search_state->this_rdc, mi_row, mi_col, bsize,
+ &this_early_term, use_model_yrd_large, &rate_mv,
+ best_pickmode->best_sse);
+ if (this_mode == NEWMV) {
+ this_mv[0] = mi->mv[0];
+ }
+#endif
+ } else {
+ mi->interp_filters =
+ (filter_ref == SWITCHABLE)
+ ? av1_broadcast_interp_filter(default_interp_filter)
+ : av1_broadcast_interp_filter(filter_ref);
+ if (force_mv_inter_layer)
+ mi->interp_filters = av1_broadcast_interp_filter(EIGHTTAP_REGULAR);
+
+ // If it is sub-pel motion and cb_pred_filter_search is enabled, select
+ // the pre-decided filter
+ if (is_mv_subpel && cb_pred_filter_search)
+ mi->interp_filters = av1_broadcast_interp_filter(filt_select);
+
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer2);
+#endif
+ if (is_single_pred) {
+ SubpelParams subpel_params;
+ // Initialize inter mode level params for single reference mode.
+ init_inter_mode_params(&mi->mv[0].as_mv, &inter_pred_params_sr,
+ &subpel_params, xd->block_ref_scale_factors[0],
+ pd->pre->width, pd->pre->height);
+ av1_enc_build_inter_predictor_y_nonrd(xd, &inter_pred_params_sr,
+ &subpel_params);
+ } else {
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_Y, AOM_PLANE_Y);
+ }
+
+ if (use_model_yrd_large) {
+ model_skip_for_sb_y_large(cpi, bsize, mi_row, mi_col, x, xd,
+ &search_state->this_rdc, &this_early_term, 0,
+ best_pickmode->best_sse, &var, var_threshold);
+ } else {
+ model_rd_for_sb_y(cpi, bsize, x, xd, &search_state->this_rdc, &var, 0,
+ &this_early_term);
+ }
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer2);
+ x->ms_stat_nonrd.model_rd_time[bsize][this_mode] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer2);
+#endif
+ }
+
+ // update variance for single mode
+ if (is_single_pred) {
+ search_state->vars[INTER_OFFSET(this_mode)][ref_frame] = var;
+ if (this_mv->as_int == 0) {
+ search_state->vars[INTER_OFFSET(GLOBALMV)][ref_frame] = var;
+ }
+ }
+ // prune compound mode based on single mode var threshold
+ if (!is_single_pred && var > var_threshold) {
+ if (reuse_inter_pred) free_pred_buffer(*this_mode_pred);
+ return true;
+ }
+
+ if (ref_frame == LAST_FRAME && this_mv->as_int == 0) {
+ *sse_zeromv_norm = (unsigned int)(search_state->this_rdc.sse >>
+ (b_width_log2_lookup[bsize] +
+ b_height_log2_lookup[bsize]));
+ }
+
+ // Perform early termination based on sse.
+ if (rt_sf->sse_early_term_inter_search &&
+ early_term_inter_search_with_sse(rt_sf->sse_early_term_inter_search,
+ bsize, search_state->this_rdc.sse,
+ best_pickmode->best_sse, this_mode)) {
+ if (reuse_inter_pred) free_pred_buffer(*this_mode_pred);
+ return true;
+ }
+
+#if COLLECT_NONRD_PICK_MODE_STAT
+ x->ms_stat_nonrd.num_nonskipped_searches[bsize][this_mode]++;
+#endif
+
+ const int skip_ctx = av1_get_skip_txfm_context(xd);
+ const int skip_txfm_cost = mode_costs->skip_txfm_cost[skip_ctx][1];
+ const int no_skip_txfm_cost = mode_costs->skip_txfm_cost[skip_ctx][0];
+ const int64_t sse_y = search_state->this_rdc.sse;
+
+ if (this_early_term) {
+ search_state->this_rdc.skip_txfm = 1;
+ search_state->this_rdc.rate = skip_txfm_cost;
+ search_state->this_rdc.dist = search_state->this_rdc.sse << 4;
+ } else {
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer2);
+#endif
+ // Calculates RD Cost using Hadamard transform.
+ av1_block_yrd(x, &search_state->this_rdc, &is_skippable, bsize,
+ mi->tx_size);
+ if (search_state->this_rdc.skip_txfm ||
+ RDCOST(x->rdmult, search_state->this_rdc.rate,
+ search_state->this_rdc.dist) >=
+ RDCOST(x->rdmult, 0, search_state->this_rdc.sse)) {
+ if (!search_state->this_rdc.skip_txfm) {
+ // Need to store "real" rdc for possible future use if UV rdc
+ // disallows tx skip
+ nonskip_rdc = search_state->this_rdc;
+ nonskip_rdc.rate += no_skip_txfm_cost;
+ }
+ search_state->this_rdc.rate = skip_txfm_cost;
+ search_state->this_rdc.skip_txfm = 1;
+ search_state->this_rdc.dist = search_state->this_rdc.sse;
+ } else {
+ search_state->this_rdc.rate += no_skip_txfm_cost;
+ }
+
+ // Populate predicted sample for chroma planes based on color sensitivity.
+ if ((x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)])) {
+ RD_STATS rdc_uv;
+ const BLOCK_SIZE uv_bsize =
+ get_plane_block_size(bsize, xd->plane[AOM_PLANE_U].subsampling_x,
+ xd->plane[AOM_PLANE_U].subsampling_y);
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)]) {
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_U, AOM_PLANE_U);
+ }
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]) {
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_V, AOM_PLANE_V);
+ }
+ // Compute sse for chroma planes.
+ const int64_t sse_uv = av1_model_rd_for_sb_uv(
+ cpi, uv_bsize, x, xd, &rdc_uv, AOM_PLANE_U, AOM_PLANE_V);
+ search_state->this_rdc.sse += sse_uv;
+ // Restore Y rdc if UV rdc disallows txfm skip
+ if (search_state->this_rdc.skip_txfm && !rdc_uv.skip_txfm &&
+ nonskip_rdc.rate != INT_MAX)
+ search_state->this_rdc = nonskip_rdc;
+ if (is_single_pred) {
+ search_state->uv_dist[INTER_OFFSET(this_mode)][ref_frame] = rdc_uv.dist;
+ }
+ search_state->this_rdc.rate += rdc_uv.rate;
+ search_state->this_rdc.dist += rdc_uv.dist;
+ search_state->this_rdc.skip_txfm =
+ search_state->this_rdc.skip_txfm && rdc_uv.skip_txfm;
+ }
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer2);
+ x->ms_stat_nonrd.txfm_time[bsize][this_mode] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer2);
+#endif
+ }
+
+ this_best_mode = this_mode;
+ // TODO(kyslov) account for UV prediction cost
+ search_state->this_rdc.rate += rate_mv;
+ if (!is_single_pred) {
+ const int16_t mode_ctx =
+ av1_mode_context_analyzer(mbmi_ext->mode_context, mi->ref_frame);
+ search_state->this_rdc.rate += cost_mv_ref(mode_costs, this_mode, mode_ctx);
+ } else {
+ // If the current mode has zeromv but is not GLOBALMV, compare the rate
+ // cost. If GLOBALMV is cheaper, use GLOBALMV instead.
+ if (this_mode != GLOBALMV &&
+ this_mv->as_int == search_state->frame_mv[GLOBALMV][ref_frame].as_int) {
+ if (is_globalmv_better(this_mode, ref_frame, rate_mv, mode_costs,
+ search_state->single_inter_mode_costs, mbmi_ext)) {
+ this_best_mode = GLOBALMV;
+ }
+ }
+
+ search_state->this_rdc.rate +=
+ search_state
+ ->single_inter_mode_costs[INTER_OFFSET(this_best_mode)][ref_frame];
+ }
+
+ if (is_single_pred && this_mv->as_int == 0 && var < UINT_MAX) {
+ search_state->vars[INTER_OFFSET(GLOBALMV)][ref_frame] = var;
+ }
+
+ search_state->this_rdc.rate += search_state->ref_costs_single[ref_frame];
+
+ search_state->this_rdc.rdcost = RDCOST(x->rdmult, search_state->this_rdc.rate,
+ search_state->this_rdc.dist);
+ if (cpi->oxcf.rc_cfg.mode == AOM_CBR && is_single_pred) {
+ newmv_diff_bias(xd, this_best_mode, &search_state->this_rdc, bsize,
+ search_state->frame_mv[this_best_mode][ref_frame].as_mv.row,
+ search_state->frame_mv[this_best_mode][ref_frame].as_mv.col,
+ cpi->speed, x->source_variance, x->content_state_sb);
+ }
+
+#if CONFIG_AV1_TEMPORAL_DENOISING
+ if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc_pickmode &&
+ cpi->denoiser.denoising_level > kDenLowLow) {
+ av1_denoiser_update_frame_stats(mi, sse_y, this_mode, ctx);
+ // Keep track of zero_last cost.
+ if (ref_frame == LAST_FRAME && this_mv->as_int == 0)
+ *zero_last_cost_orig = search_state->this_rdc.rdcost;
+ }
+#else
+ (void)(sse_y);
+#endif
+
+ search_state->mode_checked[this_mode][ref_frame] = 1;
+ search_state->mode_checked[this_best_mode][ref_frame] = 1;
+
+ if (*check_globalmv) {
+ int32_t abs_mv =
+ abs(search_state->frame_mv[this_best_mode][ref_frame].as_mv.row) +
+ abs(search_state->frame_mv[this_best_mode][ref_frame].as_mv.col);
+ // Early exit check: if the magnitude of this_best_mode's mv is small
+ // enough, we skip GLOBALMV check in the next loop iteration.
+ if (abs_mv < 2) {
+ *check_globalmv = false;
+ }
+ }
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer1);
+ x->ms_stat_nonrd.nonskipped_search_times[bsize][this_mode] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer1);
+#endif
+
+ // Copy best mode params to search state
+ if (search_state->this_rdc.rdcost < search_state->best_rdc.rdcost) {
+ search_state->best_rdc = search_state->this_rdc;
+ *best_early_term = this_early_term;
+ update_search_state_nonrd(search_state, mi, txfm_info, &nonskip_rdc, ctx,
+ this_best_mode, sse_y);
+
+ // This is needed for the compound modes.
+ search_state->frame_mv_best[this_best_mode][ref_frame].as_int =
+ search_state->frame_mv[this_best_mode][ref_frame].as_int;
+ if (ref_frame2 > NONE_FRAME) {
+ search_state->frame_mv_best[this_best_mode][ref_frame2].as_int =
+ search_state->frame_mv[this_best_mode][ref_frame2].as_int;
+ }
+
+ if (reuse_inter_pred) {
+ free_pred_buffer(best_pickmode->best_pred);
+ best_pickmode->best_pred = *this_mode_pred;
+ }
+ } else {
+ if (reuse_inter_pred) free_pred_buffer(*this_mode_pred);
+ }
+
+ if (*best_early_term && (idx > 0 || rt_sf->nonrd_aggressive_skip)) {
+ txfm_info->skip_txfm = 1;
+ return false;
+ }
+ return true;
+}
+
+// Function to perform screen content mode evaluation for non-rd
+static AOM_FORCE_INLINE void handle_screen_content_mode_nonrd(
+ AV1_COMP *cpi, MACROBLOCK *x, InterModeSearchStateNonrd *search_state,
+ PRED_BUFFER *this_mode_pred, PICK_MODE_CONTEXT *ctx,
+ PRED_BUFFER *tmp_buffer, struct buf_2d *orig_dst, int skip_idtx_palette,
+ int try_palette, BLOCK_SIZE bsize, int reuse_inter_pred, int mi_col,
+ int mi_row) {
+ AV1_COMMON *const cm = &cpi->common;
+ const REAL_TIME_SPEED_FEATURES *const rt_sf = &cpi->sf.rt_sf;
+ MACROBLOCKD *const xd = &x->e_mbd;
+ MB_MODE_INFO *const mi = xd->mi[0];
+ struct macroblockd_plane *const pd = &xd->plane[0];
+ const int bw = block_size_wide[bsize];
+ const int bh = block_size_high[bsize];
+ TxfmSearchInfo *txfm_info = &x->txfm_search_info;
+ BEST_PICKMODE *const best_pickmode = &search_state->best_pickmode;
+
+ // TODO(marpan): Only allow for 8 bit-depth for now, re-enable for 10/12 bit
+ // when issue 3359 is fixed.
+ if (cm->seq_params->bit_depth == 8 &&
+ cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN && !skip_idtx_palette &&
+ !cpi->oxcf.txfm_cfg.use_inter_dct_only && !x->force_zeromv_skip_for_blk &&
+ is_inter_mode(best_pickmode->best_mode) &&
+ best_pickmode->best_pred != NULL &&
+ (!rt_sf->prune_idtx_nonrd ||
+ (rt_sf->prune_idtx_nonrd && bsize <= BLOCK_32X32 &&
+ best_pickmode->best_mode_skip_txfm != 1 && x->source_variance > 200))) {
+ RD_STATS idtx_rdc;
+ av1_init_rd_stats(&idtx_rdc);
+ int is_skippable;
+ this_mode_pred = &tmp_buffer[get_pred_buffer(tmp_buffer, 3)];
+ pd->dst.buf = this_mode_pred->data;
+ pd->dst.stride = bw;
+ const PRED_BUFFER *const best_pred = best_pickmode->best_pred;
+ av1_block_yrd_idtx(x, best_pred->data, best_pred->stride, &idtx_rdc,
+ &is_skippable, bsize, mi->tx_size);
+ int64_t idx_rdcost_y = RDCOST(x->rdmult, idtx_rdc.rate, idtx_rdc.dist);
+ int allow_idtx = 1;
+ // Incorporate color into rd cost.
+ if ((x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)])) {
+ RD_STATS rdc_uv;
+ const BLOCK_SIZE uv_bsize =
+ get_plane_block_size(bsize, xd->plane[AOM_PLANE_U].subsampling_x,
+ xd->plane[AOM_PLANE_U].subsampling_y);
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)]) {
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_U, AOM_PLANE_U);
+ }
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]) {
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
+ AOM_PLANE_V, AOM_PLANE_V);
+ }
+ av1_model_rd_for_sb_uv(cpi, uv_bsize, x, xd, &rdc_uv, AOM_PLANE_U,
+ AOM_PLANE_V);
+ idtx_rdc.rate += rdc_uv.rate;
+ idtx_rdc.dist += rdc_uv.dist;
+ idtx_rdc.skip_txfm = idtx_rdc.skip_txfm && rdc_uv.skip_txfm;
+ if (idx_rdcost_y == 0 && rdc_uv.dist > 0 && x->source_variance < 3000 &&
+ x->content_state_sb.source_sad_nonrd > kMedSad)
+ allow_idtx = 0;
+ }
+ int64_t idx_rdcost = RDCOST(x->rdmult, idtx_rdc.rate, idtx_rdc.dist);
+ if (allow_idtx && idx_rdcost < search_state->best_rdc.rdcost) {
+ best_pickmode->tx_type = IDTX;
+ search_state->best_rdc.rdcost = idx_rdcost;
+ best_pickmode->best_mode_skip_txfm = idtx_rdc.skip_txfm;
+ if (!idtx_rdc.skip_txfm) {
+ memcpy(ctx->blk_skip, txfm_info->blk_skip,
+ sizeof(txfm_info->blk_skip[0]) * ctx->num_4x4_blk);
+ }
+ xd->tx_type_map[0] = best_pickmode->tx_type;
+ memset(ctx->tx_type_map, best_pickmode->tx_type, ctx->num_4x4_blk);
+ memset(xd->tx_type_map, best_pickmode->tx_type, ctx->num_4x4_blk);
+ }
+ pd->dst = *orig_dst;
+ }
+
+ if (!try_palette) return;
+ const unsigned int intra_ref_frame_cost =
+ search_state->ref_costs_single[INTRA_FRAME];
+
+ if (!is_mode_intra(best_pickmode->best_mode)) {
+ PRED_BUFFER *const best_pred = best_pickmode->best_pred;
+ if (reuse_inter_pred && best_pred != NULL) {
+ if (best_pred->data == orig_dst->buf) {
+ this_mode_pred = &tmp_buffer[get_pred_buffer(tmp_buffer, 3)];
+ aom_convolve_copy(best_pred->data, best_pred->stride,
+ this_mode_pred->data, this_mode_pred->stride, bw, bh);
+ best_pickmode->best_pred = this_mode_pred;
+ }
+ }
+ pd->dst = *orig_dst;
+ }
+ // Search palette mode for Luma plane in inter frame.
+ av1_search_palette_mode_luma(cpi, x, bsize, intra_ref_frame_cost, ctx,
+ &search_state->this_rdc,
+ search_state->best_rdc.rdcost);
+ // Update best mode data in search_state
+ if (search_state->this_rdc.rdcost < search_state->best_rdc.rdcost) {
+ best_pickmode->pmi = mi->palette_mode_info;
+ best_pickmode->best_mode = DC_PRED;
+ mi->mv[0].as_int = INVALID_MV;
+ mi->mv[1].as_int = INVALID_MV;
+ best_pickmode->best_ref_frame = INTRA_FRAME;
+ best_pickmode->best_second_ref_frame = NONE;
+ search_state->best_rdc.rate = search_state->this_rdc.rate;
+ search_state->best_rdc.dist = search_state->this_rdc.dist;
+ search_state->best_rdc.rdcost = search_state->this_rdc.rdcost;
+ best_pickmode->best_mode_skip_txfm = search_state->this_rdc.skip_txfm;
+ // Keep the skip_txfm off if the color_sensitivity is set.
+ if (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)])
+ search_state->this_rdc.skip_txfm = 0;
+ if (!search_state->this_rdc.skip_txfm) {
+ memcpy(ctx->blk_skip, txfm_info->blk_skip,
+ sizeof(txfm_info->blk_skip[0]) * ctx->num_4x4_blk);
+ }
+ if (xd->tx_type_map[0] != DCT_DCT)
+ av1_copy_array(ctx->tx_type_map, xd->tx_type_map, ctx->num_4x4_blk);
+ }
+}
+
+/*!\brief AV1 inter mode selection based on Non-RD optimized model.
+ *
+ * \ingroup nonrd_mode_search
+ * \callgraph
+ * Top level function for Non-RD optimized inter mode selection.
+ * This finction will loop over subset of inter modes and select the best one
+ * based on calculated modelled RD cost. While making decisions which modes to
+ * check, this function applies heuristics based on previously checked modes,
+ * block residual variance, block size, and other factors to prune certain
+ * modes and reference frames. Currently only single reference frame modes
+ * are checked. Additional heuristics are applied to decide if intra modes
+ * need to be checked.
+ * *
+ * \param[in] cpi Top-level encoder structure
+ * \param[in] tile_data Pointer to struct holding adaptive
+ data/contexts/models for the tile during
+ encoding
+ * \param[in] x Pointer to structure holding all the data for
+ the current macroblock
+ * \param[in] rd_cost Struct to keep track of the RD information
+ * \param[in] bsize Current block size
+ * \param[in] ctx Structure to hold snapshot of coding context
+ during the mode picking process
+ *
+ * \remark Nothing is returned. Instead, the MB_MODE_INFO struct inside x
+ * is modified to store information about the best mode computed
+ * in this function. The rd_cost struct is also updated with the RD stats
+ * corresponding to the best mode found.
+ */
void av1_nonrd_pick_inter_mode_sb(AV1_COMP *cpi, TileDataEnc *tile_data,
MACROBLOCK *x, RD_STATS *rd_cost,
BLOCK_SIZE bsize, PICK_MODE_CONTEXT *ctx) {
@@ -3561,10 +3064,8 @@
SVC *const svc = &cpi->svc;
MACROBLOCKD *const xd = &x->e_mbd;
MB_MODE_INFO *const mi = xd->mi[0];
- struct macroblockd_plane *const pd = &xd->plane[0];
+ struct macroblockd_plane *const pd = &xd->plane[AOM_PLANE_Y];
const MB_MODE_INFO_EXT *const mbmi_ext = &x->mbmi_ext;
- const InterpFilter filter_ref = cm->features.interp_filter;
- const InterpFilter default_interp_filter = EIGHTTAP_REGULAR;
MV_REFERENCE_FRAME ref_frame, ref_frame2;
const unsigned char segment_id = mi->segment_id;
int best_early_term = 0;
@@ -3572,30 +3073,33 @@
unsigned int sse_zeromv_norm = UINT_MAX;
int skip_pred_mv = 0;
const int num_inter_modes = NUM_INTER_MODES;
- bool check_globalmv = cpi->sf.rt_sf.check_globalmv_on_single_ref;
+ const REAL_TIME_SPEED_FEATURES *const rt_sf = &cpi->sf.rt_sf;
+ bool check_globalmv = rt_sf->check_globalmv_on_single_ref;
PRED_BUFFER tmp_buffer[4];
- DECLARE_ALIGNED(16, uint8_t, pred_buf[3 * 128 * 128]);
+ DECLARE_ALIGNED(16, uint8_t, pred_buf[MAX_MB_PLANE * MAX_SB_SQUARE]);
PRED_BUFFER *this_mode_pred = NULL;
- const int reuse_inter_pred = cpi->sf.rt_sf.reuse_inter_pred_nonrd &&
- cm->seq_params->bit_depth == AOM_BITS_8;
+ const int reuse_inter_pred =
+ rt_sf->reuse_inter_pred_nonrd && cm->seq_params->bit_depth == AOM_BITS_8;
InterModeSearchStateNonrd search_state;
av1_zero(search_state.use_ref_frame_mask);
+ BEST_PICKMODE *const best_pickmode = &search_state.best_pickmode;
+ (void)tile_data;
const int bh = block_size_high[bsize];
const int bw = block_size_wide[bsize];
const int pixels_in_block = bh * bw;
- const int num_8x8_blocks = ctx->num_4x4_blk / 4;
struct buf_2d orig_dst = pd->dst;
const TxfmSearchParams *txfm_params = &x->txfm_search_params;
TxfmSearchInfo *txfm_info = &x->txfm_search_info;
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.bsize_timer);
+#if COLLECT_NONRD_PICK_MODE_STAT
+ // Mode statistics can be collected only when num_workers is 1
+ assert(cpi->mt_info.num_workers <= 1);
+ aom_usec_timer_start(&x->ms_stat_nonrd.bsize_timer);
#endif
int64_t thresh_sad_pred = INT64_MAX;
const int mi_row = xd->mi_row;
const int mi_col = xd->mi_col;
- int svc_mv_col = 0;
- int svc_mv_row = 0;
+ int_mv svc_mv = { .as_int = 0 };
int force_mv_inter_layer = 0;
bool comp_use_zero_zeromv_only = 0;
int tot_num_comp_modes = NUM_COMP_INTER_MODES_RT;
@@ -3609,10 +3113,10 @@
const ModeCosts *mode_costs = &x->mode_costs;
if (reuse_inter_pred) {
- for (int i = 0; i < 3; i++) {
- tmp_buffer[i].data = &pred_buf[pixels_in_block * i];
- tmp_buffer[i].stride = bw;
- tmp_buffer[i].in_use = 0;
+ for (int buf_idx = 0; buf_idx < 3; buf_idx++) {
+ tmp_buffer[buf_idx].data = &pred_buf[pixels_in_block * buf_idx];
+ tmp_buffer[buf_idx].stride = bw;
+ tmp_buffer[buf_idx].in_use = 0;
}
tmp_buffer[3].data = pd->dst.buf;
tmp_buffer[3].stride = pd->dst.stride;
@@ -3629,25 +3133,24 @@
if (cpi->ppi->use_svc && svc->spatial_layer_id > 0 &&
svc->downsample_filter_phase[svc->spatial_layer_id - 1] == 8 &&
cm->width * cm->height > 640 * 480) {
- svc_mv_col = -4;
- svc_mv_row = -4;
+ svc_mv.as_mv.row = -4;
+ svc_mv.as_mv.col = -4;
}
// Setup parameters used for inter mode evaluation.
set_params_nonrd_pick_inter_mode(
- cpi, x, &search_state, tile_data, ctx, rd_cost, &force_skip_low_temp_var,
- &skip_pred_mv, mi_row, mi_col, gf_temporal_ref, segment_id, bsize
+ cpi, x, &search_state, rd_cost, &force_skip_low_temp_var, &skip_pred_mv,
+ mi_row, mi_col, gf_temporal_ref, segment_id, bsize
#if CONFIG_AV1_TEMPORAL_DENOISING
,
- denoise_svc_pickmode
+ ctx, denoise_svc_pickmode
#endif
);
- if (cpi->sf.rt_sf.use_comp_ref_nonrd && is_comp_ref_allowed(bsize)) {
+ if (rt_sf->use_comp_ref_nonrd && is_comp_ref_allowed(bsize)) {
// Only search compound if bsize \gt BLOCK_16X16.
if (bsize > BLOCK_16X16) {
- comp_use_zero_zeromv_only =
- cpi->sf.rt_sf.check_only_zero_zeromv_on_large_blocks;
+ comp_use_zero_zeromv_only = rt_sf->check_only_zero_zeromv_on_large_blocks;
} else {
tot_num_comp_modes = 0;
}
@@ -3658,7 +3161,7 @@
if (x->pred_mv_sad[LAST_FRAME] != INT_MAX) {
thresh_sad_pred = ((int64_t)x->pred_mv_sad[LAST_FRAME]) << 1;
// Increase threshold for less aggressive pruning.
- if (cpi->sf.rt_sf.nonrd_prune_ref_frame_search == 1)
+ if (rt_sf->nonrd_prune_ref_frame_search == 1)
thresh_sad_pred += (x->pred_mv_sad[LAST_FRAME] >> 2);
}
@@ -3679,10 +3182,10 @@
is_filter_search_enabled_blk(cpi, x, mi_row, mi_col, bsize, segment_id,
cb_pred_filter_search, &filt_select);
-#if COLLECT_PICK_MODE_STAT
- ms_stat.num_blocks[bsize]++;
+#if COLLECT_NONRD_PICK_MODE_STAT
+ x->ms_stat_nonrd.num_blocks[bsize]++;
#endif
- init_mbmi(mi, DC_PRED, NONE_FRAME, NONE_FRAME, cm);
+ init_mbmi_nonrd(mi, DC_PRED, NONE_FRAME, NONE_FRAME, cm);
mi->tx_size = AOMMIN(
AOMMIN(max_txsize_lookup[bsize],
tx_mode_to_biggest_tx_size[txfm_params->tx_mode_search_type]),
@@ -3707,456 +3210,71 @@
for (int idx = 0; idx < num_inter_modes + tot_num_comp_modes; ++idx) {
// If we are at the first compound mode, and the single modes already
// perform well, then end the search.
- if (cpi->sf.rt_sf.skip_compound_based_on_var && idx == num_inter_modes &&
+ if (rt_sf->skip_compound_based_on_var && idx == num_inter_modes &&
skip_comp_based_on_var(search_state.vars, bsize)) {
break;
}
- int rate_mv = 0;
- int is_skippable;
- int this_early_term = 0;
- int skip_this_mv = 0;
- int comp_pred = 0;
- unsigned int var = UINT_MAX;
+ int is_single_pred = 1;
PREDICTION_MODE this_mode;
- RD_STATS nonskip_rdc;
- av1_invalid_rd_stats(&nonskip_rdc);
- memset(txfm_info->blk_skip, 0,
- sizeof(txfm_info->blk_skip[0]) * num_8x8_blocks);
// Check the inter mode can be skipped based on mode statistics and speed
// features settings.
- if (skip_inter_mode_nonrd(
- cpi, x, &search_state, &thresh_sad_pred, &force_mv_inter_layer,
- &comp_pred, &this_mode, &last_comp_ref_frame, &ref_frame,
- &ref_frame2, idx, svc_mv_col, svc_mv_row, force_skip_low_temp_var,
- sse_zeromv_norm, num_inter_modes, segment_id, bsize,
- comp_use_zero_zeromv_only, check_globalmv))
+ if (skip_inter_mode_nonrd(cpi, x, &search_state, &thresh_sad_pred,
+ &force_mv_inter_layer, &is_single_pred,
+ &this_mode, &last_comp_ref_frame, &ref_frame,
+ &ref_frame2, idx, svc_mv, force_skip_low_temp_var,
+ sse_zeromv_norm, num_inter_modes, segment_id,
+ bsize, comp_use_zero_zeromv_only, check_globalmv))
continue;
// Select prediction reference frames.
- for (int i = 0; i < MAX_MB_PLANE; i++) {
- xd->plane[i].pre[0] = search_state.yv12_mb[ref_frame][i];
- if (comp_pred) xd->plane[i].pre[1] = search_state.yv12_mb[ref_frame2][i];
+ for (int plane = 0; plane < MAX_MB_PLANE; plane++) {
+ xd->plane[plane].pre[0] = search_state.yv12_mb[ref_frame][plane];
+ if (!is_single_pred)
+ xd->plane[plane].pre[1] = search_state.yv12_mb[ref_frame2][plane];
}
mi->ref_frame[0] = ref_frame;
mi->ref_frame[1] = ref_frame2;
set_ref_ptrs(cm, xd, ref_frame, ref_frame2);
- if (this_mode == NEWMV && !force_mv_inter_layer) {
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer2);
-#endif
- const bool skip_newmv = search_new_mv(
- cpi, x, search_state.frame_mv, ref_frame, gf_temporal_ref, bsize,
- mi_row, mi_col, &rate_mv, &search_state.best_rdc);
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer2);
- ms_stat.ms_time[bsize][this_mode] +=
- aom_usec_timer_elapsed(&ms_stat.timer2);
-#endif
- if (skip_newmv) {
- continue;
- }
- }
-
- for (PREDICTION_MODE inter_mv_mode = NEARESTMV; inter_mv_mode <= NEWMV;
- inter_mv_mode++) {
- if (inter_mv_mode == this_mode) continue;
- if (!comp_pred && search_state.mode_checked[inter_mv_mode][ref_frame] &&
- search_state.frame_mv[this_mode][ref_frame].as_int ==
- search_state.frame_mv[inter_mv_mode][ref_frame].as_int) {
- skip_this_mv = 1;
- break;
- }
- }
-
- if (skip_this_mv && !comp_pred) continue;
-
- // For screen: for spatially flat blocks with non-zero motion,
- // skip newmv if the motion vector is (0, 0), and color is not set.
- if (this_mode == NEWMV &&
- cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
- cpi->svc.spatial_layer_id == 0 &&
- cpi->sf.rt_sf.source_metrics_sb_nonrd) {
- if (search_state.frame_mv[this_mode][ref_frame].as_int == 0 &&
- x->content_state_sb.source_sad_nonrd != kZeroSad &&
- ((x->color_sensitivity[0] == 0 && x->color_sensitivity[1] == 0) ||
- cpi->rc.high_source_sad) &&
- x->source_variance == 0)
- continue;
- }
-
- mi->mode = this_mode;
- mi->mv[0].as_int = search_state.frame_mv[this_mode][ref_frame].as_int;
- mi->mv[1].as_int = 0;
- if (comp_pred)
- mi->mv[1].as_int = search_state.frame_mv[this_mode][ref_frame2].as_int;
-
- if (reuse_inter_pred) {
- if (!this_mode_pred) {
- this_mode_pred = &tmp_buffer[3];
- } else {
- this_mode_pred = &tmp_buffer[get_pred_buffer(tmp_buffer, 3)];
- pd->dst.buf = this_mode_pred->data;
- pd->dst.stride = bw;
- }
- }
-
- if (idx == 0 && !skip_pred_mv) {
- // Set color sensitivity on first tested mode only.
- // Use y-sad already computed in find_predictors: take the sad with motion
- // vector closest to 0; the uv-sad computed below in set_color_sensitivity
- // is for zeromv.
- // For screen: first check if golden reference is being used, if so,
- // force color_sensitivity on if the color sensitivity for sb_g is on.
- if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
- search_state.use_ref_frame_mask[GOLDEN_FRAME]) {
- if (x->color_sensitivity_sb_g[0] == 1) x->color_sensitivity[0] = 1;
- if (x->color_sensitivity_sb_g[1] == 1) x->color_sensitivity[1] = 1;
- } else {
- int y_sad = x->pred_mv0_sad[LAST_FRAME];
- if (x->pred_mv1_sad[LAST_FRAME] != INT_MAX &&
- (abs(search_state.frame_mv[NEARMV][LAST_FRAME].as_mv.col) +
- abs(search_state.frame_mv[NEARMV][LAST_FRAME].as_mv.row)) <
- (abs(search_state.frame_mv[NEARESTMV][LAST_FRAME].as_mv.col) +
- abs(search_state.frame_mv[NEARESTMV][LAST_FRAME].as_mv.row)))
- y_sad = x->pred_mv1_sad[LAST_FRAME];
- set_color_sensitivity(cpi, x, bsize, y_sad, x->source_variance,
- search_state.yv12_mb[LAST_FRAME]);
- }
- }
- mi->motion_mode = SIMPLE_TRANSLATION;
-#if !CONFIG_REALTIME_ONLY
- if (cpi->oxcf.motion_mode_cfg.allow_warped_motion) {
- calc_num_proj_ref(cpi, x, mi);
- }
-#endif
- // set variance threshold for compound more pruning
- unsigned int var_threshold = UINT_MAX;
- if (cpi->sf.rt_sf.prune_compoundmode_with_singlecompound_var && comp_pred &&
- use_model_yrd_large) {
- const PREDICTION_MODE single_mode0 = compound_ref0_mode(this_mode);
- const PREDICTION_MODE single_mode1 = compound_ref1_mode(this_mode);
- var_threshold =
- AOMMIN(var_threshold,
- search_state.vars[INTER_OFFSET(single_mode0)][ref_frame]);
- var_threshold =
- AOMMIN(var_threshold,
- search_state.vars[INTER_OFFSET(single_mode1)][ref_frame2]);
- }
- // decide interpolation filter, build prediction signal, get sse
- const bool is_mv_subpel =
- (mi->mv[0].as_mv.row & 0x07) || (mi->mv[0].as_mv.col & 0x07);
- const bool enable_filt_search_this_mode =
- (filter_search_enabled_blk == 2)
- ? true
- : (filter_search_enabled_blk && !force_mv_inter_layer &&
- !comp_pred &&
- (ref_frame == LAST_FRAME || !x->nonrd_prune_ref_frame_search));
- if (is_mv_subpel && enable_filt_search_this_mode) {
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer2);
-#endif
- search_filter_ref(cpi, x, &search_state.this_rdc, &inter_pred_params_sr,
- mi_row, mi_col, tmp_buffer, bsize, reuse_inter_pred,
- &this_mode_pred, &this_early_term, &var,
- use_model_yrd_large,
- search_state.best_pickmode.best_sse, comp_pred);
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer2);
- ms_stat.ifs_time[bsize][this_mode] +=
- aom_usec_timer_elapsed(&ms_stat.timer2);
-#endif
-#if !CONFIG_REALTIME_ONLY
- } else if (cpi->oxcf.motion_mode_cfg.allow_warped_motion &&
- this_mode == NEWMV) {
- search_motion_mode(cpi, x, &search_state.this_rdc, mi_row, mi_col, bsize,
- &this_early_term, use_model_yrd_large, &rate_mv,
- search_state.best_pickmode.best_sse);
- if (this_mode == NEWMV) {
- search_state.frame_mv[this_mode][ref_frame] = mi->mv[0];
- }
-#endif
- } else {
- mi->interp_filters =
- (filter_ref == SWITCHABLE)
- ? av1_broadcast_interp_filter(default_interp_filter)
- : av1_broadcast_interp_filter(filter_ref);
- if (force_mv_inter_layer)
- mi->interp_filters = av1_broadcast_interp_filter(EIGHTTAP_REGULAR);
-
- // If it is sub-pel motion and cb_pred_filter_search is enabled, select
- // the pre-decided filter
- if (is_mv_subpel && cb_pred_filter_search)
- mi->interp_filters = av1_broadcast_interp_filter(filt_select);
-
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer2);
-#endif
- if (!comp_pred) {
- SubpelParams subpel_params;
- // Initialize inter mode level params for single reference mode.
- init_inter_mode_params(&mi->mv[0].as_mv, &inter_pred_params_sr,
- &subpel_params, xd->block_ref_scale_factors[0],
- pd->pre->width, pd->pre->height);
- av1_enc_build_inter_predictor_y_nonrd(xd, &inter_pred_params_sr,
- &subpel_params);
- } else {
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0,
- 0);
- }
-
- if (use_model_yrd_large) {
- model_skip_for_sb_y_large(cpi, bsize, mi_row, mi_col, x, xd,
- &search_state.this_rdc, &this_early_term, 0,
- search_state.best_pickmode.best_sse, &var,
- var_threshold);
- } else {
- model_rd_for_sb_y(cpi, bsize, x, xd, &search_state.this_rdc, &var, 0,
- &this_early_term);
- }
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer2);
- ms_stat.model_rd_time[bsize][this_mode] +=
- aom_usec_timer_elapsed(&ms_stat.timer2);
-#endif
- }
- // update variance for single mode
- if (!comp_pred) {
- search_state.vars[INTER_OFFSET(this_mode)][ref_frame] = var;
- if (search_state.frame_mv[this_mode][ref_frame].as_int == 0) {
- search_state.vars[INTER_OFFSET(GLOBALMV)][ref_frame] = var;
- }
- }
- // prune compound mode based on single mode var threshold
- if (comp_pred && var > var_threshold) {
- if (reuse_inter_pred) free_pred_buffer(this_mode_pred);
- continue;
- }
-
- if (ref_frame == LAST_FRAME &&
- search_state.frame_mv[this_mode][ref_frame].as_int == 0) {
- sse_zeromv_norm = (unsigned int)(search_state.this_rdc.sse >>
- (b_width_log2_lookup[bsize] +
- b_height_log2_lookup[bsize]));
- }
-
- if (cpi->sf.rt_sf.sse_early_term_inter_search &&
- early_term_inter_search_with_sse(
- cpi->sf.rt_sf.sse_early_term_inter_search, bsize,
- search_state.this_rdc.sse, search_state.best_pickmode.best_sse,
- this_mode)) {
- if (reuse_inter_pred) free_pred_buffer(this_mode_pred);
- continue;
- }
-
-#if COLLECT_PICK_MODE_STAT
- ms_stat.num_nonskipped_searches[bsize][this_mode]++;
-#endif
-
- const int skip_ctx = av1_get_skip_txfm_context(xd);
- const int skip_txfm_cost = mode_costs->skip_txfm_cost[skip_ctx][1];
- const int no_skip_txfm_cost = mode_costs->skip_txfm_cost[skip_ctx][0];
- const int64_t sse_y = search_state.this_rdc.sse;
- if (this_early_term) {
- search_state.this_rdc.skip_txfm = 1;
- search_state.this_rdc.rate = skip_txfm_cost;
- search_state.this_rdc.dist = search_state.this_rdc.sse << 4;
- } else {
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer2);
-#endif
- block_yrd(x, &search_state.this_rdc, &is_skippable, bsize, mi->tx_size,
- 1);
- if (search_state.this_rdc.skip_txfm ||
- RDCOST(x->rdmult, search_state.this_rdc.rate,
- search_state.this_rdc.dist) >=
- RDCOST(x->rdmult, 0, search_state.this_rdc.sse)) {
- if (!search_state.this_rdc.skip_txfm) {
- // Need to store "real" rdc for possible future use if UV rdc
- // disallows tx skip
- nonskip_rdc = search_state.this_rdc;
- nonskip_rdc.rate += no_skip_txfm_cost;
- }
- search_state.this_rdc.rate = skip_txfm_cost;
- search_state.this_rdc.skip_txfm = 1;
- search_state.this_rdc.dist = search_state.this_rdc.sse;
- } else {
- search_state.this_rdc.rate += no_skip_txfm_cost;
- }
- if ((x->color_sensitivity[0] || x->color_sensitivity[1])) {
- RD_STATS rdc_uv;
- const BLOCK_SIZE uv_bsize = get_plane_block_size(
- bsize, xd->plane[1].subsampling_x, xd->plane[1].subsampling_y);
- if (x->color_sensitivity[0]) {
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
- AOM_PLANE_U, AOM_PLANE_U);
- }
- if (x->color_sensitivity[1]) {
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize,
- AOM_PLANE_V, AOM_PLANE_V);
- }
- const int64_t sse_uv =
- model_rd_for_sb_uv(cpi, uv_bsize, x, xd, &rdc_uv, 1, 2);
- search_state.this_rdc.sse += sse_uv;
- // Restore Y rdc if UV rdc disallows txfm skip
- if (search_state.this_rdc.skip_txfm && !rdc_uv.skip_txfm &&
- nonskip_rdc.rate != INT_MAX)
- search_state.this_rdc = nonskip_rdc;
- if (!comp_pred) {
- search_state.uv_dist[INTER_OFFSET(this_mode)][ref_frame] =
- rdc_uv.dist;
- }
- search_state.this_rdc.rate += rdc_uv.rate;
- search_state.this_rdc.dist += rdc_uv.dist;
- search_state.this_rdc.skip_txfm =
- search_state.this_rdc.skip_txfm && rdc_uv.skip_txfm;
- }
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer2);
- ms_stat.txfm_time[bsize][this_mode] +=
- aom_usec_timer_elapsed(&ms_stat.timer2);
-#endif
- }
- PREDICTION_MODE this_best_mode = this_mode;
-
- // TODO(kyslov) account for UV prediction cost
- search_state.this_rdc.rate += rate_mv;
- if (comp_pred) {
- const int16_t mode_ctx =
- av1_mode_context_analyzer(mbmi_ext->mode_context, mi->ref_frame);
- search_state.this_rdc.rate +=
- cost_mv_ref(mode_costs, this_mode, mode_ctx);
- } else {
- // If the current mode has zeromv but is not GLOBALMV, compare the rate
- // cost. If GLOBALMV is cheaper, use GLOBALMV instead.
- if (this_mode != GLOBALMV &&
- search_state.frame_mv[this_mode][ref_frame].as_int ==
- search_state.frame_mv[GLOBALMV][ref_frame].as_int) {
- if (is_globalmv_better(this_mode, ref_frame, rate_mv, mode_costs,
- search_state.single_inter_mode_costs,
- mbmi_ext)) {
- this_best_mode = GLOBALMV;
- }
- }
-
- search_state.this_rdc.rate +=
- search_state
- .single_inter_mode_costs[INTER_OFFSET(this_best_mode)][ref_frame];
- }
-
- if (!comp_pred && search_state.frame_mv[this_mode][ref_frame].as_int == 0 &&
- var < UINT_MAX) {
- search_state.vars[INTER_OFFSET(GLOBALMV)][ref_frame] = var;
- }
-
- search_state.this_rdc.rate += search_state.ref_costs_single[ref_frame];
-
- search_state.this_rdc.rdcost = RDCOST(x->rdmult, search_state.this_rdc.rate,
- search_state.this_rdc.dist);
- if (cpi->oxcf.rc_cfg.mode == AOM_CBR && !comp_pred) {
- newmv_diff_bias(
- xd, this_best_mode, &search_state.this_rdc, bsize,
- search_state.frame_mv[this_best_mode][ref_frame].as_mv.row,
- search_state.frame_mv[this_best_mode][ref_frame].as_mv.col,
- cpi->speed, x->source_variance, x->content_state_sb);
- }
+ // Perform inter mode evaluation for non-rd
+ if (!handle_inter_mode_nonrd(
+ cpi, x, &search_state, ctx, &this_mode_pred, tmp_buffer,
+ inter_pred_params_sr, &best_early_term, &sse_zeromv_norm,
+ &check_globalmv,
#if CONFIG_AV1_TEMPORAL_DENOISING
- if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc_pickmode &&
- cpi->denoiser.denoising_level > kDenLowLow) {
- av1_denoiser_update_frame_stats(mi, sse_y, this_mode, ctx);
- // Keep track of zero_last cost.
- if (ref_frame == LAST_FRAME &&
- search_state.frame_mv[this_mode][ref_frame].as_int == 0)
- zero_last_cost_orig = search_state.this_rdc.rdcost;
- }
-#else
- (void)sse_y;
+ &zero_last_cost_orig, denoise_svc_pickmode,
#endif
-
- search_state.mode_checked[this_mode][ref_frame] = 1;
- search_state.mode_checked[this_best_mode][ref_frame] = 1;
-
- if (check_globalmv) {
- int32_t abs_mv =
- abs(search_state.frame_mv[this_best_mode][ref_frame].as_mv.row) +
- abs(search_state.frame_mv[this_best_mode][ref_frame].as_mv.col);
- // Early exit check: if the magnitude of this_best_mode's mv is small
- // enough, we skip GLOBALMV check in the next loop iteration.
- if (abs_mv < 2) {
- check_globalmv = false;
- }
- }
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer1);
- ms_stat.nonskipped_search_times[bsize][this_mode] +=
- aom_usec_timer_elapsed(&ms_stat.timer1);
-#endif
- if (search_state.this_rdc.rdcost < search_state.best_rdc.rdcost) {
- search_state.best_rdc = search_state.this_rdc;
- best_early_term = this_early_term;
- search_state.best_pickmode.best_sse = sse_y;
- search_state.best_pickmode.best_mode = this_best_mode;
- search_state.best_pickmode.best_motion_mode = mi->motion_mode;
- search_state.best_pickmode.wm_params = mi->wm_params;
- search_state.best_pickmode.num_proj_ref = mi->num_proj_ref;
- search_state.best_pickmode.best_pred_filter = mi->interp_filters;
- search_state.best_pickmode.best_tx_size = mi->tx_size;
- search_state.best_pickmode.best_ref_frame = ref_frame;
- search_state.best_pickmode.best_second_ref_frame = ref_frame2;
- search_state.best_pickmode.best_mode_skip_txfm =
- search_state.this_rdc.skip_txfm;
- search_state.best_pickmode.best_mode_initial_skip_flag =
- (nonskip_rdc.rate == INT_MAX && search_state.this_rdc.skip_txfm);
- if (!search_state.best_pickmode.best_mode_skip_txfm) {
- memcpy(search_state.best_pickmode.blk_skip, txfm_info->blk_skip,
- sizeof(txfm_info->blk_skip[0]) * num_8x8_blocks);
- }
-
- // This is needed for the compound modes.
- search_state.frame_mv_best[this_best_mode][ref_frame].as_int =
- search_state.frame_mv[this_best_mode][ref_frame].as_int;
- if (ref_frame2 > NONE_FRAME) {
- search_state.frame_mv_best[this_best_mode][ref_frame2].as_int =
- search_state.frame_mv[this_best_mode][ref_frame2].as_int;
- }
-
- if (reuse_inter_pred) {
- free_pred_buffer(search_state.best_pickmode.best_pred);
- search_state.best_pickmode.best_pred = this_mode_pred;
- }
- } else {
- if (reuse_inter_pred) free_pred_buffer(this_mode_pred);
- }
- if (best_early_term && (idx > 0 || cpi->sf.rt_sf.nonrd_aggressive_skip)) {
- txfm_info->skip_txfm = 1;
+ idx, force_mv_inter_layer, is_single_pred, skip_pred_mv,
+ gf_temporal_ref, use_model_yrd_large, filter_search_enabled_blk,
+ bsize, this_mode, filt_select, cb_pred_filter_search,
+ reuse_inter_pred)) {
break;
}
}
- mi->mode = search_state.best_pickmode.best_mode;
- mi->motion_mode = search_state.best_pickmode.best_motion_mode;
- mi->wm_params = search_state.best_pickmode.wm_params;
- mi->num_proj_ref = search_state.best_pickmode.num_proj_ref;
- mi->interp_filters = search_state.best_pickmode.best_pred_filter;
- mi->tx_size = search_state.best_pickmode.best_tx_size;
+ // Restore mode data of best inter mode
+ mi->mode = best_pickmode->best_mode;
+ mi->motion_mode = best_pickmode->best_motion_mode;
+ mi->wm_params = best_pickmode->wm_params;
+ mi->num_proj_ref = best_pickmode->num_proj_ref;
+ mi->interp_filters = best_pickmode->best_pred_filter;
+ mi->tx_size = best_pickmode->best_tx_size;
memset(mi->inter_tx_size, mi->tx_size, sizeof(mi->inter_tx_size));
- mi->ref_frame[0] = search_state.best_pickmode.best_ref_frame;
- mi->mv[0].as_int =
- search_state
- .frame_mv_best[search_state.best_pickmode.best_mode]
- [search_state.best_pickmode.best_ref_frame]
- .as_int;
+ mi->ref_frame[0] = best_pickmode->best_ref_frame;
+ mi->mv[0].as_int = search_state
+ .frame_mv_best[best_pickmode->best_mode]
+ [best_pickmode->best_ref_frame]
+ .as_int;
mi->mv[1].as_int = 0;
- if (search_state.best_pickmode.best_second_ref_frame > INTRA_FRAME) {
- mi->ref_frame[1] = search_state.best_pickmode.best_second_ref_frame;
- mi->mv[1].as_int =
- search_state
- .frame_mv_best[search_state.best_pickmode.best_mode]
- [search_state.best_pickmode.best_second_ref_frame]
- .as_int;
+ if (best_pickmode->best_second_ref_frame > INTRA_FRAME) {
+ mi->ref_frame[1] = best_pickmode->best_second_ref_frame;
+ mi->mv[1].as_int = search_state
+ .frame_mv_best[best_pickmode->best_mode]
+ [best_pickmode->best_second_ref_frame]
+ .as_int;
}
// Perform intra prediction search, if the best SAD is above a certain
// threshold.
@@ -4164,118 +3282,75 @@
mi->angle_delta[PLANE_TYPE_UV] = 0;
mi->filter_intra_mode_info.use_filter_intra = 0;
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_start(&ms_stat.timer1);
- ms_stat.num_searches[bsize][DC_PRED]++;
- ms_stat.num_nonskipped_searches[bsize][DC_PRED]++;
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_start(&x->ms_stat_nonrd.timer1);
+ x->ms_stat_nonrd.num_searches[bsize][DC_PRED]++;
+ x->ms_stat_nonrd.num_nonskipped_searches[bsize][DC_PRED]++;
#endif
- if (!x->force_zeromv_skip_for_blk)
- estimate_intra_mode(cpi, x, bsize, best_early_term,
- search_state.ref_costs_single[INTRA_FRAME],
- reuse_inter_pred, &orig_dst, tmp_buffer,
- &this_mode_pred, &search_state.best_rdc,
- &search_state.best_pickmode, ctx);
-
- int skip_idtx_palette =
- (x->color_sensitivity[0] || x->color_sensitivity[1]) &&
+ int force_palette_test = 0;
+ if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
x->content_state_sb.source_sad_nonrd != kZeroSad &&
- !cpi->rc.high_source_sad;
-
- // Check for IDTX: based only on Y channel, so avoid when color_sensitivity
- // is set.
- if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN && !skip_idtx_palette &&
- !cpi->oxcf.txfm_cfg.use_inter_dct_only && !x->force_zeromv_skip_for_blk &&
- is_inter_mode(search_state.best_pickmode.best_mode) &&
- (!cpi->sf.rt_sf.prune_idtx_nonrd ||
- (cpi->sf.rt_sf.prune_idtx_nonrd && bsize <= BLOCK_32X32 &&
- search_state.best_pickmode.best_mode_skip_txfm != 1 &&
- x->source_variance > 200))) {
- RD_STATS idtx_rdc;
- av1_init_rd_stats(&idtx_rdc);
- int is_skippable;
- this_mode_pred = &tmp_buffer[get_pred_buffer(tmp_buffer, 3)];
- pd->dst.buf = this_mode_pred->data;
- pd->dst.stride = bw;
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, NULL, bsize, 0, 0);
- block_yrd_idtx(x, &idtx_rdc, &is_skippable, bsize, mi->tx_size);
- int64_t idx_rdcost = RDCOST(x->rdmult, idtx_rdc.rate, idtx_rdc.dist);
- if (idx_rdcost < search_state.best_rdc.rdcost) {
- // Keep the skip_txfm off if the color_sensitivity is set.
- if (x->color_sensitivity[0] || x->color_sensitivity[1])
- idtx_rdc.skip_txfm = 0;
- search_state.best_pickmode.tx_type = IDTX;
- search_state.best_rdc.rdcost = idx_rdcost;
- search_state.best_pickmode.best_mode_skip_txfm = idtx_rdc.skip_txfm;
- if (!idtx_rdc.skip_txfm) {
- memcpy(search_state.best_pickmode.blk_skip, txfm_info->blk_skip,
- sizeof(txfm_info->blk_skip[0]) * num_8x8_blocks);
- }
- xd->tx_type_map[0] = search_state.best_pickmode.tx_type;
- memset(ctx->tx_type_map, search_state.best_pickmode.tx_type,
- ctx->num_4x4_blk);
- memset(xd->tx_type_map, search_state.best_pickmode.tx_type,
- ctx->num_4x4_blk);
- }
- pd->dst = orig_dst;
+ bsize <= BLOCK_16X16) {
+ unsigned int thresh_sse = cpi->rc.high_source_sad ? 15000 : 250000;
+ unsigned int thresh_source_var = cpi->rc.high_source_sad ? 50 : 1000;
+ unsigned int best_sse_inter_motion =
+ (unsigned int)(search_state.best_rdc.sse >>
+ (b_width_log2_lookup[bsize] +
+ b_height_log2_lookup[bsize]));
+ if (best_sse_inter_motion > thresh_sse &&
+ x->source_variance > thresh_source_var)
+ force_palette_test = 1;
}
+ // Evaluate Intra modes in inter frame
+ if (!x->force_zeromv_skip_for_blk)
+ av1_estimate_intra_mode(cpi, x, bsize, best_early_term,
+ search_state.ref_costs_single[INTRA_FRAME],
+ reuse_inter_pred, &orig_dst, tmp_buffer,
+ &this_mode_pred, &search_state.best_rdc,
+ best_pickmode, ctx);
+
+ int skip_idtx_palette = (x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]) &&
+ x->content_state_sb.source_sad_nonrd != kZeroSad &&
+ !cpi->rc.high_source_sad;
+
int try_palette =
!skip_idtx_palette && cpi->oxcf.tool_cfg.enable_palette &&
av1_allow_palette(cpi->common.features.allow_screen_content_tools,
mi->bsize);
- try_palette = try_palette &&
- is_mode_intra(search_state.best_pickmode.best_mode) &&
- x->source_variance > 0 && !x->force_zeromv_skip_for_blk &&
- (cpi->rc.high_source_sad || x->source_variance > 500);
+ try_palette =
+ try_palette &&
+ (is_mode_intra(best_pickmode->best_mode) || force_palette_test) &&
+ x->source_variance > 0 && !x->force_zeromv_skip_for_blk &&
+ (cpi->rc.high_source_sad || x->source_variance > 500);
- if (try_palette) {
- const unsigned int intra_ref_frame_cost =
- search_state.ref_costs_single[INTRA_FRAME];
+ if (rt_sf->prune_palette_nonrd && bsize > BLOCK_16X16) try_palette = 0;
- av1_search_palette_mode_luma(cpi, x, bsize, intra_ref_frame_cost, ctx,
- &search_state.this_rdc,
- search_state.best_rdc.rdcost);
- if (search_state.this_rdc.rdcost < search_state.best_rdc.rdcost) {
- search_state.best_pickmode.pmi = mi->palette_mode_info;
- search_state.best_pickmode.best_mode = DC_PRED;
- mi->mv[0].as_int = 0;
- search_state.best_rdc.rate = search_state.this_rdc.rate;
- search_state.best_rdc.dist = search_state.this_rdc.dist;
- search_state.best_rdc.rdcost = search_state.this_rdc.rdcost;
- search_state.best_pickmode.best_mode_skip_txfm =
- search_state.this_rdc.skip_txfm;
- // Keep the skip_txfm off if the color_sensitivity is set.
- if (x->color_sensitivity[0] || x->color_sensitivity[1])
- search_state.this_rdc.skip_txfm = 0;
- if (!search_state.this_rdc.skip_txfm) {
- memcpy(ctx->blk_skip, txfm_info->blk_skip,
- sizeof(txfm_info->blk_skip[0]) * ctx->num_4x4_blk);
- }
- if (xd->tx_type_map[0] != DCT_DCT)
- av1_copy_array(ctx->tx_type_map, xd->tx_type_map, ctx->num_4x4_blk);
- }
- }
+ // Perform screen content mode evaluation for non-rd
+ handle_screen_content_mode_nonrd(
+ cpi, x, &search_state, this_mode_pred, ctx, tmp_buffer, &orig_dst,
+ skip_idtx_palette, try_palette, bsize, reuse_inter_pred, mi_col, mi_row);
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.timer1);
- ms_stat.nonskipped_search_times[bsize][DC_PRED] +=
- aom_usec_timer_elapsed(&ms_stat.timer1);
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.timer1);
+ x->ms_stat_nonrd.nonskipped_search_times[bsize][DC_PRED] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.timer1);
#endif
pd->dst = orig_dst;
- if (try_palette) mi->palette_mode_info = search_state.best_pickmode.pmi;
- mi->mode = search_state.best_pickmode.best_mode;
- mi->ref_frame[0] = search_state.best_pickmode.best_ref_frame;
- mi->ref_frame[1] = search_state.best_pickmode.best_second_ref_frame;
- txfm_info->skip_txfm = search_state.best_pickmode.best_mode_skip_txfm;
- if (!txfm_info->skip_txfm) {
- // For inter modes: copy blk_skip from best_pickmode, which is
- // defined for 8x8 blocks. If palette or intra mode was selected
- // as best then blk_skip is already copied into the ctx.
- if (search_state.best_pickmode.best_mode >= INTRA_MODE_END)
- memcpy(ctx->blk_skip, search_state.best_pickmode.blk_skip,
- sizeof(search_state.best_pickmode.blk_skip[0]) * num_8x8_blocks);
+ // Best mode is finalized. Restore the mode data to mbmi
+ if (try_palette) mi->palette_mode_info = best_pickmode->pmi;
+ mi->mode = best_pickmode->best_mode;
+ mi->ref_frame[0] = best_pickmode->best_ref_frame;
+ mi->ref_frame[1] = best_pickmode->best_second_ref_frame;
+ // For lossless: always force the skip flags off.
+ if (is_lossless_requested(&cpi->oxcf.rc_cfg)) {
+ txfm_info->skip_txfm = 0;
+ memset(ctx->blk_skip, 0, sizeof(ctx->blk_skip[0]) * ctx->num_4x4_blk);
+ } else {
+ txfm_info->skip_txfm = best_pickmode->best_mode_skip_txfm;
}
if (has_second_ref(mi)) {
mi->comp_group_idx = 0;
@@ -4287,8 +3362,9 @@
mi->interp_filters = av1_broadcast_interp_filter(SWITCHABLE_FILTERS);
}
- if (reuse_inter_pred && search_state.best_pickmode.best_pred != NULL) {
- PRED_BUFFER *const best_pred = search_state.best_pickmode.best_pred;
+ // Restore the predicted samples of best mode to final buffer
+ if (reuse_inter_pred && best_pickmode->best_pred != NULL) {
+ PRED_BUFFER *const best_pred = best_pickmode->best_pred;
if (best_pred->data != orig_dst.buf && is_inter_mode(mi->mode)) {
aom_convolve_copy(best_pred->data, best_pred->stride, pd->dst.buf,
pd->dst.stride, bw, bh);
@@ -4303,52 +3379,50 @@
ctx->sb_skip_denoising = 0;
av1_pickmode_ctx_den_update(
&ctx_den, zero_last_cost_orig, search_state.ref_costs_single,
- search_state.frame_mv, reuse_inter_pred, &search_state.best_pickmode);
+ search_state.frame_mv, reuse_inter_pred, best_pickmode);
av1_denoiser_denoise(cpi, x, mi_row, mi_col, bsize, ctx, &decision,
gf_temporal_ref);
if (denoise_recheck_zeromv)
recheck_zeromv_after_denoising(
cpi, mi, x, xd, decision, &ctx_den, search_state.yv12_mb,
- &search_state.best_rdc, &search_state.best_pickmode, bsize, mi_row,
- mi_col);
- search_state.best_pickmode.best_ref_frame = ctx_den.best_ref_frame;
+ &search_state.best_rdc, best_pickmode, bsize, mi_row, mi_col);
+ best_pickmode->best_ref_frame = ctx_den.best_ref_frame;
}
#endif
+ // Update the factors used for RD thresholding for all modes.
if (cpi->sf.inter_sf.adaptive_rd_thresh && !has_second_ref(mi)) {
THR_MODES best_mode_idx =
- mode_idx[search_state.best_pickmode.best_ref_frame]
- [mode_offset(mi->mode)];
- if (search_state.best_pickmode.best_ref_frame == INTRA_FRAME) {
+ mode_idx[best_pickmode->best_ref_frame][mode_offset(mi->mode)];
+ if (best_pickmode->best_ref_frame == INTRA_FRAME) {
// Only consider the modes that are included in the intra_mode_list.
int intra_modes = sizeof(intra_mode_list) / sizeof(PREDICTION_MODE);
- for (int i = 0; i < intra_modes; i++) {
+ for (int mode_index = 0; mode_index < intra_modes; mode_index++) {
update_thresh_freq_fact(cpi, x, bsize, INTRA_FRAME, best_mode_idx,
- intra_mode_list[i]);
+ intra_mode_list[mode_index]);
}
} else {
PREDICTION_MODE this_mode;
for (this_mode = NEARESTMV; this_mode <= NEWMV; ++this_mode) {
- update_thresh_freq_fact(cpi, x, bsize,
- search_state.best_pickmode.best_ref_frame,
+ update_thresh_freq_fact(cpi, x, bsize, best_pickmode->best_ref_frame,
best_mode_idx, this_mode);
}
}
}
#if CONFIG_INTERNAL_STATS
- store_coding_context(x, ctx, mi->mode);
+ store_coding_context_nonrd(x, ctx, mi->mode);
#else
- store_coding_context(x, ctx);
+ store_coding_context_nonrd(x, ctx);
#endif // CONFIG_INTERNAL_STATS
-#if COLLECT_PICK_MODE_STAT
- aom_usec_timer_mark(&ms_stat.bsize_timer);
- ms_stat.total_block_times[bsize] +=
- aom_usec_timer_elapsed(&ms_stat.bsize_timer);
- print_time(&ms_stat, bsize, cm->mi_params.mi_rows, cm->mi_params.mi_cols,
- mi_row, mi_col);
-#endif // COLLECT_PICK_MODE_STAT
+#if COLLECT_NONRD_PICK_MODE_STAT
+ aom_usec_timer_mark(&x->ms_stat_nonrd.bsize_timer);
+ x->ms_stat_nonrd.total_block_times[bsize] +=
+ aom_usec_timer_elapsed(&x->ms_stat_nonrd.bsize_timer);
+ print_time(&x->ms_stat_nonrd, bsize, cm->mi_params.mi_rows,
+ cm->mi_params.mi_cols, mi_row, mi_col);
+#endif // COLLECT_NONRD_PICK_MODE_STAT
*rd_cost = search_state.best_rdc;
}
diff --git a/av1/encoder/palette.c b/av1/encoder/palette.c
index 9c3d407..b1a73e4 100644
--- a/av1/encoder/palette.c
+++ b/av1/encoder/palette.c
@@ -733,6 +733,9 @@
if (best_mbmi->palette_mode_info.palette_size[0] > 0) {
memcpy(color_map, best_palette_color_map,
block_width * block_height * sizeof(best_palette_color_map[0]));
+ // Gather the stats to determine whether to use screen content tools in
+ // function av1_determine_sc_tools_with_encoding().
+ x->palette_pixels += (block_width * block_height);
}
*mbmi = *best_mbmi;
}
diff --git a/av1/encoder/partition_search.c b/av1/encoder/partition_search.c
index 8d06bf5..96567dd 100644
--- a/av1/encoder/partition_search.c
+++ b/av1/encoder/partition_search.c
@@ -603,8 +603,7 @@
}
#if !CONFIG_REALTIME_ONLY
- const AV1_COMMON *const cm = &cpi->common;
- if (cm->delta_q_info.delta_q_present_flag &&
+ if (cpi->common.delta_q_info.delta_q_present_flag &&
!cpi->sf.rt_sf.use_nonrd_pick_mode) {
x->rdmult = av1_get_cb_rdmult(cpi, x, bsize, mi_row, mi_col);
}
@@ -614,15 +613,22 @@
av1_set_ssim_rdmult(cpi, &x->errorperbit, bsize, mi_row, mi_col,
&x->rdmult);
}
+#if CONFIG_SALIENCY_MAP
+ else if (cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_SALIENCY_MAP) {
+ av1_set_saliency_map_vmaf_rdmult(cpi, &x->errorperbit,
+ cpi->common.seq_params->sb_size, mi_row,
+ mi_col, &x->rdmult);
+ }
+#endif
#if CONFIG_TUNE_VMAF
- if (cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_WITHOUT_PREPROCESSING ||
- cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_MAX_GAIN ||
- cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_NEG_MAX_GAIN) {
+ else if (cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_WITHOUT_PREPROCESSING ||
+ cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_MAX_GAIN ||
+ cpi->oxcf.tune_cfg.tuning == AOM_TUNE_VMAF_NEG_MAX_GAIN) {
av1_set_vmaf_rdmult(cpi, x, bsize, mi_row, mi_col, &x->rdmult);
}
#endif
#if CONFIG_TUNE_BUTTERAUGLI
- if (cpi->oxcf.tune_cfg.tuning == AOM_TUNE_BUTTERAUGLI) {
+ else if (cpi->oxcf.tune_cfg.tuning == AOM_TUNE_BUTTERAUGLI) {
av1_set_butteraugli_rdmult(cpi, x, bsize, mi_row, mi_col, &x->rdmult);
}
#endif
@@ -1294,8 +1300,7 @@
}
if (inter_block && cm->features.interp_filter == SWITCHABLE &&
- mbmi->motion_mode != WARPED_CAUSAL &&
- !is_nontrans_global_motion(xd, mbmi)) {
+ av1_is_interp_needed(xd)) {
update_filter_type_cdf(xd, mbmi, cm->seq_params->enable_dual_filter);
}
if (inter_block &&
@@ -2301,7 +2306,8 @@
// here. Check to see is skipping cdef is allowed.
const int allow_cdef_skipping =
cpi->rc.frames_since_key > 10 && !cpi->rc.high_source_sad &&
- !(x->color_sensitivity[0] || x->color_sensitivity[1]);
+ !(x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] ||
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)]);
// Find the corresponding 64x64 block. It'll be the 128x128 block if that's
// the block size.
@@ -2312,9 +2318,15 @@
get_mi_grid_idx(&cm->mi_params, mi_row_sb, mi_col_sb);
// Do not skip if intra or new mv is picked, or color sensitivity is set.
// Never skip on slide/scene change.
- mi_sb[0]->cdef_strength =
- mi_sb[0]->cdef_strength && allow_cdef_skipping &&
- !(mbmi->mode < INTRA_MODES || mbmi->mode == NEWMV);
+ if (cpi->sf.rt_sf.skip_cdef_sb >= 2) {
+ mi_sb[0]->cdef_strength =
+ mi_sb[0]->cdef_strength &&
+ (allow_cdef_skipping || x->source_variance == 0);
+ } else {
+ mi_sb[0]->cdef_strength =
+ mi_sb[0]->cdef_strength && allow_cdef_skipping &&
+ !(mbmi->mode < INTRA_MODES || mbmi->mode == NEWMV);
+ }
// Store in the pickmode context.
ctx->mic.cdef_strength = mi_sb[0]->cdef_strength;
}
@@ -2538,6 +2550,7 @@
MACROBLOCK *const x = &td->mb;
MACROBLOCKD *const xd = &x->e_mbd;
const ModeCosts *mode_costs = &x->mode_costs;
+ const int num_planes = av1_num_planes(cm);
// Only square blocks from 8x8 to 128x128 are supported
assert(bsize >= BLOCK_8X8 && bsize <= BLOCK_128X128);
const int bs = mi_size_wide[bsize];
@@ -2547,7 +2560,7 @@
RD_STATS split_rdc, none_rdc;
av1_invalid_rd_stats(&split_rdc);
av1_invalid_rd_stats(&none_rdc);
- av1_save_context(x, &x_ctx, mi_row, mi_col, bsize, 3);
+ av1_save_context(x, &x_ctx, mi_row, mi_col, bsize, num_planes);
xd->above_txfm_context =
cm->above_contexts.txfm[tile_info->tile_row] + mi_col;
xd->left_txfm_context =
@@ -2562,7 +2575,7 @@
pc_tree->none);
none_rdc.rate += mode_costs->partition_cost[pl][PARTITION_NONE];
none_rdc.rdcost = RDCOST(x->rdmult, none_rdc.rate, none_rdc.dist);
- av1_restore_context(x, &x_ctx, mi_row, mi_col, bsize, 3);
+ av1_restore_context(x, &x_ctx, mi_row, mi_col, bsize, num_planes);
if (cpi->sf.rt_sf.nonrd_check_partition_merge_mode < 2 ||
none_rdc.skip_txfm != 1 || pc_tree->none->mic.mode == NEWMV) {
@@ -2608,7 +2621,7 @@
1, subsize, PARTITION_NONE, pc_tree->split[i]->none,
NULL);
}
- av1_restore_context(x, &x_ctx, mi_row, mi_col, bsize, 3);
+ av1_restore_context(x, &x_ctx, mi_row, mi_col, bsize, num_planes);
split_rdc.rdcost = RDCOST(x->rdmult, split_rdc.rate, split_rdc.dist);
}
}
@@ -2755,12 +2768,12 @@
frame_mv[i][j].as_int = INVALID_MV;
}
}
- x->color_sensitivity[0] = x->color_sensitivity_sb[0];
- x->color_sensitivity[1] = x->color_sensitivity_sb[1];
+ av1_copy(x->color_sensitivity, x->color_sensitivity_sb);
skip_pred_mv = (x->nonrd_prune_ref_frame_search > 2 &&
- x->color_sensitivity[0] != 2 && x->color_sensitivity[1] != 2);
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_U)] != 2 &&
+ x->color_sensitivity[COLOR_SENS_IDX(AOM_PLANE_V)] != 2);
- find_predictors(cpi, x, ref_frame, frame_mv, tile_data, yv12_mb, bsize,
+ find_predictors(cpi, x, ref_frame, frame_mv, yv12_mb, bsize,
force_skip_low_temp_var, skip_pred_mv);
int continue_merging = 1;
@@ -2776,8 +2789,8 @@
// calling find_predictors() again.
av1_set_offsets_without_segment_id(cpi, &tile_data->tile_info, x, mi_row,
mi_col, this_mi[0]->bsize);
- find_predictors(cpi, x, ref_frame, frame_mv, tile_data, yv12_mb,
- this_mi[0]->bsize, force_skip_low_temp_var, skip_pred_mv);
+ find_predictors(cpi, x, ref_frame, frame_mv, yv12_mb, this_mi[0]->bsize,
+ force_skip_low_temp_var, skip_pred_mv);
} else {
struct scale_factors *sf = get_ref_scale_factors(cm, ref_frame);
const int is_scaled = av1_is_scaled(sf);
@@ -4451,14 +4464,15 @@
static int read_partition_tree(AV1_COMP *const cpi, PC_TREE *const pc_tree,
const int config_id) {
+ const AV1_COMMON *const cm = &cpi->common;
const char *path = cpi->oxcf.partition_info_path;
char filename[256];
snprintf(filename, sizeof(filename), "%s/partition_tree_sb%d_c%d", path,
cpi->sb_counter, config_id);
FILE *pfile = fopen(filename, "r");
if (pfile == NULL) {
- printf("Can't find the file: %s\n", filename);
- exit(0);
+ aom_internal_error(cm->error, AOM_CODEC_ERROR, "Can't find input file: %s.",
+ filename);
}
int read_bsize;
@@ -4618,7 +4632,9 @@
best_rdc.dist = sum_subblock_dist;
best_rdc.rdcost = RDCOST(x->rdmult, best_rdc.rate, best_rdc.dist);
break;
- default: assert(0 && "invalid partition type."); exit(0);
+ default:
+ assert(0 && "invalid partition type.");
+ aom_internal_error(cm->error, AOM_CODEC_ERROR, "Invalid partition type.");
}
// Note: it is necessary to restore context information.
av1_restore_context(x, &x_ctx, mi_row, mi_col, bsize, num_planes);
@@ -4721,7 +4737,8 @@
update_partition_stats(&this_rdcost, &stats);
av1_ext_part_send_partition_stats(ext_part_controller, &stats);
if (!partition_decision.is_final_decision) {
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
}
} while (!partition_decision.is_final_decision);
@@ -4729,8 +4746,8 @@
set_cb_offsets(x->cb_offset, 0, 0);
encode_sb(cpi, td, tile_data, tp, mi_row, mi_col, OUTPUT_ENABLED, bsize,
pc_tree, NULL);
-
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
return true;
}
@@ -5003,7 +5020,8 @@
for (int i = 0; i < 4; ++i) {
if (pc_tree->split[i] != NULL) {
av1_free_pc_tree_recursive(pc_tree->split[i], av1_num_planes(cm), 0,
- 0);
+ 0,
+ cpi->sf.part_sf.partition_search_type);
pc_tree->split[i] = NULL;
}
}
@@ -5047,8 +5065,8 @@
set_cb_offsets(x->cb_offset, 0, 0);
encode_sb(cpi, td, tile_data, tp, mi_row, mi_col, OUTPUT_ENABLED, bsize,
pc_tree, NULL);
-
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
return true;
}
@@ -5058,6 +5076,7 @@
SIMPLE_MOTION_DATA_TREE *sms_root, int mi_row,
int mi_col, const BLOCK_SIZE bsize,
RD_STATS *best_rd_cost) {
+ AV1_COMMON *const cm = &cpi->common;
if (cpi->ext_part_controller.ready) {
bool valid_search = true;
const aom_ext_part_decision_mode_t decision_mode =
@@ -5073,13 +5092,13 @@
return false;
}
if (!valid_search) {
- assert(0 && "Invalid search from ML model, partition search failed.");
- exit(0);
+ aom_internal_error(
+ cm->error, AOM_CODEC_ERROR,
+ "Invalid search from ML model, partition search failed");
}
return true;
}
- AV1_COMMON *const cm = &cpi->common;
MACROBLOCK *const x = &td->mb;
int best_idx = 0;
int64_t min_rdcost = INT64_MAX;
@@ -5093,10 +5112,10 @@
CHECK_MEM_ERROR(cm, rdcost, aom_calloc(num_configs, sizeof(*rdcost)));
}
if (num_configs <= 0) {
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
if (rdcost != NULL) aom_free(rdcost);
- exit(0);
- return false;
+ aom_internal_error(cm->error, AOM_CODEC_ERROR, "Invalid configs.");
}
verify_write_partition_tree(cpi, pc_tree, bsize, i, mi_row, mi_col);
// Encode the block with the given partition tree. Get rdcost and encoding
@@ -5109,7 +5128,8 @@
best_idx = i;
*best_rd_cost = rdcost[i];
}
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
++i;
} while (i < num_configs);
@@ -5121,8 +5141,8 @@
set_cb_offsets(x->cb_offset, 0, 0);
encode_sb(cpi, td, tile_data, tp, mi_row, mi_col, OUTPUT_ENABLED, bsize,
pc_tree, NULL);
-
- av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, av1_num_planes(cm), 0, 0,
+ cpi->sf.part_sf.partition_search_type);
aom_free(rdcost);
++cpi->sb_counter;
@@ -5177,8 +5197,8 @@
max_var_4x4 = AOMMAX(max_var_4x4, var);
}
}
- *var_min = log(1.0 + min_var_4x4 / 16.0);
- *var_max = log(1.0 + max_var_4x4 / 16.0);
+ *var_min = log1p(min_var_4x4 / 16.0);
+ *var_max = log1p(max_var_4x4 / 16.0);
}
static AOM_INLINE void set_sms_tree_partitioning(
@@ -5359,6 +5379,7 @@
// partition types and intra cnn output.
if (x->must_find_valid_partition) {
reset_part_limitations(cpi, &part_search_state);
+ av1_prune_partitions_by_max_min_bsize(&x->sb_enc, &part_search_state);
// Invalidate intra cnn output for key frames.
if (frame_is_intra_only(cm) && bsize == BLOCK_64X64) {
part_search_state.intra_part_info->quad_tree_idx = 0;
@@ -5372,27 +5393,41 @@
start_timing(cpi, none_partition_search_time);
#endif
- // Further pruning or in some cases reverse pruning when allintra is set
- // This code helps visual and in some cases metrics quality where the current
- // block comprises at least one very low variance sub-block and at least one
- // where the variance is much higher.
- //
- // The idea is that in such cases there is danger of ringing and other visual
- // artifacts from a high variance feature such as an edge into a very low
- // variance region.
- //
- // The approach taken is to force break down / split to a smaller block size
- // to try and separate out the low variance and well predicted blocks from the
- // more complex ones and to prevent propagation of ringing over a large
- // region.
- if ((cpi->oxcf.mode == ALLINTRA) && (bsize >= BLOCK_16X16)) {
- double var_min, var_max;
- log_sub_block_var(cpi, x, bsize, &var_min, &var_max);
+ if (cpi->oxcf.mode == ALLINTRA) {
+ const bool bsize_at_least_16x16 = (bsize >= BLOCK_16X16);
+ const bool prune_rect_part_using_4x4_var_deviation =
+ (cpi->sf.part_sf.prune_rect_part_using_4x4_var_deviation &&
+ !x->must_find_valid_partition);
- if ((var_min < 0.272) && ((var_max - var_min) > 3.0)) {
- part_search_state.partition_none_allowed = 0;
- part_search_state.terminate_partition_search = 0;
- part_search_state.do_square_split = 1;
+ if (bsize_at_least_16x16 || prune_rect_part_using_4x4_var_deviation) {
+ double var_min, var_max;
+ log_sub_block_var(cpi, x, bsize, &var_min, &var_max);
+
+ // Further pruning or in some cases reverse pruning when allintra is set.
+ // This code helps visual and in some cases metrics quality where the
+ // current block comprises at least one very low variance sub-block and at
+ // least one where the variance is much higher.
+ //
+ // The idea is that in such cases there is danger of ringing and other
+ // visual artifacts from a high variance feature such as an edge into a
+ // very low variance region.
+ //
+ // The approach taken is to force break down / split to a smaller block
+ // size to try and separate out the low variance and well predicted blocks
+ // from the more complex ones and to prevent propagation of ringing over a
+ // large region.
+ if (bsize_at_least_16x16 && (var_min < 0.272) &&
+ ((var_max - var_min) > 3.0)) {
+ part_search_state.partition_none_allowed = 0;
+ part_search_state.terminate_partition_search = 0;
+ part_search_state.do_square_split = 1;
+ } else if (prune_rect_part_using_4x4_var_deviation &&
+ (var_max - var_min < 3.0)) {
+ // Prune rectangular partitions if the variance deviation of 4x4
+ // sub-blocks within the block is less than a threshold (derived
+ // empirically).
+ part_search_state.do_rectangular_split = 0;
+ }
}
}
@@ -5584,7 +5619,8 @@
encode_sb(cpi, td, tile_data, tp, mi_row, mi_col, run_type, bsize,
pc_tree, NULL);
// Dealloc the whole PC_TREE after a superblock is done.
- av1_free_pc_tree_recursive(pc_tree, num_planes, 0, 0);
+ av1_free_pc_tree_recursive(pc_tree, num_planes, 0, 0,
+ cpi->sf.part_sf.partition_search_type);
pc_tree_dealloc = 1;
} else if (should_do_dry_run_encode_for_current_block(
cm->seq_params->sb_size, x->sb_enc.max_partition_size,
@@ -5601,7 +5637,8 @@
// If the tree still exists (non-superblock), dealloc most nodes, only keep
// nodes for the best partition and PARTITION_NONE.
if (pc_tree_dealloc == 0)
- av1_free_pc_tree_recursive(pc_tree, num_planes, 1, 1);
+ av1_free_pc_tree_recursive(pc_tree, num_planes, 1, 1,
+ cpi->sf.part_sf.partition_search_type);
if (bsize == cm->seq_params->sb_size) {
assert(best_rdc.rate < INT_MAX);
@@ -5659,7 +5696,7 @@
float score[LABELS];
features[feature_idx] =
- (logf((float)(dc_q * dc_q) / 256.0f + 1.0f) - means[feature_idx]) /
+ (log1pf((float)(dc_q * dc_q) / 256.0f) - means[feature_idx]) /
sqrtf(vars[feature_idx]);
feature_idx++;
av1_setup_src_planes(x, cpi->source, mi_row, mi_col, 1, bsize);
@@ -5679,8 +5716,8 @@
cpi->ppi->fn_ptr[bsize].vf(src, src_stride, pred, pred_stride, &sse);
const float factor = (var == 0) ? 1.0f : (1.0f / (float)var);
- features[feature_idx] = (logf((float)var + 1.0f) - means[feature_idx]) /
- sqrtf(vars[feature_idx]);
+ features[feature_idx] =
+ (log1pf((float)var) - means[feature_idx]) / sqrtf(vars[feature_idx]);
feature_idx++;
for (i = 0; i < 4; ++i) {
const int x_idx = (i & 1) * bs / 2;
@@ -5735,7 +5772,7 @@
cm->seq_params->bit_depth);
int feature_idx = 0;
- features[feature_idx++] = logf((float)(dc_q * dc_q) / 256.0f + 1.0f);
+ features[feature_idx++] = log1pf((float)(dc_q * dc_q) / 256.0f);
av1_setup_src_planes(x, cpi->source, mi_row, mi_col, 1, bsize);
{
const int bs = block_size_wide[bsize];
@@ -5768,7 +5805,7 @@
cpi->fn_ptr[bsize].vf(src, src_stride, pred, pred_stride, &sse);
const float factor = (var == 0) ? 1.0f : (1.0f / (float)var);
- features[feature_idx++] = logf((float)var + 1.0f);
+ features[feature_idx++] = log1pf((float)var);
fprintf(f, "%f,%f,", features[0], features[1]);
for (i = 0; i < 4; ++i) {
diff --git a/av1/encoder/partition_strategy.c b/av1/encoder/partition_strategy.c
index 89c1a79..080587b 100644
--- a/av1/encoder/partition_strategy.c
+++ b/av1/encoder/partition_strategy.c
@@ -187,7 +187,7 @@
const int bit_depth = xd->bd;
const int dc_q =
av1_dc_quant_QTX(x->qindex, 0, bit_depth) >> (bit_depth - 8);
- part_info->log_q = logf(1.0f + (float)(dc_q * dc_q) / 256.0f);
+ part_info->log_q = log1pf((float)(dc_q * dc_q) / 256.0f);
part_info->log_q =
(part_info->log_q - av1_intra_mode_cnn_partition_mean[0]) /
av1_intra_mode_cnn_partition_std[0];
@@ -602,21 +602,21 @@
int f_idx = 0;
if (features_to_get & FEATURE_SMS_NONE_FLAG) {
for (int sub_idx = 0; sub_idx < 2; sub_idx++) {
- features[f_idx++] = logf(1.0f + sms_tree->sms_none_feat[sub_idx]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_none_feat[sub_idx]);
}
}
if (features_to_get & FEATURE_SMS_SPLIT_FLAG) {
for (int sub_idx = 0; sub_idx < SUB_PARTITIONS_SPLIT; sub_idx++) {
SIMPLE_MOTION_DATA_TREE *sub_tree = sms_tree->split[sub_idx];
- features[f_idx++] = logf(1.0f + sub_tree->sms_none_feat[0]);
- features[f_idx++] = logf(1.0f + sub_tree->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sub_tree->sms_none_feat[0]);
+ features[f_idx++] = log1pf((float)sub_tree->sms_none_feat[1]);
}
}
if (features_to_get & FEATURE_SMS_RECT_FLAG) {
for (int sub_idx = 0; sub_idx < 8; sub_idx++) {
- features[f_idx++] = logf(1.0f + sms_tree->sms_rect_feat[sub_idx]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_rect_feat[sub_idx]);
}
}
@@ -625,7 +625,7 @@
// Q_INDEX
const int dc_q = av1_dc_quant_QTX(x->qindex, 0, xd->bd) >> (xd->bd - 8);
- features[f_idx++] = logf(1.0f + (float)(dc_q * dc_q) / 256.0f);
+ features[f_idx++] = log1pf((float)(dc_q * dc_q) / 256.0f);
// Neighbor stuff
const int has_above = !!xd->above_mbmi;
@@ -742,9 +742,9 @@
FEATURE_SMS_PRUNE_PART_FLAG);
int f_idx = FEATURE_SIZE_SMS_PRUNE_PART;
- features[f_idx++] = logf(1.0f + (float)none_rdc->rate);
- features[f_idx++] = logf(1.0f + (float)none_rdc->dist);
- features[f_idx++] = logf(1.0f + (float)none_rdc->rdcost);
+ features[f_idx++] = log1pf((float)none_rdc->rate);
+ features[f_idx++] = log1pf((float)none_rdc->dist);
+ features[f_idx++] = log1pf((float)none_rdc->rdcost);
assert(f_idx == FEATURE_SIZE_SMS_TERM_NONE);
@@ -809,7 +809,7 @@
int f_idx = 0;
const int dc_q = av1_dc_quant_QTX(x->qindex, 0, xd->bd) >> (xd->bd - 8);
- const float log_q_sq = logf(1.0f + (float)(dc_q * dc_q) / 256.0f);
+ const float log_q_sq = log1pf((float)(dc_q * dc_q) / 256.0f);
// Perform full-pixel single motion search in Y plane of 16x16 mbs in the sb
float sum_mv_row_sq = 0;
@@ -845,7 +845,7 @@
const float mv_row = (float)(best_mv.as_mv.row / 8);
const float mv_col = (float)(best_mv.as_mv.col / 8);
- const float log_sse = logf(1.0f + (float)sse);
+ const float log_sse = log1pf((float)sse);
const float abs_mv_row = fabsf(mv_row);
const float abs_mv_col = fabsf(mv_col);
@@ -1056,8 +1056,8 @@
int f_idx = 0;
float features[FEATURES] = { 0.0f };
- features[f_idx++] = logf(1.0f + (float)dc_q / 4.0f);
- features[f_idx++] = logf(1.0f + (float)best_rd / bs / bs / 1024.0f);
+ features[f_idx++] = log1pf((float)dc_q / 4.0f);
+ features[f_idx++] = log1pf((float)best_rd / bs / bs / 1024.0f);
add_rd_feature(part_none_rd, best_rd, features, &f_idx);
add_rd_feature(part_split_rd, best_rd, features, &f_idx);
@@ -1075,17 +1075,17 @@
bsize, NULL,
FEATURE_SMS_PRUNE_PART_FLAG);
- features[f_idx++] = logf(1.0f + (float)sms_tree->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_none_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->split[0]->sms_none_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->split[1]->sms_none_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->split[2]->sms_none_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->split[3]->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->split[0]->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->split[1]->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->split[2]->sms_none_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->split[3]->sms_none_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->sms_rect_feat[1]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->sms_rect_feat[3]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->sms_rect_feat[5]);
- features[f_idx++] = logf(1.0f + (float)sms_tree->sms_rect_feat[7]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_rect_feat[1]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_rect_feat[3]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_rect_feat[5]);
+ features[f_idx++] = log1pf((float)sms_tree->sms_rect_feat[7]);
assert(f_idx == FEATURES);
diff --git a/av1/encoder/pass2_strategy.c b/av1/encoder/pass2_strategy.c
index d8b96c5..46bc6b0 100644
--- a/av1/encoder/pass2_strategy.c
+++ b/av1/encoder/pass2_strategy.c
@@ -20,6 +20,7 @@
#include <assert.h>
#include <stdint.h>
+#include "aom_mem/aom_mem.h"
#include "config/aom_config.h"
#include "config/aom_scale_rtcd.h"
@@ -513,12 +514,12 @@
gf_stats->gf_group_inactive_zone_rows += stats->inactive_zone_rows;
}
-void av1_accumulate_next_frame_stats(const FIRSTPASS_STATS *stats,
- const int flash_detected,
- const int frames_since_key,
- const int cur_idx,
- GF_GROUP_STATS *gf_stats, int f_w,
- int f_h) {
+static void accumulate_next_frame_stats(const FIRSTPASS_STATS *stats,
+ const int flash_detected,
+ const int frames_since_key,
+ const int cur_idx,
+ GF_GROUP_STATS *gf_stats, int f_w,
+ int f_h) {
accumulate_frame_motion_stats(stats, gf_stats, f_w, f_h);
// sum up the metric values of current gf group
gf_stats->avg_sr_coded_error += stats->sr_coded_error;
@@ -1034,9 +1035,9 @@
return 0;
}
-static int is_shorter_gf_interval_better(AV1_COMP *cpi,
- EncodeFrameParams *frame_params) {
- RATE_CONTROL *const rc = &cpi->rc;
+static int is_shorter_gf_interval_better(
+ AV1_COMP *cpi, const EncodeFrameParams *frame_params) {
+ const RATE_CONTROL *const rc = &cpi->rc;
PRIMARY_RATE_CONTROL *const p_rc = &cpi->ppi->p_rc;
int gop_length_decision_method = cpi->sf.tpl_sf.gop_length_decision_method;
int shorten_gf_interval;
@@ -1938,9 +1939,8 @@
flash_detected = detect_flash(twopass, &cpi->twopass_frame, 0);
// TODO(bohanli): remove redundant accumulations here, or unify
// this and the ones in define_gf_group
- av1_accumulate_next_frame_stats(&next_frame, flash_detected,
- rc->frames_since_key, i, &gf_stats, f_w,
- f_h);
+ accumulate_next_frame_stats(&next_frame, flash_detected,
+ rc->frames_since_key, i, &gf_stats, f_w, f_h);
cut_here = detect_gf_cut(cpi, i, cur_start, flash_detected,
active_max_gf_interval, active_min_gf_interval,
@@ -2044,8 +2044,9 @@
temp_accu_coeff *= stats[n].cor_coeff;
this_score +=
temp_accu_coeff *
- (1 - stats[n].noise_var /
- AOMMAX(regions[this_reg].avg_intra_err, 0.001));
+ sqrt(AOMMAX(0.5,
+ 1 - stats[n].noise_var /
+ AOMMAX(stats[n].intra_error, 0.001)));
count_f++;
}
// preceding frames
@@ -2055,8 +2056,9 @@
temp_accu_coeff *= stats[n].cor_coeff;
this_score +=
temp_accu_coeff *
- (1 - stats[n].noise_var /
- AOMMAX(regions[this_reg].avg_intra_err, 0.001));
+ sqrt(AOMMAX(0.5,
+ 1 - stats[n].noise_var /
+ AOMMAX(stats[n].intra_error, 0.001)));
}
if (this_score > best_score) {
@@ -2276,9 +2278,8 @@
flash_detected = detect_flash(twopass, &cpi->twopass_frame, 0);
// accumulate stats for next frame
- av1_accumulate_next_frame_stats(next_frame, flash_detected,
- rc->frames_since_key, i, gf_stats, f_w,
- f_h);
+ accumulate_next_frame_stats(next_frame, flash_detected,
+ rc->frames_since_key, i, gf_stats, f_w, f_h);
++i;
}
@@ -3410,13 +3411,13 @@
TWO_PASS_FRAME *twopass_frame = &cpi->twopass_frame;
// The multiplication by 256 reverses a scaling factor of (>> 8)
// applied when combining MB error values for the frame.
- twopass_frame->mb_av_energy = log((this_frame_ptr->intra_error) + 1.0);
+ twopass_frame->mb_av_energy = log1p(this_frame_ptr->intra_error);
const FIRSTPASS_STATS *const total_stats =
cpi->ppi->twopass.stats_buf_ctx->total_stats;
if (is_fp_wavelet_energy_invalid(total_stats) == 0) {
twopass_frame->frame_avg_haar_energy =
- log((this_frame_ptr->frame_avg_wavelet_energy) + 1.0);
+ log1p(this_frame_ptr->frame_avg_wavelet_energy);
}
// Set the frame content type flag.
@@ -3510,6 +3511,39 @@
}
}
+// Smooth-out the noise variance so it is more stable
+// TODO(bohanli): Use a better low-pass filter than averaging
+static void smooth_filter_noise(FIRSTPASS_STATS *first_stats,
+ FIRSTPASS_STATS *last_stats) {
+ int len = (int)(last_stats - first_stats);
+ double *smooth_noise = aom_malloc(len * sizeof(*smooth_noise));
+ if (!smooth_noise) return;
+
+ for (int i = 0; i < len; i++) {
+ double total_noise = 0;
+ double total_wt = 0;
+ for (int j = -HALF_FILT_LEN; j <= HALF_FILT_LEN; j++) {
+ int idx = AOMMIN(AOMMAX(i + j, 0), len - 1);
+ if (first_stats[idx].is_flash) continue;
+
+ total_noise += first_stats[idx].noise_var;
+ total_wt += 1.0;
+ }
+ if (total_wt > 0.01) {
+ total_noise /= total_wt;
+ } else {
+ total_noise = first_stats[i].noise_var;
+ }
+ smooth_noise[i] = total_noise;
+ }
+
+ for (int i = 0; i < len; i++) {
+ first_stats[i].noise_var = smooth_noise[i];
+ }
+
+ aom_free(smooth_noise);
+}
+
// Estimate the noise variance of each frame from the first pass stats
void av1_estimate_noise(FIRSTPASS_STATS *first_stats,
FIRSTPASS_STATS *last_stats) {
@@ -3597,6 +3631,8 @@
this_stats++) {
this_stats->noise_var = (first_stats + 2)->noise_var;
}
+
+ smooth_filter_noise(first_stats, last_stats);
}
// Estimate correlation coefficient of each frame with its previous frame.
@@ -3638,6 +3674,10 @@
frame_params->show_frame =
!(gf_group->update_type[cpi->gf_frame_index] == ARF_UPDATE ||
gf_group->update_type[cpi->gf_frame_index] == INTNL_ARF_UPDATE);
+ if (cpi->gf_frame_index == 0) {
+ av1_tf_info_reset(&cpi->ppi->tf_info);
+ av1_tf_info_filtering(&cpi->ppi->tf_info, cpi, gf_group);
+ }
return;
}
@@ -3678,17 +3718,6 @@
if (oxcf->rc_cfg.mode == AOM_Q)
rc->active_worst_quality = oxcf->rc_cfg.cq_level;
- FIRSTPASS_STATS this_frame;
- av1_zero(this_frame);
- // call above fn
- if (is_stat_consumption_stage(cpi)) {
- if (cpi->gf_frame_index < gf_group->size || rc->frames_to_key == 0) {
- process_first_pass_stats(cpi, &this_frame);
- update_total_stats = 1;
- }
- } else {
- rc->active_worst_quality = oxcf->rc_cfg.cq_level;
- }
if (cpi->gf_frame_index == gf_group->size) {
if (cpi->ppi->lap_enabled && cpi->ppi->p_rc.enable_scenecut_detection) {
@@ -3701,6 +3730,18 @@
}
}
+ FIRSTPASS_STATS this_frame;
+ av1_zero(this_frame);
+ // call above fn
+ if (is_stat_consumption_stage(cpi)) {
+ if (cpi->gf_frame_index < gf_group->size || rc->frames_to_key == 0) {
+ process_first_pass_stats(cpi, &this_frame);
+ update_total_stats = 1;
+ }
+ } else {
+ rc->active_worst_quality = oxcf->rc_cfg.cq_level;
+ }
+
// Keyframe and section processing.
FIRSTPASS_STATS this_frame_copy;
this_frame_copy = this_frame;
@@ -4160,33 +4201,50 @@
// If the rate control is drifting consider adjustment to min or maxq.
if ((rc_cfg->mode != AOM_Q) && !cpi->rc.is_src_frame_alt_ref) {
- int maxq_adj_limit;
int minq_adj_limit;
- maxq_adj_limit = rc->worst_quality - rc->active_worst_quality;
+ int maxq_adj_limit;
minq_adj_limit =
(rc_cfg->mode == AOM_CQ ? MINQ_ADJ_LIMIT_CQ : MINQ_ADJ_LIMIT);
- // Undershoot.
- if (p_rc->rate_error_estimate > rc_cfg->under_shoot_pct) {
- --twopass->extend_maxq;
- if (p_rc->rolling_target_bits >= p_rc->rolling_actual_bits)
- ++twopass->extend_minq;
- // Overshoot.
- } else if (p_rc->rate_error_estimate < -rc_cfg->over_shoot_pct) {
- --twopass->extend_minq;
- if (p_rc->rolling_target_bits < p_rc->rolling_actual_bits)
- ++twopass->extend_maxq;
+ maxq_adj_limit = rc->worst_quality - rc->active_worst_quality;
+
+ // Undershoot
+ if ((rc_cfg->under_shoot_pct < 100) &&
+ (p_rc->rolling_actual_bits < p_rc->rolling_target_bits)) {
+ int pct_error =
+ ((p_rc->rolling_target_bits - p_rc->rolling_actual_bits) * 100) /
+ p_rc->rolling_target_bits;
+
+ if ((pct_error >= rc_cfg->under_shoot_pct) &&
+ (p_rc->rate_error_estimate > 0)) {
+ twopass->extend_minq += 1;
+ }
+ twopass->extend_maxq -= 1;
+ // Overshoot
+ } else if ((rc_cfg->over_shoot_pct < 100) &&
+ (p_rc->rolling_actual_bits > p_rc->rolling_target_bits)) {
+ int pct_error =
+ ((p_rc->rolling_actual_bits - p_rc->rolling_target_bits) * 100) /
+ p_rc->rolling_target_bits;
+
+ pct_error = clamp(pct_error, 0, 100);
+ if ((pct_error >= rc_cfg->over_shoot_pct) &&
+ (p_rc->rate_error_estimate < 0)) {
+ twopass->extend_maxq += 1;
+ }
+ twopass->extend_minq -= 1;
} else {
// Adjustment for extreme local overshoot.
+ // Only applies when normal adjustment above is not used (e.g.
+ // when threshold is set to 100).
if (rc->projected_frame_size > (2 * rc->base_frame_target) &&
rc->projected_frame_size > (2 * rc->avg_frame_bandwidth))
++twopass->extend_maxq;
- // Unwind undershoot or overshoot adjustment.
- if (p_rc->rolling_target_bits < p_rc->rolling_actual_bits)
- --twopass->extend_minq;
+ // Unwind extreme overshoot adjustment.
else if (p_rc->rolling_target_bits > p_rc->rolling_actual_bits)
--twopass->extend_maxq;
}
- twopass->extend_minq = clamp(twopass->extend_minq, 0, minq_adj_limit);
+ twopass->extend_minq =
+ clamp(twopass->extend_minq, -minq_adj_limit, minq_adj_limit);
twopass->extend_maxq = clamp(twopass->extend_maxq, 0, maxq_adj_limit);
// If there is a big and undexpected undershoot then feed the extra
@@ -4200,19 +4258,6 @@
fast_extra_thresh - rc->projected_frame_size;
p_rc->vbr_bits_off_target_fast = AOMMIN(p_rc->vbr_bits_off_target_fast,
(4 * rc->avg_frame_bandwidth));
-
- // Fast adaptation of minQ if necessary to use up the extra bits.
- if (rc->avg_frame_bandwidth) {
- twopass->extend_minq_fast = (int)(p_rc->vbr_bits_off_target_fast * 8 /
- rc->avg_frame_bandwidth);
- }
- twopass->extend_minq_fast = AOMMIN(
- twopass->extend_minq_fast, minq_adj_limit - twopass->extend_minq);
- } else if (p_rc->vbr_bits_off_target_fast) {
- twopass->extend_minq_fast = AOMMIN(
- twopass->extend_minq_fast, minq_adj_limit - twopass->extend_minq);
- } else {
- twopass->extend_minq_fast = 0;
}
}
@@ -4223,7 +4268,6 @@
p_rc->vbr_bits_off_target_fast;
cpi->ppi->p_rc.temp_extend_minq = twopass->extend_minq;
cpi->ppi->p_rc.temp_extend_maxq = twopass->extend_maxq;
- cpi->ppi->p_rc.temp_extend_minq_fast = twopass->extend_minq_fast;
}
#endif
}
diff --git a/av1/encoder/pass2_strategy.h b/av1/encoder/pass2_strategy.h
index a75be1a..e34454e 100644
--- a/av1/encoder/pass2_strategy.h
+++ b/av1/encoder/pass2_strategy.h
@@ -134,12 +134,6 @@
int *num_fpstats_used, int *num_fpstats_required,
int project_gfu_boost);
-void av1_accumulate_next_frame_stats(const FIRSTPASS_STATS *stats,
- const int flash_detected,
- const int frames_since_key,
- const int cur_idx,
- GF_GROUP_STATS *gf_stats, int f_w,
- int f_h);
// Identify stable and unstable regions from first pass stats.
// stats_start points to the first frame to analyze.
// |offset| is the offset from the current frame to the frame stats_start is
diff --git a/av1/encoder/pickcdef.c b/av1/encoder/pickcdef.c
index 22a4557..293dafa 100644
--- a/av1/encoder/pickcdef.c
+++ b/av1/encoder/pickcdef.c
@@ -638,7 +638,7 @@
const int nvfb = cdef_search_ctx->nvfb;
const int nhfb = cdef_search_ctx->nhfb;
cdef_search_ctx->sb_index =
- aom_malloc(nvfb * nhfb * sizeof(cdef_search_ctx->sb_index));
+ aom_malloc(nvfb * nhfb * sizeof(cdef_search_ctx->sb_index[0]));
cdef_search_ctx->sb_count = 0;
cdef_search_ctx->mse[0] =
aom_malloc(sizeof(**cdef_search_ctx->mse) * nvfb * nhfb);
@@ -728,8 +728,8 @@
#endif
}
-static void pick_cdef_from_qp(AV1_COMMON *const cm, int skip_cdef,
- int is_screen_content) {
+void av1_pick_cdef_from_qp(AV1_COMMON *const cm, int skip_cdef,
+ int is_screen_content) {
const int bd = cm->seq_params->bit_depth;
const int q =
av1_ac_quant_QTX(cm->quant_params.base_qindex, 0, bd) >> (bd - 8);
@@ -807,6 +807,8 @@
const int nvfb = (mi_params->mi_rows + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
const int nhfb = (mi_params->mi_cols + MI_SIZE_64X64 - 1) / MI_SIZE_64X64;
MB_MODE_INFO **mbmi = mi_params->mi_grid_base;
+ // mbmi is NULL when real-time rate control library is used.
+ if (!mbmi) return;
for (int r = 0; r < nvfb; ++r) {
for (int c = 0; c < nhfb; ++c) {
MB_MODE_INFO *current_mbmi = mbmi[MI_SIZE_64X64 * c];
@@ -820,7 +822,8 @@
const YV12_BUFFER_CONFIG *ref, AV1_COMMON *cm,
MACROBLOCKD *xd, CDEF_PICK_METHOD pick_method, int rdmult,
int skip_cdef_feature, CDEF_CONTROL cdef_control,
- const int is_screen_content, int non_reference_frame) {
+ const int is_screen_content, int non_reference_frame,
+ int rtc_ext_rc) {
assert(cdef_control != CDEF_NONE);
if (cdef_control == CDEF_REFERENCE && non_reference_frame) {
CdefInfo *const cdef_info = &cm->cdef_info;
@@ -831,8 +834,12 @@
return;
}
+ if (rtc_ext_rc) {
+ av1_pick_cdef_from_qp(cm, 0, 0);
+ return;
+ }
if (pick_method == CDEF_PICK_FROM_Q) {
- pick_cdef_from_qp(cm, skip_cdef_feature, is_screen_content);
+ av1_pick_cdef_from_qp(cm, skip_cdef_feature, is_screen_content);
return;
}
const CommonModeInfoParams *const mi_params = &cm->mi_params;
diff --git a/av1/encoder/pickcdef.h b/av1/encoder/pickcdef.h
index 548a740..bdd8233 100644
--- a/av1/encoder/pickcdef.h
+++ b/av1/encoder/pickcdef.h
@@ -235,6 +235,7 @@
* \param[in] is_screen_content Whether it is screen content type
* \param[in] non_reference_frame Indicates if current frame is
* non-reference
+ * \param[in] rtc_ext_rc Indicate if external RC is used for testing
*
* \remark Nothing is returned. Instead, optimal CDEF parameters are stored
* in the \c cdef_info structure of type \ref CdefInfo inside \c cm:
@@ -252,7 +253,22 @@
const YV12_BUFFER_CONFIG *ref, AV1_COMMON *cm,
MACROBLOCKD *xd, CDEF_PICK_METHOD pick_method, int rdmult,
int skip_cdef_feature, CDEF_CONTROL cdef_control,
- const int is_screen_content, int non_reference_frame);
+ const int is_screen_content, int non_reference_frame,
+ int rtc_ext_rc);
+
+/*!\brief AV1 CDEF level from QP
+ *
+ * \ingroup in_loop_cdef
+ *
+ * Calculates CDEF levels from frame QP. Only used for speed 7+ with RT mode.
+ *
+ * \param[in,out] cm Pointer to top level common structure
+ * \param[in] skip_cdef Flag to skip CDEF filtering
+ * \param[in] is_screen_content Flag indicating screen content
+ *
+ */
+void av1_pick_cdef_from_qp(AV1_COMMON *const cm, int skip_cdef,
+ int is_screen_content);
#ifdef __cplusplus
} // extern "C"
diff --git a/av1/encoder/picklpf.c b/av1/encoder/picklpf.c
index 90c3c1a..9084d3f 100644
--- a/av1/encoder/picklpf.c
+++ b/av1/encoder/picklpf.c
@@ -234,6 +234,17 @@
cpi->common.width * cpi->common.height > 352 * 288))
? 12034
: 6017;
+ // Increase strength on base TL0 for temporal layers, for low-resoln,
+ // based on frame source_sad.
+ if (cpi->svc.number_temporal_layers > 1 &&
+ cpi->svc.temporal_layer_id == 0 &&
+ cpi->common.width * cpi->common.height <= 352 * 288 &&
+ cpi->sf.rt_sf.use_nonrd_pick_mode) {
+ if (cpi->rc.frame_source_sad > 100000)
+ inter_frame_multiplier = inter_frame_multiplier << 1;
+ else if (cpi->rc.frame_source_sad > 50000)
+ inter_frame_multiplier = 3 * (inter_frame_multiplier >> 1);
+ }
// These values were determined by linear fitting the result of the
// searched level for 8 bit depth:
// Keyframes: filt_guess = q * 0.06699 - 1.60817
diff --git a/av1/encoder/pickrst.c b/av1/encoder/pickrst.c
index dc599a8..7212469 100644
--- a/av1/encoder/pickrst.c
+++ b/av1/encoder/pickrst.c
@@ -32,10 +32,6 @@
#include "av1/encoder/picklpf.h"
#include "av1/encoder/pickrst.h"
-// When set to RESTORE_WIENER or RESTORE_SGRPROJ only those are allowed.
-// When set to RESTORE_TYPES we allow switchable.
-static const RestorationType force_restore_type = RESTORE_TYPES;
-
// Number of Wiener iterations
#define NUM_WIENER_ITERS 5
@@ -149,6 +145,11 @@
SgrprojInfo sgrproj;
WienerInfo wiener;
PixelRect tile_rect;
+
+ // Buffers used to hold dgd-avg and src-avg data respectively during SIMD
+ // call of Wiener filter.
+ int16_t *dgd_avg;
+ int16_t *src_avg;
} RestSearchCtxt;
static AOM_INLINE void rsc_on_tile(void *priv) {
@@ -938,10 +939,11 @@
if (cost_sgr < cost_none) rsc->sgrproj = rusi->sgrproj;
}
-void acc_stat_one_line(const uint8_t *dgd, const uint8_t *src, int dgd_stride,
- int h_start, int h_end, uint8_t avg,
- const int wiener_halfwin, const int wiener_win2,
- int32_t *M_int32, int32_t *H_int32, int count) {
+static void acc_stat_one_line(const uint8_t *dgd, const uint8_t *src,
+ int dgd_stride, int h_start, int h_end,
+ uint8_t avg, const int wiener_halfwin,
+ const int wiener_win2, int32_t *M_int32,
+ int32_t *H_int32, int count) {
int j, k, l;
int16_t Y[WIENER_WIN2];
@@ -969,9 +971,12 @@
}
void av1_compute_stats_c(int wiener_win, const uint8_t *dgd, const uint8_t *src,
- int h_start, int h_end, int v_start, int v_end,
- int dgd_stride, int src_stride, int64_t *M, int64_t *H,
+ int16_t *dgd_avg, int16_t *src_avg, int h_start,
+ int h_end, int v_start, int v_end, int dgd_stride,
+ int src_stride, int64_t *M, int64_t *H,
int use_downsampled_wiener_stats) {
+ (void)dgd_avg;
+ (void)src_avg;
int i, k, l;
const int wiener_win2 = wiener_win * wiener_win;
const int wiener_halfwin = (wiener_win >> 1);
@@ -1096,15 +1101,41 @@
b[i - 1] = c;
}
}
+
+ // b/278065963: The multiplies
+ // c / 256 * A[k * stride + j] / cd * 256
+ // and
+ // c / 256 * b[k] / cd * 256
+ // within Gaussian elimination can cause a signed integer overflow. Rework
+ // the multiplies so that larger scaling is used without significantly
+ // impacting the overall precision.
+ //
+ // Precision guidance:
+ // scale_threshold: Pick as high as possible.
+ // For max_abs_akj >= scale_threshold scenario:
+ // scaler_A: Pick as low as possible. Needed for A[(i + 1) * stride + j].
+ // scaler_c: Pick as low as possible while maintaining scaler_c >=
+ // (1 << 7). Needed for A[(i + 1) * stride + j] and b[i + 1].
+ int64_t max_abs_akj = 0;
+ for (int j = 0; j < n; j++) {
+ const int64_t abs_akj = llabs(A[k * stride + j]);
+ if (abs_akj > max_abs_akj) max_abs_akj = abs_akj;
+ }
+ const int scale_threshold = 1 << 22;
+ const int scaler_A = max_abs_akj < scale_threshold ? 1 : (1 << 5);
+ const int scaler_c = max_abs_akj < scale_threshold ? 1 : (1 << 7);
+ const int scaler = scaler_c * scaler_A;
+
// Forward elimination (convert A to row-echelon form)
for (int i = k; i < n - 1; i++) {
if (A[k * stride + k] == 0) return 0;
- const int64_t c = A[(i + 1) * stride + k];
+ const int64_t c = A[(i + 1) * stride + k] / scaler_c;
const int64_t cd = A[k * stride + k];
for (int j = 0; j < n; j++) {
- A[(i + 1) * stride + j] -= c / 256 * A[k * stride + j] / cd * 256;
+ A[(i + 1) * stride + j] -=
+ A[k * stride + j] / scaler_A * c / cd * scaler;
}
- b[i + 1] -= c / 256 * b[k] / cd * 256;
+ b[i + 1] -= c * b[k] / cd * scaler_c;
}
}
// Back-substitution
@@ -1137,16 +1168,28 @@
A[jj] += Mc[i][j] * b[i] / WIENER_TAP_SCALE_FACTOR;
}
}
+
+ // b/274668506: This is the dual branch for the issue in b/272139363. The fix
+ // is similar. See comments in update_b_sep_sym() below.
+ int32_t max_b_l = 0;
+ for (int l = 0; l < wiener_win; ++l) {
+ const int32_t abs_b_l = abs(b[l]);
+ if (abs_b_l > max_b_l) max_b_l = abs_b_l;
+ }
+ const int scale_threshold = 128 * WIENER_TAP_SCALE_FACTOR;
+ const int scaler = max_b_l < scale_threshold ? 1 : 4;
+
for (i = 0; i < wiener_win; i++) {
for (j = 0; j < wiener_win; j++) {
int k, l;
for (k = 0; k < wiener_win; ++k) {
+ const int kk = wrap_index(k, wiener_win);
for (l = 0; l < wiener_win; ++l) {
- const int kk = wrap_index(k, wiener_win);
const int ll = wrap_index(l, wiener_win);
B[ll * wiener_halfwin1 + kk] +=
Hc[j * wiener_win + i][k * wiener_win2 + l] * b[i] /
- WIENER_TAP_SCALE_FACTOR * b[j] / WIENER_TAP_SCALE_FACTOR;
+ (scaler * WIENER_TAP_SCALE_FACTOR) * b[j] /
+ (WIENER_TAP_SCALE_FACTOR / scaler);
}
}
}
@@ -1197,16 +1240,43 @@
}
}
+ // b/272139363: The computation,
+ // Hc[i * wiener_win + j][k * wiener_win2 + l] * a[k] /
+ // WIENER_TAP_SCALE_FACTOR * a[l] / WIENER_TAP_SCALE_FACTOR;
+ // may generate a signed-integer-overflow. Conditionally scale the terms to
+ // avoid a potential overflow.
+ //
+ // Hc contains accumulated correlation statistics and it is desired to leave
+ // as much room as possible for Hc. It was experimentally observed that the
+ // primary issue manifests itself with the second, a[l], multiply. For
+ // max_a_l < WIENER_TAP_SCALE_FACTOR the first multiply with a[k] should not
+ // increase dynamic range and the second multiply should hence be safe.
+ // Thereafter a safe scale_threshold depends on the actual operational range
+ // of Hc. The largest scale_threshold is expected to depend on bit-depth
+ // (av1_compute_stats_highbd_c() scales highbd to 8-bit) and maximum
+ // restoration-unit size (256), leading up to 32-bit positive numbers in Hc.
+ // Noting that the caller, wiener_decompose_sep_sym(), initializes a[...]
+ // to a range smaller than 16 bits, the scale_threshold is set as below for
+ // convenience.
+ int32_t max_a_l = 0;
+ for (int l = 0; l < wiener_win; ++l) {
+ const int32_t abs_a_l = abs(a[l]);
+ if (abs_a_l > max_a_l) max_a_l = abs_a_l;
+ }
+ const int scale_threshold = 128 * WIENER_TAP_SCALE_FACTOR;
+ const int scaler = max_a_l < scale_threshold ? 1 : 4;
+
for (i = 0; i < wiener_win; i++) {
+ const int ii = wrap_index(i, wiener_win);
for (j = 0; j < wiener_win; j++) {
- const int ii = wrap_index(i, wiener_win);
const int jj = wrap_index(j, wiener_win);
int k, l;
for (k = 0; k < wiener_win; ++k) {
for (l = 0; l < wiener_win; ++l) {
B[jj * wiener_halfwin1 + ii] +=
Hc[i * wiener_win + j][k * wiener_win2 + l] * a[k] /
- WIENER_TAP_SCALE_FACTOR * a[l] / WIENER_TAP_SCALE_FACTOR;
+ (scaler * WIENER_TAP_SCALE_FACTOR) * a[l] /
+ (WIENER_TAP_SCALE_FACTOR / scaler);
}
}
}
@@ -1385,7 +1455,6 @@
return bits;
}
-#define USE_WIENER_REFINEMENT_SEARCH 1
static int64_t finer_tile_search_wiener(const RestSearchCtxt *rsc,
const RestorationTileLimits *limits,
const PixelRect *tile,
@@ -1393,7 +1462,10 @@
int wiener_win) {
const int plane_off = (WIENER_WIN - wiener_win) >> 1;
int64_t err = try_restoration_unit(rsc, limits, tile, rui);
-#if USE_WIENER_REFINEMENT_SEARCH
+
+ if (rsc->lpf_sf->disable_wiener_coeff_refine_search) return err;
+
+ // Refinement search around the wiener filter coefficients.
int64_t err2;
int tap_min[] = { WIENER_FILT_TAP0_MINV, WIENER_FILT_TAP1_MINV,
WIENER_FILT_TAP2_MINV };
@@ -1489,7 +1561,6 @@
}
}
// printf("err post = %"PRId64"\n", err);
-#endif // USE_WIENER_REFINEMENT_SEARCH
return err;
}
@@ -1549,21 +1620,24 @@
const AV1_COMMON *const cm = rsc->cm;
if (cm->seq_params->use_highbitdepth) {
// TODO(any) : Add support for use_downsampled_wiener_stats SF in HBD
- // functions
+ // functions. Optimize intrinsics of HBD design similar to LBD (i.e.,
+ // pre-calculate d and s buffers and avoid most of the C operations).
av1_compute_stats_highbd(reduced_wiener_win, rsc->dgd_buffer,
rsc->src_buffer, limits->h_start, limits->h_end,
limits->v_start, limits->v_end, rsc->dgd_stride,
rsc->src_stride, M, H, cm->seq_params->bit_depth);
} else {
av1_compute_stats(reduced_wiener_win, rsc->dgd_buffer, rsc->src_buffer,
- limits->h_start, limits->h_end, limits->v_start,
- limits->v_end, rsc->dgd_stride, rsc->src_stride, M, H,
+ rsc->dgd_avg, rsc->src_avg, limits->h_start,
+ limits->h_end, limits->v_start, limits->v_end,
+ rsc->dgd_stride, rsc->src_stride, M, H,
rsc->lpf_sf->use_downsampled_wiener_stats);
}
#else
av1_compute_stats(reduced_wiener_win, rsc->dgd_buffer, rsc->src_buffer,
- limits->h_start, limits->h_end, limits->v_start,
- limits->v_end, rsc->dgd_stride, rsc->src_stride, M, H,
+ rsc->dgd_avg, rsc->src_avg, limits->h_start, limits->h_end,
+ limits->v_start, limits->v_end, rsc->dgd_stride,
+ rsc->src_stride, M, H,
rsc->lpf_sf->use_downsampled_wiener_stats);
#endif
@@ -1741,6 +1815,24 @@
return rsi->units_per_tile;
}
+static INLINE void av1_derive_flags_for_lr_processing(
+ const LOOP_FILTER_SPEED_FEATURES *lpf_sf, bool *disable_lr_filter) {
+ const bool is_wiener_disabled = lpf_sf->disable_wiener_filter;
+ const bool is_sgr_disabled = lpf_sf->disable_sgr_filter;
+
+ // Enable None Loop restoration filter if either of Wiener or Self-guided is
+ // enabled.
+ disable_lr_filter[RESTORE_NONE] = (is_wiener_disabled && is_sgr_disabled);
+
+ disable_lr_filter[RESTORE_WIENER] = is_wiener_disabled;
+ disable_lr_filter[RESTORE_SGRPROJ] = is_sgr_disabled;
+
+ // Enable Swicthable Loop restoration filter if both of the Wiener and
+ // Self-guided are enabled.
+ disable_lr_filter[RESTORE_SWITCHABLE] =
+ (is_wiener_disabled || is_sgr_disabled);
+}
+
void av1_pick_filter_restoration(const YV12_BUFFER_CONFIG *src, AV1_COMP *cpi) {
AV1_COMMON *const cm = &cpi->common;
MACROBLOCK *const x = &cpi->td.mb;
@@ -1780,11 +1872,50 @@
"Failed to allocate trial restored frame buffer");
RestSearchCtxt rsc;
+
+ // The buffers 'src_avg' and 'dgd_avg' are used to compute H and M buffers.
+ // These buffers are required for AVX2 SIMD purpose only. Hence, allocated the
+ // same if AVX2 variant of SIMD for av1_compute_stats() is enabled. The buffer
+ // size required is calculated based on maximum width and height of the LRU
+ // (i.e., from foreach_rest_unit_in_tile() 1.5 times the
+ // RESTORATION_UNITSIZE_MAX) allowed for Wiener filtering. The width and
+ // height aligned to multiple of 16 is considered for intrinsic purpose.
+ rsc.dgd_avg = NULL;
+ rsc.src_avg = NULL;
+#if HAVE_AVX2
+ // The buffers allocated below are used during Wiener filter processing of low
+ // bitdepth path. Hence, allocate the same when Wiener filter is enabled in
+ // low bitdepth path.
+ if (!cpi->sf.lpf_sf.disable_wiener_filter &&
+ !cm->seq_params->use_highbitdepth) {
+ const int buf_size = sizeof(*rsc.dgd_avg) * 6 * RESTORATION_UNITSIZE_MAX *
+ RESTORATION_UNITSIZE_MAX;
+ CHECK_MEM_ERROR(cm, rsc.dgd_avg, (int16_t *)aom_memalign(32, buf_size));
+
+ // When LRU width isn't multiple of 16, the 256 bits load instruction used
+ // in AVX2 intrinsic can read data beyond valid LRU. Hence, in order to
+ // silence Valgrind warning this buffer is initialized with zero. Overhead
+ // due to this initialization is negligible since it is done at frame level.
+ memset(rsc.dgd_avg, 0, buf_size);
+ rsc.src_avg =
+ rsc.dgd_avg + 3 * RESTORATION_UNITSIZE_MAX * RESTORATION_UNITSIZE_MAX;
+ // Asserts the starting address of src_avg is always 32-bytes aligned.
+ assert(!((intptr_t)rsc.src_avg % 32));
+ }
+#endif
+
const int plane_start = AOM_PLANE_Y;
const int plane_end = num_planes > 1 ? AOM_PLANE_V : AOM_PLANE_Y;
+
+ // Derive the flags to enable/disable Loop restoration filters based on the
+ // speed features 'disable_wiener_filter' and 'disable_sgr_filter'.
+ bool disable_lr_filter[RESTORE_TYPES] = { false };
+ const LOOP_FILTER_SPEED_FEATURES *lpf_sf = &cpi->sf.lpf_sf;
+ av1_derive_flags_for_lr_processing(lpf_sf, disable_lr_filter);
+
for (int plane = plane_start; plane <= plane_end; ++plane) {
- init_rsc(src, &cpi->common, x, &cpi->sf.lpf_sf, plane, rusi,
- &cpi->trial_frame_rst, &rsc);
+ init_rsc(src, &cpi->common, x, lpf_sf, plane, rusi, &cpi->trial_frame_rst,
+ &rsc);
const int plane_ntiles = ntiles[plane > 0];
const RestorationType num_rtypes =
@@ -1794,16 +1925,16 @@
RestorationType best_rtype = RESTORE_NONE;
const int highbd = rsc.cm->seq_params->use_highbitdepth;
- if ((plane && !cpi->sf.lpf_sf.disable_loop_restoration_chroma) ||
- (!plane && !cpi->sf.lpf_sf.disable_loop_restoration_luma)) {
+ if ((plane && !lpf_sf->disable_loop_restoration_chroma) ||
+ (!plane && !lpf_sf->disable_loop_restoration_luma)) {
av1_extend_frame(rsc.dgd_buffer, rsc.plane_width, rsc.plane_height,
rsc.dgd_stride, RESTORATION_BORDER, RESTORATION_BORDER,
highbd);
for (RestorationType r = 0; r < num_rtypes; ++r) {
- if ((force_restore_type != RESTORE_TYPES) && (r != RESTORE_NONE) &&
- (r != force_restore_type))
- continue;
+ // Disable Loop restoration filter based on the flags set using speed
+ // feature 'disable_wiener_filter' and 'disable_sgr_filter'.
+ if (disable_lr_filter[r]) continue;
double cost = search_rest_type(&rsc, r);
@@ -1815,15 +1946,17 @@
}
cm->rst_info[plane].frame_restoration_type = best_rtype;
- if (force_restore_type != RESTORE_TYPES)
- assert(best_rtype == force_restore_type || best_rtype == RESTORE_NONE);
-
if (best_rtype != RESTORE_NONE) {
for (int u = 0; u < plane_ntiles; ++u) {
copy_unit_info(best_rtype, &rusi[u], &cm->rst_info[plane].unit_info[u]);
}
}
}
-
+#if HAVE_AVX2
+ if (!cpi->sf.lpf_sf.disable_wiener_filter &&
+ !cm->seq_params->use_highbitdepth) {
+ aom_free(rsc.dgd_avg);
+ }
+#endif
aom_free(rusi);
}
diff --git a/av1/encoder/random.h b/av1/encoder/random.h
index 0bca391..efe909b 100644
--- a/av1/encoder/random.h
+++ b/av1/encoder/random.h
@@ -12,14 +12,70 @@
#ifndef AOM_AV1_ENCODER_RANDOM_H_
#define AOM_AV1_ENCODER_RANDOM_H_
+#include <stdint.h>
+
#ifdef __cplusplus
extern "C" {
#endif
+// Advance the generator to its next state, and generate the next 32-bit output.
+// Note that the low bits of this output are comparatively low-quality, so users
+// of this function should ensure that the high bits factor through to their
+// outputs.
+static INLINE uint32_t lcg_next(uint32_t *state) {
+ *state = (uint32_t)(*state * 1103515245ULL + 12345);
+ return *state;
+}
+
// Generate a random number in the range [0, 32768).
-static INLINE unsigned int lcg_rand16(unsigned int *state) {
- *state = (unsigned int)(*state * 1103515245ULL + 12345);
- return *state / 65536 % 32768;
+static INLINE uint32_t lcg_rand16(uint32_t *state) {
+ return (lcg_next(state) / 65536) % 32768;
+}
+
+// Generate a random number in the range [0, n)
+// This is implemented as (rand() * n) / <range of RNG> rather than
+// rand() % n, for a few reasons: This implementation is faster and less biased,
+// and if is a power of 2, this uses the higher-quality top bits from the RNG
+// output rather than the lower-quality bottom bits.
+static INLINE uint32_t lcg_randint(uint32_t *state, uint32_t n) {
+ uint64_t v = ((uint64_t)lcg_next(state) * n) >> 32;
+ return (uint32_t)v;
+}
+
+// Generate a random number in the range [lo, hi)
+static INLINE uint32_t lcg_randrange(uint32_t *state, uint32_t lo,
+ uint32_t hi) {
+ assert(lo < hi);
+ return lo + lcg_randint(state, hi - lo);
+}
+
+// Pick k distinct numbers from the set {0, ..., n-1}
+// All possible sets of k numbers, and all possible orderings of those numbers,
+// are equally likely.
+//
+// Note: The algorithm used here uses resampling to avoid choosing repeated
+// values. This works well as long as n >> k, but can potentially lead to many
+// resampling attempts if n is equal to or only slightly larger than k.
+static INLINE void lcg_pick(int n, int k, int *out, unsigned int *seed) {
+ assert(0 <= k && k <= n);
+ for (int i = 0; i < k; i++) {
+ int v;
+
+ // Inner resampling loop
+ // We have to use a goto here because C does not have a multi-level continue
+ // statement
+ resample:
+ v = (int)lcg_randint(seed, n);
+ for (int j = 0; j < i; j++) {
+ if (v == out[j]) {
+ // Repeated v, resample
+ goto resample;
+ }
+ }
+
+ // New v, accept
+ out[i] = v;
+ }
}
#ifdef __cplusplus
diff --git a/av1/encoder/ratectrl.c b/av1/encoder/ratectrl.c
index 9518480..fdf1495 100644
--- a/av1/encoder/ratectrl.c
+++ b/av1/encoder/ratectrl.c
@@ -174,35 +174,31 @@
return enumerator;
}
+static int get_init_ratio(double sse) { return (int)(300000 / sse); }
+
int av1_rc_bits_per_mb(const AV1_COMP *cpi, FRAME_TYPE frame_type, int qindex,
double correction_factor, int accurate_estimate) {
const AV1_COMMON *const cm = &cpi->common;
const int is_screen_content_type = cpi->is_screen_content_type;
const aom_bit_depth_t bit_depth = cm->seq_params->bit_depth;
const double q = av1_convert_qindex_to_q(qindex, bit_depth);
+ int enumerator = av1_get_bpmb_enumerator(frame_type, is_screen_content_type);
- const int min_dim = AOMMIN(cm->width, cm->height);
+ assert(correction_factor <= MAX_BPB_FACTOR &&
+ correction_factor >= MIN_BPB_FACTOR);
if (frame_type != KEY_FRAME && accurate_estimate) {
assert(cpi->rec_sse != UINT64_MAX);
const int mbs = cm->mi_params.MBs;
- const int res = (min_dim < 480) ? 0 : ((min_dim < 720) ? 1 : 2);
- const double sse_over_q2 = (double)(cpi->rec_sse << BPER_MB_NORMBITS) /
- ((double)q * q) / (double)mbs;
- const double coef[3][2] = {
- { 0.535, 3000.0 }, // < 480
- { 0.590, 3000.0 }, // < 720
- { 0.485, 1000.0 } // 720
- };
- int bits = (int)(coef[res][0] * sse_over_q2 + coef[res][1]);
- return (int)(bits * correction_factor);
+ const double sse_sqrt =
+ (double)((int)sqrt((double)(cpi->rec_sse)) << BPER_MB_NORMBITS) /
+ (double)mbs;
+ const int ratio = (cpi->rc.bit_est_ratio == 0) ? get_init_ratio(sse_sqrt)
+ : cpi->rc.bit_est_ratio;
+ // Clamp the enumerator to lower the q fluctuations.
+ enumerator = AOMMIN(AOMMAX((int)(ratio * sse_sqrt), 20000), 170000);
}
- const int enumerator =
- av1_get_bpmb_enumerator(frame_type, is_screen_content_type);
- assert(correction_factor <= MAX_BPB_FACTOR &&
- correction_factor >= MIN_BPB_FACTOR);
-
// q based adjustment to baseline enumerator
return (int)(enumerator * correction_factor / q);
}
@@ -262,7 +258,8 @@
// Update the buffer level for higher temporal layers, given the encoded current
// temporal layer.
-static void update_layer_buffer_level(SVC *svc, int encoded_frame_size) {
+static void update_layer_buffer_level(SVC *svc, int encoded_frame_size,
+ bool is_screen) {
const int current_temporal_layer = svc->temporal_layer_id;
for (int i = current_temporal_layer + 1; i < svc->number_temporal_layers;
++i) {
@@ -276,6 +273,15 @@
lp_rc->bits_off_target =
AOMMIN(lp_rc->bits_off_target, lp_rc->maximum_buffer_size);
lp_rc->buffer_level = lp_rc->bits_off_target;
+
+ // For screen-content mode: don't let buffer level go below threshold,
+ // given here as -rc->maximum_ buffer_size, to allow buffer to come back
+ // up sooner after slide change with big oveshoot.
+ if (is_screen) {
+ lp_rc->bits_off_target =
+ AOMMAX(lp_rc->bits_off_target, -lp_rc->maximum_buffer_size);
+ lp_rc->buffer_level = lp_rc->bits_off_target;
+ }
}
}
// Update the buffer level: leaky bucket model.
@@ -302,7 +308,8 @@
p_rc->buffer_level = p_rc->bits_off_target;
if (cpi->ppi->use_svc)
- update_layer_buffer_level(&cpi->svc, encoded_frame_size);
+ update_layer_buffer_level(&cpi->svc, encoded_frame_size,
+ cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN);
#if CONFIG_FPMT_TEST
/* The variable temp_buffer_level is introduced for quality
@@ -430,6 +437,7 @@
rc->resize_count = 0;
rc->rtc_external_ratectrl = 0;
rc->frame_level_fast_extra_bits = 0;
+ rc->use_external_qp_one_pass = 0;
}
int av1_rc_drop_frame(AV1_COMP *cpi) {
@@ -483,14 +491,38 @@
const RATE_CONTROL *const rc = &cpi->rc;
const PRIMARY_RATE_CONTROL *const p_rc = &cpi->ppi->p_rc;
const AV1_COMMON *const cm = &cpi->common;
+ const SVC *const svc = &cpi->svc;
const RefreshFrameInfo *const refresh_frame = &cpi->refresh_frame;
- const int max_delta_down = (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN)
- ? AOMMIN(8, AOMMAX(1, rc->q_1_frame / 16))
- : AOMMIN(16, AOMMAX(1, rc->q_1_frame / 8));
- const int max_delta_up = 20;
+ int max_delta_down;
+ int max_delta_up = 20;
const int change_avg_frame_bandwidth =
abs(rc->avg_frame_bandwidth - rc->prev_avg_frame_bandwidth) >
0.1 * (rc->avg_frame_bandwidth);
+
+ // Set the maximum adjustment down for Q for this frame.
+ if (cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ &&
+ cpi->cyclic_refresh->apply_cyclic_refresh) {
+ // For static screen type content limit the Q drop till the start of the
+ // next refresh cycle.
+ if (cpi->is_screen_content_type &&
+ (cpi->cyclic_refresh->sb_index > cpi->cyclic_refresh->last_sb_index)) {
+ max_delta_down = AOMMIN(8, AOMMAX(1, rc->q_1_frame / 32));
+ } else {
+ max_delta_down = AOMMIN(16, AOMMAX(1, rc->q_1_frame / 8));
+ }
+ if (!cpi->ppi->use_svc && cpi->is_screen_content_type) {
+ // Link max_delta_up to max_delta_down and buffer status.
+ if (p_rc->buffer_level > p_rc->optimal_buffer_level) {
+ max_delta_up = AOMMAX(4, max_delta_down);
+ } else {
+ max_delta_up = AOMMAX(8, max_delta_down);
+ }
+ }
+ } else {
+ max_delta_down = (cpi->is_screen_content_type)
+ ? AOMMIN(8, AOMMAX(1, rc->q_1_frame / 16))
+ : AOMMIN(16, AOMMAX(1, rc->q_1_frame / 8));
+ }
// If resolution changes or avg_frame_bandwidth significantly changed,
// then set this flag to indicate change in target bits per macroblock.
const int change_target_bits_mb =
@@ -498,13 +530,20 @@
(width != cm->prev_frame->width || height != cm->prev_frame->height ||
change_avg_frame_bandwidth);
// Apply some control/clamp to QP under certain conditions.
- if (cm->current_frame.frame_type != KEY_FRAME && !cpi->ppi->use_svc &&
- rc->frames_since_key > 1 && !change_target_bits_mb &&
+ // Delay the use of the clamping for svc until after num_temporal_layers,
+ // to make they have been set for each temporal layer.
+ if (!frame_is_intra_only(cm) && rc->frames_since_key > 1 &&
+ (!cpi->ppi->use_svc ||
+ svc->current_superframe > (unsigned int)svc->number_temporal_layers) &&
+ !change_target_bits_mb && !cpi->rc.rtc_external_ratectrl &&
(!cpi->oxcf.rc_cfg.gf_cbr_boost_pct ||
!(refresh_frame->alt_ref_frame || refresh_frame->golden_frame))) {
- // Make sure q is between oscillating Qs to prevent resonance.
+ // If in the previous two frames we have seen both overshoot and undershoot
+ // clamp Q between the two. Check for rc->q_1/2_frame > 0 in case they have
+ // not been set due to dropped frames.
if (rc->rc_1_frame * rc->rc_2_frame == -1 &&
- rc->q_1_frame != rc->q_2_frame) {
+ rc->q_1_frame != rc->q_2_frame && rc->q_1_frame > 0 &&
+ rc->q_2_frame > 0) {
int qclamp = clamp(q, AOMMIN(rc->q_1_frame, rc->q_2_frame),
AOMMAX(rc->q_1_frame, rc->q_2_frame));
// If the previous frame had overshoot and the current q needs to
@@ -518,7 +557,7 @@
// Adjust Q base on source content change from scene detection.
if (cpi->sf.rt_sf.check_scene_detection && rc->prev_avg_source_sad > 0 &&
rc->frames_since_key > 10 && rc->frame_source_sad > 0 &&
- !cpi->ppi->use_svc) {
+ !cpi->rc.rtc_external_ratectrl) {
const int bit_depth = cm->seq_params->bit_depth;
double delta =
(double)rc->avg_source_sad / (double)rc->prev_avg_source_sad - 1.0;
@@ -542,15 +581,42 @@
// Limit the decrease in Q from previous frame.
if (rc->q_1_frame - q > max_delta_down) q = rc->q_1_frame - max_delta_down;
// Limit the increase in Q from previous frame.
- else if (q - rc->q_1_frame > max_delta_up &&
- cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN)
+ else if (q - rc->q_1_frame > max_delta_up)
q = rc->q_1_frame + max_delta_up;
}
- // For single spatial layer: if resolution has increased push q closer
+ // Adjustment for temporal layers.
+ if (svc->number_temporal_layers > 1 && svc->spatial_layer_id == 0 &&
+ !change_target_bits_mb && !cpi->rc.rtc_external_ratectrl &&
+ cpi->oxcf.resize_cfg.resize_mode != RESIZE_DYNAMIC) {
+ if (svc->temporal_layer_id > 0) {
+ // Constrain enhancement relative to the previous base TL0.
+ // Get base temporal layer TL0.
+ const int layer = LAYER_IDS_TO_IDX(0, 0, svc->number_temporal_layers);
+ LAYER_CONTEXT *lc = &svc->layer_context[layer];
+ // lc->rc.avg_frame_bandwidth and lc->p_rc.last_q correspond to the
+ // last TL0 frame.
+ if (rc->avg_frame_bandwidth < lc->rc.avg_frame_bandwidth &&
+ q < lc->p_rc.last_q[INTER_FRAME] - 4)
+ q = lc->p_rc.last_q[INTER_FRAME] - 4;
+ } else if (cpi->svc.temporal_layer_id == 0 &&
+ p_rc->buffer_level > (p_rc->optimal_buffer_level >> 2) &&
+ rc->frame_source_sad < 100000) {
+ // Push base TL0 Q down if buffer is stable and frame_source_sad
+ // is below threshold.
+ int delta = (svc->number_temporal_layers == 2) ? 4 : 10;
+ q = q - delta;
+ }
+ }
+ // For non-svc (single layer): if resolution has increased push q closer
// to the active_worst to avoid excess overshoot.
- if (cpi->svc.number_spatial_layers <= 1 && cm->prev_frame &&
+ if (!cpi->ppi->use_svc && cm->prev_frame &&
(width * height > 1.5 * cm->prev_frame->width * cm->prev_frame->height))
q = (q + active_worst_quality) >> 1;
+ // For single layer RPS: Bias Q based on distance of closest reference.
+ if (cpi->ppi->rtc_ref.bias_recovery_frame) {
+ const int min_dist = av1_svc_get_min_ref_dist(cpi);
+ q = q - AOMMIN(min_dist, 20);
+ }
return AOMMAX(AOMMIN(q, cpi->rc.worst_quality), cpi->rc.best_quality);
}
@@ -709,7 +775,7 @@
// recorded as INTRA only key frames.
if ((cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ) &&
(cpi->cyclic_refresh->counter_encode_maxq_scene_change == 0) &&
- (cm->current_frame.frame_type != KEY_FRAME) && (!cpi->ppi->use_svc)) {
+ !frame_is_intra_only(cm) && !cpi->ppi->use_svc) {
cpi->rc.q_2_frame = cm->quant_params.base_qindex;
cpi->rc.q_1_frame = cm->quant_params.base_qindex;
cpi->rc.rc_2_frame = 0;
@@ -762,8 +828,7 @@
// Adjustment to delta Q and number of blocks updated in cyclic refressh
// based on over or under shoot of target in current frame.
- if (cyclic_refresh_active && (cpi->rc.this_frame_target > 0) &&
- !cpi->ppi->use_svc) {
+ if (cyclic_refresh_active && cpi->rc.this_frame_target > 0) {
CYCLIC_REFRESH *const cr = cpi->cyclic_refresh;
if (correction_factor > 1.25) {
cr->percent_refresh_adjustment =
@@ -1012,19 +1077,27 @@
int layer = LAYER_IDS_TO_IDX(0, 0, svc->number_temporal_layers);
const LAYER_CONTEXT *lc = &svc->layer_context[layer];
const PRIMARY_RATE_CONTROL *const lp_rc = &lc->p_rc;
- avg_qindex_key = lp_rc->avg_frame_qindex[KEY_FRAME];
- if (svc->temporal_layer_id == 0)
- avg_qindex_key =
- AOMMIN(lp_rc->avg_frame_qindex[KEY_FRAME], lp_rc->last_q[KEY_FRAME]);
+ avg_qindex_key =
+ AOMMIN(lp_rc->avg_frame_qindex[KEY_FRAME], lp_rc->last_q[KEY_FRAME]);
}
ambient_qp = (cm->current_frame.frame_number < num_frames_weight_key)
? AOMMIN(p_rc->avg_frame_qindex[INTER_FRAME], avg_qindex_key)
: p_rc->avg_frame_qindex[INTER_FRAME];
- active_worst_quality = AOMMIN(rc->worst_quality, ambient_qp * 5 / 4);
+ ambient_qp = AOMMIN(rc->worst_quality, ambient_qp);
+
if (p_rc->buffer_level > p_rc->optimal_buffer_level) {
// Adjust down.
- // Maximum limit for down adjustment, ~30%.
- int max_adjustment_down = active_worst_quality / 3;
+ int max_adjustment_down; // Maximum adjustment down for Q
+
+ if (cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ && !cpi->ppi->use_svc &&
+ (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN)) {
+ active_worst_quality = AOMMIN(rc->worst_quality, ambient_qp);
+ max_adjustment_down = AOMMIN(4, active_worst_quality / 16);
+ } else {
+ active_worst_quality = AOMMIN(rc->worst_quality, ambient_qp * 5 / 4);
+ max_adjustment_down = active_worst_quality / 3;
+ }
+
if (max_adjustment_down) {
buff_lvl_step =
((p_rc->maximum_buffer_size - p_rc->optimal_buffer_level) /
@@ -1036,6 +1109,7 @@
}
} else if (p_rc->buffer_level > critical_level) {
// Adjust up from ambient Q.
+ active_worst_quality = AOMMIN(rc->worst_quality, ambient_qp);
if (critical_level) {
buff_lvl_step = (p_rc->optimal_buffer_level - critical_level);
if (buff_lvl_step) {
@@ -1043,7 +1117,7 @@
(p_rc->optimal_buffer_level - p_rc->buffer_level) /
buff_lvl_step);
}
- active_worst_quality = ambient_qp + adjustment;
+ active_worst_quality += adjustment;
}
} else {
// Set to worst_quality if buffer is below critical level.
@@ -1204,15 +1278,6 @@
q = *top_index;
}
- // Special case: we force the first few frames to use low q such that
- // these frames are encoded at a high quality, which provides good
- // references for following frames.
- if (current_frame->frame_type != KEY_FRAME && !cpi->ppi->use_svc &&
- current_frame->frame_number >= 10 && current_frame->frame_number <= 15) {
- q = AOMMIN(p_rc->last_kf_qindex + 108, AOMMAX(5, q - 9));
- q = AOMMAX(q, rc->best_quality);
- }
-
assert(*top_index <= rc->worst_quality && *top_index >= rc->best_quality);
assert(*bottom_index <= rc->worst_quality &&
*bottom_index >= rc->best_quality);
@@ -1607,9 +1672,6 @@
const int simulate_parallel_frame =
cpi->ppi->gf_group.frame_parallel_level[cpi->gf_frame_index] > 0 &&
cpi->ppi->fpmt_unit_test_cfg == PARALLEL_SIMULATION_ENCODE;
- int extend_minq_fast = simulate_parallel_frame
- ? p_rc->temp_extend_minq_fast
- : cpi->ppi->twopass.extend_minq_fast;
int extend_minq = simulate_parallel_frame ? p_rc->temp_extend_minq
: cpi->ppi->twopass.extend_minq;
int extend_maxq = simulate_parallel_frame ? p_rc->temp_extend_maxq
@@ -1623,21 +1685,18 @@
(refresh_frame->golden_frame || is_intrl_arf_boost ||
refresh_frame->alt_ref_frame))) {
#if CONFIG_FPMT_TEST
- active_best_quality -= (extend_minq + extend_minq_fast);
+ active_best_quality -= extend_minq;
active_worst_quality += (extend_maxq / 2);
#else
- active_best_quality -=
- (cpi->ppi->twopass.extend_minq + cpi->ppi->twopass.extend_minq_fast);
+ active_best_quality -= cpi->ppi->twopass.extend_minq / 4;
active_worst_quality += (cpi->ppi->twopass.extend_maxq / 2);
#endif
} else {
#if CONFIG_FPMT_TEST
- active_best_quality -= (extend_minq + extend_minq_fast) / 2;
+ active_best_quality -= extend_minq / 2;
active_worst_quality += extend_maxq;
#else
- active_best_quality -=
- (cpi->ppi->twopass.extend_minq + cpi->ppi->twopass.extend_minq_fast) /
- 2;
+ active_best_quality -= cpi->ppi->twopass.extend_minq / 4;
active_worst_quality += cpi->ppi->twopass.extend_maxq;
#endif
}
@@ -1860,6 +1919,8 @@
get_active_best_quality(cpi, active_worst_quality, cq_level, gf_index);
}
+ if (cq_level > 0) active_best_quality = AOMMAX(1, active_best_quality);
+
*top_index = active_worst_quality;
*bottom_index = active_best_quality;
@@ -2048,7 +2109,8 @@
pre_y += (pre_ystride << 6) - (sb_cols << 6);
}
assert(num_samples > 0);
- if (num_samples > 0) cpi->rec_sse = fsse;
+ // Ensure rec_sse > 0
+ if (num_samples > 0) cpi->rec_sse = fsse > 0 ? fsse : 1;
}
int av1_rc_pick_q_and_bounds(AV1_COMP *cpi, int width, int height, int gf_index,
@@ -2168,6 +2230,19 @@
// Post encode loop adjustment of Q prediction.
av1_rc_update_rate_correction_factors(cpi, 0, cm->width, cm->height);
+ // Update bit estimation ratio.
+ if (cm->current_frame.frame_type != KEY_FRAME &&
+ cpi->sf.hl_sf.accurate_bit_estimate) {
+ const double q = av1_convert_qindex_to_q(cm->quant_params.base_qindex,
+ cm->seq_params->bit_depth);
+ const int this_bit_est_ratio =
+ (int)(rc->projected_frame_size * q / sqrt((double)cpi->rec_sse));
+ cpi->rc.bit_est_ratio =
+ cpi->rc.bit_est_ratio == 0
+ ? this_bit_est_ratio
+ : (7 * cpi->rc.bit_est_ratio + this_bit_est_ratio) / 8;
+ }
+
// Keep a record of last Q and ambient average Q.
if (current_frame->frame_type == KEY_FRAME) {
p_rc->last_q[KEY_FRAME] = qindex;
@@ -2266,6 +2341,8 @@
rc->frame_num_last_gf_refresh = current_frame->frame_number;
rc->prev_coded_width = cm->width;
rc->prev_coded_height = cm->height;
+ rc->frame_number_encoded++;
+ rc->prev_frame_is_dropped = 0;
// if (current_frame->frame_number == 1 && cm->show_frame)
/*
rc->this_frame_target =
@@ -2286,6 +2363,11 @@
cpi->rc.prev_avg_frame_bandwidth = cpi->rc.avg_frame_bandwidth;
cpi->rc.prev_coded_width = cpi->common.width;
cpi->rc.prev_coded_height = cpi->common.height;
+ cpi->rc.prev_frame_is_dropped = 1;
+ // On a scene/slide change for dropped frame: reset the avg_source_sad to 0,
+ // otherwise the avg_source_sad can get too large and subsequent frames
+ // may miss the scene/slide detection.
+ if (cpi->rc.high_source_sad) cpi->rc.avg_source_sad = 0;
}
int av1_find_qindex(double desired_q, aom_bit_depth_t bit_depth,
@@ -2754,6 +2836,9 @@
ExtRefreshFrameFlagsInfo *const ext_refresh_frame_flags =
&ext_flags->refresh_frame;
RTC_REF *const rtc_ref = &cpi->ppi->rtc_ref;
+ unsigned int frame_number = (cpi->oxcf.rc_cfg.drop_frames_water_mark)
+ ? rc->frame_number_encoded
+ : cm->current_frame.frame_number;
unsigned int lag_alt = 4;
int last_idx = 0;
int last_idx_refresh = 0;
@@ -2799,19 +2884,16 @@
ext_flags->ref_frame_flags ^= AOM_LAST2_FLAG;
const int sh = 6;
// Moving index slot for last: 0 - (sh - 1).
- if (cm->current_frame.frame_number > 1)
- last_idx = ((cm->current_frame.frame_number - 1) % sh);
+ if (frame_number > 1) last_idx = ((frame_number - 1) % sh);
// Moving index for refresh of last: one ahead for next frame.
- last_idx_refresh = (cm->current_frame.frame_number % sh);
+ last_idx_refresh = (frame_number % sh);
gld_idx = 6;
// Moving index for alt_ref, lag behind LAST by lag_alt frames.
- if (cm->current_frame.frame_number > lag_alt)
- alt_ref_idx = ((cm->current_frame.frame_number - lag_alt) % sh);
+ if (frame_number > lag_alt) alt_ref_idx = ((frame_number - lag_alt) % sh);
if (cpi->sf.rt_sf.ref_frame_comp_nonrd[1]) {
// Moving index for LAST2, lag behind LAST by 2 frames.
- if (cm->current_frame.frame_number > 2)
- last2_idx = ((cm->current_frame.frame_number - 2) % sh);
+ if (frame_number > 2) last2_idx = ((frame_number - 2) % sh);
}
rtc_ref->ref_idx[0] = last_idx; // LAST
rtc_ref->ref_idx[1] = last_idx_refresh; // LAST2 (for refresh of last).
@@ -2926,6 +3008,13 @@
int light_change = 0;
// Flag to check light change or not.
const int check_light_change = 0;
+ // TODO(marpan): There seems some difference along the bottom border when
+ // using the source_last_tl0 for last_source (used for temporal layers or
+ // when previous frame is dropped).
+ // Remove this bord parameter when issue is resolved: difference is that
+ // non-zero sad exists along bottom border even though source is static.
+ const int border =
+ rc->prev_frame_is_dropped || cpi->svc.number_temporal_layers > 1;
// Store blkwise SAD for later use
if (width == cm->render_width && height == cm->render_height) {
if (cpi->src_sad_blk_64x64 == NULL) {
@@ -2934,7 +3023,8 @@
sizeof(*cpi->src_sad_blk_64x64)));
}
}
- for (int sbi_row = 0; sbi_row < sb_rows; ++sbi_row) {
+ // Avoid bottom and right border.
+ for (int sbi_row = 0; sbi_row < sb_rows - border; ++sbi_row) {
for (int sbi_col = 0; sbi_col < sb_cols; ++sbi_col) {
tmp_sad = cpi->ppi->fn_ptr[bsize].sdf(src_y, src_ystride, last_src_y,
last_src_ystride);
@@ -3068,19 +3158,21 @@
if (qindex <= 120 * p_rc->last_q[INTER_FRAME] / 100)
p_rc->rate_correction_factors[INTER_NORMAL] *= 1.5;
}
- // Apply the same rate control reset to all temporal layers.
- for (int tl = 0; tl < svc->number_temporal_layers; tl++) {
- LAYER_CONTEXT *lc = NULL;
- lc = &svc->layer_context[svc->spatial_layer_id *
- svc->number_temporal_layers +
- tl];
- lc->rc.resize_state = rc->resize_state;
- lc->p_rc.buffer_level = lc->p_rc.optimal_buffer_level;
- lc->p_rc.bits_off_target = lc->p_rc.optimal_buffer_level;
- lc->p_rc.rate_correction_factors[INTER_NORMAL] =
- p_rc->rate_correction_factors[INTER_NORMAL];
- lc->p_rc.avg_frame_qindex[INTER_FRAME] =
- p_rc->avg_frame_qindex[INTER_FRAME];
+ if (svc->number_temporal_layers > 1) {
+ // Apply the same rate control reset to all temporal layers.
+ for (int tl = 0; tl < svc->number_temporal_layers; tl++) {
+ LAYER_CONTEXT *lc = NULL;
+ lc = &svc->layer_context[svc->spatial_layer_id *
+ svc->number_temporal_layers +
+ tl];
+ lc->rc.resize_state = rc->resize_state;
+ lc->p_rc.buffer_level = lc->p_rc.optimal_buffer_level;
+ lc->p_rc.bits_off_target = lc->p_rc.optimal_buffer_level;
+ lc->p_rc.rate_correction_factors[INTER_NORMAL] =
+ p_rc->rate_correction_factors[INTER_NORMAL];
+ lc->p_rc.avg_frame_qindex[INTER_FRAME] =
+ p_rc->avg_frame_qindex[INTER_FRAME];
+ }
}
}
@@ -3205,6 +3297,25 @@
return 0;
}
+// Set to true if this frame is a recovery frame, for 1 layer RPS,
+// and whether we should apply some boost (QP, adjust speed features, etc).
+// Recovery frame here means frame whose closest reference suddenly
+// switched from previous frame to one much further away.
+// TODO(marpan): Consider adding on/off flag to SVC_REF_FRAME_CONFIG to
+// allow more control for applications.
+static bool set_flag_rps_bias_recovery_frame(const AV1_COMP *const cpi) {
+ if (cpi->ppi->rtc_ref.set_ref_frame_config &&
+ cpi->svc.number_temporal_layers == 1 &&
+ cpi->svc.number_spatial_layers == 1 &&
+ cpi->ppi->rtc_ref.reference_was_previous_frame) {
+ int min_dist = av1_svc_get_min_ref_dist(cpi);
+ // Only consider boost for this frame if its closest reference is further
+ // than x frames away, using x = 4 for now.
+ if (min_dist != INT_MAX && min_dist > 4) return true;
+ }
+ return false;
+}
+
void av1_get_one_pass_rt_params(AV1_COMP *cpi, FRAME_TYPE *const frame_type,
const EncodeFrameInput *frame_input,
unsigned int frame_flags) {
@@ -3219,12 +3330,11 @@
const int layer =
LAYER_IDS_TO_IDX(svc->spatial_layer_id, svc->temporal_layer_id,
svc->number_temporal_layers);
- // Turn this on to explicitly set the reference structure rather than
- // relying on internal/default structure.
if (cpi->ppi->use_svc) {
av1_update_temporal_layer_framerate(cpi);
av1_restore_layer_context(cpi);
}
+ cpi->ppi->rtc_ref.bias_recovery_frame = set_flag_rps_bias_recovery_frame(cpi);
// Set frame type.
if (set_key_frame(cpi, frame_flags)) {
*frame_type = KEY_FRAME;
@@ -3240,6 +3350,7 @@
av1_svc_reset_temporal_layers(cpi, 1);
svc->layer_context[layer].is_key_frame = 1;
}
+ rc->frame_number_encoded = 0;
} else {
*frame_type = INTER_FRAME;
gf_group->update_type[cpi->gf_frame_index] = LF_UPDATE;
diff --git a/av1/encoder/ratectrl.h b/av1/encoder/ratectrl.h
index 114778d..4fb1179 100644
--- a/av1/encoder/ratectrl.h
+++ b/av1/encoder/ratectrl.h
@@ -204,6 +204,13 @@
int decimation_factor;
int decimation_count;
+ int prev_frame_is_dropped;
+
+ /*!
+ * Frame number for encoded frames (non-dropped).
+ * Use for setting the rtc reference structure.
+ */
+ unsigned int frame_number_encoded;
/*!\endcond */
/*!
@@ -261,6 +268,15 @@
int prev_coded_width;
int prev_coded_height;
+
+ // The ratio used for inter frames in bit estimation.
+ // TODO(yunqing): if golden frame is treated differently (e.g. gf_cbr_boost_
+ // pct > THR), consider to add bit_est_ratio_g for golden frames.
+ int bit_est_ratio;
+
+ // Whether to use a fixed qp for the frame, bypassing internal rate control.
+ // This flag will reset to 0 after every frame.
+ int use_external_qp_one_pass;
/*!\endcond */
} RATE_CONTROL;
@@ -461,11 +477,6 @@
*/
int temp_extend_maxq;
- /*!
- * Temporary variable used in simulating the delayed update of
- * extend_minq_fast.
- */
- int temp_extend_minq_fast;
#endif
/*!
* Proposed minimum allowed Q different layers in a coding pyramid
diff --git a/av1/encoder/rd.h b/av1/encoder/rd.h
index b1eb154..b38d9ca 100644
--- a/av1/encoder/rd.h
+++ b/av1/encoder/rd.h
@@ -56,6 +56,16 @@
// Factor to weigh the rate for switchable interp filters.
#define SWITCHABLE_INTERP_RATE_FACTOR 1
+// Macros for common video resolutions: width x height
+// For example, 720p represents video resolution of 1280x720 pixels.
+#define RESOLUTION_288P 352 * 288
+#define RESOLUTION_360P 640 * 360
+#define RESOLUTION_480P 640 * 480
+#define RESOLUTION_720P 1280 * 720
+#define RESOLUTION_1080P 1920 * 1080
+#define RESOLUTION_1440P 2560 * 1440
+#define RESOLUTION_4K 3840 * 2160
+
#define RTC_REFS 4
static const MV_REFERENCE_FRAME real_time_ref_combos[RTC_REFS][2] = {
{ LAST_FRAME, NONE_FRAME },
diff --git a/av1/encoder/rdopt.c b/av1/encoder/rdopt.c
index c25db61..8620087 100644
--- a/av1/encoder/rdopt.c
+++ b/av1/encoder/rdopt.c
@@ -1321,6 +1321,10 @@
const int mi_row = xd->mi_row;
const int mi_col = xd->mi_col;
int mode_index_start, mode_index_end;
+ const int txfm_rd_gate_level =
+ get_txfm_rd_gate_level(cpi->sf.inter_sf.txfm_rd_gate_level, bsize,
+ TX_SEARCH_MOTION_MODE, eval_motion_mode);
+
// Modify the start and end index according to speed features. For example,
// if SIMPLE_TRANSLATION has already been searched according to
// the motion_mode_for_winner_cand speed feature, update the mode_index_start
@@ -1429,7 +1433,8 @@
// Refine MV in a small range.
av1_refine_warped_mv(xd, cm, &ms_params, bsize, pts0, pts_inref0,
- total_samples);
+ total_samples, cpi->sf.mv_sf.warp_search_method,
+ cpi->sf.mv_sf.warp_search_iters);
if (mv0.as_int != mbmi->mv[0].as_int) {
// Keep the refined MV and WM parameters.
@@ -1523,7 +1528,7 @@
if (rd_stats->rdcost < *best_est_rd) {
*best_est_rd = rd_stats->rdcost;
assert(sse_y >= 0);
- ref_skip_rd[1] = cpi->sf.inter_sf.txfm_rd_gate_level
+ ref_skip_rd[1] = txfm_rd_gate_level
? RDCOST(x->rdmult, mode_rate, (sse_y << 4))
: INT64_MAX;
}
@@ -1545,14 +1550,14 @@
// Perform full transform search
int64_t skip_rd = INT64_MAX;
int64_t skip_rdy = INT64_MAX;
- if (cpi->sf.inter_sf.txfm_rd_gate_level) {
+ if (txfm_rd_gate_level) {
// Check if the mode is good enough based on skip RD
int64_t sse_y = INT64_MAX;
int64_t curr_sse = get_sse(cpi, x, &sse_y);
skip_rd = RDCOST(x->rdmult, rd_stats->rate, curr_sse);
skip_rdy = RDCOST(x->rdmult, rd_stats->rate, (sse_y << 4));
int eval_txfm = check_txfm_eval(x, bsize, ref_skip_rd[0], skip_rd,
- cpi->sf.inter_sf.txfm_rd_gate_level, 0);
+ txfm_rd_gate_level, 0);
if (!eval_txfm) continue;
}
@@ -1635,18 +1640,22 @@
static int64_t skip_mode_rd(RD_STATS *rd_stats, const AV1_COMP *const cpi,
MACROBLOCK *const x, BLOCK_SIZE bsize,
- const BUFFER_SET *const orig_dst) {
+ const BUFFER_SET *const orig_dst, int64_t best_rd) {
assert(bsize < BLOCK_SIZES_ALL);
const AV1_COMMON *cm = &cpi->common;
const int num_planes = av1_num_planes(cm);
MACROBLOCKD *const xd = &x->e_mbd;
const int mi_row = xd->mi_row;
const int mi_col = xd->mi_col;
- av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, orig_dst, bsize, 0,
- av1_num_planes(cm) - 1);
-
int64_t total_sse = 0;
+ int64_t this_rd = INT64_MAX;
+ const int skip_mode_ctx = av1_get_skip_mode_context(xd);
+ rd_stats->rate = x->mode_costs.skip_mode_cost[skip_mode_ctx][1];
+
for (int plane = 0; plane < num_planes; ++plane) {
+ // Call av1_enc_build_inter_predictor() for one plane at a time.
+ av1_enc_build_inter_predictor(cm, xd, mi_row, mi_col, orig_dst, bsize,
+ plane, plane);
const struct macroblock_plane *const p = &x->plane[plane];
const struct macroblockd_plane *const pd = &xd->plane[plane];
const BLOCK_SIZE plane_bsize =
@@ -1658,11 +1667,14 @@
int64_t sse = aom_sum_squares_2d_i16(p->src_diff, bw, bw, bh) << 4;
sse >>= ((cpi->frame_info.bit_depth - 8) * 2);
total_sse += sse;
+ // When current rd cost is more than the best rd, skip evaluation of
+ // remaining planes.
+ this_rd = RDCOST(x->rdmult, rd_stats->rate, total_sse);
+ if (this_rd > best_rd) break;
}
- const int skip_mode_ctx = av1_get_skip_mode_context(xd);
+
rd_stats->dist = rd_stats->sse = total_sse;
- rd_stats->rate = x->mode_costs.skip_mode_cost[skip_mode_ctx][1];
- rd_stats->rdcost = RDCOST(x->rdmult, rd_stats->rate, rd_stats->dist);
+ rd_stats->rdcost = this_rd;
restore_dst_buf(xd, *orig_dst, num_planes);
return 0;
@@ -1670,6 +1682,10 @@
// Check NEARESTMV, NEARMV, GLOBALMV ref mvs for duplicate and skip the relevant
// mode
+// Note(rachelbarker): This speed feature currently does not interact correctly
+// with global motion. The issue is that, when global motion is used, GLOBALMV
+// produces a different prediction to NEARESTMV/NEARMV even if the motion
+// vectors are the same. Thus GLOBALMV should not be pruned in this case.
static INLINE int check_repeat_ref_mv(const MB_MODE_INFO_EXT *mbmi_ext,
int ref_idx,
const MV_REFERENCE_FRAME *ref_frame,
@@ -1748,9 +1764,16 @@
// population
static INLINE int skip_nearest_near_mv_using_refmv_weight(
const MACROBLOCK *const x, const PREDICTION_MODE this_mode,
- const int8_t ref_frame_type) {
+ const int8_t ref_frame_type, PREDICTION_MODE best_mode) {
if (this_mode != NEARESTMV && this_mode != NEARMV) return 0;
+ // Do not skip the mode if the current block has not yet obtained a valid
+ // inter mode.
+ if (!is_inter_mode(best_mode)) return 0;
+ const MACROBLOCKD *xd = &x->e_mbd;
+ // Do not skip the mode if both the top and left neighboring blocks are not
+ // available.
+ if (!xd->left_available || !xd->up_available) return 0;
const MB_MODE_INFO_EXT *const mbmi_ext = &x->mbmi_ext;
const uint16_t *const ref_mv_weight = mbmi_ext->weight[ref_frame_type];
const int ref_mv_count =
@@ -2482,15 +2505,18 @@
const int is_comp_pred = has_second_ref(mbmi);
const MV_REFERENCE_FRAME *refs = mbmi->ref_frame;
- // Check that the global mv is the same as ZEROMV
- assert(mbmi->mv[0].as_int == 0);
- assert(IMPLIES(is_comp_pred, mbmi->mv[0].as_int == 0));
- assert(xd->global_motion[refs[0]].wmtype == TRANSLATION ||
- xd->global_motion[refs[0]].wmtype == IDENTITY);
-
- // Don't prune if we have invalid data
for (int idx = 0; idx < 1 + is_comp_pred; idx++) {
- assert(mbmi->mv[0].as_int == 0);
+ if (xd->global_motion[refs[idx]].wmtype != IDENTITY) {
+ // Pruning logic only works for IDENTITY type models
+ // Note: In theory we could apply similar logic for TRANSLATION
+ // type models, but we do not code these due to a spec bug
+ // (see comments in gm_get_motion_vector() in av1/common/mv.h)
+ assert(xd->global_motion[refs[idx]].wmtype != TRANSLATION);
+ return 0;
+ }
+
+ // Don't prune if we have invalid data
+ assert(mbmi->mv[idx].as_int == 0);
if (args->best_single_sse_in_refs[refs[idx]] == INT32_MAX) {
return 0;
}
@@ -2940,7 +2966,6 @@
continue;
if (cpi->sf.gm_sf.prune_zero_mv_with_sse &&
- cpi->sf.gm_sf.gm_search_type == GM_DISABLE_SEARCH &&
(this_mode == GLOBALMV || this_mode == GLOBAL_GLOBALMV)) {
if (prune_zero_mv_with_sse(cpi->ppi->fn_ptr, x, bsize, args,
cpi->sf.gm_sf.prune_zero_mv_with_sse)) {
@@ -3165,8 +3190,10 @@
FULLPEL_MOTION_SEARCH_PARAMS fullms_params;
const search_site_config *lookahead_search_sites =
cpi->mv_search_params.search_site_cfg[SS_CFG_LOOKAHEAD];
+ const FULLPEL_MV start_mv = get_fullmv_from_mv(&dv_ref.as_mv);
av1_make_default_fullpel_ms_params(&fullms_params, cpi, x, bsize,
- &dv_ref.as_mv, lookahead_search_sites,
+ &dv_ref.as_mv, start_mv,
+ lookahead_search_sites,
/*fine_search_interval=*/0);
const IntraBCMVCosts *const dv_costs = x->dv_costs;
av1_set_ms_to_intra_mode(&fullms_params, dv_costs);
@@ -3213,7 +3240,6 @@
}
const int step_param = cpi->mv_search_params.mv_step_param;
- const FULLPEL_MV start_mv = get_fullmv_from_mv(&dv_ref.as_mv);
IntraBCHashInfo *intrabc_hash_info = &x->intrabc_hash_info;
int_mv best_mv, best_hash_mv;
@@ -3446,9 +3472,6 @@
orig_dst.stride[i] = xd->plane[i].dst.stride;
}
- // Obtain the rdcost for skip_mode.
- skip_mode_rd(&skip_mode_rd_stats, cpi, x, bsize, &orig_dst);
-
// Compare the use of skip_mode with the best intra/inter mode obtained.
const int skip_mode_ctx = av1_get_skip_mode_context(xd);
int64_t best_intra_inter_mode_cost = INT64_MAX;
@@ -3462,6 +3485,10 @@
av1_rd_cost_update(x->rdmult, rd_cost);
}
+ // Obtain the rdcost for skip_mode.
+ skip_mode_rd(&skip_mode_rd_stats, cpi, x, bsize, &orig_dst,
+ best_intra_inter_mode_cost);
+
if (skip_mode_rd_stats.rdcost <= best_intra_inter_mode_cost &&
(!xd->lossless[mbmi->segment_id] || skip_mode_rd_stats.dist == 0)) {
assert(mode_index != THR_INVALID);
@@ -5029,8 +5056,14 @@
if (sf->inter_sf.prune_nearest_near_mv_using_refmv_weight && !comp_pred) {
const int8_t ref_frame_type = av1_ref_frame_type(ref_frames);
- if (skip_nearest_near_mv_using_refmv_weight(x, this_mode, ref_frame_type))
+ if (skip_nearest_near_mv_using_refmv_weight(
+ x, this_mode, ref_frame_type,
+ args->search_state->best_mbmode.mode)) {
+ // Ensure the mode is pruned only when the current block has obtained a
+ // valid inter mode.
+ assert(is_inter_mode(args->search_state->best_mbmode.mode));
return 1;
+ }
}
if (sf->rt_sf.prune_inter_modes_with_golden_ref &&
@@ -5169,13 +5202,15 @@
RD_STATS rd_stats_uv;
const int mode_rate = inter_modes_info->mode_rate_arr[data_idx];
int64_t skip_rd = INT64_MAX;
- if (cpi->sf.inter_sf.txfm_rd_gate_level) {
+ const int txfm_rd_gate_level = get_txfm_rd_gate_level(
+ cpi->sf.inter_sf.txfm_rd_gate_level, bsize, TX_SEARCH_DEFAULT,
+ /*eval_motion_mode=*/0);
+ if (txfm_rd_gate_level) {
// Check if the mode is good enough based on skip RD
int64_t curr_sse = inter_modes_info->sse_arr[data_idx];
skip_rd = RDCOST(x->rdmult, mode_rate, curr_sse);
- int eval_txfm =
- check_txfm_eval(x, bsize, search_state->best_skip_rd[0], skip_rd,
- cpi->sf.inter_sf.txfm_rd_gate_level, 0);
+ int eval_txfm = check_txfm_eval(x, bsize, search_state->best_skip_rd[0],
+ skip_rd, txfm_rd_gate_level, 0);
if (!eval_txfm) continue;
}
@@ -5695,6 +5730,7 @@
interintra_modes,
{ { { 0 }, { { 0 } }, { 0 }, 0, 0, 0, 0 } },
{ { 0, 0 } },
+ { 0 },
0,
0,
-1,
diff --git a/av1/encoder/rdopt.h b/av1/encoder/rdopt.h
index 78a23d6..efb797e 100644
--- a/av1/encoder/rdopt.h
+++ b/av1/encoder/rdopt.h
@@ -105,7 +105,7 @@
* based on calculated modelled RD cost. Only 4 intra modes are checked as
* specified in \c intra_mode_list. When calculating RD cost Hadamard transform
* of residual is used to calculate rate. Estmation of RD cost is performed
- * in \c estimate_block_intra which is called from this function
+ * in \c av1_estimate_block_intra which is called from this function
*
* \param[in] cpi Top-level encoder structure
* \param[in] x Pointer to structure holding all the data for
diff --git a/av1/encoder/rdopt_utils.h b/av1/encoder/rdopt_utils.h
index 91823d8..1c5b3db 100644
--- a/av1/encoder/rdopt_utils.h
+++ b/av1/encoder/rdopt_utils.h
@@ -23,6 +23,7 @@
#endif
#define MAX_REF_MV_SEARCH 3
+#define MAX_TX_RD_GATE_LEVEL 5
#define INTER_INTRA_RD_THRESH_SCALE 9
#define INTER_INTRA_RD_THRESH_SHIFT 4
@@ -352,10 +353,12 @@
// Derive aggressiveness factor for gating the transform search
// Lower value indicates more aggressiveness. Be more conservative (high
// value) for (i) low quantizers (ii) regions where prediction is poor
- const int scale[5] = { INT_MAX, 4, 3, 2, 2 };
+ const int scale[MAX_TX_RD_GATE_LEVEL + 1] = { INT_MAX, 4, 3, 2, 2, 1 };
const int qslope = 2 * (!is_luma_only);
- const int level_to_qindex_map[5] = { 0, 0, 0, 80, 100 };
+ const int level_to_qindex_map[MAX_TX_RD_GATE_LEVEL + 1] = { 0, 0, 0,
+ 80, 100, 140 };
int aggr_factor = 4;
+ assert(level <= MAX_TX_RD_GATE_LEVEL);
const int pred_qindex_thresh = level_to_qindex_map[level];
if (!is_luma_only && level <= 2) {
aggr_factor = 4 * AOMMAX(1, ROUND_POWER_OF_TWO((MAXQ - x->qindex) * qslope,
@@ -374,7 +377,9 @@
// since best_skip_rd is computed after and skip_rd is computed (with 8-bit
// prediction signals blended for WEDGE/DIFFWTD rather than 16-bit) before
// interpolation filter search
- const int luma_mul[5] = { INT_MAX, 32, 29, 17, 17 };
+ const int luma_mul[MAX_TX_RD_GATE_LEVEL + 1] = {
+ INT_MAX, 32, 29, 17, 17, 17
+ };
int mul_factor = is_luma_only ? luma_mul[level] : 16;
int64_t rd_thresh =
(best_skip_rd == INT64_MAX)
@@ -767,6 +772,18 @@
USABLE_REF_MV_STACK_SIZE * sizeof(xd->ref_mv_stack[0][0]));
}
+// Get transform rd gate level for the given transform search case.
+static INLINE int get_txfm_rd_gate_level(
+ const int txfm_rd_gate_level[TX_SEARCH_CASES], BLOCK_SIZE bsize,
+ TX_SEARCH_CASE tx_search_case, int eval_motion_mode) {
+ assert(tx_search_case < TX_SEARCH_CASES);
+ if (tx_search_case == TX_SEARCH_MOTION_MODE && !eval_motion_mode &&
+ num_pels_log2_lookup[bsize] > 8)
+ return txfm_rd_gate_level[TX_SEARCH_MOTION_MODE];
+
+ return txfm_rd_gate_level[TX_SEARCH_DEFAULT];
+}
+
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/av1/encoder/reconinter_enc.c b/av1/encoder/reconinter_enc.c
index ac7dc16..83e5d4f 100644
--- a/av1/encoder/reconinter_enc.c
+++ b/av1/encoder/reconinter_enc.c
@@ -515,23 +515,23 @@
}
}
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const AV1_COMMON *const cm,
- int mi_row, int mi_col, const MV *const mv,
- uint8_t *comp_pred, const uint8_t *pred,
- int width, int height, int subpel_x_q3,
- int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask,
- int mask_stride, int invert_mask,
- int subpel_search) {
+void aom_comp_mask_upsampled_pred(MACROBLOCKD *xd, const AV1_COMMON *const cm,
+ int mi_row, int mi_col, const MV *const mv,
+ uint8_t *comp_pred, const uint8_t *pred,
+ int width, int height, int subpel_x_q3,
+ int subpel_y_q3, const uint8_t *ref,
+ int ref_stride, const uint8_t *mask,
+ int mask_stride, int invert_mask,
+ int subpel_search) {
if (subpel_x_q3 | subpel_y_q3) {
- aom_upsampled_pred_c(xd, cm, mi_row, mi_col, mv, comp_pred, width, height,
- subpel_x_q3, subpel_y_q3, ref, ref_stride,
- subpel_search);
+ aom_upsampled_pred(xd, cm, mi_row, mi_col, mv, comp_pred, width, height,
+ subpel_x_q3, subpel_y_q3, ref, ref_stride,
+ subpel_search);
ref = comp_pred;
ref_stride = width;
}
- aom_comp_mask_pred_c(comp_pred, pred, width, height, ref, ref_stride, mask,
- mask_stride, invert_mask);
+ aom_comp_mask_pred(comp_pred, pred, width, height, ref, ref_stride, mask,
+ mask_stride, invert_mask);
}
void aom_dist_wtd_comp_avg_upsampled_pred_c(
diff --git a/av1/encoder/reconinter_enc.h b/av1/encoder/reconinter_enc.h
index e187a5f..16932f3 100644
--- a/av1/encoder/reconinter_enc.h
+++ b/av1/encoder/reconinter_enc.h
@@ -24,6 +24,15 @@
extern "C" {
#endif
+void aom_comp_mask_upsampled_pred(MACROBLOCKD *xd, const AV1_COMMON *const cm,
+ int mi_row, int mi_col, const MV *const mv,
+ uint8_t *comp_pred, const uint8_t *pred,
+ int width, int height, int subpel_x_q3,
+ int subpel_y_q3, const uint8_t *ref,
+ int ref_stride, const uint8_t *mask,
+ int mask_stride, int invert_mask,
+ int subpel_search);
+
void aom_highbd_comp_mask_upsampled_pred(
MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred8, const uint8_t *pred8, int width,
diff --git a/av1/encoder/saliency_map.c b/av1/encoder/saliency_map.c
new file mode 100644
index 0000000..3376846
--- /dev/null
+++ b/av1/encoder/saliency_map.c
@@ -0,0 +1,1414 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+#include <assert.h>
+#include <float.h>
+#include <string.h>
+
+#include "av1/encoder/encoder.h"
+#include "av1/encoder/encoder_utils.h"
+#include "av1/encoder/firstpass.h"
+#include "av1/encoder/rdopt.h"
+#include "av1/encoder/saliency_map.h"
+
+// The Gabor filter is generated by setting the parameters as:
+// ksize = 9
+// sigma = 1
+// theta = y*np.pi/4, where y /in {0, 1, 2, 3}, i.e., 0, 45, 90, 135 degree
+// lambda1 = 1
+// gamma=0.8
+// phi =0
+static const double kGaborFilter[4][9][9] = { // [angle: 0, 45, 90, 135
+ // degree][ksize][ksize]
+ { { 2.0047323e-06, 6.6387620e-05, 8.0876675e-04, 3.6246411e-03, 5.9760227e-03,
+ 3.6246411e-03, 8.0876675e-04, 6.6387620e-05, 2.0047323e-06 },
+ { 1.8831115e-05, 6.2360091e-04, 7.5970138e-03, 3.4047455e-02, 5.6134764e-02,
+ 3.4047455e-02, 7.5970138e-03, 6.2360091e-04, 1.8831115e-05 },
+ { 9.3271126e-05, 3.0887155e-03, 3.7628256e-02, 1.6863814e-01, 2.7803731e-01,
+ 1.6863814e-01, 3.7628256e-02, 3.0887155e-03, 9.3271126e-05 },
+ { 2.4359586e-04, 8.0667874e-03, 9.8273583e-02, 4.4043165e-01, 7.2614902e-01,
+ 4.4043165e-01, 9.8273583e-02, 8.0667874e-03, 2.4359586e-04 },
+ { 3.3546262e-04, 1.1108996e-02, 1.3533528e-01, 6.0653067e-01, 1.0000000e+00,
+ 6.0653067e-01, 1.3533528e-01, 1.1108996e-02, 3.3546262e-04 },
+ { 2.4359586e-04, 8.0667874e-03, 9.8273583e-02, 4.4043165e-01, 7.2614902e-01,
+ 4.4043165e-01, 9.8273583e-02, 8.0667874e-03, 2.4359586e-04 },
+ { 9.3271126e-05, 3.0887155e-03, 3.7628256e-02, 1.6863814e-01, 2.7803731e-01,
+ 1.6863814e-01, 3.7628256e-02, 3.0887155e-03, 9.3271126e-05 },
+ { 1.8831115e-05, 6.2360091e-04, 7.5970138e-03, 3.4047455e-02, 5.6134764e-02,
+ 3.4047455e-02, 7.5970138e-03, 6.2360091e-04, 1.8831115e-05 },
+ { 2.0047323e-06, 6.6387620e-05, 8.0876675e-04, 3.6246411e-03, 5.9760227e-03,
+ 3.6246411e-03, 8.0876675e-04, 6.6387620e-05, 2.0047323e-06 } },
+
+ { { -6.2165498e-08, 3.8760313e-06, 3.0079011e-06, -4.4602581e-04,
+ 6.6981313e-04, 1.3962291e-03, -9.9486928e-04, -8.1631159e-05,
+ 3.5712848e-05 },
+ { 3.8760313e-06, 5.7044272e-06, -1.6041942e-03, 4.5687673e-03,
+ 1.8061366e-02, -2.4406660e-02, -3.7979286e-03, 3.1511115e-03,
+ -8.1631159e-05 },
+ { 3.0079011e-06, -1.6041942e-03, 8.6645801e-03, 6.4960226e-02,
+ -1.6647682e-01, -4.9129307e-02, 7.7304743e-02, -3.7979286e-03,
+ -9.9486928e-04 },
+ { -4.4602581e-04, 4.5687673e-03, 6.4960226e-02, -3.1572008e-01,
+ -1.7670043e-01, 5.2729243e-01, -4.9129307e-02, -2.4406660e-02,
+ 1.3962291e-03 },
+ { 6.6981313e-04, 1.8061366e-02, -1.6647682e-01, -1.7670043e-01,
+ 1.0000000e+00, -1.7670043e-01, -1.6647682e-01, 1.8061366e-02,
+ 6.6981313e-04 },
+ { 1.3962291e-03, -2.4406660e-02, -4.9129307e-02, 5.2729243e-01,
+ -1.7670043e-01, -3.1572008e-01, 6.4960226e-02, 4.5687673e-03,
+ -4.4602581e-04 },
+ { -9.9486928e-04, -3.7979286e-03, 7.7304743e-02, -4.9129307e-02,
+ -1.6647682e-01, 6.4960226e-02, 8.6645801e-03, -1.6041942e-03,
+ 3.0079011e-06 },
+ { -8.1631159e-05, 3.1511115e-03, -3.7979286e-03, -2.4406660e-02,
+ 1.8061366e-02, 4.5687673e-03, -1.6041942e-03, 5.7044272e-06,
+ 3.8760313e-06 },
+ { 3.5712848e-05, -8.1631159e-05, -9.9486928e-04, 1.3962291e-03,
+ 6.6981313e-04, -4.4602581e-04, 3.0079011e-06, 3.8760313e-06,
+ -6.2165498e-08 } },
+
+ { { 2.0047323e-06, 1.8831115e-05, 9.3271126e-05, 2.4359586e-04, 3.3546262e-04,
+ 2.4359586e-04, 9.3271126e-05, 1.8831115e-05, 2.0047323e-06 },
+ { 6.6387620e-05, 6.2360091e-04, 3.0887155e-03, 8.0667874e-03, 1.1108996e-02,
+ 8.0667874e-03, 3.0887155e-03, 6.2360091e-04, 6.6387620e-05 },
+ { 8.0876675e-04, 7.5970138e-03, 3.7628256e-02, 9.8273583e-02, 1.3533528e-01,
+ 9.8273583e-02, 3.7628256e-02, 7.5970138e-03, 8.0876675e-04 },
+ { 3.6246411e-03, 3.4047455e-02, 1.6863814e-01, 4.4043165e-01, 6.0653067e-01,
+ 4.4043165e-01, 1.6863814e-01, 3.4047455e-02, 3.6246411e-03 },
+ { 5.9760227e-03, 5.6134764e-02, 2.7803731e-01, 7.2614902e-01, 1.0000000e+00,
+ 7.2614902e-01, 2.7803731e-01, 5.6134764e-02, 5.9760227e-03 },
+ { 3.6246411e-03, 3.4047455e-02, 1.6863814e-01, 4.4043165e-01, 6.0653067e-01,
+ 4.4043165e-01, 1.6863814e-01, 3.4047455e-02, 3.6246411e-03 },
+ { 8.0876675e-04, 7.5970138e-03, 3.7628256e-02, 9.8273583e-02, 1.3533528e-01,
+ 9.8273583e-02, 3.7628256e-02, 7.5970138e-03, 8.0876675e-04 },
+ { 6.6387620e-05, 6.2360091e-04, 3.0887155e-03, 8.0667874e-03, 1.1108996e-02,
+ 8.0667874e-03, 3.0887155e-03, 6.2360091e-04, 6.6387620e-05 },
+ { 2.0047323e-06, 1.8831115e-05, 9.3271126e-05, 2.4359586e-04, 3.3546262e-04,
+ 2.4359586e-04, 9.3271126e-05, 1.8831115e-05, 2.0047323e-06 } },
+
+ { { 3.5712848e-05, -8.1631159e-05, -9.9486928e-04, 1.3962291e-03,
+ 6.6981313e-04, -4.4602581e-04, 3.0079011e-06, 3.8760313e-06,
+ -6.2165498e-08 },
+ { -8.1631159e-05, 3.1511115e-03, -3.7979286e-03, -2.4406660e-02,
+ 1.8061366e-02, 4.5687673e-03, -1.6041942e-03, 5.7044272e-06,
+ 3.8760313e-06 },
+ { -9.9486928e-04, -3.7979286e-03, 7.7304743e-02, -4.9129307e-02,
+ -1.6647682e-01, 6.4960226e-02, 8.6645801e-03, -1.6041942e-03,
+ 3.0079011e-06 },
+ { 1.3962291e-03, -2.4406660e-02, -4.9129307e-02, 5.2729243e-01,
+ -1.7670043e-01, -3.1572008e-01, 6.4960226e-02, 4.5687673e-03,
+ -4.4602581e-04 },
+ { 6.6981313e-04, 1.8061366e-02, -1.6647682e-01, -1.7670043e-01,
+ 1.0000000e+00, -1.7670043e-01, -1.6647682e-01, 1.8061366e-02,
+ 6.6981313e-04 },
+ { -4.4602581e-04, 4.5687673e-03, 6.4960226e-02, -3.1572008e-01,
+ -1.7670043e-01, 5.2729243e-01, -4.9129307e-02, -2.4406660e-02,
+ 1.3962291e-03 },
+ { 3.0079011e-06, -1.6041942e-03, 8.6645801e-03, 6.4960226e-02,
+ -1.6647682e-01, -4.9129307e-02, 7.7304743e-02, -3.7979286e-03,
+ -9.9486928e-04 },
+ { 3.8760313e-06, 5.7044272e-06, -1.6041942e-03, 4.5687673e-03,
+ 1.8061366e-02, -2.4406660e-02, -3.7979286e-03, 3.1511115e-03,
+ -8.1631159e-05 },
+ { -6.2165498e-08, 3.8760313e-06, 3.0079011e-06, -4.4602581e-04,
+ 6.6981313e-04, 1.3962291e-03, -9.9486928e-04, -8.1631159e-05,
+ 3.5712848e-05 } }
+};
+
+// This function is to extract red/green/blue channels, and calculate intensity
+// = (r+g+b)/3. Note that it only handles 8bits case now.
+// TODO(linzhen): add high bitdepth support.
+static void get_color_intensity(const YV12_BUFFER_CONFIG *src,
+ int subsampling_x, int subsampling_y,
+ double *cr, double *cg, double *cb,
+ double *intensity) {
+ const uint8_t *y = src->buffers[0];
+ const uint8_t *u = src->buffers[1];
+ const uint8_t *v = src->buffers[2];
+
+ const int y_height = src->crop_heights[0];
+ const int y_width = src->crop_widths[0];
+ const int y_stride = src->strides[0];
+ const int c_stride = src->strides[1];
+
+ for (int i = 0; i < y_height; ++i) {
+ for (int j = 0; j < y_width; ++j) {
+ cr[i * y_width + j] =
+ fclamp((double)y[i * y_stride + j] +
+ 1.370 * (double)(v[(i >> subsampling_y) * c_stride +
+ (j >> subsampling_x)] -
+ 128),
+ 0, 255);
+ cg[i * y_width + j] =
+ fclamp((double)y[i * y_stride + j] -
+ 0.698 * (double)(u[(i >> subsampling_y) * c_stride +
+ (j >> subsampling_x)] -
+ 128) -
+ 0.337 * (double)(v[(i >> subsampling_y) * c_stride +
+ (j >> subsampling_x)] -
+ 128),
+ 0, 255);
+ cb[i * y_width + j] =
+ fclamp((double)y[i * y_stride + j] +
+ 1.732 * (double)(u[(i >> subsampling_y) * c_stride +
+ (j >> subsampling_x)] -
+ 128),
+ 0, 255);
+
+ intensity[i * y_width + j] =
+ (cr[i * y_width + j] + cg[i * y_width + j] + cb[i * y_width + j]) /
+ 3.0;
+ assert(intensity[i * y_width + j] >= 0 &&
+ intensity[i * y_width + j] <= 255);
+
+ intensity[i * y_width + j] /= 256;
+ cr[i * y_width + j] /= 256;
+ cg[i * y_width + j] /= 256;
+ cb[i * y_width + j] /= 256;
+ }
+ }
+}
+
+static INLINE double convolve_map(const double *filter, const double *map,
+ const int size) {
+ double result = 0;
+ for (int i = 0; i < size; ++i) {
+ result += filter[i] * map[i]; // symmetric filter is used
+ }
+ return result;
+}
+
+// This function is to decimate the map by half, and apply Gaussian filter on
+// top of the downsampled map.
+static INLINE void decimate_map(const double *map, int height, int width,
+ int stride, double *downsampled_map) {
+ const int new_width = width / 2;
+ const int window_size = 5;
+ const double gaussian_filter[25] = {
+ 1. / 256, 1.0 / 64, 3. / 128, 1. / 64, 1. / 256, 1. / 64, 1. / 16,
+ 3. / 32, 1. / 16, 1. / 64, 3. / 128, 3. / 32, 9. / 64, 3. / 32,
+ 3. / 128, 1. / 64, 1. / 16, 3. / 32, 1. / 16, 1. / 64, 1. / 256,
+ 1. / 64, 3. / 128, 1. / 64, 1. / 256
+ };
+
+ double map_region[25];
+ for (int y = 0; y < height - 1; y += 2) {
+ for (int x = 0; x < width - 1; x += 2) {
+ int i = 0;
+ for (int yy = y - window_size / 2; yy <= y + window_size / 2; ++yy) {
+ for (int xx = x - window_size / 2; xx <= x + window_size / 2; ++xx) {
+ int yvalue = clamp(yy, 0, height - 1);
+ int xvalue = clamp(xx, 0, width - 1);
+ map_region[i++] = map[yvalue * stride + xvalue];
+ }
+ }
+ downsampled_map[(y / 2) * new_width + (x / 2)] =
+ convolve_map(gaussian_filter, map_region, window_size * window_size);
+ }
+ }
+}
+
+// This function is to upscale the map from in_level size to out_level size.
+// Note that the map at "level-1" will upscale the map at "level" by x2.
+static INLINE int upscale_map(const double *input, int in_level, int out_level,
+ int height[9], int width[9], double *output) {
+ for (int level = in_level; level > out_level; level--) {
+ const int cur_width = width[level];
+ const int cur_height = height[level];
+ const int cur_stride = width[level];
+
+ double *original = (level == in_level) ? (double *)input : output;
+
+ assert(level > 0);
+
+ const int h_upscale = height[level - 1];
+ const int w_upscale = width[level - 1];
+ const int s_upscale = width[level - 1];
+
+ double *upscale = aom_malloc(h_upscale * w_upscale * sizeof(*upscale));
+
+ if (!upscale) {
+ return 0;
+ }
+
+ for (int i = 0; i < h_upscale; ++i) {
+ for (int j = 0; j < w_upscale; ++j) {
+ const int ii = clamp((i >> 1), 0, cur_height - 1);
+ const int jj = clamp((j >> 1), 0, cur_width - 1);
+ upscale[j + i * s_upscale] = (double)original[jj + ii * cur_stride];
+ }
+ }
+ memcpy(output, upscale, h_upscale * w_upscale * sizeof(double));
+ aom_free(upscale);
+ }
+
+ return 1;
+}
+
+// This function calculates the differences between a fine scale c and a
+// coarser scale s yielding the feature maps. c \in {2, 3, 4}, and s = c +
+// delta, where delta \in {3, 4}.
+static int center_surround_diff(const double *input[9], int height[9],
+ int width[9], saliency_feature_map *output[6]) {
+ int j = 0;
+ for (int k = 2; k < 5; ++k) {
+ int cur_height = height[k];
+ int cur_width = width[k];
+
+ if (upscale_map(input[k + 3], k + 3, k, height, width, output[j]->buf) ==
+ 0) {
+ return 0;
+ }
+
+ for (int r = 0; r < cur_height; ++r) {
+ for (int c = 0; c < cur_width; ++c) {
+ output[j]->buf[r * cur_width + c] =
+ fabs((double)(input[k][r * cur_width + c] -
+ output[j]->buf[r * cur_width + c]));
+ }
+ }
+
+ if (upscale_map(input[k + 4], k + 4, k, height, width,
+ output[j + 1]->buf) == 0) {
+ return 0;
+ }
+
+ for (int r = 0; r < cur_height; ++r) {
+ for (int c = 0; c < cur_width; ++c) {
+ output[j + 1]->buf[r * cur_width + c] =
+ fabs(input[k][r * cur_width + c] -
+ output[j + 1]->buf[r * cur_width + c]);
+ }
+ }
+
+ j += 2;
+ }
+ return 1;
+}
+
+// For color channels, the differences is calculated based on "color
+// double-opponency". For example, the RG feature map is constructed between a
+// fine scale c of R-G component and a coarser scale s of G-R component.
+static int center_surround_diff_rgb(const double *input_1[9],
+ const double *input_2[9], int height[9],
+ int width[9],
+ saliency_feature_map *output[6]) {
+ int j = 0;
+ for (int k = 2; k < 5; ++k) {
+ int cur_height = height[k];
+ int cur_width = width[k];
+
+ if (upscale_map(input_2[k + 3], k + 3, k, height, width, output[j]->buf) ==
+ 0) {
+ return 0;
+ }
+
+ for (int r = 0; r < cur_height; ++r) {
+ for (int c = 0; c < cur_width; ++c) {
+ output[j]->buf[r * cur_width + c] =
+ fabs((double)(input_1[k][r * cur_width + c] -
+ output[j]->buf[r * cur_width + c]));
+ }
+ }
+
+ if (upscale_map(input_2[k + 4], k + 4, k, height, width,
+ output[j + 1]->buf) == 0) {
+ return 0;
+ }
+
+ for (int r = 0; r < cur_height; ++r) {
+ for (int c = 0; c < cur_width; ++c) {
+ output[j + 1]->buf[r * cur_width + c] =
+ fabs(input_1[k][r * cur_width + c] -
+ output[j + 1]->buf[r * cur_width + c]);
+ }
+ }
+
+ j += 2;
+ }
+ return 1;
+}
+
+// This function is to generate Gaussian pyramid images with indexes from 0 to
+// 8, and construct the feature maps from calculating the center-surround
+// differences.
+static int gaussian_pyramid(const double *src, int width[9], int height[9],
+ saliency_feature_map *dst[6]) {
+ double *gaussian_map[9]; // scale = 9
+ gaussian_map[0] =
+ (double *)aom_malloc(width[0] * height[0] * sizeof(*gaussian_map[0]));
+ if (!gaussian_map[0]) {
+ return 0;
+ }
+
+ memcpy(gaussian_map[0], src, width[0] * height[0] * sizeof(double));
+
+ for (int i = 1; i < 9; ++i) {
+ int stride = width[i - 1];
+ int new_width = width[i];
+ int new_height = height[i];
+
+ gaussian_map[i] =
+ (double *)aom_malloc(new_width * new_height * sizeof(*gaussian_map[i]));
+
+ if (!gaussian_map[i]) {
+ for (int l = 0; l < i; ++l) {
+ aom_free(gaussian_map[l]);
+ }
+ return 0;
+ }
+
+ memset(gaussian_map[i], 0, new_width * new_height * sizeof(double));
+
+ decimate_map(gaussian_map[i - 1], height[i - 1], width[i - 1], stride,
+ gaussian_map[i]);
+ }
+
+ if (center_surround_diff((const double **)gaussian_map, height, width, dst) ==
+ 0) {
+ for (int l = 0; l < 9; ++l) {
+ aom_free(gaussian_map[l]);
+ }
+ return 0;
+ }
+
+ for (int i = 0; i < 9; ++i) {
+ aom_free(gaussian_map[i]);
+ }
+ return 1;
+}
+
+static int gaussian_pyramid_rgb(double *src_1, double *src_2, int width[9],
+ int height[9], saliency_feature_map *dst[6]) {
+ double *gaussian_map[2][9]; // scale = 9
+ double *src[2];
+
+ src[0] = src_1;
+ src[1] = src_2;
+
+ for (int k = 0; k < 2; ++k) {
+ gaussian_map[k][0] = (double *)aom_malloc(width[0] * height[0] *
+ sizeof(*gaussian_map[k][0]));
+ if (!gaussian_map[k][0]) {
+ for (int l = 0; l < k; ++l) {
+ aom_free(gaussian_map[l][0]);
+ }
+ return 0;
+ }
+ memcpy(gaussian_map[k][0], src[k], width[0] * height[0] * sizeof(double));
+
+ for (int i = 1; i < 9; ++i) {
+ int stride = width[i - 1];
+ int new_width = width[i];
+ int new_height = height[i];
+
+ gaussian_map[k][i] = (double *)aom_malloc(new_width * new_height *
+ sizeof(*gaussian_map[k][i]));
+ if (!gaussian_map[k][i]) {
+ for (int l = 0; l < k; ++l) {
+ aom_free(gaussian_map[l][i]);
+ }
+ return 0;
+ }
+ memset(gaussian_map[k][i], 0, new_width * new_height * sizeof(double));
+ decimate_map(gaussian_map[k][i - 1], height[i - 1], width[i - 1], stride,
+ gaussian_map[k][i]);
+ }
+ }
+
+ if (center_surround_diff_rgb((const double **)gaussian_map[0],
+ (const double **)gaussian_map[1], height, width,
+ dst) == 0) {
+ for (int l = 0; l < 2; ++l) {
+ for (int i = 0; i < 9; ++i) {
+ aom_free(gaussian_map[l][i]);
+ }
+ }
+ return 0;
+ }
+
+ for (int l = 0; l < 2; ++l) {
+ for (int i = 0; i < 9; ++i) {
+ aom_free(gaussian_map[l][i]);
+ }
+ }
+ return 1;
+}
+
+static int get_feature_map_intensity(double *intensity, int width[9],
+ int height[9],
+ saliency_feature_map *i_map[6]) {
+ if (gaussian_pyramid(intensity, width, height, i_map) == 0) {
+ return 0;
+ }
+ return 1;
+}
+
+static int get_feature_map_rgb(double *cr, double *cg, double *cb, int width[9],
+ int height[9], saliency_feature_map *rg_map[6],
+ saliency_feature_map *by_map[6]) {
+ double *rg_mat = aom_malloc(height[0] * width[0] * sizeof(*rg_mat));
+ double *by_mat = aom_malloc(height[0] * width[0] * sizeof(*by_mat));
+ double *gr_mat = aom_malloc(height[0] * width[0] * sizeof(*gr_mat));
+ double *yb_mat = aom_malloc(height[0] * width[0] * sizeof(*yb_mat));
+
+ if (!rg_mat || !by_mat || !gr_mat || !yb_mat) {
+ aom_free(rg_mat);
+ aom_free(by_mat);
+ aom_free(gr_mat);
+ aom_free(yb_mat);
+ return 0;
+ }
+
+ double r, g, b, y;
+ for (int i = 0; i < height[0]; ++i) {
+ for (int j = 0; j < width[0]; ++j) {
+ r = AOMMAX(0, cr[i * width[0] + j] -
+ (cg[i * width[0] + j] + cb[i * width[0] + j]) / 2);
+ g = AOMMAX(0, cg[i * width[0] + j] -
+ (cr[i * width[0] + j] + cb[i * width[0] + j]) / 2);
+ b = AOMMAX(0, cb[i * width[0] + j] -
+ (cr[i * width[0] + j] + cg[i * width[0] + j]) / 2);
+ y = AOMMAX(0, (cr[i * width[0] + j] + cg[i * width[0] + j]) / 2 -
+ fabs(cr[i * width[0] + j] - cg[i * width[0] + j]) / 2 -
+ cb[i * width[0] + j]);
+
+ rg_mat[i * width[0] + j] = r - g;
+ by_mat[i * width[0] + j] = b - y;
+ gr_mat[i * width[0] + j] = g - r;
+ yb_mat[i * width[0] + j] = y - b;
+ }
+ }
+
+ if (gaussian_pyramid_rgb(rg_mat, gr_mat, width, height, rg_map) == 0 ||
+ gaussian_pyramid_rgb(by_mat, yb_mat, width, height, by_map) == 0) {
+ aom_free(rg_mat);
+ aom_free(by_mat);
+ aom_free(gr_mat);
+ aom_free(yb_mat);
+ return 0;
+ }
+
+ aom_free(rg_mat);
+ aom_free(by_mat);
+ aom_free(gr_mat);
+ aom_free(yb_mat);
+ return 1;
+}
+
+static INLINE void filter2d(const double *input, const double kernel[9][9],
+ int width, int height, double *output) {
+ const int window_size = 9;
+ double map_section[81];
+ for (int y = 0; y <= height - 1; ++y) {
+ for (int x = 0; x <= width - 1; ++x) {
+ int i = 0;
+ for (int yy = y - window_size / 2; yy <= y + window_size / 2; ++yy) {
+ for (int xx = x - window_size / 2; xx <= x + window_size / 2; ++xx) {
+ int yvalue = clamp(yy, 0, height - 1);
+ int xvalue = clamp(xx, 0, width - 1);
+ map_section[i++] = input[yvalue * width + xvalue];
+ }
+ }
+
+ output[y * width + x] = 0;
+ for (int k = 0; k < window_size; ++k) {
+ for (int l = 0; l < window_size; ++l) {
+ output[y * width + x] +=
+ kernel[k][l] * map_section[k * window_size + l];
+ }
+ }
+ }
+ }
+}
+
+static int get_feature_map_orientation(const double *intensity, int width[9],
+ int height[9],
+ saliency_feature_map *dst[24]) {
+ double *gaussian_map[9];
+
+ gaussian_map[0] =
+ (double *)aom_malloc(width[0] * height[0] * sizeof(*gaussian_map[0]));
+ if (!gaussian_map[0]) {
+ return 0;
+ }
+ memcpy(gaussian_map[0], intensity, width[0] * height[0] * sizeof(double));
+
+ for (int i = 1; i < 9; ++i) {
+ int stride = width[i - 1];
+ int new_width = width[i];
+ int new_height = height[i];
+
+ gaussian_map[i] =
+ (double *)aom_malloc(new_width * new_height * sizeof(*gaussian_map[i]));
+ if (!gaussian_map[i]) {
+ for (int l = 0; l < i; ++l) {
+ aom_free(gaussian_map[l]);
+ }
+ return 0;
+ }
+ memset(gaussian_map[i], 0, new_width * new_height * sizeof(double));
+ decimate_map(gaussian_map[i - 1], height[i - 1], width[i - 1], stride,
+ gaussian_map[i]);
+ }
+
+ double *tempGaborOutput[4][9]; //[angle: 0, 45, 90, 135 degree][filter_size]
+
+ for (int i = 2; i < 9; ++i) {
+ const int cur_height = height[i];
+ const int cur_width = width[i];
+ for (int j = 0; j < 4; ++j) {
+ tempGaborOutput[j][i] = (double *)aom_malloc(
+ cur_height * cur_width * sizeof(*tempGaborOutput[j][i]));
+ if (!tempGaborOutput[j][i]) {
+ for (int l = 0; l < 9; ++l) {
+ aom_free(gaussian_map[l]);
+ }
+ for (int h = 0; h < 4; ++h) {
+ for (int g = 2; g < 9; ++g) {
+ aom_free(tempGaborOutput[h][g]);
+ }
+ }
+ return 0;
+ }
+ filter2d(gaussian_map[i], kGaborFilter[j], cur_width, cur_height,
+ tempGaborOutput[j][i]);
+ }
+ }
+
+ for (int i = 0; i < 9; ++i) {
+ aom_free(gaussian_map[i]);
+ }
+
+ saliency_feature_map
+ *tmp[4][6]; //[angle: 0, 45, 90, 135 degree][filter_size]
+
+ for (int i = 0; i < 6; ++i) {
+ for (int j = 0; j < 4; ++j) {
+ tmp[j][i] = dst[j * 6 + i];
+ }
+ }
+
+ for (int j = 0; j < 4; ++j) {
+ if (center_surround_diff((const double **)tempGaborOutput[j], height, width,
+ tmp[j]) == 0) {
+ for (int h = 0; h < 4; ++h) {
+ for (int g = 2; g < 9; ++g) {
+ aom_free(tempGaborOutput[h][g]);
+ }
+ }
+ return 0;
+ }
+ }
+
+ for (int i = 2; i < 9; ++i) {
+ for (int j = 0; j < 4; ++j) {
+ aom_free(tempGaborOutput[j][i]);
+ }
+ }
+
+ return 1;
+}
+
+static INLINE void find_min_max(const saliency_feature_map *input,
+ double *max_value, double *min_value) {
+ assert(input && input->buf);
+ *min_value = DBL_MAX;
+ *max_value = 0.0;
+
+ for (int i = 0; i < input->height; ++i) {
+ for (int j = 0; j < input->width; ++j) {
+ assert(input->buf[i * input->width + j] >= 0.0);
+ *min_value = fmin(input->buf[i * input->width + j], *min_value);
+ *max_value = fmax(input->buf[i * input->width + j], *max_value);
+ }
+ }
+}
+
+static INLINE double average_local_max(const saliency_feature_map *input,
+ int stepsize) {
+ int numlocal = 0;
+ double lmaxmean = 0, lmax = 0, dummy = 0;
+ saliency_feature_map local_map;
+ local_map.height = stepsize;
+ local_map.width = stepsize;
+ local_map.buf =
+ (double *)aom_malloc(stepsize * stepsize * sizeof(*local_map.buf));
+
+ if (!local_map.buf) {
+ return -1;
+ }
+
+ for (int y = 0; y < input->height - stepsize; y += stepsize) {
+ for (int x = 0; x < input->width - stepsize; x += stepsize) {
+ for (int i = 0; i < stepsize; ++i) {
+ for (int j = 0; j < stepsize; ++j) {
+ local_map.buf[i * stepsize + j] =
+ input->buf[(y + i) * input->width + x + j];
+ }
+ }
+
+ find_min_max(&local_map, &lmax, &dummy);
+ lmaxmean += lmax;
+ numlocal++;
+ }
+ }
+
+ aom_free(local_map.buf);
+
+ return lmaxmean / numlocal;
+}
+
+// Linear normalization the values in the map to [0,1].
+static void minmax_normalize(saliency_feature_map *input) {
+ double max_value, min_value;
+ find_min_max(input, &max_value, &min_value);
+
+ for (int i = 0; i < input->height; ++i) {
+ for (int j = 0; j < input->width; ++j) {
+ if (max_value != min_value) {
+ input->buf[i * input->width + j] =
+ input->buf[i * input->width + j] / (max_value - min_value) +
+ min_value / (min_value - max_value);
+ } else {
+ input->buf[i * input->width + j] -= min_value;
+ }
+ }
+ }
+}
+
+// This function is to promote meaningful “activation spots” in the map and
+// ignores homogeneous areas.
+static int nomalization_operator(saliency_feature_map *input, int stepsize) {
+ minmax_normalize(input);
+ double lmaxmean = average_local_max(input, stepsize);
+ if (lmaxmean < 0) {
+ return 0;
+ }
+ double normCoeff = (1 - lmaxmean) * (1 - lmaxmean);
+
+ for (int i = 0; i < input->height; ++i) {
+ for (int j = 0; j < input->width; ++j) {
+ input->buf[i * input->width + j] *= normCoeff;
+ }
+ }
+
+ return 1;
+}
+
+// Normalize the values in feature maps to [0,1], and then upscale all maps to
+// the original frame size.
+static int normalize_fm(saliency_feature_map *input[6], int width[9],
+ int height[9], int num_fm,
+ saliency_feature_map *output[6]) {
+ // Feature maps (FM) are generated by function "center_surround_diff()". The
+ // difference is between a fine scale c and a coarser scale s, where c \in {2,
+ // 3, 4}, and s = c + delta, where delta \in {3, 4}, and the FM size is scale
+ // c. Specifically, i=0: c=2 and s=5, i=1: c=2 and s=6, i=2: c=3 and s=6, i=3:
+ // c=3 and s=7, i=4: c=4 and s=7, i=5: c=4 and s=8.
+ for (int i = 0; i < num_fm; ++i) {
+ if (nomalization_operator(input[i], 8) == 0) {
+ return 0;
+ }
+
+ // Upscale FM to original frame size
+ if (upscale_map(input[i]->buf, (i / 2) + 2, 0, height, width,
+ output[i]->buf) == 0) {
+ return 0;
+ }
+ }
+ return 1;
+}
+
+// Combine feature maps with the same category (intensity, color, or
+// orientation) into one conspicuity map.
+static int normalized_map(saliency_feature_map *input[6], int width[9],
+ int height[9], saliency_feature_map *output) {
+ int num_fm = 6;
+
+ saliency_feature_map *n_input[6];
+ for (int i = 0; i < 6; ++i) {
+ n_input[i] = (saliency_feature_map *)aom_malloc(sizeof(*n_input[i]));
+ if (!n_input[i]) {
+ return 0;
+ }
+ n_input[i]->buf =
+ (double *)aom_malloc(width[0] * height[0] * sizeof(*n_input[i]->buf));
+ if (!n_input[i]->buf) {
+ aom_free(n_input[i]);
+ return 0;
+ }
+ n_input[i]->height = height[0];
+ n_input[i]->width = width[0];
+ }
+
+ if (normalize_fm(input, width, height, num_fm, n_input) == 0) {
+ for (int i = 0; i < num_fm; ++i) {
+ aom_free(n_input[i]->buf);
+ aom_free(n_input[i]);
+ }
+ return 0;
+ }
+
+ // Add up all normalized feature maps with the same category into one map.
+ for (int i = 0; i < num_fm; ++i) {
+ for (int r = 0; r < height[0]; ++r) {
+ for (int c = 0; c < width[0]; ++c) {
+ output->buf[r * width[0] + c] += n_input[i]->buf[r * width[0] + c];
+ }
+ }
+ }
+
+ for (int i = 0; i < num_fm; ++i) {
+ aom_free(n_input[i]->buf);
+ aom_free(n_input[i]);
+ }
+
+ nomalization_operator(output, 8);
+ return 1;
+}
+
+static int normalized_map_rgb(saliency_feature_map *rg_map[6],
+ saliency_feature_map *by_map[6], int width[9],
+ int height[9], saliency_feature_map *output) {
+ saliency_feature_map *color_cm[2]; // 0: color_cm_rg, 1: color_cm_by
+ for (int i = 0; i < 2; ++i) {
+ color_cm[i] = aom_malloc(sizeof(*color_cm[i]));
+ if (!color_cm[i]) {
+ return 0;
+ }
+ color_cm[i]->buf =
+ (double *)aom_malloc(width[0] * height[0] * sizeof(*color_cm[i]->buf));
+ if (!color_cm[i]->buf) {
+ for (int l = 0; l < i; ++l) {
+ aom_free(color_cm[l]->buf);
+ }
+ aom_free(color_cm[i]);
+ return 0;
+ }
+
+ color_cm[i]->width = width[0];
+ color_cm[i]->height = height[0];
+ memset(color_cm[i]->buf, 0,
+ width[0] * height[0] * sizeof(*color_cm[i]->buf));
+ }
+
+ if (normalized_map(rg_map, width, height, color_cm[0]) == 0 ||
+ normalized_map(by_map, width, height, color_cm[1]) == 0) {
+ for (int i = 0; i < 2; ++i) {
+ aom_free(color_cm[i]->buf);
+ aom_free(color_cm[i]);
+ }
+ return 0;
+ }
+
+ for (int r = 0; r < height[0]; ++r) {
+ for (int c = 0; c < width[0]; ++c) {
+ output->buf[r * width[0] + c] = color_cm[0]->buf[r * width[0] + c] +
+ color_cm[1]->buf[r * width[0] + c];
+ }
+ }
+
+ for (int i = 0; i < 2; ++i) {
+ aom_free(color_cm[i]->buf);
+ aom_free(color_cm[i]);
+ }
+
+ nomalization_operator(output, 8);
+ return 1;
+}
+
+static int normalized_map_orientation(saliency_feature_map *orientation_map[24],
+ int width[9], int height[9],
+ saliency_feature_map *output) {
+ int num_fms_per_angle = 6;
+
+ saliency_feature_map *ofm[4][6];
+ for (int i = 0; i < num_fms_per_angle; ++i) {
+ for (int j = 0; j < 4; ++j) {
+ ofm[j][i] = orientation_map[j * num_fms_per_angle + i];
+ }
+ }
+
+ // extract conspicuity map for each angle
+ saliency_feature_map *nofm = aom_malloc(sizeof(*nofm));
+ if (!nofm) {
+ return 0;
+ }
+ nofm->buf = (double *)aom_malloc(width[0] * height[0] * sizeof(*nofm->buf));
+ if (!nofm->buf) {
+ aom_free(nofm);
+ return 0;
+ }
+ nofm->height = height[0];
+ nofm->width = width[0];
+
+ for (int i = 0; i < 4; ++i) {
+ memset(nofm->buf, 0, width[0] * height[0] * sizeof(*nofm->buf));
+ if (normalized_map(ofm[i], width, height, nofm) == 0) {
+ aom_free(nofm->buf);
+ aom_free(nofm);
+ return 0;
+ }
+
+ for (int r = 0; r < height[0]; ++r) {
+ for (int c = 0; c < width[0]; ++c) {
+ output->buf[r * width[0] + c] += nofm->buf[r * width[0] + c];
+ }
+ }
+ }
+
+ aom_free(nofm->buf);
+ aom_free(nofm);
+
+ nomalization_operator(output, 8);
+ return 1;
+}
+
+// Set pixel level saliency mask based on Itti-Koch algorithm
+int av1_set_saliency_map(AV1_COMP *cpi) {
+ AV1_COMMON *const cm = &cpi->common;
+
+ int frm_width = cm->width;
+ int frm_height = cm->height;
+
+ int pyr_height[9];
+ int pyr_width[9];
+
+ pyr_height[0] = frm_height;
+ pyr_width[0] = frm_width;
+
+ for (int i = 1; i < 9; ++i) {
+ pyr_width[i] = pyr_width[i - 1] / 2;
+ pyr_height[i] = pyr_height[i - 1] / 2;
+ }
+
+ double *cr = aom_malloc(frm_width * frm_height * sizeof(*cr));
+ double *cg = aom_malloc(frm_width * frm_height * sizeof(*cg));
+ double *cb = aom_malloc(frm_width * frm_height * sizeof(*cb));
+ double *intensity = aom_malloc(frm_width * frm_height * sizeof(*intensity));
+
+ if (!cr || !cg || !cb || !intensity) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ return 0;
+ }
+
+ // Extract red / green / blue channels and intensity component
+ get_color_intensity(cpi->source, cm->seq_params->subsampling_x,
+ cm->seq_params->subsampling_y, cr, cg, cb, intensity);
+
+ // Feature Map Extraction
+ // intensity map
+ saliency_feature_map *i_map[6];
+ for (int i = 0; i < 6; ++i) {
+ int cur_height = pyr_height[(i / 2) + 2];
+ int cur_width = pyr_width[(i / 2) + 2];
+
+ i_map[i] = (saliency_feature_map *)aom_malloc(sizeof(*i_map[i]));
+ if (!i_map[i]) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < i; ++l) {
+ aom_free(i_map[l]);
+ }
+ return 0;
+ }
+ i_map[i]->buf =
+ (double *)aom_malloc(cur_height * cur_width * sizeof(*i_map[i]->buf));
+ if (!i_map[i]->buf) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < i; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(i_map[l]);
+ }
+ return 0;
+ }
+ i_map[i]->height = cur_height;
+ i_map[i]->width = cur_width;
+ }
+
+ if (get_feature_map_intensity(intensity, pyr_width, pyr_height, i_map) == 0) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(i_map[l]);
+ }
+ return 0;
+ }
+
+ // RGB map
+ saliency_feature_map *rg_map[6], *by_map[6];
+ for (int i = 0; i < 6; ++i) {
+ int cur_height = pyr_height[(i / 2) + 2];
+ int cur_width = pyr_width[(i / 2) + 2];
+ rg_map[i] = (saliency_feature_map *)aom_malloc(sizeof(*rg_map[i]));
+ by_map[i] = (saliency_feature_map *)aom_malloc(sizeof(*by_map[i]));
+ if (!rg_map[i] || !by_map[i]) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ return 0;
+ }
+ rg_map[i]->buf =
+ (double *)aom_malloc(cur_height * cur_width * sizeof(*rg_map[i]->buf));
+ by_map[i]->buf =
+ (double *)aom_malloc(cur_height * cur_width * sizeof(*by_map[i]->buf));
+ if (!by_map[i]->buf || !rg_map[i]->buf) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(i_map[l]);
+ }
+ for (int l = 0; l < i; ++l) {
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ return 0;
+ }
+ rg_map[i]->height = cur_height;
+ rg_map[i]->width = cur_width;
+ by_map[i]->height = cur_height;
+ by_map[i]->width = cur_width;
+ }
+
+ if (get_feature_map_rgb(cr, cg, cb, pyr_width, pyr_height, rg_map, by_map) ==
+ 0) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ return 0;
+ }
+
+ // Orientation map
+ saliency_feature_map *orientation_map[24];
+ for (int i = 0; i < 24; ++i) {
+ int cur_height = pyr_height[((i % 6) / 2) + 2];
+ int cur_width = pyr_width[((i % 6) / 2) + 2];
+
+ orientation_map[i] =
+ (saliency_feature_map *)aom_malloc(sizeof(*orientation_map[i]));
+ if (!orientation_map[i]) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ for (int h = 0; h < i; ++h) {
+ aom_free(orientation_map[h]);
+ }
+ return 0;
+ }
+
+ orientation_map[i]->buf = (double *)aom_malloc(
+ cur_height * cur_width * sizeof(*orientation_map[i]->buf));
+ if (!orientation_map[i]->buf) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+
+ for (int h = 0; h < i; ++h) {
+ aom_free(orientation_map[h]->buf);
+ aom_free(orientation_map[h]->buf);
+ aom_free(orientation_map[h]);
+ aom_free(orientation_map[h]);
+ }
+ return 0;
+ }
+
+ orientation_map[i]->height = cur_height;
+ orientation_map[i]->width = cur_width;
+ }
+
+ if (get_feature_map_orientation(intensity, pyr_width, pyr_height,
+ orientation_map) == 0) {
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ for (int h = 0; h < 24; ++h) {
+ aom_free(orientation_map[h]->buf);
+ aom_free(orientation_map[h]);
+ }
+ return 0;
+ }
+
+ aom_free(cr);
+ aom_free(cg);
+ aom_free(cb);
+ aom_free(intensity);
+
+ saliency_feature_map
+ *normalized_maps[3]; // 0: intensity, 1: color, 2: orientation
+
+ for (int i = 0; i < 3; ++i) {
+ normalized_maps[i] = aom_malloc(sizeof(*normalized_maps[i]));
+ if (!normalized_maps[i]) {
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+
+ for (int h = 0; h < 24; ++h) {
+ aom_free(orientation_map[h]->buf);
+ aom_free(orientation_map[h]);
+ }
+
+ for (int l = 0; l < i; ++l) {
+ aom_free(normalized_maps[l]);
+ }
+ return 0;
+ }
+ normalized_maps[i]->buf = (double *)aom_malloc(
+ frm_width * frm_height * sizeof(*normalized_maps[i]->buf));
+ if (!normalized_maps[i]->buf) {
+ for (int l = 0; l < 6; ++l) {
+ aom_free(i_map[l]->buf);
+ aom_free(rg_map[l]->buf);
+ aom_free(by_map[l]->buf);
+ aom_free(i_map[l]);
+ aom_free(rg_map[l]);
+ aom_free(by_map[l]);
+ }
+ for (int h = 0; h < 24; ++h) {
+ aom_free(orientation_map[h]->buf);
+ aom_free(orientation_map[h]);
+ }
+ for (int l = 0; l < i; ++l) {
+ aom_free(normalized_maps[l]->buf);
+ aom_free(normalized_maps[l]);
+ }
+ return 0;
+ }
+ normalized_maps[i]->width = frm_width;
+ normalized_maps[i]->height = frm_height;
+ memset(normalized_maps[i]->buf, 0,
+ frm_width * frm_height * sizeof(*normalized_maps[i]->buf));
+ }
+
+ // Conspicuity map generation
+ if (normalized_map(i_map, pyr_width, pyr_height, normalized_maps[0]) == 0 ||
+ normalized_map_rgb(rg_map, by_map, pyr_width, pyr_height,
+ normalized_maps[1]) == 0 ||
+ normalized_map_orientation(orientation_map, pyr_width, pyr_height,
+ normalized_maps[2]) == 0) {
+ for (int i = 0; i < 6; ++i) {
+ aom_free(i_map[i]->buf);
+ aom_free(rg_map[i]->buf);
+ aom_free(by_map[i]->buf);
+ aom_free(i_map[i]);
+ aom_free(rg_map[i]);
+ aom_free(by_map[i]);
+ }
+
+ for (int i = 0; i < 24; ++i) {
+ aom_free(orientation_map[i]->buf);
+ aom_free(orientation_map[i]);
+ }
+
+ for (int i = 0; i < 3; ++i) {
+ aom_free(normalized_maps[i]->buf);
+ aom_free(normalized_maps[i]);
+ }
+ return 0;
+ }
+
+ for (int i = 0; i < 6; ++i) {
+ aom_free(i_map[i]->buf);
+ aom_free(rg_map[i]->buf);
+ aom_free(by_map[i]->buf);
+ aom_free(i_map[i]);
+ aom_free(rg_map[i]);
+ aom_free(by_map[i]);
+ }
+
+ for (int i = 0; i < 24; ++i) {
+ aom_free(orientation_map[i]->buf);
+ aom_free(orientation_map[i]);
+ }
+
+ // Pixel level saliency map
+ saliency_feature_map *combined_saliency_map =
+ aom_malloc(sizeof(*combined_saliency_map));
+ if (!combined_saliency_map) {
+ for (int i = 0; i < 3; ++i) {
+ aom_free(normalized_maps[i]->buf);
+ aom_free(normalized_maps[i]);
+ }
+ return 0;
+ }
+
+ combined_saliency_map->buf = (double *)aom_malloc(
+ frm_width * frm_height * sizeof(*combined_saliency_map->buf));
+ if (!combined_saliency_map->buf) {
+ for (int i = 0; i < 3; ++i) {
+ aom_free(normalized_maps[i]->buf);
+ aom_free(normalized_maps[i]);
+ }
+
+ aom_free(combined_saliency_map);
+ return 0;
+ }
+ combined_saliency_map->height = frm_height;
+ combined_saliency_map->width = frm_width;
+
+ double w_intensity, w_color, w_orient;
+
+ w_intensity = w_color = w_orient = (double)1 / 3;
+
+ for (int r = 0; r < frm_height; ++r) {
+ for (int c = 0; c < frm_width; ++c) {
+ combined_saliency_map->buf[r * frm_width + c] =
+ (w_intensity * normalized_maps[0]->buf[r * frm_width + c] +
+ w_color * normalized_maps[1]->buf[r * frm_width + c] +
+ w_orient * normalized_maps[2]->buf[r * frm_width + c]);
+ }
+ }
+
+ for (int r = 0; r < frm_height; ++r) {
+ for (int c = 0; c < frm_width; ++c) {
+ int index = r * frm_width + c;
+ cpi->saliency_map[index] =
+ (uint8_t)(combined_saliency_map->buf[index] * 255);
+ }
+ }
+
+ for (int i = 0; i < 3; ++i) {
+ aom_free(normalized_maps[i]->buf);
+ aom_free(normalized_maps[i]);
+ }
+
+ aom_free(combined_saliency_map->buf);
+ aom_free(combined_saliency_map);
+
+ return 1;
+}
+
+// Set superblock level saliency mask for rdmult scaling
+int av1_setup_sm_rdmult_scaling_factor(AV1_COMP *cpi, double motion_ratio) {
+ AV1_COMMON *cm = &cpi->common;
+
+ saliency_feature_map *sb_saliency_map =
+ aom_malloc(sizeof(saliency_feature_map));
+
+ if (sb_saliency_map == NULL) {
+ return 0;
+ }
+
+ const int bsize = cm->seq_params->sb_size;
+ const int num_mi_w = mi_size_wide[bsize];
+ const int num_mi_h = mi_size_high[bsize];
+ const int block_width = block_size_wide[bsize];
+ const int block_height = block_size_high[bsize];
+ const int num_sb_cols = (cm->mi_params.mi_cols + num_mi_w - 1) / num_mi_w;
+ const int num_sb_rows = (cm->mi_params.mi_rows + num_mi_h - 1) / num_mi_h;
+
+ sb_saliency_map->height = num_sb_rows;
+ sb_saliency_map->width = num_sb_cols;
+ sb_saliency_map->buf = (double *)aom_malloc(num_sb_rows * num_sb_cols *
+ sizeof(*sb_saliency_map->buf));
+
+ if (sb_saliency_map->buf == NULL) {
+ aom_free(sb_saliency_map);
+ return 0;
+ }
+
+ for (int row = 0; row < num_sb_rows; ++row) {
+ for (int col = 0; col < num_sb_cols; ++col) {
+ const int index = row * num_sb_cols + col;
+ double total_pixel = 0;
+ double total_weight = 0;
+
+ for (int i = 0; i < block_height; i++) {
+ for (int j = 0; j < block_width; j++) {
+ if ((row * block_height + i) >= cpi->common.height ||
+ (col * block_width + j) >= cpi->common.width)
+ continue;
+ total_pixel++;
+ total_weight +=
+ cpi->saliency_map[(row * block_height + i) * cpi->common.width +
+ col * block_width + j];
+ }
+ }
+
+ assert(total_pixel > 0);
+
+ // Calculate the superblock level saliency map from pixel level saliency
+ // map
+ sb_saliency_map->buf[index] = total_weight / total_pixel;
+
+ // Further lower the superblock saliency score for boundary superblocks.
+ if (row < 1 || row > num_sb_rows - 2 || col < 1 ||
+ col > num_sb_cols - 2) {
+ sb_saliency_map->buf[index] /= 5;
+ }
+ }
+ }
+
+ // superblock level saliency map finalization
+ minmax_normalize(sb_saliency_map);
+
+ double log_sum = 0.0;
+ double sum = 0.0;
+ int block_count = 0;
+
+ // Calculate the average superblock sm_scaling_factor for a frame, to be used
+ // for clamping later.
+ for (int row = 0; row < num_sb_rows; ++row) {
+ for (int col = 0; col < num_sb_cols; ++col) {
+ const int index = row * num_sb_cols + col;
+ const double saliency = sb_saliency_map->buf[index];
+
+ cpi->sm_scaling_factor[index] = 1 - saliency;
+ sum += cpi->sm_scaling_factor[index];
+ block_count++;
+ }
+ }
+ assert(block_count > 0);
+ sum /= block_count;
+
+ // Calculate the geometric mean of superblock sm_scaling_factor for a frame,
+ // to be used for normalization.
+ for (int row = 0; row < num_sb_rows; ++row) {
+ for (int col = 0; col < num_sb_cols; ++col) {
+ const int index = row * num_sb_cols + col;
+ log_sum += log(fmax(cpi->sm_scaling_factor[index], 0.001));
+ cpi->sm_scaling_factor[index] =
+ fmax(cpi->sm_scaling_factor[index], 0.8 * sum);
+ }
+ }
+
+ log_sum = exp(log_sum / block_count);
+
+ // Normalize the sm_scaling_factor by geometric mean.
+ for (int row = 0; row < num_sb_rows; ++row) {
+ for (int col = 0; col < num_sb_cols; ++col) {
+ const int index = row * num_sb_cols + col;
+ assert(log_sum > 0);
+ cpi->sm_scaling_factor[index] /= log_sum;
+
+ // Modulate the sm_scaling_factor by frame basis motion factor
+ cpi->sm_scaling_factor[index] =
+ cpi->sm_scaling_factor[index] * motion_ratio;
+ }
+ }
+
+ aom_free(sb_saliency_map->buf);
+ aom_free(sb_saliency_map);
+ return 1;
+}
+
+// av1_setup_motion_ratio() is only enabled when CONFIG_REALTIME_ONLY is 0,
+// because the computations need to access the first pass stats which are
+// only available when CONFIG_REALTIME_ONLY is equal to 0.
+#if !CONFIG_REALTIME_ONLY
+// Set motion_ratio that reflects the motion quantities between two consecutive
+// frames. Motion_ratio will be used to set up saliency_map based rdmult scaling
+// factor, i.e., the less the motion quantities are, the more bits will be spent
+// on this frame, and vice versa.
+double av1_setup_motion_ratio(AV1_COMP *cpi) {
+ AV1_COMMON *cm = &cpi->common;
+ int frames_since_key =
+ cm->current_frame.display_order_hint - cpi->rc.frames_since_key;
+ const FIRSTPASS_STATS *cur_stats = av1_firstpass_info_peek(
+ &cpi->ppi->twopass.firstpass_info, frames_since_key);
+ assert(cur_stats != NULL);
+ assert(cpi->ppi->twopass.firstpass_info.total_stats.count > 0);
+
+ const double avg_intra_error =
+ exp(cpi->ppi->twopass.firstpass_info.total_stats.log_intra_error /
+ cpi->ppi->twopass.firstpass_info.total_stats.count);
+ const double avg_inter_error =
+ exp(cpi->ppi->twopass.firstpass_info.total_stats.log_coded_error /
+ cpi->ppi->twopass.firstpass_info.total_stats.count);
+
+ double inter_error = cur_stats->coded_error;
+ double error_stdev = 0;
+ const double avg_error =
+ cpi->ppi->twopass.firstpass_info.total_stats.intra_error /
+ cpi->ppi->twopass.firstpass_info.total_stats.count;
+ for (int i = 0; i < cpi->ppi->twopass.firstpass_info.total_stats.count; i++) {
+ const FIRSTPASS_STATS *stats =
+ &cpi->ppi->twopass.firstpass_info.stats_buf[i];
+ error_stdev +=
+ (stats->intra_error - avg_error) * (stats->intra_error - avg_error);
+ }
+ error_stdev =
+ sqrt(error_stdev / cpi->ppi->twopass.firstpass_info.total_stats.count);
+
+ double motion_ratio = 1;
+ if (error_stdev / fmax(avg_intra_error, 1) > 0.1) {
+ motion_ratio = inter_error / fmax(1, avg_inter_error);
+ motion_ratio = AOMMIN(motion_ratio, 1.5);
+ motion_ratio = AOMMAX(motion_ratio, 0.8);
+ }
+
+ return motion_ratio;
+}
+#endif // !CONFIG_REALTIME_ONLY
diff --git a/av1/encoder/saliency_map.h b/av1/encoder/saliency_map.h
new file mode 100644
index 0000000..0d27f83
--- /dev/null
+++ b/av1/encoder/saliency_map.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#ifndef AOM_AV1_ENCODER_SALIENCY_MAP_H_
+#define AOM_AV1_ENCODER_SALIENCY_MAP_H_
+#include "av1/encoder/encoder.h"
+
+typedef struct saliency_feature_map {
+ double *buf; // stores values of the map in 1D array
+ int height;
+ int width;
+} saliency_feature_map;
+
+int av1_set_saliency_map(AV1_COMP *cpi);
+#if !CONFIG_REALTIME_ONLY
+double av1_setup_motion_ratio(AV1_COMP *cpi);
+#endif
+int av1_setup_sm_rdmult_scaling_factor(AV1_COMP *cpi, double motion_ratio);
+
+#endif // AOM_AV1_ENCODER_SALIENCY_MAP_H_
diff --git a/av1/encoder/speed_features.c b/av1/encoder/speed_features.c
index 9fbfbd7..19723a4 100644
--- a/av1/encoder/speed_features.c
+++ b/av1/encoder/speed_features.c
@@ -168,6 +168,14 @@
return frame_is_kf_gf_arf(cpi);
}
+// Set transform rd gate level for all transform search cases.
+static AOM_INLINE void set_txfm_rd_gate_level(
+ int txfm_rd_gate_level[TX_SEARCH_CASES], int level) {
+ assert(level <= MAX_TX_RD_GATE_LEVEL);
+ for (int idx = 0; idx < TX_SEARCH_CASES; idx++)
+ txfm_rd_gate_level[idx] = level;
+}
+
static void set_allintra_speed_feature_framesize_dependent(
const AV1_COMP *const cpi, SPEED_FEATURES *const sf, int speed) {
const AV1_COMMON *const cm = &cpi->common;
@@ -206,7 +214,7 @@
if (is_720p_or_larger) {
// TODO([email protected]): make this speed feature adaptive based on
// current block's vertical texture instead of hardcoded with resolution
- sf->mv_sf.use_downsampled_sad = 1;
+ sf->mv_sf.use_downsampled_sad = 2;
}
if (speed >= 1) {
@@ -309,6 +317,11 @@
if (speed >= 9) {
// TODO(kyslov): add more speed features to control speed/quality
if (!is_4k_or_larger) {
+ // In av1_select_sb_size(), superblock size is set to 64x64 only for
+ // resolutions less than 4k in speed>=9, to improve the multithread
+ // performance. If cost update levels are set to INTERNAL_COST_UPD_OFF
+ // for resolutions >= 4k, the SB size setting can be modified for these
+ // resolutions as well.
sf->inter_sf.coeff_cost_upd_level = INTERNAL_COST_UPD_OFF;
sf->inter_sf.mode_cost_upd_level = INTERNAL_COST_UPD_OFF;
}
@@ -422,6 +435,7 @@
sf->tx_sf.adaptive_txb_search_level = 2;
sf->tx_sf.tx_type_search.use_skip_flag_prediction = 2;
+ sf->tx_sf.use_rd_based_breakout_for_intra_tx_search = true;
// TODO(any): evaluate if these lpf features can be moved to speed 2.
// For screen content, "prune_sgr_based_on_wiener = 2" cause large quality
@@ -478,7 +492,9 @@
sf->intra_sf.chroma_intra_pruning_with_hog = 3;
sf->lpf_sf.use_coarse_filter_level_search = 0;
- sf->lpf_sf.disable_lr_filter = 1;
+ // Disable Wiener and Self-guided Loop restoration filters.
+ sf->lpf_sf.disable_wiener_filter = true;
+ sf->lpf_sf.disable_sgr_filter = true;
sf->mv_sf.prune_mesh_search = PRUNE_MESH_SEARCH_LVL_2;
@@ -497,6 +513,7 @@
sf->part_sf.prune_rectangular_split_based_on_qidx =
allow_screen_content_tools ? 0 : 2;
+ sf->part_sf.prune_rect_part_using_4x4_var_deviation = true;
sf->part_sf.prune_sub_8x8_partition_level =
allow_screen_content_tools ? 0 : 1;
sf->part_sf.prune_part4_search = 3;
@@ -555,6 +572,8 @@
sf->rt_sf.var_part_split_threshold_shift = 9;
sf->rt_sf.vbp_prune_16x16_split_using_min_max_sub_blk_var = true;
sf->rt_sf.prune_h_pred_using_best_mode_so_far = true;
+ sf->rt_sf.enable_intra_mode_pruning_using_neighbors = true;
+ sf->rt_sf.prune_intra_mode_using_best_sad_so_far = true;
}
// As the speed feature prune_chroma_modes_using_luma_winner already
@@ -576,12 +595,21 @@
const int is_1080p_or_larger = AOMMIN(cm->width, cm->height) >= 1080;
const int is_4k_or_larger = AOMMIN(cm->width, cm->height) >= 2160;
const bool use_hbd = cpi->oxcf.use_highbitdepth;
+ // Speed features applicable for temporal filtering and tpl modules may be
+ // changed based on frame type at places where the sf is applied (Example :
+ // use_downsampled_sad). This is because temporal filtering and tpl modules
+ // are called before this function (except for the first key frame).
+ // TODO([email protected]): For the speed features applicable to temporal
+ // filtering and tpl modules, modify the sf initialization appropriately
+ // before calling the modules.
const int boosted = frame_is_boosted(cpi);
const int is_boosted_arf2_bwd_type =
boosted ||
cpi->ppi->gf_group.update_type[cpi->gf_frame_index] == INTNL_ARF_UPDATE;
const int is_lf_frame =
cpi->ppi->gf_group.update_type[cpi->gf_frame_index] == LF_UPDATE;
+ const int allow_screen_content_tools =
+ cm->features.allow_screen_content_tools;
if (is_480p_or_larger) {
sf->part_sf.use_square_partition_only_threshold = BLOCK_128X128;
@@ -612,7 +640,7 @@
if (is_720p_or_larger) {
// TODO([email protected]): make this speed feature adaptive based on
// current block's vertical texture instead of hardcoded with resolution
- sf->mv_sf.use_downsampled_sad = 1;
+ sf->mv_sf.use_downsampled_sad = 2;
}
if (!is_720p_or_larger) {
@@ -766,6 +794,8 @@
if (is_480p_or_larger) {
sf->tx_sf.tx_type_search.prune_tx_type_using_stats = 2;
+ } else {
+ sf->mv_sf.skip_fullpel_search_using_startmv = boosted ? 0 : 1;
}
sf->inter_sf.disable_interinter_wedge_var_thresh = UINT_MAX;
@@ -799,14 +829,16 @@
sf->inter_sf.skip_newmv_in_drl = 4;
sf->inter_sf.prune_comp_ref_frames = 1;
+ sf->mv_sf.skip_fullpel_search_using_startmv = boosted ? 0 : 1;
if (!is_720p_or_larger) {
sf->inter_sf.mv_cost_upd_level = INTERNAL_COST_UPD_SBROW_SET;
+ sf->inter_sf.prune_nearest_near_mv_using_refmv_weight =
+ (boosted || allow_screen_content_tools) ? 0 : 1;
+ sf->mv_sf.use_downsampled_sad = 1;
}
if (!is_480p_or_larger) {
- sf->tx_sf.tx_type_search.fast_inter_tx_type_prob_thresh =
- boosted ? INT_MAX : 250;
sf->part_sf.partition_search_breakout_dist_thr = (1 << 26);
}
@@ -821,6 +853,10 @@
sf->tx_sf.tx_type_search.winner_mode_tx_type_pruning = 4;
sf->inter_sf.prune_nearmv_using_neighbors = PRUNE_NEARMV_LEVEL3;
sf->inter_sf.prune_comp_ref_frames = 2;
+ sf->inter_sf.prune_nearest_near_mv_using_refmv_weight =
+ (boosted || allow_screen_content_tools) ? 0 : 1;
+ sf->mv_sf.skip_fullpel_search_using_startmv = boosted ? 0 : 2;
+
if (is_720p_or_larger) {
sf->part_sf.auto_max_partition_based_on_simple_motion = NOT_IN_USE;
} else if (is_480p_or_larger) {
@@ -855,12 +891,13 @@
}
if (!is_720p_or_larger) {
- sf->tx_sf.tx_type_search.fast_inter_tx_type_prob_thresh = 150;
+ sf->tx_sf.tx_type_search.fast_inter_tx_type_prob_thresh =
+ is_boosted_arf2_bwd_type ? 450 : 150;
}
sf->lpf_sf.cdef_pick_method = CDEF_FAST_SEARCH_LVL4;
- if (!is_480p_or_larger) sf->hl_sf.num_frames_used_in_tf = 3;
+ sf->hl_sf.recode_tolerance = 55;
}
}
@@ -881,7 +918,10 @@
}
// Speed 0 for all speed features that give neutral coding performance change.
- sf->gm_sf.gm_search_type = GM_REDUCED_REF_SEARCH_SKIP_L2_L3;
+ sf->gm_sf.gm_search_type = boosted ? GM_REDUCED_REF_SEARCH_SKIP_L2_L3_ARF2
+ : GM_SEARCH_CLOSEST_REFS_ONLY;
+ sf->gm_sf.prune_ref_frame_for_gm_search = boosted ? 0 : 1;
+ sf->gm_sf.disable_gm_search_based_on_stats = 1;
sf->part_sf.less_rectangular_check_level = 1;
sf->part_sf.ml_prune_partition = 1;
@@ -932,9 +972,6 @@
sf->hl_sf.superres_auto_search_type = SUPERRES_AUTO_DUAL;
if (speed >= 1) {
- sf->gm_sf.gm_search_type = GM_REDUCED_REF_SEARCH_SKIP_L2_L3_ARF2;
- sf->gm_sf.prune_ref_frame_for_gm_search = boosted ? 0 : 1;
-
sf->part_sf.intra_cnn_based_part_prune_level =
allow_screen_content_tools ? 0 : 2;
sf->part_sf.simple_motion_search_early_term_none = 1;
@@ -990,7 +1027,7 @@
sf->fp_sf.skip_motion_search_threshold = 25;
- sf->gm_sf.disable_gm_search_based_on_stats = 1;
+ sf->gm_sf.num_refinement_steps = 2;
sf->part_sf.reuse_best_prediction_for_part_ab =
!frame_is_intra_only(&cpi->common);
@@ -1012,10 +1049,9 @@
sf->inter_sf.prune_comp_type_by_comp_avg = 2;
sf->inter_sf.selective_ref_frame = 3;
sf->inter_sf.use_dist_wtd_comp_flag = DIST_WTD_COMP_DISABLED;
- // Enable fast search only for COMPOUND_DIFFWTD type.
sf->inter_sf.enable_fast_compound_mode_search = 1;
sf->inter_sf.reuse_mask_search_results = 1;
- sf->inter_sf.txfm_rd_gate_level = boosted ? 0 : 1;
+ set_txfm_rd_gate_level(sf->inter_sf.txfm_rd_gate_level, boosted ? 0 : 1);
sf->inter_sf.inter_mode_txfm_breakout = boosted ? 0 : 1;
sf->inter_sf.alt_ref_search_fp = 1;
@@ -1047,8 +1083,9 @@
if (speed >= 3) {
sf->hl_sf.high_precision_mv_usage = CURRENT_Q;
- sf->gm_sf.gm_search_type = GM_DISABLE_SEARCH;
+ sf->gm_sf.prune_ref_frame_for_gm_search = 1;
sf->gm_sf.prune_zero_mv_with_sse = 1;
+ sf->gm_sf.num_refinement_steps = 0;
sf->part_sf.less_rectangular_check_level = 2;
sf->part_sf.simple_motion_search_prune_agg =
@@ -1074,10 +1111,9 @@
sf->inter_sf.prune_inter_modes_based_on_tpl = boosted ? 0 : 1;
sf->inter_sf.prune_comp_search_by_single_result = boosted ? 4 : 2;
sf->inter_sf.selective_ref_frame = 5;
- sf->inter_sf.skip_repeated_ref_mv = 1;
sf->inter_sf.reuse_compound_type_decision = 1;
- sf->inter_sf.txfm_rd_gate_level =
- boosted ? 0 : (is_boosted_arf2_bwd_type ? 1 : 2);
+ set_txfm_rd_gate_level(sf->inter_sf.txfm_rd_gate_level,
+ boosted ? 0 : (is_boosted_arf2_bwd_type ? 1 : 2));
sf->inter_sf.inter_mode_txfm_breakout = boosted ? 0 : 2;
sf->interp_sf.adaptive_interp_filter_search = 2;
@@ -1121,10 +1157,10 @@
}
if (speed >= 4) {
- sf->gm_sf.prune_zero_mv_with_sse = 2;
-
sf->mv_sf.subpel_search_method = SUBPEL_TREE_PRUNED_MORE;
+ sf->gm_sf.prune_zero_mv_with_sse = 2;
+
sf->part_sf.simple_motion_search_prune_agg =
allow_screen_content_tools ? SIMPLE_AGG_LVL0 : SIMPLE_AGG_LVL2;
sf->part_sf.simple_motion_search_reduce_search_steps = 4;
@@ -1135,7 +1171,8 @@
: 1;
sf->inter_sf.alt_ref_search_fp = 2;
- sf->inter_sf.txfm_rd_gate_level = boosted ? 0 : 3;
+ sf->inter_sf.txfm_rd_gate_level[TX_SEARCH_DEFAULT] = boosted ? 0 : 3;
+ sf->inter_sf.txfm_rd_gate_level[TX_SEARCH_MOTION_MODE] = boosted ? 0 : 5;
sf->inter_sf.prune_inter_modes_based_on_tpl = boosted ? 0 : 2;
sf->inter_sf.prune_ext_comp_using_neighbors = 2;
@@ -1175,8 +1212,12 @@
}
if (speed >= 5) {
+ sf->hl_sf.weight_calc_level_in_tf = 1;
+
sf->fp_sf.reduce_mv_step_param = 4;
+ sf->gm_sf.gm_search_type = GM_DISABLE_SEARCH;
+
sf->part_sf.simple_motion_search_prune_agg =
allow_screen_content_tools ? SIMPLE_AGG_LVL0 : SIMPLE_AGG_LVL3;
sf->part_sf.ext_partition_eval_thresh =
@@ -1185,9 +1226,10 @@
(allow_screen_content_tools || frame_is_intra_only(&cpi->common)) ? 0
: 2;
+ sf->mv_sf.warp_search_method = WARP_SEARCH_DIAMOND;
+
sf->inter_sf.prune_inter_modes_if_skippable = 1;
- sf->inter_sf.txfm_rd_gate_level = boosted ? 0 : 4;
- // Enable fast search for all valid compound modes.
+ sf->inter_sf.txfm_rd_gate_level[TX_SEARCH_DEFAULT] = boosted ? 0 : 4;
sf->inter_sf.enable_fast_compound_mode_search = 2;
sf->intra_sf.chroma_intra_pruning_with_hog = 3;
@@ -1197,7 +1239,9 @@
frame_is_intra_only(&cpi->common) ? MULTI_WINNER_MODE_FAST
: MULTI_WINNER_MODE_OFF;
- sf->lpf_sf.disable_lr_filter = 1;
+ // Disable Self-guided Loop restoration filter.
+ sf->lpf_sf.disable_sgr_filter = true;
+ sf->lpf_sf.disable_wiener_coeff_refine_search = true;
sf->tpl_sf.prune_starting_mv = 3;
sf->tpl_sf.use_y_only_rate_distortion = 1;
@@ -1212,7 +1256,8 @@
if (speed >= 6) {
sf->hl_sf.disable_extra_sc_testing = 1;
sf->hl_sf.second_alt_ref_filtering = 0;
- sf->hl_sf.recode_tolerance = 55;
+ sf->hl_sf.adjust_num_frames_for_arf_filtering =
+ allow_screen_content_tools ? 0 : 1;
sf->inter_sf.prune_inter_modes_based_on_tpl = boosted ? 0 : 3;
sf->inter_sf.selective_ref_frame = 6;
@@ -1236,10 +1281,8 @@
sf->mv_sf.simple_motion_subpel_force_stop = FULL_PEL;
sf->mv_sf.use_bsize_dependent_search_method = 1;
- sf->mv_sf.skip_fullpel_search_using_startmv = boosted ? 0 : 1;
sf->tpl_sf.gop_length_decision_method = 3;
- sf->tpl_sf.disable_filtered_key_tpl = 1;
sf->rd_sf.perform_coeff_opt = is_boosted_arf2_bwd_type ? 6 : 8;
@@ -1300,6 +1343,10 @@
sf->rt_sf.use_adaptive_subpel_search = false;
}
if (speed >= 10) {
+ // TODO([email protected]): To be conservative, disable
+ // sf->rt_sf.estimate_motion_for_var_based_partition = 3 for speed 10/qvga
+ // for now. May enable it in the future.
+ sf->rt_sf.estimate_motion_for_var_based_partition = 0;
sf->rt_sf.skip_intra_pred = 2;
sf->rt_sf.hybrid_intra_pickmode = 3;
sf->rt_sf.reduce_mv_pel_precision_lowcomplex = 1;
@@ -1352,12 +1399,6 @@
if (speed == 7) {
sf->rt_sf.nonrd_check_partition_merge_mode = 2;
}
- if (speed >= 8) {
- sf->rt_sf.estimate_motion_for_var_based_partition = 1;
- }
- if (speed >= 9) {
- sf->rt_sf.estimate_motion_for_var_based_partition = 0;
- }
}
if (!is_720p_or_larger) {
if (speed >= 9) {
@@ -1399,18 +1440,22 @@
// For SVC: for greater than 2 temporal layers, use better mv search on
// base temporal layers, and only on base spatial layer if highest
// resolution is above 640x360.
- if (cpi->svc.number_temporal_layers > 2 &&
+ if (cpi->svc.number_temporal_layers >= 2 &&
cpi->svc.temporal_layer_id == 0 &&
(cpi->svc.spatial_layer_id == 0 ||
cpi->oxcf.frm_dim_cfg.width * cpi->oxcf.frm_dim_cfg.height <=
640 * 360)) {
sf->mv_sf.search_method = NSTEP;
- sf->mv_sf.subpel_search_method = SUBPEL_TREE;
- sf->rt_sf.fullpel_search_step_param = 6;
+ sf->mv_sf.subpel_search_method = SUBPEL_TREE_PRUNED;
+ sf->rt_sf.fullpel_search_step_param = 10;
sf->rt_sf.reduce_mv_pel_precision_highmotion = 0;
+ if (cm->width * cm->height <= 352 * 288)
+ sf->rt_sf.nonrd_prune_ref_frame_search = 2;
+ sf->rt_sf.force_large_partition_blocks_intra = 0;
}
if (speed >= 8) {
- sf->rt_sf.disable_cdf_update_non_reference_frame = true;
+ if (cpi->svc.number_temporal_layers > 2)
+ sf->rt_sf.disable_cdf_update_non_reference_frame = true;
sf->rt_sf.reduce_mv_pel_precision_highmotion = 3;
if (rtc_ref->non_reference_frame) {
sf->rt_sf.nonrd_aggressive_skip = 1;
@@ -1422,6 +1467,8 @@
sf->rt_sf.check_only_zero_zeromv_on_large_blocks = false;
else
sf->rt_sf.check_only_zero_zeromv_on_large_blocks = true;
+ sf->rt_sf.frame_level_mode_cost_update = false;
+
// Compound mode enabling.
if (rtc_ref->ref_frame_comp[0] || rtc_ref->ref_frame_comp[1] ||
rtc_ref->ref_frame_comp[2]) {
@@ -1439,6 +1486,20 @@
if (cpi->svc.number_spatial_layers > 1 ||
cpi->svc.number_temporal_layers > 1)
sf->hl_sf.accurate_bit_estimate = 0;
+
+ // TODO([email protected]): test to see if
+ // estimate_motion_for_var_based_partition == 2 helps here.
+ if (sf->rt_sf.estimate_motion_for_var_based_partition == 2)
+ sf->rt_sf.estimate_motion_for_var_based_partition = 1;
+ if (speed >= 9) sf->rt_sf.estimate_motion_for_var_based_partition = 0;
+
+ // For single layers RPS: bias/adjustment for recovery frame.
+ if (cpi->ppi->rtc_ref.bias_recovery_frame) {
+ sf->mv_sf.search_method = NSTEP;
+ sf->mv_sf.subpel_search_method = SUBPEL_TREE;
+ sf->rt_sf.fullpel_search_step_param = 8;
+ sf->rt_sf.nonrd_aggressive_skip = 0;
+ }
}
// Screen settings.
if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN) {
@@ -1446,6 +1507,7 @@
if (speed >= 7) {
sf->rt_sf.reduce_mv_pel_precision_highmotion = 1;
sf->mv_sf.use_bsize_dependent_search_method = 0;
+ sf->rt_sf.skip_cdef_sb = 1;
}
if (speed >= 8) {
sf->rt_sf.nonrd_check_partition_merge_mode = 3;
@@ -1469,13 +1531,18 @@
if (speed >= 10) {
if (cm->width * cm->height > 1920 * 1080)
sf->part_sf.disable_8x8_part_based_on_qidx = 1;
- sf->rt_sf.set_zeromv_skip_based_on_source_sad = 2;
sf->rt_sf.screen_content_cdef_filter_qindex_thresh = 80;
sf->rt_sf.part_early_exit_zeromv = 1;
sf->rt_sf.nonrd_aggressive_skip = 1;
}
+ if (speed >= 11) {
+ sf->rt_sf.skip_lf_screen = 2;
+ sf->rt_sf.skip_cdef_sb = 2;
+ sf->rt_sf.part_early_exit_zeromv = 2;
+ sf->rt_sf.prune_palette_nonrd = 1;
+ sf->rt_sf.set_zeromv_skip_based_on_source_sad = 2;
+ }
sf->rt_sf.use_nonrd_altref_frame = 0;
- sf->rt_sf.skip_cdef_sb = 1;
sf->rt_sf.use_rtc_tf = 0;
sf->rt_sf.use_comp_ref_nonrd = 0;
sf->rt_sf.source_metrics_sb_nonrd = 1;
@@ -1497,6 +1564,18 @@
}
sf->rt_sf.partition_direct_merging = 0;
sf->hl_sf.accurate_bit_estimate = 0;
+
+ // "sf->rt_sf.estimate_motion_for_var_based_partition = 2" doesn't work well
+ // for screen contents.
+ if (sf->rt_sf.estimate_motion_for_var_based_partition == 2)
+ sf->rt_sf.estimate_motion_for_var_based_partition = 1;
+ if (speed >= 9) sf->rt_sf.estimate_motion_for_var_based_partition = 0;
+ }
+ if (is_lossless_requested(&cpi->oxcf.rc_cfg)) {
+ sf->rt_sf.use_rtc_tf = 0;
+ // TODO(aomedia:3412): The setting accurate_bit_estimate = 0
+ // can be removed once it's fixed for lossless mode.
+ sf->hl_sf.accurate_bit_estimate = 0;
}
}
@@ -1532,7 +1611,9 @@
sf->intra_sf.dv_cost_upd_level = INTERNAL_COST_UPD_OFF;
sf->tx_sf.model_based_prune_tx_search_level = 0;
sf->lpf_sf.dual_sgr_penalty_level = 1;
- sf->lpf_sf.disable_lr_filter = 1;
+ // Disable Wiener and Self-guided Loop restoration filters.
+ sf->lpf_sf.disable_wiener_filter = true;
+ sf->lpf_sf.disable_sgr_filter = true;
sf->rt_sf.skip_interp_filter_search = 1;
sf->intra_sf.prune_palette_search_level = 2;
sf->intra_sf.prune_luma_palette_size_search_level = 2;
@@ -1557,7 +1638,7 @@
sf->inter_sf.disable_interintra_wedge_var_thresh = UINT_MAX;
sf->inter_sf.selective_ref_frame = 4;
sf->inter_sf.alt_ref_search_fp = 2;
- sf->inter_sf.txfm_rd_gate_level = boosted ? 0 : 4;
+ set_txfm_rd_gate_level(sf->inter_sf.txfm_rd_gate_level, boosted ? 0 : 4);
sf->inter_sf.limit_txfm_eval_per_mode = 3;
sf->inter_sf.adaptive_rd_thresh = 4;
@@ -1628,7 +1709,7 @@
sf->winner_mode_sf.winner_mode_ifs = 1;
sf->rt_sf.check_intra_pred_nonrd = 1;
- sf->rt_sf.estimate_motion_for_var_based_partition = 1;
+ sf->rt_sf.estimate_motion_for_var_based_partition = 2;
sf->rt_sf.hybrid_intra_pickmode = 1;
sf->rt_sf.use_comp_ref_nonrd = 0;
sf->rt_sf.ref_frame_comp_nonrd[0] = 0;
@@ -1754,7 +1835,6 @@
if (speed >= 8) {
sf->rt_sf.sse_early_term_inter_search = EARLY_TERM_IDX_2;
sf->intra_sf.intra_pruning_with_hog = 1;
- sf->rt_sf.estimate_motion_for_var_based_partition = 1;
sf->rt_sf.short_circuit_low_temp_var = 1;
sf->rt_sf.use_nonrd_altref_frame = 0;
sf->rt_sf.nonrd_prune_ref_frame_search = 2;
@@ -1768,7 +1848,7 @@
}
if (speed >= 9) {
sf->rt_sf.sse_early_term_inter_search = EARLY_TERM_IDX_3;
- sf->rt_sf.estimate_motion_for_var_based_partition = 0;
+ sf->rt_sf.estimate_motion_for_var_based_partition = 3;
sf->rt_sf.prefer_large_partition_blocks = 3;
sf->rt_sf.skip_intra_pred = 2;
sf->rt_sf.var_part_split_threshold_shift = 9;
@@ -1799,8 +1879,9 @@
hl_sf->superres_auto_search_type = SUPERRES_AUTO_ALL;
hl_sf->disable_extra_sc_testing = 0;
hl_sf->second_alt_ref_filtering = 1;
- hl_sf->num_frames_used_in_tf = INT_MAX;
+ hl_sf->adjust_num_frames_for_arf_filtering = 0;
hl_sf->accurate_bit_estimate = 0;
+ hl_sf->weight_calc_level_in_tf = 0;
}
static AOM_INLINE void init_fp_sf(FIRST_PASS_SPEED_FEATURES *fp_sf) {
@@ -1818,7 +1899,6 @@
tpl_sf->skip_alike_starting_mv = 0;
tpl_sf->subpel_force_stop = EIGHTH_PEL;
tpl_sf->search_method = NSTEP;
- tpl_sf->disable_filtered_key_tpl = 0;
tpl_sf->prune_ref_frames_in_tpl = 0;
tpl_sf->allow_compound_pred = 1;
tpl_sf->use_y_only_rate_distortion = 0;
@@ -1829,6 +1909,7 @@
gm_sf->prune_ref_frame_for_gm_search = 0;
gm_sf->prune_zero_mv_with_sse = 0;
gm_sf->disable_gm_search_based_on_stats = 0;
+ gm_sf->num_refinement_steps = GM_MAX_REFINEMENT_STEPS;
}
static AOM_INLINE void init_part_sf(PARTITION_SPEED_FEATURES *part_sf) {
@@ -1864,6 +1945,7 @@
part_sf->rect_partition_eval_thresh = BLOCK_128X128;
part_sf->prune_ext_part_using_split_info = 0;
part_sf->prune_rectangular_split_based_on_qidx = 0;
+ part_sf->prune_rect_part_using_4x4_var_deviation = false;
part_sf->early_term_after_none_split = 0;
part_sf->ml_predict_breakout_level = 0;
part_sf->prune_sub_8x8_partition_level = 0;
@@ -1894,6 +1976,8 @@
mv_sf->disable_extensive_joint_motion_search = 0;
mv_sf->disable_second_mv = 0;
mv_sf->skip_fullpel_search_using_startmv = 0;
+ mv_sf->warp_search_method = WARP_SEARCH_SQUARE;
+ mv_sf->warp_search_iters = 8;
}
static AOM_INLINE void init_inter_sf(INTER_MODE_SPEED_FEATURES *inter_sf) {
@@ -1934,7 +2018,6 @@
inter_sf->prune_ref_mv_idx_search = 0;
inter_sf->prune_warped_prob_thresh = 0;
inter_sf->reuse_compound_type_decision = 0;
- inter_sf->txfm_rd_gate_level = 0;
inter_sf->prune_inter_modes_if_skippable = 0;
inter_sf->disable_masked_comp = 0;
inter_sf->enable_fast_compound_mode_search = 0;
@@ -1944,6 +2027,7 @@
inter_sf->limit_inter_mode_cands = 0;
inter_sf->limit_txfm_eval_per_mode = 0;
inter_sf->skip_arf_compound = 0;
+ set_txfm_rd_gate_level(inter_sf->txfm_rd_gate_level, 0);
}
static AOM_INLINE void init_interp_sf(INTERP_FILTER_SPEED_FEATURES *interp_sf) {
@@ -2001,6 +2085,7 @@
tx_sf->refine_fast_tx_search_results = 1;
tx_sf->prune_tx_size_level = 0;
tx_sf->prune_intra_tx_depths_using_nn = false;
+ tx_sf->use_rd_based_breakout_for_intra_tx_search = false;
}
static AOM_INLINE void init_rd_sf(RD_CALC_SPEED_FEATURES *rd_sf,
@@ -2058,7 +2143,10 @@
lpf_sf->cdef_pick_method = CDEF_FULL_SEARCH;
// Set decoder side speed feature to use less dual sgr modes
lpf_sf->dual_sgr_penalty_level = 0;
- lpf_sf->disable_lr_filter = 0;
+ // Enable Wiener and Self-guided Loop restoration filters by default.
+ lpf_sf->disable_wiener_filter = false;
+ lpf_sf->disable_sgr_filter = false;
+ lpf_sf->disable_wiener_coeff_refine_search = false;
lpf_sf->use_downsampled_wiener_stats = 0;
}
@@ -2106,6 +2194,7 @@
rt_sf->gf_refresh_based_on_qp = 0;
rt_sf->use_rtc_tf = 0;
rt_sf->prune_idtx_nonrd = 0;
+ rt_sf->prune_palette_nonrd = 0;
rt_sf->part_early_exit_zeromv = 0;
rt_sf->sse_early_term_inter_search = EARLY_TERM_DISABLED;
rt_sf->skip_lf_screen = 0;
@@ -2117,6 +2206,8 @@
rt_sf->prune_compoundmode_with_singlecompound_var = false;
rt_sf->frame_level_mode_cost_update = false;
rt_sf->prune_h_pred_using_best_mode_so_far = false;
+ rt_sf->enable_intra_mode_pruning_using_neighbors = false;
+ rt_sf->prune_intra_mode_using_best_sad_so_far = false;
rt_sf->check_only_zero_zeromv_on_large_blocks = false;
rt_sf->disable_cdf_update_non_reference_frame = false;
rt_sf->prune_compoundmode_with_singlemode_var = false;
@@ -2128,23 +2219,22 @@
rt_sf->check_globalmv_on_single_ref = true;
}
+static fractional_mv_step_fp
+ *const fractional_mv_search[SUBPEL_SEARCH_METHODS] = {
+ av1_find_best_sub_pixel_tree, // SUBPEL_TREE = 0
+ av1_find_best_sub_pixel_tree_pruned, // SUBPEL_TREE_PRUNED = 1
+ av1_find_best_sub_pixel_tree_pruned_more // SUBPEL_TREE_PRUNED_MORE = 2
+ };
+
// Populate appropriate sub-pel search method based on speed feature and user
// specified settings
static void set_subpel_search_method(
MotionVectorSearchParams *mv_search_params,
unsigned int motion_vector_unit_test,
- SUBPEL_SEARCH_METHODS subpel_search_method) {
- if (subpel_search_method == SUBPEL_TREE) {
- mv_search_params->find_fractional_mv_step = av1_find_best_sub_pixel_tree;
- } else if (subpel_search_method == SUBPEL_TREE_PRUNED) {
- mv_search_params->find_fractional_mv_step =
- av1_find_best_sub_pixel_tree_pruned;
- } else if (subpel_search_method == SUBPEL_TREE_PRUNED_MORE) {
- mv_search_params->find_fractional_mv_step =
- av1_find_best_sub_pixel_tree_pruned_more;
- } else {
- assert(0);
- }
+ SUBPEL_SEARCH_METHOD subpel_search_method) {
+ assert(subpel_search_method <= SUBPEL_TREE_PRUNED_MORE);
+ mv_search_params->find_fractional_mv_step =
+ fractional_mv_search[subpel_search_method];
// This is only used in motion vector unit test.
if (motion_vector_unit_test == 1)
@@ -2232,12 +2322,30 @@
sf->winner_mode_sf.tx_size_search_level = 3;
}
+ if (cpi->mt_info.num_workers > 1) {
+ // Loop restoration stage is conditionally disabled for speed 5, 6 when
+ // num_workers > 1. Since av1_pick_filter_restoration() is not
+ // multi-threaded, enabling the Loop restoration stage will cause an
+ // increase in encode time (3% to 7% increase depends on frame
+ // resolution).
+ // TODO(aomedia:3446): Implement multi-threading of
+ // av1_pick_filter_restoration() and enable Wiener filter for speed 5, 6
+ // similar to single thread encoding path.
+ if (speed >= 5) {
+ sf->lpf_sf.disable_sgr_filter = true;
+ sf->lpf_sf.disable_wiener_filter = true;
+ }
+ }
+
if (!cpi->ppi->seq_params_locked) {
cpi->common.seq_params->order_hint_info.enable_dist_wtd_comp &=
(sf->inter_sf.use_dist_wtd_comp_flag != DIST_WTD_COMP_DISABLED);
cpi->common.seq_params->enable_dual_filter &=
!sf->interp_sf.disable_dual_filter;
- cpi->common.seq_params->enable_restoration &= !sf->lpf_sf.disable_lr_filter;
+ // Set the flag 'enable_restoration', if one the Loop restoration filters
+ // (i.e., Wiener or Self-guided) is enabled.
+ cpi->common.seq_params->enable_restoration &=
+ (!sf->lpf_sf.disable_wiener_filter || !sf->lpf_sf.disable_sgr_filter);
cpi->common.seq_params->enable_interintra_compound &=
(sf->inter_sf.disable_interintra_wedge_var_thresh != UINT_MAX);
@@ -2469,6 +2577,17 @@
}
}
+ if (speed == 5) {
+ if (!(frame_is_intra_only(&cpi->common) ||
+ cm->features.allow_screen_content_tools)) {
+ const int qindex[2] = { 256, 128 };
+ // Set the sf value as 3 for low resolution and
+ // for higher resolutions with low quantizers.
+ if (cm->quant_params.base_qindex < qindex[is_480p_or_larger])
+ sf->tx_sf.tx_type_search.winner_mode_tx_type_pruning = 3;
+ }
+ }
+
set_subpel_search_method(&cpi->mv_search_params,
cpi->oxcf.unit_test_cfg.motion_vector_unit_test,
sf->mv_sf.subpel_search_method);
diff --git a/av1/encoder/speed_features.h b/av1/encoder/speed_features.h
index 00c21e5..27a07c5 100644
--- a/av1/encoder/speed_features.h
+++ b/av1/encoder/speed_features.h
@@ -35,6 +35,11 @@
GM_FULL_SEARCH,
GM_REDUCED_REF_SEARCH_SKIP_L2_L3,
GM_REDUCED_REF_SEARCH_SKIP_L2_L3_ARF2,
+
+ // Same as GM_REDUCED_REF_SEARCH_SKIP_L2_L3_ARF2 but with extra filtering
+ // to keep at most two ref frames
+ GM_SEARCH_CLOSEST_REFS_ONLY,
+
GM_DISABLE_SEARCH
} UENUM1BYTE(GM_SEARCH_TYPE);
@@ -134,7 +139,8 @@
SUBPEL_TREE = 0,
SUBPEL_TREE_PRUNED = 1, // Prunes 1/2-pel searches
SUBPEL_TREE_PRUNED_MORE = 2, // Prunes 1/2-pel searches more aggressively
-} UENUM1BYTE(SUBPEL_SEARCH_METHODS);
+ SUBPEL_SEARCH_METHODS
+} UENUM1BYTE(SUBPEL_SEARCH_METHOD);
enum {
// Try the full image with different values.
@@ -233,6 +239,16 @@
PRUNE_NEARMV_MAX = PRUNE_NEARMV_LEVEL3,
} UENUM1BYTE(PRUNE_NEARMV_LEVEL);
+enum {
+ // Default Transform search case - used in evaluation of compound type mode
+ // and best inter candidates
+ TX_SEARCH_DEFAULT = 0,
+ // Transform search in motion mode rd
+ TX_SEARCH_MOTION_MODE,
+ // All transform search cases
+ TX_SEARCH_CASES
+} UENUM1BYTE(TX_SEARCH_CASE);
+
typedef struct {
TX_TYPE_PRUNE_MODE prune_2d_txfm_mode;
int fast_intra_tx_type_search;
@@ -431,10 +447,10 @@
int second_alt_ref_filtering;
/*!
- * Number of frames to be used in temporal filtering controlled based on noise
- * levels and arf-q.
+ * The number of frames to be used during temporal filtering of an ARF frame
+ * is adjusted based on noise level of the current frame.
*/
- int num_frames_used_in_tf;
+ int adjust_num_frames_for_arf_filtering;
/*!
* Decide the bit estimation approach used in qindex decision.
@@ -442,6 +458,13 @@
* 1: estimate bits more accurately based on the frame complexity.
*/
int accurate_bit_estimate;
+
+ /*!
+ * Decide the approach for weight calculation during temporal filtering.
+ * 0: Calculate weight using exp()
+ * 1: Calculate weight using a lookup table that approximates exp().
+ */
+ int weight_calc_level_in_tf;
} HIGH_LEVEL_SPEED_FEATURES;
/*!
@@ -505,9 +528,6 @@
// Prune starting mvs in TPL based on sad scores.
int prune_starting_mv;
- // Not run TPL for filtered Key frame.
- int disable_filtered_key_tpl;
-
// Prune reference frames in TPL.
int prune_ref_frames_in_tpl;
@@ -536,6 +556,9 @@
// Disable global motion estimation based on stats of previous frames in the
// GF group
int disable_gm_search_based_on_stats;
+
+ // Number of refinement steps to apply after initial model generation
+ int num_refinement_steps;
} GLOBAL_MOTION_SPEED_FEATURES;
typedef struct PARTITION_SPEED_FEATURES {
@@ -649,6 +672,18 @@
// 2 : prune all block size based on qindex
int prune_rectangular_split_based_on_qidx;
+ // Prune rectangular partitions based on 4x4 sub-block variance
+ // false : no pruning
+ // true : prune rectangular partitions based on 4x4 sub-block variance
+ // deviation
+ //
+ // For allintra encode, this speed feature reduces instruction count by 6.4%
+ // for speed=6 with coding performance change less than 0.24%. For AVIF image
+ // encode, this speed feature reduces encode time by 8.14% for speed 6 on a
+ // typical image dataset with coding performance change less than 0.16%. This
+ // speed feature is not applicable to speed >= 7.
+ bool prune_rect_part_using_4x4_var_deviation;
+
// Terminate partition search for child partition,
// when NONE and SPLIT partition rd_costs are INT64_MAX.
int early_term_after_none_split;
@@ -746,7 +781,7 @@
// logarithmic search that keeps stepping at 1/2 pixel units until
// you stop getting a gain, and then goes on to 1/4 and repeats
// the same process. Along the way it skips many diagonals.
- SUBPEL_SEARCH_METHODS subpel_search_method;
+ SUBPEL_SEARCH_METHOD subpel_search_method;
// Maximum number of steps in logarithmic subpel search before giving up.
int subpel_iters_per_step;
@@ -788,7 +823,16 @@
int full_pixel_search_level;
// Whether to downsample the rows in sad calculation during motion search.
- // This is only active when there are at least 16 rows.
+ // This is only active when there are at least 16 rows. When this sf is
+ // active, if there is a large discrepancy in the SAD values for the final
+ // motion vector between skipping vs not skipping, motion search is redone
+ // with skip row features off.
+ // 0: Disabled (do not downsample rows)
+ // 1: Skip SAD calculation of odd rows if the SAD deviation of the even and
+ // odd rows for the starting MV is small. Redo motion search with sf off
+ // when SAD deviation is high for the final motion vector.
+ // 2: Skip SAD calculation of odd rows. SAD deviation is not tested for the
+ // start MV and tested only for the final MV.
int use_downsampled_sad;
// Enable/disable extensive joint motion search.
@@ -801,7 +845,17 @@
int disable_second_mv;
// Skips full pixel search based on start mv of prior ref_mv_idx.
+ // 0: Disabled
+ // 1: Skips the full pixel search upto 4 neighbor full-pel MV positions.
+ // 2: Skips the full pixel search upto 8 neighbor full-pel MV positions.
int skip_fullpel_search_using_startmv;
+
+ // Method to use for refining WARPED_CAUSAL motion vectors
+ // TODO(rachelbarker): Can this be unified with OBMC in some way?
+ WARP_SEARCH_METHOD warp_search_method;
+
+ // Maximum number of iterations in WARPED_CAUSAL refinement search
+ int warp_search_iters;
} MV_SPEED_FEATURES;
typedef struct INTER_MODE_SPEED_FEATURES {
@@ -813,8 +867,11 @@
// 2: used with static rd model
int inter_mode_rd_model_estimation;
- // Bypass transform search based on skip rd
- int txfm_rd_gate_level;
+ // Bypass transform search based on skip rd at following stages
+ // i. Compound type mode search
+ // ii. Motion mode search (mode evaluation and winner motion mode stage)
+ // iii. Transform search for best inter candidates
+ int txfm_rd_gate_level[TX_SEARCH_CASES];
// Limit the inter mode tested in the RD loop
int reduce_inter_modes;
@@ -927,14 +984,9 @@
int prune_comp_using_best_single_mode_ref;
// Skip NEARESTMV and NEARMV using weight computed in ref mv list population
- // This speed feature sometimes leads to severe visual artifacts for
- // the overlay frame. It makes inter RD mode search skip NEARESTMV
- // and NEARMV, and no valid inter mode is evaluated when the NEWMV mode
- // is also early terminated due to the constraint that it does not handle
- // zero mv difference. In this cases, intra modes will be chosen, leading
- // to bad prediction and flickering artifacts.
- // Turn off this feature for now. Be careful to check visual quality if
- // anyone is going to turn it on.
+ //
+ // Pruning is enabled only when both the top and left neighbor blocks are
+ // available and when the current block already has a valid inter prediction.
int prune_nearest_near_mv_using_refmv_weight;
// Based on previous ref_mv_idx search result, prune the following search.
@@ -999,7 +1051,20 @@
// Enable/disable masked compound.
int disable_masked_comp;
- // Enable/disable the fast compound mode search.
+ // Enable/disable MV refinement for compound modes corresponds to compound
+ // types COMPOUND_AVERAGE, COMPOUND_DISTWTD (currently, this compound type
+ // is disabled for speeds >= 2 using the sf 'use_dist_wtd_comp_flag') and
+ // COMPOUND_DIFFWTD based on the availability. Levels 0 to 3 indicate
+ // increasing order of aggressiveness to disable MV refinement.
+ // 0: MV Refinement is enabled and for NEW_NEWMV mode used two iterations of
+ // refinement in av1_joint_motion_search().
+ // 1: MV Refinement is disabled for COMPOUND_DIFFWTD and enabled for
+ // COMPOUND_AVERAGE & COMPOUND_DISTWTD.
+ // 2: MV Refinement is enabled for COMPOUND_AVERAGE & COMPOUND_DISTWTD for
+ // NEW_NEWMV mode with one iteration of refinement in
+ // av1_joint_motion_search() and MV Refinement is disabled for other compound
+ // type modes.
+ // 3: MV Refinement is disabled.
int enable_fast_compound_mode_search;
// Reuse masked compound type search results
@@ -1236,6 +1301,21 @@
// encode time by 4.65%, 9.16% and 10.45% for speed 6, 7 and 8 on a typical
// image dataset with coding performance change less than 0.19%.
bool prune_intra_tx_depths_using_nn;
+
+ // Enable/disable early breakout during transform search of intra modes, by
+ // using the minimum rd cost possible. By using this approach, the rd
+ // evaluation of applicable transform blocks (in the current block) can be
+ // avoided as
+ // 1) best_rd evolves during the search in choose_tx_size_type_from_rd()
+ // 2) appropriate ref_best_rd is passed in intra_block_yrd()
+ //
+ // For allintra encode, this speed feature reduces instruction count
+ // by 1.11%, 1.08%, 1.02% and 0.93% for speed 3, 6, 7 and 8 with coding
+ // performance change less than 0.02%. For AVIF image encode, this speed
+ // feature reduces encode time by 0.93%, 1.46%, 1.07%, 0.84%, 0.99% and 0.73%
+ // for speed 3, 4, 5, 6, 7 and 8 on a typical image dataset with coding
+ // performance change less than 0.004%.
+ bool use_rd_based_breakout_for_intra_tx_search;
} TX_SPEED_FEATURES;
typedef struct RD_CALC_SPEED_FEATURES {
@@ -1377,8 +1457,14 @@
// Reduce the wiener filter win size for luma
int reduce_wiener_window_size;
- // Disable loop restoration filter
- int disable_lr_filter;
+ // Flag to disable Wiener Loop restoration filter.
+ bool disable_wiener_filter;
+
+ // Flag to disable Self-guided Loop restoration filter.
+ bool disable_sgr_filter;
+
+ // Disable the refinement search around the wiener filter coefficients.
+ bool disable_wiener_coeff_refine_search;
// Whether to downsample the rows in computation of wiener stats.
int use_downsampled_wiener_stats;
@@ -1395,7 +1481,11 @@
// Skipping aggressiveness increases from level 1 to 2.
int skip_intra_pred;
- // Perform coarse ME before calculating variance in variance-based partition
+ // Estimate motion before calculating variance in variance-based partition
+ // 0 - Only use zero MV
+ // 1 - perform coarse ME
+ // 2 - perform coarse ME, and also use neighbours' MVs
+ // 3 - use neighbours' MVs without performing coarse ME
int estimate_motion_for_var_based_partition;
// For nonrd_use_partition: mode of extra check of leaf partition
@@ -1486,8 +1576,8 @@
// Bit mask to enable or disable intra modes for each prediction block size
// separately, for nonrd_pickmode. Currently, the sf is not respected when
- // 'force_intra_check' is true in 'estimate_intra_mode()' function. Also, H
- // and V pred modes allowed through this sf can be further pruned when
+ // 'force_intra_check' is true in 'av1_estimate_intra_mode()' function. Also,
+ // H and V pred modes allowed through this sf can be further pruned when
//'prune_hv_pred_modes_using_src_sad' sf is true.
int intra_y_mode_bsize_mask_nrd[BLOCK_SIZES];
@@ -1502,9 +1592,13 @@
// Skips mode checks more aggressively in nonRD mode
int nonrd_aggressive_skip;
- // Skip cdef on 64x64 blocks when NEWMV or INTRA is not picked or color
- // sensitivity is off. When color sensitivity is on for a superblock, all
- // 64x64 blocks within will not skip.
+ // Skip cdef on 64x64 blocks/
+ // 0: disabled
+ // 1: skip when NEWMV or INTRA is not picked or color sensitivity is off.
+ // When color sensitivity is on for a superblock, all 64x64 blocks within
+ // will not skip.
+ // 2: more aggressive mode where skip is done for all frames where
+ // rc->high_source_sad = 0 (non slide-changes), and color sensitivity off.
int skip_cdef_sb;
// Forces larger partition blocks in variance based partitioning for intra
@@ -1565,6 +1659,7 @@
// Temporal filtering
// The value can be 1 or 2, which indicates the threshold to use.
+ // Must be off for lossless mode.
int use_rtc_tf;
// Prune the use of the identity transform in nonrd_pickmode,
@@ -1573,8 +1668,15 @@
// already set.
int prune_idtx_nonrd;
+ // Prune the use of paletter mode in nonrd pickmode.
+ int prune_palette_nonrd;
+
// Skip loopfilter, for static content after slide change
// or key frame, once quality has ramped up.
+ // 0: disabled
+ // 1: skip only after quality is ramped up.
+ // 2: aggrssive mode, where skip is done for all frames that
+ // where rc->high_source_sad = 0 (no slide-changes).
int skip_lf_screen;
// For nonrd: early exit out of variance partition that sets the
@@ -1640,6 +1742,26 @@
// 0.08%.
bool prune_h_pred_using_best_mode_so_far;
+ // Enable pruning of intra mode evaluations in nonrd path based on source
+ // variance and best mode so far. The pruning logic is enabled only if the
+ // mode is not a winner mode of both the neighboring blocks (left/top).
+ //
+ // For allintra encode, this speed feature reduces instruction count by 3.96%
+ // for speed 9 with coding performance change less than 0.38%.
+ // For AVIF image encode, this speed feature reduces encode time by 3.46% for
+ // speed 9 on a typical image dataset with coding performance change less than
+ // -0.06%.
+ bool enable_intra_mode_pruning_using_neighbors;
+
+ // Prune intra mode evaluations in nonrd path based on best sad so far.
+ //
+ // For allintra encode, this speed feature reduces instruction count by 3.05%
+ // for speed 9 with coding performance change less than 0.24%.
+ // For AVIF image encode, this speed feature reduces encode time by 1.87% for
+ // speed 9 on a typical image dataset with coding performance change less than
+ // 0.16%.
+ bool prune_intra_mode_using_best_sad_so_far;
+
// If compound is enabled, and the current block size is \geq BLOCK_16X16,
// limit the compound modes to GLOBAL_GLOBALMV. This does not apply to the
// base layer of svc.
diff --git a/av1/encoder/superres_scale.c b/av1/encoder/superres_scale.c
index f439e70..5e1e289 100644
--- a/av1/encoder/superres_scale.c
+++ b/av1/encoder/superres_scale.c
@@ -403,7 +403,7 @@
assert(!is_lossless_requested(&cpi->oxcf.rc_cfg));
assert(!cm->features.all_lossless);
- av1_superres_upscale(cm, NULL);
+ av1_superres_upscale(cm, NULL, cpi->image_pyramid_levels);
// If regular resizing is occurring the source will need to be downscaled to
// match the upscaled superres resolution. Otherwise the original source is
diff --git a/av1/encoder/svc_layercontext.c b/av1/encoder/svc_layercontext.c
index d31f55d..85678dc 100644
--- a/av1/encoder/svc_layercontext.c
+++ b/av1/encoder/svc_layercontext.c
@@ -8,6 +8,7 @@
* be found in the AUTHORS file in the root of the source tree.
*/
+#include <assert.h>
#include <math.h>
#include "av1/encoder/encoder.h"
@@ -84,6 +85,7 @@
bool av1_alloc_layer_context(AV1_COMP *cpi, int num_layers) {
SVC *const svc = &cpi->svc;
if (svc->layer_context == NULL || svc->num_allocated_layers < num_layers) {
+ assert(num_layers > 1);
aom_free(svc->layer_context);
svc->num_allocated_layers = 0;
svc->layer_context =
@@ -99,10 +101,13 @@
const int64_t target_bandwidth) {
const RATE_CONTROL *const rc = &cpi->rc;
const PRIMARY_RATE_CONTROL *const p_rc = &cpi->ppi->p_rc;
+ AV1_COMMON *const cm = &cpi->common;
SVC *const svc = &cpi->svc;
int layer = 0;
int64_t spatial_layer_target = 0;
float bitrate_alloc = 1.0;
+ const int mi_rows = cm->mi_params.mi_rows;
+ const int mi_cols = cm->mi_params.mi_cols;
for (int sl = 0; sl < svc->number_spatial_layers; ++sl) {
for (int tl = 0; tl < svc->number_temporal_layers; ++tl) {
layer = LAYER_IDS_TO_IDX(sl, tl, svc->number_temporal_layers);
@@ -116,7 +121,9 @@
RATE_CONTROL *const lrc = &lc->rc;
PRIMARY_RATE_CONTROL *const lp_rc = &lc->p_rc;
lc->spatial_layer_target_bandwidth = spatial_layer_target;
- bitrate_alloc = (float)lc->target_bandwidth / target_bandwidth;
+ if (target_bandwidth != 0) {
+ bitrate_alloc = (float)lc->target_bandwidth / target_bandwidth;
+ }
lp_rc->starting_buffer_level =
(int64_t)(p_rc->starting_buffer_level * bitrate_alloc);
lp_rc->optimal_buffer_level =
@@ -134,6 +141,24 @@
lrc->rtc_external_ratectrl = rc->rtc_external_ratectrl;
lrc->worst_quality = av1_quantizer_to_qindex(lc->max_q);
lrc->best_quality = av1_quantizer_to_qindex(lc->min_q);
+ if (rc->use_external_qp_one_pass) {
+ lrc->worst_quality = rc->worst_quality;
+ lrc->best_quality = rc->best_quality;
+ }
+ // Reset the cyclic refresh parameters, if needed (map is NULL),
+ // or number of spatial layers has changed.
+ // Cyclic refresh is only applied on base temporal layer.
+ if (svc->number_spatial_layers > 1 && tl == 0 &&
+ (lc->map == NULL ||
+ svc->prev_number_spatial_layers != svc->number_spatial_layers)) {
+ lc->sb_index = 0;
+ lc->actual_num_seg1_blocks = 0;
+ lc->actual_num_seg2_blocks = 0;
+ lc->counter_encode_maxq_scene_change = 0;
+ if (lc->map) aom_free(lc->map);
+ CHECK_MEM_ERROR(cm, lc->map,
+ aom_calloc(mi_rows * mi_cols, sizeof(*lc->map)));
+ }
}
}
}
@@ -178,8 +203,9 @@
static AOM_INLINE bool check_ref_is_low_spatial_res_super_frame(
int ref_frame, const SVC *svc, const RTC_REF *rtc_ref) {
int ref_frame_idx = rtc_ref->ref_idx[ref_frame - 1];
- return svc->buffer_time_index[ref_frame_idx] == svc->current_superframe &&
- svc->buffer_spatial_layer[ref_frame_idx] <= svc->spatial_layer_id - 1;
+ return rtc_ref->buffer_time_index[ref_frame_idx] == svc->current_superframe &&
+ rtc_ref->buffer_spatial_layer[ref_frame_idx] <=
+ svc->spatial_layer_id - 1;
}
void av1_restore_layer_context(AV1_COMP *const cpi) {
@@ -232,6 +258,32 @@
}
}
+void av1_svc_update_buffer_slot_refreshed(AV1_COMP *const cpi) {
+ SVC *const svc = &cpi->svc;
+ RTC_REF *const rtc_ref = &cpi->ppi->rtc_ref;
+ const unsigned int current_frame =
+ cpi->ppi->use_svc ? svc->current_superframe
+ : cpi->common.current_frame.frame_number;
+ // For any buffer slot that is refreshed, update it with
+ // the spatial_layer_id and the current_superframe.
+ if (cpi->common.current_frame.frame_type == KEY_FRAME) {
+ // All slots are refreshed on KEY.
+ for (unsigned int i = 0; i < REF_FRAMES; i++) {
+ rtc_ref->buffer_time_index[i] = current_frame;
+ rtc_ref->buffer_spatial_layer[i] = svc->spatial_layer_id;
+ }
+ } else if (rtc_ref->set_ref_frame_config) {
+ for (unsigned int i = 0; i < INTER_REFS_PER_FRAME; i++) {
+ const int ref_frame_map_idx = rtc_ref->ref_idx[i];
+ if (cpi->ppi->rtc_ref.refresh[ref_frame_map_idx]) {
+ rtc_ref->buffer_time_index[ref_frame_map_idx] = current_frame;
+ rtc_ref->buffer_spatial_layer[ref_frame_map_idx] =
+ svc->spatial_layer_id;
+ }
+ }
+ }
+}
+
void av1_save_layer_context(AV1_COMP *const cpi) {
SVC *const svc = &cpi->svc;
const AV1_COMMON *const cm = &cpi->common;
@@ -255,23 +307,7 @@
lc->actual_num_seg2_blocks = cr->actual_num_seg2_blocks;
lc->counter_encode_maxq_scene_change = cr->counter_encode_maxq_scene_change;
}
- // For any buffer slot that is refreshed, update it with
- // the spatial_layer_id and the current_superframe.
- if (cpi->common.current_frame.frame_type == KEY_FRAME) {
- // All slots are refreshed on KEY.
- for (unsigned int i = 0; i < REF_FRAMES; i++) {
- svc->buffer_time_index[i] = svc->current_superframe;
- svc->buffer_spatial_layer[i] = svc->spatial_layer_id;
- }
- } else if (cpi->ppi->rtc_ref.set_ref_frame_config) {
- for (unsigned int i = 0; i < INTER_REFS_PER_FRAME; i++) {
- int ref_frame_map_idx = cpi->ppi->rtc_ref.ref_idx[i];
- if (cpi->ppi->rtc_ref.refresh[ref_frame_map_idx]) {
- svc->buffer_time_index[ref_frame_map_idx] = svc->current_superframe;
- svc->buffer_spatial_layer[ref_frame_map_idx] = svc->spatial_layer_id;
- }
- }
- }
+ av1_svc_update_buffer_slot_refreshed(cpi);
for (unsigned int i = 0; i < REF_FRAMES; i++) {
if (frame_is_intra_only(cm) ||
cm->current_frame.refresh_frame_flags & (1 << i)) {
@@ -524,12 +560,24 @@
SVC *const svc = &cpi->svc;
for (int sl = 0; sl < svc->number_spatial_layers; ++sl) {
// Check for reset based on avg_frame_bandwidth for spatial layer sl.
+ // If avg_frame_bandwidth for top temporal layer is not set
+ // (because enhancement layer was inactive), use the base TL0
int layer = LAYER_IDS_TO_IDX(sl, svc->number_temporal_layers - 1,
svc->number_temporal_layers);
LAYER_CONTEXT *lc = &svc->layer_context[layer];
RATE_CONTROL *lrc = &lc->rc;
- if (lrc->avg_frame_bandwidth > (3 * lrc->prev_avg_frame_bandwidth >> 1) ||
- lrc->avg_frame_bandwidth < (lrc->prev_avg_frame_bandwidth >> 1)) {
+ int avg_frame_bandwidth = lrc->avg_frame_bandwidth;
+ int prev_avg_frame_bandwidth = lrc->prev_avg_frame_bandwidth;
+ if (avg_frame_bandwidth == 0 || prev_avg_frame_bandwidth == 0) {
+ // Use base TL0.
+ layer = LAYER_IDS_TO_IDX(sl, 0, svc->number_temporal_layers);
+ lc = &svc->layer_context[layer];
+ lrc = &lc->rc;
+ avg_frame_bandwidth = lrc->avg_frame_bandwidth;
+ prev_avg_frame_bandwidth = lrc->prev_avg_frame_bandwidth;
+ }
+ if (avg_frame_bandwidth > (3 * prev_avg_frame_bandwidth >> 1) ||
+ avg_frame_bandwidth < (prev_avg_frame_bandwidth >> 1)) {
// Reset for all temporal layers with spatial layer sl.
for (int tl = 0; tl < svc->number_temporal_layers; ++tl) {
int layer2 = LAYER_IDS_TO_IDX(sl, tl, svc->number_temporal_layers);
@@ -548,25 +596,76 @@
void av1_svc_set_last_source(AV1_COMP *const cpi, EncodeFrameInput *frame_input,
YV12_BUFFER_CONFIG *prev_source) {
- if (cpi->svc.spatial_layer_id == 0) {
- // For base spatial layer: if the LAST reference (index 0) is not
- // the previous (super)frame set the last_source to the source corresponding
- // to the last TL0, otherwise keep it at prev_source.
- frame_input->last_source = prev_source != NULL ? prev_source : NULL;
- if (cpi->svc.current_superframe > 0) {
- const int buffslot_last = cpi->ppi->rtc_ref.ref_idx[0];
- if (cpi->svc.buffer_time_index[buffslot_last] <
- cpi->svc.current_superframe - 1)
+ frame_input->last_source = prev_source != NULL ? prev_source : NULL;
+ if (!cpi->ppi->use_svc && cpi->rc.prev_frame_is_dropped &&
+ cpi->rc.frame_number_encoded > 0) {
+ frame_input->last_source = &cpi->svc.source_last_TL0;
+ } else {
+ RTC_REF *const rtc_ref = &cpi->ppi->rtc_ref;
+ if (cpi->svc.spatial_layer_id == 0) {
+ // For base spatial layer: if the LAST reference (index 0) is not
+ // the previous (super)frame set the last_source to the source
+ // corresponding to the last TL0, otherwise keep it at prev_source.
+ // Always use source_last_TL0 if previous base TL0 was dropped.
+ if (cpi->svc.current_superframe > 0) {
+ const int buffslot_last = rtc_ref->ref_idx[0];
+ // Check if previous frame was dropped on base TL0 layer.
+ const int layer =
+ LAYER_IDS_TO_IDX(0, 0, cpi->svc.number_temporal_layers);
+ LAYER_CONTEXT *lc = &cpi->svc.layer_context[layer];
+ RATE_CONTROL *lrc = &lc->rc;
+ if (lrc->prev_frame_is_dropped ||
+ rtc_ref->buffer_time_index[buffslot_last] <
+ cpi->svc.current_superframe - 1) {
+ frame_input->last_source = &cpi->svc.source_last_TL0;
+ }
+ }
+ } else if (cpi->svc.spatial_layer_id > 0) {
+ // For spatial enhancement layers: the previous source (prev_source)
+ // corresponds to the lower spatial layer (which is the same source so
+ // we can't use that), so always set the last_source to the source of the
+ // last TL0.
+ if (cpi->svc.current_superframe > 0)
frame_input->last_source = &cpi->svc.source_last_TL0;
+ else
+ frame_input->last_source = NULL;
}
- } else if (cpi->svc.spatial_layer_id > 0) {
- // For spatial enhancement layers: the previous source (prev_source)
- // corresponds to the lower spatial layer (which is the same source so
- // we can't use that), so always set the last_source to the source of the
- // last TL0.
- if (cpi->svc.current_superframe > 0)
- frame_input->last_source = &cpi->svc.source_last_TL0;
- else
- frame_input->last_source = NULL;
+ }
+}
+
+int av1_svc_get_min_ref_dist(const AV1_COMP *cpi) {
+ RTC_REF *const rtc_ref = &cpi->ppi->rtc_ref;
+ int min_dist = INT_MAX;
+ const unsigned int current_frame_num =
+ cpi->ppi->use_svc ? cpi->svc.current_superframe
+ : cpi->common.current_frame.frame_number;
+ for (unsigned int i = 0; i < INTER_REFS_PER_FRAME; i++) {
+ if (cpi->ppi->rtc_ref.reference[i]) {
+ const int ref_frame_map_idx = rtc_ref->ref_idx[i];
+ const int dist =
+ current_frame_num - rtc_ref->buffer_time_index[ref_frame_map_idx];
+ if (dist < min_dist) min_dist = dist;
+ }
+ }
+ return min_dist;
+}
+
+void av1_svc_set_reference_was_previous(AV1_COMP *cpi) {
+ RTC_REF *const rtc_ref = &cpi->ppi->rtc_ref;
+ // Check if the encoded frame had some reference that was the
+ // previous frame.
+ const unsigned int current_frame =
+ cpi->ppi->use_svc ? cpi->svc.current_superframe
+ : cpi->common.current_frame.frame_number;
+ rtc_ref->reference_was_previous_frame = true;
+ if (current_frame > 0) {
+ rtc_ref->reference_was_previous_frame = false;
+ for (unsigned int i = 0; i < INTER_REFS_PER_FRAME; i++) {
+ if (rtc_ref->reference[i]) {
+ const int ref_frame_map_idx = rtc_ref->ref_idx[i];
+ if (rtc_ref->buffer_time_index[ref_frame_map_idx] == current_frame - 1)
+ rtc_ref->reference_was_previous_frame = true;
+ }
+ }
}
}
diff --git a/av1/encoder/svc_layercontext.h b/av1/encoder/svc_layercontext.h
index 5e983f6..3a6e0fc 100644
--- a/av1/encoder/svc_layercontext.h
+++ b/av1/encoder/svc_layercontext.h
@@ -29,7 +29,7 @@
RATE_CONTROL rc;
PRIMARY_RATE_CONTROL p_rc;
int framerate_factor;
- int64_t layer_target_bitrate;
+ int64_t layer_target_bitrate; // In bits per second.
int scaling_factor_num;
int scaling_factor_den;
int64_t target_bandwidth;
@@ -91,6 +91,7 @@
int temporal_layer_id;
int number_spatial_layers;
int number_temporal_layers;
+ int prev_number_spatial_layers;
int use_flexible_mode;
int ksvc_fixed_mode;
/*!\endcond */
@@ -98,8 +99,6 @@
/*!\cond */
double base_framerate;
unsigned int current_superframe;
- unsigned int buffer_time_index[REF_FRAMES];
- unsigned char buffer_spatial_layer[REF_FRAMES];
int skip_mvsearch_last;
int skip_mvsearch_gf;
int skip_mvsearch_altref;
@@ -114,11 +113,14 @@
/*!
* Layer context used for rate control in CBR mode.
+ * An array. The index for spatial layer `sl` and temporal layer `tl` is
+ * sl * number_temporal_layers + tl.
*/
LAYER_CONTEXT *layer_context;
/*!
- * Number of layers allocated for layer_context.
+ * Number of layers allocated for layer_context. If nonzero, must be greater
+ * than or equal to number_spatial_layers * number_temporal_layers.
*/
int num_allocated_layers;
@@ -286,6 +288,11 @@
struct EncodeFrameInput *frame_input,
YV12_BUFFER_CONFIG *prev_source);
+void av1_svc_update_buffer_slot_refreshed(struct AV1_COMP *const cpi);
+
+int av1_svc_get_min_ref_dist(const struct AV1_COMP *cpi);
+
+void av1_svc_set_reference_was_previous(struct AV1_COMP *cpi);
#ifdef __cplusplus
} // extern "C"
#endif
diff --git a/av1/encoder/temporal_filter.c b/av1/encoder/temporal_filter.c
index ad1cc64..91a0c78 100644
--- a/av1/encoder/temporal_filter.c
+++ b/av1/encoder/temporal_filter.c
@@ -16,6 +16,7 @@
#include "config/aom_scale_rtcd.h"
#include "aom_dsp/aom_dsp_common.h"
+#include "aom_dsp/mathutils.h"
#include "aom_dsp/odintrin.h"
#include "aom_mem/aom_mem.h"
#include "aom_ports/aom_timer.h"
@@ -145,7 +146,7 @@
const int q = av1_get_q(cpi);
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, mb, block_size,
- &baseline_mv, search_site_cfg,
+ &baseline_mv, start_mv, search_site_cfg,
/*fine_search_interval=*/0);
av1_set_mv_search_method(&full_ms_params, search_site_cfg, search_method);
full_ms_params.run_mesh_search = 1;
@@ -204,7 +205,7 @@
mbd->plane[0].pre[0].buf = ref_frame->y_buffer + y_offset + offset;
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, mb,
subblock_size, &baseline_mv,
- search_site_cfg,
+ start_mv, search_site_cfg,
/*fine_search_interval=*/0);
av1_set_mv_search_method(&full_ms_params, search_site_cfg,
search_method);
@@ -549,6 +550,8 @@
* defined in libaom, converted from `qindex`
* \param[in] filter_strength Filtering strength. This value lies in range
* [0, 6] where 6 is the maximum strength.
+ * \param[in] tf_wgt_calc_lvl Controls the weight calculation method during
+ * temporal filtering
* \param[out] pred Pointer to the well-built predictors
* \param[out] accum Pointer to the pixel-wise accumulator for
* filtering
@@ -563,7 +566,8 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
// Block information.
const int mb_height = block_size_high[block_size];
const int mb_width = block_size_wide[block_size];
@@ -693,7 +697,14 @@
double scaled_error =
combined_error * d_factor[subblock_idx] * decay_factor[plane];
scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ int weight;
+ if (tf_wgt_calc_lvl == 0) {
+ weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ } else {
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ weight = iroundpf(fweight);
+ }
const int idx = plane_offset + pred_idx; // Index with plane shift.
const int pred_value = is_high_bitdepth ? pred16[idx] : pred[idx];
@@ -716,11 +727,12 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
av1_apply_temporal_filter_c(frame_to_filter, mbd, block_size, mb_row, mb_col,
num_planes, noise_levels, subblock_mvs,
- subblock_mses, q_factor, filter_strength, pred,
- accum, count);
+ subblock_mses, q_factor, filter_strength,
+ tf_wgt_calc_lvl, pred, accum, count);
}
#endif // CONFIG_AV1_HIGHBITDEPTH
/*!\brief Normalizes the accumulated filtering result to produce the filtered
@@ -809,6 +821,7 @@
const int mi_h = mi_size_high_log2[block_size];
const int mi_w = mi_size_wide_log2[block_size];
const int num_planes = av1_num_planes(&cpi->common);
+ const int weight_calc_level_in_tf = cpi->sf.hl_sf.weight_calc_level_in_tf;
uint32_t *accum = tf_data->accum;
uint16_t *count = tf_data->count;
uint8_t *pred = tf_data->pred;
@@ -865,27 +878,27 @@
av1_highbd_apply_temporal_filter(
frame_to_filter, mbd, block_size, mb_row, mb_col, num_planes,
noise_levels, subblock_mvs, subblock_mses, q_factor,
- filter_strength, pred, accum, count);
+ filter_strength, weight_calc_level_in_tf, pred, accum, count);
} else {
#endif // CONFIG_AV1_HIGHBITDEPTH
av1_apply_temporal_filter_c(
frame_to_filter, mbd, block_size, mb_row, mb_col, num_planes,
noise_levels, subblock_mvs, subblock_mses, q_factor,
- filter_strength, pred, accum, count);
+ filter_strength, weight_calc_level_in_tf, pred, accum, count);
#if CONFIG_AV1_HIGHBITDEPTH
}
#endif // CONFIG_AV1_HIGHBITDEPTH
} else { // for 8-bit
if (TF_BLOCK_SIZE == BLOCK_32X32 && TF_WINDOW_LENGTH == 5) {
- av1_apply_temporal_filter(frame_to_filter, mbd, block_size, mb_row,
- mb_col, num_planes, noise_levels,
- subblock_mvs, subblock_mses, q_factor,
- filter_strength, pred, accum, count);
+ av1_apply_temporal_filter(
+ frame_to_filter, mbd, block_size, mb_row, mb_col, num_planes,
+ noise_levels, subblock_mvs, subblock_mses, q_factor,
+ filter_strength, weight_calc_level_in_tf, pred, accum, count);
} else {
av1_apply_temporal_filter_c(
frame_to_filter, mbd, block_size, mb_row, mb_col, num_planes,
noise_levels, subblock_mvs, subblock_mses, q_factor,
- filter_strength, pred, accum, count);
+ filter_strength, weight_calc_level_in_tf, pred, accum, count);
}
}
}
@@ -995,11 +1008,9 @@
const YV12_BUFFER_CONFIG *to_filter_frame = &to_filter_buf->img;
const int num_planes = av1_num_planes(&cpi->common);
double *noise_levels = tf_ctx->noise_levels;
- for (int plane = 0; plane < num_planes; ++plane) {
- noise_levels[plane] = av1_estimate_noise_from_single_plane(
- to_filter_frame, plane, cpi->common.seq_params->bit_depth,
- NOISE_ESTIMATION_EDGE_THRESHOLD);
- }
+ av1_estimate_noise_level(to_filter_frame, noise_levels, AOM_PLANE_Y,
+ num_planes - 1, cpi->common.seq_params->bit_depth,
+ NOISE_ESTIMATION_EDGE_THRESHOLD);
// Get quantization factor.
const int q = av1_get_q(cpi);
// Get correlation estimates from first-pass;
@@ -1040,6 +1051,18 @@
adjust_num = 0;
} else if ((update_type == KF_UPDATE) && q <= 10) {
adjust_num = 0;
+ } else if (cpi->sf.hl_sf.adjust_num_frames_for_arf_filtering &&
+ update_type != KF_UPDATE) {
+ // Adjust number of frames to be considered for filtering based on noise
+ // level of the current frame. For low-noise frame, use more frames to
+ // filter such that the filtered frame can provide better predictions for
+ // subsequent frames and vice versa.
+ if (noise_levels[AOM_PLANE_Y] < 0.5)
+ adjust_num = 4;
+ else if (noise_levels[AOM_PLANE_Y] < 1.0)
+ adjust_num = 2;
+ else
+ adjust_num = 0;
}
num_frames = AOMMIN(num_frames + adjust_num, lookahead_depth);
@@ -1055,10 +1078,6 @@
num_frames = AOMMIN(num_frames, gfu_boost / 150);
num_frames += !(num_frames & 1); // Make the number odd.
- // Limit the number of frames if noise levels are low and high quantizers.
- if (noise_levels[AOM_PLANE_Y] < 1.9 && cpi->ppi->p_rc.arf_q > 40)
- num_frames = AOMMIN(num_frames, cpi->sf.hl_sf.num_frames_used_in_tf);
-
// Only use 2 neighbours for the second ARF.
if (update_type == INTNL_ARF_UPDATE) num_frames = AOMMIN(num_frames, 3);
if (AOMMIN(max_after, max_before) >= num_frames / 2) {
@@ -1108,21 +1127,50 @@
/*!\cond */
-// A constant number, sqrt(pi / 2), used for noise estimation.
-static const double SQRT_PI_BY_2 = 1.25331413732;
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height,
+ int width, int stride,
+ int edge_thresh) {
+ int64_t accum = 0;
+ int count = 0;
-double av1_estimate_noise_from_single_plane(const YV12_BUFFER_CONFIG *frame,
- const int plane,
- const int bit_depth,
- const int edge_thresh) {
- const int is_y_plane = (plane == 0);
- const int height = frame->crop_heights[is_y_plane ? 0 : 1];
- const int width = frame->crop_widths[is_y_plane ? 0 : 1];
- const int stride = frame->strides[is_y_plane ? 0 : 1];
- const uint8_t *src = frame->buffers[plane];
- const uint16_t *src16 = CONVERT_TO_SHORTPTR(src);
- const int is_high_bitdepth = is_frame_high_bitdepth(frame);
+ for (int i = 1; i < height - 1; ++i) {
+ for (int j = 1; j < width - 1; ++j) {
+ // Setup a small 3x3 matrix.
+ const int center_idx = i * stride + j;
+ int mat[3][3];
+ for (int ii = -1; ii <= 1; ++ii) {
+ for (int jj = -1; jj <= 1; ++jj) {
+ const int idx = center_idx + ii * stride + jj;
+ mat[ii + 1][jj + 1] = src[idx];
+ }
+ }
+ // Compute sobel gradients.
+ const int Gx = (mat[0][0] - mat[0][2]) + (mat[2][0] - mat[2][2]) +
+ 2 * (mat[1][0] - mat[1][2]);
+ const int Gy = (mat[0][0] - mat[2][0]) + (mat[0][2] - mat[2][2]) +
+ 2 * (mat[0][1] - mat[2][1]);
+ const int Ga = ROUND_POWER_OF_TWO(abs(Gx) + abs(Gy), 0);
+ // Accumulate Laplacian.
+ if (Ga < edge_thresh) { // Only count smooth pixels.
+ const int v = 4 * mat[1][1] -
+ 2 * (mat[0][1] + mat[2][1] + mat[1][0] + mat[1][2]) +
+ (mat[0][0] + mat[0][2] + mat[2][0] + mat[2][2]);
+ accum += ROUND_POWER_OF_TWO(abs(v), 0);
+ ++count;
+ }
+ }
+ }
+ // Return -1.0 (unreliable estimation) if there are too few smooth pixels.
+ return (count < 16) ? -1.0 : (double)accum / (6 * count) * SQRT_PI_BY_2;
+}
+
+#if CONFIG_AV1_HIGHBITDEPTH
+double av1_highbd_estimate_noise_from_single_plane(const uint16_t *src16,
+ int height, int width,
+ const int stride,
+ int bit_depth,
+ int edge_thresh) {
int64_t accum = 0;
int count = 0;
for (int i = 1; i < height - 1; ++i) {
@@ -1133,7 +1181,7 @@
for (int ii = -1; ii <= 1; ++ii) {
for (int jj = -1; jj <= 1; ++jj) {
const int idx = center_idx + ii * stride + jj;
- mat[ii + 1][jj + 1] = is_high_bitdepth ? src16[idx] : src[idx];
+ mat[ii + 1][jj + 1] = src16[idx];
}
}
// Compute sobel gradients.
@@ -1156,6 +1204,35 @@
// Return -1.0 (unreliable estimation) if there are too few smooth pixels.
return (count < 16) ? -1.0 : (double)accum / (6 * count) * SQRT_PI_BY_2;
}
+#endif
+
+void av1_estimate_noise_level(const YV12_BUFFER_CONFIG *frame,
+ double *noise_level, int plane_from, int plane_to,
+ int bit_depth, int edge_thresh) {
+ for (int plane = plane_from; plane <= plane_to; plane++) {
+ const bool is_uv_plane = (plane != AOM_PLANE_Y);
+ const int height = frame->crop_heights[is_uv_plane];
+ const int width = frame->crop_widths[is_uv_plane];
+ const int stride = frame->strides[is_uv_plane];
+ const uint8_t *src = frame->buffers[plane];
+
+#if CONFIG_AV1_HIGHBITDEPTH
+ const uint16_t *src16 = CONVERT_TO_SHORTPTR(src);
+ const int is_high_bitdepth = is_frame_high_bitdepth(frame);
+ if (is_high_bitdepth) {
+ noise_level[plane] = av1_highbd_estimate_noise_from_single_plane(
+ src16, height, width, stride, bit_depth, edge_thresh);
+ } else {
+ noise_level[plane] = av1_estimate_noise_from_single_plane(
+ src, height, width, stride, edge_thresh);
+ }
+#else
+ (void)bit_depth;
+ noise_level[plane] = av1_estimate_noise_from_single_plane(
+ src, height, width, stride, edge_thresh);
+#endif
+ }
+}
// Initializes the members of TemporalFilterCtx
// Inputs:
@@ -1293,7 +1370,7 @@
seq_params->subsampling_x, seq_params->subsampling_y,
seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
cm->features.byte_alignment, NULL, NULL, NULL,
- cpi->oxcf.tool_cfg.enable_global_motion, 0);
+ cpi->image_pyramid_levels, 0);
if (ret) {
aom_internal_error(cm->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate tf_info");
diff --git a/av1/encoder/temporal_filter.h b/av1/encoder/temporal_filter.h
index 725bd86..8aa4731 100644
--- a/av1/encoder/temporal_filter.h
+++ b/av1/encoder/temporal_filter.h
@@ -33,6 +33,9 @@
// Window size for temporal filtering.
#define TF_WINDOW_LENGTH 5
+// A constant number, sqrt(pi / 2), used for noise estimation.
+static const double SQRT_PI_BY_2 = 1.25331413732;
+
// Hyper-parameters used to compute filtering weight. These hyper-parameters can
// be tuned for a better performance.
// 0. A scale factor used in temporal filtering to raise the filter weight from
@@ -268,15 +271,15 @@
// Signal Processing, 2008, St Julians, Malta.
// Inputs:
// frame: Pointer to the frame to estimate noise level from.
-// plane: Index of the plane used for noise estimation. Commonly, 0 for
-// Y-plane, 1 for U-plane, and 2 for V-plane.
+// noise_level: Pointer to store the estimated noise.
+// plane_from: Index of the starting plane used for noise estimation.
+// Commonly, 0 for Y-plane, 1 for U-plane, and 2 for V-plane.
+// plane_to: Index of the end plane used for noise estimation.
// bit_depth: Actual bit-depth instead of the encoding bit-depth of the frame.
-// Returns:
-// The estimated noise, or -1.0 if there are too few smooth pixels.
-double av1_estimate_noise_from_single_plane(const YV12_BUFFER_CONFIG *frame,
- const int plane,
- const int bit_depth,
- const int edge_thresh);
+// edge_thresh: Edge threshold.
+void av1_estimate_noise_level(const YV12_BUFFER_CONFIG *frame,
+ double *noise_level, int plane_from, int plane_to,
+ int bit_depth, int edge_thresh);
/*!\endcond */
/*!\brief Does temporal filter for a given macroblock row.
diff --git a/av1/encoder/tpl_model.c b/av1/encoder/tpl_model.c
index ef59c99..3aeb511 100644
--- a/av1/encoder/tpl_model.c
+++ b/av1/encoder/tpl_model.c
@@ -9,8 +9,9 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
-#include <stdint.h>
+#include <assert.h>
#include <float.h>
+#include <stdint.h>
#include "av1/encoder/thirdpass.h"
#include "config/aom_config.h"
@@ -57,6 +58,7 @@
sizeof(tpl_txfm_stats->abs_coeff_mean[0]) * tpl_txfm_stats->coeff_num);
}
+#if CONFIG_BITRATE_ACCURACY
void av1_accumulate_tpl_txfm_stats(const TplTxfmStats *sub_stats,
TplTxfmStats *accumulated_stats) {
accumulated_stats->txfm_block_count += sub_stats->txfm_block_count;
@@ -93,6 +95,7 @@
const int frame_index) {
tpl_data->txfm_stats_list[frame_index] = *tpl_txfm_stats;
}
+#endif // CONFIG_BITRATE_ACCURACY
static AOM_INLINE void get_quantize_error(const MACROBLOCK *x, int plane,
const tran_low_t *coeff,
@@ -190,7 +193,7 @@
&tpl_data->tpl_rec_pool[frame], width, height,
seq_params->subsampling_x, seq_params->subsampling_y,
seq_params->use_highbitdepth, tpl_data->border_in_pixels,
- byte_alignment, alloc_y_plane_only))
+ byte_alignment, 0, alloc_y_plane_only))
aom_internal_error(&ppi->error, AOM_CODEC_MEM_ERROR,
"Failed to allocate frame buffer");
}
@@ -217,8 +220,8 @@
int rate_cost = 1;
for (int idx = 0; idx < eob; ++idx) {
- int abs_level = abs(qcoeff[scan_order->scan[idx]]);
- rate_cost += (int)(log(abs_level + 1.0) / log(2.0)) + 1 + (abs_level > 0);
+ unsigned int abs_level = abs(qcoeff[scan_order->scan[idx]]);
+ rate_cost += get_msb(abs_level + 1) + 1 + (abs_level > 0);
}
return (rate_cost << AV1_PROB_COST_SHIFT);
@@ -228,7 +231,7 @@
const MACROBLOCK *x, int16_t *src_diff, int diff_stride, uint8_t *src,
int src_stride, uint8_t *dst, int dst_stride, tran_low_t *coeff,
tran_low_t *qcoeff, tran_low_t *dqcoeff, int bw, int bh, TX_SIZE tx_size,
- int *rate_cost, int64_t *recon_error, int64_t *sse) {
+ int do_recon, int *rate_cost, int64_t *recon_error, int64_t *sse) {
const MACROBLOCKD *xd = &x->e_mbd;
const BitDepthInfo bd_info = get_bit_depth_info(xd);
uint16_t eob;
@@ -241,8 +244,9 @@
*rate_cost = rate_estimator(qcoeff, eob, tx_size);
- av1_inverse_transform_block(xd, dqcoeff, 0, DCT_DCT, tx_size, dst, dst_stride,
- eob, 0);
+ if (do_recon)
+ av1_inverse_transform_block(xd, dqcoeff, 0, DCT_DCT, tx_size, dst,
+ dst_stride, eob, 0);
}
static uint32_t motion_estimation(AV1_COMP *cpi, MACROBLOCK *x,
@@ -277,14 +281,21 @@
FULLPEL_MOTION_SEARCH_PARAMS full_ms_params;
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, x, bsize, ¢er_mv,
- search_site_cfg,
+ start_mv, search_site_cfg,
/*fine_search_interval=*/0);
av1_set_mv_search_method(&full_ms_params, search_site_cfg,
tpl_sf->search_method);
- av1_full_pixel_search(start_mv, &full_ms_params, step_param,
- cond_cost_list(cpi, cost_list), &best_mv->as_fullmv,
- NULL);
+ bestsme = av1_full_pixel_search(start_mv, &full_ms_params, step_param,
+ cond_cost_list(cpi, cost_list),
+ &best_mv->as_fullmv, NULL);
+
+ // When sub-pel motion search is skipped, populate sub-pel precision MV and
+ // return.
+ if (tpl_sf->subpel_force_stop == FULL_PEL) {
+ best_mv->as_mv = get_mv_from_fullmv(&best_mv->as_fullmv);
+ return bestsme;
+ }
SUBPEL_MOTION_SEARCH_PARAMS ms_params;
av1_make_default_subpel_ms_params(&ms_params, cpi, x, bsize, ¢er_mv,
@@ -337,13 +348,15 @@
tran_low_t *dqcoeff, AV1_COMMON *cm, MACROBLOCK *x,
const YV12_BUFFER_CONFIG *ref_frame_ptr[2], uint8_t *rec_buffer_pool[3],
const int rec_stride_pool[3], TX_SIZE tx_size, PREDICTION_MODE best_mode,
- int mi_row, int mi_col, int use_y_only_rate_distortion,
+ int mi_row, int mi_col, int use_y_only_rate_distortion, int do_recon,
TplTxfmStats *tpl_txfm_stats) {
const SequenceHeader *seq_params = cm->seq_params;
*rate_cost = 0;
*recon_error = 1;
*pred_error = 1;
+ (void)tpl_txfm_stats;
+
MACROBLOCKD *xd = &x->e_mbd;
int is_compound = (best_mode == NEW_NEWMV);
int num_planes = use_y_only_rate_distortion ? 1 : MAX_MB_PLANE;
@@ -423,12 +436,14 @@
src_buffer_pool[plane] + src_mb_offset, src_stride, dst_buffer,
dst_buffer_stride, coeff, qcoeff, dqcoeff, block_size_wide[bsize_plane],
block_size_high[bsize_plane], max_txsize_rect_lookup[bsize_plane],
- &this_rate, &this_recon_error, &sse);
+ do_recon, &this_rate, &this_recon_error, &sse);
+#if CONFIG_BITRATE_ACCURACY
if (plane == 0 && tpl_txfm_stats) {
// We only collect Y plane's transform coefficient
av1_record_tpl_txfm_block(tpl_txfm_stats, coeff);
}
+#endif // CONFIG_BITRATE_ACCURACY
*recon_error += this_recon_error;
*pred_error += sse;
@@ -443,6 +458,7 @@
TplDepStats *tpl_stats) {
AV1_COMMON *cm = &cpi->common;
const GF_GROUP *gf_group = &cpi->ppi->gf_group;
+ TPL_SPEED_FEATURES *tpl_sf = &cpi->sf.tpl_sf;
(void)gf_group;
@@ -471,7 +487,7 @@
mi_row * MI_SIZE * tpl_frame->rec_picture->y_stride + mi_col * MI_SIZE;
uint8_t *dst_buffer = tpl_frame->rec_picture->y_buffer + dst_mb_offset;
int dst_buffer_stride = tpl_frame->rec_picture->y_stride;
- int use_y_only_rate_distortion = cpi->sf.tpl_sf.use_y_only_rate_distortion;
+ int use_y_only_rate_distortion = tpl_sf->use_y_only_rate_distortion;
uint8_t *rec_buffer_pool[3] = {
tpl_frame->rec_picture->y_buffer,
@@ -550,7 +566,7 @@
// if cpi->sf.tpl_sf.prune_intra_modes is on, then search only DC_PRED,
// H_PRED, and V_PRED
const PREDICTION_MODE last_intra_mode =
- cpi->sf.tpl_sf.prune_intra_modes ? D45_PRED : INTRA_MODE_END;
+ tpl_sf->prune_intra_modes ? D45_PRED : INTRA_MODE_END;
const SequenceHeader *seq_params = cm->seq_params;
for (PREDICTION_MODE mode = INTRA_MODE_START; mode < last_intra_mode;
++mode) {
@@ -576,7 +592,7 @@
get_rate_distortion(&rate_cost, &recon_error, &pred_error, src_diff, coeff,
qcoeff, dqcoeff, cm, x, NULL, rec_buffer_pool,
rec_stride_pool, tx_size, best_mode, mi_row, mi_col,
- use_y_only_rate_distortion, NULL);
+ use_y_only_rate_distortion, 1 /*do_recon*/, NULL);
tpl_stats->intra_dist = recon_error << TPL_DEP_COST_SCALE_LOG2;
tpl_stats->intra_sse = pred_error << TPL_DEP_COST_SCALE_LOG2;
@@ -656,7 +672,7 @@
TplDepStats *ref_tpl_stats = &tpl_frame->tpl_stats_ptr[av1_tpl_ptr_pos(
mi_row - mi_height, mi_col, tpl_frame->stride, block_mis_log2)];
if (!is_alike_mv(ref_tpl_stats->mv[rf_idx], center_mvs, refmv_count,
- cpi->sf.tpl_sf.skip_alike_starting_mv)) {
+ tpl_sf->skip_alike_starting_mv)) {
center_mvs[refmv_count].mv.as_int = ref_tpl_stats->mv[rf_idx].as_int;
++refmv_count;
}
@@ -666,7 +682,7 @@
TplDepStats *ref_tpl_stats = &tpl_frame->tpl_stats_ptr[av1_tpl_ptr_pos(
mi_row, mi_col - mi_width, tpl_frame->stride, block_mis_log2)];
if (!is_alike_mv(ref_tpl_stats->mv[rf_idx], center_mvs, refmv_count,
- cpi->sf.tpl_sf.skip_alike_starting_mv)) {
+ tpl_sf->skip_alike_starting_mv)) {
center_mvs[refmv_count].mv.as_int = ref_tpl_stats->mv[rf_idx].as_int;
++refmv_count;
}
@@ -677,7 +693,7 @@
mi_row - mi_height, mi_col + mi_width, tpl_frame->stride,
block_mis_log2)];
if (!is_alike_mv(ref_tpl_stats->mv[rf_idx], center_mvs, refmv_count,
- cpi->sf.tpl_sf.skip_alike_starting_mv)) {
+ tpl_sf->skip_alike_starting_mv)) {
center_mvs[refmv_count].mv.as_int = ref_tpl_stats->mv[rf_idx].as_int;
++refmv_count;
}
@@ -696,13 +712,13 @@
rf_idx + LAST_FRAME);
if (tp_mv.as_int != INVALID_MV &&
!is_alike_mv(tp_mv, center_mvs + 1, refmv_count - 1,
- cpi->sf.tpl_sf.skip_alike_starting_mv)) {
+ tpl_sf->skip_alike_starting_mv)) {
center_mvs[0].mv = tp_mv;
}
}
// Prune starting mvs
- if (cpi->sf.tpl_sf.prune_starting_mv) {
+ if (tpl_sf->prune_starting_mv && refmv_count > 1) {
// Get each center mv's sad.
for (idx = 0; idx < refmv_count; ++idx) {
FULLPEL_MV mv = get_fullmv_from_mv(¢er_mvs[idx].mv.as_mv);
@@ -713,10 +729,9 @@
}
// Rank center_mv using sad.
- if (refmv_count > 1) {
- qsort(center_mvs, refmv_count, sizeof(center_mvs[0]), compare_sad);
- }
- refmv_count = AOMMIN(4 - cpi->sf.tpl_sf.prune_starting_mv, refmv_count);
+ qsort(center_mvs, refmv_count, sizeof(center_mvs[0]), compare_sad);
+
+ refmv_count = AOMMIN(4 - tpl_sf->prune_starting_mv, refmv_count);
// Further reduce number of refmv based on sad difference.
if (refmv_count > 1) {
int last_sad = center_mvs[refmv_count - 1].sad;
@@ -741,21 +756,31 @@
tpl_stats->mv[rf_idx].as_int = best_rfidx_mv.as_int;
single_mv[rf_idx] = best_rfidx_mv;
- struct buf_2d ref_buf = { NULL, ref_frame_ptr->y_buffer,
- ref_frame_ptr->y_width, ref_frame_ptr->y_height,
- ref_frame_ptr->y_stride };
- InterPredParams inter_pred_params;
- av1_init_inter_params(&inter_pred_params, bw, bh, mi_row * MI_SIZE,
- mi_col * MI_SIZE, 0, 0, xd->bd, is_cur_buf_hbd(xd), 0,
- &tpl_data->sf, &ref_buf, kernel);
- inter_pred_params.conv_params = get_conv_params(0, 0, xd->bd);
+ if (tpl_sf->subpel_force_stop != FULL_PEL) {
+ struct buf_2d ref_buf = { NULL, ref_frame_ptr->y_buffer,
+ ref_frame_ptr->y_width, ref_frame_ptr->y_height,
+ ref_frame_ptr->y_stride };
+ InterPredParams inter_pred_params;
+ av1_init_inter_params(&inter_pred_params, bw, bh, mi_row * MI_SIZE,
+ mi_col * MI_SIZE, 0, 0, xd->bd, is_cur_buf_hbd(xd),
+ 0, &tpl_data->sf, &ref_buf, kernel);
+ inter_pred_params.conv_params = get_conv_params(0, 0, xd->bd);
- av1_enc_build_one_inter_predictor(predictor, bw, &best_rfidx_mv.as_mv,
- &inter_pred_params);
+ av1_enc_build_one_inter_predictor(predictor, bw, &best_rfidx_mv.as_mv,
+ &inter_pred_params);
- inter_cost =
- tpl_get_satd_cost(bd_info, src_diff, bw, src_mb_buffer, src_stride,
- predictor, bw, coeff, bw, bh, tx_size);
+ inter_cost =
+ tpl_get_satd_cost(bd_info, src_diff, bw, src_mb_buffer, src_stride,
+ predictor, bw, coeff, bw, bh, tx_size);
+ } else {
+ const FULLPEL_MV best_fullmv = get_fullmv_from_mv(&best_rfidx_mv.as_mv);
+ // Since sub-pel motion search is not performed, use the prediction pixels
+ // directly from the reference block ref_mb
+ inter_cost = tpl_get_satd_cost(
+ bd_info, src_diff, bw, src_mb_buffer, src_stride,
+ &ref_mb[best_fullmv.row * ref_stride + best_fullmv.col], ref_stride,
+ coeff, bw, bh, tx_size);
+ }
// Store inter cost for each ref frame
tpl_stats->pred_error[rf_idx] = AOMMAX(1, inter_cost);
@@ -782,7 +807,7 @@
int start_rf = 0;
int end_rf = 3;
- if (!cpi->sf.tpl_sf.allow_compound_pred) end_rf = 0;
+ if (!tpl_sf->allow_compound_pred) end_rf = 0;
if (cpi->third_pass_ctx &&
frame_offset < cpi->third_pass_ctx->frame_info_count &&
tpl_data->frame_idx < gf_group->size) {
@@ -802,10 +827,10 @@
break;
}
}
- if (!found || !cpi->sf.tpl_sf.allow_compound_pred) {
+ if (!found || !tpl_sf->allow_compound_pred) {
comp_ref_frames[2][0] = this_mi->ref_frame[0] - LAST_FRAME;
comp_ref_frames[2][1] = this_mi->ref_frame[1] - LAST_FRAME;
- if (!cpi->sf.tpl_sf.allow_compound_pred) {
+ if (!tpl_sf->allow_compound_pred) {
start_rf = 2;
end_rf = 3;
}
@@ -854,7 +879,8 @@
int_mv tmp_mv[2] = { single_mv[rf_idx0], single_mv[rf_idx1] };
int rate_mv;
av1_joint_motion_search(cpi, x, bsize, tmp_mv, NULL, 0, &rate_mv,
- !cpi->sf.mv_sf.disable_second_mv);
+ !cpi->sf.mv_sf.disable_second_mv,
+ NUM_JOINT_ME_REFINE_ITER);
for (int ref = 0; ref < 2; ++ref) {
struct buf_2d ref_buf = { NULL, ref_frame_ptr[ref]->y_buffer,
@@ -892,7 +918,7 @@
xd->mi[0]->ref_frame[1] = best_rf_idx1 + LAST_FRAME;
}
- if (best_inter_cost < INT32_MAX) {
+ if (best_inter_cost < INT32_MAX && is_inter_mode(best_mode)) {
xd->mi[0]->mv[0].as_int = best_mv[0].as_int;
xd->mi[0]->mv[1].as_int = best_mv[1].as_int;
const YV12_BUFFER_CONFIG *ref_frame_ptr[2] = {
@@ -907,7 +933,7 @@
get_rate_distortion(&rate_cost, &recon_error, &pred_error, src_diff, coeff,
qcoeff, dqcoeff, cm, x, ref_frame_ptr, rec_buffer_pool,
rec_stride_pool, tx_size, best_mode, mi_row, mi_col,
- use_y_only_rate_distortion, NULL);
+ use_y_only_rate_distortion, 0 /*do_recon*/, NULL);
tpl_stats->srcrf_rate = rate_cost;
}
@@ -935,7 +961,8 @@
get_rate_distortion(&rate_cost, &recon_error, &pred_error, src_diff, coeff,
qcoeff, dqcoeff, cm, x, ref_frame_ptr, rec_buffer_pool,
rec_stride_pool, tx_size, best_mode, mi_row, mi_col,
- use_y_only_rate_distortion, tpl_txfm_stats);
+ use_y_only_rate_distortion, 1 /*do_recon*/,
+ tpl_txfm_stats);
tpl_stats->recrf_dist = recon_error << TPL_DEP_COST_SCALE_LOG2;
tpl_stats->recrf_sse = pred_error << TPL_DEP_COST_SCALE_LOG2;
@@ -957,7 +984,7 @@
get_rate_distortion(&rate_cost, &recon_error, &pred_error, src_diff, coeff,
qcoeff, dqcoeff, cm, x, ref_frame_ptr, rec_buffer_pool,
rec_stride_pool, tx_size, best_mode, mi_row, mi_col,
- use_y_only_rate_distortion, NULL);
+ use_y_only_rate_distortion, 1 /*do_recon*/, NULL);
tpl_stats->cmp_recrf_dist[0] = recon_error << TPL_DEP_COST_SCALE_LOG2;
tpl_stats->cmp_recrf_rate[0] = rate_cost;
@@ -978,7 +1005,7 @@
get_rate_distortion(&rate_cost, &recon_error, &pred_error, src_diff, coeff,
qcoeff, dqcoeff, cm, x, ref_frame_ptr, rec_buffer_pool,
rec_stride_pool, tx_size, best_mode, mi_row, mi_col,
- use_y_only_rate_distortion, NULL);
+ use_y_only_rate_distortion, 1 /*do_recon*/, NULL);
tpl_stats->cmp_recrf_dist[1] = recon_error << TPL_DEP_COST_SCALE_LOG2;
tpl_stats->cmp_recrf_rate[1] = rate_cost;
@@ -1315,6 +1342,10 @@
// Initialize x->mbmi_ext when compound predictions are enabled.
if (cpi->sf.tpl_sf.allow_compound_pred) av1_zero(x->mbmi_ext);
+
+ // Set the pointer to null since mbmi is only allocated inside this function.
+ assert(xd->mi == &mbmi_ptr);
+ xd->mi = NULL;
}
// This function stores the motion estimation dependencies of all the blocks in
@@ -1756,8 +1787,10 @@
} else {
mc_flow_dispenser(cpi);
}
+#if CONFIG_BITRATE_ACCURACY
av1_tpl_txfm_stats_update_abs_coeff_mean(&cpi->td.tpl_txfm_stats);
av1_tpl_store_txfm_stats(tpl_data, &cpi->td.tpl_txfm_stats, frame_idx);
+#endif // CONFIG_BITRATE_ACCURACY
#if CONFIG_RATECTRL_LOG && CONFIG_THREE_PASS && CONFIG_BITRATE_ACCURACY
if (cpi->oxcf.pass == AOM_RC_THIRD_PASS) {
int frame_coding_idx =
@@ -2057,6 +2090,7 @@
RDCOST(tpl_frame->base_rdmult, this_stats->mc_dep_rate,
this_stats->mc_dep_dist);
double dist_scaled = (double)(this_stats->recrf_dist << RDDIV_BITS);
+ dist_scaled = AOMMAX(dist_scaled, 1);
intra_cost_base += log(dist_scaled) * cbcmp;
mc_dep_cost_base += log(dist_scaled + mc_dep_delta) * cbcmp;
cbcmp_base += cbcmp;
diff --git a/av1/encoder/tpl_model.h b/av1/encoder/tpl_model.h
index 71cc320..36c3ae0 100644
--- a/av1/encoder/tpl_model.h
+++ b/av1/encoder/tpl_model.h
@@ -485,6 +485,7 @@
*/
void av1_init_tpl_txfm_stats(TplTxfmStats *tpl_txfm_stats);
+#if CONFIG_BITRATE_ACCURACY
/*
*!\brief Accumulate TplTxfmStats
*
@@ -516,6 +517,7 @@
* \param[in] txfm_stats A structure for storing transform stats
*/
void av1_tpl_txfm_stats_update_abs_coeff_mean(TplTxfmStats *txfm_stats);
+#endif // CONFIG_BITRATE_ACCURACY
/*!\brief Estimate coefficient entropy using Laplace dsitribution
*
diff --git a/av1/encoder/tune_butteraugli.c b/av1/encoder/tune_butteraugli.c
index 2f057e1..8f59373 100644
--- a/av1/encoder/tune_butteraugli.c
+++ b/av1/encoder/tune_butteraugli.c
@@ -209,7 +209,7 @@
if (dst->buffer_alloc_sz == 0) {
aom_alloc_frame_buffer(
dst, width, height, ss_x, ss_y, cm->seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0);
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0);
}
av1_copy_and_extend_frame(cpi->source, dst);
@@ -218,7 +218,7 @@
aom_alloc_frame_buffer(
resized_dst, width / resize_factor, height / resize_factor, ss_x, ss_y,
cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
}
av1_resize_and_extend_frame_nonnormative(cpi->source, resized_dst, bit_depth,
av1_num_planes(cm));
@@ -241,7 +241,7 @@
aom_alloc_frame_buffer(
&resized_recon, width / resize_factor, height / resize_factor, ss_x, ss_y,
cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
copy_img(&cpi->common.cur_frame->buf, &resized_recon, width / resize_factor,
height / resize_factor);
@@ -264,13 +264,12 @@
cpi->source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_source, &cpi->scaled_source, cm->features.interp_filter,
- 0, false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ 0, false, false, cpi->oxcf.border_in_pixels, cpi->image_pyramid_levels);
if (cpi->unscaled_last_source != NULL) {
cpi->last_source = av1_realloc_and_scale_if_required(
cm, cpi->unscaled_last_source, &cpi->scaled_last_source,
cm->features.interp_filter, 0, false, false, cpi->oxcf.border_in_pixels,
- cpi->oxcf.tool_cfg.enable_global_motion);
+ cpi->image_pyramid_levels);
}
av1_setup_butteraugli_source(cpi);
@@ -299,9 +298,8 @@
av1_set_quantizer(cm, q_cfg->qm_minlevel, q_cfg->qm_maxlevel, q_index,
q_cfg->enable_chroma_deltaq, q_cfg->enable_hdr_deltaq);
av1_set_speed_features_qindex_dependent(cpi, oxcf->speed);
- if (q_cfg->deltaq_mode != NO_DELTA_Q || q_cfg->enable_chroma_deltaq)
- av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
- cm->seq_params->bit_depth);
+ av1_init_quantizer(&cpi->enc_quant_dequant_params, &cm->quant_params,
+ cm->seq_params->bit_depth);
av1_set_variance_partition_thresholds(cpi, q_index, 0);
av1_encode_frame(cpi);
diff --git a/av1/encoder/tune_vmaf.c b/av1/encoder/tune_vmaf.c
index 46260a6..9c7c112 100644
--- a/av1/encoder/tune_vmaf.c
+++ b/av1/encoder/tune_vmaf.c
@@ -63,7 +63,7 @@
// Do motion search.
// Only do full search on the entire block.
av1_make_default_fullpel_ms_params(&full_ms_params, cpi, mb, block_size,
- &baseline_mv, search_site_cfg,
+ &baseline_mv, *ref_mv, search_site_cfg,
/*fine_search_interval=*/0);
av1_set_mv_search_method(&full_ms_params, search_site_cfg, search_method);
av1_full_pixel_search(*ref_mv, &full_ms_params, step_param,
@@ -341,7 +341,7 @@
aom_alloc_frame_buffer(
&sharpened, width, height, source->subsampling_x, source->subsampling_y,
cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
const double baseline_variance = frame_average_variance(cpi, source);
double unsharp_amount;
@@ -393,7 +393,7 @@
aom_alloc_frame_buffer(
&blurred, width, height, source->subsampling_x, source->subsampling_y,
cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
gaussian_blur(bit_depth, source, &blurred);
unsharp(cpi, source, &blurred, source, best_frame_unsharp_amount);
@@ -413,11 +413,11 @@
aom_alloc_frame_buffer(
&source_extended, width, height, source->subsampling_x,
source->subsampling_y, cm->seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0);
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(
&blurred, width, height, source->subsampling_x, source->subsampling_y,
cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
av1_copy_and_extend_frame(source, &source_extended);
gaussian_blur(bit_depth, &source_extended, &blurred);
@@ -453,11 +453,11 @@
memset(&source_extended, 0, sizeof(source_extended));
aom_alloc_frame_buffer(
&blurred, width, height, ss_x, ss_y, cm->seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0);
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&source_extended, width, height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
av1_copy_and_extend_frame(source, &source_extended);
gaussian_blur(bit_depth, &source_extended, &blurred);
@@ -493,11 +493,11 @@
aom_alloc_frame_buffer(&source_block, block_w, block_h, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&blurred_block, block_w, block_h, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
for (int row = 0; row < num_rows; ++row) {
for (int col = 0; col < num_cols; ++col) {
@@ -620,7 +620,7 @@
aom_alloc_frame_buffer(
&resized_source, y_width / resize_factor, y_height / resize_factor, ss_x,
ss_y, cm->seq_params->use_highbitdepth, cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
av1_resize_and_extend_frame_nonnormative(cpi->source, &resized_source,
bit_depth, av1_num_planes(cm));
@@ -638,7 +638,7 @@
aom_alloc_frame_buffer(&blurred, resized_y_width, resized_y_height, ss_x,
ss_y, cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
gaussian_blur(bit_depth, &resized_source, &blurred);
YV12_BUFFER_CONFIG recon;
@@ -646,7 +646,7 @@
aom_alloc_frame_buffer(&recon, resized_y_width, resized_y_height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_yv12_copy_frame(&resized_source, &recon, 1);
VmafContext *vmaf_context;
@@ -825,15 +825,15 @@
aom_alloc_frame_buffer(&blurred_cur, y_width, y_height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&blurred_last, y_width, y_height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&blurred_next, y_width, y_height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
gaussian_blur(bit_depth, cur, &blurred_cur);
gaussian_blur(bit_depth, last, &blurred_last);
@@ -935,7 +935,8 @@
const double dvmaf = 26.11 * (1.0 - exp(-0.06 * motion));
const double dsse = dvmaf * approx_sse / approx_dvmaf;
- const double beta = approx_sse / (dsse + approx_sse);
+ // Clamping beta to address VQ issue (aomedia:3170).
+ const double beta = AOMMAX(approx_sse / (dsse + approx_sse), 0.5);
const int offset =
av1_get_deltaq_offset(cm->seq_params->bit_depth, current_qindex, beta);
int qindex = current_qindex + offset;
@@ -1017,18 +1018,18 @@
aom_alloc_frame_buffer(&recon_sharpened, width, height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&src_sharpened, width, height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(&recon_blurred, width, height, ss_x, ss_y,
cm->seq_params->use_highbitdepth,
cpi->oxcf.border_in_pixels,
- cm->features.byte_alignment, 0);
+ cm->features.byte_alignment, 0, 0);
aom_alloc_frame_buffer(
&src_blurred, width, height, ss_x, ss_y, cm->seq_params->use_highbitdepth,
- cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0);
+ cpi->oxcf.border_in_pixels, cm->features.byte_alignment, 0, 0);
gaussian_blur(bit_depth, recon, &recon_blurred);
gaussian_blur(bit_depth, src, &src_blurred);
diff --git a/av1/encoder/tx_search.c b/av1/encoder/tx_search.c
index 74c9de2..d6217b7 100644
--- a/av1/encoder/tx_search.c
+++ b/av1/encoder/tx_search.c
@@ -2809,10 +2809,10 @@
int feature_idx = get_mean_dev_features(diff, diff_stride, bw, bh, features);
- features[feature_idx++] = logf(1.0f + (float)x->source_variance);
+ features[feature_idx++] = log1pf((float)x->source_variance);
const int dc_q = av1_dc_quant_QTX(x->qindex, 0, xd->bd) >> (xd->bd - 8);
- const float log_dc_q_square = logf(1.0f + (float)(dc_q * dc_q) / 256.0f);
+ const float log_dc_q_square = log1pf((float)(dc_q * dc_q) / 256.0f);
features[feature_idx++] = log_dc_q_square;
assert(feature_idx == NUM_INTRA_TX_SPLIT_FEATURES);
for (int i = 0; i < NUM_INTRA_TX_SPLIT_FEATURES; i++) {
@@ -2895,7 +2895,13 @@
#endif
RD_STATS this_rd_stats;
- rd[depth] = av1_uniform_txfm_yrd(cpi, x, &this_rd_stats, ref_best_rd, bs,
+ // When the speed feature use_rd_based_breakout_for_intra_tx_search is
+ // enabled, use the known minimum best_rd for early termination.
+ const int64_t rd_thresh =
+ cpi->sf.tx_sf.use_rd_based_breakout_for_intra_tx_search
+ ? AOMMIN(ref_best_rd, best_rd)
+ : ref_best_rd;
+ rd[depth] = av1_uniform_txfm_yrd(cpi, x, &this_rd_stats, rd_thresh, bs,
tx_size, FTXS_NONE, skip_trellis);
if (rd[depth] < best_rd) {
av1_copy_array(best_blk_skip, txfm_info->blk_skip, num_blks);
diff --git a/av1/encoder/txb_rdopt.c b/av1/encoder/txb_rdopt.c
index 2f2b8fd..e551e8a 100644
--- a/av1/encoder/txb_rdopt.c
+++ b/av1/encoder/txb_rdopt.c
@@ -16,7 +16,7 @@
static INLINE void update_coeff_general(
int *accu_rate, int64_t *accu_dist, int si, int eob, TX_SIZE tx_size,
- TX_CLASS tx_class, int bwl, int height, int64_t rdmult, int shift,
+ TX_CLASS tx_class, int bhl, int width, int64_t rdmult, int shift,
int dc_sign_ctx, const int16_t *dequant, const int16_t *scan,
const LV_MAP_COEFF_COST *txb_costs, const tran_low_t *tcoeff,
tran_low_t *qcoeff, tran_low_t *dqcoeff, uint8_t *levels,
@@ -26,7 +26,7 @@
const tran_low_t qc = qcoeff[ci];
const int is_last = si == (eob - 1);
const int coeff_ctx = get_lower_levels_ctx_general(
- is_last, si, bwl, height, levels, ci, tx_size, tx_class);
+ is_last, si, bhl, width, levels, ci, tx_size, tx_class);
if (qc == 0) {
*accu_rate += txb_costs->base_cost[coeff_ctx][0];
} else {
@@ -38,7 +38,7 @@
const int64_t dist0 = get_coeff_dist(tqc, 0, shift, qmatrix, ci);
const int rate =
get_coeff_cost_general(is_last, ci, abs_qc, sign, coeff_ctx,
- dc_sign_ctx, txb_costs, bwl, tx_class, levels);
+ dc_sign_ctx, txb_costs, bhl, tx_class, levels);
const int64_t rd = RDCOST(rdmult, rate, dist);
tran_low_t qc_low, dqc_low;
@@ -55,14 +55,14 @@
dist_low = get_coeff_dist(tqc, dqc_low, shift, qmatrix, ci);
rate_low =
get_coeff_cost_general(is_last, ci, abs_qc_low, sign, coeff_ctx,
- dc_sign_ctx, txb_costs, bwl, tx_class, levels);
+ dc_sign_ctx, txb_costs, bhl, tx_class, levels);
}
rd_low = RDCOST(rdmult, rate_low, dist_low);
if (rd_low < rd) {
qcoeff[ci] = qc_low;
dqcoeff[ci] = dqc_low;
- levels[get_padded_idx(ci, bwl)] = AOMMIN(abs_qc_low, INT8_MAX);
+ levels[get_padded_idx(ci, bhl)] = AOMMIN(abs_qc_low, INT8_MAX);
*accu_rate += rate_low;
*accu_dist += dist_low - dist0;
} else {
@@ -74,7 +74,7 @@
static AOM_FORCE_INLINE void update_coeff_simple(
int *accu_rate, int si, int eob, TX_SIZE tx_size, TX_CLASS tx_class,
- int bwl, int64_t rdmult, int shift, const int16_t *dequant,
+ int bhl, int64_t rdmult, int shift, const int16_t *dequant,
const int16_t *scan, const LV_MAP_COEFF_COST *txb_costs,
const tran_low_t *tcoeff, tran_low_t *qcoeff, tran_low_t *dqcoeff,
uint8_t *levels, const qm_val_t *iqmatrix, const qm_val_t *qmatrix) {
@@ -87,7 +87,7 @@
const int ci = scan[si];
const tran_low_t qc = qcoeff[ci];
const int coeff_ctx =
- get_lower_levels_ctx(levels, ci, bwl, tx_size, tx_class);
+ get_lower_levels_ctx(levels, ci, bhl, tx_size, tx_class);
if (qc == 0) {
*accu_rate += txb_costs->base_cost[coeff_ctx][0];
} else {
@@ -96,7 +96,7 @@
const tran_low_t abs_dqc = abs(dqcoeff[ci]);
int rate_low = 0;
const int rate = get_two_coeff_cost_simple(
- ci, abs_qc, coeff_ctx, txb_costs, bwl, tx_class, levels, &rate_low);
+ ci, abs_qc, coeff_ctx, txb_costs, bhl, tx_class, levels, &rate_low);
if (abs_dqc < abs_tqc) {
*accu_rate += rate;
return;
@@ -115,7 +115,7 @@
const int sign = (qc < 0) ? 1 : 0;
qcoeff[ci] = (-sign ^ abs_qc_low) + sign;
dqcoeff[ci] = (-sign ^ abs_dqc_low) + sign;
- levels[get_padded_idx(ci, bwl)] = AOMMIN(abs_qc_low, INT8_MAX);
+ levels[get_padded_idx(ci, bhl)] = AOMMIN(abs_qc_low, INT8_MAX);
*accu_rate += rate_low;
} else {
*accu_rate += rate;
@@ -125,7 +125,7 @@
static AOM_FORCE_INLINE void update_coeff_eob(
int *accu_rate, int64_t *accu_dist, int *eob, int *nz_num, int *nz_ci,
- int si, TX_SIZE tx_size, TX_CLASS tx_class, int bwl, int height,
+ int si, TX_SIZE tx_size, TX_CLASS tx_class, int bhl, int width,
int dc_sign_ctx, int64_t rdmult, int shift, const int16_t *dequant,
const int16_t *scan, const LV_MAP_EOB_COST *txb_eob_costs,
const LV_MAP_COEFF_COST *txb_costs, const tran_low_t *tcoeff,
@@ -136,7 +136,7 @@
const int ci = scan[si];
const tran_low_t qc = qcoeff[ci];
const int coeff_ctx =
- get_lower_levels_ctx(levels, ci, bwl, tx_size, tx_class);
+ get_lower_levels_ctx(levels, ci, bhl, tx_size, tx_class);
if (qc == 0) {
*accu_rate += txb_costs->base_cost[coeff_ctx][0];
} else {
@@ -149,7 +149,7 @@
int64_t dist = get_coeff_dist(tqc, dqc, shift, qmatrix, ci) - dist0;
int rate =
get_coeff_cost_general(0, ci, abs_qc, sign, coeff_ctx, dc_sign_ctx,
- txb_costs, bwl, tx_class, levels);
+ txb_costs, bhl, tx_class, levels);
int64_t rd = RDCOST(rdmult, *accu_rate + rate, *accu_dist + dist);
tran_low_t qc_low, dqc_low;
@@ -169,18 +169,18 @@
dist_low = get_coeff_dist(tqc, dqc_low, shift, qmatrix, ci) - dist0;
rate_low =
get_coeff_cost_general(0, ci, abs_qc_low, sign, coeff_ctx,
- dc_sign_ctx, txb_costs, bwl, tx_class, levels);
+ dc_sign_ctx, txb_costs, bhl, tx_class, levels);
rd_low = RDCOST(rdmult, *accu_rate + rate_low, *accu_dist + dist_low);
}
int lower_level_new_eob = 0;
const int new_eob = si + 1;
- const int coeff_ctx_new_eob = get_lower_levels_ctx_eob(bwl, height, si);
+ const int coeff_ctx_new_eob = get_lower_levels_ctx_eob(bhl, width, si);
const int new_eob_cost =
get_eob_cost(new_eob, txb_eob_costs, txb_costs, tx_class);
int rate_coeff_eob =
new_eob_cost + get_coeff_cost_eob(ci, abs_qc, sign, coeff_ctx_new_eob,
- dc_sign_ctx, txb_costs, bwl,
+ dc_sign_ctx, txb_costs, bhl,
tx_class);
int64_t dist_new_eob = dist;
int64_t rd_new_eob = RDCOST(rdmult, rate_coeff_eob, dist_new_eob);
@@ -189,7 +189,7 @@
const int rate_coeff_eob_low =
new_eob_cost + get_coeff_cost_eob(ci, abs_qc_low, sign,
coeff_ctx_new_eob, dc_sign_ctx,
- txb_costs, bwl, tx_class);
+ txb_costs, bhl, tx_class);
const int64_t dist_new_eob_low = dist_low;
const int64_t rd_new_eob_low =
RDCOST(rdmult, rate_coeff_eob_low, dist_new_eob_low);
@@ -213,7 +213,7 @@
if (sharpness == 0 && rd_new_eob < rd) {
for (int ni = 0; ni < *nz_num; ++ni) {
int last_ci = nz_ci[ni];
- levels[get_padded_idx(last_ci, bwl)] = 0;
+ levels[get_padded_idx(last_ci, bhl)] = 0;
qcoeff[last_ci] = 0;
dqcoeff[last_ci] = 0;
}
@@ -230,7 +230,7 @@
if (lower_level) {
qcoeff[ci] = qc_low;
dqcoeff[ci] = dqc_low;
- levels[get_padded_idx(ci, bwl)] = AOMMIN(abs_qc_low, INT8_MAX);
+ levels[get_padded_idx(ci, bhl)] = AOMMIN(abs_qc_low, INT8_MAX);
}
if (qcoeff[ci]) {
nz_ci[*nz_num] = ci;
@@ -251,7 +251,7 @@
qcoeff[ci] = 0;
dqcoeff[ci] = 0;
// no need to set up levels because this is the last step
- // levels[get_padded_idx(ci, bwl)] = 0;
+ // levels[get_padded_idx(ci, bhl)] = 0;
}
*accu_rate = 0;
*eob = 0;
@@ -324,10 +324,10 @@
const TX_SIZE txs_ctx = get_txsize_entropy_ctx(tx_size);
const TX_CLASS tx_class = tx_type_to_class[tx_type];
const MB_MODE_INFO *mbmi = xd->mi[0];
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
- assert(width == (1 << bwl));
+ assert(height == (1 << bhl));
const int is_inter = is_inter_block(mbmi);
const LV_MAP_COEFF_COST *txb_costs =
&coeff_costs->coeff_costs[txs_ctx][plane_type];
@@ -344,7 +344,7 @@
rshift;
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
if (eob > 1) av1_txb_init_levels(qcoeff, width, height, levels);
@@ -365,16 +365,16 @@
int nz_ci[3] = { ci, 0, 0 };
if (abs_qc >= 2) {
update_coeff_general(&accu_rate, &accu_dist, si, eob, tx_size, tx_class,
- bwl, height, rdmult, shift, txb_ctx->dc_sign_ctx,
+ bhl, width, rdmult, shift, txb_ctx->dc_sign_ctx,
dequant, scan, txb_costs, tcoeff, qcoeff, dqcoeff,
levels, iqmatrix, qmatrix);
--si;
} else {
assert(abs_qc == 1);
- const int coeff_ctx = get_lower_levels_ctx_eob(bwl, height, si);
+ const int coeff_ctx = get_lower_levels_ctx_eob(bhl, width, si);
accu_rate +=
get_coeff_cost_eob(ci, abs_qc, sign, coeff_ctx, txb_ctx->dc_sign_ctx,
- txb_costs, bwl, tx_class);
+ txb_costs, bhl, tx_class);
const tran_low_t tqc = tcoeff[ci];
const tran_low_t dqc = dqcoeff[ci];
const int64_t dist = get_coeff_dist(tqc, dqc, shift, qmatrix, ci);
@@ -387,7 +387,7 @@
case tx_class_literal: \
for (; si >= 0 && nz_num <= max_nz_num; --si) { \
update_coeff_eob(&accu_rate, &accu_dist, &eob, &nz_num, nz_ci, si, \
- tx_size, tx_class_literal, bwl, height, \
+ tx_size, tx_class_literal, bhl, width, \
txb_ctx->dc_sign_ctx, rdmult, shift, dequant, scan, \
txb_eob_costs, txb_costs, tcoeff, qcoeff, dqcoeff, \
levels, sharpness, iqmatrix, qmatrix); \
@@ -409,7 +409,7 @@
#define UPDATE_COEFF_SIMPLE_CASE(tx_class_literal) \
case tx_class_literal: \
for (; si >= 1; --si) { \
- update_coeff_simple(&accu_rate, si, eob, tx_size, tx_class_literal, bwl, \
+ update_coeff_simple(&accu_rate, si, eob, tx_size, tx_class_literal, bhl, \
rdmult, shift, dequant, scan, txb_costs, tcoeff, \
qcoeff, dqcoeff, levels, iqmatrix, qmatrix); \
} \
@@ -427,7 +427,7 @@
// no need to update accu_dist because it's not used after this point
int64_t dummy_dist = 0;
update_coeff_general(&accu_rate, &dummy_dist, si, eob, tx_size, tx_class,
- bwl, height, rdmult, shift, txb_ctx->dc_sign_ctx,
+ bhl, width, rdmult, shift, txb_ctx->dc_sign_ctx,
dequant, scan, txb_costs, tcoeff, qcoeff, dqcoeff,
levels, iqmatrix, qmatrix);
}
@@ -456,13 +456,13 @@
int reduced_tx_set_used) {
const tran_low_t *const qcoeff = p->qcoeff + BLOCK_OFFSET(block);
const int txb_skip_ctx = txb_ctx->txb_skip_ctx;
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
const SCAN_ORDER *const scan_order = get_scan(tx_size, tx_type);
const int16_t *const scan = scan_order->scan;
uint8_t levels_buf[TX_PAD_2D];
- uint8_t *const levels = set_levels(levels_buf, width);
+ uint8_t *const levels = set_levels(levels_buf, height);
DECLARE_ALIGNED(16, int8_t, coeff_contexts[MAX_TX_SQUARE]);
const int eob_multi_size = txsize_log2_minus4[tx_size];
const LV_MAP_EOB_COST *const eob_costs =
@@ -491,7 +491,7 @@
if (v) {
// sign bit cost
if (level > NUM_BASE_LEVELS) {
- const int ctx = get_br_ctx_eob(pos, bwl, tx_class);
+ const int ctx = get_br_ctx_eob(pos, bhl, tx_class);
cost += get_br_cost(level, lps_cost[ctx]);
}
if (c) {
@@ -515,7 +515,7 @@
// sign bit cost
cost += av1_cost_literal(1);
if (level > NUM_BASE_LEVELS) {
- const int ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int ctx = get_br_ctx(levels, pos, bhl, tx_class);
cost += get_br_cost(level, lps_cost[ctx]);
}
}
@@ -535,7 +535,7 @@
const int dc_sign_ctx = txb_ctx->dc_sign_ctx;
cost += coeff_costs->dc_sign_cost[dc_sign_ctx][sign01];
if (level > NUM_BASE_LEVELS) {
- const int ctx = get_br_ctx(levels, pos, bwl, tx_class);
+ const int ctx = get_br_ctx(levels, pos, bhl, tx_class);
cost += get_br_cost(level, lps_cost[ctx]);
}
}
diff --git a/av1/encoder/txb_rdopt_utils.h b/av1/encoder/txb_rdopt_utils.h
index d8158fd..b9f08aa 100644
--- a/av1/encoder/txb_rdopt_utils.h
+++ b/av1/encoder/txb_rdopt_utils.h
@@ -119,7 +119,7 @@
static AOM_FORCE_INLINE int get_two_coeff_cost_simple(
int ci, tran_low_t abs_qc, int coeff_ctx,
- const LV_MAP_COEFF_COST *txb_costs, int bwl, TX_CLASS tx_class,
+ const LV_MAP_COEFF_COST *txb_costs, int bhl, TX_CLASS tx_class,
const uint8_t *levels, int *cost_low) {
// this simple version assumes the coeff's scan_idx is not DC (scan_idx != 0)
// and not the last (scan_idx != eob - 1)
@@ -130,7 +130,7 @@
if (abs_qc) {
cost += av1_cost_literal(1);
if (abs_qc > NUM_BASE_LEVELS) {
- const int br_ctx = get_br_ctx(levels, ci, bwl, tx_class);
+ const int br_ctx = get_br_ctx(levels, ci, bhl, tx_class);
int brcost_diff = 0;
cost += get_br_cost_with_diff(abs_qc, txb_costs->lps_cost[br_ctx],
&brcost_diff);
@@ -145,7 +145,7 @@
static INLINE int get_coeff_cost_eob(int ci, tran_low_t abs_qc, int sign,
int coeff_ctx, int dc_sign_ctx,
const LV_MAP_COEFF_COST *txb_costs,
- int bwl, TX_CLASS tx_class) {
+ int bhl, TX_CLASS tx_class) {
int cost = 0;
cost += txb_costs->base_eob_cost[coeff_ctx][AOMMIN(abs_qc, 3) - 1];
if (abs_qc != 0) {
@@ -156,7 +156,7 @@
}
if (abs_qc > NUM_BASE_LEVELS) {
int br_ctx;
- br_ctx = get_br_ctx_eob(ci, bwl, tx_class);
+ br_ctx = get_br_ctx_eob(ci, bhl, tx_class);
cost += get_br_cost(abs_qc, txb_costs->lps_cost[br_ctx]);
}
}
@@ -167,7 +167,7 @@
int sign, int coeff_ctx,
int dc_sign_ctx,
const LV_MAP_COEFF_COST *txb_costs,
- int bwl, TX_CLASS tx_class,
+ int bhl, TX_CLASS tx_class,
const uint8_t *levels) {
int cost = 0;
if (is_last) {
@@ -184,9 +184,9 @@
if (abs_qc > NUM_BASE_LEVELS) {
int br_ctx;
if (is_last)
- br_ctx = get_br_ctx_eob(ci, bwl, tx_class);
+ br_ctx = get_br_ctx_eob(ci, bhl, tx_class);
else
- br_ctx = get_br_ctx(levels, ci, bwl, tx_class);
+ br_ctx = get_br_ctx(levels, ci, bhl, tx_class);
cost += get_br_cost(abs_qc, txb_costs->lps_cost[br_ctx]);
}
}
diff --git a/av1/encoder/var_based_part.c b/av1/encoder/var_based_part.c
index 3d47a28..5b8f598 100644
--- a/av1/encoder/var_based_part.c
+++ b/av1/encoder/var_based_part.c
@@ -30,8 +30,6 @@
#include "av1/encoder/var_based_part.h"
#include "av1/encoder/reconinter_enc.h"
-extern const uint8_t AV1_VAR_OFFS[];
-
// Possible values for the force_split variable while evaluating variance based
// partitioning.
enum {
@@ -50,49 +48,49 @@
static AOM_INLINE void tree_to_node(void *data, BLOCK_SIZE bsize,
variance_node *node) {
- int i;
node->part_variances = NULL;
switch (bsize) {
case BLOCK_128X128: {
VP128x128 *vt = (VP128x128 *)data;
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++)
- node->split[i] = &vt->split[i].part_variances.none;
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx].part_variances.none;
break;
}
case BLOCK_64X64: {
VP64x64 *vt = (VP64x64 *)data;
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++)
- node->split[i] = &vt->split[i].part_variances.none;
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx].part_variances.none;
break;
}
case BLOCK_32X32: {
VP32x32 *vt = (VP32x32 *)data;
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++)
- node->split[i] = &vt->split[i].part_variances.none;
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx].part_variances.none;
break;
}
case BLOCK_16X16: {
VP16x16 *vt = (VP16x16 *)data;
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++)
- node->split[i] = &vt->split[i].part_variances.none;
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx].part_variances.none;
break;
}
case BLOCK_8X8: {
VP8x8 *vt = (VP8x8 *)data;
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++)
- node->split[i] = &vt->split[i].part_variances.none;
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx].part_variances.none;
break;
}
default: {
VP4x4 *vt = (VP4x4 *)data;
assert(bsize == BLOCK_4X4);
node->part_variances = &vt->part_variances;
- for (i = 0; i < 4; i++) node->split[i] = &vt->split[i];
+ for (int split_idx = 0; split_idx < 4; split_idx++)
+ node->split[split_idx] = &vt->split[split_idx];
break;
}
}
@@ -217,12 +215,14 @@
if (mi_row + bs_height_check <= tile->mi_row_end &&
mi_col + bs_width_vert_check <= tile->mi_col_end) {
BLOCK_SIZE subsize = get_partition_subsize(bsize, PARTITION_VERT);
+ BLOCK_SIZE plane_bsize =
+ get_plane_block_size(subsize, xd->plane[AOM_PLANE_U].subsampling_x,
+ xd->plane[AOM_PLANE_U].subsampling_y);
get_variance(&vt.part_variances->vert[0]);
get_variance(&vt.part_variances->vert[1]);
if (vt.part_variances->vert[0].variance < threshold &&
vt.part_variances->vert[1].variance < threshold &&
- get_plane_block_size(subsize, xd->plane[1].subsampling_x,
- xd->plane[1].subsampling_y) < BLOCK_INVALID) {
+ plane_bsize < BLOCK_INVALID) {
set_block_size(cpi, mi_row, mi_col, subsize);
set_block_size(cpi, mi_row, mi_col + block_width / 2, subsize);
return 1;
@@ -232,12 +232,14 @@
if (mi_col + bs_width_check <= tile->mi_col_end &&
mi_row + bs_height_horiz_check <= tile->mi_row_end) {
BLOCK_SIZE subsize = get_partition_subsize(bsize, PARTITION_HORZ);
+ BLOCK_SIZE plane_bsize =
+ get_plane_block_size(subsize, xd->plane[AOM_PLANE_U].subsampling_x,
+ xd->plane[AOM_PLANE_U].subsampling_y);
get_variance(&vt.part_variances->horz[0]);
get_variance(&vt.part_variances->horz[1]);
if (vt.part_variances->horz[0].variance < threshold &&
vt.part_variances->horz[1].variance < threshold &&
- get_plane_block_size(subsize, xd->plane[1].subsampling_x,
- xd->plane[1].subsampling_y) < BLOCK_INVALID) {
+ plane_bsize < BLOCK_INVALID) {
set_block_size(cpi, mi_row, mi_col, subsize);
set_block_size(cpi, mi_row + block_height / 2, mi_col, subsize);
return 1;
@@ -251,9 +253,9 @@
static AOM_INLINE int all_blks_inside(int x16_idx, int y16_idx, int pixels_wide,
int pixels_high) {
int all_inside = 1;
- for (int k = 0; k < 4; k++) {
- all_inside &= ((x16_idx + ((k & 1) << 3)) < pixels_wide);
- all_inside &= ((y16_idx + ((k >> 1) << 3)) < pixels_high);
+ for (int idx = 0; idx < 4; idx++) {
+ all_inside &= ((x16_idx + GET_BLK_IDX_X(idx, 3)) < pixels_wide);
+ all_inside &= ((y16_idx + GET_BLK_IDX_Y(idx, 3)) < pixels_high);
}
return all_inside;
}
@@ -261,113 +263,116 @@
#if CONFIG_AV1_HIGHBITDEPTH
// TODO(yunqingwang): Perform average of four 8x8 blocks similar to lowbd
static AOM_INLINE void fill_variance_8x8avg_highbd(
- const uint8_t *s, int sp, const uint8_t *d, int dp, int x16_idx,
- int y16_idx, VP16x16 *vst, int pixels_wide, int pixels_high,
- int is_key_frame) {
- for (int k = 0; k < 4; k++) {
- const int x8_idx = x16_idx + ((k & 1) << 3);
- const int y8_idx = y16_idx + ((k >> 1) << 3);
+ const uint8_t *src_buf, int src_stride, const uint8_t *dst_buf,
+ int dst_stride, int x16_idx, int y16_idx, VP16x16 *vst, int pixels_wide,
+ int pixels_high) {
+ for (int idx = 0; idx < 4; idx++) {
+ const int x8_idx = x16_idx + GET_BLK_IDX_X(idx, 3);
+ const int y8_idx = y16_idx + GET_BLK_IDX_Y(idx, 3);
unsigned int sse = 0;
int sum = 0;
if (x8_idx < pixels_wide && y8_idx < pixels_high) {
- int s_avg;
- int d_avg = 128;
- s_avg = aom_highbd_avg_8x8(s + y8_idx * sp + x8_idx, sp);
- if (!is_key_frame)
- d_avg = aom_highbd_avg_8x8(d + y8_idx * dp + x8_idx, dp);
+ int src_avg = aom_highbd_avg_8x8(src_buf + y8_idx * src_stride + x8_idx,
+ src_stride);
+ int dst_avg = aom_highbd_avg_8x8(dst_buf + y8_idx * dst_stride + x8_idx,
+ dst_stride);
- sum = s_avg - d_avg;
+ sum = src_avg - dst_avg;
sse = sum * sum;
}
- fill_variance(sse, sum, 0, &vst->split[k].part_variances.none);
+ fill_variance(sse, sum, 0, &vst->split[idx].part_variances.none);
}
}
#endif
-static AOM_INLINE void fill_variance_8x8avg_lowbd(const uint8_t *s, int sp,
- const uint8_t *d, int dp,
- int x16_idx, int y16_idx,
- VP16x16 *vst, int pixels_wide,
- int pixels_high,
- int is_key_frame) {
+static AOM_INLINE void fill_variance_8x8avg_lowbd(
+ const uint8_t *src_buf, int src_stride, const uint8_t *dst_buf,
+ int dst_stride, int x16_idx, int y16_idx, VP16x16 *vst, int pixels_wide,
+ int pixels_high) {
unsigned int sse[4] = { 0 };
int sum[4] = { 0 };
- int d_avg[4] = { 128, 128, 128, 128 };
- int s_avg[4];
if (all_blks_inside(x16_idx, y16_idx, pixels_wide, pixels_high)) {
- aom_avg_8x8_quad(s, sp, x16_idx, y16_idx, s_avg);
- if (!is_key_frame) aom_avg_8x8_quad(d, dp, x16_idx, y16_idx, d_avg);
- for (int k = 0; k < 4; k++) {
- sum[k] = s_avg[k] - d_avg[k];
- sse[k] = sum[k] * sum[k];
+ int src_avg[4];
+ int dst_avg[4];
+ aom_avg_8x8_quad(src_buf, src_stride, x16_idx, y16_idx, src_avg);
+ aom_avg_8x8_quad(dst_buf, dst_stride, x16_idx, y16_idx, dst_avg);
+ for (int idx = 0; idx < 4; idx++) {
+ sum[idx] = src_avg[idx] - dst_avg[idx];
+ sse[idx] = sum[idx] * sum[idx];
}
} else {
- for (int k = 0; k < 4; k++) {
- const int x8_idx = x16_idx + ((k & 1) << 3);
- const int y8_idx = y16_idx + ((k >> 1) << 3);
+ for (int idx = 0; idx < 4; idx++) {
+ const int x8_idx = x16_idx + GET_BLK_IDX_X(idx, 3);
+ const int y8_idx = y16_idx + GET_BLK_IDX_Y(idx, 3);
if (x8_idx < pixels_wide && y8_idx < pixels_high) {
- s_avg[k] = aom_avg_8x8(s + y8_idx * sp + x8_idx, sp);
- if (!is_key_frame) d_avg[k] = aom_avg_8x8(d + y8_idx * dp + x8_idx, dp);
- sum[k] = s_avg[k] - d_avg[k];
- sse[k] = sum[k] * sum[k];
+ int src_avg =
+ aom_avg_8x8(src_buf + y8_idx * src_stride + x8_idx, src_stride);
+ int dst_avg =
+ aom_avg_8x8(dst_buf + y8_idx * dst_stride + x8_idx, dst_stride);
+ sum[idx] = src_avg - dst_avg;
+ sse[idx] = sum[idx] * sum[idx];
}
}
}
- for (int k = 0; k < 4; k++) {
- fill_variance(sse[k], sum[k], 0, &vst->split[k].part_variances.none);
+ for (int idx = 0; idx < 4; idx++) {
+ fill_variance(sse[idx], sum[idx], 0, &vst->split[idx].part_variances.none);
}
}
// Obtain parameters required to calculate variance (such as sum, sse, etc,.)
// at 8x8 sub-block level for a given 16x16 block.
-static AOM_INLINE void fill_variance_8x8avg(const uint8_t *s, int sp,
- const uint8_t *d, int dp,
- int x16_idx, int y16_idx,
- VP16x16 *vst, int highbd_flag,
- int pixels_wide, int pixels_high,
- int is_key_frame) {
+// The function can be called only when is_key_frame is false since sum is
+// computed between source and reference frames.
+static AOM_INLINE void fill_variance_8x8avg(
+ const uint8_t *src_buf, int src_stride, const uint8_t *dst_buf,
+ int dst_stride, int x16_idx, int y16_idx, VP16x16 *vst, int highbd_flag,
+ int pixels_wide, int pixels_high) {
#if CONFIG_AV1_HIGHBITDEPTH
if (highbd_flag) {
- fill_variance_8x8avg_highbd(s, sp, d, dp, x16_idx, y16_idx, vst,
- pixels_wide, pixels_high, is_key_frame);
+ fill_variance_8x8avg_highbd(src_buf, src_stride, dst_buf, dst_stride,
+ x16_idx, y16_idx, vst, pixels_wide,
+ pixels_high);
return;
}
#else
(void)highbd_flag;
#endif // CONFIG_AV1_HIGHBITDEPTH
- fill_variance_8x8avg_lowbd(s, sp, d, dp, x16_idx, y16_idx, vst, pixels_wide,
- pixels_high, is_key_frame);
+ fill_variance_8x8avg_lowbd(src_buf, src_stride, dst_buf, dst_stride, x16_idx,
+ y16_idx, vst, pixels_wide, pixels_high);
}
-static int compute_minmax_8x8(const uint8_t *s, int sp, const uint8_t *d,
- int dp, int x16_idx, int y16_idx,
+static int compute_minmax_8x8(const uint8_t *src_buf, int src_stride,
+ const uint8_t *dst_buf, int dst_stride,
+ int x16_idx, int y16_idx,
#if CONFIG_AV1_HIGHBITDEPTH
int highbd_flag,
#endif
int pixels_wide, int pixels_high) {
- int k;
int minmax_max = 0;
int minmax_min = 255;
// Loop over the 4 8x8 subblocks.
- for (k = 0; k < 4; k++) {
- int x8_idx = x16_idx + ((k & 1) << 3);
- int y8_idx = y16_idx + ((k >> 1) << 3);
+ for (int idx = 0; idx < 4; idx++) {
+ const int x8_idx = x16_idx + GET_BLK_IDX_X(idx, 3);
+ const int y8_idx = y16_idx + GET_BLK_IDX_Y(idx, 3);
int min = 0;
int max = 0;
if (x8_idx < pixels_wide && y8_idx < pixels_high) {
#if CONFIG_AV1_HIGHBITDEPTH
if (highbd_flag & YV12_FLAG_HIGHBITDEPTH) {
- aom_highbd_minmax_8x8(s + y8_idx * sp + x8_idx, sp,
- d + y8_idx * dp + x8_idx, dp, &min, &max);
+ aom_highbd_minmax_8x8(
+ src_buf + y8_idx * src_stride + x8_idx, src_stride,
+ dst_buf + y8_idx * dst_stride + x8_idx, dst_stride, &min, &max);
} else {
- aom_minmax_8x8(s + y8_idx * sp + x8_idx, sp, d + y8_idx * dp + x8_idx,
- dp, &min, &max);
+ aom_minmax_8x8(src_buf + y8_idx * src_stride + x8_idx, src_stride,
+ dst_buf + y8_idx * dst_stride + x8_idx, dst_stride, &min,
+ &max);
}
#else
- aom_minmax_8x8(s + y8_idx * sp + x8_idx, sp, d + y8_idx * dp + x8_idx, dp,
- &min, &max);
+ aom_minmax_8x8(src_buf + y8_idx * src_stride + x8_idx, src_stride,
+ dst_buf + y8_idx * dst_stride + x8_idx, dst_stride, &min,
+ &max);
#endif
if ((max - min) > minmax_max) minmax_max = (max - min);
if ((max - min) < minmax_min) minmax_min = (max - min);
@@ -376,43 +381,42 @@
return (minmax_max - minmax_min);
}
-static AOM_INLINE void fill_variance_4x4avg(const uint8_t *s, int sp,
- const uint8_t *d, int dp,
- int x8_idx, int y8_idx, VP8x8 *vst,
+// Function to compute average and variance of 4x4 sub-block.
+// The function can be called only when is_key_frame is true since sum is
+// computed using source frame only.
+static AOM_INLINE void fill_variance_4x4avg(const uint8_t *src_buf,
+ int src_stride, int x8_idx,
+ int y8_idx, VP8x8 *vst,
#if CONFIG_AV1_HIGHBITDEPTH
int highbd_flag,
#endif
int pixels_wide, int pixels_high,
- int is_key_frame,
int border_offset_4x4) {
- int k;
- for (k = 0; k < 4; k++) {
- int x4_idx = x8_idx + ((k & 1) << 2);
- int y4_idx = y8_idx + ((k >> 1) << 2);
+ for (int idx = 0; idx < 4; idx++) {
+ const int x4_idx = x8_idx + GET_BLK_IDX_X(idx, 2);
+ const int y4_idx = y8_idx + GET_BLK_IDX_Y(idx, 2);
unsigned int sse = 0;
int sum = 0;
if (x4_idx < pixels_wide - border_offset_4x4 &&
y4_idx < pixels_high - border_offset_4x4) {
- int s_avg;
- int d_avg = 128;
+ int src_avg;
+ int dst_avg = 128;
#if CONFIG_AV1_HIGHBITDEPTH
if (highbd_flag & YV12_FLAG_HIGHBITDEPTH) {
- s_avg = aom_highbd_avg_4x4(s + y4_idx * sp + x4_idx, sp);
- if (!is_key_frame)
- d_avg = aom_highbd_avg_4x4(d + y4_idx * dp + x4_idx, dp);
+ src_avg = aom_highbd_avg_4x4(src_buf + y4_idx * src_stride + x4_idx,
+ src_stride);
} else {
- s_avg = aom_avg_4x4(s + y4_idx * sp + x4_idx, sp);
- if (!is_key_frame) d_avg = aom_avg_4x4(d + y4_idx * dp + x4_idx, dp);
+ src_avg =
+ aom_avg_4x4(src_buf + y4_idx * src_stride + x4_idx, src_stride);
}
#else
- s_avg = aom_avg_4x4(s + y4_idx * sp + x4_idx, sp);
- if (!is_key_frame) d_avg = aom_avg_4x4(d + y4_idx * dp + x4_idx, dp);
+ src_avg = aom_avg_4x4(src_buf + y4_idx * src_stride + x4_idx, src_stride);
#endif
- sum = s_avg - d_avg;
+ sum = src_avg - dst_avg;
sse = sum * sum;
}
- fill_variance(sse, sum, 0, &vst->split[k].part_variances.none);
+ fill_variance(sse, sum, 0, &vst->split[idx].part_variances.none);
}
}
@@ -430,101 +434,137 @@
return threshold;
}
-static AOM_INLINE void tune_thresh_based_on_qindex_window(
- int qindex, int th, int win, int fac, int64_t thresholds[]) {
+// Tune thresholds less or more aggressively to prefer larger partitions
+static AOM_INLINE void tune_thresh_based_on_qindex(
+ AV1_COMP *cpi, int64_t thresholds[], uint64_t block_sad, int current_qindex,
+ int num_pixels, bool is_segment_id_boosted, int source_sad_nonrd,
+ int lighting_change) {
double weight;
-
- if (qindex < th - win)
- weight = 1.0;
- else if (qindex > th + win)
- weight = 0.0;
- else
- weight = 1.0 - (qindex - th + win) / (2 * win);
- thresholds[1] =
- (int)((1 - weight) * (thresholds[1] << 1) + weight * thresholds[1]);
- thresholds[2] =
- (int)((1 - weight) * (thresholds[2] << 1) + weight * thresholds[2]);
- thresholds[3] =
- (int)((1 - weight) * (thresholds[3] << fac) + weight * thresholds[3]);
+ if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 3) {
+ const int win = 20;
+ if (current_qindex < QINDEX_LARGE_BLOCK_THR - win)
+ weight = 1.0;
+ else if (current_qindex > QINDEX_LARGE_BLOCK_THR + win)
+ weight = 0.0;
+ else
+ weight =
+ 1.0 - (current_qindex - QINDEX_LARGE_BLOCK_THR + win) / (2 * win);
+ if (num_pixels > RESOLUTION_480P) {
+ for (int i = 0; i < 4; i++) {
+ thresholds[i] <<= 1;
+ }
+ }
+ if (num_pixels <= RESOLUTION_288P) {
+ thresholds[3] = INT64_MAX;
+ if (is_segment_id_boosted == false) {
+ thresholds[1] <<= 2;
+ thresholds[2] <<= (source_sad_nonrd <= kLowSad) ? 5 : 4;
+ } else {
+ thresholds[1] <<= 1;
+ thresholds[2] <<= 3;
+ }
+ // Allow for split to 8x8 for superblocks where part of it has
+ // moving boundary. So allow for sb with source_sad above threshold,
+ // and avoid very large source_sad or high source content, to avoid
+ // too many 8x8 within superblock.
+ uint64_t avg_source_sad_thresh = 25000;
+ uint64_t block_sad_low = 25000;
+ uint64_t block_sad_high = 50000;
+ if (cpi->svc.temporal_layer_id == 0 &&
+ cpi->svc.number_temporal_layers > 1) {
+ // Increase the sad thresholds for base TL0, as reference/LAST is
+ // 2/4 frames behind (for 2/3 #TL).
+ avg_source_sad_thresh = 40000;
+ block_sad_high = 70000;
+ }
+ if (is_segment_id_boosted == false &&
+ cpi->rc.avg_source_sad < avg_source_sad_thresh &&
+ block_sad > block_sad_low && block_sad < block_sad_high &&
+ !lighting_change) {
+ thresholds[2] = (3 * thresholds[2]) >> 2;
+ thresholds[3] = thresholds[2] << 3;
+ }
+ // Condition the increase of partition thresholds on the segment
+ // and the content. Avoid the increase for superblocks which have
+ // high source sad, unless the whole frame has very high motion
+ // (i.e, cpi->rc.avg_source_sad is very large, in which case all blocks
+ // have high source sad).
+ } else if (num_pixels > RESOLUTION_480P && is_segment_id_boosted == false &&
+ (source_sad_nonrd != kHighSad ||
+ cpi->rc.avg_source_sad > 50000)) {
+ thresholds[0] = (3 * thresholds[0]) >> 1;
+ thresholds[3] = INT64_MAX;
+ if (current_qindex > QINDEX_LARGE_BLOCK_THR) {
+ thresholds[1] =
+ (int)((1 - weight) * (thresholds[1] << 1) + weight * thresholds[1]);
+ thresholds[2] =
+ (int)((1 - weight) * (thresholds[2] << 1) + weight * thresholds[2]);
+ }
+ } else if (current_qindex > QINDEX_LARGE_BLOCK_THR &&
+ is_segment_id_boosted == false &&
+ (source_sad_nonrd != kHighSad ||
+ cpi->rc.avg_source_sad > 50000)) {
+ thresholds[1] =
+ (int)((1 - weight) * (thresholds[1] << 2) + weight * thresholds[1]);
+ thresholds[2] =
+ (int)((1 - weight) * (thresholds[2] << 4) + weight * thresholds[2]);
+ thresholds[3] = INT64_MAX;
+ }
+ } else if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 2) {
+ thresholds[1] <<= (source_sad_nonrd <= kLowSad) ? 2 : 0;
+ thresholds[2] =
+ (source_sad_nonrd <= kLowSad) ? (3 * thresholds[2]) : thresholds[2];
+ } else if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 1) {
+ const int fac = (source_sad_nonrd <= kLowSad) ? 2 : 1;
+ if (current_qindex < QINDEX_LARGE_BLOCK_THR - 45)
+ weight = 1.0;
+ else if (current_qindex > QINDEX_LARGE_BLOCK_THR + 45)
+ weight = 0.0;
+ else
+ weight = 1.0 - (current_qindex - QINDEX_LARGE_BLOCK_THR + 45) / (2 * 45);
+ thresholds[1] =
+ (int)((1 - weight) * (thresholds[1] << 1) + weight * thresholds[1]);
+ thresholds[2] =
+ (int)((1 - weight) * (thresholds[2] << 1) + weight * thresholds[2]);
+ thresholds[3] =
+ (int)((1 - weight) * (thresholds[3] << fac) + weight * thresholds[3]);
+ }
+ if (cpi->sf.part_sf.disable_8x8_part_based_on_qidx && (current_qindex < 128))
+ thresholds[3] = INT64_MAX;
}
-static AOM_INLINE void set_vbp_thresholds(AV1_COMP *cpi, int64_t thresholds[],
- int q, int content_lowsumdiff,
- int source_sad_nonrd,
- int source_sad_rd, int segment_id,
- uint64_t blk_sad,
- int lighting_change) {
- AV1_COMMON *const cm = &cpi->common;
- const int is_key_frame = frame_is_intra_only(cm);
- const int threshold_multiplier = is_key_frame ? 120 : 1;
- const int ac_q = av1_ac_quant_QTX(q, 0, cm->seq_params->bit_depth);
- int64_t threshold_base = (int64_t)(threshold_multiplier * ac_q);
- const int current_qindex = cm->quant_params.base_qindex;
- const int threshold_left_shift = cpi->sf.rt_sf.var_part_split_threshold_shift;
-
- if (is_key_frame) {
- if (cpi->sf.rt_sf.force_large_partition_blocks_intra) {
- const int shift_steps =
- threshold_left_shift - (cpi->oxcf.mode == ALLINTRA ? 7 : 8);
- assert(shift_steps >= 0);
- threshold_base <<= shift_steps;
- }
- thresholds[0] = threshold_base;
- thresholds[1] = threshold_base;
- if (cm->width * cm->height < 1280 * 720) {
- thresholds[2] = threshold_base / 3;
- thresholds[3] = threshold_base >> 1;
- } else {
- int shift_val = 2;
- if (cpi->sf.rt_sf.force_large_partition_blocks_intra) {
- shift_val = 0;
- }
-
- thresholds[2] = threshold_base >> shift_val;
- thresholds[3] = threshold_base >> shift_val;
- }
- thresholds[4] = threshold_base << 2;
- return;
+static void set_vbp_thresholds_key_frame(AV1_COMP *cpi, int64_t thresholds[],
+ int64_t threshold_base,
+ int threshold_left_shift,
+ int num_pixels) {
+ if (cpi->sf.rt_sf.force_large_partition_blocks_intra) {
+ const int shift_steps =
+ threshold_left_shift - (cpi->oxcf.mode == ALLINTRA ? 7 : 8);
+ assert(shift_steps >= 0);
+ threshold_base <<= shift_steps;
}
-
- // Increase partition thresholds for noisy content. Apply it only for
- // superblocks where sumdiff is low, as we assume the sumdiff of superblock
- // whose only change is due to noise will be low (i.e, noise will average
- // out over large block).
- if (cpi->noise_estimate.enabled && content_lowsumdiff &&
- (cm->width * cm->height > 640 * 480) &&
- cm->current_frame.frame_number > 60) {
- NOISE_LEVEL noise_level =
- av1_noise_estimate_extract_level(&cpi->noise_estimate);
- if (noise_level == kHigh)
- threshold_base = (5 * threshold_base) >> 1;
- else if (noise_level == kMedium &&
- !cpi->sf.rt_sf.prefer_large_partition_blocks)
- threshold_base = (5 * threshold_base) >> 2;
- }
- // TODO(kyslov) Enable var based partition adjusment on temporal denoising
-#if 0 // CONFIG_AV1_TEMPORAL_DENOISING
- if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi) &&
- cpi->oxcf.speed > 5 && cpi->denoiser.denoising_level >= kDenLow)
- threshold_base =
- av1_scale_part_thresh(threshold_base, cpi->denoiser.denoising_level,
- content_state, cpi->svc.temporal_layer_id);
- else
- threshold_base =
- scale_part_thresh_content(threshold_base, cpi->oxcf.speed, cm->width,
- cm->height, cpi->ppi->rtc_ref.non_reference_frame);
-#else
- // Increase base variance threshold based on content_state/sum_diff level.
- threshold_base = scale_part_thresh_content(
- threshold_base, cpi->oxcf.speed, cm->width, cm->height,
- cpi->ppi->rtc_ref.non_reference_frame);
-#endif
- thresholds[0] = threshold_base >> 1;
+ thresholds[0] = threshold_base;
thresholds[1] = threshold_base;
- thresholds[3] = threshold_base << threshold_left_shift;
- if (cm->width >= 1280 && cm->height >= 720)
- thresholds[3] = thresholds[3] << 1;
- if (cm->width * cm->height <= 352 * 288) {
+ if (num_pixels < RESOLUTION_720P) {
+ thresholds[2] = threshold_base / 3;
+ thresholds[3] = threshold_base >> 1;
+ } else {
+ int shift_val = 2;
+ if (cpi->sf.rt_sf.force_large_partition_blocks_intra) {
+ shift_val = 0;
+ }
+
+ thresholds[2] = threshold_base >> shift_val;
+ thresholds[3] = threshold_base >> shift_val;
+ }
+ thresholds[4] = threshold_base << 2;
+}
+
+static AOM_INLINE void tune_thresh_based_on_resolution(
+ AV1_COMP *cpi, int64_t thresholds[], int64_t threshold_base,
+ int current_qindex, int source_sad_rd, int num_pixels) {
+ if (num_pixels >= RESOLUTION_720P) thresholds[3] = thresholds[3] << 1;
+ if (num_pixels <= RESOLUTION_288P) {
const int qindex_thr[5][2] = {
{ 200, 220 }, { 140, 170 }, { 120, 150 }, { 200, 210 }, { 170, 220 },
};
@@ -563,85 +603,99 @@
qi_diff_high * (threshold_base << 3)) /
threshold_diff;
}
- } else if (cm->width < 1280 && cm->height < 720) {
+ } else if (num_pixels < RESOLUTION_720P) {
thresholds[2] = (5 * threshold_base) >> 2;
- } else if (cm->width < 1920 && cm->height < 1080) {
+ } else if (num_pixels < RESOLUTION_1080P) {
thresholds[2] = threshold_base << 1;
- } else if (cm->width < 2560 && cm->height < 1440) {
- thresholds[2] = (5 * threshold_base) >> 1;
} else {
- thresholds[2] = (7 * threshold_base) >> 1;
- }
- // Tune thresholds less or more aggressively to prefer larger partitions
- if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 3) {
- double weight;
- const int win = 20;
- if (current_qindex < QINDEX_LARGE_BLOCK_THR - win)
- weight = 1.0;
- else if (current_qindex > QINDEX_LARGE_BLOCK_THR + win)
- weight = 0.0;
- else
- weight =
- 1.0 - (current_qindex - QINDEX_LARGE_BLOCK_THR + win) / (2 * win);
- if (cm->width * cm->height > 640 * 480) {
- for (int i = 0; i < 4; i++) {
- thresholds[i] <<= 1;
- }
- }
- if (cm->width * cm->height <= 352 * 288) {
- thresholds[3] = INT64_MAX;
- if (segment_id == 0) {
- thresholds[1] <<= 2;
- thresholds[2] <<= (source_sad_nonrd <= kLowSad) ? 5 : 4;
+ // num_pixels >= RESOLUTION_1080P
+ if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN) {
+ if (num_pixels < RESOLUTION_1440P) {
+ thresholds[2] = (5 * threshold_base) >> 1;
} else {
- thresholds[1] <<= 1;
- thresholds[2] <<= 3;
+ thresholds[2] = (7 * threshold_base) >> 1;
}
- // Allow for split to 8x8 for superblocks where part of it has
- // moving boundary. So allow for sb with source_sad above threshold,
- // and avoid very large source_sad or high source content, to avoid
- // too many 8x8 within superblock.
- if (segment_id == 0 && cpi->rc.avg_source_sad < 25000 &&
- blk_sad > 25000 && blk_sad < 50000 && !lighting_change) {
- thresholds[2] = (3 * thresholds[2]) >> 2;
- thresholds[3] = thresholds[2] << 3;
+ } else {
+ if (cpi->oxcf.speed > 7) {
+ thresholds[2] = 6 * threshold_base;
+ } else {
+ thresholds[2] = 3 * threshold_base;
}
- // Condition the increase of partition thresholds on the segment
- // and the content. Avoid the increase for superblocks which have
- // high source sad, unless the whole frame has very high motion
- // (i.e, cpi->rc.avg_source_sad is very large, in which case all blocks
- // have high source sad).
- } else if (cm->width * cm->height > 640 * 480 && segment_id == 0 &&
- (source_sad_nonrd != kHighSad ||
- cpi->rc.avg_source_sad > 50000)) {
- thresholds[0] = (3 * thresholds[0]) >> 1;
- thresholds[3] = INT64_MAX;
- if (current_qindex > QINDEX_LARGE_BLOCK_THR) {
- thresholds[1] =
- (int)((1 - weight) * (thresholds[1] << 1) + weight * thresholds[1]);
- thresholds[2] =
- (int)((1 - weight) * (thresholds[2] << 1) + weight * thresholds[2]);
- }
- } else if (current_qindex > QINDEX_LARGE_BLOCK_THR && segment_id == 0 &&
- (source_sad_nonrd != kHighSad ||
- cpi->rc.avg_source_sad > 50000)) {
- thresholds[1] =
- (int)((1 - weight) * (thresholds[1] << 2) + weight * thresholds[1]);
- thresholds[2] =
- (int)((1 - weight) * (thresholds[2] << 4) + weight * thresholds[2]);
- thresholds[3] = INT64_MAX;
}
- } else if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 2) {
- thresholds[1] <<= (source_sad_nonrd <= kLowSad) ? 2 : 0;
- thresholds[2] =
- (source_sad_nonrd <= kLowSad) ? (3 * thresholds[2]) : thresholds[2];
- } else if (cpi->sf.rt_sf.prefer_large_partition_blocks >= 1) {
- const int fac = (source_sad_nonrd <= kLowSad) ? 2 : 1;
- tune_thresh_based_on_qindex_window(current_qindex, QINDEX_LARGE_BLOCK_THR,
- 45, fac, thresholds);
}
- if (cpi->sf.part_sf.disable_8x8_part_based_on_qidx && (current_qindex < 128))
- thresholds[3] = INT64_MAX;
+}
+
+// Increase partition thresholds for noisy content. Apply it only for
+// superblocks where sumdiff is low, as we assume the sumdiff of superblock
+// whose only change is due to noise will be low (i.e, noise will average
+// out over large block).
+static AOM_INLINE int64_t tune_thresh_noisy_content(AV1_COMP *cpi,
+ int64_t threshold_base,
+ int content_lowsumdiff,
+ int num_pixels) {
+ AV1_COMMON *const cm = &cpi->common;
+ int64_t updated_thresh_base = threshold_base;
+ if (cpi->noise_estimate.enabled && content_lowsumdiff &&
+ num_pixels > RESOLUTION_480P && cm->current_frame.frame_number > 60) {
+ NOISE_LEVEL noise_level =
+ av1_noise_estimate_extract_level(&cpi->noise_estimate);
+ if (noise_level == kHigh)
+ updated_thresh_base = (5 * updated_thresh_base) >> 1;
+ else if (noise_level == kMedium &&
+ !cpi->sf.rt_sf.prefer_large_partition_blocks)
+ updated_thresh_base = (5 * updated_thresh_base) >> 2;
+ }
+ // TODO(kyslov) Enable var based partition adjusment on temporal denoising
+#if 0 // CONFIG_AV1_TEMPORAL_DENOISING
+ if (cpi->oxcf.noise_sensitivity > 0 && denoise_svc(cpi) &&
+ cpi->oxcf.speed > 5 && cpi->denoiser.denoising_level >= kDenLow)
+ updated_thresh_base =
+ av1_scale_part_thresh(updated_thresh_base, cpi->denoiser.denoising_level,
+ content_state, cpi->svc.temporal_layer_id);
+ else
+ threshold_base =
+ scale_part_thresh_content(updated_thresh_base, cpi->oxcf.speed, cm->width,
+ cm->height, cpi->ppi->rtc_ref.non_reference_frame);
+#else
+ // Increase base variance threshold based on content_state/sum_diff level.
+ updated_thresh_base = scale_part_thresh_content(
+ updated_thresh_base, cpi->oxcf.speed, cm->width, cm->height,
+ cpi->ppi->rtc_ref.non_reference_frame);
+#endif
+ return updated_thresh_base;
+}
+
+static AOM_INLINE void set_vbp_thresholds(
+ AV1_COMP *cpi, int64_t thresholds[], uint64_t blk_sad, int qindex,
+ int content_lowsumdiff, int source_sad_nonrd, int source_sad_rd,
+ bool is_segment_id_boosted, int lighting_change) {
+ AV1_COMMON *const cm = &cpi->common;
+ const int is_key_frame = frame_is_intra_only(cm);
+ const int threshold_multiplier = is_key_frame ? 120 : 1;
+ const int ac_q = av1_ac_quant_QTX(qindex, 0, cm->seq_params->bit_depth);
+ int64_t threshold_base = (int64_t)(threshold_multiplier * ac_q);
+ const int current_qindex = cm->quant_params.base_qindex;
+ const int threshold_left_shift = cpi->sf.rt_sf.var_part_split_threshold_shift;
+ const int num_pixels = cm->width * cm->height;
+
+ if (is_key_frame) {
+ set_vbp_thresholds_key_frame(cpi, thresholds, threshold_base,
+ threshold_left_shift, num_pixels);
+ return;
+ }
+
+ threshold_base = tune_thresh_noisy_content(cpi, threshold_base,
+ content_lowsumdiff, num_pixels);
+ thresholds[0] = threshold_base >> 1;
+ thresholds[1] = threshold_base;
+ thresholds[3] = threshold_base << threshold_left_shift;
+
+ tune_thresh_based_on_resolution(cpi, thresholds, threshold_base,
+ current_qindex, source_sad_rd, num_pixels);
+
+ tune_thresh_based_on_qindex(cpi, thresholds, blk_sad, current_qindex,
+ num_pixels, is_segment_id_boosted,
+ source_sad_nonrd, lighting_change);
}
// Set temporal variance low flag for superblock 64x64.
@@ -654,42 +708,43 @@
if ((vt->part_variances).none.variance < (thresholds[0] >> 1))
part_info->variance_low[0] = 1;
} else if (xd->mi[0]->bsize == BLOCK_64X32) {
- for (int i = 0; i < 2; i++) {
- if (vt->part_variances.horz[i].variance < (thresholds[0] >> 2))
- part_info->variance_low[i + 1] = 1;
+ for (int part_idx = 0; part_idx < 2; part_idx++) {
+ if (vt->part_variances.horz[part_idx].variance < (thresholds[0] >> 2))
+ part_info->variance_low[part_idx + 1] = 1;
}
} else if (xd->mi[0]->bsize == BLOCK_32X64) {
- for (int i = 0; i < 2; i++) {
- if (vt->part_variances.vert[i].variance < (thresholds[0] >> 2))
- part_info->variance_low[i + 3] = 1;
+ for (int part_idx = 0; part_idx < 2; part_idx++) {
+ if (vt->part_variances.vert[part_idx].variance < (thresholds[0] >> 2))
+ part_info->variance_low[part_idx + 3] = 1;
}
} else {
static const int idx[4][2] = { { 0, 0 }, { 0, 8 }, { 8, 0 }, { 8, 8 } };
- for (int i = 0; i < 4; i++) {
- const int idx_str =
- mi_params->mi_stride * (mi_row + idx[i][0]) + mi_col + idx[i][1];
+ for (int lvl1_idx = 0; lvl1_idx < 4; lvl1_idx++) {
+ const int idx_str = mi_params->mi_stride * (mi_row + idx[lvl1_idx][0]) +
+ mi_col + idx[lvl1_idx][1];
MB_MODE_INFO **this_mi = mi_params->mi_grid_base + idx_str;
- if (mi_params->mi_cols <= mi_col + idx[i][1] ||
- mi_params->mi_rows <= mi_row + idx[i][0])
+ if (mi_params->mi_cols <= mi_col + idx[lvl1_idx][1] ||
+ mi_params->mi_rows <= mi_row + idx[lvl1_idx][0])
continue;
if (*this_mi == NULL) continue;
if ((*this_mi)->bsize == BLOCK_32X32) {
int64_t threshold_32x32 = (5 * thresholds[1]) >> 3;
- if (vt->split[i].part_variances.none.variance < threshold_32x32)
- part_info->variance_low[i + 5] = 1;
+ if (vt->split[lvl1_idx].part_variances.none.variance < threshold_32x32)
+ part_info->variance_low[lvl1_idx + 5] = 1;
} else {
// For 32x16 and 16x32 blocks, the flag is set on each 16x16 block
// inside.
if ((*this_mi)->bsize == BLOCK_16X16 ||
(*this_mi)->bsize == BLOCK_32X16 ||
(*this_mi)->bsize == BLOCK_16X32) {
- for (int j = 0; j < 4; j++) {
- if (vt->split[i].split[j].part_variances.none.variance <
- (thresholds[2] >> 8))
- part_info->variance_low[(i << 2) + j + 9] = 1;
+ for (int lvl2_idx = 0; lvl2_idx < 4; lvl2_idx++) {
+ if (vt->split[lvl1_idx]
+ .split[lvl2_idx]
+ .part_variances.none.variance < (thresholds[2] >> 8))
+ part_info->variance_low[(lvl1_idx << 2) + lvl2_idx + 9] = 1;
}
}
}
@@ -705,68 +760,74 @@
if (vt->part_variances.none.variance < (thresholds[0] >> 1))
part_info->variance_low[0] = 1;
} else if (xd->mi[0]->bsize == BLOCK_128X64) {
- for (int i = 0; i < 2; i++) {
- if (vt->part_variances.horz[i].variance < (thresholds[0] >> 2))
- part_info->variance_low[i + 1] = 1;
+ for (int part_idx = 0; part_idx < 2; part_idx++) {
+ if (vt->part_variances.horz[part_idx].variance < (thresholds[0] >> 2))
+ part_info->variance_low[part_idx + 1] = 1;
}
} else if (xd->mi[0]->bsize == BLOCK_64X128) {
- for (int i = 0; i < 2; i++) {
- if (vt->part_variances.vert[i].variance < (thresholds[0] >> 2))
- part_info->variance_low[i + 3] = 1;
+ for (int part_idx = 0; part_idx < 2; part_idx++) {
+ if (vt->part_variances.vert[part_idx].variance < (thresholds[0] >> 2))
+ part_info->variance_low[part_idx + 3] = 1;
}
} else {
static const int idx64[4][2] = {
{ 0, 0 }, { 0, 16 }, { 16, 0 }, { 16, 16 }
};
static const int idx32[4][2] = { { 0, 0 }, { 0, 8 }, { 8, 0 }, { 8, 8 } };
- for (int i = 0; i < 4; i++) {
- const int idx_str =
- mi_params->mi_stride * (mi_row + idx64[i][0]) + mi_col + idx64[i][1];
+ for (int lvl1_idx = 0; lvl1_idx < 4; lvl1_idx++) {
+ const int idx_str = mi_params->mi_stride * (mi_row + idx64[lvl1_idx][0]) +
+ mi_col + idx64[lvl1_idx][1];
MB_MODE_INFO **mi_64 = mi_params->mi_grid_base + idx_str;
if (*mi_64 == NULL) continue;
- if (mi_params->mi_cols <= mi_col + idx64[i][1] ||
- mi_params->mi_rows <= mi_row + idx64[i][0])
+ if (mi_params->mi_cols <= mi_col + idx64[lvl1_idx][1] ||
+ mi_params->mi_rows <= mi_row + idx64[lvl1_idx][0])
continue;
const int64_t threshold_64x64 = (5 * thresholds[1]) >> 3;
if ((*mi_64)->bsize == BLOCK_64X64) {
- if (vt->split[i].part_variances.none.variance < threshold_64x64)
- part_info->variance_low[5 + i] = 1;
+ if (vt->split[lvl1_idx].part_variances.none.variance < threshold_64x64)
+ part_info->variance_low[5 + lvl1_idx] = 1;
} else if ((*mi_64)->bsize == BLOCK_64X32) {
- for (int j = 0; j < 2; j++)
- if (vt->split[i].part_variances.horz[j].variance <
+ for (int part_idx = 0; part_idx < 2; part_idx++)
+ if (vt->split[lvl1_idx].part_variances.horz[part_idx].variance <
(threshold_64x64 >> 1))
- part_info->variance_low[9 + (i << 1) + j] = 1;
+ part_info->variance_low[9 + (lvl1_idx << 1) + part_idx] = 1;
} else if ((*mi_64)->bsize == BLOCK_32X64) {
- for (int j = 0; j < 2; j++)
- if (vt->split[i].part_variances.vert[j].variance <
+ for (int part_idx = 0; part_idx < 2; part_idx++)
+ if (vt->split[lvl1_idx].part_variances.vert[part_idx].variance <
(threshold_64x64 >> 1))
- part_info->variance_low[17 + (i << 1) + j] = 1;
+ part_info->variance_low[17 + (lvl1_idx << 1) + part_idx] = 1;
} else {
- for (int k = 0; k < 4; k++) {
- const int idx_str1 = mi_params->mi_stride * idx32[k][0] + idx32[k][1];
+ for (int lvl2_idx = 0; lvl2_idx < 4; lvl2_idx++) {
+ const int idx_str1 =
+ mi_params->mi_stride * idx32[lvl2_idx][0] + idx32[lvl2_idx][1];
MB_MODE_INFO **mi_32 = mi_params->mi_grid_base + idx_str + idx_str1;
if (*mi_32 == NULL) continue;
- if (mi_params->mi_cols <= mi_col + idx64[i][1] + idx32[k][1] ||
- mi_params->mi_rows <= mi_row + idx64[i][0] + idx32[k][0])
+ if (mi_params->mi_cols <=
+ mi_col + idx64[lvl1_idx][1] + idx32[lvl2_idx][1] ||
+ mi_params->mi_rows <=
+ mi_row + idx64[lvl1_idx][0] + idx32[lvl2_idx][0])
continue;
const int64_t threshold_32x32 = (5 * thresholds[2]) >> 3;
if ((*mi_32)->bsize == BLOCK_32X32) {
- if (vt->split[i].split[k].part_variances.none.variance <
- threshold_32x32)
- part_info->variance_low[25 + (i << 2) + k] = 1;
+ if (vt->split[lvl1_idx]
+ .split[lvl2_idx]
+ .part_variances.none.variance < threshold_32x32)
+ part_info->variance_low[25 + (lvl1_idx << 2) + lvl2_idx] = 1;
} else {
// For 32x16 and 16x32 blocks, the flag is set on each 16x16 block
// inside.
if ((*mi_32)->bsize == BLOCK_16X16 ||
(*mi_32)->bsize == BLOCK_32X16 ||
(*mi_32)->bsize == BLOCK_16X32) {
- for (int j = 0; j < 4; j++) {
- if (vt->split[i]
- .split[k]
- .split[j]
- .part_variances.none.variance < (thresholds[3] >> 8))
- part_info->variance_low[41 + (i << 4) + (k << 2) + j] = 1;
+ for (int lvl3_idx = 0; lvl3_idx < 4; lvl3_idx++) {
+ VPartVar *none_var = &vt->split[lvl1_idx]
+ .split[lvl2_idx]
+ .split[lvl3_idx]
+ .part_variances.none;
+ if (none_var->variance < (thresholds[3] >> 8))
+ part_info->variance_low[41 + (lvl1_idx << 4) +
+ (lvl2_idx << 2) + lvl3_idx] = 1;
}
}
}
@@ -779,14 +840,13 @@
static AOM_INLINE void set_low_temp_var_flag(
AV1_COMP *cpi, PartitionSearchInfo *part_info, MACROBLOCKD *xd,
VP128x128 *vt, int64_t thresholds[], MV_REFERENCE_FRAME ref_frame_partition,
- int mi_col, int mi_row) {
+ int mi_col, int mi_row, const bool is_small_sb) {
AV1_COMMON *const cm = &cpi->common;
// Check temporal variance for bsize >= 16x16, if LAST_FRAME was selected.
// If the temporal variance is small set the flag
// variance_low for the block. The variance threshold can be adjusted, the
// higher the more aggressive.
if (ref_frame_partition == LAST_FRAME) {
- const int is_small_sb = (cm->seq_params->sb_size == BLOCK_64X64);
if (is_small_sb)
set_low_temp_var_flag_64x64(&cm->mi_params, part_info, xd,
&(vt->split[0]), thresholds, mi_col, mi_row);
@@ -922,37 +982,48 @@
return force_skip_low_temp_var;
}
-void av1_set_variance_partition_thresholds(AV1_COMP *cpi, int q,
+void av1_set_variance_partition_thresholds(AV1_COMP *cpi, int qindex,
int content_lowsumdiff) {
SPEED_FEATURES *const sf = &cpi->sf;
if (sf->part_sf.partition_search_type != VAR_BASED_PARTITION) {
return;
} else {
- set_vbp_thresholds(cpi, cpi->vbp_info.thresholds, q, content_lowsumdiff, 0,
- 0, 0, 0, 0);
+ set_vbp_thresholds(cpi, cpi->vbp_info.thresholds, 0, qindex,
+ content_lowsumdiff, 0, 0, 0, 0);
// The threshold below is not changed locally.
- cpi->vbp_info.threshold_minmax = 15 + (q >> 3);
+ cpi->vbp_info.threshold_minmax = 15 + (qindex >> 3);
}
}
static AOM_INLINE void chroma_check(AV1_COMP *cpi, MACROBLOCK *x,
BLOCK_SIZE bsize, unsigned int y_sad,
- unsigned int y_sad_g, int is_key_frame,
- int zero_motion, unsigned int *uv_sad) {
- int i;
+ unsigned int y_sad_g,
+ unsigned int y_sad_alt, bool is_key_frame,
+ bool zero_motion, unsigned int *uv_sad) {
MACROBLOCKD *xd = &x->e_mbd;
const int source_sad_nonrd = x->content_state_sb.source_sad_nonrd;
int shift_upper_limit = 1;
int shift_lower_limit = 3;
+ int fac_uv = 6;
if (is_key_frame || cpi->oxcf.tool_cfg.enable_monochrome) return;
+ // Use lower threshold (more conservative in setting color flag) for
+ // higher resolutions non-screen, which tend to have more camera noise.
+ // Since this may be used to skip compound mode in nonrd pickmode, which
+ // is generally more effective for higher resolutions, better to be more
+ // conservative.
+ if (cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN) {
+ if (cpi->common.width * cpi->common.height >= RESOLUTION_1080P)
+ fac_uv = 3;
+ else
+ fac_uv = 5;
+ }
if (cpi->oxcf.tune_cfg.content == AOM_CONTENT_SCREEN &&
- cpi->rc.high_source_sad)
- shift_lower_limit = 5;
- else if (source_sad_nonrd >= kMedSad &&
- cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN &&
- (int64_t) cpi->common.width * (int64_t) cpi->common.height >=
- (int64_t) 640 * 360) {
+ cpi->rc.high_source_sad) {
+ shift_lower_limit = 7;
+ } else if (source_sad_nonrd >= kMedSad &&
+ cpi->oxcf.tune_cfg.content != AOM_CONTENT_SCREEN &&
+ cpi->common.width * cpi->common.height >= 640 * 360) {
shift_upper_limit = 2;
shift_lower_limit = source_sad_nonrd > kMedSad ? 5 : 4;
}
@@ -961,14 +1032,16 @@
const AV1_COMMON *const cm = &cpi->common;
const YV12_BUFFER_CONFIG *yv12 = get_ref_frame_yv12_buf(cm, LAST_FRAME);
const YV12_BUFFER_CONFIG *yv12_g = get_ref_frame_yv12_buf(cm, GOLDEN_FRAME);
+ const YV12_BUFFER_CONFIG *yv12_alt = get_ref_frame_yv12_buf(cm, ALTREF_FRAME);
const struct scale_factors *const sf =
get_ref_scale_factors_const(cm, LAST_FRAME);
struct buf_2d dst;
unsigned int uv_sad_g = 0;
+ unsigned int uv_sad_alt = 0;
- for (i = 1; i <= 2; ++i) {
- struct macroblock_plane *p = &x->plane[i];
- struct macroblockd_plane *pd = &xd->plane[i];
+ for (int plane = AOM_PLANE_U; plane < MAX_MB_PLANE; ++plane) {
+ struct macroblock_plane *p = &x->plane[plane];
+ struct macroblockd_plane *pd = &xd->plane[plane];
const BLOCK_SIZE bs =
get_plane_block_size(bsize, pd->subsampling_x, pd->subsampling_y);
@@ -976,57 +1049,70 @@
// For last:
if (zero_motion) {
if (mi->ref_frame[0] == LAST_FRAME) {
- uv_sad[i - 1] = cpi->ppi->fn_ptr[bs].sdf(
+ uv_sad[plane - 1] = cpi->ppi->fn_ptr[bs].sdf(
p->src.buf, p->src.stride, pd->pre[0].buf, pd->pre[0].stride);
} else {
- uint8_t *src = (i == 1) ? yv12->u_buffer : yv12->v_buffer;
+ uint8_t *src = (plane == 1) ? yv12->u_buffer : yv12->v_buffer;
setup_pred_plane(&dst, xd->mi[0]->bsize, src, yv12->uv_crop_width,
yv12->uv_crop_height, yv12->uv_stride, xd->mi_row,
- xd->mi_col, sf, xd->plane[i].subsampling_x,
- xd->plane[i].subsampling_y);
+ xd->mi_col, sf, xd->plane[plane].subsampling_x,
+ xd->plane[plane].subsampling_y);
- uv_sad[i - 1] = cpi->ppi->fn_ptr[bs].sdf(p->src.buf, p->src.stride,
- dst.buf, dst.stride);
+ uv_sad[plane - 1] = cpi->ppi->fn_ptr[bs].sdf(
+ p->src.buf, p->src.stride, dst.buf, dst.stride);
}
} else {
- uv_sad[i - 1] = cpi->ppi->fn_ptr[bs].sdf(p->src.buf, p->src.stride,
- pd->dst.buf, pd->dst.stride);
+ uv_sad[plane - 1] = cpi->ppi->fn_ptr[bs].sdf(
+ p->src.buf, p->src.stride, pd->dst.buf, pd->dst.stride);
}
// For golden:
if (y_sad_g != UINT_MAX) {
- uint8_t *src = (i == 1) ? yv12_g->u_buffer : yv12_g->v_buffer;
+ uint8_t *src = (plane == 1) ? yv12_g->u_buffer : yv12_g->v_buffer;
setup_pred_plane(&dst, xd->mi[0]->bsize, src, yv12_g->uv_crop_width,
yv12_g->uv_crop_height, yv12_g->uv_stride, xd->mi_row,
- xd->mi_col, sf, xd->plane[i].subsampling_x,
- xd->plane[i].subsampling_y);
+ xd->mi_col, sf, xd->plane[plane].subsampling_x,
+ xd->plane[plane].subsampling_y);
uv_sad_g = cpi->ppi->fn_ptr[bs].sdf(p->src.buf, p->src.stride, dst.buf,
dst.stride);
}
+
+ // For altref:
+ if (y_sad_alt != UINT_MAX) {
+ uint8_t *src = (plane == 1) ? yv12_alt->u_buffer : yv12_alt->v_buffer;
+ setup_pred_plane(&dst, xd->mi[0]->bsize, src, yv12_alt->uv_crop_width,
+ yv12_alt->uv_crop_height, yv12_alt->uv_stride,
+ xd->mi_row, xd->mi_col, sf,
+ xd->plane[plane].subsampling_x,
+ xd->plane[plane].subsampling_y);
+ uv_sad_alt = cpi->ppi->fn_ptr[bs].sdf(p->src.buf, p->src.stride,
+ dst.buf, dst.stride);
+ }
}
- if (uv_sad[i - 1] > (y_sad >> shift_upper_limit))
- x->color_sensitivity_sb[i - 1] = 1;
- else if (uv_sad[i - 1] < (y_sad >> shift_lower_limit))
- x->color_sensitivity_sb[i - 1] = 0;
+ if (uv_sad[plane - 1] > (y_sad >> shift_upper_limit))
+ x->color_sensitivity_sb[COLOR_SENS_IDX(plane)] = 1;
+ else if (uv_sad[plane - 1] < (y_sad >> shift_lower_limit))
+ x->color_sensitivity_sb[COLOR_SENS_IDX(plane)] = 0;
// Borderline case: to be refined at coding block level in nonrd_pickmode,
// for coding block size < sb_size.
else
- x->color_sensitivity_sb[i - 1] = 2;
+ x->color_sensitivity_sb[COLOR_SENS_IDX(plane)] = 2;
- x->color_sensitivity_sb_g[i - 1] = uv_sad_g > y_sad_g / 6;
+ x->color_sensitivity_sb_g[COLOR_SENS_IDX(plane)] =
+ uv_sad_g > y_sad_g / fac_uv;
+ x->color_sensitivity_sb_alt[COLOR_SENS_IDX(plane)] =
+ uv_sad_alt > y_sad_alt / fac_uv;
}
}
static void fill_variance_tree_leaves(
- AV1_COMP *cpi, MACROBLOCK *x, VP128x128 *vt, VP16x16 *vt2,
- PART_EVAL_STATUS *force_split, int avg_16x16[][4], int maxvar_16x16[][4],
- int minvar_16x16[][4], int *variance4x4downsample, int64_t *thresholds,
- uint8_t *src, int src_stride, const uint8_t *dst, int dst_stride,
- bool is_key_frame) {
- AV1_COMMON *cm = &cpi->common;
+ AV1_COMP *cpi, MACROBLOCK *x, VP128x128 *vt, PART_EVAL_STATUS *force_split,
+ int avg_16x16[][4], int maxvar_16x16[][4], int minvar_16x16[][4],
+ int *variance4x4downsample, int64_t *thresholds, const uint8_t *src_buf,
+ int src_stride, const uint8_t *dst_buf, int dst_stride, bool is_key_frame,
+ const bool is_small_sb) {
MACROBLOCKD *xd = &x->e_mbd;
- const int is_small_sb = (cm->seq_params->sb_size == BLOCK_64X64);
const int num_64x64_blocks = is_small_sb ? 1 : 4;
// TODO(kyslov) Bring back compute_minmax_variance with content type detection
const int compute_minmax_variance = 0;
@@ -1034,6 +1120,8 @@
int pixels_wide = 128, pixels_high = 128;
int border_offset_4x4 = 0;
int temporal_denoising = cpi->sf.rt_sf.use_rtc_tf;
+ // dst_buf pointer is not used for is_key_frame, so it should be NULL.
+ assert(IMPLIES(is_key_frame, dst_buf == NULL));
if (is_small_sb) {
pixels_wide = 64;
pixels_high = 64;
@@ -1049,121 +1137,236 @@
// data outside superblock (while its being modified by temporal filter).
// Temporal filtering is never done on key frames.
if (!is_key_frame && temporal_denoising) border_offset_4x4 = 4;
- for (int m = 0; m < num_64x64_blocks; m++) {
- const int x64_idx = ((m & 1) << 6);
- const int y64_idx = ((m >> 1) << 6);
- const int m2 = m << 2;
- force_split[m + 1] = PART_EVAL_ALL;
+ for (int blk64_idx = 0; blk64_idx < num_64x64_blocks; blk64_idx++) {
+ const int x64_idx = GET_BLK_IDX_X(blk64_idx, 6);
+ const int y64_idx = GET_BLK_IDX_Y(blk64_idx, 6);
+ const int blk64_scale_idx = blk64_idx << 2;
+ force_split[blk64_idx + 1] = PART_EVAL_ALL;
- for (int i = 0; i < 4; i++) {
- const int x32_idx = x64_idx + ((i & 1) << 5);
- const int y32_idx = y64_idx + ((i >> 1) << 5);
- const int i2 = (m2 + i) << 2;
- force_split[5 + m2 + i] = PART_EVAL_ALL;
- avg_16x16[m][i] = 0;
- maxvar_16x16[m][i] = 0;
- minvar_16x16[m][i] = INT_MAX;
- for (int j = 0; j < 4; j++) {
- const int x16_idx = x32_idx + ((j & 1) << 4);
- const int y16_idx = y32_idx + ((j >> 1) << 4);
- const int split_index = 21 + i2 + j;
- VP16x16 *vst = &vt->split[m].split[i].split[j];
+ for (int lvl1_idx = 0; lvl1_idx < 4; lvl1_idx++) {
+ const int x32_idx = x64_idx + GET_BLK_IDX_X(lvl1_idx, 5);
+ const int y32_idx = y64_idx + GET_BLK_IDX_Y(lvl1_idx, 5);
+ const int lvl1_scale_idx = (blk64_scale_idx + lvl1_idx) << 2;
+ force_split[5 + blk64_scale_idx + lvl1_idx] = PART_EVAL_ALL;
+ avg_16x16[blk64_idx][lvl1_idx] = 0;
+ maxvar_16x16[blk64_idx][lvl1_idx] = 0;
+ minvar_16x16[blk64_idx][lvl1_idx] = INT_MAX;
+ for (int lvl2_idx = 0; lvl2_idx < 4; lvl2_idx++) {
+ const int x16_idx = x32_idx + GET_BLK_IDX_X(lvl2_idx, 4);
+ const int y16_idx = y32_idx + GET_BLK_IDX_Y(lvl2_idx, 4);
+ const int split_index = 21 + lvl1_scale_idx + lvl2_idx;
+ VP16x16 *vst = &vt->split[blk64_idx].split[lvl1_idx].split[lvl2_idx];
force_split[split_index] = PART_EVAL_ALL;
- variance4x4downsample[i2 + j] = 0;
- if (!is_key_frame) {
- fill_variance_8x8avg(src, src_stride, dst, dst_stride, x16_idx,
- y16_idx, vst, is_cur_buf_hbd(xd), pixels_wide,
- pixels_high, is_key_frame);
+ variance4x4downsample[lvl1_scale_idx + lvl2_idx] = 0;
+ if (is_key_frame) {
+ force_split[split_index] = PART_EVAL_ALL;
+ // Go down to 4x4 down-sampling for variance.
+ variance4x4downsample[lvl1_scale_idx + lvl2_idx] = 1;
+ for (int lvl3_idx = 0; lvl3_idx < 4; lvl3_idx++) {
+ const int x8_idx = x16_idx + GET_BLK_IDX_X(lvl3_idx, 3);
+ const int y8_idx = y16_idx + GET_BLK_IDX_Y(lvl3_idx, 3);
+ VP8x8 *vst2 = &vst->split[lvl3_idx];
+ fill_variance_4x4avg(src_buf, src_stride, x8_idx, y8_idx, vst2,
+#if CONFIG_AV1_HIGHBITDEPTH
+ xd->cur_buf->flags,
+#endif
+ pixels_wide, pixels_high, border_offset_4x4);
+ }
+ } else {
+ fill_variance_8x8avg(src_buf, src_stride, dst_buf, dst_stride,
+ x16_idx, y16_idx, vst, is_cur_buf_hbd(xd),
+ pixels_wide, pixels_high);
- fill_variance_tree(&vt->split[m].split[i].split[j], BLOCK_16X16);
- get_variance(&vt->split[m].split[i].split[j].part_variances.none);
- avg_16x16[m][i] +=
- vt->split[m].split[i].split[j].part_variances.none.variance;
- if (vt->split[m].split[i].split[j].part_variances.none.variance <
- minvar_16x16[m][i])
- minvar_16x16[m][i] =
- vt->split[m].split[i].split[j].part_variances.none.variance;
- if (vt->split[m].split[i].split[j].part_variances.none.variance >
- maxvar_16x16[m][i])
- maxvar_16x16[m][i] =
- vt->split[m].split[i].split[j].part_variances.none.variance;
- if (vt->split[m].split[i].split[j].part_variances.none.variance >
- thresholds[3]) {
+ fill_variance_tree(vst, BLOCK_16X16);
+ VPartVar *none_var = &vt->split[blk64_idx]
+ .split[lvl1_idx]
+ .split[lvl2_idx]
+ .part_variances.none;
+ get_variance(none_var);
+ const int val_none_var = none_var->variance;
+ avg_16x16[blk64_idx][lvl1_idx] += val_none_var;
+ minvar_16x16[blk64_idx][lvl1_idx] =
+ AOMMIN(minvar_16x16[blk64_idx][lvl1_idx], val_none_var);
+ maxvar_16x16[blk64_idx][lvl1_idx] =
+ AOMMAX(maxvar_16x16[blk64_idx][lvl1_idx], val_none_var);
+ if (val_none_var > thresholds[3]) {
// 16X16 variance is above threshold for split, so force split to
// 8x8 for this 16x16 block (this also forces splits for upper
// levels).
force_split[split_index] = PART_EVAL_ONLY_SPLIT;
- force_split[5 + m2 + i] = PART_EVAL_ONLY_SPLIT;
- force_split[m + 1] = PART_EVAL_ONLY_SPLIT;
+ force_split[5 + blk64_scale_idx + lvl1_idx] = PART_EVAL_ONLY_SPLIT;
+ force_split[blk64_idx + 1] = PART_EVAL_ONLY_SPLIT;
force_split[0] = PART_EVAL_ONLY_SPLIT;
} else if (!cyclic_refresh_segment_id_boosted(segment_id) &&
- compute_minmax_variance &&
- vt->split[m]
- .split[i]
- .split[j]
- .part_variances.none.variance > thresholds[2]) {
+ compute_minmax_variance && val_none_var > thresholds[2]) {
// We have some nominal amount of 16x16 variance (based on average),
// compute the minmax over the 8x8 sub-blocks, and if above
// threshold, force split to 8x8 block for this 16x16 block.
- int minmax = compute_minmax_8x8(src, src_stride, dst, dst_stride,
- x16_idx, y16_idx,
+ int minmax = compute_minmax_8x8(src_buf, src_stride, dst_buf,
+ dst_stride, x16_idx, y16_idx,
#if CONFIG_AV1_HIGHBITDEPTH
xd->cur_buf->flags,
#endif
pixels_wide, pixels_high);
- int thresh_minmax = (int)cpi->vbp_info.threshold_minmax;
+ const int thresh_minmax = (int)cpi->vbp_info.threshold_minmax;
if (minmax > thresh_minmax) {
force_split[split_index] = PART_EVAL_ONLY_SPLIT;
- force_split[5 + m2 + i] = PART_EVAL_ONLY_SPLIT;
- force_split[m + 1] = PART_EVAL_ONLY_SPLIT;
+ force_split[5 + blk64_scale_idx + lvl1_idx] =
+ PART_EVAL_ONLY_SPLIT;
+ force_split[blk64_idx + 1] = PART_EVAL_ONLY_SPLIT;
force_split[0] = PART_EVAL_ONLY_SPLIT;
}
}
}
- if (is_key_frame) {
- force_split[split_index] = PART_EVAL_ALL;
- // Go down to 4x4 down-sampling for variance.
- variance4x4downsample[i2 + j] = 1;
- for (int k = 0; k < 4; k++) {
- int x8_idx = x16_idx + ((k & 1) << 3);
- int y8_idx = y16_idx + ((k >> 1) << 3);
- VP8x8 *vst2 = is_key_frame ? &vst->split[k] : &vt2[i2 + j].split[k];
- fill_variance_4x4avg(
- src, src_stride, dst, dst_stride, x8_idx, y8_idx, vst2,
-#if CONFIG_AV1_HIGHBITDEPTH
- xd->cur_buf->flags,
-#endif
- pixels_wide, pixels_high, is_key_frame, border_offset_4x4);
- }
- }
}
}
}
}
+static AOM_INLINE void set_ref_frame_for_partition(
+ AV1_COMP *cpi, MACROBLOCK *x, MACROBLOCKD *xd,
+ MV_REFERENCE_FRAME *ref_frame_partition, MB_MODE_INFO *mi,
+ unsigned int *y_sad, unsigned int *y_sad_g, unsigned int *y_sad_alt,
+ const YV12_BUFFER_CONFIG *yv12_g, const YV12_BUFFER_CONFIG *yv12_alt,
+ int mi_row, int mi_col, int num_planes) {
+ AV1_COMMON *const cm = &cpi->common;
+ const bool is_set_golden_ref_frame =
+ *y_sad_g < 0.9 * *y_sad && *y_sad_g < *y_sad_alt;
+ const bool is_set_altref_ref_frame =
+ *y_sad_alt < 0.9 * *y_sad && *y_sad_alt < *y_sad_g;
+
+ if (is_set_golden_ref_frame) {
+ av1_setup_pre_planes(xd, 0, yv12_g, mi_row, mi_col,
+ get_ref_scale_factors(cm, GOLDEN_FRAME), num_planes);
+ mi->ref_frame[0] = GOLDEN_FRAME;
+ mi->mv[0].as_int = 0;
+ *y_sad = *y_sad_g;
+ *ref_frame_partition = GOLDEN_FRAME;
+ x->nonrd_prune_ref_frame_search = 0;
+ } else if (is_set_altref_ref_frame) {
+ av1_setup_pre_planes(xd, 0, yv12_alt, mi_row, mi_col,
+ get_ref_scale_factors(cm, ALTREF_FRAME), num_planes);
+ mi->ref_frame[0] = ALTREF_FRAME;
+ mi->mv[0].as_int = 0;
+ *y_sad = *y_sad_alt;
+ *ref_frame_partition = ALTREF_FRAME;
+ x->nonrd_prune_ref_frame_search = 0;
+ } else {
+ *ref_frame_partition = LAST_FRAME;
+ x->nonrd_prune_ref_frame_search =
+ cpi->sf.rt_sf.nonrd_prune_ref_frame_search;
+ }
+}
+
+static AOM_FORCE_INLINE int mv_distance(const FULLPEL_MV *mv0,
+ const FULLPEL_MV *mv1) {
+ return abs(mv0->row - mv1->row) + abs(mv0->col - mv1->col);
+}
+
+static AOM_INLINE void evaluate_neighbour_mvs(AV1_COMP *cpi, MACROBLOCK *x,
+ unsigned int *y_sad,
+ bool is_small_sb,
+ int est_motion) {
+ const int source_sad_nonrd = x->content_state_sb.source_sad_nonrd;
+ // TODO([email protected]): test if this condition works with other
+ // speeds.
+ if (est_motion > 2 && source_sad_nonrd > kMedSad) return;
+
+ MACROBLOCKD *xd = &x->e_mbd;
+ BLOCK_SIZE bsize = is_small_sb ? BLOCK_64X64 : BLOCK_128X128;
+ MB_MODE_INFO *mi = xd->mi[0];
+
+ unsigned int above_y_sad = UINT_MAX;
+ unsigned int left_y_sad = UINT_MAX;
+ FULLPEL_MV above_mv = kZeroFullMv;
+ FULLPEL_MV left_mv = kZeroFullMv;
+ SubpelMvLimits subpel_mv_limits;
+ const MV dummy_mv = { 0, 0 };
+ av1_set_subpel_mv_search_range(&subpel_mv_limits, &x->mv_limits, &dummy_mv);
+
+ // Current best MV
+ FULLPEL_MV best_mv = get_fullmv_from_mv(&mi->mv[0].as_mv);
+ const int multi = (est_motion > 2 && source_sad_nonrd > kLowSad) ? 7 : 8;
+
+ if (xd->up_available) {
+ const MB_MODE_INFO *above_mbmi = xd->above_mbmi;
+ if (above_mbmi->mode >= INTRA_MODE_END &&
+ above_mbmi->ref_frame[0] == LAST_FRAME) {
+ MV temp = above_mbmi->mv[0].as_mv;
+ clamp_mv(&temp, &subpel_mv_limits);
+ above_mv = get_fullmv_from_mv(&temp);
+
+ if (mv_distance(&best_mv, &above_mv) > 0) {
+ uint8_t const *ref_buf =
+ get_buf_from_fullmv(&xd->plane[0].pre[0], &above_mv);
+ above_y_sad = cpi->ppi->fn_ptr[bsize].sdf(
+ x->plane[0].src.buf, x->plane[0].src.stride, ref_buf,
+ xd->plane[0].pre[0].stride);
+ }
+ }
+ }
+ if (xd->left_available) {
+ const MB_MODE_INFO *left_mbmi = xd->left_mbmi;
+ if (left_mbmi->mode >= INTRA_MODE_END &&
+ left_mbmi->ref_frame[0] == LAST_FRAME) {
+ MV temp = left_mbmi->mv[0].as_mv;
+ clamp_mv(&temp, &subpel_mv_limits);
+ left_mv = get_fullmv_from_mv(&temp);
+
+ if (mv_distance(&best_mv, &left_mv) > 0 &&
+ mv_distance(&above_mv, &left_mv) > 0) {
+ uint8_t const *ref_buf =
+ get_buf_from_fullmv(&xd->plane[0].pre[0], &left_mv);
+ left_y_sad = cpi->ppi->fn_ptr[bsize].sdf(
+ x->plane[0].src.buf, x->plane[0].src.stride, ref_buf,
+ xd->plane[0].pre[0].stride);
+ }
+ }
+ }
+
+ if (above_y_sad < ((multi * *y_sad) >> 3) && above_y_sad < left_y_sad) {
+ *y_sad = above_y_sad;
+ mi->mv[0].as_mv = get_mv_from_fullmv(&above_mv);
+ clamp_mv(&mi->mv[0].as_mv, &subpel_mv_limits);
+ }
+ if (left_y_sad < ((multi * *y_sad) >> 3) && left_y_sad < above_y_sad) {
+ *y_sad = left_y_sad;
+ mi->mv[0].as_mv = get_mv_from_fullmv(&left_mv);
+ clamp_mv(&mi->mv[0].as_mv, &subpel_mv_limits);
+ }
+}
+
static void setup_planes(AV1_COMP *cpi, MACROBLOCK *x, unsigned int *y_sad,
unsigned int *y_sad_g, unsigned int *y_sad_alt,
unsigned int *y_sad_last,
MV_REFERENCE_FRAME *ref_frame_partition, int mi_row,
- int mi_col) {
+ int mi_col, bool is_small_sb, bool scaled_ref_last) {
AV1_COMMON *const cm = &cpi->common;
MACROBLOCKD *xd = &x->e_mbd;
const int num_planes = av1_num_planes(cm);
- const int is_small_sb = (cm->seq_params->sb_size == BLOCK_64X64);
BLOCK_SIZE bsize = is_small_sb ? BLOCK_64X64 : BLOCK_128X128;
MB_MODE_INFO *mi = xd->mi[0];
- const YV12_BUFFER_CONFIG *yv12 = get_ref_frame_yv12_buf(cm, LAST_FRAME);
+ const YV12_BUFFER_CONFIG *yv12 =
+ scaled_ref_last ? av1_get_scaled_ref_frame(cpi, LAST_FRAME)
+ : get_ref_frame_yv12_buf(cm, LAST_FRAME);
assert(yv12 != NULL);
const YV12_BUFFER_CONFIG *yv12_g = NULL;
const YV12_BUFFER_CONFIG *yv12_alt = NULL;
// Check if LAST is a reference. For spatial layers always use it as
- // reference scaling (golden or altref being lower resolution) is not
- // handled/check here.
+ // reference scaling.
int use_last_ref = (cpi->ref_frame_flags & AOM_LAST_FLAG) ||
cpi->svc.number_spatial_layers > 1;
int use_golden_ref = cpi->ref_frame_flags & AOM_GOLD_FLAG;
int use_alt_ref = cpi->ppi->rtc_ref.set_ref_frame_config ||
- cpi->sf.rt_sf.use_nonrd_altref_frame;
+ cpi->sf.rt_sf.use_nonrd_altref_frame ||
+ (cpi->sf.rt_sf.use_comp_ref_nonrd &&
+ cpi->sf.rt_sf.ref_frame_comp_nonrd[2] == 1);
+ // On a resized frame (reference has different scale) only use
+ // LAST as reference for partitioning for now.
+ if (scaled_ref_last) {
+ use_golden_ref = 0;
+ use_alt_ref = 0;
+ }
// For 1 spatial layer: GOLDEN is another temporal reference.
// Check if it should be used as reference for partitioning.
@@ -1174,8 +1377,9 @@
av1_setup_pre_planes(xd, 0, yv12_g, mi_row, mi_col,
get_ref_scale_factors(cm, GOLDEN_FRAME), num_planes);
*y_sad_g = cpi->ppi->fn_ptr[bsize].sdf(
- x->plane[0].src.buf, x->plane[0].src.stride, xd->plane[0].pre[0].buf,
- xd->plane[0].pre[0].stride);
+ x->plane[AOM_PLANE_Y].src.buf, x->plane[AOM_PLANE_Y].src.stride,
+ xd->plane[AOM_PLANE_Y].pre[0].buf,
+ xd->plane[AOM_PLANE_Y].pre[0].stride);
}
}
@@ -1189,57 +1393,60 @@
av1_setup_pre_planes(xd, 0, yv12_alt, mi_row, mi_col,
get_ref_scale_factors(cm, ALTREF_FRAME), num_planes);
*y_sad_alt = cpi->ppi->fn_ptr[bsize].sdf(
- x->plane[0].src.buf, x->plane[0].src.stride, xd->plane[0].pre[0].buf,
- xd->plane[0].pre[0].stride);
+ x->plane[AOM_PLANE_Y].src.buf, x->plane[AOM_PLANE_Y].src.stride,
+ xd->plane[AOM_PLANE_Y].pre[0].buf,
+ xd->plane[AOM_PLANE_Y].pre[0].stride);
}
}
if (use_last_ref) {
- av1_setup_pre_planes(xd, 0, yv12, mi_row, mi_col,
- get_ref_scale_factors(cm, LAST_FRAME), num_planes);
+ const int source_sad_nonrd = x->content_state_sb.source_sad_nonrd;
+ av1_setup_pre_planes(
+ xd, 0, yv12, mi_row, mi_col,
+ scaled_ref_last ? NULL : get_ref_scale_factors(cm, LAST_FRAME),
+ num_planes);
mi->ref_frame[0] = LAST_FRAME;
mi->ref_frame[1] = NONE_FRAME;
mi->bsize = cm->seq_params->sb_size;
mi->mv[0].as_int = 0;
mi->interp_filters = av1_broadcast_interp_filter(BILINEAR);
- if (cpi->sf.rt_sf.estimate_motion_for_var_based_partition) {
+
+ int est_motion = cpi->sf.rt_sf.estimate_motion_for_var_based_partition;
+ // TODO(b/290596301): Look into adjusting this condition.
+ // There is regression on color content when
+ // estimate_motion_for_var_based_partition = 3 and high motion,
+ // so for now force it to 2 based on superblock sad.
+ if (est_motion > 2 && source_sad_nonrd > kMedSad) est_motion = 2;
+
+ if (est_motion == 1 || est_motion == 2) {
if (xd->mb_to_right_edge >= 0 && xd->mb_to_bottom_edge >= 0) {
const MV dummy_mv = { 0, 0 };
*y_sad = av1_int_pro_motion_estimation(cpi, x, cm->seq_params->sb_size,
mi_row, mi_col, &dummy_mv);
}
}
+
if (*y_sad == UINT_MAX) {
*y_sad = cpi->ppi->fn_ptr[bsize].sdf(
- x->plane[0].src.buf, x->plane[0].src.stride, xd->plane[0].pre[0].buf,
- xd->plane[0].pre[0].stride);
+ x->plane[AOM_PLANE_Y].src.buf, x->plane[AOM_PLANE_Y].src.stride,
+ xd->plane[AOM_PLANE_Y].pre[0].buf,
+ xd->plane[AOM_PLANE_Y].pre[0].stride);
}
+
+ // Evaluate if neighbours' MVs give better predictions. Zero MV is tested
+ // already, so only non-zero MVs are tested here. Here the neighbour blocks
+ // are the first block above or left to this superblock.
+ if (est_motion >= 2 && (xd->up_available || xd->left_available))
+ evaluate_neighbour_mvs(cpi, x, y_sad, is_small_sb, est_motion);
+
*y_sad_last = *y_sad;
}
// Pick the ref frame for partitioning, use golden or altref frame only if
// its lower sad, bias to LAST with factor 0.9.
- if (*y_sad_g < 0.9 * *y_sad && *y_sad_g < *y_sad_alt) {
- av1_setup_pre_planes(xd, 0, yv12_g, mi_row, mi_col,
- get_ref_scale_factors(cm, GOLDEN_FRAME), num_planes);
- mi->ref_frame[0] = GOLDEN_FRAME;
- mi->mv[0].as_int = 0;
- *y_sad = *y_sad_g;
- *ref_frame_partition = GOLDEN_FRAME;
- x->nonrd_prune_ref_frame_search = 0;
- } else if (*y_sad_alt < 0.9 * *y_sad && *y_sad_alt < *y_sad_g) {
- av1_setup_pre_planes(xd, 0, yv12_alt, mi_row, mi_col,
- get_ref_scale_factors(cm, ALTREF_FRAME), num_planes);
- mi->ref_frame[0] = ALTREF_FRAME;
- mi->mv[0].as_int = 0;
- *y_sad = *y_sad_alt;
- *ref_frame_partition = ALTREF_FRAME;
- x->nonrd_prune_ref_frame_search = 0;
- } else {
- *ref_frame_partition = LAST_FRAME;
- x->nonrd_prune_ref_frame_search =
- cpi->sf.rt_sf.nonrd_prune_ref_frame_search;
- }
+ set_ref_frame_for_partition(cpi, x, xd, ref_frame_partition, mi, y_sad,
+ y_sad_g, y_sad_alt, yv12_g, yv12_alt, mi_row,
+ mi_col, num_planes);
// Only calculate the predictor for non-zero MV.
if (mi->mv[0].as_int != 0) {
@@ -1255,9 +1462,10 @@
static AOM_INLINE PART_EVAL_STATUS get_part_eval_based_on_sub_blk_var(
VP16x16 *var_16x16_info, int64_t threshold16) {
int max_8x8_var = 0, min_8x8_var = INT_MAX;
- for (int k = 0; k < 4; k++) {
- get_variance(&var_16x16_info->split[k].part_variances.none);
- int this_8x8_var = var_16x16_info->split[k].part_variances.none.variance;
+ for (int split_idx = 0; split_idx < 4; split_idx++) {
+ get_variance(&var_16x16_info->split[split_idx].part_variances.none);
+ int this_8x8_var =
+ var_16x16_info->split[split_idx].part_variances.none.variance;
max_8x8_var = AOMMAX(this_8x8_var, max_8x8_var);
min_8x8_var = AOMMIN(this_8x8_var, min_8x8_var);
}
@@ -1282,6 +1490,44 @@
return false;
}
+static AOM_INLINE bool set_force_zeromv_skip_for_sb(
+ AV1_COMP *cpi, MACROBLOCK *x, const TileInfo *const tile, VP16x16 *vt2,
+ VP128x128 *vt, unsigned int *uv_sad, int mi_row, int mi_col,
+ unsigned int y_sad, BLOCK_SIZE bsize) {
+ AV1_COMMON *const cm = &cpi->common;
+ if (!is_set_force_zeromv_skip_based_on_src_sad(
+ cpi->sf.rt_sf.set_zeromv_skip_based_on_source_sad,
+ x->content_state_sb.source_sad_nonrd))
+ return false;
+ const int block_width = mi_size_wide[cm->seq_params->sb_size];
+ const int block_height = mi_size_high[cm->seq_params->sb_size];
+ const unsigned int thresh_exit_part_y =
+ cpi->zeromv_skip_thresh_exit_part[bsize];
+ unsigned int thresh_exit_part_uv =
+ CALC_CHROMA_THRESH_FOR_ZEROMV_SKIP(thresh_exit_part_y);
+ // Be more aggressive in UV threshold if source_sad >= VeryLowSad
+ // to suppreess visual artifact caused by the speed feature:
+ // set_zeromv_skip_based_on_source_sad = 2. For now only for
+ // part_early_exit_zeromv = 1.
+ if (x->content_state_sb.source_sad_nonrd >= kVeryLowSad &&
+ cpi->sf.rt_sf.part_early_exit_zeromv == 1)
+ thresh_exit_part_uv = thresh_exit_part_uv >> 3;
+ if (mi_col + block_width <= tile->mi_col_end &&
+ mi_row + block_height <= tile->mi_row_end && y_sad < thresh_exit_part_y &&
+ uv_sad[0] < thresh_exit_part_uv && uv_sad[1] < thresh_exit_part_uv) {
+ set_block_size(cpi, mi_row, mi_col, bsize);
+ x->force_zeromv_skip_for_sb = 1;
+ if (vt2) aom_free(vt2);
+ if (vt) aom_free(vt);
+ // Partition shape is set here at SB level.
+ // Exit needs to happen from av1_choose_var_based_partitioning().
+ return true;
+ } else if (x->content_state_sb.source_sad_nonrd == kZeroSad &&
+ cpi->sf.rt_sf.part_early_exit_zeromv >= 2)
+ x->force_zeromv_skip_for_sb = 2;
+ return false;
+}
+
int av1_choose_var_based_partitioning(AV1_COMP *cpi, const TileInfo *const tile,
ThreadData *td, MACROBLOCK *x, int mi_row,
int mi_col) {
@@ -1291,8 +1537,6 @@
AV1_COMMON *const cm = &cpi->common;
MACROBLOCKD *xd = &x->e_mbd;
const int64_t *const vbp_thresholds = cpi->vbp_info.thresholds;
-
- int i, j, k, m;
VP128x128 *vt;
VP16x16 *vt2 = NULL;
PART_EVAL_STATUS force_split[85];
@@ -1307,22 +1551,22 @@
int maxvar_16x16[4][4];
int minvar_16x16[4][4];
int64_t threshold_4x4avg;
- uint8_t *s;
- const uint8_t *d;
- int sp;
- int dp;
- unsigned int uv_sad[2];
+ const uint8_t *src_buf;
+ const uint8_t *dst_buf;
+ int dst_stride;
+ unsigned int uv_sad[MAX_MB_PLANE - 1];
NOISE_LEVEL noise_level = kLow;
- int zero_motion = 1;
+ bool is_zero_motion = true;
+ bool scaled_ref_last = false;
- int is_key_frame =
+ bool is_key_frame =
(frame_is_intra_only(cm) ||
(cpi->ppi->use_svc &&
cpi->svc.layer_context[cpi->svc.temporal_layer_id].is_key_frame));
assert(cm->seq_params->sb_size == BLOCK_64X64 ||
cm->seq_params->sb_size == BLOCK_128X128);
- const int is_small_sb = (cm->seq_params->sb_size == BLOCK_64X64);
+ const bool is_small_sb = (cm->seq_params->sb_size == BLOCK_64X64);
const int num_64x64_blocks = is_small_sb ? 1 : 4;
unsigned int y_sad = UINT_MAX;
@@ -1346,7 +1590,8 @@
int variance4x4downsample[64];
const int segment_id = xd->mi[0]->segment_id;
uint64_t blk_sad = 0;
- if (cpi->src_sad_blk_64x64 != NULL && !cpi->ppi->use_svc) {
+ if (cpi->src_sad_blk_64x64 != NULL &&
+ cpi->svc.spatial_layer_id == cpi->svc.number_spatial_layers - 1) {
const int sb_size_by_mb = (cm->seq_params->sb_size == BLOCK_128X128)
? (cm->seq_params->mib_size >> 1)
: cm->seq_params->mib_size;
@@ -1357,27 +1602,23 @@
blk_sad = cpi->src_sad_blk_64x64[sbi_col + sbi_row * sb_cols];
}
- if (cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ && cm->seg.enabled &&
- cyclic_refresh_segment_id_boosted(segment_id)) {
- const int q =
- av1_get_qindex(&cm->seg, segment_id, cm->quant_params.base_qindex);
- set_vbp_thresholds(cpi, thresholds, q, x->content_state_sb.low_sumdiff,
- x->content_state_sb.source_sad_nonrd,
- x->content_state_sb.source_sad_rd, 1, blk_sad,
- x->content_state_sb.lighting_change);
- } else {
- set_vbp_thresholds(cpi, thresholds, cm->quant_params.base_qindex,
- x->content_state_sb.low_sumdiff,
- x->content_state_sb.source_sad_nonrd,
- x->content_state_sb.source_sad_rd, 0, blk_sad,
- x->content_state_sb.lighting_change);
- }
+ const bool is_segment_id_boosted =
+ cpi->oxcf.q_cfg.aq_mode == CYCLIC_REFRESH_AQ && cm->seg.enabled &&
+ cyclic_refresh_segment_id_boosted(segment_id);
+ const int qindex =
+ is_segment_id_boosted
+ ? av1_get_qindex(&cm->seg, segment_id, cm->quant_params.base_qindex)
+ : cm->quant_params.base_qindex;
+ set_vbp_thresholds(
+ cpi, thresholds, blk_sad, qindex, x->content_state_sb.low_sumdiff,
+ x->content_state_sb.source_sad_nonrd, x->content_state_sb.source_sad_rd,
+ is_segment_id_boosted, x->content_state_sb.lighting_change);
// For non keyframes, disable 4x4 average for low resolution when speed = 8
threshold_4x4avg = INT64_MAX;
- s = x->plane[0].src.buf;
- sp = x->plane[0].src.stride;
+ src_buf = x->plane[AOM_PLANE_Y].src.buf;
+ int src_stride = x->plane[AOM_PLANE_Y].src.stride;
// Index for force_split: 0 for 64x64, 1-4 for 32x32 blocks,
// 5-20 for the 16x16 blocks.
@@ -1385,50 +1626,51 @@
memset(x->part_search_info.variance_low, 0,
sizeof(x->part_search_info.variance_low));
- // Check if LAST frame is NULL or if the resolution of LAST is
- // different than the current frame resolution, and if so, treat this frame
+ // Check if LAST frame is NULL, and if so, treat this frame
// as a key frame, for the purpose of the superblock partitioning.
// LAST == NULL can happen in cases where enhancement spatial layers are
// enabled dyanmically and the only reference is the spatial(GOLDEN).
- // TODO(marpan): Check se of scaled references for the different resoln.
+ // If LAST frame has a different resolution: set the scaled_ref_last flag
+ // and check if ref_scaled is NULL.
if (!frame_is_intra_only(cm)) {
- const YV12_BUFFER_CONFIG *const ref =
- get_ref_frame_yv12_buf(cm, LAST_FRAME);
- if (ref == NULL || ref->y_crop_height != cm->height ||
- ref->y_crop_width != cm->width) {
- is_key_frame = 1;
+ const YV12_BUFFER_CONFIG *ref = get_ref_frame_yv12_buf(cm, LAST_FRAME);
+ if (ref == NULL) {
+ is_key_frame = true;
+ } else if (ref->y_crop_height != cm->height ||
+ ref->y_crop_width != cm->width) {
+ scaled_ref_last = true;
+ const YV12_BUFFER_CONFIG *ref_scaled =
+ av1_get_scaled_ref_frame(cpi, LAST_FRAME);
+ if (ref_scaled == NULL) is_key_frame = true;
}
}
if (!is_key_frame) {
setup_planes(cpi, x, &y_sad, &y_sad_g, &y_sad_alt, &y_sad_last,
- &ref_frame_partition, mi_row, mi_col);
+ &ref_frame_partition, mi_row, mi_col, is_small_sb,
+ scaled_ref_last);
MB_MODE_INFO *mi = xd->mi[0];
// Use reference SB directly for zero mv.
if (mi->mv[0].as_int != 0) {
- d = xd->plane[0].dst.buf;
- dp = xd->plane[0].dst.stride;
- zero_motion = 0;
+ dst_buf = xd->plane[AOM_PLANE_Y].dst.buf;
+ dst_stride = xd->plane[AOM_PLANE_Y].dst.stride;
+ is_zero_motion = false;
} else {
- d = xd->plane[0].pre[0].buf;
- dp = xd->plane[0].pre[0].stride;
+ dst_buf = xd->plane[AOM_PLANE_Y].pre[0].buf;
+ dst_stride = xd->plane[AOM_PLANE_Y].pre[0].stride;
}
} else {
- d = AV1_VAR_OFFS;
- dp = 0;
+ dst_buf = NULL;
+ dst_stride = 0;
}
- uv_sad[0] = 0;
- uv_sad[1] = 0;
- chroma_check(cpi, x, bsize, y_sad_last, y_sad_g, is_key_frame, zero_motion,
- uv_sad);
+ // check and set the color sensitivity of sb.
+ av1_zero(uv_sad);
+ chroma_check(cpi, x, bsize, y_sad_last, y_sad_g, y_sad_alt, is_key_frame,
+ is_zero_motion, uv_sad);
x->force_zeromv_skip_for_sb = 0;
- const bool is_set_force_zeromv_skip =
- is_set_force_zeromv_skip_based_on_src_sad(
- cpi->sf.rt_sf.set_zeromv_skip_based_on_source_sad,
- x->content_state_sb.source_sad_nonrd);
// If the superblock is completely static (zero source sad) and
// the y_sad (relative to LAST ref) is very small, take the sb_size partition
@@ -1438,27 +1680,11 @@
// Condition on color uv_sad is also added.
if (!is_key_frame && cpi->sf.rt_sf.part_early_exit_zeromv &&
cpi->rc.frames_since_key > 30 && segment_id == CR_SEGMENT_ID_BASE &&
- is_set_force_zeromv_skip && ref_frame_partition == LAST_FRAME &&
- xd->mi[0]->mv[0].as_int == 0) {
- const int block_width = mi_size_wide[cm->seq_params->sb_size];
- const int block_height = mi_size_high[cm->seq_params->sb_size];
- const unsigned int thresh_exit_part_y =
- cpi->zeromv_skip_thresh_exit_part[bsize];
- const unsigned int thresh_exit_part_uv =
- CALC_CHROMA_THRESH_FOR_ZEROMV_SKIP(thresh_exit_part_y);
- if (mi_col + block_width <= tile->mi_col_end &&
- mi_row + block_height <= tile->mi_row_end &&
- y_sad < thresh_exit_part_y && uv_sad[0] < thresh_exit_part_uv &&
- uv_sad[1] < thresh_exit_part_uv) {
- set_block_size(cpi, mi_row, mi_col, bsize);
- x->force_zeromv_skip_for_sb = 1;
- if (vt2) aom_free(vt2);
- if (vt) aom_free(vt);
+ ref_frame_partition == LAST_FRAME && xd->mi[0]->mv[0].as_int == 0) {
+ // Exit here, if zero mv skip flag is set at SB level.
+ if (set_force_zeromv_skip_for_sb(cpi, x, tile, vt2, vt, uv_sad, mi_row,
+ mi_col, y_sad, bsize))
return 0;
- } else if (x->content_state_sb.source_sad_nonrd == kZeroSad &&
- cpi->sf.rt_sf.part_early_exit_zeromv >= 2) {
- x->force_zeromv_skip_for_sb = 2;
- }
}
if (cpi->noise_estimate.enabled)
@@ -1468,95 +1694,102 @@
CHECK_MEM_ERROR(cm, vt2, aom_malloc(sizeof(*vt2)));
// Fill in the entire tree of 8x8 (or 4x4 under some conditions) variances
// for splits.
- fill_variance_tree_leaves(cpi, x, vt, vt2, force_split, avg_16x16,
- maxvar_16x16, minvar_16x16, variance4x4downsample,
- thresholds, s, sp, d, dp, is_key_frame);
+ fill_variance_tree_leaves(cpi, x, vt, force_split, avg_16x16, maxvar_16x16,
+ minvar_16x16, variance4x4downsample, thresholds,
+ src_buf, src_stride, dst_buf, dst_stride,
+ is_key_frame, is_small_sb);
avg_64x64 = 0;
- for (m = 0; m < num_64x64_blocks; ++m) {
- max_var_32x32[m] = 0;
- min_var_32x32[m] = INT_MAX;
- const int m2 = m << 2;
- for (i = 0; i < 4; i++) {
- const int i2 = (m2 + i) << 2;
- for (j = 0; j < 4; j++) {
- const int split_index = 21 + i2 + j;
- if (variance4x4downsample[i2 + j] == 1) {
- VP16x16 *vtemp =
- (!is_key_frame) ? &vt2[i2 + j] : &vt->split[m].split[i].split[j];
- for (k = 0; k < 4; k++)
- fill_variance_tree(&vtemp->split[k], BLOCK_8X8);
- fill_variance_tree(vtemp, BLOCK_16X16);
- // If variance of this 16x16 block is above the threshold, force block
- // to split. This also forces a split on the upper levels.
- get_variance(&vtemp->part_variances.none);
- if (vtemp->part_variances.none.variance > thresholds[3]) {
- force_split[split_index] =
- cpi->sf.rt_sf.vbp_prune_16x16_split_using_min_max_sub_blk_var
- ? get_part_eval_based_on_sub_blk_var(vtemp, thresholds[3])
- : PART_EVAL_ONLY_SPLIT;
- force_split[5 + m2 + i] = PART_EVAL_ONLY_SPLIT;
- force_split[m + 1] = PART_EVAL_ONLY_SPLIT;
- force_split[0] = PART_EVAL_ONLY_SPLIT;
- }
+ for (int blk64_idx = 0; blk64_idx < num_64x64_blocks; ++blk64_idx) {
+ max_var_32x32[blk64_idx] = 0;
+ min_var_32x32[blk64_idx] = INT_MAX;
+ const int blk64_scale_idx = blk64_idx << 2;
+ for (int lvl1_idx = 0; lvl1_idx < 4; lvl1_idx++) {
+ const int lvl1_scale_idx = (blk64_scale_idx + lvl1_idx) << 2;
+ for (int lvl2_idx = 0; lvl2_idx < 4; lvl2_idx++) {
+ if (variance4x4downsample[lvl1_scale_idx + lvl2_idx] != 1) continue;
+ VP16x16 *vtemp =
+ (!is_key_frame)
+ ? &vt2[lvl1_scale_idx + lvl2_idx]
+ : &vt->split[blk64_idx].split[lvl1_idx].split[lvl2_idx];
+ for (int lvl3_idx = 0; lvl3_idx < 4; lvl3_idx++)
+ fill_variance_tree(&vtemp->split[lvl3_idx], BLOCK_8X8);
+ fill_variance_tree(vtemp, BLOCK_16X16);
+ // If variance of this 16x16 block is above the threshold, force block
+ // to split. This also forces a split on the upper levels.
+ get_variance(&vtemp->part_variances.none);
+ if (vtemp->part_variances.none.variance > thresholds[3]) {
+ const int split_index = 21 + lvl1_scale_idx + lvl2_idx;
+ force_split[split_index] =
+ cpi->sf.rt_sf.vbp_prune_16x16_split_using_min_max_sub_blk_var
+ ? get_part_eval_based_on_sub_blk_var(vtemp, thresholds[3])
+ : PART_EVAL_ONLY_SPLIT;
+ force_split[5 + blk64_scale_idx + lvl1_idx] = PART_EVAL_ONLY_SPLIT;
+ force_split[blk64_idx + 1] = PART_EVAL_ONLY_SPLIT;
+ force_split[0] = PART_EVAL_ONLY_SPLIT;
}
}
- fill_variance_tree(&vt->split[m].split[i], BLOCK_32X32);
+ fill_variance_tree(&vt->split[blk64_idx].split[lvl1_idx], BLOCK_32X32);
// If variance of this 32x32 block is above the threshold, or if its above
// (some threshold of) the average variance over the sub-16x16 blocks,
// then force this block to split. This also forces a split on the upper
// (64x64) level.
uint64_t frame_sad_thresh = 20000;
+ const int is_360p_or_smaller = cm->width * cm->height <= RESOLUTION_360P;
if (cpi->svc.number_temporal_layers > 2 &&
cpi->svc.temporal_layer_id == 0)
frame_sad_thresh = frame_sad_thresh << 1;
- if (force_split[5 + m2 + i] == PART_EVAL_ALL) {
- get_variance(&vt->split[m].split[i].part_variances.none);
- var_32x32 = vt->split[m].split[i].part_variances.none.variance;
- max_var_32x32[m] = AOMMAX(var_32x32, max_var_32x32[m]);
- min_var_32x32[m] = AOMMIN(var_32x32, min_var_32x32[m]);
- if (vt->split[m].split[i].part_variances.none.variance >
- thresholds[2] ||
- (!is_key_frame &&
- vt->split[m].split[i].part_variances.none.variance >
- (thresholds[2] >> 1) &&
- vt->split[m].split[i].part_variances.none.variance >
- (avg_16x16[m][i] >> 1))) {
- force_split[5 + m2 + i] = PART_EVAL_ONLY_SPLIT;
- force_split[m + 1] = PART_EVAL_ONLY_SPLIT;
+ if (force_split[5 + blk64_scale_idx + lvl1_idx] == PART_EVAL_ALL) {
+ get_variance(&vt->split[blk64_idx].split[lvl1_idx].part_variances.none);
+ var_32x32 =
+ vt->split[blk64_idx].split[lvl1_idx].part_variances.none.variance;
+ max_var_32x32[blk64_idx] = AOMMAX(var_32x32, max_var_32x32[blk64_idx]);
+ min_var_32x32[blk64_idx] = AOMMIN(var_32x32, min_var_32x32[blk64_idx]);
+ const int max_min_var_16X16_diff = (maxvar_16x16[blk64_idx][lvl1_idx] -
+ minvar_16x16[blk64_idx][lvl1_idx]);
+
+ if (var_32x32 > thresholds[2] ||
+ (!is_key_frame && var_32x32 > (thresholds[2] >> 1) &&
+ var_32x32 > (avg_16x16[blk64_idx][lvl1_idx] >> 1))) {
+ force_split[5 + blk64_scale_idx + lvl1_idx] = PART_EVAL_ONLY_SPLIT;
+ force_split[blk64_idx + 1] = PART_EVAL_ONLY_SPLIT;
force_split[0] = PART_EVAL_ONLY_SPLIT;
- } else if (!is_key_frame && (cm->width * cm->height <= 640 * 360) &&
- (((maxvar_16x16[m][i] - minvar_16x16[m][i]) >
- (thresholds[2] >> 1) &&
- maxvar_16x16[m][i] > thresholds[2]) ||
+ } else if (!is_key_frame && is_360p_or_smaller &&
+ ((max_min_var_16X16_diff > (thresholds[2] >> 1) &&
+ maxvar_16x16[blk64_idx][lvl1_idx] > thresholds[2]) ||
(cpi->sf.rt_sf.prefer_large_partition_blocks &&
x->content_state_sb.source_sad_nonrd > kLowSad &&
cpi->rc.frame_source_sad < frame_sad_thresh &&
- maxvar_16x16[m][i] > (thresholds[2] >> 4) &&
- maxvar_16x16[m][i] > (minvar_16x16[m][i] << 2)))) {
- force_split[5 + m2 + i] = PART_EVAL_ONLY_SPLIT;
- force_split[m + 1] = PART_EVAL_ONLY_SPLIT;
+ maxvar_16x16[blk64_idx][lvl1_idx] > (thresholds[2] >> 4) &&
+ maxvar_16x16[blk64_idx][lvl1_idx] >
+ (minvar_16x16[blk64_idx][lvl1_idx] << 2)))) {
+ force_split[5 + blk64_scale_idx + lvl1_idx] = PART_EVAL_ONLY_SPLIT;
+ force_split[blk64_idx + 1] = PART_EVAL_ONLY_SPLIT;
force_split[0] = PART_EVAL_ONLY_SPLIT;
}
}
}
- if (force_split[1 + m] == PART_EVAL_ALL) {
- fill_variance_tree(&vt->split[m], BLOCK_64X64);
- get_variance(&vt->split[m].part_variances.none);
- var_64x64 = vt->split[m].part_variances.none.variance;
+ if (force_split[1 + blk64_idx] == PART_EVAL_ALL) {
+ fill_variance_tree(&vt->split[blk64_idx], BLOCK_64X64);
+ get_variance(&vt->split[blk64_idx].part_variances.none);
+ var_64x64 = vt->split[blk64_idx].part_variances.none.variance;
max_var_64x64 = AOMMAX(var_64x64, max_var_64x64);
min_var_64x64 = AOMMIN(var_64x64, min_var_64x64);
// If the difference of the max-min variances of sub-blocks or max
// variance of a sub-block is above some threshold of then force this
// block to split. Only checking this for noise level >= medium, if
// encoder is in SVC or if we already forced large blocks.
+ const int max_min_var_32x32_diff =
+ max_var_32x32[blk64_idx] - min_var_32x32[blk64_idx];
+ const int check_max_var = max_var_32x32[blk64_idx] > thresholds[1] >> 1;
+ const bool check_noise_lvl = noise_level >= kMedium ||
+ cpi->ppi->use_svc ||
+ cpi->sf.rt_sf.prefer_large_partition_blocks;
+ const int64_t set_threshold = 3 * (thresholds[1] >> 3);
- if (!is_key_frame &&
- (max_var_32x32[m] - min_var_32x32[m]) > 3 * (thresholds[1] >> 3) &&
- max_var_32x32[m] > thresholds[1] >> 1 &&
- (noise_level >= kMedium || cpi->ppi->use_svc ||
- cpi->sf.rt_sf.prefer_large_partition_blocks)) {
- force_split[1 + m] = PART_EVAL_ONLY_SPLIT;
+ if (!is_key_frame && max_min_var_32x32_diff > set_threshold &&
+ check_max_var && check_noise_lvl) {
+ force_split[1 + blk64_idx] = PART_EVAL_ONLY_SPLIT;
force_split[0] = PART_EVAL_ONLY_SPLIT;
}
avg_64x64 += var_64x64;
@@ -1567,8 +1800,8 @@
if (force_split[0] == PART_EVAL_ALL) {
fill_variance_tree(vt, BLOCK_128X128);
get_variance(&vt->part_variances.none);
- if (!is_key_frame &&
- vt->part_variances.none.variance > (9 * avg_64x64) >> 5)
+ const int set_avg_64x64 = (9 * avg_64x64) >> 5;
+ if (!is_key_frame && vt->part_variances.none.variance > set_avg_64x64)
force_split[0] = PART_EVAL_ONLY_SPLIT;
if (!is_key_frame &&
@@ -1580,51 +1813,51 @@
if (mi_col + 32 > tile->mi_col_end || mi_row + 32 > tile->mi_row_end ||
!set_vt_partitioning(cpi, xd, tile, vt, BLOCK_128X128, mi_row, mi_col,
thresholds[0], BLOCK_16X16, force_split[0])) {
- for (m = 0; m < num_64x64_blocks; ++m) {
- const int x64_idx = ((m & 1) << 4);
- const int y64_idx = ((m >> 1) << 4);
- const int m2 = m << 2;
+ for (int blk64_idx = 0; blk64_idx < num_64x64_blocks; ++blk64_idx) {
+ const int x64_idx = GET_BLK_IDX_X(blk64_idx, 4);
+ const int y64_idx = GET_BLK_IDX_Y(blk64_idx, 4);
+ const int blk64_scale_idx = blk64_idx << 2;
// Now go through the entire structure, splitting every block size until
// we get to one that's got a variance lower than our threshold.
- if (!set_vt_partitioning(cpi, xd, tile, &vt->split[m], BLOCK_64X64,
- mi_row + y64_idx, mi_col + x64_idx,
- thresholds[1], BLOCK_16X16,
- force_split[1 + m])) {
- for (i = 0; i < 4; ++i) {
- const int x32_idx = ((i & 1) << 3);
- const int y32_idx = ((i >> 1) << 3);
- const int i2 = (m2 + i) << 2;
- if (!set_vt_partitioning(cpi, xd, tile, &vt->split[m].split[i],
- BLOCK_32X32, (mi_row + y64_idx + y32_idx),
- (mi_col + x64_idx + x32_idx), thresholds[2],
- BLOCK_16X16, force_split[5 + m2 + i])) {
- for (j = 0; j < 4; ++j) {
- const int x16_idx = ((j & 1) << 2);
- const int y16_idx = ((j >> 1) << 2);
- const int split_index = 21 + i2 + j;
- // For inter frames: if variance4x4downsample[] == 1 for this
- // 16x16 block, then the variance is based on 4x4 down-sampling,
- // so use vt2 in set_vt_partioning(), otherwise use vt.
- VP16x16 *vtemp =
- (!is_key_frame && variance4x4downsample[i2 + j] == 1)
- ? &vt2[i2 + j]
- : &vt->split[m].split[i].split[j];
- if (!set_vt_partitioning(cpi, xd, tile, vtemp, BLOCK_16X16,
- mi_row + y64_idx + y32_idx + y16_idx,
- mi_col + x64_idx + x32_idx + x16_idx,
- thresholds[3], BLOCK_8X8,
- force_split[split_index])) {
- for (k = 0; k < 4; ++k) {
- const int x8_idx = (k & 1) << 1;
- const int y8_idx = (k >> 1) << 1;
- set_block_size(
- cpi, (mi_row + y64_idx + y32_idx + y16_idx + y8_idx),
- (mi_col + x64_idx + x32_idx + x16_idx + x8_idx),
- BLOCK_8X8);
- }
- }
- }
+ if (set_vt_partitioning(cpi, xd, tile, &vt->split[blk64_idx], BLOCK_64X64,
+ mi_row + y64_idx, mi_col + x64_idx, thresholds[1],
+ BLOCK_16X16, force_split[1 + blk64_idx]))
+ continue;
+ for (int lvl1_idx = 0; lvl1_idx < 4; ++lvl1_idx) {
+ const int x32_idx = GET_BLK_IDX_X(lvl1_idx, 3);
+ const int y32_idx = GET_BLK_IDX_Y(lvl1_idx, 3);
+ const int lvl1_scale_idx = (blk64_scale_idx + lvl1_idx) << 2;
+ if (set_vt_partitioning(
+ cpi, xd, tile, &vt->split[blk64_idx].split[lvl1_idx],
+ BLOCK_32X32, (mi_row + y64_idx + y32_idx),
+ (mi_col + x64_idx + x32_idx), thresholds[2], BLOCK_16X16,
+ force_split[5 + blk64_scale_idx + lvl1_idx]))
+ continue;
+ for (int lvl2_idx = 0; lvl2_idx < 4; ++lvl2_idx) {
+ const int x16_idx = GET_BLK_IDX_X(lvl2_idx, 2);
+ const int y16_idx = GET_BLK_IDX_Y(lvl2_idx, 2);
+ const int split_index = 21 + lvl1_scale_idx + lvl2_idx;
+ // For inter frames: if variance4x4downsample[] == 1 for this
+ // 16x16 block, then the variance is based on 4x4 down-sampling,
+ // so use vt2 in set_vt_partioning(), otherwise use vt.
+ VP16x16 *vtemp =
+ (!is_key_frame &&
+ variance4x4downsample[lvl1_scale_idx + lvl2_idx] == 1)
+ ? &vt2[lvl1_scale_idx + lvl2_idx]
+ : &vt->split[blk64_idx].split[lvl1_idx].split[lvl2_idx];
+ if (set_vt_partitioning(cpi, xd, tile, vtemp, BLOCK_16X16,
+ mi_row + y64_idx + y32_idx + y16_idx,
+ mi_col + x64_idx + x32_idx + x16_idx,
+ thresholds[3], BLOCK_8X8,
+ force_split[split_index]))
+ continue;
+ for (int lvl3_idx = 0; lvl3_idx < 4; ++lvl3_idx) {
+ const int x8_idx = GET_BLK_IDX_X(lvl3_idx, 1);
+ const int y8_idx = GET_BLK_IDX_Y(lvl3_idx, 1);
+ set_block_size(cpi, (mi_row + y64_idx + y32_idx + y16_idx + y8_idx),
+ (mi_col + x64_idx + x32_idx + x16_idx + x8_idx),
+ BLOCK_8X8);
}
}
}
@@ -1633,7 +1866,7 @@
if (cpi->sf.rt_sf.short_circuit_low_temp_var) {
set_low_temp_var_flag(cpi, &x->part_search_info, xd, vt, thresholds,
- ref_frame_partition, mi_col, mi_row);
+ ref_frame_partition, mi_col, mi_row, is_small_sb);
}
if (vt2) aom_free(vt2);
diff --git a/av1/encoder/var_based_part.h b/av1/encoder/var_based_part.h
index 7febc0e..f912458 100644
--- a/av1/encoder/var_based_part.h
+++ b/av1/encoder/var_based_part.h
@@ -20,6 +20,10 @@
#include "av1/encoder/encoder.h"
+// Calculate block index x and y from split level and index
+#define GET_BLK_IDX_X(idx, level) (((idx) & (0x01)) << (level))
+#define GET_BLK_IDX_Y(idx, level) (((idx) >> (0x01)) << (level))
+
#ifdef __cplusplus
extern "C" {
#endif
diff --git a/av1/encoder/x86/av1_fwd_txfm2d_avx2.c b/av1/encoder/x86/av1_fwd_txfm2d_avx2.c
index b898fc6..b143df3 100644
--- a/av1/encoder/x86/av1_fwd_txfm2d_avx2.c
+++ b/av1/encoder/x86/av1_fwd_txfm2d_avx2.c
@@ -1430,34 +1430,15 @@
}
}
-static INLINE void transpose_32_8x8_avx2(int stride, const __m256i *inputA,
- __m256i *output) {
- __m256i temp0 = _mm256_unpacklo_epi32(inputA[0], inputA[2]);
- __m256i temp1 = _mm256_unpackhi_epi32(inputA[0], inputA[2]);
- __m256i temp2 = _mm256_unpacklo_epi32(inputA[1], inputA[3]);
- __m256i temp3 = _mm256_unpackhi_epi32(inputA[1], inputA[3]);
- __m256i temp4 = _mm256_unpacklo_epi32(inputA[4], inputA[6]);
- __m256i temp5 = _mm256_unpackhi_epi32(inputA[4], inputA[6]);
- __m256i temp6 = _mm256_unpacklo_epi32(inputA[5], inputA[7]);
- __m256i temp7 = _mm256_unpackhi_epi32(inputA[5], inputA[7]);
-
- __m256i t0 = _mm256_unpacklo_epi32(temp0, temp2);
- __m256i t1 = _mm256_unpackhi_epi32(temp0, temp2);
- __m256i t2 = _mm256_unpacklo_epi32(temp1, temp3);
- __m256i t3 = _mm256_unpackhi_epi32(temp1, temp3);
- __m256i t4 = _mm256_unpacklo_epi32(temp4, temp6);
- __m256i t5 = _mm256_unpackhi_epi32(temp4, temp6);
- __m256i t6 = _mm256_unpacklo_epi32(temp5, temp7);
- __m256i t7 = _mm256_unpackhi_epi32(temp5, temp7);
-
- output[0 * stride] = _mm256_permute2x128_si256(t0, t4, 0x20);
- output[1 * stride] = _mm256_permute2x128_si256(t1, t5, 0x20);
- output[2 * stride] = _mm256_permute2x128_si256(t2, t6, 0x20);
- output[3 * stride] = _mm256_permute2x128_si256(t3, t7, 0x20);
- output[4 * stride] = _mm256_permute2x128_si256(t0, t4, 0x31);
- output[5 * stride] = _mm256_permute2x128_si256(t1, t5, 0x31);
- output[6 * stride] = _mm256_permute2x128_si256(t2, t6, 0x31);
- output[7 * stride] = _mm256_permute2x128_si256(t3, t7, 0x31);
+static INLINE void store_output_32bit_w16(int32_t *const out,
+ const __m256i *const in1,
+ const __m256i *const in2,
+ const int stride,
+ const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ _mm256_store_si256((__m256i *)(out + stride * i), in1[i]);
+ _mm256_store_si256((__m256i *)(out + stride * i + 8), in2[i]);
+ }
}
// Store 8 16 bit values. Sign extend the values.
@@ -1779,83 +1760,14 @@
out[7] = _mm256_extractf128_si256(c3, 1);
}
-static INLINE void transpose_16bit_and_store_8x8(const __m128i *const in,
- int32_t *output) {
- // in[0]: 00 01 02 03 04 05 06 07
- // in[1]: 10 11 12 13 14 15 16 17
- // in[2]: 20 21 22 23 24 25 26 27
- // in[3]: 30 31 32 33 34 35 36 37
- // in[4]: 40 41 42 43 44 45 46 47
- // in[5]: 50 51 52 53 54 55 56 57
- // in[6]: 60 61 62 63 64 65 66 67
- // in[7]: 70 71 72 73 74 75 76 77
- // to:
- // s04: 00 01 02 03 04 05 06 07 | 40 41 42 43 44 45 46 47
- // s15: 10 11 12 13 14 15 16 17 | 50 51 52 53 54 55 56 57
- // s26: 20 21 22 23 24 25 26 27 | 60 61 62 63 64 65 66 67
- // s37: 30 31 32 33 34 35 36 37 | 70 71 72 73 74 75 76 77
- const __m256i s04 =
- _mm256_insertf128_si256(_mm256_castsi128_si256(in[0]), in[4], 0x1);
- const __m256i s15 =
- _mm256_insertf128_si256(_mm256_castsi128_si256(in[1]), in[5], 0x1);
- const __m256i s26 =
- _mm256_insertf128_si256(_mm256_castsi128_si256(in[2]), in[6], 0x1);
- const __m256i s37 =
- _mm256_insertf128_si256(_mm256_castsi128_si256(in[3]), in[7], 0x1);
-
- // a0: 00 10 01 11 02 12 03 13 | 40 50 41 51 42 52 43 53
- // a1: 04 14 05 15 06 16 07 17 | 44 54 45 55 46 56 47 57
- // a2: 20 30 21 31 22 32 23 33 | 60 70 61 71 62 72 63 73
- // a3: 24 34 25 35 26 36 27 37 | 64 74 65 75 66 76 67 77
- const __m256i a0 = _mm256_unpacklo_epi16(s04, s15);
- const __m256i a1 = _mm256_unpackhi_epi16(s04, s15);
- const __m256i a2 = _mm256_unpacklo_epi16(s26, s37);
- const __m256i a3 = _mm256_unpackhi_epi16(s26, s37);
-
- // Unpack 32 bit elements resulting in:
- // b0: 00 10 20 30 01 11 21 31 | 40 50 60 70 41 51 61 71
- // b1: 02 12 22 32 03 13 23 33 | 42 52 62 72 43 53 63 73
- // b2: 04 14 24 34 05 15 25 35 | 44 54 64 74 45 55 65 75
- // b2: 06 16 26 36 07 17 27 37 | 46 56 66 76 47 57 67 77
- const __m256i b0 = _mm256_unpacklo_epi32(a0, a2);
- const __m256i b1 = _mm256_unpackhi_epi32(a0, a2);
- const __m256i b2 = _mm256_unpacklo_epi32(a1, a3);
- const __m256i b3 = _mm256_unpackhi_epi32(a1, a3);
-
- // 00 10 20 30 40 50 60 70
- // 01 11 21 31 41 51 61 71
- // 02 12 22 32 42 52 62 72
- // 03 13 23 33 43 53 63 73
- // 04 14 24 34 44 54 64 74
- // 05 15 25 35 45 55 65 75
- // 06 16 26 36 46 56 66 76
- // 07 17 27 37 47 57 67 77
- const __m256i a_lo = _mm256_unpacklo_epi16(b0, b0);
- const __m256i a_hi = _mm256_unpackhi_epi16(b0, b0);
- const __m256i b_lo = _mm256_unpacklo_epi16(b1, b1);
- const __m256i b_hi = _mm256_unpackhi_epi16(b1, b1);
- const __m256i c_lo = _mm256_unpacklo_epi16(b2, b2);
- const __m256i c_hi = _mm256_unpackhi_epi16(b2, b2);
- const __m256i d_lo = _mm256_unpacklo_epi16(b3, b3);
- const __m256i d_hi = _mm256_unpackhi_epi16(b3, b3);
-
- const __m256i a_1 = _mm256_srai_epi32(a_lo, 16);
- const __m256i a_2 = _mm256_srai_epi32(a_hi, 16);
- const __m256i a_3 = _mm256_srai_epi32(b_lo, 16);
- const __m256i a_4 = _mm256_srai_epi32(b_hi, 16);
- const __m256i a_5 = _mm256_srai_epi32(c_lo, 16);
- const __m256i a_6 = _mm256_srai_epi32(c_hi, 16);
- const __m256i a_7 = _mm256_srai_epi32(d_lo, 16);
- const __m256i a_8 = _mm256_srai_epi32(d_hi, 16);
-
- _mm256_store_si256((__m256i *)output, a_1);
- _mm256_store_si256((__m256i *)(output + 8), a_2);
- _mm256_store_si256((__m256i *)(output + 16), a_3);
- _mm256_store_si256((__m256i *)(output + 24), a_4);
- _mm256_store_si256((__m256i *)(output + 32), a_5);
- _mm256_store_si256((__m256i *)(output + 40), a_6);
- _mm256_store_si256((__m256i *)(output + 48), a_7);
- _mm256_store_si256((__m256i *)(output + 56), a_8);
+static INLINE void store_buffer_16bit_to_32bit_w8_avx2(const __m128i *const in,
+ int32_t *const out,
+ const int stride,
+ const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ _mm256_store_si256((__m256i *)(out + i * stride),
+ _mm256_cvtepi16_epi32(in[i]));
+ }
}
static void av1_lowbd_fwd_txfm2d_8x8_avx2(const int16_t *input, int32_t *output,
@@ -1897,7 +1809,7 @@
// Round and shift operation is avoided here as the shift bit is assumed to be
// zero always.
assert(shift[2] == 0);
- transpose_16bit_and_store_8x8(buf, output);
+ store_buffer_16bit_to_32bit_w8_avx2(buf, output, 8, 8);
}
static void lowbd_fwd_txfm2d_16x16_avx2(const int16_t *input, int32_t *output,
@@ -1937,8 +1849,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- transpose_16bit_16x16_avx2(buf, buf);
- store_buffer_16bit_to_32bit_w16_avx2(buf, output + 16 * width * i, width, 16);
+ store_buffer_16bit_to_32bit_w16_avx2(buf, output + i * 16, height, width);
}
static void lowbd_fwd_txfm2d_32x32_avx2(const int16_t *input, int32_t *output,
@@ -1983,12 +1894,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- transpose_16bit_16x16_avx2(buf, buf);
- store_buffer_16bit_to_32bit_w16_avx2(buf, output + 16 * width * i, width,
- 16);
- transpose_16bit_16x16_avx2(buf + 16, buf + 16);
- store_buffer_16bit_to_32bit_w16_avx2(buf + 16, output + 16 * width * i + 16,
- width, 16);
+ store_buffer_16bit_to_32bit_w16_avx2(buf, output + i * 16, height, width);
}
}
@@ -2032,13 +1938,7 @@
fdct64_new_avx2(bufB, bufB, cos_bit_row);
round_shift_array_32_avx2(bufA, bufA, 32, -shift[2]);
round_shift_array_32_avx2(bufB, bufB, 32, -shift[2]);
-
- int32_t *output8 = output + 16 * 32 * i;
- for (int j = 0; j < 4; ++j) {
- __m256i *out = (__m256i *)(output8 + 8 * j);
- transpose_32_8x8_avx2(4, bufA + 8 * j, out);
- transpose_32_8x8_avx2(4, bufB + 8 * j, out + 8 * 4);
- }
+ store_output_32bit_w16(output + i * 16, bufA, bufB, 32, 32);
}
}
@@ -2081,9 +1981,8 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- transpose_16bit_16x16_avx2(buf, buf);
- store_rect_buffer_16bit_to_32bit_w16_avx2(buf, output + 16 * width * i,
- width, 16);
+ store_rect_buffer_16bit_to_32bit_w16_avx2(buf, output + i * 16, height,
+ width);
}
}
@@ -2126,11 +2025,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- transpose_16bit_16x16_avx2(buf, buf);
- store_rect_buffer_16bit_to_32bit_w16_avx2(buf, output, width, 16);
-
- transpose_16bit_16x16_avx2(buf + 16, buf + 16);
- store_rect_buffer_16bit_to_32bit_w16_avx2(buf + 16, output + 16, width, 16);
+ store_rect_buffer_16bit_to_32bit_w16_avx2(buf, output, height, width);
}
static void lowbd_fwd_txfm2d_64x32_avx2(const int16_t *input, int32_t *output,
@@ -2172,12 +2067,7 @@
round_shift_rect_array_32_avx2(bufA, bufA, 32, -shift[2], NewSqrt2);
round_shift_rect_array_32_avx2(bufB, bufB, 32, -shift[2], NewSqrt2);
- int32_t *output8 = output + 16 * 32 * i;
- for (int j = 0; j < 4; ++j) {
- __m256i *out = (__m256i *)(output8 + 8 * j);
- transpose_32_8x8_avx2(4, bufA + 8 * j, out);
- transpose_32_8x8_avx2(4, bufB + 8 * j, out + 8 * 4);
- }
+ store_output_32bit_w16(output + i * 16, bufA, bufB, 32, 32);
}
}
@@ -2222,12 +2112,7 @@
round_shift_rect_array_32_avx2(bufA, bufA, 32, -shift[2], NewSqrt2);
round_shift_rect_array_32_avx2(bufB, bufB, 32, -shift[2], NewSqrt2);
- int32_t *output8 = output + 16 * 32 * i;
- for (int j = 0; j < 4; ++j) {
- __m256i *out = (__m256i *)(output8 + 8 * j);
- transpose_32_8x8_avx2(4, bufA + 8 * j, out);
- transpose_32_8x8_avx2(4, bufB + 8 * j, out + 8 * 4);
- }
+ store_output_32bit_w16(output + i * 16, bufA, bufB, 32, 32);
}
}
@@ -2260,19 +2145,12 @@
}
}
- for (int i = 0; i < AOMMIN(4, height_div16); i++) {
+ for (int i = 0; i < AOMMIN(2, height_div16); i++) {
__m256i *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- int32_t *output16 = output + 16 * width * i;
- for (int j = 0; j < width_div16; ++j) {
- __m256i *buf16 = buf + 16 * j;
- transpose_16bit_16x16_avx2(buf16, buf16);
- store_buffer_16bit_to_32bit_w16_avx2(buf16, output16 + 16 * j, width, 16);
- }
+ store_buffer_16bit_to_32bit_w16_avx2(buf, output + width * i, 32, width);
}
- // Zero out the bottom 16x32 area.
- memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
static void lowbd_fwd_txfm2d_64x16_avx2(const int16_t *input, int32_t *output,
@@ -2308,13 +2186,10 @@
__m256i *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit_w16_avx2(buf, width, shift[2]);
- int32_t *output16 = output + 16 * 32 * i;
- for (int j = 0; j < 2; ++j) {
- __m256i *buf16 = buf + 16 * j;
- transpose_16bit_16x16_avx2(buf16, buf16);
- store_buffer_16bit_to_32bit_w16_avx2(buf16, output16 + 16 * j, 32, 16);
- }
+ store_buffer_16bit_to_32bit_w16_avx2(buf, output + 16 * i, 16, 32);
}
+ // Zero out the bottom 16x32 area.
+ memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
static INLINE void btf_16_avx2(__m256i *w0, __m256i *w1, __m256i *in0,
@@ -3054,8 +2929,7 @@
pack_reg(bufl, bufu, buf2);
row_txfm(buf2, buf2, cos_bit_row);
round_shift_16bit_w16_avx2(buf2, width, shift[2]);
- transpose_16bit_16x8_avx2(buf2, buf2);
- store_rect_buffer_16bit_to_32bit_w8_avx2(buf2, output, width, 8);
+ store_rect_buffer_16bit_to_32bit_w16_avx2(buf2, output, height, width);
}
static void lowbd_fwd_txfm2d_16x8_avx2(const int16_t *input, int32_t *output,
@@ -3099,10 +2973,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output, width, height);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8, width, height);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
static FwdTxfm2dFunc fwd_txfm2d_func_ls[TX_SIZES_ALL] = {
diff --git a/av1/encoder/x86/av1_fwd_txfm2d_sse4.c b/av1/encoder/x86/av1_fwd_txfm2d_sse4.c
index db554c4..825da8d 100644
--- a/av1/encoder/x86/av1_fwd_txfm2d_sse4.c
+++ b/av1/encoder/x86/av1_fwd_txfm2d_sse4.c
@@ -29,6 +29,16 @@
}
}
+static INLINE void store_output_32bit_w8(int32_t *const out,
+ const __m128i *const in1,
+ const __m128i *const in2,
+ const int stride, const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ _mm_store_si128((__m128i *)(out + stride * i), in1[i]);
+ _mm_store_si128((__m128i *)(out + stride * i + 4), in2[i]);
+ }
+}
+
typedef void (*TxfmFuncSSE2)(__m128i *input, __m128i *output,
const int8_t cos_bit, const int8_t *stage_range);
@@ -65,9 +75,9 @@
static INLINE TxfmFuncSSE2 fwd_txfm_type_to_func(TXFM_TYPE txfm_type) {
switch (txfm_type) {
- case TXFM_TYPE_DCT32: return fdct32_sse4_1; break;
- case TXFM_TYPE_DCT64: return fdct64_new_sse4_1; break;
- case TXFM_TYPE_IDENTITY32: return idtx32x32_sse4_1; break;
+ case TXFM_TYPE_DCT32: return fdct32_sse4_1;
+ case TXFM_TYPE_DCT64: return fdct64_new_sse4_1;
+ case TXFM_TYPE_IDENTITY32: return idtx32x32_sse4_1;
default: assert(0);
}
return NULL;
@@ -104,8 +114,7 @@
av1_round_shift_array_32_sse4_1(buf_128, out_128, txfm2d_size_128, -shift[1]);
transpose_32(txfm_size, out_128, buf_128);
txfm_func_row(buf_128, out_128, cos_bit_row, stage_range_row);
- av1_round_shift_array_32_sse4_1(out_128, buf_128, txfm2d_size_128, -shift[2]);
- transpose_32(txfm_size, buf_128, out_128);
+ av1_round_shift_array_32_sse4_1(out_128, out_128, txfm2d_size_128, -shift[2]);
}
static INLINE void fwd_txfm2d_64x64_sse4_1(const int16_t *input,
@@ -140,8 +149,7 @@
}
txfm2d_size_128 = (col_num >> 1) * (txfm_size >> 1);
- av1_round_shift_array_32_sse4_1(out_128, buf_128, txfm2d_size_128, -shift[2]);
- transpose_8nx8n(buf_128, out_128, 32, 32);
+ av1_round_shift_array_32_sse4_1(out_128, out_128, txfm2d_size_128, -shift[2]);
}
void av1_fwd_txfm2d_32x32_sse4_1(const int16_t *input, int32_t *output,
@@ -162,29 +170,6 @@
fwd_txfm2d_64x64_sse4_1(input, output, stride, &cfg, txfm_buf);
}
-static INLINE void transpose_32_4x4x2(int stride, const __m128i *inputA,
- const __m128i *inputB, __m128i *output) {
- __m128i temp0 = _mm_unpacklo_epi32(inputA[0], inputA[2]);
- __m128i temp1 = _mm_unpackhi_epi32(inputA[0], inputA[2]);
- __m128i temp2 = _mm_unpacklo_epi32(inputA[1], inputA[3]);
- __m128i temp3 = _mm_unpackhi_epi32(inputA[1], inputA[3]);
-
- output[0 * stride] = _mm_unpacklo_epi32(temp0, temp2);
- output[1 * stride] = _mm_unpackhi_epi32(temp0, temp2);
- output[2 * stride] = _mm_unpacklo_epi32(temp1, temp3);
- output[3 * stride] = _mm_unpackhi_epi32(temp1, temp3);
-
- temp0 = _mm_unpacklo_epi32(inputB[0], inputB[2]);
- temp1 = _mm_unpackhi_epi32(inputB[0], inputB[2]);
- temp2 = _mm_unpacklo_epi32(inputB[1], inputB[3]);
- temp3 = _mm_unpackhi_epi32(inputB[1], inputB[3]);
-
- output[4 * stride] = _mm_unpacklo_epi32(temp0, temp2);
- output[5 * stride] = _mm_unpackhi_epi32(temp0, temp2);
- output[6 * stride] = _mm_unpacklo_epi32(temp1, temp3);
- output[7 * stride] = _mm_unpackhi_epi32(temp1, temp3);
-}
-
static void lowbd_fwd_txfm2d_64x64_sse4_1(const int16_t *input, int32_t *output,
int stride, TX_TYPE tx_type, int bd) {
(void)bd;
@@ -225,11 +210,7 @@
av1_round_shift_array_32_sse4_1(bufA, bufA, 32, -shift[2]);
av1_round_shift_array_32_sse4_1(bufB, bufB, 32, -shift[2]);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < width_div8; ++j) {
- __m128i *out = (__m128i *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
@@ -272,11 +253,7 @@
av1_round_shift_rect_array_32_sse4_1(bufA, bufA, 32, -shift[2], NewSqrt2);
av1_round_shift_rect_array_32_sse4_1(bufB, bufB, 32, -shift[2], NewSqrt2);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < width_div8; ++j) {
- __m128i *out = (__m128i *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
@@ -321,11 +298,7 @@
av1_round_shift_rect_array_32_sse4_1(bufA, bufA, 32, -shift[2], NewSqrt2);
av1_round_shift_rect_array_32_sse4_1(bufB, bufB, 32, -shift[2], NewSqrt2);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < (32 / 4); ++j) {
- __m128i *out = (__m128i *)(output8 + 4 * j);
- transpose_32_4x4x2(8, bufA + 4 * j, bufB + 4 * j, out);
- }
+ store_output_32bit_w8(output + i * 8, bufA, bufB, 32, 32);
}
}
diff --git a/av1/encoder/x86/av1_fwd_txfm_sse2.c b/av1/encoder/x86/av1_fwd_txfm_sse2.c
index 748ef4d..a4def75 100644
--- a/av1/encoder/x86/av1_fwd_txfm_sse2.c
+++ b/av1/encoder/x86/av1_fwd_txfm_sse2.c
@@ -2006,8 +2006,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_4x4(buf, buf);
- store_buffer_16bit_to_32bit_w4(buf, output, width, height);
+ store_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_4x8_sse2(const int16_t *input, int32_t *output,
@@ -2045,8 +2044,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x4(buf, buf);
- store_rect_buffer_16bit_to_32bit_w4(buf, output, width, height);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_4x16_sse2(const int16_t *input, int32_t *output,
@@ -2086,8 +2084,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x4(buf, buf);
- store_buffer_16bit_to_32bit_w4(buf, output + 8 * width * i, width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2124,8 +2121,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output, width, height);
+ store_rect_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_8x8_sse2(const int16_t *input, int32_t *output,
@@ -2161,8 +2157,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output, width, height);
+ store_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_8x16_sse2(const int16_t *input, int32_t *output,
@@ -2202,8 +2197,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2246,8 +2240,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2288,10 +2281,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_4x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output, width, height);
- transpose_16bit_4x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8, width, height);
+ store_buffer_16bit_to_32bit_w4(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_16x8_sse2(const int16_t *input, int32_t *output,
@@ -2331,10 +2321,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output, width, height);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8, width, height);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output, height, width);
}
void av1_lowbd_fwd_txfm2d_16x16_sse2(const int16_t *input, int32_t *output,
@@ -2376,11 +2363,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
}
@@ -2427,12 +2410,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8,
- width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_16x32_c(input, output, stride, tx_type, bd);
@@ -2479,18 +2457,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- height);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- height);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, height);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, height);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x16_c(input, output, stride, tx_type, bd);
@@ -2538,18 +2505,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width,
- 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_rect_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8,
- width, 8);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_rect_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, 8);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_rect_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, 8);
+ store_rect_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x16_c(input, output, stride, tx_type, bd);
@@ -2599,17 +2555,7 @@
}
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- transpose_16bit_8x8(buf, buf);
- store_buffer_16bit_to_32bit_w8(buf, output + 8 * width * i, width, 8);
- transpose_16bit_8x8(buf + 8, buf + 8);
- store_buffer_16bit_to_32bit_w8(buf + 8, output + 8 * width * i + 8, width,
- 8);
- transpose_16bit_8x8(buf + 16, buf + 16);
- store_buffer_16bit_to_32bit_w8(buf + 16, output + 8 * width * i + 16,
- width, 8);
- transpose_16bit_8x8(buf + 24, buf + 24);
- store_buffer_16bit_to_32bit_w8(buf + 24, output + 8 * width * i + 24,
- width, 8);
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, height, width);
}
} else {
av1_fwd_txfm2d_32x32_c(input, output, stride, tx_type, bd);
@@ -2649,13 +2595,10 @@
__m128i *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- int32_t *output8 = output + 8 * 32 * i;
- for (int j = 0; j < 4; ++j) {
- __m128i *buf8 = buf + 8 * j;
- transpose_16bit_8x8(buf8, buf8);
- store_buffer_16bit_to_32bit_w8(buf8, output8 + 8 * j, 32, 8);
- }
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, 16, 32);
}
+ // Zero out the bottom 16x32 area.
+ memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
void av1_lowbd_fwd_txfm2d_16x64_sse2(const int16_t *input, int32_t *output,
@@ -2691,15 +2634,8 @@
__m128i *buf = buf1 + width * i;
row_txfm(buf, buf, cos_bit_row);
round_shift_16bit(buf, width, shift[2]);
- int32_t *output8 = output + 8 * width * i;
- for (int j = 0; j < width_div8; ++j) {
- __m128i *buf8 = buf + 8 * j;
- transpose_16bit_8x8(buf8, buf8);
- store_buffer_16bit_to_32bit_w8(buf8, output8 + 8 * j, width, 8);
- }
+ store_buffer_16bit_to_32bit_w8(buf, output + 8 * i, 32, 16);
}
- // Zero out the bottom 16x32 area.
- memset(output + 16 * 32, 0, 16 * 32 * sizeof(*output));
}
static FwdTxfm2dFunc fwd_txfm2d_func_ls[TX_SIZES_ALL] = {
diff --git a/av1/encoder/x86/av1_k_means_avx2.c b/av1/encoder/x86/av1_k_means_avx2.c
index a2db222..ad0b374 100644
--- a/av1/encoder/x86/av1_k_means_avx2.c
+++ b/av1/encoder/x86/av1_k_means_avx2.c
@@ -26,39 +26,44 @@
void av1_calc_indices_dim1_avx2(const int16_t *data, const int16_t *centroids,
uint8_t *indices, int64_t *total_dist, int n,
int k) {
- __m256i dist[PALETTE_MAX_SIZE];
const __m256i v_zero = _mm256_setzero_si256();
__m256i sum = _mm256_setzero_si256();
+ __m256i cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ cents[j] = _mm256_set1_epi16(centroids[j]);
+ }
for (int i = 0; i < n; i += 16) {
- __m256i ind = _mm256_loadu_si256((__m256i *)data);
- for (int j = 0; j < k; j++) {
- __m256i cent = _mm256_set1_epi16(centroids[j]);
- __m256i d1 = _mm256_sub_epi16(ind, cent);
- dist[j] = _mm256_abs_epi16(d1);
- }
+ const __m256i in = _mm256_loadu_si256((__m256i *)data);
+ __m256i ind = _mm256_setzero_si256();
+ // Compute the distance to the first centroid.
+ __m256i d1 = _mm256_sub_epi16(in, cents[0]);
+ __m256i dist_min = _mm256_abs_epi16(d1);
- ind = _mm256_setzero_si256();
- for (int j = 1; j < k; j++) {
- __m256i cmp = _mm256_cmpgt_epi16(dist[0], dist[j]);
- dist[0] = _mm256_min_epi16(dist[0], dist[j]);
- __m256i ind1 = _mm256_set1_epi16(j);
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ d1 = _mm256_sub_epi16(in, cents[j]);
+ const __m256i dist = _mm256_abs_epi16(d1);
+ // Compare to the minimal one.
+ const __m256i cmp = _mm256_cmpgt_epi16(dist_min, dist);
+ dist_min = _mm256_min_epi16(dist_min, dist);
+ const __m256i ind1 = _mm256_set1_epi16(j);
ind = _mm256_or_si256(_mm256_andnot_si256(cmp, ind),
_mm256_and_si256(cmp, ind1));
}
- __m256i p1 = _mm256_packus_epi16(ind, v_zero);
- __m256i px = _mm256_permute4x64_epi64(p1, 0x58);
- __m128i d1 = _mm256_extracti128_si256(px, 0);
+ const __m256i p1 = _mm256_packus_epi16(ind, v_zero);
+ const __m256i px = _mm256_permute4x64_epi64(p1, 0x58);
+ const __m128i d2 = _mm256_extracti128_si256(px, 0);
- _mm_storeu_si128((__m128i *)indices, d1);
+ _mm_storeu_si128((__m128i *)indices, d2);
if (total_dist) {
// Square, convert to 32 bit and add together.
- dist[0] = _mm256_madd_epi16(dist[0], dist[0]);
+ dist_min = _mm256_madd_epi16(dist_min, dist_min);
// Convert to 64 bit and add to sum.
- const __m256i dist1 = _mm256_unpacklo_epi32(dist[0], v_zero);
- const __m256i dist2 = _mm256_unpackhi_epi32(dist[0], v_zero);
+ const __m256i dist1 = _mm256_unpacklo_epi32(dist_min, v_zero);
+ const __m256i dist2 = _mm256_unpackhi_epi32(dist_min, v_zero);
sum = _mm256_add_epi64(sum, dist1);
sum = _mm256_add_epi64(sum, dist2);
}
@@ -74,46 +79,52 @@
void av1_calc_indices_dim2_avx2(const int16_t *data, const int16_t *centroids,
uint8_t *indices, int64_t *total_dist, int n,
int k) {
- __m256i dist[PALETTE_MAX_SIZE];
const __m256i v_zero = _mm256_setzero_si256();
+ const __m256i permute = _mm256_set_epi32(0, 0, 0, 0, 5, 1, 4, 0);
__m256i sum = _mm256_setzero_si256();
+ __m256i ind[2];
+ __m256i cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ const int16_t cx = centroids[2 * j], cy = centroids[2 * j + 1];
+ cents[j] = _mm256_set_epi16(cy, cx, cy, cx, cy, cx, cy, cx, cy, cx, cy, cx,
+ cy, cx, cy, cx);
+ }
- for (int i = 0; i < n; i += 8) {
- __m256i ind = _mm256_loadu_si256((__m256i *)data);
- for (int j = 0; j < k; j++) {
- const int16_t cx = centroids[2 * j], cy = centroids[2 * j + 1];
- const __m256i cent = _mm256_set_epi16(cy, cx, cy, cx, cy, cx, cy, cx, cy,
- cx, cy, cx, cy, cx, cy, cx);
- const __m256i d1 = _mm256_sub_epi16(ind, cent);
- dist[j] = _mm256_madd_epi16(d1, d1);
+ for (int i = 0; i < n; i += 16) {
+ for (int l = 0; l < 2; ++l) {
+ const __m256i in = _mm256_loadu_si256((__m256i *)data);
+ ind[l] = _mm256_setzero_si256();
+ // Compute the distance to the first centroid.
+ __m256i d1 = _mm256_sub_epi16(in, cents[0]);
+ __m256i dist_min = _mm256_madd_epi16(d1, d1);
+
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ d1 = _mm256_sub_epi16(in, cents[j]);
+ const __m256i dist = _mm256_madd_epi16(d1, d1);
+ // Compare to the minimal one.
+ const __m256i cmp = _mm256_cmpgt_epi32(dist_min, dist);
+ dist_min = _mm256_min_epi32(dist_min, dist);
+ const __m256i ind1 = _mm256_set1_epi32(j);
+ ind[l] = _mm256_or_si256(_mm256_andnot_si256(cmp, ind[l]),
+ _mm256_and_si256(cmp, ind1));
+ }
+ if (total_dist) {
+ // Convert to 64 bit and add to sum.
+ const __m256i dist1 = _mm256_unpacklo_epi32(dist_min, v_zero);
+ const __m256i dist2 = _mm256_unpackhi_epi32(dist_min, v_zero);
+ sum = _mm256_add_epi64(sum, dist1);
+ sum = _mm256_add_epi64(sum, dist2);
+ }
+ data += 16;
}
-
- ind = _mm256_setzero_si256();
- for (int j = 1; j < k; j++) {
- __m256i cmp = _mm256_cmpgt_epi32(dist[0], dist[j]);
- dist[0] = _mm256_min_epi32(dist[0], dist[j]);
- const __m256i ind1 = _mm256_set1_epi32(j);
- ind = _mm256_or_si256(_mm256_andnot_si256(cmp, ind),
- _mm256_and_si256(cmp, ind1));
- }
-
- __m256i p1 = _mm256_packus_epi32(ind, v_zero);
- __m256i px = _mm256_permute4x64_epi64(p1, 0x58);
- __m256i p2 = _mm256_packus_epi16(px, v_zero);
- __m128i d1 = _mm256_extracti128_si256(p2, 0);
-
- _mm_storel_epi64((__m128i *)indices, d1);
-
- if (total_dist) {
- // Convert to 64 bit and add to sum.
- const __m256i dist1 = _mm256_unpacklo_epi32(dist[0], v_zero);
- const __m256i dist2 = _mm256_unpackhi_epi32(dist[0], v_zero);
- sum = _mm256_add_epi64(sum, dist1);
- sum = _mm256_add_epi64(sum, dist2);
- }
-
- indices += 8;
- data += 16;
+ // Cast to 8 bit and store.
+ const __m256i d2 = _mm256_packus_epi32(ind[0], ind[1]);
+ const __m256i d3 = _mm256_packus_epi16(d2, v_zero);
+ const __m256i d4 = _mm256_permutevar8x32_epi32(d3, permute);
+ const __m128i d5 = _mm256_extracti128_si256(d4, 0);
+ _mm_storeu_si128((__m128i *)indices, d5);
+ indices += 16;
}
if (total_dist) {
*total_dist = k_means_horizontal_sum_avx2(sum);
diff --git a/av1/encoder/x86/av1_k_means_sse2.c b/av1/encoder/x86/av1_k_means_sse2.c
index a284fa9..4338bf7 100644
--- a/av1/encoder/x86/av1_k_means_sse2.c
+++ b/av1/encoder/x86/av1_k_means_sse2.c
@@ -26,31 +26,37 @@
uint8_t *indices, int64_t *total_dist, int n,
int k) {
const __m128i v_zero = _mm_setzero_si128();
- __m128i dist[PALETTE_MAX_SIZE];
__m128i sum = _mm_setzero_si128();
+ __m128i cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ cents[j] = _mm_set1_epi16(centroids[j]);
+ }
for (int i = 0; i < n; i += 8) {
- __m128i in = _mm_loadu_si128((__m128i *)data);
- for (int j = 0; j < k; j++) {
- __m128i cent = _mm_set1_epi16(centroids[j]);
- __m128i d1 = _mm_sub_epi16(in, cent);
- __m128i d2 = _mm_sub_epi16(cent, in);
- dist[j] = _mm_max_epi16(d1, d2);
- }
-
+ const __m128i in = _mm_loadu_si128((__m128i *)data);
__m128i ind = _mm_setzero_si128();
- for (int j = 1; j < k; j++) {
- __m128i cmp = _mm_cmpgt_epi16(dist[0], dist[j]);
- dist[0] = _mm_min_epi16(dist[0], dist[j]);
- __m128i ind1 = _mm_set1_epi16(j);
+ // Compute the distance to the first centroid.
+ __m128i d1 = _mm_sub_epi16(in, cents[0]);
+ __m128i d2 = _mm_sub_epi16(cents[0], in);
+ __m128i dist_min = _mm_max_epi16(d1, d2);
+
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ d1 = _mm_sub_epi16(in, cents[j]);
+ d2 = _mm_sub_epi16(cents[j], in);
+ const __m128i dist = _mm_max_epi16(d1, d2);
+ // Compare to the minimal one.
+ const __m128i cmp = _mm_cmpgt_epi16(dist_min, dist);
+ dist_min = _mm_min_epi16(dist_min, dist);
+ const __m128i ind1 = _mm_set1_epi16(j);
ind = _mm_or_si128(_mm_andnot_si128(cmp, ind), _mm_and_si128(cmp, ind1));
}
if (total_dist) {
// Square, convert to 32 bit and add together.
- dist[0] = _mm_madd_epi16(dist[0], dist[0]);
+ dist_min = _mm_madd_epi16(dist_min, dist_min);
// Convert to 64 bit and add to sum.
- const __m128i dist1 = _mm_unpacklo_epi32(dist[0], v_zero);
- const __m128i dist2 = _mm_unpackhi_epi32(dist[0], v_zero);
+ const __m128i dist1 = _mm_unpacklo_epi32(dist_min, v_zero);
+ const __m128i dist2 = _mm_unpackhi_epi32(dist_min, v_zero);
sum = _mm_add_epi64(sum, dist1);
sum = _mm_add_epi64(sum, dist2);
}
@@ -68,45 +74,49 @@
uint8_t *indices, int64_t *total_dist, int n,
int k) {
const __m128i v_zero = _mm_setzero_si128();
- int l = 1;
- __m128i dist[PALETTE_MAX_SIZE];
- __m128i ind[2];
__m128i sum = _mm_setzero_si128();
+ __m128i ind[2];
+ __m128i cents[PALETTE_MAX_SIZE];
+ for (int j = 0; j < k; ++j) {
+ const int16_t cx = centroids[2 * j], cy = centroids[2 * j + 1];
+ cents[j] = _mm_set_epi16(cy, cx, cy, cx, cy, cx, cy, cx);
+ }
- for (int i = 0; i < n; i += 4) {
- l = (l == 0) ? 1 : 0;
- __m128i ind1 = _mm_loadu_si128((__m128i *)data);
- for (int j = 0; j < k; j++) {
- const int16_t cx = centroids[2 * j], cy = centroids[2 * j + 1];
- const __m128i cent = _mm_set_epi16(cy, cx, cy, cx, cy, cx, cy, cx);
- const __m128i d1 = _mm_sub_epi16(ind1, cent);
- dist[j] = _mm_madd_epi16(d1, d1);
- }
+ for (int i = 0; i < n; i += 8) {
+ for (int l = 0; l < 2; ++l) {
+ const __m128i in = _mm_loadu_si128((__m128i *)data);
+ ind[l] = _mm_setzero_si128();
+ // Compute the distance to the first centroid.
+ __m128i d1 = _mm_sub_epi16(in, cents[0]);
+ __m128i dist_min = _mm_madd_epi16(d1, d1);
- ind[l] = _mm_setzero_si128();
- for (int j = 1; j < k; j++) {
- __m128i cmp = _mm_cmpgt_epi32(dist[0], dist[j]);
- __m128i dist1 = _mm_andnot_si128(cmp, dist[0]);
- __m128i dist2 = _mm_and_si128(cmp, dist[j]);
- dist[0] = _mm_or_si128(dist1, dist2);
- ind1 = _mm_set1_epi32(j);
- ind[l] =
- _mm_or_si128(_mm_andnot_si128(cmp, ind[l]), _mm_and_si128(cmp, ind1));
+ for (int j = 1; j < k; ++j) {
+ // Compute the distance to the centroid.
+ d1 = _mm_sub_epi16(in, cents[j]);
+ const __m128i dist = _mm_madd_epi16(d1, d1);
+ // Compare to the minimal one.
+ const __m128i cmp = _mm_cmpgt_epi32(dist_min, dist);
+ const __m128i dist1 = _mm_andnot_si128(cmp, dist_min);
+ const __m128i dist2 = _mm_and_si128(cmp, dist);
+ dist_min = _mm_or_si128(dist1, dist2);
+ const __m128i ind1 = _mm_set1_epi32(j);
+ ind[l] = _mm_or_si128(_mm_andnot_si128(cmp, ind[l]),
+ _mm_and_si128(cmp, ind1));
+ }
+ if (total_dist) {
+ // Convert to 64 bit and add to sum.
+ const __m128i dist1 = _mm_unpacklo_epi32(dist_min, v_zero);
+ const __m128i dist2 = _mm_unpackhi_epi32(dist_min, v_zero);
+ sum = _mm_add_epi64(sum, dist1);
+ sum = _mm_add_epi64(sum, dist2);
+ }
+ data += 8;
}
- ind[l] = _mm_packus_epi16(ind[l], v_zero);
- if (total_dist) {
- // Convert to 64 bit and add to sum.
- const __m128i dist1 = _mm_unpacklo_epi32(dist[0], v_zero);
- const __m128i dist2 = _mm_unpackhi_epi32(dist[0], v_zero);
- sum = _mm_add_epi64(sum, dist1);
- sum = _mm_add_epi64(sum, dist2);
- }
- if (l == 1) {
- __m128i p2 = _mm_packus_epi16(_mm_unpacklo_epi64(ind[0], ind[1]), v_zero);
- _mm_storel_epi64((__m128i *)indices, p2);
- indices += 8;
- }
- data += 8;
+ // Cast to 8 bit and store.
+ const __m128i d2 = _mm_packus_epi16(ind[0], ind[1]);
+ const __m128i d3 = _mm_packus_epi16(d2, v_zero);
+ _mm_storel_epi64((__m128i *)indices, d3);
+ indices += 8;
}
if (total_dist) {
*total_dist = k_means_horizontal_sum_sse2(sum);
diff --git a/av1/encoder/x86/encodetxb_avx2.c b/av1/encoder/x86/encodetxb_avx2.c
index 30a4129..9627f75 100644
--- a/av1/encoder/x86/encodetxb_avx2.c
+++ b/av1/encoder/x86/encodetxb_avx2.c
@@ -23,11 +23,11 @@
void av1_txb_init_levels_avx2(const tran_low_t *const coeff, const int width,
const int height, uint8_t *const levels) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
const __m256i y_zeros = _mm256_setzero_si256();
const int32_t bottom_len = sizeof(*levels) * (TX_PAD_BOTTOM * stride);
- uint8_t *bottom_buf_end = levels + (height + TX_PAD_BOTTOM) * stride;
+ uint8_t *bottom_buf_end = levels + (width + TX_PAD_BOTTOM) * stride;
uint8_t *bottom_buf = bottom_buf_end - ((bottom_len + 31) & (~31));
do {
@@ -38,7 +38,7 @@
int i = 0;
uint8_t *ls = levels;
const tran_low_t *cf = coeff;
- if (width == 4) {
+ if (height == 4) {
do {
const __m256i c0 = yy_loadu_256(cf);
const __m256i c1 = yy_loadu_256(cf + 8);
@@ -50,8 +50,8 @@
ls += 32;
cf += 16;
i += 4;
- } while (i < height);
- } else if (width == 8) {
+ } while (i < width);
+ } else if (height == 8) {
do {
const __m256i coeffA = yy_loadu_256(cf);
const __m256i coeffB = yy_loadu_256(cf + 8);
@@ -67,18 +67,18 @@
const __m128i res0 = _mm256_castsi256_si128(res);
const __m128i res1 = _mm256_extracti128_si256(res, 1);
xx_storel_64(ls, res0);
- *(int32_t *)(ls + width) = 0;
+ *(int32_t *)(ls + height) = 0;
xx_storel_64(ls + stride, _mm_srli_si128(res0, 8));
- *(int32_t *)(ls + width + stride) = 0;
+ *(int32_t *)(ls + height + stride) = 0;
xx_storel_64(ls + stride * 2, res1);
- *(int32_t *)(ls + width + stride * 2) = 0;
+ *(int32_t *)(ls + height + stride * 2) = 0;
xx_storel_64(ls + stride * 3, _mm_srli_si128(res1, 8));
- *(int32_t *)(ls + width + stride * 3) = 0;
+ *(int32_t *)(ls + height + stride * 3) = 0;
cf += 32;
ls += stride << 2;
i += 4;
- } while (i < height);
- } else if (width == 16) {
+ } while (i < width);
+ } else if (height == 16) {
do {
const __m256i coeffA = yy_loadu_256(cf);
const __m256i coeffB = yy_loadu_256(cf + 8);
@@ -94,11 +94,11 @@
xx_storeu_128(ls, _mm256_castsi256_si128(res));
xx_storeu_128(ls + stride, _mm256_extracti128_si256(res, 1));
cf += 32;
- *(int32_t *)(ls + width) = 0;
- *(int32_t *)(ls + stride + width) = 0;
+ *(int32_t *)(ls + height) = 0;
+ *(int32_t *)(ls + stride + height) = 0;
ls += stride << 1;
i += 2;
- } while (i < height);
+ } while (i < width);
} else {
do {
const __m256i coeffA = yy_loadu_256(cf);
@@ -114,9 +114,9 @@
const __m256i res = _mm256_shuffle_epi32(res_, 0xd8);
yy_storeu_256(ls, res);
cf += 32;
- *(int32_t *)(ls + width) = 0;
+ *(int32_t *)(ls + height) = 0;
ls += stride;
i += 1;
- } while (i < height);
+ } while (i < width);
}
}
diff --git a/av1/encoder/x86/encodetxb_sse2.c b/av1/encoder/x86/encodetxb_sse2.c
index 394befb..d23a688 100644
--- a/av1/encoder/x86/encodetxb_sse2.c
+++ b/av1/encoder/x86/encodetxb_sse2.c
@@ -70,22 +70,22 @@
}
static INLINE void get_4_nz_map_contexts_2d(const uint8_t *levels,
- const int height,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *const coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
const __m128i pos_to_offset_large = _mm_set1_epi8(21);
__m128i pos_to_offset =
- (height == 4)
+ (width == 4)
? _mm_setr_epi8(0, 1, 6, 6, 1, 6, 6, 21, 6, 6, 21, 21, 6, 21, 21, 21)
- : _mm_setr_epi8(0, 11, 11, 11, 11, 11, 11, 11, 6, 6, 21, 21, 6, 21,
+ : _mm_setr_epi8(0, 16, 16, 16, 16, 16, 16, 16, 6, 6, 21, 21, 6, 21,
21, 21);
__m128i count;
__m128i level[5];
int8_t *cc = coeff_contexts;
- int row = height;
+ int col = width;
- assert(!(height % 4));
+ assert(!(width % 4));
do {
load_levels_4x4x5_sse2(levels, stride, offsets, level);
@@ -95,14 +95,14 @@
pos_to_offset = pos_to_offset_large;
levels += 4 * stride;
cc += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
coeff_contexts[0] = 0;
}
-static INLINE void get_4_nz_map_contexts_hor(const uint8_t *levels,
- const int height,
+static INLINE void get_4_nz_map_contexts_ver(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
@@ -117,9 +117,9 @@
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10);
__m128i count;
__m128i level[5];
- int row = height;
+ int col = width;
- assert(!(height % 4));
+ assert(!(width % 4));
do {
load_levels_4x4x5_sse2(levels, stride, offsets, level);
@@ -128,12 +128,12 @@
_mm_store_si128((__m128i *)coeff_contexts, count);
levels += 4 * stride;
coeff_contexts += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
}
-static INLINE void get_4_nz_map_contexts_ver(const uint8_t *levels,
- const int height,
+static INLINE void get_4_nz_map_contexts_hor(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
const int stride = 4 + TX_PAD_HOR;
@@ -149,9 +149,9 @@
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10);
__m128i count;
__m128i level[5];
- int row = height;
+ int col = width;
- assert(!(height % 4));
+ assert(!(width % 4));
do {
load_levels_4x4x5_sse2(levels, stride, offsets, level);
@@ -161,36 +161,36 @@
pos_to_offset = pos_to_offset_large;
levels += 4 * stride;
coeff_contexts += 16;
- row -= 4;
- } while (row);
+ col -= 4;
+ } while (col);
}
static INLINE void get_8_coeff_contexts_2d(const uint8_t *levels,
- const int height,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
int8_t *cc = coeff_contexts;
- int row = height;
+ int col = width;
__m128i count;
__m128i level[5];
__m128i pos_to_offset[3];
- assert(!(height % 2));
+ assert(!(width % 2));
- if (height == 8) {
+ if (width == 8) {
pos_to_offset[0] =
_mm_setr_epi8(0, 1, 6, 6, 21, 21, 21, 21, 1, 6, 6, 21, 21, 21, 21, 21);
pos_to_offset[1] = _mm_setr_epi8(6, 6, 21, 21, 21, 21, 21, 21, 6, 21, 21,
21, 21, 21, 21, 21);
- } else if (height < 8) {
- pos_to_offset[0] = _mm_setr_epi8(0, 16, 6, 6, 21, 21, 21, 21, 16, 16, 6, 21,
+ } else if (width < 8) {
+ pos_to_offset[0] = _mm_setr_epi8(0, 11, 6, 6, 21, 21, 21, 21, 11, 11, 6, 21,
21, 21, 21, 21);
- pos_to_offset[1] = _mm_setr_epi8(16, 16, 21, 21, 21, 21, 21, 21, 16, 16, 21,
+ pos_to_offset[1] = _mm_setr_epi8(11, 11, 21, 21, 21, 21, 21, 21, 11, 11, 21,
21, 21, 21, 21, 21);
} else {
- pos_to_offset[0] = _mm_setr_epi8(0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
- 11, 11, 11, 11, 11);
+ pos_to_offset[0] = _mm_setr_epi8(0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+ 16, 16, 16, 16, 16);
pos_to_offset[1] = _mm_setr_epi8(6, 6, 21, 21, 21, 21, 21, 21, 6, 21, 21,
21, 21, 21, 21, 21);
}
@@ -205,14 +205,14 @@
pos_to_offset[1] = pos_to_offset[2];
levels += 2 * stride;
cc += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
coeff_contexts[0] = 0;
}
-static INLINE void get_8_coeff_contexts_hor(const uint8_t *levels,
- const int height,
+static INLINE void get_8_coeff_contexts_ver(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
@@ -225,11 +225,11 @@
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10);
- int row = height;
+ int col = width;
__m128i count;
__m128i level[5];
- assert(!(height % 2));
+ assert(!(width % 2));
do {
load_levels_8x2x5_sse2(levels, stride, offsets, level);
@@ -238,12 +238,12 @@
_mm_store_si128((__m128i *)coeff_contexts, count);
levels += 2 * stride;
coeff_contexts += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
}
-static INLINE void get_8_coeff_contexts_ver(const uint8_t *levels,
- const int height,
+static INLINE void get_8_coeff_contexts_hor(const uint8_t *levels,
+ const int width,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
const int stride = 8 + TX_PAD_HOR;
@@ -257,11 +257,11 @@
SIG_COEF_CONTEXTS_2D + 5, SIG_COEF_CONTEXTS_2D + 5,
SIG_COEF_CONTEXTS_2D + 5, SIG_COEF_CONTEXTS_2D + 5,
SIG_COEF_CONTEXTS_2D + 5, SIG_COEF_CONTEXTS_2D + 5);
- int row = height;
+ int col = width;
__m128i count;
__m128i level[5];
- assert(!(height % 2));
+ assert(!(width % 2));
do {
load_levels_8x2x5_sse2(levels, stride, offsets, level);
@@ -271,8 +271,8 @@
pos_to_offset = pos_to_offset_large;
levels += 2 * stride;
coeff_contexts += 16;
- row -= 2;
- } while (row);
+ col -= 2;
+ } while (col);
}
static INLINE void get_16n_coeff_contexts_2d(const uint8_t *levels,
@@ -281,15 +281,15 @@
const int width, const int height,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
int8_t *cc = coeff_contexts;
- int row = height;
+ int col = width;
__m128i pos_to_offset[5];
__m128i pos_to_offset_large[3];
__m128i count;
__m128i level[5];
- assert(!(width % 16));
+ assert(!(height % 16));
pos_to_offset_large[2] = _mm_set1_epi8(21);
if (real_width == real_height) {
@@ -303,27 +303,27 @@
21, 21, 21, 21, 21);
pos_to_offset[4] = pos_to_offset_large[0] = pos_to_offset_large[1] =
pos_to_offset_large[2];
- } else if (real_width > real_height) {
- pos_to_offset[0] = _mm_setr_epi8(0, 16, 6, 6, 21, 21, 21, 21, 21, 21, 21,
+ } else if (real_width < real_height) {
+ pos_to_offset[0] = _mm_setr_epi8(0, 11, 6, 6, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21);
- pos_to_offset[1] = _mm_setr_epi8(16, 16, 6, 21, 21, 21, 21, 21, 21, 21, 21,
+ pos_to_offset[1] = _mm_setr_epi8(11, 11, 6, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21);
pos_to_offset[2] = pos_to_offset[3] = pos_to_offset[4] = _mm_setr_epi8(
- 16, 16, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21);
+ 11, 11, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21);
pos_to_offset_large[0] = pos_to_offset_large[1] = pos_to_offset_large[2];
- } else { // real_width < real_height
+ } else { // real_width > real_height
pos_to_offset[0] = pos_to_offset[1] = _mm_setr_epi8(
- 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11);
+ 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16);
pos_to_offset[2] = _mm_setr_epi8(6, 6, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21);
pos_to_offset[3] = _mm_setr_epi8(6, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 21, 21, 21);
pos_to_offset[4] = pos_to_offset_large[2];
- pos_to_offset_large[0] = pos_to_offset_large[1] = _mm_set1_epi8(11);
+ pos_to_offset_large[0] = pos_to_offset_large[1] = _mm_set1_epi8(16);
}
do {
- int w = width;
+ int h = height;
do {
load_levels_16x1x5_sse2(levels, stride, offsets, level);
@@ -332,9 +332,9 @@
_mm_store_si128((__m128i *)cc, count);
levels += 16;
cc += 16;
- w -= 16;
+ h -= 16;
pos_to_offset[0] = pos_to_offset_large[0];
- } while (w);
+ } while (h);
pos_to_offset[0] = pos_to_offset[1];
pos_to_offset[1] = pos_to_offset[2];
@@ -343,16 +343,16 @@
pos_to_offset_large[0] = pos_to_offset_large[1];
pos_to_offset_large[1] = pos_to_offset_large[2];
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
coeff_contexts[0] = 0;
}
-static INLINE void get_16n_coeff_contexts_hor(const uint8_t *levels,
+static INLINE void get_16n_coeff_contexts_ver(const uint8_t *levels,
const int width, const int height,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
const __m128i pos_to_offset_large =
_mm_setr_epi8(SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
@@ -364,9 +364,9 @@
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10);
__m128i count;
__m128i level[5];
- int row = height;
+ int col = width;
- assert(!(width % 16));
+ assert(!(height % 16));
do {
__m128i pos_to_offset =
@@ -378,7 +378,7 @@
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10,
SIG_COEF_CONTEXTS_2D + 10, SIG_COEF_CONTEXTS_2D + 10);
- int w = width;
+ int h = height;
do {
load_levels_16x1x5_sse2(levels, stride, offsets, level);
@@ -388,31 +388,31 @@
pos_to_offset = pos_to_offset_large;
levels += 16;
coeff_contexts += 16;
- w -= 16;
- } while (w);
+ h -= 16;
+ } while (h);
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
}
-static INLINE void get_16n_coeff_contexts_ver(const uint8_t *levels,
+static INLINE void get_16n_coeff_contexts_hor(const uint8_t *levels,
const int width, const int height,
const ptrdiff_t *const offsets,
int8_t *coeff_contexts) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
__m128i pos_to_offset[3];
__m128i count;
__m128i level[5];
- int row = height;
+ int col = width;
- assert(!(width % 16));
+ assert(!(height % 16));
pos_to_offset[0] = _mm_set1_epi8(SIG_COEF_CONTEXTS_2D + 0);
pos_to_offset[1] = _mm_set1_epi8(SIG_COEF_CONTEXTS_2D + 5);
pos_to_offset[2] = _mm_set1_epi8(SIG_COEF_CONTEXTS_2D + 10);
do {
- int w = width;
+ int h = height;
do {
load_levels_16x1x5_sse2(levels, stride, offsets, level);
@@ -421,13 +421,13 @@
_mm_store_si128((__m128i *)coeff_contexts, count);
levels += 16;
coeff_contexts += 16;
- w -= 16;
- } while (w);
+ h -= 16;
+ } while (h);
pos_to_offset[0] = pos_to_offset[1];
pos_to_offset[1] = pos_to_offset[2];
levels += TX_PAD_HOR;
- } while (--row);
+ } while (--col);
}
// Note: levels[] must be in the range [0, 127], inclusive.
@@ -446,7 +446,7 @@
const int real_height = tx_size_high[tx_size];
const int width = get_txb_wide(tx_size);
const int height = get_txb_high(tx_size);
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
ptrdiff_t offsets[3];
/* coeff_contexts must be 16 byte aligned. */
@@ -457,11 +457,11 @@
offsets[1] = 1 * stride + 1;
offsets[2] = 2 * stride + 0;
- if (width == 4) {
- get_4_nz_map_contexts_2d(levels, height, offsets, coeff_contexts);
- } else if (width == 8) {
- get_8_coeff_contexts_2d(levels, height, offsets, coeff_contexts);
- } else if (width == 16) {
+ if (height == 4) {
+ get_4_nz_map_contexts_2d(levels, width, offsets, coeff_contexts);
+ } else if (height == 8) {
+ get_8_coeff_contexts_2d(levels, width, offsets, coeff_contexts);
+ } else if (height == 16) {
get_16n_coeff_contexts_2d(levels, real_width, real_height, width, height,
offsets, coeff_contexts);
} else {
@@ -469,36 +469,36 @@
offsets, coeff_contexts);
}
} else if (tx_class == TX_CLASS_HORIZ) {
- offsets[0] = 2;
- offsets[1] = 3;
- offsets[2] = 4;
- if (width == 4) {
- get_4_nz_map_contexts_hor(levels, height, offsets, coeff_contexts);
- } else if (width == 8) {
- get_8_coeff_contexts_hor(levels, height, offsets, coeff_contexts);
+ offsets[0] = 2 * stride;
+ offsets[1] = 3 * stride;
+ offsets[2] = 4 * stride;
+ if (height == 4) {
+ get_4_nz_map_contexts_hor(levels, width, offsets, coeff_contexts);
+ } else if (height == 8) {
+ get_8_coeff_contexts_hor(levels, width, offsets, coeff_contexts);
} else {
get_16n_coeff_contexts_hor(levels, width, height, offsets,
coeff_contexts);
}
} else { // TX_CLASS_VERT
- offsets[0] = 2 * stride;
- offsets[1] = 3 * stride;
- offsets[2] = 4 * stride;
- if (width == 4) {
- get_4_nz_map_contexts_ver(levels, height, offsets, coeff_contexts);
- } else if (width == 8) {
- get_8_coeff_contexts_ver(levels, height, offsets, coeff_contexts);
+ offsets[0] = 2;
+ offsets[1] = 3;
+ offsets[2] = 4;
+ if (height == 4) {
+ get_4_nz_map_contexts_ver(levels, width, offsets, coeff_contexts);
+ } else if (height == 8) {
+ get_8_coeff_contexts_ver(levels, width, offsets, coeff_contexts);
} else {
get_16n_coeff_contexts_ver(levels, width, height, offsets,
coeff_contexts);
}
}
- const int bwl = get_txb_bwl(tx_size);
+ const int bhl = get_txb_bhl(tx_size);
const int pos = scan[last_idx];
- if (last_idx <= (height << bwl) / 8)
+ if (last_idx <= (width << bhl) / 8)
coeff_contexts[pos] = 1;
- else if (last_idx <= (height << bwl) / 4)
+ else if (last_idx <= (width << bhl) / 4)
coeff_contexts[pos] = 2;
else
coeff_contexts[pos] = 3;
diff --git a/av1/encoder/x86/encodetxb_sse4.c b/av1/encoder/x86/encodetxb_sse4.c
index aeb57f2..72bd8e3 100644
--- a/av1/encoder/x86/encodetxb_sse4.c
+++ b/av1/encoder/x86/encodetxb_sse4.c
@@ -20,11 +20,11 @@
void av1_txb_init_levels_sse4_1(const tran_low_t *const coeff, const int width,
const int height, uint8_t *const levels) {
- const int stride = width + TX_PAD_HOR;
+ const int stride = height + TX_PAD_HOR;
const __m128i zeros = _mm_setzero_si128();
const int32_t bottom_len = sizeof(*levels) * (TX_PAD_BOTTOM * stride);
- uint8_t *bottom_buf = levels + stride * height;
+ uint8_t *bottom_buf = levels + stride * width;
uint8_t *bottom_buf_end = bottom_buf + bottom_len;
do {
_mm_storeu_si128((__m128i *)(bottom_buf), zeros);
@@ -34,7 +34,7 @@
int i = 0;
uint8_t *ls = levels;
const tran_low_t *cf = coeff;
- if (width == 4) {
+ if (height == 4) {
do {
const __m128i coeffA = xx_loadu_128(cf);
const __m128i coeffB = xx_loadu_128(cf + 4);
@@ -44,10 +44,10 @@
const __m128i lsAB = _mm_unpacklo_epi32(absAB8, zeros);
xx_storeu_128(ls, lsAB);
ls += (stride << 1);
- cf += (width << 1);
+ cf += (height << 1);
i += 2;
- } while (i < height);
- } else if (width == 8) {
+ } while (i < width);
+ } else if (height == 8) {
do {
const __m128i coeffA = xx_loadu_128(cf);
const __m128i coeffB = xx_loadu_128(cf + 4);
@@ -56,9 +56,9 @@
const __m128i absAB8 = _mm_packs_epi16(absAB, zeros);
xx_storeu_128(ls, absAB8);
ls += stride;
- cf += width;
+ cf += height;
i += 1;
- } while (i < height);
+ } while (i < width);
} else {
do {
int j = 0;
@@ -75,10 +75,10 @@
xx_storeu_128(ls + j, absABCD);
j += 16;
cf += 16;
- } while (j < width);
- *(int32_t *)(ls + width) = 0;
+ } while (j < height);
+ *(int32_t *)(ls + height) = 0;
ls += stride;
i += 1;
- } while (i < height);
+ } while (i < width);
}
}
diff --git a/av1/encoder/x86/error_intrin_sse2.c b/av1/encoder/x86/error_intrin_sse2.c
index e876db1..61f65c6 100644
--- a/av1/encoder/x86/error_intrin_sse2.c
+++ b/av1/encoder/x86/error_intrin_sse2.c
@@ -65,11 +65,11 @@
accum = reduce_sum_epi64(accum);
// Store the results.
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
return _mm_cvtsi128_si64(accum);
#else
int64_t result;
_mm_storel_epi64((__m128i *)&result, accum);
return result;
-#endif // ARCH_X86_64
+#endif // AOM_ARCH_X86_64
}
diff --git a/av1/encoder/x86/error_sse2.asm b/av1/encoder/x86/error_sse2.asm
index f4b4968..6407c10 100644
--- a/av1/encoder/x86/error_sse2.asm
+++ b/av1/encoder/x86/error_sse2.asm
@@ -75,7 +75,7 @@
movhlps m7, m6
paddq m4, m5
paddq m6, m7
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
movq rax, m4
movq [sszq], m6
%else
diff --git a/av1/encoder/x86/highbd_fwd_txfm_avx2.c b/av1/encoder/x86/highbd_fwd_txfm_avx2.c
index 1faa412..9cdf21f 100644
--- a/av1/encoder/x86/highbd_fwd_txfm_avx2.c
+++ b/av1/encoder/x86/highbd_fwd_txfm_avx2.c
@@ -561,8 +561,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fdct8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case ADST_DCT:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
@@ -572,8 +571,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fdct8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case DCT_ADST:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
@@ -583,8 +581,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case ADST_ADST:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
@@ -594,8 +591,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case FLIPADST_DCT:
load_buffer_8x8_avx2(input, in, stride, 1, 0, shift[0]);
@@ -605,8 +601,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fdct8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case DCT_FLIPADST:
load_buffer_8x8_avx2(input, in, stride, 0, 1, shift[0]);
@@ -616,8 +611,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case FLIPADST_FLIPADST:
load_buffer_8x8_avx2(input, in, stride, 1, 1, shift[0]);
@@ -627,8 +621,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case ADST_FLIPADST:
load_buffer_8x8_avx2(input, in, stride, 0, 1, shift[0]);
@@ -638,8 +631,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case FLIPADST_ADST:
load_buffer_8x8_avx2(input, in, stride, 1, 0, shift[0]);
@@ -649,26 +641,27 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case IDTX:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
idtx8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
col_txfm_8x8_rounding(out, -shift[1]);
- idtx8_avx2(out, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
+ idtx8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case V_DCT:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
fdct8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
col_txfm_8x8_rounding(out, -shift[1]);
- idtx8_avx2(out, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
+ idtx8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case H_DCT:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
@@ -678,17 +671,17 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fdct8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case V_ADST:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
fadst8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
col_txfm_8x8_rounding(out, -shift[1]);
- idtx8_avx2(out, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
+ idtx8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case H_ADST:
load_buffer_8x8_avx2(input, in, stride, 0, 0, shift[0]);
@@ -698,17 +691,17 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case V_FLIPADST:
load_buffer_8x8_avx2(input, in, stride, 1, 0, shift[0]);
fadst8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
col_txfm_8x8_rounding(out, -shift[1]);
- idtx8_avx2(out, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
+ idtx8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
case H_FLIPADST:
load_buffer_8x8_avx2(input, in, stride, 0, 1, shift[0]);
@@ -718,8 +711,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
fadst8_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_8x8_avx2(out, in, width_div8, width_div8);
- store_buffer_avx2(in, coeff, 8, 8);
+ store_buffer_avx2(out, coeff, 8, 8);
break;
default: assert(0);
}
@@ -1333,9 +1325,7 @@
fwd_txfm_transpose_8x8_avx2(out, in, 1, 2);
fwd_txfm_transpose_8x8_avx2(&out[8], &in[1], 1, 2);
row_txfm(in, out, bit, 2, 2);
- fwd_txfm_transpose_8x8_avx2(out, in, 2, 1);
- fwd_txfm_transpose_8x8_avx2(&out[1], &in[8], 2, 1);
- round_shift_rect_array_32_avx2(in, in, 16, -shift[2], NewSqrt2);
+ round_shift_rect_array_32_avx2(out, in, 16, -shift[2], NewSqrt2);
store_buffer_avx2(in, coeff, 8, 16);
(void)bd;
}
@@ -1394,10 +1384,8 @@
fwd_txfm_transpose_8x8_avx2(out, in, 2, 1);
fwd_txfm_transpose_8x8_avx2(&out[1], &in[8], 2, 1);
row_txfm(in, out, bit, 1, 1);
- fwd_txfm_transpose_8x8_avx2(out, in, 1, 2);
- fwd_txfm_transpose_8x8_avx2(&out[8], &in[1], 1, 2);
- round_shift_rect_array_32_avx2(in, in, 16, -shift[2], NewSqrt2);
- store_buffer_avx2(in, coeff, 8, 16);
+ round_shift_rect_array_32_avx2(out, out, 16, -shift[2], NewSqrt2);
+ store_buffer_avx2(out, coeff, 8, 16);
(void)bd;
}
void av1_fwd_txfm2d_16x16_avx2(const int16_t *input, int32_t *coeff, int stride,
@@ -1422,8 +1410,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fdct16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case ADST_DCT:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1434,8 +1421,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fdct16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case DCT_ADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1446,8 +1432,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case ADST_ADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1458,8 +1443,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case FLIPADST_DCT:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 1, 0);
@@ -1470,8 +1454,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fdct16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case DCT_FLIPADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 1);
@@ -1482,8 +1465,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case FLIPADST_FLIPADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 1, 1);
@@ -1494,8 +1476,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case ADST_FLIPADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 1);
@@ -1506,8 +1487,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case FLIPADST_ADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 1, 0);
@@ -1518,8 +1498,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case IDTX:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1527,9 +1506,10 @@
idtx16_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
round_shift_32_8xn_avx2(out, size, shift[1], width_div16);
- idtx16_avx2(out, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_16x16_avx2(out, in);
+ idtx16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case V_DCT:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1537,9 +1517,10 @@
fdct16_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
round_shift_32_8xn_avx2(out, size, shift[1], width_div16);
- idtx16_avx2(out, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_16x16_avx2(out, in);
+ idtx16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case H_DCT:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1550,8 +1531,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fdct16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case V_ADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1559,9 +1539,10 @@
fadst16_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
round_shift_32_8xn_avx2(out, size, shift[1], width_div16);
- idtx16_avx2(out, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_16x16_avx2(out, in);
+ idtx16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case H_ADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 0);
@@ -1572,8 +1553,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case V_FLIPADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 1, 0);
@@ -1581,9 +1561,10 @@
fadst16_avx2(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], width_div8,
width_div8);
round_shift_32_8xn_avx2(out, size, shift[1], width_div16);
- idtx16_avx2(out, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
+ fwd_txfm_transpose_16x16_avx2(out, in);
+ idtx16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
case H_FLIPADST:
load_buffer_16xn_avx2(input, in, stride, height, width_div8, 0, 1);
@@ -1594,8 +1575,7 @@
fwd_txfm_transpose_16x16_avx2(out, in);
fadst16_avx2(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], width_div8,
width_div8);
- fwd_txfm_transpose_16x16_avx2(out, in);
- store_buffer_avx2(in, coeff, 8, 32);
+ store_buffer_avx2(out, coeff, 8, 32);
break;
default: assert(0);
}
@@ -2091,15 +2071,7 @@
round_shift_32_8xn_avx2(&buf1[(i << 1) + 1], height, shift[2], width_div8);
}
- for (r = 0; r < height; r += 8) {
- for (c = 0; c < width_div8; c++) {
- fwd_txfm_transpose_8x8_avx2(&buf1[r * width_div8 + c],
- &buf0[c * 8 * width_div8 + (r >> 3)],
- width_div8, width_div8);
- }
- }
-
- store_buffer_avx2(buf0, output, 8, 128);
+ store_buffer_avx2(buf1, output, 8, 128);
}
static INLINE void fdct64_stage2_avx2(__m256i *x1, __m256i *x2,
__m256i *cospi_m32, __m256i *cospi_p32,
@@ -3156,12 +3128,5 @@
width_div16);
}
- for (r = 0; r < (height >> 1); r += 8) {
- for (c = 0; c < width_div16; c++) {
- fwd_txfm_transpose_8x8_avx2(&buf0[r * width_div16 + c],
- &buf1[c * 8 * width_div16 + (r >> 3)],
- width_div16, width_div16);
- }
- }
- store_buffer_avx2(buf1, output, 8, 128);
+ store_buffer_avx2(buf0, output, 8, 128);
}
diff --git a/av1/encoder/x86/highbd_fwd_txfm_sse4.c b/av1/encoder/x86/highbd_fwd_txfm_sse4.c
index 73f9b44..158b4ae 100644
--- a/av1/encoder/x86/highbd_fwd_txfm_sse4.c
+++ b/av1/encoder/x86/highbd_fwd_txfm_sse4.c
@@ -22,6 +22,13 @@
#include "config/aom_config.h"
#include "config/av1_rtcd.h"
+static INLINE void store_output_w4(int32_t *const out, const __m128i *const in,
+ const int stride, const int out_size) {
+ for (int i = 0; i < out_size; ++i) {
+ _mm_store_si128((__m128i *)(out + i * stride), in[i]);
+ }
+}
+
void av1_fwht4x4_sse4_1(const int16_t *input, tran_low_t *output, int stride) {
__m128i in[4];
in[0] = _mm_loadl_epi64((const __m128i *)(input + 0 * stride));
@@ -57,7 +64,9 @@
op[2] = d1;
op[3] = b1;
- transpose_32bit_4x4(op, op);
+ if (i == 0) {
+ transpose_32bit_4x4(op, op);
+ }
}
op[0] = _mm_slli_epi32(op[0], UNIT_QUANT_SHIFT);
@@ -71,11 +80,6 @@
_mm_storeu_si128((__m128i *)(output + 12), op[3]);
}
-void av1_highbd_fwht4x4_sse4_1(const int16_t *input, tran_low_t *output,
- int stride) {
- av1_fwht4x4_sse4_1(input, output, stride);
-}
-
static INLINE void load_buffer_4x4(const int16_t *input, __m128i *in,
int stride, int flipud, int fliplr,
int shift) {
@@ -160,16 +164,10 @@
// Note: shift[1] and shift[2] are zeros
- // Transpose 4x4 32-bit
- v0 = _mm_unpacklo_epi32(u0, u1);
- v1 = _mm_unpackhi_epi32(u0, u1);
- v2 = _mm_unpacklo_epi32(u2, u3);
- v3 = _mm_unpackhi_epi32(u2, u3);
-
- out[0] = _mm_unpacklo_epi64(v0, v2);
- out[1] = _mm_unpackhi_epi64(v0, v2);
- out[2] = _mm_unpacklo_epi64(v1, v3);
- out[3] = _mm_unpackhi_epi64(v1, v3);
+ out[0] = u0;
+ out[1] = u1;
+ out[2] = u2;
+ out[3] = u3;
}
static INLINE void write_buffer_4x4(__m128i *res, int32_t *output) {
@@ -191,7 +189,6 @@
__m128i s0, s1, s2, s3, s4, s5, s6, s7;
__m128i x0, x1, x2, x3;
__m128i u0, u1, u2, u3;
- __m128i v0, v1, v2, v3;
int idx = 0 * num_col;
s0 = _mm_mullo_epi32(in[idx], sinpi1);
@@ -232,39 +229,22 @@
u3 = _mm_add_epi32(s3, rnding);
u3 = _mm_srai_epi32(u3, bit);
- v0 = _mm_unpacklo_epi32(u0, u1);
- v1 = _mm_unpackhi_epi32(u0, u1);
- v2 = _mm_unpacklo_epi32(u2, u3);
- v3 = _mm_unpackhi_epi32(u2, u3);
-
- out[0] = _mm_unpacklo_epi64(v0, v2);
- out[1] = _mm_unpackhi_epi64(v0, v2);
- out[2] = _mm_unpacklo_epi64(v1, v3);
- out[3] = _mm_unpackhi_epi64(v1, v3);
+ out[0] = u0;
+ out[1] = u1;
+ out[2] = u2;
+ out[3] = u3;
}
static void idtx4x4_sse4_1(__m128i *in, __m128i *out, int bit, int col_num) {
(void)bit;
__m128i fact = _mm_set1_epi32(NewSqrt2);
__m128i offset = _mm_set1_epi32(1 << (NewSqrt2Bits - 1));
__m128i a_low;
- __m128i v[4];
for (int i = 0; i < 4; i++) {
a_low = _mm_mullo_epi32(in[i * col_num], fact);
a_low = _mm_add_epi32(a_low, offset);
out[i] = _mm_srai_epi32(a_low, NewSqrt2Bits);
}
-
- // Transpose for 4x4
- v[0] = _mm_unpacklo_epi32(out[0], out[1]);
- v[1] = _mm_unpackhi_epi32(out[0], out[1]);
- v[2] = _mm_unpacklo_epi32(out[2], out[3]);
- v[3] = _mm_unpackhi_epi32(out[2], out[3]);
-
- out[0] = _mm_unpacklo_epi64(v[0], v[2]);
- out[1] = _mm_unpackhi_epi64(v[0], v[2]);
- out[2] = _mm_unpacklo_epi64(v[1], v[3]);
- out[3] = _mm_unpackhi_epi64(v[1], v[3]);
}
void av1_fwd_txfm2d_4x4_sse4_1(const int16_t *input, int32_t *coeff,
int input_stride, TX_TYPE tx_type, int bd) {
@@ -277,96 +257,112 @@
case DCT_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case DCT_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_DCT:
load_buffer_4x4(input, in, input_stride, 1, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case DCT_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, shift[0]);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_4x4(input, in, input_stride, 1, 1, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case ADST_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case FLIPADST_ADST:
load_buffer_4x4(input, in, input_stride, 1, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case IDTX:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_DCT:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fdct4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_ADST:
load_buffer_4x4(input, in, input_stride, 0, 0, shift[0]);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_col[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case V_FLIPADST:
load_buffer_4x4(input, in, input_stride, 1, 0, shift[0]);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
case H_FLIPADST:
load_buffer_4x4(input, in, input_stride, 0, 1, shift[0]);
idtx4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
+ transpose_32bit_4x4(in, in);
fadst4x4_sse4_1(in, in, av1_fwd_cos_bit_row[txw_idx][txh_idx], 1);
write_buffer_4x4(in, coeff);
break;
@@ -911,8 +907,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fdct8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -920,8 +915,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fdct8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case DCT_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -929,8 +923,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -938,8 +931,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_DCT:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -947,8 +939,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fdct8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case DCT_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -956,8 +947,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_8x8(input, in, stride, 1, 1, shift[0]);
@@ -965,8 +955,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case ADST_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -974,8 +963,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case FLIPADST_ADST:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -983,8 +971,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case IDTX:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -992,8 +979,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
idtx8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -1001,8 +987,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
idtx8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_DCT:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -1010,8 +995,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fdct8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -1019,8 +1003,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
idtx8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_ADST:
load_buffer_8x8(input, in, stride, 0, 0, shift[0]);
@@ -1028,8 +1011,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case V_FLIPADST:
load_buffer_8x8(input, in, stride, 1, 0, shift[0]);
@@ -1037,8 +1019,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
idtx8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
case H_FLIPADST:
load_buffer_8x8(input, in, stride, 0, 1, shift[0]);
@@ -1046,8 +1027,7 @@
col_txfm_8x8_rounding(out, -shift[1]);
transpose_8x8(out, in);
fadst8x8_sse4_1(in, out, av1_fwd_cos_bit_col[txw_idx][txh_idx], 2);
- transpose_8x8(out, in);
- write_buffer_8x8(in, coeff);
+ write_buffer_8x8(out, coeff);
break;
default: assert(0);
}
@@ -1819,8 +1799,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
fdct16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1829,8 +1808,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
fdct16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case DCT_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1839,8 +1817,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1850,8 +1827,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_DCT:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1860,8 +1836,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
fdct16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case DCT_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1870,8 +1845,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_FLIPADST:
load_buffer_16x16(input, in, stride, 1, 1, shift[0]);
@@ -1881,8 +1855,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case ADST_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1892,8 +1865,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case FLIPADST_ADST:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1903,8 +1875,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case IDTX:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1912,8 +1883,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
idtx16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1921,8 +1891,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
idtx16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_DCT:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1930,8 +1899,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
fdct16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1940,8 +1908,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
idtx16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_ADST:
load_buffer_16x16(input, in, stride, 0, 0, shift[0]);
@@ -1950,8 +1917,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case V_FLIPADST:
load_buffer_16x16(input, in, stride, 1, 0, shift[0]);
@@ -1960,8 +1926,7 @@
col_txfm_16x16_rounding(out, -shift[1]);
transpose_16x16(out, in);
idtx16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx], col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
case H_FLIPADST:
load_buffer_16x16(input, in, stride, 0, 1, shift[0]);
@@ -1970,8 +1935,7 @@
transpose_16x16(out, in);
fadst16x16_sse4_1(in, out, av1_fwd_cos_bit_row[txw_idx][txh_idx],
col_num);
- transpose_16x16(out, in);
- write_buffer_16x16(in, coeff);
+ write_buffer_16x16(out, coeff);
break;
default: assert(0);
}
@@ -2218,11 +2182,10 @@
}
for (int i = 0; i < 2; i++) {
- transpose_8x8(out + i * 16, in);
- av1_round_shift_rect_array_32_sse4_1(in, in, 16, -shift[2], NewSqrt2);
- write_buffer_16x8(in, coeff + i * 8, 16);
+ av1_round_shift_rect_array_32_sse4_1(out + i * 16, in, 16, -shift[2],
+ NewSqrt2);
+ write_buffer_8x8(in, coeff + i * 64);
}
-
(void)bd;
}
@@ -2246,11 +2209,9 @@
for (int i = 0; i < 2; i++) {
row_txfm(out + i * 16, out, bit, 2);
- transpose_8x8(out, in);
- av1_round_shift_rect_array_32_sse4_1(in, in, 16, -shift[2], NewSqrt2);
- write_buffer_8x8(in, coeff + i * 64);
+ av1_round_shift_rect_array_32_sse4_1(out, out, 16, -shift[2], NewSqrt2);
+ write_buffer_16x8(out, coeff + i * 8, 16);
}
-
(void)bd;
}
@@ -2278,8 +2239,10 @@
transpose_8nx8n(outcoeff128, in, txfm_size_col, txfm_size_row);
// row transform
- for (int i = 0; i < txfm_size_col; i++) {
- row_txfm(in + i, outcoeff128 + i * txfm_size_col, bitrow, txfm_size_col);
+ for (int i = 0; i < 4; i++) {
+ __m128i tmp[4];
+ row_txfm(in + i, tmp, bitrow, txfm_size_row >> 2);
+ store_output_w4(coeff + i * 4, tmp, txfm_size_row, txfm_size_col);
}
(void)bd;
}
@@ -2304,15 +2267,15 @@
// col transform
load_buffer_16x4(input, in, stride, ud_flip, lr_flip, shift[0]);
- for (int i = 0; i < txfm_size_row; i++) {
- col_txfm(in + i * txfm_size_row, outcoeff128 + i * txfm_size_row, bitcol,
- 1);
+ for (int i = 0; i < (txfm_size_col >> 2); i++) {
+ __m128i *cur_in = &in[i * txfm_size_row];
+ col_txfm(cur_in, cur_in, bitcol, 1);
+ transpose_32bit_4x4(cur_in, cur_in);
}
- col_txfm_8x8_rounding(outcoeff128, -shift[1]);
+ col_txfm_8x8_rounding(in, -shift[1]);
// row transform
- row_txfm(outcoeff128, in, bitrow, 1);
- transpose_8nx8n(in, outcoeff128, txfm_size_row, txfm_size_col);
+ row_txfm(in, outcoeff128, bitrow, 1);
(void)bd;
}
@@ -2341,8 +2304,7 @@
// row transform
row_txfm(outcoef128, in, bitrow, 8);
- transpose_8nx8n(in, outcoef128, 32, 16);
- av1_round_shift_rect_array_32_sse4_1(outcoef128, outcoef128, 128, -shift[2],
+ av1_round_shift_rect_array_32_sse4_1(in, outcoef128, 128, -shift[2],
NewSqrt2);
(void)bd;
}
@@ -2376,9 +2338,10 @@
for (int i = 0; i < num_row; i++) {
av1_fdct32_sse4_1((outcoef128 + i), (in + i), bitrow, num_row);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
- av1_round_shift_rect_array_32_sse4_1(outcoef128, outcoef128, 512, -shift[2],
- NewSqrt2);
+ for (int i = 0; i < txfm_size_col; i++) {
+ av1_round_shift_rect_array_32_sse4_1(in + i * 16, outcoef128 + i * 8, 8,
+ -shift[2], NewSqrt2);
+ }
(void)bd;
}
@@ -2421,9 +2384,8 @@
for (int i = 0; i < num_row; i++) {
av1_fdct64_sse4_1((outcoef128 + i), (in + i), bitrow, num_row, num_row);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col >> 1);
- av1_round_shift_rect_array_32_sse4_1(outcoef128, outcoef128, 512 >> 1,
- -shift[2], NewSqrt2);
+ av1_round_shift_rect_array_32_sse4_1(in, outcoef128, 512, -shift[2],
+ NewSqrt2);
(void)bd;
}
@@ -2450,8 +2412,7 @@
for (int i = 0; i < 4; i++) {
row_txfm((outcoef128 + i), (in + i), bitrow, 4);
}
- transpose_8nx8n(in, outcoef128, 16, 32);
- av1_round_shift_rect_array_32_sse4_1(outcoef128, outcoef128, 128, -shift[2],
+ av1_round_shift_rect_array_32_sse4_1(in, outcoef128, 128, -shift[2],
NewSqrt2);
(void)bd;
}
@@ -2486,9 +2447,8 @@
// row transform
for (int i = 0; i < txfm_size_col; i += 2) {
- row_txfm((outcoef128 + i), (in + i), bitrow, txfm_size_col);
+ row_txfm((outcoef128 + i), (outcoef128 + i), bitrow, txfm_size_col);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
(void)bd;
}
@@ -2519,9 +2479,8 @@
// row transform
for (int i = 0; i < num_col; i++) {
- row_txfm((outcoef128 + i), (in + i), bitrow, num_col);
+ row_txfm((outcoef128 + i), (outcoef128 + i), bitrow, num_col);
}
- transpose_8nx8n(in, outcoef128, txfm_size_row, txfm_size_col);
(void)bd;
}
#endif
@@ -2529,7 +2488,6 @@
void av1_fwd_txfm2d_4x8_sse4_1(const int16_t *input, int32_t *coeff, int stride,
TX_TYPE tx_type, int bd) {
__m128i in[8];
- __m128i *outcoeff128 = (__m128i *)coeff;
const int8_t *shift = av1_fwd_txfm_shift_ls[TX_4X8];
const int txw_idx = get_txw_idx(TX_4X8);
const int txh_idx = get_txh_idx(TX_4X8);
@@ -2546,13 +2504,15 @@
load_buffer_4x8(input, in, stride, ud_flip, lr_flip, shift[0]);
col_txfm(in, in, bitcol, 1);
col_txfm_4x8_rounding(in, -shift[1]);
- transpose_8nx8n(in, outcoeff128, txfm_size_col, txfm_size_row);
for (int i = 0; i < 2; i++) {
- row_txfm(outcoeff128 + i, in + i * txfm_size_col, bitrow, 2);
+ __m128i *cur_in = &in[i * 4];
+ transpose_32bit_4x4(cur_in, cur_in);
+ row_txfm(cur_in, cur_in, bitrow, 1);
+ av1_round_shift_rect_array_32_sse4_1(cur_in, cur_in, txfm_size_col,
+ -shift[2], NewSqrt2);
+ store_output_w4(coeff + i * 4, cur_in, txfm_size_row, 4);
}
- av1_round_shift_rect_array_32_sse4_1(in, outcoeff128, txfm_size_row,
- -shift[2], NewSqrt2);
(void)bd;
}
@@ -2574,15 +2534,16 @@
// col tranform
load_buffer_8x4(input, in, stride, ud_flip, lr_flip, shift[0]);
for (int i = 0; i < 2; i++) {
- col_txfm(in + i * txfm_size_row, in + i * txfm_size_row, bitcol, 1);
+ __m128i *cur_in = &in[i * txfm_size_row];
+ col_txfm(cur_in, cur_in, bitcol, 1);
+ transpose_32bit_4x4(cur_in, cur_in);
}
col_txfm_4x8_rounding(in, -shift[1]);
// row tranform
row_txfm(in, outcoeff128, bitrow, 1);
- av1_round_shift_rect_array_32_sse4_1(outcoeff128, in, txfm_size_col,
+ av1_round_shift_rect_array_32_sse4_1(outcoeff128, outcoeff128, txfm_size_col,
-shift[2], NewSqrt2);
- transpose_8nx8n(in, outcoeff128, txfm_size_row, txfm_size_col);
(void)bd;
}
@@ -2623,9 +2584,7 @@
col_txfm_16x16_rounding(outcoeff128 + 192, -shift[1]);
transpose_8nx8n(outcoeff128, in, txfm_size_col, 32);
- fdct16x16_sse4_1(in, in, bitrow, 8);
- transpose_8nx8n(in, outcoeff128, 32, txfm_size_col);
- memset(coeff + txfm_size_col * 32, 0, txfm_size_col * 32 * sizeof(*coeff));
+ fdct16x16_sse4_1(in, outcoeff128, bitrow, 8);
(void)bd;
}
@@ -2662,9 +2621,9 @@
transpose_8nx8n(outcoeff128, in, txfm_size_col, txfm_size_row);
for (int i = 0; i < 4; i++) {
- av1_fdct64_sse4_1(in + i, in + i, bitrow, 4, 4);
+ av1_fdct64_sse4_1(in + i, outcoeff128 + i, bitrow, 4, 4);
}
- transpose_8nx8n(in, outcoeff128, txfm_size_row, 32);
+ memset(coeff + txfm_size_row * 32, 0, txfm_size_row * 32 * sizeof(*coeff));
(void)bd;
}
#endif
diff --git a/av1/encoder/x86/highbd_temporal_filter_avx2.c b/av1/encoder/x86/highbd_temporal_filter_avx2.c
index 68509fa..ca448ca 100644
--- a/av1/encoder/x86/highbd_temporal_filter_avx2.c
+++ b/av1/encoder/x86/highbd_temporal_filter_avx2.c
@@ -13,6 +13,7 @@
#include <immintrin.h>
#include "config/av1_rtcd.h"
+#include "aom_dsp/mathutils.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/temporal_filter.h"
@@ -147,7 +148,8 @@
const int *subblock_mses, unsigned int *accumulator, uint16_t *count,
uint32_t *frame_sse, uint32_t *luma_sse_sum, int bd,
const double inv_num_ref_pixels, const double decay_factor,
- const double inv_factor, const double weight_factor, double *d_factor) {
+ const double inv_factor, const double weight_factor, double *d_factor,
+ int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -304,28 +306,61 @@
acc_5x5_sse[row][col + 3] = xx_mask_and_hadd(vsum, 3);
}
- for (int i = 0, k = 0; i < block_height; i++) {
- for (int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame2[i * stride2 + j];
- uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+ double subblock_mses_scaled[4];
+ double d_factor_decayed[4];
+ for (int idx = 0; idx < 4; idx++) {
+ subblock_mses_scaled[idx] = subblock_mses[idx] * inv_factor;
+ d_factor_decayed[idx] = d_factor[idx] * decay_factor;
+ }
+ if (tf_wgt_calc_lvl == 0) {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
- // Scale down the difference for high bit depth input.
- diff_sse >>= ((bd - 8) * 2);
+ // Scale down the difference for high bit depth input.
+ diff_sse >>= ((bd - 8) * 2);
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
- count[k] += weight;
- accumulator[k] += weight * pixel_value;
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
+ }
+ } else {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+
+ // Scale down the difference for high bit depth input.
+ diff_sse >>= ((bd - 8) * 2);
+
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
+
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ const int weight = iroundpf(fweight);
+
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
}
}
}
@@ -335,7 +370,8 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
const int is_high_bitdepth = frame_to_filter->flags & YV12_FLAG_HIGHBITDEPTH;
assert(block_size == BLOCK_32X32 && "Only support 32x32 block with sse2!");
assert(TF_WINDOW_LENGTH == 5 && "Only support window length 5 with sse2!");
@@ -424,7 +460,7 @@
ref, frame_stride, pred1 + plane_offset, plane_w, plane_w, plane_h,
subblock_mses, accum + plane_offset, count + plane_offset, frame_sse,
luma_sse_sum, mbd->bd, inv_num_ref_pixels, decay_factor, inv_factor,
- weight_factor, d_factor);
+ weight_factor, d_factor, tf_wgt_calc_lvl);
plane_offset += plane_h * plane_w;
}
}
diff --git a/av1/encoder/x86/highbd_temporal_filter_sse2.c b/av1/encoder/x86/highbd_temporal_filter_sse2.c
index 1bfdaf7..2032847 100644
--- a/av1/encoder/x86/highbd_temporal_filter_sse2.c
+++ b/av1/encoder/x86/highbd_temporal_filter_sse2.c
@@ -13,6 +13,7 @@
#include <emmintrin.h>
#include "config/av1_rtcd.h"
+#include "aom_dsp/mathutils.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/temporal_filter.h"
@@ -95,7 +96,8 @@
const int *subblock_mses, unsigned int *accumulator, uint16_t *count,
uint32_t *frame_sse, uint32_t *luma_sse_sum, int bd,
const double inv_num_ref_pixels, const double decay_factor,
- const double inv_factor, const double weight_factor, double *d_factor) {
+ const double inv_factor, const double weight_factor, double *d_factor,
+ int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -179,28 +181,61 @@
}
}
- for (int i = 0, k = 0; i < block_height; i++) {
- for (int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame2[i * stride2 + j];
- uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+ double subblock_mses_scaled[4];
+ double d_factor_decayed[4];
+ for (int idx = 0; idx < 4; idx++) {
+ subblock_mses_scaled[idx] = subblock_mses[idx] * inv_factor;
+ d_factor_decayed[idx] = d_factor[idx] * decay_factor;
+ }
+ if (tf_wgt_calc_lvl == 0) {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
- // Scale down the difference for high bit depth input.
- diff_sse >>= ((bd - 8) * 2);
+ // Scale down the difference for high bit depth input.
+ diff_sse >>= ((bd - 8) * 2);
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
- count[k] += weight;
- accumulator[k] += weight * pixel_value;
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
+ }
+ } else {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+
+ // Scale down the difference for high bit depth input.
+ diff_sse >>= ((bd - 8) * 2);
+
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
+
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ const int weight = iroundpf(fweight);
+
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
}
}
}
@@ -210,7 +245,8 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
const int is_high_bitdepth = frame_to_filter->flags & YV12_FLAG_HIGHBITDEPTH;
assert(block_size == BLOCK_32X32 && "Only support 32x32 block with sse2!");
assert(TF_WINDOW_LENGTH == 5 && "Only support window length 5 with sse2!");
@@ -299,7 +335,7 @@
ref, frame_stride, pred1 + plane_offset, plane_w, plane_w, plane_h,
subblock_mses, accum + plane_offset, count + plane_offset, frame_sse,
luma_sse_sum, mbd->bd, inv_num_ref_pixels, decay_factor, inv_factor,
- weight_factor, d_factor);
+ weight_factor, d_factor, tf_wgt_calc_lvl);
plane_offset += plane_h * plane_w;
}
}
diff --git a/av1/encoder/x86/pickrst_avx2.c b/av1/encoder/x86/pickrst_avx2.c
index 3452f73..6658ed3 100644
--- a/av1/encoder/x86/pickrst_avx2.c
+++ b/av1/encoder/x86/pickrst_avx2.c
@@ -19,179 +19,6 @@
#include "av1/common/restoration.h"
#include "av1/encoder/pickrst.h"
-static INLINE void acc_stat_avx2(int32_t *dst, const uint8_t *src,
- const __m128i *shuffle, const __m256i *kl) {
- const __m128i s = _mm_shuffle_epi8(xx_loadu_128(src), *shuffle);
- const __m256i d0 = _mm256_madd_epi16(*kl, _mm256_cvtepu8_epi16(s));
- const __m256i dst0 = yy_load_256(dst);
- const __m256i r0 = _mm256_add_epi32(dst0, d0);
- yy_store_256(dst, r0);
-}
-
-static INLINE void acc_stat_win7_one_line_avx2(
- const uint8_t *dgd, const uint8_t *src, int h_start, int h_end,
- int dgd_stride, const __m128i *shuffle, int32_t *sumX,
- int32_t sumY[WIENER_WIN][WIENER_WIN], int32_t M_int[WIENER_WIN][WIENER_WIN],
- int32_t H_int[WIENER_WIN2][WIENER_WIN * 8]) {
- int j, k, l;
- const int wiener_win = WIENER_WIN;
- // Main loop handles two pixels at a time
- // We can assume that h_start is even, since it will always be aligned to
- // a tile edge + some number of restoration units, and both of those will
- // be 64-pixel aligned.
- // However, at the edge of the image, h_end may be odd, so we need to handle
- // that case correctly.
- assert(h_start % 2 == 0);
- const int h_end_even = h_end & ~1;
- const int has_odd_pixel = h_end & 1;
- for (j = h_start; j < h_end_even; j += 2) {
- const uint8_t X1 = src[j];
- const uint8_t X2 = src[j + 1];
- *sumX += X1 + X2;
- const uint8_t *dgd_ij = dgd + j;
- for (k = 0; k < wiener_win; k++) {
- const uint8_t *dgd_ijk = dgd_ij + k * dgd_stride;
- for (l = 0; l < wiener_win; l++) {
- int32_t *H_ = &H_int[(l * wiener_win + k)][0];
- const uint8_t D1 = dgd_ijk[l];
- const uint8_t D2 = dgd_ijk[l + 1];
- sumY[k][l] += D1 + D2;
- M_int[k][l] += D1 * X1 + D2 * X2;
-
- const __m256i kl =
- _mm256_cvtepu8_epi16(_mm_set1_epi16(loadu_int16(dgd_ijk + l)));
- acc_stat_avx2(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 3 * 8, dgd_ij + 3 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 4 * 8, dgd_ij + 4 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 5 * 8, dgd_ij + 5 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 6 * 8, dgd_ij + 6 * dgd_stride, shuffle, &kl);
- }
- }
- }
- // If the width is odd, add in the final pixel
- if (has_odd_pixel) {
- const uint8_t X1 = src[j];
- *sumX += X1;
- const uint8_t *dgd_ij = dgd + j;
- for (k = 0; k < wiener_win; k++) {
- const uint8_t *dgd_ijk = dgd_ij + k * dgd_stride;
- for (l = 0; l < wiener_win; l++) {
- int32_t *H_ = &H_int[(l * wiener_win + k)][0];
- const uint8_t D1 = dgd_ijk[l];
- sumY[k][l] += D1;
- M_int[k][l] += D1 * X1;
-
- // The `acc_stat_avx2` function wants its input to have interleaved
- // copies of two pixels, but we only have one. However, the pixels
- // are (effectively) used as inputs to a multiply-accumulate.
- // So if we set the extra pixel slot to 0, then it is effectively
- // ignored.
- const __m256i kl = _mm256_cvtepu8_epi16(_mm_set1_epi16((int16_t)D1));
- acc_stat_avx2(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 3 * 8, dgd_ij + 3 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 4 * 8, dgd_ij + 4 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 5 * 8, dgd_ij + 5 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 6 * 8, dgd_ij + 6 * dgd_stride, shuffle, &kl);
- }
- }
- }
-}
-
-static INLINE void compute_stats_win7_opt_avx2(
- const uint8_t *dgd, const uint8_t *src, int h_start, int h_end, int v_start,
- int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H,
- int use_downsampled_wiener_stats) {
- int i, j, k, l, m, n;
- const int wiener_win = WIENER_WIN;
- const int pixel_count = (h_end - h_start) * (v_end - v_start);
- const int wiener_win2 = wiener_win * wiener_win;
- const int wiener_halfwin = (wiener_win >> 1);
- uint8_t avg = find_average(dgd, h_start, h_end, v_start, v_end, dgd_stride);
-
- int32_t M_int32[WIENER_WIN][WIENER_WIN] = { { 0 } };
- int64_t M_int64[WIENER_WIN][WIENER_WIN] = { { 0 } };
- int32_t M_int32_row[WIENER_WIN][WIENER_WIN] = { { 0 } };
-
- DECLARE_ALIGNED(32, int32_t,
- H_int32[WIENER_WIN2][WIENER_WIN * 8]) = { { 0 } };
- DECLARE_ALIGNED(32, int32_t,
- H_int32_row[WIENER_WIN2][WIENER_WIN * 8]) = { { 0 } };
- int64_t H_int64[WIENER_WIN2][WIENER_WIN * 8] = { { 0 } };
- int32_t sumY[WIENER_WIN][WIENER_WIN] = { { 0 } };
- int32_t sumX = 0;
- const uint8_t *dgd_win = dgd - wiener_halfwin * dgd_stride - wiener_halfwin;
- int downsample_factor =
- use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
- int32_t sumX_row = 0;
- int32_t sumY_row[WIENER_WIN][WIENER_WIN] = { { 0 } };
-
- const __m128i shuffle = xx_loadu_128(g_shuffle_stats_data);
- for (j = v_start; j < v_end; j += 64) {
- const int vert_end = AOMMIN(64, v_end - j) + j;
- for (i = j; i < vert_end; i = i + downsample_factor) {
- if (use_downsampled_wiener_stats &&
- (vert_end - i < WIENER_STATS_DOWNSAMPLE_FACTOR)) {
- downsample_factor = vert_end - i;
- }
- sumX_row = 0;
- memset(sumY_row, 0, sizeof(int32_t) * WIENER_WIN * WIENER_WIN);
- memset(M_int32_row, 0, sizeof(int32_t) * WIENER_WIN * WIENER_WIN);
- memset(H_int32_row, 0, sizeof(int32_t) * WIENER_WIN2 * (WIENER_WIN * 8));
- acc_stat_win7_one_line_avx2(
- dgd_win + i * dgd_stride, src + i * src_stride, h_start, h_end,
- dgd_stride, &shuffle, &sumX_row, sumY_row, M_int32_row, H_int32_row);
- sumX += sumX_row * downsample_factor;
-
- // Scale M matrix based on the downsampling factor
- for (k = 0; k < wiener_win; ++k) {
- for (l = 0; l < wiener_win; ++l) {
- sumY[k][l] += (sumY_row[k][l] * downsample_factor);
- M_int32[k][l] += (M_int32_row[k][l] * downsample_factor);
- }
- }
- // Scale H matrix based on the downsampling factor
- for (k = 0; k < WIENER_WIN2; ++k) {
- for (l = 0; l < WIENER_WIN * 8; ++l) {
- H_int32[k][l] += (H_int32_row[k][l] * downsample_factor);
- }
- }
- }
- for (k = 0; k < wiener_win; ++k) {
- for (l = 0; l < wiener_win; ++l) {
- M_int64[k][l] += M_int32[k][l];
- M_int32[k][l] = 0;
- }
- }
- for (k = 0; k < WIENER_WIN2; ++k) {
- for (l = 0; l < WIENER_WIN * 8; ++l) {
- H_int64[k][l] += H_int32[k][l];
- H_int32[k][l] = 0;
- }
- }
- }
-
- const int64_t avg_square_sum = (int64_t)avg * (int64_t)avg * pixel_count;
- for (k = 0; k < wiener_win; k++) {
- for (l = 0; l < wiener_win; l++) {
- const int32_t idx0 = l * wiener_win + k;
- M[idx0] =
- M_int64[k][l] + (avg_square_sum - (int64_t)avg * (sumX + sumY[k][l]));
- int64_t *H_ = H + idx0 * wiener_win2;
- int64_t *H_int_ = &H_int64[idx0][0];
- for (m = 0; m < wiener_win; m++) {
- for (n = 0; n < wiener_win; n++) {
- H_[m * wiener_win + n] = H_int_[n * 8 + m] + avg_square_sum -
- (int64_t)avg * (sumY[k][l] + sumY[n][m]);
- }
- }
- }
- }
-}
-
#if CONFIG_AV1_HIGHBITDEPTH
static INLINE void acc_stat_highbd_avx2(int64_t *dst, const uint16_t *dgd,
const __m256i *shuffle,
@@ -537,188 +364,1173 @@
}
#endif // CONFIG_AV1_HIGHBITDEPTH
-static INLINE void acc_stat_win5_one_line_avx2(
- const uint8_t *dgd, const uint8_t *src, int h_start, int h_end,
- int dgd_stride, const __m128i *shuffle, int32_t *sumX,
- int32_t sumY[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA],
- int32_t M_int[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA],
- int32_t H_int[WIENER_WIN2_CHROMA][WIENER_WIN_CHROMA * 8]) {
- int j, k, l;
- const int wiener_win = WIENER_WIN_CHROMA;
- // Main loop handles two pixels at a time
- // We can assume that h_start is even, since it will always be aligned to
- // a tile edge + some number of restoration units, and both of those will
- // be 64-pixel aligned.
- // However, at the edge of the image, h_end may be odd, so we need to handle
- // that case correctly.
- assert(h_start % 2 == 0);
- const int h_end_even = h_end & ~1;
- const int has_odd_pixel = h_end & 1;
- for (j = h_start; j < h_end_even; j += 2) {
- const uint8_t X1 = src[j];
- const uint8_t X2 = src[j + 1];
- *sumX += X1 + X2;
- const uint8_t *dgd_ij = dgd + j;
- for (k = 0; k < wiener_win; k++) {
- const uint8_t *dgd_ijk = dgd_ij + k * dgd_stride;
- for (l = 0; l < wiener_win; l++) {
- int32_t *H_ = &H_int[(l * wiener_win + k)][0];
- const uint8_t D1 = dgd_ijk[l];
- const uint8_t D2 = dgd_ijk[l + 1];
- sumY[k][l] += D1 + D2;
- M_int[k][l] += D1 * X1 + D2 * X2;
+static INLINE void madd_and_accum_avx2(__m256i src, __m256i dgd, __m256i *sum) {
+ *sum = _mm256_add_epi32(*sum, _mm256_madd_epi16(src, dgd));
+}
- const __m256i kl =
- _mm256_cvtepu8_epi16(_mm_set1_epi16(loadu_int16(dgd_ijk + l)));
- acc_stat_avx2(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 3 * 8, dgd_ij + 3 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 4 * 8, dgd_ij + 4 * dgd_stride, shuffle, &kl);
- }
- }
+static INLINE __m256i convert_and_add_avx2(__m256i src) {
+ const __m256i s0 = _mm256_cvtepi32_epi64(_mm256_castsi256_si128(src));
+ const __m256i s1 = _mm256_cvtepi32_epi64(_mm256_extracti128_si256(src, 1));
+ return _mm256_add_epi64(s0, s1);
+}
+
+static INLINE __m256i hadd_four_32_to_64_avx2(__m256i src0, __m256i src1,
+ __m256i *src2, __m256i *src3) {
+ // 00 01 10 11 02 03 12 13
+ const __m256i s_0 = _mm256_hadd_epi32(src0, src1);
+ // 20 21 30 31 22 23 32 33
+ const __m256i s_1 = _mm256_hadd_epi32(*src2, *src3);
+ // 00+01 10+11 20+21 30+31 02+03 12+13 22+23 32+33
+ const __m256i s_2 = _mm256_hadd_epi32(s_0, s_1);
+ return convert_and_add_avx2(s_2);
+}
+
+static INLINE __m128i add_64bit_lvl_avx2(__m256i src0, __m256i src1) {
+ // 00 10 02 12
+ const __m256i t0 = _mm256_unpacklo_epi64(src0, src1);
+ // 01 11 03 13
+ const __m256i t1 = _mm256_unpackhi_epi64(src0, src1);
+ // 00+01 10+11 02+03 12+13
+ const __m256i sum = _mm256_add_epi64(t0, t1);
+ // 00+01 10+11
+ const __m128i sum0 = _mm256_castsi256_si128(sum);
+ // 02+03 12+13
+ const __m128i sum1 = _mm256_extracti128_si256(sum, 1);
+ // 00+01+02+03 10+11+12+13
+ return _mm_add_epi64(sum0, sum1);
+}
+
+static INLINE __m128i convert_32_to_64_add_avx2(__m256i src0, __m256i src1) {
+ // 00 01 02 03
+ const __m256i s0 = convert_and_add_avx2(src0);
+ // 10 11 12 13
+ const __m256i s1 = convert_and_add_avx2(src1);
+ return add_64bit_lvl_avx2(s0, s1);
+}
+
+static INLINE int32_t calc_sum_of_register(__m256i src) {
+ const __m128i src_l = _mm256_castsi256_si128(src);
+ const __m128i src_h = _mm256_extracti128_si256(src, 1);
+ const __m128i sum = _mm_add_epi32(src_l, src_h);
+ const __m128i dst0 = _mm_add_epi32(sum, _mm_srli_si128(sum, 8));
+ const __m128i dst1 = _mm_add_epi32(dst0, _mm_srli_si128(dst0, 4));
+ return _mm_cvtsi128_si32(dst1);
+}
+
+static INLINE void transpose_64bit_4x4_avx2(const __m256i *const src,
+ __m256i *const dst) {
+ // Unpack 64 bit elements. Goes from:
+ // src[0]: 00 01 02 03
+ // src[1]: 10 11 12 13
+ // src[2]: 20 21 22 23
+ // src[3]: 30 31 32 33
+ // to:
+ // reg0: 00 10 02 12
+ // reg1: 20 30 22 32
+ // reg2: 01 11 03 13
+ // reg3: 21 31 23 33
+ const __m256i reg0 = _mm256_unpacklo_epi64(src[0], src[1]);
+ const __m256i reg1 = _mm256_unpacklo_epi64(src[2], src[3]);
+ const __m256i reg2 = _mm256_unpackhi_epi64(src[0], src[1]);
+ const __m256i reg3 = _mm256_unpackhi_epi64(src[2], src[3]);
+
+ // Unpack 64 bit elements resulting in:
+ // dst[0]: 00 10 20 30
+ // dst[1]: 01 11 21 31
+ // dst[2]: 02 12 22 32
+ // dst[3]: 03 13 23 33
+ dst[0] = _mm256_inserti128_si256(reg0, _mm256_castsi256_si128(reg1), 1);
+ dst[1] = _mm256_inserti128_si256(reg2, _mm256_castsi256_si128(reg3), 1);
+ dst[2] = _mm256_inserti128_si256(reg1, _mm256_extracti128_si256(reg0, 1), 0);
+ dst[3] = _mm256_inserti128_si256(reg3, _mm256_extracti128_si256(reg2, 1), 0);
+}
+
+// When we load 32 values of int8_t type and need less than 32 values for
+// processing, the below mask is used to make the extra values zero.
+static const int8_t mask_8bit[32] = {
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 16 bytes
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 16 bytes
+};
+
+// When we load 16 values of int16_t type and need less than 16 values for
+// processing, the below mask is used to make the extra values zero.
+static const int16_t mask_16bit[32] = {
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 16 bytes
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 16 bytes
+};
+
+static INLINE uint8_t calc_dgd_buf_avg_avx2(const uint8_t *src, int32_t h_start,
+ int32_t h_end, int32_t v_start,
+ int32_t v_end, int32_t stride) {
+ const uint8_t *src_temp = src + v_start * stride + h_start;
+ const __m256i zero = _mm256_setzero_si256();
+ const int32_t width = h_end - h_start;
+ const int32_t height = v_end - v_start;
+ const int32_t wd_beyond_mul32 = width & 31;
+ const int32_t wd_mul32 = width - wd_beyond_mul32;
+ __m128i mask_low, mask_high;
+ __m256i ss = zero;
+
+ // When width is not multiple of 32, it still loads 32 and to make the data
+ // which is extra (beyond required) as zero using the below mask.
+ if (wd_beyond_mul32 >= 16) {
+ mask_low = _mm_set1_epi8(-1);
+ mask_high = _mm_loadu_si128((__m128i *)(&mask_8bit[32 - wd_beyond_mul32]));
+ } else {
+ mask_low = _mm_loadu_si128((__m128i *)(&mask_8bit[16 - wd_beyond_mul32]));
+ mask_high = _mm_setzero_si128();
}
- // If the width is odd, add in the final pixel
- if (has_odd_pixel) {
- const uint8_t X1 = src[j];
- *sumX += X1;
- const uint8_t *dgd_ij = dgd + j;
- for (k = 0; k < wiener_win; k++) {
- const uint8_t *dgd_ijk = dgd_ij + k * dgd_stride;
- for (l = 0; l < wiener_win; l++) {
- int32_t *H_ = &H_int[(l * wiener_win + k)][0];
- const uint8_t D1 = dgd_ijk[l];
- sumY[k][l] += D1;
- M_int[k][l] += D1 * X1;
+ const __m256i mask =
+ _mm256_inserti128_si256(_mm256_castsi128_si256(mask_low), mask_high, 1);
- // The `acc_stat_avx2` function wants its input to have interleaved
- // copies of two pixels, but we only have one. However, the pixels
- // are (effectively) used as inputs to a multiply-accumulate.
- // So if we set the extra pixel slot to 0, then it is effectively
- // ignored.
- const __m256i kl = _mm256_cvtepu8_epi16(_mm_set1_epi16((int16_t)D1));
- acc_stat_avx2(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 3 * 8, dgd_ij + 3 * dgd_stride, shuffle, &kl);
- acc_stat_avx2(H_ + 4 * 8, dgd_ij + 4 * dgd_stride, shuffle, &kl);
- }
+ int32_t proc_ht = 0;
+ do {
+ // Process width in multiple of 32.
+ int32_t proc_wd = 0;
+ while (proc_wd < wd_mul32) {
+ const __m256i s_0 = _mm256_loadu_si256((__m256i *)(src_temp + proc_wd));
+ const __m256i sad_0 = _mm256_sad_epu8(s_0, zero);
+ ss = _mm256_add_epi32(ss, sad_0);
+ proc_wd += 32;
+ }
+
+ // Process the remaining width.
+ if (wd_beyond_mul32) {
+ const __m256i s_0 = _mm256_loadu_si256((__m256i *)(src_temp + proc_wd));
+ const __m256i s_m_0 = _mm256_and_si256(s_0, mask);
+ const __m256i sad_0 = _mm256_sad_epu8(s_m_0, zero);
+ ss = _mm256_add_epi32(ss, sad_0);
+ }
+ src_temp += stride;
+ proc_ht++;
+ } while (proc_ht < height);
+
+ const uint32_t sum = calc_sum_of_register(ss);
+ const uint8_t avg = sum / (width * height);
+ return avg;
+}
+
+// Fill (src-avg) or (dgd-avg) buffers. Note that when n = (width % 16) is not
+// 0, it writes (16 - n) more data than required.
+static INLINE void sub_avg_block_avx2(const uint8_t *src, int32_t src_stride,
+ uint8_t avg, int32_t width,
+ int32_t height, int16_t *dst,
+ int32_t dst_stride,
+ int use_downsampled_wiener_stats) {
+ const __m256i avg_reg = _mm256_set1_epi16(avg);
+
+ int32_t proc_ht = 0;
+ do {
+ int ds_factor =
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
+ if (use_downsampled_wiener_stats &&
+ (height - proc_ht < WIENER_STATS_DOWNSAMPLE_FACTOR)) {
+ ds_factor = height - proc_ht;
+ }
+
+ int32_t proc_wd = 0;
+ while (proc_wd < width) {
+ const __m128i s = _mm_loadu_si128((__m128i *)(src + proc_wd));
+ const __m256i ss = _mm256_cvtepu8_epi16(s);
+ const __m256i d = _mm256_sub_epi16(ss, avg_reg);
+ _mm256_storeu_si256((__m256i *)(dst + proc_wd), d);
+ proc_wd += 16;
+ }
+
+ src += ds_factor * src_stride;
+ dst += ds_factor * dst_stride;
+ proc_ht += ds_factor;
+ } while (proc_ht < height);
+}
+
+// Fills lower-triangular elements of H buffer from upper triangular elements of
+// the same
+static INLINE void fill_lower_triag_elements_avx2(const int32_t wiener_win2,
+ int64_t *const H) {
+ for (int32_t i = 0; i < wiener_win2 - 1; i += 4) {
+ __m256i in[4], out[4];
+
+ in[0] = _mm256_loadu_si256((__m256i *)(H + (i + 0) * wiener_win2 + i + 1));
+ in[1] = _mm256_loadu_si256((__m256i *)(H + (i + 1) * wiener_win2 + i + 1));
+ in[2] = _mm256_loadu_si256((__m256i *)(H + (i + 2) * wiener_win2 + i + 1));
+ in[3] = _mm256_loadu_si256((__m256i *)(H + (i + 3) * wiener_win2 + i + 1));
+
+ transpose_64bit_4x4_avx2(in, out);
+
+ _mm_storel_epi64((__m128i *)(H + (i + 1) * wiener_win2 + i),
+ _mm256_castsi256_si128(out[0]));
+ _mm_storeu_si128((__m128i *)(H + (i + 2) * wiener_win2 + i),
+ _mm256_castsi256_si128(out[1]));
+ _mm256_storeu_si256((__m256i *)(H + (i + 3) * wiener_win2 + i), out[2]);
+ _mm256_storeu_si256((__m256i *)(H + (i + 4) * wiener_win2 + i), out[3]);
+
+ for (int32_t j = i + 5; j < wiener_win2; j += 4) {
+ in[0] = _mm256_loadu_si256((__m256i *)(H + (i + 0) * wiener_win2 + j));
+ in[1] = _mm256_loadu_si256((__m256i *)(H + (i + 1) * wiener_win2 + j));
+ in[2] = _mm256_loadu_si256((__m256i *)(H + (i + 2) * wiener_win2 + j));
+ in[3] = _mm256_loadu_si256((__m256i *)(H + (i + 3) * wiener_win2 + j));
+
+ transpose_64bit_4x4_avx2(in, out);
+
+ _mm256_storeu_si256((__m256i *)(H + (j + 0) * wiener_win2 + i), out[0]);
+ _mm256_storeu_si256((__m256i *)(H + (j + 1) * wiener_win2 + i), out[1]);
+ _mm256_storeu_si256((__m256i *)(H + (j + 2) * wiener_win2 + i), out[2]);
+ _mm256_storeu_si256((__m256i *)(H + (j + 3) * wiener_win2 + i), out[3]);
}
}
}
-static INLINE void compute_stats_win5_opt_avx2(
- const uint8_t *dgd, const uint8_t *src, int h_start, int h_end, int v_start,
- int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H,
- int use_downsampled_wiener_stats) {
- int i, j, k, l, m, n;
- const int wiener_win = WIENER_WIN_CHROMA;
- const int pixel_count = (h_end - h_start) * (v_end - v_start);
- const int wiener_win2 = wiener_win * wiener_win;
- const int wiener_halfwin = (wiener_win >> 1);
- uint8_t avg = find_average(dgd, h_start, h_end, v_start, v_end, dgd_stride);
-
- int32_t M_int32[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA] = { { 0 } };
- int32_t M_int32_row[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA] = { { 0 } };
- int64_t M_int64[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA] = { { 0 } };
- DECLARE_ALIGNED(
- 32, int32_t,
- H_int32[WIENER_WIN2_CHROMA][WIENER_WIN_CHROMA * 8]) = { { 0 } };
- DECLARE_ALIGNED(
- 32, int32_t,
- H_int32_row[WIENER_WIN2_CHROMA][WIENER_WIN_CHROMA * 8]) = { { 0 } };
- int64_t H_int64[WIENER_WIN2_CHROMA][WIENER_WIN_CHROMA * 8] = { { 0 } };
- int32_t sumY[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA] = { { 0 } };
- int32_t sumX = 0;
- const uint8_t *dgd_win = dgd - wiener_halfwin * dgd_stride - wiener_halfwin;
- int downsample_factor =
- use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
- int32_t sumX_row = 0;
- int32_t sumY_row[WIENER_WIN_CHROMA][WIENER_WIN_CHROMA] = { { 0 } };
-
- const __m128i shuffle = xx_loadu_128(g_shuffle_stats_data);
- for (j = v_start; j < v_end; j += 64) {
- const int vert_end = AOMMIN(64, v_end - j) + j;
- for (i = j; i < vert_end; i = i + downsample_factor) {
- if (use_downsampled_wiener_stats &&
- (vert_end - i < WIENER_STATS_DOWNSAMPLE_FACTOR)) {
- downsample_factor = vert_end - i;
- }
- sumX_row = 0;
- memset(sumY_row, 0,
- sizeof(int32_t) * WIENER_WIN_CHROMA * WIENER_WIN_CHROMA);
- memset(M_int32_row, 0,
- sizeof(int32_t) * WIENER_WIN_CHROMA * WIENER_WIN_CHROMA);
- memset(H_int32_row, 0,
- sizeof(int32_t) * WIENER_WIN2_CHROMA * (WIENER_WIN_CHROMA * 8));
- acc_stat_win5_one_line_avx2(
- dgd_win + i * dgd_stride, src + i * src_stride, h_start, h_end,
- dgd_stride, &shuffle, &sumX_row, sumY_row, M_int32_row, H_int32_row);
- sumX += sumX_row * downsample_factor;
-
- // Scale M matrix based on the downsampling factor
- for (k = 0; k < wiener_win; ++k) {
- for (l = 0; l < wiener_win; ++l) {
- sumY[k][l] += (sumY_row[k][l] * downsample_factor);
- M_int32[k][l] += (M_int32_row[k][l] * downsample_factor);
- }
- }
- // Scale H matrix based on the downsampling factor
- for (k = 0; k < WIENER_WIN2_CHROMA; ++k) {
- for (l = 0; l < WIENER_WIN_CHROMA * 8; ++l) {
- H_int32[k][l] += (H_int32_row[k][l] * downsample_factor);
- }
- }
- }
- for (k = 0; k < wiener_win; ++k) {
- for (l = 0; l < wiener_win; ++l) {
- M_int64[k][l] += M_int32[k][l];
- M_int32[k][l] = 0;
- }
- }
- for (k = 0; k < WIENER_WIN2_CHROMA; ++k) {
- for (l = 0; l < WIENER_WIN_CHROMA * 8; ++l) {
- H_int64[k][l] += H_int32[k][l];
- H_int32[k][l] = 0;
- }
- }
+// Fill H buffer based on loop_count.
+#define INIT_H_VALUES(d, loop_count) \
+ for (int g = 0; g < (loop_count); g++) { \
+ const __m256i dgd0 = \
+ _mm256_loadu_si256((__m256i *)((d) + (g * d_stride))); \
+ madd_and_accum_avx2(dgd_mul_df, dgd0, &sum_h[g]); \
}
- const int64_t avg_square_sum = (int64_t)avg * (int64_t)avg * pixel_count;
- for (k = 0; k < wiener_win; k++) {
- for (l = 0; l < wiener_win; l++) {
- const int32_t idx0 = l * wiener_win + k;
- M[idx0] =
- M_int64[k][l] + (avg_square_sum - (int64_t)avg * (sumX + sumY[k][l]));
- int64_t *H_ = H + idx0 * wiener_win2;
- int64_t *H_int_ = &H_int64[idx0][0];
- for (m = 0; m < wiener_win; m++) {
- for (n = 0; n < wiener_win; n++) {
- H_[m * wiener_win + n] = H_int_[n * 8 + m] + avg_square_sum -
- (int64_t)avg * (sumY[k][l] + sumY[n][m]);
- }
- }
- }
+// Fill M & H buffer.
+#define INIT_MH_VALUES(d) \
+ for (int g = 0; g < wiener_win; g++) { \
+ const __m256i dgds_0 = \
+ _mm256_loadu_si256((__m256i *)((d) + (g * d_stride))); \
+ madd_and_accum_avx2(src_mul_df, dgds_0, &sum_m[g]); \
+ madd_and_accum_avx2(dgd_mul_df, dgds_0, &sum_h[g]); \
}
+
+// Update the dgd pointers appropriately.
+#define INITIALIZATION(wiener_window_sz) \
+ j = i / (wiener_window_sz); \
+ const int16_t *d_window = d + j; \
+ const int16_t *d_current_row = \
+ d + j + ((i % (wiener_window_sz)) * d_stride); \
+ int proc_ht = v_start; \
+ downsample_factor = \
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1; \
+ __m256i sum_h[wiener_window_sz]; \
+ memset(sum_h, 0, sizeof(sum_h));
+
+// Update the downsample factor appropriately.
+#define UPDATE_DOWNSAMPLE_FACTOR \
+ int proc_wd = 0; \
+ if (use_downsampled_wiener_stats && \
+ ((v_end - proc_ht) < WIENER_STATS_DOWNSAMPLE_FACTOR)) { \
+ downsample_factor = v_end - proc_ht; \
+ } \
+ const __m256i df_reg = _mm256_set1_epi16(downsample_factor);
+
+#define CALCULATE_REMAINING_H_WIN5 \
+ while (j < wiener_win) { \
+ d_window = d; \
+ d_current_row = d + (i / wiener_win) + ((i % wiener_win) * d_stride); \
+ const __m256i zero = _mm256_setzero_si256(); \
+ sum_h[0] = zero; \
+ sum_h[1] = zero; \
+ sum_h[2] = zero; \
+ sum_h[3] = zero; \
+ sum_h[4] = zero; \
+ \
+ proc_ht = v_start; \
+ downsample_factor = \
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1; \
+ do { \
+ UPDATE_DOWNSAMPLE_FACTOR; \
+ \
+ /* Process the amount of width multiple of 16.*/ \
+ while (proc_wd < wd_mul16) { \
+ const __m256i dgd = \
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd)); \
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg); \
+ INIT_H_VALUES(d_window + j + proc_wd, 5) \
+ \
+ proc_wd += 16; \
+ }; \
+ \
+ /* Process the remaining width here. */ \
+ if (wd_beyond_mul16) { \
+ const __m256i dgd = \
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd)); \
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask); \
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg); \
+ INIT_H_VALUES(d_window + j + proc_wd, 5) \
+ } \
+ proc_ht += downsample_factor; \
+ d_window += downsample_factor * d_stride; \
+ d_current_row += downsample_factor * d_stride; \
+ } while (proc_ht < v_end); \
+ const __m256i s_h0 = \
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]); \
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + (wiener_win * j)), \
+ s_h0); \
+ const __m256i s_m_h = convert_and_add_avx2(sum_h[4]); \
+ const __m128i s_m_h0 = add_64bit_lvl_avx2(s_m_h, s_m_h); \
+ _mm_storel_epi64( \
+ (__m128i *)(H + (i * wiener_win2) + (wiener_win * j) + 4), s_m_h0); \
+ j++; \
+ }
+
+#define CALCULATE_REMAINING_H_WIN7 \
+ while (j < wiener_win) { \
+ d_window = d; \
+ d_current_row = d + (i / wiener_win) + ((i % wiener_win) * d_stride); \
+ const __m256i zero = _mm256_setzero_si256(); \
+ sum_h[0] = zero; \
+ sum_h[1] = zero; \
+ sum_h[2] = zero; \
+ sum_h[3] = zero; \
+ sum_h[4] = zero; \
+ sum_h[5] = zero; \
+ sum_h[6] = zero; \
+ \
+ proc_ht = v_start; \
+ downsample_factor = \
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1; \
+ do { \
+ UPDATE_DOWNSAMPLE_FACTOR; \
+ \
+ /* Process the amount of width multiple of 16.*/ \
+ while (proc_wd < wd_mul16) { \
+ const __m256i dgd = \
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd)); \
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg); \
+ INIT_H_VALUES(d_window + j + proc_wd, 7) \
+ \
+ proc_wd += 16; \
+ }; \
+ \
+ /* Process the remaining width here. */ \
+ if (wd_beyond_mul16) { \
+ const __m256i dgd = \
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd)); \
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask); \
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg); \
+ INIT_H_VALUES(d_window + j + proc_wd, 7) \
+ } \
+ proc_ht += downsample_factor; \
+ d_window += downsample_factor * d_stride; \
+ d_current_row += downsample_factor * d_stride; \
+ } while (proc_ht < v_end); \
+ const __m256i s_h1 = \
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]); \
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + (wiener_win * j)), \
+ s_h1); \
+ const __m256i s_h2 = \
+ hadd_four_32_to_64_avx2(sum_h[4], sum_h[5], &sum_h[6], &sum_h[6]); \
+ _mm256_storeu_si256( \
+ (__m256i *)(H + (i * wiener_win2) + (wiener_win * j) + 4), s_h2); \
+ j++; \
+ }
+
+// The buffers H(auto-covariance) and M(cross-correlation) are used to estimate
+// the filter tap values required for wiener filtering. Here, the buffer H is of
+// size ((wiener_window_size^2)*(wiener_window_size^2)) and M is of size
+// (wiener_window_size*wiener_window_size). H is a symmetric matrix where the
+// value above the diagonal (upper triangle) are equal to the values below the
+// diagonal (lower triangle). The calculation of elements/stats of H(upper
+// triangle) and M is done in steps as described below where each step fills
+// specific values of H and M.
+// Once the upper triangular elements of H matrix are derived, the same will be
+// copied to lower triangular using the function
+// fill_lower_triag_elements_avx2().
+// Example: Wiener window size =
+// WIENER_WIN_CHROMA (5) M buffer = [M0 M1 M2 ---- M23 M24] H buffer = Hxy
+// (x-row, y-column) [H00 H01 H02 ---- H023 H024] [H10 H11 H12 ---- H123 H124]
+// [H30 H31 H32 ---- H323 H324]
+// [H40 H41 H42 ---- H423 H424]
+// [H50 H51 H52 ---- H523 H524]
+// [H60 H61 H62 ---- H623 H624]
+// ||
+// ||
+// [H230 H231 H232 ---- H2323 H2324]
+// [H240 H241 H242 ---- H2423 H2424]
+// In Step 1, whole M buffers (i.e., M0 to M24) and the first row of H (i.e.,
+// H00 to H024) is filled. The remaining rows of H buffer are filled through
+// steps 2 to 6.
+static void compute_stats_win5_avx2(const int16_t *const d, int32_t d_stride,
+ const int16_t *const s, int32_t s_stride,
+ int32_t width, int v_start, int v_end,
+ int64_t *const M, int64_t *const H,
+ int use_downsampled_wiener_stats) {
+ const int32_t wiener_win = WIENER_WIN_CHROMA;
+ const int32_t wiener_win2 = wiener_win * wiener_win;
+ // Amount of width which is beyond multiple of 16. This case is handled
+ // appropriately to process only the required width towards the end.
+ const int32_t wd_mul16 = width & ~15;
+ const int32_t wd_beyond_mul16 = width - wd_mul16;
+ const __m256i mask =
+ _mm256_loadu_si256((__m256i *)(&mask_16bit[16 - wd_beyond_mul16]));
+ int downsample_factor;
+
+ // Step 1: Full M (i.e., M0 to M24) and first row H (i.e., H00 to H024)
+ // values are filled here. Here, the loop over 'j' is executed for values 0
+ // to 4 (wiener_win-1). When the loop executed for a specific 'j', 5 values of
+ // M and H are filled as shown below.
+ // j=0: M0-M4 and H00-H04, j=1: M5-M9 and H05-H09 are filled etc,.
+ int j = 0;
+ do {
+ const int16_t *s_t = s;
+ const int16_t *d_t = d;
+ __m256i sum_m[WIENER_WIN_CHROMA] = { _mm256_setzero_si256() };
+ __m256i sum_h[WIENER_WIN_CHROMA] = { _mm256_setzero_si256() };
+ downsample_factor =
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
+ int proc_ht = v_start;
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i src = _mm256_loadu_si256((__m256i *)(s_t + proc_wd));
+ const __m256i dgd = _mm256_loadu_si256((__m256i *)(d_t + proc_wd));
+ const __m256i src_mul_df = _mm256_mullo_epi16(src, df_reg);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_MH_VALUES(d_t + j + proc_wd)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i src = _mm256_loadu_si256((__m256i *)(s_t + proc_wd));
+ const __m256i dgd = _mm256_loadu_si256((__m256i *)(d_t + proc_wd));
+ const __m256i src_mask = _mm256_and_si256(src, mask);
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i src_mul_df = _mm256_mullo_epi16(src_mask, df_reg);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_MH_VALUES(d_t + j + proc_wd)
+ }
+ proc_ht += downsample_factor;
+ s_t += downsample_factor * s_stride;
+ d_t += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+
+ const __m256i s_m =
+ hadd_four_32_to_64_avx2(sum_m[0], sum_m[1], &sum_m[2], &sum_m[3]);
+ const __m128i s_m_h = convert_32_to_64_add_avx2(sum_m[4], sum_h[4]);
+ _mm256_storeu_si256((__m256i *)(M + wiener_win * j), s_m);
+ _mm_storel_epi64((__m128i *)&M[wiener_win * j + 4], s_m_h);
+
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + wiener_win * j), s_h);
+ _mm_storeh_epi64((__m128i *)&H[wiener_win * j + 4], s_m_h);
+ } while (++j < wiener_win);
+
+ // The below steps are designed to fill remaining rows of H buffer. Here, aim
+ // is to fill only upper triangle elements correspond to each row and lower
+ // triangle elements are copied from upper-triangle elements. Also, as
+ // mentioned in Step 1, the core function is designed to fill 5
+ // elements/stats/values of H buffer.
+ //
+ // Step 2: Here, the rows 1, 6, 11, 16 and 21 are filled. As we need to fill
+ // only upper-triangle elements, H10 from row1, H60-H64 and H65 from row6,etc,
+ // are need not be filled. As the core function process 5 values, in first
+ // iteration of 'j' only 4 values to be filled i.e., H11-H14 from row1,H66-H69
+ // from row6, etc.
+ for (int i = 1; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN_CHROMA)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (1 * d_stride), 4)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (1 * d_stride), 4)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN5
+ }
+
+ // Step 3: Here, the rows 2, 7, 12, 17 and 22 are filled. As we need to fill
+ // only upper-triangle elements, H20-H21 from row2, H70-H74 and H75-H76 from
+ // row7, etc, are need not be filled. As the core function process 5 values,
+ // in first iteration of 'j' only 3 values to be filled i.e., H22-H24 from
+ // row2, H77-H79 from row7, etc.
+ for (int i = 2; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN_CHROMA)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (2 * d_stride), 3)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (2 * d_stride), 3)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN5
+ }
+
+ // Step 4: Here, the rows 3, 8, 13, 18 and 23 are filled. As we need to fill
+ // only upper-triangle elements, H30-H32 from row3, H80-H84 and H85-H87 from
+ // row8, etc, are need not be filled. As the core function process 5 values,
+ // in first iteration of 'j' only 2 values to be filled i.e., H33-H34 from
+ // row3, H88-89 from row8, etc.
+ for (int i = 3; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN_CHROMA)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (3 * d_stride), 2)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (3 * d_stride), 2)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m128i s_h = convert_32_to_64_add_avx2(sum_h[0], sum_h[1]);
+ _mm_storeu_si128((__m128i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN5
+ }
+
+ // Step 5: Here, the rows 4, 9, 14, 19 and 24 are filled. As we need to fill
+ // only upper-triangle elements, H40-H43 from row4, H90-H94 and H95-H98 from
+ // row9, etc, are need not be filled. As the core function process 5 values,
+ // in first iteration of 'j' only 1 values to be filled i.e., H44 from row4,
+ // H99 from row9, etc.
+ for (int i = 4; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN_CHROMA)
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (4 * d_stride), 1)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (4 * d_stride), 1)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m128i s_h = convert_32_to_64_add_avx2(sum_h[0], sum_h[1]);
+ _mm_storeu_si128((__m128i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN5
+ }
+
+ // Step 6: Here, the rows 5, 10, 15 and 20 are filled. As we need to fill only
+ // upper-triangle elements, H50-H54 from row5, H100-H104 and H105-H109 from
+ // row10,etc, are need not be filled. The first iteration of 'j' fills H55-H59
+ // from row5 and H1010-H1014 from row10, etc.
+ for (int i = 5; i < wiener_win2; i += wiener_win) {
+ // Derive j'th iteration from where the H buffer filling needs to be
+ // started.
+ j = i / wiener_win;
+ int shift = 0;
+ do {
+ // Update the dgd pointers appropriately.
+ int proc_ht = v_start;
+ const int16_t *d_window = d + (i / wiener_win);
+ const int16_t *d_current_row =
+ d + (i / wiener_win) + ((i % wiener_win) * d_stride);
+ downsample_factor =
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
+ __m256i sum_h[WIENER_WIN_CHROMA] = { _mm256_setzero_si256() };
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + shift + proc_wd, 5)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + shift + proc_wd, 5)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + (wiener_win * j)),
+ s_h);
+ const __m256i s_m_h = convert_and_add_avx2(sum_h[4]);
+ const __m128i s_m_h0 = add_64bit_lvl_avx2(s_m_h, s_m_h);
+ _mm_storel_epi64(
+ (__m128i *)(H + (i * wiener_win2) + (wiener_win * j) + 4), s_m_h0);
+ shift++;
+ } while (++j < wiener_win);
+ }
+
+ fill_lower_triag_elements_avx2(wiener_win2, H);
+}
+
+// The buffers H(auto-covariance) and M(cross-correlation) are used to estimate
+// the filter tap values required for wiener filtering. Here, the buffer H is of
+// size ((wiener_window_size^2)*(wiener_window_size^2)) and M is of size
+// (wiener_window_size*wiener_window_size). H is a symmetric matrix where the
+// value above the diagonal (upper triangle) are equal to the values below the
+// diagonal (lower triangle). The calculation of elements/stats of H(upper
+// triangle) and M is done in steps as described below where each step fills
+// specific values of H and M.
+// Example:
+// Wiener window size = WIENER_WIN (7)
+// M buffer = [M0 M1 M2 ---- M47 M48]
+// H buffer = Hxy (x-row, y-column)
+// [H00 H01 H02 ---- H047 H048]
+// [H10 H11 H12 ---- H147 H148]
+// [H30 H31 H32 ---- H347 H348]
+// [H40 H41 H42 ---- H447 H448]
+// [H50 H51 H52 ---- H547 H548]
+// [H60 H61 H62 ---- H647 H648]
+// ||
+// ||
+// [H470 H471 H472 ---- H4747 H4748]
+// [H480 H481 H482 ---- H4847 H4848]
+// In Step 1, whole M buffers (i.e., M0 to M48) and the first row of H (i.e.,
+// H00 to H048) is filled. The remaining rows of H buffer are filled through
+// steps 2 to 8.
+static void compute_stats_win7_avx2(const int16_t *const d, int32_t d_stride,
+ const int16_t *const s, int32_t s_stride,
+ int32_t width, int v_start, int v_end,
+ int64_t *const M, int64_t *const H,
+ int use_downsampled_wiener_stats) {
+ const int32_t wiener_win = WIENER_WIN;
+ const int32_t wiener_win2 = wiener_win * wiener_win;
+ // Amount of width which is beyond multiple of 16. This case is handled
+ // appropriately to process only the required width towards the end.
+ const int32_t wd_mul16 = width & ~15;
+ const int32_t wd_beyond_mul16 = width - wd_mul16;
+ const __m256i mask =
+ _mm256_loadu_si256((__m256i *)(&mask_16bit[16 - wd_beyond_mul16]));
+ int downsample_factor;
+
+ // Step 1: Full M (i.e., M0 to M48) and first row H (i.e., H00 to H048)
+ // values are filled here. Here, the loop over 'j' is executed for values 0
+ // to 6. When the loop executed for a specific 'j', 7 values of M and H are
+ // filled as shown below.
+ // j=0: M0-M6 and H00-H06, j=1: M7-M13 and H07-H013 are filled etc,.
+ int j = 0;
+ do {
+ const int16_t *s_t = s;
+ const int16_t *d_t = d;
+ __m256i sum_m[WIENER_WIN] = { _mm256_setzero_si256() };
+ __m256i sum_h[WIENER_WIN] = { _mm256_setzero_si256() };
+ downsample_factor =
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
+ int proc_ht = v_start;
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i src = _mm256_loadu_si256((__m256i *)(s_t + proc_wd));
+ const __m256i dgd = _mm256_loadu_si256((__m256i *)(d_t + proc_wd));
+ const __m256i src_mul_df = _mm256_mullo_epi16(src, df_reg);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_MH_VALUES(d_t + j + proc_wd)
+
+ proc_wd += 16;
+ }
+
+ if (wd_beyond_mul16) {
+ const __m256i src = _mm256_loadu_si256((__m256i *)(s_t + proc_wd));
+ const __m256i dgd = _mm256_loadu_si256((__m256i *)(d_t + proc_wd));
+ const __m256i src_mask = _mm256_and_si256(src, mask);
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i src_mul_df = _mm256_mullo_epi16(src_mask, df_reg);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_MH_VALUES(d_t + j + proc_wd)
+ }
+ proc_ht += downsample_factor;
+ s_t += downsample_factor * s_stride;
+ d_t += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+
+ const __m256i s_m0 =
+ hadd_four_32_to_64_avx2(sum_m[0], sum_m[1], &sum_m[2], &sum_m[3]);
+ const __m256i s_m1 =
+ hadd_four_32_to_64_avx2(sum_m[4], sum_m[5], &sum_m[6], &sum_m[6]);
+ _mm256_storeu_si256((__m256i *)(M + wiener_win * j + 0), s_m0);
+ _mm_storeu_si128((__m128i *)(M + wiener_win * j + 4),
+ _mm256_castsi256_si128(s_m1));
+ _mm_storel_epi64((__m128i *)&M[wiener_win * j + 6],
+ _mm256_extracti128_si256(s_m1, 1));
+
+ const __m256i sh_0 =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ const __m256i sh_1 =
+ hadd_four_32_to_64_avx2(sum_h[4], sum_h[5], &sum_h[6], &sum_h[6]);
+ _mm256_storeu_si256((__m256i *)(H + wiener_win * j + 0), sh_0);
+ _mm_storeu_si128((__m128i *)(H + wiener_win * j + 4),
+ _mm256_castsi256_si128(sh_1));
+ _mm_storel_epi64((__m128i *)&H[wiener_win * j + 6],
+ _mm256_extracti128_si256(sh_1, 1));
+ } while (++j < wiener_win);
+
+ // The below steps are designed to fill remaining rows of H buffer. Here, aim
+ // is to fill only upper triangle elements correspond to each row and lower
+ // triangle elements are copied from upper-triangle elements. Also, as
+ // mentioned in Step 1, the core function is designed to fill 7
+ // elements/stats/values of H buffer.
+ //
+ // Step 2: Here, the rows 1, 8, 15, 22, 29, 36 and 43 are filled. As we need
+ // to fill only upper-triangle elements, H10 from row1, H80-H86 and H87 from
+ // row8, etc. are need not be filled. As the core function process 7 values,
+ // in first iteration of 'j' only 6 values to be filled i.e., H11-H16 from
+ // row1 and H88-H813 from row8, etc.
+ for (int i = 1; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (1 * d_stride), 6)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (1 * d_stride), 6)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+ const __m128i s_h0 = convert_32_to_64_add_avx2(sum_h[4], sum_h[5]);
+ _mm_storeu_si128((__m128i *)(H + (i * wiener_win2) + i + 4), s_h0);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 3: Here, the rows 2, 9, 16, 23, 30, 37 and 44 are filled. As we need
+ // to fill only upper-triangle elements, H20-H21 from row2, H90-H96 and
+ // H97-H98 from row9, etc. are need not be filled. As the core function
+ // process 7 values, in first iteration of 'j' only 5 values to be filled
+ // i.e., H22-H26 from row2 and H99-H913 from row9, etc.
+ for (int i = 2; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (2 * d_stride), 5)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (2 * d_stride), 5)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+ const __m256i s_m_h = convert_and_add_avx2(sum_h[4]);
+ const __m128i s_m_h0 = add_64bit_lvl_avx2(s_m_h, s_m_h);
+ _mm_storel_epi64((__m128i *)(H + (i * wiener_win2) + i + 4), s_m_h0);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 4: Here, the rows 3, 10, 17, 24, 31, 38 and 45 are filled. As we need
+ // to fill only upper-triangle elements, H30-H32 from row3, H100-H106 and
+ // H107-H109 from row10, etc. are need not be filled. As the core function
+ // process 7 values, in first iteration of 'j' only 4 values to be filled
+ // i.e., H33-H36 from row3 and H1010-H1013 from row10, etc.
+ for (int i = 3; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (3 * d_stride), 4)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (3 * d_stride), 4)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 5: Here, the rows 4, 11, 18, 25, 32, 39 and 46 are filled. As we need
+ // to fill only upper-triangle elements, H40-H43 from row4, H110-H116 and
+ // H117-H1110 from row10, etc. are need not be filled. As the core function
+ // process 7 values, in first iteration of 'j' only 3 values to be filled
+ // i.e., H44-H46 from row4 and H1111-H1113 from row11, etc.
+ for (int i = 4; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (4 * d_stride), 3)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (4 * d_stride), 3)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 6: Here, the rows 5, 12, 19, 26, 33, 40 and 47 are filled. As we need
+ // to fill only upper-triangle elements, H50-H54 from row5, H120-H126 and
+ // H127-H1211 from row12, etc. are need not be filled. As the core function
+ // process 7 values, in first iteration of 'j' only 2 values to be filled
+ // i.e., H55-H56 from row5 and H1212-H1213 from row12, etc.
+ for (int i = 5; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (5 * d_stride), 2)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (5 * d_stride), 2)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + i), s_h);
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 7: Here, the rows 6, 13, 20, 27, 34, 41 and 48 are filled. As we need
+ // to fill only upper-triangle elements, H60-H65 from row6, H130-H136 and
+ // H137-H1312 from row13, etc. are need not be filled. As the core function
+ // process 7 values, in first iteration of 'j' only 1 value to be filled
+ // i.e., H66 from row6 and H1313 from row13, etc.
+ for (int i = 6; i < wiener_win2; i += wiener_win) {
+ // Update the dgd pointers appropriately and also derive the 'j'th iteration
+ // from where the H buffer filling needs to be started.
+ INITIALIZATION(WIENER_WIN)
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (6 * d_stride), 1)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + proc_wd + (6 * d_stride), 1)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+ const __m256i s_h =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ xx_storel_64(&H[(i * wiener_win2) + i], _mm256_castsi256_si128(s_h));
+
+ // process the remaining 'j' iterations.
+ j++;
+ CALCULATE_REMAINING_H_WIN7
+ }
+
+ // Step 8: Here, the rows 7, 14, 21, 28, 35 and 42 are filled. As we need
+ // to fill only upper-triangle elements, H70-H75 from row7, H140-H146 and
+ // H147-H1413 from row14, etc. are need not be filled. The first iteration of
+ // 'j' fills H77-H713 from row7 and H1414-H1420 from row14, etc.
+ for (int i = 7; i < wiener_win2; i += wiener_win) {
+ // Derive j'th iteration from where the H buffer filling needs to be
+ // started.
+ j = i / wiener_win;
+ int shift = 0;
+ do {
+ // Update the dgd pointers appropriately.
+ int proc_ht = v_start;
+ const int16_t *d_window = d + (i / WIENER_WIN);
+ const int16_t *d_current_row =
+ d + (i / WIENER_WIN) + ((i % WIENER_WIN) * d_stride);
+ downsample_factor =
+ use_downsampled_wiener_stats ? WIENER_STATS_DOWNSAMPLE_FACTOR : 1;
+ __m256i sum_h[WIENER_WIN] = { _mm256_setzero_si256() };
+ do {
+ UPDATE_DOWNSAMPLE_FACTOR
+
+ // Process the amount of width multiple of 16.
+ while (proc_wd < wd_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd, df_reg);
+ INIT_H_VALUES(d_window + shift + proc_wd, 7)
+
+ proc_wd += 16;
+ }
+
+ // Process the remaining width here.
+ if (wd_beyond_mul16) {
+ const __m256i dgd =
+ _mm256_loadu_si256((__m256i *)(d_current_row + proc_wd));
+ const __m256i dgd_mask = _mm256_and_si256(dgd, mask);
+ const __m256i dgd_mul_df = _mm256_mullo_epi16(dgd_mask, df_reg);
+ INIT_H_VALUES(d_window + shift + proc_wd, 7)
+ }
+ proc_ht += downsample_factor;
+ d_window += downsample_factor * d_stride;
+ d_current_row += downsample_factor * d_stride;
+ } while (proc_ht < v_end);
+
+ const __m256i sh_0 =
+ hadd_four_32_to_64_avx2(sum_h[0], sum_h[1], &sum_h[2], &sum_h[3]);
+ const __m256i sh_1 =
+ hadd_four_32_to_64_avx2(sum_h[4], sum_h[5], &sum_h[6], &sum_h[6]);
+ _mm256_storeu_si256((__m256i *)(H + (i * wiener_win2) + (wiener_win * j)),
+ sh_0);
+ _mm_storeu_si128(
+ (__m128i *)(H + (i * wiener_win2) + (wiener_win * j) + 4),
+ _mm256_castsi256_si128(sh_1));
+ _mm_storel_epi64((__m128i *)&H[(i * wiener_win2) + (wiener_win * j) + 6],
+ _mm256_extracti128_si256(sh_1, 1));
+ shift++;
+ } while (++j < wiener_win);
+ }
+
+ fill_lower_triag_elements_avx2(wiener_win2, H);
}
void av1_compute_stats_avx2(int wiener_win, const uint8_t *dgd,
- const uint8_t *src, int h_start, int h_end,
+ const uint8_t *src, int16_t *dgd_avg,
+ int16_t *src_avg, int h_start, int h_end,
int v_start, int v_end, int dgd_stride,
int src_stride, int64_t *M, int64_t *H,
int use_downsampled_wiener_stats) {
- if (wiener_win == WIENER_WIN) {
- compute_stats_win7_opt_avx2(dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M, H,
- use_downsampled_wiener_stats);
- } else if (wiener_win == WIENER_WIN_CHROMA) {
- compute_stats_win5_opt_avx2(dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M, H,
- use_downsampled_wiener_stats);
- } else {
- av1_compute_stats_c(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M, H,
+ if (wiener_win != WIENER_WIN && wiener_win != WIENER_WIN_CHROMA) {
+ // Currently, libaom supports Wiener filter processing with window sizes as
+ // WIENER_WIN_CHROMA(5) and WIENER_WIN(7). For any other window size, SIMD
+ // support is not facilitated. Hence, invoke C function for the same.
+ av1_compute_stats_c(wiener_win, dgd, src, dgd_avg, src_avg, h_start, h_end,
+ v_start, v_end, dgd_stride, src_stride, M, H,
use_downsampled_wiener_stats);
+ return;
+ }
+
+ const int32_t wiener_halfwin = wiener_win >> 1;
+ const uint8_t avg =
+ calc_dgd_buf_avg_avx2(dgd, h_start, h_end, v_start, v_end, dgd_stride);
+ const int32_t width = h_end - h_start;
+ const int32_t height = v_end - v_start;
+ const int32_t d_stride = (width + 2 * wiener_halfwin + 15) & ~15;
+ const int32_t s_stride = (width + 15) & ~15;
+
+ // Based on the sf 'use_downsampled_wiener_stats', process either once for
+ // UPDATE_DOWNSAMPLE_FACTOR or for each row.
+ sub_avg_block_avx2(src + v_start * src_stride + h_start, src_stride, avg,
+ width, height, src_avg, s_stride,
+ use_downsampled_wiener_stats);
+
+ // Compute (dgd-avg) buffer here which is used to fill H buffer.
+ sub_avg_block_avx2(
+ dgd + (v_start - wiener_halfwin) * dgd_stride + h_start - wiener_halfwin,
+ dgd_stride, avg, width + 2 * wiener_halfwin, height + 2 * wiener_halfwin,
+ dgd_avg, d_stride, 0);
+ if (wiener_win == WIENER_WIN) {
+ compute_stats_win7_avx2(dgd_avg, d_stride, src_avg, s_stride, width,
+ v_start, v_end, M, H, use_downsampled_wiener_stats);
+ } else if (wiener_win == WIENER_WIN_CHROMA) {
+ compute_stats_win5_avx2(dgd_avg, d_stride, src_avg, s_stride, width,
+ v_start, v_end, M, H, use_downsampled_wiener_stats);
}
}
diff --git a/av1/encoder/x86/pickrst_sse4.c b/av1/encoder/x86/pickrst_sse4.c
index cdfcac9..50db305 100644
--- a/av1/encoder/x86/pickrst_sse4.c
+++ b/av1/encoder/x86/pickrst_sse4.c
@@ -11,6 +11,7 @@
#include <assert.h>
#include <emmintrin.h>
+#include "aom_dsp/x86/mem_sse2.h"
#include "aom_dsp/x86/synonyms.h"
#include "config/av1_rtcd.h"
@@ -62,7 +63,7 @@
M_int[k][l] += D1 * X1 + D2 * X2;
const __m128i kl =
- _mm_cvtepu8_epi16(_mm_set1_epi16(*((int16_t *)(dgd_ijk + l))));
+ _mm_cvtepu8_epi16(_mm_set1_epi16(loadu_int16(dgd_ijk + l)));
acc_stat_sse41(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
acc_stat_sse41(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
acc_stat_sse41(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
@@ -265,7 +266,7 @@
// Load two u16 values from dgd as a single u32
// Then broadcast to 4x u32 slots of a 128
- const __m128i dgd_ijkl = _mm_set1_epi32(*((int *)(dgd_ijk + l)));
+ const __m128i dgd_ijkl = _mm_set1_epi32(loadu_int32(dgd_ijk + l));
// dgd_ijkl = [y x y x y x y x] as u16
acc_stat_highbd_sse41(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle,
@@ -414,7 +415,7 @@
// Load two u16 values from dgd as a single u32
// then broadcast to 4x u32 slots of a 128
- const __m128i dgd_ijkl = _mm_set1_epi32(*((int *)(dgd_ijk + l)));
+ const __m128i dgd_ijkl = _mm_set1_epi32(loadu_int32(dgd_ijk + l));
// dgd_ijkl = [y x y x y x y x] as u16
acc_stat_highbd_sse41(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle,
@@ -574,7 +575,7 @@
M_int[k][l] += D1 * X1 + D2 * X2;
const __m128i kl =
- _mm_cvtepu8_epi16(_mm_set1_epi16(*((int16_t *)(dgd_ijk + l))));
+ _mm_cvtepu8_epi16(_mm_set1_epi16(loadu_int16(dgd_ijk + l)));
acc_stat_sse41(H_ + 0 * 8, dgd_ij + 0 * dgd_stride, shuffle, &kl);
acc_stat_sse41(H_ + 1 * 8, dgd_ij + 1 * dgd_stride, shuffle, &kl);
acc_stat_sse41(H_ + 2 * 8, dgd_ij + 2 * dgd_stride, shuffle, &kl);
@@ -703,7 +704,8 @@
}
}
void av1_compute_stats_sse4_1(int wiener_win, const uint8_t *dgd,
- const uint8_t *src, int h_start, int h_end,
+ const uint8_t *src, int16_t *dgd_avg,
+ int16_t *src_avg, int h_start, int h_end,
int v_start, int v_end, int dgd_stride,
int src_stride, int64_t *M, int64_t *H,
int use_downsampled_wiener_stats) {
@@ -716,8 +718,8 @@
dgd_stride, src_stride, M, H,
use_downsampled_wiener_stats);
} else {
- av1_compute_stats_c(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M, H,
+ av1_compute_stats_c(wiener_win, dgd, src, dgd_avg, src_avg, h_start, h_end,
+ v_start, v_end, dgd_stride, src_stride, M, H,
use_downsampled_wiener_stats);
}
}
diff --git a/av1/encoder/x86/reconinter_enc_sse2.c b/av1/encoder/x86/reconinter_enc_sse2.c
index d33fec7..a492483 100644
--- a/av1/encoder/x86/reconinter_enc_sse2.c
+++ b/av1/encoder/x86/reconinter_enc_sse2.c
@@ -345,20 +345,3 @@
pred += 16;
}
}
-
-void aom_comp_mask_upsampled_pred_sse2(
- MACROBLOCKD *xd, const AV1_COMMON *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search) {
- if (subpel_x_q3 | subpel_y_q3) {
- aom_upsampled_pred(xd, cm, mi_row, mi_col, mv, comp_pred, width, height,
- subpel_x_q3, subpel_y_q3, ref, ref_stride,
- subpel_search);
- ref = comp_pred;
- ref_stride = width;
- }
- aom_comp_mask_pred(comp_pred, pred, width, height, ref, ref_stride, mask,
- mask_stride, invert_mask);
-}
diff --git a/av1/encoder/x86/temporal_filter_avx2.c b/av1/encoder/x86/temporal_filter_avx2.c
index a9c8004..752d6f3 100644
--- a/av1/encoder/x86/temporal_filter_avx2.c
+++ b/av1/encoder/x86/temporal_filter_avx2.c
@@ -30,6 +30,205 @@
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 10, 11, 10, 11 }
};
+#define CALC_X_GRADIENT(AC, GI, DF, out) \
+ out = _mm256_abs_epi16( \
+ _mm256_add_epi16(_mm256_add_epi16(AC, GI), _mm256_slli_epi16(DF, 1)));
+
+#define CALC_Y_GRADIENT(AC, GI, BH, out) \
+ out = _mm256_abs_epi16( \
+ _mm256_add_epi16(_mm256_sub_epi16(AC, GI), _mm256_slli_epi16(BH, 1)));
+
+double av1_estimate_noise_from_single_plane_avx2(const uint8_t *src, int height,
+ int width, int stride,
+ int edge_thresh) {
+ int count = 0;
+ int64_t accum = 0;
+ // w32 stores width multiple of 32.
+ const int w32 = (width - 1) & ~0x1f;
+ const __m256i zero = _mm256_setzero_si256();
+ const __m256i edge_threshold = _mm256_set1_epi16(edge_thresh);
+ __m256i num_accumulator = zero;
+ __m256i sum_accumulator = zero;
+
+ // A | B | C
+ // D | E | F
+ // G | H | I
+ // g_x = (A - C) + (G - I) + 2*(D - F)
+ // g_y = (A + C) - (G + I) + 2*(B - H)
+ // v = 4*E - 2*(D+F+B+H) + (A+C+G+I)
+
+ // Process the width multiple of 32 here.
+ for (int w = 1; w < w32; w += 32) {
+ int h = 1;
+ const int start_idx = h * stride + w;
+ const int stride_0 = start_idx - stride;
+
+ __m256i num_accum_row_lvl = zero;
+ const __m256i A = _mm256_loadu_si256((__m256i *)(&src[stride_0 - 1]));
+ const __m256i C = _mm256_loadu_si256((__m256i *)(&src[stride_0 + 1]));
+ const __m256i D = _mm256_loadu_si256((__m256i *)(&src[start_idx - 1]));
+ const __m256i F = _mm256_loadu_si256((__m256i *)(&src[start_idx + 1]));
+ __m256i B = _mm256_loadu_si256((__m256i *)(&src[stride_0]));
+ __m256i E = _mm256_loadu_si256((__m256i *)(&src[start_idx]));
+
+ const __m256i A_lo = _mm256_unpacklo_epi8(A, zero);
+ const __m256i A_hi = _mm256_unpackhi_epi8(A, zero);
+ const __m256i C_lo = _mm256_unpacklo_epi8(C, zero);
+ const __m256i C_hi = _mm256_unpackhi_epi8(C, zero);
+ const __m256i D_lo = _mm256_unpacklo_epi8(D, zero);
+ const __m256i D_hi = _mm256_unpackhi_epi8(D, zero);
+ const __m256i F_lo = _mm256_unpacklo_epi8(F, zero);
+ const __m256i F_hi = _mm256_unpackhi_epi8(F, zero);
+
+ __m256i sub_AC_lo = _mm256_sub_epi16(A_lo, C_lo);
+ __m256i sub_AC_hi = _mm256_sub_epi16(A_hi, C_hi);
+ __m256i sum_AC_lo = _mm256_add_epi16(A_lo, C_lo);
+ __m256i sum_AC_hi = _mm256_add_epi16(A_hi, C_hi);
+ __m256i sub_DF_lo = _mm256_sub_epi16(D_lo, F_lo);
+ __m256i sub_DF_hi = _mm256_sub_epi16(D_hi, F_hi);
+ __m256i sum_DF_lo = _mm256_add_epi16(D_lo, F_lo);
+ __m256i sum_DF_hi = _mm256_add_epi16(D_hi, F_hi);
+
+ for (; h < height - 1; h++) {
+ __m256i sum_GI_lo, sub_GI_lo, sum_GI_hi, sub_GI_hi, gx_lo, gy_lo, gx_hi,
+ gy_hi;
+ const int k = h * stride + w;
+ const __m256i G = _mm256_loadu_si256((__m256i *)(&src[k + stride - 1]));
+ const __m256i H = _mm256_loadu_si256((__m256i *)(&src[k + stride]));
+ const __m256i I = _mm256_loadu_si256((__m256i *)(&src[k + stride + 1]));
+
+ const __m256i B_lo = _mm256_unpacklo_epi8(B, zero);
+ const __m256i B_hi = _mm256_unpackhi_epi8(B, zero);
+ const __m256i G_lo = _mm256_unpacklo_epi8(G, zero);
+ const __m256i G_hi = _mm256_unpackhi_epi8(G, zero);
+ const __m256i I_lo = _mm256_unpacklo_epi8(I, zero);
+ const __m256i I_hi = _mm256_unpackhi_epi8(I, zero);
+ const __m256i H_lo = _mm256_unpacklo_epi8(H, zero);
+ const __m256i H_hi = _mm256_unpackhi_epi8(H, zero);
+
+ sub_GI_lo = _mm256_sub_epi16(G_lo, I_lo);
+ sub_GI_hi = _mm256_sub_epi16(G_hi, I_hi);
+ sum_GI_lo = _mm256_add_epi16(G_lo, I_lo);
+ sum_GI_hi = _mm256_add_epi16(G_hi, I_hi);
+ const __m256i sub_BH_lo = _mm256_sub_epi16(B_lo, H_lo);
+ const __m256i sub_BH_hi = _mm256_sub_epi16(B_hi, H_hi);
+
+ CALC_X_GRADIENT(sub_AC_lo, sub_GI_lo, sub_DF_lo, gx_lo)
+ CALC_Y_GRADIENT(sum_AC_lo, sum_GI_lo, sub_BH_lo, gy_lo)
+
+ const __m256i ga_lo = _mm256_add_epi16(gx_lo, gy_lo);
+
+ CALC_X_GRADIENT(sub_AC_hi, sub_GI_hi, sub_DF_hi, gx_hi)
+ CALC_Y_GRADIENT(sum_AC_hi, sum_GI_hi, sub_BH_hi, gy_hi)
+
+ const __m256i ga_hi = _mm256_add_epi16(gx_hi, gy_hi);
+
+ __m256i cmp_lo = _mm256_cmpgt_epi16(edge_threshold, ga_lo);
+ __m256i cmp_hi = _mm256_cmpgt_epi16(edge_threshold, ga_hi);
+ const __m256i comp_reg = _mm256_add_epi16(cmp_lo, cmp_hi);
+
+ // v = 4*E -2*(D+F+B+H) + (A+C+G+I)
+ if (_mm256_movemask_epi8(comp_reg) != 0) {
+ const __m256i sum_BH_lo = _mm256_add_epi16(B_lo, H_lo);
+ const __m256i sum_BH_hi = _mm256_add_epi16(B_hi, H_hi);
+
+ // 2*(D+F+B+H)
+ const __m256i sum_DFBH_lo =
+ _mm256_slli_epi16(_mm256_add_epi16(sum_DF_lo, sum_BH_lo), 1);
+ // (A+C+G+I)
+ const __m256i sum_ACGI_lo = _mm256_add_epi16(sum_AC_lo, sum_GI_lo);
+ const __m256i sum_DFBH_hi =
+ _mm256_slli_epi16(_mm256_add_epi16(sum_DF_hi, sum_BH_hi), 1);
+ const __m256i sum_ACGI_hi = _mm256_add_epi16(sum_AC_hi, sum_GI_hi);
+
+ // Convert E register values from 8bit to 16bit
+ const __m256i E_lo = _mm256_unpacklo_epi8(E, zero);
+ const __m256i E_hi = _mm256_unpackhi_epi8(E, zero);
+
+ // 4*E - 2*(D+F+B+H)+ (A+C+G+I)
+ const __m256i var_lo_0 = _mm256_abs_epi16(_mm256_add_epi16(
+ _mm256_sub_epi16(_mm256_slli_epi16(E_lo, 2), sum_DFBH_lo),
+ sum_ACGI_lo));
+ const __m256i var_hi_0 = _mm256_abs_epi16(_mm256_add_epi16(
+ _mm256_sub_epi16(_mm256_slli_epi16(E_hi, 2), sum_DFBH_hi),
+ sum_ACGI_hi));
+ cmp_lo = _mm256_srli_epi16(cmp_lo, 15);
+ cmp_hi = _mm256_srli_epi16(cmp_hi, 15);
+ const __m256i var_lo = _mm256_mullo_epi16(var_lo_0, cmp_lo);
+ const __m256i var_hi = _mm256_mullo_epi16(var_hi_0, cmp_hi);
+
+ num_accum_row_lvl = _mm256_add_epi16(num_accum_row_lvl, cmp_lo);
+ num_accum_row_lvl = _mm256_add_epi16(num_accum_row_lvl, cmp_hi);
+
+ sum_accumulator = _mm256_add_epi32(sum_accumulator,
+ _mm256_unpacklo_epi16(var_lo, zero));
+ sum_accumulator = _mm256_add_epi32(sum_accumulator,
+ _mm256_unpackhi_epi16(var_lo, zero));
+ sum_accumulator = _mm256_add_epi32(sum_accumulator,
+ _mm256_unpacklo_epi16(var_hi, zero));
+ sum_accumulator = _mm256_add_epi32(sum_accumulator,
+ _mm256_unpackhi_epi16(var_hi, zero));
+ }
+ sub_AC_lo = sub_DF_lo;
+ sub_AC_hi = sub_DF_hi;
+ sub_DF_lo = sub_GI_lo;
+ sub_DF_hi = sub_GI_hi;
+ sum_AC_lo = sum_DF_lo;
+ sum_AC_hi = sum_DF_hi;
+ sum_DF_lo = sum_GI_lo;
+ sum_DF_hi = sum_GI_hi;
+ B = E;
+ E = H;
+ }
+ const __m256i num_0 = _mm256_unpacklo_epi16(num_accum_row_lvl, zero);
+ const __m256i num_1 = _mm256_unpackhi_epi16(num_accum_row_lvl, zero);
+ num_accumulator =
+ _mm256_add_epi32(num_accumulator, _mm256_add_epi32(num_0, num_1));
+ }
+
+ // Process the remaining width here.
+ for (int h = 1; h < height - 1; ++h) {
+ for (int w = w32 + 1; w < width - 1; ++w) {
+ const int k = h * stride + w;
+
+ // Compute sobel gradients
+ const int g_x = (src[k - stride - 1] - src[k - stride + 1]) +
+ (src[k + stride - 1] - src[k + stride + 1]) +
+ 2 * (src[k - 1] - src[k + 1]);
+ const int g_y = (src[k - stride - 1] - src[k + stride - 1]) +
+ (src[k - stride + 1] - src[k + stride + 1]) +
+ 2 * (src[k - stride] - src[k + stride]);
+ const int ga = abs(g_x) + abs(g_y);
+
+ if (ga < edge_thresh) {
+ // Find Laplacian
+ const int v =
+ 4 * src[k] -
+ 2 * (src[k - 1] + src[k + 1] + src[k - stride] + src[k + stride]) +
+ (src[k - stride - 1] + src[k - stride + 1] + src[k + stride - 1] +
+ src[k + stride + 1]);
+ accum += abs(v);
+ ++count;
+ }
+ }
+ }
+
+ // s0 s1 n0 n1 s2 s3 n2 n3
+ __m256i sum_avx = _mm256_hadd_epi32(sum_accumulator, num_accumulator);
+ __m128i sum_avx_lo = _mm256_castsi256_si128(sum_avx);
+ __m128i sum_avx_hi = _mm256_extractf128_si256(sum_avx, 1);
+ // s0+s2 s1+s3 n0+n2 n1+n3
+ __m128i sum_avx_1 = _mm_add_epi32(sum_avx_lo, sum_avx_hi);
+ // s0+s2+s1+s3 n0+n2+n1+n3
+ __m128i result = _mm_add_epi32(_mm_srli_si128(sum_avx_1, 4), sum_avx_1);
+
+ accum += _mm_cvtsi128_si32(result);
+ count += _mm_extract_epi32(result, 2);
+
+ // If very few smooth pels, return -1 since the estimate is unreliable.
+ return (count < 16) ? -1.0 : (double)accum / (6 * count) * SQRT_PI_BY_2;
+}
+
static AOM_FORCE_INLINE void get_squared_error_16x16_avx2(
const uint8_t *frame1, const unsigned int stride, const uint8_t *frame2,
const unsigned int stride2, const int block_width, const int block_height,
@@ -127,13 +326,31 @@
return _mm_extract_epi32(v128a, 0);
}
+// AVX2 implementation of approx_exp()
+static AOM_INLINE __m256 approx_exp_avx2(__m256 y) {
+#define A ((1 << 23) / 0.69314718056f) // (1 << 23) / ln(2)
+#define B \
+ 127 // Offset for the exponent according to IEEE floating point standard.
+#define C 60801 // Magic number controls the accuracy of approximation
+ const __m256 multiplier = _mm256_set1_ps(A);
+ const __m256i offset = _mm256_set1_epi32(B * (1 << 23) - C);
+
+ y = _mm256_mul_ps(y, multiplier);
+ y = _mm256_castsi256_ps(_mm256_add_epi32(_mm256_cvttps_epi32(y), offset));
+ return y;
+#undef A
+#undef B
+#undef C
+}
+
static void apply_temporal_filter(
const uint8_t *frame1, const unsigned int stride, const uint8_t *frame2,
const unsigned int stride2, const int block_width, const int block_height,
const int *subblock_mses, unsigned int *accumulator, uint16_t *count,
uint16_t *frame_sse, uint32_t *luma_sse_sum,
const double inv_num_ref_pixels, const double decay_factor,
- const double inv_factor, const double weight_factor, double *d_factor) {
+ const double inv_factor, const double weight_factor, double *d_factor,
+ int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -192,25 +409,140 @@
}
}
- for (int i = 0, k = 0; i < block_height; i++) {
- for (int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame2[i * stride2 + j];
- uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+ double subblock_mses_scaled[4];
+ double d_factor_decayed[4];
+ for (int idx = 0; idx < 4; idx++) {
+ subblock_mses_scaled[idx] = subblock_mses[idx] * inv_factor;
+ d_factor_decayed[idx] = d_factor[idx] * decay_factor;
+ }
+ if (tf_wgt_calc_lvl == 0) {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
- count[k] += weight;
- accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
+ }
+ } else {
+ __m256d subblock_mses_reg[4];
+ __m256d d_factor_mul_n_decay_qr_invs[4];
+ const __m256 zero = _mm256_set1_ps(0.0f);
+ const __m256 point_five = _mm256_set1_ps(0.5f);
+ const __m256 seven = _mm256_set1_ps(7.0f);
+ const __m256d inv_num_ref_pixel_256bit = _mm256_set1_pd(inv_num_ref_pixels);
+ const __m256d weight_factor_256bit = _mm256_set1_pd(weight_factor);
+ const __m256 tf_weight_scale = _mm256_set1_ps((float)TF_WEIGHT_SCALE);
+ // Maintain registers to hold mse and d_factor at subblock level.
+ subblock_mses_reg[0] = _mm256_set1_pd(subblock_mses_scaled[0]);
+ subblock_mses_reg[1] = _mm256_set1_pd(subblock_mses_scaled[1]);
+ subblock_mses_reg[2] = _mm256_set1_pd(subblock_mses_scaled[2]);
+ subblock_mses_reg[3] = _mm256_set1_pd(subblock_mses_scaled[3]);
+ d_factor_mul_n_decay_qr_invs[0] = _mm256_set1_pd(d_factor_decayed[0]);
+ d_factor_mul_n_decay_qr_invs[1] = _mm256_set1_pd(d_factor_decayed[1]);
+ d_factor_mul_n_decay_qr_invs[2] = _mm256_set1_pd(d_factor_decayed[2]);
+ d_factor_mul_n_decay_qr_invs[3] = _mm256_set1_pd(d_factor_decayed[3]);
+
+ for (int i = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ uint32_t *luma_sse_sum_temp = luma_sse_sum + i * BW;
+ for (int j = 0; j < block_width; j += 8) {
+ const __m256i acc_sse =
+ _mm256_lddqu_si256((__m256i *)(acc_5x5_sse[i] + j));
+ const __m256i luma_sse =
+ _mm256_lddqu_si256((__m256i *)((luma_sse_sum_temp + j)));
+
+ // uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+ const __m256i diff_sse = _mm256_add_epi32(acc_sse, luma_sse);
+
+ const __m256d diff_sse_pd_1 =
+ _mm256_cvtepi32_pd(_mm256_castsi256_si128(diff_sse));
+ const __m256d diff_sse_pd_2 =
+ _mm256_cvtepi32_pd(_mm256_extracti128_si256(diff_sse, 1));
+
+ // const double window_error = diff_sse * inv_num_ref_pixels;
+ const __m256d window_error_1 =
+ _mm256_mul_pd(diff_sse_pd_1, inv_num_ref_pixel_256bit);
+ const __m256d window_error_2 =
+ _mm256_mul_pd(diff_sse_pd_2, inv_num_ref_pixel_256bit);
+
+ // const int subblock_idx = y_blk_raster_offset + (j >= block_width /
+ // 2);
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+ const __m256d blk_error = subblock_mses_reg[subblock_idx];
+
+ // const double combined_error =
+ // weight_factor *window_error + subblock_mses_scaled[subblock_idx];
+ const __m256d combined_error_1 = _mm256_add_pd(
+ _mm256_mul_pd(window_error_1, weight_factor_256bit), blk_error);
+
+ const __m256d combined_error_2 = _mm256_add_pd(
+ _mm256_mul_pd(window_error_2, weight_factor_256bit), blk_error);
+
+ // d_factor_decayed[subblock_idx]
+ const __m256d d_fact_mul_n_decay =
+ d_factor_mul_n_decay_qr_invs[subblock_idx];
+
+ // double scaled_error = combined_error *
+ // d_factor_decayed[subblock_idx];
+ const __m256d scaled_error_1 =
+ _mm256_mul_pd(combined_error_1, d_fact_mul_n_decay);
+ const __m256d scaled_error_2 =
+ _mm256_mul_pd(combined_error_2, d_fact_mul_n_decay);
+
+ const __m128 scaled_error_ps_1 = _mm256_cvtpd_ps(scaled_error_1);
+ const __m128 scaled_error_ps_2 = _mm256_cvtpd_ps(scaled_error_2);
+
+ const __m256 scaled_error_ps = _mm256_insertf128_ps(
+ _mm256_castps128_ps256(scaled_error_ps_1), scaled_error_ps_2, 0x1);
+
+ // scaled_error = AOMMIN(scaled_error, 7);
+ const __m256 scaled_diff_ps = _mm256_min_ps(scaled_error_ps, seven);
+ const __m256 minus_scaled_diff_ps = _mm256_sub_ps(zero, scaled_diff_ps);
+ // const int weight =
+ //(int)(approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE + 0.5f);
+ const __m256 exp_result = approx_exp_avx2(minus_scaled_diff_ps);
+ const __m256 scale_weight_exp_result =
+ _mm256_mul_ps(exp_result, tf_weight_scale);
+ const __m256 round_result =
+ _mm256_add_ps(scale_weight_exp_result, point_five);
+ __m256i weights_in_32bit = _mm256_cvttps_epi32(round_result);
+
+ __m128i weights_in_16bit =
+ _mm_packus_epi32(_mm256_castsi256_si128(weights_in_32bit),
+ _mm256_extractf128_si256(weights_in_32bit, 0x1));
+
+ // count[k] += weight;
+ // accumulator[k] += weight * pixel_value;
+ const int stride_idx = i * stride2 + j;
+ const __m128i count_array =
+ _mm_loadu_si128((__m128i *)(count + stride_idx));
+ _mm_storeu_si128((__m128i *)(count + stride_idx),
+ _mm_add_epi16(count_array, weights_in_16bit));
+
+ const __m256i accumulator_array =
+ _mm256_loadu_si256((__m256i *)(accumulator + stride_idx));
+ const __m128i pred_values =
+ _mm_loadl_epi64((__m128i *)(frame2 + stride_idx));
+
+ const __m256i pred_values_u32 = _mm256_cvtepu8_epi32(pred_values);
+ const __m256i mull_frame2_weight_u32 =
+ _mm256_mullo_epi32(pred_values_u32, weights_in_32bit);
+ _mm256_storeu_si256(
+ (__m256i *)(accumulator + stride_idx),
+ _mm256_add_epi32(accumulator_array, mull_frame2_weight_u32));
+ }
}
}
}
@@ -220,7 +552,8 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
const int is_high_bitdepth = frame_to_filter->flags & YV12_FLAG_HIGHBITDEPTH;
assert(block_size == BLOCK_32X32 && "Only support 32x32 block with avx2!");
assert(TF_WINDOW_LENGTH == 5 && "Only support window length 5 with avx2!");
@@ -308,7 +641,7 @@
plane_w, plane_h, subblock_mses, accum + plane_offset,
count + plane_offset, frame_sse, luma_sse_sum,
inv_num_ref_pixels, decay_factor, inv_factor,
- weight_factor, d_factor);
+ weight_factor, d_factor, tf_wgt_calc_lvl);
plane_offset += plane_h * plane_w;
}
}
diff --git a/av1/encoder/x86/temporal_filter_sse2.c b/av1/encoder/x86/temporal_filter_sse2.c
index 8be7164..842d3b1 100644
--- a/av1/encoder/x86/temporal_filter_sse2.c
+++ b/av1/encoder/x86/temporal_filter_sse2.c
@@ -13,6 +13,7 @@
#include <emmintrin.h>
#include "config/av1_rtcd.h"
+#include "aom_dsp/mathutils.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/temporal_filter.h"
@@ -107,7 +108,8 @@
const int *subblock_mses, unsigned int *accumulator, uint16_t *count,
uint16_t *frame_sse, uint32_t *luma_sse_sum,
const double inv_num_ref_pixels, const double decay_factor,
- const double inv_factor, const double weight_factor, double *d_factor) {
+ const double inv_factor, const double weight_factor, double *d_factor,
+ int tf_wgt_calc_lvl) {
assert(((block_width == 16) || (block_width == 32)) &&
((block_height == 16) || (block_height == 32)));
@@ -168,25 +170,52 @@
}
}
- for (int i = 0, k = 0; i < block_height; i++) {
- for (int j = 0; j < block_width; j++, k++) {
- const int pixel_value = frame2[i * stride2 + j];
- uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+ double subblock_mses_scaled[4];
+ double d_factor_decayed[4];
+ for (int idx = 0; idx < 4; idx++) {
+ subblock_mses_scaled[idx] = subblock_mses[idx] * inv_factor;
+ d_factor_decayed[idx] = d_factor[idx] * decay_factor;
+ }
+ if (tf_wgt_calc_lvl == 0) {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
- const double window_error = diff_sse * inv_num_ref_pixels;
- const int subblock_idx =
- (i >= block_height / 2) * 2 + (j >= block_width / 2);
- const double block_error = (double)subblock_mses[subblock_idx];
- const double combined_error =
- weight_factor * window_error + block_error * inv_factor;
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
- double scaled_error =
- combined_error * d_factor[subblock_idx] * decay_factor;
- scaled_error = AOMMIN(scaled_error, 7);
- const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const int weight = (int)(exp(-scaled_error) * TF_WEIGHT_SCALE);
- count[k] += weight;
- accumulator[k] += weight * pixel_value;
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
+ }
+ } else {
+ for (int i = 0, k = 0; i < block_height; i++) {
+ const int y_blk_raster_offset = (i >= block_height / 2) * 2;
+ for (int j = 0; j < block_width; j++, k++) {
+ const int pixel_value = frame2[i * stride2 + j];
+ uint32_t diff_sse = acc_5x5_sse[i][j] + luma_sse_sum[i * BW + j];
+
+ const double window_error = diff_sse * inv_num_ref_pixels;
+ const int subblock_idx = y_blk_raster_offset + (j >= block_width / 2);
+ const double combined_error =
+ weight_factor * window_error + subblock_mses_scaled[subblock_idx];
+
+ double scaled_error = combined_error * d_factor_decayed[subblock_idx];
+ scaled_error = AOMMIN(scaled_error, 7);
+ const float fweight =
+ approx_exp((float)-scaled_error) * TF_WEIGHT_SCALE;
+ const int weight = iroundpf(fweight);
+ count[k] += weight;
+ accumulator[k] += weight * pixel_value;
+ }
}
}
}
@@ -196,7 +225,8 @@
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_levels, const MV *subblock_mvs,
const int *subblock_mses, const int q_factor, const int filter_strength,
- const uint8_t *pred, uint32_t *accum, uint16_t *count) {
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum,
+ uint16_t *count) {
const int is_high_bitdepth = frame_to_filter->flags & YV12_FLAG_HIGHBITDEPTH;
assert(block_size == BLOCK_32X32 && "Only support 32x32 block with sse2!");
assert(TF_WINDOW_LENGTH == 5 && "Only support window length 5 with sse2!");
@@ -284,7 +314,7 @@
plane_w, plane_h, subblock_mses, accum + plane_offset,
count + plane_offset, frame_sse, luma_sse_sum,
inv_num_ref_pixels, decay_factor, inv_factor,
- weight_factor, d_factor);
+ weight_factor, d_factor, tf_wgt_calc_lvl);
plane_offset += plane_h * plane_w;
}
}
diff --git a/av1/encoder/x86/wedge_utils_avx2.c b/av1/encoder/x86/wedge_utils_avx2.c
index bbc62d5..9cde860 100644
--- a/av1/encoder/x86/wedge_utils_avx2.c
+++ b/av1/encoder/x86/wedge_utils_avx2.c
@@ -72,7 +72,7 @@
__m128i v_acc_q_0 = _mm256_castsi256_si128(v_acc0_q);
__m128i v_acc_q_1 = _mm256_extracti128_si256(v_acc0_q, 1);
v_acc_q_0 = _mm_add_epi64(v_acc_q_0, v_acc_q_1);
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
csse = (uint64_t)_mm_extract_epi64(v_acc_q_0, 0);
#else
xx_storel_64(&csse, v_acc_q_0);
@@ -141,7 +141,7 @@
__m128i v_acc_q_1 = _mm256_extracti128_si256(v_acc_q, 1);
v_acc_q_0 = _mm_add_epi64(v_acc_q_0, v_acc_q_1);
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
acc = _mm_extract_epi64(v_acc_q_0, 0);
#else
xx_storel_64(&acc, v_acc_q_0);
diff --git a/av1/encoder/x86/wedge_utils_sse2.c b/av1/encoder/x86/wedge_utils_sse2.c
index e665b2e..d7ac222 100644
--- a/av1/encoder/x86/wedge_utils_sse2.c
+++ b/av1/encoder/x86/wedge_utils_sse2.c
@@ -85,7 +85,7 @@
v_acc0_q = _mm_add_epi64(v_acc0_q, _mm_srli_si128(v_acc0_q, 8));
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
csse = (uint64_t)_mm_cvtsi128_si64(v_acc0_q);
#else
xx_storel_64(&csse, v_acc0_q);
@@ -174,7 +174,7 @@
v_acc_q = _mm_add_epi64(v_acc_q, _mm_srli_si128(v_acc_q, 8));
-#if ARCH_X86_64
+#if AOM_ARCH_X86_64
acc = _mm_cvtsi128_si64(v_acc_q);
#else
xx_storel_64(&acc, v_acc_q);
diff --git a/av1/qmode_rc/ducky_encode.cc b/av1/qmode_rc/ducky_encode.cc
deleted file mode 100644
index bd4b766..0000000
--- a/av1/qmode_rc/ducky_encode.cc
+++ /dev/null
@@ -1,718 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-#include <stdlib.h>
-#include <string.h>
-#include <algorithm>
-#include <memory>
-#include <numeric>
-#include <vector>
-
-#include "av1/common/enums.h"
-#include "av1/encoder/rd.h"
-#include "config/aom_config.h"
-
-#include "aom/aom_encoder.h"
-
-#include "av1/av1_cx_iface.h"
-#include "av1/av1_iface_common.h"
-#include "av1/encoder/encoder.h"
-#include "av1/encoder/ethread.h"
-#include "av1/encoder/firstpass.h"
-#include "av1/encoder/temporal_filter.h"
-#include "av1/qmode_rc/ducky_encode.h"
-
-#include "common/tools_common.h"
-
-namespace aom {
-struct EncoderResource {
- STATS_BUFFER_CTX *stats_buf_ctx;
- FIRSTPASS_STATS *stats_buffer;
- aom_image_t img;
- AV1_PRIMARY *ppi;
- int lookahead_push_count;
- int encode_frame_count; // Use in second pass only
-};
-
-class DuckyEncode::EncodeImpl {
- public:
- VideoInfo video_info;
- int g_usage;
- int max_ref_frames;
- int speed;
- int base_qindex;
- BLOCK_SIZE sb_size;
- enum aom_rc_mode rc_end_usage;
- aom_rational64_t timestamp_ratio;
- std::vector<FIRSTPASS_STATS> stats_list;
- EncoderResource enc_resource;
- struct AvxInputContext input;
-};
-
-DuckyEncode::DuckyEncode(const VideoInfo &video_info, BLOCK_SIZE sb_size,
- int max_ref_frames, int speed, int base_qindex) {
- impl_ptr_ = std::unique_ptr<EncodeImpl>(new EncodeImpl());
- impl_ptr_->video_info = video_info;
- impl_ptr_->g_usage = GOOD;
- impl_ptr_->max_ref_frames = max_ref_frames;
- impl_ptr_->speed = speed;
- impl_ptr_->base_qindex = base_qindex;
- impl_ptr_->sb_size = sb_size;
- impl_ptr_->rc_end_usage = AOM_Q;
- // TODO(angiebird): Set timestamp_ratio properly
- // timestamp_ratio.den = cfg->g_timebase.den;
- // timestamp_ratio.num = (int64_t)cfg->g_timebase.num * TICKS_PER_SEC;
- impl_ptr_->timestamp_ratio = { 1, 1 };
- // TODO(angiebird): How to set ptsvol and duration?
- impl_ptr_->input.filename = impl_ptr_->video_info.file_path.c_str();
-}
-
-DuckyEncode::~DuckyEncode() {}
-
-static AV1EncoderConfig GetEncoderConfig(const VideoInfo &video_info,
- int g_usage, aom_enc_pass pass) {
- const aom_codec_iface *codec = aom_codec_av1_cx();
- aom_codec_enc_cfg_t cfg;
- aom_codec_enc_config_default(codec, &cfg, g_usage);
- cfg.g_w = video_info.frame_width;
- cfg.g_h = video_info.frame_height;
- cfg.g_pass = pass;
- // g_timebase is the inverse of frame_rate
- cfg.g_timebase.num = video_info.frame_rate.den;
- cfg.g_timebase.den = video_info.frame_rate.num;
- if (pass == AOM_RC_SECOND_PASS) {
- cfg.rc_twopass_stats_in.sz =
- (video_info.frame_count + 1) * sizeof(FIRSTPASS_STATS);
- }
- AV1EncoderConfig oxcf = av1_get_encoder_config(&cfg);
- // TODO(angiebird): Why didn't we init use_highbitdepth in
- // av1_get_encoder_config()?
- oxcf.use_highbitdepth = 0;
-
- // TODO(jingning): Change this to 35 when the baseline rate control
- // logic is in place.
- // Force maximum look ahead buffer to be 19. This will disable the use
- // of maximum 32 GOP length.
- oxcf.gf_cfg.lag_in_frames = 19;
-
- return oxcf;
-}
-
-static STATS_BUFFER_CTX *CreateStatsBufferCtx(int frame_count,
- FIRSTPASS_STATS **stats_buffer) {
- STATS_BUFFER_CTX *stats_buf_ctx = new STATS_BUFFER_CTX;
- // +2 is for total_stats and total_left_stats
- *stats_buffer = new FIRSTPASS_STATS[frame_count + 2];
- stats_buf_ctx->stats_in_start = *stats_buffer;
- stats_buf_ctx->stats_in_end = stats_buf_ctx->stats_in_start;
- stats_buf_ctx->stats_in_buf_end = stats_buf_ctx->stats_in_start + frame_count;
- stats_buf_ctx->total_stats = stats_buf_ctx->stats_in_buf_end;
- stats_buf_ctx->total_left_stats =
- stats_buf_ctx->stats_in_start + frame_count + 1;
- for (FIRSTPASS_STATS *buffer = stats_buf_ctx->stats_in_start;
- buffer <= stats_buf_ctx->total_left_stats; ++buffer) {
- av1_twopass_zero_stats(buffer);
- }
- return stats_buf_ctx;
-}
-
-static void DestroyStatsBufferCtx(STATS_BUFFER_CTX **stats_buf_context,
- FIRSTPASS_STATS **stats_buffer) {
- (*stats_buf_context)->stats_in_start = nullptr;
- (*stats_buf_context)->stats_in_end = nullptr;
- (*stats_buf_context)->stats_in_buf_end = nullptr;
- (*stats_buf_context)->total_stats = nullptr;
- (*stats_buf_context)->total_left_stats = nullptr;
- delete *stats_buf_context;
- *stats_buf_context = nullptr;
- delete[](*stats_buffer);
- *stats_buffer = nullptr;
-}
-
-static FIRSTPASS_STATS ComputeTotalStats(
- const std::vector<FIRSTPASS_STATS> &stats_list) {
- FIRSTPASS_STATS total_stats = {};
- for (size_t i = 0; i < stats_list.size(); ++i) {
- av1_accumulate_stats(&total_stats, &stats_list[i]);
- }
- return total_stats;
-}
-
-static bool FileIsY4m(const char detect[4]) {
- return memcmp(detect, "YUV4", 4) == 0;
-}
-
-static bool FourccIsIvf(const char detect[4]) {
- return memcmp(detect, "DKIF", 4) == 0;
-}
-
-static void OpenInputFile(struct AvxInputContext *input) {
- input->file = fopen(input->filename, "rb");
- /* For RAW input sources, these bytes will applied on the first frame
- * in read_frame().
- */
- input->detect.buf_read = fread(input->detect.buf, 1, 4, input->file);
- input->detect.position = 0;
- aom_chroma_sample_position_t const csp = AOM_CSP_UNKNOWN;
- if (input->detect.buf_read == 4 && FileIsY4m(input->detect.buf)) {
- if (y4m_input_open(&input->y4m, input->file, input->detect.buf, 4, csp,
- input->only_i420) >= 0) {
- input->file_type = FILE_TYPE_Y4M;
- input->width = input->y4m.pic_w;
- input->height = input->y4m.pic_h;
- input->pixel_aspect_ratio.numerator = input->y4m.par_n;
- input->pixel_aspect_ratio.denominator = input->y4m.par_d;
- input->framerate.numerator = input->y4m.fps_n;
- input->framerate.denominator = input->y4m.fps_d;
- input->fmt = input->y4m.aom_fmt;
- input->bit_depth = static_cast<aom_bit_depth_t>(input->y4m.bit_depth);
- input->color_range = input->y4m.color_range;
- } else
- fatal("Unsupported Y4M stream.");
- } else if (input->detect.buf_read == 4 && FourccIsIvf(input->detect.buf)) {
- fatal("IVF is not supported as input.");
- } else {
- input->file_type = FILE_TYPE_RAW;
- }
-}
-
-void DuckyEncode::InitEncoder(aom_enc_pass pass,
- const std::vector<FIRSTPASS_STATS> *stats_list) {
- EncoderResource enc_resource = {};
- enc_resource.lookahead_push_count = 0;
- OpenInputFile(&impl_ptr_->input);
- if (impl_ptr_->input.file_type != FILE_TYPE_Y4M) {
- aom_img_alloc(&enc_resource.img, impl_ptr_->video_info.img_fmt,
- impl_ptr_->video_info.frame_width,
- impl_ptr_->video_info.frame_height, /*align=*/1);
- }
- AV1EncoderConfig oxcf =
- GetEncoderConfig(impl_ptr_->video_info, impl_ptr_->g_usage, pass);
- oxcf.dec_model_cfg.decoder_model_info_present_flag = 0;
- oxcf.dec_model_cfg.display_model_info_present_flag = 0;
- oxcf.ref_frm_cfg.max_reference_frames = impl_ptr_->max_ref_frames;
- oxcf.speed = impl_ptr_->speed;
- if (impl_ptr_->sb_size == BLOCK_64X64)
- oxcf.tool_cfg.superblock_size = AOM_SUPERBLOCK_SIZE_64X64;
- else
- oxcf.tool_cfg.superblock_size = AOM_SUPERBLOCK_SIZE_128X128;
-
- av1_initialize_enc(impl_ptr_->g_usage, impl_ptr_->rc_end_usage);
- AV1_PRIMARY *ppi =
- av1_create_primary_compressor(nullptr,
- /*num_lap_buffers=*/0, &oxcf);
- enc_resource.ppi = ppi;
-
- assert(ppi != nullptr);
- // Turn off ppi->b_calculate_psnr to avoid calling generate_psnr_packet() in
- // av1_post_encode_updates().
- // TODO(angiebird): Modify generate_psnr_packet() to handle the case that
- // cpi->ppi->output_pkt_list = nullptr.
- ppi->b_calculate_psnr = 0;
-
- aom_codec_err_t res = AOM_CODEC_OK;
- (void)res;
- enc_resource.stats_buf_ctx = CreateStatsBufferCtx(
- impl_ptr_->video_info.frame_count, &enc_resource.stats_buffer);
- if (pass == AOM_RC_SECOND_PASS) {
- assert(stats_list != nullptr);
- std::copy(stats_list->begin(), stats_list->end(),
- enc_resource.stats_buffer);
- *enc_resource.stats_buf_ctx->total_stats = ComputeTotalStats(*stats_list);
- oxcf.twopass_stats_in.buf = enc_resource.stats_buffer;
- // We need +1 here because av1 encoder assumes
- // oxcf.twopass_stats_in.buf[video_info.frame_count] has the total_stats
- oxcf.twopass_stats_in.sz = (impl_ptr_->video_info.frame_count + 1) *
- sizeof(enc_resource.stats_buffer[0]);
- } else {
- assert(pass == AOM_RC_FIRST_PASS);
- // We don't use stats_list for AOM_RC_FIRST_PASS.
- assert(stats_list == nullptr);
- }
- ppi->twopass.stats_buf_ctx = enc_resource.stats_buf_ctx;
- BufferPool *buffer_pool = nullptr;
- res = av1_create_context_and_bufferpool(ppi, &ppi->cpi, &buffer_pool, &oxcf,
- ENCODE_STAGE, -1);
- // TODO(angiebird): Why didn't we set initial_dimensions in
- // av1_create_compressor()?
- ppi->cpi->initial_dimensions.width = oxcf.frm_dim_cfg.width;
- ppi->cpi->initial_dimensions.height = oxcf.frm_dim_cfg.height;
- // use_ducky_encode is the flag we use to change AV1 behavior
- // slightly based on DuckyEncode's need. We should minimize this kind of
- // change unless it's necessary.
- ppi->cpi->use_ducky_encode = 1;
- assert(res == AOM_CODEC_OK);
- assert(ppi->cpi != nullptr);
- assert(buffer_pool != nullptr);
- const AV1_COMP *cpi = ppi->cpi;
- SequenceHeader *seq_params = ppi->cpi->common.seq_params;
- set_sb_size(seq_params, impl_ptr_->sb_size);
- ppi->seq_params_locked = 1;
- assert(ppi->lookahead == nullptr);
-
- int lag_in_frames = cpi->oxcf.gf_cfg.lag_in_frames;
- ppi->lookahead = av1_lookahead_init(
- cpi->oxcf.frm_dim_cfg.width, cpi->oxcf.frm_dim_cfg.height,
- seq_params->subsampling_x, seq_params->subsampling_y,
- seq_params->use_highbitdepth, lag_in_frames, cpi->oxcf.border_in_pixels,
- cpi->common.features.byte_alignment,
- /*num_lap_buffers=*/0, /*is_all_intra=*/0,
- cpi->oxcf.tool_cfg.enable_global_motion);
-
- av1_tf_info_alloc(&cpi->ppi->tf_info, cpi);
- assert(ppi->lookahead != nullptr);
-
- impl_ptr_->enc_resource = enc_resource;
-}
-
-static void CloseInputFile(struct AvxInputContext *input) {
- fclose(input->file);
- if (input->file_type == FILE_TYPE_Y4M) y4m_input_close(&input->y4m);
-}
-
-void DuckyEncode::FreeEncoder() {
- EncoderResource *enc_resource = &impl_ptr_->enc_resource;
- CloseInputFile(&impl_ptr_->input);
- aom_img_free(&enc_resource->img);
- DestroyStatsBufferCtx(&enc_resource->stats_buf_ctx,
- &enc_resource->stats_buffer);
- BufferPool *buffer_pool = enc_resource->ppi->cpi->common.buffer_pool;
- av1_destroy_context_and_bufferpool(enc_resource->ppi->cpi, &buffer_pool);
- av1_remove_primary_compressor(enc_resource->ppi);
- enc_resource->ppi = nullptr;
-}
-
-static int ReadFrame(struct AvxInputContext *input_ctx, aom_image_t *img) {
- FILE *f = input_ctx->file;
- y4m_input *y4m = &input_ctx->y4m;
- int shortread = 0;
-
- if (input_ctx->file_type == FILE_TYPE_Y4M) {
- if (y4m_input_fetch_frame(y4m, f, img) < 1) return 0;
- } else {
- shortread = read_yuv_frame(input_ctx, img);
- }
-
- return !shortread;
-}
-
-std::vector<FIRSTPASS_STATS> DuckyEncode::ComputeFirstPassStats() {
- aom_enc_pass pass = AOM_RC_FIRST_PASS;
- InitEncoder(pass, nullptr);
- AV1_PRIMARY *ppi = impl_ptr_->enc_resource.ppi;
- EncoderResource *enc_resource = &impl_ptr_->enc_resource;
- struct lookahead_ctx *lookahead = ppi->lookahead;
- int frame_count = impl_ptr_->video_info.frame_count;
- aom_rational64_t timestamp_ratio = impl_ptr_->timestamp_ratio;
- // TODO(angiebird): Ideally, ComputeFirstPassStats() doesn't output
- // bitstream. Do we need bitstream buffer here?
- std::vector<uint8_t> buf(1000);
- std::vector<FIRSTPASS_STATS> stats_list;
- for (int i = 0; i < frame_count; ++i) {
- if (ReadFrame(&impl_ptr_->input, &impl_ptr_->enc_resource.img)) {
- // TODO(angiebird): Set ts_start/ts_end properly
- int64_t ts_start = enc_resource->lookahead_push_count;
- int64_t ts_end = ts_start + 1;
- YV12_BUFFER_CONFIG sd;
- image2yuvconfig(&enc_resource->img, &sd);
- av1_lookahead_push(lookahead, &sd, ts_start, ts_end,
- /*use_highbitdepth=*/0, /*flags=*/0);
- ++enc_resource->lookahead_push_count;
- AV1_COMP_DATA cpi_data = {};
- cpi_data.cx_data = buf.data();
- cpi_data.cx_data_sz = buf.size();
- cpi_data.frame_size = 0;
- cpi_data.flush = 1; // Makes av1_get_compressed_data process a frame
- cpi_data.ts_frame_start = ts_start;
- cpi_data.ts_frame_end = ts_end;
- cpi_data.pop_lookahead = 1;
- cpi_data.timestamp_ratio = ×tamp_ratio;
- // av1_get_compressed_data only generates first pass stats not
- // compresses data
- int res = av1_get_compressed_data(ppi->cpi, &cpi_data);
- (void)res;
- assert(res == static_cast<int>(AOM_CODEC_OK));
- stats_list.push_back(*(ppi->twopass.stats_buf_ctx->stats_in_end - 1));
- av1_post_encode_updates(ppi->cpi, &cpi_data);
- }
- }
- av1_end_first_pass(ppi->cpi);
-
- FreeEncoder();
- return stats_list;
-}
-
-void DuckyEncode::StartEncode(const std::vector<FIRSTPASS_STATS> &stats_list) {
- aom_enc_pass pass = AOM_RC_SECOND_PASS;
- impl_ptr_->stats_list = stats_list;
- InitEncoder(pass, &stats_list);
- write_temp_delimiter_ = true;
-}
-
-static void DuckyEncodeInfoSetGopStruct(AV1_PRIMARY *ppi,
- const GopStruct &gop_struct,
- const GopEncodeInfo &gop_encode_info) {
- GF_GROUP *gf_group = &ppi->gf_group;
- ppi->p_rc.baseline_gf_interval = gop_struct.show_frame_count;
- ppi->internal_altref_allowed = 1;
-
- gf_group->size = static_cast<int>(gop_struct.gop_frame_list.size());
- gf_group->max_layer_depth = 0;
-
- int i = 0;
- for (const auto &frame : gop_struct.gop_frame_list) {
- gf_group->update_type[i] = (int)frame.update_type;
- if (frame.update_type == GopFrameType::kRegularArf) gf_group->arf_index = i;
-
- gf_group->frame_type[i] = !frame.is_key_frame;
-
- gf_group->q_val[i] = gop_encode_info.param_list[i].q_index;
- gf_group->rdmult_val[i] = gop_encode_info.param_list[i].rdmult;
-
- gf_group->cur_frame_idx[i] = 0;
- gf_group->arf_src_offset[i] = frame.order_idx - frame.display_idx;
- gf_group->cur_frame_idx[i] = frame.display_idx;
- gf_group->src_offset[i] = 0;
-
- // TODO(jingning): Placeholder - update the arf boost.
- gf_group->arf_boost[i] = 500;
- gf_group->layer_depth[i] = frame.layer_depth;
- gf_group->max_layer_depth =
- AOMMAX(frame.layer_depth, gf_group->max_layer_depth);
- gf_group->refbuf_state[i] =
- frame.is_key_frame ? REFBUF_RESET : REFBUF_UPDATE;
-
- std::fill_n(gf_group->ref_frame_list[i], REF_FRAMES, -1);
- gf_group->update_ref_idx[i] = -1;
- for (int ref_idx = 0;
- ref_idx < static_cast<int>(frame.ref_frame_list.size()); ++ref_idx) {
- int ref_frame = static_cast<int>(frame.ref_frame_list[ref_idx].name);
- gf_group->ref_frame_list[i][ref_frame] =
- static_cast<int8_t>(frame.ref_frame_list[ref_idx].index);
- }
- gf_group->update_ref_idx[i] = frame.update_ref_idx;
- gf_group->primary_ref_idx[i] = frame.primary_ref_frame.index;
- ++i;
- }
- ppi->cpi->gf_frame_index = 0;
-}
-
-static void DuckyEncodeInfoSetEncodeFrameDecision(
- DuckyEncodeInfo *ducky_encode_info, const EncodeFrameDecision &decision) {
- DuckyEncodeFrameInfo *frame_info = &ducky_encode_info->frame_info;
- *frame_info = {};
- frame_info->qp_mode = static_cast<DUCKY_ENCODE_FRAME_MODE>(decision.qp_mode);
- frame_info->gop_mode = static_cast<DUCKY_ENCODE_GOP_MODE>(decision.gop_mode);
- frame_info->q_index = decision.parameters.q_index;
- frame_info->rdmult = decision.parameters.rdmult;
- const size_t num_superblocks =
- decision.parameters.superblock_encode_params.size();
- frame_info->delta_q_enabled = 0;
- if (num_superblocks > 1) {
- frame_info->delta_q_enabled = 1;
- frame_info->superblock_encode_qindex = new int[num_superblocks];
- frame_info->superblock_encode_rdmult = new int[num_superblocks];
- for (size_t i = 0; i < num_superblocks; ++i) {
- frame_info->superblock_encode_qindex[i] =
- decision.parameters.superblock_encode_params[i].q_index;
- frame_info->superblock_encode_rdmult[i] =
- decision.parameters.superblock_encode_params[i].rdmult;
- }
- }
-}
-
-static void DuckyEncodeInfoGetEncodeFrameResult(
- const DuckyEncodeInfo *ducky_encode_info, EncodeFrameResult *result) {
- const DuckyEncodeFrameResult &frame_result = ducky_encode_info->frame_result;
- result->global_order_idx = frame_result.global_order_idx;
- result->q_index = frame_result.q_index;
- result->rdmult = frame_result.rdmult;
- result->rate = frame_result.rate;
- result->dist = frame_result.dist;
- result->psnr = frame_result.psnr;
-}
-
-static void WriteObu(AV1_PRIMARY *ppi, AV1_COMP_DATA *cpi_data) {
- AV1_COMP *const cpi = ppi->cpi;
- uint32_t obu_header_size = 1;
- const uint32_t obu_payload_size = 0;
- const size_t length_field_size = aom_uleb_size_in_bytes(obu_payload_size);
-
- const size_t move_offset = obu_header_size + length_field_size;
- memmove(cpi_data->cx_data + move_offset, cpi_data->cx_data,
- cpi_data->frame_size);
- obu_header_size =
- av1_write_obu_header(&ppi->level_params, &cpi->frame_header_count,
- OBU_TEMPORAL_DELIMITER, 0, cpi_data->cx_data);
-
- // OBUs are preceded/succeeded by an unsigned leb128 coded integer.
- if (av1_write_uleb_obu_size(obu_header_size, obu_payload_size,
- cpi_data->cx_data) != AOM_CODEC_OK) {
- aom_internal_error(&ppi->error, AOM_CODEC_ERROR, NULL);
- }
-
- cpi_data->frame_size +=
- obu_header_size + obu_payload_size + length_field_size;
-}
-
-TplGopStats DuckyEncode::ObtainTplStats(const GopStruct gop_struct,
- bool rate_dist_present) {
- TplGopStats tpl_gop_stats;
-
- AV1_PRIMARY *ppi = impl_ptr_->enc_resource.ppi;
- const uint8_t block_mis_log2 = ppi->tpl_data.tpl_stats_block_mis_log2;
-
- for (size_t idx = 0; idx < gop_struct.gop_frame_list.size(); ++idx) {
- TplFrameStats tpl_frame_stats = {};
- tpl_frame_stats.rate_dist_present = rate_dist_present;
-
- TplDepFrame *tpl_frame = &ppi->tpl_data.tpl_frame[idx];
- if (gop_struct.gop_frame_list[idx].update_type == GopFrameType::kOverlay ||
- gop_struct.gop_frame_list[idx].update_type ==
- GopFrameType::kIntermediateOverlay) {
- tpl_gop_stats.frame_stats_list.push_back(tpl_frame_stats);
- continue;
- }
-
- int ref_frame_index_mapping[REF_FRAMES] = { 0 };
- const GopFrame &gop_frame = gop_struct.gop_frame_list[idx];
-
- for (auto &rf : gop_frame.ref_frame_list) {
- ref_frame_index_mapping[static_cast<int>(rf.name)] = rf.index;
- }
-
- const int mi_rows = tpl_frame->mi_rows;
- const int mi_cols = tpl_frame->mi_cols;
- const int tpl_frame_stride = tpl_frame->stride;
- tpl_frame_stats.frame_height = mi_rows * MI_SIZE;
- tpl_frame_stats.frame_width = mi_cols * MI_SIZE;
- tpl_frame_stats.min_block_size = (1 << block_mis_log2) * MI_SIZE;
-
- const int mi_step = 1 << block_mis_log2;
- for (int mi_row = 0; mi_row < mi_rows; mi_row += mi_step) {
- for (int mi_col = 0; mi_col < mi_cols; mi_col += mi_step) {
- int tpl_blk_pos = (mi_row >> block_mis_log2) * tpl_frame_stride +
- (mi_col >> block_mis_log2);
- TplDepStats *tpl_stats_ptr = &tpl_frame->tpl_stats_ptr[tpl_blk_pos];
-
- TplBlockStats block_stats;
- block_stats.row = mi_row * MI_SIZE;
- block_stats.col = mi_col * MI_SIZE;
- block_stats.height = (1 << block_mis_log2) * MI_SIZE;
- block_stats.width = (1 << block_mis_log2) * MI_SIZE;
-
- block_stats.inter_cost =
- RDCOST(tpl_frame->base_rdmult, tpl_stats_ptr->recrf_rate,
- tpl_stats_ptr->recrf_dist);
- block_stats.intra_cost =
- RDCOST(tpl_frame->base_rdmult, tpl_stats_ptr->intra_rate,
- tpl_stats_ptr->intra_dist);
-
- if (tpl_frame_stats.rate_dist_present) {
- block_stats.recrf_dist = tpl_stats_ptr->recrf_dist;
- block_stats.recrf_rate = tpl_stats_ptr->recrf_rate;
- block_stats.intra_pred_err = tpl_stats_ptr->intra_sse;
- block_stats.inter_pred_err = tpl_stats_ptr->recrf_sse;
- }
-
- block_stats.ref_frame_index = { -1, -1 };
-
- for (int i = 0; i < kBlockRefCount; ++i) {
- if (tpl_stats_ptr->ref_frame_index[i] >= 0) {
- block_stats.ref_frame_index[i] =
- ref_frame_index_mapping[tpl_stats_ptr->ref_frame_index[i] + 1];
- block_stats.mv[i] = {
- tpl_stats_ptr->mv[tpl_stats_ptr->ref_frame_index[i]].as_mv.row,
- tpl_stats_ptr->mv[tpl_stats_ptr->ref_frame_index[i]].as_mv.col, 3
- };
- }
- }
- tpl_frame_stats.block_stats_list.push_back(block_stats);
- }
- }
-
- tpl_gop_stats.frame_stats_list.push_back(tpl_frame_stats);
- }
-
- return tpl_gop_stats;
-}
-
-// Obtain TPL stats through ducky_encode.
-// TODO(jianj): Populate rate_dist_present flag through qmode_rc_encoder
-std::vector<TplGopStats> DuckyEncode::ComputeTplStats(
- const std::vector<FIRSTPASS_STATS> &stats_list,
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list) {
- StartEncode(stats_list);
- std::vector<TplGopStats> tpl_gop_stats_list;
- AV1_PRIMARY *ppi = impl_ptr_->enc_resource.ppi;
- const VideoInfo &video_info = impl_ptr_->video_info;
- write_temp_delimiter_ = true;
- AllocateBitstreamBuffer(video_info);
-
- // Go through each gop and encode each frame in the gop
- for (size_t i = 0; i < gop_list.size(); ++i) {
- const aom::GopStruct &gop_struct = gop_list[i];
- const aom::GopEncodeInfo &gop_encode_info = gop_encode_info_list[i];
-
- DuckyEncodeInfoSetGopStruct(ppi, gop_struct, gop_encode_info);
-
- aom::TplGopStats tpl_gop_stats;
- for (auto &frame_param : gop_encode_info.param_list) {
- // encoding frame frame_number
- aom::EncodeFrameDecision frame_decision = { aom::EncodeFrameMode::kQindex,
- aom::EncodeGopMode::kGopRcl,
- frame_param };
- EncodeFrame(frame_decision);
- if (ppi->cpi->common.show_frame) pending_ctx_size_ = 0;
- write_temp_delimiter_ = ppi->cpi->common.show_frame;
- }
- // The rate_dist_present needs to be populated.
- tpl_gop_stats = ObtainTplStats(gop_struct, 0);
- tpl_gop_stats_list.push_back(tpl_gop_stats);
- }
- EndEncode();
- return tpl_gop_stats_list;
-}
-
-std::vector<TplGopStats> DuckyEncode::ComputeTwoPassTplStats(
- const std::vector<FIRSTPASS_STATS> &stats_list,
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list,
- const GopEncodeInfoList &alt_gop_encode_info_list) {
- std::vector<TplGopStats> first_tpl_gop_stats_list =
- ComputeTplStats(stats_list, gop_list, gop_encode_info_list);
- const std::vector<TplGopStats> second_tpl_gop_stats_list =
- ComputeTplStats(stats_list, gop_list, alt_gop_encode_info_list);
- assert(first_tpl_gop_stats_list.size() == second_tpl_gop_stats_list.size());
-
- // Set alternate_block_stats_list in first_tpl_gop_stats_list
- // and return first_tpl_gop_stats_list
- for (size_t i = 0; i < first_tpl_gop_stats_list.size(); ++i) {
- for (size_t j = 0; j < first_tpl_gop_stats_list[i].frame_stats_list.size();
- ++j) {
- first_tpl_gop_stats_list[i]
- .frame_stats_list[j]
- .alternate_block_stats_list =
- second_tpl_gop_stats_list[i].frame_stats_list[j].block_stats_list;
- }
- }
- return first_tpl_gop_stats_list;
-}
-
-// Conduct final encoding process.
-std::vector<EncodeFrameResult> DuckyEncode::EncodeVideo(
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list) {
- AV1_PRIMARY *ppi = impl_ptr_->enc_resource.ppi;
- std::vector<EncodeFrameResult> encoded_frame_list;
- const VideoInfo &video_info = impl_ptr_->video_info;
-
- write_temp_delimiter_ = true;
- AllocateBitstreamBuffer(video_info);
-
- // Go through each gop and encode each frame in the gop
- for (size_t i = 0; i < gop_list.size(); ++i) {
- const aom::GopStruct &gop_struct = gop_list[i];
- const aom::GopEncodeInfo &gop_encode_info = gop_encode_info_list[i];
- DuckyEncodeInfoSetGopStruct(ppi, gop_struct, gop_encode_info);
-
- for (auto &frame_param : gop_encode_info.param_list) {
- aom::EncodeFrameDecision frame_decision = { aom::EncodeFrameMode::kQindex,
- aom::EncodeGopMode::kGopRcl,
- frame_param };
- EncodeFrameResult temp_result = EncodeFrame(frame_decision);
- if (ppi->cpi->common.show_frame) {
- bitstream_buf_.resize(pending_ctx_size_);
- EncodeFrameResult encode_frame_result = temp_result;
- encode_frame_result.bitstream_buf = bitstream_buf_;
- encoded_frame_list.push_back(encode_frame_result);
-
- AllocateBitstreamBuffer(video_info);
- }
- write_temp_delimiter_ = ppi->cpi->common.show_frame;
- }
- }
-
- return encoded_frame_list;
-}
-
-EncodeFrameResult DuckyEncode::EncodeFrame(
- const EncodeFrameDecision &decision) {
- EncodeFrameResult encode_frame_result = {};
- encode_frame_result.bitstream_buf = bitstream_buf_;
- AV1_PRIMARY *ppi = impl_ptr_->enc_resource.ppi;
- aom_image_t *img = &impl_ptr_->enc_resource.img;
- AV1_COMP *const cpi = ppi->cpi;
- struct lookahead_ctx *lookahead = ppi->lookahead;
-
- while (!av1_lookahead_full(lookahead)) {
- if (ReadFrame(&impl_ptr_->input, img)) {
- YV12_BUFFER_CONFIG sd;
- image2yuvconfig(img, &sd);
- int64_t ts_start = impl_ptr_->enc_resource.lookahead_push_count;
- int64_t ts_end = ts_start + 1;
- av1_lookahead_push(lookahead, &sd, ts_start, ts_end,
- /*use_highbitdepth=*/0, /*flags=*/0);
- ++impl_ptr_->enc_resource.lookahead_push_count;
- } else {
- break;
- }
- }
-
- AV1_COMP_DATA cpi_data = {};
- cpi_data.cx_data = bitstream_buf_.data() + pending_ctx_size_;
- cpi_data.cx_data_sz = bitstream_buf_.size() - pending_ctx_size_;
- cpi_data.frame_size = 0;
- cpi_data.flush = 1;
- // ts_frame_start and ts_frame_end are not as important since we are focusing
- // on q mode
- cpi_data.ts_frame_start = impl_ptr_->enc_resource.encode_frame_count;
- cpi_data.ts_frame_end = cpi_data.ts_frame_start + 1;
- cpi_data.pop_lookahead = 1;
- cpi_data.timestamp_ratio = &impl_ptr_->timestamp_ratio;
- ++impl_ptr_->enc_resource.encode_frame_count;
-
- av1_compute_num_workers_for_mt(cpi);
- av1_init_frame_mt(ppi, cpi);
-
- DuckyEncodeInfoSetEncodeFrameDecision(&cpi->ducky_encode_info, decision);
- const int status = av1_get_compressed_data(cpi, &cpi_data);
-
- if (write_temp_delimiter_) WriteObu(ppi, &cpi_data);
- (void)status;
- assert(status == static_cast<int>(AOM_CODEC_OK));
- DuckyEncodeInfoGetEncodeFrameResult(&cpi->ducky_encode_info,
- &encode_frame_result);
- av1_post_encode_updates(cpi, &cpi_data);
- if (cpi->common.show_frame) {
- // decrement frames_left counter
- ppi->frames_left = AOMMAX(0, ppi->frames_left - 1);
- }
-
- pending_ctx_size_ += cpi_data.frame_size;
-
- fprintf(stderr, "frame %d, qp = %d, size %d, PSNR %f\n",
- encode_frame_result.global_order_idx, encode_frame_result.q_index,
- encode_frame_result.rate, encode_frame_result.psnr);
- delete[] cpi->ducky_encode_info.frame_info.superblock_encode_qindex;
- delete[] cpi->ducky_encode_info.frame_info.superblock_encode_rdmult;
- return encode_frame_result;
-}
-
-void DuckyEncode::EndEncode() { FreeEncoder(); }
-
-void DuckyEncode::AllocateBitstreamBuffer(const VideoInfo &video_info) {
- pending_ctx_size_ = 0;
- // TODO(angiebird): Set bitstream_buf size to a conservatve upperbound.
- bitstream_buf_.assign(
- video_info.frame_width * video_info.frame_height * 3 * 8, 0);
-}
-} // namespace aom
diff --git a/av1/qmode_rc/ducky_encode.h b/av1/qmode_rc/ducky_encode.h
deleted file mode 100644
index 5dee2a5..0000000
--- a/av1/qmode_rc/ducky_encode.h
+++ /dev/null
@@ -1,117 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#ifndef AOM_AV1_QMODE_RC_DUCKY_ENCODE_H_
-#define AOM_AV1_QMODE_RC_DUCKY_ENCODE_H_
-
-#include <cstddef>
-#include <cstdint>
-#include <memory>
-#include <string>
-#include <vector>
-
-#include "aom/aom_encoder.h"
-#include "av1/encoder/firstpass.h"
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-
-namespace aom {
-struct VideoInfo {
- int frame_width;
- int frame_height;
- aom_rational_t frame_rate;
- aom_img_fmt_t img_fmt;
- int frame_count;
- std::string file_path;
-};
-
-struct EncodeFrameResult {
- std::vector<uint8_t> bitstream_buf;
- // TODO(angiebird): update global_coding_idx and global_order_idx properly.
- int global_coding_idx;
- int global_order_idx;
- int q_index;
- int rdmult;
- int rate;
- int64_t dist;
- double psnr;
-};
-
-enum class EncodeFrameMode {
- kNone, // Let native AV1 determine q index and rdmult
- kQindex, // DuckyEncode determines q index and AV1 determines rdmult
- kQindexRdmult, // DuckyEncode determines q index and rdmult
-};
-
-enum class EncodeGopMode {
- kNone, // native AV1 decides GOP
- kGopRcl, // rate control lib decides GOP
-};
-
-struct EncodeFrameDecision {
- EncodeFrameMode qp_mode;
- EncodeGopMode gop_mode;
- FrameEncodeParameters parameters;
-};
-
-using GopEncodeInfoList = std::vector<GopEncodeInfo>;
-
-// DuckyEncode is an experimental encoder c++ interface for two-pass mode.
-// This object can be used to do zero or more encode passes, where each encode
-// pass consists of:
-// - StartEncode()
-// - Zero or more calls to EncodeFrame()
-// - EndEncode()
-// Encode passes may not overlap, and any other sequence of these calls is
-// invalid.
-class DuckyEncode {
- public:
- explicit DuckyEncode(const VideoInfo &video_info, BLOCK_SIZE sb_size,
- int max_ref_frames, int speed, int base_qindex);
- ~DuckyEncode();
- std::vector<FIRSTPASS_STATS> ComputeFirstPassStats();
- void StartEncode(const std::vector<FIRSTPASS_STATS> &stats_list);
-
- TplGopStats ObtainTplStats(const GopStruct gop_struct,
- bool rate_dist_present);
-
- std::vector<TplGopStats> ComputeTplStats(
- const std::vector<FIRSTPASS_STATS> &stats_list,
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list);
-
- std::vector<TplGopStats> ComputeTwoPassTplStats(
- const std::vector<FIRSTPASS_STATS> &stats_list,
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list,
- const GopEncodeInfoList &alt_gop_encode_info_list);
-
- std::vector<EncodeFrameResult> EncodeVideo(
- const GopStructList &gop_list,
- const GopEncodeInfoList &gop_encode_info_list);
- EncodeFrameResult EncodeFrame(const EncodeFrameDecision &decision);
- void EndEncode();
- void AllocateBitstreamBuffer(const VideoInfo &video_info);
-
- private:
- void InitEncoder(aom_enc_pass pass,
- const std::vector<FIRSTPASS_STATS> *stats_list);
- void FreeEncoder();
-
- private:
- class EncodeImpl;
- std::unique_ptr<EncodeImpl> impl_ptr_;
- bool write_temp_delimiter_;
- std::vector<uint8_t> bitstream_buf_;
- size_t pending_ctx_size_;
-};
-} // namespace aom
-
-#endif // AOM_AV1_QMODE_RC_DUCKY_ENCODE_H_
diff --git a/av1/qmode_rc/ratectrl_qmode.cc b/av1/qmode_rc/ratectrl_qmode.cc
deleted file mode 100644
index 0a2892d..0000000
--- a/av1/qmode_rc/ratectrl_qmode.cc
+++ /dev/null
@@ -1,1552 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-#include "av1/qmode_rc/ratectrl_qmode.h"
-
-#include <algorithm>
-#include <cassert>
-#include <climits>
-#include <functional>
-#include <numeric>
-#include <sstream>
-#include <unordered_map>
-#include <unordered_set>
-#include <vector>
-
-#include "aom/aom_codec.h"
-#include "av1/encoder/pass2_strategy.h"
-#include "av1/encoder/tpl_model.h"
-
-namespace aom {
-
-// This is used before division to ensure that the divisor isn't zero or
-// too close to zero.
-static double ModifyDivisor(double divisor) {
- const double kEpsilon = 0.0000001;
- return (divisor < 0 ? std::min(divisor, -kEpsilon)
- : std::max(divisor, kEpsilon));
-}
-
-GopFrame GopFrameInvalid() {
- GopFrame gop_frame = {};
- gop_frame.is_valid = false;
- gop_frame.coding_idx = -1;
- gop_frame.order_idx = -1;
- return gop_frame;
-}
-
-void SetGopFrameByType(GopFrameType gop_frame_type, GopFrame *gop_frame) {
- gop_frame->update_type = gop_frame_type;
- switch (gop_frame_type) {
- case GopFrameType::kRegularKey:
- gop_frame->is_key_frame = 1;
- gop_frame->is_arf_frame = 0;
- gop_frame->is_show_frame = 1;
- gop_frame->is_golden_frame = 1;
- gop_frame->encode_ref_mode = EncodeRefMode::kRegular;
- break;
- case GopFrameType::kRegularGolden:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 0;
- gop_frame->is_show_frame = 1;
- gop_frame->is_golden_frame = 1;
- gop_frame->encode_ref_mode = EncodeRefMode::kRegular;
- break;
- case GopFrameType::kRegularArf:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 1;
- gop_frame->is_show_frame = 0;
- gop_frame->is_golden_frame = 1;
- gop_frame->encode_ref_mode = EncodeRefMode::kRegular;
- break;
- case GopFrameType::kIntermediateArf:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 1;
- gop_frame->is_show_frame = 0;
- gop_frame->is_golden_frame = gop_frame->layer_depth <= 2 ? 1 : 0;
- gop_frame->encode_ref_mode = EncodeRefMode::kRegular;
- break;
- case GopFrameType::kRegularLeaf:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 0;
- gop_frame->is_show_frame = 1;
- gop_frame->is_golden_frame = 0;
- gop_frame->encode_ref_mode = EncodeRefMode::kRegular;
- break;
- case GopFrameType::kIntermediateOverlay:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 0;
- gop_frame->is_show_frame = 1;
- gop_frame->is_golden_frame = 0;
- gop_frame->encode_ref_mode = EncodeRefMode::kShowExisting;
- break;
- case GopFrameType::kOverlay:
- gop_frame->is_key_frame = 0;
- gop_frame->is_arf_frame = 0;
- gop_frame->is_show_frame = 1;
- gop_frame->is_golden_frame = 0;
- gop_frame->encode_ref_mode = EncodeRefMode::kOverlay;
- break;
- }
-}
-
-GopFrame GopFrameBasic(int global_coding_idx_offset,
- int global_order_idx_offset, int coding_idx,
- int order_idx, int depth, int display_idx,
- GopFrameType gop_frame_type) {
- GopFrame gop_frame = {};
- gop_frame.is_valid = true;
- gop_frame.coding_idx = coding_idx;
- gop_frame.order_idx = order_idx;
- gop_frame.display_idx = display_idx;
- gop_frame.global_coding_idx = global_coding_idx_offset + coding_idx;
- gop_frame.global_order_idx = global_order_idx_offset + order_idx;
- gop_frame.layer_depth = depth + kLayerDepthOffset;
- gop_frame.colocated_ref_idx = -1;
- gop_frame.update_ref_idx = -1;
- SetGopFrameByType(gop_frame_type, &gop_frame);
- return gop_frame;
-}
-
-// This function create gop frames with indices of display order from
-// order_start to order_end - 1. The function will recursively introduce
-// intermediate ARF untill maximum depth is met or the number of regular frames
-// in between two ARFs are less than 3. Than the regular frames will be added
-// into the gop_struct.
-void ConstructGopMultiLayer(GopStruct *gop_struct,
- RefFrameManager *ref_frame_manager, int max_depth,
- int depth, int order_start, int order_end) {
- GopFrame gop_frame;
- int num_frames = order_end - order_start;
- const int global_coding_idx_offset = gop_struct->global_coding_idx_offset;
- const int global_order_idx_offset = gop_struct->global_order_idx_offset;
- // If there are less than kMinIntervalToAddArf frames, stop introducing ARF
- if (depth < max_depth && num_frames >= kMinIntervalToAddArf) {
- int order_mid = (order_start + order_end) / 2;
- // intermediate ARF
- gop_frame = GopFrameBasic(
- global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct->gop_frame_list.size()), order_mid, depth,
- gop_struct->display_tracker, GopFrameType::kIntermediateArf);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct->gop_frame_list.push_back(gop_frame);
- ConstructGopMultiLayer(gop_struct, ref_frame_manager, max_depth, depth + 1,
- order_start, order_mid);
- // show existing intermediate ARF
- gop_frame =
- GopFrameBasic(global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct->gop_frame_list.size()),
- order_mid, max_depth, gop_struct->display_tracker,
- GopFrameType::kIntermediateOverlay);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct->gop_frame_list.push_back(gop_frame);
- ++gop_struct->display_tracker;
- ConstructGopMultiLayer(gop_struct, ref_frame_manager, max_depth, depth + 1,
- order_mid + 1, order_end);
- } else {
- // regular frame
- for (int i = order_start; i < order_end; ++i) {
- gop_frame = GopFrameBasic(
- global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct->gop_frame_list.size()), i, max_depth,
- gop_struct->display_tracker, GopFrameType::kRegularLeaf);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct->gop_frame_list.push_back(gop_frame);
- ++gop_struct->display_tracker;
- }
- }
-}
-
-GopStruct ConstructGop(RefFrameManager *ref_frame_manager, int show_frame_count,
- bool has_key_frame, int global_coding_idx_offset,
- int global_order_idx_offset) {
- GopStruct gop_struct;
- gop_struct.show_frame_count = show_frame_count;
- gop_struct.global_coding_idx_offset = global_coding_idx_offset;
- gop_struct.global_order_idx_offset = global_order_idx_offset;
- int order_start = 0;
- int order_end = show_frame_count - 1;
-
- // TODO(jingning): Re-enable the use of pyramid coding structure.
- bool has_arf_frame = show_frame_count > kMinIntervalToAddArf;
-
- gop_struct.display_tracker = 0;
-
- GopFrame gop_frame;
- if (has_key_frame) {
- const int key_frame_depth = -1;
- ref_frame_manager->Reset();
- gop_frame = GopFrameBasic(
- global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct.gop_frame_list.size()), order_start,
- key_frame_depth, gop_struct.display_tracker, GopFrameType::kRegularKey);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct.gop_frame_list.push_back(gop_frame);
- order_start++;
- ++gop_struct.display_tracker;
- }
-
- const int arf_depth = 0;
- if (has_arf_frame) {
- // Use multi-layer pyrmaid coding structure.
- gop_frame = GopFrameBasic(
- global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct.gop_frame_list.size()), order_end,
- arf_depth, gop_struct.display_tracker, GopFrameType::kRegularArf);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct.gop_frame_list.push_back(gop_frame);
- ConstructGopMultiLayer(&gop_struct, ref_frame_manager,
- ref_frame_manager->MaxRefFrame() - 1, arf_depth + 1,
- order_start, order_end);
- // Overlay
- gop_frame =
- GopFrameBasic(global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct.gop_frame_list.size()),
- order_end, ref_frame_manager->MaxRefFrame() - 1,
- gop_struct.display_tracker, GopFrameType::kOverlay);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct.gop_frame_list.push_back(gop_frame);
- ++gop_struct.display_tracker;
- } else {
- // Use IPPP format.
- for (int i = order_start; i <= order_end; ++i) {
- gop_frame = GopFrameBasic(
- global_coding_idx_offset, global_order_idx_offset,
- static_cast<int>(gop_struct.gop_frame_list.size()), i, arf_depth + 1,
- gop_struct.display_tracker, GopFrameType::kRegularLeaf);
- ref_frame_manager->UpdateRefFrameTable(&gop_frame);
- gop_struct.gop_frame_list.push_back(gop_frame);
- ++gop_struct.display_tracker;
- }
- }
-
- return gop_struct;
-}
-
-Status AV1RateControlQMode::SetRcParam(const RateControlParam &rc_param) {
- std::ostringstream error_message;
- if (rc_param.max_gop_show_frame_count <
- std::max(4, rc_param.min_gop_show_frame_count)) {
- error_message << "max_gop_show_frame_count ("
- << rc_param.max_gop_show_frame_count
- << ") must be at least 4 and may not be less than "
- "min_gop_show_frame_count ("
- << rc_param.min_gop_show_frame_count << ")";
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- if (rc_param.ref_frame_table_size < 1 || rc_param.ref_frame_table_size > 8) {
- error_message << "ref_frame_table_size (" << rc_param.ref_frame_table_size
- << ") must be in the range [1, 8].";
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- if (rc_param.max_ref_frames < 1 || rc_param.max_ref_frames > 7) {
- error_message << "max_ref_frames (" << rc_param.max_ref_frames
- << ") must be in the range [1, 7].";
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- if (rc_param.base_q_index < 0 || rc_param.base_q_index > 255) {
- error_message << "base_q_index (" << rc_param.base_q_index
- << ") must be in the range [0, 255].";
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- if (rc_param.frame_width < 16 || rc_param.frame_width > 16384 ||
- rc_param.frame_height < 16 || rc_param.frame_height > 16384) {
- error_message << "frame_width (" << rc_param.frame_width
- << ") and frame_height (" << rc_param.frame_height
- << ") must be in the range [16, 16384].";
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- rc_param_ = rc_param;
- return { AOM_CODEC_OK, "" };
-}
-
-// Threshold for use of the lagging second reference frame. High second ref
-// usage may point to a transient event like a flash or occlusion rather than
-// a real scene cut.
-// We adapt the threshold based on number of frames in this key-frame group so
-// far.
-static double GetSecondRefUsageThreshold(int frame_count_so_far) {
- const int adapt_upto = 32;
- const double min_second_ref_usage_thresh = 0.085;
- const double second_ref_usage_thresh_max_delta = 0.035;
- if (frame_count_so_far >= adapt_upto) {
- return min_second_ref_usage_thresh + second_ref_usage_thresh_max_delta;
- }
- return min_second_ref_usage_thresh +
- ((double)frame_count_so_far / (adapt_upto - 1)) *
- second_ref_usage_thresh_max_delta;
-}
-
-// Slide show transition detection.
-// Tests for case where there is very low error either side of the current frame
-// but much higher just for this frame. This can help detect key frames in
-// slide shows even where the slides are pictures of different sizes.
-// Also requires that intra and inter errors are very similar to help eliminate
-// harmful false positives.
-// It will not help if the transition is a fade or other multi-frame effect.
-static bool DetectSlideTransition(const FIRSTPASS_STATS &this_frame,
- const FIRSTPASS_STATS &last_frame,
- const FIRSTPASS_STATS &next_frame) {
- // Intra / Inter threshold very low
- constexpr double kVeryLowII = 1.5;
- // Clean slide transitions we expect a sharp single frame spike in error.
- constexpr double kErrorSpike = 5.0;
-
- // TODO(angiebird): Understand the meaning of these conditions.
- return (this_frame.intra_error < (this_frame.coded_error * kVeryLowII)) &&
- (this_frame.coded_error > (last_frame.coded_error * kErrorSpike)) &&
- (this_frame.coded_error > (next_frame.coded_error * kErrorSpike));
-}
-
-// Check if there is a significant intra/inter error change between the current
-// frame and its neighbor. If so, we should further test whether the current
-// frame should be a key frame.
-static bool DetectIntraInterErrorChange(const FIRSTPASS_STATS &this_stats,
- const FIRSTPASS_STATS &last_stats,
- const FIRSTPASS_STATS &next_stats) {
- // Minimum % intra coding observed in first pass (1.0 = 100%)
- constexpr double kMinIntraLevel = 0.25;
- // Minimum ratio between the % of intra coding and inter coding in the first
- // pass after discounting neutral blocks (discounting neutral blocks in this
- // way helps catch scene cuts in clips with very flat areas or letter box
- // format clips with image padding.
- constexpr double kIntraVsInterRatio = 2.0;
-
- const double modified_pcnt_inter =
- this_stats.pcnt_inter - this_stats.pcnt_neutral;
- const double pcnt_intra_min =
- std::max(kMinIntraLevel, kIntraVsInterRatio * modified_pcnt_inter);
-
- // In real scene cuts there is almost always a sharp change in the intra
- // or inter error score.
- constexpr double kErrorChangeThreshold = 0.4;
- const double last_this_error_ratio =
- fabs(last_stats.coded_error - this_stats.coded_error) /
- ModifyDivisor(this_stats.coded_error);
-
- const double this_next_error_ratio =
- fabs(last_stats.intra_error - this_stats.intra_error) /
- ModifyDivisor(this_stats.intra_error);
-
- // Maximum threshold for the relative ratio of intra error score vs best
- // inter error score.
- constexpr double kThisIntraCodedErrorRatioMax = 1.9;
- const double this_intra_coded_error_ratio =
- this_stats.intra_error / ModifyDivisor(this_stats.coded_error);
-
- // For real scene cuts we expect an improvment in the intra inter error
- // ratio in the next frame.
- constexpr double kNextIntraCodedErrorRatioMin = 3.5;
- const double next_intra_coded_error_ratio =
- next_stats.intra_error / ModifyDivisor(next_stats.coded_error);
-
- double pcnt_intra = 1.0 - this_stats.pcnt_inter;
- return pcnt_intra > pcnt_intra_min &&
- this_intra_coded_error_ratio < kThisIntraCodedErrorRatioMax &&
- (last_this_error_ratio > kErrorChangeThreshold ||
- this_next_error_ratio > kErrorChangeThreshold ||
- next_intra_coded_error_ratio > kNextIntraCodedErrorRatioMin);
-}
-
-// Check whether the candidate can be a key frame.
-// This is a rewrite of test_candidate_kf().
-static bool TestCandidateKey(const FirstpassInfo &first_pass_info,
- int candidate_key_idx, int frames_since_prev_key) {
- const auto &stats_list = first_pass_info.stats_list;
- const int stats_count = static_cast<int>(stats_list.size());
- if (candidate_key_idx + 1 >= stats_count || candidate_key_idx - 1 < 0) {
- return false;
- }
- const auto &last_stats = stats_list[candidate_key_idx - 1];
- const auto &this_stats = stats_list[candidate_key_idx];
- const auto &next_stats = stats_list[candidate_key_idx + 1];
-
- if (frames_since_prev_key < 3) return false;
- const double second_ref_usage_threshold =
- GetSecondRefUsageThreshold(frames_since_prev_key);
- if (this_stats.pcnt_second_ref >= second_ref_usage_threshold) return false;
- if (next_stats.pcnt_second_ref >= second_ref_usage_threshold) return false;
-
- // Hard threshold where the first pass chooses intra for almost all blocks.
- // In such a case even if the frame is not a scene cut coding a key frame
- // may be a good option.
- constexpr double kVeryLowInterThreshold = 0.05;
- if (this_stats.pcnt_inter < kVeryLowInterThreshold ||
- DetectSlideTransition(this_stats, last_stats, next_stats) ||
- DetectIntraInterErrorChange(this_stats, last_stats, next_stats)) {
- double boost_score = 0.0;
- double decay_accumulator = 1.0;
-
- // We do "-1" because the candidate key is not counted.
- int stats_after_this_stats = stats_count - candidate_key_idx - 1;
-
- // Number of frames required to test for scene cut detection
- constexpr int kSceneCutKeyTestIntervalMax = 16;
-
- // Make sure we have enough stats after the candidate key.
- const int frames_to_test_after_candidate_key =
- std::min(kSceneCutKeyTestIntervalMax, stats_after_this_stats);
-
- // Examine how well the key frame predicts subsequent frames.
- int i;
- for (i = 1; i <= frames_to_test_after_candidate_key; ++i) {
- // Get the next frame details
- const auto &stats = stats_list[candidate_key_idx + i];
-
- // Cumulative effect of decay in prediction quality.
- if (stats.pcnt_inter > 0.85) {
- decay_accumulator *= stats.pcnt_inter;
- } else {
- decay_accumulator *= (0.85 + stats.pcnt_inter) / 2.0;
- }
-
- constexpr double kBoostFactor = 12.5;
- double next_iiratio =
- (kBoostFactor * stats.intra_error / ModifyDivisor(stats.coded_error));
- next_iiratio = std::min(next_iiratio, 128.0);
- double boost_score_increment = decay_accumulator * next_iiratio;
-
- // Keep a running total.
- boost_score += boost_score_increment;
-
- // Test various breakout clauses.
- // TODO(any): Test of intra error should be normalized to an MB.
- // TODO(angiebird): Investigate the following questions.
- // Question 1: next_iiratio (intra_error / coded_error) * kBoostFactor
- // We know intra_error / coded_error >= 1 and kBoostFactor = 12.5,
- // therefore, (intra_error / coded_error) * kBoostFactor will always
- // greater than 1.5. Is "next_iiratio < 1.5" always false?
- // Question 2: Similar to question 1, is "next_iiratio < 3.0" always true?
- // Question 3: Why do we need to divide 200 with num_mbs_16x16?
- if ((stats.pcnt_inter < 0.05) || (next_iiratio < 1.5) ||
- (((stats.pcnt_inter - stats.pcnt_neutral) < 0.20) &&
- (next_iiratio < 3.0)) ||
- (boost_score_increment < 3.0) ||
- (stats.intra_error <
- (200.0 / static_cast<double>(first_pass_info.num_mbs_16x16)))) {
- break;
- }
- }
-
- // If there is tolerable prediction for at least the next 3 frames then
- // break out else discard this potential key frame and move on
- const int count_for_tolerable_prediction = 3;
- if (boost_score > 30.0 && (i > count_for_tolerable_prediction)) {
- return true;
- }
- }
- return false;
-}
-
-// Compute key frame location from first_pass_info.
-std::vector<int> GetKeyFrameList(const FirstpassInfo &first_pass_info) {
- std::vector<int> key_frame_list;
- key_frame_list.push_back(0); // The first frame is always a key frame
- int candidate_key_idx = 1;
- while (candidate_key_idx <
- static_cast<int>(first_pass_info.stats_list.size())) {
- const int frames_since_prev_key = candidate_key_idx - key_frame_list.back();
- // Check for a scene cut.
- const bool scenecut_detected = TestCandidateKey(
- first_pass_info, candidate_key_idx, frames_since_prev_key);
- if (scenecut_detected) {
- key_frame_list.push_back(candidate_key_idx);
- }
- ++candidate_key_idx;
- }
- return key_frame_list;
-}
-
-// initialize GF_GROUP_STATS
-static void InitGFStats(GF_GROUP_STATS *gf_stats) {
- gf_stats->gf_group_err = 0.0;
- gf_stats->gf_group_raw_error = 0.0;
- gf_stats->gf_group_skip_pct = 0.0;
- gf_stats->gf_group_inactive_zone_rows = 0.0;
-
- gf_stats->mv_ratio_accumulator = 0.0;
- gf_stats->decay_accumulator = 1.0;
- gf_stats->zero_motion_accumulator = 1.0;
- gf_stats->loop_decay_rate = 1.0;
- gf_stats->last_loop_decay_rate = 1.0;
- gf_stats->this_frame_mv_in_out = 0.0;
- gf_stats->mv_in_out_accumulator = 0.0;
- gf_stats->abs_mv_in_out_accumulator = 0.0;
-
- gf_stats->avg_sr_coded_error = 0.0;
- gf_stats->avg_pcnt_second_ref = 0.0;
- gf_stats->avg_new_mv_count = 0.0;
- gf_stats->avg_wavelet_energy = 0.0;
- gf_stats->avg_raw_err_stdev = 0.0;
- gf_stats->non_zero_stdev_count = 0;
-}
-
-static int FindRegionIndex(const std::vector<REGIONS> ®ions, int frame_idx) {
- for (int k = 0; k < static_cast<int>(regions.size()); k++) {
- if (regions[k].start <= frame_idx && regions[k].last >= frame_idx) {
- return k;
- }
- }
- return -1;
-}
-
-// This function detects a flash through the high relative pcnt_second_ref
-// score in the frame following a flash frame. The offset passed in should
-// reflect this.
-static bool DetectFlash(const std::vector<FIRSTPASS_STATS> &stats_list,
- int index) {
- int next_index = index + 1;
- if (next_index >= static_cast<int>(stats_list.size())) return false;
- const FIRSTPASS_STATS &next_frame = stats_list[next_index];
-
- // What we are looking for here is a situation where there is a
- // brief break in prediction (such as a flash) but subsequent frames
- // are reasonably well predicted by an earlier (pre flash) frame.
- // The recovery after a flash is indicated by a high pcnt_second_ref
- // compared to pcnt_inter.
- return next_frame.pcnt_second_ref > next_frame.pcnt_inter &&
- next_frame.pcnt_second_ref >= 0.5;
-}
-
-#define MIN_SHRINK_LEN 6
-
-// This function takes in a suggesting gop interval from cur_start to cur_last,
-// analyzes firstpass stats and region stats and then return a better gop cut
-// location.
-// TODO(b/231517281): Simplify the indices once we have an unit test.
-// We are using four indices here, order_index, cur_start, cur_last, and
-// frames_since_key. Ideally, only three indices are needed.
-// 1) start_index = order_index + cur_start
-// 2) end_index = order_index + cur_end
-// 3) key_index
-int FindBetterGopCut(const std::vector<FIRSTPASS_STATS> &stats_list,
- const std::vector<REGIONS> ®ions_list,
- int min_gop_show_frame_count, int max_gop_show_frame_count,
- int order_index, int cur_start, int cur_last,
- int frames_since_key) {
- // only try shrinking if interval smaller than active_max_gf_interval
- if (cur_last - cur_start > max_gop_show_frame_count ||
- cur_start >= cur_last) {
- return cur_last;
- }
- int num_regions = static_cast<int>(regions_list.size());
- int num_stats = static_cast<int>(stats_list.size());
- const int min_shrink_int = std::max(MIN_SHRINK_LEN, min_gop_show_frame_count);
-
- // find the region indices of where the first and last frame belong.
- int k_start = FindRegionIndex(regions_list, cur_start + frames_since_key);
- int k_last = FindRegionIndex(regions_list, cur_last + frames_since_key);
- if (cur_start + frames_since_key == 0) k_start = 0;
-
- int scenecut_idx = -1;
- // See if we have a scenecut in between
- for (int r = k_start + 1; r <= k_last; r++) {
- if (regions_list[r].type == SCENECUT_REGION &&
- regions_list[r].last - frames_since_key - cur_start >
- min_gop_show_frame_count) {
- scenecut_idx = r;
- break;
- }
- }
-
- // if the found scenecut is very close to the end, ignore it.
- if (scenecut_idx >= 0 &&
- regions_list[num_regions - 1].last - regions_list[scenecut_idx].last <
- 4) {
- scenecut_idx = -1;
- }
-
- if (scenecut_idx != -1) {
- // If we have a scenecut, then stop at it.
- // TODO(bohanli): add logic here to stop before the scenecut and for
- // the next gop start from the scenecut with GF
- int is_minor_sc =
- (regions_list[scenecut_idx].avg_cor_coeff *
- (1 - stats_list[order_index + regions_list[scenecut_idx].start -
- frames_since_key]
- .noise_var /
- regions_list[scenecut_idx].avg_intra_err) >
- 0.6);
- cur_last =
- regions_list[scenecut_idx].last - frames_since_key - !is_minor_sc;
- } else {
- int is_last_analysed =
- (k_last == num_regions - 1) &&
- (cur_last + frames_since_key == regions_list[k_last].last);
- int not_enough_regions =
- k_last - k_start <= 1 + (regions_list[k_start].type == SCENECUT_REGION);
- // if we are very close to the end, then do not shrink since it may
- // introduce intervals that are too short
- if (!(is_last_analysed && not_enough_regions)) {
- const double arf_length_factor = 0.1;
- double best_score = 0;
- int best_j = -1;
- const int first_frame = regions_list[0].start - frames_since_key;
- const int last_frame =
- regions_list[num_regions - 1].last - frames_since_key;
- // score of how much the arf helps the whole GOP
- double base_score = 0.0;
- // Accumulate base_score in
- for (int j = cur_start + 1; j < cur_start + min_shrink_int; j++) {
- if (order_index + j >= num_stats) break;
- base_score = (base_score + 1.0) * stats_list[order_index + j].cor_coeff;
- }
- int met_blending = 0; // Whether we have met blending areas before
- int last_blending = 0; // Whether the previous frame if blending
- for (int j = cur_start + min_shrink_int; j <= cur_last; j++) {
- if (order_index + j >= num_stats) break;
- base_score = (base_score + 1.0) * stats_list[order_index + j].cor_coeff;
- int this_reg = FindRegionIndex(regions_list, j + frames_since_key);
- if (this_reg < 0) continue;
- // A GOP should include at most 1 blending region.
- if (regions_list[this_reg].type == BLENDING_REGION) {
- last_blending = 1;
- if (met_blending) {
- break;
- } else {
- base_score = 0;
- continue;
- }
- } else {
- if (last_blending) met_blending = 1;
- last_blending = 0;
- }
-
- // Add the factor of how good the neighborhood is for this
- // candidate arf.
- double this_score = arf_length_factor * base_score;
- double temp_accu_coeff = 1.0;
- // following frames
- int count_f = 0;
- for (int n = j + 1; n <= j + 3 && n <= last_frame; n++) {
- if (order_index + n >= num_stats) break;
- temp_accu_coeff *= stats_list[order_index + n].cor_coeff;
- this_score +=
- temp_accu_coeff *
- (1 - stats_list[order_index + n].noise_var /
- AOMMAX(regions_list[this_reg].avg_intra_err, 0.001));
- count_f++;
- }
- // preceding frames
- temp_accu_coeff = 1.0;
- for (int n = j; n > j - 3 * 2 + count_f && n > first_frame; n--) {
- if (order_index + n < 0) break;
- temp_accu_coeff *= stats_list[order_index + n].cor_coeff;
- this_score +=
- temp_accu_coeff *
- (1 - stats_list[order_index + n].noise_var /
- AOMMAX(regions_list[this_reg].avg_intra_err, 0.001));
- }
-
- if (this_score > best_score) {
- best_score = this_score;
- best_j = j;
- }
- }
-
- // For blending areas, move one more frame in case we missed the
- // first blending frame.
- int best_reg = FindRegionIndex(regions_list, best_j + frames_since_key);
- if (best_reg < num_regions - 1 && best_reg > 0) {
- if (regions_list[best_reg - 1].type == BLENDING_REGION &&
- regions_list[best_reg + 1].type == BLENDING_REGION) {
- if (best_j + frames_since_key == regions_list[best_reg].start &&
- best_j + frames_since_key < regions_list[best_reg].last) {
- best_j += 1;
- } else if (best_j + frames_since_key == regions_list[best_reg].last &&
- best_j + frames_since_key > regions_list[best_reg].start) {
- best_j -= 1;
- }
- }
- }
-
- if (cur_last - best_j < 2) best_j = cur_last;
- if (best_j > 0 && best_score > 0.1) cur_last = best_j;
- // if cannot find anything, just cut at the original place.
- }
- }
-
- return cur_last;
-}
-
-// Function to test for a condition where a complex transition is followed
-// by a static section. For example in slide shows where there is a fade
-// between slides. This is to help with more optimal kf and gf positioning.
-static bool DetectTransitionToStill(
- const std::vector<FIRSTPASS_STATS> &stats_list, int next_stats_index,
- int min_gop_show_frame_count, int frame_interval, int still_interval,
- double loop_decay_rate, double last_decay_rate) {
- // Break clause to detect very still sections after motion
- // For example a static image after a fade or other transition
- // instead of a clean scene cut.
- if (frame_interval > min_gop_show_frame_count && loop_decay_rate >= 0.999 &&
- last_decay_rate < 0.9) {
- int stats_count = static_cast<int>(stats_list.size());
- int stats_left = stats_count - next_stats_index;
- if (stats_left >= still_interval) {
- // Look ahead a few frames to see if static condition persists...
- int j;
- for (j = 0; j < still_interval; ++j) {
- const FIRSTPASS_STATS &stats = stats_list[next_stats_index + j];
- if (stats.pcnt_inter - stats.pcnt_motion < 0.999) break;
- }
- // Only if it does do we signal a transition to still.
- return j == still_interval;
- }
- }
- return false;
-}
-
-static int DetectGopCut(const std::vector<FIRSTPASS_STATS> &stats_list,
- int start_idx, int candidate_cut_idx, int next_key_idx,
- int flash_detected, int min_gop_show_frame_count,
- int max_gop_show_frame_count, int frame_width,
- int frame_height, const GF_GROUP_STATS &gf_stats) {
- (void)max_gop_show_frame_count;
- const int candidate_gop_size = candidate_cut_idx - start_idx;
-
- if (!flash_detected) {
- // Break clause to detect very still sections after motion. For example,
- // a static image after a fade or other transition.
- if (DetectTransitionToStill(stats_list, start_idx, min_gop_show_frame_count,
- candidate_gop_size, 5, gf_stats.loop_decay_rate,
- gf_stats.last_loop_decay_rate)) {
- return 1;
- }
- const double arf_abs_zoom_thresh = 4.4;
- // Motion breakout threshold for loop below depends on image size.
- const double mv_ratio_accumulator_thresh =
- (frame_height + frame_width) / 4.0;
- // Some conditions to breakout after min interval.
- if (candidate_gop_size >= min_gop_show_frame_count &&
- // If possible don't break very close to a kf
- (next_key_idx - candidate_cut_idx >= min_gop_show_frame_count) &&
- (candidate_gop_size & 0x01) &&
- (gf_stats.mv_ratio_accumulator > mv_ratio_accumulator_thresh ||
- gf_stats.abs_mv_in_out_accumulator > arf_abs_zoom_thresh)) {
- return 1;
- }
- }
-
- // TODO(b/231489624): Check if we need this part.
- // If almost totally static, we will not use the the max GF length later,
- // so we can continue for more frames.
- // if ((candidate_gop_size >= active_max_gf_interval + 1) &&
- // !is_almost_static(gf_stats->zero_motion_accumulator,
- // twopass->kf_zeromotion_pct, cpi->ppi->lap_enabled)) {
- // return 0;
- // }
- return 0;
-}
-
-/*!\brief Determine the length of future GF groups.
- *
- * \ingroup gf_group_algo
- * This function decides the gf group length of future frames in batch
- *
- * \param[in] rc_param Rate control parameters
- * \param[in] stats_list List of first pass stats
- * \param[in] regions_list List of regions from av1_identify_regions
- * \param[in] order_index Index of current frame in stats_list
- * \param[in] frames_since_key Number of frames since the last key frame
- * \param[in] frames_to_key Number of frames to the next key frame
- *
- * \return Returns a vector of decided GF group lengths.
- */
-static std::vector<int> PartitionGopIntervals(
- const RateControlParam &rc_param,
- const std::vector<FIRSTPASS_STATS> &stats_list,
- const std::vector<REGIONS> ®ions_list, int order_index,
- int frames_since_key, int frames_to_key) {
- int i = 0;
- // If cpi->gf_state.arf_gf_boost_lst is 0, we are starting with a KF or GF.
- int cur_start = 0;
- // Each element is the last frame of the previous GOP. If there are n GOPs,
- // you need n + 1 cuts to find the durations. So cut_pos starts out with -1,
- // which is the last frame of the previous GOP.
- std::vector<int> cut_pos(1, -1);
- int cut_here = 0;
- GF_GROUP_STATS gf_stats;
- InitGFStats(&gf_stats);
- int num_stats = static_cast<int>(stats_list.size());
-
- while (i + order_index < num_stats) {
- // reaches next key frame, break here
- if (i >= frames_to_key - 1) {
- cut_here = 2;
- } else if (i - cur_start >= rc_param.max_gop_show_frame_count) {
- // reached maximum len, but nothing special yet (almost static)
- // let's look at the next interval
- cut_here = 2;
- } else {
- // Test for the case where there is a brief flash but the prediction
- // quality back to an earlier frame is then restored.
- const int gop_start_idx = cur_start + order_index;
- const int candidate_gop_cut_idx = i + order_index;
- const int next_key_idx = frames_to_key + order_index;
- const bool flash_detected =
- DetectFlash(stats_list, candidate_gop_cut_idx);
-
- // TODO(bohanli): remove redundant accumulations here, or unify
- // this and the ones in define_gf_group
- const FIRSTPASS_STATS *stats = &stats_list[candidate_gop_cut_idx];
- av1_accumulate_next_frame_stats(stats, flash_detected, frames_since_key,
- i, &gf_stats, rc_param.frame_width,
- rc_param.frame_height);
-
- // TODO(angiebird): Can we simplify this part? Looks like we are going to
- // change the gop cut index with FindBetterGopCut() anyway.
- cut_here = DetectGopCut(
- stats_list, gop_start_idx, candidate_gop_cut_idx, next_key_idx,
- flash_detected, rc_param.min_gop_show_frame_count,
- rc_param.max_gop_show_frame_count, rc_param.frame_width,
- rc_param.frame_height, gf_stats);
- }
-
- if (!cut_here) {
- ++i;
- continue;
- }
-
- // the current last frame in the gf group
- int original_last = cut_here > 1 ? i : i - 1;
- int cur_last = FindBetterGopCut(
- stats_list, regions_list, rc_param.min_gop_show_frame_count,
- rc_param.max_gop_show_frame_count, order_index, cur_start,
- original_last, frames_since_key);
- // only try shrinking if interval smaller than active_max_gf_interval
- cut_pos.push_back(cur_last);
-
- // reset pointers to the shrunken location
- cur_start = cur_last;
- int cur_region_idx =
- FindRegionIndex(regions_list, cur_start + 1 + frames_since_key);
- if (cur_region_idx >= 0)
- if (regions_list[cur_region_idx].type == SCENECUT_REGION) cur_start++;
-
- // reset accumulators
- InitGFStats(&gf_stats);
- i = cur_last + 1;
-
- if (cut_here == 2 && i >= frames_to_key) break;
- }
-
- std::vector<int> gf_intervals;
- // save intervals
- for (size_t n = 1; n < cut_pos.size(); n++) {
- gf_intervals.push_back(cut_pos[n] - cut_pos[n - 1]);
- }
-
- return gf_intervals;
-}
-
-StatusOr<GopStructList> AV1RateControlQMode::DetermineGopInfo(
- const FirstpassInfo &firstpass_info) {
- const int stats_size = static_cast<int>(firstpass_info.stats_list.size());
- GopStructList gop_list;
- RefFrameManager ref_frame_manager(rc_param_.ref_frame_table_size,
- rc_param_.max_ref_frames);
-
- // Make a copy of the first pass stats, and analyze them
- FirstpassInfo fp_info_copy = firstpass_info;
- av1_mark_flashes(fp_info_copy.stats_list.data(),
- fp_info_copy.stats_list.data() + stats_size);
- av1_estimate_noise(fp_info_copy.stats_list.data(),
- fp_info_copy.stats_list.data() + stats_size);
- av1_estimate_coeff(fp_info_copy.stats_list.data(),
- fp_info_copy.stats_list.data() + stats_size);
-
- int global_coding_idx_offset = 0;
- int global_order_idx_offset = 0;
- std::vector<int> key_frame_list = GetKeyFrameList(fp_info_copy);
- key_frame_list.push_back(stats_size); // a sentinel value
- for (size_t ki = 0; ki + 1 < key_frame_list.size(); ++ki) {
- int frames_to_key = key_frame_list[ki + 1] - key_frame_list[ki];
- int key_order_index = key_frame_list[ki]; // The key frame's display order
-
- std::vector<REGIONS> regions_list(MAX_FIRSTPASS_ANALYSIS_FRAMES);
- int total_regions = 0;
- av1_identify_regions(fp_info_copy.stats_list.data() + key_order_index,
- frames_to_key, 0, regions_list.data(), &total_regions);
- regions_list.resize(total_regions);
- std::vector<int> gf_intervals = PartitionGopIntervals(
- rc_param_, fp_info_copy.stats_list, regions_list, key_order_index,
- /*frames_since_key=*/0, frames_to_key);
- for (size_t gi = 0; gi < gf_intervals.size(); ++gi) {
- const bool has_key_frame = gi == 0;
- const int show_frame_count = gf_intervals[gi];
- GopStruct gop =
- ConstructGop(&ref_frame_manager, show_frame_count, has_key_frame,
- global_coding_idx_offset, global_order_idx_offset);
- assert(gop.show_frame_count == show_frame_count);
- global_coding_idx_offset += static_cast<int>(gop.gop_frame_list.size());
- global_order_idx_offset += gop.show_frame_count;
- gop_list.push_back(gop);
- }
- }
- return gop_list;
-}
-
-TplFrameDepStats CreateTplFrameDepStats(int frame_height, int frame_width,
- int min_block_size) {
- const int unit_rows = (frame_height + min_block_size - 1) / min_block_size;
- const int unit_cols = (frame_width + min_block_size - 1) / min_block_size;
- TplFrameDepStats frame_dep_stats;
- frame_dep_stats.unit_size = min_block_size;
- frame_dep_stats.unit_stats.resize(unit_rows);
- for (auto &row : frame_dep_stats.unit_stats) {
- row.resize(unit_cols);
- }
- return frame_dep_stats;
-}
-
-TplUnitDepStats TplBlockStatsToDepStats(const TplBlockStats &block_stats,
- int unit_count) {
- TplUnitDepStats dep_stats = {};
- dep_stats.intra_cost = block_stats.intra_cost * 1.0 / unit_count;
- dep_stats.inter_cost = block_stats.inter_cost * 1.0 / unit_count;
- // In rare case, inter_cost may be greater than intra_cost.
- // If so, we need to modify inter_cost such that inter_cost <= intra_cost
- // because it is required by GetPropagationFraction()
- dep_stats.inter_cost = std::min(dep_stats.intra_cost, dep_stats.inter_cost);
- dep_stats.mv = block_stats.mv;
- dep_stats.ref_frame_index = block_stats.ref_frame_index;
- return dep_stats;
-}
-
-namespace {
-Status ValidateBlockStats(const TplFrameStats &frame_stats,
- const TplBlockStats &block_stats,
- int min_block_size) {
- if (block_stats.col >= frame_stats.frame_width ||
- block_stats.row >= frame_stats.frame_height) {
- std::ostringstream error_message;
- error_message << "Block position (" << block_stats.col << ", "
- << block_stats.row
- << ") is out of range; frame dimensions are "
- << frame_stats.frame_width << " x "
- << frame_stats.frame_height;
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- if (block_stats.col % min_block_size != 0 ||
- block_stats.row % min_block_size != 0 ||
- block_stats.width % min_block_size != 0 ||
- block_stats.height % min_block_size != 0) {
- std::ostringstream error_message;
- error_message
- << "Invalid block position or dimension, must be a multiple of "
- << min_block_size << "; col = " << block_stats.col
- << ", row = " << block_stats.row << ", width = " << block_stats.width
- << ", height = " << block_stats.height;
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- return { AOM_CODEC_OK, "" };
-}
-
-Status ValidateTplStats(const GopStruct &gop_struct,
- const TplGopStats &tpl_gop_stats) {
- constexpr char kAdvice[] =
- "Do the current RateControlParam settings match those used to generate "
- "the TPL stats?";
- if (gop_struct.gop_frame_list.size() !=
- tpl_gop_stats.frame_stats_list.size()) {
- std::ostringstream error_message;
- error_message << "Frame count of GopStruct ("
- << gop_struct.gop_frame_list.size()
- << ") doesn't match frame count of TPL stats ("
- << tpl_gop_stats.frame_stats_list.size() << "). " << kAdvice;
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- for (int i = 0; i < static_cast<int>(gop_struct.gop_frame_list.size()); ++i) {
- const bool is_ref_frame = gop_struct.gop_frame_list[i].update_ref_idx >= 0;
- const bool has_tpl_stats =
- !tpl_gop_stats.frame_stats_list[i].block_stats_list.empty();
- if (is_ref_frame && !has_tpl_stats) {
- std::ostringstream error_message;
- error_message << "The frame with global_coding_idx "
- << gop_struct.gop_frame_list[i].global_coding_idx
- << " is a reference frame, but has no TPL stats. "
- << kAdvice;
- return { AOM_CODEC_INVALID_PARAM, error_message.str() };
- }
- }
- return { AOM_CODEC_OK, "" };
-}
-} // namespace
-
-StatusOr<TplFrameDepStats> CreateTplFrameDepStatsWithoutPropagation(
- const TplFrameStats &frame_stats) {
- if (frame_stats.block_stats_list.empty()) {
- return TplFrameDepStats();
- }
- const int min_block_size = frame_stats.min_block_size;
- const int unit_rows =
- (frame_stats.frame_height + min_block_size - 1) / min_block_size;
- const int unit_cols =
- (frame_stats.frame_width + min_block_size - 1) / min_block_size;
- TplFrameDepStats frame_dep_stats = CreateTplFrameDepStats(
- frame_stats.frame_height, frame_stats.frame_width, min_block_size);
- for (const TplBlockStats &block_stats : frame_stats.block_stats_list) {
- Status status =
- ValidateBlockStats(frame_stats, block_stats, min_block_size);
- if (!status.ok()) {
- return status;
- }
- const int block_unit_row = block_stats.row / min_block_size;
- const int block_unit_col = block_stats.col / min_block_size;
- // The block must start within the frame boundaries, but it may extend past
- // the right edge or bottom of the frame. Find the number of unit rows and
- // columns in the block which are fully within the frame.
- const int block_unit_rows = std::min(block_stats.height / min_block_size,
- unit_rows - block_unit_row);
- const int block_unit_cols = std::min(block_stats.width / min_block_size,
- unit_cols - block_unit_col);
- const int unit_count = block_unit_rows * block_unit_cols;
- TplUnitDepStats unit_stats =
- TplBlockStatsToDepStats(block_stats, unit_count);
- for (int r = 0; r < block_unit_rows; r++) {
- for (int c = 0; c < block_unit_cols; c++) {
- frame_dep_stats.unit_stats[block_unit_row + r][block_unit_col + c] =
- unit_stats;
- }
- }
- }
-
- frame_dep_stats.rdcost = TplFrameDepStatsAccumulateInterCost(frame_dep_stats);
-
- return frame_dep_stats;
-}
-
-int GetRefCodingIdxList(const TplUnitDepStats &unit_dep_stats,
- const RefFrameTable &ref_frame_table,
- int *ref_coding_idx_list) {
- int ref_frame_count = 0;
- for (int i = 0; i < kBlockRefCount; ++i) {
- ref_coding_idx_list[i] = -1;
- int ref_frame_index = unit_dep_stats.ref_frame_index[i];
- if (ref_frame_index != -1) {
- assert(ref_frame_index < static_cast<int>(ref_frame_table.size()));
- ref_coding_idx_list[i] = ref_frame_table[ref_frame_index].coding_idx;
- ref_frame_count++;
- }
- }
- return ref_frame_count;
-}
-
-int GetBlockOverlapArea(int r0, int c0, int r1, int c1, int size) {
- const int r_low = std::max(r0, r1);
- const int r_high = std::min(r0 + size, r1 + size);
- const int c_low = std::max(c0, c1);
- const int c_high = std::min(c0 + size, c1 + size);
- if (r_high >= r_low && c_high >= c_low) {
- return (r_high - r_low) * (c_high - c_low);
- }
- return 0;
-}
-
-// TODO(angiebird): Merge TplFrameDepStatsAccumulateIntraCost and
-// TplFrameDepStatsAccumulate.
-double TplFrameDepStatsAccumulateIntraCost(
- const TplFrameDepStats &frame_dep_stats) {
- auto getIntraCost = [](double sum, const TplUnitDepStats &unit) {
- return sum + unit.intra_cost;
- };
- double sum = 0;
- for (const auto &row : frame_dep_stats.unit_stats) {
- sum = std::accumulate(row.begin(), row.end(), sum, getIntraCost);
- }
- return std::max(sum, 1.0);
-}
-
-double TplFrameDepStatsAccumulateInterCost(
- const TplFrameDepStats &frame_dep_stats) {
- auto getInterCost = [](double sum, const TplUnitDepStats &unit) {
- return sum + unit.inter_cost;
- };
- double sum = 0;
- for (const auto &row : frame_dep_stats.unit_stats) {
- sum = std::accumulate(row.begin(), row.end(), sum, getInterCost);
- }
- return std::max(sum, 1.0);
-}
-
-double TplFrameDepStatsAccumulate(const TplFrameDepStats &frame_dep_stats) {
- auto getOverallCost = [](double sum, const TplUnitDepStats &unit) {
- return sum + unit.propagation_cost + unit.intra_cost;
- };
- double sum = 0;
- for (const auto &row : frame_dep_stats.unit_stats) {
- sum = std::accumulate(row.begin(), row.end(), sum, getOverallCost);
- }
- return std::max(sum, 1.0);
-}
-
-// This is a generalization of GET_MV_RAWPEL that allows for an arbitrary
-// number of fractional bits.
-// TODO(angiebird): Add unit test to this function
-int GetFullpelValue(int subpel_value, int subpel_bits) {
- const int subpel_scale = (1 << subpel_bits);
- const int sign = subpel_value >= 0 ? 1 : -1;
- int fullpel_value = (abs(subpel_value) + subpel_scale / 2) >> subpel_bits;
- fullpel_value *= sign;
- return fullpel_value;
-}
-
-double GetPropagationFraction(const TplUnitDepStats &unit_dep_stats) {
- assert(unit_dep_stats.intra_cost >= unit_dep_stats.inter_cost);
- return (unit_dep_stats.intra_cost - unit_dep_stats.inter_cost) /
- ModifyDivisor(unit_dep_stats.intra_cost);
-}
-
-void TplFrameDepStatsPropagate(int coding_idx,
- const RefFrameTable &ref_frame_table,
- TplGopDepStats *tpl_gop_dep_stats) {
- assert(!tpl_gop_dep_stats->frame_dep_stats_list.empty());
- TplFrameDepStats *frame_dep_stats =
- &tpl_gop_dep_stats->frame_dep_stats_list[coding_idx];
-
- if (frame_dep_stats->unit_stats.empty()) return;
-
- const int unit_size = frame_dep_stats->unit_size;
- const int frame_unit_rows =
- static_cast<int>(frame_dep_stats->unit_stats.size());
- const int frame_unit_cols =
- static_cast<int>(frame_dep_stats->unit_stats[0].size());
- for (int unit_row = 0; unit_row < frame_unit_rows; ++unit_row) {
- for (int unit_col = 0; unit_col < frame_unit_cols; ++unit_col) {
- TplUnitDepStats &unit_dep_stats =
- frame_dep_stats->unit_stats[unit_row][unit_col];
- int ref_coding_idx_list[kBlockRefCount] = { -1, -1 };
- int ref_frame_count = GetRefCodingIdxList(unit_dep_stats, ref_frame_table,
- ref_coding_idx_list);
- if (ref_frame_count == 0) continue;
- for (int i = 0; i < kBlockRefCount; ++i) {
- if (ref_coding_idx_list[i] == -1) continue;
- assert(
- ref_coding_idx_list[i] <
- static_cast<int>(tpl_gop_dep_stats->frame_dep_stats_list.size()));
- TplFrameDepStats &ref_frame_dep_stats =
- tpl_gop_dep_stats->frame_dep_stats_list[ref_coding_idx_list[i]];
- assert(!ref_frame_dep_stats.unit_stats.empty());
- const auto &mv = unit_dep_stats.mv[i];
- const int mv_row = GetFullpelValue(mv.row, mv.subpel_bits);
- const int mv_col = GetFullpelValue(mv.col, mv.subpel_bits);
- const int ref_pixel_r = unit_row * unit_size + mv_row;
- const int ref_pixel_c = unit_col * unit_size + mv_col;
- const int ref_unit_row_low =
- (unit_row * unit_size + mv_row) / unit_size;
- const int ref_unit_col_low =
- (unit_col * unit_size + mv_col) / unit_size;
-
- for (int j = 0; j < 2; ++j) {
- for (int k = 0; k < 2; ++k) {
- const int ref_unit_row = ref_unit_row_low + j;
- const int ref_unit_col = ref_unit_col_low + k;
- if (ref_unit_row >= 0 && ref_unit_row < frame_unit_rows &&
- ref_unit_col >= 0 && ref_unit_col < frame_unit_cols) {
- const int overlap_area = GetBlockOverlapArea(
- ref_pixel_r, ref_pixel_c, ref_unit_row * unit_size,
- ref_unit_col * unit_size, unit_size);
- const double overlap_ratio =
- overlap_area * 1.0 / (unit_size * unit_size);
- const double propagation_fraction =
- GetPropagationFraction(unit_dep_stats);
- const double propagation_ratio =
- 1.0 / ref_frame_count * overlap_ratio * propagation_fraction;
- TplUnitDepStats &ref_unit_stats =
- ref_frame_dep_stats.unit_stats[ref_unit_row][ref_unit_col];
- ref_unit_stats.propagation_cost +=
- (unit_dep_stats.intra_cost +
- unit_dep_stats.propagation_cost) *
- propagation_ratio;
- }
- }
- }
- }
- }
- }
-}
-
-std::vector<RefFrameTable> AV1RateControlQMode::GetRefFrameTableList(
- const GopStruct &gop_struct,
- const std::vector<LookaheadStats> &lookahead_stats,
- RefFrameTable ref_frame_table) {
- if (gop_struct.global_coding_idx_offset == 0) {
- // For the first GOP, ref_frame_table need not be initialized. This is fine,
- // because the first frame (a key frame) will fully initialize it.
- ref_frame_table.assign(rc_param_.ref_frame_table_size, GopFrameInvalid());
- } else {
- // It's not the first GOP, so ref_frame_table must be valid.
- assert(static_cast<int>(ref_frame_table.size()) ==
- rc_param_.ref_frame_table_size);
- assert(std::all_of(ref_frame_table.begin(), ref_frame_table.end(),
- std::mem_fn(&GopFrame::is_valid)));
- // Reset the frame processing order of the initial ref_frame_table.
- for (GopFrame &gop_frame : ref_frame_table) gop_frame.coding_idx = -1;
- }
-
- std::vector<RefFrameTable> ref_frame_table_list;
- ref_frame_table_list.push_back(ref_frame_table);
- for (const GopFrame &gop_frame : gop_struct.gop_frame_list) {
- if (gop_frame.is_key_frame) {
- ref_frame_table.assign(rc_param_.ref_frame_table_size, gop_frame);
- } else if (gop_frame.update_ref_idx != -1) {
- assert(gop_frame.update_ref_idx <
- static_cast<int>(ref_frame_table.size()));
- ref_frame_table[gop_frame.update_ref_idx] = gop_frame;
- }
- ref_frame_table_list.push_back(ref_frame_table);
- }
-
- int gop_size_offset = static_cast<int>(gop_struct.gop_frame_list.size());
-
- for (const auto &lookahead_stat : lookahead_stats) {
- for (GopFrame gop_frame : lookahead_stat.gop_struct->gop_frame_list) {
- if (gop_frame.is_key_frame) {
- ref_frame_table.assign(rc_param_.ref_frame_table_size, gop_frame);
- } else if (gop_frame.update_ref_idx != -1) {
- assert(gop_frame.update_ref_idx <
- static_cast<int>(ref_frame_table.size()));
- gop_frame.coding_idx += gop_size_offset;
- ref_frame_table[gop_frame.update_ref_idx] = gop_frame;
- }
- ref_frame_table_list.push_back(ref_frame_table);
- }
- gop_size_offset +=
- static_cast<int>(lookahead_stat.gop_struct->gop_frame_list.size());
- }
-
- return ref_frame_table_list;
-}
-
-StatusOr<TplGopDepStats> ComputeTplGopDepStats(
- const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const std::vector<RefFrameTable> &ref_frame_table_list) {
- std::vector<const TplFrameStats *> tpl_frame_stats_list_with_lookahead;
- for (const auto &tpl_frame_stats : tpl_gop_stats.frame_stats_list) {
- tpl_frame_stats_list_with_lookahead.push_back(&tpl_frame_stats);
- }
- for (const auto &lookahead_stat : lookahead_stats) {
- for (const auto &tpl_frame_stats :
- lookahead_stat.tpl_gop_stats->frame_stats_list) {
- tpl_frame_stats_list_with_lookahead.push_back(&tpl_frame_stats);
- }
- }
-
- const int frame_count =
- static_cast<int>(tpl_frame_stats_list_with_lookahead.size());
-
- // Create the struct to store TPL dependency stats
- TplGopDepStats tpl_gop_dep_stats;
-
- tpl_gop_dep_stats.frame_dep_stats_list.reserve(frame_count);
- for (int coding_idx = 0; coding_idx < frame_count; coding_idx++) {
- const StatusOr<TplFrameDepStats> tpl_frame_dep_stats =
- CreateTplFrameDepStatsWithoutPropagation(
- *tpl_frame_stats_list_with_lookahead[coding_idx]);
- if (!tpl_frame_dep_stats.ok()) {
- return tpl_frame_dep_stats.status();
- }
- tpl_gop_dep_stats.frame_dep_stats_list.push_back(
- std::move(*tpl_frame_dep_stats));
- }
-
- // Back propagation
- for (int coding_idx = frame_count - 1; coding_idx >= 0; coding_idx--) {
- auto &ref_frame_table = ref_frame_table_list[coding_idx];
- // TODO(angiebird): Handle/test the case where reference frame
- // is in the previous GOP
- TplFrameDepStatsPropagate(coding_idx, ref_frame_table, &tpl_gop_dep_stats);
- }
- return tpl_gop_dep_stats;
-}
-
-static std::vector<uint8_t> SetupDeltaQ(const TplFrameDepStats &frame_dep_stats,
- int frame_width, int frame_height,
- int base_qindex,
- double frame_importance) {
- // TODO(jianj) : Add support to various superblock sizes.
- const int sb_size = 64;
- const int delta_q_res = 4;
- const int num_unit_per_sb = sb_size / frame_dep_stats.unit_size;
- const int sb_rows = (frame_height + sb_size - 1) / sb_size;
- const int sb_cols = (frame_width + sb_size - 1) / sb_size;
- const int unit_rows = (frame_height + frame_dep_stats.unit_size - 1) /
- frame_dep_stats.unit_size;
- const int unit_cols =
- (frame_width + frame_dep_stats.unit_size - 1) / frame_dep_stats.unit_size;
- std::vector<uint8_t> superblock_q_indices;
- // Calculate delta_q offset for each superblock.
- for (int sb_row = 0; sb_row < sb_rows; ++sb_row) {
- for (int sb_col = 0; sb_col < sb_cols; ++sb_col) {
- double intra_cost = 0;
- double mc_dep_cost = 0;
- const int unit_row_start = sb_row * num_unit_per_sb;
- const int unit_row_end =
- std::min((sb_row + 1) * num_unit_per_sb, unit_rows);
- const int unit_col_start = sb_col * num_unit_per_sb;
- const int unit_col_end =
- std::min((sb_col + 1) * num_unit_per_sb, unit_cols);
- // A simplified version of av1_get_q_for_deltaq_objective()
- for (int unit_row = unit_row_start; unit_row < unit_row_end; ++unit_row) {
- for (int unit_col = unit_col_start; unit_col < unit_col_end;
- ++unit_col) {
- const TplUnitDepStats &unit_dep_stat =
- frame_dep_stats.unit_stats[unit_row][unit_col];
- intra_cost += unit_dep_stat.intra_cost;
- mc_dep_cost += unit_dep_stat.propagation_cost;
- }
- }
-
- double beta = 1.0;
- if (mc_dep_cost > 0 && intra_cost > 0) {
- const double r0 = 1 / frame_importance;
- const double rk = intra_cost / mc_dep_cost;
- beta = r0 / rk;
- assert(beta > 0.0);
- }
- int offset = av1_get_deltaq_offset(AOM_BITS_8, base_qindex, beta);
- offset = std::min(offset, delta_q_res * 9 - 1);
- offset = std::max(offset, -delta_q_res * 9 + 1);
- int qindex = offset + base_qindex;
- qindex = std::min(qindex, MAXQ);
- qindex = std::max(qindex, MINQ);
- qindex = av1_adjust_q_from_delta_q_res(delta_q_res, base_qindex, qindex);
- superblock_q_indices.push_back(static_cast<uint8_t>(qindex));
- }
- }
-
- return superblock_q_indices;
-}
-
-static std::unordered_map<int, double> FindKMeansClusterMap(
- const std::vector<uint8_t> &qindices,
- const std::vector<double> ¢roids) {
- std::unordered_map<int, double> cluster_map;
- for (const uint8_t qindex : qindices) {
- double nearest_centroid = *std::min_element(
- centroids.begin(), centroids.end(),
- [qindex](const double centroid_a, const double centroid_b) {
- return fabs(centroid_a - qindex) < fabs(centroid_b - qindex);
- });
- cluster_map.insert({ qindex, nearest_centroid });
- }
- return cluster_map;
-}
-
-namespace internal {
-
-std::unordered_map<int, int> KMeans(std::vector<uint8_t> qindices, int k) {
- std::vector<double> centroids;
- // Initialize the centroids with first k qindices
- std::unordered_set<int> qindices_set;
-
- for (const uint8_t qp : qindices) {
- if (!qindices_set.insert(qp).second) continue; // Already added.
- centroids.push_back(qp);
- if (static_cast<int>(centroids.size()) >= k) break;
- }
-
- std::unordered_map<int, double> intermediate_cluster_map;
- while (true) {
- // Find the closest centroid for each qindex
- intermediate_cluster_map = FindKMeansClusterMap(qindices, centroids);
- // For each cluster, calculate the new centroids
- std::unordered_map<double, std::vector<int>> centroid_to_qindices;
- for (const auto &qindex_centroid : intermediate_cluster_map) {
- centroid_to_qindices[qindex_centroid.second].push_back(
- qindex_centroid.first);
- }
- bool centroids_changed = false;
- std::vector<double> new_centroids;
- for (const auto &cluster : centroid_to_qindices) {
- double sum = 0.0;
- for (const int qindex : cluster.second) {
- sum += qindex;
- }
- double new_centroid = sum / cluster.second.size();
- new_centroids.push_back(new_centroid);
- if (new_centroid != cluster.first) centroids_changed = true;
- }
- if (!centroids_changed) break;
- centroids = new_centroids;
- }
- std::unordered_map<int, int> cluster_map;
- for (const auto &qindex_centroid : intermediate_cluster_map) {
- cluster_map.insert(
- { qindex_centroid.first, static_cast<int>(qindex_centroid.second) });
- }
- return cluster_map;
-}
-} // namespace internal
-
-static int GetRDMult(const GopFrame &gop_frame, int q_index) {
- // TODO(angiebird):
- // 1) Check if these rdmult rules are good in our use case.
- // 2) Support high-bit-depth mode
- if (gop_frame.is_golden_frame) {
- // Assume ARF_UPDATE/GF_UPDATE share the same remult rule.
- return av1_compute_rd_mult_based_on_qindex(AOM_BITS_8, GF_UPDATE, q_index);
- } else if (gop_frame.is_key_frame) {
- return av1_compute_rd_mult_based_on_qindex(AOM_BITS_8, KF_UPDATE, q_index);
- } else {
- // Assume LF_UPDATE/OVERLAY_UPDATE/INTNL_OVERLAY_UPDATE/INTNL_ARF_UPDATE
- // share the same remult rule.
- return av1_compute_rd_mult_based_on_qindex(AOM_BITS_8, LF_UPDATE, q_index);
- }
-}
-
-StatusOr<GopEncodeInfo> AV1RateControlQMode::GetGopEncodeInfoWithNoStats(
- const GopStruct &gop_struct) {
- GopEncodeInfo gop_encode_info;
- const int frame_count = static_cast<int>(gop_struct.gop_frame_list.size());
- for (int i = 0; i < frame_count; i++) {
- FrameEncodeParameters param;
- const GopFrame &gop_frame = gop_struct.gop_frame_list[i];
- // Use constant QP for TPL pass encoding. Keep the functionality
- // that allows QP changes across sub-gop.
- param.q_index = rc_param_.base_q_index;
- param.rdmult = av1_compute_rd_mult_based_on_qindex(AOM_BITS_8, LF_UPDATE,
- rc_param_.base_q_index);
- // TODO(jingning): gop_frame is needed in two pass tpl later.
- (void)gop_frame;
-
- if (rc_param_.tpl_pass_index) {
- if (gop_frame.update_type == GopFrameType::kRegularGolden ||
- gop_frame.update_type == GopFrameType::kRegularKey ||
- gop_frame.update_type == GopFrameType::kRegularArf) {
- double qstep_ratio = 1 / 3.0;
- param.q_index = av1_get_q_index_from_qstep_ratio(
- rc_param_.base_q_index, qstep_ratio, AOM_BITS_8);
- if (rc_param_.base_q_index) param.q_index = AOMMAX(param.q_index, 1);
- }
- }
- gop_encode_info.param_list.push_back(param);
- }
- return gop_encode_info;
-}
-
-StatusOr<GopEncodeInfo> AV1RateControlQMode::GetGopEncodeInfoWithFp(
- const GopStruct &gop_struct,
- const FirstpassInfo &firstpass_info AOM_UNUSED) {
- // TODO(b/260859962): This is currently a placeholder. Should use the fp
- // stats to calculate frame-level qp.
- return GetGopEncodeInfoWithNoStats(gop_struct);
-}
-
-StatusOr<GopEncodeInfo> AV1RateControlQMode::GetGopEncodeInfoWithTpl(
- const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const RefFrameTable &ref_frame_table_snapshot_init) {
- const std::vector<RefFrameTable> ref_frame_table_list = GetRefFrameTableList(
- gop_struct, lookahead_stats, ref_frame_table_snapshot_init);
-
- GopEncodeInfo gop_encode_info;
- gop_encode_info.final_snapshot = ref_frame_table_list.back();
- StatusOr<TplGopDepStats> gop_dep_stats = ComputeTplGopDepStats(
- tpl_gop_stats, lookahead_stats, ref_frame_table_list);
- if (!gop_dep_stats.ok()) {
- return gop_dep_stats.status();
- }
- const int frame_count =
- static_cast<int>(tpl_gop_stats.frame_stats_list.size());
- const int active_worst_quality = rc_param_.base_q_index;
- int active_best_quality = rc_param_.base_q_index;
- for (int i = 0; i < frame_count; i++) {
- FrameEncodeParameters param;
- const GopFrame &gop_frame = gop_struct.gop_frame_list[i];
-
- if (gop_frame.update_type == GopFrameType::kOverlay ||
- gop_frame.update_type == GopFrameType::kIntermediateOverlay ||
- gop_frame.update_type == GopFrameType::kRegularLeaf) {
- param.q_index = rc_param_.base_q_index;
- } else if (gop_frame.update_type == GopFrameType::kRegularGolden ||
- gop_frame.update_type == GopFrameType::kRegularKey ||
- gop_frame.update_type == GopFrameType::kRegularArf) {
- const TplFrameDepStats &frame_dep_stats =
- gop_dep_stats->frame_dep_stats_list[i];
- const double cost_without_propagation =
- TplFrameDepStatsAccumulateIntraCost(frame_dep_stats);
- const double cost_with_propagation =
- TplFrameDepStatsAccumulate(frame_dep_stats);
- const double frame_importance =
- cost_with_propagation / cost_without_propagation;
- // Imitate the behavior of av1_tpl_get_qstep_ratio()
- const double qstep_ratio = sqrt(1 / frame_importance);
- param.q_index = av1_get_q_index_from_qstep_ratio(rc_param_.base_q_index,
- qstep_ratio, AOM_BITS_8);
- if (rc_param_.base_q_index) param.q_index = AOMMAX(param.q_index, 1);
- active_best_quality = param.q_index;
-
- if (rc_param_.max_distinct_q_indices_per_frame > 1) {
- std::vector<uint8_t> superblock_q_indices = SetupDeltaQ(
- frame_dep_stats, rc_param_.frame_width, rc_param_.frame_height,
- param.q_index, frame_importance);
- std::unordered_map<int, int> qindex_centroids = internal::KMeans(
- superblock_q_indices, rc_param_.max_distinct_q_indices_per_frame);
- for (size_t i = 0; i < superblock_q_indices.size(); ++i) {
- const int curr_sb_qindex =
- qindex_centroids.find(superblock_q_indices[i])->second;
- const int delta_q_res = 4;
- const int adjusted_qindex =
- param.q_index +
- (curr_sb_qindex - param.q_index) / delta_q_res * delta_q_res;
- const int rd_mult = GetRDMult(gop_frame, adjusted_qindex);
- param.superblock_encode_params.push_back(
- { static_cast<uint8_t>(adjusted_qindex), rd_mult });
- }
- }
- } else {
- // Intermediate ARFs
- assert(gop_frame.layer_depth >= 1);
- const int depth_factor = 1 << (gop_frame.layer_depth - 1);
- param.q_index =
- (active_worst_quality * (depth_factor - 1) + active_best_quality) /
- depth_factor;
- }
- param.rdmult = GetRDMult(gop_frame, param.q_index);
- gop_encode_info.param_list.push_back(param);
- }
- return gop_encode_info;
-}
-
-StatusOr<GopEncodeInfo> AV1RateControlQMode::GetTplPassGopEncodeInfo(
- const GopStruct &gop_struct, const FirstpassInfo &firstpass_info) {
- return GetGopEncodeInfoWithFp(gop_struct, firstpass_info);
-}
-
-StatusOr<GopEncodeInfo> AV1RateControlQMode::GetGopEncodeInfo(
- const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const FirstpassInfo &firstpass_info AOM_UNUSED,
- const RefFrameTable &ref_frame_table_snapshot_init) {
- // When TPL stats are not valid, use first pass stats.
- Status status = ValidateTplStats(gop_struct, tpl_gop_stats);
- if (!status.ok()) {
- return status;
- }
-
- for (const auto &lookahead_stat : lookahead_stats) {
- Status status = ValidateTplStats(*lookahead_stat.gop_struct,
- *lookahead_stat.tpl_gop_stats);
- if (!status.ok()) {
- return status;
- }
- }
-
- // TODO(b/260859962): Currently firstpass stats are used as an alternative,
- // but we could also combine it with tpl results in the future for more
- // stable qp determination.
- return GetGopEncodeInfoWithTpl(gop_struct, tpl_gop_stats, lookahead_stats,
- ref_frame_table_snapshot_init);
-}
-
-} // namespace aom
diff --git a/av1/qmode_rc/ratectrl_qmode.h b/av1/qmode_rc/ratectrl_qmode.h
deleted file mode 100644
index f60000e..0000000
--- a/av1/qmode_rc/ratectrl_qmode.h
+++ /dev/null
@@ -1,141 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#ifndef AOM_AV1_QMODE_RC_RATECTRL_QMODE_H_
-#define AOM_AV1_QMODE_RC_RATECTRL_QMODE_H_
-
-#include <deque>
-#include <queue>
-#include <unordered_map>
-#include <vector>
-#include "av1/encoder/firstpass.h"
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-#include "av1/qmode_rc/reference_manager.h"
-
-namespace aom {
-
-constexpr int kLayerDepthOffset = 1;
-constexpr int kMinIntervalToAddArf = 3;
-constexpr int kMinArfInterval = (kMinIntervalToAddArf + 1) / 2;
-
-struct TplUnitDepStats {
- double propagation_cost;
- double intra_cost;
- double inter_cost;
- std::array<MotionVector, kBlockRefCount> mv;
- std::array<int, kBlockRefCount> ref_frame_index;
-};
-
-struct TplFrameDepStats {
- int unit_size; // equivalent to min_block_size
- double rdcost; // overall rate-distortion cost
- std::vector<std::vector<TplUnitDepStats>> unit_stats;
-};
-
-struct TplGopDepStats {
- std::vector<TplFrameDepStats> frame_dep_stats_list;
-};
-
-GopFrame GopFrameInvalid();
-
-// Set up is_key_frame, is_arf_frame, is_show_frame, is_golden_frame and
-// encode_ref_mode in GopFrame based on gop_frame_type
-void SetGopFrameByType(GopFrameType gop_frame_type, GopFrame *gop_frame);
-
-GopFrame GopFrameBasic(int global_coding_idx_offset,
- int global_order_idx_offset, int coding_idx,
- int order_idx, int depth, int display_idx,
- GopFrameType gop_frame_type);
-
-GopStruct ConstructGop(RefFrameManager *ref_frame_manager, int show_frame_count,
- bool has_key_frame, int global_coding_idx_offset,
- int global_order_idx_offset);
-
-// Creates a TplFrameDepStats containing an 2D array of default-initialized
-// TplUnitDepStats, with dimensions of
-// ceil(frame_height / min_block_size) x ceil(frame_width / min_block_size).
-// i.e., there will be one entry for each square block of size min_block_size,
-// and blocks along the bottom or right edge of the frame may extend beyond the
-// edges of the frame.
-TplFrameDepStats CreateTplFrameDepStats(int frame_height, int frame_width,
- int min_block_size);
-
-TplUnitDepStats TplBlockStatsToDepStats(const TplBlockStats &block_stats,
- int unit_count);
-
-StatusOr<TplFrameDepStats> CreateTplFrameDepStatsWithoutPropagation(
- const TplFrameStats &frame_stats);
-
-std::vector<int> GetKeyFrameList(const FirstpassInfo &first_pass_info);
-
-double TplFrameDepStatsAccumulateIntraCost(
- const TplFrameDepStats &frame_dep_stats);
-
-double TplFrameDepStatsAccumulateInterCost(
- const TplFrameDepStats &frame_dep_stats);
-
-double TplFrameDepStatsAccumulate(const TplFrameDepStats &frame_dep_stats);
-
-void TplFrameDepStatsPropagate(int coding_idx,
- const RefFrameTable &ref_frame_table,
- TplGopDepStats *tpl_gop_dep_stats);
-
-int GetBlockOverlapArea(int r0, int c0, int r1, int c1, int size);
-
-namespace internal {
-std::unordered_map<int, int> KMeans(std::vector<uint8_t> qindices, int k);
-}
-
-StatusOr<TplGopDepStats> ComputeTplGopDepStats(
- const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const std::vector<RefFrameTable> &ref_frame_table_list);
-
-class AV1RateControlQMode : public AV1RateControlQModeInterface {
- public:
- Status SetRcParam(const RateControlParam &rc_param) override;
- StatusOr<GopStructList> DetermineGopInfo(
- const FirstpassInfo &firstpass_info) override;
- StatusOr<GopEncodeInfo> GetGopEncodeInfo(
- const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const FirstpassInfo &firstpass_info,
- const RefFrameTable &ref_frame_table_snapshot) override;
- StatusOr<GopEncodeInfo> GetTplPassGopEncodeInfo(
- const GopStruct &gop_struct,
- const FirstpassInfo &firstpass_info) override;
-
- // Public for testing only.
- // Returns snapshots of the ref frame before and after each frame in
- // gop_struct. The returned list will have n+1 entries for n frames.
- // If this is first GOP, ref_frame_table is ignored and all refs are assumed
- // invalid; otherwise ref_frame_table is used as the initial state.
- std::vector<RefFrameTable> GetRefFrameTableList(
- const GopStruct &gop_struct,
- const std::vector<LookaheadStats> &lookahead_stats,
- RefFrameTable ref_frame_table);
-
- private:
- RateControlParam rc_param_;
-
- // Private methods to determine GOP encode info with different stats
- StatusOr<GopEncodeInfo> GetGopEncodeInfoWithNoStats(
- const GopStruct &gop_struct);
- StatusOr<GopEncodeInfo> GetGopEncodeInfoWithFp(
- const GopStruct &gop_struct, const FirstpassInfo &firstpass_info);
- StatusOr<GopEncodeInfo> GetGopEncodeInfoWithTpl(
- const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const RefFrameTable &ref_frame_table_snapshot_init);
-};
-} // namespace aom
-
-#endif // AOM_AV1_QMODE_RC_RATECTRL_QMODE_H_
diff --git a/av1/qmode_rc/ratectrl_qmode_interface.cc b/av1/qmode_rc/ratectrl_qmode_interface.cc
deleted file mode 100644
index 1f03e0c..0000000
--- a/av1/qmode_rc/ratectrl_qmode_interface.cc
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-
-namespace aom {
-
-AV1RateControlQModeInterface::AV1RateControlQModeInterface() = default;
-AV1RateControlQModeInterface::~AV1RateControlQModeInterface() = default;
-
-} // namespace aom
diff --git a/av1/qmode_rc/ratectrl_qmode_interface.h b/av1/qmode_rc/ratectrl_qmode_interface.h
deleted file mode 100644
index a7fff4a..0000000
--- a/av1/qmode_rc/ratectrl_qmode_interface.h
+++ /dev/null
@@ -1,358 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#ifndef AOM_AV1_QMODE_RC_RATECTRL_QMODE_INTERFACE_H_
-#define AOM_AV1_QMODE_RC_RATECTRL_QMODE_INTERFACE_H_
-
-#include <array>
-#include <string>
-#include <vector>
-
-#include "aom/aom_codec.h"
-#include "av1/encoder/firstpass.h"
-
-namespace aom {
-
-constexpr int kBlockRefCount = 2;
-
-struct MotionVector {
- int row; // subpel row
- int col; // subpel col
- // TODO(b/241589513): Move this to TplFrameStats; it's wasteful to code it
- // separately for each block.
- int subpel_bits; // number of fractional bits used by row/col
-};
-
-enum class TplPassCount {
- kOneTplPass = 1,
- kTwoTplPasses = 2,
-};
-
-struct RateControlParam {
- // Range of allowed GOP sizes (number of displayed frames).
- int max_gop_show_frame_count;
- int min_gop_show_frame_count;
- // Number of reference frame buffers, i.e., size of the DPB.
- int ref_frame_table_size;
- // Maximum number of references a single frame may use.
- int max_ref_frames;
-
- int base_q_index;
-
- // If greater than 1, enables per-superblock q_index, and limits the number of
- // unique q_index values which may be used in a frame (each of which will have
- // its own unique rdmult value).
- int max_distinct_q_indices_per_frame;
-
- // If per-superblock q_index is enabled and this is greater than 1, enables
- // additional per-superblock scaling of lambda, and limits the number of
- // unique lambda scale values which may be used in a frame.
- int max_distinct_lambda_scales_per_frame;
-
- int frame_width;
- int frame_height;
-
- // Total number of TPL passes.
- TplPassCount tpl_pass_count = TplPassCount::kOneTplPass;
- // Current TPL pass number, 0 or 1 (for GetTplPassGopEncodeInfo).
- int tpl_pass_index = 0;
-};
-
-struct TplBlockStats {
- int16_t height; // Pixel height.
- int16_t width; // Pixel width.
- int16_t row; // Pixel row of the top left corner.
- int16_t col; // Pixel col of the top left corner.
- int64_t intra_cost; // Rd cost of the best intra mode.
- int64_t inter_cost; // Rd cost of the best inter mode.
-
- // Valid only if TplFrameStats::rate_dist_present is true:
- int64_t recrf_rate; // Bits when using recon as reference.
- int64_t recrf_dist; // Distortion when using recon as reference.
- int64_t intra_pred_err; // Prediction residual of the intra mode.
- int64_t inter_pred_err; // Prediction residual of the inter mode.
-
- std::array<MotionVector, kBlockRefCount> mv;
- std::array<int, kBlockRefCount> ref_frame_index;
-};
-
-// gop frame type used for facilitate setting up GopFrame
-// TODO(angiebird): Define names for forward key frame and
-// key frame with overlay
-enum class GopFrameType {
- kRegularKey, // High quality key frame without overlay
- kRegularLeaf, // Regular leaf frame
- kRegularGolden, // Regular golden frame
- kRegularArf, // High quality arf with strong filtering followed by an overlay
- // later
- kOverlay, // Overlay frame
- kIntermediateOverlay, // Intermediate overlay frame
- kIntermediateArf, // Good quality arf with weak or no filtering followed by a
- // show_existing later
-};
-
-enum class EncodeRefMode {
- kRegular,
- kOverlay,
- kShowExisting,
-};
-
-enum class ReferenceName {
- kNoneFrame = -1,
- kIntraFrame = 0,
- kLastFrame = 1,
- kLast2Frame = 2,
- kLast3Frame = 3,
- kGoldenFrame = 4,
- kBwdrefFrame = 5,
- kAltref2Frame = 6,
- kAltrefFrame = 7,
-};
-
-struct Status {
- aom_codec_err_t code;
- std::string message; // Empty if code == AOM_CODEC_OK.
- bool ok() const { return code == AOM_CODEC_OK; }
-};
-
-// A very simple imitation of absl::StatusOr, this is conceptually a union of a
-// Status struct and an object of type T. It models an object that is either a
-// usable object, or an error explaining why such an object is not present. A
-// StatusOr<T> may never hold a status with a code of AOM_CODEC_OK.
-template <typename T>
-class StatusOr {
- public:
- StatusOr(const T &value) : value_(value) {}
- StatusOr(T &&value) : value_(std::move(value)) {}
- StatusOr(Status status) : status_(std::move(status)) {
- assert(status_.code != AOM_CODEC_OK);
- }
-
- const Status &status() const { return status_; }
- bool ok() const { return status().ok(); }
-
- // operator* returns the value; it should only be called after checking that
- // ok() returns true.
- const T &operator*() const & { return value_; }
- T &operator*() & { return value_; }
- const T &&operator*() const && { return value_; }
- T &&operator*() && { return std::move(value_); }
-
- // sor->field is equivalent to (*sor).field.
- const T *operator->() const & { return &value_; }
- T *operator->() & { return &value_; }
-
- // value() is equivalent to operator*, but asserts that ok() is true.
- const T &value() const & {
- assert(ok());
- return value_;
- }
- T &value() & {
- assert(ok());
- return value_;
- }
- const T &&value() const && {
- assert(ok());
- return value_;
- }
- T &&value() && {
- assert(ok());
- return std::move(value_);
- }
-
- private:
- T value_; // This could be std::optional<T> if it were available.
- Status status_ = { AOM_CODEC_OK, "" };
-};
-
-struct ReferenceFrame {
- int index; // Index of reference slot containing the reference frame
- ReferenceName name;
-};
-
-struct GopFrame {
- // basic info
- bool is_valid;
- int order_idx; // Index in display order in a GOP
- int coding_idx; // Index in coding order in a GOP
- int display_idx; // The number of displayed frames preceding this frame in
- // a GOP
-
- int global_order_idx; // Index in display order in the whole video chunk
- int global_coding_idx; // Index in coding order in the whole video chunk
-
- bool is_key_frame; // If this is key frame, reset reference buffers are
- // required
- bool is_arf_frame; // Is this a forward frame, a frame with order_idx
- // higher than the current display order
- bool is_show_frame; // Is this frame a show frame after coding
- bool is_golden_frame; // Is this a high quality frame
-
- GopFrameType update_type; // This is a redundant field. It is only used for
- // easy conversion in SW integration.
-
- // reference frame info
- EncodeRefMode encode_ref_mode;
- int colocated_ref_idx; // colocated_ref_idx == -1 when encode_ref_mode ==
- // EncodeRefMode::kRegular
- int update_ref_idx; // The reference index that this frame should be
- // updated to. update_ref_idx == -1 when this frame
- // will not serve as a reference frame
- std::vector<ReferenceFrame>
- ref_frame_list; // A list of available reference frames in priority order
- // for the current to-be-coded frame. The list size
- // should be less or equal to ref_frame_table_size. The
- // reference frames with smaller indices are more likely
- // to be a good reference frame. Therefore, they should
- // be prioritized when the reference frame count is
- // limited. For example, if we plan to use 3 reference
- // frames, we should choose ref_frame_list[0],
- // ref_frame_list[1] and ref_frame_list[2].
- int layer_depth; // Layer depth in the GOP structure
- ReferenceFrame primary_ref_frame; // We will use the primary reference frame
- // to update current frame's initial
- // probability model
-};
-
-struct GopStruct {
- int show_frame_count;
- int global_coding_idx_offset;
- int global_order_idx_offset;
- // TODO(jingning): This can be removed once the framework is up running.
- int display_tracker; // Track the number of frames displayed proceeding a
- // current coding frame.
- std::vector<GopFrame> gop_frame_list;
-};
-
-using GopStructList = std::vector<GopStruct>;
-
-struct SuperblockEncodeParameters {
- int q_index;
- int rdmult;
-};
-
-struct FrameEncodeParameters {
- // Base q_index for the frame.
- int q_index;
-
- // Frame level Lagrangian multiplier.
- int rdmult;
-
- // If max_distinct_q_indices_per_frame <= 1, this will be empty.
- // Otherwise:
- // - There must be one entry per 64x64 superblock, in row-major order
- // - There may be no more than max_distinct_q_indices_per_frame unique q_index
- // values
- // - All entries with the same q_index must have the same rdmult
- // (If it's desired to use different rdmult values with the same q_index, this
- // must be done with superblock_lambda_scales.)
- std::vector<SuperblockEncodeParameters> superblock_encode_params;
-
- // If max_distinct_q_indices_per_frame <= 1 or
- // max_distinct_lambda_scales_per_frame <= 1, this will be empty. Otherwise,
- // it will have one entry per 64x64 superblock, in row-major order, with no
- // more than max_distinct_lambda_scales_per_frame unique values. Each entry
- // should be multiplied by the rdmult in the corresponding superblock's entry
- // in superblock_encode_params.
- std::vector<float> superblock_lambda_scales;
-};
-
-struct FirstpassInfo {
- int num_mbs_16x16; // Count of 16x16 unit blocks in each frame.
- // FIRSTPASS_STATS's unit block size is 16x16
- std::vector<FIRSTPASS_STATS> stats_list;
-};
-
-// In general, the number of elements in RefFrameTable must always equal
-// ref_frame_table_size (as specified in RateControlParam), but see
-// GetGopEncodeInfo for the one exception.
-using RefFrameTable = std::vector<GopFrame>;
-
-struct GopEncodeInfo {
- std::vector<FrameEncodeParameters> param_list;
- RefFrameTable final_snapshot; // RefFrameTable snapshot after coding this GOP
-};
-
-struct TplFrameStats {
- int min_block_size;
- int frame_width;
- int frame_height;
- bool rate_dist_present; // True if recrf_rate and recrf_dist are populated.
- std::vector<TplBlockStats> block_stats_list;
- // Optional stats computed with different settings, should be empty unless
- // tpl_pass_count == kTwoTplPasses.
- std::vector<TplBlockStats> alternate_block_stats_list;
-};
-
-struct TplGopStats {
- std::vector<TplFrameStats> frame_stats_list;
-};
-
-// Structure and TPL stats for a single GOP, to be used for lookahead.
-struct LookaheadStats {
- const GopStruct *gop_struct; // Not owned, may not be nullptr.
- const TplGopStats *tpl_gop_stats; // Not owned, may not be nullptr.
-};
-
-class AV1RateControlQModeInterface {
- public:
- AV1RateControlQModeInterface();
- virtual ~AV1RateControlQModeInterface();
-
- virtual Status SetRcParam(const RateControlParam &rc_param) = 0;
- virtual StatusOr<GopStructList> DetermineGopInfo(
- const FirstpassInfo &firstpass_info) = 0;
-
- // Accepts GOP structure and TPL info from the encoder and returns q index and
- // rdmult for each frame. This should be called with consecutive GOPs as
- // returned by DetermineGopInfo.
- //
- // GOP structure and TPL info from zero or more subsequent GOPs may optionally
- // be passed in lookahead_stats.
- //
- // For the first GOP, a default-constructed RefFrameTable may be passed in as
- // ref_frame_table_snapshot_init; for subsequent GOPs, it should be the
- // final_snapshot returned on the previous call.
- //
- // TODO(b/260859962): Remove these once all callers and overrides are gone.
- virtual StatusOr<GopEncodeInfo> GetGopEncodeInfo(
- const GopStruct &gop_struct AOM_UNUSED,
- const TplGopStats &tpl_gop_stats AOM_UNUSED,
- const std::vector<LookaheadStats> &lookahead_stats AOM_UNUSED,
- const RefFrameTable &ref_frame_table_snapshot AOM_UNUSED) {
- return Status{ AOM_CODEC_UNSUP_FEATURE, "Deprecated" };
- }
- virtual StatusOr<GopEncodeInfo> GetTplPassGopEncodeInfo(
- const GopStruct &gop_struct AOM_UNUSED) {
- return Status{ AOM_CODEC_UNSUP_FEATURE, "Deprecated" };
- }
-
- // Extensions to the API to pass in the first pass info. There should be stats
- // for all frames starting from the first frame of the GOP and continuing to
- // the end of the sequence.
- // TODO(b/260859962): Make pure virtual once all derived classes implement it.
- virtual StatusOr<GopEncodeInfo> GetGopEncodeInfo(
- const GopStruct &gop_struct AOM_UNUSED,
- const TplGopStats &tpl_gop_stats AOM_UNUSED,
- const std::vector<LookaheadStats> &lookahead_stats AOM_UNUSED,
- const FirstpassInfo &firstpass_info AOM_UNUSED,
- const RefFrameTable &ref_frame_table_snapshot AOM_UNUSED) {
- return Status{ AOM_CODEC_UNSUP_FEATURE, "Not yet implemented" };
- }
- virtual StatusOr<GopEncodeInfo> GetTplPassGopEncodeInfo(
- const GopStruct &gop_struct AOM_UNUSED,
- const FirstpassInfo &firstpass_info AOM_UNUSED) {
- return Status{ AOM_CODEC_UNSUP_FEATURE, "Not yet implemented" };
- }
-};
-} // namespace aom
-
-#endif // AOM_AV1_QMODE_RC_RATECTRL_QMODE_INTERFACE_H_
diff --git a/av1/qmode_rc/reference_manager.cc b/av1/qmode_rc/reference_manager.cc
deleted file mode 100644
index eea7b7d..0000000
--- a/av1/qmode_rc/reference_manager.cc
+++ /dev/null
@@ -1,339 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include <algorithm>
-#include <set>
-#include <utility>
-#include <tuple>
-#include <vector>
-
-#include "av1/qmode_rc/reference_manager.h"
-#include "av1/qmode_rc/ratectrl_qmode.h"
-
-namespace aom {
-
-void RefFrameManager::Reset() {
- free_ref_idx_list_.clear();
- for (int i = 0; i < static_cast<int>(ref_frame_table_.size()); ++i) {
- free_ref_idx_list_.push_back(i);
- ref_frame_table_[i] = GopFrameInvalid();
- }
- forward_stack_.clear();
- backward_queue_.clear();
- last_queue_.clear();
-}
-
-int RefFrameManager::AllocateRefIdx() {
- if (free_ref_idx_list_.empty()) {
- size_t backward_size = backward_queue_.size();
- size_t last_size = last_queue_.size();
- if (last_size >= backward_size) {
- int ref_idx = last_queue_.front();
- last_queue_.pop_front();
- free_ref_idx_list_.push_back(ref_idx);
- } else {
- int ref_idx = backward_queue_.front();
- backward_queue_.pop_front();
- free_ref_idx_list_.push_back(ref_idx);
- }
- }
-
- int ref_idx = free_ref_idx_list_.front();
- free_ref_idx_list_.pop_front();
- return ref_idx;
-}
-
-int RefFrameManager::GetRefFrameCountByType(
- RefUpdateType ref_update_type) const {
- size_t cnt = 0;
- switch (ref_update_type) {
- case RefUpdateType::kForward: cnt = forward_stack_.size(); break;
- case RefUpdateType::kBackward: cnt = backward_queue_.size(); break;
- case RefUpdateType::kLast: cnt = last_queue_.size(); break;
- case RefUpdateType::kNone: cnt = 0; break;
- }
- return static_cast<int>(cnt);
-}
-
-int RefFrameManager::GetRefFrameCount() const {
- return GetRefFrameCountByType(RefUpdateType::kForward) +
- GetRefFrameCountByType(RefUpdateType::kBackward) +
- GetRefFrameCountByType(RefUpdateType::kLast);
-}
-
-// TODO(angiebird): Add unit test.
-// Find the ref_idx corresponding to a ref_update_type.
-// Return -1 if no ref frame is found.
-// The priority_idx indicate closeness between the current frame and
-// the ref frame in display order.
-// For example, ref_update_type == kForward and priority_idx == 0 means
-// find the closest ref frame in forward_stack_.
-int RefFrameManager::GetRefFrameIdxByPriority(RefUpdateType ref_update_type,
- int priority_idx) const {
- if (ref_update_type == RefUpdateType::kForward) {
- int size = static_cast<int>(forward_stack_.size());
- // When two or more forward reference frames can be used, first get
- // the highest quality one as the ARF, then going from nearest to
- // the more distant ones in the forward reference frame list.
- if (priority_idx < size) {
- if (allow_two_fwd_frames_) {
- if (priority_idx == 0) return forward_stack_[0];
- return forward_stack_[size - priority_idx];
- }
-
- // Handle the special case where only one forward reference frame
- // can be used. In this setting, we prefer the nearest frame.
- return forward_stack_[size - 1 - priority_idx];
- }
- } else if (ref_update_type == RefUpdateType::kBackward) {
- int size = static_cast<int>(backward_queue_.size());
- if (priority_idx < size) {
- return backward_queue_[size - priority_idx - 1];
- }
- } else if (ref_update_type == RefUpdateType::kLast) {
- int size = static_cast<int>(last_queue_.size());
- if (priority_idx < size) {
- return last_queue_[size - priority_idx - 1];
- }
- }
- return -1;
-}
-
-// The priority_idx indicate closeness between the current frame and
-// the ref frame in display order.
-// For example, ref_update_type == kForward and priority_idx == 0 means
-// find the closest ref frame in forward_stack_.
-GopFrame RefFrameManager::GetRefFrameByPriority(RefUpdateType ref_update_type,
- int priority_idx) const {
- int ref_idx = GetRefFrameIdxByPriority(ref_update_type, priority_idx);
- if (ref_idx == -1) {
- return GopFrameInvalid();
- }
- assert(ref_frame_table_[ref_idx].update_ref_idx == ref_idx);
- return ref_frame_table_[ref_idx];
-}
-
-GopFrame RefFrameManager::GetRefFrameByIndex(int ref_idx) const {
- return ref_frame_table_[ref_idx];
-}
-
-ReferenceName get_ref_name(RefUpdateType ref_update_type, int priority_idx,
- const std::set<ReferenceName> &used_name_set) {
- // TODO(angiebird): Find the better way to assign name lists.
- // Maybe sort the names based on how frequent each name is being used in the
- // past?
- const std::vector<ReferenceName> forward_name_list{
- ReferenceName::kAltrefFrame, ReferenceName::kBwdrefFrame,
- ReferenceName::kAltref2Frame, ReferenceName::kGoldenFrame,
- ReferenceName::kLast3Frame, ReferenceName::kLast2Frame,
- ReferenceName::kLastFrame
- };
- const std::vector<ReferenceName> backward_name_list{
- ReferenceName::kGoldenFrame, ReferenceName::kLastFrame,
- ReferenceName::kLast2Frame, ReferenceName::kLast3Frame,
- ReferenceName::kBwdrefFrame, ReferenceName::kAltref2Frame,
- ReferenceName::kAltrefFrame
- };
- const std::vector<ReferenceName> last_name_list{
- ReferenceName::kLastFrame, ReferenceName::kLast2Frame,
- ReferenceName::kLast3Frame, ReferenceName::kGoldenFrame,
- ReferenceName::kBwdrefFrame, ReferenceName::kAltref2Frame,
- ReferenceName::kAltrefFrame
- };
-
- const std::vector<ReferenceName> *name_list = nullptr;
- switch (ref_update_type) {
- case RefUpdateType::kForward: name_list = &forward_name_list; break;
- case RefUpdateType::kBackward: name_list = &backward_name_list; break;
- case RefUpdateType::kLast: name_list = &last_name_list; break;
- case RefUpdateType::kNone: break;
- }
-
- if (name_list) {
- const int name_list_size = static_cast<int>(name_list->size());
- for (int idx = priority_idx; idx < name_list_size; ++idx) {
- ReferenceName ref_name = name_list->at(idx);
- bool not_used = used_name_set.find(ref_name) == used_name_set.end();
- if (not_used) return ref_name;
- }
- }
- return ReferenceName::kNoneFrame;
-}
-
-// Generate a list of available reference frames in priority order for the
-// current to-be-coded frame. The list size should be less or equal to the size
-// of ref_frame_table_. The reference frames with smaller indices are more
-// likely to be a good reference frame. Therefore, they should be prioritized
-// when the reference frame count is limited. For example, if we plan to use 3
-// reference frames, we should choose ref_frame_list[0], ref_frame_list[1] and
-// ref_frame_list[2].
-std::vector<ReferenceFrame> RefFrameManager::GetRefFrameListByPriority() const {
- constexpr int round_robin_size = 3;
- const std::vector<RefUpdateType> round_robin_list{ RefUpdateType::kForward,
- RefUpdateType::kBackward,
- RefUpdateType::kLast };
- std::vector<int> priority_idx_list(round_robin_size, 0);
- int available_ref_frames = GetRefFrameCount();
- std::vector<ReferenceFrame> ref_frame_list;
- int ref_frame_count = 0;
- int round_robin_idx = 0;
-
- std::set<ReferenceName> used_name_set;
- while (ref_frame_count < available_ref_frames &&
- ref_frame_count < max_ref_frames_) {
- const RefUpdateType ref_update_type = round_robin_list[round_robin_idx];
- int priority_idx = priority_idx_list[round_robin_idx];
- int ref_idx = GetRefFrameIdxByPriority(ref_update_type, priority_idx);
- if (ref_idx != -1) {
- const ReferenceName name =
- get_ref_name(ref_update_type, priority_idx, used_name_set);
- assert(name != ReferenceName::kNoneFrame);
- used_name_set.insert(name);
- ReferenceFrame ref_frame = { ref_idx, name };
- ref_frame_list.push_back(ref_frame);
- ++ref_frame_count;
- ++priority_idx_list[round_robin_idx];
- }
- round_robin_idx = (round_robin_idx + 1) % round_robin_size;
- }
- return ref_frame_list;
-}
-
-void RefFrameManager::UpdateOrder(int global_order_idx) {
- cur_global_order_idx_ = global_order_idx;
- if (forward_stack_.empty()) {
- return;
- }
- int ref_idx = forward_stack_.back();
- const GopFrame &gf_frame = ref_frame_table_[ref_idx];
-
- // If the current processing frame is an overlay / show existing frame.
- if (gf_frame.global_order_idx == global_order_idx) {
- forward_stack_.pop_back();
- if (gf_frame.is_golden_frame) {
- // high quality frame
- backward_queue_.push_back(ref_idx);
- } else {
- last_queue_.push_back(ref_idx);
- }
- }
-}
-
-int RefFrameManager::ColocatedRefIdx(int global_order_idx) {
- if (forward_stack_.empty()) return -1;
- int ref_idx = forward_stack_.back();
- int arf_global_order_idx = ref_frame_table_[ref_idx].global_order_idx;
- if (arf_global_order_idx == global_order_idx) {
- return ref_idx;
- }
- return -1;
-}
-
-static RefUpdateType infer_ref_update_type(const GopFrame &gop_frame,
- int cur_global_order_idx) {
- if (gop_frame.global_order_idx > cur_global_order_idx) {
- return RefUpdateType::kForward;
- }
- if (gop_frame.is_golden_frame) {
- return RefUpdateType::kBackward;
- }
- if (gop_frame.encode_ref_mode == EncodeRefMode::kShowExisting ||
- gop_frame.encode_ref_mode == EncodeRefMode::kOverlay) {
- return RefUpdateType::kNone;
- }
- return RefUpdateType::kLast;
-}
-
-using PrimaryRefKey = std::tuple<int, // abs layer_depth delta
- bool, // is_key_frame differs
- bool, // is_golden_frame differs
- bool, // is_arf_frame differs
- bool, // is_show_frame differs
- bool, // encode_ref_mode differs
- int>; // abs order_idx delta
-
-// Generate PrimaryRefKey based on abs layer_depth delta,
-// frame flags and abs order_idx delta. These are the fields that will
-// be used to pick the primary reference frame for probability model
-static PrimaryRefKey get_primary_ref_key(const GopFrame &cur_frame,
- const GopFrame &ref_frame) {
- return std::make_tuple(abs(cur_frame.layer_depth - ref_frame.layer_depth),
- cur_frame.is_key_frame != ref_frame.is_key_frame,
- cur_frame.is_golden_frame != ref_frame.is_golden_frame,
- cur_frame.is_arf_frame != ref_frame.is_arf_frame,
- cur_frame.is_show_frame != ref_frame.is_show_frame,
- cur_frame.encode_ref_mode != ref_frame.encode_ref_mode,
- abs(cur_frame.order_idx - ref_frame.order_idx));
-}
-
-// Pick primary_ref_idx for probability model.
-ReferenceFrame RefFrameManager::GetPrimaryRefFrame(
- const GopFrame &gop_frame) const {
- assert(gop_frame.is_valid);
- std::vector<std::pair<PrimaryRefKey, int>> candidate_list;
- for (auto &ref_frame_in_gop_frame : gop_frame.ref_frame_list) {
- const GopFrame &ref_frame = ref_frame_table_[ref_frame_in_gop_frame.index];
- if (ref_frame.is_valid) {
- assert(ref_frame_in_gop_frame.index == ref_frame.update_ref_idx);
- PrimaryRefKey key = get_primary_ref_key(gop_frame, ref_frame);
- std::pair<PrimaryRefKey, int> candidate = {
- key, ref_frame_in_gop_frame.index
- };
- candidate_list.push_back(candidate);
- }
- }
-
- std::sort(candidate_list.begin(), candidate_list.end());
-
- ReferenceFrame ref_frame = { -1, ReferenceName::kNoneFrame };
- assert(candidate_list.size() == gop_frame.ref_frame_list.size());
- if (!candidate_list.empty()) {
- int ref_idx = candidate_list[0].second;
- for (const auto &frame : gop_frame.ref_frame_list) {
- if (frame.index == ref_idx) {
- ref_frame = frame;
- }
- }
- }
- return ref_frame;
-}
-
-void RefFrameManager::UpdateRefFrameTable(GopFrame *gop_frame) {
- allow_two_fwd_frames_ =
- (max_ref_frames_ - !!GetRefFrameCountByType(RefUpdateType::kBackward) -
- !!GetRefFrameCountByType(RefUpdateType::kLast)) >= 2;
- gop_frame->ref_frame_list = GetRefFrameListByPriority();
- gop_frame->primary_ref_frame = GetPrimaryRefFrame(*gop_frame);
- gop_frame->colocated_ref_idx = ColocatedRefIdx(gop_frame->global_order_idx);
-
- if (gop_frame->is_show_frame) {
- UpdateOrder(gop_frame->global_order_idx);
- }
- // Call infer_ref_update_type() after UpdateOrder() so that
- // cur_global_order_idx_ is up-to-date
- RefUpdateType ref_update_type =
- infer_ref_update_type(*gop_frame, cur_global_order_idx_);
- if (ref_update_type == RefUpdateType::kNone) {
- gop_frame->update_ref_idx = -1;
- } else {
- const int ref_idx = AllocateRefIdx();
- gop_frame->update_ref_idx = ref_idx;
- switch (ref_update_type) {
- case RefUpdateType::kForward: forward_stack_.push_back(ref_idx); break;
- case RefUpdateType::kBackward: backward_queue_.push_back(ref_idx); break;
- case RefUpdateType::kLast: last_queue_.push_back(ref_idx); break;
- case RefUpdateType::kNone: break;
- }
- ref_frame_table_[ref_idx] = *gop_frame;
- }
-}
-
-} // namespace aom
diff --git a/av1/qmode_rc/reference_manager.h b/av1/qmode_rc/reference_manager.h
deleted file mode 100644
index 37b5038..0000000
--- a/av1/qmode_rc/reference_manager.h
+++ /dev/null
@@ -1,95 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#ifndef AOM_AV1_QMODE_RC_REFERENCE_MANAGER_H_
-#define AOM_AV1_QMODE_RC_REFERENCE_MANAGER_H_
-
-#include <deque>
-#include <iostream>
-#include <vector>
-
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-
-namespace aom {
-
-enum class RefUpdateType { kForward, kBackward, kLast, kNone };
-
-class RefFrameManager {
- public:
- explicit RefFrameManager(int ref_frame_table_size, int max_ref_frames)
- : ref_frame_table_(ref_frame_table_size),
- max_ref_frames_(max_ref_frames) {
- // forward_max_size_ define max number of arf frames that can exists at
- // the same time. In the other words, it's the max size of forward_stack_.
- // TODO(angiebird): Figure out if this number is optimal.
- forward_max_size_ = ref_frame_table_size - 2;
- cur_global_order_idx_ = 0;
- Reset();
- }
- ~RefFrameManager() = default;
-
- RefFrameManager(const RefFrameManager &) = delete;
- RefFrameManager &operator=(const RefFrameManager &) = delete;
-
- friend std::ostream &operator<<(std::ostream &os,
- const RefFrameManager &rfm) {
- os << "=" << std::endl;
- os << "forward: ";
- for (const auto &ref_idx : rfm.forward_stack_) {
- os << rfm.ref_frame_table_[ref_idx].order_idx << " ";
- }
- os << std::endl;
- os << "backward: ";
- for (const auto &ref_idx : rfm.backward_queue_) {
- os << rfm.ref_frame_table_[ref_idx].order_idx << " ";
- }
- os << std::endl;
- os << "last: ";
- for (const auto &ref_idx : rfm.last_queue_) {
- os << rfm.ref_frame_table_[ref_idx].order_idx << " ";
- }
- os << std::endl;
- return os;
- }
-
- void Reset();
- int AllocateRefIdx();
- int GetRefFrameCountByType(RefUpdateType ref_update_type) const;
- int GetRefFrameCount() const;
- std::vector<ReferenceFrame> GetRefFrameListByPriority() const;
- int GetRefFrameIdxByPriority(RefUpdateType ref_update_type,
- int priority_idx) const;
- GopFrame GetRefFrameByPriority(RefUpdateType ref_update_type,
- int priority_idx) const;
- GopFrame GetRefFrameByIndex(int ref_idx) const;
- void UpdateOrder(int global_order_idx);
- int ColocatedRefIdx(int global_order_idx);
- int ForwardMaxSize() const { return forward_max_size_; }
- int MaxRefFrame() const { return max_ref_frames_; }
- int CurGlobalOrderIdx() const { return cur_global_order_idx_; }
- void UpdateRefFrameTable(GopFrame *gop_frame);
- ReferenceFrame GetPrimaryRefFrame(const GopFrame &gop_frame) const;
-
- private:
- int forward_max_size_;
- int cur_global_order_idx_;
- RefFrameTable ref_frame_table_;
- int max_ref_frames_;
- bool allow_two_fwd_frames_;
- std::deque<int> free_ref_idx_list_;
- std::vector<int> forward_stack_;
- std::deque<int> backward_queue_;
- std::deque<int> last_queue_;
-};
-
-} // namespace aom
-
-#endif // AOM_AV1_QMODE_RC_REFERENCE_MANAGER_H_
diff --git a/av1/ratectrl_rtc.cc b/av1/ratectrl_rtc.cc
index 6cf53f0..a3ec6f6 100644
--- a/av1/ratectrl_rtc.cc
+++ b/av1/ratectrl_rtc.cc
@@ -19,6 +19,8 @@
#include "aom_mem/aom_mem.h"
#include "av1/encoder/encoder.h"
#include "av1/encoder/encoder_utils.h"
+#include "av1/encoder/pickcdef.h"
+#include "av1/encoder/picklpf.h"
#include "av1/encoder/ratectrl.h"
#include "av1/encoder/rc_utils.h"
#include "av1/encoder/svc_layercontext.h"
@@ -38,6 +40,7 @@
max_intra_bitrate_pct = 50;
max_inter_bitrate_pct = 0;
framerate = 30.0;
+ ss_number_layers = 1;
ts_number_layers = 1;
aq_mode = 0;
layer_target_bitrate[0] = static_cast<int>(target_bandwidth);
@@ -68,9 +71,7 @@
av1_zero(*rc_api->cpi_->ppi);
rc_api->cpi_->common.seq_params = &rc_api->cpi_->ppi->seq_params;
av1_zero(*rc_api->cpi_->common.seq_params);
- const int num_layers = cfg.ss_number_layers * cfg.ts_number_layers;
- if (!av1_alloc_layer_context(rc_api->cpi_, num_layers)) return nullptr;
- rc_api->InitRateControl(cfg);
+ if (!rc_api->InitRateControl(cfg)) return nullptr;
if (cfg.aq_mode) {
AV1_COMP *const cpi = rc_api->cpi_;
cpi->enc_seg.map = static_cast<uint8_t *>(aom_calloc(
@@ -110,7 +111,7 @@
}
}
-void AV1RateControlRTC::InitRateControl(const AV1RateControlRtcConfig &rc_cfg) {
+bool AV1RateControlRTC::InitRateControl(const AV1RateControlRtcConfig &rc_cfg) {
AV1_COMMON *cm = &cpi_->common;
AV1EncoderConfig *oxcf = &cpi_->oxcf;
RATE_CONTROL *const rc = &cpi_->rc;
@@ -126,13 +127,14 @@
oxcf->rc_cfg.drop_frames_water_mark = 0;
oxcf->tool_cfg.bit_depth = AOM_BITS_8;
oxcf->tool_cfg.superblock_size = AOM_SUPERBLOCK_SIZE_DYNAMIC;
+ oxcf->algo_cfg.loopfilter_control = LOOPFILTER_ALL;
cm->current_frame.frame_number = 0;
cpi_->ppi->p_rc.kf_boost = DEFAULT_KF_BOOST_RT;
for (auto &lvl_idx : oxcf->target_seq_level_idx) lvl_idx = SEQ_LEVEL_MAX;
memcpy(cpi_->ppi->level_params.target_seq_level_idx,
oxcf->target_seq_level_idx, sizeof(oxcf->target_seq_level_idx));
- UpdateRateControl(rc_cfg);
+ if (!UpdateRateControl(rc_cfg)) return false;
set_sb_size(cm->seq_params,
av1_select_sb_size(oxcf, cm->width, cm->height,
cpi_->svc.number_spatial_layers));
@@ -146,14 +148,24 @@
// Enable external rate control.
cpi_->rc.rtc_external_ratectrl = 1;
cpi_->sf.rt_sf.use_nonrd_pick_mode = 1;
+ return true;
}
-void AV1RateControlRTC::UpdateRateControl(
+bool AV1RateControlRTC::UpdateRateControl(
const AV1RateControlRtcConfig &rc_cfg) {
+ if (rc_cfg.ss_number_layers < 1 ||
+ rc_cfg.ss_number_layers > AOM_MAX_SS_LAYERS ||
+ rc_cfg.ts_number_layers < 1 ||
+ rc_cfg.ts_number_layers > AOM_MAX_TS_LAYERS) {
+ return false;
+ }
+ const int num_layers = rc_cfg.ss_number_layers * rc_cfg.ts_number_layers;
+ if (num_layers > 1 && !av1_alloc_layer_context(cpi_, num_layers)) {
+ return false;
+ }
AV1_COMMON *cm = &cpi_->common;
AV1EncoderConfig *oxcf = &cpi_->oxcf;
RATE_CONTROL *const rc = &cpi_->rc;
-
initial_width_ = rc_cfg.width;
initial_height_ = rc_cfg.height;
cm->width = rc_cfg.width;
@@ -180,35 +192,38 @@
cpi_->svc.number_temporal_layers = rc_cfg.ts_number_layers;
set_primary_rc_buffer_sizes(oxcf, cpi_->ppi);
enc_set_mb_mi(&cm->mi_params, cm->width, cm->height, BLOCK_8X8);
- int64_t target_bandwidth_svc = 0;
- for (int sl = 0; sl < cpi_->svc.number_spatial_layers; ++sl) {
- for (int tl = 0; tl < cpi_->svc.number_temporal_layers; ++tl) {
- const int layer =
- LAYER_IDS_TO_IDX(sl, tl, cpi_->svc.number_temporal_layers);
- LAYER_CONTEXT *lc = &cpi_->svc.layer_context[layer];
- RATE_CONTROL *const lrc = &lc->rc;
- lc->layer_target_bitrate = 1000 * rc_cfg.layer_target_bitrate[layer];
- lc->max_q = rc_cfg.max_quantizers[layer];
- lc->min_q = rc_cfg.min_quantizers[layer];
- lrc->worst_quality =
- av1_quantizer_to_qindex(rc_cfg.max_quantizers[layer]);
- lrc->best_quality = av1_quantizer_to_qindex(rc_cfg.min_quantizers[layer]);
- lc->scaling_factor_num = rc_cfg.scaling_factor_num[sl];
- lc->scaling_factor_den = rc_cfg.scaling_factor_den[sl];
- lc->framerate_factor = rc_cfg.ts_rate_decimator[tl];
- if (tl == cpi_->svc.number_temporal_layers - 1)
- target_bandwidth_svc += lc->layer_target_bitrate;
- }
- }
av1_new_framerate(cpi_, cpi_->framerate);
if (cpi_->svc.number_temporal_layers > 1 ||
cpi_->svc.number_spatial_layers > 1) {
+ int64_t target_bandwidth_svc = 0;
+ for (int sl = 0; sl < cpi_->svc.number_spatial_layers; ++sl) {
+ for (int tl = 0; tl < cpi_->svc.number_temporal_layers; ++tl) {
+ const int layer =
+ LAYER_IDS_TO_IDX(sl, tl, cpi_->svc.number_temporal_layers);
+ LAYER_CONTEXT *lc = &cpi_->svc.layer_context[layer];
+ RATE_CONTROL *const lrc = &lc->rc;
+ lc->layer_target_bitrate = 1000 * rc_cfg.layer_target_bitrate[layer];
+ lc->max_q = rc_cfg.max_quantizers[layer];
+ lc->min_q = rc_cfg.min_quantizers[layer];
+ lrc->worst_quality =
+ av1_quantizer_to_qindex(rc_cfg.max_quantizers[layer]);
+ lrc->best_quality =
+ av1_quantizer_to_qindex(rc_cfg.min_quantizers[layer]);
+ lc->scaling_factor_num = rc_cfg.scaling_factor_num[sl];
+ lc->scaling_factor_den = rc_cfg.scaling_factor_den[sl];
+ lc->framerate_factor = rc_cfg.ts_rate_decimator[tl];
+ if (tl == cpi_->svc.number_temporal_layers - 1)
+ target_bandwidth_svc += lc->layer_target_bitrate;
+ }
+ }
+
if (cm->current_frame.frame_number == 0) av1_init_layer_context(cpi_);
// This is needed to initialize external RC flag in layer context structure.
cpi_->rc.rtc_external_ratectrl = 1;
av1_update_layer_context_change_config(cpi_, target_bandwidth_svc);
}
check_reset_rc_flag(cpi_);
+ return true;
}
void AV1RateControlRTC::ComputeQP(const AV1FrameParamsRTC &frame_params) {
@@ -291,6 +306,27 @@
return cpi_->common.quant_params.base_qindex;
}
+AV1LoopfilterLevel AV1RateControlRTC::GetLoopfilterLevel() const {
+ av1_pick_filter_level(nullptr, cpi_, LPF_PICK_FROM_Q);
+ AV1LoopfilterLevel lpf_level;
+ lpf_level.filter_level[0] = cpi_->common.lf.filter_level[0];
+ lpf_level.filter_level[1] = cpi_->common.lf.filter_level[1];
+ lpf_level.filter_level_u = cpi_->common.lf.filter_level_u;
+ lpf_level.filter_level_v = cpi_->common.lf.filter_level_v;
+
+ return lpf_level;
+}
+
+AV1CdefInfo AV1RateControlRTC::GetCdefInfo() const {
+ av1_pick_cdef_from_qp(&cpi_->common, 0, 0);
+ AV1CdefInfo cdef_level;
+ cdef_level.cdef_strength_y = cpi_->common.cdef_info.cdef_strengths[0];
+ cdef_level.cdef_strength_uv = cpi_->common.cdef_info.cdef_uv_strengths[0];
+ cdef_level.damping = cpi_->common.cdef_info.cdef_damping;
+
+ return cdef_level;
+}
+
signed char *AV1RateControlRTC::GetCyclicRefreshMap() const {
return cpi_->cyclic_refresh->map;
}
@@ -301,6 +337,8 @@
void AV1RateControlRTC::PostEncodeUpdate(uint64_t encoded_frame_size) {
cpi_->common.current_frame.frame_number++;
+ if (cpi_->svc.spatial_layer_id == cpi_->svc.number_spatial_layers - 1)
+ cpi_->svc.prev_number_spatial_layers = cpi_->svc.number_spatial_layers;
av1_rc_postencode_update(cpi_, encoded_frame_size);
if (cpi_->svc.number_spatial_layers > 1 ||
cpi_->svc.number_temporal_layers > 1)
diff --git a/av1/ratectrl_rtc.h b/av1/ratectrl_rtc.h
index 9843803..e96e210 100644
--- a/av1/ratectrl_rtc.h
+++ b/av1/ratectrl_rtc.h
@@ -32,6 +32,8 @@
int width;
int height;
+ // Flag indicating if the content is screen or not.
+ bool is_screen;
// 0-63
int max_quantizer;
int min_quantizer;
@@ -63,15 +65,31 @@
int temporal_layer_id;
};
+struct AV1LoopfilterLevel {
+ int filter_level[2];
+ int filter_level_u;
+ int filter_level_v;
+};
+
+struct AV1CdefInfo {
+ int cdef_strength_y;
+ int cdef_strength_uv;
+ int damping;
+};
+
class AV1RateControlRTC {
public:
static std::unique_ptr<AV1RateControlRTC> Create(
const AV1RateControlRtcConfig &cfg);
~AV1RateControlRTC();
- void UpdateRateControl(const AV1RateControlRtcConfig &rc_cfg);
+ bool UpdateRateControl(const AV1RateControlRtcConfig &rc_cfg);
// GetQP() needs to be called after ComputeQP() to get the latest QP
int GetQP() const;
+ // GetLoopfilterLevel() needs to be called after ComputeQP()
+ AV1LoopfilterLevel GetLoopfilterLevel() const;
+ // GetCdefInfo() needs to be called after ComputeQP()
+ AV1CdefInfo GetCdefInfo() const;
signed char *GetCyclicRefreshMap() const;
int *GetDeltaQ() const;
void ComputeQP(const AV1FrameParamsRTC &frame_params);
@@ -80,7 +98,7 @@
private:
AV1RateControlRTC() = default;
- void InitRateControl(const AV1RateControlRtcConfig &cfg);
+ bool InitRateControl(const AV1RateControlRtcConfig &cfg);
AV1_COMP *cpi_;
int initial_width_;
int initial_height_;
diff --git a/build/cmake/aom_config.c.template b/build/cmake/aom_config.c.template
index 62f0a10..93a6d8f 100644
--- a/build/cmake/aom_config.c.template
+++ b/build/cmake/aom_config.c.template
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) @year@, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/build/cmake/aom_config_defaults.cmake b/build/cmake/aom_config_defaults.cmake
index d63990b..5058022 100644
--- a/build/cmake/aom_config_defaults.cmake
+++ b/build/cmake/aom_config_defaults.cmake
@@ -23,10 +23,11 @@
set_aom_detect_var(INLINE "" "Sets INLINE value for current target.")
# CPUs.
-set_aom_detect_var(ARCH_ARM 0 "Enables ARM architecture.")
-set_aom_detect_var(ARCH_PPC 0 "Enables PPC architecture.")
-set_aom_detect_var(ARCH_X86 0 "Enables X86 architecture.")
-set_aom_detect_var(ARCH_X86_64 0 "Enables X86_64 architecture.")
+set_aom_detect_var(AOM_ARCH_AARCH64 0 "Enables AArch64 architecture.")
+set_aom_detect_var(AOM_ARCH_ARM 0 "Enables ARM architecture.")
+set_aom_detect_var(AOM_ARCH_PPC 0 "Enables PPC architecture.")
+set_aom_detect_var(AOM_ARCH_X86 0 "Enables X86 architecture.")
+set_aom_detect_var(AOM_ARCH_X86_64 0 "Enables X86_64 architecture.")
# ARM feature flags.
set_aom_detect_var(HAVE_NEON 0 "Enables NEON intrinsics optimizations.")
@@ -155,6 +156,11 @@
"AV1 experiment: Enable tensorflow lite library.")
set_aom_config_var(CONFIG_THREE_PASS 0
"AV1 experiment: Enable three-pass encoding.")
+set_aom_config_var(CONFIG_OUTPUT_FRAME_SIZE 0
+ "AV1 experiment: Output frame size information.")
+set_aom_config_var(
+ CONFIG_SALIENCY_MAP 0
+ "AV1 experiment: Enable saliency map based encoding tuning for VMAF.")
set_aom_config_var(CONFIG_CWG_C013 0
"AV1 experiment: Support for 7.x and 8.x levels.")
diff --git a/build/cmake/aom_configure.cmake b/build/cmake/aom_configure.cmake
index 427507f..aaef2c3 100644
--- a/build/cmake/aom_configure.cmake
+++ b/build/cmake/aom_configure.cmake
@@ -155,49 +155,61 @@
endif()
if(AOM_TARGET_CPU STREQUAL "x86" OR AOM_TARGET_CPU STREQUAL "x86_64")
- find_program(AS_EXECUTABLE yasm $ENV{YASM_PATH})
- if(NOT AS_EXECUTABLE OR ENABLE_NASM)
- unset(AS_EXECUTABLE CACHE)
- find_program(AS_EXECUTABLE nasm $ENV{NASM_PATH})
- if(AS_EXECUTABLE)
- test_nasm()
- endif()
+ find_program(CMAKE_ASM_NASM_COMPILER yasm $ENV{YASM_PATH})
+ if(NOT CMAKE_ASM_NASM_COMPILER OR ENABLE_NASM)
+ unset(CMAKE_ASM_NASM_COMPILER CACHE)
+ find_program(CMAKE_ASM_NASM_COMPILER nasm $ENV{NASM_PATH})
endif()
- if(NOT AS_EXECUTABLE)
+ include(CheckLanguage)
+ check_language(ASM_NASM)
+ if(CMAKE_ASM_NASM_COMPILER)
+ get_asm_obj_format("objformat")
+ unset(CMAKE_ASM_NASM_OBJECT_FORMAT)
+ set(CMAKE_ASM_NASM_OBJECT_FORMAT ${objformat})
+ enable_language(ASM_NASM)
+ if(CMAKE_ASM_NASM_COMPILER_ID STREQUAL "NASM")
+ test_nasm()
+ endif()
+ # Xcode requires building the objects manually, so pass the object format
+ # flag.
+ if(XCODE)
+ set(AOM_AS_FLAGS -f ${objformat} ${AOM_AS_FLAGS})
+ endif()
+ else()
message(
FATAL_ERROR
"Unable to find assembler. Install 'yasm' or 'nasm.' "
"To build without optimizations, add -DAOM_TARGET_CPU=generic to "
"your cmake command line.")
endif()
- get_asm_obj_format("objformat")
- set(AOM_AS_FLAGS -f ${objformat} ${AOM_AS_FLAGS})
string(STRIP "${AOM_AS_FLAGS}" AOM_AS_FLAGS)
elseif(AOM_TARGET_CPU MATCHES "arm")
if(AOM_TARGET_SYSTEM STREQUAL "Darwin")
- set(AS_EXECUTABLE as)
+ set(CMAKE_ASM_COMPILER as)
set(AOM_AS_FLAGS -arch ${AOM_TARGET_CPU} -isysroot ${CMAKE_OSX_SYSROOT})
elseif(AOM_TARGET_SYSTEM STREQUAL "Windows")
- if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE ${CMAKE_C_COMPILER} -c -mimplicit-it=always)
+ if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER ${CMAKE_C_COMPILER} -c -mimplicit-it=always)
endif()
else()
- if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE as)
+ if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER as)
endif()
endif()
- find_program(as_executable_found ${AS_EXECUTABLE})
- if(NOT as_executable_found)
+ include(CheckLanguage)
+ check_language(ASM)
+ if(NOT CMAKE_ASM_COMPILER)
message(
FATAL_ERROR
"Unable to find assembler and optimizations are enabled."
- "Searched for ${AS_EXECUTABLE}. Install it, add it to your path, or "
- "set the assembler directly by adding -DAS_EXECUTABLE=<assembler path> "
- "to your CMake command line."
+ "Searched for ${CMAKE_ASM_COMPILER}. Install it, add it to your path,"
+ "or set the assembler directly by adding "
+ "-DCMAKE_ASM_COMPILER=<assembler path> to your CMake command line."
"To build without optimizations, add -DAOM_TARGET_CPU=generic to your "
"cmake command line.")
endif()
+ enable_language(ASM)
string(STRIP "${AOM_AS_FLAGS}" AOM_AS_FLAGS)
endif()
@@ -230,6 +242,8 @@
# The default _WIN32_WINNT value in MinGW is 0x0502 (Windows XP with SP2). Set
# it to 0x0601 (Windows 7).
add_compiler_flag_if_supported("-D_WIN32_WINNT=0x0601")
+ # Quiet warnings related to fopen, printf, etc.
+ add_compiler_flag_if_supported("-D_CRT_SECURE_NO_WARNINGS")
endif()
#
@@ -288,6 +302,9 @@
# Test compiler flags.
if(MSVC)
+ # It isn't possible to specify C99 conformance for MSVC. MSVC doesn't support
+ # C++ standards modes earlier than C++14.
+ add_cxx_flag_if_supported("/std:c++14")
add_compiler_flag_if_supported("/W3")
# Disable MSVC warnings that suggest making code non-portable.
@@ -327,11 +344,13 @@
add_c_flag_if_supported("-Wimplicit-function-declaration")
add_compiler_flag_if_supported("-Wlogical-op")
add_compiler_flag_if_supported("-Wpointer-arith")
+ add_compiler_flag_if_supported("-Wshadow")
add_compiler_flag_if_supported("-Wshorten-64-to-32")
add_compiler_flag_if_supported("-Wsign-compare")
add_compiler_flag_if_supported("-Wstring-conversion")
add_compiler_flag_if_supported("-Wtype-limits")
add_compiler_flag_if_supported("-Wuninitialized")
+ add_compiler_flag_if_supported("-Wunreachable-code-aggressive")
add_compiler_flag_if_supported("-Wunused")
add_compiler_flag_if_supported("-Wvla")
add_cxx_flag_if_supported("-Wc++14-extensions")
@@ -357,9 +376,6 @@
add_compiler_flag_if_supported("-Wno-disabled-optimization")
endif()
- # Add -Wshadow only for C files to avoid massive gtest warning spam.
- add_c_flag_if_supported("-Wshadow")
-
# Add -Wundef only for C files to avoid massive gtest warning spam.
add_c_flag_if_supported("-Wundef")
@@ -428,6 +444,7 @@
message("--- Git missing, version will be read from CHANGELOG.")
endif()
+string(TIMESTAMP year "%Y")
configure_file("${AOM_ROOT}/build/cmake/aom_config.c.template"
"${AOM_CONFIG_DIR}/config/aom_config.c")
diff --git a/build/cmake/aom_install.cmake b/build/cmake/aom_install.cmake
index 3b52a68..b02c7b9 100644
--- a/build/cmake/aom_install.cmake
+++ b/build/cmake/aom_install.cmake
@@ -31,8 +31,8 @@
include("GNUInstallDirs")
set(AOM_PKG_CONFIG_FILE "${AOM_CONFIG_DIR}/aom.pc")
- # Create a dummy library target for creating aom.pc.
- create_dummy_source_file(aom_pc c AOM_PKG_CONFIG_SOURCES)
+ # Create a library target for creating aom.pc.
+ create_no_op_source_file(aom_pc c AOM_PKG_CONFIG_SOURCES)
add_library(aom_pc ${AOM_PKG_CONFIG_SOURCES})
# Setup a rule to generate aom.pc.
@@ -49,6 +49,7 @@
-DCONFIG_MULTITHREAD=${CONFIG_MULTITHREAD}
-DCONFIG_TUNE_VMAF=${CONFIG_TUNE_VMAF}
-DCONFIG_TUNE_BUTTERAUGLI=${CONFIG_TUNE_BUTTERAUGLI}
+ -DCONFIG_SALIENCY_MAP=${CONFIG_SALIENCY_MAP}
-DCONFIG_TFLITE=${CONFIG_TFLITE}
-DHAVE_PTHREAD_H=${HAVE_PTHREAD_H}
-P
diff --git a/build/cmake/aom_optimization.cmake b/build/cmake/aom_optimization.cmake
index 8d28711..6b0c55a 100644
--- a/build/cmake/aom_optimization.cmake
+++ b/build/cmake/aom_optimization.cmake
@@ -131,11 +131,12 @@
# Adds library target named $lib_name for ASM files in variable named by
# $asm_sources. Builds an output directory path from $lib_name. Links $lib_name
-# into the aom library target(s). Generates a dummy C file with a dummy function
-# to ensure that all cmake generators can determine the linker language, and
-# that build tools don't complain that an object exposes no symbols.
+# into the aom library target(s). Generates a C file with an unused no-op
+# function to ensure that all cmake generators can determine the linker
+# language, and that build tools don't complain that an object exposes no
+# symbols.
#
-# In shared library configs every step described above happens twice, and
+# In Xcode-based builds every step described above happens twice, and
# directory/target/object names are updated to include _shared and _static
# suffixes.
function(add_asm_library lib_name asm_sources)
@@ -143,49 +144,66 @@
return()
endif()
- list(APPEND asm_configs "static")
- if(BUILD_SHARED_LIBS)
- list(APPEND asm_configs "shared")
- endif()
-
- foreach(asm_config ${asm_configs})
- set(asm_lib_name ${lib_name}_${asm_config})
- set(asm_lib_obj_dir "${AOM_CONFIG_DIR}/asm_objects/${asm_lib_name}")
- if(NOT EXISTS "${asm_lib_obj_dir}")
- file(MAKE_DIRECTORY "${asm_lib_obj_dir}")
+ if(XCODE)
+ # CMake's generator does not output a build rule for Nasm files. Moreover,
+ # it makes Xcode believe Nasm files are of type "sourcecode" instead of
+ # "sourcecode.nasm", which prevents even the default rule from applying.
+ # This default rule is broken, though, because it doesn't apply any of the
+ # flags specified for ASM_NASM. See https://discourse.cmake.org/t/building-
+ # nasm-files-with-xcode/7934
+ list(APPEND asm_configs "static")
+ if(BUILD_SHARED_LIBS)
+ list(APPEND asm_configs "shared")
endif()
- add_library(${asm_lib_name} STATIC ${${asm_sources}})
- set_property(TARGET ${asm_lib_name} PROPERTY FOLDER ${AOM_TARGET_CPU})
+ set(as_executable "${CMAKE_ASM_NASM_COMPILER}")
+ if(NOT as_executable)
+ set(as_executable "${CMAKE_ASM_COMPILER}")
+ endif()
- foreach(asm_source ${${asm_sources}})
- get_filename_component(asm_source_name "${asm_source}" NAME)
- set(asm_object "${asm_lib_obj_dir}/${asm_source_name}.o")
- add_custom_command(OUTPUT "${asm_object}"
- COMMAND ${AS_EXECUTABLE} ARGS ${AOM_AS_FLAGS}
- -I${AOM_ROOT}/ -I${AOM_CONFIG_DIR}/ -o
- "${asm_object}" "${asm_source}"
- DEPENDS "${asm_source}"
- COMMENT "Building ASM object ${asm_object}"
- WORKING_DIRECTORY "${AOM_CONFIG_DIR}"
- VERBATIM)
- if(BUILD_SHARED_LIBS AND "${asm_config}" STREQUAL "static")
- target_sources(aom_static PRIVATE "${asm_object}")
- else()
- target_sources(aom PRIVATE "${asm_object}")
+ foreach(asm_config ${asm_configs})
+ set(asm_lib_name ${lib_name}_${asm_config})
+ set(asm_lib_obj_dir "${AOM_CONFIG_DIR}/asm_objects/${asm_lib_name}")
+ if(NOT EXISTS "${asm_lib_obj_dir}")
+ file(MAKE_DIRECTORY "${asm_lib_obj_dir}")
endif()
- endforeach()
- # The above created a target containing only ASM sources. CMake needs help
- # here to determine the linker language. Add a dummy C file to force the
- # linker language to C. We don't bother with setting the LINKER_LANGUAGE
- # property on the library target because not all generators obey it (looking
- # at you, Xcode generator).
- add_dummy_source_file_to_target("${asm_lib_name}" "c")
+ foreach(asm_source ${${asm_sources}})
+ get_filename_component(asm_source_name "${asm_source}" NAME)
+ set(asm_object "${asm_lib_obj_dir}/${asm_source_name}.o")
+ add_custom_command(OUTPUT "${asm_object}"
+ COMMAND ${as_executable} ARGS ${AOM_AS_FLAGS}
+ -I${AOM_ROOT}/ -I${AOM_CONFIG_DIR}/ -o
+ "${asm_object}" "${asm_source}"
+ DEPENDS "${asm_source}"
+ COMMENT "Building ASM object ${asm_object}"
+ WORKING_DIRECTORY "${AOM_CONFIG_DIR}"
+ VERBATIM)
+ if(BUILD_SHARED_LIBS AND "${asm_config}" STREQUAL "static")
+ target_sources(aom_static PRIVATE "${asm_object}")
+ else()
+ target_sources(aom PRIVATE "${asm_object}")
+ endif()
+ endforeach()
+ endforeach()
+ else()
+ # For non-Xcode generators, CMake does not need extra help. The language
+ # support takes care of it.
+ set(asm_lib_name ${lib_name})
+
+ add_library(${asm_lib_name} OBJECT ${${asm_sources}})
+ target_include_directories(${asm_lib_name}
+ PRIVATE ${AOM_ROOT} ${AOM_CONFIG_DIR})
+ target_compile_options(${asm_lib_name} PRIVATE ${AOM_AS_FLAGS})
+ set_property(TARGET ${asm_lib_name} PROPERTY FOLDER ${AOM_TARGET_CPU})
+ if(BUILD_SHARED_LIBS)
+ target_sources(aom_static PRIVATE "$<TARGET_OBJECTS:${asm_lib_name}>")
+ endif()
+ target_sources(aom PRIVATE "$<TARGET_OBJECTS:${asm_lib_name}>")
# Add the new lib target to the global list of aom library targets.
list(APPEND AOM_LIB_TARGETS ${asm_lib_name})
- endforeach()
+ endif()
set(AOM_LIB_TARGETS ${AOM_LIB_TARGETS} PARENT_SCOPE)
endfunction()
@@ -194,7 +212,8 @@
# Currently checks only for presence of required object formats and support for
# the -Ox argument (multipass optimization).
function(test_nasm)
- execute_process(COMMAND ${AS_EXECUTABLE} -hf OUTPUT_VARIABLE nasm_helptext)
+ execute_process(COMMAND ${CMAKE_ASM_NASM_COMPILER} -hf
+ OUTPUT_VARIABLE nasm_helptext)
if(NOT "${nasm_helptext}" MATCHES "-Ox")
message(
diff --git a/build/cmake/cpu.cmake b/build/cmake/cpu.cmake
index 99ac38a..799a313 100644
--- a/build/cmake/cpu.cmake
+++ b/build/cmake/cpu.cmake
@@ -10,7 +10,10 @@
#
if("${AOM_TARGET_CPU}" MATCHES "^arm")
- set(ARCH_ARM 1)
+ set(AOM_ARCH_ARM 1)
+ if("${AOM_TARGET_CPU}" STREQUAL "arm64")
+ set(AOM_ARCH_AARCH64 1)
+ endif()
set(RTCD_ARCH_ARM "yes")
if(ENABLE_NEON)
@@ -34,7 +37,7 @@
endif()
elseif("${AOM_TARGET_CPU}" MATCHES "ppc")
- set(ARCH_PPC 1)
+ set(AOM_ARCH_PPC 1)
set(RTCD_ARCH_PPC "yes")
if(ENABLE_VSX)
@@ -46,10 +49,10 @@
endif()
elseif("${AOM_TARGET_CPU}" MATCHES "^x86")
if("${AOM_TARGET_CPU}" STREQUAL "x86")
- set(ARCH_X86 1)
+ set(AOM_ARCH_X86 1)
set(RTCD_ARCH_X86 "yes")
elseif("${AOM_TARGET_CPU}" STREQUAL "x86_64")
- set(ARCH_X86_64 1)
+ set(AOM_ARCH_X86_64 1)
set(RTCD_ARCH_X86_64 "yes")
endif()
diff --git a/build/cmake/toolchains/android.cmake b/build/cmake/toolchains/android.cmake
index f0b9fab..4d38c9a 100644
--- a/build/cmake/toolchains/android.cmake
+++ b/build/cmake/toolchains/android.cmake
@@ -45,11 +45,11 @@
endif()
if(ANDROID_ABI MATCHES "^arm")
- set(AS_EXECUTABLE as)
+ set(CMAKE_ASM_COMPILER as)
# No runtime cpu detect for arm targets.
set(CONFIG_RUNTIME_CPU_DETECT 0 CACHE STRING "")
elseif(ANDROID_ABI MATCHES "^x86")
- set(AS_EXECUTABLE yasm)
+ set(CMAKE_ASM_NASM_COMPILER yasm)
endif()
set(CMAKE_SYSTEM_NAME "Android")
diff --git a/build/cmake/toolchains/arm64-linux-gcc.cmake b/build/cmake/toolchains/arm64-linux-gcc.cmake
index 64e460b..133a96a 100644
--- a/build/cmake/toolchains/arm64-linux-gcc.cmake
+++ b/build/cmake/toolchains/arm64-linux-gcc.cmake
@@ -17,7 +17,8 @@
if("${CROSS}" STREQUAL "")
- # Default the cross compiler prefix to something known to work.
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
set(CROSS aarch64-linux-gnu-)
endif()
@@ -27,8 +28,8 @@
if(NOT CMAKE_CXX_COMPILER)
set(CMAKE_CXX_COMPILER ${CROSS}g++)
endif()
-if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE ${CROSS}as)
+if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER ${CROSS}as)
endif()
set(CMAKE_C_FLAGS_INIT "-march=armv8-a")
set(CMAKE_CXX_FLAGS_INIT "-march=armv8-a")
diff --git a/build/cmake/toolchains/arm64-mingw-gcc.cmake b/build/cmake/toolchains/arm64-mingw-gcc.cmake
index 5472ed4..7400423 100644
--- a/build/cmake/toolchains/arm64-mingw-gcc.cmake
+++ b/build/cmake/toolchains/arm64-mingw-gcc.cmake
@@ -17,6 +17,8 @@
set(CMAKE_SYSTEM_NAME "Windows")
if("${CROSS}" STREQUAL "")
+
+ # Default the cross compiler prefix to one used by MSYS2.
set(CROSS aarch64-w64-mingw32-)
endif()
diff --git a/build/cmake/toolchains/armv7-linux-gcc.cmake b/build/cmake/toolchains/armv7-linux-gcc.cmake
index 1201538..366e198 100644
--- a/build/cmake/toolchains/armv7-linux-gcc.cmake
+++ b/build/cmake/toolchains/armv7-linux-gcc.cmake
@@ -17,7 +17,8 @@
if("${CROSS}" STREQUAL "")
- # Default the cross compiler prefix to something known to work.
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
set(CROSS arm-linux-gnueabihf-)
endif()
@@ -31,8 +32,8 @@
if(NOT CMAKE_CXX_COMPILER)
set(CMAKE_CXX_COMPILER ${CROSS}g++)
endif()
-if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE ${CROSS}as)
+if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER ${CROSS}as)
endif()
set(CMAKE_C_FLAGS_INIT "-march=armv7-a -mfpu=vfpv3 \
${AOM_EXTRA_TOOLCHAIN_FLAGS}")
diff --git a/build/cmake/toolchains/armv7-mingw-gcc.cmake b/build/cmake/toolchains/armv7-mingw-gcc.cmake
index 8a92891..93f8c06 100644
--- a/build/cmake/toolchains/armv7-mingw-gcc.cmake
+++ b/build/cmake/toolchains/armv7-mingw-gcc.cmake
@@ -17,6 +17,8 @@
set(CMAKE_SYSTEM_NAME "Windows")
if("${CROSS}" STREQUAL "")
+
+ # Default the cross compiler prefix to one used by MSYS2.
set(CROSS armv7-w64-mingw32-)
endif()
diff --git a/build/cmake/toolchains/ppc-linux-gcc.cmake b/build/cmake/toolchains/ppc-linux-gcc.cmake
index ab0efea..3aa2652 100644
--- a/build/cmake/toolchains/ppc-linux-gcc.cmake
+++ b/build/cmake/toolchains/ppc-linux-gcc.cmake
@@ -17,8 +17,9 @@
if("${CROSS}" STREQUAL "")
- # Default the cross compiler prefix to something known to work.
- set(CROSS powerpc64le-unknown-linux-gnu-)
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
+ set(CROSS powerpc64le-linux-gnu-)
endif()
if(NOT CMAKE_C_COMPILER)
@@ -27,8 +28,8 @@
if(NOT CMAKE_CXX_COMPILER)
set(CMAKE_CXX_COMPILER ${CROSS}g++)
endif()
-if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE ${CROSS}as)
+if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER ${CROSS}as)
endif()
set(CMAKE_SYSTEM_PROCESSOR "ppc")
diff --git a/build/cmake/toolchains/riscv-linux-gcc.cmake b/build/cmake/toolchains/riscv-linux-gcc.cmake
index 21e7370..4133be6 100644
--- a/build/cmake/toolchains/riscv-linux-gcc.cmake
+++ b/build/cmake/toolchains/riscv-linux-gcc.cmake
@@ -17,8 +17,9 @@
if("${CROSS}" STREQUAL "")
- # Default the cross compiler prefix to something known to work.
- set(CROSS riscv64-unknown-linux-gnu-)
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
+ set(CROSS riscv64-linux-gnu-)
endif()
if(NOT CMAKE_C_COMPILER)
@@ -27,8 +28,8 @@
if(NOT CMAKE_CXX_COMPILER)
set(CMAKE_CXX_COMPILER ${CROSS}g++)
endif()
-if(NOT AS_EXECUTABLE)
- set(AS_EXECUTABLE ${CROSS}as)
+if(NOT CMAKE_ASM_COMPILER)
+ set(CMAKE_ASM_COMPILER ${CROSS}as)
endif()
set(CMAKE_SYSTEM_PROCESSOR "riscv")
diff --git a/build/cmake/toolchains/x86-mingw-gcc.cmake b/build/cmake/toolchains/x86-mingw-gcc.cmake
index f75728f..2208333 100644
--- a/build/cmake/toolchains/x86-mingw-gcc.cmake
+++ b/build/cmake/toolchains/x86-mingw-gcc.cmake
@@ -20,6 +20,9 @@
set(CMAKE_EXE_LINKER_FLAGS_INIT "-m32")
if("${CROSS}" STREQUAL "")
+
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
set(CROSS i686-w64-mingw32-)
endif()
diff --git a/build/cmake/toolchains/x86_64-mingw-gcc.cmake b/build/cmake/toolchains/x86_64-mingw-gcc.cmake
index 56e9b6e..978146a 100644
--- a/build/cmake/toolchains/x86_64-mingw-gcc.cmake
+++ b/build/cmake/toolchains/x86_64-mingw-gcc.cmake
@@ -17,6 +17,9 @@
set(CMAKE_SYSTEM_NAME "Windows")
if("${CROSS}" STREQUAL "")
+
+ # Default the cross compiler prefix to one used by Debian and other package
+ # management systems.
set(CROSS x86_64-w64-mingw32-)
endif()
diff --git a/build/cmake/util.cmake b/build/cmake/util.cmake
index 9b3da84..31de2e1 100644
--- a/build/cmake/util.cmake
+++ b/build/cmake/util.cmake
@@ -16,31 +16,32 @@
# Directory where generated sources will be written.
set(AOM_GEN_SRC_DIR "${AOM_CONFIG_DIR}/gen_src")
-# Creates dummy source file in $AOM_GEN_SRC_DIR named $basename.$extension and
-# returns the full path to the dummy source file via appending it to the list
-# variable referred to by $out_file_list_var parameter.
-macro(create_dummy_source_file basename extension out_file_list_var)
- set(dummy_source_file "${AOM_GEN_SRC_DIR}/${basename}_dummy.${extension}")
- file(WRITE "${dummy_source_file}"
+# Creates a no-op source file in $AOM_GEN_SRC_DIR named $basename.$extension and
+# returns the full path to the source file via appending it to the list variable
+# referred to by $out_file_list_var parameter.
+macro(create_no_op_source_file basename extension out_file_list_var)
+ set(no_op_source_file "${AOM_GEN_SRC_DIR}/${basename}_no_op.${extension}")
+ file(WRITE "${no_op_source_file}"
"// Generated file. DO NOT EDIT!\n"
"// ${target_name} needs a ${extension} file to force link language, \n"
"// or to silence a harmless CMake warning: Ignore me.\n"
- "void aom_${target_name}_dummy_function(void) {}\n")
- list(APPEND "${out_file_list_var}" "${dummy_source_file}")
+ "void aom_${target_name}_no_op_function(void);\n"
+ "void aom_${target_name}_no_op_function(void) {}\n")
+ list(APPEND "${out_file_list_var}" "${no_op_source_file}")
endmacro()
-# Convenience function for adding a dummy source file to $target_name using
-# $extension as the file extension. Wraps create_dummy_source_file().
-function(add_dummy_source_file_to_target target_name extension)
- create_dummy_source_file("${target_name}" "${extension}"
- "dummy_source_file_list")
- target_sources(${target_name} PRIVATE ${dummy_source_file_list})
+# Convenience function for adding a no-op source file to $target_name using
+# $extension as the file extension. Wraps create_no_op_source_file().
+function(add_no_op_source_file_to_target target_name extension)
+ create_no_op_source_file("${target_name}" "${extension}"
+ "no_op_source_file_list")
+ target_sources(${target_name} PRIVATE ${no_op_source_file_list})
endfunction()
# Sets the value of the variable referenced by $feature to $value, and reports
# the change to the user via call to message(WARNING ...). $cause is expected to
# be a configuration variable that conflicts with $feature in some way. This
-# function is a noop if $feature is already set to $value.
+# function is a no-op if $feature is already set to $value.
function(change_config_and_warn feature value cause)
if(${feature} EQUAL ${value})
return()
@@ -100,7 +101,7 @@
# already been set via the CMake command line.
#
# The names of variables defaulted through this macro are added to
-# $AOM_CONFIG_VARS to facilitate build logging and diagnostics.
+# $AOM_DETECT_VARS to facilitate build logging and diagnostics.
macro(set_aom_detect_var name value helpstring)
unset(list_index)
list(FIND AOM_DETECT_VARS ${name} list_index)
diff --git a/common/tools_common.c b/common/tools_common.c
index 6b579e0..afe4619 100644
--- a/common/tools_common.c
+++ b/common/tools_common.c
@@ -26,15 +26,9 @@
#include "aom/aomdx.h"
#endif
-#if defined(_WIN32) || defined(__OS2__)
+#if defined(_WIN32)
#include <io.h>
#include <fcntl.h>
-
-#ifdef __OS2__
-#define _setmode setmode
-#define _fileno fileno
-#define _O_BINARY O_BINARY
-#endif
#endif
#define LOG_ERROR(label) \
@@ -50,7 +44,7 @@
FILE *set_binary_mode(FILE *stream) {
(void)stream;
-#if defined(_WIN32) || defined(__OS2__)
+#if defined(_WIN32)
_setmode(_fileno(stream), _O_BINARY);
#endif
return stream;
@@ -76,6 +70,21 @@
exit(EXIT_FAILURE);
}
+const char *image_format_to_string(aom_img_fmt_t fmt) {
+ switch (fmt) {
+ case AOM_IMG_FMT_I420: return "I420";
+ case AOM_IMG_FMT_I422: return "I422";
+ case AOM_IMG_FMT_I444: return "I444";
+ case AOM_IMG_FMT_YV12: return "YV12";
+ case AOM_IMG_FMT_NV12: return "NV12";
+ case AOM_IMG_FMT_YV1216: return "YV1216";
+ case AOM_IMG_FMT_I42016: return "I42016";
+ case AOM_IMG_FMT_I42216: return "I42216";
+ case AOM_IMG_FMT_I44416: return "I44416";
+ default: return "Other";
+ }
+}
+
int read_yuv_frame(struct AvxInputContext *input_ctx, aom_image_t *yuv_frame) {
FILE *f = input_ctx->file;
struct FileTypeDetectionBuffer *detect = &input_ctx->detect;
@@ -133,8 +142,8 @@
struct CodecInfo {
// Pointer to a function of zero arguments that returns an aom_codec_iface_t.
- aom_codec_iface_t *(*const interface)();
- char *short_name;
+ aom_codec_iface_t *(*interface)(void);
+ const char *short_name;
uint32_t fourcc;
};
@@ -300,7 +309,7 @@
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I42216:
case AOM_IMG_FMT_I44416: break;
- default: fatal("Unsupported image conversion"); break;
+ default: fatal("Unsupported image conversion");
}
for (plane = 0; plane < 3; plane++) {
int w = src->d_w;
@@ -336,7 +345,7 @@
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I444: break;
- default: fatal("Unsupported image conversion"); break;
+ default: fatal("Unsupported image conversion");
}
for (plane = 0; plane < 3; plane++) {
int w = src->d_w;
@@ -377,7 +386,7 @@
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I444: break;
- default: fatal("Unsupported image conversion"); break;
+ default: fatal("Unsupported image conversion");
}
for (plane = 0; plane < 3; plane++) {
int w = src->d_w;
@@ -411,7 +420,7 @@
case AOM_IMG_FMT_I42016:
case AOM_IMG_FMT_I42216:
case AOM_IMG_FMT_I44416: break;
- default: fatal("Unsupported image conversion"); break;
+ default: fatal("Unsupported image conversion");
}
for (plane = 0; plane < 3; plane++) {
int w = src->d_w;
@@ -444,7 +453,7 @@
case AOM_IMG_FMT_I420:
case AOM_IMG_FMT_I422:
case AOM_IMG_FMT_I444: break;
- default: fatal("Unsupported image conversion"); break;
+ default: fatal("Unsupported image conversion");
}
for (plane = 0; plane < 3; plane++) {
int w = src->d_w;
diff --git a/common/tools_common.h b/common/tools_common.h
index eeccbe4..b31371c 100644
--- a/common/tools_common.h
+++ b/common/tools_common.h
@@ -157,7 +157,7 @@
// The AOM library can support different encoders / decoders. These
// functions provide different ways to lookup / iterate through them.
// The return result may be NULL to indicate no codec was found.
-int get_aom_encoder_count();
+int get_aom_encoder_count(void);
aom_codec_iface_t *get_aom_encoder_by_index(int i);
aom_codec_iface_t *get_aom_encoder_by_short_name(const char *name);
// If the interface is unknown, returns NULL.
@@ -165,7 +165,7 @@
// If the interface is unknown, returns 0.
uint32_t get_fourcc_by_aom_encoder(aom_codec_iface_t *iface);
-int get_aom_decoder_count();
+int get_aom_decoder_count(void);
aom_codec_iface_t *get_aom_decoder_by_index(int i);
aom_codec_iface_t *get_aom_decoder_by_short_name(const char *name);
aom_codec_iface_t *get_aom_decoder_by_fourcc(uint32_t fourcc);
@@ -173,6 +173,8 @@
// If the interface is unknown, returns 0.
uint32_t get_fourcc_by_aom_decoder(aom_codec_iface_t *iface);
+const char *image_format_to_string(aom_img_fmt_t fmt);
+
int read_yuv_frame(struct AvxInputContext *input_ctx, aom_image_t *yuv_frame);
void aom_img_write(const aom_image_t *img, FILE *file);
diff --git a/common/y4menc.c b/common/y4menc.c
index 7d32465..25086a9 100644
--- a/common/y4menc.c
+++ b/common/y4menc.c
@@ -28,7 +28,8 @@
// Return the Y4M name of the 8-bit colorspace, given the chroma position and
// image format.
-const char *colorspace8(aom_chroma_sample_position_t csp, aom_img_fmt_t fmt) {
+static const char *colorspace8(aom_chroma_sample_position_t csp,
+ aom_img_fmt_t fmt) {
switch (fmt) {
case AOM_IMG_FMT_I444: return "C444";
case AOM_IMG_FMT_I422: return "C422";
diff --git a/common/y4minput.c b/common/y4minput.c
index 2fc8379..1974d76 100644
--- a/common/y4minput.c
+++ b/common/y4minput.c
@@ -1202,6 +1202,7 @@
_img->fmt = _y4m->aom_fmt;
_img->w = _img->d_w = _y4m->pic_w;
_img->h = _img->d_h = _y4m->pic_h;
+ _img->bit_depth = _y4m->bit_depth;
_img->x_chroma_shift = _y4m->dst_c_dec_h >> 1;
_img->y_chroma_shift = _y4m->dst_c_dec_v >> 1;
_img->bps = _y4m->bps;
diff --git a/config/arm/config/aom_config.asm b/config/arm/config/aom_config.asm
index a6b5453..ce46e8b 100644
--- a/config/arm/config/aom_config.asm
+++ b/config/arm/config/aom_config.asm
@@ -8,10 +8,11 @@
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
-ARCH_ARM equ 1
-ARCH_PPC equ 0
-ARCH_X86 equ 0
-ARCH_X86_64 equ 0
+AOM_ARCH_AARCH64 equ 0
+AOM_ARCH_ARM equ 1
+AOM_ARCH_PPC equ 0
+AOM_ARCH_X86 equ 0
+AOM_ARCH_X86_64 equ 0
CONFIG_ACCOUNTING equ 0
CONFIG_ANALYZER equ 0
CONFIG_AV1_DECODER equ 1
@@ -47,6 +48,7 @@
CONFIG_NORMAL_TILE_MODE equ 1
CONFIG_OPTICAL_FLOW_API equ 0
CONFIG_OS_SUPPORT equ 1
+CONFIG_OUTPUT_FRAME_SIZE equ 0
CONFIG_PARTITION_SEARCH_ORDER equ 0
CONFIG_PIC equ 1
CONFIG_RATECTRL_LOG equ 0
@@ -55,6 +57,7 @@
CONFIG_REALTIME_ONLY equ 0
CONFIG_RT_ML_PARTITIONING equ 0
CONFIG_RUNTIME_CPU_DETECT equ 0
+CONFIG_SALIENCY_MAP equ 0
CONFIG_SHARED equ 0
CONFIG_SIZE_LIMIT equ 1
CONFIG_SPATIAL_RESAMPLING equ 1
diff --git a/config/arm/config/aom_config.c b/config/arm/config/aom_config.c
index 3fcba38..affe0d7 100644
--- a/config/arm/config/aom_config.c
+++ b/config/arm/config/aom_config.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/config/arm/config/aom_config.h b/config/arm/config/aom_config.h
index 9f2cfc1..6611944 100644
--- a/config/arm/config/aom_config.h
+++ b/config/arm/config/aom_config.h
@@ -10,10 +10,11 @@
*/
#ifndef AOM_CONFIG_H_
#define AOM_CONFIG_H_
-#define ARCH_ARM 1
-#define ARCH_PPC 0
-#define ARCH_X86 0
-#define ARCH_X86_64 0
+#define AOM_ARCH_AARCH64 0
+#define AOM_ARCH_ARM 1
+#define AOM_ARCH_PPC 0
+#define AOM_ARCH_X86 0
+#define AOM_ARCH_X86_64 0
#define CONFIG_ACCOUNTING 0
#define CONFIG_ANALYZER 0
#define CONFIG_AV1_DECODER 1
@@ -49,6 +50,7 @@
#define CONFIG_NORMAL_TILE_MODE 1
#define CONFIG_OPTICAL_FLOW_API 0
#define CONFIG_OS_SUPPORT 1
+#define CONFIG_OUTPUT_FRAME_SIZE 0
#define CONFIG_PARTITION_SEARCH_ORDER 0
#define CONFIG_PIC 1
#define CONFIG_RATECTRL_LOG 0
@@ -57,6 +59,7 @@
#define CONFIG_REALTIME_ONLY 0
#define CONFIG_RT_ML_PARTITIONING 0
#define CONFIG_RUNTIME_CPU_DETECT 0
+#define CONFIG_SALIENCY_MAP 0
#define CONFIG_SHARED 0
#define CONFIG_SIZE_LIMIT 1
#define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/arm/config/aom_dsp_rtcd.h b/config/arm/config/aom_dsp_rtcd.h
index 7ae6636..ad77b04 100644
--- a/config/arm/config/aom_dsp_rtcd.h
+++ b/config/arm/config/aom_dsp_rtcd.h
@@ -14,8 +14,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
#ifdef __cplusplus
@@ -46,19 +46,26 @@
#define aom_blend_a64_vmask aom_blend_a64_vmask_neon
void aom_comp_avg_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride);
-#define aom_comp_avg_pred aom_comp_avg_pred_c
+void aom_comp_avg_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride);
+#define aom_comp_avg_pred aom_comp_avg_pred_neon
void aom_comp_mask_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
-#define aom_comp_mask_pred aom_comp_mask_pred_c
+void aom_comp_mask_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
+#define aom_comp_mask_pred aom_comp_mask_pred_neon
+
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v);
+#define aom_compute_flow_at_point aom_compute_flow_at_point_c
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h);
#define aom_convolve8 aom_convolve8_c
void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
-#define aom_convolve8_horiz aom_convolve8_horiz_c
+void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_horiz aom_convolve8_horiz_neon
void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
-#define aom_convolve8_vert aom_convolve8_vert_c
+void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_vert aom_convolve8_vert_neon
void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, int w, int h);
void aom_convolve_copy_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, int w, int h);
@@ -69,57 +76,72 @@
#define aom_dc_128_predictor_16x16 aom_dc_128_predictor_16x16_neon
void aom_dc_128_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x32 aom_dc_128_predictor_16x32_c
+void aom_dc_128_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x32 aom_dc_128_predictor_16x32_neon
void aom_dc_128_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x4 aom_dc_128_predictor_16x4_c
+void aom_dc_128_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x4 aom_dc_128_predictor_16x4_neon
void aom_dc_128_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x64 aom_dc_128_predictor_16x64_c
+void aom_dc_128_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x64 aom_dc_128_predictor_16x64_neon
void aom_dc_128_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x8 aom_dc_128_predictor_16x8_c
+void aom_dc_128_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x8 aom_dc_128_predictor_16x8_neon
void aom_dc_128_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x16 aom_dc_128_predictor_32x16_c
+void aom_dc_128_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x16 aom_dc_128_predictor_32x16_neon
void aom_dc_128_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_128_predictor_32x32 aom_dc_128_predictor_32x32_neon
void aom_dc_128_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x64 aom_dc_128_predictor_32x64_c
+void aom_dc_128_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x64 aom_dc_128_predictor_32x64_neon
void aom_dc_128_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x8 aom_dc_128_predictor_32x8_c
+void aom_dc_128_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x8 aom_dc_128_predictor_32x8_neon
void aom_dc_128_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_4x16 aom_dc_128_predictor_4x16_c
+void aom_dc_128_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x16 aom_dc_128_predictor_4x16_neon
void aom_dc_128_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_128_predictor_4x4 aom_dc_128_predictor_4x4_neon
void aom_dc_128_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_4x8 aom_dc_128_predictor_4x8_c
+void aom_dc_128_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x8 aom_dc_128_predictor_4x8_neon
void aom_dc_128_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x16 aom_dc_128_predictor_64x16_c
+void aom_dc_128_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x16 aom_dc_128_predictor_64x16_neon
void aom_dc_128_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x32 aom_dc_128_predictor_64x32_c
+void aom_dc_128_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x32 aom_dc_128_predictor_64x32_neon
void aom_dc_128_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x64 aom_dc_128_predictor_64x64_c
+void aom_dc_128_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x64 aom_dc_128_predictor_64x64_neon
void aom_dc_128_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x16 aom_dc_128_predictor_8x16_c
+void aom_dc_128_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x16 aom_dc_128_predictor_8x16_neon
void aom_dc_128_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x32 aom_dc_128_predictor_8x32_c
+void aom_dc_128_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x32 aom_dc_128_predictor_8x32_neon
void aom_dc_128_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x4 aom_dc_128_predictor_8x4_c
+void aom_dc_128_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x4 aom_dc_128_predictor_8x4_neon
void aom_dc_128_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -130,57 +152,72 @@
#define aom_dc_left_predictor_16x16 aom_dc_left_predictor_16x16_neon
void aom_dc_left_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x32 aom_dc_left_predictor_16x32_c
+void aom_dc_left_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x32 aom_dc_left_predictor_16x32_neon
void aom_dc_left_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x4 aom_dc_left_predictor_16x4_c
+void aom_dc_left_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x4 aom_dc_left_predictor_16x4_neon
void aom_dc_left_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x64 aom_dc_left_predictor_16x64_c
+void aom_dc_left_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x64 aom_dc_left_predictor_16x64_neon
void aom_dc_left_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x8 aom_dc_left_predictor_16x8_c
+void aom_dc_left_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x8 aom_dc_left_predictor_16x8_neon
void aom_dc_left_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x16 aom_dc_left_predictor_32x16_c
+void aom_dc_left_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x16 aom_dc_left_predictor_32x16_neon
void aom_dc_left_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_left_predictor_32x32 aom_dc_left_predictor_32x32_neon
void aom_dc_left_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x64 aom_dc_left_predictor_32x64_c
+void aom_dc_left_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x64 aom_dc_left_predictor_32x64_neon
void aom_dc_left_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x8 aom_dc_left_predictor_32x8_c
+void aom_dc_left_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x8 aom_dc_left_predictor_32x8_neon
void aom_dc_left_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_4x16 aom_dc_left_predictor_4x16_c
+void aom_dc_left_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x16 aom_dc_left_predictor_4x16_neon
void aom_dc_left_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_left_predictor_4x4 aom_dc_left_predictor_4x4_neon
void aom_dc_left_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_4x8 aom_dc_left_predictor_4x8_c
+void aom_dc_left_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x8 aom_dc_left_predictor_4x8_neon
void aom_dc_left_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x16 aom_dc_left_predictor_64x16_c
+void aom_dc_left_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x16 aom_dc_left_predictor_64x16_neon
void aom_dc_left_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x32 aom_dc_left_predictor_64x32_c
+void aom_dc_left_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x32 aom_dc_left_predictor_64x32_neon
void aom_dc_left_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x64 aom_dc_left_predictor_64x64_c
+void aom_dc_left_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x64 aom_dc_left_predictor_64x64_neon
void aom_dc_left_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x16 aom_dc_left_predictor_8x16_c
+void aom_dc_left_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x16 aom_dc_left_predictor_8x16_neon
void aom_dc_left_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x32 aom_dc_left_predictor_8x32_c
+void aom_dc_left_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x32 aom_dc_left_predictor_8x32_neon
void aom_dc_left_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x4 aom_dc_left_predictor_8x4_c
+void aom_dc_left_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x4 aom_dc_left_predictor_8x4_neon
void aom_dc_left_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -191,57 +228,72 @@
#define aom_dc_predictor_16x16 aom_dc_predictor_16x16_neon
void aom_dc_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x32 aom_dc_predictor_16x32_c
+void aom_dc_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x32 aom_dc_predictor_16x32_neon
void aom_dc_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x4 aom_dc_predictor_16x4_c
+void aom_dc_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x4 aom_dc_predictor_16x4_neon
void aom_dc_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x64 aom_dc_predictor_16x64_c
+void aom_dc_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x64 aom_dc_predictor_16x64_neon
void aom_dc_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x8 aom_dc_predictor_16x8_c
+void aom_dc_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x8 aom_dc_predictor_16x8_neon
void aom_dc_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x16 aom_dc_predictor_32x16_c
+void aom_dc_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x16 aom_dc_predictor_32x16_neon
void aom_dc_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_predictor_32x32 aom_dc_predictor_32x32_neon
void aom_dc_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x64 aom_dc_predictor_32x64_c
+void aom_dc_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x64 aom_dc_predictor_32x64_neon
void aom_dc_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x8 aom_dc_predictor_32x8_c
+void aom_dc_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x8 aom_dc_predictor_32x8_neon
void aom_dc_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_4x16 aom_dc_predictor_4x16_c
+void aom_dc_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x16 aom_dc_predictor_4x16_neon
void aom_dc_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_predictor_4x4 aom_dc_predictor_4x4_neon
void aom_dc_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_4x8 aom_dc_predictor_4x8_c
+void aom_dc_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x8 aom_dc_predictor_4x8_neon
void aom_dc_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x16 aom_dc_predictor_64x16_c
+void aom_dc_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x16 aom_dc_predictor_64x16_neon
void aom_dc_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x32 aom_dc_predictor_64x32_c
+void aom_dc_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x32 aom_dc_predictor_64x32_neon
void aom_dc_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x64 aom_dc_predictor_64x64_c
+void aom_dc_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x64 aom_dc_predictor_64x64_neon
void aom_dc_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x16 aom_dc_predictor_8x16_c
+void aom_dc_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x16 aom_dc_predictor_8x16_neon
void aom_dc_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x32 aom_dc_predictor_8x32_c
+void aom_dc_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x32 aom_dc_predictor_8x32_neon
void aom_dc_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x4 aom_dc_predictor_8x4_c
+void aom_dc_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x4 aom_dc_predictor_8x4_neon
void aom_dc_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -252,57 +304,72 @@
#define aom_dc_top_predictor_16x16 aom_dc_top_predictor_16x16_neon
void aom_dc_top_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x32 aom_dc_top_predictor_16x32_c
+void aom_dc_top_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x32 aom_dc_top_predictor_16x32_neon
void aom_dc_top_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x4 aom_dc_top_predictor_16x4_c
+void aom_dc_top_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x4 aom_dc_top_predictor_16x4_neon
void aom_dc_top_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x64 aom_dc_top_predictor_16x64_c
+void aom_dc_top_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x64 aom_dc_top_predictor_16x64_neon
void aom_dc_top_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x8 aom_dc_top_predictor_16x8_c
+void aom_dc_top_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x8 aom_dc_top_predictor_16x8_neon
void aom_dc_top_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x16 aom_dc_top_predictor_32x16_c
+void aom_dc_top_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x16 aom_dc_top_predictor_32x16_neon
void aom_dc_top_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_top_predictor_32x32 aom_dc_top_predictor_32x32_neon
void aom_dc_top_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x64 aom_dc_top_predictor_32x64_c
+void aom_dc_top_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x64 aom_dc_top_predictor_32x64_neon
void aom_dc_top_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x8 aom_dc_top_predictor_32x8_c
+void aom_dc_top_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x8 aom_dc_top_predictor_32x8_neon
void aom_dc_top_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_4x16 aom_dc_top_predictor_4x16_c
+void aom_dc_top_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x16 aom_dc_top_predictor_4x16_neon
void aom_dc_top_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_top_predictor_4x4 aom_dc_top_predictor_4x4_neon
void aom_dc_top_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_4x8 aom_dc_top_predictor_4x8_c
+void aom_dc_top_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x8 aom_dc_top_predictor_4x8_neon
void aom_dc_top_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x16 aom_dc_top_predictor_64x16_c
+void aom_dc_top_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x16 aom_dc_top_predictor_64x16_neon
void aom_dc_top_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x32 aom_dc_top_predictor_64x32_c
+void aom_dc_top_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x32 aom_dc_top_predictor_64x32_neon
void aom_dc_top_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x64 aom_dc_top_predictor_64x64_c
+void aom_dc_top_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x64 aom_dc_top_predictor_64x64_neon
void aom_dc_top_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x16 aom_dc_top_predictor_8x16_c
+void aom_dc_top_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x16 aom_dc_top_predictor_8x16_neon
void aom_dc_top_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x32 aom_dc_top_predictor_8x32_c
+void aom_dc_top_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x32 aom_dc_top_predictor_8x32_neon
void aom_dc_top_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x4 aom_dc_top_predictor_8x4_c
+void aom_dc_top_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x4 aom_dc_top_predictor_8x4_neon
void aom_dc_top_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -451,10 +518,6 @@
void aom_fdct4x4_lp_neon(const int16_t *input, int16_t *output, int stride);
#define aom_fdct4x4_lp aom_fdct4x4_lp_neon
-void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_fdct8x8_neon(const int16_t *input, tran_low_t *output, int stride);
-#define aom_fdct8x8 aom_fdct8x8_neon
-
void aom_fft16x16_float_c(const float *input, float *temp, float *output);
#define aom_fft16x16_float aom_fft16x16_float_c
@@ -470,18 +533,6 @@
void aom_fft8x8_float_c(const float *input, float *temp, float *output);
#define aom_fft8x8_float aom_fft8x8_float_c
-void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get16x16var_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get16x16var aom_get16x16var_neon
-
-unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-unsigned int aom_get4x4sse_cs_neon(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-#define aom_get4x4sse_cs aom_get4x4sse_cs_neon
-
-void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get8x8var_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get8x8var aom_get8x8var_neon
-
void aom_get_blk_sse_sum_c(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
#define aom_get_blk_sse_sum aom_get_blk_sse_sum_c
@@ -489,7 +540,8 @@
#define aom_get_mb_ss aom_get_mb_ss_c
void aom_get_var_sse_sum_16x16_dual_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
-#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_c
+void aom_get_var_sse_sum_16x16_dual_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
+#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_neon
void aom_get_var_sse_sum_8x8_quad_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
void aom_get_var_sse_sum_8x8_quad_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
@@ -500,57 +552,72 @@
#define aom_h_predictor_16x16 aom_h_predictor_16x16_neon
void aom_h_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x32 aom_h_predictor_16x32_c
+void aom_h_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x32 aom_h_predictor_16x32_neon
void aom_h_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x4 aom_h_predictor_16x4_c
+void aom_h_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x4 aom_h_predictor_16x4_neon
void aom_h_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x64 aom_h_predictor_16x64_c
+void aom_h_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x64 aom_h_predictor_16x64_neon
void aom_h_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x8 aom_h_predictor_16x8_c
+void aom_h_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x8 aom_h_predictor_16x8_neon
void aom_h_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x16 aom_h_predictor_32x16_c
+void aom_h_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x16 aom_h_predictor_32x16_neon
void aom_h_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_h_predictor_32x32 aom_h_predictor_32x32_neon
void aom_h_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x64 aom_h_predictor_32x64_c
+void aom_h_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x64 aom_h_predictor_32x64_neon
void aom_h_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x8 aom_h_predictor_32x8_c
+void aom_h_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x8 aom_h_predictor_32x8_neon
void aom_h_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_4x16 aom_h_predictor_4x16_c
+void aom_h_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x16 aom_h_predictor_4x16_neon
void aom_h_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_h_predictor_4x4 aom_h_predictor_4x4_neon
void aom_h_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_4x8 aom_h_predictor_4x8_c
+void aom_h_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x8 aom_h_predictor_4x8_neon
void aom_h_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x16 aom_h_predictor_64x16_c
+void aom_h_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x16 aom_h_predictor_64x16_neon
void aom_h_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x32 aom_h_predictor_64x32_c
+void aom_h_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x32 aom_h_predictor_64x32_neon
void aom_h_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x64 aom_h_predictor_64x64_c
+void aom_h_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x64 aom_h_predictor_64x64_neon
void aom_h_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x16 aom_h_predictor_8x16_c
+void aom_h_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x16 aom_h_predictor_8x16_neon
void aom_h_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x32 aom_h_predictor_8x32_c
+void aom_h_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x32 aom_h_predictor_8x32_neon
void aom_h_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x4 aom_h_predictor_8x4_c
+void aom_h_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x4 aom_h_predictor_8x4_neon
void aom_h_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -561,10 +628,12 @@
#define aom_hadamard_16x16 aom_hadamard_16x16_neon
void aom_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_hadamard_32x32 aom_hadamard_32x32_c
+void aom_hadamard_32x32_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_32x32 aom_hadamard_32x32_neon
void aom_hadamard_4x4_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_hadamard_4x4 aom_hadamard_4x4_c
+void aom_hadamard_4x4_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_4x4 aom_hadamard_4x4_neon
void aom_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
void aom_hadamard_8x8_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
@@ -648,12 +717,6 @@
uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
-
-void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
-
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_c
@@ -721,16 +784,20 @@
#define aom_highbd_10_masked_sub_pixel_variance8x8 aom_highbd_10_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_10_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse16x16 aom_highbd_10_mse16x16_c
+unsigned int aom_highbd_10_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x16 aom_highbd_10_mse16x16_neon
unsigned int aom_highbd_10_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse16x8 aom_highbd_10_mse16x8_c
+unsigned int aom_highbd_10_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x8 aom_highbd_10_mse16x8_neon
unsigned int aom_highbd_10_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse8x16 aom_highbd_10_mse8x16_c
+unsigned int aom_highbd_10_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x16 aom_highbd_10_mse8x16_neon
unsigned int aom_highbd_10_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse8x8 aom_highbd_10_mse8x8_c
+unsigned int aom_highbd_10_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x8 aom_highbd_10_mse8x8_neon
unsigned int aom_highbd_10_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
#define aom_highbd_10_obmc_sub_pixel_variance128x128 aom_highbd_10_obmc_sub_pixel_variance128x128_c
@@ -1012,12 +1079,12 @@
unsigned int aom_highbd_10_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_neon
-unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_neon
-unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_neon
unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1042,12 +1109,12 @@
unsigned int aom_highbd_10_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_neon
-unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_neon
-unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_neon
unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1065,8 +1132,8 @@
unsigned int aom_highbd_10_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_neon
-unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_neon
unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1081,8 +1148,8 @@
unsigned int aom_highbd_10_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_neon
-unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_neon
unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1159,12 +1226,6 @@
uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
-
-void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
-
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_c
@@ -1232,16 +1293,20 @@
#define aom_highbd_12_masked_sub_pixel_variance8x8 aom_highbd_12_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_12_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse16x16 aom_highbd_12_mse16x16_c
+unsigned int aom_highbd_12_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x16 aom_highbd_12_mse16x16_neon
unsigned int aom_highbd_12_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse16x8 aom_highbd_12_mse16x8_c
+unsigned int aom_highbd_12_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x8 aom_highbd_12_mse16x8_neon
unsigned int aom_highbd_12_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse8x16 aom_highbd_12_mse8x16_c
+unsigned int aom_highbd_12_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x16 aom_highbd_12_mse8x16_neon
unsigned int aom_highbd_12_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse8x8 aom_highbd_12_mse8x8_c
+unsigned int aom_highbd_12_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x8 aom_highbd_12_mse8x8_neon
unsigned int aom_highbd_12_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
#define aom_highbd_12_obmc_sub_pixel_variance128x128 aom_highbd_12_obmc_sub_pixel_variance128x128_c
@@ -1508,25 +1573,32 @@
#define aom_highbd_12_sub_pixel_variance8x8 aom_highbd_12_sub_pixel_variance8x8_c
unsigned int aom_highbd_12_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance128x128 aom_highbd_12_variance128x128_c
+unsigned int aom_highbd_12_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x128 aom_highbd_12_variance128x128_neon
unsigned int aom_highbd_12_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance128x64 aom_highbd_12_variance128x64_c
+unsigned int aom_highbd_12_variance128x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x64 aom_highbd_12_variance128x64_neon
unsigned int aom_highbd_12_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x16 aom_highbd_12_variance16x16_c
+unsigned int aom_highbd_12_variance16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x16 aom_highbd_12_variance16x16_neon
unsigned int aom_highbd_12_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_c
+unsigned int aom_highbd_12_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_neon
-unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_neon
-unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_c
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_neon
unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x8 aom_highbd_12_variance16x8_c
+unsigned int aom_highbd_12_variance16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x8 aom_highbd_12_variance16x8_neon
unsigned int aom_highbd_12_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance2x2 aom_highbd_12_variance2x2_c
@@ -1535,52 +1607,67 @@
#define aom_highbd_12_variance2x4 aom_highbd_12_variance2x4_c
unsigned int aom_highbd_12_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x16 aom_highbd_12_variance32x16_c
+unsigned int aom_highbd_12_variance32x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x16 aom_highbd_12_variance32x16_neon
unsigned int aom_highbd_12_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x32 aom_highbd_12_variance32x32_c
+unsigned int aom_highbd_12_variance32x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x32 aom_highbd_12_variance32x32_neon
unsigned int aom_highbd_12_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_c
+unsigned int aom_highbd_12_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_neon
-unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_c
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_neon
-unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_neon
unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance4x2 aom_highbd_12_variance4x2_c
unsigned int aom_highbd_12_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance4x4 aom_highbd_12_variance4x4_c
+unsigned int aom_highbd_12_variance4x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x4 aom_highbd_12_variance4x4_neon
unsigned int aom_highbd_12_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance4x8 aom_highbd_12_variance4x8_c
+unsigned int aom_highbd_12_variance4x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x8 aom_highbd_12_variance4x8_neon
unsigned int aom_highbd_12_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_c
+unsigned int aom_highbd_12_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_neon
-unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_c
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_neon
unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x32 aom_highbd_12_variance64x32_c
+unsigned int aom_highbd_12_variance64x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x32 aom_highbd_12_variance64x32_neon
unsigned int aom_highbd_12_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x64 aom_highbd_12_variance64x64_c
+unsigned int aom_highbd_12_variance64x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x64 aom_highbd_12_variance64x64_neon
unsigned int aom_highbd_12_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_c
+unsigned int aom_highbd_12_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_neon
-unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_c
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_neon
unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x4 aom_highbd_12_variance8x4_c
+unsigned int aom_highbd_12_variance8x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x4 aom_highbd_12_variance8x4_neon
unsigned int aom_highbd_12_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x8 aom_highbd_12_variance8x8_c
+unsigned int aom_highbd_12_variance8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x8 aom_highbd_12_variance8x8_neon
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128 aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c
@@ -1648,12 +1735,6 @@
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
-
-void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
-
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_c
@@ -1721,16 +1802,20 @@
#define aom_highbd_8_masked_sub_pixel_variance8x8 aom_highbd_8_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_8_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse16x16 aom_highbd_8_mse16x16_c
+unsigned int aom_highbd_8_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x16 aom_highbd_8_mse16x16_neon
unsigned int aom_highbd_8_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse16x8 aom_highbd_8_mse16x8_c
+unsigned int aom_highbd_8_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x8 aom_highbd_8_mse16x8_neon
unsigned int aom_highbd_8_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse8x16 aom_highbd_8_mse8x16_c
+unsigned int aom_highbd_8_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x16 aom_highbd_8_mse8x16_neon
unsigned int aom_highbd_8_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse8x8 aom_highbd_8_mse8x8_c
+unsigned int aom_highbd_8_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x8 aom_highbd_8_mse8x8_neon
uint32_t aom_highbd_8_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
#define aom_highbd_8_sub_pixel_avg_variance128x128 aom_highbd_8_sub_pixel_avg_variance128x128_c
@@ -1865,25 +1950,32 @@
#define aom_highbd_8_sub_pixel_variance8x8 aom_highbd_8_sub_pixel_variance8x8_c
unsigned int aom_highbd_8_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance128x128 aom_highbd_8_variance128x128_c
+unsigned int aom_highbd_8_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x128 aom_highbd_8_variance128x128_neon
unsigned int aom_highbd_8_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance128x64 aom_highbd_8_variance128x64_c
+unsigned int aom_highbd_8_variance128x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x64 aom_highbd_8_variance128x64_neon
unsigned int aom_highbd_8_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x16 aom_highbd_8_variance16x16_c
+unsigned int aom_highbd_8_variance16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x16 aom_highbd_8_variance16x16_neon
unsigned int aom_highbd_8_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_c
+unsigned int aom_highbd_8_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_neon
-unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_neon
-unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_c
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_neon
unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x8 aom_highbd_8_variance16x8_c
+unsigned int aom_highbd_8_variance16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x8 aom_highbd_8_variance16x8_neon
unsigned int aom_highbd_8_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance2x2 aom_highbd_8_variance2x2_c
@@ -1892,59 +1984,75 @@
#define aom_highbd_8_variance2x4 aom_highbd_8_variance2x4_c
unsigned int aom_highbd_8_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x16 aom_highbd_8_variance32x16_c
+unsigned int aom_highbd_8_variance32x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x16 aom_highbd_8_variance32x16_neon
unsigned int aom_highbd_8_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x32 aom_highbd_8_variance32x32_c
+unsigned int aom_highbd_8_variance32x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x32 aom_highbd_8_variance32x32_neon
unsigned int aom_highbd_8_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_c
+unsigned int aom_highbd_8_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_neon
-unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_c
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_neon
-unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_neon
unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance4x2 aom_highbd_8_variance4x2_c
unsigned int aom_highbd_8_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance4x4 aom_highbd_8_variance4x4_c
+unsigned int aom_highbd_8_variance4x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x4 aom_highbd_8_variance4x4_neon
unsigned int aom_highbd_8_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance4x8 aom_highbd_8_variance4x8_c
+unsigned int aom_highbd_8_variance4x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x8 aom_highbd_8_variance4x8_neon
unsigned int aom_highbd_8_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_c
+unsigned int aom_highbd_8_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_neon
-unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_c
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_neon
unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x32 aom_highbd_8_variance64x32_c
+unsigned int aom_highbd_8_variance64x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x32 aom_highbd_8_variance64x32_neon
unsigned int aom_highbd_8_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x64 aom_highbd_8_variance64x64_c
+unsigned int aom_highbd_8_variance64x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x64 aom_highbd_8_variance64x64_neon
unsigned int aom_highbd_8_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_c
+unsigned int aom_highbd_8_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_neon
-unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_c
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_neon
unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x4 aom_highbd_8_variance8x4_c
+unsigned int aom_highbd_8_variance8x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x4 aom_highbd_8_variance8x4_neon
unsigned int aom_highbd_8_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x8 aom_highbd_8_variance8x8_c
+unsigned int aom_highbd_8_variance8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x8 aom_highbd_8_variance8x8_neon
unsigned int aom_highbd_avg_4x4_c(const uint8_t *, int p);
unsigned int aom_highbd_avg_4x4_neon(const uint8_t *, int p);
#define aom_highbd_avg_4x4 aom_highbd_avg_4x4_neon
unsigned int aom_highbd_avg_8x8_c(const uint8_t *, int p);
-#define aom_highbd_avg_8x8 aom_highbd_avg_8x8_c
+unsigned int aom_highbd_avg_8x8_neon(const uint8_t *, int p);
+#define aom_highbd_avg_8x8 aom_highbd_avg_8x8_neon
void aom_highbd_blend_a64_d16_mask_c(uint8_t *dst, uint32_t dst_stride, const CONV_BUF_TYPE *src0, uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subw, int subh, ConvolveParams *conv_params, const int bd);
#define aom_highbd_blend_a64_d16_mask aom_highbd_blend_a64_d16_mask_c
@@ -1974,237 +2082,308 @@
#define aom_highbd_convolve_copy aom_highbd_convolve_copy_c
void aom_highbd_dc_128_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x16 aom_highbd_dc_128_predictor_16x16_c
+void aom_highbd_dc_128_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x16 aom_highbd_dc_128_predictor_16x16_neon
void aom_highbd_dc_128_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x32 aom_highbd_dc_128_predictor_16x32_c
+void aom_highbd_dc_128_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x32 aom_highbd_dc_128_predictor_16x32_neon
void aom_highbd_dc_128_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x4 aom_highbd_dc_128_predictor_16x4_c
+void aom_highbd_dc_128_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x4 aom_highbd_dc_128_predictor_16x4_neon
void aom_highbd_dc_128_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x64 aom_highbd_dc_128_predictor_16x64_c
+void aom_highbd_dc_128_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x64 aom_highbd_dc_128_predictor_16x64_neon
void aom_highbd_dc_128_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x8 aom_highbd_dc_128_predictor_16x8_c
+void aom_highbd_dc_128_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x8 aom_highbd_dc_128_predictor_16x8_neon
void aom_highbd_dc_128_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x16 aom_highbd_dc_128_predictor_32x16_c
+void aom_highbd_dc_128_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x16 aom_highbd_dc_128_predictor_32x16_neon
void aom_highbd_dc_128_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x32 aom_highbd_dc_128_predictor_32x32_c
+void aom_highbd_dc_128_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x32 aom_highbd_dc_128_predictor_32x32_neon
void aom_highbd_dc_128_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x64 aom_highbd_dc_128_predictor_32x64_c
+void aom_highbd_dc_128_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x64 aom_highbd_dc_128_predictor_32x64_neon
void aom_highbd_dc_128_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x8 aom_highbd_dc_128_predictor_32x8_c
+void aom_highbd_dc_128_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x8 aom_highbd_dc_128_predictor_32x8_neon
void aom_highbd_dc_128_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x16 aom_highbd_dc_128_predictor_4x16_c
+void aom_highbd_dc_128_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x16 aom_highbd_dc_128_predictor_4x16_neon
void aom_highbd_dc_128_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x4 aom_highbd_dc_128_predictor_4x4_c
+void aom_highbd_dc_128_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x4 aom_highbd_dc_128_predictor_4x4_neon
void aom_highbd_dc_128_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x8 aom_highbd_dc_128_predictor_4x8_c
+void aom_highbd_dc_128_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x8 aom_highbd_dc_128_predictor_4x8_neon
void aom_highbd_dc_128_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x16 aom_highbd_dc_128_predictor_64x16_c
+void aom_highbd_dc_128_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x16 aom_highbd_dc_128_predictor_64x16_neon
void aom_highbd_dc_128_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x32 aom_highbd_dc_128_predictor_64x32_c
+void aom_highbd_dc_128_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x32 aom_highbd_dc_128_predictor_64x32_neon
void aom_highbd_dc_128_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x64 aom_highbd_dc_128_predictor_64x64_c
+void aom_highbd_dc_128_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x64 aom_highbd_dc_128_predictor_64x64_neon
void aom_highbd_dc_128_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x16 aom_highbd_dc_128_predictor_8x16_c
+void aom_highbd_dc_128_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x16 aom_highbd_dc_128_predictor_8x16_neon
void aom_highbd_dc_128_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x32 aom_highbd_dc_128_predictor_8x32_c
+void aom_highbd_dc_128_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x32 aom_highbd_dc_128_predictor_8x32_neon
void aom_highbd_dc_128_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x4 aom_highbd_dc_128_predictor_8x4_c
+void aom_highbd_dc_128_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x4 aom_highbd_dc_128_predictor_8x4_neon
void aom_highbd_dc_128_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x8 aom_highbd_dc_128_predictor_8x8_c
+void aom_highbd_dc_128_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x8 aom_highbd_dc_128_predictor_8x8_neon
void aom_highbd_dc_left_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x16 aom_highbd_dc_left_predictor_16x16_c
+void aom_highbd_dc_left_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x16 aom_highbd_dc_left_predictor_16x16_neon
void aom_highbd_dc_left_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x32 aom_highbd_dc_left_predictor_16x32_c
+void aom_highbd_dc_left_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x32 aom_highbd_dc_left_predictor_16x32_neon
void aom_highbd_dc_left_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x4 aom_highbd_dc_left_predictor_16x4_c
+void aom_highbd_dc_left_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x4 aom_highbd_dc_left_predictor_16x4_neon
void aom_highbd_dc_left_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x64 aom_highbd_dc_left_predictor_16x64_c
+void aom_highbd_dc_left_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x64 aom_highbd_dc_left_predictor_16x64_neon
void aom_highbd_dc_left_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x8 aom_highbd_dc_left_predictor_16x8_c
+void aom_highbd_dc_left_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x8 aom_highbd_dc_left_predictor_16x8_neon
void aom_highbd_dc_left_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x16 aom_highbd_dc_left_predictor_32x16_c
+void aom_highbd_dc_left_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x16 aom_highbd_dc_left_predictor_32x16_neon
void aom_highbd_dc_left_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x32 aom_highbd_dc_left_predictor_32x32_c
+void aom_highbd_dc_left_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x32 aom_highbd_dc_left_predictor_32x32_neon
void aom_highbd_dc_left_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x64 aom_highbd_dc_left_predictor_32x64_c
+void aom_highbd_dc_left_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x64 aom_highbd_dc_left_predictor_32x64_neon
void aom_highbd_dc_left_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x8 aom_highbd_dc_left_predictor_32x8_c
+void aom_highbd_dc_left_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x8 aom_highbd_dc_left_predictor_32x8_neon
void aom_highbd_dc_left_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x16 aom_highbd_dc_left_predictor_4x16_c
+void aom_highbd_dc_left_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x16 aom_highbd_dc_left_predictor_4x16_neon
void aom_highbd_dc_left_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x4 aom_highbd_dc_left_predictor_4x4_c
+void aom_highbd_dc_left_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x4 aom_highbd_dc_left_predictor_4x4_neon
void aom_highbd_dc_left_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x8 aom_highbd_dc_left_predictor_4x8_c
+void aom_highbd_dc_left_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x8 aom_highbd_dc_left_predictor_4x8_neon
void aom_highbd_dc_left_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x16 aom_highbd_dc_left_predictor_64x16_c
+void aom_highbd_dc_left_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x16 aom_highbd_dc_left_predictor_64x16_neon
void aom_highbd_dc_left_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x32 aom_highbd_dc_left_predictor_64x32_c
+void aom_highbd_dc_left_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x32 aom_highbd_dc_left_predictor_64x32_neon
void aom_highbd_dc_left_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x64 aom_highbd_dc_left_predictor_64x64_c
+void aom_highbd_dc_left_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x64 aom_highbd_dc_left_predictor_64x64_neon
void aom_highbd_dc_left_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x16 aom_highbd_dc_left_predictor_8x16_c
+void aom_highbd_dc_left_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x16 aom_highbd_dc_left_predictor_8x16_neon
void aom_highbd_dc_left_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x32 aom_highbd_dc_left_predictor_8x32_c
+void aom_highbd_dc_left_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x32 aom_highbd_dc_left_predictor_8x32_neon
void aom_highbd_dc_left_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x4 aom_highbd_dc_left_predictor_8x4_c
+void aom_highbd_dc_left_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x4 aom_highbd_dc_left_predictor_8x4_neon
void aom_highbd_dc_left_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x8 aom_highbd_dc_left_predictor_8x8_c
+void aom_highbd_dc_left_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x8 aom_highbd_dc_left_predictor_8x8_neon
void aom_highbd_dc_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_16x16 aom_highbd_dc_predictor_16x16_neon
void aom_highbd_dc_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x32 aom_highbd_dc_predictor_16x32_c
+void aom_highbd_dc_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x32 aom_highbd_dc_predictor_16x32_neon
void aom_highbd_dc_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x4 aom_highbd_dc_predictor_16x4_c
+void aom_highbd_dc_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x4 aom_highbd_dc_predictor_16x4_neon
void aom_highbd_dc_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x64 aom_highbd_dc_predictor_16x64_c
+void aom_highbd_dc_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x64 aom_highbd_dc_predictor_16x64_neon
void aom_highbd_dc_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x8 aom_highbd_dc_predictor_16x8_c
+void aom_highbd_dc_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x8 aom_highbd_dc_predictor_16x8_neon
void aom_highbd_dc_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x16 aom_highbd_dc_predictor_32x16_c
+void aom_highbd_dc_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x16 aom_highbd_dc_predictor_32x16_neon
void aom_highbd_dc_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_32x32 aom_highbd_dc_predictor_32x32_neon
void aom_highbd_dc_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x64 aom_highbd_dc_predictor_32x64_c
+void aom_highbd_dc_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x64 aom_highbd_dc_predictor_32x64_neon
void aom_highbd_dc_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x8 aom_highbd_dc_predictor_32x8_c
+void aom_highbd_dc_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x8 aom_highbd_dc_predictor_32x8_neon
void aom_highbd_dc_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_4x16 aom_highbd_dc_predictor_4x16_c
+void aom_highbd_dc_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x16 aom_highbd_dc_predictor_4x16_neon
void aom_highbd_dc_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_4x4 aom_highbd_dc_predictor_4x4_neon
void aom_highbd_dc_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_4x8 aom_highbd_dc_predictor_4x8_c
+void aom_highbd_dc_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x8 aom_highbd_dc_predictor_4x8_neon
void aom_highbd_dc_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_64x16 aom_highbd_dc_predictor_64x16_c
+void aom_highbd_dc_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x16 aom_highbd_dc_predictor_64x16_neon
void aom_highbd_dc_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_64x32 aom_highbd_dc_predictor_64x32_c
+void aom_highbd_dc_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x32 aom_highbd_dc_predictor_64x32_neon
void aom_highbd_dc_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_64x64 aom_highbd_dc_predictor_64x64_neon
void aom_highbd_dc_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x16 aom_highbd_dc_predictor_8x16_c
+void aom_highbd_dc_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x16 aom_highbd_dc_predictor_8x16_neon
void aom_highbd_dc_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x32 aom_highbd_dc_predictor_8x32_c
+void aom_highbd_dc_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x32 aom_highbd_dc_predictor_8x32_neon
void aom_highbd_dc_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x4 aom_highbd_dc_predictor_8x4_c
+void aom_highbd_dc_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x4 aom_highbd_dc_predictor_8x4_neon
void aom_highbd_dc_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_8x8 aom_highbd_dc_predictor_8x8_neon
void aom_highbd_dc_top_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x16 aom_highbd_dc_top_predictor_16x16_c
+void aom_highbd_dc_top_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x16 aom_highbd_dc_top_predictor_16x16_neon
void aom_highbd_dc_top_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x32 aom_highbd_dc_top_predictor_16x32_c
+void aom_highbd_dc_top_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x32 aom_highbd_dc_top_predictor_16x32_neon
void aom_highbd_dc_top_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x4 aom_highbd_dc_top_predictor_16x4_c
+void aom_highbd_dc_top_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x4 aom_highbd_dc_top_predictor_16x4_neon
void aom_highbd_dc_top_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x64 aom_highbd_dc_top_predictor_16x64_c
+void aom_highbd_dc_top_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x64 aom_highbd_dc_top_predictor_16x64_neon
void aom_highbd_dc_top_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x8 aom_highbd_dc_top_predictor_16x8_c
+void aom_highbd_dc_top_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x8 aom_highbd_dc_top_predictor_16x8_neon
void aom_highbd_dc_top_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x16 aom_highbd_dc_top_predictor_32x16_c
+void aom_highbd_dc_top_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x16 aom_highbd_dc_top_predictor_32x16_neon
void aom_highbd_dc_top_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x32 aom_highbd_dc_top_predictor_32x32_c
+void aom_highbd_dc_top_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x32 aom_highbd_dc_top_predictor_32x32_neon
void aom_highbd_dc_top_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x64 aom_highbd_dc_top_predictor_32x64_c
+void aom_highbd_dc_top_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x64 aom_highbd_dc_top_predictor_32x64_neon
void aom_highbd_dc_top_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x8 aom_highbd_dc_top_predictor_32x8_c
+void aom_highbd_dc_top_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x8 aom_highbd_dc_top_predictor_32x8_neon
void aom_highbd_dc_top_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x16 aom_highbd_dc_top_predictor_4x16_c
+void aom_highbd_dc_top_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x16 aom_highbd_dc_top_predictor_4x16_neon
void aom_highbd_dc_top_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x4 aom_highbd_dc_top_predictor_4x4_c
+void aom_highbd_dc_top_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x4 aom_highbd_dc_top_predictor_4x4_neon
void aom_highbd_dc_top_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x8 aom_highbd_dc_top_predictor_4x8_c
+void aom_highbd_dc_top_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x8 aom_highbd_dc_top_predictor_4x8_neon
void aom_highbd_dc_top_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x16 aom_highbd_dc_top_predictor_64x16_c
+void aom_highbd_dc_top_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x16 aom_highbd_dc_top_predictor_64x16_neon
void aom_highbd_dc_top_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x32 aom_highbd_dc_top_predictor_64x32_c
+void aom_highbd_dc_top_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x32 aom_highbd_dc_top_predictor_64x32_neon
void aom_highbd_dc_top_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x64 aom_highbd_dc_top_predictor_64x64_c
+void aom_highbd_dc_top_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x64 aom_highbd_dc_top_predictor_64x64_neon
void aom_highbd_dc_top_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x16 aom_highbd_dc_top_predictor_8x16_c
+void aom_highbd_dc_top_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x16 aom_highbd_dc_top_predictor_8x16_neon
void aom_highbd_dc_top_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x32 aom_highbd_dc_top_predictor_8x32_c
+void aom_highbd_dc_top_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x32 aom_highbd_dc_top_predictor_8x32_neon
void aom_highbd_dc_top_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x4 aom_highbd_dc_top_predictor_8x4_c
+void aom_highbd_dc_top_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x4 aom_highbd_dc_top_predictor_8x4_neon
void aom_highbd_dc_top_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x8 aom_highbd_dc_top_predictor_8x8_c
+void aom_highbd_dc_top_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x8 aom_highbd_dc_top_predictor_8x8_neon
void aom_highbd_dist_wtd_comp_avg_pred_c(uint8_t *comp_pred8, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param);
#define aom_highbd_dist_wtd_comp_avg_pred aom_highbd_dist_wtd_comp_avg_pred_c
@@ -2275,74 +2454,93 @@
unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
-void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_c
-
void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_c
+void aom_highbd_h_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_neon
void aom_highbd_h_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x32 aom_highbd_h_predictor_16x32_c
+void aom_highbd_h_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x32 aom_highbd_h_predictor_16x32_neon
void aom_highbd_h_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x4 aom_highbd_h_predictor_16x4_c
+void aom_highbd_h_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x4 aom_highbd_h_predictor_16x4_neon
void aom_highbd_h_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x64 aom_highbd_h_predictor_16x64_c
+void aom_highbd_h_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x64 aom_highbd_h_predictor_16x64_neon
void aom_highbd_h_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x8 aom_highbd_h_predictor_16x8_c
+void aom_highbd_h_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x8 aom_highbd_h_predictor_16x8_neon
void aom_highbd_h_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x16 aom_highbd_h_predictor_32x16_c
+void aom_highbd_h_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x16 aom_highbd_h_predictor_32x16_neon
void aom_highbd_h_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x32 aom_highbd_h_predictor_32x32_c
+void aom_highbd_h_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x32 aom_highbd_h_predictor_32x32_neon
void aom_highbd_h_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x64 aom_highbd_h_predictor_32x64_c
+void aom_highbd_h_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x64 aom_highbd_h_predictor_32x64_neon
void aom_highbd_h_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x8 aom_highbd_h_predictor_32x8_c
+void aom_highbd_h_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x8 aom_highbd_h_predictor_32x8_neon
void aom_highbd_h_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x16 aom_highbd_h_predictor_4x16_c
+void aom_highbd_h_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x16 aom_highbd_h_predictor_4x16_neon
void aom_highbd_h_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x4 aom_highbd_h_predictor_4x4_c
+void aom_highbd_h_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x4 aom_highbd_h_predictor_4x4_neon
void aom_highbd_h_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x8 aom_highbd_h_predictor_4x8_c
+void aom_highbd_h_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x8 aom_highbd_h_predictor_4x8_neon
void aom_highbd_h_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x16 aom_highbd_h_predictor_64x16_c
+void aom_highbd_h_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x16 aom_highbd_h_predictor_64x16_neon
void aom_highbd_h_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x32 aom_highbd_h_predictor_64x32_c
+void aom_highbd_h_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x32 aom_highbd_h_predictor_64x32_neon
void aom_highbd_h_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x64 aom_highbd_h_predictor_64x64_c
+void aom_highbd_h_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x64 aom_highbd_h_predictor_64x64_neon
void aom_highbd_h_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x16 aom_highbd_h_predictor_8x16_c
+void aom_highbd_h_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x16 aom_highbd_h_predictor_8x16_neon
void aom_highbd_h_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x32 aom_highbd_h_predictor_8x32_c
+void aom_highbd_h_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x32 aom_highbd_h_predictor_8x32_neon
void aom_highbd_h_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x4 aom_highbd_h_predictor_8x4_c
+void aom_highbd_h_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x4 aom_highbd_h_predictor_8x4_neon
void aom_highbd_h_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x8 aom_highbd_h_predictor_8x8_c
+void aom_highbd_h_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x8 aom_highbd_h_predictor_8x8_neon
void aom_highbd_hadamard_16x16_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_16x16 aom_highbd_hadamard_16x16_c
+void aom_highbd_hadamard_16x16_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_16x16 aom_highbd_hadamard_16x16_neon
void aom_highbd_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_32x32 aom_highbd_hadamard_32x32_c
+void aom_highbd_hadamard_32x32_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_32x32 aom_highbd_hadamard_32x32_neon
void aom_highbd_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_8x8 aom_highbd_hadamard_8x8_c
+void aom_highbd_hadamard_8x8_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_8x8 aom_highbd_hadamard_8x8_neon
void aom_highbd_lpf_horizontal_14_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
void aom_highbd_lpf_horizontal_14_neon(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
@@ -2475,7 +2673,8 @@
#define aom_highbd_masked_sad8x8 aom_highbd_masked_sad8x8_c
void aom_highbd_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
-#define aom_highbd_minmax_8x8 aom_highbd_minmax_8x8_c
+void aom_highbd_minmax_8x8_neon(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
+#define aom_highbd_minmax_8x8 aom_highbd_minmax_8x8_neon
unsigned int aom_highbd_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
#define aom_highbd_obmc_sad128x128 aom_highbd_obmc_sad128x128_c
@@ -2776,7 +2975,8 @@
#define aom_highbd_quantize_b_adaptive aom_highbd_quantize_b_adaptive_neon
unsigned int aom_highbd_sad128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad128x128 aom_highbd_sad128x128_c
+unsigned int aom_highbd_sad128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x128 aom_highbd_sad128x128_neon
unsigned int aom_highbd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad128x128_avg aom_highbd_sad128x128_avg_c
@@ -2785,10 +2985,12 @@
#define aom_highbd_sad128x128x3d aom_highbd_sad128x128x3d_c
void aom_highbd_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad128x128x4d aom_highbd_sad128x128x4d_c
+void aom_highbd_sad128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x128x4d aom_highbd_sad128x128x4d_neon
unsigned int aom_highbd_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad128x64 aom_highbd_sad128x64_c
+unsigned int aom_highbd_sad128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x64 aom_highbd_sad128x64_neon
unsigned int aom_highbd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad128x64_avg aom_highbd_sad128x64_avg_c
@@ -2797,10 +2999,12 @@
#define aom_highbd_sad128x64x3d aom_highbd_sad128x64x3d_c
void aom_highbd_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad128x64x4d aom_highbd_sad128x64x4d_c
+void aom_highbd_sad128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x64x4d aom_highbd_sad128x64x4d_neon
unsigned int aom_highbd_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x16 aom_highbd_sad16x16_c
+unsigned int aom_highbd_sad16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x16 aom_highbd_sad16x16_neon
unsigned int aom_highbd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x16_avg aom_highbd_sad16x16_avg_c
@@ -2809,10 +3013,12 @@
#define aom_highbd_sad16x16x3d aom_highbd_sad16x16x3d_c
void aom_highbd_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x16x4d aom_highbd_sad16x16x4d_c
+void aom_highbd_sad16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x16x4d aom_highbd_sad16x16x4d_neon
unsigned int aom_highbd_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x32 aom_highbd_sad16x32_c
+unsigned int aom_highbd_sad16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x32 aom_highbd_sad16x32_neon
unsigned int aom_highbd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x32_avg aom_highbd_sad16x32_avg_c
@@ -2821,10 +3027,12 @@
#define aom_highbd_sad16x32x3d aom_highbd_sad16x32x3d_c
void aom_highbd_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x32x4d aom_highbd_sad16x32x4d_c
+void aom_highbd_sad16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x32x4d aom_highbd_sad16x32x4d_neon
unsigned int aom_highbd_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x4 aom_highbd_sad16x4_c
+unsigned int aom_highbd_sad16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x4 aom_highbd_sad16x4_neon
unsigned int aom_highbd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x4_avg aom_highbd_sad16x4_avg_c
@@ -2833,10 +3041,12 @@
#define aom_highbd_sad16x4x3d aom_highbd_sad16x4x3d_c
void aom_highbd_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x4x4d aom_highbd_sad16x4x4d_c
+void aom_highbd_sad16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x4x4d aom_highbd_sad16x4x4d_neon
unsigned int aom_highbd_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x64 aom_highbd_sad16x64_c
+unsigned int aom_highbd_sad16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x64 aom_highbd_sad16x64_neon
unsigned int aom_highbd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x64_avg aom_highbd_sad16x64_avg_c
@@ -2845,10 +3055,12 @@
#define aom_highbd_sad16x64x3d aom_highbd_sad16x64x3d_c
void aom_highbd_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x64x4d aom_highbd_sad16x64x4d_c
+void aom_highbd_sad16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x64x4d aom_highbd_sad16x64x4d_neon
unsigned int aom_highbd_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x8 aom_highbd_sad16x8_c
+unsigned int aom_highbd_sad16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x8 aom_highbd_sad16x8_neon
unsigned int aom_highbd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x8_avg aom_highbd_sad16x8_avg_c
@@ -2857,10 +3069,12 @@
#define aom_highbd_sad16x8x3d aom_highbd_sad16x8x3d_c
void aom_highbd_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x8x4d aom_highbd_sad16x8x4d_c
+void aom_highbd_sad16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x8x4d aom_highbd_sad16x8x4d_neon
unsigned int aom_highbd_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x16 aom_highbd_sad32x16_c
+unsigned int aom_highbd_sad32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x16 aom_highbd_sad32x16_neon
unsigned int aom_highbd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x16_avg aom_highbd_sad32x16_avg_c
@@ -2869,10 +3083,12 @@
#define aom_highbd_sad32x16x3d aom_highbd_sad32x16x3d_c
void aom_highbd_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x16x4d aom_highbd_sad32x16x4d_c
+void aom_highbd_sad32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x16x4d aom_highbd_sad32x16x4d_neon
unsigned int aom_highbd_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x32 aom_highbd_sad32x32_c
+unsigned int aom_highbd_sad32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x32 aom_highbd_sad32x32_neon
unsigned int aom_highbd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x32_avg aom_highbd_sad32x32_avg_c
@@ -2881,10 +3097,12 @@
#define aom_highbd_sad32x32x3d aom_highbd_sad32x32x3d_c
void aom_highbd_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x32x4d aom_highbd_sad32x32x4d_c
+void aom_highbd_sad32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x32x4d aom_highbd_sad32x32x4d_neon
unsigned int aom_highbd_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x64 aom_highbd_sad32x64_c
+unsigned int aom_highbd_sad32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x64 aom_highbd_sad32x64_neon
unsigned int aom_highbd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x64_avg aom_highbd_sad32x64_avg_c
@@ -2893,10 +3111,12 @@
#define aom_highbd_sad32x64x3d aom_highbd_sad32x64x3d_c
void aom_highbd_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x64x4d aom_highbd_sad32x64x4d_c
+void aom_highbd_sad32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x64x4d aom_highbd_sad32x64x4d_neon
unsigned int aom_highbd_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x8 aom_highbd_sad32x8_c
+unsigned int aom_highbd_sad32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x8 aom_highbd_sad32x8_neon
unsigned int aom_highbd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x8_avg aom_highbd_sad32x8_avg_c
@@ -2905,10 +3125,12 @@
#define aom_highbd_sad32x8x3d aom_highbd_sad32x8x3d_c
void aom_highbd_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x8x4d aom_highbd_sad32x8x4d_c
+void aom_highbd_sad32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x8x4d aom_highbd_sad32x8x4d_neon
unsigned int aom_highbd_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x16 aom_highbd_sad4x16_c
+unsigned int aom_highbd_sad4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x16 aom_highbd_sad4x16_neon
unsigned int aom_highbd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x16_avg aom_highbd_sad4x16_avg_c
@@ -2917,10 +3139,12 @@
#define aom_highbd_sad4x16x3d aom_highbd_sad4x16x3d_c
void aom_highbd_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x16x4d aom_highbd_sad4x16x4d_c
+void aom_highbd_sad4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x16x4d aom_highbd_sad4x16x4d_neon
unsigned int aom_highbd_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x4 aom_highbd_sad4x4_c
+unsigned int aom_highbd_sad4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x4 aom_highbd_sad4x4_neon
unsigned int aom_highbd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x4_avg aom_highbd_sad4x4_avg_c
@@ -2929,10 +3153,12 @@
#define aom_highbd_sad4x4x3d aom_highbd_sad4x4x3d_c
void aom_highbd_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x4x4d aom_highbd_sad4x4x4d_c
+void aom_highbd_sad4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x4x4d aom_highbd_sad4x4x4d_neon
unsigned int aom_highbd_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x8 aom_highbd_sad4x8_c
+unsigned int aom_highbd_sad4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x8 aom_highbd_sad4x8_neon
unsigned int aom_highbd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x8_avg aom_highbd_sad4x8_avg_c
@@ -2941,10 +3167,12 @@
#define aom_highbd_sad4x8x3d aom_highbd_sad4x8x3d_c
void aom_highbd_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x8x4d aom_highbd_sad4x8x4d_c
+void aom_highbd_sad4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x8x4d aom_highbd_sad4x8x4d_neon
unsigned int aom_highbd_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x128 aom_highbd_sad64x128_c
+unsigned int aom_highbd_sad64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x128 aom_highbd_sad64x128_neon
unsigned int aom_highbd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x128_avg aom_highbd_sad64x128_avg_c
@@ -2953,10 +3181,12 @@
#define aom_highbd_sad64x128x3d aom_highbd_sad64x128x3d_c
void aom_highbd_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x128x4d aom_highbd_sad64x128x4d_c
+void aom_highbd_sad64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x128x4d aom_highbd_sad64x128x4d_neon
unsigned int aom_highbd_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x16 aom_highbd_sad64x16_c
+unsigned int aom_highbd_sad64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x16 aom_highbd_sad64x16_neon
unsigned int aom_highbd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x16_avg aom_highbd_sad64x16_avg_c
@@ -2965,10 +3195,12 @@
#define aom_highbd_sad64x16x3d aom_highbd_sad64x16x3d_c
void aom_highbd_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x16x4d aom_highbd_sad64x16x4d_c
+void aom_highbd_sad64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x16x4d aom_highbd_sad64x16x4d_neon
unsigned int aom_highbd_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x32 aom_highbd_sad64x32_c
+unsigned int aom_highbd_sad64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x32 aom_highbd_sad64x32_neon
unsigned int aom_highbd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x32_avg aom_highbd_sad64x32_avg_c
@@ -2977,10 +3209,12 @@
#define aom_highbd_sad64x32x3d aom_highbd_sad64x32x3d_c
void aom_highbd_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x32x4d aom_highbd_sad64x32x4d_c
+void aom_highbd_sad64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x32x4d aom_highbd_sad64x32x4d_neon
unsigned int aom_highbd_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x64 aom_highbd_sad64x64_c
+unsigned int aom_highbd_sad64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x64 aom_highbd_sad64x64_neon
unsigned int aom_highbd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x64_avg aom_highbd_sad64x64_avg_c
@@ -2989,10 +3223,12 @@
#define aom_highbd_sad64x64x3d aom_highbd_sad64x64x3d_c
void aom_highbd_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x64x4d aom_highbd_sad64x64x4d_c
+void aom_highbd_sad64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x64x4d aom_highbd_sad64x64x4d_neon
unsigned int aom_highbd_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x16 aom_highbd_sad8x16_c
+unsigned int aom_highbd_sad8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x16 aom_highbd_sad8x16_neon
unsigned int aom_highbd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x16_avg aom_highbd_sad8x16_avg_c
@@ -3001,10 +3237,12 @@
#define aom_highbd_sad8x16x3d aom_highbd_sad8x16x3d_c
void aom_highbd_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x16x4d aom_highbd_sad8x16x4d_c
+void aom_highbd_sad8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x16x4d aom_highbd_sad8x16x4d_neon
unsigned int aom_highbd_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x32 aom_highbd_sad8x32_c
+unsigned int aom_highbd_sad8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x32 aom_highbd_sad8x32_neon
unsigned int aom_highbd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x32_avg aom_highbd_sad8x32_avg_c
@@ -3013,10 +3251,12 @@
#define aom_highbd_sad8x32x3d aom_highbd_sad8x32x3d_c
void aom_highbd_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x32x4d aom_highbd_sad8x32x4d_c
+void aom_highbd_sad8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x32x4d aom_highbd_sad8x32x4d_neon
unsigned int aom_highbd_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x4 aom_highbd_sad8x4_c
+unsigned int aom_highbd_sad8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x4 aom_highbd_sad8x4_neon
unsigned int aom_highbd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x4_avg aom_highbd_sad8x4_avg_c
@@ -3025,10 +3265,12 @@
#define aom_highbd_sad8x4x3d aom_highbd_sad8x4x3d_c
void aom_highbd_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x4x4d aom_highbd_sad8x4x4d_c
+void aom_highbd_sad8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x4x4d aom_highbd_sad8x4x4d_neon
unsigned int aom_highbd_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x8 aom_highbd_sad8x8_c
+unsigned int aom_highbd_sad8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x8 aom_highbd_sad8x8_neon
unsigned int aom_highbd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x8_avg aom_highbd_sad8x8_avg_c
@@ -3037,139 +3279,184 @@
#define aom_highbd_sad8x8x3d aom_highbd_sad8x8x3d_c
void aom_highbd_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x8x4d aom_highbd_sad8x8x4d_c
+void aom_highbd_sad8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x8x4d aom_highbd_sad8x8x4d_neon
unsigned int aom_highbd_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_128x128 aom_highbd_sad_skip_128x128_c
+unsigned int aom_highbd_sad_skip_128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_128x128 aom_highbd_sad_skip_128x128_neon
void aom_highbd_sad_skip_128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_128x128x4d aom_highbd_sad_skip_128x128x4d_c
+void aom_highbd_sad_skip_128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_128x128x4d aom_highbd_sad_skip_128x128x4d_neon
unsigned int aom_highbd_sad_skip_128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_128x64 aom_highbd_sad_skip_128x64_c
+unsigned int aom_highbd_sad_skip_128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_128x64 aom_highbd_sad_skip_128x64_neon
void aom_highbd_sad_skip_128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_128x64x4d aom_highbd_sad_skip_128x64x4d_c
+void aom_highbd_sad_skip_128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_128x64x4d aom_highbd_sad_skip_128x64x4d_neon
unsigned int aom_highbd_sad_skip_16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x16 aom_highbd_sad_skip_16x16_c
+unsigned int aom_highbd_sad_skip_16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x16 aom_highbd_sad_skip_16x16_neon
void aom_highbd_sad_skip_16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x16x4d aom_highbd_sad_skip_16x16x4d_c
+void aom_highbd_sad_skip_16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x16x4d aom_highbd_sad_skip_16x16x4d_neon
unsigned int aom_highbd_sad_skip_16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x32 aom_highbd_sad_skip_16x32_c
+unsigned int aom_highbd_sad_skip_16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x32 aom_highbd_sad_skip_16x32_neon
void aom_highbd_sad_skip_16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x32x4d aom_highbd_sad_skip_16x32x4d_c
+void aom_highbd_sad_skip_16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x32x4d aom_highbd_sad_skip_16x32x4d_neon
unsigned int aom_highbd_sad_skip_16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x4 aom_highbd_sad_skip_16x4_c
+unsigned int aom_highbd_sad_skip_16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x4 aom_highbd_sad_skip_16x4_neon
void aom_highbd_sad_skip_16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x4x4d aom_highbd_sad_skip_16x4x4d_c
+void aom_highbd_sad_skip_16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x4x4d aom_highbd_sad_skip_16x4x4d_neon
unsigned int aom_highbd_sad_skip_16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x64 aom_highbd_sad_skip_16x64_c
+unsigned int aom_highbd_sad_skip_16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x64 aom_highbd_sad_skip_16x64_neon
void aom_highbd_sad_skip_16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x64x4d aom_highbd_sad_skip_16x64x4d_c
+void aom_highbd_sad_skip_16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x64x4d aom_highbd_sad_skip_16x64x4d_neon
unsigned int aom_highbd_sad_skip_16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x8 aom_highbd_sad_skip_16x8_c
+unsigned int aom_highbd_sad_skip_16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x8 aom_highbd_sad_skip_16x8_neon
void aom_highbd_sad_skip_16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x8x4d aom_highbd_sad_skip_16x8x4d_c
+void aom_highbd_sad_skip_16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x8x4d aom_highbd_sad_skip_16x8x4d_neon
unsigned int aom_highbd_sad_skip_32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x16 aom_highbd_sad_skip_32x16_c
+unsigned int aom_highbd_sad_skip_32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x16 aom_highbd_sad_skip_32x16_neon
void aom_highbd_sad_skip_32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x16x4d aom_highbd_sad_skip_32x16x4d_c
+void aom_highbd_sad_skip_32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x16x4d aom_highbd_sad_skip_32x16x4d_neon
unsigned int aom_highbd_sad_skip_32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x32 aom_highbd_sad_skip_32x32_c
+unsigned int aom_highbd_sad_skip_32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x32 aom_highbd_sad_skip_32x32_neon
void aom_highbd_sad_skip_32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x32x4d aom_highbd_sad_skip_32x32x4d_c
+void aom_highbd_sad_skip_32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x32x4d aom_highbd_sad_skip_32x32x4d_neon
unsigned int aom_highbd_sad_skip_32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x64 aom_highbd_sad_skip_32x64_c
+unsigned int aom_highbd_sad_skip_32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x64 aom_highbd_sad_skip_32x64_neon
void aom_highbd_sad_skip_32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x64x4d aom_highbd_sad_skip_32x64x4d_c
+void aom_highbd_sad_skip_32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x64x4d aom_highbd_sad_skip_32x64x4d_neon
unsigned int aom_highbd_sad_skip_32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x8 aom_highbd_sad_skip_32x8_c
+unsigned int aom_highbd_sad_skip_32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x8 aom_highbd_sad_skip_32x8_neon
void aom_highbd_sad_skip_32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x8x4d aom_highbd_sad_skip_32x8x4d_c
+void aom_highbd_sad_skip_32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x8x4d aom_highbd_sad_skip_32x8x4d_neon
unsigned int aom_highbd_sad_skip_4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x16 aom_highbd_sad_skip_4x16_c
+unsigned int aom_highbd_sad_skip_4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x16 aom_highbd_sad_skip_4x16_neon
void aom_highbd_sad_skip_4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x16x4d aom_highbd_sad_skip_4x16x4d_c
+void aom_highbd_sad_skip_4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x16x4d aom_highbd_sad_skip_4x16x4d_neon
unsigned int aom_highbd_sad_skip_4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x4 aom_highbd_sad_skip_4x4_c
+unsigned int aom_highbd_sad_skip_4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x4 aom_highbd_sad_skip_4x4_neon
void aom_highbd_sad_skip_4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x4x4d aom_highbd_sad_skip_4x4x4d_c
+void aom_highbd_sad_skip_4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x4x4d aom_highbd_sad_skip_4x4x4d_neon
unsigned int aom_highbd_sad_skip_4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x8 aom_highbd_sad_skip_4x8_c
+unsigned int aom_highbd_sad_skip_4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x8 aom_highbd_sad_skip_4x8_neon
void aom_highbd_sad_skip_4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x8x4d aom_highbd_sad_skip_4x8x4d_c
+void aom_highbd_sad_skip_4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x8x4d aom_highbd_sad_skip_4x8x4d_neon
unsigned int aom_highbd_sad_skip_64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x128 aom_highbd_sad_skip_64x128_c
+unsigned int aom_highbd_sad_skip_64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x128 aom_highbd_sad_skip_64x128_neon
void aom_highbd_sad_skip_64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x128x4d aom_highbd_sad_skip_64x128x4d_c
+void aom_highbd_sad_skip_64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x128x4d aom_highbd_sad_skip_64x128x4d_neon
unsigned int aom_highbd_sad_skip_64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x16 aom_highbd_sad_skip_64x16_c
+unsigned int aom_highbd_sad_skip_64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x16 aom_highbd_sad_skip_64x16_neon
void aom_highbd_sad_skip_64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x16x4d aom_highbd_sad_skip_64x16x4d_c
+void aom_highbd_sad_skip_64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x16x4d aom_highbd_sad_skip_64x16x4d_neon
unsigned int aom_highbd_sad_skip_64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x32 aom_highbd_sad_skip_64x32_c
+unsigned int aom_highbd_sad_skip_64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x32 aom_highbd_sad_skip_64x32_neon
void aom_highbd_sad_skip_64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x32x4d aom_highbd_sad_skip_64x32x4d_c
+void aom_highbd_sad_skip_64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x32x4d aom_highbd_sad_skip_64x32x4d_neon
unsigned int aom_highbd_sad_skip_64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x64 aom_highbd_sad_skip_64x64_c
+unsigned int aom_highbd_sad_skip_64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x64 aom_highbd_sad_skip_64x64_neon
void aom_highbd_sad_skip_64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x64x4d aom_highbd_sad_skip_64x64x4d_c
+void aom_highbd_sad_skip_64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x64x4d aom_highbd_sad_skip_64x64x4d_neon
unsigned int aom_highbd_sad_skip_8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x16 aom_highbd_sad_skip_8x16_c
+unsigned int aom_highbd_sad_skip_8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x16 aom_highbd_sad_skip_8x16_neon
void aom_highbd_sad_skip_8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x16x4d aom_highbd_sad_skip_8x16x4d_c
+void aom_highbd_sad_skip_8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x16x4d aom_highbd_sad_skip_8x16x4d_neon
unsigned int aom_highbd_sad_skip_8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x32 aom_highbd_sad_skip_8x32_c
+unsigned int aom_highbd_sad_skip_8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x32 aom_highbd_sad_skip_8x32_neon
void aom_highbd_sad_skip_8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x32x4d aom_highbd_sad_skip_8x32x4d_c
+void aom_highbd_sad_skip_8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x32x4d aom_highbd_sad_skip_8x32x4d_neon
unsigned int aom_highbd_sad_skip_8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x4 aom_highbd_sad_skip_8x4_c
+unsigned int aom_highbd_sad_skip_8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x4 aom_highbd_sad_skip_8x4_neon
void aom_highbd_sad_skip_8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x4x4d aom_highbd_sad_skip_8x4x4d_c
+void aom_highbd_sad_skip_8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x4x4d aom_highbd_sad_skip_8x4x4d_neon
unsigned int aom_highbd_sad_skip_8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x8 aom_highbd_sad_skip_8x8_c
+unsigned int aom_highbd_sad_skip_8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x8 aom_highbd_sad_skip_8x8_neon
void aom_highbd_sad_skip_8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x8x4d aom_highbd_sad_skip_8x8x4d_c
+void aom_highbd_sad_skip_8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x8x4d aom_highbd_sad_skip_8x8x4d_neon
void aom_highbd_smooth_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_smooth_h_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
@@ -3610,205 +3897,272 @@
#define aom_lpf_vertical_8_quad aom_lpf_vertical_8_quad_neon
unsigned int aom_masked_sad128x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad128x128 aom_masked_sad128x128_c
+unsigned int aom_masked_sad128x128_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x128 aom_masked_sad128x128_neon
void aom_masked_sad128x128x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad128x128x4d aom_masked_sad128x128x4d_c
+void aom_masked_sad128x128x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad128x128x4d aom_masked_sad128x128x4d_neon
unsigned int aom_masked_sad128x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad128x64 aom_masked_sad128x64_c
+unsigned int aom_masked_sad128x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x64 aom_masked_sad128x64_neon
void aom_masked_sad128x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad128x64x4d aom_masked_sad128x64x4d_c
+void aom_masked_sad128x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad128x64x4d aom_masked_sad128x64x4d_neon
unsigned int aom_masked_sad16x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x16 aom_masked_sad16x16_c
+unsigned int aom_masked_sad16x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x16 aom_masked_sad16x16_neon
void aom_masked_sad16x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x16x4d aom_masked_sad16x16x4d_c
+void aom_masked_sad16x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x16x4d aom_masked_sad16x16x4d_neon
unsigned int aom_masked_sad16x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x32 aom_masked_sad16x32_c
+unsigned int aom_masked_sad16x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x32 aom_masked_sad16x32_neon
void aom_masked_sad16x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x32x4d aom_masked_sad16x32x4d_c
+void aom_masked_sad16x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x32x4d aom_masked_sad16x32x4d_neon
unsigned int aom_masked_sad16x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x4 aom_masked_sad16x4_c
+unsigned int aom_masked_sad16x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x4 aom_masked_sad16x4_neon
void aom_masked_sad16x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x4x4d aom_masked_sad16x4x4d_c
+void aom_masked_sad16x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x4x4d aom_masked_sad16x4x4d_neon
unsigned int aom_masked_sad16x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x64 aom_masked_sad16x64_c
+unsigned int aom_masked_sad16x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x64 aom_masked_sad16x64_neon
void aom_masked_sad16x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x64x4d aom_masked_sad16x64x4d_c
+void aom_masked_sad16x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x64x4d aom_masked_sad16x64x4d_neon
unsigned int aom_masked_sad16x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x8 aom_masked_sad16x8_c
+unsigned int aom_masked_sad16x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x8 aom_masked_sad16x8_neon
void aom_masked_sad16x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x8x4d aom_masked_sad16x8x4d_c
+void aom_masked_sad16x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x8x4d aom_masked_sad16x8x4d_neon
unsigned int aom_masked_sad32x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x16 aom_masked_sad32x16_c
+unsigned int aom_masked_sad32x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x16 aom_masked_sad32x16_neon
void aom_masked_sad32x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x16x4d aom_masked_sad32x16x4d_c
+void aom_masked_sad32x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x16x4d aom_masked_sad32x16x4d_neon
unsigned int aom_masked_sad32x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x32 aom_masked_sad32x32_c
+unsigned int aom_masked_sad32x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x32 aom_masked_sad32x32_neon
void aom_masked_sad32x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x32x4d aom_masked_sad32x32x4d_c
+void aom_masked_sad32x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x32x4d aom_masked_sad32x32x4d_neon
unsigned int aom_masked_sad32x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x64 aom_masked_sad32x64_c
+unsigned int aom_masked_sad32x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x64 aom_masked_sad32x64_neon
void aom_masked_sad32x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x64x4d aom_masked_sad32x64x4d_c
+void aom_masked_sad32x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x64x4d aom_masked_sad32x64x4d_neon
unsigned int aom_masked_sad32x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x8 aom_masked_sad32x8_c
+unsigned int aom_masked_sad32x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x8 aom_masked_sad32x8_neon
void aom_masked_sad32x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x8x4d aom_masked_sad32x8x4d_c
+void aom_masked_sad32x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x8x4d aom_masked_sad32x8x4d_neon
unsigned int aom_masked_sad4x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x16 aom_masked_sad4x16_c
+unsigned int aom_masked_sad4x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x16 aom_masked_sad4x16_neon
void aom_masked_sad4x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x16x4d aom_masked_sad4x16x4d_c
+void aom_masked_sad4x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x16x4d aom_masked_sad4x16x4d_neon
unsigned int aom_masked_sad4x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x4 aom_masked_sad4x4_c
+unsigned int aom_masked_sad4x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x4 aom_masked_sad4x4_neon
void aom_masked_sad4x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x4x4d aom_masked_sad4x4x4d_c
+void aom_masked_sad4x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x4x4d aom_masked_sad4x4x4d_neon
unsigned int aom_masked_sad4x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x8 aom_masked_sad4x8_c
+unsigned int aom_masked_sad4x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x8 aom_masked_sad4x8_neon
void aom_masked_sad4x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x8x4d aom_masked_sad4x8x4d_c
+void aom_masked_sad4x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x8x4d aom_masked_sad4x8x4d_neon
unsigned int aom_masked_sad64x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x128 aom_masked_sad64x128_c
+unsigned int aom_masked_sad64x128_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x128 aom_masked_sad64x128_neon
void aom_masked_sad64x128x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x128x4d aom_masked_sad64x128x4d_c
+void aom_masked_sad64x128x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x128x4d aom_masked_sad64x128x4d_neon
unsigned int aom_masked_sad64x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x16 aom_masked_sad64x16_c
+unsigned int aom_masked_sad64x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x16 aom_masked_sad64x16_neon
void aom_masked_sad64x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x16x4d aom_masked_sad64x16x4d_c
+void aom_masked_sad64x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x16x4d aom_masked_sad64x16x4d_neon
unsigned int aom_masked_sad64x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x32 aom_masked_sad64x32_c
+unsigned int aom_masked_sad64x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x32 aom_masked_sad64x32_neon
void aom_masked_sad64x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x32x4d aom_masked_sad64x32x4d_c
+void aom_masked_sad64x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x32x4d aom_masked_sad64x32x4d_neon
unsigned int aom_masked_sad64x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x64 aom_masked_sad64x64_c
+unsigned int aom_masked_sad64x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x64 aom_masked_sad64x64_neon
void aom_masked_sad64x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x64x4d aom_masked_sad64x64x4d_c
+void aom_masked_sad64x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x64x4d aom_masked_sad64x64x4d_neon
unsigned int aom_masked_sad8x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x16 aom_masked_sad8x16_c
+unsigned int aom_masked_sad8x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x16 aom_masked_sad8x16_neon
void aom_masked_sad8x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x16x4d aom_masked_sad8x16x4d_c
+void aom_masked_sad8x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x16x4d aom_masked_sad8x16x4d_neon
unsigned int aom_masked_sad8x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x32 aom_masked_sad8x32_c
+unsigned int aom_masked_sad8x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x32 aom_masked_sad8x32_neon
void aom_masked_sad8x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x32x4d aom_masked_sad8x32x4d_c
+void aom_masked_sad8x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x32x4d aom_masked_sad8x32x4d_neon
unsigned int aom_masked_sad8x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x4 aom_masked_sad8x4_c
+unsigned int aom_masked_sad8x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x4 aom_masked_sad8x4_neon
void aom_masked_sad8x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x4x4d aom_masked_sad8x4x4d_c
+void aom_masked_sad8x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x4x4d aom_masked_sad8x4x4d_neon
unsigned int aom_masked_sad8x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x8 aom_masked_sad8x8_c
+unsigned int aom_masked_sad8x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x8 aom_masked_sad8x8_neon
void aom_masked_sad8x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x8x4d aom_masked_sad8x8x4d_c
+void aom_masked_sad8x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x8x4d aom_masked_sad8x8x4d_neon
unsigned int aom_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance128x128 aom_masked_sub_pixel_variance128x128_c
+unsigned int aom_masked_sub_pixel_variance128x128_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x128 aom_masked_sub_pixel_variance128x128_neon
unsigned int aom_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance128x64 aom_masked_sub_pixel_variance128x64_c
+unsigned int aom_masked_sub_pixel_variance128x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x64 aom_masked_sub_pixel_variance128x64_neon
unsigned int aom_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x16 aom_masked_sub_pixel_variance16x16_c
+unsigned int aom_masked_sub_pixel_variance16x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x16 aom_masked_sub_pixel_variance16x16_neon
unsigned int aom_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x32 aom_masked_sub_pixel_variance16x32_c
+unsigned int aom_masked_sub_pixel_variance16x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x32 aom_masked_sub_pixel_variance16x32_neon
unsigned int aom_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x4 aom_masked_sub_pixel_variance16x4_c
+unsigned int aom_masked_sub_pixel_variance16x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x4 aom_masked_sub_pixel_variance16x4_neon
unsigned int aom_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x64 aom_masked_sub_pixel_variance16x64_c
+unsigned int aom_masked_sub_pixel_variance16x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x64 aom_masked_sub_pixel_variance16x64_neon
unsigned int aom_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x8 aom_masked_sub_pixel_variance16x8_c
+unsigned int aom_masked_sub_pixel_variance16x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x8 aom_masked_sub_pixel_variance16x8_neon
unsigned int aom_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x16 aom_masked_sub_pixel_variance32x16_c
+unsigned int aom_masked_sub_pixel_variance32x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x16 aom_masked_sub_pixel_variance32x16_neon
unsigned int aom_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x32 aom_masked_sub_pixel_variance32x32_c
+unsigned int aom_masked_sub_pixel_variance32x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x32 aom_masked_sub_pixel_variance32x32_neon
unsigned int aom_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x64 aom_masked_sub_pixel_variance32x64_c
+unsigned int aom_masked_sub_pixel_variance32x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x64 aom_masked_sub_pixel_variance32x64_neon
unsigned int aom_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x8 aom_masked_sub_pixel_variance32x8_c
+unsigned int aom_masked_sub_pixel_variance32x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x8 aom_masked_sub_pixel_variance32x8_neon
unsigned int aom_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x16 aom_masked_sub_pixel_variance4x16_c
+unsigned int aom_masked_sub_pixel_variance4x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x16 aom_masked_sub_pixel_variance4x16_neon
unsigned int aom_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x4 aom_masked_sub_pixel_variance4x4_c
+unsigned int aom_masked_sub_pixel_variance4x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x4 aom_masked_sub_pixel_variance4x4_neon
unsigned int aom_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x8 aom_masked_sub_pixel_variance4x8_c
+unsigned int aom_masked_sub_pixel_variance4x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x8 aom_masked_sub_pixel_variance4x8_neon
unsigned int aom_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x128 aom_masked_sub_pixel_variance64x128_c
+unsigned int aom_masked_sub_pixel_variance64x128_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x128 aom_masked_sub_pixel_variance64x128_neon
unsigned int aom_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x16 aom_masked_sub_pixel_variance64x16_c
+unsigned int aom_masked_sub_pixel_variance64x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x16 aom_masked_sub_pixel_variance64x16_neon
unsigned int aom_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x32 aom_masked_sub_pixel_variance64x32_c
+unsigned int aom_masked_sub_pixel_variance64x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x32 aom_masked_sub_pixel_variance64x32_neon
unsigned int aom_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x64 aom_masked_sub_pixel_variance64x64_c
+unsigned int aom_masked_sub_pixel_variance64x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x64 aom_masked_sub_pixel_variance64x64_neon
unsigned int aom_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x16 aom_masked_sub_pixel_variance8x16_c
+unsigned int aom_masked_sub_pixel_variance8x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x16 aom_masked_sub_pixel_variance8x16_neon
unsigned int aom_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x32 aom_masked_sub_pixel_variance8x32_c
+unsigned int aom_masked_sub_pixel_variance8x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x32 aom_masked_sub_pixel_variance8x32_neon
unsigned int aom_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x4 aom_masked_sub_pixel_variance8x4_c
+unsigned int aom_masked_sub_pixel_variance8x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x4 aom_masked_sub_pixel_variance8x4_neon
unsigned int aom_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x8 aom_masked_sub_pixel_variance8x8_c
+unsigned int aom_masked_sub_pixel_variance8x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x8 aom_masked_sub_pixel_variance8x8_neon
void aom_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
-#define aom_minmax_8x8 aom_minmax_8x8_c
+void aom_minmax_8x8_neon(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
+#define aom_minmax_8x8 aom_minmax_8x8_neon
unsigned int aom_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
unsigned int aom_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
@@ -3837,202 +4191,268 @@
#define aom_mse_wxh_16bit_highbd aom_mse_wxh_16bit_highbd_c
unsigned int aom_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad128x128 aom_obmc_sad128x128_c
+unsigned int aom_obmc_sad128x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x128 aom_obmc_sad128x128_neon
unsigned int aom_obmc_sad128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad128x64 aom_obmc_sad128x64_c
+unsigned int aom_obmc_sad128x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x64 aom_obmc_sad128x64_neon
unsigned int aom_obmc_sad16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x16 aom_obmc_sad16x16_c
+unsigned int aom_obmc_sad16x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x16 aom_obmc_sad16x16_neon
unsigned int aom_obmc_sad16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x32 aom_obmc_sad16x32_c
+unsigned int aom_obmc_sad16x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x32 aom_obmc_sad16x32_neon
unsigned int aom_obmc_sad16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x4 aom_obmc_sad16x4_c
+unsigned int aom_obmc_sad16x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x4 aom_obmc_sad16x4_neon
unsigned int aom_obmc_sad16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x64 aom_obmc_sad16x64_c
+unsigned int aom_obmc_sad16x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x64 aom_obmc_sad16x64_neon
unsigned int aom_obmc_sad16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x8 aom_obmc_sad16x8_c
+unsigned int aom_obmc_sad16x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x8 aom_obmc_sad16x8_neon
unsigned int aom_obmc_sad32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x16 aom_obmc_sad32x16_c
+unsigned int aom_obmc_sad32x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x16 aom_obmc_sad32x16_neon
unsigned int aom_obmc_sad32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x32 aom_obmc_sad32x32_c
+unsigned int aom_obmc_sad32x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x32 aom_obmc_sad32x32_neon
unsigned int aom_obmc_sad32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x64 aom_obmc_sad32x64_c
+unsigned int aom_obmc_sad32x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x64 aom_obmc_sad32x64_neon
unsigned int aom_obmc_sad32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x8 aom_obmc_sad32x8_c
+unsigned int aom_obmc_sad32x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x8 aom_obmc_sad32x8_neon
unsigned int aom_obmc_sad4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x16 aom_obmc_sad4x16_c
+unsigned int aom_obmc_sad4x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x16 aom_obmc_sad4x16_neon
unsigned int aom_obmc_sad4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x4 aom_obmc_sad4x4_c
+unsigned int aom_obmc_sad4x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x4 aom_obmc_sad4x4_neon
unsigned int aom_obmc_sad4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x8 aom_obmc_sad4x8_c
+unsigned int aom_obmc_sad4x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x8 aom_obmc_sad4x8_neon
unsigned int aom_obmc_sad64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x128 aom_obmc_sad64x128_c
+unsigned int aom_obmc_sad64x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x128 aom_obmc_sad64x128_neon
unsigned int aom_obmc_sad64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x16 aom_obmc_sad64x16_c
+unsigned int aom_obmc_sad64x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x16 aom_obmc_sad64x16_neon
unsigned int aom_obmc_sad64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x32 aom_obmc_sad64x32_c
+unsigned int aom_obmc_sad64x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x32 aom_obmc_sad64x32_neon
unsigned int aom_obmc_sad64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x64 aom_obmc_sad64x64_c
+unsigned int aom_obmc_sad64x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x64 aom_obmc_sad64x64_neon
unsigned int aom_obmc_sad8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x16 aom_obmc_sad8x16_c
+unsigned int aom_obmc_sad8x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x16 aom_obmc_sad8x16_neon
unsigned int aom_obmc_sad8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x32 aom_obmc_sad8x32_c
+unsigned int aom_obmc_sad8x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x32 aom_obmc_sad8x32_neon
unsigned int aom_obmc_sad8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x4 aom_obmc_sad8x4_c
+unsigned int aom_obmc_sad8x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x4 aom_obmc_sad8x4_neon
unsigned int aom_obmc_sad8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x8 aom_obmc_sad8x8_c
+unsigned int aom_obmc_sad8x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x8 aom_obmc_sad8x8_neon
unsigned int aom_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance128x128 aom_obmc_sub_pixel_variance128x128_c
+unsigned int aom_obmc_sub_pixel_variance128x128_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x128 aom_obmc_sub_pixel_variance128x128_neon
unsigned int aom_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance128x64 aom_obmc_sub_pixel_variance128x64_c
+unsigned int aom_obmc_sub_pixel_variance128x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x64 aom_obmc_sub_pixel_variance128x64_neon
unsigned int aom_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x16 aom_obmc_sub_pixel_variance16x16_c
+unsigned int aom_obmc_sub_pixel_variance16x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x16 aom_obmc_sub_pixel_variance16x16_neon
unsigned int aom_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x32 aom_obmc_sub_pixel_variance16x32_c
+unsigned int aom_obmc_sub_pixel_variance16x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x32 aom_obmc_sub_pixel_variance16x32_neon
unsigned int aom_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x4 aom_obmc_sub_pixel_variance16x4_c
+unsigned int aom_obmc_sub_pixel_variance16x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x4 aom_obmc_sub_pixel_variance16x4_neon
unsigned int aom_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x64 aom_obmc_sub_pixel_variance16x64_c
+unsigned int aom_obmc_sub_pixel_variance16x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x64 aom_obmc_sub_pixel_variance16x64_neon
unsigned int aom_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x8 aom_obmc_sub_pixel_variance16x8_c
+unsigned int aom_obmc_sub_pixel_variance16x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x8 aom_obmc_sub_pixel_variance16x8_neon
unsigned int aom_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x16 aom_obmc_sub_pixel_variance32x16_c
+unsigned int aom_obmc_sub_pixel_variance32x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x16 aom_obmc_sub_pixel_variance32x16_neon
unsigned int aom_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x32 aom_obmc_sub_pixel_variance32x32_c
+unsigned int aom_obmc_sub_pixel_variance32x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x32 aom_obmc_sub_pixel_variance32x32_neon
unsigned int aom_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x64 aom_obmc_sub_pixel_variance32x64_c
+unsigned int aom_obmc_sub_pixel_variance32x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x64 aom_obmc_sub_pixel_variance32x64_neon
unsigned int aom_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x8 aom_obmc_sub_pixel_variance32x8_c
+unsigned int aom_obmc_sub_pixel_variance32x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x8 aom_obmc_sub_pixel_variance32x8_neon
unsigned int aom_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x16 aom_obmc_sub_pixel_variance4x16_c
+unsigned int aom_obmc_sub_pixel_variance4x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x16 aom_obmc_sub_pixel_variance4x16_neon
unsigned int aom_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x4 aom_obmc_sub_pixel_variance4x4_c
+unsigned int aom_obmc_sub_pixel_variance4x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x4 aom_obmc_sub_pixel_variance4x4_neon
unsigned int aom_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x8 aom_obmc_sub_pixel_variance4x8_c
+unsigned int aom_obmc_sub_pixel_variance4x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x8 aom_obmc_sub_pixel_variance4x8_neon
unsigned int aom_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x128 aom_obmc_sub_pixel_variance64x128_c
+unsigned int aom_obmc_sub_pixel_variance64x128_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x128 aom_obmc_sub_pixel_variance64x128_neon
unsigned int aom_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x16 aom_obmc_sub_pixel_variance64x16_c
+unsigned int aom_obmc_sub_pixel_variance64x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x16 aom_obmc_sub_pixel_variance64x16_neon
unsigned int aom_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x32 aom_obmc_sub_pixel_variance64x32_c
+unsigned int aom_obmc_sub_pixel_variance64x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x32 aom_obmc_sub_pixel_variance64x32_neon
unsigned int aom_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x64 aom_obmc_sub_pixel_variance64x64_c
+unsigned int aom_obmc_sub_pixel_variance64x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x64 aom_obmc_sub_pixel_variance64x64_neon
unsigned int aom_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x16 aom_obmc_sub_pixel_variance8x16_c
+unsigned int aom_obmc_sub_pixel_variance8x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x16 aom_obmc_sub_pixel_variance8x16_neon
unsigned int aom_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x32 aom_obmc_sub_pixel_variance8x32_c
+unsigned int aom_obmc_sub_pixel_variance8x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x32 aom_obmc_sub_pixel_variance8x32_neon
unsigned int aom_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x4 aom_obmc_sub_pixel_variance8x4_c
+unsigned int aom_obmc_sub_pixel_variance8x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x4 aom_obmc_sub_pixel_variance8x4_neon
unsigned int aom_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x8 aom_obmc_sub_pixel_variance8x8_c
+unsigned int aom_obmc_sub_pixel_variance8x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x8 aom_obmc_sub_pixel_variance8x8_neon
unsigned int aom_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance128x128 aom_obmc_variance128x128_c
+unsigned int aom_obmc_variance128x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x128 aom_obmc_variance128x128_neon
unsigned int aom_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance128x64 aom_obmc_variance128x64_c
+unsigned int aom_obmc_variance128x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x64 aom_obmc_variance128x64_neon
unsigned int aom_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x16 aom_obmc_variance16x16_c
+unsigned int aom_obmc_variance16x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x16 aom_obmc_variance16x16_neon
unsigned int aom_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x32 aom_obmc_variance16x32_c
+unsigned int aom_obmc_variance16x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x32 aom_obmc_variance16x32_neon
unsigned int aom_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x4 aom_obmc_variance16x4_c
+unsigned int aom_obmc_variance16x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x4 aom_obmc_variance16x4_neon
unsigned int aom_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x64 aom_obmc_variance16x64_c
+unsigned int aom_obmc_variance16x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x64 aom_obmc_variance16x64_neon
unsigned int aom_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x8 aom_obmc_variance16x8_c
+unsigned int aom_obmc_variance16x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x8 aom_obmc_variance16x8_neon
unsigned int aom_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x16 aom_obmc_variance32x16_c
+unsigned int aom_obmc_variance32x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x16 aom_obmc_variance32x16_neon
unsigned int aom_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x32 aom_obmc_variance32x32_c
+unsigned int aom_obmc_variance32x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x32 aom_obmc_variance32x32_neon
unsigned int aom_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x64 aom_obmc_variance32x64_c
+unsigned int aom_obmc_variance32x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x64 aom_obmc_variance32x64_neon
unsigned int aom_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x8 aom_obmc_variance32x8_c
+unsigned int aom_obmc_variance32x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x8 aom_obmc_variance32x8_neon
unsigned int aom_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x16 aom_obmc_variance4x16_c
+unsigned int aom_obmc_variance4x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x16 aom_obmc_variance4x16_neon
unsigned int aom_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x4 aom_obmc_variance4x4_c
+unsigned int aom_obmc_variance4x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x4 aom_obmc_variance4x4_neon
unsigned int aom_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x8 aom_obmc_variance4x8_c
+unsigned int aom_obmc_variance4x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x8 aom_obmc_variance4x8_neon
unsigned int aom_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x128 aom_obmc_variance64x128_c
+unsigned int aom_obmc_variance64x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x128 aom_obmc_variance64x128_neon
unsigned int aom_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x16 aom_obmc_variance64x16_c
+unsigned int aom_obmc_variance64x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x16 aom_obmc_variance64x16_neon
unsigned int aom_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x32 aom_obmc_variance64x32_c
+unsigned int aom_obmc_variance64x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x32 aom_obmc_variance64x32_neon
unsigned int aom_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x64 aom_obmc_variance64x64_c
+unsigned int aom_obmc_variance64x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x64 aom_obmc_variance64x64_neon
unsigned int aom_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x16 aom_obmc_variance8x16_c
+unsigned int aom_obmc_variance8x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x16 aom_obmc_variance8x16_neon
unsigned int aom_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x32 aom_obmc_variance8x32_c
+unsigned int aom_obmc_variance8x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x32 aom_obmc_variance8x32_neon
unsigned int aom_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x4 aom_obmc_variance8x4_c
+unsigned int aom_obmc_variance8x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x4 aom_obmc_variance8x4_neon
unsigned int aom_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x8 aom_obmc_variance8x8_c
+unsigned int aom_obmc_variance8x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x8 aom_obmc_variance8x8_neon
void aom_paeth_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_paeth_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -4110,9 +4530,6 @@
void aom_paeth_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_neon
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-#define aom_pixel_scale aom_pixel_scale_c
-
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
void aom_quantize_b_neon(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
#define aom_quantize_b aom_quantize_b_neon
@@ -4143,15 +4560,13 @@
#define aom_sad128x128_avg aom_sad128x128_avg_neon
void aom_sad128x128x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad128x128x3d aom_sad128x128x3d_c
+void aom_sad128x128x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad128x128x3d aom_sad128x128x3d_neon
void aom_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x128x4d aom_sad128x128x4d_neon
-void aom_sad128x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x128x4d_avg aom_sad128x128x4d_avg_c
-
unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad128x64 aom_sad128x64_neon
@@ -4161,18 +4576,13 @@
#define aom_sad128x64_avg aom_sad128x64_avg_neon
void aom_sad128x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad128x64x3d aom_sad128x64x3d_c
+void aom_sad128x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad128x64x3d aom_sad128x64x3d_neon
void aom_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x64x4d aom_sad128x64x4d_neon
-void aom_sad128x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x64x4d_avg aom_sad128x64x4d_avg_c
-
-unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad128xh aom_sad128xh_c
-
unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x16 aom_sad16x16_neon
@@ -4182,15 +4592,13 @@
#define aom_sad16x16_avg aom_sad16x16_avg_neon
void aom_sad16x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x16x3d aom_sad16x16x3d_c
+void aom_sad16x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x16x3d aom_sad16x16x3d_neon
void aom_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x16x4d aom_sad16x16x4d_neon
-void aom_sad16x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x16x4d_avg aom_sad16x16x4d_avg_c
-
unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x32 aom_sad16x32_neon
@@ -4200,15 +4608,13 @@
#define aom_sad16x32_avg aom_sad16x32_avg_neon
void aom_sad16x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x32x3d aom_sad16x32x3d_c
+void aom_sad16x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x32x3d aom_sad16x32x3d_neon
void aom_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x32x4d aom_sad16x32x4d_neon
-void aom_sad16x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x32x4d_avg aom_sad16x32x4d_avg_c
-
unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x4 aom_sad16x4_neon
@@ -4218,15 +4624,13 @@
#define aom_sad16x4_avg aom_sad16x4_avg_neon
void aom_sad16x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x4x3d aom_sad16x4x3d_c
+void aom_sad16x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x4x3d aom_sad16x4x3d_neon
void aom_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x4x4d aom_sad16x4x4d_neon
-void aom_sad16x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x4x4d_avg aom_sad16x4x4d_avg_c
-
unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x64 aom_sad16x64_neon
@@ -4236,15 +4640,13 @@
#define aom_sad16x64_avg aom_sad16x64_avg_neon
void aom_sad16x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x64x3d aom_sad16x64x3d_c
+void aom_sad16x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x64x3d aom_sad16x64x3d_neon
void aom_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x64x4d aom_sad16x64x4d_neon
-void aom_sad16x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x64x4d_avg aom_sad16x64x4d_avg_c
-
unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x8 aom_sad16x8_neon
@@ -4254,18 +4656,13 @@
#define aom_sad16x8_avg aom_sad16x8_avg_neon
void aom_sad16x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x8x3d aom_sad16x8x3d_c
+void aom_sad16x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x8x3d aom_sad16x8x3d_neon
void aom_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x8x4d aom_sad16x8x4d_neon
-void aom_sad16x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x8x4d_avg aom_sad16x8x4d_avg_c
-
-unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad16xh aom_sad16xh_c
-
unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x16 aom_sad32x16_neon
@@ -4275,15 +4672,13 @@
#define aom_sad32x16_avg aom_sad32x16_avg_neon
void aom_sad32x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x16x3d aom_sad32x16x3d_c
+void aom_sad32x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x16x3d aom_sad32x16x3d_neon
void aom_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x16x4d aom_sad32x16x4d_neon
-void aom_sad32x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x16x4d_avg aom_sad32x16x4d_avg_c
-
unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x32 aom_sad32x32_neon
@@ -4293,15 +4688,13 @@
#define aom_sad32x32_avg aom_sad32x32_avg_neon
void aom_sad32x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x32x3d aom_sad32x32x3d_c
+void aom_sad32x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x32x3d aom_sad32x32x3d_neon
void aom_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x32x4d aom_sad32x32x4d_neon
-void aom_sad32x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x32x4d_avg aom_sad32x32x4d_avg_c
-
unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x64 aom_sad32x64_neon
@@ -4311,15 +4704,13 @@
#define aom_sad32x64_avg aom_sad32x64_avg_neon
void aom_sad32x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x64x3d aom_sad32x64x3d_c
+void aom_sad32x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x64x3d aom_sad32x64x3d_neon
void aom_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x64x4d aom_sad32x64x4d_neon
-void aom_sad32x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x64x4d_avg aom_sad32x64x4d_avg_c
-
unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x8 aom_sad32x8_neon
@@ -4329,18 +4720,13 @@
#define aom_sad32x8_avg aom_sad32x8_avg_neon
void aom_sad32x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x8x3d aom_sad32x8x3d_c
+void aom_sad32x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x8x3d aom_sad32x8x3d_neon
void aom_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x8x4d aom_sad32x8x4d_neon
-void aom_sad32x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x8x4d_avg aom_sad32x8x4d_avg_c
-
-unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad32xh aom_sad32xh_c
-
unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x16 aom_sad4x16_neon
@@ -4350,15 +4736,13 @@
#define aom_sad4x16_avg aom_sad4x16_avg_neon
void aom_sad4x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x16x3d aom_sad4x16x3d_c
+void aom_sad4x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x16x3d aom_sad4x16x3d_neon
void aom_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x16x4d aom_sad4x16x4d_neon
-void aom_sad4x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x16x4d_avg aom_sad4x16x4d_avg_c
-
unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x4 aom_sad4x4_neon
@@ -4368,15 +4752,13 @@
#define aom_sad4x4_avg aom_sad4x4_avg_neon
void aom_sad4x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x4x3d aom_sad4x4x3d_c
+void aom_sad4x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x4x3d aom_sad4x4x3d_neon
void aom_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x4x4d aom_sad4x4x4d_neon
-void aom_sad4x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x4x4d_avg aom_sad4x4x4d_avg_c
-
unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x8 aom_sad4x8_neon
@@ -4386,18 +4768,13 @@
#define aom_sad4x8_avg aom_sad4x8_avg_neon
void aom_sad4x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x8x3d aom_sad4x8x3d_c
+void aom_sad4x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x8x3d aom_sad4x8x3d_neon
void aom_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x8x4d aom_sad4x8x4d_neon
-void aom_sad4x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x8x4d_avg aom_sad4x8x4d_avg_c
-
-unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad4xh aom_sad4xh_c
-
unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x128 aom_sad64x128_neon
@@ -4407,15 +4784,13 @@
#define aom_sad64x128_avg aom_sad64x128_avg_neon
void aom_sad64x128x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x128x3d aom_sad64x128x3d_c
+void aom_sad64x128x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x128x3d aom_sad64x128x3d_neon
void aom_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x128x4d aom_sad64x128x4d_neon
-void aom_sad64x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x128x4d_avg aom_sad64x128x4d_avg_c
-
unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x16 aom_sad64x16_neon
@@ -4425,15 +4800,13 @@
#define aom_sad64x16_avg aom_sad64x16_avg_neon
void aom_sad64x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x16x3d aom_sad64x16x3d_c
+void aom_sad64x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x16x3d aom_sad64x16x3d_neon
void aom_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x16x4d aom_sad64x16x4d_neon
-void aom_sad64x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x16x4d_avg aom_sad64x16x4d_avg_c
-
unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x32 aom_sad64x32_neon
@@ -4443,15 +4816,13 @@
#define aom_sad64x32_avg aom_sad64x32_avg_neon
void aom_sad64x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x32x3d aom_sad64x32x3d_c
+void aom_sad64x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x32x3d aom_sad64x32x3d_neon
void aom_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x32x4d aom_sad64x32x4d_neon
-void aom_sad64x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x32x4d_avg aom_sad64x32x4d_avg_c
-
unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x64 aom_sad64x64_neon
@@ -4461,18 +4832,13 @@
#define aom_sad64x64_avg aom_sad64x64_avg_neon
void aom_sad64x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x64x3d aom_sad64x64x3d_c
+void aom_sad64x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x64x3d aom_sad64x64x3d_neon
void aom_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x64x4d aom_sad64x64x4d_neon
-void aom_sad64x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x64x4d_avg aom_sad64x64x4d_avg_c
-
-unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad64xh aom_sad64xh_c
-
unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x16 aom_sad8x16_neon
@@ -4482,15 +4848,13 @@
#define aom_sad8x16_avg aom_sad8x16_avg_neon
void aom_sad8x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x16x3d aom_sad8x16x3d_c
+void aom_sad8x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x16x3d aom_sad8x16x3d_neon
void aom_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x16x4d aom_sad8x16x4d_neon
-void aom_sad8x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x16x4d_avg aom_sad8x16x4d_avg_c
-
unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x32 aom_sad8x32_neon
@@ -4500,15 +4864,13 @@
#define aom_sad8x32_avg aom_sad8x32_avg_neon
void aom_sad8x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x32x3d aom_sad8x32x3d_c
+void aom_sad8x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x32x3d aom_sad8x32x3d_neon
void aom_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x32x4d aom_sad8x32x4d_neon
-void aom_sad8x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x32x4d_avg aom_sad8x32x4d_avg_c
-
unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x4 aom_sad8x4_neon
@@ -4518,15 +4880,13 @@
#define aom_sad8x4_avg aom_sad8x4_avg_neon
void aom_sad8x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x4x3d aom_sad8x4x3d_c
+void aom_sad8x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x4x3d aom_sad8x4x3d_neon
void aom_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x4x4d aom_sad8x4x4d_neon
-void aom_sad8x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x4x4d_avg aom_sad8x4x4d_avg_c
-
unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x8 aom_sad8x8_neon
@@ -4536,18 +4896,13 @@
#define aom_sad8x8_avg aom_sad8x8_avg_neon
void aom_sad8x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x8x3d aom_sad8x8x3d_c
+void aom_sad8x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x8x3d aom_sad8x8x3d_neon
void aom_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x8x4d aom_sad8x8x4d_neon
-void aom_sad8x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x8x4d_avg aom_sad8x8x4d_avg_c
-
-unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad8xh aom_sad8xh_c
-
unsigned int aom_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad_skip_128x128 aom_sad_skip_128x128_neon
@@ -4581,10 +4936,12 @@
#define aom_sad_skip_16x32x4d aom_sad_skip_16x32x4d_neon
unsigned int aom_sad_skip_16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_16x4 aom_sad_skip_16x4_c
+unsigned int aom_sad_skip_16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_16x4 aom_sad_skip_16x4_neon
void aom_sad_skip_16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_16x4x4d aom_sad_skip_16x4x4d_c
+void aom_sad_skip_16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_16x4x4d aom_sad_skip_16x4x4d_neon
unsigned int aom_sad_skip_16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -4643,10 +5000,12 @@
#define aom_sad_skip_4x16x4d aom_sad_skip_4x16x4d_neon
unsigned int aom_sad_skip_4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_4x4 aom_sad_skip_4x4_c
+unsigned int aom_sad_skip_4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_4x4 aom_sad_skip_4x4_neon
void aom_sad_skip_4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_4x4x4d aom_sad_skip_4x4x4d_c
+void aom_sad_skip_4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_4x4x4d aom_sad_skip_4x4x4d_neon
unsigned int aom_sad_skip_4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -4705,10 +5064,12 @@
#define aom_sad_skip_8x32x4d aom_sad_skip_8x32x4d_neon
unsigned int aom_sad_skip_8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_8x4 aom_sad_skip_8x4_c
+unsigned int aom_sad_skip_8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_8x4 aom_sad_skip_8x4_neon
void aom_sad_skip_8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_8x4x4d aom_sad_skip_8x4x4d_c
+void aom_sad_skip_8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_8x4x4d aom_sad_skip_8x4x4d_neon
unsigned int aom_sad_skip_8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -5150,7 +5511,8 @@
#define aom_sum_squares_2d_i16 aom_sum_squares_2d_i16_neon
uint64_t aom_sum_squares_i16_c(const int16_t *src, uint32_t N);
-#define aom_sum_squares_i16 aom_sum_squares_i16_c
+uint64_t aom_sum_squares_i16_neon(const int16_t *src, uint32_t N);
+#define aom_sum_squares_i16 aom_sum_squares_i16_neon
uint64_t aom_sum_sse_2d_i16_c(const int16_t *src, int src_stride, int width, int height, int *sum);
uint64_t aom_sum_sse_2d_i16_neon(const int16_t *src, int src_stride, int width, int height, int *sum);
@@ -5161,67 +5523,84 @@
#define aom_v_predictor_16x16 aom_v_predictor_16x16_neon
void aom_v_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x32 aom_v_predictor_16x32_c
+void aom_v_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x32 aom_v_predictor_16x32_neon
void aom_v_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x4 aom_v_predictor_16x4_c
+void aom_v_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x4 aom_v_predictor_16x4_neon
void aom_v_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x64 aom_v_predictor_16x64_c
+void aom_v_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x64 aom_v_predictor_16x64_neon
void aom_v_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x8 aom_v_predictor_16x8_c
+void aom_v_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x8 aom_v_predictor_16x8_neon
void aom_v_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x16 aom_v_predictor_32x16_c
+void aom_v_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x16 aom_v_predictor_32x16_neon
void aom_v_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_32x32 aom_v_predictor_32x32_neon
void aom_v_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x64 aom_v_predictor_32x64_c
+void aom_v_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x64 aom_v_predictor_32x64_neon
void aom_v_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x8 aom_v_predictor_32x8_c
+void aom_v_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x8 aom_v_predictor_32x8_neon
void aom_v_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_4x16 aom_v_predictor_4x16_c
+void aom_v_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x16 aom_v_predictor_4x16_neon
void aom_v_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_4x4 aom_v_predictor_4x4_neon
void aom_v_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_4x8 aom_v_predictor_4x8_c
+void aom_v_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x8 aom_v_predictor_4x8_neon
void aom_v_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x16 aom_v_predictor_64x16_c
+void aom_v_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x16 aom_v_predictor_64x16_neon
void aom_v_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x32 aom_v_predictor_64x32_c
+void aom_v_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x32 aom_v_predictor_64x32_neon
void aom_v_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x64 aom_v_predictor_64x64_c
+void aom_v_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x64 aom_v_predictor_64x64_neon
void aom_v_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x16 aom_v_predictor_8x16_c
+void aom_v_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x16 aom_v_predictor_8x16_neon
void aom_v_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x32 aom_v_predictor_8x32_c
+void aom_v_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x32 aom_v_predictor_8x32_neon
void aom_v_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x4 aom_v_predictor_8x4_c
+void aom_v_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x4 aom_v_predictor_8x4_neon
void aom_v_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_8x8 aom_v_predictor_8x8_neon
uint64_t aom_var_2d_u16_c(uint8_t *src, int src_stride, int width, int height);
-#define aom_var_2d_u16 aom_var_2d_u16_c
+uint64_t aom_var_2d_u16_neon(uint8_t *src, int src_stride, int width, int height);
+#define aom_var_2d_u16 aom_var_2d_u16_neon
uint64_t aom_var_2d_u8_c(uint8_t *src, int src_stride, int width, int height);
-#define aom_var_2d_u8 aom_var_2d_u8_c
+uint64_t aom_var_2d_u8_neon(uint8_t *src, int src_stride, int width, int height);
+#define aom_var_2d_u8 aom_var_2d_u8_neon
unsigned int aom_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
unsigned int aom_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -5324,7 +5703,7 @@
int aom_vector_var_neon(const int16_t *ref, const int16_t *src, int bwl);
#define aom_vector_var aom_vector_var_neon
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2);
#define av1_compute_cross_correlation av1_compute_cross_correlation_c
void aom_dsp_rtcd(void);
diff --git a/config/arm/config/aom_scale_rtcd.h b/config/arm/config/aom_scale_rtcd.h
index df4b96f..d296957 100644
--- a/config/arm/config/aom_scale_rtcd.h
+++ b/config/arm/config/aom_scale_rtcd.h
@@ -80,7 +80,7 @@
void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
-int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes);
+int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes);
#define aom_yv12_realloc_with_new_border aom_yv12_realloc_with_new_border_c
void aom_scale_rtcd(void);
diff --git a/config/arm/config/av1_rtcd.h b/config/arm/config/av1_rtcd.h
index 964bb72..1a3fa19 100644
--- a/config/arm/config/av1_rtcd.h
+++ b/config/arm/config/av1_rtcd.h
@@ -15,12 +15,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -80,14 +80,11 @@
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
int ref_stride, int subpel_search);
-#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_c
-
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_c
+void aom_comp_avg_upsampled_pred_neon(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+ const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
+ int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
+ int ref_stride, int subpel_search);
+#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_neon
void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
@@ -118,14 +115,17 @@
void aom_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search);
-#define aom_upsampled_pred aom_upsampled_pred_c
+void aom_upsampled_pred_neon(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+ const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
+ int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search);
+#define aom_upsampled_pred aom_upsampled_pred_neon
void av1_apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
void av1_apply_selfguided_restoration_neon(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
#define av1_apply_selfguided_restoration av1_apply_selfguided_restoration_neon
-void av1_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_apply_temporal_filter_neon(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_neon(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_apply_temporal_filter av1_apply_temporal_filter_neon
int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
@@ -150,10 +150,12 @@
#define av1_calc_frame_error av1_calc_frame_error_c
void av1_calc_indices_dim1_c(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
-#define av1_calc_indices_dim1 av1_calc_indices_dim1_c
+void av1_calc_indices_dim1_neon(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
+#define av1_calc_indices_dim1 av1_calc_indices_dim1_neon
void av1_calc_indices_dim2_c(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
-#define av1_calc_indices_dim2 av1_calc_indices_dim2_c
+void av1_calc_indices_dim2_neon(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
+#define av1_calc_indices_dim2 av1_calc_indices_dim2_neon
void av1_calc_proj_params_c( const uint8_t *src8, int width, int height, int src_stride, const uint8_t *dat8, int dat_stride, int32_t *flt0, int flt0_stride, int32_t *flt1, int flt1_stride, int64_t H[2][2], int64_t C[2], const sgr_params_type *params);
#define av1_calc_proj_params av1_calc_proj_params_c
@@ -179,7 +181,7 @@
bool av1_cnn_predict_c( const float **input, int in_width, int in_height, int in_stride, const CNN_CONFIG *cnn_config, const CNN_THREAD_DATA *thread_data, CNN_MULTI_OUT *output_struct);
#define av1_cnn_predict av1_cnn_predict_c
-void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
#define av1_compute_stats av1_compute_stats_c
void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
@@ -231,6 +233,9 @@
void av1_dr_prediction_z3_neon(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
#define av1_dr_prediction_z3 av1_dr_prediction_z3_neon
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height, int width, int stride, int edge_thresh);
+#define av1_estimate_noise_from_single_plane av1_estimate_noise_from_single_plane_c
+
void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
#define av1_filter_intra_edge av1_filter_intra_edge_c
@@ -332,7 +337,7 @@
void av1_get_nz_map_contexts_neon(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
#define av1_get_nz_map_contexts av1_get_nz_map_contexts_neon
-void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_c
int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
@@ -348,10 +353,12 @@
#define av1_highbd_convolve8_vert av1_highbd_convolve8_vert_c
void av1_highbd_convolve_2d_scale_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int x_step_qn, const int subpel_y_qn, const int y_step_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_2d_scale av1_highbd_convolve_2d_scale_c
+void av1_highbd_convolve_2d_scale_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int x_step_qn, const int subpel_y_qn, const int y_step_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_scale av1_highbd_convolve_2d_scale_neon
void av1_highbd_convolve_2d_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_2d_sr av1_highbd_convolve_2d_sr_c
+void av1_highbd_convolve_2d_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_sr av1_highbd_convolve_2d_sr_neon
void av1_highbd_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
#define av1_highbd_convolve_avg av1_highbd_convolve_avg_c
@@ -360,25 +367,32 @@
#define av1_highbd_convolve_copy av1_highbd_convolve_copy_c
void av1_highbd_convolve_horiz_rs_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd);
-#define av1_highbd_convolve_horiz_rs av1_highbd_convolve_horiz_rs_c
+void av1_highbd_convolve_horiz_rs_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd);
+#define av1_highbd_convolve_horiz_rs av1_highbd_convolve_horiz_rs_neon
void av1_highbd_convolve_x_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_x_sr av1_highbd_convolve_x_sr_c
+void av1_highbd_convolve_x_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_x_sr av1_highbd_convolve_x_sr_neon
void av1_highbd_convolve_y_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, int bd);
-#define av1_highbd_convolve_y_sr av1_highbd_convolve_y_sr_c
+void av1_highbd_convolve_y_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, int bd);
+#define av1_highbd_convolve_y_sr av1_highbd_convolve_y_sr_neon
void av1_highbd_dist_wtd_convolve_2d_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_2d av1_highbd_dist_wtd_convolve_2d_c
+void av1_highbd_dist_wtd_convolve_2d_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d av1_highbd_dist_wtd_convolve_2d_neon
void av1_highbd_dist_wtd_convolve_2d_copy_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_2d_copy av1_highbd_dist_wtd_convolve_2d_copy_c
+void av1_highbd_dist_wtd_convolve_2d_copy_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d_copy av1_highbd_dist_wtd_convolve_2d_copy_neon
void av1_highbd_dist_wtd_convolve_x_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_x av1_highbd_dist_wtd_convolve_x_c
+void av1_highbd_dist_wtd_convolve_x_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_x av1_highbd_dist_wtd_convolve_x_neon
void av1_highbd_dist_wtd_convolve_y_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_y av1_highbd_dist_wtd_convolve_y_c
+void av1_highbd_dist_wtd_convolve_y_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_y av1_highbd_dist_wtd_convolve_y_neon
void av1_highbd_dr_prediction_z1_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_above, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z1 av1_highbd_dr_prediction_z1_c
@@ -389,9 +403,8 @@
void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
-void av1_highbd_fwht4x4_neon(const int16_t *input, tran_low_t *output, int stride);
-#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_neon
+double av1_highbd_estimate_noise_from_single_plane_c(const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh);
+#define av1_highbd_estimate_noise_from_single_plane av1_highbd_estimate_noise_from_single_plane_c
void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
void av1_highbd_inv_txfm_add_neon(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
diff --git a/config/arm64/config/aom_config.asm b/config/arm64/config/aom_config.asm
index a6b5453..9214692 100644
--- a/config/arm64/config/aom_config.asm
+++ b/config/arm64/config/aom_config.asm
@@ -8,10 +8,11 @@
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
-ARCH_ARM equ 1
-ARCH_PPC equ 0
-ARCH_X86 equ 0
-ARCH_X86_64 equ 0
+AOM_ARCH_AARCH64 equ 1
+AOM_ARCH_ARM equ 1
+AOM_ARCH_PPC equ 0
+AOM_ARCH_X86 equ 0
+AOM_ARCH_X86_64 equ 0
CONFIG_ACCOUNTING equ 0
CONFIG_ANALYZER equ 0
CONFIG_AV1_DECODER equ 1
@@ -47,6 +48,7 @@
CONFIG_NORMAL_TILE_MODE equ 1
CONFIG_OPTICAL_FLOW_API equ 0
CONFIG_OS_SUPPORT equ 1
+CONFIG_OUTPUT_FRAME_SIZE equ 0
CONFIG_PARTITION_SEARCH_ORDER equ 0
CONFIG_PIC equ 1
CONFIG_RATECTRL_LOG equ 0
@@ -55,6 +57,7 @@
CONFIG_REALTIME_ONLY equ 0
CONFIG_RT_ML_PARTITIONING equ 0
CONFIG_RUNTIME_CPU_DETECT equ 0
+CONFIG_SALIENCY_MAP equ 0
CONFIG_SHARED equ 0
CONFIG_SIZE_LIMIT equ 1
CONFIG_SPATIAL_RESAMPLING equ 1
diff --git a/config/arm64/config/aom_config.c b/config/arm64/config/aom_config.c
index a4d09f7..0a75709 100644
--- a/config/arm64/config/aom_config.c
+++ b/config/arm64/config/aom_config.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/config/arm64/config/aom_config.h b/config/arm64/config/aom_config.h
index 9f2cfc1..239527c 100644
--- a/config/arm64/config/aom_config.h
+++ b/config/arm64/config/aom_config.h
@@ -10,10 +10,11 @@
*/
#ifndef AOM_CONFIG_H_
#define AOM_CONFIG_H_
-#define ARCH_ARM 1
-#define ARCH_PPC 0
-#define ARCH_X86 0
-#define ARCH_X86_64 0
+#define AOM_ARCH_AARCH64 1
+#define AOM_ARCH_ARM 1
+#define AOM_ARCH_PPC 0
+#define AOM_ARCH_X86 0
+#define AOM_ARCH_X86_64 0
#define CONFIG_ACCOUNTING 0
#define CONFIG_ANALYZER 0
#define CONFIG_AV1_DECODER 1
@@ -49,6 +50,7 @@
#define CONFIG_NORMAL_TILE_MODE 1
#define CONFIG_OPTICAL_FLOW_API 0
#define CONFIG_OS_SUPPORT 1
+#define CONFIG_OUTPUT_FRAME_SIZE 0
#define CONFIG_PARTITION_SEARCH_ORDER 0
#define CONFIG_PIC 1
#define CONFIG_RATECTRL_LOG 0
@@ -57,6 +59,7 @@
#define CONFIG_REALTIME_ONLY 0
#define CONFIG_RT_ML_PARTITIONING 0
#define CONFIG_RUNTIME_CPU_DETECT 0
+#define CONFIG_SALIENCY_MAP 0
#define CONFIG_SHARED 0
#define CONFIG_SIZE_LIMIT 1
#define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/arm64/config/aom_dsp_rtcd.h b/config/arm64/config/aom_dsp_rtcd.h
index 7ae6636..ad77b04 100644
--- a/config/arm64/config/aom_dsp_rtcd.h
+++ b/config/arm64/config/aom_dsp_rtcd.h
@@ -14,8 +14,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
#ifdef __cplusplus
@@ -46,19 +46,26 @@
#define aom_blend_a64_vmask aom_blend_a64_vmask_neon
void aom_comp_avg_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride);
-#define aom_comp_avg_pred aom_comp_avg_pred_c
+void aom_comp_avg_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride);
+#define aom_comp_avg_pred aom_comp_avg_pred_neon
void aom_comp_mask_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
-#define aom_comp_mask_pred aom_comp_mask_pred_c
+void aom_comp_mask_pred_neon(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
+#define aom_comp_mask_pred aom_comp_mask_pred_neon
+
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v);
+#define aom_compute_flow_at_point aom_compute_flow_at_point_c
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h);
#define aom_convolve8 aom_convolve8_c
void aom_convolve8_horiz_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
-#define aom_convolve8_horiz aom_convolve8_horiz_c
+void aom_convolve8_horiz_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_horiz aom_convolve8_horiz_neon
void aom_convolve8_vert_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
-#define aom_convolve8_vert aom_convolve8_vert_c
+void aom_convolve8_vert_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h);
+#define aom_convolve8_vert aom_convolve8_vert_neon
void aom_convolve_copy_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, int w, int h);
void aom_convolve_copy_neon(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, int w, int h);
@@ -69,57 +76,72 @@
#define aom_dc_128_predictor_16x16 aom_dc_128_predictor_16x16_neon
void aom_dc_128_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x32 aom_dc_128_predictor_16x32_c
+void aom_dc_128_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x32 aom_dc_128_predictor_16x32_neon
void aom_dc_128_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x4 aom_dc_128_predictor_16x4_c
+void aom_dc_128_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x4 aom_dc_128_predictor_16x4_neon
void aom_dc_128_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x64 aom_dc_128_predictor_16x64_c
+void aom_dc_128_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x64 aom_dc_128_predictor_16x64_neon
void aom_dc_128_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_16x8 aom_dc_128_predictor_16x8_c
+void aom_dc_128_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_16x8 aom_dc_128_predictor_16x8_neon
void aom_dc_128_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x16 aom_dc_128_predictor_32x16_c
+void aom_dc_128_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x16 aom_dc_128_predictor_32x16_neon
void aom_dc_128_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_128_predictor_32x32 aom_dc_128_predictor_32x32_neon
void aom_dc_128_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x64 aom_dc_128_predictor_32x64_c
+void aom_dc_128_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x64 aom_dc_128_predictor_32x64_neon
void aom_dc_128_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_32x8 aom_dc_128_predictor_32x8_c
+void aom_dc_128_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_32x8 aom_dc_128_predictor_32x8_neon
void aom_dc_128_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_4x16 aom_dc_128_predictor_4x16_c
+void aom_dc_128_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x16 aom_dc_128_predictor_4x16_neon
void aom_dc_128_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_128_predictor_4x4 aom_dc_128_predictor_4x4_neon
void aom_dc_128_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_4x8 aom_dc_128_predictor_4x8_c
+void aom_dc_128_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_4x8 aom_dc_128_predictor_4x8_neon
void aom_dc_128_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x16 aom_dc_128_predictor_64x16_c
+void aom_dc_128_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x16 aom_dc_128_predictor_64x16_neon
void aom_dc_128_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x32 aom_dc_128_predictor_64x32_c
+void aom_dc_128_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x32 aom_dc_128_predictor_64x32_neon
void aom_dc_128_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_64x64 aom_dc_128_predictor_64x64_c
+void aom_dc_128_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_64x64 aom_dc_128_predictor_64x64_neon
void aom_dc_128_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x16 aom_dc_128_predictor_8x16_c
+void aom_dc_128_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x16 aom_dc_128_predictor_8x16_neon
void aom_dc_128_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x32 aom_dc_128_predictor_8x32_c
+void aom_dc_128_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x32 aom_dc_128_predictor_8x32_neon
void aom_dc_128_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_128_predictor_8x4 aom_dc_128_predictor_8x4_c
+void aom_dc_128_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_128_predictor_8x4 aom_dc_128_predictor_8x4_neon
void aom_dc_128_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_128_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -130,57 +152,72 @@
#define aom_dc_left_predictor_16x16 aom_dc_left_predictor_16x16_neon
void aom_dc_left_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x32 aom_dc_left_predictor_16x32_c
+void aom_dc_left_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x32 aom_dc_left_predictor_16x32_neon
void aom_dc_left_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x4 aom_dc_left_predictor_16x4_c
+void aom_dc_left_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x4 aom_dc_left_predictor_16x4_neon
void aom_dc_left_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x64 aom_dc_left_predictor_16x64_c
+void aom_dc_left_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x64 aom_dc_left_predictor_16x64_neon
void aom_dc_left_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_16x8 aom_dc_left_predictor_16x8_c
+void aom_dc_left_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_16x8 aom_dc_left_predictor_16x8_neon
void aom_dc_left_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x16 aom_dc_left_predictor_32x16_c
+void aom_dc_left_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x16 aom_dc_left_predictor_32x16_neon
void aom_dc_left_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_left_predictor_32x32 aom_dc_left_predictor_32x32_neon
void aom_dc_left_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x64 aom_dc_left_predictor_32x64_c
+void aom_dc_left_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x64 aom_dc_left_predictor_32x64_neon
void aom_dc_left_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_32x8 aom_dc_left_predictor_32x8_c
+void aom_dc_left_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_32x8 aom_dc_left_predictor_32x8_neon
void aom_dc_left_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_4x16 aom_dc_left_predictor_4x16_c
+void aom_dc_left_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x16 aom_dc_left_predictor_4x16_neon
void aom_dc_left_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_left_predictor_4x4 aom_dc_left_predictor_4x4_neon
void aom_dc_left_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_4x8 aom_dc_left_predictor_4x8_c
+void aom_dc_left_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_4x8 aom_dc_left_predictor_4x8_neon
void aom_dc_left_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x16 aom_dc_left_predictor_64x16_c
+void aom_dc_left_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x16 aom_dc_left_predictor_64x16_neon
void aom_dc_left_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x32 aom_dc_left_predictor_64x32_c
+void aom_dc_left_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x32 aom_dc_left_predictor_64x32_neon
void aom_dc_left_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_64x64 aom_dc_left_predictor_64x64_c
+void aom_dc_left_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_64x64 aom_dc_left_predictor_64x64_neon
void aom_dc_left_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x16 aom_dc_left_predictor_8x16_c
+void aom_dc_left_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x16 aom_dc_left_predictor_8x16_neon
void aom_dc_left_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x32 aom_dc_left_predictor_8x32_c
+void aom_dc_left_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x32 aom_dc_left_predictor_8x32_neon
void aom_dc_left_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_left_predictor_8x4 aom_dc_left_predictor_8x4_c
+void aom_dc_left_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_left_predictor_8x4 aom_dc_left_predictor_8x4_neon
void aom_dc_left_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_left_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -191,57 +228,72 @@
#define aom_dc_predictor_16x16 aom_dc_predictor_16x16_neon
void aom_dc_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x32 aom_dc_predictor_16x32_c
+void aom_dc_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x32 aom_dc_predictor_16x32_neon
void aom_dc_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x4 aom_dc_predictor_16x4_c
+void aom_dc_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x4 aom_dc_predictor_16x4_neon
void aom_dc_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x64 aom_dc_predictor_16x64_c
+void aom_dc_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x64 aom_dc_predictor_16x64_neon
void aom_dc_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_16x8 aom_dc_predictor_16x8_c
+void aom_dc_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_16x8 aom_dc_predictor_16x8_neon
void aom_dc_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x16 aom_dc_predictor_32x16_c
+void aom_dc_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x16 aom_dc_predictor_32x16_neon
void aom_dc_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_predictor_32x32 aom_dc_predictor_32x32_neon
void aom_dc_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x64 aom_dc_predictor_32x64_c
+void aom_dc_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x64 aom_dc_predictor_32x64_neon
void aom_dc_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_32x8 aom_dc_predictor_32x8_c
+void aom_dc_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_32x8 aom_dc_predictor_32x8_neon
void aom_dc_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_4x16 aom_dc_predictor_4x16_c
+void aom_dc_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x16 aom_dc_predictor_4x16_neon
void aom_dc_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_predictor_4x4 aom_dc_predictor_4x4_neon
void aom_dc_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_4x8 aom_dc_predictor_4x8_c
+void aom_dc_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_4x8 aom_dc_predictor_4x8_neon
void aom_dc_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x16 aom_dc_predictor_64x16_c
+void aom_dc_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x16 aom_dc_predictor_64x16_neon
void aom_dc_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x32 aom_dc_predictor_64x32_c
+void aom_dc_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x32 aom_dc_predictor_64x32_neon
void aom_dc_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_64x64 aom_dc_predictor_64x64_c
+void aom_dc_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_64x64 aom_dc_predictor_64x64_neon
void aom_dc_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x16 aom_dc_predictor_8x16_c
+void aom_dc_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x16 aom_dc_predictor_8x16_neon
void aom_dc_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x32 aom_dc_predictor_8x32_c
+void aom_dc_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x32 aom_dc_predictor_8x32_neon
void aom_dc_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_predictor_8x4 aom_dc_predictor_8x4_c
+void aom_dc_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_predictor_8x4 aom_dc_predictor_8x4_neon
void aom_dc_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -252,57 +304,72 @@
#define aom_dc_top_predictor_16x16 aom_dc_top_predictor_16x16_neon
void aom_dc_top_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x32 aom_dc_top_predictor_16x32_c
+void aom_dc_top_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x32 aom_dc_top_predictor_16x32_neon
void aom_dc_top_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x4 aom_dc_top_predictor_16x4_c
+void aom_dc_top_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x4 aom_dc_top_predictor_16x4_neon
void aom_dc_top_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x64 aom_dc_top_predictor_16x64_c
+void aom_dc_top_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x64 aom_dc_top_predictor_16x64_neon
void aom_dc_top_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_16x8 aom_dc_top_predictor_16x8_c
+void aom_dc_top_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_16x8 aom_dc_top_predictor_16x8_neon
void aom_dc_top_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x16 aom_dc_top_predictor_32x16_c
+void aom_dc_top_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x16 aom_dc_top_predictor_32x16_neon
void aom_dc_top_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_top_predictor_32x32 aom_dc_top_predictor_32x32_neon
void aom_dc_top_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x64 aom_dc_top_predictor_32x64_c
+void aom_dc_top_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x64 aom_dc_top_predictor_32x64_neon
void aom_dc_top_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_32x8 aom_dc_top_predictor_32x8_c
+void aom_dc_top_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_32x8 aom_dc_top_predictor_32x8_neon
void aom_dc_top_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_4x16 aom_dc_top_predictor_4x16_c
+void aom_dc_top_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x16 aom_dc_top_predictor_4x16_neon
void aom_dc_top_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_dc_top_predictor_4x4 aom_dc_top_predictor_4x4_neon
void aom_dc_top_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_4x8 aom_dc_top_predictor_4x8_c
+void aom_dc_top_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_4x8 aom_dc_top_predictor_4x8_neon
void aom_dc_top_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x16 aom_dc_top_predictor_64x16_c
+void aom_dc_top_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x16 aom_dc_top_predictor_64x16_neon
void aom_dc_top_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x32 aom_dc_top_predictor_64x32_c
+void aom_dc_top_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x32 aom_dc_top_predictor_64x32_neon
void aom_dc_top_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_64x64 aom_dc_top_predictor_64x64_c
+void aom_dc_top_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_64x64 aom_dc_top_predictor_64x64_neon
void aom_dc_top_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x16 aom_dc_top_predictor_8x16_c
+void aom_dc_top_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x16 aom_dc_top_predictor_8x16_neon
void aom_dc_top_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x32 aom_dc_top_predictor_8x32_c
+void aom_dc_top_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x32 aom_dc_top_predictor_8x32_neon
void aom_dc_top_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_dc_top_predictor_8x4 aom_dc_top_predictor_8x4_c
+void aom_dc_top_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_dc_top_predictor_8x4 aom_dc_top_predictor_8x4_neon
void aom_dc_top_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_dc_top_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -451,10 +518,6 @@
void aom_fdct4x4_lp_neon(const int16_t *input, int16_t *output, int stride);
#define aom_fdct4x4_lp aom_fdct4x4_lp_neon
-void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_fdct8x8_neon(const int16_t *input, tran_low_t *output, int stride);
-#define aom_fdct8x8 aom_fdct8x8_neon
-
void aom_fft16x16_float_c(const float *input, float *temp, float *output);
#define aom_fft16x16_float aom_fft16x16_float_c
@@ -470,18 +533,6 @@
void aom_fft8x8_float_c(const float *input, float *temp, float *output);
#define aom_fft8x8_float aom_fft8x8_float_c
-void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get16x16var_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get16x16var aom_get16x16var_neon
-
-unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-unsigned int aom_get4x4sse_cs_neon(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-#define aom_get4x4sse_cs aom_get4x4sse_cs_neon
-
-void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get8x8var_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get8x8var aom_get8x8var_neon
-
void aom_get_blk_sse_sum_c(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
#define aom_get_blk_sse_sum aom_get_blk_sse_sum_c
@@ -489,7 +540,8 @@
#define aom_get_mb_ss aom_get_mb_ss_c
void aom_get_var_sse_sum_16x16_dual_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
-#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_c
+void aom_get_var_sse_sum_16x16_dual_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
+#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_neon
void aom_get_var_sse_sum_8x8_quad_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
void aom_get_var_sse_sum_8x8_quad_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
@@ -500,57 +552,72 @@
#define aom_h_predictor_16x16 aom_h_predictor_16x16_neon
void aom_h_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x32 aom_h_predictor_16x32_c
+void aom_h_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x32 aom_h_predictor_16x32_neon
void aom_h_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x4 aom_h_predictor_16x4_c
+void aom_h_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x4 aom_h_predictor_16x4_neon
void aom_h_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x64 aom_h_predictor_16x64_c
+void aom_h_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x64 aom_h_predictor_16x64_neon
void aom_h_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_16x8 aom_h_predictor_16x8_c
+void aom_h_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_16x8 aom_h_predictor_16x8_neon
void aom_h_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x16 aom_h_predictor_32x16_c
+void aom_h_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x16 aom_h_predictor_32x16_neon
void aom_h_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_h_predictor_32x32 aom_h_predictor_32x32_neon
void aom_h_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x64 aom_h_predictor_32x64_c
+void aom_h_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x64 aom_h_predictor_32x64_neon
void aom_h_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_32x8 aom_h_predictor_32x8_c
+void aom_h_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_32x8 aom_h_predictor_32x8_neon
void aom_h_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_4x16 aom_h_predictor_4x16_c
+void aom_h_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x16 aom_h_predictor_4x16_neon
void aom_h_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_h_predictor_4x4 aom_h_predictor_4x4_neon
void aom_h_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_4x8 aom_h_predictor_4x8_c
+void aom_h_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_4x8 aom_h_predictor_4x8_neon
void aom_h_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x16 aom_h_predictor_64x16_c
+void aom_h_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x16 aom_h_predictor_64x16_neon
void aom_h_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x32 aom_h_predictor_64x32_c
+void aom_h_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x32 aom_h_predictor_64x32_neon
void aom_h_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_64x64 aom_h_predictor_64x64_c
+void aom_h_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_64x64 aom_h_predictor_64x64_neon
void aom_h_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x16 aom_h_predictor_8x16_c
+void aom_h_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x16 aom_h_predictor_8x16_neon
void aom_h_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x32 aom_h_predictor_8x32_c
+void aom_h_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x32 aom_h_predictor_8x32_neon
void aom_h_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_h_predictor_8x4 aom_h_predictor_8x4_c
+void aom_h_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_h_predictor_8x4 aom_h_predictor_8x4_neon
void aom_h_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_h_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -561,10 +628,12 @@
#define aom_hadamard_16x16 aom_hadamard_16x16_neon
void aom_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_hadamard_32x32 aom_hadamard_32x32_c
+void aom_hadamard_32x32_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_32x32 aom_hadamard_32x32_neon
void aom_hadamard_4x4_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_hadamard_4x4 aom_hadamard_4x4_c
+void aom_hadamard_4x4_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_hadamard_4x4 aom_hadamard_4x4_neon
void aom_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
void aom_hadamard_8x8_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
@@ -648,12 +717,6 @@
uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
-
-void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
-
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_c
@@ -721,16 +784,20 @@
#define aom_highbd_10_masked_sub_pixel_variance8x8 aom_highbd_10_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_10_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse16x16 aom_highbd_10_mse16x16_c
+unsigned int aom_highbd_10_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x16 aom_highbd_10_mse16x16_neon
unsigned int aom_highbd_10_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse16x8 aom_highbd_10_mse16x8_c
+unsigned int aom_highbd_10_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse16x8 aom_highbd_10_mse16x8_neon
unsigned int aom_highbd_10_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse8x16 aom_highbd_10_mse8x16_c
+unsigned int aom_highbd_10_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x16 aom_highbd_10_mse8x16_neon
unsigned int aom_highbd_10_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_10_mse8x8 aom_highbd_10_mse8x8_c
+unsigned int aom_highbd_10_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_10_mse8x8 aom_highbd_10_mse8x8_neon
unsigned int aom_highbd_10_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
#define aom_highbd_10_obmc_sub_pixel_variance128x128 aom_highbd_10_obmc_sub_pixel_variance128x128_c
@@ -1012,12 +1079,12 @@
unsigned int aom_highbd_10_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_neon
-unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_neon
-unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_neon
unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1042,12 +1109,12 @@
unsigned int aom_highbd_10_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_neon
-unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_neon
-unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_neon
unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1065,8 +1132,8 @@
unsigned int aom_highbd_10_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_neon
-unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_neon
unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1081,8 +1148,8 @@
unsigned int aom_highbd_10_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_neon
-unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_neon
unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1159,12 +1226,6 @@
uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
-
-void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
-
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_c
@@ -1232,16 +1293,20 @@
#define aom_highbd_12_masked_sub_pixel_variance8x8 aom_highbd_12_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_12_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse16x16 aom_highbd_12_mse16x16_c
+unsigned int aom_highbd_12_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x16 aom_highbd_12_mse16x16_neon
unsigned int aom_highbd_12_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse16x8 aom_highbd_12_mse16x8_c
+unsigned int aom_highbd_12_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse16x8 aom_highbd_12_mse16x8_neon
unsigned int aom_highbd_12_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse8x16 aom_highbd_12_mse8x16_c
+unsigned int aom_highbd_12_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x16 aom_highbd_12_mse8x16_neon
unsigned int aom_highbd_12_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_12_mse8x8 aom_highbd_12_mse8x8_c
+unsigned int aom_highbd_12_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_12_mse8x8 aom_highbd_12_mse8x8_neon
unsigned int aom_highbd_12_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
#define aom_highbd_12_obmc_sub_pixel_variance128x128 aom_highbd_12_obmc_sub_pixel_variance128x128_c
@@ -1508,25 +1573,32 @@
#define aom_highbd_12_sub_pixel_variance8x8 aom_highbd_12_sub_pixel_variance8x8_c
unsigned int aom_highbd_12_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance128x128 aom_highbd_12_variance128x128_c
+unsigned int aom_highbd_12_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x128 aom_highbd_12_variance128x128_neon
unsigned int aom_highbd_12_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance128x64 aom_highbd_12_variance128x64_c
+unsigned int aom_highbd_12_variance128x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance128x64 aom_highbd_12_variance128x64_neon
unsigned int aom_highbd_12_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x16 aom_highbd_12_variance16x16_c
+unsigned int aom_highbd_12_variance16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x16 aom_highbd_12_variance16x16_neon
unsigned int aom_highbd_12_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_c
+unsigned int aom_highbd_12_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_neon
-unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_neon
-unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_c
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_neon
unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance16x8 aom_highbd_12_variance16x8_c
+unsigned int aom_highbd_12_variance16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance16x8 aom_highbd_12_variance16x8_neon
unsigned int aom_highbd_12_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance2x2 aom_highbd_12_variance2x2_c
@@ -1535,52 +1607,67 @@
#define aom_highbd_12_variance2x4 aom_highbd_12_variance2x4_c
unsigned int aom_highbd_12_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x16 aom_highbd_12_variance32x16_c
+unsigned int aom_highbd_12_variance32x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x16 aom_highbd_12_variance32x16_neon
unsigned int aom_highbd_12_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x32 aom_highbd_12_variance32x32_c
+unsigned int aom_highbd_12_variance32x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x32 aom_highbd_12_variance32x32_neon
unsigned int aom_highbd_12_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_c
+unsigned int aom_highbd_12_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_neon
-unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_c
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_neon
-unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_neon
unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance4x2 aom_highbd_12_variance4x2_c
unsigned int aom_highbd_12_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance4x4 aom_highbd_12_variance4x4_c
+unsigned int aom_highbd_12_variance4x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x4 aom_highbd_12_variance4x4_neon
unsigned int aom_highbd_12_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance4x8 aom_highbd_12_variance4x8_c
+unsigned int aom_highbd_12_variance4x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance4x8 aom_highbd_12_variance4x8_neon
unsigned int aom_highbd_12_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_c
+unsigned int aom_highbd_12_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_neon
-unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_c
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_neon
unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x32 aom_highbd_12_variance64x32_c
+unsigned int aom_highbd_12_variance64x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x32 aom_highbd_12_variance64x32_neon
unsigned int aom_highbd_12_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance64x64 aom_highbd_12_variance64x64_c
+unsigned int aom_highbd_12_variance64x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance64x64 aom_highbd_12_variance64x64_neon
unsigned int aom_highbd_12_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_c
+unsigned int aom_highbd_12_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_neon
-unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_c
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_neon
unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x4 aom_highbd_12_variance8x4_c
+unsigned int aom_highbd_12_variance8x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x4 aom_highbd_12_variance8x4_neon
unsigned int aom_highbd_12_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_12_variance8x8 aom_highbd_12_variance8x8_c
+unsigned int aom_highbd_12_variance8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_12_variance8x8 aom_highbd_12_variance8x8_neon
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128 aom_highbd_8_dist_wtd_sub_pixel_avg_variance128x128_c
@@ -1648,12 +1735,6 @@
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
-
-void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
-
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_c
@@ -1721,16 +1802,20 @@
#define aom_highbd_8_masked_sub_pixel_variance8x8 aom_highbd_8_masked_sub_pixel_variance8x8_c
unsigned int aom_highbd_8_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse16x16 aom_highbd_8_mse16x16_c
+unsigned int aom_highbd_8_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x16 aom_highbd_8_mse16x16_neon
unsigned int aom_highbd_8_mse16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse16x8 aom_highbd_8_mse16x8_c
+unsigned int aom_highbd_8_mse16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse16x8 aom_highbd_8_mse16x8_neon
unsigned int aom_highbd_8_mse8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse8x16 aom_highbd_8_mse8x16_c
+unsigned int aom_highbd_8_mse8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x16 aom_highbd_8_mse8x16_neon
unsigned int aom_highbd_8_mse8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
-#define aom_highbd_8_mse8x8 aom_highbd_8_mse8x8_c
+unsigned int aom_highbd_8_mse8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
+#define aom_highbd_8_mse8x8 aom_highbd_8_mse8x8_neon
uint32_t aom_highbd_8_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred);
#define aom_highbd_8_sub_pixel_avg_variance128x128 aom_highbd_8_sub_pixel_avg_variance128x128_c
@@ -1865,25 +1950,32 @@
#define aom_highbd_8_sub_pixel_variance8x8 aom_highbd_8_sub_pixel_variance8x8_c
unsigned int aom_highbd_8_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance128x128 aom_highbd_8_variance128x128_c
+unsigned int aom_highbd_8_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x128 aom_highbd_8_variance128x128_neon
unsigned int aom_highbd_8_variance128x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance128x64 aom_highbd_8_variance128x64_c
+unsigned int aom_highbd_8_variance128x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance128x64 aom_highbd_8_variance128x64_neon
unsigned int aom_highbd_8_variance16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x16 aom_highbd_8_variance16x16_c
+unsigned int aom_highbd_8_variance16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x16 aom_highbd_8_variance16x16_neon
unsigned int aom_highbd_8_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_c
+unsigned int aom_highbd_8_variance16x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_neon
-unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_neon
-unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_c
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_neon
unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance16x8 aom_highbd_8_variance16x8_c
+unsigned int aom_highbd_8_variance16x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance16x8 aom_highbd_8_variance16x8_neon
unsigned int aom_highbd_8_variance2x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance2x2 aom_highbd_8_variance2x2_c
@@ -1892,59 +1984,75 @@
#define aom_highbd_8_variance2x4 aom_highbd_8_variance2x4_c
unsigned int aom_highbd_8_variance32x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x16 aom_highbd_8_variance32x16_c
+unsigned int aom_highbd_8_variance32x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x16 aom_highbd_8_variance32x16_neon
unsigned int aom_highbd_8_variance32x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x32 aom_highbd_8_variance32x32_c
+unsigned int aom_highbd_8_variance32x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x32 aom_highbd_8_variance32x32_neon
unsigned int aom_highbd_8_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_c
+unsigned int aom_highbd_8_variance32x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_neon
-unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_c
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance32x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_neon
-unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance4x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_neon
unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance4x2 aom_highbd_8_variance4x2_c
unsigned int aom_highbd_8_variance4x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance4x4 aom_highbd_8_variance4x4_c
+unsigned int aom_highbd_8_variance4x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x4 aom_highbd_8_variance4x4_neon
unsigned int aom_highbd_8_variance4x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance4x8 aom_highbd_8_variance4x8_c
+unsigned int aom_highbd_8_variance4x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance4x8 aom_highbd_8_variance4x8_neon
unsigned int aom_highbd_8_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_c
+unsigned int aom_highbd_8_variance64x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_neon
-unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_c
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance64x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_neon
unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x32 aom_highbd_8_variance64x32_c
+unsigned int aom_highbd_8_variance64x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x32 aom_highbd_8_variance64x32_neon
unsigned int aom_highbd_8_variance64x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance64x64 aom_highbd_8_variance64x64_c
+unsigned int aom_highbd_8_variance64x64_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance64x64 aom_highbd_8_variance64x64_neon
unsigned int aom_highbd_8_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_c
+unsigned int aom_highbd_8_variance8x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_neon
-unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_c
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance8x32_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_neon
unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x4 aom_highbd_8_variance8x4_c
+unsigned int aom_highbd_8_variance8x4_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x4 aom_highbd_8_variance8x4_neon
unsigned int aom_highbd_8_variance8x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
-#define aom_highbd_8_variance8x8 aom_highbd_8_variance8x8_c
+unsigned int aom_highbd_8_variance8x8_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+#define aom_highbd_8_variance8x8 aom_highbd_8_variance8x8_neon
unsigned int aom_highbd_avg_4x4_c(const uint8_t *, int p);
unsigned int aom_highbd_avg_4x4_neon(const uint8_t *, int p);
#define aom_highbd_avg_4x4 aom_highbd_avg_4x4_neon
unsigned int aom_highbd_avg_8x8_c(const uint8_t *, int p);
-#define aom_highbd_avg_8x8 aom_highbd_avg_8x8_c
+unsigned int aom_highbd_avg_8x8_neon(const uint8_t *, int p);
+#define aom_highbd_avg_8x8 aom_highbd_avg_8x8_neon
void aom_highbd_blend_a64_d16_mask_c(uint8_t *dst, uint32_t dst_stride, const CONV_BUF_TYPE *src0, uint32_t src0_stride, const CONV_BUF_TYPE *src1, uint32_t src1_stride, const uint8_t *mask, uint32_t mask_stride, int w, int h, int subw, int subh, ConvolveParams *conv_params, const int bd);
#define aom_highbd_blend_a64_d16_mask aom_highbd_blend_a64_d16_mask_c
@@ -1974,237 +2082,308 @@
#define aom_highbd_convolve_copy aom_highbd_convolve_copy_c
void aom_highbd_dc_128_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x16 aom_highbd_dc_128_predictor_16x16_c
+void aom_highbd_dc_128_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x16 aom_highbd_dc_128_predictor_16x16_neon
void aom_highbd_dc_128_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x32 aom_highbd_dc_128_predictor_16x32_c
+void aom_highbd_dc_128_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x32 aom_highbd_dc_128_predictor_16x32_neon
void aom_highbd_dc_128_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x4 aom_highbd_dc_128_predictor_16x4_c
+void aom_highbd_dc_128_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x4 aom_highbd_dc_128_predictor_16x4_neon
void aom_highbd_dc_128_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x64 aom_highbd_dc_128_predictor_16x64_c
+void aom_highbd_dc_128_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x64 aom_highbd_dc_128_predictor_16x64_neon
void aom_highbd_dc_128_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_16x8 aom_highbd_dc_128_predictor_16x8_c
+void aom_highbd_dc_128_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_16x8 aom_highbd_dc_128_predictor_16x8_neon
void aom_highbd_dc_128_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x16 aom_highbd_dc_128_predictor_32x16_c
+void aom_highbd_dc_128_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x16 aom_highbd_dc_128_predictor_32x16_neon
void aom_highbd_dc_128_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x32 aom_highbd_dc_128_predictor_32x32_c
+void aom_highbd_dc_128_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x32 aom_highbd_dc_128_predictor_32x32_neon
void aom_highbd_dc_128_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x64 aom_highbd_dc_128_predictor_32x64_c
+void aom_highbd_dc_128_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x64 aom_highbd_dc_128_predictor_32x64_neon
void aom_highbd_dc_128_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_32x8 aom_highbd_dc_128_predictor_32x8_c
+void aom_highbd_dc_128_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_32x8 aom_highbd_dc_128_predictor_32x8_neon
void aom_highbd_dc_128_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x16 aom_highbd_dc_128_predictor_4x16_c
+void aom_highbd_dc_128_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x16 aom_highbd_dc_128_predictor_4x16_neon
void aom_highbd_dc_128_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x4 aom_highbd_dc_128_predictor_4x4_c
+void aom_highbd_dc_128_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x4 aom_highbd_dc_128_predictor_4x4_neon
void aom_highbd_dc_128_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_4x8 aom_highbd_dc_128_predictor_4x8_c
+void aom_highbd_dc_128_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_4x8 aom_highbd_dc_128_predictor_4x8_neon
void aom_highbd_dc_128_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x16 aom_highbd_dc_128_predictor_64x16_c
+void aom_highbd_dc_128_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x16 aom_highbd_dc_128_predictor_64x16_neon
void aom_highbd_dc_128_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x32 aom_highbd_dc_128_predictor_64x32_c
+void aom_highbd_dc_128_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x32 aom_highbd_dc_128_predictor_64x32_neon
void aom_highbd_dc_128_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_64x64 aom_highbd_dc_128_predictor_64x64_c
+void aom_highbd_dc_128_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_64x64 aom_highbd_dc_128_predictor_64x64_neon
void aom_highbd_dc_128_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x16 aom_highbd_dc_128_predictor_8x16_c
+void aom_highbd_dc_128_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x16 aom_highbd_dc_128_predictor_8x16_neon
void aom_highbd_dc_128_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x32 aom_highbd_dc_128_predictor_8x32_c
+void aom_highbd_dc_128_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x32 aom_highbd_dc_128_predictor_8x32_neon
void aom_highbd_dc_128_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x4 aom_highbd_dc_128_predictor_8x4_c
+void aom_highbd_dc_128_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x4 aom_highbd_dc_128_predictor_8x4_neon
void aom_highbd_dc_128_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_128_predictor_8x8 aom_highbd_dc_128_predictor_8x8_c
+void aom_highbd_dc_128_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_128_predictor_8x8 aom_highbd_dc_128_predictor_8x8_neon
void aom_highbd_dc_left_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x16 aom_highbd_dc_left_predictor_16x16_c
+void aom_highbd_dc_left_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x16 aom_highbd_dc_left_predictor_16x16_neon
void aom_highbd_dc_left_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x32 aom_highbd_dc_left_predictor_16x32_c
+void aom_highbd_dc_left_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x32 aom_highbd_dc_left_predictor_16x32_neon
void aom_highbd_dc_left_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x4 aom_highbd_dc_left_predictor_16x4_c
+void aom_highbd_dc_left_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x4 aom_highbd_dc_left_predictor_16x4_neon
void aom_highbd_dc_left_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x64 aom_highbd_dc_left_predictor_16x64_c
+void aom_highbd_dc_left_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x64 aom_highbd_dc_left_predictor_16x64_neon
void aom_highbd_dc_left_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_16x8 aom_highbd_dc_left_predictor_16x8_c
+void aom_highbd_dc_left_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_16x8 aom_highbd_dc_left_predictor_16x8_neon
void aom_highbd_dc_left_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x16 aom_highbd_dc_left_predictor_32x16_c
+void aom_highbd_dc_left_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x16 aom_highbd_dc_left_predictor_32x16_neon
void aom_highbd_dc_left_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x32 aom_highbd_dc_left_predictor_32x32_c
+void aom_highbd_dc_left_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x32 aom_highbd_dc_left_predictor_32x32_neon
void aom_highbd_dc_left_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x64 aom_highbd_dc_left_predictor_32x64_c
+void aom_highbd_dc_left_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x64 aom_highbd_dc_left_predictor_32x64_neon
void aom_highbd_dc_left_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_32x8 aom_highbd_dc_left_predictor_32x8_c
+void aom_highbd_dc_left_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_32x8 aom_highbd_dc_left_predictor_32x8_neon
void aom_highbd_dc_left_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x16 aom_highbd_dc_left_predictor_4x16_c
+void aom_highbd_dc_left_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x16 aom_highbd_dc_left_predictor_4x16_neon
void aom_highbd_dc_left_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x4 aom_highbd_dc_left_predictor_4x4_c
+void aom_highbd_dc_left_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x4 aom_highbd_dc_left_predictor_4x4_neon
void aom_highbd_dc_left_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_4x8 aom_highbd_dc_left_predictor_4x8_c
+void aom_highbd_dc_left_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_4x8 aom_highbd_dc_left_predictor_4x8_neon
void aom_highbd_dc_left_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x16 aom_highbd_dc_left_predictor_64x16_c
+void aom_highbd_dc_left_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x16 aom_highbd_dc_left_predictor_64x16_neon
void aom_highbd_dc_left_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x32 aom_highbd_dc_left_predictor_64x32_c
+void aom_highbd_dc_left_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x32 aom_highbd_dc_left_predictor_64x32_neon
void aom_highbd_dc_left_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_64x64 aom_highbd_dc_left_predictor_64x64_c
+void aom_highbd_dc_left_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_64x64 aom_highbd_dc_left_predictor_64x64_neon
void aom_highbd_dc_left_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x16 aom_highbd_dc_left_predictor_8x16_c
+void aom_highbd_dc_left_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x16 aom_highbd_dc_left_predictor_8x16_neon
void aom_highbd_dc_left_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x32 aom_highbd_dc_left_predictor_8x32_c
+void aom_highbd_dc_left_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x32 aom_highbd_dc_left_predictor_8x32_neon
void aom_highbd_dc_left_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x4 aom_highbd_dc_left_predictor_8x4_c
+void aom_highbd_dc_left_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x4 aom_highbd_dc_left_predictor_8x4_neon
void aom_highbd_dc_left_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_left_predictor_8x8 aom_highbd_dc_left_predictor_8x8_c
+void aom_highbd_dc_left_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_left_predictor_8x8 aom_highbd_dc_left_predictor_8x8_neon
void aom_highbd_dc_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_16x16 aom_highbd_dc_predictor_16x16_neon
void aom_highbd_dc_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x32 aom_highbd_dc_predictor_16x32_c
+void aom_highbd_dc_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x32 aom_highbd_dc_predictor_16x32_neon
void aom_highbd_dc_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x4 aom_highbd_dc_predictor_16x4_c
+void aom_highbd_dc_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x4 aom_highbd_dc_predictor_16x4_neon
void aom_highbd_dc_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x64 aom_highbd_dc_predictor_16x64_c
+void aom_highbd_dc_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x64 aom_highbd_dc_predictor_16x64_neon
void aom_highbd_dc_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_16x8 aom_highbd_dc_predictor_16x8_c
+void aom_highbd_dc_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_16x8 aom_highbd_dc_predictor_16x8_neon
void aom_highbd_dc_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x16 aom_highbd_dc_predictor_32x16_c
+void aom_highbd_dc_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x16 aom_highbd_dc_predictor_32x16_neon
void aom_highbd_dc_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_32x32 aom_highbd_dc_predictor_32x32_neon
void aom_highbd_dc_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x64 aom_highbd_dc_predictor_32x64_c
+void aom_highbd_dc_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x64 aom_highbd_dc_predictor_32x64_neon
void aom_highbd_dc_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_32x8 aom_highbd_dc_predictor_32x8_c
+void aom_highbd_dc_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_32x8 aom_highbd_dc_predictor_32x8_neon
void aom_highbd_dc_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_4x16 aom_highbd_dc_predictor_4x16_c
+void aom_highbd_dc_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x16 aom_highbd_dc_predictor_4x16_neon
void aom_highbd_dc_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_4x4 aom_highbd_dc_predictor_4x4_neon
void aom_highbd_dc_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_4x8 aom_highbd_dc_predictor_4x8_c
+void aom_highbd_dc_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_4x8 aom_highbd_dc_predictor_4x8_neon
void aom_highbd_dc_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_64x16 aom_highbd_dc_predictor_64x16_c
+void aom_highbd_dc_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x16 aom_highbd_dc_predictor_64x16_neon
void aom_highbd_dc_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_64x32 aom_highbd_dc_predictor_64x32_c
+void aom_highbd_dc_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_64x32 aom_highbd_dc_predictor_64x32_neon
void aom_highbd_dc_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_64x64 aom_highbd_dc_predictor_64x64_neon
void aom_highbd_dc_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x16 aom_highbd_dc_predictor_8x16_c
+void aom_highbd_dc_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x16 aom_highbd_dc_predictor_8x16_neon
void aom_highbd_dc_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x32 aom_highbd_dc_predictor_8x32_c
+void aom_highbd_dc_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x32 aom_highbd_dc_predictor_8x32_neon
void aom_highbd_dc_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_predictor_8x4 aom_highbd_dc_predictor_8x4_c
+void aom_highbd_dc_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_predictor_8x4 aom_highbd_dc_predictor_8x4_neon
void aom_highbd_dc_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_dc_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_dc_predictor_8x8 aom_highbd_dc_predictor_8x8_neon
void aom_highbd_dc_top_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x16 aom_highbd_dc_top_predictor_16x16_c
+void aom_highbd_dc_top_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x16 aom_highbd_dc_top_predictor_16x16_neon
void aom_highbd_dc_top_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x32 aom_highbd_dc_top_predictor_16x32_c
+void aom_highbd_dc_top_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x32 aom_highbd_dc_top_predictor_16x32_neon
void aom_highbd_dc_top_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x4 aom_highbd_dc_top_predictor_16x4_c
+void aom_highbd_dc_top_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x4 aom_highbd_dc_top_predictor_16x4_neon
void aom_highbd_dc_top_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x64 aom_highbd_dc_top_predictor_16x64_c
+void aom_highbd_dc_top_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x64 aom_highbd_dc_top_predictor_16x64_neon
void aom_highbd_dc_top_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_16x8 aom_highbd_dc_top_predictor_16x8_c
+void aom_highbd_dc_top_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_16x8 aom_highbd_dc_top_predictor_16x8_neon
void aom_highbd_dc_top_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x16 aom_highbd_dc_top_predictor_32x16_c
+void aom_highbd_dc_top_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x16 aom_highbd_dc_top_predictor_32x16_neon
void aom_highbd_dc_top_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x32 aom_highbd_dc_top_predictor_32x32_c
+void aom_highbd_dc_top_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x32 aom_highbd_dc_top_predictor_32x32_neon
void aom_highbd_dc_top_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x64 aom_highbd_dc_top_predictor_32x64_c
+void aom_highbd_dc_top_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x64 aom_highbd_dc_top_predictor_32x64_neon
void aom_highbd_dc_top_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_32x8 aom_highbd_dc_top_predictor_32x8_c
+void aom_highbd_dc_top_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_32x8 aom_highbd_dc_top_predictor_32x8_neon
void aom_highbd_dc_top_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x16 aom_highbd_dc_top_predictor_4x16_c
+void aom_highbd_dc_top_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x16 aom_highbd_dc_top_predictor_4x16_neon
void aom_highbd_dc_top_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x4 aom_highbd_dc_top_predictor_4x4_c
+void aom_highbd_dc_top_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x4 aom_highbd_dc_top_predictor_4x4_neon
void aom_highbd_dc_top_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_4x8 aom_highbd_dc_top_predictor_4x8_c
+void aom_highbd_dc_top_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_4x8 aom_highbd_dc_top_predictor_4x8_neon
void aom_highbd_dc_top_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x16 aom_highbd_dc_top_predictor_64x16_c
+void aom_highbd_dc_top_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x16 aom_highbd_dc_top_predictor_64x16_neon
void aom_highbd_dc_top_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x32 aom_highbd_dc_top_predictor_64x32_c
+void aom_highbd_dc_top_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x32 aom_highbd_dc_top_predictor_64x32_neon
void aom_highbd_dc_top_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_64x64 aom_highbd_dc_top_predictor_64x64_c
+void aom_highbd_dc_top_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_64x64 aom_highbd_dc_top_predictor_64x64_neon
void aom_highbd_dc_top_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x16 aom_highbd_dc_top_predictor_8x16_c
+void aom_highbd_dc_top_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x16 aom_highbd_dc_top_predictor_8x16_neon
void aom_highbd_dc_top_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x32 aom_highbd_dc_top_predictor_8x32_c
+void aom_highbd_dc_top_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x32 aom_highbd_dc_top_predictor_8x32_neon
void aom_highbd_dc_top_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x4 aom_highbd_dc_top_predictor_8x4_c
+void aom_highbd_dc_top_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x4 aom_highbd_dc_top_predictor_8x4_neon
void aom_highbd_dc_top_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_dc_top_predictor_8x8 aom_highbd_dc_top_predictor_8x8_c
+void aom_highbd_dc_top_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_dc_top_predictor_8x8 aom_highbd_dc_top_predictor_8x8_neon
void aom_highbd_dist_wtd_comp_avg_pred_c(uint8_t *comp_pred8, const uint8_t *pred8, int width, int height, const uint8_t *ref8, int ref_stride, const DIST_WTD_COMP_PARAMS *jcp_param);
#define aom_highbd_dist_wtd_comp_avg_pred aom_highbd_dist_wtd_comp_avg_pred_c
@@ -2275,74 +2454,93 @@
unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
-void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_c
-
void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_c
+void aom_highbd_h_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_neon
void aom_highbd_h_predictor_16x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x32 aom_highbd_h_predictor_16x32_c
+void aom_highbd_h_predictor_16x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x32 aom_highbd_h_predictor_16x32_neon
void aom_highbd_h_predictor_16x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x4 aom_highbd_h_predictor_16x4_c
+void aom_highbd_h_predictor_16x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x4 aom_highbd_h_predictor_16x4_neon
void aom_highbd_h_predictor_16x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x64 aom_highbd_h_predictor_16x64_c
+void aom_highbd_h_predictor_16x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x64 aom_highbd_h_predictor_16x64_neon
void aom_highbd_h_predictor_16x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_16x8 aom_highbd_h_predictor_16x8_c
+void aom_highbd_h_predictor_16x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_16x8 aom_highbd_h_predictor_16x8_neon
void aom_highbd_h_predictor_32x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x16 aom_highbd_h_predictor_32x16_c
+void aom_highbd_h_predictor_32x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x16 aom_highbd_h_predictor_32x16_neon
void aom_highbd_h_predictor_32x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x32 aom_highbd_h_predictor_32x32_c
+void aom_highbd_h_predictor_32x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x32 aom_highbd_h_predictor_32x32_neon
void aom_highbd_h_predictor_32x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x64 aom_highbd_h_predictor_32x64_c
+void aom_highbd_h_predictor_32x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x64 aom_highbd_h_predictor_32x64_neon
void aom_highbd_h_predictor_32x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_32x8 aom_highbd_h_predictor_32x8_c
+void aom_highbd_h_predictor_32x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_32x8 aom_highbd_h_predictor_32x8_neon
void aom_highbd_h_predictor_4x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x16 aom_highbd_h_predictor_4x16_c
+void aom_highbd_h_predictor_4x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x16 aom_highbd_h_predictor_4x16_neon
void aom_highbd_h_predictor_4x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x4 aom_highbd_h_predictor_4x4_c
+void aom_highbd_h_predictor_4x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x4 aom_highbd_h_predictor_4x4_neon
void aom_highbd_h_predictor_4x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_4x8 aom_highbd_h_predictor_4x8_c
+void aom_highbd_h_predictor_4x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_4x8 aom_highbd_h_predictor_4x8_neon
void aom_highbd_h_predictor_64x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x16 aom_highbd_h_predictor_64x16_c
+void aom_highbd_h_predictor_64x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x16 aom_highbd_h_predictor_64x16_neon
void aom_highbd_h_predictor_64x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x32 aom_highbd_h_predictor_64x32_c
+void aom_highbd_h_predictor_64x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x32 aom_highbd_h_predictor_64x32_neon
void aom_highbd_h_predictor_64x64_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_64x64 aom_highbd_h_predictor_64x64_c
+void aom_highbd_h_predictor_64x64_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_64x64 aom_highbd_h_predictor_64x64_neon
void aom_highbd_h_predictor_8x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x16 aom_highbd_h_predictor_8x16_c
+void aom_highbd_h_predictor_8x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x16 aom_highbd_h_predictor_8x16_neon
void aom_highbd_h_predictor_8x32_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x32 aom_highbd_h_predictor_8x32_c
+void aom_highbd_h_predictor_8x32_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x32 aom_highbd_h_predictor_8x32_neon
void aom_highbd_h_predictor_8x4_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x4 aom_highbd_h_predictor_8x4_c
+void aom_highbd_h_predictor_8x4_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x4 aom_highbd_h_predictor_8x4_neon
void aom_highbd_h_predictor_8x8_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
-#define aom_highbd_h_predictor_8x8 aom_highbd_h_predictor_8x8_c
+void aom_highbd_h_predictor_8x8_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
+#define aom_highbd_h_predictor_8x8 aom_highbd_h_predictor_8x8_neon
void aom_highbd_hadamard_16x16_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_16x16 aom_highbd_hadamard_16x16_c
+void aom_highbd_hadamard_16x16_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_16x16 aom_highbd_hadamard_16x16_neon
void aom_highbd_hadamard_32x32_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_32x32 aom_highbd_hadamard_32x32_c
+void aom_highbd_hadamard_32x32_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_32x32 aom_highbd_hadamard_32x32_neon
void aom_highbd_hadamard_8x8_c(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
-#define aom_highbd_hadamard_8x8 aom_highbd_hadamard_8x8_c
+void aom_highbd_hadamard_8x8_neon(const int16_t *src_diff, ptrdiff_t src_stride, tran_low_t *coeff);
+#define aom_highbd_hadamard_8x8 aom_highbd_hadamard_8x8_neon
void aom_highbd_lpf_horizontal_14_c(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
void aom_highbd_lpf_horizontal_14_neon(uint16_t *s, int pitch, const uint8_t *blimit, const uint8_t *limit, const uint8_t *thresh, int bd);
@@ -2475,7 +2673,8 @@
#define aom_highbd_masked_sad8x8 aom_highbd_masked_sad8x8_c
void aom_highbd_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
-#define aom_highbd_minmax_8x8 aom_highbd_minmax_8x8_c
+void aom_highbd_minmax_8x8_neon(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
+#define aom_highbd_minmax_8x8 aom_highbd_minmax_8x8_neon
unsigned int aom_highbd_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
#define aom_highbd_obmc_sad128x128 aom_highbd_obmc_sad128x128_c
@@ -2776,7 +2975,8 @@
#define aom_highbd_quantize_b_adaptive aom_highbd_quantize_b_adaptive_neon
unsigned int aom_highbd_sad128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad128x128 aom_highbd_sad128x128_c
+unsigned int aom_highbd_sad128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x128 aom_highbd_sad128x128_neon
unsigned int aom_highbd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad128x128_avg aom_highbd_sad128x128_avg_c
@@ -2785,10 +2985,12 @@
#define aom_highbd_sad128x128x3d aom_highbd_sad128x128x3d_c
void aom_highbd_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad128x128x4d aom_highbd_sad128x128x4d_c
+void aom_highbd_sad128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x128x4d aom_highbd_sad128x128x4d_neon
unsigned int aom_highbd_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad128x64 aom_highbd_sad128x64_c
+unsigned int aom_highbd_sad128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad128x64 aom_highbd_sad128x64_neon
unsigned int aom_highbd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad128x64_avg aom_highbd_sad128x64_avg_c
@@ -2797,10 +2999,12 @@
#define aom_highbd_sad128x64x3d aom_highbd_sad128x64x3d_c
void aom_highbd_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad128x64x4d aom_highbd_sad128x64x4d_c
+void aom_highbd_sad128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad128x64x4d aom_highbd_sad128x64x4d_neon
unsigned int aom_highbd_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x16 aom_highbd_sad16x16_c
+unsigned int aom_highbd_sad16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x16 aom_highbd_sad16x16_neon
unsigned int aom_highbd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x16_avg aom_highbd_sad16x16_avg_c
@@ -2809,10 +3013,12 @@
#define aom_highbd_sad16x16x3d aom_highbd_sad16x16x3d_c
void aom_highbd_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x16x4d aom_highbd_sad16x16x4d_c
+void aom_highbd_sad16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x16x4d aom_highbd_sad16x16x4d_neon
unsigned int aom_highbd_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x32 aom_highbd_sad16x32_c
+unsigned int aom_highbd_sad16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x32 aom_highbd_sad16x32_neon
unsigned int aom_highbd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x32_avg aom_highbd_sad16x32_avg_c
@@ -2821,10 +3027,12 @@
#define aom_highbd_sad16x32x3d aom_highbd_sad16x32x3d_c
void aom_highbd_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x32x4d aom_highbd_sad16x32x4d_c
+void aom_highbd_sad16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x32x4d aom_highbd_sad16x32x4d_neon
unsigned int aom_highbd_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x4 aom_highbd_sad16x4_c
+unsigned int aom_highbd_sad16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x4 aom_highbd_sad16x4_neon
unsigned int aom_highbd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x4_avg aom_highbd_sad16x4_avg_c
@@ -2833,10 +3041,12 @@
#define aom_highbd_sad16x4x3d aom_highbd_sad16x4x3d_c
void aom_highbd_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x4x4d aom_highbd_sad16x4x4d_c
+void aom_highbd_sad16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x4x4d aom_highbd_sad16x4x4d_neon
unsigned int aom_highbd_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x64 aom_highbd_sad16x64_c
+unsigned int aom_highbd_sad16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x64 aom_highbd_sad16x64_neon
unsigned int aom_highbd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x64_avg aom_highbd_sad16x64_avg_c
@@ -2845,10 +3055,12 @@
#define aom_highbd_sad16x64x3d aom_highbd_sad16x64x3d_c
void aom_highbd_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x64x4d aom_highbd_sad16x64x4d_c
+void aom_highbd_sad16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x64x4d aom_highbd_sad16x64x4d_neon
unsigned int aom_highbd_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad16x8 aom_highbd_sad16x8_c
+unsigned int aom_highbd_sad16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad16x8 aom_highbd_sad16x8_neon
unsigned int aom_highbd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad16x8_avg aom_highbd_sad16x8_avg_c
@@ -2857,10 +3069,12 @@
#define aom_highbd_sad16x8x3d aom_highbd_sad16x8x3d_c
void aom_highbd_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad16x8x4d aom_highbd_sad16x8x4d_c
+void aom_highbd_sad16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad16x8x4d aom_highbd_sad16x8x4d_neon
unsigned int aom_highbd_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x16 aom_highbd_sad32x16_c
+unsigned int aom_highbd_sad32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x16 aom_highbd_sad32x16_neon
unsigned int aom_highbd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x16_avg aom_highbd_sad32x16_avg_c
@@ -2869,10 +3083,12 @@
#define aom_highbd_sad32x16x3d aom_highbd_sad32x16x3d_c
void aom_highbd_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x16x4d aom_highbd_sad32x16x4d_c
+void aom_highbd_sad32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x16x4d aom_highbd_sad32x16x4d_neon
unsigned int aom_highbd_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x32 aom_highbd_sad32x32_c
+unsigned int aom_highbd_sad32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x32 aom_highbd_sad32x32_neon
unsigned int aom_highbd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x32_avg aom_highbd_sad32x32_avg_c
@@ -2881,10 +3097,12 @@
#define aom_highbd_sad32x32x3d aom_highbd_sad32x32x3d_c
void aom_highbd_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x32x4d aom_highbd_sad32x32x4d_c
+void aom_highbd_sad32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x32x4d aom_highbd_sad32x32x4d_neon
unsigned int aom_highbd_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x64 aom_highbd_sad32x64_c
+unsigned int aom_highbd_sad32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x64 aom_highbd_sad32x64_neon
unsigned int aom_highbd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x64_avg aom_highbd_sad32x64_avg_c
@@ -2893,10 +3111,12 @@
#define aom_highbd_sad32x64x3d aom_highbd_sad32x64x3d_c
void aom_highbd_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x64x4d aom_highbd_sad32x64x4d_c
+void aom_highbd_sad32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x64x4d aom_highbd_sad32x64x4d_neon
unsigned int aom_highbd_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad32x8 aom_highbd_sad32x8_c
+unsigned int aom_highbd_sad32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad32x8 aom_highbd_sad32x8_neon
unsigned int aom_highbd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad32x8_avg aom_highbd_sad32x8_avg_c
@@ -2905,10 +3125,12 @@
#define aom_highbd_sad32x8x3d aom_highbd_sad32x8x3d_c
void aom_highbd_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad32x8x4d aom_highbd_sad32x8x4d_c
+void aom_highbd_sad32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad32x8x4d aom_highbd_sad32x8x4d_neon
unsigned int aom_highbd_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x16 aom_highbd_sad4x16_c
+unsigned int aom_highbd_sad4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x16 aom_highbd_sad4x16_neon
unsigned int aom_highbd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x16_avg aom_highbd_sad4x16_avg_c
@@ -2917,10 +3139,12 @@
#define aom_highbd_sad4x16x3d aom_highbd_sad4x16x3d_c
void aom_highbd_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x16x4d aom_highbd_sad4x16x4d_c
+void aom_highbd_sad4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x16x4d aom_highbd_sad4x16x4d_neon
unsigned int aom_highbd_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x4 aom_highbd_sad4x4_c
+unsigned int aom_highbd_sad4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x4 aom_highbd_sad4x4_neon
unsigned int aom_highbd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x4_avg aom_highbd_sad4x4_avg_c
@@ -2929,10 +3153,12 @@
#define aom_highbd_sad4x4x3d aom_highbd_sad4x4x3d_c
void aom_highbd_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x4x4d aom_highbd_sad4x4x4d_c
+void aom_highbd_sad4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x4x4d aom_highbd_sad4x4x4d_neon
unsigned int aom_highbd_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad4x8 aom_highbd_sad4x8_c
+unsigned int aom_highbd_sad4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad4x8 aom_highbd_sad4x8_neon
unsigned int aom_highbd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad4x8_avg aom_highbd_sad4x8_avg_c
@@ -2941,10 +3167,12 @@
#define aom_highbd_sad4x8x3d aom_highbd_sad4x8x3d_c
void aom_highbd_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad4x8x4d aom_highbd_sad4x8x4d_c
+void aom_highbd_sad4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad4x8x4d aom_highbd_sad4x8x4d_neon
unsigned int aom_highbd_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x128 aom_highbd_sad64x128_c
+unsigned int aom_highbd_sad64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x128 aom_highbd_sad64x128_neon
unsigned int aom_highbd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x128_avg aom_highbd_sad64x128_avg_c
@@ -2953,10 +3181,12 @@
#define aom_highbd_sad64x128x3d aom_highbd_sad64x128x3d_c
void aom_highbd_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x128x4d aom_highbd_sad64x128x4d_c
+void aom_highbd_sad64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x128x4d aom_highbd_sad64x128x4d_neon
unsigned int aom_highbd_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x16 aom_highbd_sad64x16_c
+unsigned int aom_highbd_sad64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x16 aom_highbd_sad64x16_neon
unsigned int aom_highbd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x16_avg aom_highbd_sad64x16_avg_c
@@ -2965,10 +3195,12 @@
#define aom_highbd_sad64x16x3d aom_highbd_sad64x16x3d_c
void aom_highbd_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x16x4d aom_highbd_sad64x16x4d_c
+void aom_highbd_sad64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x16x4d aom_highbd_sad64x16x4d_neon
unsigned int aom_highbd_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x32 aom_highbd_sad64x32_c
+unsigned int aom_highbd_sad64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x32 aom_highbd_sad64x32_neon
unsigned int aom_highbd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x32_avg aom_highbd_sad64x32_avg_c
@@ -2977,10 +3209,12 @@
#define aom_highbd_sad64x32x3d aom_highbd_sad64x32x3d_c
void aom_highbd_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x32x4d aom_highbd_sad64x32x4d_c
+void aom_highbd_sad64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x32x4d aom_highbd_sad64x32x4d_neon
unsigned int aom_highbd_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad64x64 aom_highbd_sad64x64_c
+unsigned int aom_highbd_sad64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad64x64 aom_highbd_sad64x64_neon
unsigned int aom_highbd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad64x64_avg aom_highbd_sad64x64_avg_c
@@ -2989,10 +3223,12 @@
#define aom_highbd_sad64x64x3d aom_highbd_sad64x64x3d_c
void aom_highbd_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad64x64x4d aom_highbd_sad64x64x4d_c
+void aom_highbd_sad64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad64x64x4d aom_highbd_sad64x64x4d_neon
unsigned int aom_highbd_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x16 aom_highbd_sad8x16_c
+unsigned int aom_highbd_sad8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x16 aom_highbd_sad8x16_neon
unsigned int aom_highbd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x16_avg aom_highbd_sad8x16_avg_c
@@ -3001,10 +3237,12 @@
#define aom_highbd_sad8x16x3d aom_highbd_sad8x16x3d_c
void aom_highbd_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x16x4d aom_highbd_sad8x16x4d_c
+void aom_highbd_sad8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x16x4d aom_highbd_sad8x16x4d_neon
unsigned int aom_highbd_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x32 aom_highbd_sad8x32_c
+unsigned int aom_highbd_sad8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x32 aom_highbd_sad8x32_neon
unsigned int aom_highbd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x32_avg aom_highbd_sad8x32_avg_c
@@ -3013,10 +3251,12 @@
#define aom_highbd_sad8x32x3d aom_highbd_sad8x32x3d_c
void aom_highbd_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x32x4d aom_highbd_sad8x32x4d_c
+void aom_highbd_sad8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x32x4d aom_highbd_sad8x32x4d_neon
unsigned int aom_highbd_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x4 aom_highbd_sad8x4_c
+unsigned int aom_highbd_sad8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x4 aom_highbd_sad8x4_neon
unsigned int aom_highbd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x4_avg aom_highbd_sad8x4_avg_c
@@ -3025,10 +3265,12 @@
#define aom_highbd_sad8x4x3d aom_highbd_sad8x4x3d_c
void aom_highbd_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x4x4d aom_highbd_sad8x4x4d_c
+void aom_highbd_sad8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x4x4d aom_highbd_sad8x4x4d_neon
unsigned int aom_highbd_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad8x8 aom_highbd_sad8x8_c
+unsigned int aom_highbd_sad8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad8x8 aom_highbd_sad8x8_neon
unsigned int aom_highbd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred);
#define aom_highbd_sad8x8_avg aom_highbd_sad8x8_avg_c
@@ -3037,139 +3279,184 @@
#define aom_highbd_sad8x8x3d aom_highbd_sad8x8x3d_c
void aom_highbd_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad8x8x4d aom_highbd_sad8x8x4d_c
+void aom_highbd_sad8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad8x8x4d aom_highbd_sad8x8x4d_neon
unsigned int aom_highbd_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_128x128 aom_highbd_sad_skip_128x128_c
+unsigned int aom_highbd_sad_skip_128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_128x128 aom_highbd_sad_skip_128x128_neon
void aom_highbd_sad_skip_128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_128x128x4d aom_highbd_sad_skip_128x128x4d_c
+void aom_highbd_sad_skip_128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_128x128x4d aom_highbd_sad_skip_128x128x4d_neon
unsigned int aom_highbd_sad_skip_128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_128x64 aom_highbd_sad_skip_128x64_c
+unsigned int aom_highbd_sad_skip_128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_128x64 aom_highbd_sad_skip_128x64_neon
void aom_highbd_sad_skip_128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_128x64x4d aom_highbd_sad_skip_128x64x4d_c
+void aom_highbd_sad_skip_128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_128x64x4d aom_highbd_sad_skip_128x64x4d_neon
unsigned int aom_highbd_sad_skip_16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x16 aom_highbd_sad_skip_16x16_c
+unsigned int aom_highbd_sad_skip_16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x16 aom_highbd_sad_skip_16x16_neon
void aom_highbd_sad_skip_16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x16x4d aom_highbd_sad_skip_16x16x4d_c
+void aom_highbd_sad_skip_16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x16x4d aom_highbd_sad_skip_16x16x4d_neon
unsigned int aom_highbd_sad_skip_16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x32 aom_highbd_sad_skip_16x32_c
+unsigned int aom_highbd_sad_skip_16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x32 aom_highbd_sad_skip_16x32_neon
void aom_highbd_sad_skip_16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x32x4d aom_highbd_sad_skip_16x32x4d_c
+void aom_highbd_sad_skip_16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x32x4d aom_highbd_sad_skip_16x32x4d_neon
unsigned int aom_highbd_sad_skip_16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x4 aom_highbd_sad_skip_16x4_c
+unsigned int aom_highbd_sad_skip_16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x4 aom_highbd_sad_skip_16x4_neon
void aom_highbd_sad_skip_16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x4x4d aom_highbd_sad_skip_16x4x4d_c
+void aom_highbd_sad_skip_16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x4x4d aom_highbd_sad_skip_16x4x4d_neon
unsigned int aom_highbd_sad_skip_16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x64 aom_highbd_sad_skip_16x64_c
+unsigned int aom_highbd_sad_skip_16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x64 aom_highbd_sad_skip_16x64_neon
void aom_highbd_sad_skip_16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x64x4d aom_highbd_sad_skip_16x64x4d_c
+void aom_highbd_sad_skip_16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x64x4d aom_highbd_sad_skip_16x64x4d_neon
unsigned int aom_highbd_sad_skip_16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_16x8 aom_highbd_sad_skip_16x8_c
+unsigned int aom_highbd_sad_skip_16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_16x8 aom_highbd_sad_skip_16x8_neon
void aom_highbd_sad_skip_16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_16x8x4d aom_highbd_sad_skip_16x8x4d_c
+void aom_highbd_sad_skip_16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_16x8x4d aom_highbd_sad_skip_16x8x4d_neon
unsigned int aom_highbd_sad_skip_32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x16 aom_highbd_sad_skip_32x16_c
+unsigned int aom_highbd_sad_skip_32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x16 aom_highbd_sad_skip_32x16_neon
void aom_highbd_sad_skip_32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x16x4d aom_highbd_sad_skip_32x16x4d_c
+void aom_highbd_sad_skip_32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x16x4d aom_highbd_sad_skip_32x16x4d_neon
unsigned int aom_highbd_sad_skip_32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x32 aom_highbd_sad_skip_32x32_c
+unsigned int aom_highbd_sad_skip_32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x32 aom_highbd_sad_skip_32x32_neon
void aom_highbd_sad_skip_32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x32x4d aom_highbd_sad_skip_32x32x4d_c
+void aom_highbd_sad_skip_32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x32x4d aom_highbd_sad_skip_32x32x4d_neon
unsigned int aom_highbd_sad_skip_32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x64 aom_highbd_sad_skip_32x64_c
+unsigned int aom_highbd_sad_skip_32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x64 aom_highbd_sad_skip_32x64_neon
void aom_highbd_sad_skip_32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x64x4d aom_highbd_sad_skip_32x64x4d_c
+void aom_highbd_sad_skip_32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x64x4d aom_highbd_sad_skip_32x64x4d_neon
unsigned int aom_highbd_sad_skip_32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_32x8 aom_highbd_sad_skip_32x8_c
+unsigned int aom_highbd_sad_skip_32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_32x8 aom_highbd_sad_skip_32x8_neon
void aom_highbd_sad_skip_32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_32x8x4d aom_highbd_sad_skip_32x8x4d_c
+void aom_highbd_sad_skip_32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_32x8x4d aom_highbd_sad_skip_32x8x4d_neon
unsigned int aom_highbd_sad_skip_4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x16 aom_highbd_sad_skip_4x16_c
+unsigned int aom_highbd_sad_skip_4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x16 aom_highbd_sad_skip_4x16_neon
void aom_highbd_sad_skip_4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x16x4d aom_highbd_sad_skip_4x16x4d_c
+void aom_highbd_sad_skip_4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x16x4d aom_highbd_sad_skip_4x16x4d_neon
unsigned int aom_highbd_sad_skip_4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x4 aom_highbd_sad_skip_4x4_c
+unsigned int aom_highbd_sad_skip_4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x4 aom_highbd_sad_skip_4x4_neon
void aom_highbd_sad_skip_4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x4x4d aom_highbd_sad_skip_4x4x4d_c
+void aom_highbd_sad_skip_4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x4x4d aom_highbd_sad_skip_4x4x4d_neon
unsigned int aom_highbd_sad_skip_4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_4x8 aom_highbd_sad_skip_4x8_c
+unsigned int aom_highbd_sad_skip_4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_4x8 aom_highbd_sad_skip_4x8_neon
void aom_highbd_sad_skip_4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_4x8x4d aom_highbd_sad_skip_4x8x4d_c
+void aom_highbd_sad_skip_4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_4x8x4d aom_highbd_sad_skip_4x8x4d_neon
unsigned int aom_highbd_sad_skip_64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x128 aom_highbd_sad_skip_64x128_c
+unsigned int aom_highbd_sad_skip_64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x128 aom_highbd_sad_skip_64x128_neon
void aom_highbd_sad_skip_64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x128x4d aom_highbd_sad_skip_64x128x4d_c
+void aom_highbd_sad_skip_64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x128x4d aom_highbd_sad_skip_64x128x4d_neon
unsigned int aom_highbd_sad_skip_64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x16 aom_highbd_sad_skip_64x16_c
+unsigned int aom_highbd_sad_skip_64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x16 aom_highbd_sad_skip_64x16_neon
void aom_highbd_sad_skip_64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x16x4d aom_highbd_sad_skip_64x16x4d_c
+void aom_highbd_sad_skip_64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x16x4d aom_highbd_sad_skip_64x16x4d_neon
unsigned int aom_highbd_sad_skip_64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x32 aom_highbd_sad_skip_64x32_c
+unsigned int aom_highbd_sad_skip_64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x32 aom_highbd_sad_skip_64x32_neon
void aom_highbd_sad_skip_64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x32x4d aom_highbd_sad_skip_64x32x4d_c
+void aom_highbd_sad_skip_64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x32x4d aom_highbd_sad_skip_64x32x4d_neon
unsigned int aom_highbd_sad_skip_64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_64x64 aom_highbd_sad_skip_64x64_c
+unsigned int aom_highbd_sad_skip_64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_64x64 aom_highbd_sad_skip_64x64_neon
void aom_highbd_sad_skip_64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_64x64x4d aom_highbd_sad_skip_64x64x4d_c
+void aom_highbd_sad_skip_64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_64x64x4d aom_highbd_sad_skip_64x64x4d_neon
unsigned int aom_highbd_sad_skip_8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x16 aom_highbd_sad_skip_8x16_c
+unsigned int aom_highbd_sad_skip_8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x16 aom_highbd_sad_skip_8x16_neon
void aom_highbd_sad_skip_8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x16x4d aom_highbd_sad_skip_8x16x4d_c
+void aom_highbd_sad_skip_8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x16x4d aom_highbd_sad_skip_8x16x4d_neon
unsigned int aom_highbd_sad_skip_8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x32 aom_highbd_sad_skip_8x32_c
+unsigned int aom_highbd_sad_skip_8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x32 aom_highbd_sad_skip_8x32_neon
void aom_highbd_sad_skip_8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x32x4d aom_highbd_sad_skip_8x32x4d_c
+void aom_highbd_sad_skip_8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x32x4d aom_highbd_sad_skip_8x32x4d_neon
unsigned int aom_highbd_sad_skip_8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x4 aom_highbd_sad_skip_8x4_c
+unsigned int aom_highbd_sad_skip_8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x4 aom_highbd_sad_skip_8x4_neon
void aom_highbd_sad_skip_8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x4x4d aom_highbd_sad_skip_8x4x4d_c
+void aom_highbd_sad_skip_8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x4x4d aom_highbd_sad_skip_8x4x4d_neon
unsigned int aom_highbd_sad_skip_8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_highbd_sad_skip_8x8 aom_highbd_sad_skip_8x8_c
+unsigned int aom_highbd_sad_skip_8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_highbd_sad_skip_8x8 aom_highbd_sad_skip_8x8_neon
void aom_highbd_sad_skip_8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
-#define aom_highbd_sad_skip_8x8x4d aom_highbd_sad_skip_8x8x4d_c
+void aom_highbd_sad_skip_8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array);
+#define aom_highbd_sad_skip_8x8x4d aom_highbd_sad_skip_8x8x4d_neon
void aom_highbd_smooth_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_smooth_h_predictor_16x16_neon(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
@@ -3610,205 +3897,272 @@
#define aom_lpf_vertical_8_quad aom_lpf_vertical_8_quad_neon
unsigned int aom_masked_sad128x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad128x128 aom_masked_sad128x128_c
+unsigned int aom_masked_sad128x128_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x128 aom_masked_sad128x128_neon
void aom_masked_sad128x128x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad128x128x4d aom_masked_sad128x128x4d_c
+void aom_masked_sad128x128x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad128x128x4d aom_masked_sad128x128x4d_neon
unsigned int aom_masked_sad128x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad128x64 aom_masked_sad128x64_c
+unsigned int aom_masked_sad128x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad128x64 aom_masked_sad128x64_neon
void aom_masked_sad128x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad128x64x4d aom_masked_sad128x64x4d_c
+void aom_masked_sad128x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad128x64x4d aom_masked_sad128x64x4d_neon
unsigned int aom_masked_sad16x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x16 aom_masked_sad16x16_c
+unsigned int aom_masked_sad16x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x16 aom_masked_sad16x16_neon
void aom_masked_sad16x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x16x4d aom_masked_sad16x16x4d_c
+void aom_masked_sad16x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x16x4d aom_masked_sad16x16x4d_neon
unsigned int aom_masked_sad16x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x32 aom_masked_sad16x32_c
+unsigned int aom_masked_sad16x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x32 aom_masked_sad16x32_neon
void aom_masked_sad16x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x32x4d aom_masked_sad16x32x4d_c
+void aom_masked_sad16x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x32x4d aom_masked_sad16x32x4d_neon
unsigned int aom_masked_sad16x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x4 aom_masked_sad16x4_c
+unsigned int aom_masked_sad16x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x4 aom_masked_sad16x4_neon
void aom_masked_sad16x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x4x4d aom_masked_sad16x4x4d_c
+void aom_masked_sad16x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x4x4d aom_masked_sad16x4x4d_neon
unsigned int aom_masked_sad16x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x64 aom_masked_sad16x64_c
+unsigned int aom_masked_sad16x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x64 aom_masked_sad16x64_neon
void aom_masked_sad16x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x64x4d aom_masked_sad16x64x4d_c
+void aom_masked_sad16x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x64x4d aom_masked_sad16x64x4d_neon
unsigned int aom_masked_sad16x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad16x8 aom_masked_sad16x8_c
+unsigned int aom_masked_sad16x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad16x8 aom_masked_sad16x8_neon
void aom_masked_sad16x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad16x8x4d aom_masked_sad16x8x4d_c
+void aom_masked_sad16x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad16x8x4d aom_masked_sad16x8x4d_neon
unsigned int aom_masked_sad32x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x16 aom_masked_sad32x16_c
+unsigned int aom_masked_sad32x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x16 aom_masked_sad32x16_neon
void aom_masked_sad32x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x16x4d aom_masked_sad32x16x4d_c
+void aom_masked_sad32x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x16x4d aom_masked_sad32x16x4d_neon
unsigned int aom_masked_sad32x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x32 aom_masked_sad32x32_c
+unsigned int aom_masked_sad32x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x32 aom_masked_sad32x32_neon
void aom_masked_sad32x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x32x4d aom_masked_sad32x32x4d_c
+void aom_masked_sad32x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x32x4d aom_masked_sad32x32x4d_neon
unsigned int aom_masked_sad32x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x64 aom_masked_sad32x64_c
+unsigned int aom_masked_sad32x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x64 aom_masked_sad32x64_neon
void aom_masked_sad32x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x64x4d aom_masked_sad32x64x4d_c
+void aom_masked_sad32x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x64x4d aom_masked_sad32x64x4d_neon
unsigned int aom_masked_sad32x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad32x8 aom_masked_sad32x8_c
+unsigned int aom_masked_sad32x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad32x8 aom_masked_sad32x8_neon
void aom_masked_sad32x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad32x8x4d aom_masked_sad32x8x4d_c
+void aom_masked_sad32x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad32x8x4d aom_masked_sad32x8x4d_neon
unsigned int aom_masked_sad4x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x16 aom_masked_sad4x16_c
+unsigned int aom_masked_sad4x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x16 aom_masked_sad4x16_neon
void aom_masked_sad4x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x16x4d aom_masked_sad4x16x4d_c
+void aom_masked_sad4x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x16x4d aom_masked_sad4x16x4d_neon
unsigned int aom_masked_sad4x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x4 aom_masked_sad4x4_c
+unsigned int aom_masked_sad4x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x4 aom_masked_sad4x4_neon
void aom_masked_sad4x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x4x4d aom_masked_sad4x4x4d_c
+void aom_masked_sad4x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x4x4d aom_masked_sad4x4x4d_neon
unsigned int aom_masked_sad4x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad4x8 aom_masked_sad4x8_c
+unsigned int aom_masked_sad4x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad4x8 aom_masked_sad4x8_neon
void aom_masked_sad4x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad4x8x4d aom_masked_sad4x8x4d_c
+void aom_masked_sad4x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad4x8x4d aom_masked_sad4x8x4d_neon
unsigned int aom_masked_sad64x128_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x128 aom_masked_sad64x128_c
+unsigned int aom_masked_sad64x128_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x128 aom_masked_sad64x128_neon
void aom_masked_sad64x128x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x128x4d aom_masked_sad64x128x4d_c
+void aom_masked_sad64x128x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x128x4d aom_masked_sad64x128x4d_neon
unsigned int aom_masked_sad64x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x16 aom_masked_sad64x16_c
+unsigned int aom_masked_sad64x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x16 aom_masked_sad64x16_neon
void aom_masked_sad64x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x16x4d aom_masked_sad64x16x4d_c
+void aom_masked_sad64x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x16x4d aom_masked_sad64x16x4d_neon
unsigned int aom_masked_sad64x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x32 aom_masked_sad64x32_c
+unsigned int aom_masked_sad64x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x32 aom_masked_sad64x32_neon
void aom_masked_sad64x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x32x4d aom_masked_sad64x32x4d_c
+void aom_masked_sad64x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x32x4d aom_masked_sad64x32x4d_neon
unsigned int aom_masked_sad64x64_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad64x64 aom_masked_sad64x64_c
+unsigned int aom_masked_sad64x64_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad64x64 aom_masked_sad64x64_neon
void aom_masked_sad64x64x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad64x64x4d aom_masked_sad64x64x4d_c
+void aom_masked_sad64x64x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad64x64x4d aom_masked_sad64x64x4d_neon
unsigned int aom_masked_sad8x16_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x16 aom_masked_sad8x16_c
+unsigned int aom_masked_sad8x16_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x16 aom_masked_sad8x16_neon
void aom_masked_sad8x16x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x16x4d aom_masked_sad8x16x4d_c
+void aom_masked_sad8x16x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x16x4d aom_masked_sad8x16x4d_neon
unsigned int aom_masked_sad8x32_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x32 aom_masked_sad8x32_c
+unsigned int aom_masked_sad8x32_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x32 aom_masked_sad8x32_neon
void aom_masked_sad8x32x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x32x4d aom_masked_sad8x32x4d_c
+void aom_masked_sad8x32x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x32x4d aom_masked_sad8x32x4d_neon
unsigned int aom_masked_sad8x4_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x4 aom_masked_sad8x4_c
+unsigned int aom_masked_sad8x4_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x4 aom_masked_sad8x4_neon
void aom_masked_sad8x4x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x4x4d aom_masked_sad8x4x4d_c
+void aom_masked_sad8x4x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x4x4d aom_masked_sad8x4x4d_neon
unsigned int aom_masked_sad8x8_c(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
-#define aom_masked_sad8x8 aom_masked_sad8x8_c
+unsigned int aom_masked_sad8x8_neon(const uint8_t *src, int src_stride, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask);
+#define aom_masked_sad8x8 aom_masked_sad8x8_neon
void aom_masked_sad8x8x4d_c(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
-#define aom_masked_sad8x8x4d aom_masked_sad8x8x4d_c
+void aom_masked_sad8x8x4d_neon(const uint8_t *src, int src_stride, const uint8_t *ref[4], int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned sads[4]);
+#define aom_masked_sad8x8x4d aom_masked_sad8x8x4d_neon
unsigned int aom_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance128x128 aom_masked_sub_pixel_variance128x128_c
+unsigned int aom_masked_sub_pixel_variance128x128_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x128 aom_masked_sub_pixel_variance128x128_neon
unsigned int aom_masked_sub_pixel_variance128x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance128x64 aom_masked_sub_pixel_variance128x64_c
+unsigned int aom_masked_sub_pixel_variance128x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance128x64 aom_masked_sub_pixel_variance128x64_neon
unsigned int aom_masked_sub_pixel_variance16x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x16 aom_masked_sub_pixel_variance16x16_c
+unsigned int aom_masked_sub_pixel_variance16x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x16 aom_masked_sub_pixel_variance16x16_neon
unsigned int aom_masked_sub_pixel_variance16x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x32 aom_masked_sub_pixel_variance16x32_c
+unsigned int aom_masked_sub_pixel_variance16x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x32 aom_masked_sub_pixel_variance16x32_neon
unsigned int aom_masked_sub_pixel_variance16x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x4 aom_masked_sub_pixel_variance16x4_c
+unsigned int aom_masked_sub_pixel_variance16x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x4 aom_masked_sub_pixel_variance16x4_neon
unsigned int aom_masked_sub_pixel_variance16x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x64 aom_masked_sub_pixel_variance16x64_c
+unsigned int aom_masked_sub_pixel_variance16x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x64 aom_masked_sub_pixel_variance16x64_neon
unsigned int aom_masked_sub_pixel_variance16x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance16x8 aom_masked_sub_pixel_variance16x8_c
+unsigned int aom_masked_sub_pixel_variance16x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance16x8 aom_masked_sub_pixel_variance16x8_neon
unsigned int aom_masked_sub_pixel_variance32x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x16 aom_masked_sub_pixel_variance32x16_c
+unsigned int aom_masked_sub_pixel_variance32x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x16 aom_masked_sub_pixel_variance32x16_neon
unsigned int aom_masked_sub_pixel_variance32x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x32 aom_masked_sub_pixel_variance32x32_c
+unsigned int aom_masked_sub_pixel_variance32x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x32 aom_masked_sub_pixel_variance32x32_neon
unsigned int aom_masked_sub_pixel_variance32x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x64 aom_masked_sub_pixel_variance32x64_c
+unsigned int aom_masked_sub_pixel_variance32x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x64 aom_masked_sub_pixel_variance32x64_neon
unsigned int aom_masked_sub_pixel_variance32x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance32x8 aom_masked_sub_pixel_variance32x8_c
+unsigned int aom_masked_sub_pixel_variance32x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance32x8 aom_masked_sub_pixel_variance32x8_neon
unsigned int aom_masked_sub_pixel_variance4x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x16 aom_masked_sub_pixel_variance4x16_c
+unsigned int aom_masked_sub_pixel_variance4x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x16 aom_masked_sub_pixel_variance4x16_neon
unsigned int aom_masked_sub_pixel_variance4x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x4 aom_masked_sub_pixel_variance4x4_c
+unsigned int aom_masked_sub_pixel_variance4x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x4 aom_masked_sub_pixel_variance4x4_neon
unsigned int aom_masked_sub_pixel_variance4x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance4x8 aom_masked_sub_pixel_variance4x8_c
+unsigned int aom_masked_sub_pixel_variance4x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance4x8 aom_masked_sub_pixel_variance4x8_neon
unsigned int aom_masked_sub_pixel_variance64x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x128 aom_masked_sub_pixel_variance64x128_c
+unsigned int aom_masked_sub_pixel_variance64x128_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x128 aom_masked_sub_pixel_variance64x128_neon
unsigned int aom_masked_sub_pixel_variance64x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x16 aom_masked_sub_pixel_variance64x16_c
+unsigned int aom_masked_sub_pixel_variance64x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x16 aom_masked_sub_pixel_variance64x16_neon
unsigned int aom_masked_sub_pixel_variance64x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x32 aom_masked_sub_pixel_variance64x32_c
+unsigned int aom_masked_sub_pixel_variance64x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x32 aom_masked_sub_pixel_variance64x32_neon
unsigned int aom_masked_sub_pixel_variance64x64_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance64x64 aom_masked_sub_pixel_variance64x64_c
+unsigned int aom_masked_sub_pixel_variance64x64_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance64x64 aom_masked_sub_pixel_variance64x64_neon
unsigned int aom_masked_sub_pixel_variance8x16_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x16 aom_masked_sub_pixel_variance8x16_c
+unsigned int aom_masked_sub_pixel_variance8x16_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x16 aom_masked_sub_pixel_variance8x16_neon
unsigned int aom_masked_sub_pixel_variance8x32_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x32 aom_masked_sub_pixel_variance8x32_c
+unsigned int aom_masked_sub_pixel_variance8x32_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x32 aom_masked_sub_pixel_variance8x32_neon
unsigned int aom_masked_sub_pixel_variance8x4_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x4 aom_masked_sub_pixel_variance8x4_c
+unsigned int aom_masked_sub_pixel_variance8x4_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x4 aom_masked_sub_pixel_variance8x4_neon
unsigned int aom_masked_sub_pixel_variance8x8_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
-#define aom_masked_sub_pixel_variance8x8 aom_masked_sub_pixel_variance8x8_c
+unsigned int aom_masked_sub_pixel_variance8x8_neon(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
+#define aom_masked_sub_pixel_variance8x8 aom_masked_sub_pixel_variance8x8_neon
void aom_minmax_8x8_c(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
-#define aom_minmax_8x8 aom_minmax_8x8_c
+void aom_minmax_8x8_neon(const uint8_t *s, int p, const uint8_t *d, int dp, int *min, int *max);
+#define aom_minmax_8x8 aom_minmax_8x8_neon
unsigned int aom_mse16x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
unsigned int aom_mse16x16_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int recon_stride, unsigned int *sse);
@@ -3837,202 +4191,268 @@
#define aom_mse_wxh_16bit_highbd aom_mse_wxh_16bit_highbd_c
unsigned int aom_obmc_sad128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad128x128 aom_obmc_sad128x128_c
+unsigned int aom_obmc_sad128x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x128 aom_obmc_sad128x128_neon
unsigned int aom_obmc_sad128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad128x64 aom_obmc_sad128x64_c
+unsigned int aom_obmc_sad128x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad128x64 aom_obmc_sad128x64_neon
unsigned int aom_obmc_sad16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x16 aom_obmc_sad16x16_c
+unsigned int aom_obmc_sad16x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x16 aom_obmc_sad16x16_neon
unsigned int aom_obmc_sad16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x32 aom_obmc_sad16x32_c
+unsigned int aom_obmc_sad16x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x32 aom_obmc_sad16x32_neon
unsigned int aom_obmc_sad16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x4 aom_obmc_sad16x4_c
+unsigned int aom_obmc_sad16x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x4 aom_obmc_sad16x4_neon
unsigned int aom_obmc_sad16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x64 aom_obmc_sad16x64_c
+unsigned int aom_obmc_sad16x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x64 aom_obmc_sad16x64_neon
unsigned int aom_obmc_sad16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad16x8 aom_obmc_sad16x8_c
+unsigned int aom_obmc_sad16x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad16x8 aom_obmc_sad16x8_neon
unsigned int aom_obmc_sad32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x16 aom_obmc_sad32x16_c
+unsigned int aom_obmc_sad32x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x16 aom_obmc_sad32x16_neon
unsigned int aom_obmc_sad32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x32 aom_obmc_sad32x32_c
+unsigned int aom_obmc_sad32x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x32 aom_obmc_sad32x32_neon
unsigned int aom_obmc_sad32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x64 aom_obmc_sad32x64_c
+unsigned int aom_obmc_sad32x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x64 aom_obmc_sad32x64_neon
unsigned int aom_obmc_sad32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad32x8 aom_obmc_sad32x8_c
+unsigned int aom_obmc_sad32x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad32x8 aom_obmc_sad32x8_neon
unsigned int aom_obmc_sad4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x16 aom_obmc_sad4x16_c
+unsigned int aom_obmc_sad4x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x16 aom_obmc_sad4x16_neon
unsigned int aom_obmc_sad4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x4 aom_obmc_sad4x4_c
+unsigned int aom_obmc_sad4x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x4 aom_obmc_sad4x4_neon
unsigned int aom_obmc_sad4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad4x8 aom_obmc_sad4x8_c
+unsigned int aom_obmc_sad4x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad4x8 aom_obmc_sad4x8_neon
unsigned int aom_obmc_sad64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x128 aom_obmc_sad64x128_c
+unsigned int aom_obmc_sad64x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x128 aom_obmc_sad64x128_neon
unsigned int aom_obmc_sad64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x16 aom_obmc_sad64x16_c
+unsigned int aom_obmc_sad64x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x16 aom_obmc_sad64x16_neon
unsigned int aom_obmc_sad64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x32 aom_obmc_sad64x32_c
+unsigned int aom_obmc_sad64x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x32 aom_obmc_sad64x32_neon
unsigned int aom_obmc_sad64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad64x64 aom_obmc_sad64x64_c
+unsigned int aom_obmc_sad64x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad64x64 aom_obmc_sad64x64_neon
unsigned int aom_obmc_sad8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x16 aom_obmc_sad8x16_c
+unsigned int aom_obmc_sad8x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x16 aom_obmc_sad8x16_neon
unsigned int aom_obmc_sad8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x32 aom_obmc_sad8x32_c
+unsigned int aom_obmc_sad8x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x32 aom_obmc_sad8x32_neon
unsigned int aom_obmc_sad8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x4 aom_obmc_sad8x4_c
+unsigned int aom_obmc_sad8x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x4 aom_obmc_sad8x4_neon
unsigned int aom_obmc_sad8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
-#define aom_obmc_sad8x8 aom_obmc_sad8x8_c
+unsigned int aom_obmc_sad8x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask);
+#define aom_obmc_sad8x8 aom_obmc_sad8x8_neon
unsigned int aom_obmc_sub_pixel_variance128x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance128x128 aom_obmc_sub_pixel_variance128x128_c
+unsigned int aom_obmc_sub_pixel_variance128x128_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x128 aom_obmc_sub_pixel_variance128x128_neon
unsigned int aom_obmc_sub_pixel_variance128x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance128x64 aom_obmc_sub_pixel_variance128x64_c
+unsigned int aom_obmc_sub_pixel_variance128x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance128x64 aom_obmc_sub_pixel_variance128x64_neon
unsigned int aom_obmc_sub_pixel_variance16x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x16 aom_obmc_sub_pixel_variance16x16_c
+unsigned int aom_obmc_sub_pixel_variance16x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x16 aom_obmc_sub_pixel_variance16x16_neon
unsigned int aom_obmc_sub_pixel_variance16x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x32 aom_obmc_sub_pixel_variance16x32_c
+unsigned int aom_obmc_sub_pixel_variance16x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x32 aom_obmc_sub_pixel_variance16x32_neon
unsigned int aom_obmc_sub_pixel_variance16x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x4 aom_obmc_sub_pixel_variance16x4_c
+unsigned int aom_obmc_sub_pixel_variance16x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x4 aom_obmc_sub_pixel_variance16x4_neon
unsigned int aom_obmc_sub_pixel_variance16x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x64 aom_obmc_sub_pixel_variance16x64_c
+unsigned int aom_obmc_sub_pixel_variance16x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x64 aom_obmc_sub_pixel_variance16x64_neon
unsigned int aom_obmc_sub_pixel_variance16x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance16x8 aom_obmc_sub_pixel_variance16x8_c
+unsigned int aom_obmc_sub_pixel_variance16x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance16x8 aom_obmc_sub_pixel_variance16x8_neon
unsigned int aom_obmc_sub_pixel_variance32x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x16 aom_obmc_sub_pixel_variance32x16_c
+unsigned int aom_obmc_sub_pixel_variance32x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x16 aom_obmc_sub_pixel_variance32x16_neon
unsigned int aom_obmc_sub_pixel_variance32x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x32 aom_obmc_sub_pixel_variance32x32_c
+unsigned int aom_obmc_sub_pixel_variance32x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x32 aom_obmc_sub_pixel_variance32x32_neon
unsigned int aom_obmc_sub_pixel_variance32x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x64 aom_obmc_sub_pixel_variance32x64_c
+unsigned int aom_obmc_sub_pixel_variance32x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x64 aom_obmc_sub_pixel_variance32x64_neon
unsigned int aom_obmc_sub_pixel_variance32x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance32x8 aom_obmc_sub_pixel_variance32x8_c
+unsigned int aom_obmc_sub_pixel_variance32x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance32x8 aom_obmc_sub_pixel_variance32x8_neon
unsigned int aom_obmc_sub_pixel_variance4x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x16 aom_obmc_sub_pixel_variance4x16_c
+unsigned int aom_obmc_sub_pixel_variance4x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x16 aom_obmc_sub_pixel_variance4x16_neon
unsigned int aom_obmc_sub_pixel_variance4x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x4 aom_obmc_sub_pixel_variance4x4_c
+unsigned int aom_obmc_sub_pixel_variance4x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x4 aom_obmc_sub_pixel_variance4x4_neon
unsigned int aom_obmc_sub_pixel_variance4x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance4x8 aom_obmc_sub_pixel_variance4x8_c
+unsigned int aom_obmc_sub_pixel_variance4x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance4x8 aom_obmc_sub_pixel_variance4x8_neon
unsigned int aom_obmc_sub_pixel_variance64x128_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x128 aom_obmc_sub_pixel_variance64x128_c
+unsigned int aom_obmc_sub_pixel_variance64x128_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x128 aom_obmc_sub_pixel_variance64x128_neon
unsigned int aom_obmc_sub_pixel_variance64x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x16 aom_obmc_sub_pixel_variance64x16_c
+unsigned int aom_obmc_sub_pixel_variance64x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x16 aom_obmc_sub_pixel_variance64x16_neon
unsigned int aom_obmc_sub_pixel_variance64x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x32 aom_obmc_sub_pixel_variance64x32_c
+unsigned int aom_obmc_sub_pixel_variance64x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x32 aom_obmc_sub_pixel_variance64x32_neon
unsigned int aom_obmc_sub_pixel_variance64x64_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance64x64 aom_obmc_sub_pixel_variance64x64_c
+unsigned int aom_obmc_sub_pixel_variance64x64_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance64x64 aom_obmc_sub_pixel_variance64x64_neon
unsigned int aom_obmc_sub_pixel_variance8x16_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x16 aom_obmc_sub_pixel_variance8x16_c
+unsigned int aom_obmc_sub_pixel_variance8x16_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x16 aom_obmc_sub_pixel_variance8x16_neon
unsigned int aom_obmc_sub_pixel_variance8x32_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x32 aom_obmc_sub_pixel_variance8x32_c
+unsigned int aom_obmc_sub_pixel_variance8x32_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x32 aom_obmc_sub_pixel_variance8x32_neon
unsigned int aom_obmc_sub_pixel_variance8x4_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x4 aom_obmc_sub_pixel_variance8x4_c
+unsigned int aom_obmc_sub_pixel_variance8x4_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x4 aom_obmc_sub_pixel_variance8x4_neon
unsigned int aom_obmc_sub_pixel_variance8x8_c(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_sub_pixel_variance8x8 aom_obmc_sub_pixel_variance8x8_c
+unsigned int aom_obmc_sub_pixel_variance8x8_neon(const uint8_t *pre, int pre_stride, int xoffset, int yoffset, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_sub_pixel_variance8x8 aom_obmc_sub_pixel_variance8x8_neon
unsigned int aom_obmc_variance128x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance128x128 aom_obmc_variance128x128_c
+unsigned int aom_obmc_variance128x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x128 aom_obmc_variance128x128_neon
unsigned int aom_obmc_variance128x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance128x64 aom_obmc_variance128x64_c
+unsigned int aom_obmc_variance128x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance128x64 aom_obmc_variance128x64_neon
unsigned int aom_obmc_variance16x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x16 aom_obmc_variance16x16_c
+unsigned int aom_obmc_variance16x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x16 aom_obmc_variance16x16_neon
unsigned int aom_obmc_variance16x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x32 aom_obmc_variance16x32_c
+unsigned int aom_obmc_variance16x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x32 aom_obmc_variance16x32_neon
unsigned int aom_obmc_variance16x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x4 aom_obmc_variance16x4_c
+unsigned int aom_obmc_variance16x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x4 aom_obmc_variance16x4_neon
unsigned int aom_obmc_variance16x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x64 aom_obmc_variance16x64_c
+unsigned int aom_obmc_variance16x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x64 aom_obmc_variance16x64_neon
unsigned int aom_obmc_variance16x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance16x8 aom_obmc_variance16x8_c
+unsigned int aom_obmc_variance16x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance16x8 aom_obmc_variance16x8_neon
unsigned int aom_obmc_variance32x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x16 aom_obmc_variance32x16_c
+unsigned int aom_obmc_variance32x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x16 aom_obmc_variance32x16_neon
unsigned int aom_obmc_variance32x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x32 aom_obmc_variance32x32_c
+unsigned int aom_obmc_variance32x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x32 aom_obmc_variance32x32_neon
unsigned int aom_obmc_variance32x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x64 aom_obmc_variance32x64_c
+unsigned int aom_obmc_variance32x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x64 aom_obmc_variance32x64_neon
unsigned int aom_obmc_variance32x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance32x8 aom_obmc_variance32x8_c
+unsigned int aom_obmc_variance32x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance32x8 aom_obmc_variance32x8_neon
unsigned int aom_obmc_variance4x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x16 aom_obmc_variance4x16_c
+unsigned int aom_obmc_variance4x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x16 aom_obmc_variance4x16_neon
unsigned int aom_obmc_variance4x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x4 aom_obmc_variance4x4_c
+unsigned int aom_obmc_variance4x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x4 aom_obmc_variance4x4_neon
unsigned int aom_obmc_variance4x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance4x8 aom_obmc_variance4x8_c
+unsigned int aom_obmc_variance4x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance4x8 aom_obmc_variance4x8_neon
unsigned int aom_obmc_variance64x128_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x128 aom_obmc_variance64x128_c
+unsigned int aom_obmc_variance64x128_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x128 aom_obmc_variance64x128_neon
unsigned int aom_obmc_variance64x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x16 aom_obmc_variance64x16_c
+unsigned int aom_obmc_variance64x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x16 aom_obmc_variance64x16_neon
unsigned int aom_obmc_variance64x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x32 aom_obmc_variance64x32_c
+unsigned int aom_obmc_variance64x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x32 aom_obmc_variance64x32_neon
unsigned int aom_obmc_variance64x64_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance64x64 aom_obmc_variance64x64_c
+unsigned int aom_obmc_variance64x64_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance64x64 aom_obmc_variance64x64_neon
unsigned int aom_obmc_variance8x16_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x16 aom_obmc_variance8x16_c
+unsigned int aom_obmc_variance8x16_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x16 aom_obmc_variance8x16_neon
unsigned int aom_obmc_variance8x32_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x32 aom_obmc_variance8x32_c
+unsigned int aom_obmc_variance8x32_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x32 aom_obmc_variance8x32_neon
unsigned int aom_obmc_variance8x4_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x4 aom_obmc_variance8x4_c
+unsigned int aom_obmc_variance8x4_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x4 aom_obmc_variance8x4_neon
unsigned int aom_obmc_variance8x8_c(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
-#define aom_obmc_variance8x8 aom_obmc_variance8x8_c
+unsigned int aom_obmc_variance8x8_neon(const uint8_t *pre, int pre_stride, const int32_t *wsrc, const int32_t *mask, unsigned int *sse);
+#define aom_obmc_variance8x8 aom_obmc_variance8x8_neon
void aom_paeth_predictor_16x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_paeth_predictor_16x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
@@ -4110,9 +4530,6 @@
void aom_paeth_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_neon
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-#define aom_pixel_scale aom_pixel_scale_c
-
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
void aom_quantize_b_neon(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
#define aom_quantize_b aom_quantize_b_neon
@@ -4143,15 +4560,13 @@
#define aom_sad128x128_avg aom_sad128x128_avg_neon
void aom_sad128x128x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad128x128x3d aom_sad128x128x3d_c
+void aom_sad128x128x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad128x128x3d aom_sad128x128x3d_neon
void aom_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad128x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x128x4d aom_sad128x128x4d_neon
-void aom_sad128x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x128x4d_avg aom_sad128x128x4d_avg_c
-
unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad128x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad128x64 aom_sad128x64_neon
@@ -4161,18 +4576,13 @@
#define aom_sad128x64_avg aom_sad128x64_avg_neon
void aom_sad128x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad128x64x3d aom_sad128x64x3d_c
+void aom_sad128x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad128x64x3d aom_sad128x64x3d_neon
void aom_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad128x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x64x4d aom_sad128x64x4d_neon
-void aom_sad128x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x64x4d_avg aom_sad128x64x4d_avg_c
-
-unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad128xh aom_sad128xh_c
-
unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x16 aom_sad16x16_neon
@@ -4182,15 +4592,13 @@
#define aom_sad16x16_avg aom_sad16x16_avg_neon
void aom_sad16x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x16x3d aom_sad16x16x3d_c
+void aom_sad16x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x16x3d aom_sad16x16x3d_neon
void aom_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x16x4d aom_sad16x16x4d_neon
-void aom_sad16x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x16x4d_avg aom_sad16x16x4d_avg_c
-
unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x32 aom_sad16x32_neon
@@ -4200,15 +4608,13 @@
#define aom_sad16x32_avg aom_sad16x32_avg_neon
void aom_sad16x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x32x3d aom_sad16x32x3d_c
+void aom_sad16x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x32x3d aom_sad16x32x3d_neon
void aom_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x32x4d aom_sad16x32x4d_neon
-void aom_sad16x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x32x4d_avg aom_sad16x32x4d_avg_c
-
unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x4 aom_sad16x4_neon
@@ -4218,15 +4624,13 @@
#define aom_sad16x4_avg aom_sad16x4_avg_neon
void aom_sad16x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x4x3d aom_sad16x4x3d_c
+void aom_sad16x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x4x3d aom_sad16x4x3d_neon
void aom_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x4x4d aom_sad16x4x4d_neon
-void aom_sad16x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x4x4d_avg aom_sad16x4x4d_avg_c
-
unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x64 aom_sad16x64_neon
@@ -4236,15 +4640,13 @@
#define aom_sad16x64_avg aom_sad16x64_avg_neon
void aom_sad16x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x64x3d aom_sad16x64x3d_c
+void aom_sad16x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x64x3d aom_sad16x64x3d_neon
void aom_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x64x4d aom_sad16x64x4d_neon
-void aom_sad16x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x64x4d_avg aom_sad16x64x4d_avg_c
-
unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x8 aom_sad16x8_neon
@@ -4254,18 +4656,13 @@
#define aom_sad16x8_avg aom_sad16x8_avg_neon
void aom_sad16x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad16x8x3d aom_sad16x8x3d_c
+void aom_sad16x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad16x8x3d aom_sad16x8x3d_neon
void aom_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad16x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x8x4d aom_sad16x8x4d_neon
-void aom_sad16x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x8x4d_avg aom_sad16x8x4d_avg_c
-
-unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad16xh aom_sad16xh_c
-
unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x16 aom_sad32x16_neon
@@ -4275,15 +4672,13 @@
#define aom_sad32x16_avg aom_sad32x16_avg_neon
void aom_sad32x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x16x3d aom_sad32x16x3d_c
+void aom_sad32x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x16x3d aom_sad32x16x3d_neon
void aom_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x16x4d aom_sad32x16x4d_neon
-void aom_sad32x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x16x4d_avg aom_sad32x16x4d_avg_c
-
unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x32 aom_sad32x32_neon
@@ -4293,15 +4688,13 @@
#define aom_sad32x32_avg aom_sad32x32_avg_neon
void aom_sad32x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x32x3d aom_sad32x32x3d_c
+void aom_sad32x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x32x3d aom_sad32x32x3d_neon
void aom_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x32x4d aom_sad32x32x4d_neon
-void aom_sad32x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x32x4d_avg aom_sad32x32x4d_avg_c
-
unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x64 aom_sad32x64_neon
@@ -4311,15 +4704,13 @@
#define aom_sad32x64_avg aom_sad32x64_avg_neon
void aom_sad32x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x64x3d aom_sad32x64x3d_c
+void aom_sad32x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x64x3d aom_sad32x64x3d_neon
void aom_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x64x4d aom_sad32x64x4d_neon
-void aom_sad32x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x64x4d_avg aom_sad32x64x4d_avg_c
-
unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x8 aom_sad32x8_neon
@@ -4329,18 +4720,13 @@
#define aom_sad32x8_avg aom_sad32x8_avg_neon
void aom_sad32x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad32x8x3d aom_sad32x8x3d_c
+void aom_sad32x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad32x8x3d aom_sad32x8x3d_neon
void aom_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad32x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x8x4d aom_sad32x8x4d_neon
-void aom_sad32x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x8x4d_avg aom_sad32x8x4d_avg_c
-
-unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad32xh aom_sad32xh_c
-
unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x16 aom_sad4x16_neon
@@ -4350,15 +4736,13 @@
#define aom_sad4x16_avg aom_sad4x16_avg_neon
void aom_sad4x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x16x3d aom_sad4x16x3d_c
+void aom_sad4x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x16x3d aom_sad4x16x3d_neon
void aom_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x16x4d aom_sad4x16x4d_neon
-void aom_sad4x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x16x4d_avg aom_sad4x16x4d_avg_c
-
unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x4 aom_sad4x4_neon
@@ -4368,15 +4752,13 @@
#define aom_sad4x4_avg aom_sad4x4_avg_neon
void aom_sad4x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x4x3d aom_sad4x4x3d_c
+void aom_sad4x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x4x3d aom_sad4x4x3d_neon
void aom_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x4x4d aom_sad4x4x4d_neon
-void aom_sad4x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x4x4d_avg aom_sad4x4x4d_avg_c
-
unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x8 aom_sad4x8_neon
@@ -4386,18 +4768,13 @@
#define aom_sad4x8_avg aom_sad4x8_avg_neon
void aom_sad4x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad4x8x3d aom_sad4x8x3d_c
+void aom_sad4x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad4x8x3d aom_sad4x8x3d_neon
void aom_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad4x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x8x4d aom_sad4x8x4d_neon
-void aom_sad4x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x8x4d_avg aom_sad4x8x4d_avg_c
-
-unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad4xh aom_sad4xh_c
-
unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x128 aom_sad64x128_neon
@@ -4407,15 +4784,13 @@
#define aom_sad64x128_avg aom_sad64x128_avg_neon
void aom_sad64x128x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x128x3d aom_sad64x128x3d_c
+void aom_sad64x128x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x128x3d aom_sad64x128x3d_neon
void aom_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x128x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x128x4d aom_sad64x128x4d_neon
-void aom_sad64x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x128x4d_avg aom_sad64x128x4d_avg_c
-
unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x16 aom_sad64x16_neon
@@ -4425,15 +4800,13 @@
#define aom_sad64x16_avg aom_sad64x16_avg_neon
void aom_sad64x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x16x3d aom_sad64x16x3d_c
+void aom_sad64x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x16x3d aom_sad64x16x3d_neon
void aom_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x16x4d aom_sad64x16x4d_neon
-void aom_sad64x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x16x4d_avg aom_sad64x16x4d_avg_c
-
unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x32 aom_sad64x32_neon
@@ -4443,15 +4816,13 @@
#define aom_sad64x32_avg aom_sad64x32_avg_neon
void aom_sad64x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x32x3d aom_sad64x32x3d_c
+void aom_sad64x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x32x3d aom_sad64x32x3d_neon
void aom_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x32x4d aom_sad64x32x4d_neon
-void aom_sad64x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x32x4d_avg aom_sad64x32x4d_avg_c
-
unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x64 aom_sad64x64_neon
@@ -4461,18 +4832,13 @@
#define aom_sad64x64_avg aom_sad64x64_avg_neon
void aom_sad64x64x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad64x64x3d aom_sad64x64x3d_c
+void aom_sad64x64x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad64x64x3d aom_sad64x64x3d_neon
void aom_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad64x64x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x64x4d aom_sad64x64x4d_neon
-void aom_sad64x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x64x4d_avg aom_sad64x64x4d_avg_c
-
-unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad64xh aom_sad64xh_c
-
unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x16_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x16 aom_sad8x16_neon
@@ -4482,15 +4848,13 @@
#define aom_sad8x16_avg aom_sad8x16_avg_neon
void aom_sad8x16x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x16x3d aom_sad8x16x3d_c
+void aom_sad8x16x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x16x3d aom_sad8x16x3d_neon
void aom_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x16x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x16x4d aom_sad8x16x4d_neon
-void aom_sad8x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x16x4d_avg aom_sad8x16x4d_avg_c
-
unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x32_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x32 aom_sad8x32_neon
@@ -4500,15 +4864,13 @@
#define aom_sad8x32_avg aom_sad8x32_avg_neon
void aom_sad8x32x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x32x3d aom_sad8x32x3d_c
+void aom_sad8x32x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x32x3d aom_sad8x32x3d_neon
void aom_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x32x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x32x4d aom_sad8x32x4d_neon
-void aom_sad8x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x32x4d_avg aom_sad8x32x4d_avg_c
-
unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x4 aom_sad8x4_neon
@@ -4518,15 +4880,13 @@
#define aom_sad8x4_avg aom_sad8x4_avg_neon
void aom_sad8x4x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x4x3d aom_sad8x4x3d_c
+void aom_sad8x4x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x4x3d aom_sad8x4x3d_neon
void aom_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x4x4d aom_sad8x4x4d_neon
-void aom_sad8x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x4x4d_avg aom_sad8x4x4d_avg_c
-
unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x8 aom_sad8x8_neon
@@ -4536,18 +4896,13 @@
#define aom_sad8x8_avg aom_sad8x8_avg_neon
void aom_sad8x8x3d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad8x8x3d aom_sad8x8x3d_c
+void aom_sad8x8x3d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad8x8x3d aom_sad8x8x3d_neon
void aom_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
void aom_sad8x8x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x8x4d aom_sad8x8x4d_neon
-void aom_sad8x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x8x4d_avg aom_sad8x8x4d_avg_c
-
-unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad8xh aom_sad8xh_c
-
unsigned int aom_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_128x128_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad_skip_128x128 aom_sad_skip_128x128_neon
@@ -4581,10 +4936,12 @@
#define aom_sad_skip_16x32x4d aom_sad_skip_16x32x4d_neon
unsigned int aom_sad_skip_16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_16x4 aom_sad_skip_16x4_c
+unsigned int aom_sad_skip_16x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_16x4 aom_sad_skip_16x4_neon
void aom_sad_skip_16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_16x4x4d aom_sad_skip_16x4x4d_c
+void aom_sad_skip_16x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_16x4x4d aom_sad_skip_16x4x4d_neon
unsigned int aom_sad_skip_16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_16x64_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -4643,10 +5000,12 @@
#define aom_sad_skip_4x16x4d aom_sad_skip_4x16x4d_neon
unsigned int aom_sad_skip_4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_4x4 aom_sad_skip_4x4_c
+unsigned int aom_sad_skip_4x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_4x4 aom_sad_skip_4x4_neon
void aom_sad_skip_4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_4x4x4d aom_sad_skip_4x4x4d_c
+void aom_sad_skip_4x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_4x4x4d aom_sad_skip_4x4x4d_neon
unsigned int aom_sad_skip_4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_4x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -4705,10 +5064,12 @@
#define aom_sad_skip_8x32x4d aom_sad_skip_8x32x4d_neon
unsigned int aom_sad_skip_8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
-#define aom_sad_skip_8x4 aom_sad_skip_8x4_c
+unsigned int aom_sad_skip_8x4_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
+#define aom_sad_skip_8x4 aom_sad_skip_8x4_neon
void aom_sad_skip_8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
-#define aom_sad_skip_8x4x4d aom_sad_skip_8x4x4d_c
+void aom_sad_skip_8x4x4d_neon(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
+#define aom_sad_skip_8x4x4d aom_sad_skip_8x4x4d_neon
unsigned int aom_sad_skip_8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_8x8_neon(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
@@ -5150,7 +5511,8 @@
#define aom_sum_squares_2d_i16 aom_sum_squares_2d_i16_neon
uint64_t aom_sum_squares_i16_c(const int16_t *src, uint32_t N);
-#define aom_sum_squares_i16 aom_sum_squares_i16_c
+uint64_t aom_sum_squares_i16_neon(const int16_t *src, uint32_t N);
+#define aom_sum_squares_i16 aom_sum_squares_i16_neon
uint64_t aom_sum_sse_2d_i16_c(const int16_t *src, int src_stride, int width, int height, int *sum);
uint64_t aom_sum_sse_2d_i16_neon(const int16_t *src, int src_stride, int width, int height, int *sum);
@@ -5161,67 +5523,84 @@
#define aom_v_predictor_16x16 aom_v_predictor_16x16_neon
void aom_v_predictor_16x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x32 aom_v_predictor_16x32_c
+void aom_v_predictor_16x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x32 aom_v_predictor_16x32_neon
void aom_v_predictor_16x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x4 aom_v_predictor_16x4_c
+void aom_v_predictor_16x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x4 aom_v_predictor_16x4_neon
void aom_v_predictor_16x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x64 aom_v_predictor_16x64_c
+void aom_v_predictor_16x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x64 aom_v_predictor_16x64_neon
void aom_v_predictor_16x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_16x8 aom_v_predictor_16x8_c
+void aom_v_predictor_16x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_16x8 aom_v_predictor_16x8_neon
void aom_v_predictor_32x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x16 aom_v_predictor_32x16_c
+void aom_v_predictor_32x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x16 aom_v_predictor_32x16_neon
void aom_v_predictor_32x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_32x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_32x32 aom_v_predictor_32x32_neon
void aom_v_predictor_32x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x64 aom_v_predictor_32x64_c
+void aom_v_predictor_32x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x64 aom_v_predictor_32x64_neon
void aom_v_predictor_32x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_32x8 aom_v_predictor_32x8_c
+void aom_v_predictor_32x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_32x8 aom_v_predictor_32x8_neon
void aom_v_predictor_4x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_4x16 aom_v_predictor_4x16_c
+void aom_v_predictor_4x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x16 aom_v_predictor_4x16_neon
void aom_v_predictor_4x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_4x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_4x4 aom_v_predictor_4x4_neon
void aom_v_predictor_4x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_4x8 aom_v_predictor_4x8_c
+void aom_v_predictor_4x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_4x8 aom_v_predictor_4x8_neon
void aom_v_predictor_64x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x16 aom_v_predictor_64x16_c
+void aom_v_predictor_64x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x16 aom_v_predictor_64x16_neon
void aom_v_predictor_64x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x32 aom_v_predictor_64x32_c
+void aom_v_predictor_64x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x32 aom_v_predictor_64x32_neon
void aom_v_predictor_64x64_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_64x64 aom_v_predictor_64x64_c
+void aom_v_predictor_64x64_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_64x64 aom_v_predictor_64x64_neon
void aom_v_predictor_8x16_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x16 aom_v_predictor_8x16_c
+void aom_v_predictor_8x16_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x16 aom_v_predictor_8x16_neon
void aom_v_predictor_8x32_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x32 aom_v_predictor_8x32_c
+void aom_v_predictor_8x32_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x32 aom_v_predictor_8x32_neon
void aom_v_predictor_8x4_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
-#define aom_v_predictor_8x4 aom_v_predictor_8x4_c
+void aom_v_predictor_8x4_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
+#define aom_v_predictor_8x4 aom_v_predictor_8x4_neon
void aom_v_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
void aom_v_predictor_8x8_neon(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_v_predictor_8x8 aom_v_predictor_8x8_neon
uint64_t aom_var_2d_u16_c(uint8_t *src, int src_stride, int width, int height);
-#define aom_var_2d_u16 aom_var_2d_u16_c
+uint64_t aom_var_2d_u16_neon(uint8_t *src, int src_stride, int width, int height);
+#define aom_var_2d_u16 aom_var_2d_u16_neon
uint64_t aom_var_2d_u8_c(uint8_t *src, int src_stride, int width, int height);
-#define aom_var_2d_u8 aom_var_2d_u8_c
+uint64_t aom_var_2d_u8_neon(uint8_t *src, int src_stride, int width, int height);
+#define aom_var_2d_u8 aom_var_2d_u8_neon
unsigned int aom_variance128x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
unsigned int aom_variance128x128_neon(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -5324,7 +5703,7 @@
int aom_vector_var_neon(const int16_t *ref, const int16_t *src, int bwl);
#define aom_vector_var aom_vector_var_neon
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2);
#define av1_compute_cross_correlation av1_compute_cross_correlation_c
void aom_dsp_rtcd(void);
diff --git a/config/arm64/config/aom_scale_rtcd.h b/config/arm64/config/aom_scale_rtcd.h
index df4b96f..d296957 100644
--- a/config/arm64/config/aom_scale_rtcd.h
+++ b/config/arm64/config/aom_scale_rtcd.h
@@ -80,7 +80,7 @@
void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
-int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes);
+int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes);
#define aom_yv12_realloc_with_new_border aom_yv12_realloc_with_new_border_c
void aom_scale_rtcd(void);
diff --git a/config/arm64/config/av1_rtcd.h b/config/arm64/config/av1_rtcd.h
index 964bb72..1a3fa19 100644
--- a/config/arm64/config/av1_rtcd.h
+++ b/config/arm64/config/av1_rtcd.h
@@ -15,12 +15,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -80,14 +80,11 @@
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
int ref_stride, int subpel_search);
-#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_c
-
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_c
+void aom_comp_avg_upsampled_pred_neon(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+ const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
+ int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
+ int ref_stride, int subpel_search);
+#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_neon
void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
@@ -118,14 +115,17 @@
void aom_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search);
-#define aom_upsampled_pred aom_upsampled_pred_c
+void aom_upsampled_pred_neon(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+ const MV *const mv, uint8_t *comp_pred, int width, int height, int subpel_x_q3,
+ int subpel_y_q3, const uint8_t *ref, int ref_stride, int subpel_search);
+#define aom_upsampled_pred aom_upsampled_pred_neon
void av1_apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
void av1_apply_selfguided_restoration_neon(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
#define av1_apply_selfguided_restoration av1_apply_selfguided_restoration_neon
-void av1_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_apply_temporal_filter_neon(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_neon(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_apply_temporal_filter av1_apply_temporal_filter_neon
int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
@@ -150,10 +150,12 @@
#define av1_calc_frame_error av1_calc_frame_error_c
void av1_calc_indices_dim1_c(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
-#define av1_calc_indices_dim1 av1_calc_indices_dim1_c
+void av1_calc_indices_dim1_neon(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
+#define av1_calc_indices_dim1 av1_calc_indices_dim1_neon
void av1_calc_indices_dim2_c(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
-#define av1_calc_indices_dim2 av1_calc_indices_dim2_c
+void av1_calc_indices_dim2_neon(const int16_t *data, const int16_t *centroids, uint8_t *indices, int64_t *total_dist, int n, int k);
+#define av1_calc_indices_dim2 av1_calc_indices_dim2_neon
void av1_calc_proj_params_c( const uint8_t *src8, int width, int height, int src_stride, const uint8_t *dat8, int dat_stride, int32_t *flt0, int flt0_stride, int32_t *flt1, int flt1_stride, int64_t H[2][2], int64_t C[2], const sgr_params_type *params);
#define av1_calc_proj_params av1_calc_proj_params_c
@@ -179,7 +181,7 @@
bool av1_cnn_predict_c( const float **input, int in_width, int in_height, int in_stride, const CNN_CONFIG *cnn_config, const CNN_THREAD_DATA *thread_data, CNN_MULTI_OUT *output_struct);
#define av1_cnn_predict av1_cnn_predict_c
-void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
#define av1_compute_stats av1_compute_stats_c
void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
@@ -231,6 +233,9 @@
void av1_dr_prediction_z3_neon(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
#define av1_dr_prediction_z3 av1_dr_prediction_z3_neon
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height, int width, int stride, int edge_thresh);
+#define av1_estimate_noise_from_single_plane av1_estimate_noise_from_single_plane_c
+
void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
#define av1_filter_intra_edge av1_filter_intra_edge_c
@@ -332,7 +337,7 @@
void av1_get_nz_map_contexts_neon(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
#define av1_get_nz_map_contexts av1_get_nz_map_contexts_neon
-void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_c
int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
@@ -348,10 +353,12 @@
#define av1_highbd_convolve8_vert av1_highbd_convolve8_vert_c
void av1_highbd_convolve_2d_scale_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int x_step_qn, const int subpel_y_qn, const int y_step_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_2d_scale av1_highbd_convolve_2d_scale_c
+void av1_highbd_convolve_2d_scale_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int x_step_qn, const int subpel_y_qn, const int y_step_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_scale av1_highbd_convolve_2d_scale_neon
void av1_highbd_convolve_2d_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_2d_sr av1_highbd_convolve_2d_sr_c
+void av1_highbd_convolve_2d_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_2d_sr av1_highbd_convolve_2d_sr_neon
void av1_highbd_convolve_avg_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps);
#define av1_highbd_convolve_avg av1_highbd_convolve_avg_c
@@ -360,25 +367,32 @@
#define av1_highbd_convolve_copy av1_highbd_convolve_copy_c
void av1_highbd_convolve_horiz_rs_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd);
-#define av1_highbd_convolve_horiz_rs av1_highbd_convolve_horiz_rs_c
+void av1_highbd_convolve_horiz_rs_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const int16_t *x_filters, int x0_qn, int x_step_qn, int bd);
+#define av1_highbd_convolve_horiz_rs av1_highbd_convolve_horiz_rs_neon
void av1_highbd_convolve_x_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_convolve_x_sr av1_highbd_convolve_x_sr_c
+void av1_highbd_convolve_x_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_convolve_x_sr av1_highbd_convolve_x_sr_neon
void av1_highbd_convolve_y_sr_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, int bd);
-#define av1_highbd_convolve_y_sr av1_highbd_convolve_y_sr_c
+void av1_highbd_convolve_y_sr_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, int bd);
+#define av1_highbd_convolve_y_sr av1_highbd_convolve_y_sr_neon
void av1_highbd_dist_wtd_convolve_2d_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_2d av1_highbd_dist_wtd_convolve_2d_c
+void av1_highbd_dist_wtd_convolve_2d_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const InterpFilterParams *filter_params_y, const int subpel_x_qn, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d av1_highbd_dist_wtd_convolve_2d_neon
void av1_highbd_dist_wtd_convolve_2d_copy_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_2d_copy av1_highbd_dist_wtd_convolve_2d_copy_c
+void av1_highbd_dist_wtd_convolve_2d_copy_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_2d_copy av1_highbd_dist_wtd_convolve_2d_copy_neon
void av1_highbd_dist_wtd_convolve_x_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_x av1_highbd_dist_wtd_convolve_x_c
+void av1_highbd_dist_wtd_convolve_x_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_x, const int subpel_x_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_x av1_highbd_dist_wtd_convolve_x_neon
void av1_highbd_dist_wtd_convolve_y_c(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
-#define av1_highbd_dist_wtd_convolve_y av1_highbd_dist_wtd_convolve_y_c
+void av1_highbd_dist_wtd_convolve_y_neon(const uint16_t *src, int src_stride, uint16_t *dst, int dst_stride, int w, int h, const InterpFilterParams *filter_params_y, const int subpel_y_qn, ConvolveParams *conv_params, int bd);
+#define av1_highbd_dist_wtd_convolve_y av1_highbd_dist_wtd_convolve_y_neon
void av1_highbd_dr_prediction_z1_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_above, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z1 av1_highbd_dr_prediction_z1_c
@@ -389,9 +403,8 @@
void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
-void av1_highbd_fwht4x4_neon(const int16_t *input, tran_low_t *output, int stride);
-#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_neon
+double av1_highbd_estimate_noise_from_single_plane_c(const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh);
+#define av1_highbd_estimate_noise_from_single_plane av1_highbd_estimate_noise_from_single_plane_c
void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
void av1_highbd_inv_txfm_add_neon(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
diff --git a/config/config/aom_version.h b/config/config/aom_version.h
index 4586009..c3705db 100644
--- a/config/config/aom_version.h
+++ b/config/config/aom_version.h
@@ -10,10 +10,10 @@
*/
#define VERSION_MAJOR 3
-#define VERSION_MINOR 6
-#define VERSION_PATCH 1
-#define VERSION_EXTRA "226-g5cf4c68cb3"
+#define VERSION_MINOR 7
+#define VERSION_PATCH 0
+#define VERSION_EXTRA "273-g722272fc9"
#define VERSION_PACKED \
((VERSION_MAJOR << 16) | (VERSION_MINOR << 8) | (VERSION_PATCH))
-#define VERSION_STRING_NOSP "3.6.1-226-g5cf4c68cb3"
-#define VERSION_STRING " 3.6.1-226-g5cf4c68cb3"
+#define VERSION_STRING_NOSP "3.7.0-273-g722272fc9"
+#define VERSION_STRING " 3.7.0-273-g722272fc9"
diff --git a/config/riscv64/config/aom_config.asm b/config/riscv64/config/aom_config.asm
index b9c668a..02ff408 100644
--- a/config/riscv64/config/aom_config.asm
+++ b/config/riscv64/config/aom_config.asm
@@ -8,10 +8,11 @@
; Media Patent License 1.0 was not distributed with this source code in the
; PATENTS file, you can obtain it at www.aomedia.org/license/patent.
;
-ARCH_ARM equ 0
-ARCH_PPC equ 0
-ARCH_X86 equ 0
-ARCH_X86_64 equ 0
+AOM_ARCH_AARCH64 equ 0
+AOM_ARCH_ARM equ 0
+AOM_ARCH_PPC equ 0
+AOM_ARCH_X86 equ 0
+AOM_ARCH_X86_64 equ 0
CONFIG_ACCOUNTING equ 0
CONFIG_ANALYZER equ 0
CONFIG_AV1_DECODER equ 1
@@ -47,6 +48,7 @@
CONFIG_NORMAL_TILE_MODE equ 1
CONFIG_OPTICAL_FLOW_API equ 0
CONFIG_OS_SUPPORT equ 1
+CONFIG_OUTPUT_FRAME_SIZE equ 0
CONFIG_PARTITION_SEARCH_ORDER equ 0
CONFIG_PIC equ 1
CONFIG_RATECTRL_LOG equ 0
@@ -55,6 +57,7 @@
CONFIG_REALTIME_ONLY equ 0
CONFIG_RT_ML_PARTITIONING equ 0
CONFIG_RUNTIME_CPU_DETECT equ 0
+CONFIG_SALIENCY_MAP equ 0
CONFIG_SHARED equ 0
CONFIG_SIZE_LIMIT equ 1
CONFIG_SPATIAL_RESAMPLING equ 1
diff --git a/config/riscv64/config/aom_config.c b/config/riscv64/config/aom_config.c
index 14ddb81..07609ac 100644
--- a/config/riscv64/config/aom_config.c
+++ b/config/riscv64/config/aom_config.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/config/riscv64/config/aom_config.h b/config/riscv64/config/aom_config.h
index e629873..91b6249 100644
--- a/config/riscv64/config/aom_config.h
+++ b/config/riscv64/config/aom_config.h
@@ -10,10 +10,11 @@
*/
#ifndef AOM_CONFIG_H_
#define AOM_CONFIG_H_
-#define ARCH_ARM 0
-#define ARCH_PPC 0
-#define ARCH_X86 0
-#define ARCH_X86_64 0
+#define AOM_ARCH_AARCH64 0
+#define AOM_ARCH_ARM 0
+#define AOM_ARCH_PPC 0
+#define AOM_ARCH_X86 0
+#define AOM_ARCH_X86_64 0
#define CONFIG_ACCOUNTING 0
#define CONFIG_ANALYZER 0
#define CONFIG_AV1_DECODER 1
@@ -49,6 +50,7 @@
#define CONFIG_NORMAL_TILE_MODE 1
#define CONFIG_OPTICAL_FLOW_API 0
#define CONFIG_OS_SUPPORT 1
+#define CONFIG_OUTPUT_FRAME_SIZE 0
#define CONFIG_PARTITION_SEARCH_ORDER 0
#define CONFIG_PIC 1
#define CONFIG_RATECTRL_LOG 0
@@ -57,6 +59,7 @@
#define CONFIG_REALTIME_ONLY 0
#define CONFIG_RT_ML_PARTITIONING 0
#define CONFIG_RUNTIME_CPU_DETECT 0
+#define CONFIG_SALIENCY_MAP 0
#define CONFIG_SHARED 0
#define CONFIG_SIZE_LIMIT 1
#define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/riscv64/config/aom_dsp_rtcd.h b/config/riscv64/config/aom_dsp_rtcd.h
index 4a9a683..e724d0d 100644
--- a/config/riscv64/config/aom_dsp_rtcd.h
+++ b/config/riscv64/config/aom_dsp_rtcd.h
@@ -14,8 +14,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
#ifdef __cplusplus
@@ -46,6 +46,9 @@
void aom_comp_mask_pred_c(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
#define aom_comp_mask_pred aom_comp_mask_pred_c
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v);
+#define aom_compute_flow_at_point aom_compute_flow_at_point_c
+
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h);
#define aom_convolve8 aom_convolve8_c
@@ -427,9 +430,6 @@
void aom_fdct4x4_lp_c(const int16_t *input, int16_t *output, int stride);
#define aom_fdct4x4_lp aom_fdct4x4_lp_c
-void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-#define aom_fdct8x8 aom_fdct8x8_c
-
void aom_fft16x16_float_c(const float *input, float *temp, float *output);
#define aom_fft16x16_float aom_fft16x16_float_c
@@ -445,15 +445,6 @@
void aom_fft8x8_float_c(const float *input, float *temp, float *output);
#define aom_fft8x8_float aom_fft8x8_float_c
-void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get16x16var aom_get16x16var_c
-
-unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-#define aom_get4x4sse_cs aom_get4x4sse_cs_c
-
-void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get8x8var aom_get8x8var_c
-
void aom_get_blk_sse_sum_c(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
#define aom_get_blk_sse_sum aom_get_blk_sse_sum_c
@@ -610,12 +601,6 @@
uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
-
-void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
-
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_c
@@ -970,10 +955,10 @@
unsigned int aom_highbd_10_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_c
-unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_c
-unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_c
unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -994,10 +979,10 @@
unsigned int aom_highbd_10_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_c
-unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_c
-unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_c
unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1012,7 +997,7 @@
unsigned int aom_highbd_10_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_c
-unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_c
unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1024,7 +1009,7 @@
unsigned int aom_highbd_10_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_c
-unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_c
unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1099,12 +1084,6 @@
uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
-
-void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
-
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_c
@@ -1459,10 +1438,10 @@
unsigned int aom_highbd_12_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_c
-unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
-unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_c
unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1483,10 +1462,10 @@
unsigned int aom_highbd_12_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_c
-unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_c
-unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1501,7 +1480,7 @@
unsigned int aom_highbd_12_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_c
-unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_c
unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1513,7 +1492,7 @@
unsigned int aom_highbd_12_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_c
-unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_c
unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1588,12 +1567,6 @@
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
-
-void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
-
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_c
@@ -1816,10 +1789,10 @@
unsigned int aom_highbd_8_variance16x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_c
-unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
-unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_c
unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1840,10 +1813,10 @@
unsigned int aom_highbd_8_variance32x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_c
-unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_c
-unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1858,7 +1831,7 @@
unsigned int aom_highbd_8_variance64x128_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_c
-unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_c
unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1870,7 +1843,7 @@
unsigned int aom_highbd_8_variance8x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_c
-unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_c
unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2209,9 +2182,6 @@
unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
-void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_c
-
void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_c
@@ -3874,9 +3844,6 @@
void aom_paeth_predictor_8x8_c(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_c
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-#define aom_pixel_scale aom_pixel_scale_c
-
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
#define aom_quantize_b aom_quantize_b_c
@@ -3907,9 +3874,6 @@
void aom_sad128x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x128x4d aom_sad128x128x4d_c
-void aom_sad128x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x128x4d_avg aom_sad128x128x4d_avg_c
-
unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad128x64 aom_sad128x64_c
@@ -3922,12 +3886,6 @@
void aom_sad128x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x64x4d aom_sad128x64x4d_c
-void aom_sad128x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x64x4d_avg aom_sad128x64x4d_avg_c
-
-unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad128xh aom_sad128xh_c
-
unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x16 aom_sad16x16_c
@@ -3940,9 +3898,6 @@
void aom_sad16x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x16x4d aom_sad16x16x4d_c
-void aom_sad16x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x16x4d_avg aom_sad16x16x4d_avg_c
-
unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x32 aom_sad16x32_c
@@ -3955,9 +3910,6 @@
void aom_sad16x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x32x4d aom_sad16x32x4d_c
-void aom_sad16x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x32x4d_avg aom_sad16x32x4d_avg_c
-
unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x4 aom_sad16x4_c
@@ -3970,9 +3922,6 @@
void aom_sad16x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x4x4d aom_sad16x4x4d_c
-void aom_sad16x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x4x4d_avg aom_sad16x4x4d_avg_c
-
unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x64 aom_sad16x64_c
@@ -3985,9 +3934,6 @@
void aom_sad16x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x64x4d aom_sad16x64x4d_c
-void aom_sad16x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x64x4d_avg aom_sad16x64x4d_avg_c
-
unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x8 aom_sad16x8_c
@@ -4000,12 +3946,6 @@
void aom_sad16x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x8x4d aom_sad16x8x4d_c
-void aom_sad16x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x8x4d_avg aom_sad16x8x4d_avg_c
-
-unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad16xh aom_sad16xh_c
-
unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x16 aom_sad32x16_c
@@ -4018,9 +3958,6 @@
void aom_sad32x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x16x4d aom_sad32x16x4d_c
-void aom_sad32x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x16x4d_avg aom_sad32x16x4d_avg_c
-
unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x32 aom_sad32x32_c
@@ -4033,9 +3970,6 @@
void aom_sad32x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x32x4d aom_sad32x32x4d_c
-void aom_sad32x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x32x4d_avg aom_sad32x32x4d_avg_c
-
unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x64 aom_sad32x64_c
@@ -4048,9 +3982,6 @@
void aom_sad32x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x64x4d aom_sad32x64x4d_c
-void aom_sad32x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x64x4d_avg aom_sad32x64x4d_avg_c
-
unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x8 aom_sad32x8_c
@@ -4063,12 +3994,6 @@
void aom_sad32x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x8x4d aom_sad32x8x4d_c
-void aom_sad32x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x8x4d_avg aom_sad32x8x4d_avg_c
-
-unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad32xh aom_sad32xh_c
-
unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x16 aom_sad4x16_c
@@ -4081,9 +4006,6 @@
void aom_sad4x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x16x4d aom_sad4x16x4d_c
-void aom_sad4x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x16x4d_avg aom_sad4x16x4d_avg_c
-
unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x4 aom_sad4x4_c
@@ -4096,9 +4018,6 @@
void aom_sad4x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x4x4d aom_sad4x4x4d_c
-void aom_sad4x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x4x4d_avg aom_sad4x4x4d_avg_c
-
unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x8 aom_sad4x8_c
@@ -4111,12 +4030,6 @@
void aom_sad4x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x8x4d aom_sad4x8x4d_c
-void aom_sad4x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x8x4d_avg aom_sad4x8x4d_avg_c
-
-unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad4xh aom_sad4xh_c
-
unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x128 aom_sad64x128_c
@@ -4129,9 +4042,6 @@
void aom_sad64x128x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x128x4d aom_sad64x128x4d_c
-void aom_sad64x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x128x4d_avg aom_sad64x128x4d_avg_c
-
unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x16 aom_sad64x16_c
@@ -4144,9 +4054,6 @@
void aom_sad64x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x16x4d aom_sad64x16x4d_c
-void aom_sad64x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x16x4d_avg aom_sad64x16x4d_avg_c
-
unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x32 aom_sad64x32_c
@@ -4159,9 +4066,6 @@
void aom_sad64x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x32x4d aom_sad64x32x4d_c
-void aom_sad64x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x32x4d_avg aom_sad64x32x4d_avg_c
-
unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x64 aom_sad64x64_c
@@ -4174,12 +4078,6 @@
void aom_sad64x64x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x64x4d aom_sad64x64x4d_c
-void aom_sad64x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x64x4d_avg aom_sad64x64x4d_avg_c
-
-unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad64xh aom_sad64xh_c
-
unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x16 aom_sad8x16_c
@@ -4192,9 +4090,6 @@
void aom_sad8x16x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x16x4d aom_sad8x16x4d_c
-void aom_sad8x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x16x4d_avg aom_sad8x16x4d_avg_c
-
unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x32 aom_sad8x32_c
@@ -4207,9 +4102,6 @@
void aom_sad8x32x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x32x4d aom_sad8x32x4d_c
-void aom_sad8x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x32x4d_avg aom_sad8x32x4d_avg_c
-
unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x4 aom_sad8x4_c
@@ -4222,9 +4114,6 @@
void aom_sad8x4x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x4x4d aom_sad8x4x4d_c
-void aom_sad8x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x4x4d_avg aom_sad8x4x4d_avg_c
-
unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x8 aom_sad8x8_c
@@ -4237,12 +4126,6 @@
void aom_sad8x8x4d_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x8x4d aom_sad8x8x4d_c
-void aom_sad8x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x8x4d_avg aom_sad8x8x4d_avg_c
-
-unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad8xh aom_sad8xh_c
-
unsigned int aom_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad_skip_128x128 aom_sad_skip_128x128_c
@@ -4846,7 +4729,7 @@
int aom_vector_var_c(const int16_t *ref, const int16_t *src, int bwl);
#define aom_vector_var aom_vector_var_c
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2);
#define av1_compute_cross_correlation av1_compute_cross_correlation_c
void aom_dsp_rtcd(void);
diff --git a/config/riscv64/config/aom_scale_rtcd.h b/config/riscv64/config/aom_scale_rtcd.h
index 69d50c9..733b2d9 100644
--- a/config/riscv64/config/aom_scale_rtcd.h
+++ b/config/riscv64/config/aom_scale_rtcd.h
@@ -80,7 +80,7 @@
void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
-int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes);
+int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes);
#define aom_yv12_realloc_with_new_border aom_yv12_realloc_with_new_border_c
void aom_scale_rtcd(void);
diff --git a/config/riscv64/config/av1_rtcd.h b/config/riscv64/config/av1_rtcd.h
index 01da23f..3d406ef 100644
--- a/config/riscv64/config/av1_rtcd.h
+++ b/config/riscv64/config/av1_rtcd.h
@@ -15,12 +15,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -82,13 +82,6 @@
int ref_stride, int subpel_search);
#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_c
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_c
-
void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
@@ -122,7 +115,7 @@
void av1_apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
#define av1_apply_selfguided_restoration av1_apply_selfguided_restoration_c
-void av1_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_apply_temporal_filter av1_apply_temporal_filter_c
int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
@@ -173,7 +166,7 @@
bool av1_cnn_predict_c( const float **input, int in_width, int in_height, int in_stride, const CNN_CONFIG *cnn_config, const CNN_THREAD_DATA *thread_data, CNN_MULTI_OUT *output_struct);
#define av1_cnn_predict av1_cnn_predict_c
-void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
#define av1_compute_stats av1_compute_stats_c
void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
@@ -215,6 +208,9 @@
void av1_dr_prediction_z3_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
#define av1_dr_prediction_z3 av1_dr_prediction_z3_c
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height, int width, int stride, int edge_thresh);
+#define av1_estimate_noise_from_single_plane av1_estimate_noise_from_single_plane_c
+
void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
#define av1_filter_intra_edge av1_filter_intra_edge_c
@@ -293,7 +289,7 @@
void av1_get_nz_map_contexts_c(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
#define av1_get_nz_map_contexts av1_get_nz_map_contexts_c
-void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_c
int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
@@ -350,8 +346,8 @@
void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
-#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_c
+double av1_highbd_estimate_noise_from_single_plane_c(const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh);
+#define av1_highbd_estimate_noise_from_single_plane av1_highbd_estimate_noise_from_single_plane_c
void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
#define av1_highbd_inv_txfm_add av1_highbd_inv_txfm_add_c
diff --git a/config/x86/config/aom_config.asm b/config/x86/config/aom_config.asm
index 0a256ea..e01202e 100644
--- a/config/x86/config/aom_config.asm
+++ b/config/x86/config/aom_config.asm
@@ -1,7 +1,8 @@
-%define ARCH_ARM 0
-%define ARCH_PPC 0
-%define ARCH_X86 1
-%define ARCH_X86_64 0
+%define AOM_ARCH_AARCH64 0
+%define AOM_ARCH_ARM 0
+%define AOM_ARCH_PPC 0
+%define AOM_ARCH_X86 1
+%define AOM_ARCH_X86_64 0
%define CONFIG_ACCOUNTING 0
%define CONFIG_ANALYZER 0
%define CONFIG_AV1_DECODER 1
@@ -37,6 +38,7 @@
%define CONFIG_NORMAL_TILE_MODE 1
%define CONFIG_OPTICAL_FLOW_API 0
%define CONFIG_OS_SUPPORT 1
+%define CONFIG_OUTPUT_FRAME_SIZE 0
%define CONFIG_PARTITION_SEARCH_ORDER 0
%define CONFIG_PIC 1
%define CONFIG_RATECTRL_LOG 0
@@ -45,6 +47,7 @@
%define CONFIG_REALTIME_ONLY 0
%define CONFIG_RT_ML_PARTITIONING 0
%define CONFIG_RUNTIME_CPU_DETECT 0
+%define CONFIG_SALIENCY_MAP 0
%define CONFIG_SHARED 0
%define CONFIG_SIZE_LIMIT 1
%define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/x86/config/aom_config.c b/config/x86/config/aom_config.c
index d81f6b9..9873194 100644
--- a/config/x86/config/aom_config.c
+++ b/config/x86/config/aom_config.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/config/x86/config/aom_config.h b/config/x86/config/aom_config.h
index 55e58ee..502262e 100644
--- a/config/x86/config/aom_config.h
+++ b/config/x86/config/aom_config.h
@@ -10,10 +10,11 @@
*/
#ifndef AOM_CONFIG_H_
#define AOM_CONFIG_H_
-#define ARCH_ARM 0
-#define ARCH_PPC 0
-#define ARCH_X86 1
-#define ARCH_X86_64 0
+#define AOM_ARCH_AARCH64 0
+#define AOM_ARCH_ARM 0
+#define AOM_ARCH_PPC 0
+#define AOM_ARCH_X86 1
+#define AOM_ARCH_X86_64 0
#define CONFIG_ACCOUNTING 0
#define CONFIG_ANALYZER 0
#define CONFIG_AV1_DECODER 1
@@ -49,6 +50,7 @@
#define CONFIG_NORMAL_TILE_MODE 1
#define CONFIG_OPTICAL_FLOW_API 0
#define CONFIG_OS_SUPPORT 1
+#define CONFIG_OUTPUT_FRAME_SIZE 0
#define CONFIG_PARTITION_SEARCH_ORDER 0
#define CONFIG_PIC 1
#define CONFIG_RATECTRL_LOG 0
@@ -57,6 +59,7 @@
#define CONFIG_REALTIME_ONLY 0
#define CONFIG_RT_ML_PARTITIONING 0
#define CONFIG_RUNTIME_CPU_DETECT 0
+#define CONFIG_SALIENCY_MAP 0
#define CONFIG_SHARED 0
#define CONFIG_SIZE_LIMIT 1
#define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/x86/config/aom_dsp_rtcd.h b/config/x86/config/aom_dsp_rtcd.h
index a259b8f..4521b9d 100644
--- a/config/x86/config/aom_dsp_rtcd.h
+++ b/config/x86/config/aom_dsp_rtcd.h
@@ -14,8 +14,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
#ifdef __cplusplus
@@ -50,6 +50,9 @@
void aom_comp_mask_pred_ssse3(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
#define aom_comp_mask_pred aom_comp_mask_pred_ssse3
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v);
+#define aom_compute_flow_at_point aom_compute_flow_at_point_c
+
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h);
#define aom_convolve8 aom_convolve8_c
@@ -376,92 +379,92 @@
#define aom_dist_wtd_comp_avg_pred aom_dist_wtd_comp_avg_pred_ssse3
unsigned int aom_dist_wtd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad128x128_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad128x128_avg aom_dist_wtd_sad128x128_avg_ssse3
+unsigned int aom_dist_wtd_sad128x128_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x128_avg aom_dist_wtd_sad128x128_avg_sse2
unsigned int aom_dist_wtd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad128x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad128x64_avg aom_dist_wtd_sad128x64_avg_ssse3
+unsigned int aom_dist_wtd_sad128x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x64_avg aom_dist_wtd_sad128x64_avg_sse2
unsigned int aom_dist_wtd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x16_avg aom_dist_wtd_sad16x16_avg_ssse3
+unsigned int aom_dist_wtd_sad16x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x16_avg aom_dist_wtd_sad16x16_avg_sse2
unsigned int aom_dist_wtd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x32_avg aom_dist_wtd_sad16x32_avg_ssse3
+unsigned int aom_dist_wtd_sad16x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x32_avg aom_dist_wtd_sad16x32_avg_sse2
unsigned int aom_dist_wtd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x4_avg aom_dist_wtd_sad16x4_avg_ssse3
+unsigned int aom_dist_wtd_sad16x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x4_avg aom_dist_wtd_sad16x4_avg_sse2
unsigned int aom_dist_wtd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x64_avg aom_dist_wtd_sad16x64_avg_ssse3
+unsigned int aom_dist_wtd_sad16x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x64_avg aom_dist_wtd_sad16x64_avg_sse2
unsigned int aom_dist_wtd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x8_avg aom_dist_wtd_sad16x8_avg_ssse3
+unsigned int aom_dist_wtd_sad16x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x8_avg aom_dist_wtd_sad16x8_avg_sse2
unsigned int aom_dist_wtd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x16_avg aom_dist_wtd_sad32x16_avg_ssse3
+unsigned int aom_dist_wtd_sad32x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x16_avg aom_dist_wtd_sad32x16_avg_sse2
unsigned int aom_dist_wtd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x32_avg aom_dist_wtd_sad32x32_avg_ssse3
+unsigned int aom_dist_wtd_sad32x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x32_avg aom_dist_wtd_sad32x32_avg_sse2
unsigned int aom_dist_wtd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x64_avg aom_dist_wtd_sad32x64_avg_ssse3
+unsigned int aom_dist_wtd_sad32x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x64_avg aom_dist_wtd_sad32x64_avg_sse2
unsigned int aom_dist_wtd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x8_avg aom_dist_wtd_sad32x8_avg_ssse3
+unsigned int aom_dist_wtd_sad32x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x8_avg aom_dist_wtd_sad32x8_avg_sse2
unsigned int aom_dist_wtd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x16_avg aom_dist_wtd_sad4x16_avg_ssse3
+unsigned int aom_dist_wtd_sad4x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x16_avg aom_dist_wtd_sad4x16_avg_sse2
unsigned int aom_dist_wtd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x4_avg aom_dist_wtd_sad4x4_avg_ssse3
+unsigned int aom_dist_wtd_sad4x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x4_avg aom_dist_wtd_sad4x4_avg_sse2
unsigned int aom_dist_wtd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x8_avg aom_dist_wtd_sad4x8_avg_ssse3
+unsigned int aom_dist_wtd_sad4x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x8_avg aom_dist_wtd_sad4x8_avg_sse2
unsigned int aom_dist_wtd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x128_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x128_avg aom_dist_wtd_sad64x128_avg_ssse3
+unsigned int aom_dist_wtd_sad64x128_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x128_avg aom_dist_wtd_sad64x128_avg_sse2
unsigned int aom_dist_wtd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x16_avg aom_dist_wtd_sad64x16_avg_ssse3
+unsigned int aom_dist_wtd_sad64x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x16_avg aom_dist_wtd_sad64x16_avg_sse2
unsigned int aom_dist_wtd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x32_avg aom_dist_wtd_sad64x32_avg_ssse3
+unsigned int aom_dist_wtd_sad64x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x32_avg aom_dist_wtd_sad64x32_avg_sse2
unsigned int aom_dist_wtd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x64_avg aom_dist_wtd_sad64x64_avg_ssse3
+unsigned int aom_dist_wtd_sad64x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x64_avg aom_dist_wtd_sad64x64_avg_sse2
unsigned int aom_dist_wtd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x16_avg aom_dist_wtd_sad8x16_avg_ssse3
+unsigned int aom_dist_wtd_sad8x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x16_avg aom_dist_wtd_sad8x16_avg_sse2
unsigned int aom_dist_wtd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x32_avg aom_dist_wtd_sad8x32_avg_ssse3
+unsigned int aom_dist_wtd_sad8x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x32_avg aom_dist_wtd_sad8x32_avg_sse2
unsigned int aom_dist_wtd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x4_avg aom_dist_wtd_sad8x4_avg_ssse3
+unsigned int aom_dist_wtd_sad8x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x4_avg aom_dist_wtd_sad8x4_avg_sse2
unsigned int aom_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x8_avg aom_dist_wtd_sad8x8_avg_ssse3
+unsigned int aom_dist_wtd_sad8x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x8_avg aom_dist_wtd_sad8x8_avg_sse2
uint32_t aom_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
uint32_t aom_dist_wtd_sub_pixel_avg_variance128x128_ssse3(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
@@ -559,10 +562,6 @@
void aom_fdct4x4_lp_sse2(const int16_t *input, int16_t *output, int stride);
#define aom_fdct4x4_lp aom_fdct4x4_lp_sse2
-void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_fdct8x8_sse2(const int16_t *input, tran_low_t *output, int stride);
-#define aom_fdct8x8 aom_fdct8x8_sse2
-
void aom_fft16x16_float_c(const float *input, float *temp, float *output);
void aom_fft16x16_float_sse2(const float *input, float *temp, float *output);
#define aom_fft16x16_float aom_fft16x16_float_sse2
@@ -582,16 +581,6 @@
void aom_fft8x8_float_sse2(const float *input, float *temp, float *output);
#define aom_fft8x8_float aom_fft8x8_float_sse2
-void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get16x16var aom_get16x16var_c
-
-unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-#define aom_get4x4sse_cs aom_get4x4sse_cs_c
-
-void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get8x8var_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get8x8var aom_get8x8var_sse2
-
void aom_get_blk_sse_sum_c(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
void aom_get_blk_sse_sum_sse2(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
#define aom_get_blk_sse_sum aom_get_blk_sse_sum_sse2
@@ -601,7 +590,8 @@
#define aom_get_mb_ss aom_get_mb_ss_sse2
void aom_get_var_sse_sum_16x16_dual_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
-#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_c
+void aom_get_var_sse_sum_16x16_dual_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
+#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_sse2
void aom_get_var_sse_sum_8x8_quad_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
void aom_get_var_sse_sum_8x8_quad_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
@@ -777,12 +767,6 @@
uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
-
-void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
-
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_ssse3
@@ -1200,11 +1184,11 @@
unsigned int aom_highbd_10_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_sse2
-unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_c
-unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_sse2
unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1229,11 +1213,11 @@
unsigned int aom_highbd_10_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_sse2
-unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_sse2
-unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_c
unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1249,8 +1233,8 @@
unsigned int aom_highbd_10_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_sse2
-unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_sse2
unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1265,8 +1249,8 @@
unsigned int aom_highbd_10_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_sse2
-unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_sse2
unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1342,12 +1326,6 @@
uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
-
-void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
-
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_ssse3
@@ -1765,11 +1743,11 @@
unsigned int aom_highbd_12_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_sse2
-unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
-unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_sse2
unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1794,11 +1772,11 @@
unsigned int aom_highbd_12_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_sse2
-unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_sse2
-unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1814,8 +1792,8 @@
unsigned int aom_highbd_12_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_sse2
-unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_sse2
unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1830,8 +1808,8 @@
unsigned int aom_highbd_12_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_sse2
-unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_sse2
unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1907,12 +1885,6 @@
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
-
-void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
-
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_ssse3
@@ -2198,11 +2170,11 @@
unsigned int aom_highbd_8_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_sse2
-unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
-unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_sse2
unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2227,11 +2199,11 @@
unsigned int aom_highbd_8_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_sse2
-unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_sse2
-unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2247,8 +2219,8 @@
unsigned int aom_highbd_8_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_sse2
-unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_sse2
unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2263,8 +2235,8 @@
unsigned int aom_highbd_8_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_sse2
-unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_sse2
unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2649,10 +2621,6 @@
unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
-void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_highbd_fdct8x8_sse2(const int16_t *input, tran_low_t *output, int stride);
-#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_sse2
-
void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_h_predictor_16x16_sse2(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_sse2
@@ -4592,10 +4560,6 @@
void aom_paeth_predictor_8x8_ssse3(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_ssse3
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-void aom_pixel_scale_sse2(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-#define aom_pixel_scale aom_pixel_scale_sse2
-
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
void aom_quantize_b_sse2(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
#define aom_quantize_b aom_quantize_b_sse2
@@ -4634,10 +4598,6 @@
void aom_sad128x128x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x128x4d aom_sad128x128x4d_sse2
-void aom_sad128x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad128x128x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x128x4d_avg aom_sad128x128x4d_avg_sse2
-
unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad128x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad128x64 aom_sad128x64_sse2
@@ -4653,14 +4613,6 @@
void aom_sad128x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x64x4d aom_sad128x64x4d_sse2
-void aom_sad128x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad128x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x64x4d_avg aom_sad128x64x4d_avg_sse2
-
-unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad128xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad128xh aom_sad128xh_sse2
-
unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x16 aom_sad16x16_sse2
@@ -4676,10 +4628,6 @@
void aom_sad16x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x16x4d aom_sad16x16x4d_sse2
-void aom_sad16x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x16x4d_avg aom_sad16x16x4d_avg_sse2
-
unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x32 aom_sad16x32_sse2
@@ -4695,10 +4643,6 @@
void aom_sad16x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x32x4d aom_sad16x32x4d_sse2
-void aom_sad16x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x32x4d_avg aom_sad16x32x4d_avg_sse2
-
unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x4 aom_sad16x4_sse2
@@ -4714,10 +4658,6 @@
void aom_sad16x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x4x4d aom_sad16x4x4d_sse2
-void aom_sad16x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x4x4d_avg aom_sad16x4x4d_avg_sse2
-
unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x64 aom_sad16x64_sse2
@@ -4733,10 +4673,6 @@
void aom_sad16x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x64x4d aom_sad16x64x4d_sse2
-void aom_sad16x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x64x4d_avg aom_sad16x64x4d_avg_sse2
-
unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x8 aom_sad16x8_sse2
@@ -4752,14 +4688,6 @@
void aom_sad16x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x8x4d aom_sad16x8x4d_sse2
-void aom_sad16x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x8x4d_avg aom_sad16x8x4d_avg_sse2
-
-unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad16xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad16xh aom_sad16xh_sse2
-
unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x16 aom_sad32x16_sse2
@@ -4775,10 +4703,6 @@
void aom_sad32x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x16x4d aom_sad32x16x4d_sse2
-void aom_sad32x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x16x4d_avg aom_sad32x16x4d_avg_sse2
-
unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x32 aom_sad32x32_sse2
@@ -4794,10 +4718,6 @@
void aom_sad32x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x32x4d aom_sad32x32x4d_sse2
-void aom_sad32x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x32x4d_avg aom_sad32x32x4d_avg_sse2
-
unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x64 aom_sad32x64_sse2
@@ -4813,10 +4733,6 @@
void aom_sad32x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x64x4d aom_sad32x64x4d_sse2
-void aom_sad32x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x64x4d_avg aom_sad32x64x4d_avg_sse2
-
unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x8 aom_sad32x8_sse2
@@ -4832,14 +4748,6 @@
void aom_sad32x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x8x4d aom_sad32x8x4d_sse2
-void aom_sad32x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x8x4d_avg aom_sad32x8x4d_avg_sse2
-
-unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad32xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad32xh aom_sad32xh_sse2
-
unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x16 aom_sad4x16_sse2
@@ -4855,10 +4763,6 @@
void aom_sad4x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x16x4d aom_sad4x16x4d_sse2
-void aom_sad4x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x16x4d_avg aom_sad4x16x4d_avg_sse2
-
unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x4 aom_sad4x4_sse2
@@ -4874,10 +4778,6 @@
void aom_sad4x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x4x4d aom_sad4x4x4d_sse2
-void aom_sad4x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x4x4d_avg aom_sad4x4x4d_avg_sse2
-
unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x8 aom_sad4x8_sse2
@@ -4893,14 +4793,6 @@
void aom_sad4x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x8x4d aom_sad4x8x4d_sse2
-void aom_sad4x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x8x4d_avg aom_sad4x8x4d_avg_sse2
-
-unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad4xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad4xh aom_sad4xh_sse2
-
unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x128_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x128 aom_sad64x128_sse2
@@ -4916,10 +4808,6 @@
void aom_sad64x128x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x128x4d aom_sad64x128x4d_sse2
-void aom_sad64x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x128x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x128x4d_avg aom_sad64x128x4d_avg_sse2
-
unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x16 aom_sad64x16_sse2
@@ -4935,10 +4823,6 @@
void aom_sad64x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x16x4d aom_sad64x16x4d_sse2
-void aom_sad64x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x16x4d_avg aom_sad64x16x4d_avg_sse2
-
unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x32 aom_sad64x32_sse2
@@ -4954,10 +4838,6 @@
void aom_sad64x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x32x4d aom_sad64x32x4d_sse2
-void aom_sad64x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x32x4d_avg aom_sad64x32x4d_avg_sse2
-
unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x64 aom_sad64x64_sse2
@@ -4973,14 +4853,6 @@
void aom_sad64x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x64x4d aom_sad64x64x4d_sse2
-void aom_sad64x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x64x4d_avg aom_sad64x64x4d_avg_sse2
-
-unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad64xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad64xh aom_sad64xh_sse2
-
unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x16 aom_sad8x16_sse2
@@ -4996,10 +4868,6 @@
void aom_sad8x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x16x4d aom_sad8x16x4d_sse2
-void aom_sad8x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x16x4d_avg aom_sad8x16x4d_avg_sse2
-
unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x32 aom_sad8x32_sse2
@@ -5015,10 +4883,6 @@
void aom_sad8x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x32x4d aom_sad8x32x4d_sse2
-void aom_sad8x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x32x4d_avg aom_sad8x32x4d_avg_sse2
-
unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x4 aom_sad8x4_sse2
@@ -5034,10 +4898,6 @@
void aom_sad8x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x4x4d aom_sad8x4x4d_sse2
-void aom_sad8x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x4x4d_avg aom_sad8x4x4d_avg_sse2
-
unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x8 aom_sad8x8_sse2
@@ -5053,14 +4913,6 @@
void aom_sad8x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x8x4d aom_sad8x8x4d_sse2
-void aom_sad8x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x8x4d_avg aom_sad8x8x4d_avg_sse2
-
-unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad8xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad8xh aom_sad8xh_sse2
-
unsigned int aom_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_128x128_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad_skip_128x128 aom_sad_skip_128x128_sse2
@@ -5897,7 +5749,7 @@
int aom_vector_var_c(const int16_t *ref, const int16_t *src, int bwl);
#define aom_vector_var aom_vector_var_c
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2);
#define av1_compute_cross_correlation av1_compute_cross_correlation_c
void aom_dsp_rtcd(void);
diff --git a/config/x86/config/aom_scale_rtcd.h b/config/x86/config/aom_scale_rtcd.h
index 28e903d..3b70fb4 100644
--- a/config/x86/config/aom_scale_rtcd.h
+++ b/config/x86/config/aom_scale_rtcd.h
@@ -80,7 +80,7 @@
void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
-int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes);
+int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes);
#define aom_yv12_realloc_with_new_border aom_yv12_realloc_with_new_border_c
void aom_scale_rtcd(void);
diff --git a/config/x86/config/av1_rtcd.h b/config/x86/config/av1_rtcd.h
index ef17ccb..d05b1d5 100644
--- a/config/x86/config/av1_rtcd.h
+++ b/config/x86/config/av1_rtcd.h
@@ -15,12 +15,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -86,18 +86,6 @@
int ref_stride, int subpel_search);
#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_sse2
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-void aom_comp_mask_upsampled_pred_sse2(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_sse2
-
void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
@@ -148,8 +136,8 @@
void av1_apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
#define av1_apply_selfguided_restoration av1_apply_selfguided_restoration_c
-void av1_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_apply_temporal_filter_sse2(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_sse2(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_apply_temporal_filter av1_apply_temporal_filter_sse2
int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
@@ -206,7 +194,7 @@
bool av1_cnn_predict_c( const float **input, int in_width, int in_height, int in_stride, const CNN_CONFIG *cnn_config, const CNN_THREAD_DATA *thread_data, CNN_MULTI_OUT *output_struct);
#define av1_cnn_predict av1_cnn_predict_c
-void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
#define av1_compute_stats av1_compute_stats_c
void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
@@ -256,6 +244,9 @@
void av1_dr_prediction_z3_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
#define av1_dr_prediction_z3 av1_dr_prediction_z3_c
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height, int width, int stride, int edge_thresh);
+#define av1_estimate_noise_from_single_plane av1_estimate_noise_from_single_plane_c
+
void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
#define av1_filter_intra_edge av1_filter_intra_edge_c
@@ -335,8 +326,8 @@
void av1_get_nz_map_contexts_sse2(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
#define av1_get_nz_map_contexts av1_get_nz_map_contexts_sse2
-void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_highbd_apply_temporal_filter_sse2(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_sse2(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_sse2
int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
@@ -397,8 +388,8 @@
void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
-#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_c
+double av1_highbd_estimate_noise_from_single_plane_c(const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh);
+#define av1_highbd_estimate_noise_from_single_plane av1_highbd_estimate_noise_from_single_plane_c
void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
#define av1_highbd_inv_txfm_add av1_highbd_inv_txfm_add_c
diff --git a/config/x86_64/config/aom_config.asm b/config/x86_64/config/aom_config.asm
index 81d4e24..dc45eaa 100644
--- a/config/x86_64/config/aom_config.asm
+++ b/config/x86_64/config/aom_config.asm
@@ -1,7 +1,8 @@
-%define ARCH_ARM 0
-%define ARCH_PPC 0
-%define ARCH_X86 0
-%define ARCH_X86_64 1
+%define AOM_ARCH_AARCH64 0
+%define AOM_ARCH_ARM 0
+%define AOM_ARCH_PPC 0
+%define AOM_ARCH_X86 0
+%define AOM_ARCH_X86_64 1
%define CONFIG_ACCOUNTING 0
%define CONFIG_ANALYZER 0
%define CONFIG_AV1_DECODER 1
@@ -37,6 +38,7 @@
%define CONFIG_NORMAL_TILE_MODE 1
%define CONFIG_OPTICAL_FLOW_API 0
%define CONFIG_OS_SUPPORT 1
+%define CONFIG_OUTPUT_FRAME_SIZE 0
%define CONFIG_PARTITION_SEARCH_ORDER 0
%define CONFIG_PIC 1
%define CONFIG_RATECTRL_LOG 0
@@ -45,6 +47,7 @@
%define CONFIG_REALTIME_ONLY 0
%define CONFIG_RT_ML_PARTITIONING 0
%define CONFIG_RUNTIME_CPU_DETECT 0
+%define CONFIG_SALIENCY_MAP 0
%define CONFIG_SHARED 0
%define CONFIG_SIZE_LIMIT 1
%define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/x86_64/config/aom_config.c b/config/x86_64/config/aom_config.c
index 3801952..8a75212 100644
--- a/config/x86_64/config/aom_config.c
+++ b/config/x86_64/config/aom_config.c
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2016, Alliance for Open Media. All rights reserved
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
*
* This source code is subject to the terms of the BSD 2 Clause License and
* the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
diff --git a/config/x86_64/config/aom_config.h b/config/x86_64/config/aom_config.h
index dcaa0fc..bad7861 100644
--- a/config/x86_64/config/aom_config.h
+++ b/config/x86_64/config/aom_config.h
@@ -10,10 +10,11 @@
*/
#ifndef AOM_CONFIG_H_
#define AOM_CONFIG_H_
-#define ARCH_ARM 0
-#define ARCH_PPC 0
-#define ARCH_X86 0
-#define ARCH_X86_64 1
+#define AOM_ARCH_AARCH64 0
+#define AOM_ARCH_ARM 0
+#define AOM_ARCH_PPC 0
+#define AOM_ARCH_X86 0
+#define AOM_ARCH_X86_64 1
#define CONFIG_ACCOUNTING 0
#define CONFIG_ANALYZER 0
#define CONFIG_AV1_DECODER 1
@@ -49,6 +50,7 @@
#define CONFIG_NORMAL_TILE_MODE 1
#define CONFIG_OPTICAL_FLOW_API 0
#define CONFIG_OS_SUPPORT 1
+#define CONFIG_OUTPUT_FRAME_SIZE 0
#define CONFIG_PARTITION_SEARCH_ORDER 0
#define CONFIG_PIC 1
#define CONFIG_RATECTRL_LOG 0
@@ -57,6 +59,7 @@
#define CONFIG_REALTIME_ONLY 0
#define CONFIG_RT_ML_PARTITIONING 0
#define CONFIG_RUNTIME_CPU_DETECT 0
+#define CONFIG_SALIENCY_MAP 0
#define CONFIG_SHARED 0
#define CONFIG_SIZE_LIMIT 1
#define CONFIG_SPATIAL_RESAMPLING 1
diff --git a/config/x86_64/config/aom_dsp_rtcd.h b/config/x86_64/config/aom_dsp_rtcd.h
index c4f99d6..cfb7380 100644
--- a/config/x86_64/config/aom_dsp_rtcd.h
+++ b/config/x86_64/config/aom_dsp_rtcd.h
@@ -14,8 +14,8 @@
#include "aom/aom_integer.h"
#include "aom_dsp/aom_dsp_common.h"
-#include "av1/common/enums.h"
#include "av1/common/blockd.h"
+#include "av1/common/enums.h"
#ifdef __cplusplus
@@ -50,6 +50,9 @@
void aom_comp_mask_pred_ssse3(uint8_t *comp_pred, const uint8_t *pred, int width, int height, const uint8_t *ref, int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask);
#define aom_comp_mask_pred aom_comp_mask_pred_ssse3
+void aom_compute_flow_at_point_c(const uint8_t *src, const uint8_t *ref, int x, int y, int width, int height, int stride, double *u, double *v);
+#define aom_compute_flow_at_point aom_compute_flow_at_point_c
+
void aom_convolve8_c(const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const InterpKernel *filter, int x0_q4, int x_step_q4, int y0_q4, int y_step_q4, int w, int h);
#define aom_convolve8 aom_convolve8_c
@@ -376,92 +379,92 @@
#define aom_dist_wtd_comp_avg_pred aom_dist_wtd_comp_avg_pred_ssse3
unsigned int aom_dist_wtd_sad128x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad128x128_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad128x128_avg aom_dist_wtd_sad128x128_avg_ssse3
+unsigned int aom_dist_wtd_sad128x128_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x128_avg aom_dist_wtd_sad128x128_avg_sse2
unsigned int aom_dist_wtd_sad128x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad128x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad128x64_avg aom_dist_wtd_sad128x64_avg_ssse3
+unsigned int aom_dist_wtd_sad128x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad128x64_avg aom_dist_wtd_sad128x64_avg_sse2
unsigned int aom_dist_wtd_sad16x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x16_avg aom_dist_wtd_sad16x16_avg_ssse3
+unsigned int aom_dist_wtd_sad16x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x16_avg aom_dist_wtd_sad16x16_avg_sse2
unsigned int aom_dist_wtd_sad16x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x32_avg aom_dist_wtd_sad16x32_avg_ssse3
+unsigned int aom_dist_wtd_sad16x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x32_avg aom_dist_wtd_sad16x32_avg_sse2
unsigned int aom_dist_wtd_sad16x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x4_avg aom_dist_wtd_sad16x4_avg_ssse3
+unsigned int aom_dist_wtd_sad16x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x4_avg aom_dist_wtd_sad16x4_avg_sse2
unsigned int aom_dist_wtd_sad16x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x64_avg aom_dist_wtd_sad16x64_avg_ssse3
+unsigned int aom_dist_wtd_sad16x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x64_avg aom_dist_wtd_sad16x64_avg_sse2
unsigned int aom_dist_wtd_sad16x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad16x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad16x8_avg aom_dist_wtd_sad16x8_avg_ssse3
+unsigned int aom_dist_wtd_sad16x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad16x8_avg aom_dist_wtd_sad16x8_avg_sse2
unsigned int aom_dist_wtd_sad32x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x16_avg aom_dist_wtd_sad32x16_avg_ssse3
+unsigned int aom_dist_wtd_sad32x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x16_avg aom_dist_wtd_sad32x16_avg_sse2
unsigned int aom_dist_wtd_sad32x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x32_avg aom_dist_wtd_sad32x32_avg_ssse3
+unsigned int aom_dist_wtd_sad32x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x32_avg aom_dist_wtd_sad32x32_avg_sse2
unsigned int aom_dist_wtd_sad32x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x64_avg aom_dist_wtd_sad32x64_avg_ssse3
+unsigned int aom_dist_wtd_sad32x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x64_avg aom_dist_wtd_sad32x64_avg_sse2
unsigned int aom_dist_wtd_sad32x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad32x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad32x8_avg aom_dist_wtd_sad32x8_avg_ssse3
+unsigned int aom_dist_wtd_sad32x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad32x8_avg aom_dist_wtd_sad32x8_avg_sse2
unsigned int aom_dist_wtd_sad4x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x16_avg aom_dist_wtd_sad4x16_avg_ssse3
+unsigned int aom_dist_wtd_sad4x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x16_avg aom_dist_wtd_sad4x16_avg_sse2
unsigned int aom_dist_wtd_sad4x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x4_avg aom_dist_wtd_sad4x4_avg_ssse3
+unsigned int aom_dist_wtd_sad4x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x4_avg aom_dist_wtd_sad4x4_avg_sse2
unsigned int aom_dist_wtd_sad4x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad4x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad4x8_avg aom_dist_wtd_sad4x8_avg_ssse3
+unsigned int aom_dist_wtd_sad4x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad4x8_avg aom_dist_wtd_sad4x8_avg_sse2
unsigned int aom_dist_wtd_sad64x128_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x128_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x128_avg aom_dist_wtd_sad64x128_avg_ssse3
+unsigned int aom_dist_wtd_sad64x128_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x128_avg aom_dist_wtd_sad64x128_avg_sse2
unsigned int aom_dist_wtd_sad64x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x16_avg aom_dist_wtd_sad64x16_avg_ssse3
+unsigned int aom_dist_wtd_sad64x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x16_avg aom_dist_wtd_sad64x16_avg_sse2
unsigned int aom_dist_wtd_sad64x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x32_avg aom_dist_wtd_sad64x32_avg_ssse3
+unsigned int aom_dist_wtd_sad64x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x32_avg aom_dist_wtd_sad64x32_avg_sse2
unsigned int aom_dist_wtd_sad64x64_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad64x64_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad64x64_avg aom_dist_wtd_sad64x64_avg_ssse3
+unsigned int aom_dist_wtd_sad64x64_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad64x64_avg aom_dist_wtd_sad64x64_avg_sse2
unsigned int aom_dist_wtd_sad8x16_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x16_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x16_avg aom_dist_wtd_sad8x16_avg_ssse3
+unsigned int aom_dist_wtd_sad8x16_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x16_avg aom_dist_wtd_sad8x16_avg_sse2
unsigned int aom_dist_wtd_sad8x32_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x32_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x32_avg aom_dist_wtd_sad8x32_avg_ssse3
+unsigned int aom_dist_wtd_sad8x32_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x32_avg aom_dist_wtd_sad8x32_avg_sse2
unsigned int aom_dist_wtd_sad8x4_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x4_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x4_avg aom_dist_wtd_sad8x4_avg_ssse3
+unsigned int aom_dist_wtd_sad8x4_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x4_avg aom_dist_wtd_sad8x4_avg_sse2
unsigned int aom_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-unsigned int aom_dist_wtd_sad8x8_avg_ssse3(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
-#define aom_dist_wtd_sad8x8_avg aom_dist_wtd_sad8x8_avg_ssse3
+unsigned int aom_dist_wtd_sad8x8_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
+#define aom_dist_wtd_sad8x8_avg aom_dist_wtd_sad8x8_avg_sse2
uint32_t aom_dist_wtd_sub_pixel_avg_variance128x128_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
uint32_t aom_dist_wtd_sub_pixel_avg_variance128x128_ssse3(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS *jcp_param);
@@ -559,11 +562,6 @@
void aom_fdct4x4_lp_sse2(const int16_t *input, int16_t *output, int stride);
#define aom_fdct4x4_lp aom_fdct4x4_lp_sse2
-void aom_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_fdct8x8_sse2(const int16_t *input, tran_low_t *output, int stride);
-void aom_fdct8x8_ssse3(const int16_t *input, tran_low_t *output, int stride);
-#define aom_fdct8x8 aom_fdct8x8_ssse3
-
void aom_fft16x16_float_c(const float *input, float *temp, float *output);
void aom_fft16x16_float_sse2(const float *input, float *temp, float *output);
#define aom_fft16x16_float aom_fft16x16_float_sse2
@@ -583,16 +581,6 @@
void aom_fft8x8_float_sse2(const float *input, float *temp, float *output);
#define aom_fft8x8_float aom_fft8x8_float_sse2
-void aom_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get16x16var aom_get16x16var_c
-
-unsigned int aom_get4x4sse_cs_c(const unsigned char *src_ptr, int source_stride, const unsigned char *ref_ptr, int ref_stride);
-#define aom_get4x4sse_cs aom_get4x4sse_cs_c
-
-void aom_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-void aom_get8x8var_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_get8x8var aom_get8x8var_sse2
-
void aom_get_blk_sse_sum_c(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
void aom_get_blk_sse_sum_sse2(const int16_t *data, int stride, int bw, int bh, int *x_sum, int64_t *x2_sum);
#define aom_get_blk_sse_sum aom_get_blk_sse_sum_sse2
@@ -602,7 +590,8 @@
#define aom_get_mb_ss aom_get_mb_ss_sse2
void aom_get_var_sse_sum_16x16_dual_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
-#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_c
+void aom_get_var_sse_sum_16x16_dual_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse16x16, unsigned int *tot_sse, int *tot_sum, uint32_t *var16x16);
+#define aom_get_var_sse_sum_16x16_dual aom_get_var_sse_sum_16x16_dual_sse2
void aom_get_var_sse_sum_8x8_quad_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
void aom_get_var_sse_sum_8x8_quad_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse8x8, int *sum8x8, unsigned int *tot_sse, int *tot_sum, uint32_t *var8x8);
@@ -778,12 +767,6 @@
uint32_t aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_10_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_10_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get16x16var aom_highbd_10_get16x16var_c
-
-void aom_highbd_10_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_10_get8x8var aom_highbd_10_get8x8var_c
-
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_10_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_10_masked_sub_pixel_variance128x128 aom_highbd_10_masked_sub_pixel_variance128x128_ssse3
@@ -1201,11 +1184,11 @@
unsigned int aom_highbd_10_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x32 aom_highbd_10_variance16x32_sse2
-unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x4 aom_highbd_10_variance16x4_c
-unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance16x64 aom_highbd_10_variance16x64_sse2
unsigned int aom_highbd_10_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1230,11 +1213,11 @@
unsigned int aom_highbd_10_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x64 aom_highbd_10_variance32x64_sse2
-unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance32x8 aom_highbd_10_variance32x8_sse2
-unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance4x16 aom_highbd_10_variance4x16_c
unsigned int aom_highbd_10_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1250,8 +1233,8 @@
unsigned int aom_highbd_10_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x128 aom_highbd_10_variance64x128_sse2
-unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance64x16 aom_highbd_10_variance64x16_sse2
unsigned int aom_highbd_10_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1266,8 +1249,8 @@
unsigned int aom_highbd_10_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x16 aom_highbd_10_variance8x16_sse2
-unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_10_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_10_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_10_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_10_variance8x32 aom_highbd_10_variance8x32_sse2
unsigned int aom_highbd_10_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1343,12 +1326,6 @@
uint32_t aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_12_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_12_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get16x16var aom_highbd_12_get16x16var_c
-
-void aom_highbd_12_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_12_get8x8var aom_highbd_12_get8x8var_c
-
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_12_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_12_masked_sub_pixel_variance128x128 aom_highbd_12_masked_sub_pixel_variance128x128_ssse3
@@ -1766,11 +1743,11 @@
unsigned int aom_highbd_12_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x32 aom_highbd_12_variance16x32_sse2
-unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x4 aom_highbd_12_variance16x4_c
-unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance16x64 aom_highbd_12_variance16x64_sse2
unsigned int aom_highbd_12_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1795,11 +1772,11 @@
unsigned int aom_highbd_12_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x64 aom_highbd_12_variance32x64_sse2
-unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance32x8 aom_highbd_12_variance32x8_sse2
-unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance4x16 aom_highbd_12_variance4x16_c
unsigned int aom_highbd_12_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1815,8 +1792,8 @@
unsigned int aom_highbd_12_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x128 aom_highbd_12_variance64x128_sse2
-unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance64x16 aom_highbd_12_variance64x16_sse2
unsigned int aom_highbd_12_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1831,8 +1808,8 @@
unsigned int aom_highbd_12_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x16 aom_highbd_12_variance8x16_sse2
-unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_12_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_12_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_12_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_12_variance8x32 aom_highbd_12_variance8x32_sse2
unsigned int aom_highbd_12_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -1908,12 +1885,6 @@
uint32_t aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c(const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8 aom_highbd_8_dist_wtd_sub_pixel_avg_variance8x8_c
-void aom_highbd_8_get16x16var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get16x16var aom_highbd_8_get16x16var_c
-
-void aom_highbd_8_get8x8var_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse, int *sum);
-#define aom_highbd_8_get8x8var aom_highbd_8_get8x8var_c
-
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_c(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
unsigned int aom_highbd_8_masked_sub_pixel_variance128x128_ssse3(const uint8_t *src, int src_stride, int xoffset, int yoffset, const uint8_t *ref, int ref_stride, const uint8_t *second_pred, const uint8_t *msk, int msk_stride, int invert_mask, unsigned int *sse);
#define aom_highbd_8_masked_sub_pixel_variance128x128 aom_highbd_8_masked_sub_pixel_variance128x128_ssse3
@@ -2199,11 +2170,11 @@
unsigned int aom_highbd_8_variance16x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x32 aom_highbd_8_variance16x32_sse2
-unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x4 aom_highbd_8_variance16x4_c
-unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance16x64_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance16x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance16x64 aom_highbd_8_variance16x64_sse2
unsigned int aom_highbd_8_variance16x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2228,11 +2199,11 @@
unsigned int aom_highbd_8_variance32x64_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x64 aom_highbd_8_variance32x64_sse2
-unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance32x8_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance32x8_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance32x8 aom_highbd_8_variance32x8_sse2
-unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance4x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance4x16 aom_highbd_8_variance4x16_c
unsigned int aom_highbd_8_variance4x2_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2248,8 +2219,8 @@
unsigned int aom_highbd_8_variance64x128_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x128 aom_highbd_8_variance64x128_sse2
-unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance64x16_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance64x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance64x16 aom_highbd_8_variance64x16_sse2
unsigned int aom_highbd_8_variance64x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2264,8 +2235,8 @@
unsigned int aom_highbd_8_variance8x16_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x16 aom_highbd_8_variance8x16_sse2
-unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
-unsigned int aom_highbd_8_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse);
+unsigned int aom_highbd_8_variance8x32_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
+unsigned int aom_highbd_8_variance8x32_sse2(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
#define aom_highbd_8_variance8x32 aom_highbd_8_variance8x32_sse2
unsigned int aom_highbd_8_variance8x4_c(const uint8_t *src_ptr, int source_stride, const uint8_t *ref_ptr, int ref_stride, unsigned int *sse);
@@ -2650,10 +2621,6 @@
unsigned int aom_highbd_dist_wtd_sad8x8_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred, const DIST_WTD_COMP_PARAMS* jcp_param);
#define aom_highbd_dist_wtd_sad8x8_avg aom_highbd_dist_wtd_sad8x8_avg_c
-void aom_highbd_fdct8x8_c(const int16_t *input, tran_low_t *output, int stride);
-void aom_highbd_fdct8x8_sse2(const int16_t *input, tran_low_t *output, int stride);
-#define aom_highbd_fdct8x8 aom_highbd_fdct8x8_sse2
-
void aom_highbd_h_predictor_16x16_c(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
void aom_highbd_h_predictor_16x16_sse2(uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd);
#define aom_highbd_h_predictor_16x16 aom_highbd_h_predictor_16x16_sse2
@@ -4593,10 +4560,6 @@
void aom_paeth_predictor_8x8_ssse3(uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left);
#define aom_paeth_predictor_8x8 aom_paeth_predictor_8x8_ssse3
-void aom_pixel_scale_c(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-void aom_pixel_scale_sse2(const int16_t *src_diff, ptrdiff_t src_stride, int16_t *coeff, int log_scale, int h8, int w8);
-#define aom_pixel_scale aom_pixel_scale_sse2
-
void aom_quantize_b_c(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
void aom_quantize_b_sse2(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
void aom_quantize_b_ssse3(const tran_low_t *coeff_ptr, intptr_t n_coeffs, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan);
@@ -4637,10 +4600,6 @@
void aom_sad128x128x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x128x4d aom_sad128x128x4d_sse2
-void aom_sad128x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad128x128x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x128x4d_avg aom_sad128x128x4d_avg_sse2
-
unsigned int aom_sad128x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad128x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad128x64 aom_sad128x64_sse2
@@ -4656,14 +4615,6 @@
void aom_sad128x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad128x64x4d aom_sad128x64x4d_sse2
-void aom_sad128x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad128x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad128x64x4d_avg aom_sad128x64x4d_avg_sse2
-
-unsigned int aom_sad128xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad128xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad128xh aom_sad128xh_sse2
-
unsigned int aom_sad16x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x16 aom_sad16x16_sse2
@@ -4679,10 +4630,6 @@
void aom_sad16x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x16x4d aom_sad16x16x4d_sse2
-void aom_sad16x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x16x4d_avg aom_sad16x16x4d_avg_sse2
-
unsigned int aom_sad16x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x32 aom_sad16x32_sse2
@@ -4698,10 +4645,6 @@
void aom_sad16x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x32x4d aom_sad16x32x4d_sse2
-void aom_sad16x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x32x4d_avg aom_sad16x32x4d_avg_sse2
-
unsigned int aom_sad16x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x4 aom_sad16x4_sse2
@@ -4717,10 +4660,6 @@
void aom_sad16x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x4x4d aom_sad16x4x4d_sse2
-void aom_sad16x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x4x4d_avg aom_sad16x4x4d_avg_sse2
-
unsigned int aom_sad16x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x64 aom_sad16x64_sse2
@@ -4736,10 +4675,6 @@
void aom_sad16x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x64x4d aom_sad16x64x4d_sse2
-void aom_sad16x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x64x4d_avg aom_sad16x64x4d_avg_sse2
-
unsigned int aom_sad16x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad16x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad16x8 aom_sad16x8_sse2
@@ -4755,14 +4690,6 @@
void aom_sad16x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad16x8x4d aom_sad16x8x4d_sse2
-void aom_sad16x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad16x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad16x8x4d_avg aom_sad16x8x4d_avg_sse2
-
-unsigned int aom_sad16xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad16xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad16xh aom_sad16xh_sse2
-
unsigned int aom_sad32x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x16 aom_sad32x16_sse2
@@ -4778,10 +4705,6 @@
void aom_sad32x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x16x4d aom_sad32x16x4d_sse2
-void aom_sad32x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x16x4d_avg aom_sad32x16x4d_avg_sse2
-
unsigned int aom_sad32x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x32 aom_sad32x32_sse2
@@ -4797,10 +4720,6 @@
void aom_sad32x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x32x4d aom_sad32x32x4d_sse2
-void aom_sad32x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x32x4d_avg aom_sad32x32x4d_avg_sse2
-
unsigned int aom_sad32x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x64 aom_sad32x64_sse2
@@ -4816,10 +4735,6 @@
void aom_sad32x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x64x4d aom_sad32x64x4d_sse2
-void aom_sad32x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x64x4d_avg aom_sad32x64x4d_avg_sse2
-
unsigned int aom_sad32x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad32x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad32x8 aom_sad32x8_sse2
@@ -4835,14 +4750,6 @@
void aom_sad32x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad32x8x4d aom_sad32x8x4d_sse2
-void aom_sad32x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad32x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad32x8x4d_avg aom_sad32x8x4d_avg_sse2
-
-unsigned int aom_sad32xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad32xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad32xh aom_sad32xh_sse2
-
unsigned int aom_sad4x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x16 aom_sad4x16_sse2
@@ -4858,10 +4765,6 @@
void aom_sad4x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x16x4d aom_sad4x16x4d_sse2
-void aom_sad4x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x16x4d_avg aom_sad4x16x4d_avg_sse2
-
unsigned int aom_sad4x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x4 aom_sad4x4_sse2
@@ -4877,10 +4780,6 @@
void aom_sad4x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x4x4d aom_sad4x4x4d_sse2
-void aom_sad4x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x4x4d_avg aom_sad4x4x4d_avg_sse2
-
unsigned int aom_sad4x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad4x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad4x8 aom_sad4x8_sse2
@@ -4896,14 +4795,6 @@
void aom_sad4x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad4x8x4d aom_sad4x8x4d_sse2
-void aom_sad4x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad4x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad4x8x4d_avg aom_sad4x8x4d_avg_sse2
-
-unsigned int aom_sad4xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad4xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad4xh aom_sad4xh_sse2
-
unsigned int aom_sad64x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x128_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x128 aom_sad64x128_sse2
@@ -4919,10 +4810,6 @@
void aom_sad64x128x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x128x4d aom_sad64x128x4d_sse2
-void aom_sad64x128x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x128x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x128x4d_avg aom_sad64x128x4d_avg_sse2
-
unsigned int aom_sad64x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x16 aom_sad64x16_sse2
@@ -4938,10 +4825,6 @@
void aom_sad64x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x16x4d aom_sad64x16x4d_sse2
-void aom_sad64x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x16x4d_avg aom_sad64x16x4d_avg_sse2
-
unsigned int aom_sad64x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x32 aom_sad64x32_sse2
@@ -4957,10 +4840,6 @@
void aom_sad64x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x32x4d aom_sad64x32x4d_sse2
-void aom_sad64x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x32x4d_avg aom_sad64x32x4d_avg_sse2
-
unsigned int aom_sad64x64_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad64x64_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad64x64 aom_sad64x64_sse2
@@ -4976,14 +4855,6 @@
void aom_sad64x64x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad64x64x4d aom_sad64x64x4d_sse2
-void aom_sad64x64x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad64x64x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad64x64x4d_avg aom_sad64x64x4d_avg_sse2
-
-unsigned int aom_sad64xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad64xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad64xh aom_sad64xh_sse2
-
unsigned int aom_sad8x16_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x16_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x16 aom_sad8x16_sse2
@@ -4999,10 +4870,6 @@
void aom_sad8x16x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x16x4d aom_sad8x16x4d_sse2
-void aom_sad8x16x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x16x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x16x4d_avg aom_sad8x16x4d_avg_sse2
-
unsigned int aom_sad8x32_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x32_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x32 aom_sad8x32_sse2
@@ -5018,10 +4885,6 @@
void aom_sad8x32x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x32x4d aom_sad8x32x4d_sse2
-void aom_sad8x32x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x32x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x32x4d_avg aom_sad8x32x4d_avg_sse2
-
unsigned int aom_sad8x4_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x4_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x4 aom_sad8x4_sse2
@@ -5037,10 +4900,6 @@
void aom_sad8x4x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x4x4d aom_sad8x4x4d_sse2
-void aom_sad8x4x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x4x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x4x4d_avg aom_sad8x4x4d_avg_sse2
-
unsigned int aom_sad8x8_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad8x8_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad8x8 aom_sad8x8_sse2
@@ -5056,14 +4915,6 @@
void aom_sad8x8x4d_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, uint32_t sad_array[4]);
#define aom_sad8x8x4d aom_sad8x8x4d_sse2
-void aom_sad8x8x4d_avg_c(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-void aom_sad8x8x4d_avg_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[4], int ref_stride, const uint8_t *second_pred, uint32_t sad_array[4]);
-#define aom_sad8x8x4d_avg aom_sad8x8x4d_avg_sse2
-
-unsigned int aom_sad8xh_c(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-unsigned int aom_sad8xh_sse2(const uint8_t *a, int a_stride, const uint8_t *b, int b_stride, int width, int height);
-#define aom_sad8xh aom_sad8xh_sse2
-
unsigned int aom_sad_skip_128x128_c(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
unsigned int aom_sad_skip_128x128_sse2(const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride);
#define aom_sad_skip_128x128 aom_sad_skip_128x128_sse2
@@ -5901,7 +5752,7 @@
int aom_vector_var_c(const int16_t *ref, const int16_t *src, int bwl);
#define aom_vector_var aom_vector_var_c
-double av1_compute_cross_correlation_c(unsigned char *im1, int stride1, int x1, int y1, unsigned char *im2, int stride2, int x2, int y2);
+double av1_compute_cross_correlation_c(const unsigned char *frame1, int stride1, int x1, int y1, const unsigned char *frame2, int stride2, int x2, int y2);
#define av1_compute_cross_correlation av1_compute_cross_correlation_c
void aom_dsp_rtcd(void);
diff --git a/config/x86_64/config/aom_scale_rtcd.h b/config/x86_64/config/aom_scale_rtcd.h
index 28e903d..3b70fb4 100644
--- a/config/x86_64/config/aom_scale_rtcd.h
+++ b/config/x86_64/config/aom_scale_rtcd.h
@@ -80,7 +80,7 @@
void aom_yv12_partial_copy_y_c(const struct yv12_buffer_config *src_ybc, int hstart1, int hend1, int vstart1, int vend1, struct yv12_buffer_config *dst_ybc, int hstart2, int vstart2);
#define aom_yv12_partial_copy_y aom_yv12_partial_copy_y_c
-int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_planes);
+int aom_yv12_realloc_with_new_border_c(struct yv12_buffer_config *ybf, int new_border, int byte_alignment, int num_pyramid_levels, int num_planes);
#define aom_yv12_realloc_with_new_border aom_yv12_realloc_with_new_border_c
void aom_scale_rtcd(void);
diff --git a/config/x86_64/config/av1_rtcd.h b/config/x86_64/config/av1_rtcd.h
index 00a607d..c64a024 100644
--- a/config/x86_64/config/av1_rtcd.h
+++ b/config/x86_64/config/av1_rtcd.h
@@ -15,12 +15,12 @@
#include "aom/aom_integer.h"
#include "aom_dsp/odintrin.h"
#include "aom_dsp/txfm_common.h"
-#include "av1/common/common.h"
-#include "av1/common/enums.h"
-#include "av1/common/quant_common.h"
-#include "av1/common/filter.h"
-#include "av1/common/convolve.h"
#include "av1/common/av1_txfm.h"
+#include "av1/common/common.h"
+#include "av1/common/convolve.h"
+#include "av1/common/enums.h"
+#include "av1/common/filter.h"
+#include "av1/common/quant_common.h"
#include "av1/common/restoration.h"
struct macroblockd;
@@ -86,18 +86,6 @@
int ref_stride, int subpel_search);
#define aom_comp_avg_upsampled_pred aom_comp_avg_upsampled_pred_sse2
-void aom_comp_mask_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-void aom_comp_mask_upsampled_pred_sse2(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
- const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
- int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
- int ref_stride, const uint8_t *mask, int mask_stride, int invert_mask,
- int subpel_search);
-#define aom_comp_mask_upsampled_pred aom_comp_mask_upsampled_pred_sse2
-
void aom_dist_wtd_comp_avg_upsampled_pred_c(MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
const MV *const mv, uint8_t *comp_pred, const uint8_t *pred, int width,
int height, int subpel_x_q3, int subpel_y_q3, const uint8_t *ref,
@@ -148,8 +136,8 @@
void av1_apply_selfguided_restoration_c(const uint8_t *dat, int width, int height, int stride, int eps, const int *xqd, uint8_t *dst, int dst_stride, int32_t *tmpbuf, int bit_depth, int highbd);
#define av1_apply_selfguided_restoration av1_apply_selfguided_restoration_c
-void av1_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_apply_temporal_filter_sse2(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_apply_temporal_filter_sse2(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_apply_temporal_filter av1_apply_temporal_filter_sse2
int64_t av1_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz);
@@ -206,7 +194,7 @@
bool av1_cnn_predict_c( const float **input, int in_width, int in_height, int in_stride, const CNN_CONFIG *cnn_config, const CNN_THREAD_DATA *thread_data, CNN_MULTI_OUT *output_struct);
#define av1_cnn_predict av1_cnn_predict_c
-void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
+void av1_compute_stats_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int16_t *dgd_avg, int16_t *src_avg, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, int use_downsampled_wiener_stats);
#define av1_compute_stats av1_compute_stats_c
void av1_compute_stats_highbd_c(int wiener_win, const uint8_t *dgd8, const uint8_t *src8, int h_start, int h_end, int v_start, int v_end, int dgd_stride, int src_stride, int64_t *M, int64_t *H, aom_bit_depth_t bit_depth);
@@ -256,6 +244,9 @@
void av1_dr_prediction_z3_c(uint8_t *dst, ptrdiff_t stride, int bw, int bh, const uint8_t *above, const uint8_t *left, int upsample_left, int dx, int dy);
#define av1_dr_prediction_z3 av1_dr_prediction_z3_c
+double av1_estimate_noise_from_single_plane_c(const uint8_t *src, int height, int width, int stride, int edge_thresh);
+#define av1_estimate_noise_from_single_plane av1_estimate_noise_from_single_plane_c
+
void av1_filter_intra_edge_c(uint8_t *p, int sz, int strength);
#define av1_filter_intra_edge av1_filter_intra_edge_c
@@ -335,8 +326,8 @@
void av1_get_nz_map_contexts_sse2(const uint8_t *const levels, const int16_t *const scan, const uint16_t eob, const TX_SIZE tx_size, const TX_CLASS tx_class, int8_t *const coeff_contexts);
#define av1_get_nz_map_contexts av1_get_nz_map_contexts_sse2
-void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
-void av1_highbd_apply_temporal_filter_sse2(const struct yv12_buffer_config *ref_frame, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_c(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
+void av1_highbd_apply_temporal_filter_sse2(const struct yv12_buffer_config *frame_to_filter, const struct macroblockd *mbd, const BLOCK_SIZE block_size, const int mb_row, const int mb_col, const int num_planes, const double *noise_levels, const MV *subblock_mvs, const int *subblock_mses, const int q_factor, const int filter_strength, int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
#define av1_highbd_apply_temporal_filter av1_highbd_apply_temporal_filter_sse2
int64_t av1_highbd_block_error_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd);
@@ -400,8 +391,8 @@
void av1_highbd_dr_prediction_z3_c(uint16_t *dst, ptrdiff_t stride, int bw, int bh, const uint16_t *above, const uint16_t *left, int upsample_left, int dx, int dy, int bd);
#define av1_highbd_dr_prediction_z3 av1_highbd_dr_prediction_z3_c
-void av1_highbd_fwht4x4_c(const int16_t *input, tran_low_t *output, int stride);
-#define av1_highbd_fwht4x4 av1_highbd_fwht4x4_c
+double av1_highbd_estimate_noise_from_single_plane_c(const uint16_t *src, int height, int width, int stride, int bit_depth, int edge_thresh);
+#define av1_highbd_estimate_noise_from_single_plane av1_highbd_estimate_noise_from_single_plane_c
void av1_highbd_inv_txfm_add_c(const tran_low_t *input, uint8_t *dest, int stride, const TxfmParam *txfm_param);
#define av1_highbd_inv_txfm_add av1_highbd_inv_txfm_add_c
diff --git a/docs.cmake b/docs.cmake
index 0825ca4..0d8db92 100644
--- a/docs.cmake
+++ b/docs.cmake
@@ -100,7 +100,7 @@
"Scalable encoder loop.")
set(AOM_DOXYGEN_EXAMPLE_SOURCES ${AOM_DOXYGEN_EXAMPLE_SOURCES}
- "${AOM_ROOT}/examples/svc_encoder_rtc.c")
+ "${AOM_ROOT}/examples/svc_encoder_rtc.cc")
set(AOM_DOXYGEN_EXAMPLE_DESCRIPTIONS ${AOM_DOXYGEN_EXAMPLE_DESCRIPTIONS}
"Layered encoder for RTC.")
diff --git a/examples/encoder_util.h b/examples/encoder_util.h
index a6bb3fb..fa0e7d1 100644
--- a/examples/encoder_util.h
+++ b/examples/encoder_util.h
@@ -14,6 +14,10 @@
#ifndef AOM_EXAMPLES_ENCODER_UTIL_H_
#define AOM_EXAMPLES_ENCODER_UTIL_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+
#include "aom/aom_image.h"
// Returns mismatch location (?loc[0],?loc[1]) and the values at that location
@@ -30,4 +34,7 @@
int aom_compare_img(const aom_image_t *const img1,
const aom_image_t *const img2);
+#ifdef __cplusplus
+}
+#endif
#endif // AOM_EXAMPLES_ENCODER_UTIL_H_
diff --git a/examples/inspect.c b/examples/inspect.c
index 8e7213a..ed77b5d 100644
--- a/examples/inspect.c
+++ b/examples/inspect.c
@@ -509,7 +509,6 @@
int r, c, t, i;
if (compress && len == 1) {
die("Can't encode scalars as arrays when RLE compression is enabled.");
- return -1;
}
if (map) {
buf += snprintf(buf, MAX_BUFFER, " \"%sMap\": {", name);
diff --git a/examples/lightfield_bitstream_parsing.c b/examples/lightfield_bitstream_parsing.c
index 35b4ad0..05272ba 100644
--- a/examples/lightfield_bitstream_parsing.c
+++ b/examples/lightfield_bitstream_parsing.c
@@ -92,15 +92,14 @@
case AOM_IMG_FMT_I44416: return 48;
default: die("Invalid image format");
}
- return 0;
}
-void process_tile_list(const TILE_LIST_INFO *tiles, int num_tiles,
- aom_codec_pts_t tl_pts, unsigned char **frames,
- const size_t *frame_sizes, aom_codec_ctx_t *codec,
- unsigned char *tl_buf, AvxVideoWriter *writer,
- uint8_t output_frame_width_in_tiles_minus_1,
- uint8_t output_frame_height_in_tiles_minus_1) {
+static void process_tile_list(const TILE_LIST_INFO *tiles, int num_tiles,
+ aom_codec_pts_t tl_pts, unsigned char **frames,
+ const size_t *frame_sizes, aom_codec_ctx_t *codec,
+ unsigned char *tl_buf, AvxVideoWriter *writer,
+ uint8_t output_frame_width_in_tiles_minus_1,
+ uint8_t output_frame_height_in_tiles_minus_1) {
unsigned char *tl = tl_buf;
struct aom_write_bit_buffer wb = { tl, 0 };
unsigned char *saved_obu_size_loc = NULL;
diff --git a/examples/svc_encoder_rtc.c b/examples/svc_encoder_rtc.cc
similarity index 82%
rename from examples/svc_encoder_rtc.c
rename to examples/svc_encoder_rtc.cc
index bceb7d2..1730f89 100644
--- a/examples/svc_encoder_rtc.c
+++ b/examples/svc_encoder_rtc.cc
@@ -12,15 +12,19 @@
// encoding scheme for RTC video applications.
#include <assert.h>
+#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
+#include "config/aom_config.h"
+
+#if CONFIG_AV1_DECODER
+#include "aom/aom_decoder.h"
+#endif
#include "aom/aom_encoder.h"
#include "aom/aomcx.h"
-#include "av1/common/enums.h"
-#include "av1/encoder/encoder.h"
#include "common/args.h"
#include "common/tools_common.h"
#include "common/video_writer.h"
@@ -39,6 +43,7 @@
int output_obu;
int decode;
int tune_content;
+ int show_psnr;
} AppInput;
typedef enum {
@@ -92,6 +97,8 @@
static const arg_def_t test_decode_arg =
ARG_DEF(NULL, "test-decode", 1,
"Attempt to test decoding the output when set to 1. Default is 1.");
+static const arg_def_t psnr_arg =
+ ARG_DEF(NULL, "psnr", -1, "Show PSNR in status line.");
static const struct arg_enum_list tune_content_enum[] = {
{ "default", AOM_CONTENT_DEFAULT },
{ "screen", AOM_CONTENT_SCREEN },
@@ -102,40 +109,27 @@
NULL, "tune-content", 1, "Tune content type", tune_content_enum);
#if CONFIG_AV1_HIGHBITDEPTH
-static const struct arg_enum_list bitdepth_enum[] = {
- { "8", AOM_BITS_8 }, { "10", AOM_BITS_10 }, { "12", AOM_BITS_12 }, { NULL, 0 }
-};
+static const struct arg_enum_list bitdepth_enum[] = { { "8", AOM_BITS_8 },
+ { "10", AOM_BITS_10 },
+ { NULL, 0 } };
static const arg_def_t bitdepth_arg = ARG_DEF_ENUM(
- "d", "bit-depth", 1, "Bit depth for codec 8, 10 or 12. ", bitdepth_enum);
+ "d", "bit-depth", 1, "Bit depth for codec 8 or 10. ", bitdepth_enum);
#endif // CONFIG_AV1_HIGHBITDEPTH
-static const arg_def_t *svc_args[] = { &frames_arg,
- &outputfile,
- &width_arg,
- &height_arg,
- &timebase_arg,
- &bitrate_arg,
- &spatial_layers_arg,
- &kf_dist_arg,
- &scale_factors_arg,
- &min_q_arg,
- &max_q_arg,
- &temporal_layers_arg,
- &layering_mode_arg,
- &threads_arg,
- &aqmode_arg,
+static const arg_def_t *svc_args[] = {
+ &frames_arg, &outputfile, &width_arg,
+ &height_arg, &timebase_arg, &bitrate_arg,
+ &spatial_layers_arg, &kf_dist_arg, &scale_factors_arg,
+ &min_q_arg, &max_q_arg, &temporal_layers_arg,
+ &layering_mode_arg, &threads_arg, &aqmode_arg,
#if CONFIG_AV1_HIGHBITDEPTH
- &bitdepth_arg,
+ &bitdepth_arg,
#endif
- &speed_arg,
- &bitrates_arg,
- &dropframe_thresh_arg,
- &error_resilient_arg,
- &output_obu_arg,
- &test_decode_arg,
- &tune_content_arg,
- NULL };
+ &speed_arg, &bitrates_arg, &dropframe_thresh_arg,
+ &error_resilient_arg, &output_obu_arg, &test_decode_arg,
+ &tune_content_arg, &psnr_arg, NULL,
+};
#define zero(Dest) memset(&(Dest), 0, sizeof(Dest))
@@ -202,7 +196,7 @@
input->framerate.numerator = input->y4m.fps_n;
input->framerate.denominator = input->y4m.fps_d;
input->fmt = input->y4m.aom_fmt;
- input->bit_depth = input->y4m.bit_depth;
+ input->bit_depth = static_cast<aom_bit_depth_t>(input->y4m.bit_depth);
} else {
fatal("Unsupported Y4M stream.");
}
@@ -252,10 +246,10 @@
(option1 == NULL && type == SCALE_FACTOR))
return AOM_CODEC_INVALID_PARAM;
- input_string = malloc(strlen(input));
- if (!input_string) die("Failed to allocate input string.");
- memcpy(input_string, input, strlen(input));
+ const size_t input_length = strlen(input);
+ input_string = reinterpret_cast<char *>(malloc(input_length + 1));
if (input_string == NULL) return AOM_CODEC_MEM_ERROR;
+ memcpy(input_string, input, input_length + 1);
token = strtok(input_string, delim); // NOLINT
for (i = 0; i < num_layers; ++i) {
if (token != NULL) {
@@ -263,12 +257,10 @@
if (res != AOM_CODEC_OK) break;
token = strtok(NULL, delim); // NOLINT
} else {
+ res = AOM_CODEC_INVALID_PARAM;
break;
}
}
- if (res == AOM_CODEC_OK && i != num_layers) {
- res = AOM_CODEC_INVALID_PARAM;
- }
free(input_string);
return res;
}
@@ -317,8 +309,8 @@
svc_params->number_temporal_layers = arg_parse_uint(&arg);
} else if (arg_match(&arg, &speed_arg, argi)) {
app_input->speed = arg_parse_uint(&arg);
- if (app_input->speed > 10) {
- aom_tools_warn("Mapping speed %d to speed 10.\n", app_input->speed);
+ if (app_input->speed > 11) {
+ aom_tools_warn("Mapping speed %d to speed 11.\n", app_input->speed);
}
} else if (arg_match(&arg, &aqmode_arg, argi)) {
app_input->aq_mode = arg_parse_uint(&arg);
@@ -330,16 +322,21 @@
enc_cfg->kf_min_dist = arg_parse_uint(&arg);
enc_cfg->kf_max_dist = enc_cfg->kf_min_dist;
} else if (arg_match(&arg, &scale_factors_arg, argi)) {
- parse_layer_options_from_string(svc_params, SCALE_FACTOR, arg.val,
- svc_params->scaling_factor_num,
- svc_params->scaling_factor_den);
+ aom_codec_err_t res = parse_layer_options_from_string(
+ svc_params, SCALE_FACTOR, arg.val, svc_params->scaling_factor_num,
+ svc_params->scaling_factor_den);
+ if (res != AOM_CODEC_OK) {
+ die("Failed to parse scale factors: %s\n",
+ aom_codec_err_to_string(res));
+ }
} else if (arg_match(&arg, &min_q_arg, argi)) {
enc_cfg->rc_min_quantizer = arg_parse_uint(&arg);
} else if (arg_match(&arg, &max_q_arg, argi)) {
enc_cfg->rc_max_quantizer = arg_parse_uint(&arg);
#if CONFIG_AV1_HIGHBITDEPTH
} else if (arg_match(&arg, &bitdepth_arg, argi)) {
- enc_cfg->g_bit_depth = arg_parse_enum_or_int(&arg);
+ enc_cfg->g_bit_depth =
+ static_cast<aom_bit_depth_t>(arg_parse_enum_or_int(&arg));
switch (enc_cfg->g_bit_depth) {
case AOM_BITS_8:
enc_cfg->g_input_bit_depth = 8;
@@ -347,15 +344,10 @@
break;
case AOM_BITS_10:
enc_cfg->g_input_bit_depth = 10;
- enc_cfg->g_profile = 2;
- break;
- case AOM_BITS_12:
- enc_cfg->g_input_bit_depth = 12;
- enc_cfg->g_profile = 2;
+ enc_cfg->g_profile = 0;
break;
default:
die("Error: Invalid bit depth selected (%d)\n", enc_cfg->g_bit_depth);
- break;
}
#endif // CONFIG_VP9_HIGHBITDEPTH
} else if (arg_match(&arg, &dropframe_thresh_arg, argi)) {
@@ -378,6 +370,8 @@
} else if (arg_match(&arg, &tune_content_arg, argi)) {
app_input->tune_content = arg_parse_enum_or_int(&arg);
printf("tune content %d\n", app_input->tune_content);
+ } else if (arg_match(&arg, &psnr_arg, argi)) {
+ app_input->show_psnr = 1;
} else {
++argj;
}
@@ -387,8 +381,11 @@
for (argi = argj = argv; (*argj = *argi); argi += arg.argv_step) {
arg.argv_step = 1;
if (arg_match(&arg, &bitrates_arg, argi)) {
- parse_layer_options_from_string(svc_params, BITRATE, arg.val,
- svc_params->layer_target_bitrate, NULL);
+ aom_codec_err_t res = parse_layer_options_from_string(
+ svc_params, BITRATE, arg.val, svc_params->layer_target_bitrate, NULL);
+ if (res != AOM_CODEC_OK) {
+ die("Failed to parse bitrates: %s\n", aom_codec_err_to_string(res));
+ }
} else {
++argj;
}
@@ -410,7 +407,7 @@
app_input->input_ctx.filename = argv[0];
free(argv);
- open_input_file(&app_input->input_ctx, 0);
+ open_input_file(&app_input->input_ctx, AOM_CSP_UNKNOWN);
if (app_input->input_ctx.file_type == FILE_TYPE_Y4M) {
enc_cfg->g_w = app_input->input_ctx.width;
enc_cfg->g_h = app_input->input_ctx.height;
@@ -432,10 +429,10 @@
enc_cfg->rc_target_bitrate, enc_cfg->kf_max_dist);
}
-static unsigned int mode_to_num_temporal_layers[11] = { 1, 2, 3, 3, 2, 1,
- 1, 3, 3, 3, 3 };
-static unsigned int mode_to_num_spatial_layers[11] = { 1, 1, 1, 1, 1, 2,
- 3, 2, 3, 3, 3 };
+static int mode_to_num_temporal_layers[11] = {
+ 1, 2, 3, 3, 2, 1, 1, 3, 3, 3, 3
+};
+static int mode_to_num_spatial_layers[11] = { 1, 1, 1, 1, 1, 2, 3, 2, 3, 3, 3 };
// For rate control encoding stats.
struct RateControlMetrics {
@@ -465,6 +462,10 @@
int layer_target_bitrate[AOM_MAX_LAYERS];
};
+static const int REF_FRAMES = 8;
+
+static const int INTER_REFS_PER_FRAME = 7;
+
// Reference frames used in this example encoder.
enum {
SVC_LAST_FRAME = 0,
@@ -502,9 +503,8 @@
// TODO(marpan): Update these metrics to account for multiple key frames
// in the stream.
static void set_rate_control_metrics(struct RateControlMetrics *rc,
- double framerate,
- unsigned int ss_number_layers,
- unsigned int ts_number_layers) {
+ double framerate, int ss_number_layers,
+ int ts_number_layers) {
int ts_rate_decimator[AOM_MAX_TS_LAYERS] = { 1 };
ts_rate_decimator[0] = 1;
if (ts_number_layers == 2) {
@@ -518,12 +518,12 @@
}
// Set the layer (cumulative) framerate and the target layer (non-cumulative)
// per-frame-bandwidth, for the rate control encoding stats below.
- for (unsigned int sl = 0; sl < ss_number_layers; ++sl) {
- unsigned int i = sl * ts_number_layers;
+ for (int sl = 0; sl < ss_number_layers; ++sl) {
+ int i = sl * ts_number_layers;
rc->layer_framerate[0] = framerate / ts_rate_decimator[0];
rc->layer_pfb[i] =
1000.0 * rc->layer_target_bitrate[i] / rc->layer_framerate[0];
- for (unsigned int tl = 0; tl < ts_number_layers; ++tl) {
+ for (int tl = 0; tl < ts_number_layers; ++tl) {
i = sl * ts_number_layers + tl;
if (tl > 0) {
rc->layer_framerate[tl] = framerate / ts_rate_decimator[tl];
@@ -546,17 +546,16 @@
}
static void printout_rate_control_summary(struct RateControlMetrics *rc,
- int frame_cnt,
- unsigned int ss_number_layers,
- unsigned int ts_number_layers) {
+ int frame_cnt, int ss_number_layers,
+ int ts_number_layers) {
int tot_num_frames = 0;
double perc_fluctuation = 0.0;
printf("Total number of processed frames: %d\n\n", frame_cnt - 1);
- printf("Rate control layer stats for %u layer(s):\n\n", ts_number_layers);
- for (unsigned int sl = 0; sl < ss_number_layers; ++sl) {
+ printf("Rate control layer stats for %d layer(s):\n\n", ts_number_layers);
+ for (int sl = 0; sl < ss_number_layers; ++sl) {
tot_num_frames = 0;
- for (unsigned int tl = 0; tl < ts_number_layers; ++tl) {
- unsigned int i = sl * ts_number_layers + tl;
+ for (int tl = 0; tl < ts_number_layers; ++tl) {
+ int i = sl * ts_number_layers + tl;
const int num_dropped =
tl > 0 ? rc->layer_input_frames[tl] - rc->layer_enc_frames[tl]
: rc->layer_input_frames[tl] - rc->layer_enc_frames[tl] - 1;
@@ -568,7 +567,7 @@
rc->layer_avg_frame_size[i] / rc->layer_enc_frames[tl];
rc->layer_avg_rate_mismatch[i] =
100.0 * rc->layer_avg_rate_mismatch[i] / rc->layer_enc_frames[tl];
- printf("For layer#: %u %u \n", sl, tl);
+ printf("For layer#: %d %d \n", sl, tl);
printf("Bitrate (target vs actual): %d %f\n", rc->layer_target_bitrate[i],
rc->layer_encoding_bitrate[i]);
printf("Average frame size (target vs actual): %f %f\n", rc->layer_pfb[i],
@@ -637,10 +636,10 @@
ref_frame_config->reference[SVC_LAST_FRAME] = 1;
} else {
// Pattern of 2 references (ALTREF and GOLDEN) trailing
- // LAST by 4 and 8 frame, with some switching logic to
- // sometimes only predict from longer-term reference.
- // This is simple example to test RPS (reference picture selection)
- // as method to handle network packet loss.
+ // LAST by 4 and 8 frames, with some switching logic to
+ // sometimes only predict from the longer-term reference
+ //(golden here). This is simple example to test RPS
+ // (reference picture selection).
int last_idx = 0;
int last_idx_refresh = 0;
int gld_idx = 0;
@@ -674,17 +673,20 @@
ref_frame_config->reference[SVC_LAST_FRAME] = 1;
ref_frame_config->reference[SVC_ALTREF_FRAME] = 1;
ref_frame_config->reference[SVC_GOLDEN_FRAME] = 1;
- // Switch to only ALTREF for frames 200 to 250.
- if (superframe_cnt >= 200 && superframe_cnt < 250) {
- ref_frame_config->reference[SVC_LAST_FRAME] = 0;
- ref_frame_config->reference[SVC_ALTREF_FRAME] = 1;
- ref_frame_config->reference[SVC_GOLDEN_FRAME] = 0;
- }
- // Switch to only GOLDEN for frames 400 to 450.
- if (superframe_cnt >= 400 && superframe_cnt < 450) {
+ // Switch to only GOLDEN every 300 frames.
+ if (superframe_cnt % 200 == 0 && superframe_cnt > 0) {
ref_frame_config->reference[SVC_LAST_FRAME] = 0;
ref_frame_config->reference[SVC_ALTREF_FRAME] = 0;
ref_frame_config->reference[SVC_GOLDEN_FRAME] = 1;
+ // Test if the long-term is LAST instead, this is just a renaming
+ // but its tests if encoder behaves the same, whether its
+ // LAST or GOLDEN.
+ if (superframe_cnt % 400 == 0 && superframe_cnt > 0) {
+ ref_frame_config->ref_idx[SVC_LAST_FRAME] = gld_idx;
+ ref_frame_config->reference[SVC_LAST_FRAME] = 1;
+ ref_frame_config->reference[SVC_ALTREF_FRAME] = 0;
+ ref_frame_config->reference[SVC_GOLDEN_FRAME] = 0;
+ }
}
}
break;
@@ -692,16 +694,36 @@
// 2-temporal layer.
// 1 3 5
// 0 2 4
+ // Keep golden fixed at slot 3.
+ base_count = superframe_cnt >> 1;
+ ref_frame_config->ref_idx[SVC_GOLDEN_FRAME] = 3;
+ // Cyclically refresh slots 5, 6, 7, for lag alt ref.
+ lag_index = 5;
+ if (base_count > 0) {
+ lag_index = 5 + (base_count % 3);
+ if (superframe_cnt % 2 != 0) lag_index = 5 + ((base_count + 1) % 3);
+ }
+ // Set the altref slot to lag_index.
+ ref_frame_config->ref_idx[SVC_ALTREF_FRAME] = lag_index;
if (superframe_cnt % 2 == 0) {
layer_id->temporal_layer_id = 0;
// Update LAST on layer 0, reference LAST.
ref_frame_config->refresh[0] = 1;
ref_frame_config->reference[SVC_LAST_FRAME] = 1;
+ // Refresh lag_index slot, needed for lagging golen.
+ ref_frame_config->refresh[lag_index] = 1;
+ // Refresh GOLDEN every x base layer frames.
+ if (base_count % 32 == 0) ref_frame_config->refresh[3] = 1;
} else {
layer_id->temporal_layer_id = 1;
- // No updates on layer 1, only reference LAST (TL0).
+ // No updates on layer 1, reference LAST (TL0).
ref_frame_config->reference[SVC_LAST_FRAME] = 1;
}
+ // Always reference golden and altref on TL0.
+ if (layer_id->temporal_layer_id == 0) {
+ ref_frame_config->reference[SVC_GOLDEN_FRAME] = 1;
+ ref_frame_config->reference[SVC_ALTREF_FRAME] = 1;
+ }
break;
case 2:
// 3-temporal layer:
@@ -781,8 +803,11 @@
// Every frame can reference GOLDEN AND ALTREF.
ref_frame_config->reference[SVC_GOLDEN_FRAME] = 1;
ref_frame_config->reference[SVC_ALTREF_FRAME] = 1;
- // Allow for compound prediction using LAST and ALTREF.
- if (speed >= 7) ref_frame_comp_pred->use_comp_pred[2] = 1;
+ // Allow for compound prediction for LAST-ALTREF and LAST-GOLDEN.
+ if (speed >= 7) {
+ ref_frame_comp_pred->use_comp_pred[2] = 1;
+ ref_frame_comp_pred->use_comp_pred[0] = 1;
+ }
break;
case 4:
// 3-temporal layer: but middle layer updates GF, so 2nd TL2 will
@@ -1108,13 +1133,14 @@
}
#if CONFIG_AV1_DECODER
-static void test_decode(aom_codec_ctx_t *encoder, aom_codec_ctx_t *decoder,
- const int frames_out, int *mismatch_seen) {
+// Returns whether there is a mismatch between the encoder's new frame and the
+// decoder's new frame.
+static int test_decode(aom_codec_ctx_t *encoder, aom_codec_ctx_t *decoder,
+ const int frames_out) {
aom_image_t enc_img, dec_img;
+ int mismatch = 0;
- if (*mismatch_seen) return;
-
- /* Get the internal reference frame */
+ /* Get the internal new frame */
AOM_CODEC_CONTROL_TYPECHECKED(encoder, AV1_GET_NEW_FRAME_IMAGE, &enc_img);
AOM_CODEC_CONTROL_TYPECHECKED(decoder, AV1_GET_NEW_FRAME_IMAGE, &dec_img);
@@ -1123,15 +1149,19 @@
(dec_img.fmt & AOM_IMG_FMT_HIGHBITDEPTH)) {
if (enc_img.fmt & AOM_IMG_FMT_HIGHBITDEPTH) {
aom_image_t enc_hbd_img;
- aom_img_alloc(&enc_hbd_img, enc_img.fmt - AOM_IMG_FMT_HIGHBITDEPTH,
- enc_img.d_w, enc_img.d_h, 16);
+ aom_img_alloc(
+ &enc_hbd_img,
+ static_cast<aom_img_fmt_t>(enc_img.fmt - AOM_IMG_FMT_HIGHBITDEPTH),
+ enc_img.d_w, enc_img.d_h, 16);
aom_img_truncate_16_to_8(&enc_hbd_img, &enc_img);
enc_img = enc_hbd_img;
}
if (dec_img.fmt & AOM_IMG_FMT_HIGHBITDEPTH) {
aom_image_t dec_hbd_img;
- aom_img_alloc(&dec_hbd_img, dec_img.fmt - AOM_IMG_FMT_HIGHBITDEPTH,
- dec_img.d_w, dec_img.d_h, 16);
+ aom_img_alloc(
+ &dec_hbd_img,
+ static_cast<aom_img_fmt_t>(dec_img.fmt - AOM_IMG_FMT_HIGHBITDEPTH),
+ dec_img.d_w, dec_img.d_h, 16);
aom_img_truncate_16_to_8(&dec_hbd_img, &dec_img);
dec_img = dec_hbd_img;
}
@@ -1149,22 +1179,47 @@
#else
aom_find_mismatch(&enc_img, &dec_img, y, u, v);
#endif
- decoder->err = 1;
- printf(
- "Encode/decode mismatch on frame %d at"
- " Y[%d, %d] {%d/%d},"
- " U[%d, %d] {%d/%d},"
- " V[%d, %d] {%d/%d}",
- frames_out, y[0], y[1], y[2], y[3], u[0], u[1], u[2], u[3], v[0], v[1],
- v[2], v[3]);
- *mismatch_seen = frames_out;
+ fprintf(stderr,
+ "Encode/decode mismatch on frame %d at"
+ " Y[%d, %d] {%d/%d},"
+ " U[%d, %d] {%d/%d},"
+ " V[%d, %d] {%d/%d}\n",
+ frames_out, y[0], y[1], y[2], y[3], u[0], u[1], u[2], u[3], v[0],
+ v[1], v[2], v[3]);
+ mismatch = 1;
}
aom_img_free(&enc_img);
aom_img_free(&dec_img);
+ return mismatch;
}
#endif // CONFIG_AV1_DECODER
+struct psnr_stats {
+ // The second element of these arrays is reserved for high bitdepth.
+ uint64_t psnr_sse_total[2];
+ uint64_t psnr_samples_total[2];
+ double psnr_totals[2][4];
+ int psnr_count[2];
+};
+
+static void show_psnr(struct psnr_stats *psnr_stream, double peak) {
+ double ovpsnr;
+
+ if (!psnr_stream->psnr_count[0]) return;
+
+ fprintf(stderr, "\nPSNR (Overall/Avg/Y/U/V)");
+ ovpsnr = sse_to_psnr((double)psnr_stream->psnr_samples_total[0], peak,
+ (double)psnr_stream->psnr_sse_total[0]);
+ fprintf(stderr, " %.3f", ovpsnr);
+
+ for (int i = 0; i < 4; i++) {
+ fprintf(stderr, " %.3f",
+ psnr_stream->psnr_totals[0][i] / psnr_stream->psnr_count[0]);
+ }
+ fprintf(stderr, "\n");
+}
+
int main(int argc, const char **argv) {
AppInput app_input;
AvxVideoWriter *outfile[AOM_MAX_LAYERS] = { NULL };
@@ -1177,7 +1232,7 @@
int frame_avail;
int got_data = 0;
int flags = 0;
- unsigned i;
+ int i;
int pts = 0; // PTS starts at 0.
int frame_duration = 1; // 1 timebase tick per frame.
aom_svc_layer_id_t layer_id;
@@ -1192,7 +1247,6 @@
}
#endif
#if CONFIG_AV1_DECODER
- int mismatch_seen = 0;
aom_codec_ctx_t decoder;
#endif
@@ -1205,6 +1259,7 @@
double framerate = 30.0;
int use_svc_control = 1;
int set_err_resil_frame = 0;
+ int test_changing_bitrate = 0;
zero(rc.layer_target_bitrate);
memset(&layer_id, 0, sizeof(aom_svc_layer_id_t));
memset(&app_input, 0, sizeof(AppInput));
@@ -1214,18 +1269,21 @@
// spatial stream, using the scaling_mode control.
const int test_dynamic_scaling_single_layer = 0;
+ // Flag to test setting speed per layer.
+ const int test_speed_per_layer = 0;
+
/* Setup default input stream settings */
app_input.input_ctx.framerate.numerator = 30;
app_input.input_ctx.framerate.denominator = 1;
- app_input.input_ctx.only_i420 = 1;
- app_input.input_ctx.bit_depth = 0;
+ app_input.input_ctx.only_i420 = 0;
+ app_input.input_ctx.bit_depth = AOM_BITS_8;
app_input.speed = 7;
exec_name = argv[0];
// start with default encoder configuration
aom_codec_err_t res = aom_codec_enc_config_default(aom_codec_av1_cx(), &cfg,
AOM_USAGE_REALTIME);
- if (res) {
+ if (res != AOM_CODEC_OK) {
die("Failed to get config: %s\n", aom_codec_err_to_string(res));
}
@@ -1246,8 +1304,8 @@
parse_command_line(argc, argv, &app_input, &svc_params, &cfg);
- unsigned int ts_number_layers = svc_params.number_temporal_layers;
- unsigned int ss_number_layers = svc_params.number_spatial_layers;
+ int ts_number_layers = svc_params.number_temporal_layers;
+ int ss_number_layers = svc_params.number_spatial_layers;
unsigned int width = cfg.g_w;
unsigned int height = cfg.g_h;
@@ -1268,7 +1326,7 @@
}
}
- aom_codec_iface_t *encoder = get_aom_encoder_by_short_name("av1");
+ aom_codec_iface_t *encoder = aom_codec_av1_cx();
memcpy(&rc.layer_target_bitrate[0], &svc_params.layer_target_bitrate[0],
sizeof(svc_params.layer_target_bitrate));
@@ -1311,11 +1369,11 @@
info.time_base.numerator = cfg.g_timebase.num;
info.time_base.denominator = cfg.g_timebase.den;
// Open an output file for each stream.
- for (unsigned int sl = 0; sl < ss_number_layers; ++sl) {
- for (unsigned tl = 0; tl < ts_number_layers; ++tl) {
+ for (int sl = 0; sl < ss_number_layers; ++sl) {
+ for (int tl = 0; tl < ts_number_layers; ++tl) {
i = sl * ts_number_layers + tl;
char file_name[PATH_MAX];
- snprintf(file_name, sizeof(file_name), "%s_%u.av1",
+ snprintf(file_name, sizeof(file_name), "%s_%d.av1",
app_input.output_filename, i);
if (app_input.output_obu) {
obu_files[i] = fopen(file_name, "wb");
@@ -1339,14 +1397,16 @@
// Initialize codec.
aom_codec_ctx_t codec;
- if (aom_codec_enc_init(&codec, encoder, &cfg, 0))
- die("Failed to initialize encoder");
+ aom_codec_flags_t flag = 0;
+ flag |= cfg.g_input_bit_depth == AOM_BITS_8 ? 0 : AOM_CODEC_USE_HIGHBITDEPTH;
+ flag |= app_input.show_psnr ? AOM_CODEC_USE_PSNR : 0;
+ if (aom_codec_enc_init(&codec, encoder, &cfg, flag))
+ die_codec(&codec, "Failed to initialize encoder");
#if CONFIG_AV1_DECODER
if (app_input.decode) {
- if (aom_codec_dec_init(&decoder, get_aom_decoder_by_index(0), NULL, 0)) {
- die("Failed to initialize decoder");
- }
+ if (aom_codec_dec_init(&decoder, get_aom_decoder_by_index(0), NULL, 0))
+ die_codec(&decoder, "Failed to initialize decoder");
}
#endif
@@ -1374,9 +1434,10 @@
aom_codec_control(&codec, AV1E_SET_ENABLE_FILTER_INTRA, 0);
aom_codec_control(&codec, AV1E_SET_INTRA_DEFAULT_TX_ONLY, 1);
- aom_codec_control(&codec, AV1E_SET_TILE_COLUMNS,
- cfg.g_threads ? get_msb(cfg.g_threads) : 0);
- if (cfg.g_threads > 1) aom_codec_control(&codec, AV1E_SET_ROW_MT, 1);
+ if (cfg.g_threads > 1) {
+ aom_codec_control(&codec, AV1E_SET_TILE_COLUMNS,
+ (unsigned int)log2(cfg.g_threads));
+ }
aom_codec_control(&codec, AV1E_SET_TUNE_CONTENT, app_input.tune_content);
if (app_input.tune_content == AOM_CONTENT_SCREEN) {
@@ -1417,17 +1478,19 @@
max_intra_size_pct);
}
- for (unsigned int lx = 0; lx < ts_number_layers * ss_number_layers; lx++) {
+ for (int lx = 0; lx < ts_number_layers * ss_number_layers; lx++) {
cx_time_layer[lx] = 0;
frame_cnt_layer[lx] = 0;
}
frame_avail = 1;
+ struct psnr_stats psnr_stream;
+ memset(&psnr_stream, 0, sizeof(psnr_stream));
while (frame_avail || got_data) {
struct aom_usec_timer timer;
frame_avail = read_frame(&(app_input.input_ctx), &raw);
// Loop over spatial layers.
- for (unsigned int slx = 0; slx < ss_number_layers; slx++) {
+ for (int slx = 0; slx < ss_number_layers; slx++) {
aom_codec_iter_t iter = NULL;
const aom_codec_cx_pkt_t *pkt;
int layer = 0;
@@ -1448,6 +1511,24 @@
aom_codec_control(&codec, AV1E_SET_SVC_REF_FRAME_COMP_PRED,
&ref_frame_comp_pred);
}
+ // Set the speed per layer.
+ if (test_speed_per_layer) {
+ int speed_per_layer = 10;
+ if (layer_id.spatial_layer_id == 0) {
+ if (layer_id.temporal_layer_id == 0) speed_per_layer = 6;
+ if (layer_id.temporal_layer_id == 1) speed_per_layer = 7;
+ if (layer_id.temporal_layer_id == 2) speed_per_layer = 8;
+ } else if (layer_id.spatial_layer_id == 1) {
+ if (layer_id.temporal_layer_id == 0) speed_per_layer = 7;
+ if (layer_id.temporal_layer_id == 1) speed_per_layer = 8;
+ if (layer_id.temporal_layer_id == 2) speed_per_layer = 9;
+ } else if (layer_id.spatial_layer_id == 2) {
+ if (layer_id.temporal_layer_id == 0) speed_per_layer = 8;
+ if (layer_id.temporal_layer_id == 1) speed_per_layer = 9;
+ if (layer_id.temporal_layer_id == 2) speed_per_layer = 10;
+ }
+ aom_codec_control(&codec, AOME_SET_CPUUSED, speed_per_layer);
+ }
} else {
// Only up to 3 temporal layers supported in fixed mode.
// Only need to set spatial and temporal layer_id: reference
@@ -1465,11 +1546,16 @@
aom_codec_control(&codec, AV1E_SET_SVC_LAYER_ID, &layer_id);
}
- if (set_err_resil_frame) {
+ if (set_err_resil_frame && cfg.g_error_resilient == 0) {
// Set error_resilient per frame: off/0 for base layer and
// on/1 for enhancement layer frames.
- int err_resil_mode =
- (layer_id.spatial_layer_id > 0 || layer_id.temporal_layer_id > 0);
+ // Note that this is can only be done on the fly/per-frame/layer
+ // if the config error_resilience is off/0. See the logic for updating
+ // in set_encoder_config():
+ // tool_cfg->error_resilient_mode =
+ // cfg->g_error_resilient | extra_cfg->error_resilient_mode;
+ const int err_resil_mode =
+ layer_id.spatial_layer_id > 0 || layer_id.temporal_layer_id > 0;
aom_codec_control(&codec, AV1E_SET_ERROR_RESILIENT_MODE,
err_resil_mode);
}
@@ -1518,6 +1604,23 @@
}
}
+ // Change target_bitrate every other frame.
+ if (test_changing_bitrate && frame_cnt % 2 == 0) {
+ if (frame_cnt < 500)
+ cfg.rc_target_bitrate += 10;
+ else
+ cfg.rc_target_bitrate -= 10;
+ // Do big increase and decrease.
+ if (frame_cnt == 100) cfg.rc_target_bitrate <<= 1;
+ if (frame_cnt == 600) cfg.rc_target_bitrate >>= 1;
+ if (cfg.rc_target_bitrate < 100) cfg.rc_target_bitrate = 100;
+ // Call change_config, or bypass with new control.
+ // res = aom_codec_enc_config_set(&codec, &cfg);
+ if (aom_codec_control(&codec, AV1E_SET_BITRATE_ONE_PASS_CBR,
+ cfg.rc_target_bitrate))
+ die_codec(&codec, "Failed to SET_BITRATE_ONE_PASS_CBR");
+ }
+
// Do the layer encode.
aom_usec_timer_start(&timer);
if (aom_codec_encode(&codec, frame_avail ? &raw : NULL, pts, 1, flags))
@@ -1529,38 +1632,42 @@
got_data = 0;
while ((pkt = aom_codec_get_cx_data(&codec, &iter))) {
- got_data = 1;
switch (pkt->kind) {
case AOM_CODEC_CX_FRAME_PKT:
- for (unsigned int sl = layer_id.spatial_layer_id;
- sl < ss_number_layers; ++sl) {
- for (unsigned tl = layer_id.temporal_layer_id;
- tl < ts_number_layers; ++tl) {
- unsigned int j = sl * ts_number_layers + tl;
+ for (int sl = layer_id.spatial_layer_id; sl < ss_number_layers;
+ ++sl) {
+ for (int tl = layer_id.temporal_layer_id; tl < ts_number_layers;
+ ++tl) {
+ int j = sl * ts_number_layers + tl;
if (app_input.output_obu) {
fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz,
obu_files[j]);
} else {
- aom_video_writer_write_frame(outfile[j], pkt->data.frame.buf,
- pkt->data.frame.sz, pts);
+ aom_video_writer_write_frame(
+ outfile[j],
+ reinterpret_cast<const uint8_t *>(pkt->data.frame.buf),
+ pkt->data.frame.sz, pts);
}
- if (sl == (unsigned int)layer_id.spatial_layer_id)
+ if (sl == layer_id.spatial_layer_id)
rc.layer_encoding_bitrate[j] += 8.0 * pkt->data.frame.sz;
}
}
+ got_data = 1;
// Write everything into the top layer.
if (app_input.output_obu) {
fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz,
total_layer_obu_file);
} else {
- aom_video_writer_write_frame(total_layer_file,
- pkt->data.frame.buf,
- pkt->data.frame.sz, pts);
+ aom_video_writer_write_frame(
+ total_layer_file,
+ reinterpret_cast<const uint8_t *>(pkt->data.frame.buf),
+ pkt->data.frame.sz, pts);
}
// Keep count of rate control stats per layer (for non-key).
if (!(pkt->data.frame.flags & AOM_FRAME_IS_KEY)) {
- unsigned int j = layer_id.spatial_layer_id * ts_number_layers +
- layer_id.temporal_layer_id;
+ int j = layer_id.spatial_layer_id * ts_number_layers +
+ layer_id.temporal_layer_id;
+ assert(j >= 0);
rc.layer_avg_frame_size[j] += 8.0 * pkt->data.frame.sz;
rc.layer_avg_rate_mismatch[j] +=
fabs(8.0 * pkt->data.frame.sz - rc.layer_pfb[j]) /
@@ -1601,24 +1708,43 @@
#if CONFIG_AV1_DECODER
if (app_input.decode) {
- if (aom_codec_decode(&decoder, pkt->data.frame.buf,
- (unsigned int)pkt->data.frame.sz, NULL))
- die_codec(&decoder, "Failed to decode frame.");
+ if (aom_codec_decode(
+ &decoder,
+ reinterpret_cast<const uint8_t *>(pkt->data.frame.buf),
+ pkt->data.frame.sz, NULL))
+ die_codec(&decoder, "Failed to decode frame");
}
#endif
break;
+ case AOM_CODEC_PSNR_PKT:
+ if (app_input.show_psnr) {
+ psnr_stream.psnr_sse_total[0] += pkt->data.psnr.sse[0];
+ psnr_stream.psnr_samples_total[0] += pkt->data.psnr.samples[0];
+ for (int plane = 0; plane < 4; plane++) {
+ psnr_stream.psnr_totals[0][plane] += pkt->data.psnr.psnr[plane];
+ }
+ psnr_stream.psnr_count[0]++;
+ }
+ break;
default: break;
}
}
#if CONFIG_AV1_DECODER
- if (app_input.decode) {
+ if (got_data && app_input.decode) {
// Don't look for mismatch on top spatial and top temporal layers as
// they are non reference frames.
if ((ss_number_layers > 1 || ts_number_layers > 1) &&
!(layer_id.temporal_layer_id > 0 &&
- layer_id.temporal_layer_id == (int)ts_number_layers - 1)) {
- test_decode(&codec, &decoder, frame_cnt, &mismatch_seen);
+ layer_id.temporal_layer_id == ts_number_layers - 1)) {
+ if (test_decode(&codec, &decoder, frame_cnt)) {
+#if CONFIG_INTERNAL_STATS
+ fprintf(stats_file, "First mismatch occurred in frame %d\n",
+ frame_cnt);
+ fclose(stats_file);
+#endif
+ fatal("Mismatch seen");
+ }
}
}
#endif
@@ -1632,8 +1758,8 @@
ts_number_layers);
printf("\n");
- for (unsigned int slx = 0; slx < ss_number_layers; slx++)
- for (unsigned int tlx = 0; tlx < ts_number_layers; tlx++) {
+ for (int slx = 0; slx < ss_number_layers; slx++)
+ for (int tlx = 0; tlx < ts_number_layers; tlx++) {
int lx = slx * ts_number_layers + tlx;
printf("Per layer encoding time/FPS stats for encoder: %d %d %d %f %f \n",
slx, tlx, frame_cnt_layer[lx],
@@ -1646,14 +1772,21 @@
frame_cnt, 1000 * (float)cx_time / (double)(frame_cnt * 1000000),
1000000 * (double)frame_cnt / (double)cx_time);
- if (aom_codec_destroy(&codec)) die_codec(&codec, "Failed to destroy codec");
+ if (app_input.show_psnr) {
+ show_psnr(&psnr_stream, 255.0);
+ }
+
+ if (aom_codec_destroy(&codec)) die_codec(&codec, "Failed to destroy encoder");
+
+#if CONFIG_AV1_DECODER
+ if (app_input.decode) {
+ if (aom_codec_destroy(&decoder))
+ die_codec(&decoder, "Failed to destroy decoder");
+ }
+#endif
#if CONFIG_INTERNAL_STATS
- if (mismatch_seen) {
- fprintf(stats_file, "First mismatch occurred in frame %d\n", mismatch_seen);
- } else {
- fprintf(stats_file, "No mismatch detected in recon buffers\n");
- }
+ fprintf(stats_file, "No mismatch detected in recon buffers\n");
fclose(stats_file);
#endif
diff --git a/libaom_blocklist.txt b/libaom_blocklist.txt
index 19851d3..06a721b 100644
--- a/libaom_blocklist.txt
+++ b/libaom_blocklist.txt
@@ -7,6 +7,7 @@
# libaom/av1/encoder/ratectrl.c: indirect call to assembly code on x86/x86_64 platform
fun:rc_scene_detection_onepass_rt
# libaom/av1/encoder/var_based_part.c: indirect call to assembly code on x86/x86_64 platform
+fun:evaluate_neighbour_mvs
fun:setup_planes
fun:chroma_check
# libaom/av1/encoder/rd.c: indirect call to assembly code on x86/x86_64 platform
diff --git a/test/acm_random.h b/test/acm_random.h
index bc38ba4..15e8c9c 100644
--- a/test/acm_random.h
+++ b/test/acm_random.h
@@ -59,12 +59,6 @@
return (value >> 19) & 0xfff;
}
- int16_t Rand9Signed() {
- // Use 9 bits: values between 255 (0x0FF) and -256 (0x100).
- const uint32_t value = random_.Generate(512);
- return static_cast<int16_t>(value) - 256;
- }
-
uint8_t Rand8() {
const uint32_t value =
random_.Generate(testing::internal::Random::kMaxRange);
diff --git a/test/aomenc.sh b/test/aomenc.sh
index ed98313..0bb9fba 100755
--- a/test/aomenc.sh
+++ b/test/aomenc.sh
@@ -40,12 +40,6 @@
fi
}
-aomenc_can_encode_av1() {
- if [ "$(av1_encode_available)" = "yes" ]; then
- echo yes
- fi
-}
-
# Utilities that echo aomenc input file parameters.
y4m_input_non_square_par() {
echo ""${Y4M_NOSQ_PAR_INPUT}""
diff --git a/test/av1_convolve_scale_test.cc b/test/av1_convolve_scale_test.cc
index 3f35025..c321de2 100644
--- a/test/av1_convolve_scale_test.cc
+++ b/test/av1_convolve_scale_test.cc
@@ -244,8 +244,10 @@
typedef tuple<int, int> BlockDimension;
struct BaseParams {
- BaseParams(BlockDimension dims, NTaps ntaps_x, NTaps ntaps_y, bool avg)
- : dims(dims), ntaps_x(ntaps_x), ntaps_y(ntaps_y), avg(avg) {}
+ BaseParams(BlockDimension dimensions, NTaps num_taps_x, NTaps num_taps_y,
+ bool average)
+ : dims(dimensions), ntaps_x(num_taps_x), ntaps_y(num_taps_y),
+ avg(average) {}
BlockDimension dims;
NTaps ntaps_x, ntaps_y;
@@ -455,11 +457,20 @@
TEST_P(LowBDConvolveScaleTest, DISABLED_Speed) { SpeedTest(); }
INSTANTIATE_TEST_SUITE_P(
+ C, LowBDConvolveScaleTest,
+ ::testing::Combine(::testing::Values(av1_convolve_2d_scale_c),
+ ::testing::ValuesIn(kBlockDim),
+ ::testing::ValuesIn(kNTaps), ::testing::ValuesIn(kNTaps),
+ ::testing::Bool()));
+
+#if HAVE_SSE4_1
+INSTANTIATE_TEST_SUITE_P(
SSE4_1, LowBDConvolveScaleTest,
::testing::Combine(::testing::Values(av1_convolve_2d_scale_sse4_1),
::testing::ValuesIn(kBlockDim),
::testing::ValuesIn(kNTaps), ::testing::ValuesIn(kNTaps),
::testing::Bool()));
+#endif // HAVE_SSE4_1
#if CONFIG_AV1_HIGHBITDEPTH
typedef void (*HighbdConvolveFunc)(const uint16_t *src, int src_stride,
@@ -522,10 +533,30 @@
TEST_P(HighBDConvolveScaleTest, DISABLED_Speed) { SpeedTest(); }
INSTANTIATE_TEST_SUITE_P(
+ C, HighBDConvolveScaleTest,
+ ::testing::Combine(::testing::Values(av1_highbd_convolve_2d_scale_c),
+ ::testing::ValuesIn(kBlockDim),
+ ::testing::ValuesIn(kNTaps), ::testing::ValuesIn(kNTaps),
+ ::testing::Bool(), ::testing::ValuesIn(kBDs)));
+
+#if HAVE_SSE4_1
+INSTANTIATE_TEST_SUITE_P(
SSE4_1, HighBDConvolveScaleTest,
::testing::Combine(::testing::Values(av1_highbd_convolve_2d_scale_sse4_1),
::testing::ValuesIn(kBlockDim),
::testing::ValuesIn(kNTaps), ::testing::ValuesIn(kNTaps),
::testing::Bool(), ::testing::ValuesIn(kBDs)));
+#endif // HAVE_SSE4_1
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, HighBDConvolveScaleTest,
+ ::testing::Combine(::testing::Values(av1_highbd_convolve_2d_scale_neon),
+ ::testing::ValuesIn(kBlockDim),
+ ::testing::ValuesIn(kNTaps), ::testing::ValuesIn(kNTaps),
+ ::testing::Bool(), ::testing::ValuesIn(kBDs)));
+
+#endif // HAVE_NEON
+
#endif // CONFIG_AV1_HIGHBITDEPTH
} // namespace
diff --git a/test/av1_convolve_test.cc b/test/av1_convolve_test.cc
index 12edfac..873960d 100644
--- a/test/av1_convolve_test.cc
+++ b/test/av1_convolve_test.cc
@@ -535,6 +535,11 @@
BuildHighbdParams(av1_highbd_convolve_x_sr_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, AV1ConvolveXHighbdTest,
+ BuildHighbdParams(av1_highbd_convolve_x_sr_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
////////////////////////////////////////////////////////
@@ -735,6 +740,11 @@
BuildHighbdParams(av1_highbd_convolve_y_sr_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, AV1ConvolveYHighbdTest,
+ BuildHighbdParams(av1_highbd_convolve_y_sr_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
//////////////////////////////////////////////////////////////
@@ -1072,6 +1082,11 @@
BuildHighbdParams(av1_highbd_convolve_2d_sr_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, AV1Convolve2DHighbdTest,
+ BuildHighbdParams(av1_highbd_convolve_2d_sr_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
//////////////////////////
@@ -1377,6 +1392,12 @@
BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_x_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1ConvolveXHighbdCompoundTest,
+ BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_x_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
////////////////////////////////////////////////
@@ -1451,6 +1472,12 @@
BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_y_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1ConvolveYHighbdCompoundTest,
+ BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_y_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
//////////////////////////////////////////////////////
@@ -1655,6 +1682,12 @@
BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_2d_copy_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1Convolve2DCopyHighbdCompoundTest,
+ BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_2d_copy_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
/////////////////////////////////////////////////
@@ -1846,6 +1879,12 @@
BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_2d_avx2));
#endif
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1Convolve2DHighbdCompoundTest,
+ BuildHighbdLumaParams(av1_highbd_dist_wtd_convolve_2d_neon));
+#endif
+
#endif // CONFIG_AV1_HIGHBITDEPTH
} // namespace
diff --git a/test/av1_fwd_txfm1d_test.cc b/test/av1_fwd_txfm1d_test.cc
index df504ea..885a6db 100644
--- a/test/av1_fwd_txfm1d_test.cc
+++ b/test/av1_fwd_txfm1d_test.cc
@@ -84,7 +84,7 @@
const int count_test_block = 5000;
if (fwd_txfm_func != nullptr) {
- for (int ti = 0; ti < count_test_block; ++ti) {
+ for (int i = 0; i < count_test_block; ++i) {
for (int ni = 0; ni < txfm_size; ++ni) {
input[ni] = rnd.Rand16() % input_base - rnd.Rand16() % input_base;
ref_input[ni] = static_cast<double>(input[ni]);
diff --git a/test/av1_fwd_txfm2d_test.cc b/test/av1_fwd_txfm2d_test.cc
index 525e0cc..7b84eb9 100644
--- a/test/av1_fwd_txfm2d_test.cc
+++ b/test/av1_fwd_txfm2d_test.cc
@@ -27,6 +27,7 @@
using libaom_test::bd;
using libaom_test::compute_avg_abs_error;
using libaom_test::input_base;
+using libaom_test::tx_type_name;
using libaom_test::TYPE_TXFM;
using std::vector;
@@ -99,7 +100,8 @@
actual_max_error = AOMMAX(actual_max_error, this_error);
}
EXPECT_GE(max_error_, actual_max_error)
- << "tx_size = " << tx_size_ << ", tx_type = " << tx_type_;
+ << "tx_w: " << tx_width_ << " tx_h: " << tx_height_
+ << ", tx_type = " << (int)tx_type_;
if (actual_max_error > max_error_) { // exit early.
break;
}
@@ -260,8 +262,8 @@
ACMRandom rnd(ACMRandom::DeterministicSeed());
for (int cnt = 0; cnt < 500; ++cnt) {
if (cnt == 0) {
- for (int r = 0; r < rows; ++r) {
- for (int c = 0; c < cols; ++c) {
+ for (int c = 0; c < cols; ++c) {
+ for (int r = 0; r < rows; ++r) {
input[r * input_stride + c] = (1 << bd) - 1;
}
}
@@ -278,14 +280,15 @@
param.bd = bd;
ref_func(input, ref_output, input_stride, (TX_TYPE)tx_type, bd);
target_func(input, output, input_stride, ¶m);
- const int check_rows = AOMMIN(32, rows);
- const int check_cols = AOMMIN(32, rows * cols / check_rows);
+ const int check_cols = AOMMIN(32, cols);
+ const int check_rows = AOMMIN(32, rows * cols / check_cols);
for (int r = 0; r < check_rows; ++r) {
for (int c = 0; c < check_cols; ++c) {
ASSERT_EQ(ref_output[r * check_cols + c],
output[r * check_cols + c])
<< "[" << r << "," << c << "] cnt:" << cnt
- << " tx_size: " << tx_size << " tx_type: " << tx_type;
+ << " tx_size: " << cols << "x" << rows
+ << " tx_type: " << tx_type_name[tx_type];
}
}
}
@@ -300,57 +303,55 @@
const int cols = tx_size_wide[tx_size];
const int num_loops = 1000000 / (rows * cols);
- for (int i = 0; i < 2; ++i) {
- const int bd = 8;
- for (int tx_type = 0; tx_type < TX_TYPES; ++tx_type) {
- if (libaom_test::IsTxSizeTypeValid(
- tx_size, static_cast<TX_TYPE>(tx_type)) == false) {
- continue;
+ const int bd = 8;
+ for (int tx_type = 0; tx_type < TX_TYPES; ++tx_type) {
+ if (libaom_test::IsTxSizeTypeValid(
+ tx_size, static_cast<TX_TYPE>(tx_type)) == false) {
+ continue;
+ }
+
+ FwdTxfm2dFunc ref_func = libaom_test::fwd_txfm_func_ls[tx_size];
+ if (ref_func != nullptr) {
+ DECLARE_ALIGNED(32, int16_t, input[64 * 64]) = { 0 };
+ DECLARE_ALIGNED(32, int32_t, output[64 * 64]);
+ DECLARE_ALIGNED(32, int32_t, ref_output[64 * 64]);
+ int input_stride = 64;
+ ACMRandom rnd(ACMRandom::DeterministicSeed());
+
+ for (int r = 0; r < rows; ++r) {
+ for (int c = 0; c < cols; ++c) {
+ input[r * input_stride + c] = rnd.Rand16() % (1 << bd);
+ }
}
- FwdTxfm2dFunc ref_func = libaom_test::fwd_txfm_func_ls[tx_size];
- if (ref_func != nullptr) {
- DECLARE_ALIGNED(32, int16_t, input[64 * 64]) = { 0 };
- DECLARE_ALIGNED(32, int32_t, output[64 * 64]);
- DECLARE_ALIGNED(32, int32_t, ref_output[64 * 64]);
- int input_stride = 64;
- ACMRandom rnd(ACMRandom::DeterministicSeed());
+ param.tx_type = (TX_TYPE)tx_type;
+ param.tx_size = (TX_SIZE)tx_size;
+ param.tx_set_type = EXT_TX_SET_ALL16;
+ param.bd = bd;
- for (int r = 0; r < rows; ++r) {
- for (int c = 0; c < cols; ++c) {
- input[r * input_stride + c] = rnd.Rand16() % (1 << bd);
- }
- }
+ aom_usec_timer ref_timer, test_timer;
- param.tx_type = (TX_TYPE)tx_type;
- param.tx_size = (TX_SIZE)tx_size;
- param.tx_set_type = EXT_TX_SET_ALL16;
- param.bd = bd;
-
- aom_usec_timer ref_timer, test_timer;
-
- aom_usec_timer_start(&ref_timer);
- for (int i = 0; i < num_loops; ++i) {
- ref_func(input, ref_output, input_stride, (TX_TYPE)tx_type, bd);
- }
- aom_usec_timer_mark(&ref_timer);
- const int elapsed_time_c =
- static_cast<int>(aom_usec_timer_elapsed(&ref_timer));
-
- aom_usec_timer_start(&test_timer);
- for (int i = 0; i < num_loops; ++i) {
- target_func(input, output, input_stride, ¶m);
- }
- aom_usec_timer_mark(&test_timer);
- const int elapsed_time_simd =
- static_cast<int>(aom_usec_timer_elapsed(&test_timer));
-
- printf(
- "txfm_size[%d] \t txfm_type[%d] \t c_time=%d \t simd_time=%d \t "
- "gain=%d \n",
- tx_size, tx_type, elapsed_time_c, elapsed_time_simd,
- (elapsed_time_c / elapsed_time_simd));
+ aom_usec_timer_start(&ref_timer);
+ for (int i = 0; i < num_loops; ++i) {
+ ref_func(input, ref_output, input_stride, (TX_TYPE)tx_type, bd);
}
+ aom_usec_timer_mark(&ref_timer);
+ const int elapsed_time_c =
+ static_cast<int>(aom_usec_timer_elapsed(&ref_timer));
+
+ aom_usec_timer_start(&test_timer);
+ for (int i = 0; i < num_loops; ++i) {
+ target_func(input, output, input_stride, ¶m);
+ }
+ aom_usec_timer_mark(&test_timer);
+ const int elapsed_time_simd =
+ static_cast<int>(aom_usec_timer_elapsed(&test_timer));
+
+ printf(
+ "txfm_size[%2dx%-2d] \t txfm_type[%d] \t c_time=%d \t"
+ "simd_time=%d \t gain=%d \n",
+ rows, cols, tx_type, elapsed_time_c, elapsed_time_simd,
+ (elapsed_time_c / elapsed_time_simd));
}
}
}
@@ -382,9 +383,9 @@
int stride = stride_list[i];
int array_size = stride * stride;
- for (int i = 0; i < array_size; i++) {
- src_diff[i] = 8;
- coeff[i] = 0;
+ for (int j = 0; j < array_size; j++) {
+ src_diff[j] = 8;
+ coeff[j] = 0;
}
av1_quick_txfm(/*use_hadamard=*/0, tx_size, bd_info, src_diff, stride,
@@ -392,9 +393,9 @@
double input_sse = 0;
double output_sse = 0;
- for (int i = 0; i < array_size; i++) {
- input_sse += pow(src_diff[i], 2);
- output_sse += pow(coeff[i], 2);
+ for (int j = 0; j < array_size; j++) {
+ input_sse += pow(src_diff[j], 2);
+ output_sse += pow(coeff[j], 2);
}
double scale = output_sse / input_sse;
@@ -418,9 +419,9 @@
int stride = stride_list[i];
int array_size = stride * stride;
- for (int i = 0; i < array_size; i++) {
- src_diff[i] = 8;
- coeff[i] = 0;
+ for (int j = 0; j < array_size; j++) {
+ src_diff[j] = 8;
+ coeff[j] = 0;
}
av1_quick_txfm(/*use_hadamard=*/1, tx_size, bd_info, src_diff, stride,
@@ -428,9 +429,9 @@
double input_sse = 0;
double output_sse = 0;
- for (int i = 0; i < array_size; i++) {
- input_sse += pow(src_diff[i], 2);
- output_sse += pow(coeff[i], 2);
+ for (int j = 0; j < array_size; j++) {
+ input_sse += pow(src_diff[j], 2);
+ output_sse += pow(coeff[j], 2);
}
double scale = output_sse / input_sse;
@@ -555,14 +556,15 @@
ref_func(input, ref_output, input_stride, (TX_TYPE)tx_type, bd);
target_func(input, output, input_stride, ¶m);
- const int check_rows = AOMMIN(32, rows);
- const int check_cols = AOMMIN(32, rows * cols / check_rows);
+ const int check_cols = AOMMIN(32, cols);
+ const int check_rows = AOMMIN(32, rows * cols / check_cols);
for (int r = 0; r < check_rows; ++r) {
for (int c = 0; c < check_cols; ++c) {
- ASSERT_EQ(ref_output[r * check_cols + c],
- output[r * check_cols + c])
+ ASSERT_EQ(ref_output[c * check_rows + r],
+ output[c * check_rows + r])
<< "[" << r << "," << c << "] cnt:" << cnt
- << " tx_size: " << tx_size << " tx_type: " << tx_type;
+ << " tx_size: " << cols << "x" << rows
+ << " tx_type: " << tx_type;
}
}
}
@@ -610,7 +612,7 @@
aom_usec_timer ref_timer, test_timer;
aom_usec_timer_start(&ref_timer);
- for (int i = 0; i < num_loops; ++i) {
+ for (int j = 0; j < num_loops; ++j) {
ref_func(input, ref_output, input_stride, (TX_TYPE)tx_type, bd);
}
aom_usec_timer_mark(&ref_timer);
@@ -618,7 +620,7 @@
static_cast<int>(aom_usec_timer_elapsed(&ref_timer));
aom_usec_timer_start(&test_timer);
- for (int i = 0; i < num_loops; ++i) {
+ for (int j = 0; j < num_loops; ++j) {
target_func(input, output, input_stride, ¶m);
}
aom_usec_timer_mark(&test_timer);
@@ -626,9 +628,9 @@
static_cast<int>(aom_usec_timer_elapsed(&test_timer));
printf(
- "txfm_size[%d] \t txfm_type[%d] \t c_time=%d \t simd_time=%d \t "
- "gain=%d \n",
- tx_size, tx_type, elapsed_time_c, elapsed_time_simd,
+ "txfm_size[%2dx%-2d] \t txfm_type[%d] \t c_time=%d \t"
+ "simd_time=%d \t gain=%d \n",
+ cols, rows, tx_type, elapsed_time_c, elapsed_time_simd,
(elapsed_time_c / elapsed_time_simd));
}
}
diff --git a/test/av1_highbd_iht_test.cc b/test/av1_highbd_iht_test.cc
index 07c6036..dae53ea 100644
--- a/test/av1_highbd_iht_test.cc
+++ b/test/av1_highbd_iht_test.cc
@@ -298,9 +298,8 @@
for (int r = 0; r < rows; ++r) {
for (int c = 0; c < cols; ++c) {
ASSERT_EQ(ref_output[r * stride + c], output[r * stride + c])
- << "[" << r << "," << c << "] " << cnt
- << " tx_size: " << static_cast<int>(tx_size_)
- << " bit_depth_: " << bit_depth_
+ << "[" << r << "," << c << "] " << cnt << " tx_size: " << cols
+ << "x" << rows << " bit_depth_: " << bit_depth_
<< " tx_type: " << tx_type_name[tx_type_] << " eob " << eob;
}
}
diff --git a/test/av1_horz_only_frame_superres_test.cc b/test/av1_horz_only_frame_superres_test.cc
index f503b63..28ee534 100644
--- a/test/av1_horz_only_frame_superres_test.cc
+++ b/test/av1_horz_only_frame_superres_test.cc
@@ -299,8 +299,13 @@
TEST_P(LowBDConvolveHorizRSTest, Correctness) { CorrectnessTest(); }
TEST_P(LowBDConvolveHorizRSTest, DISABLED_Speed) { SpeedTest(); }
+INSTANTIATE_TEST_SUITE_P(C, LowBDConvolveHorizRSTest,
+ ::testing::Values(av1_convolve_horiz_rs_c));
+
+#if HAVE_SSE4_1
INSTANTIATE_TEST_SUITE_P(SSE4_1, LowBDConvolveHorizRSTest,
::testing::Values(av1_convolve_horiz_rs_sse4_1));
+#endif
#if CONFIG_AV1_HIGHBITDEPTH
typedef void (*HighBDConvolveHorizRsFunc)(const uint16_t *src, int src_stride,
@@ -358,9 +363,24 @@
TEST_P(HighBDConvolveHorizRSTest, DISABLED_Speed) { SpeedTest(); }
INSTANTIATE_TEST_SUITE_P(
+ C, HighBDConvolveHorizRSTest,
+ ::testing::Combine(::testing::Values(av1_highbd_convolve_horiz_rs_c),
+ ::testing::ValuesIn(kBDs)));
+
+#if HAVE_SSE4_1
+INSTANTIATE_TEST_SUITE_P(
SSE4_1, HighBDConvolveHorizRSTest,
::testing::Combine(::testing::Values(av1_highbd_convolve_horiz_rs_sse4_1),
::testing::ValuesIn(kBDs)));
+#endif // HAVE_SSE4_1
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, HighBDConvolveHorizRSTest,
+ ::testing::Combine(::testing::Values(av1_highbd_convolve_horiz_rs_neon),
+ ::testing::ValuesIn(kBDs)));
+#endif // HAVE_NEON
+
#endif // CONFIG_AV1_HIGHBITDEPTH
} // namespace
diff --git a/test/av1_inv_txfm1d_test.cc b/test/av1_inv_txfm1d_test.cc
index ab8a6f8..e70b22a 100644
--- a/test/av1_inv_txfm1d_test.cc
+++ b/test/av1_inv_txfm1d_test.cc
@@ -73,31 +73,31 @@
const int max_error[] = { 6, 10, 19, 31, 40 };
ASSERT_EQ(NELEMENTS(max_error), TX_SIZES);
ASSERT_EQ(NELEMENTS(inv_txfm_func_ls), TX_SIZES);
- for (int k = 0; k < count_test_block; ++k) {
+ for (int i = 0; i < count_test_block; ++i) {
// choose a random transform to test
const TxSize tx_size = static_cast<TxSize>(rnd.Rand8() % TX_SIZES);
- const int tx_size_pix = txfm_size_ls[tx_size];
+ const int txfm_size = txfm_size_ls[tx_size];
const TxfmFunc inv_txfm_func = inv_txfm_func_ls[tx_size][0];
int32_t input[64];
- random_matrix(input, tx_size_pix, &rnd);
+ random_matrix(input, txfm_size, &rnd);
// 64x64 transform assumes last 32 values are zero.
memset(input + 32, 0, 32 * sizeof(input[0]));
int32_t ref_output[64];
memset(ref_output, 0, sizeof(ref_output));
- reference_idct_1d_int(input, ref_output, tx_size_pix);
+ reference_idct_1d_int(input, ref_output, txfm_size);
int32_t output[64];
memset(output, 0, sizeof(output));
inv_txfm_func(input, output, cos_bit, range_bit);
- for (int i = 0; i < tx_size_pix; ++i) {
- EXPECT_LE(abs(output[i] - ref_output[i]), max_error[tx_size])
- << "tx_size = " << tx_size << ", i = " << i
- << ", output[i] = " << output[i]
- << ", ref_output[i] = " << ref_output[i];
+ for (int ni = 0; ni < txfm_size; ++ni) {
+ EXPECT_LE(abs(output[ni] - ref_output[ni]), max_error[tx_size])
+ << "tx_size = " << tx_size << ", ni = " << ni
+ << ", output[ni] = " << output[ni]
+ << ", ref_output[ni] = " << ref_output[ni];
}
}
}
@@ -129,7 +129,7 @@
if (!fwd_txfm_func) continue;
const int count_test_block = 5000;
- for (int ci = 0; ci < count_test_block; ++ci) {
+ for (int i = 0; i < count_test_block; ++i) {
int32_t input[64];
int32_t output[64];
int32_t round_trip_output[64];
diff --git a/test/av1_inv_txfm2d_test.cc b/test/av1_inv_txfm2d_test.cc
index e13350a..dfa0481 100644
--- a/test/av1_inv_txfm2d_test.cc
+++ b/test/av1_inv_txfm2d_test.cc
@@ -30,6 +30,7 @@
using libaom_test::input_base;
using libaom_test::InvTxfm2dFunc;
using libaom_test::LbdInvTxfm2dFunc;
+using libaom_test::tx_type_name;
using ::testing::Combine;
using ::testing::Range;
@@ -42,25 +43,6 @@
namespace {
-static const char *tx_type_name[] = {
- "DCT_DCT",
- "ADST_DCT",
- "DCT_ADST",
- "ADST_ADST",
- "FLIPADST_DCT",
- "DCT_FLIPADST",
- "FLIPADST_FLIPADST",
- "ADST_FLIPADST",
- "FLIPADST_ADST",
- "IDTX",
- "V_DCT",
- "H_DCT",
- "V_ADST",
- "H_ADST",
- "V_FLIPADST",
- "H_FLIPADST",
-};
-
// AV1InvTxfm2dParam argument list:
// tx_type_, tx_size_, max_error_, max_avg_error_
typedef std::tuple<TxType, TxSize, int, double> AV1InvTxfm2dParam;
@@ -139,7 +121,8 @@
actual_max_error = AOMMAX(actual_max_error, this_error);
}
EXPECT_GE(max_error_, actual_max_error)
- << " tx_w: " << tx_w << " tx_h " << tx_h << " tx_type: " << tx_type_;
+ << " tx_w: " << tx_w << " tx_h " << tx_h
+ << " tx_type: " << tx_type_name[tx_type_];
if (actual_max_error > max_error_) { // exit early.
break;
}
@@ -149,7 +132,8 @@
avg_abs_error /= count;
EXPECT_GE(max_avg_error_, avg_abs_error)
- << " tx_w: " << tx_w << " tx_h " << tx_h << " tx_type: " << tx_type_;
+ << " tx_w: " << tx_w << " tx_h " << tx_h
+ << " tx_type: " << tx_type_name[tx_type_];
}
private:
@@ -345,9 +329,9 @@
printf(" ");
}
ASSERT_EQ(ref_value, output[r * stride + c])
- << "[" << r << "," << c << "] " << cnt
- << " tx_size: " << static_cast<int>(tx_size)
- << " tx_type: " << tx_type_name[tx_type] << " eob " << eob;
+ << "[" << r << "," << c << "] " << cnt << " tx_size: " << cols
+ << "x" << rows << " tx_type: " << tx_type_name[tx_type] << " eob "
+ << eob;
}
}
}
@@ -391,11 +375,12 @@
}
#if HAVE_SSSE3
-#if defined(_MSC_VER) || defined(__SSSE3__)
-#include "av1/common/x86/av1_inv_txfm_ssse3.h"
+extern "C" void av1_lowbd_inv_txfm2d_add_ssse3(const int32_t *input,
+ uint8_t *output, int stride,
+ TxType tx_type, TxSize tx_size,
+ int eob);
INSTANTIATE_TEST_SUITE_P(SSSE3, AV1LbdInvTxfm2d,
::testing::Values(av1_lowbd_inv_txfm2d_add_ssse3));
-#endif // _MSC_VER || __SSSE3__
#endif // HAVE_SSSE3
#if HAVE_AVX2
diff --git a/test/av1_k_means_test.cc b/test/av1_k_means_test.cc
index 221dd10..99f0fba 100644
--- a/test/av1_k_means_test.cc
+++ b/test/av1_k_means_test.cc
@@ -259,7 +259,7 @@
RunSpeedTest(GET_PARAM(0), GET_PARAM(1), 8);
}
-#if HAVE_AVX2 || HAVE_SSE2
+#if HAVE_SSE2 || HAVE_AVX2 || HAVE_NEON
const BLOCK_SIZE kValidBlockSize[] = { BLOCK_8X8, BLOCK_8X16, BLOCK_8X32,
BLOCK_16X8, BLOCK_16X16, BLOCK_16X32,
BLOCK_32X8, BLOCK_32X16, BLOCK_32X32,
@@ -267,6 +267,17 @@
BLOCK_16X64, BLOCK_64X16 };
#endif
+#if HAVE_SSE2
+INSTANTIATE_TEST_SUITE_P(
+ SSE2, AV1KmeansTest1,
+ ::testing::Combine(::testing::Values(&av1_calc_indices_dim1_sse2),
+ ::testing::ValuesIn(kValidBlockSize)));
+INSTANTIATE_TEST_SUITE_P(
+ SSE2, AV1KmeansTest2,
+ ::testing::Combine(::testing::Values(&av1_calc_indices_dim2_sse2),
+ ::testing::ValuesIn(kValidBlockSize)));
+#endif
+
#if HAVE_AVX2
INSTANTIATE_TEST_SUITE_P(
AVX2, AV1KmeansTest1,
@@ -278,15 +289,14 @@
::testing::ValuesIn(kValidBlockSize)));
#endif
-#if HAVE_SSE2
-
+#if HAVE_NEON
INSTANTIATE_TEST_SUITE_P(
- SSE2, AV1KmeansTest1,
- ::testing::Combine(::testing::Values(&av1_calc_indices_dim1_sse2),
+ NEON, AV1KmeansTest1,
+ ::testing::Combine(::testing::Values(&av1_calc_indices_dim1_neon),
::testing::ValuesIn(kValidBlockSize)));
INSTANTIATE_TEST_SUITE_P(
- SSE2, AV1KmeansTest2,
- ::testing::Combine(::testing::Values(&av1_calc_indices_dim2_sse2),
+ NEON, AV1KmeansTest2,
+ ::testing::Combine(::testing::Values(&av1_calc_indices_dim2_neon),
::testing::ValuesIn(kValidBlockSize)));
#endif
diff --git a/test/av1_nn_predict_test.cc b/test/av1_nn_predict_test.cc
index 7a3067d..48504c8 100644
--- a/test/av1_nn_predict_test.cc
+++ b/test/av1_nn_predict_test.cc
@@ -175,7 +175,7 @@
// This is all the neural network shapes observed executed in a few different
// runs of the encoder. It also conveniently covers all the kernels
// implemented.
-static const NN_CONFIG shapes[] = {
+static const NN_CONFIG kShapes[] = {
{ 10, 16, 1, { 64 }, { 0 }, { 0 } }, { 12, 1, 1, { 12 }, { 0 }, { 0 } },
{ 12, 1, 1, { 24 }, { 0 }, { 0 } }, { 12, 1, 1, { 32 }, { 0 }, { 0 } },
{ 18, 4, 1, { 24 }, { 0 }, { 0 } }, { 18, 4, 1, { 32 }, { 0 }, { 0 } },
@@ -198,11 +198,12 @@
}
TEST_P(NnPredictTest, RandomValues) {
- RunNnPredictTest_all(shapes, sizeof(shapes) / sizeof(*shapes));
+ RunNnPredictTest_all(kShapes, sizeof(kShapes) / sizeof(kShapes[0]));
}
TEST_P(NnPredictTest, DISABLED_Speed) {
- RunNnPredictSpeedTest_all(shapes, sizeof(shapes) / sizeof(*shapes), 10000000);
+ RunNnPredictSpeedTest_all(kShapes, sizeof(kShapes) / sizeof(kShapes[0]),
+ 10000000);
}
#if HAVE_SSE3 && !CONFIG_EXCLUDE_SIMD_MISMATCH
diff --git a/test/av1_txfm_test.cc b/test/av1_txfm_test.cc
index f741e7c..77c0ec1 100644
--- a/test/av1_txfm_test.cc
+++ b/test/av1_txfm_test.cc
@@ -18,6 +18,25 @@
namespace libaom_test {
+const char *tx_type_name[] = {
+ "DCT_DCT",
+ "ADST_DCT",
+ "DCT_ADST",
+ "ADST_ADST",
+ "FLIPADST_DCT",
+ "DCT_FLIPADST",
+ "FLIPADST_FLIPADST",
+ "ADST_FLIPADST",
+ "FLIPADST_ADST",
+ "IDTX",
+ "V_DCT",
+ "H_DCT",
+ "V_ADST",
+ "H_ADST",
+ "V_FLIPADST",
+ "H_FLIPADST",
+};
+
int get_txfm1d_size(TX_SIZE tx_size) { return tx_size_wide[tx_size]; }
void get_txfm1d_type(TX_TYPE txfm2d_type, TYPE_TXFM *type0, TYPE_TXFM *type1) {
@@ -250,23 +269,25 @@
ASSERT_NE(temp_in, nullptr);
ASSERT_NE(temp_out, nullptr);
ASSERT_NE(out_interm, nullptr);
- const int stride = tx_width;
// Transform columns.
for (int c = 0; c < tx_width; ++c) {
for (int r = 0; r < tx_height; ++r) {
- temp_in[r] = in[r * stride + c];
+ temp_in[r] = in[r * tx_width + c];
}
reference_hybrid_1d(temp_in.get(), temp_out.get(), tx_height, type0);
for (int r = 0; r < tx_height; ++r) {
- out_interm[r * stride + c] = temp_out[r];
+ out_interm[r * tx_width + c] = temp_out[r];
}
}
// Transform rows.
for (int r = 0; r < tx_height; ++r) {
- reference_hybrid_1d(out_interm.get() + r * stride, out + r * stride,
+ reference_hybrid_1d(out_interm.get() + r * tx_width, temp_out.get(),
tx_width, type1);
+ for (int c = 0; c < tx_width; ++c) {
+ out[c * tx_height + r] = temp_out[c];
+ }
}
// These transforms use an approximate 2D DCT transform, by only keeping the
@@ -275,48 +296,48 @@
// TODO(urvang): Refactor this code.
if (tx_width == 64 && tx_height == 64) { // tx_size == TX_64X64
// Zero out top-right 32x32 area.
- for (int row = 0; row < 32; ++row) {
- memset(out + row * 64 + 32, 0, 32 * sizeof(*out));
+ for (int col = 0; col < 32; ++col) {
+ memset(out + col * 64 + 32, 0, 32 * sizeof(*out));
}
// Zero out the bottom 64x32 area.
memset(out + 32 * 64, 0, 32 * 64 * sizeof(*out));
// Re-pack non-zero coeffs in the first 32x32 indices.
- for (int row = 1; row < 32; ++row) {
- memcpy(out + row * 32, out + row * 64, 32 * sizeof(*out));
+ for (int col = 1; col < 32; ++col) {
+ memcpy(out + col * 32, out + col * 64, 32 * sizeof(*out));
}
} else if (tx_width == 32 && tx_height == 64) { // tx_size == TX_32X64
+ // Zero out right 32x32 area.
+ for (int col = 0; col < 32; ++col) {
+ memset(out + col * 64 + 32, 0, 32 * sizeof(*out));
+ }
+ // Re-pack non-zero coeffs in the first 32x32 indices.
+ for (int col = 1; col < 32; ++col) {
+ memcpy(out + col * 32, out + col * 64, 32 * sizeof(*out));
+ }
+ } else if (tx_width == 64 && tx_height == 32) { // tx_size == TX_64X32
// Zero out the bottom 32x32 area.
memset(out + 32 * 32, 0, 32 * 32 * sizeof(*out));
// Note: no repacking needed here.
- } else if (tx_width == 64 && tx_height == 32) { // tx_size == TX_64X32
- // Zero out right 32x32 area.
- for (int row = 0; row < 32; ++row) {
- memset(out + row * 64 + 32, 0, 32 * sizeof(*out));
- }
- // Re-pack non-zero coeffs in the first 32x32 indices.
- for (int row = 1; row < 32; ++row) {
- memcpy(out + row * 32, out + row * 64, 32 * sizeof(*out));
- }
} else if (tx_width == 16 && tx_height == 64) { // tx_size == TX_16X64
- // Zero out the bottom 16x32 area.
- memset(out + 16 * 32, 0, 16 * 32 * sizeof(*out));
// Note: no repacking needed here.
- } else if (tx_width == 64 && tx_height == 16) { // tx_size == TX_64X16
// Zero out right 32x16 area.
- for (int row = 0; row < 16; ++row) {
- memset(out + row * 64 + 32, 0, 32 * sizeof(*out));
+ for (int col = 0; col < 16; ++col) {
+ memset(out + col * 64 + 32, 0, 32 * sizeof(*out));
}
// Re-pack non-zero coeffs in the first 32x16 indices.
- for (int row = 1; row < 16; ++row) {
- memcpy(out + row * 32, out + row * 64, 32 * sizeof(*out));
+ for (int col = 1; col < 16; ++col) {
+ memcpy(out + col * 32, out + col * 64, 32 * sizeof(*out));
}
+ } else if (tx_width == 64 && tx_height == 16) { // tx_size == TX_64X16
+ // Zero out the bottom 16x32 area.
+ memset(out + 16 * 32, 0, 16 * 32 * sizeof(*out));
}
// Apply appropriate scale.
const double amplify_factor = get_amplification_factor(tx_type, tx_size);
for (int c = 0; c < tx_width; ++c) {
for (int r = 0; r < tx_height; ++r) {
- out[r * stride + c] *= amplify_factor;
+ out[c * tx_height + r] *= amplify_factor;
}
}
}
diff --git a/test/av1_txfm_test.h b/test/av1_txfm_test.h
index 13a7e8a..d285e3d 100644
--- a/test/av1_txfm_test.h
+++ b/test/av1_txfm_test.h
@@ -29,6 +29,9 @@
#include "av1/common/enums.h"
namespace libaom_test {
+
+extern const char *tx_type_name[];
+
enum {
TYPE_DCT = 0,
TYPE_ADST,
diff --git a/test/avg_test.cc b/test/avg_test.cc
index 4e86f06..8865915 100644
--- a/test/avg_test.cc
+++ b/test/avg_test.cc
@@ -847,7 +847,13 @@
make_tuple(32, 32, 10, 15, 4, &aom_highbd_avg_4x4_neon),
make_tuple(16, 16, 12, 0, 4, &aom_highbd_avg_4x4_neon),
make_tuple(16, 16, 12, 5, 4, &aom_highbd_avg_4x4_neon),
- make_tuple(32, 32, 12, 15, 4, &aom_highbd_avg_4x4_neon)));
+ make_tuple(32, 32, 12, 15, 4, &aom_highbd_avg_4x4_neon),
+ make_tuple(16, 16, 10, 0, 8, &aom_highbd_avg_8x8_neon),
+ make_tuple(16, 16, 10, 5, 8, &aom_highbd_avg_8x8_neon),
+ make_tuple(32, 32, 10, 15, 8, &aom_highbd_avg_8x8_neon),
+ make_tuple(16, 16, 12, 0, 8, &aom_highbd_avg_8x8_neon),
+ make_tuple(16, 16, 12, 5, 8, &aom_highbd_avg_8x8_neon),
+ make_tuple(32, 32, 12, 15, 8, &aom_highbd_avg_8x8_neon)));
#endif // HAVE_NEON
#endif // CONFIG_AV1_HIGHBITDEPTH
diff --git a/test/avif_progressive_test.cc b/test/avif_progressive_test.cc
index 4a00a5a..2a28ca3 100644
--- a/test/avif_progressive_test.cc
+++ b/test/avif_progressive_test.cc
@@ -33,18 +33,22 @@
aom_image_t img;
EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I444, kWidth, kHeight, 1,
buffer.data()));
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
aom_codec_iface_t *iface = aom_codec_av1_cx();
aom_codec_enc_cfg_t cfg;
- const unsigned int usage = AOM_USAGE_GOOD_QUALITY;
- EXPECT_EQ(AOM_CODEC_OK, aom_codec_enc_config_default(iface, &cfg, usage));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_GOOD_QUALITY));
+ cfg.g_profile = 1;
cfg.g_w = kWidth;
cfg.g_h = kHeight;
- cfg.rc_end_usage = AOM_Q;
- cfg.g_profile = 1;
cfg.g_bit_depth = AOM_BITS_8;
cfg.g_input_bit_depth = 8;
cfg.g_lag_in_frames = 0;
+ cfg.rc_end_usage = AOM_Q;
cfg.rc_min_quantizer = 50;
cfg.rc_max_quantizer = 50;
aom_codec_ctx_t enc;
@@ -64,7 +68,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
aom_codec_iter_t iter = nullptr;
const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0x1f0011.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
@@ -85,7 +89,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, encode_flags));
iter = nullptr;
pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, 0u);
@@ -114,18 +118,22 @@
aom_image_t img;
EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I444, kWidth, kHeight, 1,
buffer.data()));
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
aom_codec_iface_t *iface = aom_codec_av1_cx();
aom_codec_enc_cfg_t cfg;
- const unsigned int usage = AOM_USAGE_GOOD_QUALITY;
- EXPECT_EQ(AOM_CODEC_OK, aom_codec_enc_config_default(iface, &cfg, usage));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_GOOD_QUALITY));
+ cfg.g_profile = 1;
cfg.g_w = kWidth;
cfg.g_h = kHeight;
- cfg.rc_end_usage = AOM_Q;
- cfg.g_profile = 1;
cfg.g_bit_depth = AOM_BITS_8;
cfg.g_input_bit_depth = 8;
cfg.g_lag_in_frames = 0;
+ cfg.rc_end_usage = AOM_Q;
cfg.rc_min_quantizer = 0;
cfg.rc_max_quantizer = 0;
aom_codec_ctx_t enc;
@@ -149,7 +157,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
aom_codec_iter_t iter = nullptr;
const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0x1f0011.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
@@ -165,7 +173,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, encode_flags));
iter = nullptr;
pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, 0u);
@@ -181,6 +189,9 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
}
+// This test reproduces bug aomedia:3382. Certain parameters such as width,
+// height, g_threads, usage, etc. were carefully chosen based on the
+// complicated logic of av1_select_sb_size() to cause an inconsistent sb_size.
TEST(AVIFProgressiveTest, DimensionChangeLargeImageMultiThread) {
constexpr int kWidth = 1920;
constexpr int kHeight = 1080;
@@ -233,7 +244,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
aom_codec_iter_t iter = nullptr;
const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0x1f0011.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
@@ -249,7 +260,7 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, encode_flags));
iter = nullptr;
pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, 0u);
diff --git a/test/comp_mask_pred_test.cc b/test/comp_mask_pred_test.cc
new file mode 100644
index 0000000..06c3192
--- /dev/null
+++ b/test/comp_mask_pred_test.cc
@@ -0,0 +1,716 @@
+/*
+ * Copyright (c) 2018, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <cstdlib>
+#include <new>
+#include <tuple>
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+
+#include "aom/aom_codec.h"
+#include "aom/aom_integer.h"
+#include "aom_dsp/variance.h"
+#include "aom_mem/aom_mem.h"
+#include "aom_ports/aom_timer.h"
+#include "aom_ports/mem.h"
+#include "av1/common/reconinter.h"
+#include "av1/encoder/reconinter_enc.h"
+#include "test/acm_random.h"
+#include "test/register_state_check.h"
+#include "test/util.h"
+#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
+
+namespace {
+typedef void (*comp_mask_pred_func)(uint8_t *comp_pred, const uint8_t *pred,
+ int width, int height, const uint8_t *ref,
+ int ref_stride, const uint8_t *mask,
+ int mask_stride, int invert_mask);
+
+typedef void (*comp_avg_pred_func)(uint8_t *comp_pred, const uint8_t *pred,
+ int width, int height, const uint8_t *ref,
+ int ref_stride);
+
+#if HAVE_SSSE3 || HAVE_SSE2 || HAVE_AVX2 || HAVE_NEON
+const BLOCK_SIZE kCompMaskPredParams[] = {
+ BLOCK_8X8, BLOCK_8X16, BLOCK_8X32, BLOCK_16X8, BLOCK_16X16,
+ BLOCK_16X32, BLOCK_32X8, BLOCK_32X16, BLOCK_32X32
+};
+#endif
+
+class AV1CompMaskPredBase : public ::testing::Test {
+ public:
+ ~AV1CompMaskPredBase();
+ void SetUp();
+
+ void TearDown();
+
+ protected:
+ bool CheckResult(int width, int height) {
+ for (int y = 0; y < height; ++y) {
+ for (int x = 0; x < width; ++x) {
+ const int idx = y * width + x;
+ if (comp_pred1_[idx] != comp_pred2_[idx]) {
+ printf("%dx%d mismatch @%d(%d,%d) ", width, height, idx, y, x);
+ printf("%d != %d ", comp_pred1_[idx], comp_pred2_[idx]);
+ return false;
+ }
+ }
+ }
+ return true;
+ }
+
+ libaom_test::ACMRandom rnd_;
+ uint8_t *comp_pred1_;
+ uint8_t *comp_pred2_;
+ uint8_t *pred_;
+ uint8_t *ref_buffer_;
+ uint8_t *ref_;
+};
+
+AV1CompMaskPredBase::~AV1CompMaskPredBase() {}
+
+void AV1CompMaskPredBase::SetUp() {
+ rnd_.Reset(libaom_test::ACMRandom::DeterministicSeed());
+ av1_init_wedge_masks();
+ comp_pred1_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(comp_pred1_, nullptr);
+ comp_pred2_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(comp_pred2_, nullptr);
+ pred_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(pred_, nullptr);
+ // The biggest block size is MAX_SB_SQUARE(128*128), however for the
+ // convolution we need to access 3 bytes before and 4 bytes after (for an
+ // 8-tap filter), in both directions, so we need to allocate
+ // (128 + 7) * (128 + 7) = MAX_SB_SQUARE + (14 * MAX_SB_SIZE) + 49
+ ref_buffer_ =
+ (uint8_t *)aom_memalign(16, MAX_SB_SQUARE + (14 * MAX_SB_SIZE) + 49);
+ ASSERT_NE(ref_buffer_, nullptr);
+ // Start of the actual block where the convolution will be computed
+ ref_ = ref_buffer_ + (3 * MAX_SB_SIZE + 3);
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand8();
+ }
+ for (int i = 0; i < MAX_SB_SQUARE + (14 * MAX_SB_SIZE) + 49; ++i) {
+ ref_buffer_[i] = rnd_.Rand8();
+ }
+}
+
+void AV1CompMaskPredBase::TearDown() {
+ aom_free(comp_pred1_);
+ aom_free(comp_pred2_);
+ aom_free(pred_);
+ aom_free(ref_buffer_);
+}
+
+typedef std::tuple<comp_mask_pred_func, BLOCK_SIZE> CompMaskPredParam;
+
+class AV1CompMaskPredTest
+ : public AV1CompMaskPredBase,
+ public ::testing::WithParamInterface<CompMaskPredParam> {
+ protected:
+ void RunCheckOutput(comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv);
+ void RunSpeedTest(comp_mask_pred_func test_impl, BLOCK_SIZE bsize);
+};
+
+void AV1CompMaskPredTest::RunCheckOutput(comp_mask_pred_func test_impl,
+ BLOCK_SIZE bsize, int inv) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int wedge_types = get_wedge_types_lookup(bsize);
+ for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
+ const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
+
+ aom_comp_mask_pred_c(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w,
+ inv);
+ test_impl(comp_pred2_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w, inv);
+
+ ASSERT_EQ(CheckResult(w, h), true)
+ << " wedge " << wedge_index << " inv " << inv;
+ }
+}
+
+void AV1CompMaskPredTest::RunSpeedTest(comp_mask_pred_func test_impl,
+ BLOCK_SIZE bsize) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int wedge_types = get_wedge_types_lookup(bsize);
+ int wedge_index = wedge_types / 2;
+ const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
+ const int num_loops = 1000000000 / (w + h);
+
+ comp_mask_pred_func funcs[2] = { aom_comp_mask_pred_c, test_impl };
+ double elapsed_time[2] = { 0 };
+ for (int i = 0; i < 2; ++i) {
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ comp_mask_pred_func func = funcs[i];
+ for (int j = 0; j < num_loops; ++j) {
+ func(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w, 0);
+ }
+ aom_usec_timer_mark(&timer);
+ double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ elapsed_time[i] = 1000.0 * time / num_loops;
+ }
+ printf("compMask %3dx%-3d: %7.2f/%7.2fns", w, h, elapsed_time[0],
+ elapsed_time[1]);
+ printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
+}
+
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1CompMaskPredTest);
+
+TEST_P(AV1CompMaskPredTest, CheckOutput) {
+ // inv = 0, 1
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
+}
+
+TEST_P(AV1CompMaskPredTest, DISABLED_Speed) {
+ RunSpeedTest(GET_PARAM(0), GET_PARAM(1));
+}
+
+#if HAVE_SSSE3
+INSTANTIATE_TEST_SUITE_P(
+ SSSE3, AV1CompMaskPredTest,
+ ::testing::Combine(::testing::Values(&aom_comp_mask_pred_ssse3),
+ ::testing::ValuesIn(kCompMaskPredParams)));
+#endif
+
+#if HAVE_AVX2
+INSTANTIATE_TEST_SUITE_P(
+ AVX2, AV1CompMaskPredTest,
+ ::testing::Combine(::testing::Values(&aom_comp_mask_pred_avx2),
+ ::testing::ValuesIn(kCompMaskPredParams)));
+#endif
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1CompMaskPredTest,
+ ::testing::Combine(::testing::Values(&aom_comp_mask_pred_neon),
+ ::testing::ValuesIn(kCompMaskPredParams)));
+#endif
+
+#if HAVE_SSSE3 || HAVE_SSE2 || HAVE_AVX2 || HAVE_NEON
+const BLOCK_SIZE kValidBlockSize[] = {
+ BLOCK_4X4, BLOCK_8X8, BLOCK_8X16, BLOCK_8X32, BLOCK_16X8,
+ BLOCK_16X16, BLOCK_16X32, BLOCK_32X8, BLOCK_32X16, BLOCK_32X32,
+ BLOCK_32X64, BLOCK_64X32, BLOCK_64X64, BLOCK_64X128, BLOCK_128X64,
+ BLOCK_128X128, BLOCK_16X64, BLOCK_64X16
+};
+#endif
+
+typedef void (*upsampled_pred_func)(MACROBLOCKD *xd, const AV1_COMMON *const cm,
+ int mi_row, int mi_col, const MV *const mv,
+ uint8_t *comp_pred, int width, int height,
+ int subpel_x_q3, int subpel_y_q3,
+ const uint8_t *ref, int ref_stride,
+ int subpel_search);
+
+typedef std::tuple<upsampled_pred_func, BLOCK_SIZE> UpsampledPredParam;
+
+class AV1UpsampledPredTest
+ : public AV1CompMaskPredBase,
+ public ::testing::WithParamInterface<UpsampledPredParam> {
+ protected:
+ void RunCheckOutput(upsampled_pred_func test_impl, BLOCK_SIZE bsize);
+ void RunSpeedTest(upsampled_pred_func test_impl, BLOCK_SIZE bsize,
+ int havSub);
+};
+
+void AV1UpsampledPredTest::RunCheckOutput(upsampled_pred_func test_impl,
+ BLOCK_SIZE bsize) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ for (int subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
+ ++subpel_search) {
+ // loop through subx and suby
+ for (int sub = 0; sub < 8 * 8; ++sub) {
+ int subx = sub & 0x7;
+ int suby = (sub >> 3);
+
+ aom_upsampled_pred_c(nullptr, nullptr, 0, 0, nullptr, comp_pred1_, w, h,
+ subx, suby, ref_, MAX_SB_SIZE, subpel_search);
+
+ test_impl(nullptr, nullptr, 0, 0, nullptr, comp_pred2_, w, h, subx, suby,
+ ref_, MAX_SB_SIZE, subpel_search);
+ ASSERT_EQ(CheckResult(w, h), true)
+ << "sub (" << subx << "," << suby << ")";
+ }
+ }
+}
+
+void AV1UpsampledPredTest::RunSpeedTest(upsampled_pred_func test_impl,
+ BLOCK_SIZE bsize, int havSub) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int subx = havSub ? 3 : 0;
+ const int suby = havSub ? 4 : 0;
+
+ const int num_loops = 1000000000 / (w + h);
+ upsampled_pred_func funcs[2] = { aom_upsampled_pred_c, test_impl };
+ double elapsed_time[2] = { 0 };
+ int subpel_search = USE_8_TAPS; // set to USE_4_TAPS to test 4-tap filter.
+ for (int i = 0; i < 2; ++i) {
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ upsampled_pred_func func = funcs[i];
+ for (int j = 0; j < num_loops; ++j) {
+ func(nullptr, nullptr, 0, 0, nullptr, comp_pred1_, w, h, subx, suby, ref_,
+ MAX_SB_SIZE, subpel_search);
+ }
+ aom_usec_timer_mark(&timer);
+ double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ elapsed_time[i] = 1000.0 * time / num_loops;
+ }
+ printf("UpsampledPred[%d] %3dx%-3d:%7.2f/%7.2fns", havSub, w, h,
+ elapsed_time[0], elapsed_time[1]);
+ printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
+}
+
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1UpsampledPredTest);
+
+TEST_P(AV1UpsampledPredTest, CheckOutput) {
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1));
+}
+
+TEST_P(AV1UpsampledPredTest, DISABLED_Speed) {
+ RunSpeedTest(GET_PARAM(0), GET_PARAM(1), 1);
+}
+
+#if HAVE_SSE2
+INSTANTIATE_TEST_SUITE_P(
+ SSE2, AV1UpsampledPredTest,
+ ::testing::Combine(::testing::Values(&aom_upsampled_pred_sse2),
+ ::testing::ValuesIn(kValidBlockSize)));
+#endif
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1UpsampledPredTest,
+ ::testing::Combine(::testing::Values(&aom_upsampled_pred_neon),
+ ::testing::ValuesIn(kValidBlockSize)));
+#endif
+
+typedef std::tuple<comp_avg_pred_func, BLOCK_SIZE> CompAvgPredParam;
+
+class AV1CompAvgPredTest : public ::testing::TestWithParam<CompAvgPredParam> {
+ public:
+ ~AV1CompAvgPredTest();
+ void SetUp();
+
+ void TearDown();
+
+ protected:
+ void RunCheckOutput(comp_avg_pred_func test_impl, BLOCK_SIZE bsize);
+ void RunSpeedTest(comp_avg_pred_func test_impl, BLOCK_SIZE bsize);
+ bool CheckResult(int width, int height) {
+ for (int y = 0; y < height; ++y) {
+ for (int x = 0; x < width; ++x) {
+ const int idx = y * width + x;
+ if (comp_pred1_[idx] != comp_pred2_[idx]) {
+ printf("%dx%d mismatch @%d(%d,%d) ", width, height, idx, x, y);
+ printf("%d != %d ", comp_pred1_[idx], comp_pred2_[idx]);
+ return false;
+ }
+ }
+ }
+ return true;
+ }
+
+ libaom_test::ACMRandom rnd_;
+ uint8_t *comp_pred1_;
+ uint8_t *comp_pred2_;
+ uint8_t *pred_;
+ uint8_t *ref_;
+};
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1CompAvgPredTest);
+
+AV1CompAvgPredTest::~AV1CompAvgPredTest() {}
+
+void AV1CompAvgPredTest::SetUp() {
+ rnd_.Reset(libaom_test::ACMRandom::DeterministicSeed());
+
+ comp_pred1_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(comp_pred1_, nullptr);
+ comp_pred2_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(comp_pred2_, nullptr);
+ pred_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(pred_, nullptr);
+ ref_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
+ ASSERT_NE(ref_, nullptr);
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand8();
+ }
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ ref_[i] = rnd_.Rand8();
+ }
+}
+
+void AV1CompAvgPredTest::TearDown() {
+ aom_free(comp_pred1_);
+ aom_free(comp_pred2_);
+ aom_free(pred_);
+ aom_free(ref_);
+}
+
+void AV1CompAvgPredTest::RunCheckOutput(comp_avg_pred_func test_impl,
+ BLOCK_SIZE bsize) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ aom_comp_avg_pred_c(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE);
+ test_impl(comp_pred2_, pred_, w, h, ref_, MAX_SB_SIZE);
+
+ ASSERT_EQ(CheckResult(w, h), true);
+}
+
+void AV1CompAvgPredTest::RunSpeedTest(comp_avg_pred_func test_impl,
+ BLOCK_SIZE bsize) {
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int num_loops = 1000000000 / (w + h);
+
+ comp_avg_pred_func functions[2] = { aom_comp_avg_pred_c, test_impl };
+ double elapsed_time[2] = { 0.0 };
+ for (int i = 0; i < 2; ++i) {
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ comp_avg_pred_func func = functions[i];
+ for (int j = 0; j < num_loops; ++j) {
+ func(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE);
+ }
+ aom_usec_timer_mark(&timer);
+ const double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ elapsed_time[i] = 1000.0 * time;
+ }
+ printf("compMask %3dx%-3d: %7.2f/%7.2fns", w, h, elapsed_time[0],
+ elapsed_time[1]);
+ printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
+}
+
+TEST_P(AV1CompAvgPredTest, CheckOutput) {
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1));
+}
+
+TEST_P(AV1CompAvgPredTest, DISABLED_Speed) {
+ RunSpeedTest(GET_PARAM(0), GET_PARAM(1));
+}
+
+#if HAVE_AVX2
+INSTANTIATE_TEST_SUITE_P(
+ AVX2, AV1CompAvgPredTest,
+ ::testing::Combine(::testing::Values(&aom_comp_avg_pred_avx2),
+ ::testing::ValuesIn(kValidBlockSize)));
+#endif
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AV1CompAvgPredTest,
+ ::testing::Combine(::testing::Values(&aom_comp_avg_pred_neon),
+ ::testing::ValuesIn(kValidBlockSize)));
+#endif
+
+#if CONFIG_AV1_HIGHBITDEPTH
+class AV1HighbdCompMaskPredTestBase : public ::testing::Test {
+ public:
+ ~AV1HighbdCompMaskPredTestBase();
+ void SetUp();
+
+ void TearDown();
+
+ protected:
+ bool CheckResult(int width, int height) {
+ for (int y = 0; y < height; ++y) {
+ for (int x = 0; x < width; ++x) {
+ const int idx = y * width + x;
+ if (comp_pred1_[idx] != comp_pred2_[idx]) {
+ printf("%dx%d mismatch @%d(%d,%d) ", width, height, idx, y, x);
+ printf("%d != %d ", comp_pred1_[idx], comp_pred2_[idx]);
+ return false;
+ }
+ }
+ }
+ return true;
+ }
+
+ libaom_test::ACMRandom rnd_;
+ uint16_t *comp_pred1_;
+ uint16_t *comp_pred2_;
+ uint16_t *pred_;
+ uint16_t *ref_buffer_;
+ uint16_t *ref_;
+};
+
+AV1HighbdCompMaskPredTestBase::~AV1HighbdCompMaskPredTestBase() {}
+
+void AV1HighbdCompMaskPredTestBase::SetUp() {
+ rnd_.Reset(libaom_test::ACMRandom::DeterministicSeed());
+ av1_init_wedge_masks();
+
+ comp_pred1_ =
+ (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*comp_pred1_));
+ ASSERT_NE(comp_pred1_, nullptr);
+ comp_pred2_ =
+ (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*comp_pred2_));
+ ASSERT_NE(comp_pred2_, nullptr);
+ pred_ = (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*pred_));
+ ASSERT_NE(pred_, nullptr);
+ // The biggest block size is MAX_SB_SQUARE(128*128), however for the
+ // convolution we need to access 3 elements before and 4 elements after (for
+ // an 8-tap filter), in both directions, so we need to allocate (128 + 7) *
+ // (128 + 7) = (MAX_SB_SQUARE + (14 * MAX_SB_SIZE) + 49) *
+ // sizeof(*ref_buffer_)
+ ref_buffer_ = (uint16_t *)aom_memalign(
+ 16, (MAX_SB_SQUARE + (14 * MAX_SB_SIZE) + 49) * sizeof(*ref_buffer_));
+ ASSERT_NE(ref_buffer_, nullptr);
+ // Start of the actual block where the convolution will be computed
+ ref_ = ref_buffer_ + (3 * MAX_SB_SIZE + 3);
+}
+
+void AV1HighbdCompMaskPredTestBase::TearDown() {
+ aom_free(comp_pred1_);
+ aom_free(comp_pred2_);
+ aom_free(pred_);
+ aom_free(ref_buffer_);
+}
+
+typedef void (*highbd_comp_mask_pred_func)(uint8_t *comp_pred8,
+ const uint8_t *pred8, int width,
+ int height, const uint8_t *ref8,
+ int ref_stride, const uint8_t *mask,
+ int mask_stride, int invert_mask);
+
+typedef std::tuple<highbd_comp_mask_pred_func, BLOCK_SIZE, int>
+ HighbdCompMaskPredParam;
+
+class AV1HighbdCompMaskPredTest
+ : public AV1HighbdCompMaskPredTestBase,
+ public ::testing::WithParamInterface<HighbdCompMaskPredParam> {
+ public:
+ ~AV1HighbdCompMaskPredTest();
+
+ protected:
+ void RunCheckOutput(comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv);
+ void RunSpeedTest(comp_mask_pred_func test_impl, BLOCK_SIZE bsize);
+};
+
+AV1HighbdCompMaskPredTest::~AV1HighbdCompMaskPredTest() {}
+
+void AV1HighbdCompMaskPredTest::RunCheckOutput(
+ highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv) {
+ int bd_ = GET_PARAM(2);
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int wedge_types = get_wedge_types_lookup(bsize);
+
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+ for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
+ ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+
+ for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
+ const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
+
+ aom_highbd_comp_mask_pred_c(
+ CONVERT_TO_BYTEPTR(comp_pred1_), CONVERT_TO_BYTEPTR(pred_), w, h,
+ CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, inv);
+
+ test_impl(CONVERT_TO_BYTEPTR(comp_pred2_), CONVERT_TO_BYTEPTR(pred_), w, h,
+ CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, inv);
+
+ ASSERT_EQ(CheckResult(w, h), true)
+ << " wedge " << wedge_index << " inv " << inv;
+ }
+}
+
+void AV1HighbdCompMaskPredTest::RunSpeedTest(
+ highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize) {
+ int bd_ = GET_PARAM(2);
+
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int wedge_types = get_wedge_types_lookup(bsize);
+ int wedge_index = wedge_types / 2;
+
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+ for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
+ ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+
+ const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
+ const int num_loops = 1000000000 / (w + h);
+
+ highbd_comp_mask_pred_func funcs[2] = { aom_highbd_comp_mask_pred_c,
+ test_impl };
+ double elapsed_time[2] = { 0 };
+ for (int i = 0; i < 2; ++i) {
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ highbd_comp_mask_pred_func func = funcs[i];
+ for (int j = 0; j < num_loops; ++j) {
+ func(CONVERT_TO_BYTEPTR(comp_pred1_), CONVERT_TO_BYTEPTR(pred_), w, h,
+ CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, 0);
+ }
+ aom_usec_timer_mark(&timer);
+ double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ elapsed_time[i] = 1000.0 * time / num_loops;
+ }
+ printf("compMask %3dx%-3d: %7.2f/%7.2fns", w, h, elapsed_time[0],
+ elapsed_time[1]);
+ printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
+}
+
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1HighbdCompMaskPredTest);
+
+TEST_P(AV1HighbdCompMaskPredTest, CheckOutput) {
+ // inv = 0, 1
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
+}
+
+TEST_P(AV1HighbdCompMaskPredTest, DISABLED_Speed) {
+ RunSpeedTest(GET_PARAM(0), GET_PARAM(1));
+}
+
+#if HAVE_AVX2
+INSTANTIATE_TEST_SUITE_P(
+ AVX2, AV1HighbdCompMaskPredTest,
+ ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_avx2),
+ ::testing::ValuesIn(kCompMaskPredParams),
+ ::testing::Range(8, 13, 2)));
+#endif
+
+#if HAVE_SSE2
+INSTANTIATE_TEST_SUITE_P(
+ SSE2, AV1HighbdCompMaskPredTest,
+ ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_sse2),
+ ::testing::ValuesIn(kCompMaskPredParams),
+ ::testing::Range(8, 13, 2)));
+#endif
+
+typedef void (*highbd_upsampled_pred_func)(
+ MACROBLOCKD *xd, const struct AV1Common *const cm, int mi_row, int mi_col,
+ const MV *const mv, uint8_t *comp_pred8, int width, int height,
+ int subpel_x_q3, int subpel_y_q3, const uint8_t *ref8, int ref_stride,
+ int bd, int subpel_search);
+
+typedef std::tuple<highbd_upsampled_pred_func, BLOCK_SIZE, int>
+ HighbdUpsampledPredParam;
+
+class AV1HighbdUpsampledPredTest
+ : public AV1HighbdCompMaskPredTestBase,
+ public ::testing::WithParamInterface<HighbdUpsampledPredParam> {
+ public:
+ ~AV1HighbdUpsampledPredTest();
+
+ protected:
+ void RunCheckOutput(highbd_upsampled_pred_func test_impl, BLOCK_SIZE bsize);
+ void RunSpeedTest(highbd_upsampled_pred_func test_impl, BLOCK_SIZE bsize,
+ int havSub);
+};
+
+AV1HighbdUpsampledPredTest::~AV1HighbdUpsampledPredTest() {}
+
+void AV1HighbdUpsampledPredTest::RunCheckOutput(
+ highbd_upsampled_pred_func test_impl, BLOCK_SIZE bsize) {
+ int bd_ = GET_PARAM(2);
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+ for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
+ ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+
+ for (int subpel_search = 1; subpel_search <= 2; ++subpel_search) {
+ // loop through subx and suby
+ for (int sub = 0; sub < 8 * 8; ++sub) {
+ int subx = sub & 0x7;
+ int suby = (sub >> 3);
+
+ aom_highbd_upsampled_pred_c(nullptr, nullptr, 0, 0, nullptr,
+ CONVERT_TO_BYTEPTR(comp_pred1_), w, h, subx,
+ suby, CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE,
+ bd_, subpel_search);
+
+ test_impl(nullptr, nullptr, 0, 0, nullptr,
+ CONVERT_TO_BYTEPTR(comp_pred2_), w, h, subx, suby,
+ CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, bd_, subpel_search);
+
+ ASSERT_EQ(CheckResult(w, h), true)
+ << "sub (" << subx << "," << suby << ")";
+ }
+ }
+}
+
+void AV1HighbdUpsampledPredTest::RunSpeedTest(
+ highbd_upsampled_pred_func test_impl, BLOCK_SIZE bsize, int havSub) {
+ int bd_ = GET_PARAM(2);
+ const int w = block_size_wide[bsize];
+ const int h = block_size_high[bsize];
+ const int subx = havSub ? 3 : 0;
+ const int suby = havSub ? 4 : 0;
+
+ for (int i = 0; i < MAX_SB_SQUARE; ++i) {
+ pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+ for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
+ ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
+ }
+
+ const int num_loops = 1000000000 / (w + h);
+ highbd_upsampled_pred_func funcs[2] = { &aom_highbd_upsampled_pred_c,
+ test_impl };
+ double elapsed_time[2] = { 0 };
+ for (int i = 0; i < 2; ++i) {
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ highbd_upsampled_pred_func func = funcs[i];
+ int subpel_search = 2; // set to 1 to test 4-tap filter.
+ for (int j = 0; j < num_loops; ++j) {
+ func(nullptr, nullptr, 0, 0, nullptr, CONVERT_TO_BYTEPTR(comp_pred1_), w,
+ h, subx, suby, CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, bd_,
+ subpel_search);
+ }
+ aom_usec_timer_mark(&timer);
+ double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ elapsed_time[i] = 1000.0 * time / num_loops;
+ }
+ printf("CompMaskUp[%d] %3dx%-3d:%7.2f/%7.2fns", havSub, w, h, elapsed_time[0],
+ elapsed_time[1]);
+ printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
+}
+
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1HighbdUpsampledPredTest);
+
+TEST_P(AV1HighbdUpsampledPredTest, CheckOutput) {
+ RunCheckOutput(GET_PARAM(0), GET_PARAM(1));
+}
+
+TEST_P(AV1HighbdUpsampledPredTest, DISABLED_Speed) {
+ RunSpeedTest(GET_PARAM(0), GET_PARAM(1), 1);
+}
+
+#if HAVE_SSE2
+INSTANTIATE_TEST_SUITE_P(
+ SSE2, AV1HighbdUpsampledPredTest,
+ ::testing::Combine(::testing::Values(&aom_highbd_upsampled_pred_sse2),
+ ::testing::ValuesIn(kValidBlockSize),
+ ::testing::Range(8, 13, 2)));
+#endif
+
+#endif // CONFIG_AV1_HIGHBITDEPTH
+} // namespace
diff --git a/test/comp_mask_variance_test.cc b/test/comp_mask_variance_test.cc
deleted file mode 100644
index f958c5d..0000000
--- a/test/comp_mask_variance_test.cc
+++ /dev/null
@@ -1,589 +0,0 @@
-/*
- * Copyright (c) 2018, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include <cstdlib>
-#include <new>
-#include <tuple>
-
-#include "config/aom_config.h"
-#include "config/aom_dsp_rtcd.h"
-
-#include "aom/aom_codec.h"
-#include "aom/aom_integer.h"
-#include "aom_dsp/variance.h"
-#include "aom_mem/aom_mem.h"
-#include "aom_ports/aom_timer.h"
-#include "aom_ports/mem.h"
-#include "av1/common/reconinter.h"
-#include "av1/encoder/reconinter_enc.h"
-#include "test/acm_random.h"
-#include "test/register_state_check.h"
-#include "test/util.h"
-#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
-
-namespace AV1CompMaskVariance {
-typedef void (*comp_mask_pred_func)(uint8_t *comp_pred, const uint8_t *pred,
- int width, int height, const uint8_t *ref,
- int ref_stride, const uint8_t *mask,
- int mask_stride, int invert_mask);
-
-#if HAVE_SSSE3 || HAVE_SSE2 || HAVE_AVX2
-const BLOCK_SIZE kValidBlockSize[] = {
- BLOCK_8X8, BLOCK_8X16, BLOCK_8X32, BLOCK_16X8, BLOCK_16X16,
- BLOCK_16X32, BLOCK_32X8, BLOCK_32X16, BLOCK_32X32, BLOCK_32X64,
- BLOCK_64X32, BLOCK_64X64, BLOCK_64X128, BLOCK_128X64, BLOCK_128X128,
- BLOCK_16X64, BLOCK_64X16
-};
-#endif
-typedef std::tuple<comp_mask_pred_func, BLOCK_SIZE> CompMaskPredParam;
-
-class AV1CompMaskVarianceTest
- : public ::testing::TestWithParam<CompMaskPredParam> {
- public:
- ~AV1CompMaskVarianceTest();
- void SetUp();
-
- void TearDown();
-
- protected:
- void RunCheckOutput(comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv);
- void RunSpeedTest(comp_mask_pred_func test_impl, BLOCK_SIZE bsize);
- bool CheckResult(int width, int height) {
- for (int y = 0; y < height; ++y) {
- for (int x = 0; x < width; ++x) {
- const int idx = y * width + x;
- if (comp_pred1_[idx] != comp_pred2_[idx]) {
- printf("%dx%d mismatch @%d(%d,%d) ", width, height, idx, y, x);
- printf("%d != %d ", comp_pred1_[idx], comp_pred2_[idx]);
- return false;
- }
- }
- }
- return true;
- }
-
- libaom_test::ACMRandom rnd_;
- uint8_t *comp_pred1_;
- uint8_t *comp_pred2_;
- uint8_t *pred_;
- uint8_t *ref_buffer_;
- uint8_t *ref_;
-};
-GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1CompMaskVarianceTest);
-
-AV1CompMaskVarianceTest::~AV1CompMaskVarianceTest() {}
-
-void AV1CompMaskVarianceTest::SetUp() {
- rnd_.Reset(libaom_test::ACMRandom::DeterministicSeed());
- av1_init_wedge_masks();
- comp_pred1_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
- ASSERT_NE(comp_pred1_, nullptr);
- comp_pred2_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
- ASSERT_NE(comp_pred2_, nullptr);
- pred_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE);
- ASSERT_NE(pred_, nullptr);
- ref_buffer_ = (uint8_t *)aom_memalign(16, MAX_SB_SQUARE + (8 * MAX_SB_SIZE));
- ASSERT_NE(ref_buffer_, nullptr);
- ref_ = ref_buffer_ + (8 * MAX_SB_SIZE);
- for (int i = 0; i < MAX_SB_SQUARE; ++i) {
- pred_[i] = rnd_.Rand8();
- }
- for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
- ref_buffer_[i] = rnd_.Rand8();
- }
-}
-
-void AV1CompMaskVarianceTest::TearDown() {
- aom_free(comp_pred1_);
- aom_free(comp_pred2_);
- aom_free(pred_);
- aom_free(ref_buffer_);
-}
-
-void AV1CompMaskVarianceTest::RunCheckOutput(comp_mask_pred_func test_impl,
- BLOCK_SIZE bsize, int inv) {
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
- for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- aom_comp_mask_pred_c(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w,
- inv);
- test_impl(comp_pred2_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w, inv);
-
- ASSERT_EQ(CheckResult(w, h), true)
- << " wedge " << wedge_index << " inv " << inv;
- }
-}
-
-void AV1CompMaskVarianceTest::RunSpeedTest(comp_mask_pred_func test_impl,
- BLOCK_SIZE bsize) {
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
- int wedge_index = wedge_types / 2;
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
- const int num_loops = 1000000000 / (w + h);
-
- comp_mask_pred_func funcs[2] = { aom_comp_mask_pred_c, test_impl };
- double elapsed_time[2] = { 0 };
- for (int i = 0; i < 2; ++i) {
- aom_usec_timer timer;
- aom_usec_timer_start(&timer);
- comp_mask_pred_func func = funcs[i];
- for (int j = 0; j < num_loops; ++j) {
- func(comp_pred1_, pred_, w, h, ref_, MAX_SB_SIZE, mask, w, 0);
- }
- aom_usec_timer_mark(&timer);
- double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
- elapsed_time[i] = 1000.0 * time / num_loops;
- }
- printf("compMask %3dx%-3d: %7.2f/%7.2fns", w, h, elapsed_time[0],
- elapsed_time[1]);
- printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
-}
-
-TEST_P(AV1CompMaskVarianceTest, CheckOutput) {
- // inv = 0, 1
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-TEST_P(AV1CompMaskVarianceTest, DISABLED_Speed) {
- RunSpeedTest(GET_PARAM(0), GET_PARAM(1));
-}
-
-#if HAVE_SSSE3
-INSTANTIATE_TEST_SUITE_P(
- SSSE3, AV1CompMaskVarianceTest,
- ::testing::Combine(::testing::Values(&aom_comp_mask_pred_ssse3),
- ::testing::ValuesIn(kValidBlockSize)));
-#endif
-
-#if HAVE_AVX2
-INSTANTIATE_TEST_SUITE_P(
- AVX2, AV1CompMaskVarianceTest,
- ::testing::Combine(::testing::Values(&aom_comp_mask_pred_avx2),
- ::testing::ValuesIn(kValidBlockSize)));
-#endif
-
-#ifndef aom_comp_mask_pred
-// can't run this test if aom_comp_mask_pred is defined to aom_comp_mask_pred_c
-class AV1CompMaskUpVarianceTest : public AV1CompMaskVarianceTest {
- public:
- ~AV1CompMaskUpVarianceTest();
-
- protected:
- void RunCheckOutput(comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv);
- void RunSpeedTest(comp_mask_pred_func test_impl, BLOCK_SIZE bsize,
- int havSub);
-};
-
-AV1CompMaskUpVarianceTest::~AV1CompMaskUpVarianceTest() {}
-
-void AV1CompMaskUpVarianceTest::RunCheckOutput(comp_mask_pred_func test_impl,
- BLOCK_SIZE bsize, int inv) {
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
- int subpel_search;
- for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
- ++subpel_search) {
- // loop through subx and suby
- for (int sub = 0; sub < 8 * 8; ++sub) {
- int subx = sub & 0x7;
- int suby = (sub >> 3);
- for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
- const uint8_t *mask =
- av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- // ref
- aom_comp_mask_upsampled_pred_c(
- nullptr, nullptr, 0, 0, nullptr, comp_pred1_, pred_, w, h, subx,
- suby, ref_, MAX_SB_SIZE, mask, w, inv, subpel_search);
-
- aom_comp_mask_pred = test_impl; // test
- aom_comp_mask_upsampled_pred(nullptr, nullptr, 0, 0, nullptr,
- comp_pred2_, pred_, w, h, subx, suby, ref_,
- MAX_SB_SIZE, mask, w, inv, subpel_search);
- ASSERT_EQ(CheckResult(w, h), true)
- << " wedge " << wedge_index << " inv " << inv << "sub (" << subx
- << "," << suby << ")";
- }
- }
- }
-}
-
-void AV1CompMaskUpVarianceTest::RunSpeedTest(comp_mask_pred_func test_impl,
- BLOCK_SIZE bsize, int havSub) {
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int subx = havSub ? 3 : 0;
- const int suby = havSub ? 4 : 0;
- const int wedge_types = get_wedge_types_lookup(bsize);
- int wedge_index = wedge_types / 2;
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- const int num_loops = 1000000000 / (w + h);
- comp_mask_pred_func funcs[2] = { &aom_comp_mask_pred_c, test_impl };
- double elapsed_time[2] = { 0 };
- int subpel_search = USE_8_TAPS; // set to USE_4_TAPS to test 4-tap filter.
- for (int i = 0; i < 2; ++i) {
- aom_usec_timer timer;
- aom_usec_timer_start(&timer);
- aom_comp_mask_pred = funcs[i];
- for (int j = 0; j < num_loops; ++j) {
- aom_comp_mask_upsampled_pred(nullptr, nullptr, 0, 0, nullptr, comp_pred1_,
- pred_, w, h, subx, suby, ref_, MAX_SB_SIZE,
- mask, w, 0, subpel_search);
- }
- aom_usec_timer_mark(&timer);
- double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
- elapsed_time[i] = 1000.0 * time / num_loops;
- }
- printf("CompMaskUp[%d] %3dx%-3d:%7.2f/%7.2fns", havSub, w, h, elapsed_time[0],
- elapsed_time[1]);
- printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
-}
-
-TEST_P(AV1CompMaskUpVarianceTest, CheckOutput) {
- // inv mask = 0, 1
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-TEST_P(AV1CompMaskUpVarianceTest, DISABLED_Speed) {
- RunSpeedTest(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-#if HAVE_SSSE3
-INSTANTIATE_TEST_SUITE_P(
- SSSE3, AV1CompMaskUpVarianceTest,
- ::testing::Combine(::testing::Values(&aom_comp_mask_pred_ssse3),
- ::testing::ValuesIn(kValidBlockSize)));
-#endif
-
-#if HAVE_AVX2
-INSTANTIATE_TEST_SUITE_P(
- AVX2, AV1CompMaskUpVarianceTest,
- ::testing::Combine(::testing::Values(&aom_comp_mask_pred_avx2),
- ::testing::ValuesIn(kValidBlockSize)));
-#endif
-
-#endif // ifndef aom_comp_mask_pred
-
-#if CONFIG_AV1_HIGHBITDEPTH
-typedef void (*highbd_comp_mask_pred_func)(uint8_t *comp_pred8,
- const uint8_t *pred8, int width,
- int height, const uint8_t *ref8,
- int ref_stride, const uint8_t *mask,
- int mask_stride, int invert_mask);
-
-typedef std::tuple<highbd_comp_mask_pred_func, BLOCK_SIZE, int>
- HighbdCompMaskPredParam;
-
-class AV1HighbdCompMaskVarianceTest
- : public ::testing::TestWithParam<HighbdCompMaskPredParam> {
- public:
- ~AV1HighbdCompMaskVarianceTest();
- void SetUp();
-
- void TearDown();
-
- protected:
- void RunCheckOutput(highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize,
- int inv);
- void RunSpeedTest(highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize);
- bool CheckResult(int width, int height) {
- for (int y = 0; y < height; ++y) {
- for (int x = 0; x < width; ++x) {
- const int idx = y * width + x;
- if (comp_pred1_[idx] != comp_pred2_[idx]) {
- printf("%dx%d mismatch @%d(%d,%d) ", width, height, idx, y, x);
- printf("%d != %d ", comp_pred1_[idx], comp_pred2_[idx]);
- return false;
- }
- }
- }
- return true;
- }
-
- libaom_test::ACMRandom rnd_;
- uint16_t *comp_pred1_;
- uint16_t *comp_pred2_;
- uint16_t *pred_;
- uint16_t *ref_buffer_;
- uint16_t *ref_;
-};
-GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(AV1HighbdCompMaskVarianceTest);
-
-AV1HighbdCompMaskVarianceTest::~AV1HighbdCompMaskVarianceTest() {}
-
-void AV1HighbdCompMaskVarianceTest::SetUp() {
- rnd_.Reset(libaom_test::ACMRandom::DeterministicSeed());
- av1_init_wedge_masks();
-
- comp_pred1_ =
- (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*comp_pred1_));
- ASSERT_NE(comp_pred1_, nullptr);
- comp_pred2_ =
- (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*comp_pred2_));
- ASSERT_NE(comp_pred2_, nullptr);
- pred_ = (uint16_t *)aom_memalign(16, MAX_SB_SQUARE * sizeof(*pred_));
- ASSERT_NE(pred_, nullptr);
- ref_buffer_ = (uint16_t *)aom_memalign(
- 16, (MAX_SB_SQUARE + (8 * MAX_SB_SIZE)) * sizeof(*ref_buffer_));
- ASSERT_NE(ref_buffer_, nullptr);
- ref_ = ref_buffer_ + (8 * MAX_SB_SIZE);
-}
-
-void AV1HighbdCompMaskVarianceTest::TearDown() {
- aom_free(comp_pred1_);
- aom_free(comp_pred2_);
- aom_free(pred_);
- aom_free(ref_buffer_);
-}
-
-void AV1HighbdCompMaskVarianceTest::RunCheckOutput(
- highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv) {
- int bd_ = GET_PARAM(2);
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
-
- for (int i = 0; i < MAX_SB_SQUARE; ++i) {
- pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
- for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
- ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
-
- for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- aom_highbd_comp_mask_pred_c(
- CONVERT_TO_BYTEPTR(comp_pred1_), CONVERT_TO_BYTEPTR(pred_), w, h,
- CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, inv);
-
- test_impl(CONVERT_TO_BYTEPTR(comp_pred2_), CONVERT_TO_BYTEPTR(pred_), w, h,
- CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, inv);
-
- ASSERT_EQ(CheckResult(w, h), true)
- << " wedge " << wedge_index << " inv " << inv;
- }
-}
-
-void AV1HighbdCompMaskVarianceTest::RunSpeedTest(
- highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize) {
- int bd_ = GET_PARAM(2);
-
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
- int wedge_index = wedge_types / 2;
-
- for (int i = 0; i < MAX_SB_SQUARE; ++i) {
- pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
- for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
- ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
-
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
- const int num_loops = 1000000000 / (w + h);
-
- highbd_comp_mask_pred_func funcs[2] = { aom_highbd_comp_mask_pred_c,
- test_impl };
- double elapsed_time[2] = { 0 };
- for (int i = 0; i < 2; ++i) {
- aom_usec_timer timer;
- aom_usec_timer_start(&timer);
- highbd_comp_mask_pred_func func = funcs[i];
- for (int j = 0; j < num_loops; ++j) {
- func(CONVERT_TO_BYTEPTR(comp_pred1_), CONVERT_TO_BYTEPTR(pred_), w, h,
- CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE, mask, w, 0);
- }
- aom_usec_timer_mark(&timer);
- double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
- elapsed_time[i] = 1000.0 * time / num_loops;
- }
- printf("compMask %3dx%-3d: %7.2f/%7.2fns", w, h, elapsed_time[0],
- elapsed_time[1]);
- printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
-}
-
-TEST_P(AV1HighbdCompMaskVarianceTest, CheckOutput) {
- // inv = 0, 1
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-TEST_P(AV1HighbdCompMaskVarianceTest, DISABLED_Speed) {
- RunSpeedTest(GET_PARAM(0), GET_PARAM(1));
-}
-
-#if HAVE_AVX2
-INSTANTIATE_TEST_SUITE_P(
- AVX2, AV1HighbdCompMaskVarianceTest,
- ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_avx2),
- ::testing::ValuesIn(kValidBlockSize),
- ::testing::Range(8, 13, 2)));
-#endif
-
-#if HAVE_SSE2
-INSTANTIATE_TEST_SUITE_P(
- SSE2, AV1HighbdCompMaskVarianceTest,
- ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_sse2),
- ::testing::ValuesIn(kValidBlockSize),
- ::testing::Range(8, 13, 2)));
-#endif
-
-#ifndef aom_highbd_comp_mask_pred
-// can't run this test if aom_highbd_comp_mask_pred is defined to
-// aom_highbd_comp_mask_pred_c
-class AV1HighbdCompMaskUpVarianceTest : public AV1HighbdCompMaskVarianceTest {
- public:
- ~AV1HighbdCompMaskUpVarianceTest();
-
- protected:
- void RunCheckOutput(highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize,
- int inv);
- void RunSpeedTest(highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize,
- int havSub);
-};
-
-AV1HighbdCompMaskUpVarianceTest::~AV1HighbdCompMaskUpVarianceTest() {}
-
-void AV1HighbdCompMaskUpVarianceTest::RunCheckOutput(
- highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int inv) {
- (void)test_impl;
- int bd_ = GET_PARAM(2);
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int wedge_types = get_wedge_types_lookup(bsize);
-
- for (int i = 0; i < MAX_SB_SQUARE; ++i) {
- pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
- for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
- ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
-
- int subpel_search;
- for (subpel_search = 1; subpel_search <= 2; ++subpel_search) {
- // loop through subx and suby
- for (int sub = 0; sub < 8 * 8; ++sub) {
- int subx = sub & 0x7;
- int suby = (sub >> 3);
- for (int wedge_index = 0; wedge_index < wedge_types; ++wedge_index) {
- const uint8_t *mask =
- av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- // ref
- aom_highbd_upsampled_pred_c(nullptr, nullptr, 0, 0, nullptr,
- CONVERT_TO_BYTEPTR(comp_pred1_), w, h, subx,
- suby, CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE,
- bd_, subpel_search);
-
- aom_highbd_comp_mask_pred_c(
- CONVERT_TO_BYTEPTR(comp_pred1_), CONVERT_TO_BYTEPTR(pred_), w, h,
- CONVERT_TO_BYTEPTR(comp_pred1_), w, mask, w, inv);
-
- // test
- aom_highbd_upsampled_pred(nullptr, nullptr, 0, 0, nullptr,
- CONVERT_TO_BYTEPTR(comp_pred2_), w, h, subx,
- suby, CONVERT_TO_BYTEPTR(ref_), MAX_SB_SIZE,
- bd_, subpel_search);
-
- aom_highbd_comp_mask_pred(
- CONVERT_TO_BYTEPTR(comp_pred2_), CONVERT_TO_BYTEPTR(pred_), w, h,
- CONVERT_TO_BYTEPTR(comp_pred2_), w, mask, w, inv);
-
- ASSERT_EQ(CheckResult(w, h), true)
- << " wedge " << wedge_index << " inv " << inv << "sub (" << subx
- << "," << suby << ")";
- }
- }
- }
-}
-
-void AV1HighbdCompMaskUpVarianceTest::RunSpeedTest(
- highbd_comp_mask_pred_func test_impl, BLOCK_SIZE bsize, int havSub) {
- int bd_ = GET_PARAM(2);
- const int w = block_size_wide[bsize];
- const int h = block_size_high[bsize];
- const int subx = havSub ? 3 : 0;
- const int suby = havSub ? 4 : 0;
- const int wedge_types = get_wedge_types_lookup(bsize);
- int wedge_index = wedge_types / 2;
- const uint8_t *mask = av1_get_contiguous_soft_mask(wedge_index, 1, bsize);
-
- for (int i = 0; i < MAX_SB_SQUARE; ++i) {
- pred_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
- for (int i = 0; i < MAX_SB_SQUARE + (8 * MAX_SB_SIZE); ++i) {
- ref_buffer_[i] = rnd_.Rand16() & ((1 << bd_) - 1);
- }
-
- const int num_loops = 1000000000 / (w + h);
- highbd_comp_mask_pred_func funcs[2] = { &aom_highbd_comp_mask_pred_c,
- test_impl };
- double elapsed_time[2] = { 0 };
- for (int i = 0; i < 2; ++i) {
- aom_usec_timer timer;
- aom_usec_timer_start(&timer);
- aom_highbd_comp_mask_pred = funcs[i];
- int subpel_search = 2; // set to 1 to test 4-tap filter.
- for (int j = 0; j < num_loops; ++j) {
- aom_highbd_comp_mask_upsampled_pred(
- nullptr, nullptr, 0, 0, nullptr, CONVERT_TO_BYTEPTR(comp_pred1_),
- CONVERT_TO_BYTEPTR(pred_), w, h, subx, suby, CONVERT_TO_BYTEPTR(ref_),
- MAX_SB_SIZE, mask, w, 0, bd_, subpel_search);
- }
- aom_usec_timer_mark(&timer);
- double time = static_cast<double>(aom_usec_timer_elapsed(&timer));
- elapsed_time[i] = 1000.0 * time / num_loops;
- }
- printf("CompMaskUp[%d] %3dx%-3d:%7.2f/%7.2fns", havSub, w, h, elapsed_time[0],
- elapsed_time[1]);
- printf("(%3.2f)\n", elapsed_time[0] / elapsed_time[1]);
-}
-
-TEST_P(AV1HighbdCompMaskUpVarianceTest, CheckOutput) {
- // inv mask = 0, 1
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 0);
- RunCheckOutput(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-TEST_P(AV1HighbdCompMaskUpVarianceTest, DISABLED_Speed) {
- RunSpeedTest(GET_PARAM(0), GET_PARAM(1), 1);
-}
-
-#if HAVE_AVX2
-INSTANTIATE_TEST_SUITE_P(
- AVX2, AV1HighbdCompMaskUpVarianceTest,
- ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_avx2),
- ::testing::ValuesIn(kValidBlockSize),
- ::testing::Range(8, 13, 2)));
-#endif
-
-#if HAVE_SSE2
-INSTANTIATE_TEST_SUITE_P(
- SSE2, AV1HighbdCompMaskUpVarianceTest,
- ::testing::Combine(::testing::Values(&aom_highbd_comp_mask_pred_sse2),
- ::testing::ValuesIn(kValidBlockSize),
- ::testing::Range(8, 13, 2)));
-#endif
-
-#endif // ifndef aom_highbd_comp_mask_pred
-#endif // CONFIG_AV1_HIGHBITDEPTH
-} // namespace AV1CompMaskVariance
diff --git a/test/convolve_test.cc b/test/convolve_test.cc
index d5232ee..8aed171 100644
--- a/test/convolve_test.cc
+++ b/test/convolve_test.cc
@@ -31,6 +31,10 @@
static const unsigned int kMaxDimension = MAX_SB_SIZE;
+static const int16_t kInvalidFilter[8] = {};
+static const int kNumFilterBanks = SWITCHABLE_FILTERS;
+static const int kNumFilters = 16;
+
typedef void (*ConvolveFunc)(const uint8_t *src, ptrdiff_t src_stride,
uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *filter_x, int filter_x_stride,
@@ -265,7 +269,7 @@
output_width, output_height);
}
-class ConvolveTest : public ::testing::TestWithParam<ConvolveParam> {
+class ConvolveTestBase : public ::testing::TestWithParam<ConvolveParam> {
public:
static void SetUpTestSuite() {
// Force input_ to be unaligned, output to be 16 byte aligned.
@@ -462,6 +466,202 @@
}
}
+ void MatchesReferenceSubpixelFilter() {
+ uint8_t *const in = input();
+ uint8_t *const out = output();
+ uint8_t *ref;
+ if (UUT_->use_highbd_ == 0) {
+ ref = ref8_;
+ } else {
+ ref = CONVERT_TO_BYTEPTR(ref16_);
+ }
+ int subpel_search;
+ for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
+ ++subpel_search) {
+ for (int filter_bank = 0; filter_bank < kNumFilterBanks; ++filter_bank) {
+ const InterpFilter filter = (InterpFilter)filter_bank;
+ const InterpKernel *filters =
+ (const InterpKernel *)av1_get_interp_filter_kernel(filter,
+ subpel_search);
+ for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
+ for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
+ wrapper_filter_block2d_8_c(in, kInputStride, filters[filter_x],
+ filters[filter_y], ref, kOutputStride,
+ Width(), Height());
+
+ if (filter_x && filter_y)
+ continue;
+ else if (filter_y)
+ UUT_->v8_(in, kInputStride, out, kOutputStride, kInvalidFilter,
+ 16, filters[filter_y], 16, Width(), Height());
+ else if (filter_x)
+ API_REGISTER_STATE_CHECK(UUT_->h8_(
+ in, kInputStride, out, kOutputStride, filters[filter_x], 16,
+ kInvalidFilter, 16, Width(), Height()));
+ else
+ continue;
+
+ CheckGuardBlocks();
+
+ for (int y = 0; y < Height(); ++y)
+ for (int x = 0; x < Width(); ++x)
+ ASSERT_EQ(lookup(ref, y * kOutputStride + x),
+ lookup(out, y * kOutputStride + x))
+ << "mismatch at (" << x << "," << y << "), "
+ << "filters (" << filter_bank << "," << filter_x << ","
+ << filter_y << ")";
+ }
+ }
+ }
+ }
+ }
+
+ void FilterExtremes() {
+ uint8_t *const in = input();
+ uint8_t *const out = output();
+ uint8_t *ref;
+ if (UUT_->use_highbd_ == 0) {
+ ref = ref8_;
+ } else {
+ ref = CONVERT_TO_BYTEPTR(ref16_);
+ }
+
+ // Populate ref and out with some random data
+ ::libaom_test::ACMRandom prng;
+ for (int y = 0; y < Height(); ++y) {
+ for (int x = 0; x < Width(); ++x) {
+ uint16_t r;
+ if (UUT_->use_highbd_ == 0 || UUT_->use_highbd_ == 8) {
+ r = prng.Rand8Extremes();
+ } else {
+ r = prng.Rand16() & mask_;
+ }
+ assign_val(out, y * kOutputStride + x, r);
+ assign_val(ref, y * kOutputStride + x, r);
+ }
+ }
+
+ for (int axis = 0; axis < 2; axis++) {
+ int seed_val = 0;
+ while (seed_val < 256) {
+ for (int y = 0; y < 8; ++y) {
+ for (int x = 0; x < 8; ++x) {
+ assign_val(in, y * kOutputStride + x - SUBPEL_TAPS / 2 + 1,
+ ((seed_val >> (axis ? y : x)) & 1) * mask_);
+ if (axis) seed_val++;
+ }
+ if (axis)
+ seed_val -= 8;
+ else
+ seed_val++;
+ }
+ if (axis) seed_val += 8;
+ int subpel_search;
+ for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
+ ++subpel_search) {
+ for (int filter_bank = 0; filter_bank < kNumFilterBanks;
+ ++filter_bank) {
+ const InterpFilter filter = (InterpFilter)filter_bank;
+ const InterpKernel *filters =
+ (const InterpKernel *)av1_get_interp_filter_kernel(
+ filter, subpel_search);
+ for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
+ for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
+ wrapper_filter_block2d_8_c(in, kInputStride, filters[filter_x],
+ filters[filter_y], ref,
+ kOutputStride, Width(), Height());
+ if (filter_x && filter_y)
+ continue;
+ else if (filter_y)
+ API_REGISTER_STATE_CHECK(UUT_->v8_(
+ in, kInputStride, out, kOutputStride, kInvalidFilter, 16,
+ filters[filter_y], 16, Width(), Height()));
+ else if (filter_x)
+ API_REGISTER_STATE_CHECK(UUT_->h8_(
+ in, kInputStride, out, kOutputStride, filters[filter_x],
+ 16, kInvalidFilter, 16, Width(), Height()));
+ else
+ continue;
+
+ for (int y = 0; y < Height(); ++y)
+ for (int x = 0; x < Width(); ++x)
+ ASSERT_EQ(lookup(ref, y * kOutputStride + x),
+ lookup(out, y * kOutputStride + x))
+ << "mismatch at (" << x << "," << y << "), "
+ << "filters (" << filter_bank << "," << filter_x << ","
+ << filter_y << ")";
+ }
+ }
+ }
+ }
+ }
+ }
+ }
+
+ void SpeedTest() {
+ uint8_t *const in = input();
+ uint8_t *const out = output();
+ uint8_t *ref;
+ if (UUT_->use_highbd_ == 0) {
+ ref = ref8_;
+ } else {
+ ref = CONVERT_TO_BYTEPTR(ref16_);
+ }
+
+ // Populate ref and out with some random data
+ ::libaom_test::ACMRandom prng;
+ for (int y = 0; y < Height(); ++y) {
+ for (int x = 0; x < Width(); ++x) {
+ uint16_t r;
+ if (UUT_->use_highbd_ == 0 || UUT_->use_highbd_ == 8) {
+ r = prng.Rand8Extremes();
+ } else {
+ r = prng.Rand16() & mask_;
+ }
+ assign_val(out, y * kOutputStride + x, r);
+ assign_val(ref, y * kOutputStride + x, r);
+ }
+ }
+
+ InterpFilter filter = (InterpFilter)1;
+ const InterpKernel *filters =
+ (const InterpKernel *)av1_get_interp_filter_kernel(filter, USE_8_TAPS);
+ wrapper_filter_average_block2d_8_c(in, kInputStride, filters[1], filters[1],
+ out, kOutputStride, Width(), Height());
+
+ aom_usec_timer timer;
+ int tests_num = 1000;
+
+ aom_usec_timer_start(&timer);
+ while (tests_num > 0) {
+ for (int filter_bank = 0; filter_bank < kNumFilterBanks; ++filter_bank) {
+ filter = (InterpFilter)filter_bank;
+ filters = (const InterpKernel *)av1_get_interp_filter_kernel(
+ filter, USE_8_TAPS);
+ for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
+ for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
+ if (filter_x && filter_y) continue;
+ if (filter_y)
+ API_REGISTER_STATE_CHECK(UUT_->v8_(
+ in, kInputStride, out, kOutputStride, kInvalidFilter, 16,
+ filters[filter_y], 16, Width(), Height()));
+ else if (filter_x)
+ API_REGISTER_STATE_CHECK(UUT_->h8_(
+ in, kInputStride, out, kOutputStride, filters[filter_x], 16,
+ kInvalidFilter, 16, Width(), Height()));
+ }
+ }
+ }
+ tests_num--;
+ }
+ aom_usec_timer_mark(&timer);
+
+ const int elapsed_time =
+ static_cast<int>(aom_usec_timer_elapsed(&timer) / 1000);
+ printf("%dx%d (bitdepth %d) time: %5d ms\n", Width(), Height(),
+ UUT_->use_highbd_, elapsed_time);
+ }
+
const ConvolveFunctions *UUT_;
static uint8_t *input_;
static uint8_t *ref8_;
@@ -474,21 +674,20 @@
int mask_;
};
-uint8_t *ConvolveTest::input_ = nullptr;
-uint8_t *ConvolveTest::ref8_ = nullptr;
-uint8_t *ConvolveTest::output_ = nullptr;
-uint8_t *ConvolveTest::output_ref_ = nullptr;
-uint16_t *ConvolveTest::input16_ = nullptr;
-uint16_t *ConvolveTest::ref16_ = nullptr;
-uint16_t *ConvolveTest::output16_ = nullptr;
-uint16_t *ConvolveTest::output16_ref_ = nullptr;
+uint8_t *ConvolveTestBase::input_ = nullptr;
+uint8_t *ConvolveTestBase::ref8_ = nullptr;
+uint8_t *ConvolveTestBase::output_ = nullptr;
+uint8_t *ConvolveTestBase::output_ref_ = nullptr;
+uint16_t *ConvolveTestBase::input16_ = nullptr;
+uint16_t *ConvolveTestBase::ref16_ = nullptr;
+uint16_t *ConvolveTestBase::output16_ = nullptr;
+uint16_t *ConvolveTestBase::output16_ref_ = nullptr;
-TEST_P(ConvolveTest, GuardBlocks) { CheckGuardBlocks(); }
+using LowbdConvolveTest = ConvolveTestBase;
-const int kNumFilterBanks = SWITCHABLE_FILTERS;
-const int kNumFilters = 16;
+TEST_P(LowbdConvolveTest, GuardBlocks) { CheckGuardBlocks(); }
-TEST(ConvolveTest, FiltersWontSaturateWhenAddedPairwise) {
+void FiltersWontSaturateWhenAddedPairwise() {
int subpel_search;
for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
++subpel_search) {
@@ -515,205 +714,17 @@
}
}
-const int16_t kInvalidFilter[8] = { 0 };
-
-TEST_P(ConvolveTest, MatchesReferenceSubpixelFilter) {
- uint8_t *const in = input();
- uint8_t *const out = output();
- uint8_t *ref;
- if (UUT_->use_highbd_ == 0) {
- ref = ref8_;
- } else {
- ref = CONVERT_TO_BYTEPTR(ref16_);
- }
- int subpel_search;
- for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
- ++subpel_search) {
- for (int filter_bank = 0; filter_bank < kNumFilterBanks; ++filter_bank) {
- const InterpFilter filter = (InterpFilter)filter_bank;
- const InterpKernel *filters =
- (const InterpKernel *)av1_get_interp_filter_kernel(filter,
- subpel_search);
- for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
- for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
- wrapper_filter_block2d_8_c(in, kInputStride, filters[filter_x],
- filters[filter_y], ref, kOutputStride,
- Width(), Height());
-
- if (filter_x && filter_y)
- continue;
- else if (filter_y)
- API_REGISTER_STATE_CHECK(
- UUT_->v8_(in, kInputStride, out, kOutputStride, kInvalidFilter,
- 16, filters[filter_y], 16, Width(), Height()));
- else if (filter_x)
- API_REGISTER_STATE_CHECK(UUT_->h8_(
- in, kInputStride, out, kOutputStride, filters[filter_x], 16,
- kInvalidFilter, 16, Width(), Height()));
- else
- continue;
-
- CheckGuardBlocks();
-
- for (int y = 0; y < Height(); ++y)
- for (int x = 0; x < Width(); ++x)
- ASSERT_EQ(lookup(ref, y * kOutputStride + x),
- lookup(out, y * kOutputStride + x))
- << "mismatch at (" << x << "," << y << "), "
- << "filters (" << filter_bank << "," << filter_x << ","
- << filter_y << ")";
- }
- }
- }
- }
+TEST(LowbdConvolveTest, FiltersWontSaturateWhenAddedPairwise) {
+ FiltersWontSaturateWhenAddedPairwise();
}
-TEST_P(ConvolveTest, FilterExtremes) {
- uint8_t *const in = input();
- uint8_t *const out = output();
- uint8_t *ref;
- if (UUT_->use_highbd_ == 0) {
- ref = ref8_;
- } else {
- ref = CONVERT_TO_BYTEPTR(ref16_);
- }
-
- // Populate ref and out with some random data
- ::libaom_test::ACMRandom prng;
- for (int y = 0; y < Height(); ++y) {
- for (int x = 0; x < Width(); ++x) {
- uint16_t r;
- if (UUT_->use_highbd_ == 0 || UUT_->use_highbd_ == 8) {
- r = prng.Rand8Extremes();
- } else {
- r = prng.Rand16() & mask_;
- }
- assign_val(out, y * kOutputStride + x, r);
- assign_val(ref, y * kOutputStride + x, r);
- }
- }
-
- for (int axis = 0; axis < 2; axis++) {
- int seed_val = 0;
- while (seed_val < 256) {
- for (int y = 0; y < 8; ++y) {
- for (int x = 0; x < 8; ++x) {
- assign_val(in, y * kOutputStride + x - SUBPEL_TAPS / 2 + 1,
- ((seed_val >> (axis ? y : x)) & 1) * mask_);
- if (axis) seed_val++;
- }
- if (axis)
- seed_val -= 8;
- else
- seed_val++;
- }
- if (axis) seed_val += 8;
- int subpel_search;
- for (subpel_search = USE_4_TAPS; subpel_search <= USE_8_TAPS;
- ++subpel_search) {
- for (int filter_bank = 0; filter_bank < kNumFilterBanks;
- ++filter_bank) {
- const InterpFilter filter = (InterpFilter)filter_bank;
- const InterpKernel *filters =
- (const InterpKernel *)av1_get_interp_filter_kernel(filter,
- subpel_search);
- for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
- for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
- wrapper_filter_block2d_8_c(in, kInputStride, filters[filter_x],
- filters[filter_y], ref, kOutputStride,
- Width(), Height());
- if (filter_x && filter_y)
- continue;
- else if (filter_y)
- API_REGISTER_STATE_CHECK(UUT_->v8_(
- in, kInputStride, out, kOutputStride, kInvalidFilter, 16,
- filters[filter_y], 16, Width(), Height()));
- else if (filter_x)
- API_REGISTER_STATE_CHECK(UUT_->h8_(
- in, kInputStride, out, kOutputStride, filters[filter_x], 16,
- kInvalidFilter, 16, Width(), Height()));
- else
- continue;
-
- for (int y = 0; y < Height(); ++y)
- for (int x = 0; x < Width(); ++x)
- ASSERT_EQ(lookup(ref, y * kOutputStride + x),
- lookup(out, y * kOutputStride + x))
- << "mismatch at (" << x << "," << y << "), "
- << "filters (" << filter_bank << "," << filter_x << ","
- << filter_y << ")";
- }
- }
- }
- }
- }
- }
+TEST_P(LowbdConvolveTest, MatchesReferenceSubpixelFilter) {
+ MatchesReferenceSubpixelFilter();
}
-TEST_P(ConvolveTest, DISABLED_Speed) {
- uint8_t *const in = input();
- uint8_t *const out = output();
- uint8_t *ref;
- if (UUT_->use_highbd_ == 0) {
- ref = ref8_;
- } else {
- ref = CONVERT_TO_BYTEPTR(ref16_);
- }
+TEST_P(LowbdConvolveTest, FilterExtremes) { FilterExtremes(); }
- // Populate ref and out with some random data
- ::libaom_test::ACMRandom prng;
- for (int y = 0; y < Height(); ++y) {
- for (int x = 0; x < Width(); ++x) {
- uint16_t r;
- if (UUT_->use_highbd_ == 0 || UUT_->use_highbd_ == 8) {
- r = prng.Rand8Extremes();
- } else {
- r = prng.Rand16() & mask_;
- }
- assign_val(out, y * kOutputStride + x, r);
- assign_val(ref, y * kOutputStride + x, r);
- }
- }
-
- const InterpFilter filter = (InterpFilter)1;
- const InterpKernel *filters =
- (const InterpKernel *)av1_get_interp_filter_kernel(filter, USE_8_TAPS);
- wrapper_filter_average_block2d_8_c(in, kInputStride, filters[1], filters[1],
- out, kOutputStride, Width(), Height());
-
- aom_usec_timer timer;
- int tests_num = 1000;
-
- aom_usec_timer_start(&timer);
- while (tests_num > 0) {
- for (int filter_bank = 0; filter_bank < kNumFilterBanks; ++filter_bank) {
- const InterpFilter filter = (InterpFilter)filter_bank;
- const InterpKernel *filters =
- (const InterpKernel *)av1_get_interp_filter_kernel(filter,
- USE_8_TAPS);
- for (int filter_x = 0; filter_x < kNumFilters; ++filter_x) {
- for (int filter_y = 0; filter_y < kNumFilters; ++filter_y) {
- if (filter_x && filter_y) continue;
- if (filter_y)
- API_REGISTER_STATE_CHECK(
- UUT_->v8_(in, kInputStride, out, kOutputStride, kInvalidFilter,
- 16, filters[filter_y], 16, Width(), Height()));
- else if (filter_x)
- API_REGISTER_STATE_CHECK(UUT_->h8_(
- in, kInputStride, out, kOutputStride, filters[filter_x], 16,
- kInvalidFilter, 16, Width(), Height()));
- }
- }
- }
- tests_num--;
- }
- aom_usec_timer_mark(&timer);
-
- const int elapsed_time =
- static_cast<int>(aom_usec_timer_elapsed(&timer) / 1000);
- printf("%dx%d (bitdepth %d) time: %5d ms\n", Width(), Height(),
- UUT_->use_highbd_, elapsed_time);
-}
+TEST_P(LowbdConvolveTest, DISABLED_Speed) { SpeedTest(); }
using std::make_tuple;
@@ -727,14 +738,14 @@
aom_highbd_##func(src, src_stride, dst, dst_stride, filter_x, \
filter_x_stride, filter_y, filter_y_stride, w, h, bd); \
}
-#if HAVE_SSE2 && ARCH_X86_64
+#if HAVE_SSE2 && AOM_ARCH_X86_64
WRAP(convolve8_horiz_sse2, 8)
WRAP(convolve8_vert_sse2, 8)
WRAP(convolve8_horiz_sse2, 10)
WRAP(convolve8_vert_sse2, 10)
WRAP(convolve8_horiz_sse2, 12)
WRAP(convolve8_vert_sse2, 12)
-#endif // HAVE_SSE2 && ARCH_X86_64
+#endif // HAVE_SSE2 && AOM_ARCH_X86_64
WRAP(convolve8_horiz_c, 8)
WRAP(convolve8_vert_c, 8)
@@ -758,25 +769,45 @@
#undef WRAP
#if CONFIG_AV1_HIGHBITDEPTH
+
+using HighbdConvolveTest = ConvolveTestBase;
+
+TEST_P(HighbdConvolveTest, GuardBlocks) { CheckGuardBlocks(); }
+
+TEST(HighbdConvolveTest, FiltersWontSaturateWhenAddedPairwise) {
+ FiltersWontSaturateWhenAddedPairwise();
+}
+
+TEST_P(HighbdConvolveTest, MatchesReferenceSubpixelFilter) {
+ MatchesReferenceSubpixelFilter();
+}
+
+TEST_P(HighbdConvolveTest, FilterExtremes) { FilterExtremes(); }
+
+TEST_P(HighbdConvolveTest, DISABLED_Speed) { SpeedTest(); }
+
const ConvolveFunctions wrap_convolve8_c(wrap_convolve8_horiz_c_8,
wrap_convolve8_vert_c_8, 8);
const ConvolveFunctions wrap_convolve10_c(wrap_convolve8_horiz_c_10,
wrap_convolve8_vert_c_10, 10);
const ConvolveFunctions wrap_convolve12_c(wrap_convolve8_horiz_c_12,
wrap_convolve8_vert_c_12, 12);
-const ConvolveParam kArrayConvolve_c[] = { ALL_SIZES(wrap_convolve8_c),
- ALL_SIZES(wrap_convolve10_c),
- ALL_SIZES(wrap_convolve12_c) };
-#else
+const ConvolveParam kArrayHighbdConvolve_c[] = { ALL_SIZES(wrap_convolve8_c),
+ ALL_SIZES(wrap_convolve10_c),
+ ALL_SIZES(wrap_convolve12_c) };
+
+INSTANTIATE_TEST_SUITE_P(C, HighbdConvolveTest,
+ ::testing::ValuesIn(kArrayHighbdConvolve_c));
+#endif // CONFIG_AV1_HIGHBITDEPTH
+
const ConvolveFunctions convolve8_c(aom_convolve8_horiz_c, aom_convolve8_vert_c,
0);
const ConvolveParam kArrayConvolve_c[] = { ALL_SIZES(convolve8_c) };
-#endif
-INSTANTIATE_TEST_SUITE_P(C, ConvolveTest,
+INSTANTIATE_TEST_SUITE_P(C, LowbdConvolveTest,
::testing::ValuesIn(kArrayConvolve_c));
-#if HAVE_SSE2 && ARCH_X86_64
+#if HAVE_SSE2 && AOM_ARCH_X86_64
#if CONFIG_AV1_HIGHBITDEPTH
const ConvolveFunctions wrap_convolve8_sse2(wrap_convolve8_horiz_sse2_8,
wrap_convolve8_vert_sse2_8, 8);
@@ -784,15 +815,19 @@
wrap_convolve8_vert_sse2_10, 10);
const ConvolveFunctions wrap_convolve12_sse2(wrap_convolve8_horiz_sse2_12,
wrap_convolve8_vert_sse2_12, 12);
-const ConvolveParam kArrayConvolve_sse2[] = { ALL_SIZES(wrap_convolve8_sse2),
- ALL_SIZES(wrap_convolve10_sse2),
- ALL_SIZES(wrap_convolve12_sse2) };
-#else
+const ConvolveParam kArrayHighbdConvolve_sse2[] = {
+ ALL_SIZES(wrap_convolve8_sse2), ALL_SIZES(wrap_convolve10_sse2),
+ ALL_SIZES(wrap_convolve12_sse2)
+};
+
+INSTANTIATE_TEST_SUITE_P(SSE2, HighbdConvolveTest,
+ ::testing::ValuesIn(kArrayHighbdConvolve_sse2));
+#endif
const ConvolveFunctions convolve8_sse2(aom_convolve8_horiz_sse2,
aom_convolve8_vert_sse2, 0);
const ConvolveParam kArrayConvolve_sse2[] = { ALL_SIZES(convolve8_sse2) };
-#endif
-INSTANTIATE_TEST_SUITE_P(SSE2, ConvolveTest,
+
+INSTANTIATE_TEST_SUITE_P(SSE2, LowbdConvolveTest,
::testing::ValuesIn(kArrayConvolve_sse2));
#endif
@@ -801,7 +836,8 @@
aom_convolve8_vert_ssse3, 0);
const ConvolveParam kArrayConvolve8_ssse3[] = { ALL_SIZES(convolve8_ssse3) };
-INSTANTIATE_TEST_SUITE_P(SSSE3, ConvolveTest,
+
+INSTANTIATE_TEST_SUITE_P(SSSE3, LowbdConvolveTest,
::testing::ValuesIn(kArrayConvolve8_ssse3));
#endif
@@ -813,18 +849,29 @@
wrap_convolve8_vert_avx2_10, 10);
const ConvolveFunctions wrap_convolve12_avx2(wrap_convolve8_horiz_avx2_12,
wrap_convolve8_vert_avx2_12, 12);
-const ConvolveParam kArray_Convolve8_avx2[] = {
+const ConvolveParam kArray_HighbdConvolve8_avx2[] = {
ALL_SIZES_64(wrap_convolve8_avx2), ALL_SIZES_64(wrap_convolve10_avx2),
ALL_SIZES_64(wrap_convolve12_avx2)
};
-#else
+
+INSTANTIATE_TEST_SUITE_P(AVX2, HighbdConvolveTest,
+ ::testing::ValuesIn(kArray_HighbdConvolve8_avx2));
+#endif
const ConvolveFunctions convolve8_avx2(aom_convolve8_horiz_avx2,
aom_convolve8_vert_avx2, 0);
const ConvolveParam kArray_Convolve8_avx2[] = { ALL_SIZES(convolve8_avx2) };
-#endif
-INSTANTIATE_TEST_SUITE_P(AVX2, ConvolveTest,
+INSTANTIATE_TEST_SUITE_P(AVX2, LowbdConvolveTest,
::testing::ValuesIn(kArray_Convolve8_avx2));
#endif // HAVE_AVX2
+#if HAVE_NEON
+const ConvolveFunctions convolve8_neon(aom_convolve8_horiz_neon,
+ aom_convolve8_vert_neon, 0);
+const ConvolveParam kArray_Convolve8_neon[] = { ALL_SIZES(convolve8_neon) };
+
+INSTANTIATE_TEST_SUITE_P(NEON, LowbdConvolveTest,
+ ::testing::ValuesIn(kArray_Convolve8_neon));
+#endif // HAVE_NEON
+
} // namespace
diff --git a/test/corner_match_test.cc b/test/corner_match_test.cc
index 673205a..93ca8ec 100644
--- a/test/corner_match_test.cc
+++ b/test/corner_match_test.cc
@@ -27,9 +27,9 @@
using libaom_test::ACMRandom;
-typedef double (*ComputeCrossCorrFunc)(unsigned char *im1, int stride1, int x1,
- int y1, unsigned char *im2, int stride2,
- int x2, int y2);
+typedef double (*ComputeCrossCorrFunc)(const unsigned char *im1, int stride1,
+ int x1, int y1, const unsigned char *im2,
+ int stride2, int x2, int y2);
using std::make_tuple;
using std::tuple;
diff --git a/test/cpu_used_firstpass_test.cc b/test/cpu_used_firstpass_test.cc
index c53db6e..cfffcd7 100644
--- a/test/cpu_used_firstpass_test.cc
+++ b/test/cpu_used_firstpass_test.cc
@@ -9,6 +9,8 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
+#include <cstdlib>
+
#include "test/codec_factory.h"
#include "test/encode_test_driver.h"
#include "test/i420_video_source.h"
@@ -84,7 +86,7 @@
first_pass_cpu_used_ = GET_PARAM(1);
if (first_pass_cpu_used_ == second_pass_cpu_used_) return;
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
- psnr_diff = abs(ref_psnr - GetAveragePsnr());
+ psnr_diff = std::abs(ref_psnr - GetAveragePsnr());
EXPECT_LT(psnr_diff, GetPsnrDiffThreshold())
<< "first pass cpu used = " << first_pass_cpu_used_
<< ", second pass cpu used = " << second_pass_cpu_used_;
diff --git a/test/datarate_test.cc b/test/datarate_test.cc
index 8fdc662..21b40d9 100644
--- a/test/datarate_test.cc
+++ b/test/datarate_test.cc
@@ -12,6 +12,7 @@
#include "config/aom_config.h"
#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
+#include "test/acm_random.h"
#include "test/codec_factory.h"
#include "test/datarate_test.h"
#include "test/encode_test_driver.h"
@@ -109,7 +110,7 @@
<< " The datarate for the file is lower than target by too much!";
ASSERT_LE(effective_datarate_, cfg_.rc_target_bitrate * 1.19)
<< " The datarate for the file is greater than target by too much!";
- ASSERT_LT(num_spikes_, 8);
+ ASSERT_LE(num_spikes_, 8);
ASSERT_LT(num_spikes_high_, 1);
}
@@ -347,7 +348,7 @@
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
ASSERT_GE(effective_datarate_, cfg_.rc_target_bitrate * 0.85)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_, cfg_.rc_target_bitrate * 1.31)
+ ASSERT_LE(effective_datarate_, cfg_.rc_target_bitrate * 1.40)
<< " The datarate for the file is greater than target by too much!";
if (last_drop > 0) {
ASSERT_LE(first_drop_, last_drop)
@@ -396,7 +397,11 @@
}
// Check basic rate targeting for CBR, for 444 input screen mode.
+#if defined(CONFIG_MAX_DECODE_PROFILE) && CONFIG_MAX_DECODE_PROFILE < 1
+TEST_P(DatarateTestLarge, DISABLED_BasicRateTargeting444CBRScreen) {
+#else
TEST_P(DatarateTestLarge, BasicRateTargeting444CBRScreen) {
+#endif
BasicRateTargeting444CBRScreenTest();
}
@@ -508,7 +513,11 @@
}
// Check basic rate targeting for CBR for 444 screen mode.
+#if defined(CONFIG_MAX_DECODE_PROFILE) && CONFIG_MAX_DECODE_PROFILE < 1
+TEST_P(DatarateTestRealtime, DISABLED_BasicRateTargeting444CBRScreen) {
+#else
TEST_P(DatarateTestRealtime, BasicRateTargeting444CBRScreen) {
+#endif
BasicRateTargeting444CBRScreenTest();
}
@@ -524,6 +533,68 @@
ChangingSpeedTest();
}
+class DatarateTestSetFrameQpRealtime
+ : public DatarateTest,
+ public ::testing::TestWithParam<const libaom_test::AV1CodecFactory *> {
+ public:
+ DatarateTestSetFrameQpRealtime() : DatarateTest(GetParam()), frame_(0) {}
+
+ protected:
+ virtual ~DatarateTestSetFrameQpRealtime() {}
+
+ virtual void SetUp() {
+ InitializeConfig(libaom_test::kRealTime);
+ ResetModel();
+ }
+
+ virtual void PreEncodeFrameHook(::libaom_test::VideoSource *video,
+ ::libaom_test::Encoder *encoder) {
+ set_cpu_used_ = 7;
+ DatarateTest::PreEncodeFrameHook(video, encoder);
+ frame_qp_ = rnd_.PseudoUniform(63);
+ encoder->Control(AV1E_SET_QUANTIZER_ONE_PASS, frame_qp_);
+ frame_++;
+ }
+
+ virtual void PostEncodeFrameHook(::libaom_test::Encoder *encoder) {
+ if (frame_ >= total_frames_) return;
+ int qp = 0;
+ encoder->Control(AOME_GET_LAST_QUANTIZER_64, &qp);
+ ASSERT_EQ(qp, frame_qp_);
+ }
+
+ protected:
+ int total_frames_;
+
+ private:
+ int frame_qp_;
+ int frame_;
+ libaom_test::ACMRandom rnd_;
+};
+
+TEST_P(DatarateTestSetFrameQpRealtime, SetFrameQpOnePass) {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_undershoot_pct = 20;
+ cfg_.rc_undershoot_pct = 20;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 50;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.rc_target_bitrate = 200;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 1;
+ cfg_.kf_max_dist = 9999;
+ cfg_.rc_dropframe_thresh = 0;
+
+ total_frames_ = 100;
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352, 288,
+ 30, 1, 0, 100);
+
+ ResetModel();
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+}
+
AV1_INSTANTIATE_TEST_SUITE(DatarateTestLarge,
::testing::Values(::libaom_test::kRealTime),
::testing::Range(5, 7), ::testing::Values(0, 3),
@@ -535,16 +606,21 @@
AV1_INSTANTIATE_TEST_SUITE(DatarateTestRealtime,
::testing::Values(::libaom_test::kRealTime),
- ::testing::Range(7, 11), ::testing::Values(0, 3),
+ ::testing::Range(7, 12), ::testing::Values(0, 3),
::testing::Values(0, 1));
AV1_INSTANTIATE_TEST_SUITE(DatarateTestFrameDropRealtime,
::testing::Values(::libaom_test::kRealTime),
- ::testing::Range(7, 11), ::testing::Values(0, 3));
+ ::testing::Range(7, 12), ::testing::Values(0, 3));
AV1_INSTANTIATE_TEST_SUITE(DatarateTestSpeedChangeRealtime,
::testing::Values(::libaom_test::kRealTime),
::testing::Values(0, 3));
+INSTANTIATE_TEST_SUITE_P(
+ AV1, DatarateTestSetFrameQpRealtime,
+ ::testing::Values(
+ static_cast<const libaom_test::CodecFactory *>(&libaom_test::kAV1)));
+
} // namespace
} // namespace datarate_test
diff --git a/test/deltaq_mode_test.cc b/test/deltaq_mode_test.cc
index 0a5e5aa..5960d27 100644
--- a/test/deltaq_mode_test.cc
+++ b/test/deltaq_mode_test.cc
@@ -10,12 +10,14 @@
*/
#include <cstddef>
+#include <cstdint>
#include <vector>
#include "aom/aomcx.h"
#include "aom/aom_codec.h"
#include "aom/aom_encoder.h"
#include "aom/aom_image.h"
+#include "config/aom_config.h"
#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
namespace {
@@ -67,7 +69,7 @@
EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
aom_codec_iter_t iter = nullptr;
const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
- EXPECT_NE(pkt, nullptr);
+ ASSERT_NE(pkt, nullptr);
EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
// pkt->data.frame.flags is 0x1f0011.
EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
@@ -83,4 +85,125 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
}
+// The implementation of multi-threading for deltaq-mode=3 in allintra
+// mode is based on row multi-threading.
+// The test ensures that When row mt is turned off,
+// deltaq-mode = 3 can still properly encode and decode.
+TEST(DeltaqModeTest, DeltaqMode3MultiThreadNoRowMT) {
+ constexpr int kWidth = 1280;
+ constexpr int kHeight = 720;
+ // Dummy buffer of neutral gray samples.
+ constexpr size_t kBufferSize = kWidth * kHeight + kWidth * kHeight / 2;
+ std::vector<unsigned char> buffer(kBufferSize,
+ static_cast<unsigned char>(128));
+
+ aom_image_t img;
+ EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I420, kWidth, kHeight, 1,
+ buffer.data()));
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_GOOD_QUALITY),
+ AOM_CODEC_OK);
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_threads = 10;
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 0;
+ cfg.g_bit_depth = AOM_BITS_8;
+ cfg.g_input_bit_depth = 8;
+ cfg.g_lag_in_frames = 0;
+ cfg.rc_min_quantizer = 0;
+ cfg.rc_max_quantizer = 63;
+ cfg.g_pass = AOM_RC_ONE_PASS;
+ cfg.g_limit = 1;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_ROW_MT, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 6), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 14), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_DELTAQ_MODE, 3), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_set_option(&enc, "passes", "1"), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_STUDIO_RANGE),
+ AOM_CODEC_OK);
+
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, nullptr, 0, 1, 0));
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+}
+
+#if CONFIG_AV1_HIGHBITDEPTH
+// 10-bit version of the DeltaqMode3MultiThread test.
+TEST(DeltaqModeTest, DeltaqMode3MultiThreadHighbd) {
+ constexpr int kWidth = 1280;
+ constexpr int kHeight = 720;
+ // Dummy buffer of 10-bit neutral gray samples.
+ constexpr size_t kBufferSize = kWidth * kHeight + kWidth * kHeight / 2;
+ std::vector<uint16_t> buffer(kBufferSize, 512);
+
+ aom_image_t img;
+ EXPECT_EQ(&img,
+ aom_img_wrap(&img, AOM_IMG_FMT_I42016, kWidth, kHeight, 1,
+ reinterpret_cast<unsigned char *>(buffer.data())));
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_GOOD_QUALITY),
+ AOM_CODEC_OK);
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_threads = 10;
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 0;
+ cfg.g_bit_depth = AOM_BITS_10;
+ cfg.g_input_bit_depth = 10;
+ cfg.g_lag_in_frames = 0;
+ cfg.rc_min_quantizer = 0;
+ cfg.rc_max_quantizer = 63;
+ cfg.g_pass = AOM_RC_ONE_PASS;
+ cfg.g_limit = 1;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, AOM_CODEC_USE_HIGHBITDEPTH),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 6), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 14), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_DELTAQ_MODE, 3), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_set_option(&enc, "passes", "1"), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_STUDIO_RANGE),
+ AOM_CODEC_OK);
+
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, nullptr, 0, 1, 0));
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+}
+#endif // CONFIG_AV1_HIGHBITDEPTH
+
} // namespace
diff --git a/test/dropframe_encode_test.cc b/test/dropframe_encode_test.cc
new file mode 100644
index 0000000..c7a801b
--- /dev/null
+++ b/test/dropframe_encode_test.cc
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2023, Alliance for Open Media. All rights reserved
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include "test/codec_factory.h"
+#include "test/encode_test_driver.h"
+#include "test/i420_video_source.h"
+#include "test/util.h"
+
+namespace {
+
+// Params: test mode, threads.
+class DropFrameEncodeTestLarge
+ : public ::libaom_test::CodecTestWith2Params<libaom_test::TestMode,
+ unsigned int>,
+ public ::libaom_test::EncoderTest {
+ protected:
+ DropFrameEncodeTestLarge()
+ : EncoderTest(GET_PARAM(0)), frame_number_(0), threads_(GET_PARAM(2)) {}
+
+ virtual void SetUp() { InitializeConfig(GET_PARAM(1)); }
+
+ virtual void PreEncodeFrameHook(::libaom_test::VideoSource *video,
+ ::libaom_test::Encoder *encoder) {
+ frame_number_ = video->frame();
+ if (frame_number_ == 0) {
+ encoder->Control(AOME_SET_CPUUSED, 1);
+ }
+ }
+
+ unsigned int frame_number_;
+ unsigned int threads_;
+};
+
+// Test to reproduce the assertion failure related to buf->display_idx in
+// init_gop_frames_for_tpl() and segmentation fault reported in aomedia:3372
+// while encoding with --drop-frame=1.
+TEST_P(DropFrameEncodeTestLarge, TestNoMisMatch) {
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.rc_buf_sz = 1;
+ cfg_.g_pass = AOM_RC_ONE_PASS;
+ cfg_.rc_dropframe_thresh = 1;
+ cfg_.g_threads = threads_;
+
+ ::libaom_test::I420VideoSource video("desktopqvga2.320_240.yuv", 320, 240, 30,
+ 1, 0, 100);
+
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+}
+
+AV1_INSTANTIATE_TEST_SUITE(DropFrameEncodeTestLarge,
+ ::testing::Values(::libaom_test::kOnePassGood),
+ ::testing::Values(1, 4));
+
+} // namespace
diff --git a/test/ducky_encode_test.cc b/test/ducky_encode_test.cc
deleted file mode 100644
index 7bbdc88..0000000
--- a/test/ducky_encode_test.cc
+++ /dev/null
@@ -1,193 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include <array>
-#include <algorithm>
-#include <cerrno>
-#include <cstring>
-#include <fstream>
-#include <memory>
-#include <numeric>
-#include <string>
-#include <vector>
-
-#include "av1/encoder/encoder.h"
-#include "av1/qmode_rc/ducky_encode.h"
-#include "av1/qmode_rc/ratectrl_qmode.h"
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-#include "test/video_source.h"
-#include "third_party/googletest/src/googlemock/include/gmock/gmock.h"
-#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
-
-namespace aom {
-
-constexpr int kMaxRefFrames = 7;
-
-TEST(DuckyEncodeTest, ComputeFirstPassStats) {
- aom_rational_t frame_rate = { 30, 1 };
- VideoInfo video_info = { 352, 288,
- frame_rate, AOM_IMG_FMT_I420,
- 1, "bus_352x288_420_f20_b8.yuv" };
- video_info.file_path =
- libaom_test::GetDataPath() + "/" + video_info.file_path;
- DuckyEncode ducky_encode(video_info, BLOCK_64X64, kMaxRefFrames, 3, 128);
- std::vector<FIRSTPASS_STATS> frame_stats =
- ducky_encode.ComputeFirstPassStats();
- EXPECT_EQ(frame_stats.size(), static_cast<size_t>(video_info.frame_count));
- for (size_t i = 0; i < frame_stats.size(); ++i) {
- // FIRSTPASS_STATS's first element is frame
- EXPECT_EQ(frame_stats[i].frame, i);
- }
-}
-
-TEST(DuckyEncodeTest, EncodeFrame) {
- aom_rational_t frame_rate = { 30, 1 };
- VideoInfo video_info = { 352, 288,
- frame_rate, AOM_IMG_FMT_I420,
- 17, "bus_352x288_420_f20_b8.yuv" };
- video_info.file_path =
- libaom_test::GetDataPath() + "/" + video_info.file_path;
- DuckyEncode ducky_encode(video_info, BLOCK_64X64, kMaxRefFrames, 3, 128);
- std::vector<FIRSTPASS_STATS> frame_stats =
- ducky_encode.ComputeFirstPassStats();
- ducky_encode.StartEncode(frame_stats);
- // We set coding_frame_count to a arbitrary number that smaller than
- // 17 here.
- // TODO(angiebird): Set coding_frame_count properly, once the DuckyEncode can
- // provide proper information.
- int coding_frame_count = 5;
- EncodeFrameDecision decision = { aom::EncodeFrameMode::kNone,
- aom::EncodeGopMode::kNone,
- {} };
- for (int i = 0; i < coding_frame_count; ++i) {
- ducky_encode.AllocateBitstreamBuffer(video_info);
- EncodeFrameResult encode_frame_result = ducky_encode.EncodeFrame(decision);
- }
- ducky_encode.EndEncode();
-}
-
-TEST(DuckyEncodeTest, EncodeFrameWithQindex) {
- aom_rational_t frame_rate = { 30, 1 };
- VideoInfo video_info = { 352, 288,
- frame_rate, AOM_IMG_FMT_I420,
- 17, "bus_352x288_420_f20_b8.yuv" };
- video_info.file_path =
- libaom_test::GetDataPath() + "/" + video_info.file_path;
- DuckyEncode ducky_encode(video_info, BLOCK_64X64, kMaxRefFrames, 3, 128);
- std::vector<FIRSTPASS_STATS> frame_stats =
- ducky_encode.ComputeFirstPassStats();
- ducky_encode.StartEncode(frame_stats);
- // We set coding_frame_count to a arbitrary number that smaller than
- // 17 here.
- // TODO(angiebird): Set coding_frame_count properly, once the DuckyEncode can
- // provide proper information.
- int coding_frame_count = 5;
- int q_index = 0;
- EncodeFrameDecision decision = { aom::EncodeFrameMode::kQindex,
- aom::EncodeGopMode::kNone,
- { q_index, -1, {}, {} } };
- for (int i = 0; i < coding_frame_count; ++i) {
- ducky_encode.AllocateBitstreamBuffer(video_info);
- EncodeFrameResult encode_frame_result = ducky_encode.EncodeFrame(decision);
- EXPECT_EQ(encode_frame_result.dist, 0);
- }
- ducky_encode.EndEncode();
-}
-
-TEST(DuckyEncodeRCTest, EncodeVideoWithRC) {
- aom_rational_t frame_rate = { 30, 1 };
- const int frame_number = 35;
- const int frame_width = 352;
- const int frame_height = 288;
- VideoInfo video_info = { frame_width, frame_height,
- frame_rate, AOM_IMG_FMT_I420,
- frame_number, "bus_352x288_420_f20_b8.yuv" };
- video_info.file_path =
- libaom_test::GetDataPath() + "/" + video_info.file_path;
- DuckyEncode ducky_encode(video_info, BLOCK_64X64, kMaxRefFrames, 3, 128);
-
- AV1RateControlQMode qmode_rc;
- RateControlParam rc_param = {};
- rc_param.max_gop_show_frame_count = 16;
- rc_param.min_gop_show_frame_count = 4;
- rc_param.ref_frame_table_size = 5;
- rc_param.max_ref_frames = 3;
- rc_param.base_q_index = 45;
- rc_param.max_distinct_q_indices_per_frame = 8;
- rc_param.max_distinct_lambda_scales_per_frame = 1;
- rc_param.frame_width = frame_width;
- rc_param.frame_height = frame_height;
- rc_param.tpl_pass_count = TplPassCount::kOneTplPass;
- rc_param.tpl_pass_index = 0;
- const Status status = qmode_rc.SetRcParam(rc_param);
- ASSERT_TRUE(status.ok());
- FirstpassInfo firstpass_info;
- firstpass_info.stats_list = ducky_encode.ComputeFirstPassStats();
- constexpr int kBlockSize = 16;
- firstpass_info.num_mbs_16x16 = ((frame_width + kBlockSize - 1) / kBlockSize) *
- ((frame_height + kBlockSize - 1) / kBlockSize);
- const auto gop_info = qmode_rc.DetermineGopInfo(firstpass_info);
- ASSERT_TRUE(gop_info.status().ok());
- const GopStructList &gop_list = gop_info.value();
-
- std::vector<aom::GopEncodeInfo> tpl_pass_gop_encode_info_list;
- std::vector<aom::TplGopStats> tpl_gop_stats_list;
- for (const auto &gop_struct : gop_list) {
- const auto gop_encode_info =
- qmode_rc.GetTplPassGopEncodeInfo(gop_struct, firstpass_info);
- ASSERT_TRUE(gop_encode_info.status().ok());
- tpl_pass_gop_encode_info_list.push_back(std::move(*gop_encode_info));
- }
-
- tpl_gop_stats_list = ducky_encode.ComputeTplStats(
- firstpass_info.stats_list, gop_list, tpl_pass_gop_encode_info_list);
-
- std::vector<aom::GopEncodeInfo> final_pass_gop_encode_info_list;
- aom::RefFrameTable ref_frame_table;
- for (size_t i = 0; i < gop_list.size(); ++i) {
- const aom::GopStruct &gop_struct = gop_list[i];
- const aom::TplGopStats &tpl_gop_stats = tpl_gop_stats_list[i];
- std::vector<aom::LookaheadStats> lookahead_stats = {};
- for (size_t lookahead_index = 1;
- lookahead_index <= 1 && i + lookahead_index < gop_list.size();
- ++lookahead_index) {
- lookahead_stats.push_back({ &gop_list[i + lookahead_index],
- &tpl_gop_stats_list[i + lookahead_index] });
- }
- const auto gop_encode_info =
- qmode_rc.GetGopEncodeInfo(gop_struct, tpl_gop_stats, lookahead_stats,
- firstpass_info, ref_frame_table);
- ASSERT_TRUE(gop_encode_info.status().ok());
- ref_frame_table = gop_encode_info.value().final_snapshot;
- final_pass_gop_encode_info_list.push_back(std::move(*gop_encode_info));
- }
-
- ducky_encode.StartEncode(firstpass_info.stats_list);
- std::vector<aom::EncodeFrameResult> encoded_frames_list =
- ducky_encode.EncodeVideo(gop_list, final_pass_gop_encode_info_list);
- ducky_encode.EndEncode();
-
- EXPECT_THAT(encoded_frames_list,
- testing::Each(testing::Field(
- "psnr", &aom::EncodeFrameResult::psnr, testing::Gt(37))));
-}
-
-TEST(DuckyEncodeTest, EncodeFrameMode) {
- EXPECT_EQ(DUCKY_ENCODE_FRAME_MODE_NONE,
- static_cast<DUCKY_ENCODE_FRAME_MODE>(EncodeFrameMode::kNone));
- EXPECT_EQ(DUCKY_ENCODE_FRAME_MODE_QINDEX,
- static_cast<DUCKY_ENCODE_FRAME_MODE>(EncodeFrameMode::kQindex));
- EXPECT_EQ(
- DUCKY_ENCODE_FRAME_MODE_QINDEX_RDMULT,
- static_cast<DUCKY_ENCODE_FRAME_MODE>(EncodeFrameMode::kQindexRdmult));
-}
-
-} // namespace aom
diff --git a/test/ec_test.cc b/test/ec_test.cc
index c4b88e3..e0555b4 100644
--- a/test/ec_test.cc
+++ b/test/ec_test.cc
@@ -93,11 +93,8 @@
int dec_method;
unsigned int sym = data[j] + 1; // Initialize sym to an invalid value.
- if (CDF_SHIFT == 0) {
- dec_method = 3 + (rand() & 1);
- } else {
- dec_method = enc_method[j];
- }
+ dec_method = 3 + (rand() & 1);
+
switch (dec_method) {
case 3: {
sym = od_ec_decode_bool_q15(
@@ -128,30 +125,28 @@
}
}
od_ec_enc_reset(&enc);
- if (CDF_SHIFT == 0) {
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(24576));
- od_ec_enc_patch_initial_bits(&enc, 3, 2);
- EXPECT_FALSE(enc.error) << "od_ec_enc_patch_initial_bits() failed.\n";
- od_ec_enc_patch_initial_bits(&enc, 0, 5);
- EXPECT_TRUE(enc.error)
- << "od_ec_enc_patch_initial_bits() didn't fail when it should have.\n";
- od_ec_enc_reset(&enc);
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
- od_ec_encode_bool_q15(&enc, 1, OD_ICDF(32256));
- od_ec_encode_bool_q15(&enc, 0, OD_ICDF(24576));
- od_ec_enc_patch_initial_bits(&enc, 0, 2);
- EXPECT_FALSE(enc.error) << "od_ec_enc_patch_initial_bits() failed.\n";
- ptr = od_ec_enc_done(&enc, &ptr_sz);
- EXPECT_EQ(ptr_sz, 2u);
- EXPECT_EQ(ptr[0], 63)
- << "Got " << ptr[0]
- << " when expecting 63 for od_ec_enc_patch_initial_bits().\n";
- }
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(24576));
+ od_ec_enc_patch_initial_bits(&enc, 3, 2);
+ EXPECT_FALSE(enc.error) << "od_ec_enc_patch_initial_bits() failed.\n";
+ od_ec_enc_patch_initial_bits(&enc, 0, 5);
+ EXPECT_TRUE(enc.error)
+ << "od_ec_enc_patch_initial_bits() didn't fail when it should have.\n";
+ od_ec_enc_reset(&enc);
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(16384));
+ od_ec_encode_bool_q15(&enc, 1, OD_ICDF(32256));
+ od_ec_encode_bool_q15(&enc, 0, OD_ICDF(24576));
+ od_ec_enc_patch_initial_bits(&enc, 0, 2);
+ EXPECT_FALSE(enc.error) << "od_ec_enc_patch_initial_bits() failed.\n";
+ ptr = od_ec_enc_done(&enc, &ptr_sz);
+ EXPECT_EQ(ptr_sz, 2u);
+ EXPECT_EQ(ptr[0], 63)
+ << "Got " << ptr[0]
+ << " when expecting 63 for od_ec_enc_patch_initial_bits().\n";
od_ec_enc_clear(&enc);
EXPECT_EQ(ret, 0);
}
diff --git a/test/encode_api_test.cc b/test/encode_api_test.cc
index 8303880..470bd06 100644
--- a/test/encode_api_test.cc
+++ b/test/encode_api_test.cc
@@ -106,6 +106,30 @@
EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
}
+TEST(EncodeAPI, MonochromeInProfiles) {
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ ASSERT_EQ(AOM_CODEC_OK, aom_codec_enc_config_default(iface, &cfg, kUsage));
+ cfg.g_w = 128;
+ cfg.g_h = 128;
+ cfg.monochrome = 1;
+ aom_codec_ctx_t enc;
+
+ // Test Profile 0
+ cfg.g_profile = 0;
+ ASSERT_EQ(AOM_CODEC_OK, aom_codec_enc_init(&enc, iface, &cfg, 0));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+
+ // Test Profile 1
+ cfg.g_profile = 1;
+ ASSERT_EQ(AOM_CODEC_INVALID_PARAM, aom_codec_enc_init(&enc, iface, &cfg, 0));
+
+ // Test Profile 3
+ cfg.g_profile = 2;
+ ASSERT_EQ(AOM_CODEC_OK, aom_codec_enc_init(&enc, iface, &cfg, 0));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+}
+
#if !CONFIG_REALTIME_ONLY
TEST(EncodeAPI, AllIntraMode) {
aom_codec_iface_t *iface = aom_codec_av1_cx();
diff --git a/test/encodetxb_test.cc b/test/encodetxb_test.cc
index c1b6709..0a58737 100644
--- a/test/encodetxb_test.cc
+++ b/test/encodetxb_test.cc
@@ -66,17 +66,17 @@
for (int tx_type = DCT_DCT; tx_type < TX_TYPES; ++tx_type) {
const TX_CLASS tx_class = tx_type_to_class[tx_type];
for (int tx_size = TX_4X4; tx_size < TX_SIZES_ALL; ++tx_size) {
- const int bwl = get_txb_bwl((TX_SIZE)tx_size);
+ const int bhl = get_txb_bhl((TX_SIZE)tx_size);
const int width = get_txb_wide((TX_SIZE)tx_size);
const int height = get_txb_high((TX_SIZE)tx_size);
const int real_width = tx_size_wide[tx_size];
const int real_height = tx_size_high[tx_size];
const int16_t *const scan = av1_scan_orders[tx_size][tx_type].scan;
- levels_ = set_levels(levels_buf_, width);
+ levels_ = set_levels(levels_buf_, height);
for (int i = 0; i < kNumTests && !result; ++i) {
for (int eob = 1; eob <= width * height && !result; ++eob) {
- InitDataWithEob(scan, bwl, eob);
+ InitDataWithEob(scan, bhl, eob);
av1_get_nz_map_contexts_c(levels_, scan, eob, (TX_SIZE)tx_size,
tx_class, coeff_contexts_ref_);
@@ -86,7 +86,7 @@
result = Compare(scan, eob);
EXPECT_EQ(result, 0)
- << " tx_class " << tx_class << " width " << real_width
+ << " tx_class " << (int)tx_class << " width " << real_width
<< " height " << real_height << " eob " << eob;
}
}
@@ -102,7 +102,7 @@
printf("Note: Only test the largest possible eob case!\n");
for (int tx_size = TX_4X4; tx_size < TX_SIZES_ALL; ++tx_size) {
- const int bwl = get_txb_bwl((TX_SIZE)tx_size);
+ const int bhl = get_txb_bhl((TX_SIZE)tx_size);
const int width = get_txb_wide((TX_SIZE)tx_size);
const int height = get_txb_high((TX_SIZE)tx_size);
const int real_width = tx_size_wide[tx_size];
@@ -113,8 +113,8 @@
const int eob = width * height;
const int numTests = kNumTests / (width * height);
- levels_ = set_levels(levels_buf_, width);
- InitDataWithEob(scan, bwl, eob);
+ levels_ = set_levels(levels_buf_, height);
+ InitDataWithEob(scan, bhl, eob);
aom_usec_timer_start(&timer_ref);
for (int i = 0; i < numTests; ++i) {
@@ -123,8 +123,8 @@
}
aom_usec_timer_mark(&timer_ref);
- levels_ = set_levels(levels_buf_, width);
- InitDataWithEob(scan, bwl, eob);
+ levels_ = set_levels(levels_buf_, height);
+ InitDataWithEob(scan, bhl, eob);
aom_usec_timer_start(&timer);
for (int i = 0; i < numTests; ++i) {
@@ -145,13 +145,13 @@
}
private:
- void InitDataWithEob(const int16_t *const scan, const int bwl,
+ void InitDataWithEob(const int16_t *const scan, const int bhl,
const int eob) {
memset(levels_buf_, 0, sizeof(levels_buf_));
memset(coeff_contexts_, 0, sizeof(*coeff_contexts_) * MAX_TX_SQUARE);
for (int c = 0; c < eob; ++c) {
- levels_[get_padded_idx(scan[c], bwl)] =
+ levels_[get_padded_idx(scan[c], bhl)] =
static_cast<uint8_t>(clamp(rnd_.Rand8(), 0, INT8_MAX));
coeff_contexts_[scan[c]] = static_cast<int8_t>(rnd_.Rand16() >> 1);
}
@@ -224,8 +224,8 @@
tran_low_t coeff[MAX_TX_SQUARE];
uint8_t levels_buf[2][TX_PAD_2D];
- uint8_t *const levels0 = set_levels(levels_buf[0], width);
- uint8_t *const levels1 = set_levels(levels_buf[1], width);
+ uint8_t *const levels0 = set_levels(levels_buf[0], height);
+ uint8_t *const levels1 = set_levels(levels_buf[1], height);
ACMRandom rnd(ACMRandom::DeterministicSeed());
for (int i = 0; i < width * height; i++) {
diff --git a/test/error_block_test.cc b/test/error_block_test.cc
index a6b442f..aadbb44 100644
--- a/test/error_block_test.cc
+++ b/test/error_block_test.cc
@@ -190,11 +190,10 @@
int64_t ssz;
int num_iters = 100000;
int64_t ref_ssz;
- int k;
const int msb = bit_depth_ + 8 - 1;
for (int i = 0; i < 9; ++i) {
block_size = 16 << (i % 9); // All block sizes from 4x4, 8x4 ..64x64
- for (k = 0; k < 9; k++) {
+ for (int k = 0; k < 9; k++) {
for (int j = 0; j < block_size; j++) {
if (k < 5) {
if (rnd(2)) {
@@ -221,7 +220,7 @@
aom_usec_timer ref_timer, test_timer;
aom_usec_timer_start(&ref_timer);
- for (int i = 0; i < num_iters; ++i) {
+ for (int iter = 0; iter < num_iters; ++iter) {
ref_error_block_op_(coeff, dqcoeff, block_size, &ref_ssz, bit_depth_);
}
aom_usec_timer_mark(&ref_timer);
@@ -229,7 +228,7 @@
static_cast<int>(aom_usec_timer_elapsed(&ref_timer));
aom_usec_timer_start(&test_timer);
- for (int i = 0; i < num_iters; ++i) {
+ for (int iter = 0; iter < num_iters; ++iter) {
error_block_op_(coeff, dqcoeff, block_size, &ssz, bit_depth_);
}
aom_usec_timer_mark(&test_timer);
diff --git a/test/ethread_test.cc b/test/ethread_test.cc
index 8e1d750..6b7fcce 100644
--- a/test/ethread_test.cc
+++ b/test/ethread_test.cc
@@ -261,6 +261,16 @@
encoder->Control(AOME_SET_ARNR_STRENGTH, 5);
encoder->Control(AV1E_SET_FRAME_PARALLEL_DECODING, 0);
encoder->Control(AV1E_SET_MAX_GF_INTERVAL, 4);
+ // In row_mt_=0 case, the output of single thread (1 thread) will be
+ // compared with multi thread (4 thread) output (as per line no:340).
+ // Currently, Loop restoration stage is conditionally disabled for speed
+ // 5, 6 when num_workers > 1. Due to this, the match between single
+ // thread and multi thread output can not be achieved. Hence, testing
+ // this case alone with LR disabled.
+ // TODO(aomedia:3446): Remove the constraint on this test case once Loop
+ // restoration state is same in both single and multi thread path.
+ if (set_cpu_used_ >= 5 && row_mt_ == 0)
+ encoder->Control(AV1E_SET_ENABLE_RESTORATION, 0);
} else if (encoding_mode_ == ::libaom_test::kRealTime) {
encoder->Control(AOME_SET_ENABLEAUTOALTREF, 0);
encoder->Control(AV1E_SET_AQ_MODE, 3);
diff --git a/test/fft_test.cc b/test/fft_test.cc
index 7fce0f8..5443c99 100644
--- a/test/fft_test.cc
+++ b/test/fft_test.cc
@@ -82,7 +82,8 @@
};
std::ostream &operator<<(std::ostream &os, const FFTTestArg &test_arg) {
- return os << "fft_arg { n:" << test_arg.n << " fft:" << test_arg.fft << " }";
+ return os << "fft_arg { n:" << test_arg.n
+ << " fft:" << reinterpret_cast<const void *>(test_arg.fft) << " }";
}
class FFT2DTest : public ::testing::TestWithParam<FFTTestArg> {
@@ -146,7 +147,7 @@
FFTTestArg(16, aom_fft16x16_float_c),
FFTTestArg(32,
aom_fft32x32_float_c)));
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
#if HAVE_SSE2
INSTANTIATE_TEST_SUITE_P(
SSE2, FFT2DTest,
@@ -162,7 +163,7 @@
FFTTestArg(16, aom_fft16x16_float_avx2),
FFTTestArg(32, aom_fft32x32_float_avx2)));
#endif // HAVE_AVX2
-#endif // ARCH_X86 || ARCH_X86_64
+#endif // AOM_ARCH_X86 || AOM_ARCH_X86_64
struct IFFTTestArg {
int n;
@@ -171,8 +172,8 @@
};
std::ostream &operator<<(std::ostream &os, const IFFTTestArg &test_arg) {
- return os << "ifft_arg { n:" << test_arg.n << " fft:" << test_arg.ifft
- << " }";
+ return os << "ifft_arg { n:" << test_arg.n
+ << " fft:" << reinterpret_cast<const void *>(test_arg.ifft) << " }";
}
class IFFT2DTest : public ::testing::TestWithParam<IFFTTestArg> {
@@ -245,7 +246,7 @@
IFFTTestArg(8, aom_ifft8x8_float_c),
IFFTTestArg(16, aom_ifft16x16_float_c),
IFFTTestArg(32, aom_ifft32x32_float_c)));
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
#if HAVE_SSE2
INSTANTIATE_TEST_SUITE_P(
SSE2, IFFT2DTest,
@@ -262,6 +263,6 @@
IFFTTestArg(16, aom_ifft16x16_float_avx2),
IFFTTestArg(32, aom_ifft32x32_float_avx2)));
#endif // HAVE_AVX2
-#endif // ARCH_X86 || ARCH_X86_64
+#endif // AOM_ARCH_X86 || AOM_ARCH_X86_64
} // namespace
diff --git a/test/film_grain_table_test.cc b/test/film_grain_table_test.cc
index bf63903..f8937f1 100644
--- a/test/film_grain_table_test.cc
+++ b/test/film_grain_table_test.cc
@@ -14,6 +14,10 @@
#include "aom_dsp/grain_table.h"
#include "aom/internal/aom_codec_internal.h"
#include "av1/encoder/grain_test_vectors.h"
+#include "test/codec_factory.h"
+#include "test/encode_test_driver.h"
+#include "test/i420_video_source.h"
+#include "test/util.h"
#include "test/video_source.h"
void grain_equal(const aom_film_grain_t *expected,
@@ -267,3 +271,66 @@
EXPECT_EQ(0, remove(grain_file.c_str()));
}
+
+const ::libaom_test::TestMode kFilmGrainEncodeTestModes[] = {
+ ::libaom_test::kRealTime,
+#if !CONFIG_REALTIME_ONLY
+ ::libaom_test::kOnePassGood
+#endif
+};
+
+class FilmGrainEncodeTest
+ : public ::libaom_test::CodecTestWith2Params<bool, ::libaom_test::TestMode>,
+ public ::libaom_test::EncoderTest {
+ protected:
+ FilmGrainEncodeTest()
+ : EncoderTest(GET_PARAM(0)), test_monochrome_(GET_PARAM(1)),
+ test_mode_(GET_PARAM(2)) {}
+ ~FilmGrainEncodeTest() override = default;
+
+ void SetUp() override {
+ InitializeConfig(test_mode_);
+ cfg_.monochrome = test_monochrome_;
+ cfg_.rc_target_bitrate = 300;
+ cfg_.kf_max_dist = 0;
+ }
+
+ void PreEncodeFrameHook(::libaom_test::VideoSource *video,
+ ::libaom_test::Encoder *encoder) override {
+ if (video->frame() == 0) {
+ encoder->Control(AOME_SET_CPUUSED, 5);
+ encoder->Control(AV1E_SET_TUNE_CONTENT, AOM_CONTENT_FILM);
+ encoder->Control(AV1E_SET_DENOISE_NOISE_LEVEL, 1);
+ } else if (video->frame() == 1) {
+ cfg_.monochrome = 0;
+ encoder->Config(&cfg_);
+ } else {
+ cfg_.monochrome = test_monochrome_;
+ encoder->Config(&cfg_);
+ }
+ }
+
+ bool DoDecode() const override { return false; }
+
+ void DoTest() {
+ if (test_monochrome_ && test_mode_ == ::libaom_test::kRealTime) {
+ // TODO(bohanli): Running real time mode with monochrome will cause the
+ // encoder to crash. Check if this is intended or there is a bug.
+ GTEST_SKIP();
+ }
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352,
+ 288, 30, 1, 0, 3);
+ cfg_.g_w = video.img()->d_w;
+ cfg_.g_h = video.img()->d_h;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ }
+
+ private:
+ bool test_monochrome_;
+ ::libaom_test::TestMode test_mode_;
+};
+
+TEST_P(FilmGrainEncodeTest, Test) { DoTest(); }
+
+AV1_INSTANTIATE_TEST_SUITE(FilmGrainEncodeTest, ::testing::Bool(),
+ ::testing::ValuesIn(kFilmGrainEncodeTestModes));
diff --git a/test/firstpass_test.cc b/test/firstpass_test.cc
index 718fdab..1f4f3b7 100644
--- a/test/firstpass_test.cc
+++ b/test/firstpass_test.cc
@@ -76,11 +76,13 @@
EXPECT_EQ(firstpass_info.stats_count, FIRSTPASS_INFO_STATIC_BUF_SIZE);
EXPECT_EQ(firstpass_info.stats_count, firstpass_info.stats_buf_size);
- // Push the stats when the queue is full.
- FIRSTPASS_STATS stats;
- av1_zero(stats);
- aom_codec_err_t ret = av1_firstpass_info_push(&firstpass_info, &stats);
- EXPECT_EQ(ret, AOM_CODEC_ERROR);
+ {
+ // Push the stats when the queue is full.
+ FIRSTPASS_STATS stats;
+ av1_zero(stats);
+ aom_codec_err_t ret = av1_firstpass_info_push(&firstpass_info, &stats);
+ EXPECT_EQ(ret, AOM_CODEC_ERROR);
+ }
}
TEST(FirstpassTest, FirstpassInfoTotalStats) {
@@ -110,9 +112,11 @@
EXPECT_EQ(ret, AOM_CODEC_OK);
}
EXPECT_EQ(firstpass_info.cur_index, firstpass_info.start_index);
- aom_codec_err_t ret = av1_firstpass_info_pop(&firstpass_info);
- // We cannot pop when cur_index == start_index
- EXPECT_EQ(ret, AOM_CODEC_ERROR);
+ {
+ aom_codec_err_t ret = av1_firstpass_info_pop(&firstpass_info);
+ // We cannot pop when cur_index == start_index
+ EXPECT_EQ(ret, AOM_CODEC_ERROR);
+ }
int ref_frame_cnt = 0;
const int move_count = FIRSTPASS_INFO_STATIC_BUF_SIZE * 2 / 3;
for (int i = 0; i < move_count; ++i) {
diff --git a/test/forced_max_frame_width_height_test.cc b/test/forced_max_frame_width_height_test.cc
index 98d96fb..2e019b6 100644
--- a/test/forced_max_frame_width_height_test.cc
+++ b/test/forced_max_frame_width_height_test.cc
@@ -15,7 +15,9 @@
// encode two frames of increasing sizes. The second aom_codec_encode() should
// not crash or have memory errors.
+#include <algorithm>
#include <memory>
+#include <vector>
#include "aom/aomcx.h"
#include "aom/aom_encoder.h"
@@ -89,6 +91,114 @@
RunTest(AOM_USAGE_GOOD_QUALITY, /*lag_in_frames=*/1, "ssim");
}
+void FillImageGradient(aom_image_t *image, int bit_depth) {
+ assert(image->range == AOM_CR_FULL_RANGE);
+ for (int plane = 0; plane < 3; plane++) {
+ const int plane_width = aom_img_plane_width(image, plane);
+ const int plane_height = aom_img_plane_height(image, plane);
+ unsigned char *row = image->planes[plane];
+ const int stride = image->stride[plane];
+ for (int y = 0; y < plane_height; ++y) {
+ for (int x = 0; x < plane_width; ++x) {
+ const int value = (x + y) * ((1 << bit_depth) - 1) /
+ std::max(1, plane_width + plane_height - 2);
+ assert(value >= 0 && value <= (1 << bit_depth) - 1);
+ if (bit_depth > 8) {
+ reinterpret_cast<uint16_t *>(row)[x] = static_cast<uint16_t>(value);
+ } else {
+ row[x] = static_cast<unsigned char>(value);
+ }
+ }
+ row += stride;
+ }
+ }
+}
+
+// A test that reproduces bug aomedia:3348: Assertion
+// `ms_params->ms_buffers.ref->stride == ms_params->search_sites->stride'
+// failed.
+TEST(EncodeForcedMaxFrameWidthHeight, DISABLED_DimensionDecreasing) {
+ constexpr int kWidth = 128;
+ constexpr int kHeight = 128;
+ constexpr size_t kBufferSize = 3 * kWidth * kHeight;
+ std::vector<unsigned char> buffer(kBufferSize);
+
+ aom_image_t img;
+ EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I420, kWidth, kHeight, 1,
+ buffer.data()));
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+ FillImageGradient(&img, 8);
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_GOOD_QUALITY));
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 0;
+ cfg.g_bit_depth = AOM_BITS_8;
+ cfg.g_input_bit_depth = 8;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_forced_max_frame_width = kWidth;
+ cfg.g_forced_max_frame_height = kHeight;
+ cfg.g_lag_in_frames = 1;
+ cfg.rc_min_quantizer = 20;
+ cfg.rc_max_quantizer = 40;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_enc_init(&enc, iface, &cfg, 0));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 30));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_control(&enc, AOME_SET_CPUUSED, 6));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM));
+
+ // First frame
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_NE(pkt->data.frame.flags & AOM_FRAME_IS_KEY, 0u);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Second frame
+ constexpr int kWidthSmall = 64;
+ constexpr int kHeightSmall = 64;
+ EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I420, kWidthSmall,
+ kHeightSmall, 1, buffer.data()));
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+ FillImageGradient(&img, 8);
+ cfg.g_w = kWidthSmall;
+ cfg.g_h = kHeightSmall;
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_enc_config_set(&enc, &cfg));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, 0u);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, nullptr, 0, 1, 0));
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+}
+
#endif // !CONFIG_REALTIME_ONLY
TEST(EncodeForcedMaxFrameWidthHeight, RealtimeLag0TunePSNR) {
diff --git a/test/frame_size_tests.cc b/test/frame_size_tests.cc
index 20aea31..b15be6e 100644
--- a/test/frame_size_tests.cc
+++ b/test/frame_size_tests.cc
@@ -48,7 +48,9 @@
};
#if CONFIG_SIZE_LIMIT
-TEST_F(AV1FrameSizeTests, TestInvalidSizes) {
+// TODO([email protected]) fails due to newer bounds checks that get caught
+// before the assert below added in ebc2714d71a834fc32a19eef0a81f51fbc47db01
+TEST_F(AV1FrameSizeTests, DISABLED_TestInvalidSizes) {
::libaom_test::RandomVideoSource video;
video.SetSize(DECODE_WIDTH_LIMIT + 16, DECODE_HEIGHT_LIMIT + 16);
@@ -57,7 +59,9 @@
ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
}
-TEST_F(AV1FrameSizeTests, LargeValidSizes) {
+// TODO([email protected]) similar to the above test, needs to be
+// updated for the new rejection case
+TEST_F(AV1FrameSizeTests, DISABLED_LargeValidSizes) {
::libaom_test::RandomVideoSource video;
video.SetSize(DECODE_WIDTH_LIMIT, DECODE_HEIGHT_LIMIT);
diff --git a/test/function_equivalence_test.h b/test/function_equivalence_test.h
index f47800a..fc2a769 100644
--- a/test/function_equivalence_test.h
+++ b/test/function_equivalence_test.h
@@ -36,8 +36,8 @@
template <typename T>
struct FuncParam {
- FuncParam(T ref = nullptr, T tst = nullptr, int bit_depth = 0)
- : ref_func(ref), tst_func(tst), bit_depth(bit_depth) {}
+ FuncParam(T ref = nullptr, T tst = nullptr, int depth = 0)
+ : ref_func(ref), tst_func(tst), bit_depth(depth) {}
T ref_func;
T tst_func;
int bit_depth;
diff --git a/test/fwht4x4_test.cc b/test/fwht4x4_test.cc
index 8b8b4f2..9d27db8 100644
--- a/test/fwht4x4_test.cc
+++ b/test/fwht4x4_test.cc
@@ -113,9 +113,8 @@
ASSERT_NE(output_block, nullptr);
for (int i = 0; i < count_test_block; ++i) {
- int j, k;
- for (j = 0; j < height_; ++j) {
- for (k = 0; k < pitch_; ++k) {
+ for (int j = 0; j < height_; ++j) {
+ for (int k = 0; k < pitch_; ++k) {
int in_idx = j * stride + k;
int out_idx = j * pitch_ + k;
input_block[in_idx] =
@@ -131,7 +130,7 @@
aom_usec_timer c_timer_;
aom_usec_timer_start(&c_timer_);
- for (int i = 0; i < numIter; i++) {
+ for (int iter = 0; iter < numIter; iter++) {
API_REGISTER_STATE_CHECK(
fwd_txfm_c_(input_block, output_ref_block, stride));
}
@@ -140,7 +139,7 @@
aom_usec_timer simd_timer_;
aom_usec_timer_start(&simd_timer_);
- for (int i = 0; i < numIter; i++) {
+ for (int iter = 0; iter < numIter; iter++) {
API_REGISTER_STATE_CHECK(
fwd_txfm_(input_block, output_block, stride));
}
@@ -150,8 +149,8 @@
simd_sum_time += static_cast<int>(aom_usec_timer_elapsed(&simd_timer_));
// The minimum quant value is 4.
- for (j = 0; j < height_; ++j) {
- for (k = 0; k < pitch_; ++k) {
+ for (int j = 0; j < height_; ++j) {
+ for (int k = 0; k < pitch_; ++k) {
int out_idx = j * pitch_ + k;
ASSERT_EQ(output_block[out_idx], output_ref_block[out_idx])
<< "Error: not bit-exact result at index: " << out_idx
@@ -191,10 +190,10 @@
INSTANTIATE_TEST_SUITE_P(
C, Trans4x4WHT,
- ::testing::Values(make_tuple(&av1_highbd_fwht4x4_c, &iwht4x4_10_c, DCT_DCT,
+ ::testing::Values(make_tuple(&av1_fwht4x4_c, &iwht4x4_10_c, DCT_DCT,
AOM_BITS_10, 16,
static_cast<FdctFunc>(nullptr)),
- make_tuple(&av1_highbd_fwht4x4_c, &iwht4x4_12_c, DCT_DCT,
+ make_tuple(&av1_fwht4x4_c, &iwht4x4_12_c, DCT_DCT,
AOM_BITS_12, 16,
static_cast<FdctFunc>(nullptr))));
@@ -202,10 +201,10 @@
INSTANTIATE_TEST_SUITE_P(
SSE4_1, Trans4x4WHT,
- ::testing::Values(make_tuple(&av1_highbd_fwht4x4_sse4_1, &iwht4x4_10_sse4_1,
+ ::testing::Values(make_tuple(&av1_fwht4x4_sse4_1, &iwht4x4_10_sse4_1,
DCT_DCT, AOM_BITS_10, 16,
static_cast<FdctFunc>(nullptr)),
- make_tuple(&av1_highbd_fwht4x4_sse4_1, &iwht4x4_12_sse4_1,
+ make_tuple(&av1_fwht4x4_sse4_1, &iwht4x4_12_sse4_1,
DCT_DCT, AOM_BITS_12, 16,
static_cast<FdctFunc>(nullptr))));
@@ -215,12 +214,10 @@
INSTANTIATE_TEST_SUITE_P(
NEON, Trans4x4WHT,
- ::testing::Values(make_tuple(&av1_highbd_fwht4x4_neon, &iwht4x4_10_c,
- DCT_DCT, AOM_BITS_10, 16,
- &av1_highbd_fwht4x4_c),
- make_tuple(&av1_highbd_fwht4x4_neon, &iwht4x4_12_c,
- DCT_DCT, AOM_BITS_12, 16,
- &av1_highbd_fwht4x4_c)));
+ ::testing::Values(make_tuple(&av1_fwht4x4_neon, &iwht4x4_10_c, DCT_DCT,
+ AOM_BITS_10, 16, &av1_fwht4x4_c),
+ make_tuple(&av1_fwht4x4_neon, &iwht4x4_12_c, DCT_DCT,
+ AOM_BITS_12, 16, &av1_fwht4x4_c)));
#endif // HAVE_NEON
diff --git a/test/hadamard_test.cc b/test/hadamard_test.cc
index 0fe7f42..fc306e6 100644
--- a/test/hadamard_test.cc
+++ b/test/hadamard_test.cc
@@ -242,6 +242,12 @@
virtual void SetUp() { rnd_.Reset(ACMRandom::DeterministicSeed()); }
+ // The Rand() function generates values in the range [-((1 << BitDepth) - 1),
+ // (1 << BitDepth) - 1]. This is because the input to the Hadamard transform
+ // is the residual pixel, which is defined as 'source pixel - predicted
+ // pixel'. Source pixel and predicted pixel take values in the range
+ // [0, (1 << BitDepth) - 1] and thus the residual pixel ranges from
+ // -((1 << BitDepth) - 1) to ((1 << BitDepth) - 1).
virtual int16_t Rand() = 0;
void CompareReferenceRandom() {
@@ -259,9 +265,37 @@
for (int i = 0; i < block_size_; ++i) a[i] = Rand();
ReferenceHadamard(a, bw_, b_ref, bw_, bh_, shift_);
API_REGISTER_STATE_CHECK(h_func_(a, bw_, b));
+
+ // The order of the output is not important. Sort before checking.
+ std::sort(b, b + block_size_);
+ std::sort(b_ref, b_ref + block_size_);
EXPECT_EQ(memcmp(b, b_ref, sizeof(b)), 0);
}
+ void CompareReferenceExtreme() {
+ const int kMaxBlockSize = 32 * 32;
+ const int block_size = bw_ * bh_;
+ const int kBitDepth = 8;
+ DECLARE_ALIGNED(16, int16_t, a[kMaxBlockSize]);
+ DECLARE_ALIGNED(16, OutputType, b[kMaxBlockSize]);
+ memset(b, 0, sizeof(b));
+
+ OutputType b_ref[kMaxBlockSize];
+ memset(b_ref, 0, sizeof(b_ref));
+ for (int i = 0; i < 2; ++i) {
+ const int sign = (i == 0) ? 1 : -1;
+ for (int j = 0; j < block_size; ++j) a[j] = sign * ((1 << kBitDepth) - 1);
+
+ ReferenceHadamard(a, bw_, b_ref, bw_, bh_, shift_);
+ API_REGISTER_STATE_CHECK(h_func_(a, bw_, b));
+
+ // The order of the output is not important. Sort before checking.
+ std::sort(b, b + block_size);
+ std::sort(b_ref, b_ref + block_size);
+ EXPECT_EQ(memcmp(b, b_ref, sizeof(b)), 0);
+ }
+ }
+
void VaryStride() {
const int kMaxBlockSize = 32 * 32;
const int block_size_ = bw_ * bh_;
@@ -278,6 +312,10 @@
ReferenceHadamard(a, i, b_ref, bw_, bh_, shift_);
API_REGISTER_STATE_CHECK(h_func_(a, i, b));
+
+ // The order of the output is not important. Sort before checking.
+ std::sort(b, b + block_size_);
+ std::sort(b_ref, b_ref + block_size_);
EXPECT_EQ(0, memcmp(b, b_ref, sizeof(b)));
}
}
@@ -312,11 +350,20 @@
class HadamardLowbdTest : public HadamardTestBase<tran_low_t, HadamardFunc> {
public:
HadamardLowbdTest() : HadamardTestBase(GetParam(), /*do_shift=*/true) {}
- virtual int16_t Rand() { return rnd_.Rand9Signed(); }
+ // Use values between -255 (0xFF01) and 255 (0x00FF)
+ virtual int16_t Rand() {
+ int16_t src = rnd_.Rand8();
+ int16_t pred = rnd_.Rand8();
+ return src - pred;
+ }
};
TEST_P(HadamardLowbdTest, CompareReferenceRandom) { CompareReferenceRandom(); }
+TEST_P(HadamardLowbdTest, CompareReferenceExtreme) {
+ CompareReferenceExtreme();
+}
+
TEST_P(HadamardLowbdTest, VaryStride) { VaryStride(); }
TEST_P(HadamardLowbdTest, DISABLED_SpeedTest) { SpeedTest(1000000); }
@@ -349,15 +396,62 @@
#if HAVE_NEON
INSTANTIATE_TEST_SUITE_P(
NEON, HadamardLowbdTest,
- ::testing::Values(HadamardFuncWithSize(&aom_hadamard_8x8_neon, 8, 8),
- HadamardFuncWithSize(&aom_hadamard_16x16_neon, 16, 16)));
+ ::testing::Values(HadamardFuncWithSize(&aom_hadamard_4x4_neon, 4, 4),
+ HadamardFuncWithSize(&aom_hadamard_8x8_neon, 8, 8),
+ HadamardFuncWithSize(&aom_hadamard_16x16_neon, 16, 16),
+ HadamardFuncWithSize(&aom_hadamard_32x32_neon, 32, 32)));
#endif // HAVE_NEON
+#if CONFIG_AV1_HIGHBITDEPTH
+class HadamardHighbdTest : public HadamardTestBase<tran_low_t, HadamardFunc> {
+ protected:
+ HadamardHighbdTest() : HadamardTestBase(GetParam(), /*do_shift=*/true) {}
+ // Use values between -4095 (0xF001) and 4095 (0x0FFF)
+ virtual int16_t Rand() {
+ int16_t src = rnd_.Rand12();
+ int16_t pred = rnd_.Rand12();
+ return src - pred;
+ }
+};
+
+TEST_P(HadamardHighbdTest, CompareReferenceRandom) { CompareReferenceRandom(); }
+
+TEST_P(HadamardHighbdTest, VaryStride) { VaryStride(); }
+
+TEST_P(HadamardHighbdTest, DISABLED_Speed) {
+ SpeedTest(10);
+ SpeedTest(10000);
+ SpeedTest(10000000);
+}
+
+INSTANTIATE_TEST_SUITE_P(
+ C, HadamardHighbdTest,
+ ::testing::Values(
+ HadamardFuncWithSize(&aom_highbd_hadamard_8x8_c, 8, 8),
+ HadamardFuncWithSize(&aom_highbd_hadamard_16x16_c, 16, 16),
+ HadamardFuncWithSize(&aom_highbd_hadamard_32x32_c, 32, 32)));
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, HadamardHighbdTest,
+ ::testing::Values(
+ HadamardFuncWithSize(&aom_highbd_hadamard_8x8_neon, 8, 8),
+ HadamardFuncWithSize(&aom_highbd_hadamard_16x16_neon, 16, 16),
+ HadamardFuncWithSize(&aom_highbd_hadamard_32x32_neon, 32, 32)));
+#endif // HAVE_NEON
+
+#endif // CONFIG_AV1_HIGHBITDEPTH
+
// Tests for low precision
class HadamardLowbdLPTest : public HadamardTestBase<int16_t, HadamardLPFunc> {
public:
HadamardLowbdLPTest() : HadamardTestBase(GetParam(), /*do_shift=*/false) {}
- virtual int16_t Rand() { return rnd_.Rand9Signed(); }
+ // Use values between -255 (0xFF01) and 255 (0x00FF)
+ virtual int16_t Rand() {
+ int16_t src = rnd_.Rand8();
+ int16_t pred = rnd_.Rand8();
+ return src - pred;
+ }
};
TEST_P(HadamardLowbdLPTest, CompareReferenceRandom) {
@@ -402,7 +496,12 @@
public:
HadamardLowbdLP8x8DualTest()
: HadamardTestBase(GetParam(), /*do_shift=*/false) {}
- virtual int16_t Rand() { return rnd_.Rand9Signed(); }
+ // Use values between -255 (0xFF01) and 255 (0x00FF)
+ virtual int16_t Rand() {
+ int16_t src = rnd_.Rand8();
+ int16_t pred = rnd_.Rand8();
+ return src - pred;
+ }
};
TEST_P(HadamardLowbdLP8x8DualTest, CompareReferenceRandom) {
diff --git a/test/hbd_metrics_test.cc b/test/hbd_metrics_test.cc
index 6c9fe55..074213a 100644
--- a/test/hbd_metrics_test.cc
+++ b/test/hbd_metrics_test.cc
@@ -112,10 +112,10 @@
memset(&hbd_src, 0, sizeof(hbd_src));
memset(&hbd_dst, 0, sizeof(hbd_dst));
- aom_alloc_frame_buffer(&lbd_src, width, height, 1, 1, 0, 32, 16, 0);
- aom_alloc_frame_buffer(&lbd_dst, width, height, 1, 1, 0, 32, 16, 0);
- aom_alloc_frame_buffer(&hbd_src, width, height, 1, 1, 1, 32, 16, 0);
- aom_alloc_frame_buffer(&hbd_dst, width, height, 1, 1, 1, 32, 16, 0);
+ aom_alloc_frame_buffer(&lbd_src, width, height, 1, 1, 0, 32, 16, 0, 0);
+ aom_alloc_frame_buffer(&lbd_dst, width, height, 1, 1, 0, 32, 16, 0, 0);
+ aom_alloc_frame_buffer(&hbd_src, width, height, 1, 1, 1, 32, 16, 0, 0);
+ aom_alloc_frame_buffer(&hbd_dst, width, height, 1, 1, 1, 32, 16, 0, 0);
memset(lbd_src.buffer_alloc, kPixFiller, lbd_src.buffer_alloc_sz);
while (i < lbd_src.buffer_alloc_sz) {
diff --git a/test/horz_superres_test.cc b/test/horz_superres_test.cc
index 323aa93..cba29e9 100644
--- a/test/horz_superres_test.cc
+++ b/test/horz_superres_test.cc
@@ -53,12 +53,12 @@
const TestVideoParam kTestVideoVectors[] = {
{ "park_joy_90p_8_420.y4m", AOM_IMG_FMT_I420, AOM_BITS_8, 0, 5, 0, 25.3,
- 45.0 },
+ 44.7 },
#if CONFIG_AV1_HIGHBITDEPTH
{ "park_joy_90p_10_444.y4m", AOM_IMG_FMT_I44416, AOM_BITS_10, 1, 5, 0, 27.0,
- 47.9 },
+ 46.8 },
#endif
- { "screendata.y4m", AOM_IMG_FMT_I420, AOM_BITS_8, 0, 4, 1, 23.0, 56.0 },
+ { "screendata.y4m", AOM_IMG_FMT_I420, AOM_BITS_8, 0, 4, 1, 23.0, 52.5 },
// Image coding (single frame).
{ "niklas_1280_720_30.y4m", AOM_IMG_FMT_I420, AOM_BITS_8, 0, 1, 0, 32.0,
49.0 },
diff --git a/test/intrapred_test.cc b/test/intrapred_test.cc
index 3da9293..aced593 100644
--- a/test/intrapred_test.cc
+++ b/test/intrapred_test.cc
@@ -340,26 +340,11 @@
#if HAVE_NEON
const IntraPredFunc<IntraPred> LowbdIntraPredTestVectorNeon[] = {
- lowbd_entry(dc, 4, 4, neon), lowbd_entry(dc, 8, 8, neon),
- lowbd_entry(dc, 16, 16, neon), lowbd_entry(dc, 32, 32, neon),
-
- lowbd_entry(dc_top, 4, 4, neon), lowbd_entry(dc_top, 8, 8, neon),
- lowbd_entry(dc_top, 16, 16, neon), lowbd_entry(dc_top, 32, 32, neon),
-
- lowbd_entry(dc_left, 4, 4, neon), lowbd_entry(dc_left, 8, 8, neon),
- lowbd_entry(dc_left, 16, 16, neon), lowbd_entry(dc_left, 32, 32, neon),
-
- lowbd_entry(dc_128, 4, 4, neon), lowbd_entry(dc_128, 8, 8, neon),
- lowbd_entry(dc_128, 16, 16, neon), lowbd_entry(dc_128, 32, 32, neon),
-
- lowbd_entry(v, 4, 4, neon), lowbd_entry(v, 8, 8, neon),
- lowbd_entry(v, 16, 16, neon), lowbd_entry(v, 32, 32, neon),
-
- lowbd_entry(h, 4, 4, neon), lowbd_entry(h, 8, 8, neon),
- lowbd_entry(h, 16, 16, neon), lowbd_entry(h, 32, 32, neon),
-
- lowbd_intrapred(smooth, neon), lowbd_intrapred(smooth_v, neon),
- lowbd_intrapred(smooth_h, neon), lowbd_intrapred(paeth, neon),
+ lowbd_intrapred(dc, neon), lowbd_intrapred(dc_top, neon),
+ lowbd_intrapred(dc_left, neon), lowbd_intrapred(dc_128, neon),
+ lowbd_intrapred(v, neon), lowbd_intrapred(h, neon),
+ lowbd_intrapred(smooth, neon), lowbd_intrapred(smooth_v, neon),
+ lowbd_intrapred(smooth_h, neon), lowbd_intrapred(paeth, neon),
};
INSTANTIATE_TEST_SUITE_P(NEON, LowbdIntraPredTest,
@@ -416,13 +401,11 @@
#if CONFIG_AV1_HIGHBITDEPTH
#if HAVE_NEON
const IntraPredFunc<HighbdIntraPred> HighbdIntraPredTestVectorNeon[] = {
- highbd_entry(dc, 4, 4, neon, 8), highbd_entry(dc, 8, 8, neon, 8),
- highbd_entry(dc, 16, 16, neon, 8), highbd_entry(dc, 32, 32, neon, 8),
- highbd_entry(dc, 64, 64, neon, 8),
-
- highbd_intrapred(v, neon, 12), highbd_intrapred(paeth, neon, 12),
- highbd_intrapred(smooth, neon, 12), highbd_intrapred(smooth_v, neon, 12),
- highbd_intrapred(smooth_h, neon, 12),
+ highbd_intrapred(dc, neon, 12), highbd_intrapred(dc_top, neon, 12),
+ highbd_intrapred(dc_left, neon, 12), highbd_intrapred(dc_128, neon, 12),
+ highbd_intrapred(v, neon, 12), highbd_intrapred(h, neon, 12),
+ highbd_intrapred(paeth, neon, 12), highbd_intrapred(smooth, neon, 12),
+ highbd_intrapred(smooth_v, neon, 12), highbd_intrapred(smooth_h, neon, 12),
};
INSTANTIATE_TEST_SUITE_P(NEON, HighbdIntraPredTest,
diff --git a/test/invalid_file_test.cc b/test/invalid_file_test.cc
index 10a3bc4..63e15ca 100644
--- a/test/invalid_file_test.cc
+++ b/test/invalid_file_test.cc
@@ -133,10 +133,16 @@
{ 4, "invalid-oss-fuzz-9463.ivf", "invalid-oss-fuzz-9463.ivf.res.2" },
{ 1, "invalid-oss-fuzz-9720.ivf", nullptr },
{ 1, "invalid-oss-fuzz-10389.ivf", "invalid-oss-fuzz-10389.ivf.res.4" },
+#if !CHROMIUM && !CONFIG_SIZE_LIMIT || \
+ (CONFIG_SIZE_LIMIT && DECODE_WIDTH_LIMIT >= 5120 && \
+ DECODE_HEIGHT_LIMIT >= 180)
{ 1, "invalid-oss-fuzz-11523.ivf", "invalid-oss-fuzz-11523.ivf.res.2" },
+#endif
{ 4, "invalid-oss-fuzz-15363.ivf", nullptr },
{ 1, "invalid-oss-fuzz-16437.ivf", "invalid-oss-fuzz-16437.ivf.res.2" },
+#if CONFIG_MAX_DECODE_PROFILE >= 1
{ 1, "invalid-oss-fuzz-24706.ivf", nullptr },
+#endif
#if CONFIG_AV1_HIGHBITDEPTH
// These test vectors contain 10-bit or 12-bit video.
{ 1, "invalid-oss-fuzz-9288.ivf", nullptr },
diff --git a/test/level_test.cc b/test/level_test.cc
index 7ae1a75..cc79926 100644
--- a/test/level_test.cc
+++ b/test/level_test.cc
@@ -9,6 +9,7 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
#include <memory>
+#include <string>
#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
@@ -78,8 +79,8 @@
int level_[32];
};
-TEST_P(LevelTest, TestTargetLevelApi) {
- static aom_codec_iface_t *codec = aom_codec_av1_cx();
+TEST(LevelTest, TestTargetLevelApi) {
+ aom_codec_iface_t *codec = aom_codec_av1_cx();
aom_codec_ctx_t enc;
aom_codec_enc_cfg_t cfg;
EXPECT_EQ(AOM_CODEC_OK, aom_codec_enc_config_default(codec, &cfg, 0));
@@ -87,10 +88,10 @@
for (int operating_point = 0; operating_point <= 32; ++operating_point) {
for (int level = 0; level <= 32; ++level) {
const int target_level = operating_point * 100 + level;
- if ((level < (CONFIG_CWG_C013 ? 28 : 20) && level != 2 && level != 3 &&
- level != 6 && level != 7 && level != 10 && level != 11) ||
- level == kLevelMax || level == kLevelKeepStats ||
- operating_point > 31) {
+ if (operating_point <= 31 &&
+ ((level < (CONFIG_CWG_C013 ? 28 : 20) && level != 2 && level != 3 &&
+ level != 6 && level != 7 && level != 10 && level != 11) ||
+ level == kLevelMax || level == kLevelKeepStats)) {
EXPECT_EQ(AOM_CODEC_OK,
AOM_CODEC_CONTROL_TYPECHECKED(
&enc, AV1E_SET_TARGET_SEQ_LEVEL_IDX, target_level));
@@ -104,6 +105,23 @@
EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
}
+TEST(LevelTest, InvalidOperatingPointIndexErrorDetail) {
+ aom_codec_iface_t *codec = aom_codec_av1_cx();
+ aom_codec_ctx_t enc;
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(codec, &cfg, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_enc_init(&enc, codec, &cfg, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TARGET_SEQ_LEVEL_IDX, 3219),
+ AOM_CODEC_INVALID_PARAM);
+ EXPECT_EQ(aom_codec_error_detail(&enc),
+ std::string("Invalid operating point index: 32"));
+ EXPECT_EQ(aom_codec_set_option(&enc, "target-seq-level-idx", "3319"),
+ AOM_CODEC_INVALID_PARAM);
+ EXPECT_EQ(aom_codec_error_detail(&enc),
+ std::string("Invalid operating point index: 33"));
+ EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
+}
+
TEST_P(LevelTest, TestTargetLevel19) {
std::unique_ptr<libaom_test::VideoSource> video;
video.reset(new libaom_test::Y4mVideoSource("park_joy_90p_8_420.y4m", 0, 10));
diff --git a/test/log2_test.cc b/test/log2_test.cc
index d7840c6..71cf8b2 100644
--- a/test/log2_test.cc
+++ b/test/log2_test.cc
@@ -9,6 +9,7 @@
* PATENTS file, you can obtain it at www.aomedia.org/license/patent.
*/
+#include <limits.h>
#include <math.h>
#include "aom_ports/bitops.h"
@@ -42,9 +43,9 @@
const int power_of_2 = 1 << exponent;
EXPECT_EQ(av1_ceil_log2(power_of_2 - 1), exponent);
EXPECT_EQ(av1_ceil_log2(power_of_2), exponent);
- // The current implementation of av1_ceil_log2 only works up to 2^30.
- if (exponent < 30) {
- EXPECT_EQ(av1_ceil_log2(power_of_2 + 1), exponent + 1);
- }
+ EXPECT_EQ(av1_ceil_log2(power_of_2 + 1), exponent + 1);
}
+
+ // INT_MAX = 2^31 - 1
+ EXPECT_EQ(av1_ceil_log2(INT_MAX), 31);
}
diff --git a/test/lossless_test.cc b/test/lossless_test.cc
index c14bc06..ef4e19f 100644
--- a/test/lossless_test.cc
+++ b/test/lossless_test.cc
@@ -76,6 +76,11 @@
return AOM_CODEC_OK == res_dec;
}
+ void TestLosslessEncoding();
+ void TestLosslessEncodingVGALag0();
+ void TestLosslessEncoding444();
+ void TestLosslessEncodingCtrl();
+
private:
double psnr_;
unsigned int nframes_;
@@ -85,7 +90,7 @@
int base_qindex_;
};
-TEST_P(LosslessTestLarge, TestLossLessEncoding) {
+void LosslessTestLarge::TestLosslessEncoding() {
const aom_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
cfg_.rc_target_bitrate = 2000;
@@ -103,7 +108,24 @@
EXPECT_GE(psnr_lossless, kMaxPsnr);
}
-TEST_P(LosslessTestLarge, TestLossLessEncoding444) {
+void LosslessTestLarge::TestLosslessEncodingVGALag0() {
+ const aom_rational timebase = { 33333333, 1000000000 };
+ cfg_.g_timebase = timebase;
+ cfg_.rc_target_bitrate = 2000;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 0;
+
+ init_flags_ = AOM_CODEC_USE_PSNR;
+
+ libaom_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480,
+ timebase.den, timebase.num, 0, 30);
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ const double psnr_lossless = GetMinPsnr();
+ EXPECT_GE(psnr_lossless, kMaxPsnr);
+}
+
+void LosslessTestLarge::TestLosslessEncoding444() {
libaom_test::Y4mVideoSource video("rush_hour_444.y4m", 0, 5);
cfg_.g_profile = 1;
@@ -120,7 +142,7 @@
EXPECT_GE(psnr_lossless, kMaxPsnr);
}
-TEST_P(LosslessTestLarge, TestLossLessEncodingCtrl) {
+void LosslessTestLarge::TestLosslessEncodingCtrl() {
const aom_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
cfg_.rc_target_bitrate = 2000;
@@ -139,9 +161,23 @@
EXPECT_GE(psnr_lossless, kMaxPsnr);
}
+TEST_P(LosslessTestLarge, TestLosslessEncoding) { TestLosslessEncoding(); }
+
+TEST_P(LosslessTestLarge, TestLosslessEncodingVGALag0) {
+ TestLosslessEncodingVGALag0();
+}
+
+TEST_P(LosslessTestLarge, TestLosslessEncoding444) {
+ TestLosslessEncoding444();
+}
+
+TEST_P(LosslessTestLarge, TestLosslessEncodingCtrl) {
+ TestLosslessEncodingCtrl();
+}
+
class LosslessAllIntraTestLarge : public LosslessTestLarge {};
-TEST_P(LosslessAllIntraTestLarge, TestLossLessEncodingCtrl) {
+TEST_P(LosslessAllIntraTestLarge, TestLosslessEncodingCtrl) {
const aom_rational timebase = { 33333333, 1000000000 };
cfg_.g_timebase = timebase;
// Intentionally set Q > 0, to make sure control can be used to activate
@@ -158,6 +194,24 @@
EXPECT_GE(psnr_lossless, kMaxPsnr);
}
+using LosslessRealtimeTestLarge = LosslessTestLarge;
+
+TEST_P(LosslessRealtimeTestLarge, TestLosslessEncoding) {
+ TestLosslessEncoding();
+}
+
+TEST_P(LosslessRealtimeTestLarge, TestLosslessEncodingVGALag0) {
+ TestLosslessEncodingVGALag0();
+}
+
+TEST_P(LosslessRealtimeTestLarge, TestLosslessEncoding444) {
+ TestLosslessEncoding444();
+}
+
+TEST_P(LosslessRealtimeTestLarge, TestLosslessEncodingCtrl) {
+ TestLosslessEncodingCtrl();
+}
+
AV1_INSTANTIATE_TEST_SUITE(LosslessTestLarge,
::testing::Values(::libaom_test::kOnePassGood,
::libaom_test::kTwoPassGood),
@@ -168,4 +222,9 @@
::testing::Values(::libaom_test::kAllIntra),
::testing::Values(AOM_Q),
::testing::Values(6, 9)); // cpu_used
+
+AV1_INSTANTIATE_TEST_SUITE(LosslessRealtimeTestLarge,
+ ::testing::Values(::libaom_test::kRealTime),
+ ::testing::Values(AOM_Q, AOM_VBR, AOM_CBR, AOM_CQ),
+ ::testing::Range(6, 11)); // cpu_used
} // namespace
diff --git a/test/masked_sad_test.cc b/test/masked_sad_test.cc
index 91f7982..2ef3e4d 100644
--- a/test/masked_sad_test.cc
+++ b/test/masked_sad_test.cc
@@ -187,13 +187,30 @@
int msk_stride = MAX_SB_SIZE;
const int iters = run_times == 1 ? number_of_iterations : 1;
for (int i = 0; i < iters; ++i) {
+ if (run_times == 1 && i == 0) {
+ // The maximum accumulator value occurs when src=0 and
+ // ref/second_pref=255 (or vice-versa, since we take the absolute
+ // difference). Check this case explicitly to ensure we do not overflow
+ // during accumulation.
+ for (int j = 0; j < MAX_SB_SIZE * MAX_SB_SIZE; j++) {
+ src_ptr[j] = 0;
+ ref_ptr[j] = 255;
+ (ref_ptr + kBlockSize)[j] = 255;
+ (ref_ptr + 2 * kBlockSize)[j] = 255;
+ (ref_ptr + 3 * kBlockSize)[j] = 255;
+ second_pred_ptr[j] = 255;
+ }
+ } else {
+ for (int j = 0; j < MAX_SB_SIZE * MAX_SB_SIZE; j++) {
+ src_ptr[j] = rnd.Rand8();
+ ref_ptr[j] = rnd.Rand8();
+ (ref_ptr + kBlockSize)[j] = rnd.Rand8();
+ (ref_ptr + 2 * kBlockSize)[j] = rnd.Rand8();
+ (ref_ptr + 3 * kBlockSize)[j] = rnd.Rand8();
+ second_pred_ptr[j] = rnd.Rand8();
+ }
+ }
for (int j = 0; j < MAX_SB_SIZE * MAX_SB_SIZE; j++) {
- src_ptr[j] = rnd.Rand8();
- ref_ptr[j] = rnd.Rand8();
- (ref_ptr + kBlockSize)[j] = rnd.Rand8();
- (ref_ptr + 2 * kBlockSize)[j] = rnd.Rand8();
- (ref_ptr + 3 * kBlockSize)[j] = rnd.Rand8();
- second_pred_ptr[j] = rnd.Rand8();
msk_ptr[j] = ((rnd.Rand8() & 0x7f) > 64) ? rnd.Rand8() & 0x3f : 64;
assert(msk_ptr[j] <= 64);
}
@@ -505,4 +522,65 @@
#endif // CONFIG_AV1_HIGHBITDEPTH
#endif // HAVE_AVX2
+#if HAVE_NEON
+const MaskedSADParam msad_test[] = {
+ make_tuple(&aom_masked_sad4x4_neon, &aom_masked_sad4x4_c),
+ make_tuple(&aom_masked_sad4x8_neon, &aom_masked_sad4x8_c),
+ make_tuple(&aom_masked_sad8x4_neon, &aom_masked_sad8x4_c),
+ make_tuple(&aom_masked_sad8x8_neon, &aom_masked_sad8x8_c),
+ make_tuple(&aom_masked_sad8x16_neon, &aom_masked_sad8x16_c),
+ make_tuple(&aom_masked_sad16x8_neon, &aom_masked_sad16x8_c),
+ make_tuple(&aom_masked_sad16x16_neon, &aom_masked_sad16x16_c),
+ make_tuple(&aom_masked_sad16x32_neon, &aom_masked_sad16x32_c),
+ make_tuple(&aom_masked_sad32x16_neon, &aom_masked_sad32x16_c),
+ make_tuple(&aom_masked_sad32x32_neon, &aom_masked_sad32x32_c),
+ make_tuple(&aom_masked_sad32x64_neon, &aom_masked_sad32x64_c),
+ make_tuple(&aom_masked_sad64x32_neon, &aom_masked_sad64x32_c),
+ make_tuple(&aom_masked_sad64x64_neon, &aom_masked_sad64x64_c),
+ make_tuple(&aom_masked_sad64x128_neon, &aom_masked_sad64x128_c),
+ make_tuple(&aom_masked_sad128x64_neon, &aom_masked_sad128x64_c),
+ make_tuple(&aom_masked_sad128x128_neon, &aom_masked_sad128x128_c),
+#if !CONFIG_REALTIME_ONLY
+ make_tuple(&aom_masked_sad4x16_neon, &aom_masked_sad4x16_c),
+ make_tuple(&aom_masked_sad16x4_neon, &aom_masked_sad16x4_c),
+ make_tuple(&aom_masked_sad8x32_neon, &aom_masked_sad8x32_c),
+ make_tuple(&aom_masked_sad32x8_neon, &aom_masked_sad32x8_c),
+ make_tuple(&aom_masked_sad16x64_neon, &aom_masked_sad16x64_c),
+ make_tuple(&aom_masked_sad64x16_neon, &aom_masked_sad64x16_c),
+#endif
+};
+
+INSTANTIATE_TEST_SUITE_P(NEON, MaskedSADTest, ::testing::ValuesIn(msad_test));
+
+const MaskedSADx4Param msadx4_test[] = {
+ make_tuple(&aom_masked_sad4x4x4d_neon, &aom_masked_sad4x4x4d_c),
+ make_tuple(&aom_masked_sad4x8x4d_neon, &aom_masked_sad4x8x4d_c),
+ make_tuple(&aom_masked_sad8x4x4d_neon, &aom_masked_sad8x4x4d_c),
+ make_tuple(&aom_masked_sad8x8x4d_neon, &aom_masked_sad8x8x4d_c),
+ make_tuple(&aom_masked_sad8x16x4d_neon, &aom_masked_sad8x16x4d_c),
+ make_tuple(&aom_masked_sad16x8x4d_neon, &aom_masked_sad16x8x4d_c),
+ make_tuple(&aom_masked_sad16x16x4d_neon, &aom_masked_sad16x16x4d_c),
+ make_tuple(&aom_masked_sad16x32x4d_neon, &aom_masked_sad16x32x4d_c),
+ make_tuple(&aom_masked_sad32x16x4d_neon, &aom_masked_sad32x16x4d_c),
+ make_tuple(&aom_masked_sad32x32x4d_neon, &aom_masked_sad32x32x4d_c),
+ make_tuple(&aom_masked_sad32x64x4d_neon, &aom_masked_sad32x64x4d_c),
+ make_tuple(&aom_masked_sad64x32x4d_neon, &aom_masked_sad64x32x4d_c),
+ make_tuple(&aom_masked_sad64x64x4d_neon, &aom_masked_sad64x64x4d_c),
+ make_tuple(&aom_masked_sad64x128x4d_neon, &aom_masked_sad64x128x4d_c),
+ make_tuple(&aom_masked_sad128x64x4d_neon, &aom_masked_sad128x64x4d_c),
+ make_tuple(&aom_masked_sad128x128x4d_neon, &aom_masked_sad128x128x4d_c),
+#if !CONFIG_REALTIME_ONLY
+ make_tuple(&aom_masked_sad4x16x4d_neon, &aom_masked_sad4x16x4d_c),
+ make_tuple(&aom_masked_sad16x4x4d_neon, &aom_masked_sad16x4x4d_c),
+ make_tuple(&aom_masked_sad8x32x4d_neon, &aom_masked_sad8x32x4d_c),
+ make_tuple(&aom_masked_sad32x8x4d_neon, &aom_masked_sad32x8x4d_c),
+ make_tuple(&aom_masked_sad16x64x4d_neon, &aom_masked_sad16x64x4d_c),
+ make_tuple(&aom_masked_sad64x16x4d_neon, &aom_masked_sad64x16x4d_c),
+#endif
+};
+
+INSTANTIATE_TEST_SUITE_P(NEON, MaskedSADx4Test,
+ ::testing::ValuesIn(msadx4_test));
+#endif // HAVE_NEON
+
} // namespace
diff --git a/test/masked_variance_test.cc b/test/masked_variance_test.cc
index 4a4cb1a..e76403e 100644
--- a/test/masked_variance_test.cc
+++ b/test/masked_variance_test.cc
@@ -514,4 +514,59 @@
::testing::ValuesIn(hbd_sub_pel_var_test));
#endif // CONFIG_AV1_HIGHBITDEPTH
#endif // HAVE_SSSE3
+
+#if HAVE_NEON
+
+const MaskedSubPixelVarianceParam sub_pel_var_test[] = {
+ make_tuple(&aom_masked_sub_pixel_variance128x128_neon,
+ &aom_masked_sub_pixel_variance128x128_c),
+ make_tuple(&aom_masked_sub_pixel_variance128x64_neon,
+ &aom_masked_sub_pixel_variance128x64_c),
+ make_tuple(&aom_masked_sub_pixel_variance64x128_neon,
+ &aom_masked_sub_pixel_variance64x128_c),
+ make_tuple(&aom_masked_sub_pixel_variance64x64_neon,
+ &aom_masked_sub_pixel_variance64x64_c),
+ make_tuple(&aom_masked_sub_pixel_variance64x32_neon,
+ &aom_masked_sub_pixel_variance64x32_c),
+ make_tuple(&aom_masked_sub_pixel_variance32x64_neon,
+ &aom_masked_sub_pixel_variance32x64_c),
+ make_tuple(&aom_masked_sub_pixel_variance32x32_neon,
+ &aom_masked_sub_pixel_variance32x32_c),
+ make_tuple(&aom_masked_sub_pixel_variance32x16_neon,
+ &aom_masked_sub_pixel_variance32x16_c),
+ make_tuple(&aom_masked_sub_pixel_variance16x32_neon,
+ &aom_masked_sub_pixel_variance16x32_c),
+ make_tuple(&aom_masked_sub_pixel_variance16x16_neon,
+ &aom_masked_sub_pixel_variance16x16_c),
+ make_tuple(&aom_masked_sub_pixel_variance16x8_neon,
+ &aom_masked_sub_pixel_variance16x8_c),
+ make_tuple(&aom_masked_sub_pixel_variance8x16_neon,
+ &aom_masked_sub_pixel_variance8x16_c),
+ make_tuple(&aom_masked_sub_pixel_variance8x8_neon,
+ &aom_masked_sub_pixel_variance8x8_c),
+ make_tuple(&aom_masked_sub_pixel_variance8x4_neon,
+ &aom_masked_sub_pixel_variance8x4_c),
+ make_tuple(&aom_masked_sub_pixel_variance4x8_neon,
+ &aom_masked_sub_pixel_variance4x8_c),
+ make_tuple(&aom_masked_sub_pixel_variance4x4_neon,
+ &aom_masked_sub_pixel_variance4x4_c),
+#if !CONFIG_REALTIME_ONLY
+ make_tuple(&aom_masked_sub_pixel_variance64x16_neon,
+ &aom_masked_sub_pixel_variance64x16_c),
+ make_tuple(&aom_masked_sub_pixel_variance16x64_neon,
+ &aom_masked_sub_pixel_variance16x64_c),
+ make_tuple(&aom_masked_sub_pixel_variance32x8_neon,
+ &aom_masked_sub_pixel_variance32x8_c),
+ make_tuple(&aom_masked_sub_pixel_variance8x32_neon,
+ &aom_masked_sub_pixel_variance8x32_c),
+ make_tuple(&aom_masked_sub_pixel_variance16x4_neon,
+ &aom_masked_sub_pixel_variance16x4_c),
+ make_tuple(&aom_masked_sub_pixel_variance4x16_neon,
+ &aom_masked_sub_pixel_variance4x16_c),
+#endif
+};
+
+INSTANTIATE_TEST_SUITE_P(NEON_C_COMPARE, MaskedSubPixelVarianceTest,
+ ::testing::ValuesIn(sub_pel_var_test));
+#endif // HAVE_NEON
} // namespace
diff --git a/test/minmax_test.cc b/test/minmax_test.cc
new file mode 100644
index 0000000..cf67b7b
--- /dev/null
+++ b/test/minmax_test.cc
@@ -0,0 +1,244 @@
+/*
+ * Copyright (c) 2023 The WebM project authors. All Rights Reserved.
+ * Copyright (c) 2023, Alliance for Open Media. All Rights Reserved.
+ *
+ * This source code is subject to the terms of the BSD 2 Clause License and
+ * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+ * was not distributed with this source code in the LICENSE file, you can
+ * obtain it at www.aomedia.org/license/software. If the Alliance for Open
+ * Media Patent License 1.0 was not distributed with this source code in the
+ * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
+
+#include "config/aom_config.h"
+#include "config/aom_dsp_rtcd.h"
+#include "aom_ports/mem.h"
+#include "test/acm_random.h"
+#include "test/register_state_check.h"
+#include "test/util.h"
+
+namespace {
+
+using ::libaom_test::ACMRandom;
+
+typedef void (*MinMaxFunc)(const uint8_t *a, int a_stride, const uint8_t *b,
+ int b_stride, int *min, int *max);
+
+class MinMaxTest : public ::testing::TestWithParam<MinMaxFunc> {
+ public:
+ virtual void SetUp() {
+ mm_func_ = GetParam();
+ rnd_.Reset(ACMRandom::DeterministicSeed());
+ }
+
+ protected:
+ MinMaxFunc mm_func_;
+ ACMRandom rnd_;
+};
+
+void reference_minmax(const uint8_t *a, int a_stride, const uint8_t *b,
+ int b_stride, int *min_ret, int *max_ret) {
+ int min = 255;
+ int max = 0;
+ for (int i = 0; i < 8; i++) {
+ for (int j = 0; j < 8; j++) {
+ const int diff = abs(a[i * a_stride + j] - b[i * b_stride + j]);
+ if (min > diff) min = diff;
+ if (max < diff) max = diff;
+ }
+ }
+
+ *min_ret = min;
+ *max_ret = max;
+}
+
+TEST_P(MinMaxTest, MinValue) {
+ for (int i = 0; i < 64; i++) {
+ uint8_t a[64], b[64];
+ memset(a, 0, sizeof(a));
+ memset(b, 255, sizeof(b));
+ b[i] = i; // Set a minimum difference of i.
+
+ int min, max;
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ EXPECT_EQ(255, max);
+ EXPECT_EQ(i, min);
+ }
+}
+
+TEST_P(MinMaxTest, MaxValue) {
+ for (int i = 0; i < 64; i++) {
+ uint8_t a[64], b[64];
+ memset(a, 0, sizeof(a));
+ memset(b, 0, sizeof(b));
+ b[i] = i; // Set a maximum difference of i.
+
+ int min, max;
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ EXPECT_EQ(i, max);
+ EXPECT_EQ(0, min);
+ }
+}
+
+TEST_P(MinMaxTest, CompareReference) {
+ uint8_t a[64], b[64];
+ for (int j = 0; j < 64; j++) {
+ a[j] = rnd_.Rand8();
+ b[j] = rnd_.Rand8();
+ }
+
+ int min_ref, max_ref, min, max;
+ reference_minmax(a, 8, b, 8, &min_ref, &max_ref);
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ EXPECT_EQ(max_ref, max);
+ EXPECT_EQ(min_ref, min);
+}
+
+TEST_P(MinMaxTest, CompareReferenceAndVaryStride) {
+ uint8_t a[8 * 64], b[8 * 64];
+ for (int i = 0; i < 8 * 64; i++) {
+ a[i] = rnd_.Rand8();
+ b[i] = rnd_.Rand8();
+ }
+ for (int a_stride = 8; a_stride <= 64; a_stride += 8) {
+ for (int b_stride = 8; b_stride <= 64; b_stride += 8) {
+ int min_ref, max_ref, min, max;
+ reference_minmax(a, a_stride, b, b_stride, &min_ref, &max_ref);
+ API_REGISTER_STATE_CHECK(mm_func_(a, a_stride, b, b_stride, &min, &max));
+ EXPECT_EQ(max_ref, max)
+ << "when a_stride = " << a_stride << " and b_stride = " << b_stride;
+ EXPECT_EQ(min_ref, min)
+ << "when a_stride = " << a_stride << " and b_stride = " << b_stride;
+ }
+ }
+}
+
+#if CONFIG_AV1_HIGHBITDEPTH
+
+using HBDMinMaxTest = MinMaxTest;
+
+void highbd_reference_minmax(const uint8_t *a, int a_stride, const uint8_t *b,
+ int b_stride, int *min_ret, int *max_ret) {
+ int min = 65535;
+ int max = 0;
+ const uint16_t *a_ptr = CONVERT_TO_SHORTPTR(a);
+ const uint16_t *b_ptr = CONVERT_TO_SHORTPTR(b);
+ for (int i = 0; i < 8; i++) {
+ for (int j = 0; j < 8; j++) {
+ const int diff = abs(a_ptr[i * a_stride + j] - b_ptr[i * b_stride + j]);
+ if (min > diff) min = diff;
+ if (max < diff) max = diff;
+ }
+ }
+
+ *min_ret = min;
+ *max_ret = max;
+}
+
+TEST_P(HBDMinMaxTest, MinValue) {
+ uint8_t *a = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ uint8_t *b = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ for (int i = 0; i < 64; i++) {
+ aom_memset16(CONVERT_TO_SHORTPTR(a), 0, 64);
+ aom_memset16(CONVERT_TO_SHORTPTR(b), 65535, 64);
+ CONVERT_TO_SHORTPTR(b)[i] = i; // Set a minimum difference of i.
+
+ int min, max;
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ EXPECT_EQ(65535, max);
+ EXPECT_EQ(i, min);
+ }
+ aom_free(CONVERT_TO_SHORTPTR(a));
+ aom_free(CONVERT_TO_SHORTPTR(b));
+}
+
+TEST_P(HBDMinMaxTest, MaxValue) {
+ uint8_t *a = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ uint8_t *b = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ for (int i = 0; i < 64; i++) {
+ aom_memset16(CONVERT_TO_SHORTPTR(a), 0, 64);
+ aom_memset16(CONVERT_TO_SHORTPTR(b), 0, 64);
+ CONVERT_TO_SHORTPTR(b)[i] = i; // Set a minimum difference of i.
+
+ int min, max;
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ EXPECT_EQ(i, max);
+ EXPECT_EQ(0, min);
+ }
+ aom_free(CONVERT_TO_SHORTPTR(a));
+ aom_free(CONVERT_TO_SHORTPTR(b));
+}
+
+TEST_P(HBDMinMaxTest, CompareReference) {
+ uint8_t *a = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ uint8_t *b = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc(64 * sizeof(uint16_t))));
+ for (int j = 0; j < 64; j++) {
+ CONVERT_TO_SHORTPTR(a)[j] = rnd_.Rand16();
+ CONVERT_TO_SHORTPTR(b)[j] = rnd_.Rand16();
+ }
+
+ int min_ref, max_ref, min, max;
+ highbd_reference_minmax(a, 8, b, 8, &min_ref, &max_ref);
+ API_REGISTER_STATE_CHECK(mm_func_(a, 8, b, 8, &min, &max));
+ aom_free(CONVERT_TO_SHORTPTR(a));
+ aom_free(CONVERT_TO_SHORTPTR(b));
+ EXPECT_EQ(max_ref, max);
+ EXPECT_EQ(min_ref, min);
+}
+
+TEST_P(HBDMinMaxTest, CompareReferenceAndVaryStride) {
+ uint8_t *a = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc((8 * 64) * sizeof(uint16_t))));
+ uint8_t *b = CONVERT_TO_BYTEPTR(
+ reinterpret_cast<uint16_t *>(aom_malloc((8 * 64) * sizeof(uint16_t))));
+ for (int i = 0; i < 8 * 64; i++) {
+ CONVERT_TO_SHORTPTR(a)[i] = rnd_.Rand16();
+ CONVERT_TO_SHORTPTR(b)[i] = rnd_.Rand16();
+ }
+ for (int a_stride = 8; a_stride <= 64; a_stride += 8) {
+ for (int b_stride = 8; b_stride <= 64; b_stride += 8) {
+ int min_ref, max_ref, min, max;
+ highbd_reference_minmax(a, a_stride, b, b_stride, &min_ref, &max_ref);
+ API_REGISTER_STATE_CHECK(mm_func_(a, a_stride, b, b_stride, &min, &max));
+ EXPECT_EQ(max_ref, max)
+ << "when a_stride = " << a_stride << " and b_stride = " << b_stride;
+ EXPECT_EQ(min_ref, min)
+ << "when a_stride = " << a_stride << " and b_stride = " << b_stride;
+ }
+ }
+ aom_free(CONVERT_TO_SHORTPTR(a));
+ aom_free(CONVERT_TO_SHORTPTR(b));
+}
+#endif // CONFIG_AV1_HIGHBITDEPTH
+
+INSTANTIATE_TEST_SUITE_P(C, MinMaxTest, ::testing::Values(&aom_minmax_8x8_c));
+#if CONFIG_AV1_HIGHBITDEPTH
+INSTANTIATE_TEST_SUITE_P(C, HBDMinMaxTest,
+ ::testing::Values(&aom_highbd_minmax_8x8_c));
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, HBDMinMaxTest,
+ ::testing::Values(&aom_highbd_minmax_8x8_neon));
+#endif
+#endif
+
+#if HAVE_SSE2
+INSTANTIATE_TEST_SUITE_P(SSE2, MinMaxTest,
+ ::testing::Values(&aom_minmax_8x8_sse2));
+#endif
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, MinMaxTest,
+ ::testing::Values(&aom_minmax_8x8_neon));
+#endif
+} // namespace
diff --git a/test/mock_ratectrl_qmode.h b/test/mock_ratectrl_qmode.h
deleted file mode 100644
index 9c9e6e8..0000000
--- a/test/mock_ratectrl_qmode.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#ifndef AOM_TEST_MOCK_RATECTRL_QMODE_H_
-#define AOM_TEST_MOCK_RATECTRL_QMODE_H_
-
-#include "av1/qmode_rc/ratectrl_qmode_interface.h"
-#include "third_party/googletest/src/googlemock/include/gmock/gmock.h"
-
-namespace aom {
-
-class MockRateControlQMode : public AV1RateControlQModeInterface {
- public:
- MOCK_METHOD(Status, SetRcParam, (const RateControlParam &rc_param),
- (override));
- MOCK_METHOD(StatusOr<GopStructList>, DetermineGopInfo,
- (const FirstpassInfo &firstpass_info), (override));
- MOCK_METHOD(StatusOr<GopEncodeInfo>, GetGopEncodeInfo,
- (const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const RefFrameTable &ref_frame_table_snapshot_init),
- (override));
- MOCK_METHOD(StatusOr<GopEncodeInfo>, GetGopEncodeInfo,
- (const GopStruct &gop_struct, const TplGopStats &tpl_gop_stats,
- const std::vector<LookaheadStats> &lookahead_stats,
- const FirstpassInfo &firstpass_info,
- const RefFrameTable &ref_frame_table_snapshot_init),
- (override));
- MOCK_METHOD(StatusOr<GopEncodeInfo>, GetTplPassGopEncodeInfo,
- (const GopStruct &gop_struct), (override));
- MOCK_METHOD(StatusOr<GopEncodeInfo>, GetTplPassGopEncodeInfo,
- (const GopStruct &gop_struct,
- const FirstpassInfo &firstpass_info),
- (override));
-};
-
-} // namespace aom
-
-#endif // AOM_TEST_MOCK_RATECTRL_QMODE_H_
diff --git a/test/noise_model_test.cc b/test/noise_model_test.cc
index e9cf9e2..650af79 100644
--- a/test/noise_model_test.cc
+++ b/test/noise_model_test.cc
@@ -36,7 +36,6 @@
return sigma * (u * sqrt(-2.0 * log(s) / s));
}
}
- return 0;
}
// Synthesizes noise using the auto-regressive filter of the given lag,
@@ -625,20 +624,20 @@
TYPED_TEST_P(NoiseModelUpdateTest, UpdateSuccessForWhiteRandomNoise) {
aom_noise_model_t &model = this->model_;
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
const int shift = this->kBitDepth - 8;
- for (int y = 0; y < kHeight; ++y) {
- for (int x = 0; x < kWidth; ++x) {
- this->data_ptr_[0][y * kWidth + x] =
- int(64 + y + randn(&this->random_, 1)) << shift;
- this->denoised_ptr_[0][y * kWidth + x] = (64 + y) << shift;
+ for (int y = 0; y < height; ++y) {
+ for (int x = 0; x < width; ++x) {
+ this->data_ptr_[0][y * width + x] = int(64 + y + randn(&this->random_, 1))
+ << shift;
+ this->denoised_ptr_[0][y * width + x] = (64 + y) << shift;
// Make the chroma planes completely correlated with the Y plane
for (int c = 1; c < 3; ++c) {
- this->data_ptr_[c][y * kWidth + x] = this->data_ptr_[0][y * kWidth + x];
- this->denoised_ptr_[c][y * kWidth + x] =
- this->denoised_ptr_[0][y * kWidth + x];
+ this->data_ptr_[c][y * width + x] = this->data_ptr_[0][y * width + x];
+ this->denoised_ptr_[c][y * width + x] =
+ this->denoised_ptr_[0][y * width + x];
}
}
}
@@ -689,26 +688,26 @@
TYPED_TEST_P(NoiseModelUpdateTest, UpdateSuccessForScaledWhiteNoise) {
aom_noise_model_t &model = this->model_;
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
const double kCoeffEps = 0.055;
const double kLowStd = 1;
const double kHighStd = 4;
const int shift = this->kBitDepth - 8;
- for (int y = 0; y < kHeight; ++y) {
- for (int x = 0; x < kWidth; ++x) {
+ for (int y = 0; y < height; ++y) {
+ for (int x = 0; x < width; ++x) {
for (int c = 0; c < 3; ++c) {
// The image data is bimodal:
// Bottom half has low intensity and low noise strength
// Top half has high intensity and high noise strength
- const int avg = (y < kHeight / 2) ? 4 : 245;
- const double std = (y < kHeight / 2) ? kLowStd : kHighStd;
- this->data_ptr_[c][y * kWidth + x] =
+ const int avg = (y < height / 2) ? 4 : 245;
+ const double std = (y < height / 2) ? kLowStd : kHighStd;
+ this->data_ptr_[c][y * width + x] =
((uint8_t)std::min((int)255,
(int)(2 + avg + randn(&this->random_, std))))
<< shift;
- this->denoised_ptr_[c][y * kWidth + x] = (2 + avg) << shift;
+ this->denoised_ptr_[c][y * width + x] = (2 + avg) << shift;
}
}
}
@@ -766,8 +765,8 @@
TYPED_TEST_P(NoiseModelUpdateTest, UpdateSuccessForCorrelatedNoise) {
aom_noise_model_t &model = this->model_;
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
const int kNumCoeffs = 24;
const double kStd = 4;
const double kStdEps = 0.3;
@@ -797,16 +796,16 @@
const int shift = this->kBitDepth - 8;
for (int c = 0; c < 3; ++c) {
noise_synth(&this->random_, model.params.lag, model.n, model.coords,
- kCoeffs[c], this->noise_ptr_[c], kWidth, kHeight);
+ kCoeffs[c], this->noise_ptr_[c], width, height);
const int x_shift = c > 0 ? this->chroma_sub_[0] : 0;
const int y_shift = c > 0 ? this->chroma_sub_[1] : 0;
- for (int y = 0; y < (kHeight >> y_shift); ++y) {
- for (int x = 0; x < (kWidth >> x_shift); ++x) {
+ for (int y = 0; y < (height >> y_shift); ++y) {
+ for (int x = 0; x < (width >> x_shift); ++x) {
const uint8_t value = 64 + x / 2 + y / 4;
- this->data_ptr_[c][y * kWidth + x] =
- (uint8_t(value + this->noise_ptr_[c][y * kWidth + x] * kStd))
+ this->data_ptr_[c][y * width + x] =
+ (uint8_t(value + this->noise_ptr_[c][y * width + x] * kStd))
<< shift;
- this->denoised_ptr_[c][y * kWidth + x] = value << shift;
+ this->denoised_ptr_[c][y * width + x] = value << shift;
}
}
}
@@ -830,10 +829,10 @@
model.latest_state[c].eqns.x, kCoeffs[c], kNumCoeffs));
noise_synth(&this->random_, model.params.lag, model.n, model.coords,
- model.latest_state[c].eqns.x, &this->renoise_[0], kWidth,
- kHeight);
+ model.latest_state[c].eqns.x, &this->renoise_[0], width,
+ height);
- EXPECT_TRUE(aom_noise_data_validate(&this->renoise_[0], kWidth, kHeight));
+ EXPECT_TRUE(aom_noise_data_validate(&this->renoise_[0], width, height));
}
// Check fitted noise strength
@@ -850,15 +849,15 @@
TYPED_TEST_P(NoiseModelUpdateTest,
NoiseStrengthChangeSignalsDifferentNoiseType) {
aom_noise_model_t &model = this->model_;
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
- const int kBlockSize = this->kBlockSize;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
+ const int block_size = this->kBlockSize;
// Create a gradient image with std = 2 uncorrelated noise
const double kStd = 2;
const int shift = this->kBitDepth - 8;
- for (int i = 0; i < kWidth * kHeight; ++i) {
- const uint8_t val = (i % kWidth) < kWidth / 2 ? 64 : 192;
+ for (int i = 0; i < width * height; ++i) {
+ const uint8_t val = (i % width) < width / 2 ? 64 : 192;
for (int c = 0; c < 3; ++c) {
this->noise_ptr_[c][i] = randn(&this->random_, 1);
this->data_ptr_[c][i] = ((uint8_t)(this->noise_ptr_[c][i] * kStd + val))
@@ -869,7 +868,7 @@
this->flat_blocks_.assign(this->flat_blocks_.size(), 1);
EXPECT_EQ(AOM_NOISE_STATUS_OK, this->NoiseModelUpdate());
- const int kNumBlocks = kWidth * kHeight / kBlockSize / kBlockSize;
+ const int kNumBlocks = width * height / block_size / block_size;
EXPECT_EQ(kNumBlocks, model.latest_state[0].strength_solver.num_equations);
EXPECT_EQ(kNumBlocks, model.latest_state[1].strength_solver.num_equations);
EXPECT_EQ(kNumBlocks, model.latest_state[2].strength_solver.num_equations);
@@ -878,8 +877,8 @@
EXPECT_EQ(kNumBlocks, model.combined_state[2].strength_solver.num_equations);
// Bump up noise by an insignificant amount
- for (int i = 0; i < kWidth * kHeight; ++i) {
- const uint8_t val = (i % kWidth) < kWidth / 2 ? 64 : 192;
+ for (int i = 0; i < width * height; ++i) {
+ const uint8_t val = (i % width) < width / 2 ? 64 : 192;
this->data_ptr_[0][i] =
((uint8_t)(this->noise_ptr_[0][i] * (kStd + 0.085) + val)) << shift;
}
@@ -899,9 +898,9 @@
// Bump up the noise strength on half the image for one channel by a
// significant amount.
- for (int i = 0; i < kWidth * kHeight; ++i) {
- const uint8_t val = (i % kWidth) < kWidth / 2 ? 64 : 128;
- if (i % kWidth < kWidth / 2) {
+ for (int i = 0; i < width * height; ++i) {
+ const uint8_t val = (i % width) < width / 2 ? 64 : 128;
+ if (i % width < width / 2) {
this->data_ptr_[0][i] =
((uint8_t)(randn(&this->random_, kStd + 0.5) + val)) << shift;
}
@@ -931,8 +930,8 @@
TYPED_TEST_P(NoiseModelUpdateTest, NoiseCoeffsSignalsDifferentNoiseType) {
aom_noise_model_t &model = this->model_;
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
const double kCoeffs[2][24] = {
{ 0.02884, -0.03356, 0.00633, 0.01757, 0.02849, -0.04620,
0.02833, -0.07178, 0.07076, -0.11603, -0.10413, -0.16571,
@@ -945,8 +944,8 @@
};
noise_synth(&this->random_, model.params.lag, model.n, model.coords,
- kCoeffs[0], this->noise_ptr_[0], kWidth, kHeight);
- for (int i = 0; i < kWidth * kHeight; ++i) {
+ kCoeffs[0], this->noise_ptr_[0], width, height);
+ for (int i = 0; i < width * height; ++i) {
this->data_ptr_[0][i] = (uint8_t)(128 + this->noise_ptr_[0][i]);
}
this->flat_blocks_.assign(this->flat_blocks_.size(), 1);
@@ -954,8 +953,8 @@
// Now try with the second set of AR coefficients
noise_synth(&this->random_, model.params.lag, model.n, model.coords,
- kCoeffs[1], this->noise_ptr_[0], kWidth, kHeight);
- for (int i = 0; i < kWidth * kHeight; ++i) {
+ kCoeffs[1], this->noise_ptr_[0], width, height);
+ for (int i = 0; i < width * height; ++i) {
this->data_ptr_[0][i] = (uint8_t)(128 + this->noise_ptr_[0][i]);
}
EXPECT_EQ(AOM_NOISE_STATUS_DIFFERENT_NOISE_TYPE, this->NoiseModelUpdate());
@@ -1313,9 +1312,9 @@
}
TYPED_TEST_P(WienerDenoiseTest, GradientTest) {
- const int kWidth = this->kWidth;
- const int kHeight = this->kHeight;
- const int kBlockSize = this->kBlockSize;
+ const int width = this->kWidth;
+ const int height = this->kHeight;
+ const int block_size = this->kBlockSize;
const uint8_t *const data_ptrs[3] = {
reinterpret_cast<uint8_t *>(&this->data_[0][0]),
reinterpret_cast<uint8_t *>(&this->data_[1][0]),
@@ -1327,34 +1326,33 @@
reinterpret_cast<uint8_t *>(&this->denoised_[2][0]),
};
const int ret = aom_wiener_denoise_2d(
- data_ptrs, denoised_ptrs, kWidth, kHeight, this->stride_,
- this->chroma_sub_, this->noise_psd_ptrs_, this->kBlockSize,
- this->kBitDepth, this->kUseHighBD);
+ data_ptrs, denoised_ptrs, width, height, this->stride_, this->chroma_sub_,
+ this->noise_psd_ptrs_, block_size, this->kBitDepth, this->kUseHighBD);
EXPECT_EQ(1, ret);
// Check the noise on the denoised image (from the analytical gradient)
// and make sure that it is less than what we added.
for (int c = 0; c < 3; ++c) {
- std::vector<double> measured_noise(kWidth * kHeight);
+ std::vector<double> measured_noise(width * height);
double var = 0;
const int shift = (c > 0);
- for (int x = 0; x < (kWidth >> shift); ++x) {
- for (int y = 0; y < (kHeight >> shift); ++y) {
+ for (int x = 0; x < (width >> shift); ++x) {
+ for (int y = 0; y < (height >> shift); ++y) {
const double diff = this->denoised_[c][y * this->stride_[c] + x] -
x * this->kScaleNoise;
var += diff * diff;
- measured_noise[y * kWidth + x] = diff;
+ measured_noise[y * width + x] = diff;
}
}
- var /= (kWidth * kHeight);
+ var /= (width * height);
const double std = sqrt(std::max(0.0, var));
EXPECT_LE(std, 1.25f * this->kScaleNoise);
if (c == 0) {
std::vector<float> measured_psd =
- get_noise_psd(&measured_noise[0], kWidth, kHeight, kBlockSize);
- std::vector<double> measured_psd_d(kBlockSize * kBlockSize);
- std::vector<double> noise_psd_d(kBlockSize * kBlockSize);
+ get_noise_psd(&measured_noise[0], width, height, block_size);
+ std::vector<double> measured_psd_d(block_size * block_size);
+ std::vector<double> noise_psd_d(block_size * block_size);
std::copy(measured_psd.begin(), measured_psd.end(),
measured_psd_d.begin());
std::copy(this->noise_psd_[0].begin(), this->noise_psd_[0].end(),
diff --git a/test/obmc_sad_test.cc b/test/obmc_sad_test.cc
index 9b70366..8d13ac1 100644
--- a/test/obmc_sad_test.cc
+++ b/test/obmc_sad_test.cc
@@ -147,6 +147,37 @@
::testing::ValuesIn(avx2_functions));
#endif // HAVE_AVX2
+#if HAVE_NEON
+const ObmcSadTest::ParamType neon_functions[] = {
+ TestFuncs(aom_obmc_sad128x128_c, aom_obmc_sad128x128_neon),
+ TestFuncs(aom_obmc_sad128x64_c, aom_obmc_sad128x64_neon),
+ TestFuncs(aom_obmc_sad64x128_c, aom_obmc_sad64x128_neon),
+ TestFuncs(aom_obmc_sad64x64_c, aom_obmc_sad64x64_neon),
+ TestFuncs(aom_obmc_sad64x32_c, aom_obmc_sad64x32_neon),
+ TestFuncs(aom_obmc_sad32x64_c, aom_obmc_sad32x64_neon),
+ TestFuncs(aom_obmc_sad32x32_c, aom_obmc_sad32x32_neon),
+ TestFuncs(aom_obmc_sad32x16_c, aom_obmc_sad32x16_neon),
+ TestFuncs(aom_obmc_sad16x32_c, aom_obmc_sad16x32_neon),
+ TestFuncs(aom_obmc_sad16x16_c, aom_obmc_sad16x16_neon),
+ TestFuncs(aom_obmc_sad16x8_c, aom_obmc_sad16x8_neon),
+ TestFuncs(aom_obmc_sad8x16_c, aom_obmc_sad8x16_neon),
+ TestFuncs(aom_obmc_sad8x8_c, aom_obmc_sad8x8_neon),
+ TestFuncs(aom_obmc_sad8x4_c, aom_obmc_sad8x4_neon),
+ TestFuncs(aom_obmc_sad4x8_c, aom_obmc_sad4x8_neon),
+ TestFuncs(aom_obmc_sad4x4_c, aom_obmc_sad4x4_neon),
+
+ TestFuncs(aom_obmc_sad64x16_c, aom_obmc_sad64x16_neon),
+ TestFuncs(aom_obmc_sad16x64_c, aom_obmc_sad16x64_neon),
+ TestFuncs(aom_obmc_sad32x8_c, aom_obmc_sad32x8_neon),
+ TestFuncs(aom_obmc_sad8x32_c, aom_obmc_sad8x32_neon),
+ TestFuncs(aom_obmc_sad16x4_c, aom_obmc_sad16x4_neon),
+ TestFuncs(aom_obmc_sad4x16_c, aom_obmc_sad4x16_neon),
+};
+
+INSTANTIATE_TEST_SUITE_P(NEON, ObmcSadTest,
+ ::testing::ValuesIn(neon_functions));
+#endif // HAVE_NEON
+
#if CONFIG_AV1_HIGHBITDEPTH
////////////////////////////////////////////////////////////////////////////////
// High bit-depth
diff --git a/test/obmc_variance_test.cc b/test/obmc_variance_test.cc
index 03b38f7..b2bf42a 100644
--- a/test/obmc_variance_test.cc
+++ b/test/obmc_variance_test.cc
@@ -127,8 +127,9 @@
const int elapsed_time_simd =
static_cast<int>(aom_usec_timer_elapsed(&test_timer));
- printf("c_time=%d \t simd_time=%d \t gain=%d \n", elapsed_time_c,
- elapsed_time_simd, (elapsed_time_c / elapsed_time_simd));
+ printf("c_time=%d \t simd_time=%d \t gain=%f \n", elapsed_time_c,
+ elapsed_time_simd,
+ static_cast<double>(elapsed_time_c) / elapsed_time_simd);
}
#if HAVE_SSE4_1
@@ -193,6 +194,37 @@
::testing::ValuesIn(avx2_functions));
#endif // HAVE_AVX2
+#if HAVE_NEON
+const ObmcVarianceTest::ParamType neon_functions[] = {
+ TestFuncs(aom_obmc_variance128x128_c, aom_obmc_variance128x128_neon),
+ TestFuncs(aom_obmc_variance128x64_c, aom_obmc_variance128x64_neon),
+ TestFuncs(aom_obmc_variance64x128_c, aom_obmc_variance64x128_neon),
+ TestFuncs(aom_obmc_variance64x64_c, aom_obmc_variance64x64_neon),
+ TestFuncs(aom_obmc_variance64x32_c, aom_obmc_variance64x32_neon),
+ TestFuncs(aom_obmc_variance32x64_c, aom_obmc_variance32x64_neon),
+ TestFuncs(aom_obmc_variance32x32_c, aom_obmc_variance32x32_neon),
+ TestFuncs(aom_obmc_variance32x16_c, aom_obmc_variance32x16_neon),
+ TestFuncs(aom_obmc_variance16x32_c, aom_obmc_variance16x32_neon),
+ TestFuncs(aom_obmc_variance16x16_c, aom_obmc_variance16x16_neon),
+ TestFuncs(aom_obmc_variance16x8_c, aom_obmc_variance16x8_neon),
+ TestFuncs(aom_obmc_variance8x16_c, aom_obmc_variance8x16_neon),
+ TestFuncs(aom_obmc_variance8x8_c, aom_obmc_variance8x8_neon),
+ TestFuncs(aom_obmc_variance8x4_c, aom_obmc_variance8x4_neon),
+ TestFuncs(aom_obmc_variance4x8_c, aom_obmc_variance4x8_neon),
+ TestFuncs(aom_obmc_variance4x4_c, aom_obmc_variance4x4_neon),
+
+ TestFuncs(aom_obmc_variance64x16_c, aom_obmc_variance64x16_neon),
+ TestFuncs(aom_obmc_variance16x64_c, aom_obmc_variance16x64_neon),
+ TestFuncs(aom_obmc_variance32x8_c, aom_obmc_variance32x8_neon),
+ TestFuncs(aom_obmc_variance8x32_c, aom_obmc_variance8x32_neon),
+ TestFuncs(aom_obmc_variance16x4_c, aom_obmc_variance16x4_neon),
+ TestFuncs(aom_obmc_variance4x16_c, aom_obmc_variance4x16_neon),
+};
+
+INSTANTIATE_TEST_SUITE_P(NEON, ObmcVarianceTest,
+ ::testing::ValuesIn(neon_functions));
+#endif // HAVE_NEON
+
////////////////////////////////////////////////////////////////////////////////
// High bit-depth
////////////////////////////////////////////////////////////////////////////////
diff --git a/test/quantize_func_test.cc b/test/quantize_func_test.cc
index 6f58898..04e8306 100644
--- a/test/quantize_func_test.cc
+++ b/test/quantize_func_test.cc
@@ -768,7 +768,7 @@
::testing::ValuesIn(kQParamArrayNEON));
#endif
-#if HAVE_SSSE3 && ARCH_X86_64
+#if HAVE_SSSE3 && AOM_ARCH_X86_64
INSTANTIATE_TEST_SUITE_P(
SSSE3, FullPrecisionQuantizeTest,
::testing::Values(
@@ -779,7 +779,7 @@
make_tuple(&aom_quantize_b_64x64_c, &aom_quantize_b_64x64_ssse3,
static_cast<TX_SIZE>(TX_64X64), TYPE_B, AOM_BITS_8)));
-#endif // HAVE_SSSE3 && ARCH_X86_64
+#endif // HAVE_SSSE3 && AOM_ARCH_X86_64
#if HAVE_AVX
INSTANTIATE_TEST_SUITE_P(
diff --git a/test/ratectrl_qmode_test.cc b/test/ratectrl_qmode_test.cc
deleted file mode 100644
index fa0c19a..0000000
--- a/test/ratectrl_qmode_test.cc
+++ /dev/null
@@ -1,1180 +0,0 @@
-/*
- * Copyright (c) 2022, Alliance for Open Media. All rights reserved
- *
- * This source code is subject to the terms of the BSD 2 Clause License and
- * the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
- * was not distributed with this source code in the LICENSE file, you can
- * obtain it at www.aomedia.org/license/software. If the Alliance for Open
- * Media Patent License 1.0 was not distributed with this source code in the
- * PATENTS file, you can obtain it at www.aomedia.org/license/patent.
- */
-
-#include "av1/qmode_rc/ratectrl_qmode.h"
-
-#include <algorithm>
-#include <array>
-#include <cerrno>
-#include <cstring>
-#include <fstream>
-#include <memory>
-#include <numeric>
-#include <random>
-#include <string>
-#include <unordered_set>
-#include <vector>
-
-#include "av1/qmode_rc/ducky_encode.h"
-#include "av1/qmode_rc/reference_manager.h"
-#include "test/mock_ratectrl_qmode.h"
-#include "test/video_source.h"
-#include "third_party/googletest/src/googlemock/include/gmock/gmock.h"
-#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
-
-namespace {
-
-using ::testing::HasSubstr;
-
-constexpr int kRefFrameTableSize = 7;
-constexpr int kFrameWidth = 352;
-constexpr int kFrameHeight = 288;
-constexpr int kFrameLimit = 250;
-
-MATCHER(IsOkStatus, "") {
- *result_listener << "with code " << arg.code
- << " and message: " << arg.message;
- return arg.ok();
-}
-
-// Reads a whitespace-delimited string from stream, and parses it as a double.
-// Returns an empty string if the entire string was successfully parsed as a
-// double, or an error messaage if not.
-std::string ReadDouble(std::istream &stream, double *value) {
- std::string word;
- stream >> word;
- if (word.empty()) {
- return "Unexpectedly reached end of input";
- }
- char *end;
- *value = std::strtod(word.c_str(), &end);
- if (*end != '\0') {
- return "Unexpected characters found: " + word;
- }
- return "";
-}
-
-void ReadFirstpassInfo(const std::string &filename,
- aom::FirstpassInfo *firstpass_info,
- const int frame_limit) {
- // These golden files are generated by the following command line:
- // ./aomenc --width=352 --height=288 --fps=30/1 --limit=250 --codec=av1
- // --cpu-used=3 --end-usage=q --cq-level=36 --threads=0 --profile=0
- // --lag-in-frames=35 --min-q=0 --max-q=63 --auto-alt-ref=1 --passes=2
- // --kf-max-dist=160 --kf-min-dist=0 --drop-frame=0
- // --static-thresh=0 --minsection-pct=0 --maxsection-pct=2000
- // --arnr-maxframes=7
- // --arnr-strength=5 --sharpness=0 --undershoot-pct=100 --overshoot-pct=100
- // --frame-parallel=0
- // --tile-columns=0 -o output.webm hantro_collage_w352h288.yuv
- // First pass stats are written out in av1_get_second_pass_params right after
- // calculate_gf_length.
- std::string path = libaom_test::GetDataPath() + "/" + filename;
- std::ifstream firstpass_stats_file(path);
- ASSERT_TRUE(firstpass_stats_file.good())
- << "Error opening " << path << ": " << std::strerror(errno);
- firstpass_info->num_mbs_16x16 =
- (kFrameWidth / 16 + 1) * (kFrameHeight / 16 + 1);
- std::string newline;
- int frame_number = 0;
- while (std::getline(firstpass_stats_file, newline) &&
- frame_number < frame_limit) {
- std::istringstream iss(newline);
- FIRSTPASS_STATS firstpass_stats_input = {};
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.frame), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.weight), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.intra_error), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.frame_avg_wavelet_energy),
- "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.coded_error), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.sr_coded_error), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.pcnt_inter), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.pcnt_motion), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.pcnt_second_ref), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.pcnt_neutral), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.intra_skip_pct), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.inactive_zone_rows), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.inactive_zone_cols), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.MVr), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.mvr_abs), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.MVc), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.mvc_abs), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.MVrv), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.MVcv), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.mv_in_out_count), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.new_mv_count), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.duration), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.count), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.raw_error_stdev), "");
- iss >> firstpass_stats_input.is_flash;
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.noise_var), "");
- ASSERT_EQ(ReadDouble(iss, &firstpass_stats_input.cor_coeff), "");
- ASSERT_TRUE(iss.eof()) << "Too many fields on line "
- << firstpass_info->stats_list.size() + 1 << "\n"
- << newline;
- firstpass_info->stats_list.push_back(firstpass_stats_input);
-
- frame_number++;
- }
-}
-} // namespace
-
-namespace aom {
-
-using ::testing::ElementsAre;
-using ::testing::Field;
-using ::testing::Return;
-
-constexpr double kErrorEpsilon = 0.000001;
-
-void TestGopDisplayOrder(const GopStruct &gop_struct) {
- // Test whether show frames' order indices are sequential
- int expected_order_idx = 0;
- int expected_show_frame_count = 0;
- for (const auto &gop_frame : gop_struct.gop_frame_list) {
- if (gop_frame.is_show_frame) {
- EXPECT_EQ(gop_frame.order_idx, expected_order_idx);
- expected_order_idx++;
- expected_show_frame_count++;
- }
- }
- EXPECT_EQ(gop_struct.show_frame_count, expected_show_frame_count);
-}
-
-void TestGopGlobalOrderIdx(const GopStruct &gop_struct,
- int global_order_idx_offset) {
- // Test whether show frames' global order indices are sequential
- EXPECT_EQ(gop_struct.global_order_idx_offset, global_order_idx_offset);
- int expected_global_order_idx = global_order_idx_offset;
- for (const auto &gop_frame : gop_struct.gop_frame_list) {
- if (gop_frame.is_show_frame) {
- EXPECT_EQ(gop_frame.global_order_idx, expected_global_order_idx);
- expected_global_order_idx++;
- }
- }
-}
-
-void TestGopGlobalCodingIdx(const GopStruct &gop_struct,
- int global_coding_idx_offset) {
- EXPECT_EQ(gop_struct.global_coding_idx_offset, global_coding_idx_offset);
- for (const auto &gop_frame : gop_struct.gop_frame_list) {
- EXPECT_EQ(gop_frame.global_coding_idx,
- global_coding_idx_offset + gop_frame.coding_idx);
- }
-}
-
-void TestColocatedShowFrame(const GopStruct &gop_struct) {
- // Test whether each non show frame has a colocated show frame
- int gop_size = static_cast<int>(gop_struct.gop_frame_list.size());
- for (int gop_idx = 0; gop_idx < gop_size; ++gop_idx) {
- auto &gop_frame = gop_struct.gop_frame_list[gop_idx];
- if (gop_frame.is_show_frame == 0) {
- bool found_colocated_ref_frame = false;
- for (int i = gop_idx + 1; i < gop_size; ++i) {
- auto &next_gop_frame = gop_struct.gop_frame_list[i];
- if (gop_frame.order_idx == next_gop_frame.order_idx) {
- found_colocated_ref_frame = true;
- EXPECT_EQ(gop_frame.update_ref_idx, next_gop_frame.colocated_ref_idx);
- EXPECT_TRUE(next_gop_frame.is_show_frame);
- }
- if (gop_frame.update_ref_idx == next_gop_frame.update_ref_idx) {
- break;
- }
- }
- EXPECT_TRUE(found_colocated_ref_frame);
- }
- }
-}
-
-void TestLayerDepth(const GopStruct &gop_struct, int max_layer_depth) {
- int gop_size = static_cast<int>(gop_struct.gop_frame_list.size());
- for (int gop_idx = 0; gop_idx < gop_size; ++gop_idx) {
- const auto &gop_frame = gop_struct.gop_frame_list[gop_idx];
- if (gop_frame.is_key_frame) {
- EXPECT_EQ(gop_frame.layer_depth, 0);
- }
-
- if (gop_frame.is_arf_frame) {
- EXPECT_LT(gop_frame.layer_depth, max_layer_depth);
- }
-
- if (!gop_frame.is_key_frame && !gop_frame.is_arf_frame) {
- EXPECT_EQ(gop_frame.layer_depth, max_layer_depth);
- }
- }
-}
-
-void TestArfInterval(const GopStruct &gop_struct) {
- std::vector<int> arf_order_idx_list;
- for (const auto &gop_frame : gop_struct.gop_frame_list) {
- if (gop_frame.is_arf_frame) {
- arf_order_idx_list.push_back(gop_frame.order_idx);
- }
- }
- std::sort(arf_order_idx_list.begin(), arf_order_idx_list.end());
- int arf_count = static_cast<int>(arf_order_idx_list.size());
- for (int i = 1; i < arf_count; ++i) {
- int arf_interval = arf_order_idx_list[i] - arf_order_idx_list[i - 1];
- EXPECT_GE(arf_interval, kMinArfInterval);
- }
-}
-
-class RateControlQModeTest : public ::testing::Test {
- protected:
- RateControlQModeTest() {
- rc_param_.max_gop_show_frame_count = 32;
- rc_param_.min_gop_show_frame_count = 4;
- rc_param_.ref_frame_table_size = 7;
- rc_param_.max_ref_frames = 7;
- rc_param_.base_q_index = 128;
- rc_param_.frame_height = kFrameHeight;
- rc_param_.frame_width = kFrameWidth;
- }
-
- RateControlParam rc_param_ = {};
-};
-
-TEST_F(RateControlQModeTest, ConstructGopARF) {
- int show_frame_count = 16;
- const bool has_key_frame = false;
- const int global_coding_idx_offset = 5;
- const int global_order_idx_offset = 20;
- RefFrameManager ref_frame_manager(kRefFrameTableSize, 7);
- GopStruct gop_struct =
- ConstructGop(&ref_frame_manager, show_frame_count, has_key_frame,
- global_coding_idx_offset, global_order_idx_offset);
- EXPECT_EQ(gop_struct.show_frame_count, show_frame_count);
- TestGopDisplayOrder(gop_struct);
- TestGopGlobalOrderIdx(gop_struct, global_order_idx_offset);
- TestGopGlobalCodingIdx(gop_struct, global_coding_idx_offset);
- TestColocatedShowFrame(gop_struct);
- const int max_layer_depth = ref_frame_manager.MaxRefFrame();
- TestLayerDepth(gop_struct, max_layer_depth);
- TestArfInterval(gop_struct);
-}
-
-TEST_F(RateControlQModeTest, ConstructGopKey) {
- const int show_frame_count = 16;
- const bool has_key_frame = true;
- const int global_coding_idx_offset = 10;
- const int global_order_idx_offset = 8;
- RefFrameManager ref_frame_manager(kRefFrameTableSize, 7);
- GopStruct gop_struct =
- ConstructGop(&ref_frame_manager, show_frame_count, has_key_frame,
- global_coding_idx_offset, global_order_idx_offset);
- EXPECT_EQ(gop_struct.show_frame_count, show_frame_count);
- TestGopDisplayOrder(gop_struct);
- TestGopGlobalOrderIdx(gop_struct, global_order_idx_offset);
- TestGopGlobalCodingIdx(gop_struct, global_coding_idx_offset);
- TestColocatedShowFrame(gop_struct);
- const int max_layer_depth = ref_frame_manager.MaxRefFrame();
- TestLayerDepth(gop_struct, max_layer_depth);
- TestArfInterval(gop_struct);
-}
-
-TEST_F(RateControlQModeTest, ConstructShortGop) {
- int show_frame_count = 2;
- const bool has_key_frame = false;
- const int global_coding_idx_offset = 5;
- const int global_order_idx_offset = 20;
- RefFrameManager ref_frame_manager(kRefFrameTableSize, 7);
- GopStruct gop_struct =
- ConstructGop(&ref_frame_manager, show_frame_count, has_key_frame,
- global_coding_idx_offset, global_order_idx_offset);
- EXPECT_EQ(gop_struct.show_frame_count, show_frame_count);
- TestGopDisplayOrder(gop_struct);
- TestGopGlobalOrderIdx(gop_struct, global_order_idx_offset);
- TestGopGlobalCodingIdx(gop_struct, global_coding_idx_offset);
- TestColocatedShowFrame(gop_struct);
- const int max_layer_depth = 1 + kLayerDepthOffset;
- TestLayerDepth(gop_struct, max_layer_depth);
- TestArfInterval(gop_struct);
-}
-
-static TplBlockStats CreateToyTplBlockStats(int h, int w, int r, int c,
- int intra_cost, int inter_cost) {
- TplBlockStats tpl_block_stats = {};
- tpl_block_stats.height = h;
- tpl_block_stats.width = w;
- tpl_block_stats.row = r;
- tpl_block_stats.col = c;
- tpl_block_stats.intra_cost = intra_cost;
- tpl_block_stats.inter_cost = inter_cost;
- tpl_block_stats.ref_frame_index = { -1, -1 };
- return tpl_block_stats;
-}
-
-static TplFrameStats CreateToyTplFrameStatsWithDiffSizes(int min_block_size,
- int max_block_size) {
- TplFrameStats frame_stats;
- const int max_h = max_block_size;
- const int max_w = max_h;
- const int count = max_block_size / min_block_size;
- frame_stats.min_block_size = min_block_size;
- frame_stats.frame_height = max_h * count;
- frame_stats.frame_width = max_w * count;
- frame_stats.rate_dist_present = false;
- for (int i = 0; i < count; ++i) {
- for (int j = 0; j < count; ++j) {
- int h = max_h >> i;
- int w = max_w >> j;
- for (int u = 0; u * h < max_h; ++u) {
- for (int v = 0; v * w < max_w; ++v) {
- int r = max_h * i + h * u;
- int c = max_w * j + w * v;
- int intra_cost = std::rand() % 16;
- TplBlockStats block_stats =
- CreateToyTplBlockStats(h, w, r, c, intra_cost, 0);
- frame_stats.block_stats_list.push_back(block_stats);
- }
- }
- }
- }
- return frame_stats;
-}
-
-static void AugmentTplFrameStatsWithRefFrames(
- TplFrameStats *tpl_frame_stats,
- const std::array<int, kBlockRefCount> &ref_frame_index) {
- for (auto &block_stats : tpl_frame_stats->block_stats_list) {
- block_stats.ref_frame_index = ref_frame_index;
- }
-}
-static void AugmentTplFrameStatsWithMotionVector(
- TplFrameStats *tpl_frame_stats,
- const std::array<MotionVector, kBlockRefCount> &mv) {
- for (auto &block_stats : tpl_frame_stats->block_stats_list) {
- block_stats.mv = mv;
- }
-}
-
-static RefFrameTable CreateToyRefFrameTable(int frame_count) {
- RefFrameTable ref_frame_table(kRefFrameTableSize);
- EXPECT_LE(frame_count, kRefFrameTableSize);
- for (int i = 0; i < frame_count; ++i) {
- ref_frame_table[i] =
- GopFrameBasic(0, 0, i, i, 0, 0, GopFrameType::kRegularLeaf);
- }
- for (int i = frame_count; i < kRefFrameTableSize; ++i) {
- ref_frame_table[i] = GopFrameInvalid();
- }
- return ref_frame_table;
-}
-
-static MotionVector CreateFullpelMv(int row, int col) {
- return { row, col, 0 };
-}
-
-double TplFrameStatsAccumulateIntraCost(const TplFrameStats &frame_stats) {
- double sum = 0;
- for (auto &block_stats : frame_stats.block_stats_list) {
- sum += block_stats.intra_cost;
- }
- return std::max(sum, 1.0);
-}
-
-TEST_F(RateControlQModeTest, CreateTplFrameDepStats) {
- TplFrameStats frame_stats = CreateToyTplFrameStatsWithDiffSizes(8, 16);
- StatusOr<TplFrameDepStats> frame_dep_stats =
- CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- ASSERT_THAT(frame_dep_stats.status(), IsOkStatus());
- EXPECT_EQ(frame_stats.min_block_size, frame_dep_stats->unit_size);
- const int unit_rows = static_cast<int>(frame_dep_stats->unit_stats.size());
- const int unit_cols = static_cast<int>(frame_dep_stats->unit_stats[0].size());
- EXPECT_EQ(frame_stats.frame_height, unit_rows * frame_dep_stats->unit_size);
- EXPECT_EQ(frame_stats.frame_width, unit_cols * frame_dep_stats->unit_size);
- const double intra_cost_sum =
- TplFrameDepStatsAccumulateIntraCost(*frame_dep_stats);
-
- const double expected_intra_cost_sum =
- TplFrameStatsAccumulateIntraCost(frame_stats);
- EXPECT_NEAR(intra_cost_sum, expected_intra_cost_sum, kErrorEpsilon);
-}
-
-TEST_F(RateControlQModeTest, BlockRowNotAMultipleOfMinBlockSizeError) {
- TplFrameStats frame_stats = CreateToyTplFrameStatsWithDiffSizes(8, 16);
- frame_stats.block_stats_list.back().row = 1;
- auto result = CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- EXPECT_FALSE(result.ok());
- EXPECT_THAT(result.status().message, HasSubstr("must be a multiple of 8"));
-}
-
-TEST_F(RateControlQModeTest, BlockPositionOutOfRangeError) {
- TplFrameStats frame_stats = CreateToyTplFrameStatsWithDiffSizes(8, 16);
- frame_stats.block_stats_list.back().row += 8;
- auto result = CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- EXPECT_FALSE(result.ok());
- EXPECT_THAT(result.status().message, HasSubstr("out of range"));
-}
-
-TEST_F(RateControlQModeTest, GetBlockOverlapArea) {
- const int size = 8;
- const int r0 = 8;
- const int c0 = 9;
- std::vector<int> r1 = { 8, 10, 16, 10, 8, 100 };
- std::vector<int> c1 = { 9, 12, 17, 5, 100, 9 };
- std::vector<int> ref_overlap = { 64, 30, 0, 24, 0, 0 };
- for (int i = 0; i < static_cast<int>(r1.size()); ++i) {
- const int overlap0 = GetBlockOverlapArea(r0, c0, r1[i], c1[i], size);
- const int overlap1 = GetBlockOverlapArea(r1[i], c1[i], r0, c0, size);
- EXPECT_EQ(overlap0, ref_overlap[i]);
- EXPECT_EQ(overlap1, ref_overlap[i]);
- }
-}
-
-TEST_F(RateControlQModeTest, TplBlockStatsToDepStats) {
- const int intra_cost = 100;
- const int inter_cost = 120;
- const int unit_count = 2;
- TplBlockStats block_stats =
- CreateToyTplBlockStats(8, 4, 0, 0, intra_cost, inter_cost);
- TplUnitDepStats unit_stats = TplBlockStatsToDepStats(block_stats, unit_count);
- double expected_intra_cost = intra_cost * 1.0 / unit_count;
- EXPECT_NEAR(unit_stats.intra_cost, expected_intra_cost, kErrorEpsilon);
- // When inter_cost >= intra_cost in block_stats, in unit_stats,
- // the inter_cost will be modified so that it's upper-bounded by intra_cost.
- EXPECT_LE(unit_stats.inter_cost, unit_stats.intra_cost);
-}
-
-TEST_F(RateControlQModeTest, TplFrameDepStatsPropagateSingleZeroMotion) {
- // cur frame with coding_idx 1 use ref frame with coding_idx 0
- const std::array<int, kBlockRefCount> ref_frame_index = { 0, -1 };
- TplFrameStats frame_stats = CreateToyTplFrameStatsWithDiffSizes(8, 16);
- AugmentTplFrameStatsWithRefFrames(&frame_stats, ref_frame_index);
-
- TplGopDepStats gop_dep_stats;
- const int frame_count = 2;
- // ref frame with coding_idx 0
- TplFrameDepStats frame_dep_stats0 =
- CreateTplFrameDepStats(frame_stats.frame_height, frame_stats.frame_width,
- frame_stats.min_block_size);
- gop_dep_stats.frame_dep_stats_list.push_back(frame_dep_stats0);
-
- // cur frame with coding_idx 1
- const StatusOr<TplFrameDepStats> frame_dep_stats1 =
- CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- ASSERT_THAT(frame_dep_stats1.status(), IsOkStatus());
- gop_dep_stats.frame_dep_stats_list.push_back(std::move(*frame_dep_stats1));
-
- const RefFrameTable ref_frame_table = CreateToyRefFrameTable(frame_count);
- TplFrameDepStatsPropagate(/*coding_idx=*/1, ref_frame_table, &gop_dep_stats);
-
- // cur frame with coding_idx 1
- const double expected_propagation_sum =
- TplFrameStatsAccumulateIntraCost(frame_stats);
-
- // ref frame with coding_idx 0
- const double propagation_sum =
- TplFrameDepStatsAccumulate(gop_dep_stats.frame_dep_stats_list[0]);
-
- // The propagation_sum between coding_idx 0 and coding_idx 1 should be equal
- // because every block in cur frame has zero motion, use ref frame with
- // coding_idx 0 for prediction, and ref frame itself is empty.
- EXPECT_NEAR(propagation_sum, expected_propagation_sum, kErrorEpsilon);
-}
-
-TEST_F(RateControlQModeTest, TplFrameDepStatsPropagateCompoundZeroMotion) {
- // cur frame with coding_idx 2 use two ref frames with coding_idx 0 and 1
- const std::array<int, kBlockRefCount> ref_frame_index = { 0, 1 };
- TplFrameStats frame_stats = CreateToyTplFrameStatsWithDiffSizes(8, 16);
- AugmentTplFrameStatsWithRefFrames(&frame_stats, ref_frame_index);
-
- TplGopDepStats gop_dep_stats;
- const int frame_count = 3;
- // ref frame with coding_idx 0
- const TplFrameDepStats frame_dep_stats0 =
- CreateTplFrameDepStats(frame_stats.frame_height, frame_stats.frame_width,
- frame_stats.min_block_size);
- gop_dep_stats.frame_dep_stats_list.push_back(frame_dep_stats0);
-
- // ref frame with coding_idx 1
- const TplFrameDepStats frame_dep_stats1 =
- CreateTplFrameDepStats(frame_stats.frame_height, frame_stats.frame_width,
- frame_stats.min_block_size);
- gop_dep_stats.frame_dep_stats_list.push_back(frame_dep_stats1);
-
- // cur frame with coding_idx 2
- const StatusOr<TplFrameDepStats> frame_dep_stats2 =
- CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- ASSERT_THAT(frame_dep_stats2.status(), IsOkStatus());
- gop_dep_stats.frame_dep_stats_list.push_back(std::move(*frame_dep_stats2));
-
- const RefFrameTable ref_frame_table = CreateToyRefFrameTable(frame_count);
- TplFrameDepStatsPropagate(/*coding_idx=*/2, ref_frame_table, &gop_dep_stats);
-
- // cur frame with coding_idx 1
- const double expected_ref_sum = TplFrameStatsAccumulateIntraCost(frame_stats);
-
- // ref frame with coding_idx 0
- const double cost_sum0 =
- TplFrameDepStatsAccumulate(gop_dep_stats.frame_dep_stats_list[0]);
- EXPECT_NEAR(cost_sum0, expected_ref_sum * 0.5, kErrorEpsilon);
-
- // ref frame with coding_idx 1
- const double cost_sum1 =
- TplFrameDepStatsAccumulate(gop_dep_stats.frame_dep_stats_list[1]);
- EXPECT_NEAR(cost_sum1, expected_ref_sum * 0.5, kErrorEpsilon);
-}
-
-TEST_F(RateControlQModeTest, TplFrameDepStatsPropagateSingleWithMotion) {
- // cur frame with coding_idx 1 use ref frame with coding_idx 0
- const std::array<int, kBlockRefCount> ref_frame_index = { 0, -1 };
- const int min_block_size = 8;
- TplFrameStats frame_stats =
- CreateToyTplFrameStatsWithDiffSizes(min_block_size, min_block_size * 2);
- AugmentTplFrameStatsWithRefFrames(&frame_stats, ref_frame_index);
-
- const int mv_row = min_block_size / 2;
- const int mv_col = min_block_size / 4;
- const double r_ratio = 1.0 / 2;
- const double c_ratio = 1.0 / 4;
- std::array<MotionVector, kBlockRefCount> mv;
- mv[0] = CreateFullpelMv(mv_row, mv_col);
- mv[1] = CreateFullpelMv(0, 0);
- AugmentTplFrameStatsWithMotionVector(&frame_stats, mv);
-
- TplGopDepStats gop_dep_stats;
- const int frame_count = 2;
- // ref frame with coding_idx 0
- gop_dep_stats.frame_dep_stats_list.push_back(
- CreateTplFrameDepStats(frame_stats.frame_height, frame_stats.frame_width,
- frame_stats.min_block_size));
-
- // cur frame with coding_idx 1
- const StatusOr<TplFrameDepStats> frame_dep_stats =
- CreateTplFrameDepStatsWithoutPropagation(frame_stats);
- ASSERT_THAT(frame_dep_stats.status(), IsOkStatus());
- gop_dep_stats.frame_dep_stats_list.push_back(std::move(*frame_dep_stats));
-
- const RefFrameTable ref_frame_table = CreateToyRefFrameTable(frame_count);
- TplFrameDepStatsPropagate(/*coding_idx=*/1, ref_frame_table, &gop_dep_stats);
-
- const auto &dep_stats0 = gop_dep_stats.frame_dep_stats_list[0];
- const auto &dep_stats1 = gop_dep_stats.frame_dep_stats_list[1];
- const int unit_rows = static_cast<int>(dep_stats0.unit_stats.size());
- const int unit_cols = static_cast<int>(dep_stats0.unit_stats[0].size());
- for (int r = 0; r < unit_rows; ++r) {
- for (int c = 0; c < unit_cols; ++c) {
- double ref_value = 0;
- ref_value += (1 - r_ratio) * (1 - c_ratio) *
- dep_stats1.unit_stats[r][c].intra_cost;
- if (r - 1 >= 0) {
- ref_value += r_ratio * (1 - c_ratio) *
- dep_stats1.unit_stats[r - 1][c].intra_cost;
- }
- if (c - 1 >= 0) {
- ref_value += (1 - r_ratio) * c_ratio *
- dep_stats1.unit_stats[r][c - 1].intra_cost;
- }
- if (r - 1 >= 0 && c - 1 >= 0) {
- ref_value +=
- r_ratio * c_ratio * dep_stats1.unit_stats[r - 1][c - 1].intra_cost;
- }
- EXPECT_NEAR(dep_stats0.unit_stats[r][c].propagation_cost, ref_value,
- kErrorEpsilon);
- }
- }
-}
-
-// TODO(jianj): Add tests for non empty lookahead stats.
-TEST_F(RateControlQModeTest, ComputeTplGopDepStats) {
- TplGopStats tpl_gop_stats;
- std::vector<RefFrameTable> ref_frame_table_list;
- GopStruct gop_struct;
- gop_struct.show_frame_count = 3;
- for (int i = 0; i < 3; i++) {
- // Use the previous frame as reference
- const std::array<int, kBlockRefCount> ref_frame_index = { i - 1, -1 };
- int min_block_size = 8;
- TplFrameStats frame_stats =
- CreateToyTplFrameStatsWithDiffSizes(min_block_size, min_block_size * 2);
- AugmentTplFrameStatsWithRefFrames(&frame_stats, ref_frame_index);
- tpl_gop_stats.frame_stats_list.push_back(frame_stats);
-
- ref_frame_table_list.push_back(CreateToyRefFrameTable(i));
- }
- const StatusOr<TplGopDepStats> gop_dep_stats =
- ComputeTplGopDepStats(tpl_gop_stats, {}, ref_frame_table_list);
- ASSERT_THAT(gop_dep_stats.status(), IsOkStatus());
-
- double expected_sum = 0;
- for (int i = 2; i >= 0; i--) {
- // Due to the linear propagation with zero motion, we can accumulate the
- // frame_stats intra_cost and use it as expected sum for dependency stats
- expected_sum +=
- TplFrameStatsAccumulateIntraCost(tpl_gop_stats.frame_stats_list[i]);
- const double sum =
- TplFrameDepStatsAccumulate(gop_dep_stats->frame_dep_stats_list[i]);
- EXPECT_NEAR(sum, expected_sum, kErrorEpsilon);
- break;
- }
-}
-
-TEST(RefFrameManagerTest, GetRefFrameCount) {
- const std::vector<int> order_idx_list = { 0, 4, 2, 1, 2, 3, 4 };
- const std::vector<GopFrameType> type_list = {
- GopFrameType::kRegularKey,
- GopFrameType::kRegularArf,
- GopFrameType::kIntermediateArf,
- GopFrameType::kRegularLeaf,
- GopFrameType::kIntermediateOverlay,
- GopFrameType::kRegularLeaf,
- GopFrameType::kOverlay
- };
- RefFrameManager ref_manager(kRefFrameTableSize, 7);
- int coding_idx = 0;
- const int first_leaf_idx = 3;
- EXPECT_EQ(type_list[first_leaf_idx], GopFrameType::kRegularLeaf);
- // update reference frame until we see the first kRegularLeaf frame
- for (; coding_idx <= first_leaf_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
- EXPECT_EQ(ref_manager.GetRefFrameCount(), 4);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 2);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 1);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 1);
- EXPECT_EQ(ref_manager.CurGlobalOrderIdx(), 1);
-
- // update reference frame until we see the first kShowExisting frame
- const int first_show_existing_idx = 4;
- EXPECT_EQ(type_list[first_show_existing_idx],
- GopFrameType::kIntermediateOverlay);
- for (; coding_idx <= first_show_existing_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
- EXPECT_EQ(ref_manager.GetRefFrameCount(), 4);
- EXPECT_EQ(ref_manager.CurGlobalOrderIdx(), 2);
- // After the first kShowExisting, the kIntermediateArf should be moved from
- // kForward to kLast due to the cur_global_order_idx_ update
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 1);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 2);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 1);
-
- const int second_leaf_idx = 5;
- EXPECT_EQ(type_list[second_leaf_idx], GopFrameType::kRegularLeaf);
- for (; coding_idx <= second_leaf_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
- EXPECT_EQ(ref_manager.GetRefFrameCount(), 5);
- EXPECT_EQ(ref_manager.CurGlobalOrderIdx(), 3);
- // An additional kRegularLeaf frame is added into kLast
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 1);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 2);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 2);
-
- const int first_overlay_idx = 6;
- EXPECT_EQ(type_list[first_overlay_idx], GopFrameType::kOverlay);
- for (; coding_idx <= first_overlay_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
-
- EXPECT_EQ(ref_manager.GetRefFrameCount(), 5);
- EXPECT_EQ(ref_manager.CurGlobalOrderIdx(), 4);
- // After the kOverlay, the kRegularArf should be moved from
- // kForward to kBackward due to the cur_global_order_idx_ update
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 0);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 3);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 2);
-}
-
-void TestRefFrameManagerPriority(const RefFrameManager &ref_manager,
- RefUpdateType type) {
- int ref_count = ref_manager.GetRefFrameCountByType(type);
- int prev_global_order_idx = ref_manager.CurGlobalOrderIdx();
- // The lower the priority is, the closer the gop_frame.global_order_idx should
- // be with cur_global_order_idx_, with exception of a base layer ARF.
- for (int priority = 0; priority < ref_count; ++priority) {
- GopFrame gop_frame = ref_manager.GetRefFrameByPriority(type, priority);
- EXPECT_EQ(gop_frame.is_valid, true);
- if (type == RefUpdateType::kForward) {
- if (priority == 0) continue;
- EXPECT_GE(gop_frame.global_order_idx, prev_global_order_idx);
- } else {
- EXPECT_LE(gop_frame.global_order_idx, prev_global_order_idx);
- }
- prev_global_order_idx = gop_frame.global_order_idx;
- }
- GopFrame gop_frame =
- ref_manager.GetRefFrameByPriority(RefUpdateType::kForward, ref_count);
- EXPECT_EQ(gop_frame.is_valid, false);
-}
-
-TEST(RefFrameManagerTest, GetRefFrameByPriority) {
- const std::vector<int> order_idx_list = { 0, 4, 2, 1, 2, 3, 4 };
- const std::vector<GopFrameType> type_list = {
- GopFrameType::kRegularKey,
- GopFrameType::kRegularArf,
- GopFrameType::kIntermediateArf,
- GopFrameType::kRegularLeaf,
- GopFrameType::kIntermediateOverlay,
- GopFrameType::kRegularLeaf,
- GopFrameType::kOverlay
- };
- RefFrameManager ref_manager(kRefFrameTableSize, 7);
- int coding_idx = 0;
- const int first_leaf_idx = 3;
- EXPECT_EQ(type_list[first_leaf_idx], GopFrameType::kRegularLeaf);
- // update reference frame until we see the first kRegularLeaf frame
- for (; coding_idx <= first_leaf_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 2);
- TestRefFrameManagerPriority(ref_manager, RefUpdateType::kForward);
-
- const int first_overlay_idx = 6;
- EXPECT_EQ(type_list[first_overlay_idx], GopFrameType::kOverlay);
- for (; coding_idx <= first_overlay_idx; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
-
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 3);
- TestRefFrameManagerPriority(ref_manager, RefUpdateType::kBackward);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 2);
- TestRefFrameManagerPriority(ref_manager, RefUpdateType::kLast);
-}
-
-TEST(RefFrameManagerTest, GetRefFrameListByPriority) {
- const std::vector<int> order_idx_list = { 0, 4, 2, 1 };
- const int frame_count = static_cast<int>(order_idx_list.size());
- const std::vector<GopFrameType> type_list = { GopFrameType::kRegularKey,
- GopFrameType::kRegularArf,
- GopFrameType::kIntermediateArf,
- GopFrameType::kRegularLeaf };
- RefFrameManager ref_manager(kRefFrameTableSize, 7);
- for (int coding_idx = 0; coding_idx < frame_count; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx], 0, 0,
- type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
- EXPECT_EQ(ref_manager.GetRefFrameCount(), frame_count);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kForward), 2);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kBackward), 1);
- EXPECT_EQ(ref_manager.GetRefFrameCountByType(RefUpdateType::kLast), 1);
- std::vector<ReferenceFrame> ref_frame_list =
- ref_manager.GetRefFrameListByPriority();
- EXPECT_EQ(ref_frame_list.size(), order_idx_list.size());
- std::vector<int> expected_global_order_idx = { 4, 0, 1, 2 };
- std::vector<ReferenceName> expected_names = { ReferenceName::kAltrefFrame,
- ReferenceName::kGoldenFrame,
- ReferenceName::kLastFrame,
- ReferenceName::kBwdrefFrame };
- for (size_t i = 0; i < ref_frame_list.size(); ++i) {
- ReferenceFrame &ref_frame = ref_frame_list[i];
- GopFrame gop_frame = ref_manager.GetRefFrameByIndex(ref_frame.index);
- EXPECT_EQ(gop_frame.global_order_idx, expected_global_order_idx[i]);
- EXPECT_EQ(ref_frame.name, expected_names[i]);
- }
-}
-
-TEST(RefFrameManagerTest, GetPrimaryRefFrame) {
- const std::vector<int> order_idx_list = { 0, 4, 2, 1 };
- const int frame_count = static_cast<int>(order_idx_list.size());
- const std::vector<GopFrameType> type_list = { GopFrameType::kRegularKey,
- GopFrameType::kRegularArf,
- GopFrameType::kIntermediateArf,
- GopFrameType::kRegularLeaf };
- const std::vector<int> layer_depth_list = { 0, 2, 4, 6 };
- RefFrameManager ref_manager(kRefFrameTableSize, 7);
- for (int coding_idx = 0; coding_idx < frame_count; ++coding_idx) {
- GopFrame gop_frame =
- GopFrameBasic(0, 0, coding_idx, order_idx_list[coding_idx],
- layer_depth_list[coding_idx], 0, type_list[coding_idx]);
- ref_manager.UpdateRefFrameTable(&gop_frame);
- }
-
- for (int i = 0; i < frame_count; ++i) {
- // Test frame that share the same layer depth with a reference frame
- int layer_depth = layer_depth_list[i];
- // Set different frame type
- GopFrameType type = type_list[(i + 1) % frame_count];
- GopFrame gop_frame = GopFrameBasic(0, 0, 0, 0, layer_depth, 0, type);
- gop_frame.ref_frame_list = ref_manager.GetRefFrameListByPriority();
- ReferenceFrame ref_frame = ref_manager.GetPrimaryRefFrame(gop_frame);
- GopFrame primary_ref_frame =
- ref_manager.GetRefFrameByIndex(ref_frame.index);
- // The GetPrimaryRefFrame should find the ref_frame with matched layer depth
- // because it's our first priority
- EXPECT_EQ(primary_ref_frame.layer_depth, gop_frame.layer_depth);
- }
-
- const std::vector<int> mid_layer_depth_list = { 1, 3, 5 };
- for (int i = 0; i < 3; ++i) {
- // Test frame that share the same frame type with a reference frame
- GopFrameType type = type_list[i];
- // Let the frame layer_depth sit in the middle of two reference frames
- int layer_depth = mid_layer_depth_list[i];
- GopFrame gop_frame = GopFrameBasic(0, 0, 0, 0, layer_depth, 0, type);
- gop_frame.ref_frame_list = ref_manager.GetRefFrameListByPriority();
- ReferenceFrame ref_frame = ref_manager.GetPrimaryRefFrame(gop_frame);
- GopFrame primary_ref_frame =
- ref_manager.GetRefFrameByIndex(ref_frame.index);
- // The GetPrimaryRefFrame should find the ref_frame with matched frame type
- // Here we use coding_idx to confirm that.
- EXPECT_EQ(primary_ref_frame.coding_idx, i);
- }
-}
-
-TEST_F(RateControlQModeTest, TestKeyframeDetection) {
- FirstpassInfo firstpass_info;
- const std::string kFirstpassStatsFile = "firstpass_stats";
- ASSERT_NO_FATAL_FAILURE(
- ReadFirstpassInfo(kFirstpassStatsFile, &firstpass_info, kFrameLimit));
- EXPECT_THAT(GetKeyFrameList(firstpass_info),
- ElementsAre(0, 30, 60, 90, 120, 150, 180, 210, 240));
-}
-
-MATCHER_P(GopFrameMatches, expected, "") {
-#define COMPARE_FIELD(FIELD) \
- do { \
- if (arg.FIELD != expected.FIELD) { \
- *result_listener << "where " #FIELD " is " << arg.FIELD \
- << " but should be " << expected.FIELD; \
- return false; \
- } \
- } while (0)
- COMPARE_FIELD(is_valid);
- COMPARE_FIELD(order_idx);
- COMPARE_FIELD(coding_idx);
- COMPARE_FIELD(global_order_idx);
- COMPARE_FIELD(global_coding_idx);
- COMPARE_FIELD(is_key_frame);
- COMPARE_FIELD(is_arf_frame);
- COMPARE_FIELD(is_show_frame);
- COMPARE_FIELD(is_golden_frame);
- COMPARE_FIELD(colocated_ref_idx);
- COMPARE_FIELD(update_ref_idx);
- COMPARE_FIELD(layer_depth);
-#undef COMPARE_FIELD
-
- return true;
-}
-
-// Helper for tests which need to set update_ref_idx, but for which the indices
-// and depth don't matter (other than to allow creating multiple GopFrames which
-// are distinguishable).
-GopFrame GopFrameUpdateRefIdx(int index, GopFrameType gop_frame_type,
- int update_ref_idx) {
- GopFrame frame =
- GopFrameBasic(0, 0, index, index, /*depth=*/0, 0, gop_frame_type);
- frame.update_ref_idx = update_ref_idx;
- return frame;
-}
-
-TEST_F(RateControlQModeTest, TestInvalidRateControlParam) {
- // Default constructed RateControlParam should not be valid.
- RateControlParam rc_param = {};
- EXPECT_NE(AV1RateControlQMode().SetRcParam(rc_param).code, AOM_CODEC_OK);
-}
-
-TEST_F(RateControlQModeTest, TestInvalidMaxGopShowFrameCount) {
- rc_param_.min_gop_show_frame_count = 2;
- rc_param_.max_gop_show_frame_count = 3;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("max_gop_show_frame_count (3) must be at least 4"));
-}
-
-TEST_F(RateControlQModeTest, TestInvalidMinGopShowFrameCount) {
- rc_param_.min_gop_show_frame_count = 9;
- rc_param_.max_gop_show_frame_count = 8;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("may not be less than min_gop_show_frame_count (9)"));
-}
-
-TEST_F(RateControlQModeTest, TestInvalidRefFrameTableSize) {
- rc_param_.ref_frame_table_size = 9;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("ref_frame_table_size (9) must be in the range"));
-}
-
-TEST_F(RateControlQModeTest, TestInvalidMaxRefFrames) {
- rc_param_.max_ref_frames = 8;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("max_ref_frames (8) must be in the range"));
-}
-
-TEST_F(RateControlQModeTest, TestInvalidBaseQIndex) {
- rc_param_.base_q_index = 256;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("base_q_index (256) must be in the range"));
-}
-
-TEST_F(RateControlQModeTest, TestInvalidFrameHeight) {
- rc_param_.frame_height = 15;
- Status status = AV1RateControlQMode().SetRcParam(rc_param_);
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("frame_height (15) must be in the range"));
-}
-
-TEST_F(RateControlQModeTest, TestGetRefFrameTableListFirstGop) {
- AV1RateControlQMode rc;
- rc_param_.ref_frame_table_size = 3;
- ASSERT_THAT(rc.SetRcParam(rc_param_), IsOkStatus());
-
- const auto invalid = GopFrameInvalid();
- const auto frame0 = GopFrameUpdateRefIdx(0, GopFrameType::kRegularKey, -1);
- const auto frame1 = GopFrameUpdateRefIdx(1, GopFrameType::kRegularLeaf, 2);
- const auto frame2 = GopFrameUpdateRefIdx(2, GopFrameType::kRegularLeaf, 0);
-
- const auto matches_invalid = GopFrameMatches(invalid);
- const auto matches_frame0 = GopFrameMatches(frame0);
- const auto matches_frame1 = GopFrameMatches(frame1);
- const auto matches_frame2 = GopFrameMatches(frame2);
-
- GopStruct gop_struct;
- gop_struct.global_coding_idx_offset = 0; // This is the first GOP.
- gop_struct.gop_frame_list = { frame0, frame1, frame2 };
- ASSERT_THAT(
- // For the first GOP only, GetRefFrameTableList can be passed a
- // default-constructed RefFrameTable (because it's all going to be
- // replaced by the key frame anyway).
- rc.GetRefFrameTableList(gop_struct, {}, RefFrameTable()),
- ElementsAre(
- ElementsAre(matches_invalid, matches_invalid, matches_invalid),
- ElementsAre(matches_frame0, matches_frame0, matches_frame0),
- ElementsAre(matches_frame0, matches_frame0, matches_frame1),
- ElementsAre(matches_frame2, matches_frame0, matches_frame1)));
-}
-
-TEST_F(RateControlQModeTest, TestGetRefFrameTableListNotFirstGop) {
- AV1RateControlQMode rc;
- rc_param_.ref_frame_table_size = 3;
- ASSERT_THAT(rc.SetRcParam(rc_param_), IsOkStatus());
-
- const auto previous = GopFrameUpdateRefIdx(0, GopFrameType::kRegularKey, -1);
- const auto frame0 = GopFrameUpdateRefIdx(5, GopFrameType::kRegularLeaf, 2);
- const auto frame1 = GopFrameUpdateRefIdx(6, GopFrameType::kRegularLeaf, -1);
- const auto frame2 = GopFrameUpdateRefIdx(7, GopFrameType::kRegularLeaf, 0);
-
- // Frames in the initial table should have coding_idx of -1
- // to prevent propagating TPL stats to already coded frames.
- auto previous_modified = previous;
- previous_modified.coding_idx = -1;
- const auto matches_previous = GopFrameMatches(previous_modified);
- const auto matches_frame0 = GopFrameMatches(frame0);
- const auto matches_frame2 = GopFrameMatches(frame2);
-
- GopStruct gop_struct;
- gop_struct.global_coding_idx_offset = 5; // This is not the first GOP.
- gop_struct.gop_frame_list = { frame0, frame1, frame2 };
- ASSERT_THAT(
- rc.GetRefFrameTableList(gop_struct, {}, RefFrameTable(3, previous)),
- ElementsAre(
- ElementsAre(matches_previous, matches_previous, matches_previous),
- ElementsAre(matches_previous, matches_previous, matches_frame0),
- ElementsAre(matches_previous, matches_previous, matches_frame0),
- ElementsAre(matches_frame2, matches_previous, matches_frame0)));
-}
-
-TEST_F(RateControlQModeTest, TestGopIntervals) {
- FirstpassInfo firstpass_info;
- ASSERT_NO_FATAL_FAILURE(
- ReadFirstpassInfo("firstpass_stats", &firstpass_info, kFrameLimit));
- AV1RateControlQMode rc;
- ASSERT_THAT(rc.SetRcParam(rc_param_), IsOkStatus());
-
- const auto gop_info = rc.DetermineGopInfo(firstpass_info);
- ASSERT_THAT(gop_info.status(), IsOkStatus());
- std::vector<int> gop_interval_list;
- std::transform(gop_info->begin(), gop_info->end(),
- std::back_inserter(gop_interval_list),
- [](GopStruct const &x) { return x.show_frame_count; });
- EXPECT_THAT(gop_interval_list,
- ElementsAre(21, 9, 30, 30, 16, 14, 21, 9, 30, 12, 16, 2, 30, 10));
-}
-
-// TODO(b/242892473): Add a test which passes lookahead GOPs.
-TEST_F(RateControlQModeTest, TestGetGopEncodeInfo) {
- FirstpassInfo firstpass_info;
- ASSERT_NO_FATAL_FAILURE(
- ReadFirstpassInfo("firstpass_stats", &firstpass_info, 50));
- AV1RateControlQMode rc;
- rc_param_.max_gop_show_frame_count = 16;
- rc_param_.max_ref_frames = 3;
- rc_param_.base_q_index = 117;
- ASSERT_THAT(rc.SetRcParam(rc_param_), IsOkStatus());
- const auto gop_info = rc.DetermineGopInfo(firstpass_info);
- ASSERT_THAT(gop_info.status(), IsOkStatus());
- const GopStructList &gop_list = *gop_info;
- const aom_rational_t frame_rate = { 30, 1 };
- const aom::VideoInfo input_video = {
- kFrameWidth, kFrameHeight,
- frame_rate, AOM_IMG_FMT_I420,
- 50, libaom_test::GetDataPath() + "/hantro_collage_w352h288.yuv"
- };
- DuckyEncode ducky_encode(input_video, BLOCK_64X64, rc_param_.max_ref_frames,
- 3, rc_param_.base_q_index);
-
- std::vector<aom::GopEncodeInfo> gop_encode_info_list;
- for (const auto &gop_struct : gop_list) {
- const auto gop_encode_info =
- rc.GetTplPassGopEncodeInfo(gop_struct, firstpass_info);
- ASSERT_TRUE(gop_encode_info.ok());
- gop_encode_info_list.push_back(gop_encode_info.value());
- }
-
- // Read TPL stats
- std::vector<TplGopStats> tpl_gop_list = ducky_encode.ComputeTplStats(
- firstpass_info.stats_list, gop_list, gop_encode_info_list);
-
- RefFrameTable ref_frame_table;
- int num_gop_skipped = 0;
- for (size_t gop_idx = 0; gop_idx < gop_list.size(); gop_idx++) {
- size_t tpl_gop_idx = gop_idx - num_gop_skipped;
- const auto gop_encode_info =
- rc.GetGopEncodeInfo(gop_list[gop_idx], tpl_gop_list[tpl_gop_idx], {},
- firstpass_info, ref_frame_table);
- ASSERT_THAT(gop_encode_info.status(), IsOkStatus());
- for (auto &frame_param : gop_encode_info->param_list) {
- EXPECT_LE(frame_param.q_index, rc_param_.base_q_index);
- }
- ref_frame_table = gop_encode_info->final_snapshot;
- for (auto &gop_frame : ref_frame_table) {
- EXPECT_LE(static_cast<int>(gop_frame.ref_frame_list.size()),
- rc_param_.max_ref_frames);
- }
- }
-}
-
-TEST_F(RateControlQModeTest, GetGopEncodeInfoWrongGopSize) {
- GopStruct gop_struct;
- gop_struct.gop_frame_list.assign(7, GopFrameInvalid());
- TplGopStats tpl_gop_stats;
- tpl_gop_stats.frame_stats_list.assign(
- 5, CreateToyTplFrameStatsWithDiffSizes(8, 8));
- AV1RateControlQMode rc;
- const Status status =
- rc.GetGopEncodeInfo(gop_struct, tpl_gop_stats, {}, {}, RefFrameTable())
- .status();
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("Frame count of GopStruct (7) doesn't match frame "
- "count of TPL stats (5)"));
-}
-
-TEST_F(RateControlQModeTest, GetGopEncodeInfoRefFrameMissingBlockStats) {
- GopStruct gop_struct;
- // Frames 0 and 2 are reference frames.
- gop_struct.gop_frame_list = {
- GopFrameUpdateRefIdx(0, GopFrameType::kRegularKey, 1),
- GopFrameUpdateRefIdx(1, GopFrameType::kRegularLeaf, -1),
- GopFrameUpdateRefIdx(2, GopFrameType::kRegularLeaf, 2),
- };
- gop_struct.show_frame_count = 3;
-
- // Only frame 0 has TPL block stats.
- TplGopStats tpl_gop_stats;
- tpl_gop_stats.frame_stats_list.assign(3, { 8, 176, 144, false, {}, {} });
- tpl_gop_stats.frame_stats_list[0] = CreateToyTplFrameStatsWithDiffSizes(8, 8);
-
- AV1RateControlQMode rc;
- const Status status =
- rc.GetGopEncodeInfo(gop_struct, tpl_gop_stats, {}, {}, RefFrameTable())
- .status();
- EXPECT_EQ(status.code, AOM_CODEC_INVALID_PARAM);
- EXPECT_THAT(status.message,
- HasSubstr("The frame with global_coding_idx 2 is a reference "
- "frame, but has no TPL stats"));
-}
-
-// MockRateControlQMode is provided for the use of clients of libaom, but it's
-// not expected that it will be used in any real libaom tests.
-// This simple "toy" test exists solely to verify the integration of gmock into
-// the aom build.
-TEST_F(RateControlQModeTest, TestMock) {
- MockRateControlQMode mock_rc;
- EXPECT_CALL(mock_rc,
- DetermineGopInfo(Field(&FirstpassInfo::num_mbs_16x16, 1000)))
- .WillOnce(Return(aom::Status{ AOM_CODEC_ERROR, "message" }));
- FirstpassInfo firstpass_info = {};
- firstpass_info.num_mbs_16x16 = 1000;
- const auto result = mock_rc.DetermineGopInfo(firstpass_info);
- EXPECT_EQ(result.status().code, AOM_CODEC_ERROR);
- EXPECT_EQ(result.status().message, "message");
-}
-
-TEST_F(RateControlQModeTest, TestKMeans) {
- // The distance between intended centroids is designed so each cluster is far
- // enough from others.
- std::vector<int> centroids_ref = { 16, 48, 80, 112, 144, 176, 208, 240 };
- std::vector<uint8_t> random_input;
- const int num_sample_per_cluster = 10;
- const int num_clusters = 8;
- std::default_random_engine generator;
- for (const int centroid : centroids_ref) {
- // This is to make sure each cluster is far enough from others.
- std::uniform_int_distribution<int> distribution(centroid - 8, centroid + 8);
- for (int i = 0; i < num_sample_per_cluster; ++i) {
- const int random_sample = distribution(generator);
- random_input.push_back(static_cast<uint8_t>(random_sample));
- }
- }
- std::shuffle(random_input.begin(), random_input.end(), generator);
- std::unordered_map<int, int> kmeans_result =
- aom::internal::KMeans(random_input, num_clusters);
-
- std::unordered_set<int> found_centroids;
- for (const auto &result : kmeans_result) {
- found_centroids.insert(result.second);
- }
- // Verify there're num_clusters in the k-means result.
- EXPECT_EQ(static_cast<int>(found_centroids.size()), num_clusters);
-
- // Verify that for each data point, the assigned centroid is the closest one.
- for (const auto &result : kmeans_result) {
- const int distance_from_cluster_centroid =
- abs(result.first - result.second);
- for (const int centroid : found_centroids) {
- if (centroid == result.second) continue;
- const int distance_from_other_cluster_centroid =
- abs(result.first - centroid);
- EXPECT_LE(distance_from_cluster_centroid,
- distance_from_other_cluster_centroid);
- }
- }
-}
-
-} // namespace aom
-
-int main(int argc, char **argv) {
- ::testing::InitGoogleTest(&argc, argv);
- std::srand(0);
- return RUN_ALL_TESTS();
-}
diff --git a/test/ratectrl_rtc_test.cc b/test/ratectrl_rtc_test.cc
index 7910b41..0d8d48f 100644
--- a/test/ratectrl_rtc_test.cc
+++ b/test/ratectrl_rtc_test.cc
@@ -16,8 +16,6 @@
#include "test/codec_factory.h"
#include "test/encode_test_driver.h"
#include "test/util.h"
-#include "test/y4m_video_source.h"
-#include "test/yuv_video_source.h"
#include "test/i420_video_source.h"
#include "third_party/googletest/src/googletest/include/gtest/gtest.h"
@@ -25,7 +23,11 @@
constexpr size_t kNumFrames = 450;
-constexpr int kTemporalId[4] = { 0, 2, 1, 2 };
+const int kTemporalId3Layer[4] = { 0, 2, 1, 2 };
+const int kTemporalId2Layer[2] = { 0, 1 };
+const int kTemporalRateAllocation3Layer[3] = { 50, 70, 100 };
+const int kTemporalRateAllocation2Layer[2] = { 60, 100 };
+const int kSpatialLayerBitrate[3] = { 200, 500, 900 };
// Parameter: aq mode: 0 and 3
class RcInterfaceTest : public ::libaom_test::EncoderTest,
@@ -33,7 +35,8 @@
public:
RcInterfaceTest()
: EncoderTest(GET_PARAM(0)), aq_mode_(GET_PARAM(1)), key_interval_(3000),
- encoder_exit_(false), layer_frame_cnt_(0) {
+ encoder_exit_(false), layer_frame_cnt_(0), superframe_cnt_(0),
+ dynamic_temporal_layers_(false), dynamic_spatial_layers_(false) {
memset(&svc_params_, 0, sizeof(svc_params_));
memset(&layer_id_, 0, sizeof(layer_id_));
}
@@ -63,7 +66,14 @@
if (use_svc) {
frame_params_.spatial_layer_id =
layer_frame_cnt_ % rc_cfg_.ss_number_layers;
- frame_params_.temporal_layer_id = kTemporalId[video->frame() % 4];
+ if (rc_cfg_.ts_number_layers == 3)
+ frame_params_.temporal_layer_id =
+ kTemporalId3Layer[superframe_cnt_ % 4];
+ else if (rc_cfg_.ts_number_layers == 2)
+ frame_params_.temporal_layer_id =
+ kTemporalId2Layer[superframe_cnt_ % 2];
+ else
+ frame_params_.temporal_layer_id = 0;
layer_id_.spatial_layer_id = frame_params_.spatial_layer_id;
layer_id_.temporal_layer_id = frame_params_.temporal_layer_id;
encoder->Control(AV1E_SET_SVC_LAYER_ID, &layer_id_);
@@ -72,6 +82,57 @@
frame_params_.frame_type =
layer_frame_cnt_ % key_int == 0 ? aom::kKeyFrame : aom::kInterFrame;
encoder_exit_ = video->frame() == kNumFrames;
+ frame_flags_ = 0;
+
+ if (dynamic_temporal_layers_) {
+ if (superframe_cnt_ == 100 && layer_id_.spatial_layer_id == 0) {
+ // Go down to 2 temporal layers.
+ SetConfigSvc(3, 2);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ } else if (superframe_cnt_ == 200 && layer_id_.spatial_layer_id == 0) {
+ // Go down to 1 temporal layer.
+ SetConfigSvc(3, 1);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ } else if (superframe_cnt_ == 300 && layer_id_.spatial_layer_id == 0) {
+ // Go back up to 3 temporal layers.
+ SetConfigSvc(3, 3);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ }
+ } else if (dynamic_spatial_layers_) {
+ // In this example the #spatial layers is modified on the fly,
+ // so we go from (120p,240p,480p) to (240p,480p), etc.
+ if (superframe_cnt_ == 100 && layer_id_.spatial_layer_id == 0) {
+ // Change to 2 spatial layers (240p, 480p).
+ SetConfigSvc(2, 3);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ } else if (superframe_cnt_ == 200 && layer_id_.spatial_layer_id == 0) {
+ // Change to 1 spatial layer (480p).
+ SetConfigSvc(1, 3);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ } else if (superframe_cnt_ == 300 && layer_id_.spatial_layer_id == 0) {
+ // Go back to 3 spatial layers (120p, 240p, 480p).
+ SetConfigSvc(3, 3);
+ encoder->Control(AV1E_SET_SVC_PARAMS, &svc_params_);
+ // In the fixed SVC mode (which is what is used in this test):
+ // Key frame is required here on SL0 since 120p will try to predict
+ // from LAST which was the 480p, so decoder will throw an error
+ // (reference must be smaller than 4x4). In the flexible mode
+ // (not used here) we can set the frame flags to predict off the 2x2
+ // reference instead,
+ frame_flags_ = AOM_EFLAG_FORCE_KF;
+ frame_params_.frame_type = aom::kKeyFrame;
+ ASSERT_TRUE(rc_api_->UpdateRateControl(rc_cfg_));
+ }
+ }
+ // TODO(marpan): Add dynamic spatial layers based on 0 layer bitrate.
+ // That is actual usage in SW where configuration (#spatial, #temporal)
+ // layers is fixed, but top layer is dropped or re-enabled based on
+ // bitrate. This requires external RC to handle dropped (zero-size) frames.
}
void PostEncodeFrameHook(::libaom_test::Encoder *encoder) override {
@@ -79,10 +140,20 @@
return;
}
layer_frame_cnt_++;
+ if (layer_id_.spatial_layer_id == rc_cfg_.ss_number_layers - 1)
+ superframe_cnt_++;
int qp;
encoder->Control(AOME_GET_LAST_QUANTIZER, &qp);
rc_api_->ComputeQP(frame_params_);
ASSERT_EQ(rc_api_->GetQP(), qp);
+ int encoder_lpf_level;
+ encoder->Control(AOME_GET_LOOPFILTER_LEVEL, &encoder_lpf_level);
+ aom::AV1LoopfilterLevel loopfilter_level = rc_api_->GetLoopfilterLevel();
+ ASSERT_EQ(loopfilter_level.filter_level[0], encoder_lpf_level);
+ aom::AV1CdefInfo cdef_level = rc_api_->GetCdefInfo();
+ int cdef_y_strengths[16];
+ encoder->Control(AV1E_GET_LUMA_CDEF_STRENGTH, cdef_y_strengths);
+ ASSERT_EQ(cdef_level.cdef_strength_y, cdef_y_strengths[0]);
}
void FramePktHook(const aom_codec_cx_pkt_t *pkt) override {
@@ -125,7 +196,7 @@
void RunSvc() {
key_interval_ = 10000;
- SetConfigSvc();
+ SetConfigSvc(3, 3);
rc_api_ = aom::AV1RateControlRTC::Create(rc_cfg_);
frame_params_.spatial_layer_id = 0;
frame_params_.temporal_layer_id = 0;
@@ -138,7 +209,35 @@
void RunSvcPeriodicKey() {
key_interval_ = 100;
- SetConfigSvc();
+ SetConfigSvc(3, 3);
+ rc_api_ = aom::AV1RateControlRTC::Create(rc_cfg_);
+ frame_params_.spatial_layer_id = 0;
+ frame_params_.temporal_layer_id = 0;
+
+ ::libaom_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30,
+ 1, 0, kNumFrames);
+
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ }
+
+ void RunSvcDynamicTemporal() {
+ dynamic_temporal_layers_ = true;
+ key_interval_ = 10000;
+ SetConfigSvc(3, 3);
+ rc_api_ = aom::AV1RateControlRTC::Create(rc_cfg_);
+ frame_params_.spatial_layer_id = 0;
+ frame_params_.temporal_layer_id = 0;
+
+ ::libaom_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30,
+ 1, 0, kNumFrames);
+
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ }
+
+ void RunSvcDynamicSpatial() {
+ dynamic_spatial_layers_ = true;
+ key_interval_ = 10000;
+ SetConfigSvc(3, 3);
rc_api_ = aom::AV1RateControlRTC::Create(rc_cfg_);
frame_params_.spatial_layer_id = 0;
frame_params_.temporal_layer_id = 0;
@@ -191,12 +290,11 @@
cfg_.kf_max_dist = key_interval_;
}
- void SetConfigSvc() {
+ void SetConfigSvc(int number_spatial_layers, int number_temporal_layers) {
rc_cfg_.width = 640;
rc_cfg_.height = 480;
- rc_cfg_.max_quantizer = 52;
+ rc_cfg_.max_quantizer = 56;
rc_cfg_.min_quantizer = 2;
- rc_cfg_.target_bandwidth = 1000;
rc_cfg_.buf_initial_sz = 600;
rc_cfg_.buf_optimal_sz = 600;
rc_cfg_.buf_sz = 1000;
@@ -204,85 +302,117 @@
rc_cfg_.overshoot_pct = 50;
rc_cfg_.max_intra_bitrate_pct = 1000;
rc_cfg_.framerate = 30.0;
- rc_cfg_.ss_number_layers = 3;
- rc_cfg_.ts_number_layers = 3;
rc_cfg_.aq_mode = aq_mode_;
-
- rc_cfg_.scaling_factor_num[0] = 1;
- rc_cfg_.scaling_factor_den[0] = 4;
- rc_cfg_.scaling_factor_num[1] = 2;
- rc_cfg_.scaling_factor_den[1] = 4;
- rc_cfg_.scaling_factor_num[2] = 4;
- rc_cfg_.scaling_factor_den[2] = 4;
-
- rc_cfg_.ts_rate_decimator[0] = 4;
- rc_cfg_.ts_rate_decimator[1] = 2;
- rc_cfg_.ts_rate_decimator[2] = 1;
-
- rc_cfg_.layer_target_bitrate[0] = 100;
- rc_cfg_.layer_target_bitrate[1] = 140;
- rc_cfg_.layer_target_bitrate[2] = 200;
- rc_cfg_.layer_target_bitrate[3] = 250;
- rc_cfg_.layer_target_bitrate[4] = 350;
- rc_cfg_.layer_target_bitrate[5] = 500;
- rc_cfg_.layer_target_bitrate[6] = 450;
- rc_cfg_.layer_target_bitrate[7] = 630;
- rc_cfg_.layer_target_bitrate[8] = 900;
-
- for (int sl = 0; sl < rc_cfg_.ss_number_layers; ++sl) {
- for (int tl = 0; tl < rc_cfg_.ts_number_layers; ++tl) {
- const int i = sl * rc_cfg_.ts_number_layers + tl;
- rc_cfg_.max_quantizers[i] = 56;
- rc_cfg_.min_quantizers[i] = 2;
- }
- }
+ rc_cfg_.ss_number_layers = number_spatial_layers;
+ rc_cfg_.ts_number_layers = number_temporal_layers;
// Encoder settings for ground truth.
cfg_.g_w = 640;
cfg_.g_h = 480;
- svc_params_.number_spatial_layers = 3;
- svc_params_.number_temporal_layers = 3;
- cfg_.g_timebase.num = 1;
- cfg_.g_timebase.den = 30;
- svc_params_.scaling_factor_num[0] = 72;
- svc_params_.scaling_factor_den[0] = 288;
- svc_params_.scaling_factor_num[1] = 144;
- svc_params_.scaling_factor_den[1] = 288;
- svc_params_.scaling_factor_num[2] = 288;
- svc_params_.scaling_factor_den[2] = 288;
- for (int i = 0; i < AOM_MAX_LAYERS; ++i) {
- svc_params_.max_quantizers[i] = 56;
- svc_params_.min_quantizers[i] = 2;
- }
- cfg_.rc_end_usage = AOM_CBR;
- cfg_.g_lag_in_frames = 0;
- cfg_.g_error_resilient = 0;
- // 3 temporal layers
- svc_params_.framerate_factor[0] = 4;
- svc_params_.framerate_factor[1] = 2;
- svc_params_.framerate_factor[2] = 1;
-
+ cfg_.rc_max_quantizer = 56;
+ cfg_.rc_min_quantizer = 2;
cfg_.rc_buf_initial_sz = 600;
cfg_.rc_buf_optimal_sz = 600;
cfg_.rc_buf_sz = 1000;
- cfg_.rc_min_quantizer = 2;
- cfg_.rc_max_quantizer = 56;
+ cfg_.rc_overshoot_pct = 50;
+ cfg_.rc_undershoot_pct = 50;
cfg_.g_threads = 1;
cfg_.kf_min_dist = key_interval_;
cfg_.kf_max_dist = key_interval_;
- cfg_.rc_target_bitrate = 1000;
- cfg_.rc_overshoot_pct = 50;
- cfg_.rc_undershoot_pct = 50;
+ cfg_.g_timebase.num = 1;
+ cfg_.g_timebase.den = 30;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 0;
+ svc_params_.number_spatial_layers = number_spatial_layers;
+ svc_params_.number_temporal_layers = number_temporal_layers;
- svc_params_.layer_target_bitrate[0] = 100;
- svc_params_.layer_target_bitrate[1] = 140;
- svc_params_.layer_target_bitrate[2] = 200;
- svc_params_.layer_target_bitrate[3] = 250;
- svc_params_.layer_target_bitrate[4] = 350;
- svc_params_.layer_target_bitrate[5] = 500;
- svc_params_.layer_target_bitrate[6] = 450;
- svc_params_.layer_target_bitrate[7] = 630;
- svc_params_.layer_target_bitrate[8] = 900;
+ // Scale factors.
+ if (number_spatial_layers == 3) {
+ rc_cfg_.scaling_factor_num[0] = 1;
+ rc_cfg_.scaling_factor_den[0] = 4;
+ rc_cfg_.scaling_factor_num[1] = 2;
+ rc_cfg_.scaling_factor_den[1] = 4;
+ rc_cfg_.scaling_factor_num[2] = 4;
+ rc_cfg_.scaling_factor_den[2] = 4;
+ svc_params_.scaling_factor_num[0] = 1;
+ svc_params_.scaling_factor_den[0] = 4;
+ svc_params_.scaling_factor_num[1] = 2;
+ svc_params_.scaling_factor_den[1] = 4;
+ svc_params_.scaling_factor_num[2] = 4;
+ svc_params_.scaling_factor_den[2] = 4;
+ } else if (number_spatial_layers == 2) {
+ rc_cfg_.scaling_factor_num[0] = 1;
+ rc_cfg_.scaling_factor_den[0] = 2;
+ rc_cfg_.scaling_factor_num[1] = 2;
+ rc_cfg_.scaling_factor_den[1] = 2;
+ svc_params_.scaling_factor_num[0] = 1;
+ svc_params_.scaling_factor_den[0] = 2;
+ svc_params_.scaling_factor_num[1] = 2;
+ svc_params_.scaling_factor_den[1] = 2;
+ } else if (number_spatial_layers == 1) {
+ rc_cfg_.scaling_factor_num[0] = 1;
+ rc_cfg_.scaling_factor_den[0] = 1;
+ svc_params_.scaling_factor_num[0] = 1;
+ svc_params_.scaling_factor_den[0] = 1;
+ }
+
+ // TS rate decimator.
+ if (number_temporal_layers == 3) {
+ rc_cfg_.ts_rate_decimator[0] = 4;
+ rc_cfg_.ts_rate_decimator[1] = 2;
+ rc_cfg_.ts_rate_decimator[2] = 1;
+ svc_params_.framerate_factor[0] = 4;
+ svc_params_.framerate_factor[1] = 2;
+ svc_params_.framerate_factor[2] = 1;
+ } else if (number_temporal_layers == 2) {
+ rc_cfg_.ts_rate_decimator[0] = 2;
+ rc_cfg_.ts_rate_decimator[1] = 1;
+ svc_params_.framerate_factor[0] = 2;
+ svc_params_.framerate_factor[1] = 1;
+ } else if (number_temporal_layers == 1) {
+ rc_cfg_.ts_rate_decimator[0] = 1;
+ svc_params_.framerate_factor[0] = 1;
+ }
+
+ // Bitate.
+ rc_cfg_.target_bandwidth = 0;
+ cfg_.rc_target_bitrate = 0;
+ for (int sl = 0; sl < number_spatial_layers; sl++) {
+ int spatial_bitrate = 0;
+ if (number_spatial_layers <= 3)
+ spatial_bitrate = kSpatialLayerBitrate[sl];
+ for (int tl = 0; tl < number_temporal_layers; tl++) {
+ int layer = sl * number_temporal_layers + tl;
+ if (number_temporal_layers == 3) {
+ rc_cfg_.layer_target_bitrate[layer] =
+ kTemporalRateAllocation3Layer[tl] * spatial_bitrate / 100;
+ svc_params_.layer_target_bitrate[layer] =
+ kTemporalRateAllocation3Layer[tl] * spatial_bitrate / 100;
+ } else if (number_temporal_layers == 2) {
+ rc_cfg_.layer_target_bitrate[layer] =
+ kTemporalRateAllocation2Layer[tl] * spatial_bitrate / 100;
+ svc_params_.layer_target_bitrate[layer] =
+ kTemporalRateAllocation2Layer[tl] * spatial_bitrate / 100;
+ } else if (number_temporal_layers == 1) {
+ rc_cfg_.layer_target_bitrate[layer] = spatial_bitrate;
+ svc_params_.layer_target_bitrate[layer] = spatial_bitrate;
+ }
+ }
+ rc_cfg_.target_bandwidth += spatial_bitrate;
+ cfg_.rc_target_bitrate += spatial_bitrate;
+ }
+
+ // Layer min/max quantizer.
+ for (int sl = 0; sl < number_spatial_layers; ++sl) {
+ for (int tl = 0; tl < number_temporal_layers; ++tl) {
+ const int i = sl * number_temporal_layers + tl;
+ rc_cfg_.max_quantizers[i] = rc_cfg_.max_quantizer;
+ rc_cfg_.min_quantizers[i] = rc_cfg_.min_quantizer;
+ svc_params_.max_quantizers[i] = cfg_.rc_max_quantizer;
+ svc_params_.min_quantizers[i] = cfg_.rc_min_quantizer;
+ }
+ }
}
std::unique_ptr<aom::AV1RateControlRTC> rc_api_;
@@ -294,6 +424,9 @@
aom_svc_params_t svc_params_;
aom_svc_layer_id_t layer_id_;
int layer_frame_cnt_;
+ int superframe_cnt_;
+ bool dynamic_temporal_layers_;
+ bool dynamic_spatial_layers_;
};
TEST_P(RcInterfaceTest, OneLayer) { RunOneLayer(); }
@@ -304,6 +437,10 @@
TEST_P(RcInterfaceTest, SvcPeriodicKey) { RunSvcPeriodicKey(); }
+TEST_P(RcInterfaceTest, SvcDynamicTemporal) { RunSvcDynamicTemporal(); }
+
+TEST_P(RcInterfaceTest, SvcDynamicSpatial) { RunSvcDynamicSpatial(); }
+
AV1_INSTANTIATE_TEST_SUITE(RcInterfaceTest, ::testing::Values(0, 3));
} // namespace
diff --git a/test/register_state_check.h b/test/register_state_check.h
index 0a150c3..4aad814 100644
--- a/test/register_state_check.h
+++ b/test/register_state_check.h
@@ -26,7 +26,7 @@
// See platform implementations of RegisterStateCheck and
// RegisterStateCheckMMX for details.
-#if defined(_WIN64) && ARCH_X86_64
+#if defined(_WIN64) && AOM_ARCH_X86_64
#undef NOMINMAX
#define NOMINMAX
@@ -86,9 +86,9 @@
class RegisterStateCheck {};
} // namespace libaom_test
-#endif // _WIN64 && ARCH_X86_64
+#endif // _WIN64 && AOM_ARCH_X86_64
-#if (ARCH_X86 || ARCH_X86_64) && defined(__GNUC__)
+#if (AOM_ARCH_X86 || AOM_ARCH_X86_64) && defined(__GNUC__)
namespace libaom_test {
// Checks the FPU tag word pre/post execution to ensure emms has been called.
@@ -122,7 +122,7 @@
class RegisterStateCheckMMX {};
} // namespace libaom_test
-#endif // (ARCH_X86 || ARCH_X86_64) && defined(__GNUC__)
+#endif // (AOM_ARCH_X86 || AOM_ARCH_X86_64) && defined(__GNUC__)
#define API_REGISTER_STATE_CHECK(statement) \
do { \
diff --git a/test/resize_test.cc b/test/resize_test.cc
index e21f4bf..85a72da 100644
--- a/test/resize_test.cc
+++ b/test/resize_test.cc
@@ -96,11 +96,17 @@
};
void ScaleForFrameNumber(unsigned int frame, unsigned int initial_w,
- unsigned int initial_h, unsigned int *w,
- unsigned int *h, int flag_codec) {
+ unsigned int initial_h, int flag_codec,
+ bool change_start_resln, unsigned int *w,
+ unsigned int *h) {
if (frame < 10) {
- *w = initial_w;
- *h = initial_h;
+ if (change_start_resln) {
+ *w = initial_w / 4;
+ *h = initial_h / 4;
+ } else {
+ *w = initial_w;
+ *h = initial_h;
+ }
return;
}
if (frame < 20) {
@@ -179,15 +185,25 @@
limit_ = 150;
}
int flag_codec_;
+ bool change_start_resln_;
virtual ~ResizingVideoSource() {}
protected:
- virtual void Next() {
+ void Begin() override {
+ frame_ = 0;
+ unsigned int width;
+ unsigned int height;
+ ScaleForFrameNumber(frame_, kInitialWidth, kInitialHeight, flag_codec_,
+ change_start_resln_, &width, &height);
+ SetSize(width, height);
+ FillFrame();
+ }
+ void Next() override {
++frame_;
unsigned int width;
unsigned int height;
- ScaleForFrameNumber(frame_, kInitialWidth, kInitialHeight, &width, &height,
- flag_codec_);
+ ScaleForFrameNumber(frame_, kInitialWidth, kInitialHeight, flag_codec_,
+ change_start_resln_, &width, &height);
SetSize(width, height);
FillFrame();
}
@@ -225,6 +241,7 @@
TEST_P(ResizeTest, TestExternalResizeWorks) {
ResizingVideoSource video;
video.flag_codec_ = 0;
+ video.change_start_resln_ = false;
cfg_.g_lag_in_frames = 0;
// We use max(kInitialWidth, kInitialHeight) because during the test
// the width and height of the frame are swapped
@@ -240,8 +257,8 @@
const unsigned int frame = static_cast<unsigned>(info->pts);
unsigned int expected_w;
unsigned int expected_h;
- ScaleForFrameNumber(frame, kInitialWidth, kInitialHeight, &expected_w,
- &expected_h, 0);
+ ScaleForFrameNumber(frame, kInitialWidth, kInitialHeight, video.flag_codec_,
+ video.change_start_resln_, &expected_w, &expected_h);
EXPECT_EQ(expected_w, info->w)
<< "Frame " << frame << " had unexpected width";
EXPECT_EQ(expected_h, info->h)
@@ -596,23 +613,30 @@
mismatch_psnr_ = 0.0;
mismatch_nframes_ = 0;
DefaultConfig();
- ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ // Test external resizing with start resolution equal to
+ // 1. kInitialWidth and kInitialHeight
+ // 2. down-scaled kInitialWidth and kInitialHeight
+ for (int i = 0; i < 2; i++) {
+ video.change_start_resln_ = static_cast<bool>(i);
- // Check we decoded the same number of frames as we attempted to encode
- ASSERT_EQ(frame_info_list_.size(), video.limit());
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
- for (std::vector<FrameInfo>::const_iterator info = frame_info_list_.begin();
- info != frame_info_list_.end(); ++info) {
- const unsigned int frame = static_cast<unsigned>(info->pts);
- unsigned int expected_w;
- unsigned int expected_h;
- ScaleForFrameNumber(frame, kInitialWidth, kInitialHeight, &expected_w,
- &expected_h, 1);
- EXPECT_EQ(expected_w, info->w)
- << "Frame " << frame << " had unexpected width";
- EXPECT_EQ(expected_h, info->h)
- << "Frame " << frame << " had unexpected height";
- EXPECT_EQ(static_cast<unsigned int>(0), GetMismatchFrames());
+ // Check we decoded the same number of frames as we attempted to encode
+ ASSERT_EQ(frame_info_list_.size(), video.limit());
+ for (const auto &info : frame_info_list_) {
+ const unsigned int frame = static_cast<unsigned>(info.pts);
+ unsigned int expected_w;
+ unsigned int expected_h;
+ ScaleForFrameNumber(frame, kInitialWidth, kInitialHeight,
+ video.flag_codec_, video.change_start_resln_,
+ &expected_w, &expected_h);
+ EXPECT_EQ(expected_w, info.w)
+ << "Frame " << frame << " had unexpected width";
+ EXPECT_EQ(expected_h, info.h)
+ << "Frame " << frame << " had unexpected height";
+ EXPECT_EQ(static_cast<unsigned int>(0), GetMismatchFrames());
+ }
+ frame_info_list_.clear();
}
}
@@ -788,7 +812,8 @@
virtual ~ResizingCspVideoSource() {}
};
-#if (defined(DISABLE_TRELLISQ_SEARCH) && DISABLE_TRELLISQ_SEARCH)
+#if (defined(DISABLE_TRELLISQ_SEARCH) && DISABLE_TRELLISQ_SEARCH) || \
+ (defined(CONFIG_MAX_DECODE_PROFILE) && CONFIG_MAX_DECODE_PROFILE < 1)
TEST_P(ResizeCspTest, DISABLED_TestResizeCspWorks) {
#else
TEST_P(ResizeCspTest, TestResizeCspWorks) {
diff --git a/test/rt_end_to_end_test.cc b/test/rt_end_to_end_test.cc
index e5cc163..735d799 100644
--- a/test/rt_end_to_end_test.cc
+++ b/test/rt_end_to_end_test.cc
@@ -42,23 +42,23 @@
{ { 5, { { 0, 36.2 }, { 3, 36.7 } } },
{ 6, { { 0, 36.1 }, { 3, 36.48 } } },
{ 7, { { 0, 35.5 }, { 3, 36.0 } } },
- { 8, { { 0, 35.8 }, { 3, 36.48 } } },
+ { 8, { { 0, 35.8 }, { 3, 36.4 } } },
{ 9, { { 0, 35.5 }, { 3, 36.0 } } },
{ 10, { { 0, 35.3 }, { 3, 35.9 } } } } },
{ "niklas_1280_720_30.y4m",
- { { 5, { { 0, 34.4 }, { 3, 34.30 } } },
- { 6, { { 0, 34.2 }, { 3, 34.2 } } },
- { 7, { { 0, 33.5 }, { 3, 33.48 } } },
- { 8, { { 0, 33.48 }, { 3, 33.48 } } },
- { 9, { { 0, 33.4 }, { 3, 33.4 } } },
- { 10, { { 0, 33.2 }, { 3, 33.2 } } } } },
+ { { 5, { { 0, 34.4 }, { 3, 34.2 } } },
+ { 6, { { 0, 34.1 }, { 3, 34.0 } } },
+ { 7, { { 0, 33.5 }, { 3, 33.1 } } },
+ { 8, { { 0, 33.3 }, { 3, 33.3 } } },
+ { 9, { { 0, 33.3 }, { 3, 33.3 } } },
+ { 10, { { 0, 33.1 }, { 3, 33.1 } } } } },
{ "hantro_collage_w352h288_nv12.yuv",
- { { 5, { { 0, 34.4 }, { 3, 34.30 } } },
- { 6, { { 0, 34.2 }, { 3, 34.2 } } },
+ { { 5, { { 0, 34.4 }, { 3, 34.2 } } },
+ { 6, { { 0, 34.1 }, { 3, 34.1 } } },
{ 7, { { 0, 33.6 }, { 3, 33.6 } } },
- { 8, { { 0, 33.48 }, { 3, 33.48 } } },
- { 9, { { 0, 33.4 }, { 3, 33.4 } } },
- { 10, { { 0, 33.2 }, { 3, 33.2 } } } } } };
+ { 8, { { 0, 33.3 }, { 3, 33.3 } } },
+ { 9, { { 0, 33.3 }, { 3, 33.3 } } },
+ { 10, { { 0, 33.1 }, { 3, 33.1 } } } } } };
typedef struct {
const char *filename;
@@ -82,17 +82,17 @@
{ "hantro_collage_w352h288_nv12.yuv", 8, AOM_IMG_FMT_NV12, AOM_BITS_8, 0 },
};
-// Params: test video, speed, aq mode, threads, tile columns.
+// Params: test video, speed, aq mode, threads, tile columns, tile rows.
class RTEndToEndTest
- : public ::libaom_test::CodecTestWith5Params<TestVideoParam, int,
- unsigned int, int, int>,
+ : public ::libaom_test::CodecTestWith6Params<TestVideoParam, int,
+ unsigned int, int, int, int>,
public ::libaom_test::EncoderTest {
protected:
RTEndToEndTest()
: EncoderTest(GET_PARAM(0)), test_video_param_(GET_PARAM(1)),
cpu_used_(GET_PARAM(2)), psnr_(0.0), nframes_(0),
aq_mode_(GET_PARAM(3)), threads_(GET_PARAM(4)),
- tile_columns_(GET_PARAM(5)) {}
+ tile_columns_(GET_PARAM(5)), tile_rows_(GET_PARAM(6)) {}
virtual ~RTEndToEndTest() {}
@@ -128,6 +128,7 @@
encoder->Control(AV1E_SET_ENABLE_TPL_MODEL, 0);
encoder->Control(AV1E_SET_FRAME_PARALLEL_DECODING, 1);
encoder->Control(AV1E_SET_TILE_COLUMNS, tile_columns_);
+ encoder->Control(AV1E_SET_TILE_ROWS, tile_rows_);
encoder->Control(AOME_SET_CPUUSED, cpu_used_);
encoder->Control(AV1E_SET_TUNE_CONTENT, AOM_CONTENT_DEFAULT);
encoder->Control(AV1E_SET_AQ_MODE, aq_mode_);
@@ -183,6 +184,7 @@
unsigned int aq_mode_;
int threads_;
int tile_columns_;
+ int tile_rows_;
};
class RTEndToEndTestThreaded : public RTEndToEndTest {};
@@ -192,13 +194,15 @@
TEST_P(RTEndToEndTestThreaded, EndtoEndPSNRTest) { DoTest(); }
AV1_INSTANTIATE_TEST_SUITE(RTEndToEndTest, ::testing::ValuesIn(kTestVectors),
- ::testing::Range(5, 11),
+ ::testing::Range(5, 12),
::testing::Values<unsigned int>(0, 3),
- ::testing::Values(1), ::testing::Values(1));
+ ::testing::Values(1), ::testing::Values(1),
+ ::testing::Values(1));
AV1_INSTANTIATE_TEST_SUITE(RTEndToEndTestThreaded,
::testing::ValuesIn(kTestVectors),
- ::testing::Range(5, 11),
+ ::testing::Range(5, 12),
::testing::Values<unsigned int>(0, 3),
- ::testing::Range(2, 5), ::testing::Range(2, 5));
+ ::testing::Range(2, 6), ::testing::Range(1, 5),
+ ::testing::Range(1, 5));
} // namespace
diff --git a/test/sad_test.cc b/test/sad_test.cc
index 98c8f51..0a39ca6 100644
--- a/test/sad_test.cc
+++ b/test/sad_test.cc
@@ -481,42 +481,6 @@
}
};
-#if !CONFIG_REALTIME_ONLY
-class SADx4AvgTest : public ::testing::WithParamInterface<SadMxNx4AvgParam>,
- public SADTestBase {
- public:
- SADx4AvgTest() : SADTestBase(GET_PARAM(0), GET_PARAM(1), GET_PARAM(3)) {}
-
- protected:
- void SADs(unsigned int *results) {
- const uint8_t *references[] = { GetReference(0), GetReference(1),
- GetReference(2), GetReference(3) };
-
- API_REGISTER_STATE_CHECK(GET_PARAM(2)(source_data_, source_stride_,
- references, reference_stride_,
- second_pred_, results));
- }
-
- void CheckSADs() {
- unsigned int reference_sad, exp_sad[4];
-
- SADs(exp_sad);
- for (int block = 0; block < 4; ++block) {
- reference_sad = ReferenceSADavg(block);
-
- EXPECT_EQ(reference_sad, exp_sad[block]) << "block " << block;
- }
- }
-
- void SADForSpeedTest(unsigned int *results,
- const uint8_t *const *references) {
- GET_PARAM(2)
- (source_data_, source_stride_, references, reference_stride_, second_pred_,
- results);
- }
-};
-#endif // !CONFIG_REALTIME_ONLY
-
class SADTest : public ::testing::WithParamInterface<SadMxNParam>,
public SADTestBase {
public:
@@ -635,39 +599,6 @@
}
};
-class DistWtdSADTest : public ::testing::WithParamInterface<DistWtdSadMxhParam>,
- public SADTestBase {
- public:
- DistWtdSADTest() : SADTestBase(GET_PARAM(0), GET_PARAM(1), GET_PARAM(3)) {}
-
- protected:
- unsigned int SAD(int block_idx) {
- unsigned int ret;
- const uint8_t *const reference = GetReference(block_idx);
-
- API_REGISTER_STATE_CHECK(ret = GET_PARAM(2)(source_data_, source_stride_,
- reference, reference_stride_,
- GET_PARAM(0), GET_PARAM(1)));
- return ret;
- }
-
- void CheckSAD() {
- const unsigned int reference_sad = ReferenceSAD(0);
- const unsigned int exp_sad = SAD(0);
-
- ASSERT_EQ(reference_sad, exp_sad);
- }
-
- void SADForSpeedTest(unsigned int *results,
- const uint8_t *const *references) {
- GET_PARAM(2)
- (source_data_, source_stride_, references[0], reference_stride_, width_,
- height_);
- (void)results;
- }
-};
-GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(DistWtdSADTest);
-
class DistWtdSADavgTest
: public ::testing::WithParamInterface<DistWtdSadMxNAvgParam>,
public SADTestBase {
@@ -908,52 +839,6 @@
reference_stride_ = tmp_stride;
}
-TEST_P(DistWtdSADTest, MaxRef) {
- FillConstant(source_data_, source_stride_, 0);
- FillConstant(reference_data_, reference_stride_, mask_);
- CheckSAD();
-}
-
-TEST_P(DistWtdSADTest, MaxSrc) {
- FillConstant(source_data_, source_stride_, mask_);
- FillConstant(reference_data_, reference_stride_, 0);
- CheckSAD();
-}
-
-TEST_P(DistWtdSADTest, ShortRef) {
- const int tmp_stride = reference_stride_;
- reference_stride_ >>= 1;
- FillRandom(source_data_, source_stride_);
- FillRandom(reference_data_, reference_stride_);
- CheckSAD();
- reference_stride_ = tmp_stride;
-}
-
-TEST_P(DistWtdSADTest, UnalignedRef) {
- // The reference frame, but not the source frame, may be unaligned for
- // certain types of searches.
- const int tmp_stride = reference_stride_;
- reference_stride_ -= 1;
- FillRandom(source_data_, source_stride_);
- FillRandom(reference_data_, reference_stride_);
- CheckSAD();
- reference_stride_ = tmp_stride;
-}
-
-TEST_P(DistWtdSADTest, ShortSrc) {
- const int tmp_stride = source_stride_;
- source_stride_ >>= 1;
- int test_count = 2000;
- while (test_count > 0) {
- FillRandom(source_data_, source_stride_);
- FillRandom(reference_data_, reference_stride_);
- CheckSAD();
- if (testing::Test::HasFatalFailure()) break;
- test_count -= 1;
- }
- source_stride_ = tmp_stride;
-}
-
TEST_P(DistWtdSADavgTest, MaxRef) {
FillConstant(source_data_, source_stride_, 0);
FillConstant(reference_data_, reference_stride_, mask_);
@@ -1252,69 +1137,6 @@
using std::make_tuple;
-#if !CONFIG_REALTIME_ONLY
-TEST_P(SADx4AvgTest, DISABLED_Speed) {
- int tmp_stride = reference_stride_;
- reference_stride_ >>= 1;
- FillRandom(source_data_, source_stride_);
- FillRandom(GetReference(0), reference_stride_);
- FillRandom(GetReference(1), reference_stride_);
- FillRandom(GetReference(2), reference_stride_);
- FillRandom(GetReference(3), reference_stride_);
- FillRandom(second_pred_, width_);
- SpeedSAD();
- reference_stride_ = tmp_stride;
-}
-
-TEST_P(SADx4AvgTest, MaxRef) {
- FillConstant(source_data_, source_stride_, 0);
- FillConstant(GetReference(0), reference_stride_, mask_);
- FillConstant(GetReference(1), reference_stride_, mask_);
- FillConstant(GetReference(2), reference_stride_, mask_);
- FillConstant(GetReference(3), reference_stride_, mask_);
- FillConstant(second_pred_, width_, 0);
- CheckSADs();
-}
-
-TEST_P(SADx4AvgTest, MaxSrc) {
- FillConstant(source_data_, source_stride_, mask_);
- FillConstant(GetReference(0), reference_stride_, 0);
- FillConstant(GetReference(1), reference_stride_, 0);
- FillConstant(GetReference(2), reference_stride_, 0);
- FillConstant(GetReference(3), reference_stride_, 0);
- FillConstant(second_pred_, width_, 0);
- CheckSADs();
-}
-
-TEST_P(SADx4AvgTest, ShortRef) {
- int tmp_stride = reference_stride_;
- reference_stride_ >>= 1;
- FillRandom(source_data_, source_stride_);
- FillRandom(GetReference(0), reference_stride_);
- FillRandom(GetReference(1), reference_stride_);
- FillRandom(GetReference(2), reference_stride_);
- FillRandom(GetReference(3), reference_stride_);
- FillRandom(second_pred_, width_);
- CheckSADs();
- reference_stride_ = tmp_stride;
-}
-
-TEST_P(SADx4AvgTest, UnalignedRef) {
- // The reference frame, but not the source frame, may be unaligned for
- // certain types of searches.
- int tmp_stride = reference_stride_;
- reference_stride_ -= 1;
- FillRandom(source_data_, source_stride_);
- FillRandom(GetReference(0), reference_stride_);
- FillRandom(GetReference(1), reference_stride_);
- FillRandom(GetReference(2), reference_stride_);
- FillRandom(GetReference(3), reference_stride_);
- FillRandom(second_pred_, width_);
- CheckSADs();
- reference_stride_ = tmp_stride;
-}
-#endif // !CONFIG_REALTIME_ONLY
-
//------------------------------------------------------------------------------
// C functions
const SadMxNParam c_tests[] = {
@@ -1992,34 +1814,6 @@
INSTANTIATE_TEST_SUITE_P(C, SADSkipx4Test,
::testing::ValuesIn(skip_x4d_c_tests));
-#if !CONFIG_REALTIME_ONLY
-const SadMxNx4AvgParam x4d_avg_c_tests[] = {
- make_tuple(128, 128, &aom_sad128x128x4d_avg_c, -1),
- make_tuple(128, 64, &aom_sad128x64x4d_avg_c, -1),
- make_tuple(64, 128, &aom_sad64x128x4d_avg_c, -1),
- make_tuple(64, 64, &aom_sad64x64x4d_avg_c, -1),
- make_tuple(64, 32, &aom_sad64x32x4d_avg_c, -1),
- make_tuple(32, 64, &aom_sad32x64x4d_avg_c, -1),
- make_tuple(32, 32, &aom_sad32x32x4d_avg_c, -1),
- make_tuple(32, 16, &aom_sad32x16x4d_avg_c, -1),
- make_tuple(16, 32, &aom_sad16x32x4d_avg_c, -1),
- make_tuple(16, 16, &aom_sad16x16x4d_avg_c, -1),
- make_tuple(16, 8, &aom_sad16x8x4d_avg_c, -1),
- make_tuple(8, 16, &aom_sad8x16x4d_avg_c, -1),
- make_tuple(8, 8, &aom_sad8x8x4d_avg_c, -1),
- make_tuple(8, 4, &aom_sad8x4x4d_avg_c, -1),
- make_tuple(4, 8, &aom_sad4x8x4d_avg_c, -1),
- make_tuple(4, 4, &aom_sad4x4x4d_avg_c, -1),
- make_tuple(64, 16, &aom_sad64x16x4d_avg_c, -1),
- make_tuple(16, 64, &aom_sad16x64x4d_avg_c, -1),
- make_tuple(32, 8, &aom_sad32x8x4d_avg_c, -1),
- make_tuple(8, 32, &aom_sad8x32x4d_avg_c, -1),
- make_tuple(16, 4, &aom_sad16x4x4d_avg_c, -1),
- make_tuple(4, 16, &aom_sad4x16x4d_avg_c, -1),
-};
-INSTANTIATE_TEST_SUITE_P(C, SADx4AvgTest, ::testing::ValuesIn(x4d_avg_c_tests));
-#endif // !CONFIG_REALTIME_ONLY
-
//------------------------------------------------------------------------------
// ARM functions
#if HAVE_NEON
@@ -2040,6 +1834,56 @@
make_tuple(8, 4, &aom_sad8x4_neon, -1),
make_tuple(4, 8, &aom_sad4x8_neon, -1),
make_tuple(4, 4, &aom_sad4x4_neon, -1),
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(128, 128, &aom_highbd_sad128x128_neon, 8),
+ make_tuple(128, 64, &aom_highbd_sad128x64_neon, 8),
+ make_tuple(64, 128, &aom_highbd_sad64x128_neon, 8),
+ make_tuple(64, 64, &aom_highbd_sad64x64_neon, 8),
+ make_tuple(64, 32, &aom_highbd_sad64x32_neon, 8),
+ make_tuple(32, 64, &aom_highbd_sad32x64_neon, 8),
+ make_tuple(32, 32, &aom_highbd_sad32x32_neon, 8),
+ make_tuple(32, 16, &aom_highbd_sad32x16_neon, 8),
+ make_tuple(16, 32, &aom_highbd_sad16x32_neon, 8),
+ make_tuple(16, 16, &aom_highbd_sad16x16_neon, 8),
+ make_tuple(16, 8, &aom_highbd_sad16x8_neon, 8),
+ make_tuple(8, 16, &aom_highbd_sad8x16_neon, 8),
+ make_tuple(8, 8, &aom_highbd_sad8x8_neon, 8),
+ make_tuple(8, 4, &aom_highbd_sad8x4_neon, 8),
+ make_tuple(4, 8, &aom_highbd_sad4x8_neon, 8),
+ make_tuple(4, 4, &aom_highbd_sad4x4_neon, 8),
+ make_tuple(128, 128, &aom_highbd_sad128x128_neon, 10),
+ make_tuple(128, 64, &aom_highbd_sad128x64_neon, 10),
+ make_tuple(64, 128, &aom_highbd_sad64x128_neon, 10),
+ make_tuple(64, 64, &aom_highbd_sad64x64_neon, 10),
+ make_tuple(64, 32, &aom_highbd_sad64x32_neon, 10),
+ make_tuple(32, 64, &aom_highbd_sad32x64_neon, 10),
+ make_tuple(32, 32, &aom_highbd_sad32x32_neon, 10),
+ make_tuple(32, 16, &aom_highbd_sad32x16_neon, 10),
+ make_tuple(16, 32, &aom_highbd_sad16x32_neon, 10),
+ make_tuple(16, 16, &aom_highbd_sad16x16_neon, 10),
+ make_tuple(16, 8, &aom_highbd_sad16x8_neon, 10),
+ make_tuple(8, 16, &aom_highbd_sad8x16_neon, 10),
+ make_tuple(8, 8, &aom_highbd_sad8x8_neon, 10),
+ make_tuple(8, 4, &aom_highbd_sad8x4_neon, 10),
+ make_tuple(4, 8, &aom_highbd_sad4x8_neon, 10),
+ make_tuple(4, 4, &aom_highbd_sad4x4_neon, 10),
+ make_tuple(128, 128, &aom_highbd_sad128x128_neon, 12),
+ make_tuple(128, 64, &aom_highbd_sad128x64_neon, 12),
+ make_tuple(64, 128, &aom_highbd_sad64x128_neon, 12),
+ make_tuple(64, 64, &aom_highbd_sad64x64_neon, 12),
+ make_tuple(64, 32, &aom_highbd_sad64x32_neon, 12),
+ make_tuple(32, 64, &aom_highbd_sad32x64_neon, 12),
+ make_tuple(32, 32, &aom_highbd_sad32x32_neon, 12),
+ make_tuple(32, 16, &aom_highbd_sad32x16_neon, 12),
+ make_tuple(16, 32, &aom_highbd_sad16x32_neon, 12),
+ make_tuple(16, 16, &aom_highbd_sad16x16_neon, 12),
+ make_tuple(16, 8, &aom_highbd_sad16x8_neon, 12),
+ make_tuple(8, 16, &aom_highbd_sad8x16_neon, 12),
+ make_tuple(8, 8, &aom_highbd_sad8x8_neon, 12),
+ make_tuple(8, 4, &aom_highbd_sad8x4_neon, 12),
+ make_tuple(4, 8, &aom_highbd_sad4x8_neon, 12),
+ make_tuple(4, 4, &aom_highbd_sad4x4_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
#if !CONFIG_REALTIME_ONLY
make_tuple(64, 16, &aom_sad64x16_neon, -1),
make_tuple(32, 8, &aom_sad32x8_neon, -1),
@@ -2047,7 +1891,27 @@
make_tuple(16, 4, &aom_sad16x4_neon, -1),
make_tuple(8, 32, &aom_sad8x32_neon, -1),
make_tuple(4, 16, &aom_sad4x16_neon, -1),
-#endif
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(64, 16, &aom_highbd_sad64x16_neon, 8),
+ make_tuple(16, 64, &aom_highbd_sad16x64_neon, 8),
+ make_tuple(32, 8, &aom_highbd_sad32x8_neon, 8),
+ make_tuple(8, 32, &aom_highbd_sad8x32_neon, 8),
+ make_tuple(16, 4, &aom_highbd_sad16x4_neon, 8),
+ make_tuple(4, 16, &aom_highbd_sad4x16_neon, 8),
+ make_tuple(64, 16, &aom_highbd_sad64x16_neon, 10),
+ make_tuple(16, 64, &aom_highbd_sad16x64_neon, 10),
+ make_tuple(32, 8, &aom_highbd_sad32x8_neon, 10),
+ make_tuple(8, 32, &aom_highbd_sad8x32_neon, 10),
+ make_tuple(16, 4, &aom_highbd_sad16x4_neon, 10),
+ make_tuple(4, 16, &aom_highbd_sad4x16_neon, 10),
+ make_tuple(64, 16, &aom_highbd_sad64x16_neon, 12),
+ make_tuple(16, 64, &aom_highbd_sad16x64_neon, 12),
+ make_tuple(32, 8, &aom_highbd_sad32x8_neon, 12),
+ make_tuple(8, 32, &aom_highbd_sad8x32_neon, 12),
+ make_tuple(16, 4, &aom_highbd_sad16x4_neon, 12),
+ make_tuple(4, 16, &aom_highbd_sad4x16_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
+#endif // !CONFIG_REALTIME_ONLY
};
INSTANTIATE_TEST_SUITE_P(NEON, SADTest, ::testing::ValuesIn(neon_tests));
@@ -2068,6 +1932,56 @@
make_tuple(8, 4, &aom_sad8x4x4d_neon, -1),
make_tuple(4, 8, &aom_sad4x8x4d_neon, -1),
make_tuple(4, 4, &aom_sad4x4x4d_neon, -1),
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(128, 128, &aom_highbd_sad128x128x4d_neon, 8),
+ make_tuple(128, 64, &aom_highbd_sad128x64x4d_neon, 8),
+ make_tuple(64, 128, &aom_highbd_sad64x128x4d_neon, 8),
+ make_tuple(64, 64, &aom_highbd_sad64x64x4d_neon, 8),
+ make_tuple(64, 32, &aom_highbd_sad64x32x4d_neon, 8),
+ make_tuple(32, 64, &aom_highbd_sad32x64x4d_neon, 8),
+ make_tuple(32, 32, &aom_highbd_sad32x32x4d_neon, 8),
+ make_tuple(32, 16, &aom_highbd_sad32x16x4d_neon, 8),
+ make_tuple(16, 32, &aom_highbd_sad16x32x4d_neon, 8),
+ make_tuple(16, 16, &aom_highbd_sad16x16x4d_neon, 8),
+ make_tuple(16, 8, &aom_highbd_sad16x8x4d_neon, 8),
+ make_tuple(8, 16, &aom_highbd_sad8x16x4d_neon, 8),
+ make_tuple(8, 8, &aom_highbd_sad8x8x4d_neon, 8),
+ make_tuple(8, 4, &aom_highbd_sad8x4x4d_neon, 8),
+ make_tuple(4, 8, &aom_highbd_sad4x8x4d_neon, 8),
+ make_tuple(4, 4, &aom_highbd_sad4x4x4d_neon, 8),
+ make_tuple(128, 128, &aom_highbd_sad128x128x4d_neon, 10),
+ make_tuple(128, 64, &aom_highbd_sad128x64x4d_neon, 10),
+ make_tuple(64, 128, &aom_highbd_sad64x128x4d_neon, 10),
+ make_tuple(64, 64, &aom_highbd_sad64x64x4d_neon, 10),
+ make_tuple(64, 32, &aom_highbd_sad64x32x4d_neon, 10),
+ make_tuple(32, 64, &aom_highbd_sad32x64x4d_neon, 10),
+ make_tuple(32, 32, &aom_highbd_sad32x32x4d_neon, 10),
+ make_tuple(32, 16, &aom_highbd_sad32x16x4d_neon, 10),
+ make_tuple(16, 32, &aom_highbd_sad16x32x4d_neon, 10),
+ make_tuple(16, 16, &aom_highbd_sad16x16x4d_neon, 10),
+ make_tuple(16, 8, &aom_highbd_sad16x8x4d_neon, 10),
+ make_tuple(8, 16, &aom_highbd_sad8x16x4d_neon, 10),
+ make_tuple(8, 8, &aom_highbd_sad8x8x4d_neon, 10),
+ make_tuple(8, 4, &aom_highbd_sad8x4x4d_neon, 10),
+ make_tuple(4, 8, &aom_highbd_sad4x8x4d_neon, 10),
+ make_tuple(4, 4, &aom_highbd_sad4x4x4d_neon, 10),
+ make_tuple(128, 128, &aom_highbd_sad128x128x4d_neon, 12),
+ make_tuple(128, 64, &aom_highbd_sad128x64x4d_neon, 12),
+ make_tuple(64, 128, &aom_highbd_sad64x128x4d_neon, 12),
+ make_tuple(64, 64, &aom_highbd_sad64x64x4d_neon, 12),
+ make_tuple(64, 32, &aom_highbd_sad64x32x4d_neon, 12),
+ make_tuple(32, 64, &aom_highbd_sad32x64x4d_neon, 12),
+ make_tuple(32, 32, &aom_highbd_sad32x32x4d_neon, 12),
+ make_tuple(32, 16, &aom_highbd_sad32x16x4d_neon, 12),
+ make_tuple(16, 32, &aom_highbd_sad16x32x4d_neon, 12),
+ make_tuple(16, 16, &aom_highbd_sad16x16x4d_neon, 12),
+ make_tuple(16, 8, &aom_highbd_sad16x8x4d_neon, 12),
+ make_tuple(8, 16, &aom_highbd_sad8x16x4d_neon, 12),
+ make_tuple(8, 8, &aom_highbd_sad8x8x4d_neon, 12),
+ make_tuple(8, 4, &aom_highbd_sad8x4x4d_neon, 12),
+ make_tuple(4, 8, &aom_highbd_sad4x8x4d_neon, 12),
+ make_tuple(4, 4, &aom_highbd_sad4x4x4d_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
#if !CONFIG_REALTIME_ONLY
make_tuple(64, 16, &aom_sad64x16x4d_neon, -1),
make_tuple(32, 8, &aom_sad32x8x4d_neon, -1),
@@ -2075,7 +1989,27 @@
make_tuple(16, 4, &aom_sad16x4x4d_neon, -1),
make_tuple(8, 32, &aom_sad8x32x4d_neon, -1),
make_tuple(4, 16, &aom_sad4x16x4d_neon, -1),
-#endif
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(64, 16, &aom_highbd_sad64x16x4d_neon, 8),
+ make_tuple(16, 64, &aom_highbd_sad16x64x4d_neon, 8),
+ make_tuple(32, 8, &aom_highbd_sad32x8x4d_neon, 8),
+ make_tuple(8, 32, &aom_highbd_sad8x32x4d_neon, 8),
+ make_tuple(16, 4, &aom_highbd_sad16x4x4d_neon, 8),
+ make_tuple(4, 16, &aom_highbd_sad4x16x4d_neon, 8),
+ make_tuple(64, 16, &aom_highbd_sad64x16x4d_neon, 10),
+ make_tuple(16, 64, &aom_highbd_sad16x64x4d_neon, 10),
+ make_tuple(32, 8, &aom_highbd_sad32x8x4d_neon, 10),
+ make_tuple(8, 32, &aom_highbd_sad8x32x4d_neon, 10),
+ make_tuple(16, 4, &aom_highbd_sad16x4x4d_neon, 10),
+ make_tuple(4, 16, &aom_highbd_sad4x16x4d_neon, 10),
+ make_tuple(64, 16, &aom_highbd_sad64x16x4d_neon, 12),
+ make_tuple(16, 64, &aom_highbd_sad16x64x4d_neon, 12),
+ make_tuple(32, 8, &aom_highbd_sad32x8x4d_neon, 12),
+ make_tuple(8, 32, &aom_highbd_sad8x32x4d_neon, 12),
+ make_tuple(16, 4, &aom_highbd_sad16x4x4d_neon, 12),
+ make_tuple(4, 16, &aom_highbd_sad4x16x4d_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
+#endif // !CONFIG_REALTIME_ONLY
};
INSTANTIATE_TEST_SUITE_P(NEON, SADx4Test, ::testing::ValuesIn(x4d_neon_tests));
const SadSkipMxNParam skip_neon_tests[] = {
@@ -2092,14 +2026,87 @@
make_tuple(16, 8, &aom_sad_skip_16x8_neon, -1),
make_tuple(8, 16, &aom_sad_skip_8x16_neon, -1),
make_tuple(8, 8, &aom_sad_skip_8x8_neon, -1),
+ make_tuple(8, 4, &aom_sad_skip_8x4_neon, -1),
make_tuple(4, 8, &aom_sad_skip_4x8_neon, -1),
+ make_tuple(4, 4, &aom_sad_skip_4x4_neon, -1),
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128_neon, 8),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64_neon, 8),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128_neon, 8),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64_neon, 8),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32_neon, 8),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64_neon, 8),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32_neon, 8),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16_neon, 8),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32_neon, 8),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16_neon, 8),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8_neon, 8),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16_neon, 8),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8_neon, 8),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4_neon, 8),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8_neon, 8),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4_neon, 8),
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128_neon, 10),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64_neon, 10),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128_neon, 10),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64_neon, 10),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32_neon, 10),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64_neon, 10),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32_neon, 10),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16_neon, 10),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32_neon, 10),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16_neon, 10),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8_neon, 10),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16_neon, 10),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8_neon, 10),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4_neon, 10),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8_neon, 10),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4_neon, 10),
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128_neon, 12),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64_neon, 12),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128_neon, 12),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64_neon, 12),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32_neon, 12),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64_neon, 12),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32_neon, 12),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16_neon, 12),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32_neon, 12),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16_neon, 12),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8_neon, 12),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16_neon, 12),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8_neon, 12),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4_neon, 12),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8_neon, 12),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
#if !CONFIG_REALTIME_ONLY
make_tuple(64, 16, &aom_sad_skip_64x16_neon, -1),
make_tuple(32, 8, &aom_sad_skip_32x8_neon, -1),
make_tuple(16, 64, &aom_sad_skip_16x64_neon, -1),
+ make_tuple(16, 4, &aom_sad_skip_16x4_neon, -1),
make_tuple(8, 32, &aom_sad_skip_8x32_neon, -1),
make_tuple(4, 16, &aom_sad_skip_4x16_neon, -1),
-#endif
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16_neon, 8),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64_neon, 8),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8_neon, 8),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32_neon, 8),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4_neon, 8),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16_neon, 8),
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16_neon, 10),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64_neon, 10),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8_neon, 10),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32_neon, 10),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4_neon, 10),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16_neon, 10),
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16_neon, 12),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64_neon, 12),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8_neon, 12),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32_neon, 12),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4_neon, 12),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
+#endif // !CONFIG_REALTIME_ONLY
};
INSTANTIATE_TEST_SUITE_P(NEON, SADSkipTest,
::testing::ValuesIn(skip_neon_tests));
@@ -2116,16 +2123,89 @@
make_tuple(16, 32, &aom_sad_skip_16x32x4d_neon, -1),
make_tuple(16, 16, &aom_sad_skip_16x16x4d_neon, -1),
make_tuple(16, 8, &aom_sad_skip_16x8x4d_neon, -1),
- make_tuple(8, 8, &aom_sad_skip_8x8x4d_neon, -1),
make_tuple(8, 16, &aom_sad_skip_8x16x4d_neon, -1),
+ make_tuple(8, 8, &aom_sad_skip_8x8x4d_neon, -1),
+ make_tuple(8, 4, &aom_sad_skip_8x4x4d_neon, -1),
make_tuple(4, 8, &aom_sad_skip_4x8x4d_neon, -1),
+ make_tuple(4, 4, &aom_sad_skip_4x4x4d_neon, -1),
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128x4d_neon, 8),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64x4d_neon, 8),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128x4d_neon, 8),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64x4d_neon, 8),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32x4d_neon, 8),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64x4d_neon, 8),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32x4d_neon, 8),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16x4d_neon, 8),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32x4d_neon, 8),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16x4d_neon, 8),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8x4d_neon, 8),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16x4d_neon, 8),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8x4d_neon, 8),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4x4d_neon, 8),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8x4d_neon, 8),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4x4d_neon, 8),
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128x4d_neon, 10),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64x4d_neon, 10),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128x4d_neon, 10),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64x4d_neon, 10),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32x4d_neon, 10),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64x4d_neon, 10),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32x4d_neon, 10),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16x4d_neon, 10),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32x4d_neon, 10),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16x4d_neon, 10),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8x4d_neon, 10),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16x4d_neon, 10),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8x4d_neon, 10),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4x4d_neon, 10),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8x4d_neon, 10),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4x4d_neon, 10),
+ make_tuple(128, 128, &aom_highbd_sad_skip_128x128x4d_neon, 12),
+ make_tuple(128, 64, &aom_highbd_sad_skip_128x64x4d_neon, 12),
+ make_tuple(64, 128, &aom_highbd_sad_skip_64x128x4d_neon, 12),
+ make_tuple(64, 64, &aom_highbd_sad_skip_64x64x4d_neon, 12),
+ make_tuple(64, 32, &aom_highbd_sad_skip_64x32x4d_neon, 12),
+ make_tuple(32, 64, &aom_highbd_sad_skip_32x64x4d_neon, 12),
+ make_tuple(32, 32, &aom_highbd_sad_skip_32x32x4d_neon, 12),
+ make_tuple(32, 16, &aom_highbd_sad_skip_32x16x4d_neon, 12),
+ make_tuple(16, 32, &aom_highbd_sad_skip_16x32x4d_neon, 12),
+ make_tuple(16, 16, &aom_highbd_sad_skip_16x16x4d_neon, 12),
+ make_tuple(16, 8, &aom_highbd_sad_skip_16x8x4d_neon, 12),
+ make_tuple(8, 16, &aom_highbd_sad_skip_8x16x4d_neon, 12),
+ make_tuple(8, 8, &aom_highbd_sad_skip_8x8x4d_neon, 12),
+ make_tuple(8, 4, &aom_highbd_sad_skip_8x4x4d_neon, 12),
+ make_tuple(4, 8, &aom_highbd_sad_skip_4x8x4d_neon, 12),
+ make_tuple(4, 4, &aom_highbd_sad_skip_4x4x4d_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
#if !CONFIG_REALTIME_ONLY
make_tuple(64, 16, &aom_sad_skip_64x16x4d_neon, -1),
make_tuple(32, 8, &aom_sad_skip_32x8x4d_neon, -1),
make_tuple(16, 64, &aom_sad_skip_16x64x4d_neon, -1),
+ make_tuple(16, 4, &aom_sad_skip_16x4x4d_neon, -1),
make_tuple(8, 32, &aom_sad_skip_8x32x4d_neon, -1),
make_tuple(4, 16, &aom_sad_skip_4x16x4d_neon, -1),
-#endif
+#if CONFIG_AV1_HIGHBITDEPTH
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16x4d_neon, 8),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64x4d_neon, 8),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8x4d_neon, 8),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32x4d_neon, 8),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4x4d_neon, 8),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16x4d_neon, 8),
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16x4d_neon, 10),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64x4d_neon, 10),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8x4d_neon, 10),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32x4d_neon, 10),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4x4d_neon, 10),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16x4d_neon, 10),
+ make_tuple(64, 16, &aom_highbd_sad_skip_64x16x4d_neon, 12),
+ make_tuple(16, 64, &aom_highbd_sad_skip_16x64x4d_neon, 12),
+ make_tuple(32, 8, &aom_highbd_sad_skip_32x8x4d_neon, 12),
+ make_tuple(8, 32, &aom_highbd_sad_skip_8x32x4d_neon, 12),
+ make_tuple(16, 4, &aom_highbd_sad_skip_16x4x4d_neon, 12),
+ make_tuple(4, 16, &aom_highbd_sad_skip_4x16x4d_neon, 12),
+#endif // CONFIG_AV1_HIGHBITDEPTH
+#endif // !CONFIG_REALTIME_ONLY
};
INSTANTIATE_TEST_SUITE_P(NEON, SADSkipx4Test,
::testing::ValuesIn(skip_x4d_neon_tests));
@@ -2158,6 +2238,34 @@
};
INSTANTIATE_TEST_SUITE_P(NEON, SADavgTest, ::testing::ValuesIn(avg_neon_tests));
+const SadMxNx4Param x3d_neon_tests[] = {
+ make_tuple(128, 128, &aom_sad128x128x3d_neon, -1),
+ make_tuple(128, 64, &aom_sad128x64x3d_neon, -1),
+ make_tuple(64, 128, &aom_sad64x128x3d_neon, -1),
+ make_tuple(64, 64, &aom_sad64x64x3d_neon, -1),
+ make_tuple(64, 32, &aom_sad64x32x3d_neon, -1),
+ make_tuple(32, 64, &aom_sad32x64x3d_neon, -1),
+ make_tuple(32, 32, &aom_sad32x32x3d_neon, -1),
+ make_tuple(32, 16, &aom_sad32x16x3d_neon, -1),
+ make_tuple(16, 32, &aom_sad16x32x3d_neon, -1),
+ make_tuple(16, 16, &aom_sad16x16x3d_neon, -1),
+ make_tuple(16, 8, &aom_sad16x8x3d_neon, -1),
+ make_tuple(8, 16, &aom_sad8x16x3d_neon, -1),
+ make_tuple(8, 8, &aom_sad8x8x3d_neon, -1),
+ make_tuple(8, 4, &aom_sad8x4x3d_neon, -1),
+ make_tuple(4, 8, &aom_sad4x8x3d_neon, -1),
+ make_tuple(4, 4, &aom_sad4x4x3d_neon, -1),
+#if !CONFIG_REALTIME_ONLY
+ make_tuple(64, 16, &aom_sad64x16x3d_neon, -1),
+ make_tuple(32, 8, &aom_sad32x8x3d_neon, -1),
+ make_tuple(16, 64, &aom_sad16x64x3d_neon, -1),
+ make_tuple(16, 4, &aom_sad16x4x3d_neon, -1),
+ make_tuple(8, 32, &aom_sad8x32x3d_neon, -1),
+ make_tuple(4, 16, &aom_sad4x16x3d_neon, -1),
+#endif // !CONFIG_REALTIME_ONLY
+};
+INSTANTIATE_TEST_SUITE_P(NEON, SADx3Test, ::testing::ValuesIn(x3d_neon_tests));
+
#endif // HAVE_NEON
//------------------------------------------------------------------------------
@@ -2608,75 +2716,35 @@
INSTANTIATE_TEST_SUITE_P(SSE2, SADSkipx4Test,
::testing::ValuesIn(skip_x4d_sse2_tests));
+const DistWtdSadMxNAvgParam dist_wtd_avg_sse2_tests[] = {
+ make_tuple(128, 128, &aom_dist_wtd_sad128x128_avg_sse2, -1),
+ make_tuple(128, 64, &aom_dist_wtd_sad128x64_avg_sse2, -1),
+ make_tuple(64, 128, &aom_dist_wtd_sad64x128_avg_sse2, -1),
+ make_tuple(64, 64, &aom_dist_wtd_sad64x64_avg_sse2, -1),
+ make_tuple(64, 32, &aom_dist_wtd_sad64x32_avg_sse2, -1),
+ make_tuple(32, 64, &aom_dist_wtd_sad32x64_avg_sse2, -1),
+ make_tuple(32, 32, &aom_dist_wtd_sad32x32_avg_sse2, -1),
+ make_tuple(32, 16, &aom_dist_wtd_sad32x16_avg_sse2, -1),
+ make_tuple(16, 32, &aom_dist_wtd_sad16x32_avg_sse2, -1),
+ make_tuple(16, 16, &aom_dist_wtd_sad16x16_avg_sse2, -1),
+ make_tuple(16, 8, &aom_dist_wtd_sad16x8_avg_sse2, -1),
+ make_tuple(8, 16, &aom_dist_wtd_sad8x16_avg_sse2, -1),
+ make_tuple(8, 8, &aom_dist_wtd_sad8x8_avg_sse2, -1),
+ make_tuple(8, 4, &aom_dist_wtd_sad8x4_avg_sse2, -1),
+ make_tuple(4, 8, &aom_dist_wtd_sad4x8_avg_sse2, -1),
+ make_tuple(4, 4, &aom_dist_wtd_sad4x4_avg_sse2, -1),
#if !CONFIG_REALTIME_ONLY
-const SadMxNx4AvgParam x4d_avg_sse2_tests[] = {
- make_tuple(128, 128, &aom_sad128x128x4d_avg_sse2, -1),
- make_tuple(128, 64, &aom_sad128x64x4d_avg_sse2, -1),
- make_tuple(64, 128, &aom_sad64x128x4d_avg_sse2, -1),
- make_tuple(64, 64, &aom_sad64x64x4d_avg_sse2, -1),
- make_tuple(64, 32, &aom_sad64x32x4d_avg_sse2, -1),
- make_tuple(32, 64, &aom_sad32x64x4d_avg_sse2, -1),
- make_tuple(32, 32, &aom_sad32x32x4d_avg_sse2, -1),
- make_tuple(32, 16, &aom_sad32x16x4d_avg_sse2, -1),
- make_tuple(16, 32, &aom_sad16x32x4d_avg_sse2, -1),
- make_tuple(16, 16, &aom_sad16x16x4d_avg_sse2, -1),
- make_tuple(16, 8, &aom_sad16x8x4d_avg_sse2, -1),
- make_tuple(8, 16, &aom_sad8x16x4d_avg_sse2, -1),
- make_tuple(8, 8, &aom_sad8x8x4d_avg_sse2, -1),
- make_tuple(8, 4, &aom_sad8x4x4d_avg_sse2, -1),
- make_tuple(4, 8, &aom_sad4x8x4d_avg_sse2, -1),
- make_tuple(4, 4, &aom_sad4x4x4d_avg_sse2, -1),
- make_tuple(64, 16, &aom_sad64x16x4d_avg_sse2, -1),
- make_tuple(16, 64, &aom_sad16x64x4d_avg_sse2, -1),
- make_tuple(32, 8, &aom_sad32x8x4d_avg_sse2, -1),
- make_tuple(8, 32, &aom_sad8x32x4d_avg_sse2, -1),
- make_tuple(16, 4, &aom_sad16x4x4d_avg_sse2, -1),
- make_tuple(4, 16, &aom_sad4x16x4d_avg_sse2, -1),
-};
-INSTANTIATE_TEST_SUITE_P(SSE2, SADx4AvgTest,
- ::testing::ValuesIn(x4d_avg_sse2_tests));
-#endif // !CONFIG_REALTIME_ONLY
-#endif // HAVE_SSE2
-
-#if HAVE_SSSE3
-// Note: These are named sse2, but part of ssse3 file and only built and linked
-// when ssse3 is enabled.
-const DistWtdSadMxhParam dist_wtd_sad_sse2_tests[] = {
- make_tuple(4, 4, &aom_sad4xh_sse2, -1),
- make_tuple(4, 8, &aom_sad4xh_sse2, -1),
- make_tuple(8, 4, &aom_sad8xh_sse2, -1),
- make_tuple(8, 8, &aom_sad8xh_sse2, -1),
- make_tuple(8, 16, &aom_sad8xh_sse2, -1),
- make_tuple(16, 8, &aom_sad16xh_sse2, -1),
- make_tuple(16, 16, &aom_sad16xh_sse2, -1),
- make_tuple(16, 32, &aom_sad16xh_sse2, -1),
- make_tuple(32, 16, &aom_sad32xh_sse2, -1),
- make_tuple(32, 32, &aom_sad32xh_sse2, -1),
- make_tuple(32, 64, &aom_sad32xh_sse2, -1),
- make_tuple(64, 32, &aom_sad64xh_sse2, -1),
- make_tuple(64, 64, &aom_sad64xh_sse2, -1),
- make_tuple(128, 128, &aom_sad128xh_sse2, -1),
- make_tuple(128, 64, &aom_sad128xh_sse2, -1),
- make_tuple(64, 128, &aom_sad64xh_sse2, -1),
- make_tuple(4, 16, &aom_sad4xh_sse2, -1),
- make_tuple(16, 4, &aom_sad16xh_sse2, -1),
- make_tuple(8, 32, &aom_sad8xh_sse2, -1),
- make_tuple(32, 8, &aom_sad32xh_sse2, -1),
- make_tuple(16, 64, &aom_sad16xh_sse2, -1),
- make_tuple(64, 16, &aom_sad64xh_sse2, -1),
-#if !CONFIG_REALTIME_ONLY
- make_tuple(16, 64, &aom_sad16xh_sse2, -1),
- make_tuple(64, 16, &aom_sad64xh_sse2, -1),
- make_tuple(8, 32, &aom_sad8xh_sse2, -1),
- make_tuple(32, 8, &aom_sad32xh_sse2, -1),
- make_tuple(4, 16, &aom_sad4xh_sse2, -1),
- make_tuple(16, 4, &aom_sad16xh_sse2, -1),
+ make_tuple(64, 16, &aom_dist_wtd_sad64x16_avg_sse2, -1),
+ make_tuple(16, 64, &aom_dist_wtd_sad16x64_avg_sse2, -1),
+ make_tuple(32, 8, &aom_dist_wtd_sad32x8_avg_sse2, -1),
+ make_tuple(8, 32, &aom_dist_wtd_sad8x32_avg_sse2, -1),
+ make_tuple(16, 4, &aom_dist_wtd_sad16x4_avg_sse2, -1),
+ make_tuple(4, 16, &aom_dist_wtd_sad4x16_avg_sse2, -1),
#endif
};
-INSTANTIATE_TEST_SUITE_P(SSE2, DistWtdSADTest,
- ::testing::ValuesIn(dist_wtd_sad_sse2_tests));
-
-#endif // HAVE_SSSE3
+INSTANTIATE_TEST_SUITE_P(sse2, DistWtdSADavgTest,
+ ::testing::ValuesIn(dist_wtd_avg_sse2_tests));
+#endif // HAVE_SSE2
#if HAVE_SSE3
// Only functions are x3, which do not have tests.
@@ -2713,35 +2781,6 @@
INSTANTIATE_TEST_SUITE_P(SSSE3, DistWtdCompAvgTest,
::testing::ValuesIn(dist_wtd_comp_avg_ssse3_tests));
-
-const DistWtdSadMxNAvgParam dist_wtd_avg_ssse3_tests[] = {
- make_tuple(128, 128, &aom_dist_wtd_sad128x128_avg_ssse3, -1),
- make_tuple(128, 64, &aom_dist_wtd_sad128x64_avg_ssse3, -1),
- make_tuple(64, 128, &aom_dist_wtd_sad64x128_avg_ssse3, -1),
- make_tuple(64, 64, &aom_dist_wtd_sad64x64_avg_ssse3, -1),
- make_tuple(64, 32, &aom_dist_wtd_sad64x32_avg_ssse3, -1),
- make_tuple(32, 64, &aom_dist_wtd_sad32x64_avg_ssse3, -1),
- make_tuple(32, 32, &aom_dist_wtd_sad32x32_avg_ssse3, -1),
- make_tuple(32, 16, &aom_dist_wtd_sad32x16_avg_ssse3, -1),
- make_tuple(16, 32, &aom_dist_wtd_sad16x32_avg_ssse3, -1),
- make_tuple(16, 16, &aom_dist_wtd_sad16x16_avg_ssse3, -1),
- make_tuple(16, 8, &aom_dist_wtd_sad16x8_avg_ssse3, -1),
- make_tuple(8, 16, &aom_dist_wtd_sad8x16_avg_ssse3, -1),
- make_tuple(8, 8, &aom_dist_wtd_sad8x8_avg_ssse3, -1),
- make_tuple(8, 4, &aom_dist_wtd_sad8x4_avg_ssse3, -1),
- make_tuple(4, 8, &aom_dist_wtd_sad4x8_avg_ssse3, -1),
- make_tuple(4, 4, &aom_dist_wtd_sad4x4_avg_ssse3, -1),
-#if !CONFIG_REALTIME_ONLY
- make_tuple(64, 16, &aom_dist_wtd_sad64x16_avg_ssse3, -1),
- make_tuple(16, 64, &aom_dist_wtd_sad16x64_avg_ssse3, -1),
- make_tuple(32, 8, &aom_dist_wtd_sad32x8_avg_ssse3, -1),
- make_tuple(8, 32, &aom_dist_wtd_sad8x32_avg_ssse3, -1),
- make_tuple(16, 4, &aom_dist_wtd_sad16x4_avg_ssse3, -1),
- make_tuple(4, 16, &aom_dist_wtd_sad4x16_avg_ssse3, -1),
-#endif
-};
-INSTANTIATE_TEST_SUITE_P(SSSE3, DistWtdSADavgTest,
- ::testing::ValuesIn(dist_wtd_avg_ssse3_tests));
#endif // HAVE_SSSE3
#if HAVE_SSE4_1
diff --git a/test/scan_test.cc b/test/scan_test.cc
index dee2ab5..571658e 100644
--- a/test/scan_test.cc
+++ b/test/scan_test.cc
@@ -15,10 +15,10 @@
#include "test/av1_txfm_test.h"
static int scan_test(const int16_t *scan, const int16_t *iscan, int si, int r,
- int c, int w) {
- if (iscan[r * w + c] != si || scan[si] != r * w + c) {
+ int c, int h) {
+ if (iscan[c * h + r] != si || scan[si] != c * h + r) {
printf("r %d c %d ref_iscan %d iscan %d ref_scan %d scan %d\n", r, c, si,
- iscan[r * w + c], r * w + c, scan[si]);
+ iscan[c * h + r], c * h + r, scan[si]);
return 1;
} else {
return 0;
@@ -37,7 +37,7 @@
for (int c = 0; c < w; ++c) {
int r = i - c;
if (r >= 0 && r < h) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
@@ -45,7 +45,7 @@
for (int r = 0; r < h; ++r) {
int c = i - r;
if (c >= 0 && c < w) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
@@ -57,7 +57,7 @@
for (int c = 0; c < w; ++c) {
int r = i - c;
if (r >= 0 && r < h) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
@@ -68,7 +68,7 @@
for (int r = 0; r < h; ++r) {
int c = i - r;
if (c >= 0 && c < w) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
@@ -77,7 +77,7 @@
int si = 0;
for (int r = 0; r < h; ++r) {
for (int c = 0; c < w; ++c) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
@@ -86,7 +86,7 @@
int si = 0;
for (int c = 0; c < w; ++c) {
for (int r = 0; r < h; ++r) {
- if (scan_test(scan, iscan, si, r, c, w)) return 1;
+ if (scan_test(scan, iscan, si, r, c, h)) return 1;
++si;
}
}
diff --git a/test/sum_squares_test.cc b/test/sum_squares_test.cc
index 5c049a5..91f172d 100644
--- a/test/sum_squares_test.cc
+++ b/test/sum_squares_test.cc
@@ -238,6 +238,13 @@
#endif // HAVE_SSE2
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(NEON, SumSquares1DTest,
+ ::testing::Values(TestFuncs1D(
+ aom_sum_squares_i16_c, aom_sum_squares_i16_neon)));
+
+#endif // HAVE_NEON
+
typedef int64_t (*sse_func)(const uint8_t *a, int a_stride, const uint8_t *b,
int b_stride, int width, int height);
typedef libaom_test::FuncParam<sse_func> TestSSEFuncs;
@@ -708,6 +715,14 @@
#endif // HAVE_SSE2
+#if HAVE_NEON
+
+INSTANTIATE_TEST_SUITE_P(NEON, Lowbd2dVarTest,
+ ::testing::Values(TestFuncVar2D(&aom_var_2d_u8_c,
+ &aom_var_2d_u8_neon)));
+
+#endif // HAVE_NEON
+
class Highbd2dVarTest : public ::testing::TestWithParam<TestFuncVar2D> {
public:
virtual ~Highbd2dVarTest() {}
@@ -837,4 +852,12 @@
::testing::Values(TestFuncVar2D(&aom_var_2d_u16_c, &aom_var_2d_u16_avx2)));
#endif // HAVE_SSE2
+
+#if HAVE_NEON
+
+INSTANTIATE_TEST_SUITE_P(
+ NEON, Highbd2dVarTest,
+ ::testing::Values(TestFuncVar2D(&aom_var_2d_u16_c, &aom_var_2d_u16_neon)));
+
+#endif // HAVE_NEON
} // namespace
diff --git a/test/svc_datarate_test.cc b/test/svc_datarate_test.cc
index a5c3840..d99d6a3 100644
--- a/test/svc_datarate_test.cc
+++ b/test/svc_datarate_test.cc
@@ -92,6 +92,8 @@
screen_mode_ = 0;
rps_mode_ = 0;
rps_recovery_frame_ = 0;
+ user_define_frame_qp_ = 0;
+ set_speed_per_layer_ = false;
}
virtual void PreEncodeFrameHook(::libaom_test::VideoSource *video,
@@ -114,7 +116,15 @@
encoder->Control(AV1E_SET_ENABLE_TPL_MODEL, 0);
encoder->Control(AV1E_SET_DELTAQ_MODE, 0);
if (cfg_.g_threads > 1) {
- encoder->Control(AV1E_SET_TILE_COLUMNS, cfg_.g_threads >> 1);
+ if (cfg_.g_threads == 4) {
+ encoder->Control(AV1E_SET_TILE_COLUMNS, 2);
+ encoder->Control(AV1E_SET_TILE_ROWS, 2);
+ } else if (cfg_.g_threads == 8) {
+ encoder->Control(AV1E_SET_TILE_COLUMNS, 4);
+ encoder->Control(AV1E_SET_TILE_ROWS, 2);
+ } else {
+ encoder->Control(AV1E_SET_TILE_COLUMNS, cfg_.g_threads >> 1);
+ }
encoder->Control(AV1E_SET_ROW_MT, 1);
}
if (screen_mode_) {
@@ -163,6 +173,23 @@
encoder->Control(AV1E_SET_SVC_REF_FRAME_CONFIG, &ref_frame_config_);
encoder->Control(AV1E_SET_SVC_REF_FRAME_COMP_PRED, &ref_frame_comp_pred_);
}
+ if (set_speed_per_layer_) {
+ int speed_per_layer = 10;
+ if (layer_id_.spatial_layer_id == 0) {
+ // For for base SL0,TL0: use the speed the test loops over.
+ if (layer_id_.temporal_layer_id == 1) speed_per_layer = 7;
+ if (layer_id_.temporal_layer_id == 2) speed_per_layer = 8;
+ } else if (layer_id_.spatial_layer_id == 1) {
+ if (layer_id_.temporal_layer_id == 0) speed_per_layer = 7;
+ if (layer_id_.temporal_layer_id == 1) speed_per_layer = 8;
+ if (layer_id_.temporal_layer_id == 2) speed_per_layer = 9;
+ } else if (layer_id_.spatial_layer_id == 2) {
+ if (layer_id_.temporal_layer_id == 0) speed_per_layer = 8;
+ if (layer_id_.temporal_layer_id == 1) speed_per_layer = 9;
+ if (layer_id_.temporal_layer_id == 2) speed_per_layer = 10;
+ }
+ encoder->Control(AOME_SET_CPUUSED, speed_per_layer);
+ }
if (set_frame_level_er_) {
int mode =
(layer_id_.spatial_layer_id > 0 || layer_id_.temporal_layer_id > 0);
@@ -193,6 +220,11 @@
}
layer_frame_cnt_++;
DatarateTest::PreEncodeFrameHook(video, encoder);
+
+ if (user_define_frame_qp_) {
+ frame_qp_ = rnd_.PseudoUniform(63);
+ encoder->Control(AV1E_SET_QUANTIZER_ONE_PASS, frame_qp_);
+ }
}
virtual void PostEncodeFrameHook(::libaom_test::Encoder *encoder) {
@@ -200,6 +232,14 @@
encoder->Control(AV1E_GET_NUM_OPERATING_POINTS, &num_operating_points);
ASSERT_EQ(num_operating_points,
number_temporal_layers_ * number_spatial_layers_);
+
+ if (user_define_frame_qp_) {
+ if (current_video_frame_ >= static_cast<unsigned int>(total_frame_))
+ return;
+ int qp;
+ encoder->Control(AOME_GET_LAST_QUANTIZER_64, &qp);
+ ASSERT_EQ(qp, frame_qp_);
+ }
}
virtual void FramePktHook(const aom_codec_cx_pkt_t *pkt) {
@@ -337,7 +377,41 @@
if (rps_mode)
ref_config_rps(ref_frame_config, frame_cnt, rps_recovery_frame);
}
- if (number_temporal_layers_ == 3 && number_spatial_layers_ == 1) {
+ if (number_temporal_layers_ == 2 && number_spatial_layers_ == 1) {
+ // 2-temporal layer.
+ // 1 3 5
+ // 0 2 4
+ // Keep golden fixed at slot 3.
+ base_count = frame_cnt >> 1;
+ ref_frame_config->ref_idx[3] = 3;
+ // Cyclically refresh slots 5, 6, 7, for lag alt ref.
+ lag_index = 5;
+ if (base_count > 0) {
+ lag_index = 5 + (base_count % 3);
+ if (frame_cnt % 2 != 0) lag_index = 5 + ((base_count + 1) % 3);
+ }
+ // Set the altref slot to lag_index.
+ ref_frame_config->ref_idx[6] = lag_index;
+ if (frame_cnt % 2 == 0) {
+ layer_id->temporal_layer_id = 0;
+ // Update LAST on layer 0, reference LAST.
+ ref_frame_config->refresh[0] = 1;
+ ref_frame_config->reference[0] = 1;
+ // Refresh lag_index slot, needed for lagging golen.
+ ref_frame_config->refresh[lag_index] = 1;
+ // Refresh GOLDEN every x base layer frames.
+ if (base_count % 32 == 0) ref_frame_config->refresh[3] = 1;
+ } else {
+ layer_id->temporal_layer_id = 1;
+ // No updates on layer 1, reference LAST (TL0).
+ ref_frame_config->reference[0] = 1;
+ }
+ // Always reference golden and altref on TL0.
+ if (layer_id->temporal_layer_id == 0) {
+ ref_frame_config->reference[3] = 1;
+ ref_frame_config->reference[6] = 1;
+ }
+ } else if (number_temporal_layers_ == 3 && number_spatial_layers_ == 1) {
// 3-layer:
// 1 3 5 7
// 2 6
@@ -627,7 +701,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Top temporal layers are non_reference, so exlcude them from
@@ -637,6 +711,71 @@
EXPECT_EQ((int)GetMismatchFrames(), 150);
}
+ virtual void SetFrameQpSVC3TL1SLTest() {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_dropframe_thresh = 0;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 63;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 1;
+
+ user_define_frame_qp_ = 1;
+ total_frame_ = 300;
+
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352,
+ 288, 30, 1, 0, 300);
+ const int bitrate_array[2] = { 200, 550 };
+ cfg_.rc_target_bitrate = bitrate_array[GET_PARAM(4)];
+ ResetModel();
+ number_temporal_layers_ = 3;
+ target_layer_bitrate_[0] = 50 * cfg_.rc_target_bitrate / 100;
+ target_layer_bitrate_[1] = 70 * cfg_.rc_target_bitrate / 100;
+ target_layer_bitrate_[2] = cfg_.rc_target_bitrate;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ }
+
+ virtual void SetFrameQpSVC3TL3SLTest() {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_dropframe_thresh = 0;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 63;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 0;
+
+ user_define_frame_qp_ = 1;
+ total_frame_ = 300;
+
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352,
+ 288, 30, 1, 0, 300);
+ const int bitrate_array[2] = { 600, 1200 };
+ cfg_.rc_target_bitrate = bitrate_array[GET_PARAM(4)];
+ ResetModel();
+ number_temporal_layers_ = 3;
+ number_spatial_layers_ = 3;
+ // SL0
+ const int bitrate_sl0 = 1 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[0] = 50 * bitrate_sl0 / 100;
+ target_layer_bitrate_[1] = 70 * bitrate_sl0 / 100;
+ target_layer_bitrate_[2] = bitrate_sl0;
+ // SL1
+ const int bitrate_sl1 = 3 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[3] = 50 * bitrate_sl1 / 100;
+ target_layer_bitrate_[4] = 70 * bitrate_sl1 / 100;
+ target_layer_bitrate_[5] = bitrate_sl1;
+ // SL2
+ const int bitrate_sl2 = 4 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[6] = 50 * bitrate_sl2 / 100;
+ target_layer_bitrate_[7] = 70 * bitrate_sl2 / 100;
+ target_layer_bitrate_[8] = bitrate_sl2;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ }
+
virtual void BasicRateTargetingSVC3TL1SLScreenTest() {
cfg_.rc_buf_initial_sz = 500;
cfg_.rc_buf_optimal_sz = 500;
@@ -663,7 +802,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.50)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.5)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.7)
<< " The datarate for the file is greater than target by too much!";
}
// Top temporal layers are non_reference, so exlcude them from
@@ -675,6 +814,44 @@
EXPECT_LE((int)GetMismatchFrames(), 30);
}
+ virtual void BasicRateTargetingSVC2TL1SLScreenDropFrameTest() {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_dropframe_thresh = 30;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 52;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 0;
+
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352,
+ 288, 30, 1, 0, 300);
+
+ const int bitrate_array[2] = { 60, 100 };
+ cfg_.rc_target_bitrate = bitrate_array[GET_PARAM(4)];
+ ResetModel();
+ screen_mode_ = 1;
+ number_temporal_layers_ = 2;
+ number_spatial_layers_ = 1;
+ target_layer_bitrate_[0] = 60 * cfg_.rc_target_bitrate / 100;
+ target_layer_bitrate_[1] = cfg_.rc_target_bitrate;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
+ ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.75)
+ << " The datarate for the file is lower than target by too much!";
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.5)
+ << " The datarate for the file is greater than target by too much!";
+ }
+ // Top temporal layers are non_reference, so exlcude them from
+ // mismatch count, since loopfilter/cdef is not applied for these on
+ // encoder side, but is always applied on decoder.
+ // This means 300 = #frames(300) - #TL2_frames(150).
+ // We use LE for screen since loopfilter level can become very small
+ // or zero and then the frame is not a mismatch.
+ EXPECT_LE((int)GetMismatchFrames(), 150);
+ }
+
virtual void BasicRateTargetingSVC1TL3SLScreenTest() {
cfg_.rc_buf_initial_sz = 500;
cfg_.rc_buf_optimal_sz = 500;
@@ -810,7 +987,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.80)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
}
@@ -857,7 +1034,7 @@
for (int i = 0; i < number_temporal_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.50)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Only base spatial layer is decoded and there are no non-referenece
@@ -905,7 +1082,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.585)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// All 3 spatial layers are decoded, starting at frame 0, so there are
@@ -938,7 +1115,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.80)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
}
@@ -1129,6 +1306,51 @@
}
}
+ virtual void BasicRateTargetingSVC3TL3SLMultiThreadSpeedPerLayerTest() {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_dropframe_thresh = 0;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 63;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+ cfg_.g_error_resilient = 0;
+ cfg_.g_threads = 2;
+ ::libaom_test::I420VideoSource video("niklas_640_480_30.yuv", 640, 480, 30,
+ 1, 0, 400);
+ cfg_.g_w = 640;
+ cfg_.g_h = 480;
+ const int bitrate_array[2] = { 600, 1200 };
+ cfg_.rc_target_bitrate = bitrate_array[GET_PARAM(4)];
+ ResetModel();
+ set_speed_per_layer_ = true;
+ number_temporal_layers_ = 3;
+ number_spatial_layers_ = 3;
+ // SL0
+ const int bitrate_sl0 = 1 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[0] = 50 * bitrate_sl0 / 100;
+ target_layer_bitrate_[1] = 70 * bitrate_sl0 / 100;
+ target_layer_bitrate_[2] = bitrate_sl0;
+ // SL1
+ const int bitrate_sl1 = 3 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[3] = 50 * bitrate_sl1 / 100;
+ target_layer_bitrate_[4] = 70 * bitrate_sl1 / 100;
+ target_layer_bitrate_[5] = bitrate_sl1;
+ // SL2
+ const int bitrate_sl2 = 4 * cfg_.rc_target_bitrate / 8;
+ target_layer_bitrate_[6] = 50 * bitrate_sl2 / 100;
+ target_layer_bitrate_[7] = 70 * bitrate_sl2 / 100;
+ target_layer_bitrate_[8] = bitrate_sl2;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
+ ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.70)
+ << " The datarate for the file is lower than target by too much!";
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.45)
+ << " The datarate for the file is greater than target by too much!";
+ }
+ }
+
virtual void BasicRateTargetingSVC3TL3SLHDMultiThread2Test() {
cfg_.rc_buf_initial_sz = 500;
cfg_.rc_buf_optimal_sz = 500;
@@ -1378,7 +1600,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1423,7 +1645,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1468,7 +1690,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1514,7 +1736,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1565,7 +1787,57 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
+ << " The datarate for the file is greater than target by too much!";
+ }
+ // Test that no mismatches have been found.
+ std::cout << " Decoded frames: " << GetDecodedFrames() << "\n";
+ std::cout << " Mismatch frames: " << GetMismatchFrames() << "\n";
+ EXPECT_EQ(300 - GetDecodedFrames(), drop_frames_);
+ EXPECT_EQ((int)GetMismatchFrames(), num_nonref);
+ }
+
+ virtual void BasicRateTargetingSVC2TL1SLDropSetEnhER0Test() {
+ cfg_.rc_buf_initial_sz = 500;
+ cfg_.rc_buf_optimal_sz = 500;
+ cfg_.rc_buf_sz = 1000;
+ cfg_.rc_dropframe_thresh = 0;
+ cfg_.rc_min_quantizer = 0;
+ cfg_.rc_max_quantizer = 63;
+ cfg_.rc_end_usage = AOM_CBR;
+ cfg_.g_lag_in_frames = 0;
+
+ ::libaom_test::I420VideoSource video("hantro_collage_w352h288.yuv", 352,
+ 288, 30, 1, 0, 300);
+ const int bitrate_array[2] = { 200, 550 };
+ cfg_.rc_target_bitrate = bitrate_array[GET_PARAM(4)];
+ ResetModel();
+
+ // Set error_resilience off.
+ cfg_.g_error_resilient = 0;
+
+ // Drop TL1: for part of sequence. Start at first TL1 at
+ // frame 101, and end at frame 199. Frame 200 is TL0,
+ // so we can continue decoding without mismatch (since LAST is the
+ // only reference).
+ int n = 0;
+ int num_nonref = 300 / 2;
+ for (int i = 101; i < 200; i++) {
+ if (i % 2 != 0) {
+ drop_frames_list_[n] = i;
+ n++;
+ if (i % 2 != 0) num_nonref -= 1;
+ }
+ }
+ drop_frames_ = n;
+ number_temporal_layers_ = 2;
+ target_layer_bitrate_[0] = 70 * cfg_.rc_target_bitrate / 100;
+ target_layer_bitrate_[1] = cfg_.rc_target_bitrate;
+ ASSERT_NO_FATAL_FAILURE(RunLoop(&video));
+ for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
+ ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
+ << " The datarate for the file is lower than target by too much!";
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1597,7 +1869,7 @@
// Drop TL1 and TL2: for part of sequence. Start at first TL2 at
// frame 101, and end at second T2 at frame 199. Frame 200 is TL0,
// so we can continue decoding without mismatch (since LAST is the
- // only reference and error_resil = 1 on TL1/TL2 frames).
+ // only reference).
int n = 0;
int num_nonref = 300 / 2;
for (int i = 101; i < 200; i++) {
@@ -1616,7 +1888,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1645,7 +1917,7 @@
// Drop TL1 and TL2: for part of sequence. Start at first TL2 at
// frame 101, and end at second T2 at frame 199. Frame 200 is TL0,
// so we can continue decoding without mismatch (since LAST is the
- // only reference and error_resil = 1 on TL1/TL2 frames).
+ // only reference).
// Drop here means drop whole superframe.
int n = 0;
int num_nonref = 300 / 2;
@@ -1679,7 +1951,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1822,7 +2094,7 @@
for (int i = 0; i < number_temporal_layers_ * number_spatial_layers_; i++) {
ASSERT_GE(effective_datarate_tl[i], target_layer_bitrate_[i] * 0.60)
<< " The datarate for the file is lower than target by too much!";
- ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.35)
+ ASSERT_LE(effective_datarate_tl[i], target_layer_bitrate_[i] * 1.60)
<< " The datarate for the file is greater than target by too much!";
}
// Test that no mismatches have been found.
@@ -1861,6 +2133,12 @@
int screen_mode_;
int rps_mode_;
int rps_recovery_frame_;
+
+ int user_define_frame_qp_;
+ int frame_qp_;
+ int total_frame_;
+ bool set_speed_per_layer_;
+ libaom_test::ACMRandom rnd_;
};
// Check basic rate targeting for CBR, for 3 temporal layers, 1 spatial.
@@ -1868,12 +2146,21 @@
BasicRateTargetingSVC3TL1SLTest();
}
+TEST_P(DatarateTestSVC, SetFrameQpSVC3TL1SL) { SetFrameQpSVC3TL1SLTest(); }
+
+TEST_P(DatarateTestSVC, SetFrameQpSVC3TL3SL) { SetFrameQpSVC3TL3SLTest(); }
+
// Check basic rate targeting for CBR, for 3 temporal layers, 1 spatial
// for screen mode.
TEST_P(DatarateTestSVC, BasicRateTargetingSVC3TL1SLScreen) {
BasicRateTargetingSVC3TL1SLScreenTest();
}
+// Check basic rate targeting for CBR, for 2 temporal layers, 1 spatial
+// for screen mode, with frame dropper on at low bitrates
+TEST_P(DatarateTestSVC, BasicRateTargetingSVC2TL1SLScreenDropFrame) {
+ BasicRateTargetingSVC2TL1SLScreenDropFrameTest();
+}
// Check basic rate targeting for CBR, for 3 spatial layers, 1 temporal
// for screen mode.
TEST_P(DatarateTestSVC, BasicRateTargetingSVC1TL3SLScreen) {
@@ -1946,6 +2233,13 @@
}
// Check basic rate targeting for CBR, for 3 spatial, 3 temporal layers,
+// for 2 threads, 2 tile_columns, row-mt enabled, and different speed
+// per layer.
+TEST_P(DatarateTestSVC, BasicRateTargetingSVC3TL3SLMultiThreadSpeedPerLayer) {
+ BasicRateTargetingSVC3TL3SLMultiThreadSpeedPerLayerTest();
+}
+
+// Check basic rate targeting for CBR, for 3 spatial, 3 temporal layers,
// for 2 threads, 2 tile_columns, row-mt enabled.
TEST_P(DatarateTestSVC, BasicRateTargetingSVC3TL3SLHDMultiThread2) {
BasicRateTargetingSVC3TL3SLHDMultiThread2Test();
@@ -1970,7 +2264,11 @@
// Check basic rate targeting for CBR, for 3 spatial, 3 temporal layers,
// for 4:4:4 input.
+#if defined(CONFIG_MAX_DECODE_PROFILE) && CONFIG_MAX_DECODE_PROFILE < 1
+TEST_P(DatarateTestSVC, DISABLED_BasicRateTargeting444SVC3TL3SL) {
+#else
TEST_P(DatarateTestSVC, BasicRateTargeting444SVC3TL3SL) {
+#endif
BasicRateTargeting444SVC3TL3SLTest();
}
@@ -2019,6 +2317,15 @@
BasicRateTargetingSVC3TL1SLDropSetEnhFrameERTest();
}
+// Check basic rate targeting for CBR, for 2 temporal layers, 1 spatial layer,
+// with dropping set of enhancement layers (TL 1) in middle of sequence.
+// Test that the error_resilient flag can be 0/off for all frames.
+// This allows for successful decoding after dropping a set enhancement layer
+// frames in the sequence.
+TEST_P(DatarateTestSVC, BasicRateTargetingSVC2TL1SLDropSetEnhER0) {
+ BasicRateTargetingSVC2TL1SLDropSetEnhER0Test();
+}
+
// Check basic rate targeting for CBR, for 3 temporal layers, 1 spatial layer,
// with dropping set of enhancement layers (TL 1 and TL2) in middle of sequence.
// Test that the error_resilient flag can be 0/off for all frames.
@@ -2068,7 +2375,7 @@
AV1_INSTANTIATE_TEST_SUITE(DatarateTestSVC,
::testing::Values(::libaom_test::kRealTime),
- ::testing::Range(7, 11), ::testing::Values(0, 3),
+ ::testing::Range(7, 12), ::testing::Values(0, 3),
::testing::Values(0, 1));
} // namespace
diff --git a/test/svc_encoder_rtc.sh b/test/svc_encoder_rtc.sh
new file mode 100644
index 0000000..735166d
--- /dev/null
+++ b/test/svc_encoder_rtc.sh
@@ -0,0 +1,85 @@
+#!/bin/sh
+## Copyright (c) 2023, Alliance for Open Media. All rights reserved
+##
+## This source code is subject to the terms of the BSD 2 Clause License and
+## the Alliance for Open Media Patent License 1.0. If the BSD 2 Clause License
+## was not distributed with this source code in the LICENSE file, you can
+## obtain it at www.aomedia.org/license/software. If the Alliance for Open
+## Media Patent License 1.0 was not distributed with this source code in the
+## PATENTS file, you can obtain it at www.aomedia.org/license/patent.
+##
+
+. $(dirname $0)/tools_common.sh
+
+# Environment check: $YUV_RAW_INPUT is required.
+svc_encoder_verify_environment() {
+ if [ ! -e "${YUV_RAW_INPUT}" ]; then
+ echo "Libaom test data must exist in LIBAOM_TEST_DATA_PATH."
+ return 1
+ fi
+}
+
+common_flags="-k 10000"
+common_flags="${common_flags} --max-q=63"
+common_flags="${common_flags} --error-resilient=0"
+
+# Runs svc_encoder_rtc using with 1 spatial layer 3 temporal layers.
+svc_encoder_s1_t3() {
+ local encoder="${LIBAOM_BIN_PATH}/svc_encoder_rtc${AOM_TEST_EXE_SUFFIX}"
+ local output_file="${AOM_TEST_OUTPUT_DIR}/svc_encoder_rtc"
+
+ if [ ! -x "${encoder}" ]; then
+ elog "${encoder} does not exist or is not executable."
+ return 1
+ fi
+
+ eval "${AOM_TEST_PREFIX}" "${encoder}" "${common_flags}" \
+ "--width=${YUV_RAW_INPUT_WIDTH}" \
+ "--height=${YUV_RAW_INPUT_HEIGHT}" \
+ "-lm 2" \
+ "--speed=8" \
+ "--target-bitrate=400" \
+ "--bitrates=220,300,400" \
+ "--spatial-layers=1" \
+ "--temporal-layers=3" \
+ "--timebase=1/30" \
+ "${YUV_RAW_INPUT}" \
+ "-o ${output_file}" \
+ ${devnull} || return 1
+
+ [ -e "${output_file}" ] || return 1
+}
+
+# Runs svc_encoder_rtc using with 1 spatial layer 2 temporal layers with
+# speed 10.
+svc_encoder_s1_t2() {
+ local encoder="${LIBAOM_BIN_PATH}/svc_encoder_rtc${AOM_TEST_EXE_SUFFIX}"
+ local output_file="${AOM_TEST_OUTPUT_DIR}/svc_encoder_rtc"
+
+ if [ ! -x "${encoder}" ]; then
+ elog "${encoder} does not exist or is not executable."
+ return 1
+ fi
+
+ eval "${AOM_TEST_PREFIX}" "${encoder}" "${common_flags}" \
+ "--width=${YUV_RAW_INPUT_WIDTH}" \
+ "--height=${YUV_RAW_INPUT_HEIGHT}" \
+ "-lm 1" \
+ "--speed=10" \
+ "--target-bitrate=400" \
+ "--bitrates=220,400" \
+ "--spatial-layers=1" \
+ "--temporal-layers=2" \
+ "--timebase=1/30" \
+ "${YUV_RAW_INPUT}" \
+ "-o ${output_file}" \
+ ${devnull} || return 1
+
+ [ -e "${output_file}" ] || return 1
+}
+
+if [ "$(av1_encode_available)" = "yes" ]; then
+ svc_encoder_rtc_tests="svc_encoder_s1_t3
+ svc_encoder_s1_t2"
+ run_tests svc_encoder_verify_environment "${svc_encoder_rtc_tests}"
+fi
diff --git a/test/temporal_filter_test.cc b/test/temporal_filter_test.cc
index 154fd5d..e689cd3 100644
--- a/test/temporal_filter_test.cc
+++ b/test/temporal_filter_test.cc
@@ -31,9 +31,7 @@
#include "test/function_equivalence_test.h"
using libaom_test::ACMRandom;
-using libaom_test::FunctionEquivalenceTest;
using ::testing::Combine;
-using ::testing::Range;
using ::testing::Values;
using ::testing::ValuesIn;
@@ -47,11 +45,11 @@
} ColorFormat;
static const char *color_fmt_str[] = { "I400", "I420", "I422", "I444" };
typedef void (*TemporalFilterFunc)(
- const YV12_BUFFER_CONFIG *ref_frame, const MACROBLOCKD *mbd,
+ const YV12_BUFFER_CONFIG *frame_to_filter, const MACROBLOCKD *mbd,
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_level, const MV *subblock_mvs,
- const int *subblock_mses, const int q_factor, const int filter_strenght,
- const uint8_t *pred, uint32_t *accum, uint16_t *count);
+ const int *subblock_mses, const int q_factor, const int filter_strength,
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
typedef libaom_test::FuncParam<TemporalFilterFunc> TemporalFilterFuncParam;
typedef std::tuple<TemporalFilterFuncParam, int> TemporalFilterWithParam;
@@ -62,6 +60,7 @@
virtual ~TemporalFilterTest() {}
virtual void SetUp() {
params_ = GET_PARAM(0);
+ tf_wgt_calc_lvl_ = GET_PARAM(1);
rnd_.Reset(ACMRandom::DeterministicSeed());
src1_ = reinterpret_cast<uint8_t *>(
aom_memalign(8, sizeof(uint8_t) * MAX_MB_PLANE * BH * BW));
@@ -121,6 +120,7 @@
protected:
TemporalFilterFuncParam params_;
+ int32_t tf_wgt_calc_lvl_;
uint8_t *src1_;
uint8_t *src2_;
ACMRandom rnd_;
@@ -131,8 +131,9 @@
ColorFormat color_fmt) {
aom_usec_timer ref_timer, test_timer;
const BLOCK_SIZE block_size = TF_BLOCK_SIZE;
- const int width = block_size_wide[block_size];
- const int height = block_size_high[block_size];
+ static_assert(block_size == BLOCK_32X32, "");
+ const int width = 32;
+ const int height = 32;
int num_planes = MAX_MB_PLANE;
int subsampling_x = 0;
int subsampling_y = 0;
@@ -173,25 +174,25 @@
memset(accumulator_mod, 0, 1024 * 3 * sizeof(accumulator_mod[0]));
memset(count_mod, 0, 1024 * 3 * sizeof(count_mod[0]));
- assert(width == 32 && height == 32);
+ static_assert(width == 32 && height == 32, "");
const MV subblock_mvs[4] = { { 0, 0 }, { 5, 5 }, { 7, 8 }, { 2, 10 } };
const int subblock_mses[4] = { 15, 16, 17, 18 };
const int q_factor = 12;
const int filter_strength = 5;
const int mb_row = 0;
const int mb_col = 0;
- std::unique_ptr<YV12_BUFFER_CONFIG> ref_frame(new (std::nothrow)
- YV12_BUFFER_CONFIG);
- ASSERT_NE(ref_frame, nullptr);
- ref_frame->y_crop_height = 360;
- ref_frame->y_crop_width = 540;
- ref_frame->heights[PLANE_TYPE_Y] = height;
- ref_frame->heights[PLANE_TYPE_UV] = height >> subsampling_y;
- ref_frame->strides[PLANE_TYPE_Y] = stride;
- ref_frame->strides[PLANE_TYPE_UV] = stride >> subsampling_x;
+ std::unique_ptr<YV12_BUFFER_CONFIG> frame_to_filter(new (std::nothrow)
+ YV12_BUFFER_CONFIG);
+ ASSERT_NE(frame_to_filter, nullptr);
+ frame_to_filter->y_crop_height = 360;
+ frame_to_filter->y_crop_width = 540;
+ frame_to_filter->heights[PLANE_TYPE_Y] = height;
+ frame_to_filter->heights[PLANE_TYPE_UV] = height >> subsampling_y;
+ frame_to_filter->strides[PLANE_TYPE_Y] = stride;
+ frame_to_filter->strides[PLANE_TYPE_UV] = stride >> subsampling_x;
DECLARE_ALIGNED(16, uint8_t, src[1024 * 3]);
- ref_frame->buffer_alloc = src;
- ref_frame->flags = 0; // Only support low bit-depth test.
+ frame_to_filter->buffer_alloc = src;
+ frame_to_filter->flags = 0; // Only support low bit-depth test.
memcpy(src, src1_, 1024 * 3 * sizeof(uint8_t));
std::unique_ptr<MACROBLOCKD> mbd(new (std::nothrow) MACROBLOCKD);
@@ -200,26 +201,28 @@
for (int plane = AOM_PLANE_Y; plane < num_planes; plane++) {
int plane_height = plane ? height >> subsampling_y : height;
int plane_stride = plane ? stride >> subsampling_x : stride;
- ref_frame->buffers[plane] =
- ref_frame->buffer_alloc + plane * plane_stride * plane_height;
+ frame_to_filter->buffers[plane] =
+ frame_to_filter->buffer_alloc + plane * plane_stride * plane_height;
mbd->plane[plane].subsampling_x = plane ? subsampling_x : 0;
mbd->plane[plane].subsampling_y = plane ? subsampling_y : 0;
}
- params_.ref_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses, q_factor,
- filter_strength, src2_, accumulator_ref, count_ref);
- params_.tst_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses, q_factor,
- filter_strength, src2_, accumulator_mod, count_mod);
+ params_.ref_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_, src2_,
+ accumulator_ref, count_ref);
+ params_.tst_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_, src2_,
+ accumulator_mod, count_mod);
if (run_times > 1) {
aom_usec_timer_start(&ref_timer);
for (int j = 0; j < run_times; j++) {
- params_.ref_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses,
- q_factor, filter_strength, src2_, accumulator_ref,
- count_ref);
+ params_.ref_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_, src2_,
+ accumulator_ref, count_ref);
}
aom_usec_timer_mark(&ref_timer);
const int elapsed_time_c =
@@ -227,10 +230,10 @@
aom_usec_timer_start(&test_timer);
for (int j = 0; j < run_times; j++) {
- params_.tst_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses,
- q_factor, filter_strength, src2_, accumulator_mod,
- count_mod);
+ params_.tst_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_, src2_,
+ accumulator_mod, count_mod);
}
aom_usec_timer_mark(&test_timer);
const int elapsed_time_simd =
@@ -286,7 +289,7 @@
&av1_apply_temporal_filter_c, &av1_apply_temporal_filter_avx2) };
INSTANTIATE_TEST_SUITE_P(AVX2, TemporalFilterTest,
Combine(ValuesIn(temporal_filter_test_avx2),
- Range(64, 65, 4)));
+ Values(0, 1)));
#endif // HAVE_AVX2
#if HAVE_SSE2
@@ -294,7 +297,7 @@
&av1_apply_temporal_filter_c, &av1_apply_temporal_filter_sse2) };
INSTANTIATE_TEST_SUITE_P(SSE2, TemporalFilterTest,
Combine(ValuesIn(temporal_filter_test_sse2),
- Range(64, 65, 4)));
+ Values(0, 1)));
#endif // HAVE_SSE2
#if HAVE_NEON
@@ -302,17 +305,109 @@
&av1_apply_temporal_filter_c, &av1_apply_temporal_filter_neon) };
INSTANTIATE_TEST_SUITE_P(NEON, TemporalFilterTest,
Combine(ValuesIn(temporal_filter_test_neon),
- Range(64, 65, 4)));
+ Values(0, 1)));
#endif // HAVE_NEON
+typedef double (*EstimateNoiseFunc)(const uint8_t *src, int height, int width,
+ int stride, int edge_thresh);
+
+typedef std::tuple<EstimateNoiseFunc, EstimateNoiseFunc, int, int>
+ EstimateNoiseWithParam;
+
+class EstimateNoiseTest
+ : public ::testing::TestWithParam<EstimateNoiseWithParam> {
+ public:
+ virtual ~EstimateNoiseTest() {}
+ virtual void SetUp() {
+ ref_func = GET_PARAM(0);
+ tst_func = GET_PARAM(1);
+ width_ = GET_PARAM(2);
+ height_ = GET_PARAM(3);
+ rnd_.Reset(ACMRandom::DeterministicSeed());
+ src1_ = reinterpret_cast<uint8_t *>(
+ aom_memalign(8, sizeof(uint8_t) * width_ * height_));
+ GenRandomData(width_ * height_);
+ ASSERT_NE(src1_, nullptr);
+ }
+
+ virtual void TearDown() { aom_free(src1_); }
+
+ void RunTest(int run_times) {
+ stride_ = width_;
+
+ for (int i = 0; i < run_times; i++) {
+ double ref_out = ref_func(src1_, height_, width_, stride_,
+ NOISE_ESTIMATION_EDGE_THRESHOLD);
+
+ double tst_out = tst_func(src1_, height_, width_, stride_,
+ NOISE_ESTIMATION_EDGE_THRESHOLD);
+
+ EXPECT_EQ(ref_out, tst_out);
+ }
+ }
+
+ void SpeedTest(int run_times) {
+ stride_ = width_;
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ for (int i = 0; i < run_times; i++) {
+ ref_func(src1_, height_, width_, stride_,
+ NOISE_ESTIMATION_EDGE_THRESHOLD);
+ }
+ aom_usec_timer_mark(&timer);
+ const double time1 = static_cast<double>(aom_usec_timer_elapsed(&timer));
+ aom_usec_timer_start(&timer);
+ for (int i = 0; i < run_times; i++) {
+ tst_func(src1_, height_, width_, stride_,
+ NOISE_ESTIMATION_EDGE_THRESHOLD);
+ }
+ aom_usec_timer_mark(&timer);
+ const double time2 = static_cast<double>(aom_usec_timer_elapsed(&timer));
+
+ printf("(%3.2f)\n", time1 / time2);
+ }
+
+ void GenRandomData(int size) {
+ for (int ii = 0; ii < size; ii++) src1_[ii] = rnd_.Rand8();
+ }
+
+ protected:
+ EstimateNoiseFunc ref_func;
+ EstimateNoiseFunc tst_func;
+ ACMRandom rnd_;
+ uint8_t *src1_;
+ int width_;
+ int height_;
+ int stride_;
+};
+GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST(EstimateNoiseTest);
+
+TEST_P(EstimateNoiseTest, RandomValues) { RunTest(1); }
+
+TEST_P(EstimateNoiseTest, DISABLED_Speed) { SpeedTest(2000); }
+
+#if HAVE_AVX2
+// Width and height for which av1_estimate_noise_from_single_plane() will be
+// tested.
+const int kWidths[] = { 3840, 1920, 1280, 800, 640, 360, 357 };
+const int kHeights[] = { 2160, 1080, 720, 600, 480, 240, 237 };
+
+INSTANTIATE_TEST_SUITE_P(
+ AVX2, EstimateNoiseTest,
+ ::testing::Combine(
+ ::testing::Values(av1_estimate_noise_from_single_plane_c),
+ ::testing::Values(av1_estimate_noise_from_single_plane_avx2),
+ ::testing::ValuesIn(kWidths), ::testing::ValuesIn(kHeights)));
+#endif // HAVE_AVX2
+
#if CONFIG_AV1_HIGHBITDEPTH
typedef void (*HBDTemporalFilterFunc)(
- const YV12_BUFFER_CONFIG *ref_frame, const MACROBLOCKD *mbd,
+ const YV12_BUFFER_CONFIG *frame_to_filter, const MACROBLOCKD *mbd,
const BLOCK_SIZE block_size, const int mb_row, const int mb_col,
const int num_planes, const double *noise_level, const MV *subblock_mvs,
- const int *subblock_mses, const int q_factor, const int filter_strenght,
- const uint8_t *pred, uint32_t *accum, uint16_t *count);
+ const int *subblock_mses, const int q_factor, const int filter_strength,
+ int tf_wgt_calc_lvl, const uint8_t *pred, uint32_t *accum, uint16_t *count);
typedef libaom_test::FuncParam<HBDTemporalFilterFunc>
HBDTemporalFilterFuncParam;
@@ -324,6 +419,7 @@
virtual ~HBDTemporalFilterTest() {}
virtual void SetUp() {
params_ = GET_PARAM(0);
+ tf_wgt_calc_lvl_ = GET_PARAM(1);
rnd_.Reset(ACMRandom::DeterministicSeed());
src1_ = reinterpret_cast<uint16_t *>(
aom_memalign(16, sizeof(uint16_t) * MAX_MB_PLANE * BH * BW));
@@ -385,6 +481,7 @@
protected:
HBDTemporalFilterFuncParam params_;
+ int tf_wgt_calc_lvl_;
uint16_t *src1_;
uint16_t *src2_;
ACMRandom rnd_;
@@ -396,8 +493,9 @@
ColorFormat color_fmt) {
aom_usec_timer ref_timer, test_timer;
const BLOCK_SIZE block_size = TF_BLOCK_SIZE;
- const int width = block_size_wide[block_size];
- const int height = block_size_high[block_size];
+ static_assert(block_size == BLOCK_32X32, "");
+ const int width = 32;
+ const int height = 32;
int num_planes = MAX_MB_PLANE;
int subsampling_x = 0;
int subsampling_y = 0;
@@ -438,25 +536,26 @@
memset(accumulator_mod, 0, 1024 * 3 * sizeof(accumulator_mod[0]));
memset(count_mod, 0, 1024 * 3 * sizeof(count_mod[0]));
- assert(width == 32 && height == 32);
+ static_assert(width == 32 && height == 32, "");
const MV subblock_mvs[4] = { { 0, 0 }, { 5, 5 }, { 7, 8 }, { 2, 10 } };
const int subblock_mses[4] = { 15, 16, 17, 18 };
const int q_factor = 12;
const int filter_strength = 5;
const int mb_row = 0;
const int mb_col = 0;
- std::unique_ptr<YV12_BUFFER_CONFIG> ref_frame(new (std::nothrow)
- YV12_BUFFER_CONFIG);
- ASSERT_NE(ref_frame, nullptr);
- ref_frame->y_crop_height = 360;
- ref_frame->y_crop_width = 540;
- ref_frame->heights[PLANE_TYPE_Y] = height;
- ref_frame->heights[PLANE_TYPE_UV] = height >> subsampling_y;
- ref_frame->strides[PLANE_TYPE_Y] = stride;
- ref_frame->strides[PLANE_TYPE_UV] = stride >> subsampling_x;
+ std::unique_ptr<YV12_BUFFER_CONFIG> frame_to_filter(new (std::nothrow)
+ YV12_BUFFER_CONFIG);
+ ASSERT_NE(frame_to_filter, nullptr);
+ frame_to_filter->y_crop_height = 360;
+ frame_to_filter->y_crop_width = 540;
+ frame_to_filter->heights[PLANE_TYPE_Y] = height;
+ frame_to_filter->heights[PLANE_TYPE_UV] = height >> subsampling_y;
+ frame_to_filter->strides[PLANE_TYPE_Y] = stride;
+ frame_to_filter->strides[PLANE_TYPE_UV] = stride >> subsampling_x;
DECLARE_ALIGNED(16, uint16_t, src[1024 * 3]);
- ref_frame->buffer_alloc = CONVERT_TO_BYTEPTR(src);
- ref_frame->flags = YV12_FLAG_HIGHBITDEPTH; // Only Hihgbd bit-depth test.
+ frame_to_filter->buffer_alloc = CONVERT_TO_BYTEPTR(src);
+ frame_to_filter->flags =
+ YV12_FLAG_HIGHBITDEPTH; // Only Hihgbd bit-depth test.
memcpy(src, src1_, 1024 * 3 * sizeof(uint16_t));
std::unique_ptr<MACROBLOCKD> mbd(new (std::nothrow) MACROBLOCKD);
@@ -465,28 +564,28 @@
for (int plane = AOM_PLANE_Y; plane < num_planes; plane++) {
int plane_height = plane ? height >> subsampling_y : height;
int plane_stride = plane ? stride >> subsampling_x : stride;
- ref_frame->buffers[plane] =
- ref_frame->buffer_alloc + plane * plane_stride * plane_height;
+ frame_to_filter->buffers[plane] =
+ frame_to_filter->buffer_alloc + plane * plane_stride * plane_height;
mbd->plane[plane].subsampling_x = plane ? subsampling_x : 0;
mbd->plane[plane].subsampling_y = plane ? subsampling_y : 0;
}
- params_.ref_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses, q_factor,
- filter_strength, CONVERT_TO_BYTEPTR(src2_),
- accumulator_ref, count_ref);
- params_.tst_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses, q_factor,
- filter_strength, CONVERT_TO_BYTEPTR(src2_),
- accumulator_mod, count_mod);
+ params_.ref_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_,
+ CONVERT_TO_BYTEPTR(src2_), accumulator_ref, count_ref);
+ params_.tst_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_,
+ CONVERT_TO_BYTEPTR(src2_), accumulator_mod, count_mod);
if (run_times > 1) {
aom_usec_timer_start(&ref_timer);
for (int j = 0; j < run_times; j++) {
- params_.ref_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses,
- q_factor, filter_strength, CONVERT_TO_BYTEPTR(src2_),
- accumulator_ref, count_ref);
+ params_.ref_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_,
+ CONVERT_TO_BYTEPTR(src2_), accumulator_ref, count_ref);
}
aom_usec_timer_mark(&ref_timer);
const int elapsed_time_c =
@@ -494,10 +593,10 @@
aom_usec_timer_start(&test_timer);
for (int j = 0; j < run_times; j++) {
- params_.tst_func(ref_frame.get(), mbd.get(), block_size, mb_row, mb_col,
- num_planes, sigma, subblock_mvs, subblock_mses,
- q_factor, filter_strength, CONVERT_TO_BYTEPTR(src2_),
- accumulator_mod, count_mod);
+ params_.tst_func(frame_to_filter.get(), mbd.get(), block_size, mb_row,
+ mb_col, num_planes, sigma, subblock_mvs, subblock_mses,
+ q_factor, filter_strength, tf_wgt_calc_lvl_,
+ CONVERT_TO_BYTEPTR(src2_), accumulator_mod, count_mod);
}
aom_usec_timer_mark(&test_timer);
const int elapsed_time_simd =
@@ -554,7 +653,7 @@
};
INSTANTIATE_TEST_SUITE_P(SSE2, HBDTemporalFilterTest,
Combine(ValuesIn(HBDtemporal_filter_test_sse2),
- Range(64, 65, 4)));
+ Values(0, 1)));
#endif // HAVE_SSE2
#if HAVE_AVX2
HBDTemporalFilterFuncParam HBDtemporal_filter_test_avx2[] = {
@@ -563,7 +662,7 @@
};
INSTANTIATE_TEST_SUITE_P(AVX2, HBDTemporalFilterTest,
Combine(ValuesIn(HBDtemporal_filter_test_avx2),
- Range(64, 65, 4)));
+ Values(0, 1)));
#endif // HAVE_AVX2
#endif // CONFIG_AV1_HIGHBITDEPTH
} // namespace
diff --git a/test/test-data.sha1 b/test/test-data.sha1
index 3ac50a4..4bd0ddc 100644
--- a/test/test-data.sha1
+++ b/test/test-data.sha1
@@ -570,3 +570,4 @@
c7f336958e7af6162c20ddc84d67c7dfa9826910 *av1-1-b8-16-intra_only-intrabc-extreme-dv.ivf
36a4fcf07e645ed522cde5845dd9c6ab2b2d1502 *av1-1-b8-16-intra_only-intrabc-extreme-dv.ivf.md5
9f935d391fdf4a6f7c320355d45770d2e7d6095c *desktopqvga2.320_240.yuv
+4d1ad6d3070268ccb000d7fc3ae0f5a9447bfe82 *test_input_w1h1.yuv
diff --git a/test/test.cmake b/test/test.cmake
index a173246..672edb3 100644
--- a/test/test.cmake
+++ b/test/test.cmake
@@ -21,8 +21,16 @@
set(AOM_IDE_TEST_FOLDER "test")
set(AOM_IDE_TESTDATA_FOLDER "testdata")
+# Appends |AOM_TEST_SOURCE_VARS| with |src_list_name| at the caller's scope.
+# This collects all variables containing libaom test source files.
+function(add_to_libaom_test_srcs src_list_name)
+ list(APPEND AOM_TEST_SOURCE_VARS ${src_list_name})
+ set(AOM_TEST_SOURCE_VARS "${AOM_TEST_SOURCE_VARS}" PARENT_SCOPE)
+endfunction()
+
list(APPEND AOM_UNIT_TEST_WRAPPER_SOURCES "${AOM_GEN_SRC_DIR}/usage_exit.c"
"${AOM_ROOT}/test/test_libaom.cc")
+add_to_libaom_test_srcs(AOM_UNIT_TEST_WRAPPER_SOURCES)
list(APPEND AOM_UNIT_TEST_COMMON_SOURCES
"${AOM_ROOT}/test/acm_random.h"
@@ -41,6 +49,7 @@
"${AOM_ROOT}/test/transform_test_base.h"
"${AOM_ROOT}/test/util.h"
"${AOM_ROOT}/test/video_source.h")
+add_to_libaom_test_srcs(AOM_UNIT_TEST_COMMON_SOURCES)
list(APPEND AOM_UNIT_TEST_DECODER_SOURCES "${AOM_ROOT}/test/decode_api_test.cc"
"${AOM_ROOT}/test/decode_scalability_test.cc"
@@ -48,6 +57,7 @@
"${AOM_ROOT}/test/invalid_file_test.cc"
"${AOM_ROOT}/test/test_vector_test.cc"
"${AOM_ROOT}/test/ivf_video_source.h")
+add_to_libaom_test_srcs(AOM_UNIT_TEST_DECODER_SOURCES)
list(APPEND AOM_UNIT_TEST_ENCODER_SOURCES
"${AOM_ROOT}/test/active_map_test.cc"
@@ -60,6 +70,7 @@
"${AOM_ROOT}/test/datarate_test.cc"
"${AOM_ROOT}/test/datarate_test.h"
"${AOM_ROOT}/test/deltaq_mode_test.cc"
+ "${AOM_ROOT}/test/dropframe_encode_test.cc"
"${AOM_ROOT}/test/svc_datarate_test.cc"
"${AOM_ROOT}/test/encode_api_test.cc"
"${AOM_ROOT}/test/encode_small_width_height_test.cc"
@@ -86,9 +97,11 @@
"${AOM_ROOT}/test/y4m_video_source.h"
"${AOM_ROOT}/test/yuv_video_source.h"
"${AOM_ROOT}/test/time_stamp_test.cc")
+add_to_libaom_test_srcs(AOM_UNIT_TEST_ENCODER_SOURCES)
list(APPEND AOM_ENCODE_PERF_TEST_SOURCES "${AOM_ROOT}/test/encode_perf_test.cc")
list(APPEND AOM_UNIT_TEST_WEBM_SOURCES "${AOM_ROOT}/test/webm_video_source.h")
+add_to_libaom_test_srcs(AOM_UNIT_TEST_WEBM_SOURCES)
list(APPEND AOM_TEST_INTRA_PRED_SPEED_SOURCES "${AOM_GEN_SRC_DIR}/usage_exit.c"
"${AOM_ROOT}/test/test_intra_pred_speed.cc")
@@ -114,6 +127,7 @@
"${AOM_ROOT}/test/cpu_speed_test.cc"
"${AOM_ROOT}/test/cpu_used_firstpass_test.cc"
"${AOM_ROOT}/test/deltaq_mode_test.cc"
+ "${AOM_ROOT}/test/dropframe_encode_test.cc"
"${AOM_ROOT}/test/end_to_end_psnr_test.cc"
"${AOM_ROOT}/test/force_key_frame_test.cc"
"${AOM_ROOT}/test/gf_pyr_height_test.cc"
@@ -145,15 +159,19 @@
list(APPEND AOM_UNIT_TEST_COMMON_INTRIN_NEON
"${AOM_ROOT}/test/simd_cmp_neon.cc")
+ add_to_libaom_test_srcs(AOM_UNIT_TEST_COMMON_INTRIN_NEON)
list(APPEND AOM_UNIT_TEST_COMMON_INTRIN_SSE2
"${AOM_ROOT}/test/simd_cmp_sse2.cc")
+ add_to_libaom_test_srcs(AOM_UNIT_TEST_COMMON_INTRIN_SSE2)
list(APPEND AOM_UNIT_TEST_COMMON_INTRIN_SSSE3
"${AOM_ROOT}/test/simd_cmp_ssse3.cc")
+ add_to_libaom_test_srcs(AOM_UNIT_TEST_COMMON_INTRIN_SSSE3)
list(APPEND AOM_UNIT_TEST_COMMON_INTRIN_AVX2
"${AOM_ROOT}/test/simd_cmp_avx2.cc")
+ add_to_libaom_test_srcs(AOM_UNIT_TEST_COMMON_INTRIN_AVX2)
list(APPEND AOM_UNIT_TEST_ENCODER_SOURCES
"${AOM_ROOT}/test/arf_freq_test.cc"
@@ -173,7 +191,7 @@
"${AOM_ROOT}/test/blend_a64_mask_test.cc"
"${AOM_ROOT}/test/comp_avg_pred_test.cc"
"${AOM_ROOT}/test/comp_avg_pred_test.h"
- "${AOM_ROOT}/test/comp_mask_variance_test.cc"
+ "${AOM_ROOT}/test/comp_mask_pred_test.cc"
"${AOM_ROOT}/test/encodemb_test.cc"
"${AOM_ROOT}/test/encodetxb_test.cc"
"${AOM_ROOT}/test/end_to_end_qmpsnr_test.cc"
@@ -187,6 +205,7 @@
"${AOM_ROOT}/test/horver_correlation_test.cc"
"${AOM_ROOT}/test/masked_sad_test.cc"
"${AOM_ROOT}/test/masked_variance_test.cc"
+ "${AOM_ROOT}/test/minmax_test.cc"
"${AOM_ROOT}/test/motion_vector_test.cc"
"${AOM_ROOT}/test/mv_cost_test.cc"
"${AOM_ROOT}/test/noise_model_test.cc"
@@ -209,6 +228,7 @@
list(APPEND AOM_UNIT_TEST_ENCODER_INTRIN_SSE4_1
"${AOM_ROOT}/test/simd_cmp_sse4.cc")
+ add_to_libaom_test_srcs(AOM_UNIT_TEST_ENCODER_INTRIN_SSE4_1)
if(NOT CONFIG_REALTIME_ONLY)
list(APPEND AOM_UNIT_TEST_ENCODER_INTRIN_SSE4_1
@@ -334,6 +354,12 @@
endif()
+ if(HAVE_NEON)
+ list(APPEND AOM_UNIT_TEST_ENCODER_SOURCES
+ "${AOM_ROOT}/test/av1_convolve_scale_test.cc"
+ "${AOM_ROOT}/test/av1_horz_only_frame_superres_test.cc")
+ endif()
+
if(HAVE_SSE4_2 OR HAVE_ARM_CRC32)
list(APPEND AOM_UNIT_TEST_ENCODER_SOURCES "${AOM_ROOT}/test/hash_test.cc")
endif()
@@ -356,26 +382,20 @@
endif()
if(CONFIG_AV1_ENCODER AND ENABLE_TESTS)
- list(APPEND AOM_RC_INTERFACE_SOURCES
- "${AOM_ROOT}/test/encode_test_driver.cc"
- "${AOM_ROOT}/test/encode_test_driver.h"
+ list(APPEND AOM_RC_TEST_SOURCES "${AOM_ROOT}/test/codec_factory.h"
"${AOM_ROOT}/test/decode_test_driver.cc"
"${AOM_ROOT}/test/decode_test_driver.h"
- "${AOM_ROOT}/test/codec_factory.h"
- "${AOM_ROOT}/test/test_aom_rc_interface.cc"
+ "${AOM_ROOT}/test/encode_test_driver.cc"
+ "${AOM_ROOT}/test/encode_test_driver.h"
+ "${AOM_ROOT}/test/i420_video_source.h"
"${AOM_ROOT}/test/ratectrl_rtc_test.cc"
- "${AOM_ROOT}/common/y4minput.c"
- "${AOM_ROOT}/common/y4minput.h"
- "${AOM_ROOT}/test/y4m_video_source.h"
- "${AOM_ROOT}/test/yuv_video_source.h")
-
- list(APPEND AV1_RC_QMODE_SOURCES "${AOM_ROOT}/test/mock_ratectrl_qmode.h"
- "${AOM_ROOT}/test/ratectrl_qmode_test.cc"
- "${AOM_ROOT}/test/ducky_encode_test.cc"
- "${AOM_ROOT}/common/y4minput.c" "${AOM_ROOT}/common/y4minput.h"
- "${AOM_ROOT}/common/tools_common.c"
- "${AOM_ROOT}/common/tools_common.h"
- "${AOM_GEN_SRC_DIR}/usage_exit.c")
+ "${AOM_ROOT}/test/test_aom_rc.cc" "${AOM_ROOT}/test/util.h")
+ if(CONFIG_THREE_PASS)
+ # Add the dependencies of "${AOM_ROOT}/common/ivfdec.c".
+ list(APPEND AOM_RC_TEST_SOURCES "${AOM_ROOT}/common/tools_common.c"
+ "${AOM_ROOT}/common/tools_common.h"
+ "${AOM_GEN_SRC_DIR}/usage_exit.c")
+ endif()
endif()
if(ENABLE_TESTS)
@@ -575,65 +595,61 @@
endif()
endif()
- # Collect all variables containing libaom test source files.
- get_cmake_property(all_cmake_vars VARIABLES)
- foreach(var ${all_cmake_vars})
-
- # https://github.com/cheshirekow/cmake_format/issues/34
- # cmake-format: off
- if (("${var}" MATCHES "_TEST_" AND NOT
- "${var}" MATCHES
- "_DATA_\|_CMAKE_\|INTRA_PRED\|_COMPILED\|_HOSTING\|_PERF_\|CODER_")
- OR (CONFIG_AV1_ENCODER AND ENABLE_ENCODE_PERF_TESTS AND
- "${var}" MATCHES "_ENCODE_PERF_TEST_")
- OR (CONFIG_AV1_DECODER AND ENABLE_DECODE_PERF_TESTS AND
- "${var}" MATCHES "_DECODE_PERF_TEST_")
- OR (CONFIG_AV1_ENCODER AND "${var}" MATCHES "_TEST_ENCODER_")
- OR (CONFIG_AV1_DECODER AND "${var}" MATCHES "_TEST_DECODER_"))
- list(APPEND aom_test_source_vars ${var})
- endif()
- # cmake-format: on
- endforeach()
-
# Libaom_test_srcs.txt generation.
set(libaom_test_srcs_txt_file "${AOM_CONFIG_DIR}/libaom_test_srcs.txt")
file(WRITE "${libaom_test_srcs_txt_file}"
"# This file is generated. DO NOT EDIT.\n")
# Static source file list first.
- foreach(aom_test_source_var ${aom_test_source_vars})
+ list(SORT AOM_TEST_SOURCE_VARS)
+ foreach(aom_test_source_var ${AOM_TEST_SOURCE_VARS})
+ if("${aom_test_source_var}" STREQUAL "${last_aom_test_source_var}")
+ message(
+ FATAL_ERROR
+ "Duplicate AOM_TEST_SOURCE_VARS entry: ${aom_test_source_var}")
+ endif()
foreach(file ${${aom_test_source_var}})
if(NOT "${file}" MATCHES "${AOM_CONFIG_DIR}")
string(REPLACE "${AOM_ROOT}/" "" file "${file}")
file(APPEND "${libaom_test_srcs_txt_file}" "${file}\n")
endif()
endforeach()
+ set(last_aom_test_source_var ${aom_test_source_var})
+ endforeach()
+
+ # libaom_test_srcs.gni generation
+ set(libaom_test_srcs_gni_file "${AOM_CONFIG_DIR}/libaom_test_srcs.gni")
+ file(WRITE "${libaom_test_srcs_gni_file}"
+ "# This file is generated. DO NOT EDIT.\n")
+
+ foreach(aom_test_source_var ${AOM_TEST_SOURCE_VARS})
+ string(TOLOWER "${aom_test_source_var}" aom_test_source_var_lowercase)
+ file(APPEND "${libaom_test_srcs_gni_file}"
+ "\n${aom_test_source_var_lowercase} = [\n")
+
+ foreach(file ${${aom_test_source_var}})
+ if(NOT "${file}" MATCHES "${AOM_CONFIG_DIR}")
+ string(REPLACE "${AOM_ROOT}/" "//third_party/libaom/source/libaom/" file
+ "${file}")
+ file(APPEND "${libaom_test_srcs_gni_file}" " \"${file}\",\n")
+ endif()
+ endforeach()
+
+ file(APPEND "${libaom_test_srcs_gni_file}" "]\n")
endforeach()
# Set up test for rc interface
- if(CONFIG_AV1_RC_RTC
- AND CONFIG_AV1_ENCODER
- AND ENABLE_TESTS
- AND CONFIG_WEBM_IO
- AND NOT BUILD_SHARED_LIBS)
- add_executable(test_aom_rc_interface ${AOM_RC_INTERFACE_SOURCES})
- target_link_libraries(test_aom_rc_interface ${AOM_LIB_LINK_TYPE} aom
- aom_av1_rc aom_gtest webm)
- set_property(TARGET test_aom_rc_interface
- PROPERTY FOLDER ${AOM_IDE_TEST_FOLDER})
- list(APPEND AOM_APP_TARGETS test_aom_rc_interface)
- endif()
-
if(CONFIG_AV1_ENCODER
AND ENABLE_TESTS
+ AND CONFIG_WEBM_IO
AND NOT BUILD_SHARED_LIBS
AND NOT CONFIG_REALTIME_ONLY)
- add_executable(test_av1_rc_qmode ${AV1_RC_QMODE_SOURCES})
- target_link_libraries(test_av1_rc_qmode ${AOM_LIB_LINK_TYPE} aom
- av1_rc_qmode aom_gtest aom_gmock)
- set_property(TARGET test_av1_rc_qmode
- PROPERTY FOLDER ${AOM_IDE_TEST_FOLDER})
- list(APPEND AOM_APP_TARGETS test_av1_rc_qmode)
+ add_executable(test_aom_rc ${AOM_RC_TEST_SOURCES})
+ target_link_libraries(test_aom_rc ${AOM_LIB_LINK_TYPE} aom aom_av1_rc
+ aom_gtest aom_gmock webm)
+ set_property(TARGET test_aom_rc PROPERTY FOLDER ${AOM_IDE_TEST_FOLDER})
+ list(APPEND AOM_APP_TARGETS test_aom_rc)
endif()
+
set(AOM_APP_TARGETS ${AOM_APP_TARGETS} PARENT_SCOPE)
endfunction()
diff --git a/test/test_aom_rc_interface.cc b/test/test_aom_rc.cc
similarity index 100%
rename from test/test_aom_rc_interface.cc
rename to test/test_aom_rc.cc
diff --git a/test/test_data_util.cmake b/test/test_data_util.cmake
index b5d6fda..de7d153 100644
--- a/test/test_data_util.cmake
+++ b/test/test_data_util.cmake
@@ -38,8 +38,8 @@
"niklas_640_480_30.yuv"
"vase10x10.yuv"
"vase10x10_tiles.txt"
- "firstpass_stats"
- "bus_352x288_420_f20_b8.yuv")
+ "bus_352x288_420_f20_b8.yuv"
+ "test_input_w1h1.yuv")
if(ENABLE_DECODE_PERF_TESTS AND CONFIG_AV1_ENCODER)
list(APPEND AOM_TEST_DATA_FILE_NAMES "niklas_1280_720_30.yuv")
diff --git a/test/test_intra_pred_speed.cc b/test/test_intra_pred_speed.cc
index bf90d4a..d5c94be 100644
--- a/test/test_intra_pred_speed.cc
+++ b/test/test_intra_pred_speed.cc
@@ -468,12 +468,16 @@
aom_h_predictor_4x4_neon, aom_paeth_predictor_4x4_neon,
aom_smooth_predictor_4x4_neon, aom_smooth_v_predictor_4x4_neon,
aom_smooth_h_predictor_4x4_neon)
-INTRA_PRED_TEST(NEON, TX_4X8, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_4x8_neon,
+INTRA_PRED_TEST(NEON, TX_4X8, aom_dc_predictor_4x8_neon,
+ aom_dc_left_predictor_4x8_neon, aom_dc_top_predictor_4x8_neon,
+ aom_dc_128_predictor_4x8_neon, aom_v_predictor_4x8_neon,
+ aom_h_predictor_4x8_neon, aom_paeth_predictor_4x8_neon,
aom_smooth_predictor_4x8_neon, aom_smooth_v_predictor_4x8_neon,
aom_smooth_h_predictor_4x8_neon)
-INTRA_PRED_TEST(NEON, TX_4X16, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_4x16_neon,
+INTRA_PRED_TEST(NEON, TX_4X16, aom_dc_predictor_4x16_neon,
+ aom_dc_left_predictor_4x16_neon, aom_dc_top_predictor_4x16_neon,
+ aom_dc_128_predictor_4x16_neon, aom_v_predictor_4x16_neon,
+ aom_h_predictor_4x16_neon, aom_paeth_predictor_4x16_neon,
aom_smooth_predictor_4x16_neon,
aom_smooth_v_predictor_4x16_neon,
aom_smooth_h_predictor_4x16_neon)
@@ -555,17 +559,23 @@
aom_h_predictor_8x8_neon, aom_paeth_predictor_8x8_neon,
aom_smooth_predictor_8x8_neon, aom_smooth_v_predictor_8x8_neon,
aom_smooth_h_predictor_8x8_neon)
-INTRA_PRED_TEST(NEON, TX_8X4, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_8x4_neon,
+INTRA_PRED_TEST(NEON, TX_8X4, aom_dc_predictor_8x4_neon,
+ aom_dc_left_predictor_8x4_neon, aom_dc_top_predictor_8x4_neon,
+ aom_dc_128_predictor_8x4_neon, aom_v_predictor_8x4_neon,
+ aom_h_predictor_8x4_neon, aom_paeth_predictor_8x4_neon,
aom_smooth_predictor_8x4_neon, aom_smooth_v_predictor_8x4_neon,
aom_smooth_h_predictor_8x4_neon)
-INTRA_PRED_TEST(NEON, TX_8X16, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_8x16_neon,
+INTRA_PRED_TEST(NEON, TX_8X16, aom_dc_predictor_8x16_neon,
+ aom_dc_left_predictor_8x16_neon, aom_dc_top_predictor_8x16_neon,
+ aom_dc_128_predictor_8x16_neon, aom_v_predictor_8x16_neon,
+ aom_h_predictor_8x16_neon, aom_paeth_predictor_8x16_neon,
aom_smooth_predictor_8x16_neon,
aom_smooth_v_predictor_8x16_neon,
aom_smooth_h_predictor_8x16_neon)
-INTRA_PRED_TEST(NEON, TX_8X32, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_8x32_neon,
+INTRA_PRED_TEST(NEON, TX_8X32, aom_dc_predictor_8x32_neon,
+ aom_dc_left_predictor_8x32_neon, aom_dc_top_predictor_8x32_neon,
+ aom_dc_128_predictor_8x32_neon, aom_v_predictor_8x32_neon,
+ aom_h_predictor_8x32_neon, aom_paeth_predictor_8x32_neon,
aom_smooth_predictor_8x32_neon,
aom_smooth_v_predictor_8x32_neon,
aom_smooth_h_predictor_8x32_neon)
@@ -683,23 +693,33 @@
aom_smooth_predictor_16x16_neon,
aom_smooth_v_predictor_16x16_neon,
aom_smooth_h_predictor_16x16_neon)
-INTRA_PRED_TEST(NEON, TX_16X8, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_16x8_neon,
+INTRA_PRED_TEST(NEON, TX_16X8, aom_dc_predictor_16x8_neon,
+ aom_dc_left_predictor_16x8_neon, aom_dc_top_predictor_16x8_neon,
+ aom_dc_128_predictor_16x8_neon, aom_v_predictor_16x8_neon,
+ aom_h_predictor_16x8_neon, aom_paeth_predictor_16x8_neon,
aom_smooth_predictor_16x8_neon,
aom_smooth_v_predictor_16x8_neon,
aom_smooth_h_predictor_16x8_neon)
-INTRA_PRED_TEST(NEON, TX_16X32, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_16x32_neon,
+INTRA_PRED_TEST(NEON, TX_16X32, aom_dc_predictor_16x32_neon,
+ aom_dc_left_predictor_16x32_neon,
+ aom_dc_top_predictor_16x32_neon,
+ aom_dc_128_predictor_16x32_neon, aom_v_predictor_16x32_neon,
+ aom_h_predictor_16x32_neon, aom_paeth_predictor_16x32_neon,
aom_smooth_predictor_16x32_neon,
aom_smooth_v_predictor_16x32_neon,
aom_smooth_h_predictor_16x32_neon)
-INTRA_PRED_TEST(NEON, TX_16X4, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_16x4_neon,
+INTRA_PRED_TEST(NEON, TX_16X4, aom_dc_predictor_16x4_neon,
+ aom_dc_left_predictor_16x4_neon, aom_dc_top_predictor_16x4_neon,
+ aom_dc_128_predictor_16x4_neon, aom_v_predictor_16x4_neon,
+ aom_h_predictor_16x4_neon, aom_paeth_predictor_16x4_neon,
aom_smooth_predictor_16x4_neon,
aom_smooth_v_predictor_16x4_neon,
aom_smooth_h_predictor_16x4_neon)
-INTRA_PRED_TEST(NEON, TX_16X64, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_16x64_neon,
+INTRA_PRED_TEST(NEON, TX_16X64, aom_dc_predictor_16x64_neon,
+ aom_dc_left_predictor_16x64_neon,
+ aom_dc_top_predictor_16x64_neon,
+ aom_dc_128_predictor_16x64_neon, aom_v_predictor_16x64_neon,
+ aom_h_predictor_16x64_neon, aom_paeth_predictor_16x64_neon,
aom_smooth_predictor_16x64_neon,
aom_smooth_v_predictor_16x64_neon,
aom_smooth_h_predictor_16x64_neon)
@@ -808,18 +828,26 @@
aom_smooth_predictor_32x32_neon,
aom_smooth_v_predictor_32x32_neon,
aom_smooth_h_predictor_32x32_neon)
-INTRA_PRED_TEST(NEON, TX_32X16, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_32x16_neon,
+INTRA_PRED_TEST(NEON, TX_32X16, aom_dc_predictor_32x16_neon,
+ aom_dc_left_predictor_32x16_neon,
+ aom_dc_top_predictor_32x16_neon,
+ aom_dc_128_predictor_32x16_neon, aom_v_predictor_32x16_neon,
+ aom_h_predictor_32x16_neon, aom_paeth_predictor_32x16_neon,
aom_smooth_predictor_32x16_neon,
aom_smooth_v_predictor_32x16_neon,
aom_smooth_h_predictor_32x16_neon)
-INTRA_PRED_TEST(NEON, TX_32X64, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_32x64_neon,
+INTRA_PRED_TEST(NEON, TX_32X64, aom_dc_predictor_32x64_neon,
+ aom_dc_left_predictor_32x64_neon,
+ aom_dc_top_predictor_32x64_neon,
+ aom_dc_128_predictor_32x64_neon, aom_v_predictor_32x64_neon,
+ aom_h_predictor_32x64_neon, aom_paeth_predictor_32x64_neon,
aom_smooth_predictor_32x64_neon,
aom_smooth_v_predictor_32x64_neon,
aom_smooth_h_predictor_32x64_neon)
-INTRA_PRED_TEST(NEON, TX_32X8, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_32x8_neon,
+INTRA_PRED_TEST(NEON, TX_32X8, aom_dc_predictor_32x8_neon,
+ aom_dc_left_predictor_32x8_neon, aom_dc_top_predictor_32x8_neon,
+ aom_dc_128_predictor_32x8_neon, aom_v_predictor_32x8_neon,
+ aom_h_predictor_32x8_neon, aom_paeth_predictor_32x8_neon,
aom_smooth_predictor_32x8_neon,
aom_smooth_v_predictor_32x8_neon,
aom_smooth_h_predictor_32x8_neon)
@@ -905,18 +933,27 @@
#endif
#if HAVE_NEON
-INTRA_PRED_TEST(NEON, TX_64X64, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_64x64_neon,
+INTRA_PRED_TEST(NEON, TX_64X64, aom_dc_predictor_64x64_neon,
+ aom_dc_left_predictor_64x64_neon,
+ aom_dc_top_predictor_64x64_neon,
+ aom_dc_128_predictor_64x64_neon, aom_v_predictor_64x64_neon,
+ aom_h_predictor_64x64_neon, aom_paeth_predictor_64x64_neon,
aom_smooth_predictor_64x64_neon,
aom_smooth_v_predictor_64x64_neon,
aom_smooth_h_predictor_64x64_neon)
-INTRA_PRED_TEST(NEON, TX_64X32, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_64x32_neon,
+INTRA_PRED_TEST(NEON, TX_64X32, aom_dc_predictor_64x32_neon,
+ aom_dc_left_predictor_64x32_neon,
+ aom_dc_top_predictor_64x32_neon,
+ aom_dc_128_predictor_64x32_neon, aom_v_predictor_64x32_neon,
+ aom_h_predictor_64x32_neon, aom_paeth_predictor_64x32_neon,
aom_smooth_predictor_64x32_neon,
aom_smooth_v_predictor_64x32_neon,
aom_smooth_h_predictor_64x32_neon)
-INTRA_PRED_TEST(NEON, TX_64X16, nullptr, nullptr, nullptr, nullptr, nullptr,
- nullptr, aom_paeth_predictor_64x16_neon,
+INTRA_PRED_TEST(NEON, TX_64X16, aom_dc_predictor_64x16_neon,
+ aom_dc_left_predictor_64x16_neon,
+ aom_dc_top_predictor_64x16_neon,
+ aom_dc_128_predictor_64x16_neon, aom_v_predictor_64x16_neon,
+ aom_h_predictor_64x16_neon, aom_paeth_predictor_64x16_neon,
aom_smooth_predictor_64x16_neon,
aom_smooth_v_predictor_64x16_neon,
aom_smooth_h_predictor_64x16_neon)
@@ -1268,20 +1305,32 @@
nullptr, nullptr)
#endif
#if HAVE_NEON
-HIGHBD_INTRA_PRED_TEST(NEON, TX_4X4, aom_highbd_dc_predictor_4x4_neon, nullptr,
- nullptr, nullptr, aom_highbd_v_predictor_4x4_neon,
- nullptr, aom_highbd_paeth_predictor_4x4_neon,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_4X4, aom_highbd_dc_predictor_4x4_neon,
+ aom_highbd_dc_left_predictor_4x4_neon,
+ aom_highbd_dc_top_predictor_4x4_neon,
+ aom_highbd_dc_128_predictor_4x4_neon,
+ aom_highbd_v_predictor_4x4_neon,
+ aom_highbd_h_predictor_4x4_neon,
+ aom_highbd_paeth_predictor_4x4_neon,
aom_highbd_smooth_predictor_4x4_neon,
aom_highbd_smooth_v_predictor_4x4_neon,
aom_highbd_smooth_h_predictor_4x4_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_4X8, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_4x8_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_4X8, aom_highbd_dc_predictor_4x8_neon,
+ aom_highbd_dc_left_predictor_4x8_neon,
+ aom_highbd_dc_top_predictor_4x8_neon,
+ aom_highbd_dc_128_predictor_4x8_neon,
+ aom_highbd_v_predictor_4x8_neon,
+ aom_highbd_h_predictor_4x8_neon,
aom_highbd_paeth_predictor_4x8_neon,
aom_highbd_smooth_predictor_4x8_neon,
aom_highbd_smooth_v_predictor_4x8_neon,
aom_highbd_smooth_h_predictor_4x8_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_4X16, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_4x16_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_4X16, aom_highbd_dc_predictor_4x16_neon,
+ aom_highbd_dc_left_predictor_4x16_neon,
+ aom_highbd_dc_top_predictor_4x16_neon,
+ aom_highbd_dc_128_predictor_4x16_neon,
+ aom_highbd_v_predictor_4x16_neon,
+ aom_highbd_h_predictor_4x16_neon,
aom_highbd_paeth_predictor_4x16_neon,
aom_highbd_smooth_predictor_4x16_neon,
aom_highbd_smooth_v_predictor_4x16_neon,
@@ -1350,26 +1399,42 @@
#endif
#if HAVE_NEON
-HIGHBD_INTRA_PRED_TEST(NEON, TX_8X8, aom_highbd_dc_predictor_8x8_neon, nullptr,
- nullptr, nullptr, aom_highbd_v_predictor_8x8_neon,
- nullptr, aom_highbd_paeth_predictor_8x8_neon,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_8X8, aom_highbd_dc_predictor_8x8_neon,
+ aom_highbd_dc_left_predictor_8x8_neon,
+ aom_highbd_dc_top_predictor_8x8_neon,
+ aom_highbd_dc_128_predictor_8x8_neon,
+ aom_highbd_v_predictor_8x8_neon,
+ aom_highbd_h_predictor_8x8_neon,
+ aom_highbd_paeth_predictor_8x8_neon,
aom_highbd_smooth_predictor_8x8_neon,
aom_highbd_smooth_v_predictor_8x8_neon,
aom_highbd_smooth_h_predictor_8x8_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_8X4, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_8x4_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_8X4, aom_highbd_dc_predictor_8x4_neon,
+ aom_highbd_dc_left_predictor_8x4_neon,
+ aom_highbd_dc_top_predictor_8x4_neon,
+ aom_highbd_dc_128_predictor_8x4_neon,
+ aom_highbd_v_predictor_8x4_neon,
+ aom_highbd_h_predictor_8x4_neon,
aom_highbd_paeth_predictor_8x4_neon,
aom_highbd_smooth_predictor_8x4_neon,
aom_highbd_smooth_v_predictor_8x4_neon,
aom_highbd_smooth_h_predictor_8x4_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_8X16, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_8x16_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_8X16, aom_highbd_dc_predictor_8x16_neon,
+ aom_highbd_dc_left_predictor_8x16_neon,
+ aom_highbd_dc_top_predictor_8x16_neon,
+ aom_highbd_dc_128_predictor_8x16_neon,
+ aom_highbd_v_predictor_8x16_neon,
+ aom_highbd_h_predictor_8x16_neon,
aom_highbd_paeth_predictor_8x16_neon,
aom_highbd_smooth_predictor_8x16_neon,
aom_highbd_smooth_v_predictor_8x16_neon,
aom_highbd_smooth_h_predictor_8x16_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_8X32, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_8x32_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_8X32, aom_highbd_dc_predictor_8x32_neon,
+ aom_highbd_dc_left_predictor_8x32_neon,
+ aom_highbd_dc_top_predictor_8x32_neon,
+ aom_highbd_dc_128_predictor_8x32_neon,
+ aom_highbd_v_predictor_8x32_neon,
+ aom_highbd_h_predictor_8x32_neon,
aom_highbd_paeth_predictor_8x32_neon,
aom_highbd_smooth_predictor_8x32_neon,
aom_highbd_smooth_v_predictor_8x32_neon,
@@ -1457,32 +1522,51 @@
#if HAVE_NEON
HIGHBD_INTRA_PRED_TEST(NEON, TX_16X16, aom_highbd_dc_predictor_16x16_neon,
- nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_16x16_neon, nullptr,
+ aom_highbd_dc_left_predictor_16x16_neon,
+ aom_highbd_dc_top_predictor_16x16_neon,
+ aom_highbd_dc_128_predictor_16x16_neon,
+ aom_highbd_v_predictor_16x16_neon,
+ aom_highbd_h_predictor_16x16_neon,
aom_highbd_paeth_predictor_16x16_neon,
aom_highbd_smooth_predictor_16x16_neon,
aom_highbd_smooth_v_predictor_16x16_neon,
aom_highbd_smooth_h_predictor_16x16_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_16X8, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_16x8_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_16X8, aom_highbd_dc_predictor_16x8_neon,
+ aom_highbd_dc_left_predictor_16x8_neon,
+ aom_highbd_dc_top_predictor_16x8_neon,
+ aom_highbd_dc_128_predictor_16x8_neon,
+ aom_highbd_v_predictor_16x8_neon,
+ aom_highbd_h_predictor_16x8_neon,
aom_highbd_paeth_predictor_16x8_neon,
aom_highbd_smooth_predictor_16x8_neon,
aom_highbd_smooth_v_predictor_16x8_neon,
aom_highbd_smooth_h_predictor_16x8_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_16X32, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_16x32_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_16X32, aom_highbd_dc_predictor_16x32_neon,
+ aom_highbd_dc_left_predictor_16x32_neon,
+ aom_highbd_dc_top_predictor_16x32_neon,
+ aom_highbd_dc_128_predictor_16x32_neon,
+ aom_highbd_v_predictor_16x32_neon,
+ aom_highbd_h_predictor_16x32_neon,
aom_highbd_paeth_predictor_16x32_neon,
aom_highbd_smooth_predictor_16x32_neon,
aom_highbd_smooth_v_predictor_16x32_neon,
aom_highbd_smooth_h_predictor_16x32_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_16X4, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_16x4_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_16X4, aom_highbd_dc_predictor_16x4_neon,
+ aom_highbd_dc_left_predictor_16x4_neon,
+ aom_highbd_dc_top_predictor_16x4_neon,
+ aom_highbd_dc_128_predictor_16x4_neon,
+ aom_highbd_v_predictor_16x4_neon,
+ aom_highbd_h_predictor_16x4_neon,
aom_highbd_paeth_predictor_16x4_neon,
aom_highbd_smooth_predictor_16x4_neon,
aom_highbd_smooth_v_predictor_16x4_neon,
aom_highbd_smooth_h_predictor_16x4_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_16X64, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_16x64_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_16X64, aom_highbd_dc_predictor_16x64_neon,
+ aom_highbd_dc_left_predictor_16x64_neon,
+ aom_highbd_dc_top_predictor_16x64_neon,
+ aom_highbd_dc_128_predictor_16x64_neon,
+ aom_highbd_v_predictor_16x64_neon,
+ aom_highbd_h_predictor_16x64_neon,
aom_highbd_paeth_predictor_16x64_neon,
aom_highbd_smooth_predictor_16x64_neon,
aom_highbd_smooth_v_predictor_16x64_neon,
@@ -1553,26 +1637,41 @@
#if HAVE_NEON
HIGHBD_INTRA_PRED_TEST(NEON, TX_32X32, aom_highbd_dc_predictor_32x32_neon,
- nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_32x32_neon, nullptr,
+ aom_highbd_dc_left_predictor_32x32_neon,
+ aom_highbd_dc_top_predictor_32x32_neon,
+ aom_highbd_dc_128_predictor_32x32_neon,
+ aom_highbd_v_predictor_32x32_neon,
+ aom_highbd_h_predictor_32x32_neon,
aom_highbd_paeth_predictor_32x32_neon,
aom_highbd_smooth_predictor_32x32_neon,
aom_highbd_smooth_v_predictor_32x32_neon,
aom_highbd_smooth_h_predictor_32x32_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_32X16, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_32x16_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_32X16, aom_highbd_dc_predictor_32x16_neon,
+ aom_highbd_dc_left_predictor_32x16_neon,
+ aom_highbd_dc_top_predictor_32x16_neon,
+ aom_highbd_dc_128_predictor_32x16_neon,
+ aom_highbd_v_predictor_32x16_neon,
+ aom_highbd_h_predictor_32x16_neon,
aom_highbd_paeth_predictor_32x16_neon,
aom_highbd_smooth_predictor_32x16_neon,
aom_highbd_smooth_v_predictor_32x16_neon,
aom_highbd_smooth_h_predictor_32x16_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_32X64, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_32x64_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_32X64, aom_highbd_dc_predictor_32x64_neon,
+ aom_highbd_dc_left_predictor_32x64_neon,
+ aom_highbd_dc_top_predictor_32x64_neon,
+ aom_highbd_dc_128_predictor_32x64_neon,
+ aom_highbd_v_predictor_32x64_neon,
+ aom_highbd_h_predictor_32x64_neon,
aom_highbd_paeth_predictor_32x64_neon,
aom_highbd_smooth_predictor_32x64_neon,
aom_highbd_smooth_v_predictor_32x64_neon,
aom_highbd_smooth_h_predictor_32x64_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_32X8, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_32x8_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_32X8, aom_highbd_dc_predictor_32x8_neon,
+ aom_highbd_dc_left_predictor_32x8_neon,
+ aom_highbd_dc_top_predictor_32x8_neon,
+ aom_highbd_dc_128_predictor_32x8_neon,
+ aom_highbd_v_predictor_32x8_neon,
+ aom_highbd_h_predictor_32x8_neon,
aom_highbd_paeth_predictor_32x8_neon,
aom_highbd_smooth_predictor_32x8_neon,
aom_highbd_smooth_v_predictor_32x8_neon,
@@ -1606,20 +1705,31 @@
#if HAVE_NEON
HIGHBD_INTRA_PRED_TEST(NEON, TX_64X64, aom_highbd_dc_predictor_64x64_neon,
- nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_64x64_neon, nullptr,
+ aom_highbd_dc_left_predictor_64x64_neon,
+ aom_highbd_dc_top_predictor_64x64_neon,
+ aom_highbd_dc_128_predictor_64x64_neon,
+ aom_highbd_v_predictor_64x64_neon,
+ aom_highbd_h_predictor_64x64_neon,
aom_highbd_paeth_predictor_64x64_neon,
aom_highbd_smooth_predictor_64x64_neon,
aom_highbd_smooth_v_predictor_64x64_neon,
aom_highbd_smooth_h_predictor_64x64_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_64X32, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_64x32_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_64X32, aom_highbd_dc_predictor_64x32_neon,
+ aom_highbd_dc_left_predictor_64x32_neon,
+ aom_highbd_dc_top_predictor_64x32_neon,
+ aom_highbd_dc_128_predictor_64x32_neon,
+ aom_highbd_v_predictor_64x32_neon,
+ aom_highbd_h_predictor_64x32_neon,
aom_highbd_paeth_predictor_64x32_neon,
aom_highbd_smooth_predictor_64x32_neon,
aom_highbd_smooth_v_predictor_64x32_neon,
aom_highbd_smooth_h_predictor_64x32_neon)
-HIGHBD_INTRA_PRED_TEST(NEON, TX_64X16, nullptr, nullptr, nullptr, nullptr,
- aom_highbd_v_predictor_64x16_neon, nullptr,
+HIGHBD_INTRA_PRED_TEST(NEON, TX_64X16, aom_highbd_dc_predictor_64x16_neon,
+ aom_highbd_dc_left_predictor_64x16_neon,
+ aom_highbd_dc_top_predictor_64x16_neon,
+ aom_highbd_dc_128_predictor_64x16_neon,
+ aom_highbd_v_predictor_64x16_neon,
+ aom_highbd_h_predictor_64x16_neon,
aom_highbd_paeth_predictor_64x16_neon,
aom_highbd_smooth_predictor_64x16_neon,
aom_highbd_smooth_v_predictor_64x16_neon,
diff --git a/test/test_libaom.cc b/test/test_libaom.cc
index b55d762..6ffbbc5 100644
--- a/test/test_libaom.cc
+++ b/test/test_libaom.cc
@@ -17,7 +17,7 @@
#include "config/aom_config.h"
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
#include "aom_ports/x86.h"
#endif
extern "C" {
@@ -26,30 +26,30 @@
extern void aom_scale_rtcd();
}
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
static void append_negative_gtest_filter(const char *str) {
- std::string filter = ::testing::FLAGS_gtest_filter;
+ std::string flag_value = GTEST_FLAG_GET(filter);
// Negative patterns begin with one '-' followed by a ':' separated list.
- if (filter.find('-') == std::string::npos) filter += '-';
+ if (flag_value.find('-') == std::string::npos) flag_value += '-';
// OPT.* matches TEST() functions
// OPT/* matches TEST_P() functions
// OPT_* matches tests which have been manually sharded.
// We do not match OPT* because of SSE/SSE2 collisions.
const char *search_terminators = "./_";
for (size_t pos = 0; pos < strlen(search_terminators); ++pos) {
- filter += ":";
- filter += str;
- filter += search_terminators[pos];
- filter += "*";
+ flag_value += ":";
+ flag_value += str;
+ flag_value += search_terminators[pos];
+ flag_value += "*";
}
- ::testing::FLAGS_gtest_filter = filter;
+ GTEST_FLAG_SET(filter, flag_value);
}
-#endif // ARCH_X86 || ARCH_X86_64
+#endif // AOM_ARCH_X86 || AOM_ARCH_X86_64
int main(int argc, char **argv) {
::testing::InitGoogleTest(&argc, argv);
-#if ARCH_X86 || ARCH_X86_64
+#if AOM_ARCH_X86 || AOM_ARCH_X86_64
const int simd_caps = x86_simd_caps();
if (!(simd_caps & HAS_MMX)) append_negative_gtest_filter("MMX");
if (!(simd_caps & HAS_SSE)) append_negative_gtest_filter("SSE");
@@ -60,7 +60,7 @@
if (!(simd_caps & HAS_SSE4_2)) append_negative_gtest_filter("SSE4_2");
if (!(simd_caps & HAS_AVX)) append_negative_gtest_filter("AVX");
if (!(simd_caps & HAS_AVX2)) append_negative_gtest_filter("AVX2");
-#endif // ARCH_X86 || ARCH_X86_64
+#endif // AOM_ARCH_X86 || AOM_ARCH_X86_64
// Shared library builds don't support whitebox tests that exercise internal
// symbols.
diff --git a/test/tpl_model_test.cc b/test/tpl_model_test.cc
index 674f202..91eb5e9 100644
--- a/test/tpl_model_test.cc
+++ b/test/tpl_model_test.cc
@@ -202,6 +202,7 @@
}
}
+#if CONFIG_BITRATE_ACCURACY
TEST(TplModelTest, TxfmStatsAccumulateTest) {
TplTxfmStats sub_stats;
av1_init_tpl_txfm_stats(&sub_stats);
@@ -248,6 +249,7 @@
EXPECT_DOUBLE_EQ(stats2.abs_coeff_sum[i], 2 * stats1.abs_coeff_sum[i]);
}
}
+#endif // CONFIG_BITRATE_ACCURACY
TEST(TplModelTest, ComputeMVDifferenceTest) {
TplDepFrame tpl_frame_small;
@@ -418,7 +420,7 @@
double min_bits_diff = fabs(curr_estimate - bit_budget);
// Start at q = 254 because we already have an estimate for q = 255.
for (int q = 254; q >= 0; q--) {
- double curr_estimate = av1_vbr_rc_info_estimate_gop_bitrate(
+ curr_estimate = av1_vbr_rc_info_estimate_gop_bitrate(
q, bit_depth, update_type_scale_factors, frame_count, update_type_list,
qstep_ratio_list, stats_list, q_index_list, estimated_bitrate_byframe);
double bits_diff = fabs(curr_estimate - bit_budget);
diff --git a/test/variance_test.cc b/test/variance_test.cc
index 25b8c8d..2863aea 100644
--- a/test/variance_test.cc
+++ b/test/variance_test.cc
@@ -42,6 +42,11 @@
uint32_t *sse8x8, int *sum8x8,
unsigned int *tot_sse, int *tot_sum,
uint32_t *var8x8);
+typedef void (*GetSseSum16x16DualFunc)(const uint8_t *a, int a_stride,
+ const uint8_t *b, int b_stride,
+ uint32_t *sse16x16,
+ unsigned int *tot_sse, int *tot_sum,
+ uint32_t *var16x16);
typedef unsigned int (*SubpixVarMxNFunc)(const uint8_t *a, int a_stride,
int xoffset, int yoffset,
const uint8_t *b, int b_stride,
@@ -51,8 +56,6 @@
const uint8_t *b, int b_stride,
uint32_t *sse,
const uint8_t *second_pred);
-typedef unsigned int (*Get4x4SseFunc)(const uint8_t *a, int a_stride,
- const uint8_t *b, int b_stride);
typedef unsigned int (*SumOfSquaresFunction)(const int16_t *src);
typedef unsigned int (*DistWtdSubpixAvgVarMxNFunc)(
const uint8_t *a, int a_stride, int xoffset, int yoffset, const uint8_t *b,
@@ -707,6 +710,12 @@
void MaxTestSseSum();
void SseSum_SpeedTest();
+ // SSE&SUM dual tests
+ void RefTestSseSumDual();
+ void MinTestSseSumDual();
+ void MaxTestSseSumDual();
+ void SseSum_SpeedTestDual();
+
// MSE/SSE tests
void RefTestMse();
void RefTestSse();
@@ -833,9 +842,11 @@
if (!use_high_bit_depth()) {
src_[j] = rnd_.Rand8();
ref_[j] = rnd_.Rand8();
+#if CONFIG_AV1_HIGHBITDEPTH
} else {
CONVERT_TO_SHORTPTR(src_)[j] = rnd_.Rand16() & mask();
CONVERT_TO_SHORTPTR(ref_)[j] = rnd_.Rand16() & mask();
+#endif // CONFIG_AV1_HIGHBITDEPTH
}
}
unsigned int sse;
@@ -872,14 +883,15 @@
const int stride = width();
int k = 0;
- for (int i = 0; i < height(); i += 8) {
- for (int j = 0; j < width(); j += 32) {
- API_REGISTER_STATE_CHECK(params_.func(
- src_ + stride * i + j, stride, ref_ + stride * i + j, stride,
- &sse1[k], &sum1[k], &sse_tot_simd, &sum_tot_simd, &var1[k]));
+ for (int row = 0; row < height(); row += 8) {
+ for (int col = 0; col < width(); col += 32) {
+ API_REGISTER_STATE_CHECK(params_.func(src_ + stride * row + col, stride,
+ ref_ + stride * row + col, stride,
+ &sse1[k], &sum1[k], &sse_tot_simd,
+ &sum_tot_simd, &var1[k]));
aom_get_var_sse_sum_8x8_quad_c(
- src_ + stride * i + j, stride, ref_ + stride * i + j, stride,
- &sse2[k], &sum2[k], &sse_tot_c, &sum_tot_c, &var2[k]);
+ src_ + stride * row + col, stride, ref_ + stride * row + col,
+ stride, &sse2[k], &sum2[k], &sse_tot_c, &sum_tot_c, &var2[k]);
k += 4;
}
}
@@ -976,12 +988,12 @@
ref_[j] = rnd_.Rand8();
}
- unsigned int sse1 = 0;
- unsigned int sse2 = 0;
- unsigned int var1 = 0;
- unsigned int var2 = 0;
- int sum1 = 0;
- int sum2 = 0;
+ unsigned int sse1[4] = { 0 };
+ unsigned int sse2[4] = { 0 };
+ unsigned int var1[4] = { 0 };
+ unsigned int var2[4] = { 0 };
+ int sum1[4] = { 0 };
+ int sum2[4] = { 0 };
unsigned int sse_tot_c = 0;
unsigned int sse_tot_simd = 0;
int sum_tot_c = 0;
@@ -994,8 +1006,8 @@
for (int i = 0; i < height(); i += 8) {
for (int j = 0; j < width(); j += 32) {
aom_get_var_sse_sum_8x8_quad_c(src_ + stride * i + j, stride,
- ref_ + stride * i + j, stride, &sse2,
- &sum2, &sse_tot_c, &sum_tot_c, &var2);
+ ref_ + stride * i + j, stride, sse2,
+ sum2, &sse_tot_c, &sum_tot_c, var2);
}
}
}
@@ -1008,7 +1020,7 @@
for (int i = 0; i < height(); i += 8) {
for (int j = 0; j < width(); j += 32) {
params_.func(src_ + stride * i + j, stride, ref_ + stride * i + j,
- stride, &sse1, &sum1, &sse_tot_simd, &sum_tot_simd, &var1);
+ stride, sse1, sum1, &sse_tot_simd, &sum_tot_simd, var1);
}
}
}
@@ -1022,6 +1034,171 @@
width(), height(), elapsed_time_ref, elapsed_time_simd,
elapsed_time_ref / elapsed_time_simd);
}
+
+template <typename GetSseSum16x16DualFuncType>
+void MainTestClass<GetSseSum16x16DualFuncType>::RefTestSseSumDual() {
+ for (int iter = 0; iter < 10; ++iter) {
+ for (int idx = 0; idx < block_size(); ++idx) {
+ src_[idx] = rnd_.Rand8();
+ ref_[idx] = rnd_.Rand8();
+ }
+ unsigned int sse1[64] = { 0 };
+ unsigned int sse2[64] = { 0 };
+ unsigned int var1[64] = { 0 };
+ unsigned int var2[64] = { 0 };
+ unsigned int sse_tot_c = 0;
+ unsigned int sse_tot_simd = 0;
+ int sum_tot_c = 0;
+ int sum_tot_simd = 0;
+ const int stride = width();
+ int k = 0;
+
+ for (int row = 0; row < height(); row += 16) {
+ for (int col = 0; col < width(); col += 32) {
+ API_REGISTER_STATE_CHECK(params_.func(
+ src_ + stride * row + col, stride, ref_ + stride * row + col,
+ stride, &sse1[k], &sse_tot_simd, &sum_tot_simd, &var1[k]));
+ aom_get_var_sse_sum_16x16_dual_c(
+ src_ + stride * row + col, stride, ref_ + stride * row + col,
+ stride, &sse2[k], &sse_tot_c, &sum_tot_c, &var2[k]);
+ k += 2;
+ }
+ }
+ EXPECT_EQ(sse_tot_c, sse_tot_simd);
+ EXPECT_EQ(sum_tot_c, sum_tot_simd);
+ for (int p = 0; p < 64; p++) {
+ EXPECT_EQ(sse1[p], sse2[p]);
+ EXPECT_EQ(sse_tot_simd, sse_tot_c);
+ EXPECT_EQ(sum_tot_simd, sum_tot_c);
+ EXPECT_EQ(var1[p], var2[p]);
+ }
+ }
+}
+
+template <typename GetSseSum16x16DualFuncType>
+void MainTestClass<GetSseSum16x16DualFuncType>::MinTestSseSumDual() {
+ memset(src_, 0, block_size());
+ memset(ref_, 255, block_size());
+ unsigned int sse1[64] = { 0 };
+ unsigned int sse2[64] = { 0 };
+ unsigned int var1[64] = { 0 };
+ unsigned int var2[64] = { 0 };
+ unsigned int sse_tot_c = 0;
+ unsigned int sse_tot_simd = 0;
+ int sum_tot_c = 0;
+ int sum_tot_simd = 0;
+ const int stride = width();
+ int k = 0;
+
+ for (int row = 0; row < height(); row += 16) {
+ for (int col = 0; col < width(); col += 32) {
+ API_REGISTER_STATE_CHECK(params_.func(
+ src_ + stride * row + col, stride, ref_ + stride * row + col, stride,
+ &sse1[k], &sse_tot_simd, &sum_tot_simd, &var1[k]));
+ aom_get_var_sse_sum_16x16_dual_c(
+ src_ + stride * row + col, stride, ref_ + stride * row + col, stride,
+ &sse2[k], &sse_tot_c, &sum_tot_c, &var2[k]);
+ k += 2;
+ }
+ }
+ EXPECT_EQ(sse_tot_simd, sse_tot_c);
+ EXPECT_EQ(sum_tot_simd, sum_tot_c);
+ for (int p = 0; p < 64; p++) {
+ EXPECT_EQ(sse1[p], sse2[p]);
+ EXPECT_EQ(var1[p], var2[p]);
+ }
+}
+
+template <typename GetSseSum16x16DualFuncType>
+void MainTestClass<GetSseSum16x16DualFuncType>::MaxTestSseSumDual() {
+ memset(src_, 255, block_size());
+ memset(ref_, 0, block_size());
+ unsigned int sse1[64] = { 0 };
+ unsigned int sse2[64] = { 0 };
+ unsigned int var1[64] = { 0 };
+ unsigned int var2[64] = { 0 };
+ unsigned int sse_tot_c = 0;
+ unsigned int sse_tot_simd = 0;
+ int sum_tot_c = 0;
+ int sum_tot_simd = 0;
+ const int stride = width();
+ int k = 0;
+
+ for (int row = 0; row < height(); row += 16) {
+ for (int col = 0; col < width(); col += 32) {
+ API_REGISTER_STATE_CHECK(params_.func(
+ src_ + stride * row + col, stride, ref_ + stride * row + col, stride,
+ &sse1[k], &sse_tot_simd, &sum_tot_simd, &var1[k]));
+ aom_get_var_sse_sum_16x16_dual_c(
+ src_ + stride * row + col, stride, ref_ + stride * row + col, stride,
+ &sse2[k], &sse_tot_c, &sum_tot_c, &var2[k]);
+ k += 2;
+ }
+ }
+ EXPECT_EQ(sse_tot_c, sse_tot_simd);
+ EXPECT_EQ(sum_tot_c, sum_tot_simd);
+
+ for (int p = 0; p < 64; p++) {
+ EXPECT_EQ(sse1[p], sse2[p]);
+ EXPECT_EQ(var1[p], var2[p]);
+ }
+}
+
+template <typename GetSseSum16x16DualFuncType>
+void MainTestClass<GetSseSum16x16DualFuncType>::SseSum_SpeedTestDual() {
+ const int loop_count = 1000000000 / block_size();
+ for (int idx = 0; idx < block_size(); ++idx) {
+ src_[idx] = rnd_.Rand8();
+ ref_[idx] = rnd_.Rand8();
+ }
+
+ unsigned int sse1[2] = { 0 };
+ unsigned int sse2[2] = { 0 };
+ unsigned int var1[2] = { 0 };
+ unsigned int var2[2] = { 0 };
+ unsigned int sse_tot_c = 0;
+ unsigned int sse_tot_simd = 0;
+ int sum_tot_c = 0;
+ int sum_tot_simd = 0;
+ const int stride = width();
+
+ aom_usec_timer timer;
+ aom_usec_timer_start(&timer);
+ for (int r = 0; r < loop_count; ++r) {
+ for (int row = 0; row < height(); row += 16) {
+ for (int col = 0; col < width(); col += 32) {
+ aom_get_var_sse_sum_16x16_dual_c(src_ + stride * row + col, stride,
+ ref_ + stride * row + col, stride,
+ sse2, &sse_tot_c, &sum_tot_c, var2);
+ }
+ }
+ }
+ aom_usec_timer_mark(&timer);
+ const double elapsed_time_ref =
+ static_cast<double>(aom_usec_timer_elapsed(&timer));
+
+ aom_usec_timer_start(&timer);
+ for (int r = 0; r < loop_count; ++r) {
+ for (int row = 0; row < height(); row += 16) {
+ for (int col = 0; col < width(); col += 32) {
+ params_.func(src_ + stride * row + col, stride,
+ ref_ + stride * row + col, stride, sse1, &sse_tot_simd,
+ &sum_tot_simd, var1);
+ }
+ }
+ }
+ aom_usec_timer_mark(&timer);
+ const double elapsed_time_simd =
+ static_cast<double>(aom_usec_timer_elapsed(&timer));
+
+ printf(
+ "aom_getvar_16x16_dual for block=%dx%d : ref_time=%lf \t simd_time=%lf "
+ "\t "
+ "gain=%lf \n",
+ width(), height(), elapsed_time_ref, elapsed_time_simd,
+ elapsed_time_ref / elapsed_time_simd);
+}
+
////////////////////////////////////////////////////////////////////////////////
// Tests related to MSE / SSE.
@@ -1029,14 +1206,21 @@
void MainTestClass<FunctionType>::RefTestMse() {
for (int i = 0; i < 10; ++i) {
for (int j = 0; j < block_size(); ++j) {
- src_[j] = rnd_.Rand8();
- ref_[j] = rnd_.Rand8();
+ if (!use_high_bit_depth()) {
+ src_[j] = rnd_.Rand8();
+ ref_[j] = rnd_.Rand8();
+#if CONFIG_AV1_HIGHBITDEPTH
+ } else {
+ CONVERT_TO_SHORTPTR(src_)[j] = rnd_.Rand16() & mask();
+ CONVERT_TO_SHORTPTR(ref_)[j] = rnd_.Rand16() & mask();
+#endif // CONFIG_AV1_HIGHBITDEPTH
+ }
}
unsigned int sse1, sse2;
const int stride = width();
API_REGISTER_STATE_CHECK(params_.func(src_, stride, ref_, stride, &sse1));
variance_ref(src_, ref_, params_.log2width, params_.log2height, stride,
- stride, &sse2, false, AOM_BITS_8);
+ stride, &sse2, use_high_bit_depth(), params_.bit_depth);
EXPECT_EQ(sse1, sse2);
}
}
@@ -1060,11 +1244,25 @@
template <typename FunctionType>
void MainTestClass<FunctionType>::MaxTestMse() {
- memset(src_, 255, block_size());
- memset(ref_, 0, block_size());
+ int max_value = (1 << params_.bit_depth) - 1;
+ if (!use_high_bit_depth()) {
+ memset(src_, max_value, block_size());
+ memset(ref_, 0, block_size());
+#if CONFIG_AV1_HIGHBITDEPTH
+ } else {
+ aom_memset16(CONVERT_TO_SHORTPTR(src_), max_value, block_size());
+ aom_memset16(CONVERT_TO_SHORTPTR(ref_), 0, block_size());
+#endif // CONFIG_AV1_HIGHBITDEPTH
+ }
unsigned int sse;
API_REGISTER_STATE_CHECK(params_.func(src_, width(), ref_, width(), &sse));
- const unsigned int expected = block_size() * 255 * 255;
+ unsigned int expected = (unsigned int)block_size() * max_value * max_value;
+ switch (params_.bit_depth) {
+ case AOM_BITS_12: expected = ROUND_POWER_OF_TWO(expected, 8); break;
+ case AOM_BITS_10: expected = ROUND_POWER_OF_TWO(expected, 4); break;
+ case AOM_BITS_8:
+ default: break;
+ }
EXPECT_EQ(expected, sse);
}
@@ -1496,10 +1694,10 @@
typedef MseWxHTestClass<MseWxH16bitFunc> MseWxHTest;
typedef Mse16xHTestClass<Mse16xH16bitFunc> Mse16xHTest;
-typedef MainTestClass<Get4x4SseFunc> AvxSseTest;
typedef MainTestClass<VarianceMxNFunc> AvxMseTest;
typedef MainTestClass<VarianceMxNFunc> AvxVarianceTest;
typedef MainTestClass<GetSseSum8x8QuadFunc> GetSseSum8x8QuadTest;
+typedef MainTestClass<GetSseSum16x16DualFunc> GetSseSum16x16DualTest;
typedef SubpelVarianceTest<SubpixVarMxNFunc> AvxSubpelVarianceTest;
typedef SubpelVarianceTest<SubpixAvgVarMxNFunc> AvxSubpelAvgVarianceTest;
typedef SubpelVarianceTest<DistWtdSubpixAvgVarMxNFunc>
@@ -1510,8 +1708,6 @@
typedef TestParams<MseWxH16bitFunc> MseWxHParams;
typedef TestParams<Mse16xH16bitFunc> Mse16xHParams;
-TEST_P(AvxSseTest, RefSse) { RefTestSse(); }
-TEST_P(AvxSseTest, MaxSse) { MaxTestSse(); }
TEST_P(MseWxHTest, RefMse) { RefMatchTestMse(); }
TEST_P(MseWxHTest, DISABLED_SpeedMse) { SpeedTest(); }
TEST_P(Mse16xHTest, RefMse) { RefMatchTestMse(); }
@@ -1528,6 +1724,10 @@
TEST_P(GetSseSum8x8QuadTest, MinSseSum) { MinTestSseSum(); }
TEST_P(GetSseSum8x8QuadTest, MaxMseSum) { MaxTestSseSum(); }
TEST_P(GetSseSum8x8QuadTest, DISABLED_Speed) { SseSum_SpeedTest(); }
+TEST_P(GetSseSum16x16DualTest, RefMseSum) { RefTestSseSumDual(); }
+TEST_P(GetSseSum16x16DualTest, MinSseSum) { MinTestSseSumDual(); }
+TEST_P(GetSseSum16x16DualTest, MaxMseSum) { MaxTestSseSumDual(); }
+TEST_P(GetSseSum16x16DualTest, DISABLED_Speed) { SseSum_SpeedTestDual(); }
TEST_P(SumOfSquaresTest, Const) { ConstTest(); }
TEST_P(SumOfSquaresTest, Ref) { RefTest(); }
TEST_P(AvxSubpelVarianceTest, Ref) { RefTest(); }
@@ -1558,11 +1758,6 @@
INSTANTIATE_TEST_SUITE_P(C, SumOfSquaresTest,
::testing::Values(aom_get_mb_ss_c));
-typedef TestParams<Get4x4SseFunc> SseParams;
-INSTANTIATE_TEST_SUITE_P(C, AvxSseTest,
- ::testing::Values(SseParams(2, 2,
- &aom_get4x4sse_cs_c)));
-
typedef TestParams<VarianceMxNFunc> MseParams;
INSTANTIATE_TEST_SUITE_P(C, AvxMseTest,
::testing::Values(MseParams(4, 4, &aom_mse16x16_c),
@@ -1610,6 +1805,17 @@
INSTANTIATE_TEST_SUITE_P(C, GetSseSum8x8QuadTest,
::testing::ValuesIn(kArrayGetSseSum8x8Quad_c));
+typedef TestParams<GetSseSum16x16DualFunc> GetSseSumParamsDual;
+const GetSseSumParamsDual kArrayGetSseSum16x16Dual_c[] = {
+ GetSseSumParamsDual(7, 7, &aom_get_var_sse_sum_16x16_dual_c, 0),
+ GetSseSumParamsDual(6, 6, &aom_get_var_sse_sum_16x16_dual_c, 0),
+ GetSseSumParamsDual(5, 5, &aom_get_var_sse_sum_16x16_dual_c, 0),
+ GetSseSumParamsDual(5, 4, &aom_get_var_sse_sum_16x16_dual_c, 0)
+};
+
+INSTANTIATE_TEST_SUITE_P(C, GetSseSum16x16DualTest,
+ ::testing::ValuesIn(kArrayGetSseSum16x16Dual_c));
+
typedef TestParams<SubpixVarMxNFunc> SubpelVarianceParams;
const SubpelVarianceParams kArraySubpelVariance_c[] = {
SubpelVarianceParams(7, 7, &aom_sub_pixel_variance128x128_c, 0),
@@ -1865,6 +2071,7 @@
TEST_P(MseHBDWxHTest, DISABLED_SpeedMse) { SpeedTest(); }
TEST_P(AvxHBDMseTest, RefMse) { RefTestMse(); }
TEST_P(AvxHBDMseTest, MaxMse) { MaxTestMse(); }
+TEST_P(AvxHBDMseTest, DISABLED_SpeedMse) { SpeedTest(); }
TEST_P(AvxHBDVarianceTest, Zero) { ZeroTest(); }
TEST_P(AvxHBDVarianceTest, Ref) { RefTest(); }
TEST_P(AvxHBDVarianceTest, RefStride) { RefStrideTest(); }
@@ -1882,22 +2089,37 @@
MseHBDWxHParams(2, 3, &aom_mse_wxh_16bit_highbd_c, 10),
MseHBDWxHParams(2, 2, &aom_mse_wxh_16bit_highbd_c, 10)));
-/* TODO(debargha): This test does not support the highbd version
INSTANTIATE_TEST_SUITE_P(
C, AvxHBDMseTest,
- ::testing::Values(make_tuple(4, 4, &aom_highbd_12_mse16x16_c),
- make_tuple(4, 4, &aom_highbd_12_mse16x8_c),
- make_tuple(4, 4, &aom_highbd_12_mse8x16_c),
- make_tuple(4, 4, &aom_highbd_12_mse8x8_c),
- make_tuple(4, 4, &aom_highbd_10_mse16x16_c),
- make_tuple(4, 4, &aom_highbd_10_mse16x8_c),
- make_tuple(4, 4, &aom_highbd_10_mse8x16_c),
- make_tuple(4, 4, &aom_highbd_10_mse8x8_c),
- make_tuple(4, 4, &aom_highbd_8_mse16x16_c),
- make_tuple(4, 4, &aom_highbd_8_mse16x8_c),
- make_tuple(4, 4, &aom_highbd_8_mse8x16_c),
- make_tuple(4, 4, &aom_highbd_8_mse8x8_c)));
-*/
+ ::testing::Values(MseParams(4, 4, &aom_highbd_12_mse16x16_c, 12),
+ MseParams(4, 3, &aom_highbd_12_mse16x8_c, 12),
+ MseParams(3, 4, &aom_highbd_12_mse8x16_c, 12),
+ MseParams(3, 3, &aom_highbd_12_mse8x8_c, 12),
+ MseParams(4, 4, &aom_highbd_10_mse16x16_c, 10),
+ MseParams(4, 3, &aom_highbd_10_mse16x8_c, 10),
+ MseParams(3, 4, &aom_highbd_10_mse8x16_c, 10),
+ MseParams(3, 3, &aom_highbd_10_mse8x8_c, 10),
+ MseParams(4, 4, &aom_highbd_8_mse16x16_c, 8),
+ MseParams(4, 3, &aom_highbd_8_mse16x8_c, 8),
+ MseParams(3, 4, &aom_highbd_8_mse8x16_c, 8),
+ MseParams(3, 3, &aom_highbd_8_mse8x8_c, 8)));
+
+#if HAVE_NEON
+INSTANTIATE_TEST_SUITE_P(
+ NEON, AvxHBDMseTest,
+ ::testing::Values(MseParams(4, 4, &aom_highbd_12_mse16x16_neon, 12),
+ MseParams(4, 3, &aom_highbd_12_mse16x8_neon, 12),
+ MseParams(3, 4, &aom_highbd_12_mse8x16_neon, 12),
+ MseParams(3, 3, &aom_highbd_12_mse8x8_neon, 12),
+ MseParams(4, 4, &aom_highbd_10_mse16x16_neon, 10),
+ MseParams(4, 3, &aom_highbd_10_mse16x8_neon, 10),
+ MseParams(3, 4, &aom_highbd_10_mse8x16_neon, 10),
+ MseParams(3, 3, &aom_highbd_10_mse8x8_neon, 10),
+ MseParams(4, 4, &aom_highbd_8_mse16x16_neon, 8),
+ MseParams(4, 3, &aom_highbd_8_mse16x8_neon, 8),
+ MseParams(3, 4, &aom_highbd_8_mse8x16_neon, 8),
+ MseParams(3, 3, &aom_highbd_8_mse8x8_neon, 8)));
+#endif // HAVE_NEON
const VarianceParams kArrayHBDVariance_c[] = {
VarianceParams(7, 7, &aom_highbd_12_variance128x128_c, 12),
@@ -2351,6 +2573,15 @@
INSTANTIATE_TEST_SUITE_P(SSE2, GetSseSum8x8QuadTest,
::testing::ValuesIn(kArrayGetSseSum8x8Quad_sse2));
+const GetSseSumParamsDual kArrayGetSseSum16x16Dual_sse2[] = {
+ GetSseSumParamsDual(7, 7, &aom_get_var_sse_sum_16x16_dual_sse2, 0),
+ GetSseSumParamsDual(6, 6, &aom_get_var_sse_sum_16x16_dual_sse2, 0),
+ GetSseSumParamsDual(5, 5, &aom_get_var_sse_sum_16x16_dual_sse2, 0),
+ GetSseSumParamsDual(5, 4, &aom_get_var_sse_sum_16x16_dual_sse2, 0)
+};
+INSTANTIATE_TEST_SUITE_P(SSE2, GetSseSum16x16DualTest,
+ ::testing::ValuesIn(kArrayGetSseSum16x16Dual_sse2));
+
const SubpelVarianceParams kArraySubpelVariance_sse2[] = {
SubpelVarianceParams(7, 7, &aom_sub_pixel_variance128x128_sse2, 0),
SubpelVarianceParams(7, 6, &aom_sub_pixel_variance128x64_sse2, 0),
@@ -2444,22 +2675,14 @@
12)));
#endif // HAVE_SSE4_1
-/* TODO(debargha): This test does not support the highbd version
INSTANTIATE_TEST_SUITE_P(
SSE2, AvxHBDMseTest,
- ::testing::Values(MseParams(4, 4, &aom_highbd_12_mse16x16_sse2),
- MseParams(4, 3, &aom_highbd_12_mse16x8_sse2),
- MseParams(3, 4, &aom_highbd_12_mse8x16_sse2),
- MseParams(3, 3, &aom_highbd_12_mse8x8_sse2),
- MseParams(4, 4, &aom_highbd_10_mse16x16_sse2),
- MseParams(4, 3, &aom_highbd_10_mse16x8_sse2),
- MseParams(3, 4, &aom_highbd_10_mse8x16_sse2),
- MseParams(3, 3, &aom_highbd_10_mse8x8_sse2),
- MseParams(4, 4, &aom_highbd_8_mse16x16_sse2),
- MseParams(4, 3, &aom_highbd_8_mse16x8_sse2),
- MseParams(3, 4, &aom_highbd_8_mse8x16_sse2),
- MseParams(3, 3, &aom_highbd_8_mse8x8_sse2)));
-*/
+ ::testing::Values(MseParams(4, 4, &aom_highbd_12_mse16x16_sse2, 12),
+ MseParams(3, 3, &aom_highbd_12_mse8x8_sse2, 12),
+ MseParams(4, 4, &aom_highbd_10_mse16x16_sse2, 10),
+ MseParams(3, 3, &aom_highbd_10_mse8x8_sse2, 10),
+ MseParams(4, 4, &aom_highbd_8_mse16x16_sse2, 8),
+ MseParams(3, 3, &aom_highbd_8_mse8x8_sse2, 8)));
const VarianceParams kArrayHBDVariance_sse2[] = {
VarianceParams(7, 7, &aom_highbd_12_variance128x128_sse2, 12),
@@ -2969,6 +3192,15 @@
INSTANTIATE_TEST_SUITE_P(AVX2, GetSseSum8x8QuadTest,
::testing::ValuesIn(kArrayGetSseSum8x8Quad_avx2));
+const GetSseSumParamsDual kArrayGetSseSum16x16Dual_avx2[] = {
+ GetSseSumParamsDual(7, 7, &aom_get_var_sse_sum_16x16_dual_avx2, 0),
+ GetSseSumParamsDual(6, 6, &aom_get_var_sse_sum_16x16_dual_avx2, 0),
+ GetSseSumParamsDual(5, 5, &aom_get_var_sse_sum_16x16_dual_avx2, 0),
+ GetSseSumParamsDual(5, 4, &aom_get_var_sse_sum_16x16_dual_avx2, 0)
+};
+INSTANTIATE_TEST_SUITE_P(AVX2, GetSseSum16x16DualTest,
+ ::testing::ValuesIn(kArrayGetSseSum16x16Dual_avx2));
+
const SubpelVarianceParams kArraySubpelVariance_avx2[] = {
SubpelVarianceParams(7, 7, &aom_sub_pixel_variance128x128_avx2, 0),
SubpelVarianceParams(7, 6, &aom_sub_pixel_variance128x64_avx2, 0),
@@ -3015,10 +3247,6 @@
MseWxHParams(2, 3, &aom_mse_wxh_16bit_neon, 8),
MseWxHParams(2, 2, &aom_mse_wxh_16bit_neon, 8)));
-INSTANTIATE_TEST_SUITE_P(NEON, AvxSseTest,
- ::testing::Values(SseParams(2, 2,
- &aom_get4x4sse_cs_neon)));
-
INSTANTIATE_TEST_SUITE_P(NEON, AvxMseTest,
::testing::Values(MseParams(3, 3, &aom_mse8x8_neon),
MseParams(3, 4, &aom_mse8x16_neon),
@@ -3114,6 +3342,35 @@
INSTANTIATE_TEST_SUITE_P(NEON, AvxSubpelAvgVarianceTest,
::testing::ValuesIn(kArraySubpelAvgVariance_neon));
+#if !CONFIG_REALTIME_ONLY
+const ObmcSubpelVarianceParams kArrayObmcSubpelVariance_neon[] = {
+ ObmcSubpelVarianceParams(7, 7, &aom_obmc_sub_pixel_variance128x128_neon, 0),
+ ObmcSubpelVarianceParams(7, 6, &aom_obmc_sub_pixel_variance128x64_neon, 0),
+ ObmcSubpelVarianceParams(6, 7, &aom_obmc_sub_pixel_variance64x128_neon, 0),
+ ObmcSubpelVarianceParams(6, 6, &aom_obmc_sub_pixel_variance64x64_neon, 0),
+ ObmcSubpelVarianceParams(6, 5, &aom_obmc_sub_pixel_variance64x32_neon, 0),
+ ObmcSubpelVarianceParams(5, 6, &aom_obmc_sub_pixel_variance32x64_neon, 0),
+ ObmcSubpelVarianceParams(5, 5, &aom_obmc_sub_pixel_variance32x32_neon, 0),
+ ObmcSubpelVarianceParams(5, 4, &aom_obmc_sub_pixel_variance32x16_neon, 0),
+ ObmcSubpelVarianceParams(4, 5, &aom_obmc_sub_pixel_variance16x32_neon, 0),
+ ObmcSubpelVarianceParams(4, 4, &aom_obmc_sub_pixel_variance16x16_neon, 0),
+ ObmcSubpelVarianceParams(4, 3, &aom_obmc_sub_pixel_variance16x8_neon, 0),
+ ObmcSubpelVarianceParams(3, 4, &aom_obmc_sub_pixel_variance8x16_neon, 0),
+ ObmcSubpelVarianceParams(3, 3, &aom_obmc_sub_pixel_variance8x8_neon, 0),
+ ObmcSubpelVarianceParams(3, 2, &aom_obmc_sub_pixel_variance8x4_neon, 0),
+ ObmcSubpelVarianceParams(2, 3, &aom_obmc_sub_pixel_variance4x8_neon, 0),
+ ObmcSubpelVarianceParams(2, 2, &aom_obmc_sub_pixel_variance4x4_neon, 0),
+ ObmcSubpelVarianceParams(6, 4, &aom_obmc_sub_pixel_variance64x16_neon, 0),
+ ObmcSubpelVarianceParams(4, 6, &aom_obmc_sub_pixel_variance16x64_neon, 0),
+ ObmcSubpelVarianceParams(5, 3, &aom_obmc_sub_pixel_variance32x8_neon, 0),
+ ObmcSubpelVarianceParams(3, 5, &aom_obmc_sub_pixel_variance8x32_neon, 0),
+ ObmcSubpelVarianceParams(4, 2, &aom_obmc_sub_pixel_variance16x4_neon, 0),
+ ObmcSubpelVarianceParams(2, 4, &aom_obmc_sub_pixel_variance4x16_neon, 0),
+};
+INSTANTIATE_TEST_SUITE_P(NEON, AvxObmcSubpelVarianceTest,
+ ::testing::ValuesIn(kArrayObmcSubpelVariance_neon));
+#endif
+
const GetSseSumParams kArrayGetSseSum8x8Quad_neon[] = {
GetSseSumParams(7, 7, &aom_get_var_sse_sum_8x8_quad_neon, 0),
GetSseSumParams(6, 6, &aom_get_var_sse_sum_8x8_quad_neon, 0),
@@ -3123,8 +3380,33 @@
INSTANTIATE_TEST_SUITE_P(NEON, GetSseSum8x8QuadTest,
::testing::ValuesIn(kArrayGetSseSum8x8Quad_neon));
+const GetSseSumParamsDual kArrayGetSseSum16x16Dual_neon[] = {
+ GetSseSumParamsDual(7, 7, &aom_get_var_sse_sum_16x16_dual_neon, 0),
+ GetSseSumParamsDual(6, 6, &aom_get_var_sse_sum_16x16_dual_neon, 0),
+ GetSseSumParamsDual(5, 5, &aom_get_var_sse_sum_16x16_dual_neon, 0),
+ GetSseSumParamsDual(5, 4, &aom_get_var_sse_sum_16x16_dual_neon, 0)
+};
+INSTANTIATE_TEST_SUITE_P(NEON, GetSseSum16x16DualTest,
+ ::testing::ValuesIn(kArrayGetSseSum16x16Dual_neon));
+
#if CONFIG_AV1_HIGHBITDEPTH
const VarianceParams kArrayHBDVariance_neon[] = {
+ VarianceParams(7, 7, &aom_highbd_12_variance128x128_neon, 12),
+ VarianceParams(7, 6, &aom_highbd_12_variance128x64_neon, 12),
+ VarianceParams(6, 7, &aom_highbd_12_variance64x128_neon, 12),
+ VarianceParams(6, 6, &aom_highbd_12_variance64x64_neon, 12),
+ VarianceParams(6, 5, &aom_highbd_12_variance64x32_neon, 12),
+ VarianceParams(5, 6, &aom_highbd_12_variance32x64_neon, 12),
+ VarianceParams(5, 5, &aom_highbd_12_variance32x32_neon, 12),
+ VarianceParams(5, 4, &aom_highbd_12_variance32x16_neon, 12),
+ VarianceParams(4, 5, &aom_highbd_12_variance16x32_neon, 12),
+ VarianceParams(4, 4, &aom_highbd_12_variance16x16_neon, 12),
+ VarianceParams(4, 3, &aom_highbd_12_variance16x8_neon, 12),
+ VarianceParams(3, 4, &aom_highbd_12_variance8x16_neon, 12),
+ VarianceParams(3, 3, &aom_highbd_12_variance8x8_neon, 12),
+ VarianceParams(3, 2, &aom_highbd_12_variance8x4_neon, 12),
+ VarianceParams(2, 3, &aom_highbd_12_variance4x8_neon, 12),
+ VarianceParams(2, 2, &aom_highbd_12_variance4x4_neon, 12),
VarianceParams(7, 7, &aom_highbd_10_variance128x128_neon, 10),
VarianceParams(7, 6, &aom_highbd_10_variance128x64_neon, 10),
VarianceParams(6, 7, &aom_highbd_10_variance64x128_neon, 10),
@@ -3141,13 +3423,41 @@
VarianceParams(3, 2, &aom_highbd_10_variance8x4_neon, 10),
VarianceParams(2, 3, &aom_highbd_10_variance4x8_neon, 10),
VarianceParams(2, 2, &aom_highbd_10_variance4x4_neon, 10),
+ VarianceParams(7, 7, &aom_highbd_8_variance128x128_neon, 8),
+ VarianceParams(7, 6, &aom_highbd_8_variance128x64_neon, 8),
+ VarianceParams(6, 7, &aom_highbd_8_variance64x128_neon, 8),
+ VarianceParams(6, 6, &aom_highbd_8_variance64x64_neon, 8),
+ VarianceParams(6, 5, &aom_highbd_8_variance64x32_neon, 8),
+ VarianceParams(5, 6, &aom_highbd_8_variance32x64_neon, 8),
+ VarianceParams(5, 5, &aom_highbd_8_variance32x32_neon, 8),
+ VarianceParams(5, 4, &aom_highbd_8_variance32x16_neon, 8),
+ VarianceParams(4, 5, &aom_highbd_8_variance16x32_neon, 8),
+ VarianceParams(4, 4, &aom_highbd_8_variance16x16_neon, 8),
+ VarianceParams(4, 3, &aom_highbd_8_variance16x8_neon, 8),
+ VarianceParams(3, 4, &aom_highbd_8_variance8x16_neon, 8),
+ VarianceParams(3, 3, &aom_highbd_8_variance8x8_neon, 8),
+ VarianceParams(3, 2, &aom_highbd_8_variance8x4_neon, 8),
+ VarianceParams(2, 3, &aom_highbd_8_variance4x8_neon, 8),
+ VarianceParams(2, 2, &aom_highbd_8_variance4x4_neon, 8),
#if !CONFIG_REALTIME_ONLY
+ VarianceParams(6, 4, &aom_highbd_12_variance64x16_neon, 12),
+ VarianceParams(4, 6, &aom_highbd_12_variance16x64_neon, 12),
+ VarianceParams(5, 3, &aom_highbd_12_variance32x8_neon, 12),
+ VarianceParams(3, 5, &aom_highbd_12_variance8x32_neon, 12),
+ VarianceParams(4, 2, &aom_highbd_12_variance16x4_neon, 12),
+ VarianceParams(2, 4, &aom_highbd_12_variance4x16_neon, 12),
VarianceParams(6, 4, &aom_highbd_10_variance64x16_neon, 10),
VarianceParams(4, 6, &aom_highbd_10_variance16x64_neon, 10),
VarianceParams(5, 3, &aom_highbd_10_variance32x8_neon, 10),
VarianceParams(3, 5, &aom_highbd_10_variance8x32_neon, 10),
VarianceParams(4, 2, &aom_highbd_10_variance16x4_neon, 10),
VarianceParams(2, 4, &aom_highbd_10_variance4x16_neon, 10),
+ VarianceParams(6, 4, &aom_highbd_8_variance64x16_neon, 8),
+ VarianceParams(4, 6, &aom_highbd_8_variance16x64_neon, 8),
+ VarianceParams(5, 3, &aom_highbd_8_variance32x8_neon, 8),
+ VarianceParams(3, 5, &aom_highbd_8_variance8x32_neon, 8),
+ VarianceParams(4, 2, &aom_highbd_8_variance16x4_neon, 8),
+ VarianceParams(2, 4, &aom_highbd_8_variance4x16_neon, 8),
#endif
};
diff --git a/test/warp_filter_test_util.cc b/test/warp_filter_test_util.cc
index b4376d8..e42671e 100644
--- a/test/warp_filter_test_util.cc
+++ b/test/warp_filter_test_util.cc
@@ -185,7 +185,6 @@
const int is_delta_zero = GET_PARAM(4);
const int out_w = std::get<0>(params), out_h = std::get<1>(params);
const int num_iters = std::get<2>(params);
- int i, j, sub_x, sub_y;
const int bd = 8;
// The warp functions always write rows with widths that are multiples of 8.
@@ -209,7 +208,7 @@
ASSERT_NE(dstb, nullptr);
for (int i = 0; i < output_n; ++i) output[i] = output2[i] = rnd_.Rand8();
- for (i = 0; i < num_iters; ++i) {
+ for (int i = 0; i < num_iters; ++i) {
// Generate an input block and extend its borders horizontally
for (int r = 0; r < h; ++r)
for (int c = 0; c < w; ++c) input[r * stride + c] = rnd_.Rand8();
@@ -218,8 +217,8 @@
memset(input + r * stride + w, input[r * stride + (w - 1)], border);
}
const int use_no_round = rnd_.Rand8() & 1;
- for (sub_x = 0; sub_x < 2; ++sub_x)
- for (sub_y = 0; sub_y < 2; ++sub_y) {
+ for (int sub_x = 0; sub_x < 2; ++sub_x)
+ for (int sub_y = 0; sub_y < 2; ++sub_y) {
generate_warped_model(&rnd_, mat, &alpha, &beta, &gamma, &delta,
is_alpha_zero, is_beta_zero, is_gamma_zero,
is_delta_zero);
@@ -258,18 +257,18 @@
out_h, out_w, sub_x, sub_y, &conv_params, alpha, beta,
gamma, delta);
if (use_no_round) {
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(dsta[j], dstb[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
<< i;
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(output[j], output2[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
<< i;
} else {
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(output[j], output2[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
@@ -386,7 +385,6 @@
const int bd = std::get<3>(param);
const int num_iters = std::get<2>(param);
const int mask = (1 << bd) - 1;
- int i, j, sub_x, sub_y;
// The warp functions always write rows with widths that are multiples of 8.
// So to avoid a buffer overflow, we may need to pad rows to a multiple of 8.
@@ -409,7 +407,7 @@
ASSERT_NE(dstb, nullptr);
for (int i = 0; i < output_n; ++i) output[i] = output2[i] = rnd_.Rand16();
- for (i = 0; i < num_iters; ++i) {
+ for (int i = 0; i < num_iters; ++i) {
// Generate an input block and extend its borders horizontally
for (int r = 0; r < h; ++r)
for (int c = 0; c < w; ++c) input[r * stride + c] = rnd_.Rand16() & mask;
@@ -420,8 +418,8 @@
}
}
const int use_no_round = rnd_.Rand8() & 1;
- for (sub_x = 0; sub_x < 2; ++sub_x)
- for (sub_y = 0; sub_y < 2; ++sub_y) {
+ for (int sub_x = 0; sub_x < 2; ++sub_x)
+ for (int sub_y = 0; sub_y < 2; ++sub_y) {
generate_warped_model(&rnd_, mat, &alpha, &beta, &gamma, &delta,
is_alpha_zero, is_beta_zero, is_gamma_zero,
is_delta_zero);
@@ -464,18 +462,18 @@
beta, gamma, delta);
if (use_no_round) {
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(dsta[j], dstb[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
<< i;
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(output[j], output2[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
<< i;
} else {
- for (j = 0; j < out_w * out_h; ++j)
+ for (int j = 0; j < out_w * out_h; ++j)
ASSERT_EQ(output[j], output2[j])
<< "Pixel mismatch at index " << j << " = ("
<< (j % out_w) << ", " << (j / out_w) << ") on iteration "
diff --git a/test/wiener_test.cc b/test/wiener_test.cc
index d44dd92..8be6a64 100644
--- a/test/wiener_test.cc
+++ b/test/wiener_test.cc
@@ -35,11 +35,14 @@
// C implementation of the algorithm implmented by the SIMD code.
// This is a little more efficient than the version in av1_compute_stats_c().
static void compute_stats_win_opt_c(int wiener_win, const uint8_t *dgd,
- const uint8_t *src, int h_start, int h_end,
- int v_start, int v_end, int dgd_stride,
- int src_stride, int64_t *M, int64_t *H,
+ const uint8_t *src, int16_t *d, int16_t *s,
+ int h_start, int h_end, int v_start,
+ int v_end, int dgd_stride, int src_stride,
+ int64_t *M, int64_t *H,
int use_downsampled_wiener_stats) {
ASSERT_TRUE(wiener_win == WIENER_WIN || wiener_win == WIENER_WIN_CHROMA);
+ (void)d;
+ (void)s;
int i, j, k, l, m, n;
const int pixel_count = (h_end - h_start) * (v_end - v_start);
const int wiener_win2 = wiener_win * wiener_win;
@@ -156,23 +159,25 @@
}
void compute_stats_opt_c(int wiener_win, const uint8_t *dgd, const uint8_t *src,
- int h_start, int h_end, int v_start, int v_end,
- int dgd_stride, int src_stride, int64_t *M, int64_t *H,
+ int16_t *d, int16_t *s, int h_start, int h_end,
+ int v_start, int v_end, int dgd_stride, int src_stride,
+ int64_t *M, int64_t *H,
int use_downsampled_wiener_stats) {
if (wiener_win == WIENER_WIN || wiener_win == WIENER_WIN_CHROMA) {
- compute_stats_win_opt_c(wiener_win, dgd, src, h_start, h_end, v_start,
+ compute_stats_win_opt_c(wiener_win, dgd, src, d, s, h_start, h_end, v_start,
v_end, dgd_stride, src_stride, M, H,
use_downsampled_wiener_stats);
} else {
- av1_compute_stats_c(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M, H,
+ av1_compute_stats_c(wiener_win, dgd, src, d, s, h_start, h_end, v_start,
+ v_end, dgd_stride, src_stride, M, H,
use_downsampled_wiener_stats);
}
}
static const int kIterations = 100;
typedef void (*compute_stats_Func)(int wiener_win, const uint8_t *dgd,
- const uint8_t *src, int h_start, int h_end,
+ const uint8_t *src, int16_t *dgd_avg,
+ int16_t *src_avg, int h_start, int h_end,
int v_start, int v_end, int dgd_stride,
int src_stride, int64_t *M, int64_t *H,
int use_downsampled_wiener_stats);
@@ -192,11 +197,17 @@
dgd_buf = (uint8_t *)aom_memalign(
32, MAX_DATA_BLOCK * MAX_DATA_BLOCK * sizeof(*dgd_buf));
ASSERT_NE(dgd_buf, nullptr);
+ const int buf_size =
+ sizeof(*buf) * 6 * RESTORATION_UNITSIZE_MAX * RESTORATION_UNITSIZE_MAX;
+ buf = (int16_t *)aom_memalign(32, buf_size);
+ ASSERT_NE(buf, nullptr);
+ memset(buf, 0, buf_size);
target_func_ = GET_PARAM(0);
}
virtual void TearDown() {
aom_free(src_buf);
aom_free(dgd_buf);
+ aom_free(buf);
}
void RunWienerTest(const int32_t wiener_win, int32_t run_times);
void RunWienerTest_ExtremeValues(const int32_t wiener_win);
@@ -206,6 +217,7 @@
libaom_test::ACMRandom rng_;
uint8_t *src_buf;
uint8_t *dgd_buf;
+ int16_t *buf;
};
void WienerTest::RunWienerTest(const int32_t wiener_win, int32_t run_times) {
@@ -232,6 +244,9 @@
const int src_stride = MAX_DATA_BLOCK;
const int iters = run_times == 1 ? kIterations : 2;
const int max_value_downsample_stats = 1;
+ int16_t *dgd_avg = buf;
+ int16_t *src_avg =
+ buf + (3 * RESTORATION_UNITSIZE_MAX * RESTORATION_UNITSIZE_MAX);
for (int iter = 0; iter < iters && !HasFatalFailure(); ++iter) {
for (int i = 0; i < MAX_DATA_BLOCK * MAX_DATA_BLOCK; ++i) {
@@ -246,16 +261,16 @@
aom_usec_timer timer;
aom_usec_timer_start(&timer);
for (int i = 0; i < run_times; ++i) {
- av1_compute_stats_c(wiener_win, dgd, src, h_start, h_end, v_start,
- v_end, dgd_stride, src_stride, M_ref, H_ref,
- use_downsampled_stats);
+ av1_compute_stats_c(wiener_win, dgd, src, dgd_avg, src_avg, h_start,
+ h_end, v_start, v_end, dgd_stride, src_stride,
+ M_ref, H_ref, use_downsampled_stats);
}
aom_usec_timer_mark(&timer);
const double time1 = static_cast<double>(aom_usec_timer_elapsed(&timer));
aom_usec_timer_start(&timer);
for (int i = 0; i < run_times; ++i) {
- target_func_(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M_test, H_test,
+ target_func_(wiener_win, dgd, src, dgd_avg, src_avg, h_start, h_end,
+ v_start, v_end, dgd_stride, src_stride, M_test, H_test,
use_downsampled_stats);
}
aom_usec_timer_mark(&timer);
@@ -302,6 +317,9 @@
const int src_stride = MAX_DATA_BLOCK;
const int iters = 1;
const int max_value_downsample_stats = 1;
+ int16_t *dgd_avg = buf;
+ int16_t *src_avg =
+ buf + (3 * RESTORATION_UNITSIZE_MAX * RESTORATION_UNITSIZE_MAX);
for (int iter = 0; iter < iters && !HasFatalFailure(); ++iter) {
for (int i = 0; i < MAX_DATA_BLOCK * MAX_DATA_BLOCK; ++i) {
@@ -313,12 +331,12 @@
for (int use_downsampled_stats = 0;
use_downsampled_stats <= max_value_downsample_stats;
use_downsampled_stats++) {
- av1_compute_stats_c(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M_ref, H_ref,
- use_downsampled_stats);
+ av1_compute_stats_c(wiener_win, dgd, src, dgd_avg, src_avg, h_start,
+ h_end, v_start, v_end, dgd_stride, src_stride, M_ref,
+ H_ref, use_downsampled_stats);
- target_func_(wiener_win, dgd, src, h_start, h_end, v_start, v_end,
- dgd_stride, src_stride, M_test, H_test,
+ target_func_(wiener_win, dgd, src, dgd_avg, src_avg, h_start, h_end,
+ v_start, v_end, dgd_stride, src_stride, M_test, H_test,
use_downsampled_stats);
int failed = 0;
@@ -710,5 +728,648 @@
::testing::Values(av1_compute_stats_highbd_avx2));
#endif // HAVE_AVX2
+// A test that reproduces b/274668506: signed integer overflow in
+// update_a_sep_sym().
+TEST(SearchWienerTest, 10bitSignedIntegerOverflowInUpdateASepSym) {
+ constexpr int kWidth = 427;
+ constexpr int kHeight = 1;
+ std::vector<uint16_t> buffer(3 * kWidth * kHeight);
+ // The values in the buffer alternate between 0 and 1023.
+ uint16_t value = 0;
+ for (size_t i = 0; i < buffer.size(); ++i) {
+ buffer[i] = value;
+ value = 1023 - value;
+ }
+ unsigned char *img_data = reinterpret_cast<unsigned char *>(buffer.data());
+
+ aom_image_t img;
+ EXPECT_EQ(
+ aom_img_wrap(&img, AOM_IMG_FMT_I44416, kWidth, kHeight, 1, img_data),
+ &img);
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_ALL_INTRA),
+ AOM_CODEC_OK);
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 1;
+ cfg.g_bit_depth = AOM_BITS_10;
+ cfg.g_input_bit_depth = 10;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_limit = 1;
+ cfg.g_lag_in_frames = 0;
+ cfg.kf_mode = AOM_KF_DISABLED;
+ cfg.kf_max_dist = 0;
+ cfg.g_threads = 61;
+ cfg.rc_min_quantizer = 2;
+ cfg.rc_max_quantizer = 20;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, AOM_CODEC_USE_HIGHBITDEPTH),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 11), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_ROW_MT, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_ROWS, 4), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 3), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_SKIP_POSTPROC_FILTERING, 1),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM),
+ AOM_CODEC_OK);
+
+ // Encode frame
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(aom_codec_encode(&enc, nullptr, 0, 1, 0), AOM_CODEC_OK);
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
+}
+
+// A test that reproduces b/281219978: signed integer overflow in
+// update_b_sep_sym().
+TEST(SearchWienerTest, 12bitSignedIntegerOverflowInUpdateBSepSym) {
+ constexpr int kWidth = 311;
+ constexpr int kHeight = 3;
+ static const uint16_t buffer[3 * kWidth * kHeight] = {
+ // Y plane:
+ 0, 0, 0, 2156, 2513, 2211, 4095, 4095, 0, 2538, 0, 0, 0, 0, 4095, 0, 258,
+ 941, 4095, 907, 0, 0, 2325, 2485, 2408, 4095, 1513, 0, 3644, 2080, 4095,
+ 4095, 0, 2135, 0, 2461, 4095, 0, 4095, 4095, 0, 1987, 0, 3629, 0, 4095,
+ 3918, 4095, 0, 4095, 4095, 4095, 0, 1065, 0, 2072, 3597, 102, 0, 534, 0, 0,
+ 0, 4095, 0, 0, 4095, 0, 4095, 0, 4095, 0, 3611, 0, 1139, 4095, 0, 0, 0, 0,
+ 0, 4095, 0, 0, 0, 0, 4095, 4095, 4095, 0, 0, 0, 3070, 3224, 0, 0, 4095,
+ 4051, 4095, 0, 4095, 3712, 0, 1465, 4095, 1699, 4095, 4095, 0, 0, 0, 3885,
+ 0, 4095, 0, 0, 4095, 1686, 4095, 4095, 4095, 4095, 1330, 0, 0, 0, 4095, 0,
+ 4095, 4095, 3919, 4095, 781, 2371, 2055, 4095, 912, 3710, 0, 2045, 0, 4095,
+ 4095, 4095, 1811, 0, 1298, 1115, 0, 3327, 0, 0, 4095, 0, 253, 2386, 4095,
+ 1791, 3657, 1444, 0, 4095, 1918, 4095, 4095, 0, 4095, 305, 1587, 0, 4095, 0,
+ 3759, 0, 0, 4095, 2387, 4095, 4095, 0, 0, 4095, 4095, 0, 1015, 4095, 0, 768,
+ 2598, 1667, 130, 4095, 0, 0, 435, 4095, 3683, 4095, 0, 4095, 4095, 1888,
+ 2828, 4095, 3349, 0, 4095, 4095, 4095, 4095, 0, 4095, 0, 0, 4095, 0, 2491,
+ 1598, 0, 0, 383, 3712, 4095, 0, 0, 4095, 760, 4095, 4095, 4095, 2030, 4095,
+ 0, 0, 3236, 0, 1040, 0, 0, 4095, 0, 0, 4095, 4095, 4095, 0, 0, 1043, 3897,
+ 2446, 233, 1589, 427, 4095, 4095, 4095, 4095, 0, 1656, 3786, 4095, 0, 840,
+ 4095, 4095, 1429, 4095, 0, 4095, 2734, 4095, 0, 2431, 1801, 278, 0, 4095, 0,
+ 4095, 0, 0, 420, 0, 0, 746, 0, 0, 3281, 3006, 4095, 4095, 0, 0, 0, 3605,
+ 4095, 4095, 0, 4095, 4095, 4095, 4095, 2660, 496, 4095, 0, 0, 0, 0, 4095, 0,
+ 1317, 4095, 4095, 510, 1919, 0, 3893, 0, 4095, 4095, 4095, 4095, 4095, 2071,
+ 2006, 0, 3316, 4095, 0, 0, 4095, 852, 2982, 0, 2073, 0, 2728, 1499, 4095,
+ 852, 361, 3137, 4095, 4095, 1502, 1575, 0, 4095, 0, 0, 0, 0, 1585, 4095, 0,
+ 4095, 0, 3188, 3244, 4095, 2958, 4095, 4095, 0, 4095, 4095, 4095, 1706,
+ 2896, 4095, 1788, 730, 1146, 4095, 0, 0, 4095, 0, 0, 0, 2791, 3613, 2175,
+ 2925, 0, 0, 0, 0, 0, 1279, 4095, 4095, 0, 4095, 0, 0, 2336, 0, 3462, 4095,
+ 0, 4095, 1997, 2328, 2860, 0, 4095, 4095, 3241, 4095, 4095, 4095, 4095,
+ 4095, 4095, 118, 0, 4095, 4095, 4095, 0, 3734, 0, 0, 0, 4095, 1952, 4095,
+ 413, 4095, 1183, 4095, 0, 4095, 0, 0, 4095, 4095, 4095, 3805, 0, 1398, 0,
+ 4095, 0, 0, 0, 4095, 4095, 4095, 2802, 3658, 4095, 4095, 0, 0, 0, 4095, 0,
+ 897, 0, 4095, 2163, 0, 0, 0, 4095, 1440, 2487, 4095, 4095, 0, 4095, 4095,
+ 4095, 2808, 0, 1999, 0, 0, 4095, 4095, 4095, 1563, 124, 2179, 754, 0, 0,
+ 2407, 2798, 0, 4095, 4095, 0, 0, 1929, 0, 0, 0, 1387, 4095, 4095, 0, 0,
+ 3911, 562, 4095, 0, 4095, 2639, 2673, 4095, 4095, 0, 0, 4095, 4095, 0, 4095,
+ 4095, 901, 0, 321, 3961, 4095, 0, 4095, 4095, 4095, 0, 0, 0, 0, 3035, 3713,
+ 3441, 0, 4095, 0, 0, 854, 1544, 3963, 1968, 4095, 0, 0, 0, 0, 2897, 4095, 0,
+ 4095, 4095, 0, 235, 1011, 4095, 0, 3452, 4095, 4095, 0, 0, 4095, 4095, 4095,
+ 4095, 4095, 3312, 0, 3064, 4095, 3981, 4095, 4095, 4095, 4095, 4095, 0, 791,
+ 3243, 4095, 799, 0, 0, 0, 523, 2117, 3776, 0, 4095, 3311, 0, 543, 4095,
+ 4095, 4095, 0, 0, 4095, 4095, 4095, 4095, 0, 0, 4095, 4095, 225, 0, 1195,
+ 3070, 1210, 4095, 0, 4095, 498, 782, 0, 0, 4095, 4095, 4095, 4095, 4095,
+ 1456, 4095, 3898, 1472, 4095, 4095, 0, 4095, 4026, 0, 0, 2354, 1554, 0,
+ 4095, 0, 2986, 0, 1053, 1228, 0, 0, 4095, 4095, 0, 0, 4095, 0, 0, 4095, 0,
+ 0, 0, 606, 0, 4095, 3563, 4095, 2016, 4095, 0, 0, 4095, 0, 4095, 4095, 4095,
+ 0, 0, 0, 929, 0, 0, 4095, 0, 3069, 4095, 0, 2687, 4095, 4095, 4095, 2015,
+ 4095, 4095, 4095, 0, 4095, 0, 0, 2860, 3668, 0, 0, 4095, 2523, 2104, 0, 0,
+ 3063, 4095, 3674, 4095, 0, 2762, 0, 4095, 2582, 3473, 930, 0, 1012, 108, 38,
+ 4095, 1148, 3568, 4036, 4095, 4095, 0, 1120, 1873, 3028, 4095, 515, 1902,
+ 4095, 0, 815, 4095, 1548, 0, 1073, 3919, 4095, 2374, 0, 3126, 4095, 2268, 0,
+ 0, 0, 4095, 425, 4095, 0, 0, 4095, 4095, 2710, 4095, 2067, 4095, 4095, 2201,
+ 4095, 4095, 0, 4095, 4095, 2933, 0, 417, 2801, 4095, 4095, 3274, 0, 2870,
+ 4095, 4095, 0, 0, 973, 0, 0, 3129, 4095, 0, 0, 0, 4095, 4095, 4095, 0, 242,
+ 4095, 0, 4095, 0, 0, 0, 0, 987, 0, 2426, 4045, 2780, 0, 4095, 3762, 3361,
+ 3095, 4095, 596, 1072, 4071, 4095, 4095, 0, 0, 81, 0, 1001, 1683, 4095,
+ 4095, 3105, 2673, 0, 3300, 104, 4030, 0, 2615, 4095, 4095, 0, 4095, 1830,
+ 3917, 4095, 4095, 4095, 0, 4095, 3637, 0, 4095, 4095, 3677, 4095, 4095, 0,
+ 880, 4095, 4095, 0, 2797, 0, 0, 0, 0, 3225, 4095, 4095, 1925, 2885, 1879, 0,
+ 0, 4095, 0, 0, 0, 2974, 559, 0, 0, 0, 699, 997, 1491, 423, 4012, 0, 2315,
+ 4095, 0, 0, 4095, 0, 836, 4095, 0, 4095, 0, 1752, 0, 0, 0, 4095, 4095, 0, 0,
+ 51, 4095, 350, 0, 2143, 2588, 0, 4095, 0, 4095, 0, 2757, 2370, 4095, 668,
+ 4095, 0, 4095, 0, 3652, 3890, 0, 4095, 0, 4095, 4095, 4095, 4095, 4095,
+ // U plane:
+ 4095, 4095, 1465, 0, 588, 4095, 0, 4095, 4095, 4095, 0, 2167, 4095, 4095,
+ 918, 3223, 4095, 4095, 0, 696, 4095, 4095, 0, 0, 594, 4095, 2935, 0, 0, 0,
+ 2036, 4095, 0, 2492, 4095, 4095, 0, 0, 0, 3883, 0, 4095, 483, 4095, 4095,
+ 324, 923, 0, 3079, 0, 4095, 4095, 810, 0, 3371, 4095, 4095, 0, 4095, 2756,
+ 0, 723, 0, 3338, 1084, 0, 4095, 4095, 3764, 0, 4095, 4095, 4095, 2323, 0,
+ 3693, 682, 0, 0, 909, 4095, 2348, 4095, 4095, 4095, 1509, 4095, 0, 4095,
+ 4095, 4095, 4095, 3977, 3652, 1580, 637, 4095, 0, 593, 4095, 1199, 1773,
+ 4095, 4095, 4095, 0, 3447, 0, 0, 4095, 3873, 0, 0, 2094, 0, 1195, 0, 3892,
+ 4095, 4095, 729, 4095, 0, 0, 4095, 449, 4095, 4095, 2900, 0, 4095, 0, 2114,
+ 4095, 4095, 4095, 1174, 995, 2933, 360, 0, 1970, 0, 4095, 1208, 0, 4095, 0,
+ 4095, 0, 4095, 4095, 0, 4095, 0, 0, 0, 1976, 0, 0, 921, 4095, 4095, 192,
+ 1006, 0, 0, 2725, 4095, 0, 2813, 0, 0, 2375, 4095, 1982, 0, 2725, 4095,
+ 1225, 3566, 4095, 0, 344, 863, 2747, 0, 4095, 4095, 1928, 4095, 4095, 0,
+ 3640, 0, 1744, 3191, 4095, 4095, 0, 4095, 4095, 4095, 0, 0, 748, 4095, 0,
+ 2609, 0, 0, 0, 0, 0, 3508, 4095, 4095, 2463, 0, 4095, 0, 4095, 4095, 4095,
+ 3175, 419, 2193, 0, 0, 4095, 0, 0, 4095, 4051, 2159, 4095, 4095, 2262, 379,
+ 4095, 0, 0, 3399, 4095, 4095, 4095, 3769, 2510, 4054, 3336, 730, 3968, 0, 0,
+ 3354, 0, 1822, 0, 4095, 0, 3847, 3823, 3262, 0, 0, 2936, 0, 4095, 4095,
+ 2120, 0, 3147, 0, 2838, 3480, 474, 1194, 4095, 4095, 2820, 4095, 0, 4095,
+ 1882, 4095, 1085, 0, 4095, 2234, 3371, 4095, 0, 4095, 0, 0, 0, 2586, 4095,
+ 4095, 4095, 4095, 0, 3818, 1401, 2273, 4095, 0, 4095, 0, 3907, 4095, 4095,
+ 694, 0, 4066, 4095, 0, 0, 4095, 2116, 4095, 4095, 4095, 4095, 4095, 0, 2821,
+ 29, 0, 0, 663, 1711, 652, 1271, 4095, 4095, 2401, 3726, 4095, 3453, 1803,
+ 3614, 0, 4095, 3439, 4095, 0, 4095, 0, 816, 0, 0, 4095, 4095, 2635, 0, 1918,
+ 0, 2663, 381, 0, 0, 3670, 0, 4095, 3065, 965, 4095, 4095, 4095, 2993, 4095,
+ 4095, 0, 4095, 973, 4095, 0, 4095, 4095, 0, 3071, 0, 2777, 4095, 4095, 0,
+ 3996, 4095, 1637, 0, 4095, 67, 3784, 0, 0, 4095, 2603, 579, 4095, 4095,
+ 2854, 4095, 3016, 0, 4095, 0, 0, 4095, 4095, 4095, 4095, 3998, 3023, 4095,
+ 4095, 0, 0, 0, 4095, 4095, 4095, 4095, 0, 0, 2623, 1308, 55, 4095, 0, 0,
+ 2554, 2311, 0, 4095, 4095, 4095, 1134, 2112, 0, 4095, 4095, 0, 4095, 0, 645,
+ 0, 0, 4095, 0, 909, 0, 0, 1719, 4095, 0, 3542, 0, 575, 0, 4095, 4095, 4095,
+ 3428, 1172, 481, 1521, 4095, 3199, 1265, 4095, 3518, 4017, 4095, 760, 2042,
+ 3986, 0, 4095, 42, 4095, 0, 4095, 4095, 4095, 4095, 2235, 346, 3865, 0,
+ 4095, 4095, 4095, 4095, 4095, 4095, 845, 4095, 0, 2826, 4095, 4095, 0, 0,
+ 335, 1614, 1465, 0, 4095, 4095, 0, 2771, 4095, 0, 2810, 4095, 4095, 0, 1254,
+ 4095, 2589, 4095, 4095, 2252, 0, 0, 0, 4095, 0, 73, 4095, 4095, 0, 1341, 0,
+ 0, 0, 0, 4095, 0, 0, 2645, 1985, 492, 914, 3996, 4095, 4095, 4095, 0, 2383,
+ 2556, 433, 0, 4095, 1094, 4095, 4095, 642, 4095, 1722, 0, 3460, 4095, 4095,
+ 4095, 4095, 4095, 0, 154, 4095, 92, 4095, 0, 0, 0, 4095, 0, 4095, 4095, 444,
+ 0, 2925, 0, 0, 0, 0, 1628, 0, 4095, 1731, 2418, 697, 4095, 0, 2513, 4095, 0,
+ 4095, 4095, 4095, 4095, 4095, 0, 2510, 4095, 3850, 0, 0, 4095, 2480, 4095,
+ 4095, 2661, 4095, 0, 4095, 0, 0, 4095, 4095, 847, 4095, 4095, 3257, 443, 0,
+ 67, 0, 0, 0, 4095, 0, 0, 3073, 4095, 0, 4095, 0, 4095, 0, 4095, 1224, 4095,
+ 4095, 4095, 0, 4095, 958, 0, 4095, 0, 2327, 684, 0, 0, 0, 0, 4095, 4095, 0,
+ 3693, 795, 4095, 0, 621, 1592, 2314, 4095, 0, 928, 1897, 4095, 4095, 0,
+ 4095, 0, 0, 4095, 2619, 4095, 0, 4095, 0, 0, 4095, 2485, 4095, 4095, 0, 435,
+ 4095, 1818, 4095, 4095, 0, 0, 0, 4095, 4095, 4095, 4095, 0, 1671, 4095,
+ 4095, 0, 2617, 0, 2572, 0, 0, 4095, 3471, 0, 0, 4095, 2719, 3979, 1307, 0,
+ 0, 0, 0, 1794, 642, 447, 913, 4095, 3927, 0, 2686, 0, 0, 4095, 0, 857, 0,
+ 4095, 4095, 567, 2385, 0, 0, 4095, 893, 0, 289, 0, 0, 0, 4095, 4095, 2566,
+ 0, 1913, 0, 2350, 1033, 2764, 0, 4095, 0, 4095, 0, 0, 0, 0, 4095, 3952,
+ 3969, 0, 3476, 0, 4095, 4095, 393, 0, 2613, 0, 0, 1422, 0, 3359, 491, 3263,
+ 4095, 4095, 0, 0, 4095, 697, 3601, 4095, 0, 4095, 4095, 0, 4095, 0, 0, 4095,
+ 0, 4095, 4095, 4095, 2506, 0, 0, 1403, 0, 3836, 3976, 0, 4095, 4095, 4095,
+ 2497, 4095, 4095, 4095, 4095, 0, 4095, 3317, 4095, 4095, 4095, 0, 0, 1131,
+ 0, 0, 0, 4095, 0, 0, 4095, 0, 0, 2988, 4095, 4095, 2711, 2487, 1335, 0, 0,
+ 0, 4095, 261, 4095, 86, 0, 0, 1138, 4095, 0, 0, 4095, 4095, 0, 0, 0, 334, 0,
+ 2395, 3297, 4095, 1698, 4095, 1791, 1341, 0, 3559, 0, 4095, 0, 2056, 3238,
+ 3310, 4095, 4095, 779, 2129, 2849, 4095, 2622, 1051, 0, 0, 1282, 4095, 1246,
+ 0, 0, 3696, 4095, 556, 0, 0, 3463, 2658, 3572, 4095, 3982, 4095, 4095, 0, 0,
+ 4053, 4095, 4095, 4095, 2162, 2567, 1621, 4095, 4095, 1522, 293, 4095, 0, 0,
+ 1976, 4095, 3089, 4095, 0, 0, 0, 0, 3650,
+ // V plane:
+ 0, 1892, 4095, 1995, 0, 0, 0, 2208, 1152, 1794, 4095, 4095, 89, 3333, 4095,
+ 2478, 4095, 2505, 4095, 0, 2664, 4095, 1984, 0, 1144, 4095, 0, 4095, 0,
+ 4095, 0, 0, 0, 2404, 1727, 4095, 4095, 0, 1326, 2033, 0, 4095, 0, 4095,
+ 3022, 0, 4095, 0, 1980, 4095, 0, 2284, 4095, 0, 3422, 0, 4095, 2171, 3155,
+ 4095, 0, 4095, 0, 636, 0, 0, 4095, 3264, 3862, 0, 2164, 0, 0, 3879, 3886, 0,
+ 225, 0, 0, 4095, 0, 1956, 523, 464, 738, 0, 1545, 0, 2829, 4095, 4095, 4095,
+ 799, 4095, 358, 4095, 0, 0, 953, 0, 0, 2081, 4095, 1604, 4095, 2086, 0, 954,
+ 0, 0, 2393, 2413, 4095, 4095, 0, 3583, 4095, 4095, 2995, 4095, 0, 4095,
+ 4095, 3501, 4095, 247, 4095, 0, 0, 0, 4095, 1303, 3382, 1059, 4095, 0, 543,
+ 1276, 1801, 0, 0, 0, 2928, 0, 4095, 3931, 70, 0, 0, 3992, 4095, 1278, 1930,
+ 4095, 0, 4095, 4095, 3894, 0, 0, 0, 0, 4095, 0, 0, 0, 0, 0, 0, 4095, 4095,
+ 4095, 1098, 4095, 2059, 0, 380, 3166, 0, 4095, 2215, 0, 0, 2846, 0, 0, 2614,
+ 528, 4095, 0, 4095, 2371, 0, 4095, 0, 0, 0, 0, 4095, 3133, 4095, 4095, 0,
+ 4095, 1283, 3821, 1772, 0, 0, 4095, 4095, 4095, 890, 3475, 4095, 4095, 133,
+ 3292, 1819, 4095, 4095, 4095, 0, 0, 4095, 702, 4095, 0, 0, 0, 4095, 0, 2137,
+ 4095, 4095, 4095, 0, 0, 0, 4095, 4095, 1555, 2435, 2778, 4095, 0, 4095,
+ 3825, 0, 3736, 3054, 0, 0, 4095, 4095, 4095, 0, 0, 0, 0, 371, 4095, 4095, 0,
+ 0, 1565, 4095, 2731, 4095, 0, 756, 925, 0, 0, 0, 4095, 775, 1379, 4095,
+ 1439, 0, 0, 0, 2680, 0, 0, 4095, 1280, 4095, 0, 0, 4095, 4095, 0, 3088, 0,
+ 4095, 4095, 4095, 0, 0, 1526, 4095, 2314, 4095, 4095, 0, 4095, 288, 0, 205,
+ 4095, 4095, 4095, 0, 1247, 2014, 0, 1530, 1985, 0, 0, 4095, 3195, 0, 4095,
+ 4, 2397, 4095, 4095, 4095, 0, 4095, 4095, 4095, 0, 0, 0, 0, 0, 4031, 928,
+ 4095, 0, 0, 4095, 4095, 4095, 1966, 4095, 2299, 1215, 4095, 0, 4095, 1335,
+ 0, 4095, 1991, 4095, 0, 4095, 114, 0, 0, 0, 2123, 2639, 4095, 3323, 4095,
+ 4095, 418, 209, 0, 0, 4095, 4095, 4095, 4095, 963, 0, 0, 0, 4095, 2505, 0,
+ 3627, 0, 311, 3748, 2047, 4095, 2791, 0, 3643, 1852, 0, 0, 4095, 0, 2179, 0,
+ 4095, 2678, 0, 0, 0, 2342, 4095, 4095, 0, 0, 4095, 0, 0, 0, 0, 1076, 0, 0,
+ 4095, 0, 2370, 0, 3530, 0, 0, 0, 0, 0, 4095, 0, 0, 0, 3474, 1201, 0, 379,
+ 699, 4095, 777, 4095, 0, 4095, 4095, 0, 1213, 1762, 4095, 4095, 4095, 0,
+ 4095, 1090, 1233, 0, 4095, 0, 4095, 0, 0, 0, 2845, 3385, 2718, 0, 0, 2975,
+ 3630, 0, 4095, 4095, 4095, 4095, 3261, 243, 0, 4095, 0, 0, 3836, 4095, 4095,
+ 4095, 963, 0, 0, 2526, 0, 4095, 4000, 4095, 2069, 0, 0, 4095, 0, 4095, 1421,
+ 0, 4095, 0, 4095, 4095, 0, 4095, 0, 4095, 4095, 1537, 4095, 3201, 0, 0,
+ 4095, 2719, 4095, 0, 4095, 4095, 4095, 0, 4095, 0, 4095, 2300, 0, 2876, 0,
+ 4095, 4095, 4095, 3235, 497, 635, 0, 1480, 4095, 0, 3067, 3979, 3741, 0,
+ 3059, 1214, 4095, 4095, 2197, 0, 4095, 4095, 2734, 0, 4095, 4095, 3364,
+ 2369, 4095, 303, 4095, 0, 4095, 4095, 3472, 1733, 4095, 4095, 4095, 0, 55,
+ 0, 10, 1378, 1169, 4095, 0, 0, 688, 3613, 0, 4095, 2832, 867, 4095, 4095,
+ 3514, 4095, 0, 4095, 4095, 2458, 3506, 0, 1920, 0, 1762, 1178, 2549, 4095,
+ 3967, 4095, 0, 2975, 1282, 0, 377, 846, 3434, 97, 0, 0, 1616, 3526, 136,
+ 1888, 0, 147, 334, 4095, 0, 4095, 0, 4095, 1106, 4095, 0, 4095, 3280, 4095,
+ 4095, 0, 2849, 3528, 0, 4095, 4095, 0, 2306, 0, 3412, 0, 4095, 4095, 4095,
+ 4048, 2273, 0, 4095, 4095, 4095, 0, 4095, 3031, 4095, 4095, 4095, 0, 3382,
+ 3812, 2315, 4095, 0, 0, 0, 432, 4095, 3606, 0, 4, 2847, 4095, 0, 4095, 0, 0,
+ 2616, 4095, 4095, 0, 4095, 0, 3394, 4095, 3976, 3119, 0, 0, 0, 0, 4046,
+ 4095, 4095, 3331, 4095, 2127, 0, 4095, 0, 0, 0, 4095, 4095, 4095, 0, 4095,
+ 4095, 4095, 0, 2068, 0, 0, 3882, 2967, 0, 1745, 4095, 2112, 478, 0, 4095, 0,
+ 199, 4095, 4095, 3542, 4095, 2634, 4095, 4095, 1235, 4095, 4095, 167, 1553,
+ 0, 4095, 2649, 0, 3383, 0, 4095, 2803, 4095, 0, 4095, 0, 785, 4095, 0, 4095,
+ 1743, 4095, 0, 3945, 0, 4095, 1894, 4095, 3973, 4095, 0, 0, 4095, 0, 0,
+ 4095, 318, 4095, 4095, 4095, 0, 261, 4095, 4095, 2125, 2690, 4095, 0, 4095,
+ 3863, 1740, 4095, 0, 2899, 1509, 0, 0, 0, 2780, 4095, 1897, 2104, 4095,
+ 1708, 284, 4095, 0, 4095, 3382, 4095, 4095, 483, 0, 0, 0, 3099, 0, 4095, 0,
+ 926, 4095, 2062, 1931, 2121, 0, 4095, 0, 2485, 1535, 4095, 4095, 3662, 4095,
+ 2419, 2487, 0, 4095, 4095, 4095, 0, 0, 4095, 0, 0, 2029, 0, 3008, 2338, 0,
+ 4095, 0, 3854, 0, 4095, 0, 0, 1315, 0, 0, 0, 0, 3492, 0, 1445, 0, 11, 4095,
+ 0, 0, 873, 0, 4095, 0, 4095, 2654, 3040, 0, 0, 0, 4095, 0, 68, 4095, 0, 0,
+ 990, 0, 828, 1015, 88, 3606, 0, 2875, 4095, 0, 3117, 411, 0, 0, 2859, 0, 0,
+ 4095, 3480, 25, 4095, 4095, 4095, 0, 0, 0, 4095, 4095, 4095, 4095, 1724, 0,
+ 0, 0, 3635, 1063, 3728, 4095, 4095, 2025, 3715, 0, 0, 0, 3722, 0, 1648, 0,
+ 4095, 3579, 0, 0, 0, 4095, 4095, 0, 4095
+ };
+ unsigned char *img_data =
+ reinterpret_cast<unsigned char *>(const_cast<uint16_t *>(buffer));
+
+ aom_image_t img;
+ EXPECT_EQ(
+ aom_img_wrap(&img, AOM_IMG_FMT_I44416, kWidth, kHeight, 1, img_data),
+ &img);
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_ALL_INTRA),
+ AOM_CODEC_OK);
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 2;
+ cfg.g_bit_depth = AOM_BITS_12;
+ cfg.g_input_bit_depth = 12;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_limit = 1;
+ cfg.g_lag_in_frames = 0;
+ cfg.kf_mode = AOM_KF_DISABLED;
+ cfg.kf_max_dist = 0;
+ cfg.g_threads = 34;
+ cfg.rc_min_quantizer = 8;
+ cfg.rc_max_quantizer = 20;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, AOM_CODEC_USE_HIGHBITDEPTH),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 14), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_ROW_MT, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_ROWS, 4), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_COLUMNS, 4), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_SKIP_POSTPROC_FILTERING, 1),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM),
+ AOM_CODEC_OK);
+
+ // Encode frame
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(aom_codec_encode(&enc, nullptr, 0, 1, 0), AOM_CODEC_OK);
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
+}
+
+// A test that reproduces b/272139363: signed integer overflow in
+// update_b_sep_sym().
+TEST(SearchWienerTest, 10bitSignedIntegerOverflowInUpdateBSepSym) {
+ constexpr int kWidth = 34;
+ constexpr int kHeight = 3;
+ static const uint16_t buffer[3 * kWidth * kHeight] = {
+ // Y plane:
+ 61, 765, 674, 188, 367, 944, 153, 275, 906, 433, 154, 51, 8, 855, 186, 154,
+ 392, 0, 634, 3, 690, 1023, 1023, 1023, 1023, 1023, 1023, 8, 1, 64, 426, 0,
+ 100, 344, 944, 816, 816, 33, 1023, 1023, 1023, 1023, 295, 1023, 1023, 1023,
+ 1023, 1023, 1023, 1015, 1023, 231, 1020, 254, 439, 439, 894, 439, 150, 1019,
+ 1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 385, 320, 575,
+ 682, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 511, 699, 987, 3, 140,
+ 661, 120, 33, 143, 0, 0, 0, 3, 40, 625, 585, 16, 579, 160, 867,
+ // U plane:
+ 739, 646, 13, 603, 7, 328, 91, 32, 488, 870, 330, 330, 330, 330, 330, 330,
+ 109, 330, 330, 330, 3, 545, 945, 249, 35, 561, 801, 32, 931, 639, 801, 91,
+ 1023, 827, 844, 948, 631, 894, 854, 601, 432, 504, 85, 1, 0, 0, 89, 89, 0,
+ 0, 0, 0, 0, 0, 432, 801, 382, 4, 0, 0, 2, 89, 89, 89, 89, 89, 89, 384, 0, 0,
+ 0, 0, 0, 0, 0, 1023, 1019, 1, 3, 691, 575, 691, 691, 691, 691, 691, 691,
+ 691, 691, 691, 691, 691, 84, 527, 4, 485, 8, 682, 698, 340, 1015, 706,
+ // V plane:
+ 49, 10, 28, 1023, 1023, 1023, 0, 32, 32, 872, 114, 1003, 1023, 57, 477, 999,
+ 1023, 309, 309, 309, 309, 309, 309, 309, 309, 309, 309, 309, 309, 309, 309,
+ 9, 418, 418, 418, 418, 418, 418, 0, 0, 0, 1023, 4, 5, 0, 0, 1023, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 64, 0, 155, 709, 3, 331, 807, 633, 1023,
+ 1018, 646, 886, 991, 692, 915, 294, 0, 35, 2, 0, 471, 643, 770, 346, 176,
+ 32, 329, 322, 302, 61, 765, 674, 188, 367, 944, 153, 275, 906, 433, 154
+ };
+ unsigned char *img_data =
+ reinterpret_cast<unsigned char *>(const_cast<uint16_t *>(buffer));
+
+ aom_image_t img;
+ EXPECT_EQ(&img, aom_img_wrap(&img, AOM_IMG_FMT_I44416, kWidth, kHeight, 1,
+ img_data));
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_ALL_INTRA));
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 1;
+ cfg.g_bit_depth = AOM_BITS_10;
+ cfg.g_input_bit_depth = 10;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_limit = 1;
+ cfg.g_lag_in_frames = 0;
+ cfg.kf_mode = AOM_KF_DISABLED;
+ cfg.kf_max_dist = 0;
+ cfg.rc_min_quantizer = 3;
+ cfg.rc_max_quantizer = 54;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_enc_init(&enc, iface, &cfg, AOM_CODEC_USE_HIGHBITDEPTH));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 28));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_control(&enc, AV1E_SET_TILE_COLUMNS, 3));
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_control(&enc, AOME_SET_CPUUSED, 0));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_control(&enc, AV1E_SET_SKIP_POSTPROC_FILTERING, 1));
+ EXPECT_EQ(AOM_CODEC_OK,
+ aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM));
+
+ // Encode frame
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, &img, 0, 1, 0));
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_encode(&enc, nullptr, 0, 1, 0));
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(AOM_CODEC_OK, aom_codec_destroy(&enc));
+}
+
+// A test that reproduces b/277121724: signed integer overflow in
+// update_b_sep_sym().
+TEST(SearchWienerTest, 8bitSignedIntegerOverflowInUpdateBSepSym) {
+ constexpr int kWidth = 198;
+ constexpr int kHeight = 3;
+ // 8-bit YUV 4:2:2
+ static const unsigned char buffer[2 * kWidth * kHeight] = {
+ // Y plane:
+ 35, 225, 56, 91, 8, 142, 137, 143, 224, 49, 217, 57, 202, 163, 159, 246,
+ 232, 134, 135, 14, 76, 101, 239, 88, 186, 159, 118, 23, 114, 20, 108, 41,
+ 72, 17, 58, 242, 45, 146, 230, 14, 135, 140, 34, 61, 189, 181, 222, 71, 98,
+ 221, 5, 199, 244, 85, 229, 163, 105, 87, 144, 105, 64, 150, 36, 233, 235, 1,
+ 179, 190, 50, 222, 176, 109, 166, 18, 80, 129, 45, 9, 218, 144, 234, 10,
+ 148, 117, 37, 10, 232, 139, 206, 92, 208, 247, 128, 79, 202, 79, 212, 89,
+ 185, 152, 206, 182, 83, 105, 21, 86, 150, 84, 21, 165, 34, 251, 174, 240,
+ 172, 155, 254, 85, 98, 25, 96, 78, 230, 253, 36, 19, 247, 155, 112, 216,
+ 166, 114, 229, 118, 197, 149, 186, 194, 128, 45, 219, 26, 36, 77, 110, 45,
+ 252, 238, 183, 161, 171, 96, 232, 108, 73, 61, 243, 58, 155, 38, 91, 209,
+ 187, 206, 16, 165, 236, 145, 69, 126, 102, 10, 4, 43, 191, 106, 193, 240,
+ 132, 226, 38, 78, 7, 152, 101, 255, 254, 39, 33, 86, 35, 247, 199, 179, 239,
+ 198, 165, 58, 190, 171, 226, 94, 158, 21, 190, 151, 75, 176, 11, 53, 199,
+ 87, 91, 1, 226, 20, 117, 96, 75, 192, 101, 200, 125, 106, 233, 176, 63, 204,
+ 114, 16, 31, 222, 15, 14, 71, 2, 25, 47, 100, 174, 26, 209, 138, 138, 211,
+ 147, 164, 204, 9, 104, 135, 250, 9, 201, 88, 218, 71, 251, 61, 199, 0, 34,
+ 59, 115, 228, 161, 100, 132, 50, 4, 117, 100, 191, 126, 53, 28, 193, 42,
+ 155, 206, 79, 80, 117, 11, 3, 253, 181, 181, 138, 239, 107, 142, 216, 57,
+ 202, 126, 229, 250, 60, 62, 150, 128, 95, 32, 251, 207, 236, 208, 247, 183,
+ 59, 19, 117, 40, 106, 87, 140, 57, 109, 190, 51, 105, 226, 116, 156, 3, 35,
+ 86, 255, 138, 52, 211, 245, 76, 83, 109, 113, 77, 106, 77, 18, 56, 235, 158,
+ 24, 53, 151, 104, 152, 21, 15, 46, 163, 144, 217, 168, 154, 44, 80, 25, 11,
+ 37, 100, 235, 145, 154, 113, 0, 140, 153, 80, 64, 19, 121, 185, 144, 43,
+ 206, 16, 16, 72, 189, 175, 231, 177, 40, 177, 206, 116, 4, 82, 43, 244, 237,
+ 22, 252, 71, 194, 106, 4, 112, 0, 108, 137, 126, 80, 122, 142, 43, 205, 22,
+ 209, 217, 165, 32, 208, 100, 70, 3, 120, 159, 203, 7, 233, 152, 37, 96, 212,
+ 177, 1, 133, 218, 161, 172, 202, 192, 186, 114, 150, 121, 177, 227, 175, 64,
+ 127, 153, 113, 91, 198, 0, 111, 227, 226, 218, 71, 62, 5, 43, 128, 27, 3,
+ 82, 5, 10, 68, 153, 215, 181, 138, 246, 224, 170, 1, 241, 191, 181, 151,
+ 167, 14, 80, 45, 4, 252, 29, 66, 125, 58, 225, 253, 255, 248, 224, 40, 24,
+ 236, 46, 11, 219, 154, 134, 12, 76, 72, 97, 239, 50, 39, 85, 182, 55, 219,
+ 19, 109, 81, 119, 125, 206, 159, 239, 67, 193, 180, 132, 80, 127, 2, 169,
+ 99, 53, 47, 5, 100, 174, 151, 124, 246, 202, 93, 82, 65, 53, 214, 238, 32,
+ 218, 15, 254, 153, 95, 79, 189, 67, 233, 47, 83, 48, 125, 144, 206, 82, 69,
+ 186, 112, 134, 244, 96, 21, 143, 187, 248, 8, 224, 161, 227, 185, 236, 6,
+ 175, 237, 169, 154, 89, 143, 106, 205, 26, 47, 155, 42, 28, 162, 7, 8, 45,
+ // U plane:
+ 55, 165, 203, 139, 152, 208, 36, 177, 61, 49, 129, 211, 140, 71, 253, 250,
+ 120, 167, 238, 67, 255, 223, 104, 32, 240, 179, 28, 41, 86, 84, 61, 243,
+ 169, 212, 201, 0, 9, 236, 89, 194, 204, 75, 228, 250, 27, 81, 137, 29, 255,
+ 131, 194, 241, 76, 133, 186, 135, 212, 197, 150, 145, 203, 96, 86, 231, 91,
+ 119, 197, 67, 226, 2, 118, 66, 181, 86, 219, 86, 132, 137, 156, 161, 221,
+ 18, 55, 170, 35, 206, 201, 193, 38, 63, 229, 29, 110, 96, 14, 135, 229, 99,
+ 106, 108, 167, 110, 50, 32, 144, 113, 48, 29, 57, 29, 20, 199, 145, 245, 9,
+ 183, 88, 174, 114, 237, 29, 40, 99, 117, 233, 6, 51, 227, 2, 28, 76, 149,
+ 190, 23, 240, 73, 113, 10, 73, 240, 105, 220, 129, 26, 144, 214, 34, 4, 24,
+ 219, 24, 156, 198, 214, 244, 143, 106, 255, 204, 93, 2, 88, 107, 211, 241,
+ 242, 86, 189, 219, 164, 132, 149, 32, 228, 219, 60, 202, 218, 189, 34, 250,
+ 160, 158, 36, 212, 212, 41, 233, 61, 92, 121, 170, 220, 192, 232, 255, 124,
+ 249, 231, 55, 196, 219, 196, 62, 238, 187, 76, 33, 138, 67, 82, 159, 169,
+ 196, 66, 196, 110, 194, 64, 35, 205, 64, 218, 12, 41, 188, 195, 244, 178,
+ 17, 80, 8, 149, 39, 110, 146, 164, 162, 215, 227, 107, 103, 47, 52, 95, 3,
+ 181, 90, 255, 80, 83, 206, 66, 153, 112, 72, 109, 235, 69, 105, 57, 75, 145,
+ 186, 16, 87, 73, 61, 98, 197, 237, 17, 32, 207, 220, 246, 188, 46, 73, 121,
+ 84, 252, 164, 111, 21, 98, 13, 170, 174, 170, 231, 77, 10, 113, 9, 217, 11,
+ // V plane:
+ 124, 94, 69, 212, 107, 223, 228, 96, 56, 2, 158, 49, 251, 217, 143, 107,
+ 113, 17, 84, 169, 208, 43, 28, 37, 176, 54, 235, 150, 135, 135, 221, 94, 50,
+ 131, 251, 78, 38, 254, 129, 200, 207, 55, 111, 110, 144, 109, 228, 65, 70,
+ 39, 170, 5, 208, 151, 87, 86, 255, 74, 155, 153, 250, 15, 35, 33, 201, 226,
+ 117, 119, 220, 238, 133, 229, 69, 122, 160, 114, 245, 182, 13, 65, 2, 228,
+ 205, 174, 128, 248, 4, 139, 178, 227, 204, 243, 249, 253, 119, 253, 107,
+ 234, 39, 15, 173, 47, 93, 12, 222, 238, 30, 121, 124, 167, 27, 40, 215, 84,
+ 172, 130, 66, 43, 165, 55, 225, 79, 84, 153, 59, 110, 64, 176, 54, 123, 82,
+ 128, 189, 150, 52, 202, 102, 133, 199, 197, 253, 180, 221, 127, 144, 124,
+ 255, 224, 52, 149, 88, 166, 39, 38, 78, 114, 44, 242, 233, 40, 132, 142,
+ 152, 213, 112, 244, 221, 7, 52, 206, 246, 51, 182, 160, 247, 154, 183, 209,
+ 81, 70, 56, 186, 63, 182, 2, 82, 202, 178, 233, 52, 198, 241, 175, 38, 165,
+ 9, 231, 150, 114, 43, 159, 200, 42, 173, 217, 25, 233, 214, 210, 50, 43,
+ 159, 231, 102, 241, 246, 77, 76, 115, 77, 81, 114, 194, 182, 236, 0, 236,
+ 198, 197, 180, 176, 148, 48, 177, 106, 180, 150, 158, 237, 130, 242, 109,
+ 174, 247, 57, 230, 184, 64, 245, 251, 123, 169, 122, 156, 125, 123, 104,
+ 238, 1, 235, 187, 53, 67, 38, 50, 139, 123, 149, 111, 72, 80, 17, 175, 186,
+ 98, 153, 247, 97, 218, 141, 38, 0, 171, 254, 180, 81, 233, 71, 156, 48, 14,
+ 62, 210, 161, 124, 203, 92
+ };
+ unsigned char *img_data = const_cast<unsigned char *>(buffer);
+
+ aom_image_t img;
+ EXPECT_EQ(aom_img_wrap(&img, AOM_IMG_FMT_I422, kWidth, kHeight, 1, img_data),
+ &img);
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_ALL_INTRA),
+ AOM_CODEC_OK);
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 2;
+ cfg.g_bit_depth = AOM_BITS_8;
+ cfg.g_input_bit_depth = 8;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_limit = 1;
+ cfg.g_lag_in_frames = 0;
+ cfg.kf_mode = AOM_KF_DISABLED;
+ cfg.kf_max_dist = 0;
+ cfg.g_threads = 43;
+ cfg.rc_min_quantizer = 30;
+ cfg.rc_max_quantizer = 50;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, 0), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 40), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_ROW_MT, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_ROWS, 4), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_COLUMNS, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 2), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_SKIP_POSTPROC_FILTERING, 1),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM),
+ AOM_CODEC_OK);
+
+ // Encode frame
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(aom_codec_encode(&enc, nullptr, 0, 1, 0), AOM_CODEC_OK);
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
+}
+
+// A test that reproduces b/259173819: signed integer overflow in
+// linsolve_wiener().
+TEST(SearchWienerTest, 10bitSignedIntegerOverflowInLinsolveWiener) {
+ constexpr int kWidth = 3;
+ constexpr int kHeight = 3;
+ static const uint16_t buffer[3 * kWidth * kHeight] = {
+ // Y plane:
+ 81, 81, 1023, 1020, 81, 1023, 81, 128, 0,
+ // U plane:
+ 273, 273, 273, 273, 273, 273, 273, 273, 273,
+ // V plane:
+ 273, 273, 273, 273, 273, 273, 516, 81, 81
+ };
+ unsigned char *img_data =
+ reinterpret_cast<unsigned char *>(const_cast<uint16_t *>(buffer));
+
+ aom_image_t img;
+ EXPECT_EQ(
+ aom_img_wrap(&img, AOM_IMG_FMT_I44416, kWidth, kHeight, 1, img_data),
+ &img);
+ img.cp = AOM_CICP_CP_UNSPECIFIED;
+ img.tc = AOM_CICP_TC_UNSPECIFIED;
+ img.mc = AOM_CICP_MC_UNSPECIFIED;
+ img.range = AOM_CR_FULL_RANGE;
+
+ aom_codec_iface_t *iface = aom_codec_av1_cx();
+ aom_codec_enc_cfg_t cfg;
+ EXPECT_EQ(aom_codec_enc_config_default(iface, &cfg, AOM_USAGE_ALL_INTRA),
+ AOM_CODEC_OK);
+ cfg.rc_end_usage = AOM_Q;
+ cfg.g_profile = 1;
+ cfg.g_bit_depth = AOM_BITS_10;
+ cfg.g_input_bit_depth = 10;
+ cfg.g_w = kWidth;
+ cfg.g_h = kHeight;
+ cfg.g_limit = 1;
+ cfg.g_lag_in_frames = 0;
+ cfg.kf_mode = AOM_KF_DISABLED;
+ cfg.kf_max_dist = 0;
+ cfg.g_threads = 21;
+ cfg.rc_min_quantizer = 16;
+ cfg.rc_max_quantizer = 54;
+ aom_codec_ctx_t enc;
+ EXPECT_EQ(aom_codec_enc_init(&enc, iface, &cfg, AOM_CODEC_USE_HIGHBITDEPTH),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CQ_LEVEL, 35), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_ROW_MT, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_ROWS, 2), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_TILE_COLUMNS, 5), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_CPUUSED, 1), AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_COLOR_RANGE, AOM_CR_FULL_RANGE),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AV1E_SET_SKIP_POSTPROC_FILTERING, 1),
+ AOM_CODEC_OK);
+ EXPECT_EQ(aom_codec_control(&enc, AOME_SET_TUNING, AOM_TUNE_SSIM),
+ AOM_CODEC_OK);
+
+ // Encode frame
+ EXPECT_EQ(aom_codec_encode(&enc, &img, 0, 1, 0), AOM_CODEC_OK);
+ aom_codec_iter_t iter = nullptr;
+ const aom_codec_cx_pkt_t *pkt = aom_codec_get_cx_data(&enc, &iter);
+ ASSERT_NE(pkt, nullptr);
+ EXPECT_EQ(pkt->kind, AOM_CODEC_CX_FRAME_PKT);
+ // pkt->data.frame.flags is 0x1f0011.
+ EXPECT_EQ(pkt->data.frame.flags & AOM_FRAME_IS_KEY, AOM_FRAME_IS_KEY);
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ // Flush encoder
+ EXPECT_EQ(aom_codec_encode(&enc, nullptr, 0, 1, 0), AOM_CODEC_OK);
+ iter = nullptr;
+ pkt = aom_codec_get_cx_data(&enc, &iter);
+ EXPECT_EQ(pkt, nullptr);
+
+ EXPECT_EQ(aom_codec_destroy(&enc), AOM_CODEC_OK);
+}
+
} // namespace wiener_highbd
#endif // CONFIG_AV1_HIGHBITDEPTH
diff --git a/third_party/fastfeat/README.libaom b/third_party/fastfeat/README.libaom
index ce7ce70..8aaee12 100644
--- a/third_party/fastfeat/README.libaom
+++ b/third_party/fastfeat/README.libaom
@@ -39,3 +39,5 @@
Convert tabs to spaces
Prefix global functions with "aom_"
Add error checking
+Add output argument to hold the scores of the detected features
+Add assertion and rewrite comparisons to appease the scan-build static analyzer
diff --git a/third_party/fastfeat/fast.c b/third_party/fastfeat/fast.c
index 30efde8..a684a33 100644
--- a/third_party/fastfeat/fast.c
+++ b/third_party/fastfeat/fast.c
@@ -33,20 +33,21 @@
#include "fast.h"
-xy* aom_fast9_detect_nonmax(const byte* im, int xsize, int ysize, int stride, int b, int* ret_num_corners)
+xy* aom_fast9_detect_nonmax(const byte* im, int xsize, int ysize, int stride, int b,
+ int** ret_scores, int* ret_num_corners)
{
- xy* corners;
- int num_corners;
- int* scores;
- xy* nonmax;
+ xy* corners;
+ int num_corners;
+ int* scores;
+ xy* nonmax;
- corners = aom_fast9_detect(im, xsize, ysize, stride, b, &num_corners);
- scores = aom_fast9_score(im, stride, corners, num_corners, b);
- nonmax = aom_nonmax_suppression(corners, scores, num_corners, ret_num_corners);
+ corners = aom_fast9_detect(im, xsize, ysize, stride, b, &num_corners);
+ scores = aom_fast9_score(im, stride, corners, num_corners, b);
+ nonmax = aom_nonmax_suppression(corners, scores, num_corners, ret_scores, ret_num_corners);
- free(corners);
- free(scores);
+ free(corners);
+ free(scores);
- return nonmax;
+ return nonmax;
}
// clang-format on
diff --git a/third_party/fastfeat/fast.h b/third_party/fastfeat/fast.h
index d7a9617..7fd199f 100644
--- a/third_party/fastfeat/fast.h
+++ b/third_party/fastfeat/fast.h
@@ -41,9 +41,11 @@
int* aom_fast9_score(const byte* i, int stride, xy* corners, int num_corners, int b);
-xy* aom_fast9_detect_nonmax(const byte* im, int xsize, int ysize, int stride, int b, int* ret_num_corners);
+xy* aom_fast9_detect_nonmax(const byte* im, int xsize, int ysize, int stride, int b,
+ int** ret_scores, int* ret_num_corners);
-xy* aom_nonmax_suppression(const xy* corners, const int* scores, int num_corners, int* ret_num_nonmax);
+xy* aom_nonmax_suppression(const xy* corners, const int* scores, int num_corners,
+ int** ret_scores, int* ret_num_nonmax);
#endif
diff --git a/third_party/fastfeat/nonmax.c b/third_party/fastfeat/nonmax.c
index 39ec18c..cc0ada7 100644
--- a/third_party/fastfeat/nonmax.c
+++ b/third_party/fastfeat/nonmax.c
@@ -29,19 +29,22 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// clang-format off
+#include <assert.h>
#include <stdlib.h>
#include "fast.h"
#define Compare(X, Y) ((X)>=(Y))
-xy* aom_nonmax_suppression(const xy* corners, const int* scores, int num_corners, int* ret_num_nonmax)
+xy* aom_nonmax_suppression(const xy* corners, const int* scores, int num_corners,
+ int** ret_scores, int* ret_num_nonmax)
{
int num_nonmax=0;
int last_row;
int* row_start;
int i, j;
xy* ret_nonmax;
+ int* nonmax_scores;
const int sz = (int)num_corners;
/*Point above points (roughly) to the pixel above the one of interest, if there
@@ -49,6 +52,7 @@
int point_above = 0;
int point_below = 0;
+ *ret_scores = 0;
*ret_num_nonmax = 0;
if(!(corners && scores) || num_corners < 1)
{
@@ -61,6 +65,13 @@
return 0;
}
+ nonmax_scores = (int*)malloc(num_corners * sizeof(*nonmax_scores));
+ if (!nonmax_scores)
+ {
+ free(ret_nonmax);
+ return 0;
+ }
+
/* Find where each row begins
(the corners are output in raster scan order). A beginning of -1 signifies
that there are no corners on that row. */
@@ -69,6 +80,7 @@
if(!row_start)
{
free(ret_nonmax);
+ free(nonmax_scores);
return 0;
}
@@ -91,6 +103,7 @@
{
int score = scores[i];
xy pos = corners[i];
+ assert(pos.y <= last_row);
/*Check left */
if(i > 0)
@@ -103,55 +116,56 @@
continue;
/*Check above (if there is a valid row above)*/
- if(pos.y > 0)
- if (row_start[pos.y - 1] != -1)
+ if(pos.y > 0 && row_start[pos.y - 1] != -1)
+ {
+ /*Make sure that current point_above is one
+ row above.*/
+ if(corners[point_above].y < pos.y - 1)
+ point_above = row_start[pos.y-1];
+
+ /*Make point_above point to the first of the pixels above the current point,
+ if it exists.*/
+ for(; corners[point_above].y < pos.y && corners[point_above].x < pos.x - 1; point_above++)
+ {}
+
+
+ for(j=point_above; corners[j].y < pos.y && corners[j].x <= pos.x + 1; j++)
{
- /*Make sure that current point_above is one
- row above.*/
- if(corners[point_above].y < pos.y - 1)
- point_above = row_start[pos.y-1];
-
- /*Make point_above point to the first of the pixels above the current point,
- if it exists.*/
- for(; corners[point_above].y < pos.y && corners[point_above].x < pos.x - 1; point_above++)
- {}
-
-
- for(j=point_above; corners[j].y < pos.y && corners[j].x <= pos.x + 1; j++)
- {
- int x = corners[j].x;
- if( (x == pos.x - 1 || x ==pos.x || x == pos.x+1) && Compare(scores[j], score))
- goto cont;
- }
-
+ int x = corners[j].x;
+ if( (x == pos.x - 1 || x ==pos.x || x == pos.x+1) && Compare(scores[j], score))
+ goto cont;
}
+ }
+
/*Check below (if there is anything below)*/
- if(pos.y >= 0)
- if (pos.y != last_row && row_start[pos.y + 1] != -1 && point_below < sz) /*Nothing below*/
+ if (pos.y + 1 < last_row+1 && row_start[pos.y + 1] != -1 && point_below < sz) /*Nothing below*/
+ {
+ if(corners[point_below].y < pos.y + 1)
+ point_below = row_start[pos.y+1];
+
+ /* Make point below point to one of the pixels belowthe current point, if it
+ exists.*/
+ for(; point_below < sz && corners[point_below].y == pos.y+1 && corners[point_below].x < pos.x - 1; point_below++)
+ {}
+
+ for(j=point_below; j < sz && corners[j].y == pos.y+1 && corners[j].x <= pos.x + 1; j++)
{
- if(corners[point_below].y < pos.y + 1)
- point_below = row_start[pos.y+1];
-
- /* Make point below point to one of the pixels belowthe current point, if it
- exists.*/
- for(; point_below < sz && corners[point_below].y == pos.y+1 && corners[point_below].x < pos.x - 1; point_below++)
- {}
-
- for(j=point_below; j < sz && corners[j].y == pos.y+1 && corners[j].x <= pos.x + 1; j++)
- {
- int x = corners[j].x;
- if( (x == pos.x - 1 || x ==pos.x || x == pos.x+1) && Compare(scores[j],score))
- goto cont;
- }
+ int x = corners[j].x;
+ if( (x == pos.x - 1 || x ==pos.x || x == pos.x+1) && Compare(scores[j],score))
+ goto cont;
}
+ }
- ret_nonmax[num_nonmax++] = corners[i];
+ ret_nonmax[num_nonmax] = corners[i];
+ nonmax_scores[num_nonmax] = scores[i];
+ num_nonmax++;
cont:
;
}
free(row_start);
+ *ret_scores = nonmax_scores;
*ret_num_nonmax = num_nonmax;
return ret_nonmax;
}
diff --git a/third_party/libwebm/AUTHORS.TXT b/third_party/libwebm/AUTHORS.TXT
index 9686ac1..59b648c 100644
--- a/third_party/libwebm/AUTHORS.TXT
+++ b/third_party/libwebm/AUTHORS.TXT
@@ -2,3 +2,4 @@
# Name or Organization <email address>
Google Inc.
+Elijah Cirioli <[email protected]>
diff --git a/third_party/libwebm/Android.mk b/third_party/libwebm/Android.mk
index 1185198..e6c17df 100644
--- a/third_party/libwebm/Android.mk
+++ b/third_party/libwebm/Android.mk
@@ -1,3 +1,5 @@
+# Ignore this file during non-NDK builds.
+ifdef NDK_ROOT
LOCAL_PATH:= $(call my-dir)
include $(CLEAR_VARS)
@@ -18,3 +20,4 @@
LOCAL_LICENSE_CONDITIONS := notice
LOCAL_NOTICE_FILE := $(LOCAL_PATH)/LICENSE.TXT $(LOCAL_PATH)/PATENTS.TXT
include $(BUILD_STATIC_LIBRARY)
+endif # NDK_ROOT
diff --git a/third_party/libwebm/README.libaom b/third_party/libwebm/README.libaom
index 325604c..ee350a5 100644
--- a/third_party/libwebm/README.libaom
+++ b/third_party/libwebm/README.libaom
@@ -1,7 +1,7 @@
URL: https://chromium.googlesource.com/webm/libwebm
-Version: ee0bab576c338c9807249b99588e352b7268cb62
+Version: 1930e3ca23b007f3ff11d98a570077be6201957e
License: BSD
-License File: LICENSE.txt
+License File: LICENSE.TXT
Description:
libwebm is used to handle WebM container I/O.
diff --git a/third_party/libwebm/mkvmuxer/mkvmuxer.cc b/third_party/libwebm/mkvmuxer/mkvmuxer.cc
index ae36531..faaf016 100644
--- a/third_party/libwebm/mkvmuxer/mkvmuxer.cc
+++ b/third_party/libwebm/mkvmuxer/mkvmuxer.cc
@@ -607,10 +607,10 @@
return true;
}
-uint64_t ContentEncoding::EncodingSize(uint64_t compresion_size,
+uint64_t ContentEncoding::EncodingSize(uint64_t compression_size,
uint64_t encryption_size) const {
// TODO(fgalligan): Add support for compression settings.
- if (compresion_size != 0)
+ if (compression_size != 0)
return 0;
uint64_t encoding_size = 0;
diff --git a/third_party/libwebm/mkvmuxer/mkvmuxer.h b/third_party/libwebm/mkvmuxer/mkvmuxer.h
index f2db377..8602d82 100644
--- a/third_party/libwebm/mkvmuxer/mkvmuxer.h
+++ b/third_party/libwebm/mkvmuxer/mkvmuxer.h
@@ -330,7 +330,7 @@
private:
// Returns the size in bytes for the encoding elements.
- uint64_t EncodingSize(uint64_t compresion_size,
+ uint64_t EncodingSize(uint64_t compression_size,
uint64_t encryption_size) const;
// Returns the size in bytes for the encryption elements.
@@ -1425,7 +1425,7 @@
bool Write(IMkvWriter* writer);
// We are going to put a cap on the number of Seek Entries.
- const static int32_t kSeekEntryCount = 5;
+ constexpr static int32_t kSeekEntryCount = 5;
private:
// Returns the maximum size in bytes of one seek entry.
@@ -1505,8 +1505,8 @@
kBeforeClusters = 0x1 // Position Cues before Clusters
};
- static const uint32_t kDefaultDocTypeVersion = 4;
- static const uint64_t kDefaultMaxClusterDuration = 30000000000ULL;
+ static constexpr uint32_t kDefaultDocTypeVersion = 4;
+ static constexpr uint64_t kDefaultMaxClusterDuration = 30000000000ULL;
Segment();
~Segment();
diff --git a/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc b/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc
index bd2f769..300b155 100644
--- a/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc
+++ b/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc
@@ -607,7 +607,7 @@
void GetVersion(int32* major, int32* minor, int32* build, int32* revision) {
*major = 0;
*minor = 3;
- *build = 0;
+ *build = 1;
*revision = 0;
}
diff --git a/third_party/libwebm/mkvparser/mkvparser.cc b/third_party/libwebm/mkvparser/mkvparser.cc
index de8884b..868afcb 100644
--- a/third_party/libwebm/mkvparser/mkvparser.cc
+++ b/third_party/libwebm/mkvparser/mkvparser.cc
@@ -55,7 +55,7 @@
void GetVersion(int& major, int& minor, int& build, int& revision) {
major = 1;
minor = 1;
- build = 0;
+ build = 1;
revision = 0;
}
@@ -298,7 +298,7 @@
if (status < 0)
return status;
- unsigned long long result = first_byte;
+ unsigned long long result = static_cast<unsigned long long>(first_byte);
++pos;
for (long i = 1; i < size; ++i) {
@@ -2432,7 +2432,7 @@
pos += size; // consume payload
}
- if ((m_pos < 0) || (m_track <= 0)) {
+ if ((m_pos < 0) || (m_track <= 0) || (m_block < 0) || (m_block > LONG_MAX)) {
return false;
}
diff --git a/third_party/libyuv/source/row_x86.asm b/third_party/libyuv/source/row_x86.asm
deleted file mode 100644
index 0cb326f..0000000
--- a/third_party/libyuv/source/row_x86.asm
+++ /dev/null
@@ -1,146 +0,0 @@
-;
-; Copyright 2012 The LibYuv Project Authors. All rights reserved.
-;
-; Use of this source code is governed by a BSD-style license
-; that can be found in the LICENSE file in the root of the source
-; tree. An additional intellectual property rights grant can be found
-; in the file PATENTS. All contributing project authors may
-; be found in the AUTHORS file in the root of the source tree.
-;
-
-%ifdef __YASM_VERSION_ID__
-%if __YASM_VERSION_ID__ < 01020000h
-%error AVX2 is supported only by yasm 1.2.0 or later.
-%endif
-%endif
-%include "x86inc.asm"
-
-SECTION .text
-
-; cglobal numeric constants are parameters, gpr regs, mm regs
-
-; void YUY2ToYRow_SSE2(const uint8* src_yuy2, uint8* dst_y, int pix)
-
-%macro YUY2TOYROW 2-3
-cglobal %1ToYRow%3, 3, 3, 3, src_yuy2, dst_y, pix
-%ifidn %1,YUY2
- pcmpeqb m2, m2, m2 ; generate mask 0x00ff00ff
- psrlw m2, m2, 8
-%endif
-
- ALIGN 4
-.convertloop:
- mov%2 m0, [src_yuy2q]
- mov%2 m1, [src_yuy2q + mmsize]
- lea src_yuy2q, [src_yuy2q + mmsize * 2]
-%ifidn %1,YUY2
- pand m0, m0, m2 ; YUY2 even bytes are Y
- pand m1, m1, m2
-%else
- psrlw m0, m0, 8 ; UYVY odd bytes are Y
- psrlw m1, m1, 8
-%endif
- packuswb m0, m0, m1
-%if cpuflag(AVX2)
- vpermq m0, m0, 0xd8
-%endif
- sub pixd, mmsize
- mov%2 [dst_yq], m0
- lea dst_yq, [dst_yq + mmsize]
- jg .convertloop
- REP_RET
-%endmacro
-
-; TODO(fbarchard): Remove MMX. Add SSSE3 pshufb version.
-INIT_MMX MMX
-YUY2TOYROW YUY2,a,
-YUY2TOYROW YUY2,u,_Unaligned
-YUY2TOYROW UYVY,a,
-YUY2TOYROW UYVY,u,_Unaligned
-INIT_XMM SSE2
-YUY2TOYROW YUY2,a,
-YUY2TOYROW YUY2,u,_Unaligned
-YUY2TOYROW UYVY,a,
-YUY2TOYROW UYVY,u,_Unaligned
-INIT_YMM AVX2
-YUY2TOYROW YUY2,a,
-YUY2TOYROW UYVY,a,
-
-; void SplitUVRow_SSE2(const uint8* src_uv, uint8* dst_u, uint8* dst_v, int pix)
-
-%macro SplitUVRow 1-2
-cglobal SplitUVRow%2, 4, 4, 5, src_uv, dst_u, dst_v, pix
- pcmpeqb m4, m4, m4 ; generate mask 0x00ff00ff
- psrlw m4, m4, 8
- sub dst_vq, dst_uq
-
- ALIGN 4
-.convertloop:
- mov%1 m0, [src_uvq]
- mov%1 m1, [src_uvq + mmsize]
- lea src_uvq, [src_uvq + mmsize * 2]
- psrlw m2, m0, 8 ; odd bytes
- psrlw m3, m1, 8
- pand m0, m0, m4 ; even bytes
- pand m1, m1, m4
- packuswb m0, m0, m1
- packuswb m2, m2, m3
-%if cpuflag(AVX2)
- vpermq m0, m0, 0xd8
- vpermq m2, m2, 0xd8
-%endif
- mov%1 [dst_uq], m0
- mov%1 [dst_uq + dst_vq], m2
- lea dst_uq, [dst_uq + mmsize]
- sub pixd, mmsize
- jg .convertloop
- REP_RET
-%endmacro
-
-INIT_MMX MMX
-SplitUVRow a,
-SplitUVRow u,_Unaligned
-INIT_XMM SSE2
-SplitUVRow a,
-SplitUVRow u,_Unaligned
-INIT_YMM AVX2
-SplitUVRow a,
-
-; void MergeUVRow_SSE2(const uint8* src_u, const uint8* src_v, uint8* dst_uv,
-; int width);
-
-%macro MergeUVRow_ 1-2
-cglobal MergeUVRow_%2, 4, 4, 3, src_u, src_v, dst_uv, pix
- sub src_vq, src_uq
-
- ALIGN 4
-.convertloop:
- mov%1 m0, [src_uq]
- mov%1 m1, [src_vq]
- lea src_uq, [src_uq + mmsize]
- punpcklbw m2, m0, m1 // first 8 UV pairs
- punpckhbw m0, m0, m1 // next 8 UV pairs
-%if cpuflag(AVX2)
- vperm2i128 m1, m2, m0, 0x20 // low 128 of ymm2 and low 128 of ymm0
- vperm2i128 m2, m2, m0, 0x31 // high 128 of ymm2 and high 128 of ymm0
- mov%1 [dst_uvq], m1
- mov%1 [dst_uvq + mmsize], m2
-%else
- mov%1 [dst_uvq], m2
- mov%1 [dst_uvq + mmsize], m0
-%endif
- lea dst_uvq, [dst_uvq + mmsize * 2]
- sub pixd, mmsize
- jg .convertloop
- REP_RET
-%endmacro
-
-INIT_MMX MMX
-MergeUVRow_ a,
-MergeUVRow_ u,_Unaligned
-INIT_XMM SSE2
-MergeUVRow_ a,
-MergeUVRow_ u,_Unaligned
-INIT_YMM AVX2
-MergeUVRow_ a,
-
diff --git a/third_party/libyuv/source/x86inc.asm b/third_party/libyuv/source/x86inc.asm
deleted file mode 100644
index cb5c32d..0000000
--- a/third_party/libyuv/source/x86inc.asm
+++ /dev/null
@@ -1,1136 +0,0 @@
-;*****************************************************************************
-;* x86inc.asm: x264asm abstraction layer
-;*****************************************************************************
-;* Copyright (C) 2005-2012 x264 project
-;*
-;* Authors: Loren Merritt <[email protected]>
-;* Anton Mitrofanov <[email protected]>
-;* Jason Garrett-Glaser <[email protected]>
-;* Henrik Gramner <[email protected]>
-;*
-;* Permission to use, copy, modify, and/or distribute this software for any
-;* purpose with or without fee is hereby granted, provided that the above
-;* copyright notice and this permission notice appear in all copies.
-;*
-;* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-;* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-;* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-;* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-;* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-;* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-;* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-;*****************************************************************************
-
-; This is a header file for the x264ASM assembly language, which uses
-; NASM/YASM syntax combined with a large number of macros to provide easy
-; abstraction between different calling conventions (x86_32, win64, linux64).
-; It also has various other useful features to simplify writing the kind of
-; DSP functions that are most often used in x264.
-
-; Unlike the rest of x264, this file is available under an ISC license, as it
-; has significant usefulness outside of x264 and we want it to be available
-; to the largest audience possible. Of course, if you modify it for your own
-; purposes to add a new feature, we strongly encourage contributing a patch
-; as this feature might be useful for others as well. Send patches or ideas
-; to [email protected] .
-
-; Local changes for libyuv:
-; remove %define program_name and references in labels
-; rename cpus to uppercase
-
-%define WIN64 0
-%define UNIX64 0
-%if ARCH_X86_64
- %ifidn __OUTPUT_FORMAT__,win32
- %define WIN64 1
- %elifidn __OUTPUT_FORMAT__,win64
- %define WIN64 1
- %else
- %define UNIX64 1
- %endif
-%endif
-
-%ifdef PREFIX
- %define mangle(x) _ %+ x
-%else
- %define mangle(x) x
-%endif
-
-; Name of the .rodata section.
-; Kludge: Something on OS X fails to align .rodata even given an align attribute,
-; so use a different read-only section.
-%macro SECTION_RODATA 0-1 16
- %ifidn __OUTPUT_FORMAT__,macho64
- SECTION .text align=%1
- %elifidn __OUTPUT_FORMAT__,macho
- SECTION .text align=%1
- fakegot:
- %elifidn __OUTPUT_FORMAT__,aout
- section .text
- %else
- SECTION .rodata align=%1
- %endif
-%endmacro
-
-; aout does not support align=
-%macro SECTION_TEXT 0-1 16
- %ifidn __OUTPUT_FORMAT__,aout
- SECTION .text
- %else
- SECTION .text align=%1
- %endif
-%endmacro
-
-%if WIN64
- %define PIC
-%elif ARCH_X86_64 == 0
-; x86_32 doesn't require PIC.
-; Some distros prefer shared objects to be PIC, but nothing breaks if
-; the code contains a few textrels, so we'll skip that complexity.
- %undef PIC
-%endif
-%ifdef PIC
- default rel
-%endif
-
-; Always use long nops (reduces 0x90 spam in disassembly on x86_32)
-CPU amdnop
-
-; Macros to eliminate most code duplication between x86_32 and x86_64:
-; Currently this works only for leaf functions which load all their arguments
-; into registers at the start, and make no other use of the stack. Luckily that
-; covers most of x264's asm.
-
-; PROLOGUE:
-; %1 = number of arguments. loads them from stack if needed.
-; %2 = number of registers used. pushes callee-saved regs if needed.
-; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed.
-; %4 = list of names to define to registers
-; PROLOGUE can also be invoked by adding the same options to cglobal
-
-; e.g.
-; cglobal foo, 2,3,0, dst, src, tmp
-; declares a function (foo), taking two args (dst and src) and one local variable (tmp)
-
-; TODO Some functions can use some args directly from the stack. If they're the
-; last args then you can just not declare them, but if they're in the middle
-; we need more flexible macro.
-
-; RET:
-; Pops anything that was pushed by PROLOGUE, and returns.
-
-; REP_RET:
-; Same, but if it doesn't pop anything it becomes a 2-byte ret, for athlons
-; which are slow when a normal ret follows a branch.
-
-; registers:
-; rN and rNq are the native-size register holding function argument N
-; rNd, rNw, rNb are dword, word, and byte size
-; rNh is the high 8 bits of the word size
-; rNm is the original location of arg N (a register or on the stack), dword
-; rNmp is native size
-
-%macro DECLARE_REG 2-3
- %define r%1q %2
- %define r%1d %2d
- %define r%1w %2w
- %define r%1b %2b
- %define r%1h %2h
- %if %0 == 2
- %define r%1m %2d
- %define r%1mp %2
- %elif ARCH_X86_64 ; memory
- %define r%1m [rsp + stack_offset + %3]
- %define r%1mp qword r %+ %1m
- %else
- %define r%1m [esp + stack_offset + %3]
- %define r%1mp dword r %+ %1m
- %endif
- %define r%1 %2
-%endmacro
-
-%macro DECLARE_REG_SIZE 3
- %define r%1q r%1
- %define e%1q r%1
- %define r%1d e%1
- %define e%1d e%1
- %define r%1w %1
- %define e%1w %1
- %define r%1h %3
- %define e%1h %3
- %define r%1b %2
- %define e%1b %2
-%if ARCH_X86_64 == 0
- %define r%1 e%1
-%endif
-%endmacro
-
-DECLARE_REG_SIZE ax, al, ah
-DECLARE_REG_SIZE bx, bl, bh
-DECLARE_REG_SIZE cx, cl, ch
-DECLARE_REG_SIZE dx, dl, dh
-DECLARE_REG_SIZE si, sil, null
-DECLARE_REG_SIZE di, dil, null
-DECLARE_REG_SIZE bp, bpl, null
-
-; t# defines for when per-arch register allocation is more complex than just function arguments
-
-%macro DECLARE_REG_TMP 1-*
- %assign %%i 0
- %rep %0
- CAT_XDEFINE t, %%i, r%1
- %assign %%i %%i+1
- %rotate 1
- %endrep
-%endmacro
-
-%macro DECLARE_REG_TMP_SIZE 0-*
- %rep %0
- %define t%1q t%1 %+ q
- %define t%1d t%1 %+ d
- %define t%1w t%1 %+ w
- %define t%1h t%1 %+ h
- %define t%1b t%1 %+ b
- %rotate 1
- %endrep
-%endmacro
-
-DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
-
-%if ARCH_X86_64
- %define gprsize 8
-%else
- %define gprsize 4
-%endif
-
-%macro PUSH 1
- push %1
- %assign stack_offset stack_offset+gprsize
-%endmacro
-
-%macro POP 1
- pop %1
- %assign stack_offset stack_offset-gprsize
-%endmacro
-
-%macro PUSH_IF_USED 1-*
- %rep %0
- %if %1 < regs_used
- PUSH r%1
- %endif
- %rotate 1
- %endrep
-%endmacro
-
-%macro POP_IF_USED 1-*
- %rep %0
- %if %1 < regs_used
- pop r%1
- %endif
- %rotate 1
- %endrep
-%endmacro
-
-%macro LOAD_IF_USED 1-*
- %rep %0
- %if %1 < num_args
- mov r%1, r %+ %1 %+ mp
- %endif
- %rotate 1
- %endrep
-%endmacro
-
-%macro SUB 2
- sub %1, %2
- %ifidn %1, rsp
- %assign stack_offset stack_offset+(%2)
- %endif
-%endmacro
-
-%macro ADD 2
- add %1, %2
- %ifidn %1, rsp
- %assign stack_offset stack_offset-(%2)
- %endif
-%endmacro
-
-%macro movifnidn 2
- %ifnidn %1, %2
- mov %1, %2
- %endif
-%endmacro
-
-%macro movsxdifnidn 2
- %ifnidn %1, %2
- movsxd %1, %2
- %endif
-%endmacro
-
-%macro ASSERT 1
- %if (%1) == 0
- %error assert failed
- %endif
-%endmacro
-
-%macro DEFINE_ARGS 0-*
- %ifdef n_arg_names
- %assign %%i 0
- %rep n_arg_names
- CAT_UNDEF arg_name %+ %%i, q
- CAT_UNDEF arg_name %+ %%i, d
- CAT_UNDEF arg_name %+ %%i, w
- CAT_UNDEF arg_name %+ %%i, h
- CAT_UNDEF arg_name %+ %%i, b
- CAT_UNDEF arg_name %+ %%i, m
- CAT_UNDEF arg_name %+ %%i, mp
- CAT_UNDEF arg_name, %%i
- %assign %%i %%i+1
- %endrep
- %endif
-
- %xdefine %%stack_offset stack_offset
- %undef stack_offset ; so that the current value of stack_offset doesn't get baked in by xdefine
- %assign %%i 0
- %rep %0
- %xdefine %1q r %+ %%i %+ q
- %xdefine %1d r %+ %%i %+ d
- %xdefine %1w r %+ %%i %+ w
- %xdefine %1h r %+ %%i %+ h
- %xdefine %1b r %+ %%i %+ b
- %xdefine %1m r %+ %%i %+ m
- %xdefine %1mp r %+ %%i %+ mp
- CAT_XDEFINE arg_name, %%i, %1
- %assign %%i %%i+1
- %rotate 1
- %endrep
- %xdefine stack_offset %%stack_offset
- %assign n_arg_names %0
-%endmacro
-
-%if WIN64 ; Windows x64 ;=================================================
-
-DECLARE_REG 0, rcx
-DECLARE_REG 1, rdx
-DECLARE_REG 2, R8
-DECLARE_REG 3, R9
-DECLARE_REG 4, R10, 40
-DECLARE_REG 5, R11, 48
-DECLARE_REG 6, rax, 56
-DECLARE_REG 7, rdi, 64
-DECLARE_REG 8, rsi, 72
-DECLARE_REG 9, rbx, 80
-DECLARE_REG 10, rbp, 88
-DECLARE_REG 11, R12, 96
-DECLARE_REG 12, R13, 104
-DECLARE_REG 13, R14, 112
-DECLARE_REG 14, R15, 120
-
-%macro PROLOGUE 2-4+ 0 ; #args, #regs, #xmm_regs, arg_names...
- %assign num_args %1
- %assign regs_used %2
- ASSERT regs_used >= num_args
- ASSERT regs_used <= 15
- PUSH_IF_USED 7, 8, 9, 10, 11, 12, 13, 14
- %if mmsize == 8
- %assign xmm_regs_used 0
- %else
- WIN64_SPILL_XMM %3
- %endif
- LOAD_IF_USED 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
- DEFINE_ARGS %4
-%endmacro
-
-%macro WIN64_SPILL_XMM 1
- %assign xmm_regs_used %1
- ASSERT xmm_regs_used <= 16
- %if xmm_regs_used > 6
- SUB rsp, (xmm_regs_used-6)*16+16
- %assign %%i xmm_regs_used
- %rep (xmm_regs_used-6)
- %assign %%i %%i-1
- movdqa [rsp + (%%i-6)*16+(~stack_offset&8)], xmm %+ %%i
- %endrep
- %endif
-%endmacro
-
-%macro WIN64_RESTORE_XMM_INTERNAL 1
- %if xmm_regs_used > 6
- %assign %%i xmm_regs_used
- %rep (xmm_regs_used-6)
- %assign %%i %%i-1
- movdqa xmm %+ %%i, [%1 + (%%i-6)*16+(~stack_offset&8)]
- %endrep
- add %1, (xmm_regs_used-6)*16+16
- %endif
-%endmacro
-
-%macro WIN64_RESTORE_XMM 1
- WIN64_RESTORE_XMM_INTERNAL %1
- %assign stack_offset stack_offset-(xmm_regs_used-6)*16+16
- %assign xmm_regs_used 0
-%endmacro
-
-%define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32
-
-%macro RET 0
- WIN64_RESTORE_XMM_INTERNAL rsp
- POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
-%if mmsize == 32
- vzeroupper
-%endif
- ret
-%endmacro
-
-%elif ARCH_X86_64 ; *nix x64 ;=============================================
-
-DECLARE_REG 0, rdi
-DECLARE_REG 1, rsi
-DECLARE_REG 2, rdx
-DECLARE_REG 3, rcx
-DECLARE_REG 4, R8
-DECLARE_REG 5, R9
-DECLARE_REG 6, rax, 8
-DECLARE_REG 7, R10, 16
-DECLARE_REG 8, R11, 24
-DECLARE_REG 9, rbx, 32
-DECLARE_REG 10, rbp, 40
-DECLARE_REG 11, R12, 48
-DECLARE_REG 12, R13, 56
-DECLARE_REG 13, R14, 64
-DECLARE_REG 14, R15, 72
-
-%macro PROLOGUE 2-4+ ; #args, #regs, #xmm_regs, arg_names...
- %assign num_args %1
- %assign regs_used %2
- ASSERT regs_used >= num_args
- ASSERT regs_used <= 15
- PUSH_IF_USED 9, 10, 11, 12, 13, 14
- LOAD_IF_USED 6, 7, 8, 9, 10, 11, 12, 13, 14
- DEFINE_ARGS %4
-%endmacro
-
-%define has_epilogue regs_used > 9 || mmsize == 32
-
-%macro RET 0
- POP_IF_USED 14, 13, 12, 11, 10, 9
-%if mmsize == 32
- vzeroupper
-%endif
- ret
-%endmacro
-
-%else ; X86_32 ;==============================================================
-
-DECLARE_REG 0, eax, 4
-DECLARE_REG 1, ecx, 8
-DECLARE_REG 2, edx, 12
-DECLARE_REG 3, ebx, 16
-DECLARE_REG 4, esi, 20
-DECLARE_REG 5, edi, 24
-DECLARE_REG 6, ebp, 28
-%define rsp esp
-
-%macro DECLARE_ARG 1-*
- %rep %0
- %define r%1m [esp + stack_offset + 4*%1 + 4]
- %define r%1mp dword r%1m
- %rotate 1
- %endrep
-%endmacro
-
-DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
-
-%macro PROLOGUE 2-4+ ; #args, #regs, #xmm_regs, arg_names...
- %assign num_args %1
- %assign regs_used %2
- %if regs_used > 7
- %assign regs_used 7
- %endif
- ASSERT regs_used >= num_args
- PUSH_IF_USED 3, 4, 5, 6
- LOAD_IF_USED 0, 1, 2, 3, 4, 5, 6
- DEFINE_ARGS %4
-%endmacro
-
-%define has_epilogue regs_used > 3 || mmsize == 32
-
-%macro RET 0
- POP_IF_USED 6, 5, 4, 3
-%if mmsize == 32
- vzeroupper
-%endif
- ret
-%endmacro
-
-%endif ;======================================================================
-
-%if WIN64 == 0
-%macro WIN64_SPILL_XMM 1
-%endmacro
-%macro WIN64_RESTORE_XMM 1
-%endmacro
-%endif
-
-%macro REP_RET 0
- %if has_epilogue
- RET
- %else
- rep ret
- %endif
-%endmacro
-
-%macro TAIL_CALL 2 ; callee, is_nonadjacent
- %if has_epilogue
- call %1
- RET
- %elif %2
- jmp %1
- %endif
-%endmacro
-
-;=============================================================================
-; arch-independent part
-;=============================================================================
-
-%assign function_align 16
-
-; Begin a function.
-; Applies any symbol mangling needed for C linkage, and sets up a define such that
-; subsequent uses of the function name automatically refer to the mangled version.
-; Appends cpuflags to the function name if cpuflags has been specified.
-%macro cglobal 1-2+ ; name, [PROLOGUE args]
-%if %0 == 1
- cglobal_internal %1 %+ SUFFIX
-%else
- cglobal_internal %1 %+ SUFFIX, %2
-%endif
-%endmacro
-%macro cglobal_internal 1-2+
- %ifndef cglobaled_%1
- %xdefine %1 mangle(%1)
- %xdefine %1.skip_prologue %1 %+ .skip_prologue
- CAT_XDEFINE cglobaled_, %1, 1
- %endif
- %xdefine current_function %1
- %ifidn __OUTPUT_FORMAT__,elf
- global %1:function hidden
- %else
- global %1
- %endif
- align function_align
- %1:
- RESET_MM_PERMUTATION ; not really needed, but makes disassembly somewhat nicer
- %assign stack_offset 0
- %if %0 > 1
- PROLOGUE %2
- %endif
-%endmacro
-
-%macro cextern 1
- %xdefine %1 mangle(%1)
- CAT_XDEFINE cglobaled_, %1, 1
- extern %1
-%endmacro
-
-; like cextern, but without the prefix
-%macro cextern_naked 1
- %xdefine %1 mangle(%1)
- CAT_XDEFINE cglobaled_, %1, 1
- extern %1
-%endmacro
-
-%macro const 2+
- %xdefine %1 mangle(%1)
- global %1
- %1: %2
-%endmacro
-
-; This is needed for ELF, otherwise the GNU linker assumes the stack is
-; executable by default.
-%ifidn __OUTPUT_FORMAT__,elf
-SECTION .note.GNU-stack noalloc noexec nowrite progbits
-%endif
-%ifidn __OUTPUT_FORMAT__,elf32
-section .note.GNU-stack noalloc noexec nowrite progbits
-%endif
-%ifidn __OUTPUT_FORMAT__,elf64
-section .note.GNU-stack noalloc noexec nowrite progbits
-%endif
-
-; cpuflags
-
-%assign cpuflags_MMX (1<<0)
-%assign cpuflags_MMX2 (1<<1) | cpuflags_MMX
-%assign cpuflags_3dnow (1<<2) | cpuflags_MMX
-%assign cpuflags_3dnow2 (1<<3) | cpuflags_3dnow
-%assign cpuflags_SSE (1<<4) | cpuflags_MMX2
-%assign cpuflags_SSE2 (1<<5) | cpuflags_SSE
-%assign cpuflags_SSE2slow (1<<6) | cpuflags_SSE2
-%assign cpuflags_SSE3 (1<<7) | cpuflags_SSE2
-%assign cpuflags_SSSE3 (1<<8) | cpuflags_SSE3
-%assign cpuflags_SSE4 (1<<9) | cpuflags_SSSE3
-%assign cpuflags_SSE42 (1<<10)| cpuflags_SSE4
-%assign cpuflags_AVX (1<<11)| cpuflags_SSE42
-%assign cpuflags_xop (1<<12)| cpuflags_AVX
-%assign cpuflags_fma4 (1<<13)| cpuflags_AVX
-%assign cpuflags_AVX2 (1<<14)| cpuflags_AVX
-%assign cpuflags_fma3 (1<<15)| cpuflags_AVX
-
-%assign cpuflags_cache32 (1<<16)
-%assign cpuflags_cache64 (1<<17)
-%assign cpuflags_slowctz (1<<18)
-%assign cpuflags_lzcnt (1<<19)
-%assign cpuflags_misalign (1<<20)
-%assign cpuflags_aligned (1<<21) ; not a cpu feature, but a function variant
-%assign cpuflags_atom (1<<22)
-%assign cpuflags_bmi1 (1<<23)
-%assign cpuflags_bmi2 (1<<24)|cpuflags_bmi1
-%assign cpuflags_tbm (1<<25)|cpuflags_bmi1
-
-%define cpuflag(x) ((cpuflags & (cpuflags_ %+ x)) == (cpuflags_ %+ x))
-%define notcpuflag(x) ((cpuflags & (cpuflags_ %+ x)) != (cpuflags_ %+ x))
-
-; Takes up to 2 cpuflags from the above list.
-; All subsequent functions (up to the next INIT_CPUFLAGS) is built for the specified cpu.
-; You shouldn't need to invoke this macro directly, it's a subroutine for INIT_MMX &co.
-%macro INIT_CPUFLAGS 0-2
- %if %0 >= 1
- %xdefine cpuname %1
- %assign cpuflags cpuflags_%1
- %if %0 >= 2
- %xdefine cpuname %1_%2
- %assign cpuflags cpuflags | cpuflags_%2
- %endif
- %xdefine SUFFIX _ %+ cpuname
- %if cpuflag(AVX)
- %assign AVX_enabled 1
- %endif
- %if mmsize == 16 && notcpuflag(SSE2)
- %define mova movaps
- %define movu movups
- %define movnta movntps
- %endif
- %if cpuflag(aligned)
- %define movu mova
- %elifidn %1, SSE3
- %define movu lddqu
- %endif
- %else
- %xdefine SUFFIX
- %undef cpuname
- %undef cpuflags
- %endif
-%endmacro
-
-; merge MMX and SSE*
-
-%macro CAT_XDEFINE 3
- %xdefine %1%2 %3
-%endmacro
-
-%macro CAT_UNDEF 2
- %undef %1%2
-%endmacro
-
-%macro INIT_MMX 0-1+
- %assign AVX_enabled 0
- %define RESET_MM_PERMUTATION INIT_MMX %1
- %define mmsize 8
- %define num_mmregs 8
- %define mova movq
- %define movu movq
- %define movh movd
- %define movnta movntq
- %assign %%i 0
- %rep 8
- CAT_XDEFINE m, %%i, mm %+ %%i
- CAT_XDEFINE nmm, %%i, %%i
- %assign %%i %%i+1
- %endrep
- %rep 8
- CAT_UNDEF m, %%i
- CAT_UNDEF nmm, %%i
- %assign %%i %%i+1
- %endrep
- INIT_CPUFLAGS %1
-%endmacro
-
-%macro INIT_XMM 0-1+
- %assign AVX_enabled 0
- %define RESET_MM_PERMUTATION INIT_XMM %1
- %define mmsize 16
- %define num_mmregs 8
- %if ARCH_X86_64
- %define num_mmregs 16
- %endif
- %define mova movdqa
- %define movu movdqu
- %define movh movq
- %define movnta movntdq
- %assign %%i 0
- %rep num_mmregs
- CAT_XDEFINE m, %%i, xmm %+ %%i
- CAT_XDEFINE nxmm, %%i, %%i
- %assign %%i %%i+1
- %endrep
- INIT_CPUFLAGS %1
-%endmacro
-
-%macro INIT_YMM 0-1+
- %assign AVX_enabled 1
- %define RESET_MM_PERMUTATION INIT_YMM %1
- %define mmsize 32
- %define num_mmregs 8
- %if ARCH_X86_64
- %define num_mmregs 16
- %endif
- %define mova vmovaps
- %define movu vmovups
- %undef movh
- %define movnta vmovntps
- %assign %%i 0
- %rep num_mmregs
- CAT_XDEFINE m, %%i, ymm %+ %%i
- CAT_XDEFINE nymm, %%i, %%i
- %assign %%i %%i+1
- %endrep
- INIT_CPUFLAGS %1
-%endmacro
-
-INIT_XMM
-
-; I often want to use macros that permute their arguments. e.g. there's no
-; efficient way to implement butterfly or transpose or dct without swapping some
-; arguments.
-;
-; I would like to not have to manually keep track of the permutations:
-; If I insert a permutation in the middle of a function, it should automatically
-; change everything that follows. For more complex macros I may also have multiple
-; implementations, e.g. the SSE2 and SSSE3 versions may have different permutations.
-;
-; Hence these macros. Insert a PERMUTE or some SWAPs at the end of a macro that
-; permutes its arguments. It's equivalent to exchanging the contents of the
-; registers, except that this way you exchange the register names instead, so it
-; doesn't cost any cycles.
-
-%macro PERMUTE 2-* ; takes a list of pairs to swap
-%rep %0/2
- %xdefine tmp%2 m%2
- %xdefine ntmp%2 nm%2
- %rotate 2
-%endrep
-%rep %0/2
- %xdefine m%1 tmp%2
- %xdefine nm%1 ntmp%2
- %undef tmp%2
- %undef ntmp%2
- %rotate 2
-%endrep
-%endmacro
-
-%macro SWAP 2-* ; swaps a single chain (sometimes more concise than pairs)
-%rep %0-1
-%ifdef m%1
- %xdefine tmp m%1
- %xdefine m%1 m%2
- %xdefine m%2 tmp
- CAT_XDEFINE n, m%1, %1
- CAT_XDEFINE n, m%2, %2
-%else
- ; If we were called as "SWAP m0,m1" rather than "SWAP 0,1" infer the original numbers here.
- ; Be careful using this mode in nested macros though, as in some cases there may be
- ; other copies of m# that have already been dereferenced and don't get updated correctly.
- %xdefine %%n1 n %+ %1
- %xdefine %%n2 n %+ %2
- %xdefine tmp m %+ %%n1
- CAT_XDEFINE m, %%n1, m %+ %%n2
- CAT_XDEFINE m, %%n2, tmp
- CAT_XDEFINE n, m %+ %%n1, %%n1
- CAT_XDEFINE n, m %+ %%n2, %%n2
-%endif
- %undef tmp
- %rotate 1
-%endrep
-%endmacro
-
-; If SAVE_MM_PERMUTATION is placed at the end of a function, then any later
-; calls to that function will automatically load the permutation, so values can
-; be returned in mmregs.
-%macro SAVE_MM_PERMUTATION 0-1
- %if %0
- %xdefine %%f %1_m
- %else
- %xdefine %%f current_function %+ _m
- %endif
- %assign %%i 0
- %rep num_mmregs
- CAT_XDEFINE %%f, %%i, m %+ %%i
- %assign %%i %%i+1
- %endrep
-%endmacro
-
-%macro LOAD_MM_PERMUTATION 1 ; name to load from
- %ifdef %1_m0
- %assign %%i 0
- %rep num_mmregs
- CAT_XDEFINE m, %%i, %1_m %+ %%i
- CAT_XDEFINE n, m %+ %%i, %%i
- %assign %%i %%i+1
- %endrep
- %endif
-%endmacro
-
-; Append cpuflags to the callee's name iff the appended name is known and the plain name isn't
-%macro call 1
- call_internal %1, %1 %+ SUFFIX
-%endmacro
-%macro call_internal 2
- %xdefine %%i %1
- %ifndef cglobaled_%1
- %ifdef cglobaled_%2
- %xdefine %%i %2
- %endif
- %endif
- call %%i
- LOAD_MM_PERMUTATION %%i
-%endmacro
-
-; Substitutions that reduce instruction size but are functionally equivalent
-%macro add 2
- %ifnum %2
- %if %2==128
- sub %1, -128
- %else
- add %1, %2
- %endif
- %else
- add %1, %2
- %endif
-%endmacro
-
-%macro sub 2
- %ifnum %2
- %if %2==128
- add %1, -128
- %else
- sub %1, %2
- %endif
- %else
- sub %1, %2
- %endif
-%endmacro
-
-;=============================================================================
-; AVX abstraction layer
-;=============================================================================
-
-%assign i 0
-%rep 16
- %if i < 8
- CAT_XDEFINE sizeofmm, i, 8
- %endif
- CAT_XDEFINE sizeofxmm, i, 16
- CAT_XDEFINE sizeofymm, i, 32
-%assign i i+1
-%endrep
-%undef i
-
-%macro CHECK_AVX_INSTR_EMU 3-*
- %xdefine %%opcode %1
- %xdefine %%dst %2
- %rep %0-2
- %ifidn %%dst, %3
- %error non-AVX emulation of ``%%opcode'' is not supported
- %endif
- %rotate 1
- %endrep
-%endmacro
-
-;%1 == instruction
-;%2 == 1 if float, 0 if int
-;%3 == 1 if 4-operand (xmm, xmm, xmm, imm), 0 if 2- or 3-operand (xmm, xmm, xmm)
-;%4 == number of operands given
-;%5+: operands
-%macro RUN_AVX_INSTR 6-7+
- %ifid %6
- %define %%sizeofreg sizeof%6
- %elifid %5
- %define %%sizeofreg sizeof%5
- %else
- %define %%sizeofreg mmsize
- %endif
- %if %%sizeofreg==32
- %if %4>=3
- v%1 %5, %6, %7
- %else
- v%1 %5, %6
- %endif
- %else
- %if %%sizeofreg==8
- %define %%regmov movq
- %elif %2
- %define %%regmov movaps
- %else
- %define %%regmov movdqa
- %endif
-
- %if %4>=3+%3
- %ifnidn %5, %6
- %if AVX_enabled && %%sizeofreg==16
- v%1 %5, %6, %7
- %else
- CHECK_AVX_INSTR_EMU {%1 %5, %6, %7}, %5, %7
- %%regmov %5, %6
- %1 %5, %7
- %endif
- %else
- %1 %5, %7
- %endif
- %elif %4>=3
- %1 %5, %6, %7
- %else
- %1 %5, %6
- %endif
- %endif
-%endmacro
-
-; 3arg AVX ops with a memory arg can only have it in src2,
-; whereas SSE emulation of 3arg prefers to have it in src1 (i.e. the mov).
-; So, if the op is symmetric and the wrong one is memory, swap them.
-%macro RUN_AVX_INSTR1 8
- %assign %%swap 0
- %if AVX_enabled
- %ifnid %6
- %assign %%swap 1
- %endif
- %elifnidn %5, %6
- %ifnid %7
- %assign %%swap 1
- %endif
- %endif
- %if %%swap && %3 == 0 && %8 == 1
- RUN_AVX_INSTR %1, %2, %3, %4, %5, %7, %6
- %else
- RUN_AVX_INSTR %1, %2, %3, %4, %5, %6, %7
- %endif
-%endmacro
-
-;%1 == instruction
-;%2 == 1 if float, 0 if int
-;%3 == 1 if 4-operand (xmm, xmm, xmm, imm), 0 if 2- or 3-operand (xmm, xmm, xmm)
-;%4 == 1 if symmetric (i.e. doesn't matter which src arg is which), 0 if not
-%macro AVX_INSTR 4
- %macro %1 2-9 fnord, fnord, fnord, %1, %2, %3, %4
- %ifidn %3, fnord
- RUN_AVX_INSTR %6, %7, %8, 2, %1, %2
- %elifidn %4, fnord
- RUN_AVX_INSTR1 %6, %7, %8, 3, %1, %2, %3, %9
- %elifidn %5, fnord
- RUN_AVX_INSTR %6, %7, %8, 4, %1, %2, %3, %4
- %else
- RUN_AVX_INSTR %6, %7, %8, 5, %1, %2, %3, %4, %5
- %endif
- %endmacro
-%endmacro
-
-AVX_INSTR addpd, 1, 0, 1
-AVX_INSTR addps, 1, 0, 1
-AVX_INSTR addsd, 1, 0, 1
-AVX_INSTR addss, 1, 0, 1
-AVX_INSTR addsubpd, 1, 0, 0
-AVX_INSTR addsubps, 1, 0, 0
-AVX_INSTR andpd, 1, 0, 1
-AVX_INSTR andps, 1, 0, 1
-AVX_INSTR andnpd, 1, 0, 0
-AVX_INSTR andnps, 1, 0, 0
-AVX_INSTR blendpd, 1, 0, 0
-AVX_INSTR blendps, 1, 0, 0
-AVX_INSTR blendvpd, 1, 0, 0
-AVX_INSTR blendvps, 1, 0, 0
-AVX_INSTR cmppd, 1, 0, 0
-AVX_INSTR cmpps, 1, 0, 0
-AVX_INSTR cmpsd, 1, 0, 0
-AVX_INSTR cmpss, 1, 0, 0
-AVX_INSTR cvtdq2ps, 1, 0, 0
-AVX_INSTR cvtps2dq, 1, 0, 0
-AVX_INSTR divpd, 1, 0, 0
-AVX_INSTR divps, 1, 0, 0
-AVX_INSTR divsd, 1, 0, 0
-AVX_INSTR divss, 1, 0, 0
-AVX_INSTR dppd, 1, 1, 0
-AVX_INSTR dpps, 1, 1, 0
-AVX_INSTR haddpd, 1, 0, 0
-AVX_INSTR haddps, 1, 0, 0
-AVX_INSTR hsubpd, 1, 0, 0
-AVX_INSTR hsubps, 1, 0, 0
-AVX_INSTR maxpd, 1, 0, 1
-AVX_INSTR maxps, 1, 0, 1
-AVX_INSTR maxsd, 1, 0, 1
-AVX_INSTR maxss, 1, 0, 1
-AVX_INSTR minpd, 1, 0, 1
-AVX_INSTR minps, 1, 0, 1
-AVX_INSTR minsd, 1, 0, 1
-AVX_INSTR minss, 1, 0, 1
-AVX_INSTR movhlps, 1, 0, 0
-AVX_INSTR movlhps, 1, 0, 0
-AVX_INSTR movsd, 1, 0, 0
-AVX_INSTR movss, 1, 0, 0
-AVX_INSTR mpsadbw, 0, 1, 0
-AVX_INSTR mulpd, 1, 0, 1
-AVX_INSTR mulps, 1, 0, 1
-AVX_INSTR mulsd, 1, 0, 1
-AVX_INSTR mulss, 1, 0, 1
-AVX_INSTR orpd, 1, 0, 1
-AVX_INSTR orps, 1, 0, 1
-AVX_INSTR pabsb, 0, 0, 0
-AVX_INSTR pabsw, 0, 0, 0
-AVX_INSTR pabsd, 0, 0, 0
-AVX_INSTR packsswb, 0, 0, 0
-AVX_INSTR packssdw, 0, 0, 0
-AVX_INSTR packuswb, 0, 0, 0
-AVX_INSTR packusdw, 0, 0, 0
-AVX_INSTR paddb, 0, 0, 1
-AVX_INSTR paddw, 0, 0, 1
-AVX_INSTR paddd, 0, 0, 1
-AVX_INSTR paddq, 0, 0, 1
-AVX_INSTR paddsb, 0, 0, 1
-AVX_INSTR paddsw, 0, 0, 1
-AVX_INSTR paddusb, 0, 0, 1
-AVX_INSTR paddusw, 0, 0, 1
-AVX_INSTR palignr, 0, 1, 0
-AVX_INSTR pand, 0, 0, 1
-AVX_INSTR pandn, 0, 0, 0
-AVX_INSTR pavgb, 0, 0, 1
-AVX_INSTR pavgw, 0, 0, 1
-AVX_INSTR pblendvb, 0, 0, 0
-AVX_INSTR pblendw, 0, 1, 0
-AVX_INSTR pcmpestri, 0, 0, 0
-AVX_INSTR pcmpestrm, 0, 0, 0
-AVX_INSTR pcmpistri, 0, 0, 0
-AVX_INSTR pcmpistrm, 0, 0, 0
-AVX_INSTR pcmpeqb, 0, 0, 1
-AVX_INSTR pcmpeqw, 0, 0, 1
-AVX_INSTR pcmpeqd, 0, 0, 1
-AVX_INSTR pcmpeqq, 0, 0, 1
-AVX_INSTR pcmpgtb, 0, 0, 0
-AVX_INSTR pcmpgtw, 0, 0, 0
-AVX_INSTR pcmpgtd, 0, 0, 0
-AVX_INSTR pcmpgtq, 0, 0, 0
-AVX_INSTR phaddw, 0, 0, 0
-AVX_INSTR phaddd, 0, 0, 0
-AVX_INSTR phaddsw, 0, 0, 0
-AVX_INSTR phsubw, 0, 0, 0
-AVX_INSTR phsubd, 0, 0, 0
-AVX_INSTR phsubsw, 0, 0, 0
-AVX_INSTR pmaddwd, 0, 0, 1
-AVX_INSTR pmaddubsw, 0, 0, 0
-AVX_INSTR pmaxsb, 0, 0, 1
-AVX_INSTR pmaxsw, 0, 0, 1
-AVX_INSTR pmaxsd, 0, 0, 1
-AVX_INSTR pmaxub, 0, 0, 1
-AVX_INSTR pmaxuw, 0, 0, 1
-AVX_INSTR pmaxud, 0, 0, 1
-AVX_INSTR pminsb, 0, 0, 1
-AVX_INSTR pminsw, 0, 0, 1
-AVX_INSTR pminsd, 0, 0, 1
-AVX_INSTR pminub, 0, 0, 1
-AVX_INSTR pminuw, 0, 0, 1
-AVX_INSTR pminud, 0, 0, 1
-AVX_INSTR pmovmskb, 0, 0, 0
-AVX_INSTR pmulhuw, 0, 0, 1
-AVX_INSTR pmulhrsw, 0, 0, 1
-AVX_INSTR pmulhw, 0, 0, 1
-AVX_INSTR pmullw, 0, 0, 1
-AVX_INSTR pmulld, 0, 0, 1
-AVX_INSTR pmuludq, 0, 0, 1
-AVX_INSTR pmuldq, 0, 0, 1
-AVX_INSTR por, 0, 0, 1
-AVX_INSTR psadbw, 0, 0, 1
-AVX_INSTR pshufb, 0, 0, 0
-AVX_INSTR pshufd, 0, 1, 0
-AVX_INSTR pshufhw, 0, 1, 0
-AVX_INSTR pshuflw, 0, 1, 0
-AVX_INSTR psignb, 0, 0, 0
-AVX_INSTR psignw, 0, 0, 0
-AVX_INSTR psignd, 0, 0, 0
-AVX_INSTR psllw, 0, 0, 0
-AVX_INSTR pslld, 0, 0, 0
-AVX_INSTR psllq, 0, 0, 0
-AVX_INSTR pslldq, 0, 0, 0
-AVX_INSTR psraw, 0, 0, 0
-AVX_INSTR psrad, 0, 0, 0
-AVX_INSTR psrlw, 0, 0, 0
-AVX_INSTR psrld, 0, 0, 0
-AVX_INSTR psrlq, 0, 0, 0
-AVX_INSTR psrldq, 0, 0, 0
-AVX_INSTR psubb, 0, 0, 0
-AVX_INSTR psubw, 0, 0, 0
-AVX_INSTR psubd, 0, 0, 0
-AVX_INSTR psubq, 0, 0, 0
-AVX_INSTR psubsb, 0, 0, 0
-AVX_INSTR psubsw, 0, 0, 0
-AVX_INSTR psubusb, 0, 0, 0
-AVX_INSTR psubusw, 0, 0, 0
-AVX_INSTR ptest, 0, 0, 0
-AVX_INSTR punpckhbw, 0, 0, 0
-AVX_INSTR punpckhwd, 0, 0, 0
-AVX_INSTR punpckhdq, 0, 0, 0
-AVX_INSTR punpckhqdq, 0, 0, 0
-AVX_INSTR punpcklbw, 0, 0, 0
-AVX_INSTR punpcklwd, 0, 0, 0
-AVX_INSTR punpckldq, 0, 0, 0
-AVX_INSTR punpcklqdq, 0, 0, 0
-AVX_INSTR pxor, 0, 0, 1
-AVX_INSTR shufps, 1, 1, 0
-AVX_INSTR subpd, 1, 0, 0
-AVX_INSTR subps, 1, 0, 0
-AVX_INSTR subsd, 1, 0, 0
-AVX_INSTR subss, 1, 0, 0
-AVX_INSTR unpckhpd, 1, 0, 0
-AVX_INSTR unpckhps, 1, 0, 0
-AVX_INSTR unpcklpd, 1, 0, 0
-AVX_INSTR unpcklps, 1, 0, 0
-AVX_INSTR xorpd, 1, 0, 1
-AVX_INSTR xorps, 1, 0, 1
-
-; 3DNow instructions, for sharing code between AVX, SSE and 3DN
-AVX_INSTR pfadd, 1, 0, 1
-AVX_INSTR pfsub, 1, 0, 0
-AVX_INSTR pfmul, 1, 0, 1
-
-; base-4 constants for shuffles
-%assign i 0
-%rep 256
- %assign j ((i>>6)&3)*1000 + ((i>>4)&3)*100 + ((i>>2)&3)*10 + (i&3)
- %if j < 10
- CAT_XDEFINE q000, j, i
- %elif j < 100
- CAT_XDEFINE q00, j, i
- %elif j < 1000
- CAT_XDEFINE q0, j, i
- %else
- CAT_XDEFINE q, j, i
- %endif
-%assign i i+1
-%endrep
-%undef i
-%undef j
-
-%macro FMA_INSTR 3
- %macro %1 4-7 %1, %2, %3
- %if cpuflag(xop)
- v%5 %1, %2, %3, %4
- %else
- %6 %1, %2, %3
- %7 %1, %4
- %endif
- %endmacro
-%endmacro
-
-FMA_INSTR pmacsdd, pmulld, paddd
-FMA_INSTR pmacsww, pmullw, paddw
-FMA_INSTR pmadcswd, pmaddwd, paddd
-
-; tzcnt is equivalent to "rep bsf" and is backwards-compatible with bsf.
-; This lets us use tzcnt without bumping the yasm version requirement yet.
-%define tzcnt rep bsf
diff --git a/third_party/x86inc/README.libaom b/third_party/x86inc/README.libaom
index 2f3e5c2..6b92358 100644
--- a/third_party/x86inc/README.libaom
+++ b/third_party/x86inc/README.libaom
@@ -16,3 +16,4 @@
Use .text instead of .rodata on macho to avoid broken tables in PIC mode.
Use .text with no alignment for aout.
Only use 'hidden' visibility with Chromium.
+Prefix ARCH_* with AOM_.
diff --git a/third_party/x86inc/x86inc.asm b/third_party/x86inc/x86inc.asm
index e48d644..b0421f5 100644
--- a/third_party/x86inc/x86inc.asm
+++ b/third_party/x86inc/x86inc.asm
@@ -45,7 +45,7 @@
%endif
%ifndef STACK_ALIGNMENT
- %if ARCH_X86_64
+ %if AOM_ARCH_X86_64
%define STACK_ALIGNMENT 16
%else
%define STACK_ALIGNMENT 4
@@ -54,7 +54,7 @@
%define WIN64 0
%define UNIX64 0
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%ifidn __OUTPUT_FORMAT__,win32
%define WIN64 1
%elifidn __OUTPUT_FORMAT__,win64
@@ -168,7 +168,7 @@
%endif
%endif
- %if ARCH_X86_64 == 0
+ %if AOM_ARCH_X86_64 == 0
%undef PIC
%endif
@@ -277,7 +277,7 @@
%if %0 == 2
%define r%1m %2d
%define r%1mp %2
- %elif ARCH_X86_64 ; memory
+ %elif AOM_ARCH_X86_64 ; memory
%define r%1m [rstk + stack_offset + %3]
%define r%1mp qword r %+ %1 %+ m
%else
@@ -298,7 +298,7 @@
%define e%1h %3
%define r%1b %2
%define e%1b %2
- %if ARCH_X86_64 == 0
+ %if AOM_ARCH_X86_64 == 0
%define r%1 e%1
%endif
%endmacro
@@ -335,14 +335,14 @@
DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
%define gprsize 8
%else
%define gprsize 4
%endif
%macro LEA 2
-%if ARCH_X86_64
+%if AOM_ARCH_X86_64
lea %1, [%2]
%elif PIC
call $+5 ; special-cased to not affect the RSB on most CPU:s
@@ -414,7 +414,7 @@
%endif
%endmacro
-%if ARCH_X86_64 == 0
+%if AOM_ARCH_X86_64 == 0
%define movsxd movifnidn
%endif
@@ -466,7 +466,7 @@
%endmacro
%define required_stack_alignment ((mmsize + 15) & ~15)
-%define vzeroupper_required (mmsize > 16 && (ARCH_X86_64 == 0 || xmm_regs_used > 16 || notcpuflag(avx512)))
+%define vzeroupper_required (mmsize > 16 && (AOM_ARCH_X86_64 == 0 || xmm_regs_used > 16 || notcpuflag(avx512)))
%define high_mm_regs (16*cpuflag(avx512))
%macro ALLOC_STACK 1-2 0 ; stack_size, n_xmm_regs (for win64 only)
@@ -521,13 +521,13 @@
; Reserve an additional register for storing the original stack pointer, but avoid using
; eax/rax for this purpose since it can potentially get overwritten as a return value.
%assign regs_used (regs_used + 1)
- %if ARCH_X86_64 && regs_used == 7
+ %if AOM_ARCH_X86_64 && regs_used == 7
%assign regs_used 8
- %elif ARCH_X86_64 == 0 && regs_used == 1
+ %elif AOM_ARCH_X86_64 == 0 && regs_used == 1
%assign regs_used 2
%endif
%endif
- %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+ %if AOM_ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
%assign regs_used 5 + UNIX64 * 3
@@ -654,7 +654,7 @@
AUTO_REP_RET
%endmacro
-%elif ARCH_X86_64 ; *nix x64 ;=============================================
+%elif AOM_ARCH_X86_64 ; *nix x64 ;=============================================
DECLARE_REG 0, rdi
DECLARE_REG 1, rsi
@@ -1002,7 +1002,7 @@
%endif
%endif
- %if ARCH_X86_64 || cpuflag(sse2)
+ %if AOM_ARCH_X86_64 || cpuflag(sse2)
%ifdef __NASM_VER__
ALIGNMODE p6
%else
@@ -1039,7 +1039,7 @@
%endif
%assign num_mmregs 8
- %if ARCH_X86_64 && mmsize >= 16
+ %if AOM_ARCH_X86_64 && mmsize >= 16
%assign num_mmregs 16
%if cpuflag(avx512) || mmsize == 64
%assign num_mmregs 32
@@ -1064,7 +1064,7 @@
; Prefer registers 16-31 over 0-15 to avoid having to use vzeroupper
%macro AVX512_MM_PERMUTATION 0-1 0 ; start_reg
- %if ARCH_X86_64 && cpuflag(avx512)
+ %if AOM_ARCH_X86_64 && cpuflag(avx512)
%assign %%i %1
%rep 16-%1
%assign %%i_high %%i+16
diff --git a/tools/frame_size_variation_analyzer.py b/tools/frame_size_variation_analyzer.py
new file mode 100644
index 0000000..5c02319
--- /dev/null
+++ b/tools/frame_size_variation_analyzer.py
@@ -0,0 +1,74 @@
+# RTC frame size variation analyzer
+# Usage:
+# 1. Config with "-DCONFIG_OUTPUT_FRAME_SIZE=1".
+# 2. Build aomenc. Encode a file, and generate output file: frame_sizes.csv
+# 3. Run: python ./frame_size.py frame_sizes.csv target-bitrate fps
+# Where target-bitrate: Bitrate (kbps), and fps is frame per second.
+# Example: python ../aom/tools/frame_size_variation_analyzer.py frame_sizes.csv
+# 1000 30
+
+import numpy as np
+import csv
+import sys
+import matplotlib.pyplot as plt
+
+# return the moving average
+def moving_average(x, w):
+ return np.convolve(x, np.ones(w), 'valid') / w
+
+def frame_size_analysis(filename, target_br, fps):
+ tbr = target_br * 1000 / fps
+
+ with open(filename, 'r') as infile:
+ raw_data = list(csv.reader(infile, delimiter=','))
+
+ data = np.array(raw_data).astype(float)
+ fsize = data[:, 0].astype(float) # frame size
+ qindex = data[:, 1].astype(float) # qindex
+
+ # Frame bit rate mismatch
+ mismatch = np.absolute(fsize - np.full(fsize.size, tbr))
+
+ # Count how many frames are more than 2.5x of frame target bit rate.
+ tbr_thr = tbr * 2.5
+ cnt = 0
+ idx = np.arange(fsize.size)
+ for i in idx:
+ if fsize[i] > tbr_thr:
+ cnt = cnt + 1
+
+ # Use the 15-frame moving window
+ win = 15
+ avg_fsize = moving_average(fsize, win)
+ win_mismatch = np.absolute(avg_fsize - np.full(avg_fsize.size, tbr))
+
+ print('[Target frame rate (bit)]:', "%.2f"%tbr)
+ print('[Average frame rate (bit)]:', "%.2f"%np.average(fsize))
+ print('[Frame rate standard deviation]:', "%.2f"%np.std(fsize))
+ print('[Max/min frame rate (bit)]:', "%.2f"%np.max(fsize), '/', "%.2f"%np.min(fsize))
+ print('[Average frame rate mismatch (bit)]:', "%.2f"%np.average(mismatch))
+ print('[Number of frames (frame rate > 2.5x of target frame rate)]:', cnt)
+ print(' Moving window size:', win)
+ print('[Moving average frame rate mismatch (bit)]:', "%.2f"%np.average(win_mismatch))
+ print('------------------------------')
+
+ figure, axis = plt.subplots(2)
+ x = np.arange(fsize.size)
+ axis[0].plot(x, fsize, color='blue')
+ axis[0].set_title("frame sizes")
+ axis[1].plot(x, qindex, color='blue')
+ axis[1].set_title("frame qindex")
+ plt.tight_layout()
+
+ # Save the plot
+ plotname = filename + '.png'
+ plt.savefig(plotname)
+ plt.show()
+
+if __name__ == '__main__':
+ if (len(sys.argv) < 4):
+ print(sys.argv[0], 'input_file, target_bitrate, fps')
+ sys.exit()
+ target_br = int(sys.argv[2])
+ fps = int(sys.argv[3])
+ frame_size_analysis(sys.argv[1], target_br, fps)