aboutsummaryrefslogtreecommitdiffstats
path: root/tests/ref/vsynth/vsynth2-mov-bpp16
diff options
context:
space:
mode:
authorZhao Zhili <zhilizhao@tencent.com>2025-08-14 12:42:38 +0800
committerZhao Zhili <quink@noreply.code.ffmpeg.org>2025-09-03 06:55:37 +0000
commit6ce02bcc3acbfcd48052fd21456722f4d35cda8e (patch)
tree9ad96ebb827ac56d7a7f13d88beac97f5655d215 /tests/ref/vsynth/vsynth2-mov-bpp16
parent2e924176036b415d5c9534a605f7f6a87cae423d (diff)
downloadffmpeg-master.tar.gz
avcodec/aarch64/vvc: Optimize apply_bdofHEADmaster
Before this patch, prof_grad_filter calculate gh[0], gh[1], gv[0], gv[1] and save them to stack. derive_bdof_vx_vy load them from stack and calculate gh[0] + gh[1], gv[0] + gv[1]. apply_bdof_min_block load them from stack and calculate gh[0] - gh[1], gv[0] - gv[1] This patch add bdof_grad_filter, which calculate gh[0] + gh[1], gh[0] - gh[1], gv[0] + gv[1], gv[0] - gv[1], and save them to stack, so derive_bdof_vx_vy and apply_bdof_min_block can use the results directly. prof_grad_filter is kept for reuse by other functions in the future. Benchmark on rpi5 with gcc 12 Before After -------------------------------------------------------------------- apply_bdof_8_8x16_c: | 7431.4 ( 1.00x) | 7371.7 ( 1.00x) apply_bdof_8_8x16_neon: | 1175.4 ( 6.32x) | 1036.3 ( 7.11x) apply_bdof_8_16x8_c: | 7182.2 ( 1.00x) | 7201.1 ( 1.00x) apply_bdof_8_16x8_neon: | 1021.7 ( 7.03x) | 879.9 ( 8.18x) apply_bdof_8_16x16_c: | 14577.1 ( 1.00x) | 14589.3 ( 1.00x) apply_bdof_8_16x16_neon: | 2012.8 ( 7.24x) | 1743.3 ( 8.37x) apply_bdof_10_8x16_c: | 7292.4 ( 1.00x) | 7308.5 ( 1.00x) apply_bdof_10_8x16_neon: | 1156.3 ( 6.31x) | 1045.3 ( 6.99x) apply_bdof_10_16x8_c: | 7112.4 ( 1.00x) | 7214.4 ( 1.00x) apply_bdof_10_16x8_neon: | 1007.6 ( 7.06x) | 904.8 ( 7.97x) apply_bdof_10_16x16_c: | 14363.3 ( 1.00x) | 14476.4 ( 1.00x) apply_bdof_10_16x16_neon: | 1986.9 ( 7.23x) | 1783.1 ( 8.12x) apply_bdof_12_8x16_c: | 7433.3 ( 1.00x) | 7374.7 ( 1.00x) apply_bdof_12_8x16_neon: | 1155.9 ( 6.43x) | 1040.8 ( 7.09x) apply_bdof_12_16x8_c: | 7171.1 ( 1.00x) | 7376.3 ( 1.00x) apply_bdof_12_16x8_neon: | 1010.8 ( 7.09x) | 899.4 ( 8.20x) apply_bdof_12_16x16_c: | 14515.5 ( 1.00x) | 14731.5 ( 1.00x) apply_bdof_12_16x16_neon: | 1988.4 ( 7.30x) | 1785.2 ( 8.25x)
Diffstat (limited to 'tests/ref/vsynth/vsynth2-mov-bpp16')
0 files changed, 0 insertions, 0 deletions