ffmpeg - Mirror of FFmpeg git repo

diff options

author	Rémi Denis-Courmont <remi@remlab.net>	2024-06-01 21:32:56 +0300
committer	Rémi Denis-Courmont <remi@remlab.net>	2024-06-04 17:40:41 +0300
commit	4e120fbbbd087c3acbad6ce2e8c7b1262a5c8632 (patch)
tree	b04252a83e826cf23cc0509fbe7118ccefd3f2c1 /libavfilter/textutils.c
parent	30797e4ff6c8c537471c386cd019a6a48a721f01 (diff)
download	ffmpeg-4e120fbbbd087c3acbad6ce2e8c7b1262a5c8632.tar.gz

lavc/vp8dsp: add R-V V vp7_idct_dc_add4y

As with idct_dc_add, most of the code is shared with, and replaces, the previous VP8 function. To improve performance, we break down the 16x4 matrix into 4 rows, rather than 4 squares. Thus strided loads and stores are avoided, and the 4 DC calculations are vectored. Unfortunately this requires a vector gather to splat the DC values, but overall this is still a win for performance: T-Head C908: vp7_idct_dc_add4y_c: 7.2 vp7_idct_dc_add4y_rvv_i32: 2.2 vp8_idct_dc_add4y_c: 6.2 vp8_idct_dc_add4y_rvv_i32: 2.2 (before) vp8_idct_dc_add4y_rvv_i32: 1.7 SpacemiT X60: vp7_idct_dc_add4y_c: 6.2 vp7_idct_dc_add4y_rvv_i32: 2.0 vp8_idct_dc_add4y_c: 5.5 vp8_idct_dc_add4y_rvv_i32: 2.5 (before) vp8_idct_dc_add4y_rvv_i32: 1.7 I also tried to provision the DC values using indexed loads. It ends up slower overall, especially for VP7, as we then have to compute 16 DC's instead of just 4.

Diffstat (limited to 'libavfilter/textutils.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: