aboutsummaryrefslogtreecommitdiffstats
path: root/libavfilter/textutils.c
diff options
context:
space:
mode:
authorRémi Denis-Courmont <remi@remlab.net>2024-06-01 21:32:56 +0300
committerRémi Denis-Courmont <remi@remlab.net>2024-06-04 17:40:41 +0300
commit4e120fbbbd087c3acbad6ce2e8c7b1262a5c8632 (patch)
treeb04252a83e826cf23cc0509fbe7118ccefd3f2c1 /libavfilter/textutils.c
parent30797e4ff6c8c537471c386cd019a6a48a721f01 (diff)
downloadffmpeg-4e120fbbbd087c3acbad6ce2e8c7b1262a5c8632.tar.gz
lavc/vp8dsp: add R-V V vp7_idct_dc_add4y
As with idct_dc_add, most of the code is shared with, and replaces, the previous VP8 function. To improve performance, we break down the 16x4 matrix into 4 rows, rather than 4 squares. Thus strided loads and stores are avoided, and the 4 DC calculations are vectored. Unfortunately this requires a vector gather to splat the DC values, but overall this is still a win for performance: T-Head C908: vp7_idct_dc_add4y_c: 7.2 vp7_idct_dc_add4y_rvv_i32: 2.2 vp8_idct_dc_add4y_c: 6.2 vp8_idct_dc_add4y_rvv_i32: 2.2 (before) vp8_idct_dc_add4y_rvv_i32: 1.7 SpacemiT X60: vp7_idct_dc_add4y_c: 6.2 vp7_idct_dc_add4y_rvv_i32: 2.0 vp8_idct_dc_add4y_c: 5.5 vp8_idct_dc_add4y_rvv_i32: 2.5 (before) vp8_idct_dc_add4y_rvv_i32: 1.7 I also tried to provision the DC values using indexed loads. It ends up slower overall, especially for VP7, as we then have to compute 16 DC's instead of just 4.
Diffstat (limited to 'libavfilter/textutils.c')
0 files changed, 0 insertions, 0 deletions