diff options
author | Rémi Denis-Courmont <remi@remlab.net> | 2024-07-11 22:01:25 +0300 |
---|---|---|
committer | Rémi Denis-Courmont <remi@remlab.net> | 2024-07-14 21:06:50 +0300 |
commit | c654e37254efb6de5d3f1355dd7936979dd0dca3 (patch) | |
tree | 7db84abc0077bcc19937c12e6fcc3b369026fd8f /libavcodec/arm/vp9lpf_16bpp_neon.S | |
parent | 80ddc727178837a5ea5e6e463f28fedf6c1051dd (diff) | |
download | ffmpeg-c654e37254efb6de5d3f1355dd7936979dd0dca3.tar.gz |
lavc/h264dsp: R-V V high-depth h264_idct8_add
Unlike the 8-bit version, we need two iterations to process this within
128-bit vectors. This adds some extra complexity for pointer arithmetic
and counting down which is unnecessary in the 8-bit variant.
Accordingly the gain relative to C are just slight better than half as
good with 128-bit vectors as with 256-bit ones.
T-Head C908 (2 iterations):
h264_idct8_add_9bpp_c: 17.5
h264_idct8_add_9bpp_rvv_i32: 10.0
h264_idct8_add_10bpp_c: 17.5
h264_idct8_add_10bpp_rvv_i32: 9.7
h264_idct8_add_12bpp_c: 17.7
h264_idct8_add_12bpp_rvv_i32: 9.7
h264_idct8_add_14bpp_c: 17.7
h264_idct8_add_14bpp_rvv_i32: 9.7
SpacemiT X60 (single iteration):
h264_idct8_add_9bpp_c: 15.2
h264_idct8_add_9bpp_rvv_i32: 5.0
h264_idct8_add_10bpp_c: 15.2
h264_idct8_add_10bpp_rvv_i32: 5.0
h264_idct8_add_12bpp_c: 14.7
h264_idct8_add_12bpp_rvv_i32: 5.0
h264_idct8_add_14bpp_c: 14.7
h264_idct8_add_14bpp_rvv_i32: 4.7
Diffstat (limited to 'libavcodec/arm/vp9lpf_16bpp_neon.S')
0 files changed, 0 insertions, 0 deletions