diff options
author | Rémi Denis-Courmont <remi@remlab.net> | 2023-11-02 21:08:56 +0200 |
---|---|---|
committer | Rémi Denis-Courmont <remi@remlab.net> | 2023-11-06 22:09:30 +0200 |
commit | adc87a5f7c99c89364b89e51d7c42fcb36cf15b0 (patch) | |
tree | 3e2437f743423144c756461a7e66f0abbe0af9ed /tests/checkasm/float_dsp.c | |
parent | 02594c8c011ca2cdc20334e9bb812ec8f6f37cf3 (diff) | |
download | ffmpeg-adc87a5f7c99c89364b89e51d7c42fcb36cf15b0.tar.gz |
lavc/opusdsp: rewrite R-V V postfilter
This uses a more traditional approach allowing up processing of up to
period minus two elements per iteration. This also allows the algorithm
to work for all and any vector length.
As the T-Head C908 device under test can load 16 elements loop, there is
unsurprisingly a little performance drop when the period is minimal and
the parallelism is capped at 13 elements:
Before:
postfilter_15_c: 21222.2
postfilter_15_rvv_f32: 22007.7
postfilter_512_c: 20189.7
postfilter_512_rvv_f32: 22004.2
postfilter_1022_c: 20189.7
postfilter_1022_rvv_f32: 22004.2
After:
postfilter_15_c: 20189.5
postfilter_15_rvv_f32: 7057.2
postfilter_512_c: 20189.5
postfilter_512_rvv_f32: 5667.2
postfilter_1022_c: 20192.7
postfilter_1022_rvv_f32: 5667.2
Diffstat (limited to 'tests/checkasm/float_dsp.c')
0 files changed, 0 insertions, 0 deletions