aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/jpeg2000dsp.h
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2019-02-01 00:12:46 +0200
committerMartin Storsjö <martin@martin.st>2019-02-19 11:46:28 +0200
commit7e42d5f0ab2aeac811fd01e122627c9198b13f01 (patch)
tree7a04b33369318adf05a6a859f80519abff58b71d /libavcodec/jpeg2000dsp.h
parent49f9c4272c4029b57ff300d908ba03c6332fc9c4 (diff)
downloadffmpeg-7e42d5f0ab2aeac811fd01e122627c9198b13f01.tar.gz
aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which could be done for free in the arm version). This gives a decent speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on Cortex A73. Before: Cortex A53 A72 A73 vp8_idct_add_neon: 79.7 67.5 65.0 After: vp8_idct_add_neon: 67.7 64.8 66.7 Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/jpeg2000dsp.h')
0 files changed, 0 insertions, 0 deletions