diff options
author | Martin Storsjö <martin@martin.st> | 2019-02-01 00:12:46 +0200 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2019-02-19 11:46:28 +0200 |
commit | 7e42d5f0ab2aeac811fd01e122627c9198b13f01 (patch) | |
tree | 7a04b33369318adf05a6a859f80519abff58b71d /tests/ref/fate/filter-pixdesc-gray9le | |
parent | 49f9c4272c4029b57ff300d908ba03c6332fc9c4 (diff) | |
download | ffmpeg-7e42d5f0ab2aeac811fd01e122627c9198b13f01.tar.gz |
aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).
This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.
Before: Cortex A53 A72 A73
vp8_idct_add_neon: 79.7 67.5 65.0
After:
vp8_idct_add_neon: 67.7 64.8 66.7
Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'tests/ref/fate/filter-pixdesc-gray9le')
0 files changed, 0 insertions, 0 deletions