diff options
author | Martin Storsjö <martin@martin.st> | 2017-02-25 00:20:25 +0200 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2017-03-19 22:53:57 +0200 |
commit | 32e273c111d8700dde895b80741622afc285ad3c (patch) | |
tree | f4cc534efeb7ff5e6d476c60ddf33b5e61ffafc7 /libavcodec/arm/vorbisdsp_neon.S | |
parent | c1619318e540a214c730c6a300ebee0a4f450ba2 (diff) | |
download | ffmpeg-32e273c111d8700dde895b80741622afc285ad3c.tar.gz |
arm: vp9itxfm16: Avoid reloading the idct32 coefficients
Keep the idct32 coefficients in narrow form in q6-q7, and idct16
coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering
q0-q3 in the pass1 function, and squeeze the idct16 coefficients
into q0-q1 in the pass2 function to avoid reloading them.
The idct16 coefficients are clobbered and reloaded within idct32_odd
though, since that turns out to be faster than narrowing them and
swapping them into q6-q7.
Before: Cortex A7 A8 A9 A53
vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.8 18268.4 19598.0 14079.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37699.0 38665.2 32542.3 24472.2
After:
vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22270.8 18159.3 19531.0 13865.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37523.3 37731.6 32181.7 24071.2
Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/arm/vorbisdsp_neon.S')
0 files changed, 0 insertions, 0 deletions