aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/arm/vorbisdsp_neon.S
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2017-02-25 00:20:25 +0200
committerMartin Storsjö <martin@martin.st>2017-03-19 22:53:57 +0200
commit32e273c111d8700dde895b80741622afc285ad3c (patch)
treef4cc534efeb7ff5e6d476c60ddf33b5e61ffafc7 /libavcodec/arm/vorbisdsp_neon.S
parentc1619318e540a214c730c6a300ebee0a4f450ba2 (diff)
downloadffmpeg-32e273c111d8700dde895b80741622afc285ad3c.tar.gz
arm: vp9itxfm16: Avoid reloading the idct32 coefficients
Keep the idct32 coefficients in narrow form in q6-q7, and idct16 coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering q0-q3 in the pass1 function, and squeeze the idct16 coefficients into q0-q1 in the pass2 function to avoid reloading them. The idct16 coefficients are clobbered and reloaded within idct32_odd though, since that turns out to be faster than narrowing them and swapping them into q6-q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22653.8 18268.4 19598.0 14079.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37699.0 38665.2 32542.3 24472.2 After: vp9_inv_dct_dct_32x32_sub4_add_10_neon: 22270.8 18159.3 19531.0 13865.0 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 37523.3 37731.6 32181.7 24071.2 Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/arm/vorbisdsp_neon.S')
0 files changed, 0 insertions, 0 deletions