aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/aarch64
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2017-01-02 22:50:38 +0200
committerMartin Storsjö <martin@martin.st>2017-02-24 00:03:43 +0200
commit402546a17233a8815307df9e14ff88cd70424537 (patch)
treedc516174e45d64738491513815c37e7edeabf938 /libavcodec/aarch64
parent575e31e931e4178e9f1e24407503c9b4ec0ef9ba (diff)
downloadffmpeg-402546a17233a8815307df9e14ff88cd70424537.tar.gz
arm: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed q4-q7 onto the stack even though it didn't clobber them; there are plenty of registers that can be used to allow keeping all the idct coefficients in registers without having to reload different subsets of them at different stages in the transform. Since the idct16 core transform avoids clobbering q4-q7 (but clobbers q2-q3 instead, to avoid needing to back up and restore q4-q7 at all in the idct16 function), and the lanewise vmul needs a register in the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 while doing idct16. While keeping these coefficients in registers, we still can skip pushing q7. Before: Cortex A7 A8 A9 A53 vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 After: vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/aarch64')
0 files changed, 0 insertions, 0 deletions