diff options
author | Ben Avison <bavison@riscosopen.org> | 2014-07-11 00:14:28 +0100 |
---|---|---|
committer | Michael Niedermayer <michaelni@gmx.at> | 2014-07-13 15:17:04 +0200 |
commit | 42c1cc35b7623ce76c7b55c6bc100f135e17cd4f (patch) | |
tree | 01d287e68c23e68bfaa806103c05c9683d301459 /libavcodec/x86/sbrdsp.asm | |
parent | 276bef53406752b3ee9289c650bef2409cde6229 (diff) | |
download | ffmpeg-42c1cc35b7623ce76c7b55c6bc100f135e17cd4f.tar.gz |
armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Diffstat (limited to 'libavcodec/x86/sbrdsp.asm')
0 files changed, 0 insertions, 0 deletions