aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/arm/videodsp_init_arm.c
diff options
context:
space:
mode:
authorBen Avison <bavison@riscosopen.org>2014-07-11 00:12:31 +0100
committerMartin Storsjö <martin@martin.st>2014-07-18 01:34:08 +0300
commit5c22e8e4ad0852d61d5c4ba8d67d33fd72339497 (patch)
tree0203e0320cec765670ccd3388987963262c6272d /libavcodec/arm/videodsp_init_arm.c
parent2d60444331fca1910510038dd3817bea885c2367 (diff)
downloadffmpeg-5c22e8e4ad0852d61d5c4ba8d67d33fd72339497.tar.gz
armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only requires mdct_bits == 6. This relatively small size lent itself to unrolling the loops a small number of times, and encoding offsets calculated at assembly time within the load/store instructions of each iteration. In the more general case (codecs such as AAC and AC3) much larger arrays are used - mdct_bits == [8, 9, 11]. The old method does not scale for these cases, so more integer registers are used with non-unrolled versions of the loops (and with some stack spillage). The postrotation filter loop is still unrolled by a factor of 2 to permit the double-buffering of some VFP registers to facilitate overlap of neighbouring iterations. I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same example AAC stream: Before After Mean StdDev Mean StdDev Confidence Change aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8% ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1% Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/arm/videodsp_init_arm.c')
0 files changed, 0 insertions, 0 deletions