aboutsummaryrefslogtreecommitdiffstats
path: root/libavdevice
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2017-01-10 00:15:16 +0200
committerMichael Niedermayer <michael@niedermayer.cc>2017-01-14 21:13:32 +0100
commit8b11a89c06b94632d545f67ca508bd9c05c435ac (patch)
tree723636e32014c76b820b49b6d4c5b5c64c11837e /libavdevice
parent388f6e6715ff6f84efc858f69530867a235fe909 (diff)
downloadffmpeg-8b11a89c06b94632d545f67ca508bd9c05c435ac.tar.gz
aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
This work is sponsored by, and copyright, Google. Previously all subpartitions except the eob=1 (DC) case ran with the same runtime: vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 By skipping individual 8x16 or 8x32 pixel slices in the first pass, we reduce the runtime of these functions like this: vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 vp9_inv_dct_dct_16x16_sub2_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub4_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub8_add_neon: 1036.7 vp9_inv_dct_dct_16x16_sub12_add_neon: 1372.1 vp9_inv_dct_dct_16x16_sub16_add_neon: 1372.1 vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5190.2 vp9_inv_dct_dct_32x32_sub4_add_neon: 5180.0 vp9_inv_dct_dct_32x32_sub8_add_neon: 5183.1 vp9_inv_dct_dct_32x32_sub12_add_neon: 6161.5 vp9_inv_dct_dct_32x32_sub16_add_neon: 6155.5 vp9_inv_dct_dct_32x32_sub20_add_neon: 7136.3 vp9_inv_dct_dct_32x32_sub24_add_neon: 7128.4 vp9_inv_dct_dct_32x32_sub28_add_neon: 8098.9 vp9_inv_dct_dct_32x32_sub32_add_neon: 8098.8 I.e. in general a very minor overhead for the full subpartition case due to the additional cmps, but a significant speedup for the cases when we only need to process a small part of the actual input data. This is cherrypicked from libav commits cad42fadcd2c2ae1b3676bb398844a1f521a2d7b and a0c443a3980dc22eb02b067ac4cb9ffa2f9b04d2. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Diffstat (limited to 'libavdevice')
0 files changed, 0 insertions, 0 deletions