diff options
author | Martin Storsjö <martin@martin.st> | 2016-11-14 12:32:26 +0200 |
---|---|---|
committer | Ronald S. Bultje <rsbultje@gmail.com> | 2016-11-15 15:10:03 -0500 |
commit | f43079e11cb445e6b70b149d9cdb829091ec2155 (patch) | |
tree | d09133185626819c3878d3b4320dce1d1b4fa547 /libavfilter/vf_vectorscope.c | |
parent | 1f7801c2bc93811df4daf09b839572bc4c8b3524 (diff) | |
download | ffmpeg-f43079e11cb445e6b70b149d9cdb829091ec2155.tar.gz |
aarch64: vp9: Add NEON itxfm routines
This work is sponsored by, and copyright, Google.
These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.
The fact that aarch64 doesn't have the same d/q register
aliasing makes some of the macros quite a bit simpler as well.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM AArch64
vp9_inv_adst_adst_4x4_add_neon: 90.0 87.7
vp9_inv_adst_adst_8x8_add_neon: 400.0 354.7
vp9_inv_adst_adst_16x16_add_neon: 2526.5 1827.2
vp9_inv_dct_dct_4x4_add_neon: 74.0 72.7
vp9_inv_dct_dct_8x8_add_neon: 271.0 256.7
vp9_inv_dct_dct_16x16_add_neon: 1960.7 1372.7
vp9_inv_dct_dct_32x32_add_neon: 11988.9 8088.3
vp9_inv_wht_wht_4x4_add_neon: 63.0 57.7
The speedup vs C code (2-4x) is smaller than in the 32 bit case,
mostly because the C code ends up significantly faster (around
1.6x faster, with GCC 5.4) when built for aarch64.
Examples of runtimes vs C on a Cortex A57 (for a slightly older version
of the patch):
A57 gcc-5.3 neon
vp9_inv_adst_adst_4x4_add_neon: 152.2 60.0
vp9_inv_adst_adst_8x8_add_neon: 948.2 288.0
vp9_inv_adst_adst_16x16_add_neon: 4830.4 1380.5
vp9_inv_dct_dct_4x4_add_neon: 153.0 58.6
vp9_inv_dct_dct_8x8_add_neon: 789.2 180.2
vp9_inv_dct_dct_16x16_add_neon: 3639.6 917.1
vp9_inv_dct_dct_32x32_add_neon: 20462.1 4985.0
vp9_inv_wht_wht_4x4_add_neon: 91.0 49.8
The asm is around factor 3-4 faster than C on the cortex-a57 and the asm
is around 30-50% faster on the a57 compared to the a53.
This is an adapted cherry-pick from libav commit
3c9546dfafcdfe8e7860aff9ebbf609318220f29.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Diffstat (limited to 'libavfilter/vf_vectorscope.c')
0 files changed, 0 insertions, 0 deletions