diff options
author | Martin Storsjö <martin@martin.st> | 2016-12-14 23:48:35 +0200 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2017-01-24 22:36:05 +0200 |
commit | 638eceed4752dbde9114718d53f199e07848f1c2 (patch) | |
tree | 0a2ceafb9a54299ad164c986c60ddd8b5d435c6f /libavcodec/aarch64/vp9dsp_init.h | |
parent | 48ad3fe1beee5c3eb453093defde503f354c8f1f (diff) | |
download | ffmpeg-638eceed4752dbde9114718d53f199e07848f1c2.tar.gz |
aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC
This work is sponsored by, and copyright, Google.
This has mostly got the same differences to the 8 bit version as
in the arm version. For the horizontal filters, we do 16 pixels
in parallel as well. For the 8 pixel wide vertical filters, we can
accumulate 4 rows before storing, just as in the 8 bit version.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM AArch64
vp9_avg4_10bpp_neon: 35.7 30.7
vp9_avg8_10bpp_neon: 93.5 84.7
vp9_avg16_10bpp_neon: 324.4 296.6
vp9_avg32_10bpp_neon: 1236.5 1148.2
vp9_avg64_10bpp_neon: 4639.6 4571.1
vp9_avg_8tap_smooth_4h_10bpp_neon: 130.0 128.0
vp9_avg_8tap_smooth_4hv_10bpp_neon: 440.0 440.5
vp9_avg_8tap_smooth_4v_10bpp_neon: 114.0 105.5
vp9_avg_8tap_smooth_8h_10bpp_neon: 327.0 314.0
vp9_avg_8tap_smooth_8hv_10bpp_neon: 918.7 865.4
vp9_avg_8tap_smooth_8v_10bpp_neon: 330.0 300.2
vp9_avg_8tap_smooth_16h_10bpp_neon: 1187.5 1155.5
vp9_avg_8tap_smooth_16hv_10bpp_neon: 2663.1 2591.0
vp9_avg_8tap_smooth_16v_10bpp_neon: 1107.4 1078.3
vp9_avg_8tap_smooth_64h_10bpp_neon: 17754.6 17454.7
vp9_avg_8tap_smooth_64hv_10bpp_neon: 33285.2 33001.5
vp9_avg_8tap_smooth_64v_10bpp_neon: 16066.9 16048.6
vp9_put4_10bpp_neon: 25.5 21.7
vp9_put8_10bpp_neon: 56.0 52.0
vp9_put16_10bpp_neon/armv8: 183.0 163.1
vp9_put32_10bpp_neon/armv8: 678.6 563.1
vp9_put64_10bpp_neon/armv8: 2679.9 2195.8
vp9_put_8tap_smooth_4h_10bpp_neon: 120.0 118.0
vp9_put_8tap_smooth_4hv_10bpp_neon: 435.2 435.0
vp9_put_8tap_smooth_4v_10bpp_neon: 107.0 98.2
vp9_put_8tap_smooth_8h_10bpp_neon: 303.0 290.0
vp9_put_8tap_smooth_8hv_10bpp_neon: 893.7 828.7
vp9_put_8tap_smooth_8v_10bpp_neon: 305.5 263.5
vp9_put_8tap_smooth_16h_10bpp_neon: 1089.1 1059.2
vp9_put_8tap_smooth_16hv_10bpp_neon: 2578.8 2452.4
vp9_put_8tap_smooth_16v_10bpp_neon: 1009.5 933.5
vp9_put_8tap_smooth_64h_10bpp_neon: 16223.4 15918.6
vp9_put_8tap_smooth_64hv_10bpp_neon: 32153.0 31016.2
vp9_put_8tap_smooth_64v_10bpp_neon: 14516.5 13748.1
These are generally about as fast as the corresponding ARM
routines on the same CPU (at least on the A53), in most cases
marginally faster.
The speedup vs C code is around 4-9x.
Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/aarch64/vp9dsp_init.h')
-rw-r--r-- | libavcodec/aarch64/vp9dsp_init.h | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/libavcodec/aarch64/vp9dsp_init.h b/libavcodec/aarch64/vp9dsp_init.h new file mode 100644 index 0000000000..9df1752c62 --- /dev/null +++ b/libavcodec/aarch64/vp9dsp_init.h @@ -0,0 +1,29 @@ +/* + * Copyright (c) 2017 Google Inc. + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifndef AVCODEC_AARCH64_VP9DSP_INIT_H +#define AVCODEC_AARCH64_VP9DSP_INIT_H + +#include "libavcodec/vp9dsp.h" + +void ff_vp9dsp_init_10bpp_aarch64(VP9DSPContext *dsp); +void ff_vp9dsp_init_12bpp_aarch64(VP9DSPContext *dsp); + +#endif /* AVCODEC_AARCH64_VP9DSP_INIT_H */ |