ffmpeg - Mirror of FFmpeg git repo

diff options

author	Martin Storsjö <martin@martin.st>	2017-01-05 12:52:06 +0200
committer	Martin Storsjö <martin@martin.st>	2017-01-24 22:36:11 +0200
commit	9f10cff61042dbc0c27efd2dea7f1d3da83eff1b (patch)
tree	44710314dd7bb298be9a2488881be33eb197c98e /libavcodec/hpeldsp.h
parent	ceb36b81781fc62814780bc3654ded53f239994b (diff)
download	ffmpeg-9f10cff61042dbc0c27efd2dea7f1d3da83eff1b.tar.gz

aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter

This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>

Diffstat (limited to 'libavcodec/hpeldsp.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: