aboutsummaryrefslogtreecommitdiffstats
path: root/libavfilter/vf_gradfun.c
diff options
context:
space:
mode:
authorSebastian Pop <spop@amazon.com>2019-11-17 14:13:13 -0600
committerMichael Niedermayer <michael@niedermayer.cc>2019-12-17 23:41:47 +0100
commitbd831912712e32b8e78d409bfa7ea7e668ce4b42 (patch)
tree9c83daba41b45fcb56f0cd013d2600fbfab39785 /libavfilter/vf_gradfun.c
parente43d66dc67186a2ca9fefec4e6c189116a3029ba (diff)
downloadffmpeg-bd831912712e32b8e78d409bfa7ea7e668ce4b42.tar.gz
swscale/aarch64: use multiply accumulate and increase vector factor to 4
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate and bumps the vectorization factor from 2 to 4. The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214 after: t:0.032168 avg:0.032215 max:0.033081 min:0.032146 The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus: $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181 after: t:0.014015 avg:0.014096 max:0.015018 min:0.013971 Tested with `make check` on aarch64-linux. Signed-off-by: Sebastian Pop <spop@amazon.com> Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Diffstat (limited to 'libavfilter/vf_gradfun.c')
0 files changed, 0 insertions, 0 deletions