aboutsummaryrefslogtreecommitdiffstats
path: root/doc/APIchanges
diff options
context:
space:
mode:
authorNiklas Haas <git@haasn.dev>2025-07-29 13:27:52 +0200
committerkierank <kieran@kunhya.com>2025-08-03 22:13:51 +0000
commit7f00e24d70545dc303e43b62ab5ea5c743411411 (patch)
tree377434555f0db9e89aa8842fae1f3af0485fd053 /doc/APIchanges
parent586a1cd088d29a911e3cb949260bf25199980b01 (diff)
downloadffmpeg-7f00e24d70545dc303e43b62ab5ea5c743411411.tar.gz
vf_bwdif: add AVX512 implementation
I also tried replacing some of the instructions by more elaborate ones using masks, but I found no performance gain significant enough to be worth maintaining two code paths, so this implementation merely replaces the AVX2 implementation by drop-in AVX512 equivalents. bwdif8_c: 6362.2 ( 1.00x) bwdif8_sse2: 1004.9 ( 6.33x) bwdif8_ssse3: 946.0 ( 6.73x) bwdif8_avx2: 477.9 (13.31x) bwdif8_avx512: 273.3 (23.28x) bwdif10_c: 6341.5 ( 1.00x) bwdif10_sse2: 872.4 ( 7.27x) bwdif10_ssse3: 803.4 ( 7.89x) bwdif10_avx2: 416.7 (15.22x) bwdif10_avx512: 224.3 (28.27x) Realtime test at 3840x2160 yuv420p: avx2: frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93 avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93 The use of this function is gated behind avx512icl so that it doesn't downclock on Skylake.
Diffstat (limited to 'doc/APIchanges')
0 files changed, 0 insertions, 0 deletions