diff options
author | Niklas Haas <git@haasn.dev> | 2025-07-29 13:27:52 +0200 |
---|---|---|
committer | kierank <kieran@kunhya.com> | 2025-08-03 22:13:51 +0000 |
commit | 7f00e24d70545dc303e43b62ab5ea5c743411411 (patch) | |
tree | 377434555f0db9e89aa8842fae1f3af0485fd053 /doc/examples/show_metadata.c | |
parent | 586a1cd088d29a911e3cb949260bf25199980b01 (diff) | |
download | ffmpeg-7f00e24d70545dc303e43b62ab5ea5c743411411.tar.gz |
vf_bwdif: add AVX512 implementation
I also tried replacing some of the instructions by more elaborate ones
using masks, but I found no performance gain significant enough to be worth
maintaining two code paths, so this implementation merely replaces the AVX2
implementation by drop-in AVX512 equivalents.
bwdif8_c: 6362.2 ( 1.00x)
bwdif8_sse2: 1004.9 ( 6.33x)
bwdif8_ssse3: 946.0 ( 6.73x)
bwdif8_avx2: 477.9 (13.31x)
bwdif8_avx512: 273.3 (23.28x)
bwdif10_c: 6341.5 ( 1.00x)
bwdif10_sse2: 872.4 ( 7.27x)
bwdif10_ssse3: 803.4 ( 7.89x)
bwdif10_avx2: 416.7 (15.22x)
bwdif10_avx512: 224.3 (28.27x)
Realtime test at 3840x2160 yuv420p:
avx2: frame=20000 fps=3370 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed=67.4x elapsed=0:00:05.93
avx512: frame=20000 fps=5077 q=-0.0 Lsize=N/A time=00:06:40.00 bitrate=N/A speed= 102x elapsed=0:00:03.93
The use of this function is gated behind avx512icl so that it doesn't
downclock on Skylake.
Diffstat (limited to 'doc/examples/show_metadata.c')
0 files changed, 0 insertions, 0 deletions