diff options
author | Shreesh Adiga <16567adigashreesh@gmail.com> | 2025-02-03 22:03:30 +0530 |
---|---|---|
committer | James Almer <jamrial@gmail.com> | 2025-02-18 12:43:57 -0300 |
commit | e18f87ed9f9f61c980420b315dc8ecb308831bc5 (patch) | |
tree | f6b3a3455d9ce6e53f4578915f895dfb2a38266c /tests/ref/fate/filter-pixdesc-p016be | |
parent | 08e37fa0820c4286624b36e6a641c229e6aa2bb5 (diff) | |
download | ffmpeg-e18f87ed9f9f61c980420b315dc8ecb308831bc5.tar.gz |
swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.
Instead of loading the vectors with interleaved lanes as done
in AVX2 version, normal load is used. At the end of packuswb,
for U and V, an extra permute operation is done to get the
required layout.
AMD 7950x Zen 4 benchmark data:
uyvytoyuv422_c: 29105.0 ( 1.00x)
uyvytoyuv422_sse2: 3888.0 ( 7.49x)
uyvytoyuv422_avx: 3374.2 ( 8.63x)
uyvytoyuv422_avx2: 2649.8 (10.98x)
uyvytoyuv422_avx512icl: 1615.0 (18.02x)
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Diffstat (limited to 'tests/ref/fate/filter-pixdesc-p016be')
0 files changed, 0 insertions, 0 deletions