aboutsummaryrefslogtreecommitdiffstats
path: root/tests/ref/vsynth/vsynth1-dnxhd-1080i-10bit
diff options
context:
space:
mode:
authorShreesh Adiga <16567adigashreesh@gmail.com>2025-02-03 22:03:30 +0530
committerJames Almer <jamrial@gmail.com>2025-02-18 12:43:57 -0300
commite18f87ed9f9f61c980420b315dc8ecb308831bc5 (patch)
treef6b3a3455d9ce6e53f4578915f895dfb2a38266c /tests/ref/vsynth/vsynth1-dnxhd-1080i-10bit
parent08e37fa0820c4286624b36e6a641c229e6aa2bb5 (diff)
downloadffmpeg-e18f87ed9f9f61c980420b315dc8ecb308831bc5.tar.gz
swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422
The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb. Instead of loading the vectors with interleaved lanes as done in AVX2 version, normal load is used. At the end of packuswb, for U and V, an extra permute operation is done to get the required layout. AMD 7950x Zen 4 benchmark data: uyvytoyuv422_c: 29105.0 ( 1.00x) uyvytoyuv422_sse2: 3888.0 ( 7.49x) uyvytoyuv422_avx: 3374.2 ( 8.63x) uyvytoyuv422_avx2: 2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
Diffstat (limited to 'tests/ref/vsynth/vsynth1-dnxhd-1080i-10bit')
0 files changed, 0 insertions, 0 deletions