diff options
author | Shreesh Adiga <16567adigashreesh@gmail.com> | 2025-02-20 18:51:38 +0530 |
---|---|---|
committer | Kieran Kunhya <kierank@obe.tv> | 2025-03-23 15:25:48 +0000 |
commit | 26f2f03e0de278f296fbd1e8a09c43245b65f5e3 (patch) | |
tree | 0e45865a3b60318487e491d2e11a6a74c711e77f /libavcodec/ffv1_parse.c | |
parent | fc44ccd9814f6a7bdbc1cd96d6aa53c299a41272 (diff) | |
download | ffmpeg-26f2f03e0de278f296fbd1e8a09c43245b65f5e3.tar.gz |
swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following:
4 vinsertq to have interleaving of the vector lanes during load from memory.
4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout.
This patch replaces the above 8 instructions with 2 vpermq and
2 vpermd with a vector register similar to AVX512ICL version.
Observed the following numbers on various microarchitectures:
On AMD Zen3 laptop:
Before:
uyvytoyuv422_c: 51979.7 ( 1.00x)
uyvytoyuv422_sse2: 5410.5 ( 9.61x)
uyvytoyuv422_avx: 4642.7 (11.20x)
uyvytoyuv422_avx2: 4249.0 (12.23x)
After:
uyvytoyuv422_c: 51659.8 ( 1.00x)
uyvytoyuv422_sse2: 5420.8 ( 9.53x)
uyvytoyuv422_avx: 4651.2 (11.11x)
uyvytoyuv422_avx2: 3953.8 (13.07x)
On Intel Macbook Pro 2019:
Before:
uyvytoyuv422_c: 185014.4 ( 1.00x)
uyvytoyuv422_sse2: 22800.4 ( 8.11x)
uyvytoyuv422_avx: 19796.9 ( 9.35x)
uyvytoyuv422_avx2: 13141.9 (14.08x)
After:
uyvytoyuv422_c: 185093.4 ( 1.00x)
uyvytoyuv422_sse2: 22795.4 ( 8.12x)
uyvytoyuv422_avx: 19791.9 ( 9.35x)
uyvytoyuv422_avx2: 12043.1 (15.37x)
On AMD Zen4 desktop:
Before:
uyvytoyuv422_c: 29105.0 ( 1.00x)
uyvytoyuv422_sse2: 3888.0 ( 7.49x)
uyvytoyuv422_avx: 3374.2 ( 8.63x)
uyvytoyuv422_avx2: 2649.8 (10.98x)
uyvytoyuv422_avx512icl: 1615.0 (18.02x)
After:
uyvytoyuv422_c: 29093.4 ( 1.00x)
uyvytoyuv422_sse2: 3874.4 ( 7.51x)
uyvytoyuv422_avx: 3371.6 ( 8.63x)
uyvytoyuv422_avx2: 2174.6 (13.38x)
uyvytoyuv422_avx512icl: 1625.1 (17.90x)
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Diffstat (limited to 'libavcodec/ffv1_parse.c')
0 files changed, 0 insertions, 0 deletions