aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/avs3_parser.c
diff options
context:
space:
mode:
authorShreesh Adiga <16567adigashreesh@gmail.com>2025-02-20 18:51:38 +0530
committerKieran Kunhya <kierank@obe.tv>2025-03-23 15:25:48 +0000
commit26f2f03e0de278f296fbd1e8a09c43245b65f5e3 (patch)
tree0e45865a3b60318487e491d2e11a6a74c711e77f /libavcodec/avs3_parser.c
parentfc44ccd9814f6a7bdbc1cd96d6aa53c299a41272 (diff)
downloadffmpeg-26f2f03e0de278f296fbd1e8a09c43245b65f5e3.tar.gz
swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422
Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the following: 4 vinsertq to have interleaving of the vector lanes during load from memory. 4 vperm2i128 inside 4 RSHIFT_COPY calls to achieve the desired layout. This patch replaces the above 8 instructions with 2 vpermq and 2 vpermd with a vector register similar to AVX512ICL version. Observed the following numbers on various microarchitectures: On AMD Zen3 laptop: Before: uyvytoyuv422_c: 51979.7 ( 1.00x) uyvytoyuv422_sse2: 5410.5 ( 9.61x) uyvytoyuv422_avx: 4642.7 (11.20x) uyvytoyuv422_avx2: 4249.0 (12.23x) After: uyvytoyuv422_c: 51659.8 ( 1.00x) uyvytoyuv422_sse2: 5420.8 ( 9.53x) uyvytoyuv422_avx: 4651.2 (11.11x) uyvytoyuv422_avx2: 3953.8 (13.07x) On Intel Macbook Pro 2019: Before: uyvytoyuv422_c: 185014.4 ( 1.00x) uyvytoyuv422_sse2: 22800.4 ( 8.11x) uyvytoyuv422_avx: 19796.9 ( 9.35x) uyvytoyuv422_avx2: 13141.9 (14.08x) After: uyvytoyuv422_c: 185093.4 ( 1.00x) uyvytoyuv422_sse2: 22795.4 ( 8.12x) uyvytoyuv422_avx: 19791.9 ( 9.35x) uyvytoyuv422_avx2: 12043.1 (15.37x) On AMD Zen4 desktop: Before: uyvytoyuv422_c: 29105.0 ( 1.00x) uyvytoyuv422_sse2: 3888.0 ( 7.49x) uyvytoyuv422_avx: 3374.2 ( 8.63x) uyvytoyuv422_avx2: 2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) After: uyvytoyuv422_c: 29093.4 ( 1.00x) uyvytoyuv422_sse2: 3874.4 ( 7.51x) uyvytoyuv422_avx: 3371.6 ( 8.63x) uyvytoyuv422_avx2: 2174.6 (13.38x) uyvytoyuv422_avx512icl: 1625.1 (17.90x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Diffstat (limited to 'libavcodec/avs3_parser.c')
0 files changed, 0 insertions, 0 deletions