aboutsummaryrefslogtreecommitdiffstats
path: root/libswscale
Commit message (Collapse)AuthorAgeFilesLines
* swscale/output: don't leave the alpha channel undefined in vuyx and xv36leJames Almer2024-08-131-32/+6
| | | | | | It's non-determistic, as shown by poisoning avfilter buffers instead of zeroing them. Signed-off-by: James Almer <jamrial@gmail.com>
* sws/riscv: depend on RVB and simplify accordinglyRémi Denis-Courmont2024-08-052-7/+5
|
* lavu/riscv: count bytes rather than words for bswap32Rémi Denis-Courmont2024-07-302-3/+2
| | | | This removes the dependency on Zba at essentially zero cost.
* swscale: [loongarch] Fix checkasm-sw_yuv2rgb failure.Shiyou Yin2024-07-282-52/+56
| | | | | Reviewed-by: 陈昊 <chenhao@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* sws/riscv: add forward-edge CFI landing padsRémi Denis-Courmont2024-07-254-0/+24
|
* sws/riscv: require B or zba explicitlyRémi Denis-Courmont2024-07-254-17/+17
|
* swscale/output: Fix integer overflows in yuv2rgba64_X_c_templateMichael Niedermayer2024-07-211-13/+13
| | | | | | | | Fixes: signed integer overflow: -1082982400 + -1068681048 cannot be represented in type 'int' Fixes: 69995/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6285740271534080 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/utils: fix leak on threaded ctx init failureNiklas Haas2024-07-141-2/+1
| | | | | | | | | | | | This count gets incremented after init succeeds, when it should be incremented after *alloc* succeeds. Otherwise, we leak the context on failure. There are no negative consequences of incrementing for allocated-but-not-initialized contexts, as the only functions that reference it will, in the worst case, simply behave as if called on allocated-but-not-initialized contexts, which is in line with expected behavior when sws_init_context() fails.
* swscale: prevent undefined behaviour in the PUTRGBA macroSean McGovern2024-07-101-2/+2
| | | | | | | | | | | For even small values of 'asrc[x]', shifting them by 24 bits or more will cause arithmetic overflow and be caught by GCC's undefined behaviour sanitizer. Ensure the values do not overflow by up-casting the bracketed expressions involving 'asrc' to uint32_t. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/x86/yuv2rgb: DetemplatizeRamiro Polla2024-07-102-193/+162
| | | | | Every function in yuv2rgb_template.c is only compiled exactly once, so detemplatize it.
* swscale: remove unconditional #define DITHER1XBPPRamiro Polla2024-07-106-34/+0
| | | | | This seems to have had an use in the past, but it is now defined unconditionally.
* swscale/swscale: Use ptrdiff_t for linesize computationsMichael Niedermayer2024-07-071-2/+2
| | | | | | | | | | This is unlikely to make a difference Fixes: CID1591896 Unintentional integer overflow Fixes: CID1591901 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/aarch64: Add argb/abgr to yuvZhao Zhili2024-07-052-21/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c : 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6 : 4786.4 abgr_to_uv_1080_neon : 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6 : 8492.1 abgr_to_uv_1920_neon : 1435.1 : 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon : 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6 : 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon : 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c : 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c : 1279.4 : 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* swscale/aarch64: Add bgra/rgba to yuvZhao Zhili2024-07-052-16/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c : 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2 : 4558.2 bgra_to_uv_1080_neon : 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2 : 8115.2 bgra_to_uv_1920_neon : 1437.7 : 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon : 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2 : 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon : 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c : 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c : 1298.7 : 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* swscale/aarch64: Add bgr24 to yuvZhao Zhili2024-07-052-32/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon : 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2 : 11554.5 bgr24_to_uv_1920_neon : 1292.5 : 1307.5 bgr24_to_uv_half_8_c : 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7 : 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c : 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon : 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2 : 3894.7 bgr24_to_y_1920_neon : 777.2 : 767.5 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* swscale/yuv2rgb/x86: remove mmx/mmxext yuv2rgb functionsRamiro Polla2024-07-043-145/+3
| | | | | These functions are either slower or barely faster than the C LUT yuv2rgb code.
* swscale/output: Avoid undefined overflow in yuv2rgb_write_full()Michael Niedermayer2024-06-261-3/+3
| | | | | | | | Fixes: signed integer overflow: -140140 * 16525 cannot be represented in type 'int' Fixes: 68859/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4516387130245120 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/output: alpha can become negative after scaling, use multiplyMichael Niedermayer2024-06-261-6/+6
| | | | | | | | Fixes: left shift of negative value -3245 Fixes: 69047/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6571511551950848 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/yuv2rgb: reindent after previous commitRamiro Polla2024-06-241-39/+39
|
* swscale/yuv2rgb: fix yuv422p input in C codeRamiro Polla2024-06-241-32/+196
| | | | | The C code was silently ignoring the second chroma line on yuv422p input.
* swscale/yuv2rgb: add macros to simplify code generationRamiro Polla2024-06-241-461/+113
|
* swscale/yuv2rgb: fix conversion for widths not aligned to 8Ramiro Polla2024-06-241-8/+93
| | | | | | | | The C code for some pixel formats (rgb555, rgb565, rgb444, and monob) was not converting the last pixels on widths not aligned to 8. NOTE: the last pixel for odd widths is still not converted for any of the pixel formats in the C code for yuv2rgb except for monob.
* swscale/aarch64: add neon {lum,chr}ConvertRangeRamiro Polla2024-06-185-1/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chrRangeFromJpeg_8_c: 29.2 chrRangeFromJpeg_8_neon: 19.5 chrRangeFromJpeg_24_c: 80.5 chrRangeFromJpeg_24_neon: 34.0 chrRangeFromJpeg_128_c: 413.7 chrRangeFromJpeg_128_neon: 156.0 chrRangeFromJpeg_144_c: 471.0 chrRangeFromJpeg_144_neon: 174.2 chrRangeFromJpeg_256_c: 842.0 chrRangeFromJpeg_256_neon: 305.5 chrRangeFromJpeg_512_c: 1699.0 chrRangeFromJpeg_512_neon: 608.0 chrRangeToJpeg_8_c: 51.7 chrRangeToJpeg_8_neon: 22.7 chrRangeToJpeg_24_c: 149.7 chrRangeToJpeg_24_neon: 38.0 chrRangeToJpeg_128_c: 761.7 chrRangeToJpeg_128_neon: 176.7 chrRangeToJpeg_144_c: 866.2 chrRangeToJpeg_144_neon: 198.7 chrRangeToJpeg_256_c: 1516.5 chrRangeToJpeg_256_neon: 348.7 chrRangeToJpeg_512_c: 3067.2 chrRangeToJpeg_512_neon: 692.7 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 17.0 lumRangeFromJpeg_24_c: 56.7 lumRangeFromJpeg_24_neon: 21.0 lumRangeFromJpeg_128_c: 294.5 lumRangeFromJpeg_128_neon: 76.7 lumRangeFromJpeg_144_c: 332.5 lumRangeFromJpeg_144_neon: 86.7 lumRangeFromJpeg_256_c: 586.0 lumRangeFromJpeg_256_neon: 152.2 lumRangeFromJpeg_512_c: 1190.0 lumRangeFromJpeg_512_neon: 298.0 lumRangeToJpeg_8_c: 31.7 lumRangeToJpeg_8_neon: 19.5 lumRangeToJpeg_24_c: 83.5 lumRangeToJpeg_24_neon: 24.2 lumRangeToJpeg_128_c: 440.5 lumRangeToJpeg_128_neon: 91.0 lumRangeToJpeg_144_c: 504.2 lumRangeToJpeg_144_neon: 101.0 lumRangeToJpeg_256_c: 879.7 lumRangeToJpeg_256_neon: 177.2 lumRangeToJpeg_512_c: 1794.2 lumRangeToJpeg_512_neon: 354.0
* swscale/x86/range_convert: add missing AVX2 preprocessor wrapperJames Almer2024-06-161-0/+2
| | | | | | Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functionsJames Almer2024-06-151-10/+10
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86: add sse2 and avx2 {lum,chr}ConvertRangeRamiro Polla2024-06-165-0/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chrRangeFromJpeg_8_c: 22.3 chrRangeFromJpeg_8_sse2: 13.3 chrRangeFromJpeg_8_avx2: 13.3 chrRangeFromJpeg_24_c: 72.8 chrRangeFromJpeg_24_sse2: 22.3 chrRangeFromJpeg_24_avx2: 17.5 chrRangeFromJpeg_128_c: 345.5 chrRangeFromJpeg_128_sse2: 106.0 chrRangeFromJpeg_128_avx2: 57.8 chrRangeFromJpeg_144_c: 380.5 chrRangeFromJpeg_144_sse2: 118.5 chrRangeFromJpeg_144_avx2: 62.3 chrRangeFromJpeg_256_c: 646.3 chrRangeFromJpeg_256_sse2: 218.8 chrRangeFromJpeg_256_avx2: 109.0 chrRangeFromJpeg_512_c: 1461.5 chrRangeFromJpeg_512_sse2: 426.5 chrRangeFromJpeg_512_avx2: 211.5 chrRangeToJpeg_8_c: 37.8 chrRangeToJpeg_8_sse2: 10.5 chrRangeToJpeg_8_avx2: 14.0 chrRangeToJpeg_24_c: 114.3 chrRangeToJpeg_24_sse2: 23.5 chrRangeToJpeg_24_avx2: 16.3 chrRangeToJpeg_128_c: 633.5 chrRangeToJpeg_128_sse2: 107.5 chrRangeToJpeg_128_avx2: 55.0 chrRangeToJpeg_144_c: 758.3 chrRangeToJpeg_144_sse2: 132.0 chrRangeToJpeg_144_avx2: 64.5 chrRangeToJpeg_256_c: 1345.0 chrRangeToJpeg_256_sse2: 218.0 chrRangeToJpeg_256_avx2: 105.3 chrRangeToJpeg_512_c: 2524.0 chrRangeToJpeg_512_sse2: 417.0 chrRangeToJpeg_512_avx2: 218.8 lumRangeFromJpeg_8_c: 11.8 lumRangeFromJpeg_8_sse2: 11.0 lumRangeFromJpeg_8_avx2: 10.3 lumRangeFromJpeg_24_c: 38.5 lumRangeFromJpeg_24_sse2: 15.5 lumRangeFromJpeg_24_avx2: 12.5 lumRangeFromJpeg_128_c: 232.3 lumRangeFromJpeg_128_sse2: 60.0 lumRangeFromJpeg_128_avx2: 26.8 lumRangeFromJpeg_144_c: 259.5 lumRangeFromJpeg_144_sse2: 65.3 lumRangeFromJpeg_144_avx2: 29.0 lumRangeFromJpeg_256_c: 464.5 lumRangeFromJpeg_256_sse2: 107.5 lumRangeFromJpeg_256_avx2: 54.0 lumRangeFromJpeg_512_c: 897.5 lumRangeFromJpeg_512_sse2: 224.5 lumRangeFromJpeg_512_avx2: 109.8 lumRangeToJpeg_8_c: 17.8 lumRangeToJpeg_8_sse2: 11.0 lumRangeToJpeg_8_avx2: 11.8 lumRangeToJpeg_24_c: 56.3 lumRangeToJpeg_24_sse2: 11.0 lumRangeToJpeg_24_avx2: 12.5 lumRangeToJpeg_128_c: 333.8 lumRangeToJpeg_128_sse2: 53.3 lumRangeToJpeg_128_avx2: 26.5 lumRangeToJpeg_144_c: 375.5 lumRangeToJpeg_144_sse2: 60.8 lumRangeToJpeg_144_avx2: 29.0 lumRangeToJpeg_256_c: 652.0 lumRangeToJpeg_256_sse2: 109.5 lumRangeToJpeg_256_avx2: 53.5 lumRangeToJpeg_512_c: 1284.3 lumRangeToJpeg_512_sse2: 218.0 lumRangeToJpeg_512_avx2: 108.3
* riscv: probe for Zbb extension at load timeRémi Denis-Courmont2024-06-112-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.
* sws/range_convert: R-V V to/from JPEGRémi Denis-Courmont2024-06-103-1/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C908 X60 chrRangeFromJpeg_8_c: 2.7 2.5 chrRangeFromJpeg_8_rvv_i32: 1.7 1.5 chrRangeFromJpeg_24_c: 7.5 6.7 chrRangeFromJpeg_24_rvv_i32: 1.7 1.5 chrRangeFromJpeg_128_c: 55.2 34.7 chrRangeFromJpeg_128_rvv_i32: 6.5 3.0 chrRangeFromJpeg_144_c: 44.0 39.2 chrRangeFromJpeg_144_rvv_i32: 7.7 4.5 chrRangeFromJpeg_256_c: 78.2 69.5 chrRangeFromJpeg_256_rvv_i32: 12.2 6.0 chrRangeFromJpeg_512_c: 172.2 138.5 chrRangeFromJpeg_512_rvv_i32: 24.5 11.7 chrRangeToJpeg_8_c: 4.7 4.2 chrRangeToJpeg_8_rvv_i32: 2.0 1.7 chrRangeToJpeg_24_c: 13.7 12.2 chrRangeToJpeg_24_rvv_i32: 2.0 1.5 chrRangeToJpeg_128_c: 72.0 63.7 chrRangeToJpeg_128_rvv_i32: 6.7 3.2 chrRangeToJpeg_144_c: 80.7 71.7 chrRangeToJpeg_144_rvv_i32: 8.5 4.7 chrRangeToJpeg_256_c: 143.2 127.2 chrRangeToJpeg_256_rvv_i32: 13.5 6.5 chrRangeToJpeg_512_c: 285.7 253.7 chrRangeToJpeg_512_rvv_i32: 27.0 13.0 lumRangeFromJpeg_8_c: 1.7 1.5 lumRangeFromJpeg_8_rvv_i32: 1.2 1.0 lumRangeFromJpeg_24_c: 4.2 3.7 lumRangeFromJpeg_24_rvv_i32: 1.2 1.0 lumRangeFromJpeg_128_c: 21.7 19.2 lumRangeFromJpeg_128_rvv_i32: 3.7 1.7 lumRangeFromJpeg_144_c: 24.7 22.0 lumRangeFromJpeg_144_rvv_i32: 4.7 2.7 lumRangeFromJpeg_256_c: 43.7 39.0 lumRangeFromJpeg_256_rvv_i32: 7.5 3.2 lumRangeFromJpeg_512_c: 87.0 77.2 lumRangeFromJpeg_512_rvv_i32: 14.5 6.7 lumRangeToJpeg_8_c: 2.7 2.2 lumRangeToJpeg_8_rvv_i32: 1.0 1.0 lumRangeToJpeg_24_c: 7.2 6.5 lumRangeToJpeg_24_rvv_i32: 1.2 1.0 lumRangeToJpeg_128_c: 37.7 33.7 lumRangeToJpeg_128_rvv_i32: 3.7 2.0 lumRangeToJpeg_144_c: 42.5 37.7 lumRangeToJpeg_144_rvv_i32: 4.7 2.7 lumRangeToJpeg_256_c: 75.0 66.7 lumRangeToJpeg_256_rvv_i32: 7.5 3.5 lumRangeToJpeg_512_c: 149.5 133.0 lumRangeToJpeg_512_rvv_i32: 14.7 7.0
* swscale/aarch64: Add rgb24 to yuv implementationZhao Zhili2024-06-113-0/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test on Apple M1: rgb24_to_uv_8_c: 0.0 rgb24_to_uv_8_neon: 0.2 rgb24_to_uv_128_c: 1.0 rgb24_to_uv_128_neon: 0.5 rgb24_to_uv_1080_c: 7.0 rgb24_to_uv_1080_neon: 5.7 rgb24_to_uv_1920_c: 12.5 rgb24_to_uv_1920_neon: 9.5 rgb24_to_uv_half_8_c: 0.2 rgb24_to_uv_half_8_neon: 0.2 rgb24_to_uv_half_128_c: 1.0 rgb24_to_uv_half_128_neon: 0.5 rgb24_to_uv_half_1080_c: 6.2 rgb24_to_uv_half_1080_neon: 3.0 rgb24_to_uv_half_1920_c: 11.2 rgb24_to_uv_half_1920_neon: 5.2 rgb24_to_y_8_c: 0.2 rgb24_to_y_8_neon: 0.0 rgb24_to_y_128_c: 0.5 rgb24_to_y_128_neon: 0.5 rgb24_to_y_1080_c: 4.7 rgb24_to_y_1080_neon: 3.2 rgb24_to_y_1920_c: 8.0 rgb24_to_y_1920_neon: 5.7 On Pixel 6: rgb24_to_uv_8_c: 30.7 rgb24_to_uv_8_neon: 56.9 rgb24_to_uv_128_c: 213.9 rgb24_to_uv_128_neon: 173.2 rgb24_to_uv_1080_c: 1649.9 rgb24_to_uv_1080_neon: 1424.4 rgb24_to_uv_1920_c: 2907.9 rgb24_to_uv_1920_neon: 2480.7 rgb24_to_uv_half_8_c: 36.2 rgb24_to_uv_half_8_neon: 33.4 rgb24_to_uv_half_128_c: 167.9 rgb24_to_uv_half_128_neon: 99.4 rgb24_to_uv_half_1080_c: 1293.9 rgb24_to_uv_half_1080_neon: 778.7 rgb24_to_uv_half_1920_c: 2292.7 rgb24_to_uv_half_1920_neon: 1328.7 rgb24_to_y_8_c: 19.7 rgb24_to_y_8_neon: 27.7 rgb24_to_y_128_c: 129.9 rgb24_to_y_128_neon: 96.7 rgb24_to_y_1080_c: 995.4 rgb24_to_y_1080_neon: 767.7 rgb24_to_y_1920_c: 1747.4 rgb24_to_y_1920_neon: 1337.2 Note both tests use clang as compiler, which has vectorization enabled by default with -O3. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
* swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2James Almer2024-06-091-0/+2
| | | | | | Fixes old yasm. Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2James Almer2024-06-091-1/+1
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103James Almer2024-06-092-68/+0
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/input: add AVX2 optimized uyvytoyuv422James Almer2024-06-092-8/+30
| | | | | | | | | uyvytoyuv422_c: 23991.8 uyvytoyuv422_sse2: 2817.8 uyvytoyuv422_avx: 2819.3 uyvytoyuv422_avx2: 1972.3 Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/input: add AVX2 optimized RGB32 to YUV functionsJames Almer2024-06-092-28/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | abgr_to_uv_8_c: 43.3 abgr_to_uv_8_sse2: 14.3 abgr_to_uv_8_avx: 15.3 abgr_to_uv_8_avx2: 18.8 abgr_to_uv_128_c: 650.3 abgr_to_uv_128_sse2: 110.8 abgr_to_uv_128_avx: 112.3 abgr_to_uv_128_avx2: 64.8 abgr_to_uv_1080_c: 5456.3 abgr_to_uv_1080_sse2: 888.8 abgr_to_uv_1080_avx: 900.8 abgr_to_uv_1080_avx2: 518.3 abgr_to_uv_1920_c: 9692.3 abgr_to_uv_1920_sse2: 1593.8 abgr_to_uv_1920_avx: 1613.3 abgr_to_uv_1920_avx2: 864.8 abgr_to_y_8_c: 23.3 abgr_to_y_8_sse2: 12.8 abgr_to_y_8_avx: 13.3 abgr_to_y_8_avx2: 17.3 abgr_to_y_128_c: 308.3 abgr_to_y_128_sse2: 67.3 abgr_to_y_128_avx: 66.8 abgr_to_y_128_avx2: 44.8 abgr_to_y_1080_c: 2371.3 abgr_to_y_1080_sse2: 512.8 abgr_to_y_1080_avx: 505.8 abgr_to_y_1080_avx2: 314.3 abgr_to_y_1920_c: 4177.3 abgr_to_y_1920_sse2: 915.8 abgr_to_y_1920_avx: 926.8 abgr_to_y_1920_avx2: 519.3 bgra_to_uv_8_c: 37.3 bgra_to_uv_8_sse2: 13.3 bgra_to_uv_8_avx: 14.8 bgra_to_uv_8_avx2: 19.8 bgra_to_uv_128_c: 563.8 bgra_to_uv_128_sse2: 111.3 bgra_to_uv_128_avx: 112.3 bgra_to_uv_128_avx2: 64.8 bgra_to_uv_1080_c: 4691.8 bgra_to_uv_1080_sse2: 893.8 bgra_to_uv_1080_avx: 899.8 bgra_to_uv_1080_avx2: 517.8 bgra_to_uv_1920_c: 8332.8 bgra_to_uv_1920_sse2: 1590.8 bgra_to_uv_1920_avx: 1605.8 bgra_to_uv_1920_avx2: 867.3 bgra_to_y_8_c: 22.3 bgra_to_y_8_sse2: 12.8 bgra_to_y_8_avx: 12.8 bgra_to_y_8_avx2: 17.3 bgra_to_y_128_c: 291.3 bgra_to_y_128_sse2: 67.8 bgra_to_y_128_avx: 69.3 bgra_to_y_128_avx2: 45.3 bgra_to_y_1080_c: 2357.3 bgra_to_y_1080_sse2: 508.3 bgra_to_y_1080_avx: 518.3 bgra_to_y_1080_avx2: 399.8 bgra_to_y_1920_c: 4202.8 bgra_to_y_1920_sse2: 906.8 bgra_to_y_1920_avx: 907.3 bgra_to_y_1920_avx2: 526.3 Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/input: add AVX2 optimized RGB24 to YUV functionsJames Almer2024-06-092-28/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rgb24_to_uv_8_c: 39.3 rgb24_to_uv_8_sse2: 14.3 rgb24_to_uv_8_ssse3: 13.3 rgb24_to_uv_8_avx: 12.8 rgb24_to_uv_8_avx2: 14.3 rgb24_to_uv_128_c: 582.8 rgb24_to_uv_128_sse2: 127.3 rgb24_to_uv_128_ssse3: 107.3 rgb24_to_uv_128_avx: 111.3 rgb24_to_uv_128_avx2: 62.3 rgb24_to_uv_1080_c: 4981.3 rgb24_to_uv_1080_sse2: 1048.3 rgb24_to_uv_1080_ssse3: 876.8 rgb24_to_uv_1080_avx: 887.8 rgb24_to_uv_1080_avx2: 492.3 rgb24_to_uv_1280_c: 5906.8 rgb24_to_uv_1280_sse2: 1263.3 rgb24_to_uv_1280_ssse3: 1048.3 rgb24_to_uv_1280_avx: 1045.8 rgb24_to_uv_1280_avx2: 579.8 rgb24_to_uv_1920_c: 8665.3 rgb24_to_uv_1920_sse2: 1888.8 rgb24_to_uv_1920_ssse3: 1571.8 rgb24_to_uv_1920_avx: 1558.8 rgb24_to_uv_1920_avx2: 869.3 rgb24_to_y_8_c: 20.3 rgb24_to_y_8_sse2: 11.8 rgb24_to_y_8_ssse3: 10.3 rgb24_to_y_8_avx: 10.3 rgb24_to_y_8_avx2: 10.8 rgb24_to_y_128_c: 284.8 rgb24_to_y_128_sse2: 83.3 rgb24_to_y_128_ssse3: 66.8 rgb24_to_y_128_avx: 64.8 rgb24_to_y_128_avx2: 39.3 rgb24_to_y_1080_c: 2451.3 rgb24_to_y_1080_sse2: 696.3 rgb24_to_y_1080_ssse3: 516.8 rgb24_to_y_1080_avx: 518.8 rgb24_to_y_1080_avx2: 301.8 rgb24_to_y_1280_c: 2892.8 rgb24_to_y_1280_sse2: 816.8 rgb24_to_y_1280_ssse3: 623.3 rgb24_to_y_1280_avx: 616.3 rgb24_to_y_1280_avx2: 350.8 rgb24_to_y_1920_c: 4338.8 rgb24_to_y_1920_sse2: 1210.8 rgb24_to_y_1920_ssse3: 928.3 rgb24_to_y_1920_avx: 920.3 rgb24_to_y_1920_avx2: 534.8 Signed-off-by: James Almer <jamrial@gmail.com>
* sws/input: R-V V 32-bit RGB to halved UVRémi Denis-Courmont2024-06-092-4/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | T-Head C908: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.5 abgr_to_uv_half_128_c: 44.0 abgr_to_uv_half_128_rvv_i32: 13.0 abgr_to_uv_half_1080_c: 245.0 abgr_to_uv_half_1080_rvv_i32: 107.2 abgr_to_uv_half_1920_c: 406.2 abgr_to_uv_half_1920_rvv_i32: 188.7 bgra_to_uv_half_8_c: 2.2 bgra_to_uv_half_8_rvv_i32: 3.5 bgra_to_uv_half_128_c: 26.5 bgra_to_uv_half_128_rvv_i32: 13.0 bgra_to_uv_half_1080_c: 219.7 bgra_to_uv_half_1080_rvv_i32: 107.0 bgra_to_uv_half_1920_c: 406.7 bgra_to_uv_half_1920_rvv_i32: 188.7 SpacemiT X60: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.0 abgr_to_uv_half_128_c: 28.2 abgr_to_uv_half_128_rvv_i32: 5.7 abgr_to_uv_half_1080_c: 235.5 abgr_to_uv_half_1080_rvv_i32: 47.7 abgr_to_uv_half_1920_c: 418.2 abgr_to_uv_half_1920_rvv_i32: 84.0 bgra_to_uv_half_8_c: 2.0 bgra_to_uv_half_8_rvv_i32: 3.0 bgra_to_uv_half_128_c: 23.7 bgra_to_uv_half_128_rvv_i32: 5.7 bgra_to_uv_half_1080_c: 195.5 bgra_to_uv_half_1080_rvv_i32: 47.7 bgra_to_uv_half_1920_c: 346.5 bgra_to_uv_half_1920_rvv_i32: 84.0
* sws/input: R-V V 32-bit RGB to UVRémi Denis-Courmont2024-06-092-0/+60
|
* sws/input: R-V V 32-bit RGB to YRémi Denis-Courmont2024-06-092-14/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | T-Head C908: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.2 abgr_to_y_128_c: 37.0 abgr_to_y_128_rvv_i32: 8.5 abgr_to_y_1080_c: 327.0 abgr_to_y_1080_rvv_i32: 69.5 abgr_to_y_1920_c: 552.0 abgr_to_y_1920_rvv_i32: 122.2 bgra_to_y_8_c: 2.5 bgra_to_y_8_rvv_i32: 2.2 bgra_to_y_128_c: 37.2 bgra_to_y_128_rvv_i32: 8.5 bgra_to_y_1080_c: 310.2 bgra_to_y_1080_rvv_i32: 69.5 bgra_to_y_1920_c: 568.2 bgra_to_y_1920_rvv_i32: 122.5 SpacemiT X60: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.0 abgr_to_y_128_c: 33.0 abgr_to_y_128_rvv_i32: 3.7 abgr_to_y_1080_c: 276.0 abgr_to_y_1080_rvv_i32: 31.5 abgr_to_y_1920_c: 493.7 abgr_to_y_1920_rvv_i32: 55.5 bgra_to_y_8_c: 2.2 bgra_to_y_8_rvv_i32: 2.0 bgra_to_y_128_c: 33.0 bgra_to_y_128_rvv_i32: 3.7 bgra_to_y_1080_c: 276.0 bgra_to_y_1080_rvv_i32: 31.5 bgra_to_y_1920_c: 490.7 bgra_to_y_1920_rvv_i32: 55.5
* swscale/x86/rgb2rgb: DetemplatizeAndreas Rheinhardt2024-06-092-2334/+2253
| | | | | | | | Every function in rgb2rgb_template.c is only compiled exactly once; there is no overlap at all between the MMXEXT and the SSE2 functions, so detemplatize it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12Andreas Rheinhardt2024-06-091-104/+0
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASMAndreas Rheinhardt2024-06-092-42/+36
| | | | | | | The SSE2 and AVX versions of deinterleaveBytes are external ASM. Move them out of the inline ASM template. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb_template: Remove unnecessary SFENCEAndreas Rheinhardt2024-06-091-4/+0
| | | | | | | The ff_nv12ToUV_* functions don't use non-temporal stores at all. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* sws/input: R-V V rgb24ToUV_half and bgr24ToUV_halfRémi Denis-Courmont2024-06-082-2/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | T-Head C908: rgb24_to_uv_half_4_c: 2.0 rgb24_to_uv_half_4_rvv_i32: 3.5 rgb24_to_uv_half_64_c: 27.0 rgb24_to_uv_half_64_rvv_i32: 12.5 rgb24_to_uv_half_540_c: 223.7 rgb24_to_uv_half_540_rvv_i32: 105.2 rgb24_to_uv_half_640_c: 265.5 rgb24_to_uv_half_640_rvv_i32: 123.7 rgb24_to_uv_half_960_c: 414.5 rgb24_to_uv_half_960_rvv_i32: 249.5 SpacemiT X60: rgb24_to_uv_half_4_c: 1.7 rgb24_to_uv_half_4_rvv_i32: 4.2 rgb24_to_uv_half_64_c: 24.0 rgb24_to_uv_half_64_rvv_i32: 8.7 rgb24_to_uv_half_540_c: 199.2 rgb24_to_uv_half_540_rvv_i32: 72.5 rgb24_to_uv_half_640_c: 235.7 rgb24_to_uv_half_640_rvv_i32: 85.2 rgb24_to_uv_half_960_c: 353.5 rgb24_to_uv_half_960_rvv_i32: 127.5
* sws/input: R-V V rgb24ToUV and bgr24ToUVRémi Denis-Courmont2024-06-082-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | T-Head C908: rgb24_to_uv_8_c: 2.7 rgb24_to_uv_8_rvv_i32: 3.2 rgb24_to_uv_128_c: 41.0 rgb24_to_uv_128_rvv_i32: 12.7 rgb24_to_uv_1080_c: 342.5 rgb24_to_uv_1080_rvv_i32: 105.7 rgb24_to_uv_1280_c: 406.0 rgb24_to_uv_1280_rvv_i32: 124.2 rgb24_to_uv_1920_c: 626.0 rgb24_to_uv_1920_rvv_i32: 186.0 SpacemiT X60: rgb24_to_uv_8_c: 2.5 rgb24_to_uv_8_rvv_i32: 3.0 rgb24_to_uv_128_c: 36.5 rgb24_to_uv_128_rvv_i32: 5.7 rgb24_to_uv_1080_c: 304.2 rgb24_to_uv_1080_rvv_i32: 49.0 rgb24_to_uv_1280_c: 360.5 rgb24_to_uv_1280_rvv_i32: 57.5 rgb24_to_uv_1920_c: 540.7 rgb24_to_uv_1920_rvv_i32: 86.2
* sws/input: R-V V rgb24ToY & bgr24ToYRémi Denis-Courmont2024-06-085-2/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | T-Head C908: rgb24_to_y_8_c: 2.0 rgb24_to_y_8_rvv_i32: 2.7 rgb24_to_y_128_c: 26.2 rgb24_to_y_128_rvv_i32: 9.2 rgb24_to_y_1080_c: 219.5 rgb24_to_y_1080_rvv_i32: 76.2 rgb24_to_y_1280_c: 276.2 rgb24_to_y_1280_rvv_i32: 89.7 rgb24_to_y_1920_c: 389.7 rgb24_to_y_1920_rvv_i32: 134.2 SpacemiT X60: rgb24_to_y_8_c: 1.7 rgb24_to_y_8_rvv_i32: 2.2 rgb24_to_y_128_c: 23.2 rgb24_to_y_128_rvv_i32: 4.2 rgb24_to_y_1080_c: 195.0 rgb24_to_y_1080_rvv_i32: 33.7 rgb24_to_y_1280_c: 231.0 rgb24_to_y_1280_rvv_i32: 40.0 rgb24_to_y_1920_c: 346.2 rgb24_to_y_1920_rvv_i32: 59.7
* libswscale/x86/yuv_2_rgb: fix some commentsRamiro Polla2024-06-071-4/+4
|
* swscale: [loongarch] Fix undeclared functions prob.Shiyou Yin2024-05-311-0/+2
| | | | | | | Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared. Reviewed-by: 金波 <jinbo@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/yuv2rgb: Use 64bit for brightness computationMichael Niedermayer2024-05-281-1/+1
| | | | | | | | This will not overflow for normal values Fixes: CID1500280 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASEMichael Niedermayer2024-05-281-7/+7
| | | | | | | related: CID1497114 Missing break in switch Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* swscale/riscv: explicitly require Zbb for MINRémi Denis-Courmont2024-05-101-2/+2
|