aboutsummaryrefslogtreecommitdiffstats
path: root/libswscale/x86/rgb2rgb.c
Commit message (Collapse)AuthorAgeFilesLines
* swscale/x86/rgb2rgb: Deduplicate ASM constantsAndreas Rheinhardt2025-04-131-2/+2
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422Shreesh Adiga2025-02-181-0/+6
| | | | | | | | | | | | | | | | | | | | | The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb. Instead of loading the vectors with interleaved lanes as done in AVX2 version, normal load is used. At the end of packuswb, for U and V, an extra permute operation is done to get the required layout. AMD 7950x Zen 4 benchmark data: uyvytoyuv422_c: 29105.0 ( 1.00x) uyvytoyuv422_sse2: 3888.0 ( 7.49x) uyvytoyuv422_avx: 3374.2 ( 8.63x) uyvytoyuv422_avx2: 2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytesShreesh Adiga2025-02-031-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | On a AMD 7950x Zen 4 shuffle_bytes_0321_c: 56.5 ( 1.00x) shuffle_bytes_0321_ssse3: 15.2 ( 3.70x) shuffle_bytes_0321_avx2: 10.2 ( 5.51x) shuffle_bytes_0321_avx512icl: 9.2 ( 6.11x) shuffle_bytes_1230_c: 84.5 ( 1.00x) shuffle_bytes_1230_ssse3: 14.2 ( 5.93x) shuffle_bytes_1230_avx2: 15.2 ( 5.54x) shuffle_bytes_1230_avx512icl: 11.2 ( 7.51x) shuffle_bytes_2103_c: 48.5 ( 1.00x) shuffle_bytes_2103_ssse3: 21.2 ( 2.28x) shuffle_bytes_2103_avx2: 13.8 ( 3.53x) shuffle_bytes_2103_avx512icl: 9.2 ( 5.24x) shuffle_bytes_3012_c: 84.5 ( 1.00x) shuffle_bytes_3012_ssse3: 14.2 ( 5.93x) shuffle_bytes_3012_avx2: 16.2 ( 5.20x) shuffle_bytes_3012_avx512icl: 10.2 ( 8.24x) shuffle_bytes_3210_c: 89.2 ( 1.00x) shuffle_bytes_3210_ssse3: 24.2 ( 3.68x) shuffle_bytes_3210_avx2: 16.2 ( 5.49x) shuffle_bytes_3210_avx512icl: 9.2 ( 9.65x) Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
* swscale/x86/swscale: Move some constants to rgb2rgb.cAndreas Rheinhardt2025-02-021-10/+10
| | | | | | | | | ff_w1111 and ff_bgr2(Y|UV)Offset are only used there (and only on x86-32 since caaec2ea957290941eecfe5d87baf5c0a500b450). Also make them static. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb: add optimized versions of the remaining shuffle_bytes ↵James Almer2024-11-021-0/+16
| | | | | | functions Signed-off-by: James Almer <jamrial@gmail.com>
* swscale: Fix aarch64 and i386 compilation failuresMartin Storsjö2024-10-081-1/+1
| | | | | | | | | | | | | | | | | This unbreaks builds after c1a0e657638f7007dcc807a2d985c22631fcd6d3, which broke with errors like src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types] 66 | ff_rgb24toyv12 = rgb24toyv12; | ^ ~~~~~~~~~~~ and src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types] 213 | c->convert_unscaled = nv24_to_yuv420p_neon_wrapper; | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Martin Storsjö <martin@martin.st>
* swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64Ramiro Polla2024-09-061-3/+3
| | | | | | | | | | | | | | The mmxext implementation is slower than the C version in x86_64. m32 m64 rgb24toyv12_16_200_c: 24942.7 14812.6 rgb24toyv12_16_200_mmxext: 17857.2 ( 1.40x) 17400.4 ( 0.85x) rgb24toyv12_128_60_c: 56892.9 35616.9 rgb24toyv12_128_60_mmxext: 40730.9 ( 1.40x) 39610.4 ( 0.90x) rgb24toyv12_512_16_c: 58402.7 37209.4 rgb24toyv12_512_16_mmxext: 44842.4 ( 1.30x) 41136.2 ( 0.90x) rgb24toyv12_1920_4_c: 54827.4 34737.4 rgb24toyv12_1920_4_mmxext: 51169.9 ( 1.07x) 34818.9 ( 1.00x)
* swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffersRamiro Polla2024-09-061-1/+6
|
* swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2James Almer2024-06-091-1/+1
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103James Almer2024-06-091-4/+0
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/input: add AVX2 optimized uyvytoyuv422James Almer2024-06-091-0/+6
| | | | | | | | | uyvytoyuv422_c: 23991.8 uyvytoyuv422_sse2: 2817.8 uyvytoyuv422_avx: 2819.3 uyvytoyuv422_avx2: 1972.3 Signed-off-by: James Almer <jamrial@gmail.com>
* swscale/x86/rgb2rgb: DetemplatizeAndreas Rheinhardt2024-06-091-19/+2253
| | | | | | | | Every function in rgb2rgb_template.c is only compiled exactly once; there is no overlap at all between the MMXEXT and the SSE2 functions, so detemplatize it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASMAndreas Rheinhardt2024-06-091-12/+36
| | | | | | | The SSE2 and AVX versions of deinterleaveBytes are external ASM. Move them out of the inline ASM template. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* swscale/x86/rgb2rgb: Remove obsolete MMX, 3dnow functionsAndreas Rheinhardt2022-06-221-26/+0
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2) for x64. So given that the only systems that benefit from these functions are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* libswscale/x86/rgb2rgb: add shuffle_bytes avx2Wu Jianhua2021-10-151-2/+15
| | | | | | | | Performance data(Less is better): shuffle_bytes_ssse3 3.64654 shuffle_bytes_avx2 0.94288 Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
* swscale/x86/rgb2rgb: Remove unused ASM constantsAndreas Rheinhardt2021-02-241-8/+0
| | | | | | | | | mask24hh etc. are unused since f099fbf5f3ac1d6b3753fc8dfda6558572111fbd, mask32b and mask32r since 296609f859a587575b91fe9e9691f2707d6e8136, mask32g since b38d487466e68bd6baf2889017d2a751831560f0 and mask32 since f8a138be5257f751ef7d3c6b7ab534c0434e90e7. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
* lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bumpAnton Khirnov2021-01-011-0/+2
| | | | They are not properly namespaced and not intended for public use.
* swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove ↵Martin Vignali2018-10-131-0/+4
| | | | inline asm version
* swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422Martin Vignali2018-04-221-0/+19
| | | | and checkasm test
* swscale/rgb : add X86 SIMD (SSSE3), for shuffle_bytes_1230, ↵Martin Vignali2018-03-241-0/+6
| | | | shuffle_bytes_3012, shuffle_bytes_3210
* swscale/rgb : add X86 SIMD (SSSE3) for shuffle_bytes_2103 and shuffle_bytes_0321Martin Vignali2018-03-241-1/+9
|
* Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'Hendrik Leppkes2016-06-261-1/+0
|\ | | | | | | | | | | | | * commit 'dc40a70c5755bccfb1a1349639943e1f408bea50': Drop unnecessary libavutil/x86/asm.h #includes Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
| * Drop unnecessary libavutil/x86/asm.h #includesDiego Biurrun2016-05-281-1/+0
| |
| * swscale/x86/rgb2rgb: add support for AVXMichael Niedermayer2014-01-211-0/+11
| | | | | | | | | | | | This does not yet include any actual AVX code Signed-off-by: Anton Khirnov <anton@khirnov.net>
| * swscale: x86: Consistently use lowercase function name suffixesDiego Biurrun2013-11-221-8/+8
| |
* | Add missing external declarations.Matt Oliver2014-03-171-0/+4
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | swscale/x86/rgb2rgb: Make sure COMPILE_TEMPLATE_AVX is definedMichael Niedermayer2013-12-141-0/+1
| | | | | | | | | | Found-by: iive Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | swscale/x86/rgb2rgb: change cpu optim identifiers to lower caseMichael Niedermayer2013-11-191-10/+10
| | | | | | | | | | | | | | This makes the code more similar to the other optims and allows us to use the same macros to build function names Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | swscale/x86/rgb2rgb: extend framework to also include AVXMichael Niedermayer2013-11-191-0/+11
| | | | | | | | | | | | This does not yet include any actual AVX code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | sws/x86: remove 8bit rgb2yuv coefficient case for rgb24toyv12 special converterMichael Niedermayer2013-04-151-1/+0
| | | | | | | | | | | | | | This simplifies the code and improves quality at the expense of a slight slowdown of a rarely used function (no fate test uses it). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415'Michael Niedermayer2012-11-011-2/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415': x86: h264_chromamc_10bit: drop pointless PAVG %define x86: mmx2 ---> mmxext in function names swscale: do not forget to swap data in formats with different endianness Conflicts: libavcodec/x86/dsputil_mmx.c libavfilter/x86/gradfun.c libswscale/input.c libswscale/utils.c libswscale/x86/swscale.c tests/ref/lavfi/pixfmts_scale Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: mmx2 ---> mmxext in function namesDiego Biurrun2012-10-311-2/+2
| |
* | Merge commit '652f5185945c8405fc57aed353286858df8d066f'Michael Niedermayer2012-10-311-3/+3
|\| | | | | | | | | | | | | | | | | | | * commit '652f5185945c8405fc57aed353286858df8d066f': x86: mmx2 ---> mmxext in comments and messages Conflicts: libswscale/x86/swscale_template.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: mmx2 ---> mmxext in comments and messagesDiego Biurrun2012-10-311-3/+3
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-09-091-4/+5
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: swscale: Provide the right alignment for external mmx asm x86: Replace checks for CPU extensions and flags by convenience macros configure: msvc: fix/simplify setting of flags for hostcc x86: mlpdsp: mlp_filter_channel_x86 requires inline asm Conflicts: libavcodec/x86/fft_init.c libavcodec/x86/h264_intrapred_init.c libavcodec/x86/h264dsp_init.c libavcodec/x86/mpegaudiodec.c libavcodec/x86/proresdsp_init.c libavutil/x86/float_dsp_init.c libswscale/utils.c libswscale/x86/swscale.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Replace checks for CPU extensions and flags by convenience macrosDiego Biurrun2012-09-081-4/+5
| | | | | | | | | | This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-091-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: mpegvideo: reduce excessive inlining of mpeg_motion() mpegvideo: convert mpegvideo_common.h to a .c file build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO Move MASK_ABS macro to libavcodec/mathops.h x86: move MANGLE() and related macros to libavutil/x86/asm.h x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h aacdec: Don't fall back to the old output configuration when no old configuration is present. rtmp: Add message tracking rtsp: Support mpegts in raw udp packets rtsp: Support receiving plain data over UDP without any RTP encapsulation rtpdec: Remove an unused include rtpenc: Remove an av_abort() that depends on user-supplied data vsrc_movie: discourage its use with avconv. avconv: allow no input files. avconv: prevent invalid reads in transcode_init() avconv: rename OutputStream.is_past_recording_time to finished. Conflicts: configure doc/filters.texi ffmpeg.c ffmpeg.h libavcodec/Makefile libavcodec/aacdec.c libavcodec/mpegvideo.c libavformat/version.h Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.hMans Rullgard2012-08-091-1/+1
| | | | | | | | | | | | | | This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-041-6/+6
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: lavr: fix handling of custom mix matrices fate: force pix_fmt in lagarith-rgb32 test fate: add tests for lagarith lossless video codec. ARMv6: vp8: fix stack allocation with Apple's assembler ARM: vp56: allow inline asm to build with clang fft: 3dnow: fix register name typo in DECL_IMDCT macro x86: dct32: port to cpuflags x86: build: replace mmx2 by mmxext Revert "wmapro: prevent division by zero when sample rate is unspecified" wmapro: prevent division by zero when sample rate is unspecified lagarith: fix color plane inversion for YUY2 output. lagarith: pad RGB buffer by 1 byte. dsputil: make add_hfyu_left_prediction_sse4() support unaligned src. Conflicts: doc/APIchanges libavcodec/lagarith.c libavfilter/x86/gradfun.c libavutil/cpu.h libavutil/version.h libswscale/utils.c libswscale/version.h libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: build: replace mmx2 by mmxextDiego Biurrun2012-08-031-6/+6
| | | | | | | | | | | | | | Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-231-1/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: v410dec: Implement explode mode support zerocodec: fix direct rendering. wav: init st to NULL to avoid a false-positive warning. wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit h264: refactor NAL decode loop RTMPTE protocol support RTMPE protocol support rtmp: Add ff_rtmp_calc_digest_pos() rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global swscale: add missing HAVE_INLINE_ASM check. lavfi: place x86 inline assembly under HAVE_INLINE_ASM. vc1: Add a test for interlaced field pictures swscale: Mark all init functions as av_cold swscale: x86: Drop pointless _mmx suffix from filenames lavf: use conditional notation for default codec in muxer declarations. swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM. dsputil: ppc: cosmetics: pretty-print dsputil: x86: add SHUFFLE_MASK_W macro configure: respect CC_O setting in check_cc Conflicts: Changelog configure libavcodec/v410dec.c libavcodec/zerocodec.c libavformat/asfenc.c libavformat/version.h libswscale/utils.c libswscale/x86/swscale.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * swscale: Mark all init functions as av_coldDiego Biurrun2012-07-231-1/+2
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-221-0/+6
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: swscale: Place inline assembly code under appropriate #ifdefs rtsp: remove terminal comma in FF_RTP_FLAG_OPTS macro. configure: Remove redundant RTMPT/RTMPTS dependencies configure: add filtering of host cflags/ldflags configure: initialise all flag filters at the same place configure: add filtering of linker flags configure: name some variables more consistently configure: remove filter_cppflags configure: set icc_version where it is needed mpegenc: remove disabled code Conflicts: configure libavformat/movenc.c libswscale/x86/swscale_mmx.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: swscale: Place inline assembly code under appropriate #ifdefsRonald S. Bultje2012-07-211-0/+6
| | | | | | | | | | | | Fixes compilation for compilers that do not support gcc inline assembly. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | Use more accurate conversion for rgb15/16 to rgb24/32 (C/MMX).Themaister2011-11-091-0/+3
| | | | | | | | | | | | Fate update by michael. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2011-06-161-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: ac3enc: use correct alignment and length in channel coupling dsp functions. ffmpeg: don't abuse a global for passing framerate from input to output ffmpeg: don't abuse a global for passing channels from input to output ffmpeg: don't abuse a global for passing samplerate from input to output ARM: update ff_h264_idct8_add4_neon for 4:4:4 changes swscale: use SwsContext for av_log when available swscale: Remove HAVE_MMX from files that are only compiled with MMX enabled. swscale: Fix compilation with --disable-mmx2. Conflicts: ffmpeg.c libswscale/utils.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * swscale: Remove HAVE_MMX from files that are only compiled with MMX enabled.Diego Biurrun2011-06-151-1/+1
| |
| * rgb2rgb: remove duplicate mmx/mmx2/3dnow/sse2 functions.Ronald S. Bultje2011-05-261-1/+1
| | | | | | | | | | | | | | Many functions have such a prefix, but do not actually use any instructions or features from that set, thus giving the false impression that swscale is highly optimized for a particular system, whereas in reality it is not.
* | rgb2rgb: remove duplicate mmx/mmx2/3dnow/sse2 functions.Ronald S. Bultje2011-05-281-1/+1
| | | | | | | | | | | | | | Many functions have such a prefix, but do not actually use any instructions or features from that set, thus giving the false impression that swscale is highly optimized for a particular system, whereas in reality it is not.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2011-05-251-10/+11
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (22 commits) configure: enable memalign_hack automatically when needed swscale: unbreak the build on non-x86 systems. swscale: remove if(bitexact) branch from functions. swscale: remove if(canMMX2BeUsed) conditional. swscale: remove swScale_{c,MMX,MMX2} duplication. swscale: use emms_c(). Move emms_c() from libavcodec to libavutil. tiff: set palette in the context when specified in TIFF_PAL tag rtsp: use strtoul to parse rtptime and seq values. pgssubdec: fix incorrect colors. dvdsubdec: fix incorrect colors. ape: Allow demuxing of files with metadata tags. swscale: remove dead macro WRITEBGR24OLD. swscale: remove AMD3DNOW "optimizations". swscale: remove duplicate code in ppc/ subdirectory. swscale: remove duplicated x86/ functions. swscale: force --enable-runtime-cpudetect and remove SWS_CPU_CAPS_*. vsrc_buffer.h: add file doxy vsrc_buffer: tweak error message in init() msmpeg4: reindent. ... Merged-by: Michael Niedermayer <michaelni@gmx.at>