| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The scalar loop is replaced with masked AVX512 instructions.
For extracting the Y from UYVY, vperm2b is used instead of
various AND and packuswb.
Instead of loading the vectors with interleaved lanes as done
in AVX2 version, normal load is used. At the end of packuswb,
for U and V, an extra permute operation is done to get the
required layout.
AMD 7950x Zen 4 benchmark data:
uyvytoyuv422_c: 29105.0 ( 1.00x)
uyvytoyuv422_sse2: 3888.0 ( 7.49x)
uyvytoyuv422_avx: 3374.2 ( 8.63x)
uyvytoyuv422_avx2: 2649.8 (10.98x)
uyvytoyuv422_avx512icl: 1615.0 (18.02x)
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a AMD 7950x Zen 4
shuffle_bytes_0321_c: 56.5 ( 1.00x)
shuffle_bytes_0321_ssse3: 15.2 ( 3.70x)
shuffle_bytes_0321_avx2: 10.2 ( 5.51x)
shuffle_bytes_0321_avx512icl: 9.2 ( 6.11x)
shuffle_bytes_1230_c: 84.5 ( 1.00x)
shuffle_bytes_1230_ssse3: 14.2 ( 5.93x)
shuffle_bytes_1230_avx2: 15.2 ( 5.54x)
shuffle_bytes_1230_avx512icl: 11.2 ( 7.51x)
shuffle_bytes_2103_c: 48.5 ( 1.00x)
shuffle_bytes_2103_ssse3: 21.2 ( 2.28x)
shuffle_bytes_2103_avx2: 13.8 ( 3.53x)
shuffle_bytes_2103_avx512icl: 9.2 ( 5.24x)
shuffle_bytes_3012_c: 84.5 ( 1.00x)
shuffle_bytes_3012_ssse3: 14.2 ( 5.93x)
shuffle_bytes_3012_avx2: 16.2 ( 5.20x)
shuffle_bytes_3012_avx512icl: 10.2 ( 8.24x)
shuffle_bytes_3210_c: 89.2 ( 1.00x)
shuffle_bytes_3210_ssse3: 24.2 ( 3.68x)
shuffle_bytes_3210_avx2: 16.2 ( 5.49x)
shuffle_bytes_3210_avx512icl: 9.2 ( 9.65x)
Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
|
|
|
|
|
|
|
|
|
| |
ff_w1111 and ff_bgr2(Y|UV)Offset are only used there
(and only on x86-32 since caaec2ea957290941eecfe5d87baf5c0a500b450).
Also make them static.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
| |
functions
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This unbreaks builds after c1a0e657638f7007dcc807a2d985c22631fcd6d3,
which broke with errors like
src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types]
66 | ff_rgb24toyv12 = rgb24toyv12;
| ^ ~~~~~~~~~~~
and
src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types]
213 | c->convert_unscaled = nv24_to_yuv420p_neon_wrapper;
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mmxext implementation is slower than the C version in x86_64.
m32 m64
rgb24toyv12_16_200_c: 24942.7 14812.6
rgb24toyv12_16_200_mmxext: 17857.2 ( 1.40x) 17400.4 ( 0.85x)
rgb24toyv12_128_60_c: 56892.9 35616.9
rgb24toyv12_128_60_mmxext: 40730.9 ( 1.40x) 39610.4 ( 0.90x)
rgb24toyv12_512_16_c: 58402.7 37209.4
rgb24toyv12_512_16_mmxext: 44842.4 ( 1.30x) 41136.2 ( 0.90x)
rgb24toyv12_1920_4_c: 54827.4 34737.4
rgb24toyv12_1920_4_mmxext: 51169.9 ( 1.07x) 34818.9 ( 1.00x)
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
| |
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
| |
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
| |
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
|
|
|
| |
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2) for x64. So given that the only systems that
benefit from these functions are truely ancient 32bit x86s
they are removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
|
|
|
|
|
|
|
|
| |
Performance data(Less is better):
shuffle_bytes_ssse3 3.64654
shuffle_bytes_avx2 0.94288
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
|
|
|
|
|
|
|
|
|
| |
mask24hh etc. are unused since f099fbf5f3ac1d6b3753fc8dfda6558572111fbd,
mask32b and mask32r since 296609f859a587575b91fe9e9691f2707d6e8136,
mask32g since b38d487466e68bd6baf2889017d2a751831560f0 and mask32 since
f8a138be5257f751ef7d3c6b7ab534c0434e90e7.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
|
|
|
|
| |
They are not properly namespaced and not intended for public use.
|
|
|
|
| |
inline asm version
|
|
|
|
| |
and checkasm test
|
|
|
|
| |
shuffle_bytes_3012, shuffle_bytes_3210
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50':
Drop unnecessary libavutil/x86/asm.h #includes
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
|
| | |
|
| |
| |
| |
| |
| |
| | |
This does not yet include any actual AVX code
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Found-by: iive
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This makes the code more similar to the other optims and allows us
to use the same macros to build function names
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
This does not yet include any actual AVX code
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This simplifies the code and improves quality at the expense of a slight
slowdown of a rarely used function (no fate test uses it).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415':
x86: h264_chromamc_10bit: drop pointless PAVG %define
x86: mmx2 ---> mmxext in function names
swscale: do not forget to swap data in formats with different endianness
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/gradfun.c
libswscale/input.c
libswscale/utils.c
libswscale/x86/swscale.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '652f5185945c8405fc57aed353286858df8d066f':
x86: mmx2 ---> mmxext in comments and messages
Conflicts:
libswscale/x86/swscale_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
swscale: Provide the right alignment for external mmx asm
x86: Replace checks for CPU extensions and flags by convenience macros
configure: msvc: fix/simplify setting of flags for hostcc
x86: mlpdsp: mlp_filter_channel_x86 requires inline asm
Conflicts:
libavcodec/x86/fft_init.c
libavcodec/x86/h264_intrapred_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/mpegaudiodec.c
libavcodec/x86/proresdsp_init.c
libavutil/x86/float_dsp_init.c
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
This separates code relying on inline from that relying on external
assembly and fixes instances where the coalesced check was incorrect.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
mpegvideo: reduce excessive inlining of mpeg_motion()
mpegvideo: convert mpegvideo_common.h to a .c file
build: factor out mpegvideo.o dependencies to CONFIG_MPEGVIDEO
Move MASK_ABS macro to libavcodec/mathops.h
x86: move MANGLE() and related macros to libavutil/x86/asm.h
x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h
aacdec: Don't fall back to the old output configuration when no old configuration is present.
rtmp: Add message tracking
rtsp: Support mpegts in raw udp packets
rtsp: Support receiving plain data over UDP without any RTP encapsulation
rtpdec: Remove an unused include
rtpenc: Remove an av_abort() that depends on user-supplied data
vsrc_movie: discourage its use with avconv.
avconv: allow no input files.
avconv: prevent invalid reads in transcode_init()
avconv: rename OutputStream.is_past_recording_time to finished.
Conflicts:
configure
doc/filters.texi
ffmpeg.c
ffmpeg.h
libavcodec/Makefile
libavcodec/aacdec.c
libavcodec/mpegvideo.c
libavformat/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
This puts x86-specific things in the x86/ subdirectory where they
belong.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
lavr: fix handling of custom mix matrices
fate: force pix_fmt in lagarith-rgb32 test
fate: add tests for lagarith lossless video codec.
ARMv6: vp8: fix stack allocation with Apple's assembler
ARM: vp56: allow inline asm to build with clang
fft: 3dnow: fix register name typo in DECL_IMDCT macro
x86: dct32: port to cpuflags
x86: build: replace mmx2 by mmxext
Revert "wmapro: prevent division by zero when sample rate is unspecified"
wmapro: prevent division by zero when sample rate is unspecified
lagarith: fix color plane inversion for YUY2 output.
lagarith: pad RGB buffer by 1 byte.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
Conflicts:
doc/APIchanges
libavcodec/lagarith.c
libavfilter/x86/gradfun.c
libavutil/cpu.h
libavutil/version.h
libswscale/utils.c
libswscale/version.h
libswscale/x86/yuv2rgb.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
v410dec: Implement explode mode support
zerocodec: fix direct rendering.
wav: init st to NULL to avoid a false-positive warning.
wavpack: set bits_per_raw_sample for S32 samples to properly identify 24-bit
h264: refactor NAL decode loop
RTMPTE protocol support
RTMPE protocol support
rtmp: Add ff_rtmp_calc_digest_pos()
rtmp: Rename rtmp_calc_digest to ff_rtmp_calc_digest and make it global
swscale: add missing HAVE_INLINE_ASM check.
lavfi: place x86 inline assembly under HAVE_INLINE_ASM.
vc1: Add a test for interlaced field pictures
swscale: Mark all init functions as av_cold
swscale: x86: Drop pointless _mmx suffix from filenames
lavf: use conditional notation for default codec in muxer declarations.
swscale: place inline assembly bilinear scaler under HAVE_INLINE_ASM.
dsputil: ppc: cosmetics: pretty-print
dsputil: x86: add SHUFFLE_MASK_W macro
configure: respect CC_O setting in check_cc
Conflicts:
Changelog
configure
libavcodec/v410dec.c
libavcodec/zerocodec.c
libavformat/asfenc.c
libavformat/version.h
libswscale/utils.c
libswscale/x86/swscale.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: swscale: Place inline assembly code under appropriate #ifdefs
rtsp: remove terminal comma in FF_RTP_FLAG_OPTS macro.
configure: Remove redundant RTMPT/RTMPTS dependencies
configure: add filtering of host cflags/ldflags
configure: initialise all flag filters at the same place
configure: add filtering of linker flags
configure: name some variables more consistently
configure: remove filter_cppflags
configure: set icc_version where it is needed
mpegenc: remove disabled code
Conflicts:
configure
libavformat/movenc.c
libswscale/x86/swscale_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Fixes compilation for compilers that do not support gcc inline assembly.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
| |
| |
| |
| |
| | |
Fate update by michael.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
ac3enc: use correct alignment and length in channel coupling dsp functions.
ffmpeg: don't abuse a global for passing framerate from input to output
ffmpeg: don't abuse a global for passing channels from input to output
ffmpeg: don't abuse a global for passing samplerate from input to output
ARM: update ff_h264_idct8_add4_neon for 4:4:4 changes
swscale: use SwsContext for av_log when available
swscale: Remove HAVE_MMX from files that are only compiled with MMX enabled.
swscale: Fix compilation with --disable-mmx2.
Conflicts:
ffmpeg.c
libswscale/utils.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Many functions have such a prefix, but do not actually use any
instructions or features from that set, thus giving the false
impression that swscale is highly optimized for a particular
system, whereas in reality it is not.
|
| |
| |
| |
| |
| |
| |
| | |
Many functions have such a prefix, but do not actually use any
instructions or features from that set, thus giving the false
impression that swscale is highly optimized for a particular
system, whereas in reality it is not.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master: (22 commits)
configure: enable memalign_hack automatically when needed
swscale: unbreak the build on non-x86 systems.
swscale: remove if(bitexact) branch from functions.
swscale: remove if(canMMX2BeUsed) conditional.
swscale: remove swScale_{c,MMX,MMX2} duplication.
swscale: use emms_c().
Move emms_c() from libavcodec to libavutil.
tiff: set palette in the context when specified in TIFF_PAL tag
rtsp: use strtoul to parse rtptime and seq values.
pgssubdec: fix incorrect colors.
dvdsubdec: fix incorrect colors.
ape: Allow demuxing of files with metadata tags.
swscale: remove dead macro WRITEBGR24OLD.
swscale: remove AMD3DNOW "optimizations".
swscale: remove duplicate code in ppc/ subdirectory.
swscale: remove duplicated x86/ functions.
swscale: force --enable-runtime-cpudetect and remove SWS_CPU_CAPS_*.
vsrc_buffer.h: add file doxy
vsrc_buffer: tweak error message in init()
msmpeg4: reindent.
...
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|