| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: more AVX2 framework
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
x86inc: FMA3/4 Support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '206895708ea2b464755d340e44501daf9a07c310':
x86inc: Remove our FMA4 support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
x86inc: Use VEX-encoded instructions in AVX functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86inc: Remove .rodata kludges
Conflicts:
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
x86inc: remove misaligned cpu flag
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '71155665414b551ad350622d5abed20e58371fbf':
x86inc: various minor backports from x264
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
x86inc: create xm# and ym#, analagous to m#
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
x86inc: fix some corner cases of SWAP
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
x86inc: Use SSE instead of SSE2 for copying data
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
x86inc: Set ELF hidden visibility for global constants
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
x86inc: activate REP_RET automatically
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| | |
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
avutil: Fix compilation with inline asm disabled on mingw
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '79aec43ce813a3e270743ca64fa3f31fa43df80b':
x86: Add and use more convenience macros to check CPU extension availability
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e':
avutil: Refactor CPU extension availability macros
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b':
avutil: Move internal CPU detection function declarations to private header
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| | |
The bug has been fixed in c8b920a9b7fa534a6141695ace4e8c2dfcd56cee by Loren Merritt
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee':
lls/x86: use 3-operator vaddpd in ADDPD_MEM
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
| |
| |
| |
| | |
This reverts commit 247425241cb3b2b76df1c2aced5ce0d56126b82d.
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| | |
It just segfaults on 32bit, thus its disabled until someone fixes it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit 'b545179fdff1ccfbbb9d422e4e9720cb6c6d9191':
x86: lpc: simd av_evaluate_lls
Conflicts:
libavutil/x86/lls.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| |
| | |
The code doesnt build with yasm from ubuntu 12.04
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|