| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: add detection for Bit Manipulation Instruction sets
Conflicts:
libavutil/x86/cpu.c
See: 0bc3de19ffe296254f214dc7615e624d8e401bcb
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1b932eb1508f550fac9e911923a0383efda53aa3':
x86: add detection for FMA3 instruction set
Conflicts:
configure
libavutil/cpu.h
libavutil/x86/cpu.c
See: a2af8eddab75f1eac712411e4dde89823c0845e8
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
|
| |
| |
| |
| |
| |
| |
| |
| | |
different ABI
also remove failed attempt at a compatibility layer, the code simply cannot work
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4':
libavutil: x86: Add AVX2 capable CPU detection.
Conflicts:
libavutil/cpu.c
libavutil/cpu.h
libavutil/x86/cpu.c
See: 865b70bc5d1cf37ec6d6cb729a69dda2cca28bd5
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: more AVX2 framework
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
x86inc: FMA3/4 Support
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '206895708ea2b464755d340e44501daf9a07c310':
x86inc: Remove our FMA4 support
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
x86inc: Use VEX-encoded instructions in AVX functions
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86inc: Remove .rodata kludges
Conflicts:
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
x86inc: remove misaligned cpu flag
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '71155665414b551ad350622d5abed20e58371fbf':
x86inc: various minor backports from x264
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
x86inc: create xm# and ym#, analagous to m#
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
x86inc: fix some corner cases of SWAP
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
x86inc: Use SSE instead of SSE2 for copying data
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
x86inc: Set ELF hidden visibility for global constants
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
x86inc: activate REP_RET automatically
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
| |
| |
| |
| | |
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
avutil: Fix compilation with inline asm disabled on mingw
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <[email protected]>
|