| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| | |
* commit '79793f833784121d574454af4871866576c0749d':
Update Fiona's name in copyright statements.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
It was lost during the port.
Should fix fate on 3dnowext machines.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz
1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips
3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips
1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips
2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips
3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips
1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Use the xm# and ym# aliases as they remain in sync with m# after a SWAP.
No actual changes to the assembly.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
~6% faster SSE2 performance. AVX/FMA3 are unaffected.
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The mova is unnecessary
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
~7% faster than AVX
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: add detection for Bit Manipulation Instruction sets
Conflicts:
libavutil/x86/cpu.c
See: 0bc3de19ffe296254f214dc7615e624d8e401bcb
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1b932eb1508f550fac9e911923a0383efda53aa3':
x86: add detection for FMA3 instruction set
Conflicts:
configure
libavutil/cpu.h
libavutil/x86/cpu.c
See: a2af8eddab75f1eac712411e4dde89823c0845e8
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
|
| |
| |
| |
| |
| |
| |
| | |
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
| |
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
Based on x264 code
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
|
| |
| |
| |
| |
| |
| |
| |
| | |
different ABI
also remove failed attempt at a compatibility layer, the code simply cannot work
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4':
libavutil: x86: Add AVX2 capable CPU detection.
Conflicts:
libavutil/cpu.c
libavutil/cpu.h
libavutil/x86/cpu.c
See: 865b70bc5d1cf37ec6d6cb729a69dda2cca28bd5
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: more AVX2 framework
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
x86inc: FMA3/4 Support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| | |
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '206895708ea2b464755d340e44501daf9a07c310':
x86inc: Remove our FMA4 support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
x86inc: Use VEX-encoded instructions in AVX functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86inc: Remove .rodata kludges
Conflicts:
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| | |
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
x86inc: remove misaligned cpu flag
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '71155665414b551ad350622d5abed20e58371fbf':
x86inc: various minor backports from x264
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| | |
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
|