summaryrefslogtreecommitdiffstats
path: root/libavutil/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge commit '79793f833784121d574454af4871866576c0749d'Michael Niedermayer2014-07-012-2/+2
|\ | | | | | | | | | | | | * commit '79793f833784121d574454af4871866576c0749d': Update Fiona's name in copyright statements. Merged-by: Michael Niedermayer <[email protected]>
| * Update Fiona's name in copyright statements.Diego Biurrun2014-07-012-2/+2
| |
* | x86util: add and use RSHIFT/LSHIFT macrosChristophe Gisquet2014-06-151-0/+16
| | | | | | | | | | | | | | Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: add missing femmsJames Almer2014-06-081-0/+3
| | | | | | | | | | | | | | | | It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: port vector_fmul_window to yasmJames Almer2014-06-082-73/+63
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/vp9: inital AVX2 intra_predJames Almer2014-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz 1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips 439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips 3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips 2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips 1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips 717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips 2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips 2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips 3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips 2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips 1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips 922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: hpeldsp: better factorizationChristophe Gisquet2014-05-291-1/+9
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}James Almer2014-05-281-0/+5
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | inline asm: fix arrays as named constraints.Matt Oliver2014-05-071-0/+6
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: remove duplicated code from vector_dmul_scalarJames Almer2014-04-191-8/+3
| | | | | | | | | | | | | | | | Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: move horizontal add macros to x86utilJames Almer2014-04-171-0/+33
| | | | | | | | | | | | | | | | | | Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <[email protected]> Reviewed-by: "Ronald S. Bultje" <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: unroll loop in vector_fmac_scalarJames Almer2014-04-161-18/+26
| | | | | | | | | | | | | | | | ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <[email protected]> Reviewed-by: Christophe Gisquet <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: use SWAP in vector_fmac_scalar Win64James Almer2014-04-161-3/+3
| | | | | | | | | | | | | | The mova is unnecessary Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/cpu: check for OS support before enabling AVX2James Almer2014-03-251-1/+1
| | | | | | | | | | | | | | AV_CPU_FLAG_AVX is enabled at this point only if there's OS support. Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | Automatically change MANGLE() into named inline asm operands when direct ↵Matt Oliver2014-03-181-1/+35
| | | | | | | | | | | | | | | | symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3James Almer2014-03-132-1/+31
| | | | | | | | | | | | | | ~7% faster than AVX Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil/timer: Fix units for x86 after c708b5403346255ea5adc776645616cc7c61f078Michael Niedermayer2014-03-091-0/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: Move XOP emulation to x86utilJames Almer2014-02-242-19/+19
| | | | | | | | | | | | | | | | | | | | | | We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2014-02-231-5/+4
|\| | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: add detection for Bit Manipulation Instruction sets Conflicts: libavutil/x86/cpu.c See: 0bc3de19ffe296254f214dc7615e624d8e401bcb Merged-by: Michael Niedermayer <[email protected]>
| * x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-231-6/+11
| | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <[email protected]>
* | Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3'Michael Niedermayer2014-02-231-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '1b932eb1508f550fac9e911923a0383efda53aa3': x86: add detection for FMA3 instruction set Conflicts: configure libavutil/cpu.h libavutil/x86/cpu.c See: a2af8eddab75f1eac712411e4dde89823c0845e8 Merged-by: Michael Niedermayer <[email protected]>
| * x86: add detection for FMA3 instruction setJames Almer2014-02-232-1/+7
| | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <[email protected]>
| * x86: add missing XOP checks and macrosJames Almer2014-02-231-0/+3
| | | | | | | | Signed-off-by: James Almer <[email protected]>
| * x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-201-16/+24
| | | | | | | | | | | | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <[email protected]>
| * x86inc: Speed up assembling with YasmLoren Merritt2014-01-261-23/+23
| | | | | | | | | | | | | | Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <[email protected]>
* | x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-221-5/+11
| | | | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: add detection for FMA3 instruction setJames Almer2014-02-222-1/+7
| | | | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-151-16/+24
| | | | | | | | | | | | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86inc: Extend FMA_INSTR functionalityJames Almer2014-02-131-0/+4
| | | | | | | | | | | | | | | | | | | | Support the cases where the first and last operand of the XOP instruction are the same. Also add vpmacsdql emulation. Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: add missing XOP checks and macrosJames Almer2014-02-111-0/+3
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | x86inc: speed up compilation with yasmLoren Merritt2014-01-181-23/+23
| | | | | | | | | | Work around yasm's inefficiency with handling large numbers of variables in the global scope.
* | rename new lls code to lls2 to avoid conflict with the old which has a ↵Michael Niedermayer2013-11-172-8/+8
| | | | | | | | | | | | | | | | different ABI also remove failed attempt at a compatibility layer, the code simply cannot work Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil: rename lls to lls2Michael Niedermayer2013-11-171-1/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'Michael Niedermayer2013-10-261-2/+3
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4': libavutil: x86: Add AVX2 capable CPU detection. Conflicts: libavutil/cpu.c libavutil/cpu.h libavutil/x86/cpu.c See: 865b70bc5d1cf37ec6d6cb729a69dda2cca28bd5 Merged-by: Michael Niedermayer <[email protected]>
| * libavutil: x86: Add AVX2 capable CPU detection.Kieran Kunhya2013-10-252-0/+11
| | | | | | | | | | | | Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <[email protected]>
* | Add AVX2 capable CPU detection. Patch based on x264's AVX2 detectionKieran Kunhya2013-10-262-0/+10
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-141-0/+11
|\| | | | | | | | | | | | | * qatar/master: x86: more AVX2 framework Merged-by: Michael Niedermayer <[email protected]>
| * x86: more AVX2 frameworkJason Garrett-Glaser2013-10-141-0/+11
| | | | | | | | Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'Michael Niedermayer2013-10-142-1/+45
|\| | | | | | | | | | | | | * commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497': x86inc: FMA3/4 Support Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: FMA3/4 SupportJason Garrett-Glaser2013-10-142-1/+45
| | | | | | | | Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge commit '206895708ea2b464755d340e44501daf9a07c310'Michael Niedermayer2013-10-142-11/+16
|\| | | | | | | | | | | | | * commit '206895708ea2b464755d340e44501daf9a07c310': x86inc: Remove our FMA4 support Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: Remove our FMA4 supportDerek Buitenhuis2013-10-142-11/+16
| | | | | | | | | | | | | | | | This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'Michael Niedermayer2013-10-141-84/+169
|\| | | | | | | | | | | | | * commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098': x86inc: Use VEX-encoded instructions in AVX functions Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: Use VEX-encoded instructions in AVX functionsHenrik Gramner2013-10-141-84/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-091-30/+5
|\| | | | | | | | | | | | | | | | | | | * qatar/master: x86inc: Remove .rodata kludges Conflicts: libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: Remove .rodata kludgesHenrik Gramner2013-10-091-11/+5
| | | | | | | | | | | | | | The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'Michael Niedermayer2013-10-081-5/+4
|\| | | | | | | | | | | | | * commit '3e2fa991db7ef172579422accd61624d52777e5a': x86inc: remove misaligned cpu flag Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: remove misaligned cpu flagHenrik Gramner2013-10-071-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <[email protected]>
* | Merge commit '71155665414b551ad350622d5abed20e58371fbf'Michael Niedermayer2013-10-081-3/+2
|\| | | | | | | | | | | | | * commit '71155665414b551ad350622d5abed20e58371fbf': x86inc: various minor backports from x264 Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: various minor backports from x264Jason Garrett-Glaser2013-10-071-4/+3
| | | | | | | | | | | | Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <[email protected]>