aboutsummaryrefslogtreecommitdiffstats
path: root/libavutil/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge commit '79793f833784121d574454af4871866576c0749d'Michael Niedermayer2014-07-012-2/+2
|\ | | | | | | | | | | | | * commit '79793f833784121d574454af4871866576c0749d': Update Fiona's name in copyright statements. Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * Update Fiona's name in copyright statements.Diego Biurrun2014-07-012-2/+2
| |
* | x86util: add and use RSHIFT/LSHIFT macrosChristophe Gisquet2014-06-151-0/+16
| | | | | | | | | | | | | | Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: add missing femmsJames Almer2014-06-081-0/+3
| | | | | | | | | | | | | | | | It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: port vector_fmul_window to yasmJames Almer2014-06-082-73/+63
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/vp9: inital AVX2 intra_predJames Almer2014-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz 1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips 439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips 3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips 2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips 1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips 717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips 2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips 2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips 3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips 2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips 1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips 922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: hpeldsp: better factorizationChristophe Gisquet2014-05-291-1/+9
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}James Almer2014-05-281-0/+5
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | inline asm: fix arrays as named constraints.Matt Oliver2014-05-071-0/+6
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: remove duplicated code from vector_dmul_scalarJames Almer2014-04-191-8/+3
| | | | | | | | | | | | | | | | Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: move horizontal add macros to x86utilJames Almer2014-04-171-0/+33
| | | | | | | | | | | | | | | | | | Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: unroll loop in vector_fmac_scalarJames Almer2014-04-161-18/+26
| | | | | | | | | | | | | | | | ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: use SWAP in vector_fmac_scalar Win64James Almer2014-04-161-3/+3
| | | | | | | | | | | | | | The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/cpu: check for OS support before enabling AVX2James Almer2014-03-251-1/+1
| | | | | | | | | | | | | | AV_CPU_FLAG_AVX is enabled at this point only if there's OS support. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Automatically change MANGLE() into named inline asm operands when direct ↵Matt Oliver2014-03-181-1/+35
| | | | | | | | | | | | | | | | symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3James Almer2014-03-132-1/+31
| | | | | | | | | | | | | | ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil/timer: Fix units for x86 after c708b5403346255ea5adc776645616cc7c61f078Michael Niedermayer2014-03-091-0/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: Move XOP emulation to x86utilJames Almer2014-02-242-19/+19
| | | | | | | | | | | | | | | | | | | | | | We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2014-02-231-5/+4
|\| | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: add detection for Bit Manipulation Instruction sets Conflicts: libavutil/x86/cpu.c See: 0bc3de19ffe296254f214dc7615e624d8e401bcb Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-231-6/+11
| | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
* | Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3'Michael Niedermayer2014-02-231-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '1b932eb1508f550fac9e911923a0383efda53aa3': x86: add detection for FMA3 instruction set Conflicts: configure libavutil/cpu.h libavutil/x86/cpu.c See: a2af8eddab75f1eac712411e4dde89823c0845e8 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add detection for FMA3 instruction setJames Almer2014-02-232-1/+7
| | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>
| * x86: add missing XOP checks and macrosJames Almer2014-02-231-0/+3
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
| * x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-201-16/+24
| | | | | | | | | | | | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <janne-libav@jannau.net>
| * x86inc: Speed up assembling with YasmLoren Merritt2014-01-261-23/+23
| | | | | | | | | | | | | | Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* | x86: add detection for Bit Manipulation Instruction setsJames Almer2014-02-221-5/+11
| | | | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: add detection for FMA3 instruction setJames Almer2014-02-222-1/+7
| | | | | | | | | | | | | | Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-151-16/+24
| | | | | | | | | | | | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86inc: Extend FMA_INSTR functionalityJames Almer2014-02-131-0/+4
| | | | | | | | | | | | | | | | | | | | Support the cases where the first and last operand of the XOP instruction are the same. Also add vpmacsdql emulation. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: add missing XOP checks and macrosJames Almer2014-02-111-0/+3
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86inc: speed up compilation with yasmLoren Merritt2014-01-181-23/+23
| | | | | | | | | | Work around yasm's inefficiency with handling large numbers of variables in the global scope.
* | rename new lls code to lls2 to avoid conflict with the old which has a ↵Michael Niedermayer2013-11-172-8/+8
| | | | | | | | | | | | | | | | different ABI also remove failed attempt at a compatibility layer, the code simply cannot work Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | avutil: rename lls to lls2Michael Niedermayer2013-11-171-1/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'Michael Niedermayer2013-10-261-2/+3
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4': libavutil: x86: Add AVX2 capable CPU detection. Conflicts: libavutil/cpu.c libavutil/cpu.h libavutil/x86/cpu.c See: 865b70bc5d1cf37ec6d6cb729a69dda2cca28bd5 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * libavutil: x86: Add AVX2 capable CPU detection.Kieran Kunhya2013-10-252-0/+11
| | | | | | | | | | | | Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Add AVX2 capable CPU detection. Patch based on x264's AVX2 detectionKieran Kunhya2013-10-262-0/+10
| | | | | | | | Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-141-0/+11
|\| | | | | | | | | | | | | * qatar/master: x86: more AVX2 framework Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: more AVX2 frameworkJason Garrett-Glaser2013-10-141-0/+11
| | | | | | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'Michael Niedermayer2013-10-142-1/+45
|\| | | | | | | | | | | | | * commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497': x86inc: FMA3/4 Support Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: FMA3/4 SupportJason Garrett-Glaser2013-10-142-1/+45
| | | | | | | | Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge commit '206895708ea2b464755d340e44501daf9a07c310'Michael Niedermayer2013-10-142-11/+16
|\| | | | | | | | | | | | | * commit '206895708ea2b464755d340e44501daf9a07c310': x86inc: Remove our FMA4 support Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: Remove our FMA4 supportDerek Buitenhuis2013-10-142-11/+16
| | | | | | | | | | | | | | | | This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'Michael Niedermayer2013-10-141-84/+169
|\| | | | | | | | | | | | | * commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098': x86inc: Use VEX-encoded instructions in AVX functions Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: Use VEX-encoded instructions in AVX functionsHenrik Gramner2013-10-141-84/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-091-30/+5
|\| | | | | | | | | | | | | | | | | | | * qatar/master: x86inc: Remove .rodata kludges Conflicts: libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: Remove .rodata kludgesHenrik Gramner2013-10-091-11/+5
| | | | | | | | | | | | | | The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'Michael Niedermayer2013-10-081-5/+4
|\| | | | | | | | | | | | | * commit '3e2fa991db7ef172579422accd61624d52777e5a': x86inc: remove misaligned cpu flag Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: remove misaligned cpu flagHenrik Gramner2013-10-071-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* | Merge commit '71155665414b551ad350622d5abed20e58371fbf'Michael Niedermayer2013-10-081-3/+2
|\| | | | | | | | | | | | | * commit '71155665414b551ad350622d5abed20e58371fbf': x86inc: various minor backports from x264 Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: various minor backports from x264Jason Garrett-Glaser2013-10-071-4/+3
| | | | | | | | | | | | Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>