summaryrefslogtreecommitdiffstats
path: root/libavutil/x86
Commit message (Collapse)AuthorAgeFilesLines
* Add macros to x86util.asm .Ivan Kalvachev2017-08-181-8/+98
| | | | | | | | | Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <[email protected]>
* x86inc: don't use read-only data sections on COFF targetsJames Almer2017-06-271-0/+2
| | | | | | | | | | | | | Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by: Clément Bœsch <[email protected]> Signed-off-by: James Almer <[email protected]>
* build: Generalize yasm/nasm-related variable namesDiego Biurrun2017-06-212-4/+4
| | | | | | | | None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <[email protected]>
* x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}James Almer2017-06-181-6/+9
| | | | About 2x faster than the c version.
* x86inc: Add some additional cpuflag relationsHenrik Gramner2017-06-121-19/+19
| | | | | | | | Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2
* x86inc: Remove argument from WIN64_RESTORE_XMMAnton Mitrofanov2017-06-091-9/+10
| | | | | The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.
* x86inc: Prefer r14/r15 over r12/r13 on x86-64Henrik Gramner2017-06-091-8/+8
| | | | | | | Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.
* x86inc: Make REP_RET identical to RET in SSSE3+ functionsHenrik Gramner2017-06-091-1/+1
| | | | There's no point in emitting a rep prefix before ret on modern CPUs.
* x86inc: Fix call with memory operandsHenrik Gramner2017-06-091-2/+6
| | | | | | We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.
* x86/float_dsp: remove usage of integer instructionsJames Almer2017-05-121-7/+7
|
* x86/float_dsp: add ff_vector_fmul_reverse_avx2James Almer2017-04-112-1/+19
| | | | | | ~20% faster than AVX. Signed-off-by: James Almer <[email protected]>
* x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}James Almer2017-04-102-0/+73
|
* Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'Clément Bœsch2017-03-301-1/+1
|\ | | | | | | | | | | | | * commit '99434f4df81b6801b2b535d5b9143305595784f6': float_dsp: Have implementation match function pointer prototype Merged-by: Clément Bœsch <[email protected]>
| * float_dsp: Have implementation match function pointer prototypeDiego Biurrun2016-11-031-1/+1
| | | | | | | | | | libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
* | Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'James Almer2017-03-232-4/+4
|\| | | | | | | | | | | | | * commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8': emms: Give apriv_emms_yasm() a more general name Merged-by: James Almer <[email protected]>
| * emms: Give apriv_emms_yasm() a more general nameDiego Biurrun2016-10-182-4/+4
| |
* | Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'James Almer2017-03-231-2/+2
|\| | | | | | | | | | | | | * commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4': x86: Add missing colons after assembly labels Merged-by: James Almer <[email protected]>
| * x86: Add missing colons after assembly labelsDiego Biurrun2016-10-171-2/+2
| | | | | | | | | | This fixes many warnings of the sort warning: label alone on a line without a colon might be in error
* | avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵James Almer2017-03-211-0/+2
| | | | | | | | | | | | | | are the same Reviewed-by: Henrik Gramner <[email protected]> Signed-off-by: James Almer <[email protected]>
* | Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'Clément Bœsch2017-03-201-0/+10
|\| | | | | | | | | | | | | * commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <[email protected]>
| * x86util: Document SBUTTERFLY macroAlexandra Hájková2016-09-191-0/+10
| | | | | | | | Signed-off-by: Luca Barbato <[email protected]>
* | Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'Clément Bœsch2017-03-203-0/+104
|\| | | | | | | | | | | | | * commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5': imgutils: add a function for copying image data from GPU mapped memory Merged-by: Clément Bœsch <[email protected]>
| * imgutils: add a function for copying image data from GPU mapped memoryAnton Khirnov2016-08-313-0/+104
| | | | | | | | See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
* | avcodec/h264: sse2, avx h luma mbaff deblock/loop filterJames Darnley2017-02-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
* | x86util: import MOVHL macroJames Darnley2017-02-181-0/+12
| | | | | | | | | | | | | | | | | | | | Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
* | avcodec/x86: deduplicate PASS8ROWS macroJames Darnley2017-02-181-0/+5
| |
* | Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'James Almer2017-01-311-0/+6
|\| | | | | | | | | | | | | * commit '8e9cd81d291b1010c625b2766058aadf4affb537': x86: cpu: Detect Conroe CPUs and their slow shuffle unit Merged-by: James Almer <[email protected]>
| * x86: cpu: Detect Conroe CPUs and their slow shuffle unitFiona Glaser2016-07-201-0/+6
| |
* | Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'James Almer2017-01-311-0/+6
|\| | | | | | | | | | | | | * commit '7d7355aa92bb36ca0765c49a569a999bcb96f332': x86: Add SSSE3_SLOW CPU flag and related convenience macros Merged-by: James Almer <[email protected]>
| * x86: Add SSSE3_SLOW CPU flag and related convenience macrosDiego Biurrun2016-07-201-0/+6
| |
| * x86util: Extend SPLATW for avx2James Almer2016-07-181-1/+3
| | | | | | | | | | | | Integration to Libav by Josh de Kock <[email protected]>. Signed-off-by: Alexandra Hájková <[email protected]>
| * asm: FF_-prefix internal macros used in inline assemblyDiego Biurrun2016-05-282-35/+35
| | | | | | | | | | These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.
| * x86inc: Enable AVX emulation in additional casesAnton Mitrofanov2016-05-161-8/+13
| | | | | | | | | | | | | | Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Improve handling of %ifid with multi-token parametersAnton Mitrofanov2016-05-161-2/+2
| | | | | | | | | | | | | | | | The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Fix AVX emulation of some instructionsAnton Mitrofanov2016-05-161-20/+24
| | | | | | | | Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Fix AVX emulation of scalar float instructionsHenrik Gramner2016-05-161-14/+14
| | | | | | | | | | | | | | Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <[email protected]>
* | x86inc: Avoid using eax/rax for storing the stack pointerHenrik Gramner2017-01-091-0/+7
| | | | | | | | | | | | | | | | | | When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value.
* | avutil/x86/emms: Document the emms_c() vs alloc/free relation.Michael Niedermayer2016-10-231-0/+2
| | | | | | | | | | Reviewed-by: Andreas Cadhalpun <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | vp9: add 16x16 idct avx2 (8-bit).Ronald S. Bultje2016-07-111-1/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
* | asm: FF_-prefix internal macros used in inline assemblyMatthieu Bouron2016-06-272-35/+35
| | | | | | | | See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
* | Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'Clément Bœsch2016-06-211-1/+1
|\| | | | | | | | | | | | | * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <[email protected]>
| * cosmetics: Fix spelling mistakesVittorio Giovara2016-05-041-1/+1
| | | | | | | | Signed-off-by: Diego Biurrun <[email protected]>
| * x86: Add ymm_reg structJames Almer2016-01-281-0/+1
| | | | | | | | | | | | | | Needed to declare 32-byte long constants Signed-off-by: James Almer <[email protected]> Signed-off-by: Luca Barbato <[email protected]>
| * x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-231-0/+23
| | | | | | | | | | | | | | | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-231-4/+2
| | | | | | | | | | | | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Use more consistent indentationHenrik Gramner2016-01-231-67/+67
| | | | | | | | Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-231-2/+5
| | | | | | | | | | | | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Improve FMA instruction handlingHenrik Gramner2016-01-231-40/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by: Anton Khirnov <[email protected]>
| * x86inc: Be more verbose in assertion failuresHenrik Gramner2016-01-231-1/+1
| | | | | | | | Signed-off-by: Anton Khirnov <[email protected]>