summaryrefslogtreecommitdiffstats
path: root/libavutil/x86
Commit message (Collapse)AuthorAgeFilesLines
* lavu/intmath.h: Move x86 only msvc/icl functions to x86 specific header.Matt Oliver2015-10-191-0/+20
| | | | Signed-off-by: Matt Oliver <[email protected]>
* lavu/intmath.h: Add msvc/icl ctzll optimisations.Matt Oliver2015-10-191-0/+35
| | | | Signed-off-by: Matt Oliver <[email protected]>
* x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2015-10-011-2/+3
| | | | Makes it possible to use them in arithmetic expressions.
* avutil/attributes: add AV_GCC_VERSION_AT_MOSTJames Almer2015-09-181-4/+4
| | | | | Reviewed-by: Michael Niedermayer <[email protected]> Signed-off-by: James Almer <[email protected]>
* x86: port PSIGNW to cpuflagsJames Almer2015-09-111-5/+5
| | | | | Reviewed-by: Ronald S. Bultje <[email protected]> Signed-off-by: James Almer <[email protected]>
* avutil/x86/asm: rename REG_SP to REG_spGanesh Ajjanagadde2015-08-221-2/+3
| | | | | | | | | REG_SP is defined by Solaris system headers. This fixes a sea of warnings while building on Solaris: http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3 Signed-off-by: Ganesh Ajjanagadde <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-051-1/+3
| | | | Signed-off-by: Henrik Gramner <[email protected]>
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-042-13/+1
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-041-22/+40
| | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.
* x86: move XOP emulation code back to x86incJames Almer2015-08-032-19/+16
| | | | | | | | | | Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <[email protected]> Signed-off-by: James Almer <[email protected]>
* x86inc: Various minor backports from x264Henrik Gramner2015-08-031-11/+21
| | | | | Reviewed-by: "Ronald S. Bultje" <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-031-9/+11
| | | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Reviewed-by: "Ronald S. Bultje" <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-261-1/+1
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <[email protected]>
* avutil/x86/bswap: force inline asm versions with ICCJames Almer2015-07-181-1/+1
| | | | | | | | Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't optimize the generic C versions of av_bswap*() on their own. Reviewed-by: Michael Niedermayer <[email protected]> Signed-off-by: James Almer <[email protected]>
* Merge commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59'Michael Niedermayer2015-07-091-1/+6
|\ | | | | | | | | | | | | * commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59': x86: Serialize rdtsc in read_time() Merged-by: Michael Niedermayer <[email protected]>
| * x86: Serialize rdtsc in read_time()Henrik Gramner2015-07-091-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves the accuracy of measurements, especially in short sections. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC." SSE2 is a requirement for lfence so only use it on SSE2-capable systems. Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems. Signed-off-by: Luca Barbato <[email protected]>
| * x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-05-312-2/+2
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Luca Barbato <[email protected]>
* | avutil/x86/intmath: add missing check for inline assemblyJames Almer2015-06-271-1/+1
| | | | | | | | Signed-off-by: James Almer <[email protected]>
* | avutil/x86/intmath: use bzhi gcc builtin in av_mod_uintp2()James Almer2015-06-271-0/+7
| | | | | | | | Signed-off-by: James Almer <[email protected]>
* | x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-06-012-3/+3
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4'Michael Niedermayer2015-05-311-0/+18
|\| | | | | | | | | | | | | * commit 'cae39851201b7781f1262e1c23627b45e6e80bb4': x86: Add helper macros to check for slow cpuflags Merged-by: Michael Niedermayer <[email protected]>
| * x86: Add helper macros to check for slow cpuflagsJames Almer2015-05-311-0/+18
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Luca Barbato <[email protected]>
| * x86: add AV_CPU_FLAG_AVXSLOW flagJames Almer2015-05-311-3/+14
| | | | | | | | | | Signed-off-by: James Almer <[email protected]> Signed-off-by: Luca Barbato <[email protected]>
| * x86inc: Clear __SECT__Timothy Gu2015-05-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Luca Barbato <[email protected]>
| * v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-12-051-0/+5
| | | | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]> Signed-off-by: Vittorio Giovara <[email protected]>
| * x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-091-19/+22
| | | | | | | | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Diego Biurrun <[email protected]>
| * x86inc: Free up variable name "n" in global namespaceLoren Merritt2014-09-091-9/+9
| | | | | | | | Signed-off-by: Diego Biurrun <[email protected]>
| * x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-091-4/+4
| | | | | | | | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Diego Biurrun <[email protected]>
* | x86inc: Clear __SECT__Timothy Gu2015-05-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit silences warning(s) like: libavcodec/x86/fft.asm:93: warning: section flags ignored on section redeclaration The cause of this warning is that because `struc` and `endstruc` attempts to revert to the previous section state [1]. The section state is stored in the macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice (once in x86inc.asm, once during `endstruc`), causing the warning. That is the first part of the commit: using the primitive `[section]` format for .note.GNU-stack etc., which does not update `__SECT__` [2]. That fixes only half of the problem. Even without any `SECTION` directives, `__SECT__` is predefined as `.text`, which conflicting with the later `SECTION_TEXT` (which expands to `.text align=16`). [1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4 [2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3 Signed-off-by: Michael Niedermayer <[email protected]>
* | x86/cpu: add AV_CPU_FLAG_AVXSLOW flagJames Almer2015-05-271-3/+14
| | | | | | | | | | Reviewed-by: Michael Niedermayer <[email protected]> Signed-off-by: James Almer <[email protected]>
* | avutil/x86/Makefile: fix conditional x86/emms.o buildMichael Niedermayer2015-04-091-2/+2
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil/x86/Makefile: Make building and linking of emms.c conditionalRonald S. Bultje2015-04-081-1/+3
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | libavutil: add bmi2 optimized av_mod_uintp2James Almer2015-03-201-2/+22
| | | | | | | | | | Reviewed-by: Michael Niedermayer <[email protected]> Signed-off-by: James Almer <[email protected]>
* | pixelutils: Comment on (lack of) sad_8x8_sse2Peter Cordes2015-03-041-0/+6
| | | | | | | | Signed-off-by: Peter Cordes <[email protected]>
* | libavutil: add x86 optimized av_popcountJames Almer2015-02-251-0/+38
| | | | | | | | | | Reviewed-by: Ronald S. Bultje <[email protected]> Signed-off-by: James Almer <[email protected]>
* | x86inc: Correctly warn on use of SSE2 instructions in SSE functionsChristophe Gisquet2015-02-171-2/+4
| | | | | | | | | | | | | | | | SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2 instructions did not issue warnings when used in SSE functions. Handle it by also checking the register type when such instructions are used. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: lavu/x264asm: fix ymm register instantiationChristophe Gisquet2015-02-041-1/+1
| | | | | | | | | | | | | | | | This mimicks what is done for the other instruction sets. Tested-by: James Almer <[email protected]> Tested-by: Mickaël Raulet <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | lavu/x86/x86inc: deprecate INIT_AVXJames Darnley2015-02-021-8/+0
| | | | | | | | | | | | The same can be done with INIT_XMM avx Signed-off-by: Michael Niedermayer <[email protected]>
* | x264asm: warn when inappropriate instruction used in function with specified ↵Anton Mitrofanov2015-02-021-286/+295
| | | | | | | | | | | | | | cpuflags Requested-by: Christophe Gisquet <[email protected]> Requested-by: "Ronald S. Bultje" <[email protected]>
* | x86/swr: add SSE2/AVX pack_8ch functionsJames Almer2014-12-301-0/+37
| | | | | | | | | | | | Reviewed-by: Michael Niedermayer <[email protected]> Reviewed-by: Ronald S. Bultje <[email protected]> Signed-off-by: James Almer <[email protected]>
* | v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya2014-11-261-0/+5
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil/lls: Make unchanged function arguments constMichael Niedermayer2014-09-281-3/+3
| | | | | | | | | | Reviewed-by: Paul B Mahol <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil/x86/cpu: fix cpuid sub-leaf selectionlvqcl2014-09-271-1/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflagsHenrik Gramner2014-09-051-19/+22
| | | | | | | | | | | | Previously there was a limit of two cpuflags. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86inc: Make ym# behave the same way as xm#Henrik Gramner2014-09-051-4/+4
| | | | | | | | | | | | This makes more sense for future implementations of templates with zmm registers. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86inc: free up variable name "n" in global namespaceLoren Merritt2014-09-051-9/+9
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil/pixelutils: faster pixelutils_sad_16x16Clément Bœsch2014-08-231-5/+11
| | | | | | | | | | | | 501 to 439 decicycles. See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
* | avutil/pixelutils: faster pixelutils_sad_[au]_16x16Clément Bœsch2014-08-231-5/+9
| | | | | | | | | | | | | | | | | | | | ~560 → ~500 decicycles This is following the comments from Michael in https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html Using 2 registers for accumulator didn't help. On the other hand, some re-ordering between the movs and psadbw allowed going ~538 to ~500.
* | drop LLS1, rename LLS2 to LLSMichael Niedermayer2014-08-092-9/+9
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | avutil: add pixelutils APIClément Bœsch2014-08-054-0/+243
| |