aboutsummaryrefslogtreecommitdiffstats
path: root/libavutil/x86
Commit message (Collapse)AuthorAgeFilesLines
* x86: use the new helper macros where usefulJames Almer2016-02-142-2/+2
| | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: add some more helper macros to check for slow cpuflagsJames Almer2016-02-141-0/+4
| | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/cpu: set avxslow cpuflag on btver2 CPUsJames Almer2016-02-071-6/+4
| | | | | | | They are also slow when using 256 bit wide registers Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/emms: empty the mmx state unconditionally on supported targetsJames Almer2016-02-041-0/+6
| | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* all: Add missing header guardsTimothy Gu2016-01-281-0/+5
|
* x86inc: Add debug symbols indicating sizes of compiled functionsGeza Lore2016-01-211-0/+23
| | | | | | | | | Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF.
* x86inc: Avoid creating unnecessary local labelsHenrik Gramner2016-01-211-2/+4
| | | | | | | | | | The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything.
* x86inc: Simplify AUTO_REP_RETHenrik Gramner2016-01-211-4/+2
| | | | | | cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment.
* x86inc: Use more consistent indentationHenrik Gramner2016-01-211-67/+67
|
* x86inc: Preserve arguments when allocating stack spaceHenrik Gramner2016-01-211-2/+5
| | | | | | When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments.
* x86inc: Improve FMA instruction handlingHenrik Gramner2016-01-211-40/+37
| | | | | | | | | | | | * Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand.
* x86inc: Be more verbose in assertion failuresHenrik Gramner2016-01-211-1/+1
|
* x86/intmath: disable sse av_clip functions when using ICCJames Almer2016-01-211-2/+2
| | | | | | | | | It seems to miscompile them Should fix fate-ra-288 and fate-twinvq Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/fixed_dsp: add ff_butterflies_fixed_sse2James Almer2016-01-163-0/+85
| | | | | Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* lavu/x86/lls: add fma3 optimizations for update_llsGanesh Ajjanagadde2016-01-152-2/+61
| | | | | | | | | | | | | | | | | | | | | This improves accuracy (very slightly) and speed for processors having fma3. Sample benchmark (fate flac-16-lpc-cholesky, Haswell): old: 5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips 5951528 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips new: 5252410 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips 5232869 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips Tested with FATE and --disable-fma3, also examined contents of lavu/lls-test. Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
* x86/intmath: add missing early clobber to output operandsJames Almer2016-01-151-2/+2
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: zero extend offset from ff_scalarproduct_float_sseJames Almer2016-01-081-3/+3
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: zero extend len from ff_butterflies_float_sse implicitlyJames Almer2016-01-081-4/+1
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: remove len check from ff_butterflies_float_sseJames Almer2016-01-081-3/+0
| | | | | | | The function documentation explicitly mentions it needs to be a multiple of 4. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/intmath: add sse optimized av_clipf and av_clipdJames Almer2016-01-071-0/+33
| | | | | | Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/bswap: Remove warning about bswap intrinsics with msvc.Matt Oliver2015-11-231-0/+3
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/intmath: Fix intrinsic header include when using newer gcc with ↵Matt Oliver2015-11-121-1/+1
| | | | | | older icc. Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/bswap: Add msvc bswap instrinsics.Matt Oliver2015-11-121-1/+24
| | | | | | This adds msvc optimisations as well as fixing an error in icl whereby it will generate invalid code otherwise. Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/intmath: Disable use of tzcnt on older intel compilers.Matt Oliver2015-11-111-1/+1
| | | | | | | ICC versions older than atleast 12.1.6 dont have the tzcnt intrinsics. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/intmath: Correct intrinsic headers for older compilers.Matt Oliver2015-11-091-2/+6
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/intmath: Add missing header.Matt Oliver2015-11-011-0/+3
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* avutil/x86/intmath: Use tzcnt in place of bsf.Matt Oliver2015-10-311-39/+15
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* lavu: add AESNI CPU flagRodger Combs2015-10-283-6/+12
|
* lavu/intmath.h: Move x86 only msvc/icl functions to x86 specific header.Matt Oliver2015-10-191-0/+20
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* lavu/intmath.h: Add msvc/icl ctzll optimisations.Matt Oliver2015-10-191-0/+35
| | | | Signed-off-by: Matt Oliver <protogonoi@gmail.com>
* x86inc: Make cpuflag() and notcpuflag() return 0 or 1Henrik Gramner2015-10-011-2/+3
| | | | Makes it possible to use them in arithmetic expressions.
* avutil/attributes: add AV_GCC_VERSION_AT_MOSTJames Almer2015-09-181-4/+4
| | | | | Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* x86: port PSIGNW to cpuflagsJames Almer2015-09-111-5/+5
| | | | | Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/asm: rename REG_SP to REG_spGanesh Ajjanagadde2015-08-221-2/+3
| | | | | | | | | REG_SP is defined by Solaris system headers. This fixes a sea of warnings while building on Solaris: http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3 Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86inc: warn if XOP integer FMA instruction emulation is impossibleAnton Mitrofanov2015-08-051-1/+3
| | | | Signed-off-by: Henrik Gramner <henrik@gramner.com>
* x86inc: Drop SECTION_TEXT macroHenrik Gramner2015-08-042-13/+1
| | | | | The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
* x86inc: Support arbitrary stack alignmentsHenrik Gramner2015-08-041-22/+40
| | | | | | Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.
* x86: move XOP emulation code back to x86incJames Almer2015-08-032-19/+16
| | | | | | | | | | Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86inc: Various minor backports from x264Henrik Gramner2015-08-031-11/+21
| | | | | Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86inc: Disable vpbroadcastq workaround in newer yasm versionsHenrik Gramner2015-08-031-9/+11
| | | | | | | The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-261-1/+1
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/bswap: force inline asm versions with ICCJames Almer2015-07-181-1/+1
| | | | | | | | Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't optimize the generic C versions of av_bswap*() on their own. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
* Merge commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59'Michael Niedermayer2015-07-091-1/+6
|\ | | | | | | | | | | | | * commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59': x86: Serialize rdtsc in read_time() Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Serialize rdtsc in read_time()Henrik Gramner2015-07-091-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improves the accuracy of measurements, especially in short sections. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed. If software requires RDTSC to be executed only after all previous instructions have completed locally, it can either use RDTSCP (if the processor supports that instruction) or execute the sequence LFENCE;RDTSC." SSE2 is a requirement for lfence so only use it on SSE2-capable systems. Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
| * x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-05-312-2/+2
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* | avutil/x86/intmath: add missing check for inline assemblyJames Almer2015-06-271-1/+1
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | avutil/x86/intmath: use bzhi gcc builtin in av_mod_uintp2()James Almer2015-06-271-0/+7
| | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com>
* | x86: check for AV_CPU_FLAG_AVXSLOW where usefulJames Almer2015-06-012-3/+3
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4'Michael Niedermayer2015-05-311-0/+18
|\| | | | | | | | | | | | | * commit 'cae39851201b7781f1262e1c23627b45e6e80bb4': x86: Add helper macros to check for slow cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Add helper macros to check for slow cpuflagsJames Almer2015-05-311-0/+18
| | | | | | | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>