| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
They are also slow when using 256 bit wide registers
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
|
|
|
|
|
|
|
|
|
|
| |
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
|
|
|
|
|
|
| |
cpuflags is never undefined any more, it's set to 0 instead.
Also fix an incorrect comment.
|
| |
|
|
|
|
|
|
| |
When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
|
| |
|
|
|
|
|
|
|
|
|
| |
It seems to miscompile them
Should fix fate-ra-288 and fate-twinvq
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves accuracy (very slightly) and speed for processors having
fma3.
Sample benchmark (fate flac-16-lpc-cholesky, Haswell):
old:
5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips
5951528 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips
new:
5252410 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips
5232869 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips
Tested with FATE and --disable-fma3, also examined contents of
lavu/lls-test.
Reviewed-by: James Almer <jamrial@gmail.com>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
|
|
|
|
| |
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
| |
The function documentation explicitly mentions it needs to be a multiple of 4.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
|
|
| |
older icc.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
|
|
| |
This adds msvc optimisations as well as fixing an error in icl whereby it will generate invalid code otherwise.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
|
|
|
| |
ICC versions older than atleast 12.1.6 dont have the tzcnt intrinsics.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
| |
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
| |
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
|
|
|
|
| |
Makes it possible to use them in arithmetic expressions.
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
|
| |
REG_SP is defined by Solaris system headers.
This fixes a sea of warnings while building on Solaris:
http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
| |
Signed-off-by: Henrik Gramner <henrik@gramner.com>
|
|
|
|
|
| |
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
|
|
|
|
|
|
| |
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
|
|
|
|
|
|
|
|
|
|
| |
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
| |
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
|
| |
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
|
|
|
|
| |
Silences warnings with Nasm
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
|
|
|
|
| |
Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't
optimize the generic C versions of av_bswap*() on their own.
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|\
| |
| |
| |
| |
| |
| | |
* commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59':
x86: Serialize rdtsc in read_time()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Improves the accuracy of measurements, especially in short sections.
To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."
SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'cae39851201b7781f1262e1c23627b45e6e80bb4':
x86: Add helper macros to check for slow cpuflags
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|