| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Matt Oliver <[email protected]>
|
|
|
|
| |
Signed-off-by: Matt Oliver <[email protected]>
|
|
|
|
| |
Makes it possible to use them in arithmetic expressions.
|
|
|
|
|
| |
Reviewed-by: Michael Niedermayer <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ronald S. Bultje <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
|
|
|
| |
REG_SP is defined by Solaris system headers.
This fixes a sea of warnings while building on Solaris:
http://fate.ffmpeg.org/report.cgi?time=20150820233505&slot=x86-opensolaris-gcc4.3
Signed-off-by: Ganesh Ajjanagadde <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|
|
|
| |
Signed-off-by: Henrik Gramner <[email protected]>
|
|
|
|
|
| |
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
|
|
|
|
|
|
| |
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
|
|
|
|
|
|
|
|
|
|
| |
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
| |
Reviewed-by: "Ronald S. Bultje" <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|
|
|
|
|
|
| |
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Reviewed-by: "Ronald S. Bultje" <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|
|
|
|
|
| |
Silences warnings with Nasm
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
|
|
| |
Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't
optimize the generic C versions of av_bswap*() on their own.
Reviewed-by: Michael Niedermayer <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|\
| |
| |
| |
| |
| |
| | |
* commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59':
x86: Serialize rdtsc in read_time()
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Improves the accuracy of measurements, especially in short sections.
To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."
SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.
Signed-off-by: Luca Barbato <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'cae39851201b7781f1262e1c23627b45e6e80bb4':
x86: Add helper macros to check for slow cpuflags
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on
section redeclaration
The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].
The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].
Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Luca Barbato <[email protected]>
|
| |
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
Signed-off-by: Vittorio Giovara <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Previously there was a limit of two cpuflags.
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
| |
| |
| |
| |
| | |
This makes more sense for future implementations of templates with zmm registers.
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This commit silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on section
redeclaration
The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Peter Cordes <[email protected]>
|
| |
| |
| |
| |
| | |
Reviewed-by: Ronald S. Bultje <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This mimicks what is done for the other instruction sets.
Tested-by: James Almer <[email protected]>
Tested-by: Mickaël Raulet <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
The same can be done with INIT_XMM avx
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
cpuflags
Requested-by: Christophe Gisquet <[email protected]>
Requested-by: "Ronald S. Bultje" <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Reviewed-by: Michael Niedermayer <[email protected]>
Reviewed-by: Ronald S. Bultje <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
Reviewed-by: Paul B Mahol <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Previously there was a limit of two cpuflags.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
This makes more sense for future implementations of templates with zmm registers.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
501 to 439 decicycles.
See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
~560 → ~500 decicycles
This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html
Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| | |
|