| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
|
|
| |
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
|
| |
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Henrik Gramner <[email protected]>
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Henrik Gramner <[email protected]>
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
|
|
|
| |
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.
Also add pmacsdql emulation.
Signed-off-by: Henrik Gramner <[email protected]>
Signed-off-by: Anton Khirnov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improves the accuracy of measurements, especially in short sections.
To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."
SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on
section redeclaration
The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].
The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].
Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Michael Niedermayer <[email protected]>
Signed-off-by: Vittorio Giovara <[email protected]>
|
|
|
|
|
|
| |
Previously there was a limit of two cpuflags.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
| |
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
| |
This makes more sense for future implementations of templates with zmm registers.
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
|
|
|
|
|
|
| |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
| |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <[email protected]>
|
|
|
|
|
|
|
| |
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
| |
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
| |
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
| |
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
| |
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
| |
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <[email protected]>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <[email protected]>
|
| |
|
|
|
|
|
|
| |
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
|
| |
4x-6x faster on sandybridge
Signed-off-by: Luca Barbato <[email protected]>
|
| |
|
|
|
|
|
| |
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
|
|
|
|
| |
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
|
|
| |
cmp{p,s}{s,d} instructions do take an imm8 operand.
Signed-off-by: Diego Biurrun <[email protected]>
|