| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
| |
Based on x264 code
Signed-off-by: James Almer <[email protected]>
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <[email protected]>
|
|
|
|
|
|
|
| |
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
| |
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
| |
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
| |
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
| |
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
| |
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
| |
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
|
|
|
|
|
| |
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <[email protected]>
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <[email protected]>
|
| |
|
|
|
|
|
|
| |
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <[email protected]>
|
|
|
|
|
|
| |
4x-6x faster on sandybridge
Signed-off-by: Luca Barbato <[email protected]>
|
| |
|
|
|
|
|
| |
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
|
|
|
|
| |
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
|
|
| |
cmp{p,s}{s,d} instructions do take an imm8 operand.
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <[email protected]>
|
| |
|
| |
|
|
|
|
| |
This makes the aac decoder and all voice codecs independent of dsputil.
|
|
|
|
|
|
| |
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
|
| |
|
|
|
|
|
|
|
| |
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
| |
This allows defining externally visible library symbols.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
|
| |
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <[email protected]>
|
|
|
|
|
|
| |
This fixes builds on 64bit MSVC.
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
| |
Signed-off-by: Luca Barbato <[email protected]>
|
| |
|
| |
|
| |
|
| |
|
| |
|