ffmpeg - Mirror of FFmpeg git repo

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add macros to x86util.asm .	Ivan Kalvachev	2017-08-18	1	-8/+98
\| \| \| \| \| \| \| \| \|	Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <[email protected]>
*	x86inc: don't use read-only data sections on COFF targets	James Almer	2017-06-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by: Clément Bœsch <[email protected]> Signed-off-by: James Almer <[email protected]>
*	build: Generalize yasm/nasm-related variable names	Diego Biurrun	2017-06-21	2	-4/+4
\| \| \| \| \| \| \| \|	None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <[email protected]>
*	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}	James Almer	2017-06-18	1	-6/+9
\| \| \| \|	About 2x faster than the c version.
*	x86inc: Add some additional cpuflag relations	Henrik Gramner	2017-06-12	1	-19/+19
\| \| \| \| \| \| \| \|	Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2
*	x86inc: Remove argument from WIN64_RESTORE_XMM	Anton Mitrofanov	2017-06-09	1	-9/+10
\| \| \| \| \|	The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.
*	x86inc: Prefer r14/r15 over r12/r13 on x86-64	Henrik Gramner	2017-06-09	1	-8/+8
\| \| \| \| \| \| \|	Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.
*	x86inc: Make REP_RET identical to RET in SSSE3+ functions	Henrik Gramner	2017-06-09	1	-1/+1
\| \| \| \|	There's no point in emitting a rep prefix before ret on modern CPUs.
*	x86inc: Fix call with memory operands	Henrik Gramner	2017-06-09	1	-2/+6
\| \| \| \| \| \|	We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.
*	x86/float_dsp: remove usage of integer instructions	James Almer	2017-05-12	1	-7/+7
\|
*	x86/float_dsp: add ff_vector_fmul_reverse_avx2	James Almer	2017-04-11	2	-1/+19
\| \| \| \| \| \|	~20% faster than AVX. Signed-off-by: James Almer <[email protected]>
*	x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}	James Almer	2017-04-10	2	-0/+73
\|
*	Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'	Clément Bœsch	2017-03-30	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	* commit '99434f4df81b6801b2b535d5b9143305595784f6': float_dsp: Have implementation match function pointer prototype Merged-by: Clément Bœsch <[email protected]>
\| *	float_dsp: Have implementation match function pointer prototype	Diego Biurrun	2016-11-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
* \|	Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'	James Almer	2017-03-23	2	-4/+4
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8': emms: Give apriv_emms_yasm() a more general name Merged-by: James Almer <[email protected]>
\| *	emms: Give apriv_emms_yasm() a more general name	Diego Biurrun	2016-10-18	2	-4/+4
\| \|
* \|	Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'	James Almer	2017-03-23	1	-2/+2
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4': x86: Add missing colons after assembly labels Merged-by: James Almer <[email protected]>
\| *	x86: Add missing colons after assembly labels	Diego Biurrun	2016-10-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	This fixes many warnings of the sort warning: label alone on a line without a colon might be in error
* \|	avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵	James Almer	2017-03-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	are the same Reviewed-by: Henrik Gramner <[email protected]> Signed-off-by: James Almer <[email protected]>
* \|	Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'	Clément Bœsch	2017-03-20	1	-0/+10
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <[email protected]>
\| *	x86util: Document SBUTTERFLY macro	Alexandra Hájková	2016-09-19	1	-0/+10
\| \| \| \| \| \| \| \|	Signed-off-by: Luca Barbato <[email protected]>
* \|	Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'	Clément Bœsch	2017-03-20	3	-0/+104
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5': imgutils: add a function for copying image data from GPU mapped memory Merged-by: Clément Bœsch <[email protected]>
\| *	imgutils: add a function for copying image data from GPU mapped memory	Anton Khirnov	2016-08-31	3	-0/+104
\| \| \| \| \| \| \| \|	See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
* \|	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter	James Darnley	2017-02-18	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
* \|	x86util: import MOVHL macro	James Darnley	2017-02-18	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
* \|	avcodec/x86: deduplicate PASS8ROWS macro	James Darnley	2017-02-18	1	-0/+5
\| \|
* \|	Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'	James Almer	2017-01-31	1	-0/+6
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '8e9cd81d291b1010c625b2766058aadf4affb537': x86: cpu: Detect Conroe CPUs and their slow shuffle unit Merged-by: James Almer <[email protected]>
\| *	x86: cpu: Detect Conroe CPUs and their slow shuffle unit	Fiona Glaser	2016-07-20	1	-0/+6
\| \|
* \|	Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'	James Almer	2017-01-31	1	-0/+6
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '7d7355aa92bb36ca0765c49a569a999bcb96f332': x86: Add SSSE3_SLOW CPU flag and related convenience macros Merged-by: James Almer <[email protected]>
\| *	x86: Add SSSE3_SLOW CPU flag and related convenience macros	Diego Biurrun	2016-07-20	1	-0/+6
\| \|
\| *	x86util: Extend SPLATW for avx2	James Almer	2016-07-18	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Integration to Libav by Josh de Kock <[email protected]>. Signed-off-by: Alexandra Hájková <[email protected]>
\| *	asm: FF_-prefix internal macros used in inline assembly	Diego Biurrun	2016-05-28	2	-35/+35
\| \| \| \| \| \| \| \| \| \|	These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.
\| *	x86inc: Enable AVX emulation in additional cases	Anton Mitrofanov	2016-05-16	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Improve handling of %ifid with multi-token parameters	Anton Mitrofanov	2016-05-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Fix AVX emulation of some instructions	Anton Mitrofanov	2016-05-16	1	-20/+24
\| \| \| \| \| \| \| \|	Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Fix AVX emulation of scalar float instructions	Henrik Gramner	2016-05-16	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. Signed-off-by: Anton Khirnov <[email protected]>
* \|	x86inc: Avoid using eax/rax for storing the stack pointer	Henrik Gramner	2017-01-09	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value.
* \|	avutil/x86/emms: Document the emms_c() vs alloc/free relation.	Michael Niedermayer	2016-10-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Reviewed-by: Andreas Cadhalpun <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* \|	vp9: add 16x16 idct avx2 (8-bit).	Ronald S. Bultje	2016-07-11	1	-1/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
* \|	asm: FF_-prefix internal macros used in inline assembly	Matthieu Bouron	2016-06-27	2	-35/+35
\| \| \| \| \| \| \| \|	See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
* \|	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'	Clément Bœsch	2016-06-21	1	-1/+1
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <[email protected]>
\| *	cosmetics: Fix spelling mistakes	Vittorio Giovara	2016-05-04	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Diego Biurrun <[email protected]>
\| *	x86: Add ymm_reg struct	James Almer	2016-01-28	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Needed to declare 32-byte long constants Signed-off-by: James Almer <[email protected]> Signed-off-by: Luca Barbato <[email protected]>
\| *	x86inc: Add debug symbols indicating sizes of compiled functions	Geza Lore	2016-01-23	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Avoid creating unnecessary local labels	Henrik Gramner	2016-01-23	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Simplify AUTO_REP_RET	Henrik Gramner	2016-01-23	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Use more consistent indentation	Henrik Gramner	2016-01-23	1	-67/+67
\| \| \| \| \| \| \| \|	Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Preserve arguments when allocating stack space	Henrik Gramner	2016-01-23	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Improve FMA instruction handling	Henrik Gramner	2016-01-23	1	-40/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. Signed-off-by: Anton Khirnov <[email protected]>
\| *	x86inc: Be more verbose in assertion failures	Henrik Gramner	2016-01-23	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Anton Khirnov <[email protected]>