ffmpeg - Mirror of FFmpeg git repo

	Commit message (Collapse)	Author	Age	Files	Lines
*	get_cabac_inline_x86: Don't inline the assembly function on 32 bit	Christopher Degawa	2023-04-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While the inline cabac assembly has worked correctly in i386 builds historically, modern compiler updates has started showing issues with it, when the function gets inlined into larger contexts that fail to provide the amount of free registers as this function requires. This was an issue with Clang on Windows on i386, which was fixed in c6d284b945324a7bc70ea8b9056040c8148aa835. However, recently the same issues also have started showing up with GCC (both for Windows and Linux). Whether the issue appears seems dependent on a lot of optimizer tuning (e.g. the issue appears or goes away depenent on the combinaton of -march= and -mtune= options), potentially due to the compiler making different decisions on how much to inline. Fixes: https://trac.ffmpeg.org/ticket/8903 Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86: replace explicit REP_RETs with RETs	Lynne	2023-02-01	39	-163/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.
*	avcodec/x86: add avx512icl function for v210dec	James Darnley	2022-12-20	2	-2/+68
\| \| \| \|	Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
*	avcodec/x86/v210: add some comments to the improved avx2 function	James Darnley	2022-12-20	1	-6/+6
\|
*	avcodec/x86/Makefile: Don't build empty files	Andreas Rheinhardt	2022-12-13	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	simple_idct.asm is 32 bit-only since bfb28b5ce89f3e950214b67ea95b45e3355c2caf, whereas simple_idct10.asm is x64-only. So don't build the ultimately unneeded and empty files, as some linkers complain about this: "ranlib: file: libavcodec/libavcodec.a(simple_idct.o) has no symbols" (this is from an Xcode toolchain as reported by Ronald S. Bultje). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/x86/v210enc: change '0b' binary constant prefix to 'b' suffix	James Darnley	2022-12-03	1	-2/+2
\| \| \| \|	For compatability with yasm from 0.7.0
*	avcodec/x86/v210enc: remove unneeded instruction	James Darnley	2022-12-01	1	-1/+0
\|
*	avcodec/x86/v210enc: expand and correct comments	James Darnley	2022-12-01	1	-4/+4
\|
*	avcodec/v210enc: add new 10-bit function for avx512 avx512icl	James Darnley	2022-12-01	2	-0/+111
\| \| \| \| \| \| \| \|	avx512 on Skylake-X (Xeon D-2123IT): 1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2 avx512icl on Ice Lake (Xeon Silver 4316): 2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
*	avcodec/x86/v210enc: replace register use with named register	James Darnley	2022-12-01	1	-1/+1
\|
*	avcodec/x86/cavsdsp: Remove unused 3DNow-macro	Andreas Rheinhardt	2022-11-09	1	-4/+0
\| \| \| \| \| \|	Forgotten in 3221aba87989742ea22b639a7bb4af69f4eaa0e7. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	libavcodec: remove mdct15	Lynne	2022-11-06	3	-327/+0
\| \| \| \| \|	It's not needed nor used by anything anymore, lavu/tx is faster, and better in every way. RIP.
*	dca_core: convert to lavu/tx	Lynne	2022-11-06	1	-3/+4
\| \| \| \| \|	Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.
*	avcodec/v210enc: add new function for avx2 avx512 avx512icl	James Darnley	2022-11-04	2	-2/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and Broadwell (Xeon E5-2620 v4): 1690±4.3 decicycles vs. 1693±78.4 1439±31.1 decicycles vs 1429±16.7 Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT): 1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2 Better speedup with avx512icl on Ice Lake (Xeon Silver 4316): 1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2 Co-authors: Henrik Gramner <henrik@gramner.com> Kieran Kunhya <kierank@obe.tv>
*	avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 only	Andreas Rheinhardt	2022-10-20	2	-5/+4
\| \| \| \| \| \| \| \|	It is only used by gmc/gmc1 which is only used by the MPEG-4 decoder, so move it to Mpeg4DecContext and rename it to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/svq1enc: Add SVQ1EncDSPContext, make codec context private	Andreas Rheinhardt	2022-10-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, SVQ1EncContext is defined in a header that is also included by the arch-specific code that initializes the one and only dsp function that this encoder uses directly. But the arch-specific functions to set this dsp function do not need anything from SVQ1EncContext. This commit therefore adds a small SVQ1EncDSPContext whose only member is said function pointer and renames svq1enc.h to svq1encdsp.h to avoid exposing unnecessary internals to these init functions (and the whole mpegvideo with it). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	lavc/x86/simple_idct: Fix linking shared libavcodec with MS link.exe	Carl Eugen Hoyos	2022-10-10	1	-1/+1
\| \| \| \| \| \|	link.exe hangs on empty simple_idct.o Fixes ticket #9909.
*	avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dsp	Andreas Rheinhardt	2022-10-09	1	-2/+2
\| \| \| \| \| \|	It is the only thing that is actually used. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/ac3dsp: Remove unused parameter	Andreas Rheinhardt	2022-09-29	1	-1/+1
\| \| \| \| \| \|	Forgotten in fd98594a8831ce037a495b6d7e090bd8f81e83a1. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/dirac_dwt: Avoid conversions between function pointers and void*	Andreas Rheinhardt	2022-09-28	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pointers to void can be converted to any pointer to incomplete or object type and back; but they are nevertheless not completely generic pointers: There is no provision in the C standard that guarantees their convertibility with function pointers. C90 lacks a generic function pointer, C99 made every function pointer a generic function pointer and still disallows the convertibility with void *. Both GCC as well as Clang warn about this when using -pedantic. Therefore use unions to avoid these conversions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	x86/lpc: use fused negative multiply-add instructions where useful	James Almer	2022-09-22	1	-2/+15
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avcodec/lpc: zero the middle odd sample in the output	James Almer	2022-09-22	1	-3/+7
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	avcodec/lpc: use ptrdiff_t for length parameters	James Almer	2022-09-22	2	-4/+3
\| \| \| \|	Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/aacpsdsp: add ps_hybrid_analysis_fma3	James Almer	2022-09-22	2	-23/+25
\| \| \| \| \| \|	This replace the sse3 version, which was not really faster than the sse one. Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/aacpsdsp: precompute constant factors	James Almer	2022-09-22	1	-18/+24
\| \| \| \| \| \|	Inspired by the optimization done to the C version by Rémi Denis-Courmont. Signed-off-by: James Almer <jamrial@gmail.com>
*	x86/lpc: Fix parameter sign extension, unbreaking checkasm-lpc on x86_64 windows	Martin Storsjö	2022-09-22	1	-0/+1
\| \| \| \|	Signed-off-by: Martin Storsjö <martin@martin.st>
*	x86/lpc: fix even scalar loop overreads/writes	Lynne	2022-09-22	1	-13/+19
\| \| \| \|	Passes checkasm with valgrind, tested to sizes of more than 4000 samples.
*	x86/lpc: fix odd scalar loop overreads/writes	Lynne	2022-09-22	1	-5/+4
\|
*	avcodec/fmtconvert: Remove unused AVCodecContext parameter	Andreas Rheinhardt	2022-09-21	1	-1/+1
\| \| \| \| \| \| \|	Unused since d74a8cb7e42f703be5796eeb485f06af710ae8ca. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/blockdsp: Remove unused AVCodecContext parameter	Andreas Rheinhardt	2022-09-21	1	-2/+1
\| \| \| \| \| \| \|	Possible since be95df12bb06b183c8d2aea3b0831fdf05466cf3. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/cavsdsp: Remove unused function parameter	Andreas Rheinhardt	2022-09-21	1	-4/+3
\| \| \| \| \|	Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	x86/lpc: implement a new Welch windowing function	Lynne	2022-09-21	3	-58/+258
\| \| \| \| \| \| \| \| \| \| \| \|	Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version by 33%. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests.
*	lavc/vorbisdsp: use ptrdiff_t rather than intptr_t	Rémi Denis-Courmont	2022-09-19	1	-1/+1
\| \| \| \|	... for a difference between pointers.
*	avcodec/x86/audiodsp: add scalarproduct avx2	Paul B Mahol	2022-09-13	2	-0/+24
\|
*	avcodec/vp8dsp: Constify src in vp8_mc_func	Andreas Rheinhardt	2022-09-11	2	-30/+30
\| \| \| \| \| \|	Reviewed-by: Peter Ross <pross@xvid.org> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/x86/flacdsp_init: Remove double ';'	Andreas Rheinhardt	2022-09-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Inside a function, the second ';' in ";;" is just a null statement, but it is actually illegal outside of functions. Compilers nevertheless accept it without warning, except when in -pedantic mode when e.g. Clang emits a -Wextra-semi warning. Therefore remove the unnecessary ';'. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/x86/flacdsp: fix bug in decorrelation	Paul B Mahol	2022-09-05	2	-20/+44
\| \| \| \|	Fixes #9297
*	avutil/mem_internal: Fix headers	Andreas Rheinhardt	2022-08-24	1	-0/+1
\| \| \| \| \| \| \|	Including avassert.h is unnecessary since commit 786be70e28fe739b8e49893fa13ae4652a68d1ea. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	x86: Don't hardcode the height to 8 in sad8_xy2_mmx	Martin Storsjö	2022-08-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The height is hardcoded in some of the me_cmp functions, but not in all of them. But in the case of all other functions, it's hardcoded in the same place in SIMD functions as in the C reference functions, while this one function differs from the behaviour of the C code. (Before 542765ce3eccbca587d54262a512cbdb1407230d, there were a couple other sad8_*_mmx functions with similar hardcoded height.) Signed-off-by: Martin Storsjö <martin@martin.st>
*	avcodec/flacdsp: Split encoder-only parts into a ctx of its own	Andreas Rheinhardt	2022-08-05	3	-13/+40
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/flacdsp: Remove unused function parameter	Andreas Rheinhardt	2022-08-05	1	-2/+1
\| \| \| \| \| \|	Forgotten in e609cfd697f8eed7325591f767585041719807d1. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/h264chroma: Constify src in h264_chroma_mc_func	Andreas Rheinhardt	2022-08-05	5	-23/+23
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/hevcdsp: Constify src pointers	Andreas Rheinhardt	2022-08-05	6	-126/+119
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/mpegvideodsp: Constify src pointers	Andreas Rheinhardt	2022-07-31	1	-1/+1
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/mpegvideoencdsp: Allow pointers to const where possible	Andreas Rheinhardt	2022-07-31	3	-7/+7
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/me_cmp: Constify me_cmp_func buffer parameters	Andreas Rheinhardt	2022-07-31	2	-48/+48
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/cfhdencdsp: Constify input pointers	Andreas Rheinhardt	2022-07-31	1	-2/+2
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/lossless_videoencdsp: Constify src sub_left_predict	Andreas Rheinhardt	2022-07-31	2	-2/+2
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/videodsp: Constify buf in VideoDSPContext.prefetch	Andreas Rheinhardt	2022-07-31	1	-1/+1
\| \| \| \|	Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
*	avcodec/vp56: Move VP5-9 range coder functions to a header of their own	Andreas Rheinhardt	2022-07-28	1	-6/+8
\| \| \| \| \| \|	Also use a vpx prefix for them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>