aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/x86
Commit message (Collapse)AuthorAgeFilesLines
* get_cabac_inline_x86: Don't inline the assembly function on 32 bitChristopher Degawa2023-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | While the inline cabac assembly has worked correctly in i386 builds historically, modern compiler updates has started showing issues with it, when the function gets inlined into larger contexts that fail to provide the amount of free registers as this function requires. This was an issue with Clang on Windows on i386, which was fixed in c6d284b945324a7bc70ea8b9056040c8148aa835. However, recently the same issues also have started showing up with GCC (both for Windows and Linux). Whether the issue appears seems dependent on a lot of optimizer tuning (e.g. the issue appears or goes away depenent on the combinaton of -march= and -mtune= options), potentially due to the compiler making different decisions on how much to inline. Fixes: https://trac.ffmpeg.org/ticket/8903 Signed-off-by: Martin Storsjö <martin@martin.st>
* x86: replace explicit REP_RETs with RETsLynne2023-02-0139-163/+163
| | | | | | | | | | | | | | | | | | | From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.
* avcodec/x86: add avx512icl function for v210decJames Darnley2022-12-202-2/+68
| | | | Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2
* avcodec/x86/v210: add some comments to the improved avx2 functionJames Darnley2022-12-201-6/+6
|
* avcodec/x86/Makefile: Don't build empty filesAndreas Rheinhardt2022-12-131-2/+5
| | | | | | | | | | | | simple_idct.asm is 32 bit-only since bfb28b5ce89f3e950214b67ea95b45e3355c2caf, whereas simple_idct10.asm is x64-only. So don't build the ultimately unneeded and empty files, as some linkers complain about this: "ranlib: file: libavcodec/libavcodec.a(simple_idct.o) has no symbols" (this is from an Xcode toolchain as reported by Ronald S. Bultje). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/x86/v210enc: change '0b' binary constant prefix to 'b' suffixJames Darnley2022-12-031-2/+2
| | | | For compatability with yasm from 0.7.0
* avcodec/x86/v210enc: remove unneeded instructionJames Darnley2022-12-011-1/+0
|
* avcodec/x86/v210enc: expand and correct commentsJames Darnley2022-12-011-4/+4
|
* avcodec/v210enc: add new 10-bit function for avx512 avx512iclJames Darnley2022-12-012-0/+111
| | | | | | | | avx512 on Skylake-X (Xeon D-2123IT): 1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2 avx512icl on Ice Lake (Xeon Silver 4316): 2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
* avcodec/x86/v210enc: replace register use with named registerJames Darnley2022-12-011-1/+1
|
* avcodec/x86/cavsdsp: Remove unused 3DNow-macroAndreas Rheinhardt2022-11-091-4/+0
| | | | | | Forgotten in 3221aba87989742ea22b639a7bb4af69f4eaa0e7. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* libavcodec: remove mdct15Lynne2022-11-063-327/+0
| | | | | It's not needed nor used by anything anymore, lavu/tx is faster, and better in every way. RIP.
* dca_core: convert to lavu/txLynne2022-11-061-3/+4
| | | | | Thanks to Martin Storsjö <martin@martin.st> for fixing and testing the arm32 and aarch64 changes.
* avcodec/v210enc: add new function for avx2 avx512 avx512iclJames Darnley2022-11-042-2/+92
| | | | | | | | | | | | | | | | | Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and Broadwell (Xeon E5-2620 v4): 1690±4.3 decicycles vs. 1693±78.4 1439±31.1 decicycles vs 1429±16.7 Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT): 1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2 Better speedup with avx512icl on Ice Lake (Xeon Silver 4316): 1.77x faster (784±1.8 vs. 442±11.6 decicycles) compared with avx2 Co-authors: Henrik Gramner <henrik@gramner.com> Kieran Kunhya <kierank@obe.tv>
* avcodec/mpegvideodsp: Make MpegVideoDSP MPEG-4 onlyAndreas Rheinhardt2022-10-202-5/+4
| | | | | | | | It is only used by gmc/gmc1 which is only used by the MPEG-4 decoder, so move it to Mpeg4DecContext and rename it to Mpeg4VideoDSP. Also compile it iff the MPEG-4 decoder is compiled. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/svq1enc: Add SVQ1EncDSPContext, make codec context privateAndreas Rheinhardt2022-10-141-2/+2
| | | | | | | | | | | | | | | Currently, SVQ1EncContext is defined in a header that is also included by the arch-specific code that initializes the one and only dsp function that this encoder uses directly. But the arch-specific functions to set this dsp function do not need anything from SVQ1EncContext. This commit therefore adds a small SVQ1EncDSPContext whose only member is said function pointer and renames svq1enc.h to svq1encdsp.h to avoid exposing unnecessary internals to these init functions (and the whole mpegvideo with it). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavc/x86/simple_idct: Fix linking shared libavcodec with MS link.exeCarl Eugen Hoyos2022-10-101-1/+1
| | | | | | link.exe hangs on empty simple_idct.o Fixes ticket #9909.
* avcodec/huffyuvencdsp: Pass pix_fmt directly when initing dspAndreas Rheinhardt2022-10-091-2/+2
| | | | | | It is the only thing that is actually used. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/ac3dsp: Remove unused parameterAndreas Rheinhardt2022-09-291-1/+1
| | | | | | Forgotten in fd98594a8831ce037a495b6d7e090bd8f81e83a1. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/dirac_dwt: Avoid conversions between function pointers and void*Andreas Rheinhardt2022-09-281-8/+8
| | | | | | | | | | | | | | Pointers to void can be converted to any pointer to incomplete or object type and back; but they are nevertheless not completely generic pointers: There is no provision in the C standard that guarantees their convertibility with function pointers. C90 lacks a generic function pointer, C99 made every function pointer a generic function pointer and still disallows the convertibility with void *. Both GCC as well as Clang warn about this when using -pedantic. Therefore use unions to avoid these conversions. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/lpc: use fused negative multiply-add instructions where usefulJames Almer2022-09-221-2/+15
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/lpc: zero the middle odd sample in the outputJames Almer2022-09-221-3/+7
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* avcodec/lpc: use ptrdiff_t for length parametersJames Almer2022-09-222-4/+3
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86/aacpsdsp: add ps_hybrid_analysis_fma3James Almer2022-09-222-23/+25
| | | | | | This replace the sse3 version, which was not really faster than the sse one. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/aacpsdsp: precompute constant factorsJames Almer2022-09-221-18/+24
| | | | | | Inspired by the optimization done to the C version by Rémi Denis-Courmont. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/lpc: Fix parameter sign extension, unbreaking checkasm-lpc on x86_64 windowsMartin Storsjö2022-09-221-0/+1
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* x86/lpc: fix even scalar loop overreads/writesLynne2022-09-221-13/+19
| | | | Passes checkasm with valgrind, tested to sizes of more than 4000 samples.
* x86/lpc: fix odd scalar loop overreads/writesLynne2022-09-221-5/+4
|
* avcodec/fmtconvert: Remove unused AVCodecContext parameterAndreas Rheinhardt2022-09-211-1/+1
| | | | | | | Unused since d74a8cb7e42f703be5796eeb485f06af710ae8ca. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/blockdsp: Remove unused AVCodecContext parameterAndreas Rheinhardt2022-09-211-2/+1
| | | | | | | Possible since be95df12bb06b183c8d2aea3b0831fdf05466cf3. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/cavsdsp: Remove unused function parameterAndreas Rheinhardt2022-09-211-4/+3
| | | | | Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86/lpc: implement a new Welch windowing functionLynne2022-09-213-58/+258
| | | | | | | | | | | | Old one was written with the assumption only even inputs would be given. This very messy replacement supports even and odd inputs, and supports AVX2 for extra speed. The buffers given are usually quite big (4k samples), so the speedup is worth it. The new SSE version is still faster than the old inline asm version by 33%. Also checkasm is provided to make sure this monstrosity works. This fixes some FATE tests.
* lavc/vorbisdsp: use ptrdiff_t rather than intptr_tRémi Denis-Courmont2022-09-191-1/+1
| | | | ... for a difference between pointers.
* avcodec/x86/audiodsp: add scalarproduct avx2Paul B Mahol2022-09-132-0/+24
|
* avcodec/vp8dsp: Constify src in vp8_mc_funcAndreas Rheinhardt2022-09-112-30/+30
| | | | | | Reviewed-by: Peter Ross <pross@xvid.org> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/x86/flacdsp_init: Remove double ';'Andreas Rheinhardt2022-09-051-1/+1
| | | | | | | | | | Inside a function, the second ';' in ";;" is just a null statement, but it is actually illegal outside of functions. Compilers nevertheless accept it without warning, except when in -pedantic mode when e.g. Clang emits a -Wextra-semi warning. Therefore remove the unnecessary ';'. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/x86/flacdsp: fix bug in decorrelationPaul B Mahol2022-09-052-20/+44
| | | | Fixes #9297
* avutil/mem_internal: Fix headersAndreas Rheinhardt2022-08-241-0/+1
| | | | | | | Including avassert.h is unnecessary since commit 786be70e28fe739b8e49893fa13ae4652a68d1ea. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* x86: Don't hardcode the height to 8 in sad8_xy2_mmxMartin Storsjö2022-08-171-2/+1
| | | | | | | | | | | | The height is hardcoded in some of the me_cmp functions, but not in all of them. But in the case of all other functions, it's hardcoded in the same place in SIMD functions as in the C reference functions, while this one function differs from the behaviour of the C code. (Before 542765ce3eccbca587d54262a512cbdb1407230d, there were a couple other sad8_*_mmx functions with similar hardcoded height.) Signed-off-by: Martin Storsjö <martin@martin.st>
* avcodec/flacdsp: Split encoder-only parts into a ctx of its ownAndreas Rheinhardt2022-08-053-13/+40
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/flacdsp: Remove unused function parameterAndreas Rheinhardt2022-08-051-2/+1
| | | | | | Forgotten in e609cfd697f8eed7325591f767585041719807d1. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/h264chroma: Constify src in h264_chroma_mc_funcAndreas Rheinhardt2022-08-055-23/+23
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/hevcdsp: Constify src pointersAndreas Rheinhardt2022-08-056-126/+119
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/mpegvideodsp: Constify src pointersAndreas Rheinhardt2022-07-311-1/+1
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/mpegvideoencdsp: Allow pointers to const where possibleAndreas Rheinhardt2022-07-313-7/+7
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/me_cmp: Constify me_cmp_func buffer parametersAndreas Rheinhardt2022-07-312-48/+48
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/cfhdencdsp: Constify input pointersAndreas Rheinhardt2022-07-311-2/+2
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/lossless_videoencdsp: Constify src sub_left_predictAndreas Rheinhardt2022-07-312-2/+2
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/videodsp: Constify buf in VideoDSPContext.prefetchAndreas Rheinhardt2022-07-311-1/+1
| | | | Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* avcodec/vp56: Move VP5-9 range coder functions to a header of their ownAndreas Rheinhardt2022-07-281-6/+8
| | | | | | Also use a vpx prefix for them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>