aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/x86/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* h264chroma: x86: Fix building with yasm disabledMartin Storsjö2013-02-061-2/+2
| | | | Signed-off-by: Martin Storsjö <martin@martin.st>
* dsputil: Separate h264chromaDiego Biurrun2013-02-061-1/+2
|
* dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasmDaniel Kang2013-01-271-0/+2
| | | | Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* dsputil: Separate h264 qpelMans Rullgard2013-01-241-0/+1
| | | | | | | | | | The sh4 optimizations are removed, because the code is 100% identical to the C code, so it is unlikely to provide any real practical benefit. Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* vorbisdsp: convert x86 simd functions from inline asm to yasm.Ronald S. Bultje2013-01-221-0/+1
|
* Move vorbis_inverse_coupling from dsputil to vorbisdspcontext.Ronald S. Bultje2013-01-191-0/+1
| | | | | Conveniently (together with Justin's earlier patches), this makes our vorbis decoder entirely independent of dsputil.
* Drop Snow codecDiego Biurrun2013-01-061-1/+0
| | | | Snow is a toy codec with no real-world use and horrible code.
* lavc: introduce VideoDSPContextRonald S. Bultje2012-12-201-0/+2
| | | | | | | | Move some functions from dsputil. The idea is that videodsp contains functions that are useful for a large and varied set of video decoders. Currently, it contains emulated_edge_mc() and prefetch(). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* x86: h264: Convert 8-bit QPEL inline assembly to YASMDaniel Kang2012-11-251-1/+2
| | | | Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86: vc1: call ff_vc1dsp_init_x86() under if (ARCH_X86)Janne Grunau2012-10-081-0/+1
|
* x86: cavs: call ff_cavsdsp_init_x86() under if (ARCH_X86)Janne Grunau2012-10-081-1/+1
|
* x86: call most of the x86 dsp init functions under if (ARCH_X86)Janne Grunau2012-10-081-16/+17
| | | | Rename the called dsp init functions to *_init_x86.
* x86: dsputil: Only compile motion_est code when encoders are enabledDiego Biurrun2012-09-101-2/+2
|
* x86: Always compile files with functions that are called unconditionallyDiego Biurrun2012-08-291-3/+3
|
* x86: avcodec: Drop silly "_mmx" suffixes from filenamesDiego Biurrun2012-08-281-8/+8
|
* x86: avcodec: Drop silly "_sse" suffixes from filenamesDiego Biurrun2012-08-281-2/+2
|
* build: fft: x86: Drop unused YASM-OBJS-FFT- variableDiego Biurrun2012-08-271-2/+1
|
* x86: mpegvideo: more sensible names for optimization file and init functionDiego Biurrun2012-08-241-1/+1
|
* x86: mpegvideoenc: Split optimizations off into a separate fileDiego Biurrun2012-08-241-0/+1
|
* dnxhdenc: x86: more sensible names for optimization file and init functionDiego Biurrun2012-08-241-1/+1
|
* build: x86: Only compile mpegvideo optimizations when necessaryDiego Biurrun2012-08-221-1/+1
|
* x86: avcodec: Consistently name all init filesDiego Biurrun2012-08-161-3/+3
|
* x86: avcodec: Appropriately name files containing only init functionsDiego Biurrun2012-08-151-4/+4
|
* x86: Drop silly "_yasm" suffixes from filenamesDiego Biurrun2012-08-121-3/+3
|
* x86: remove libmpeg2 mmx(ext) idct functionsMans Rullgard2012-08-021-1/+0
| | | | | | | | These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>
* fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64.Ronald S. Bultje2012-07-311-2/+0
| | | | | 64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
* vp3: move idct and loop filter pointers to new vp3dsp contextMans Rullgard2012-07-181-0/+1
| | | | | | | | This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>
* build: add CONFIG_VP3DSP, reduce repetition in OBJS listsMans Rullgard2012-07-181-4/+2
| | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* x86: fft: convert sse inline asm to yasmMans Rullgard2012-06-251-1/+0
|
* build: Consistently handle conditional compilation for all optimization OBJS.Diego Biurrun2012-04-121-3/+2
|
* build: prettyprinting cosmeticsDiego Biurrun2012-03-261-47/+40
|
* x86: conditionally compile H.264 QPEL optimizationsDiego Biurrun2012-03-251-1/+1
|
* SBR DSP x86: implement SSE sbr_sum_square_sseChristophe GISQUET2012-02-231-0/+2
| | | | | | | | | | | | | The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* win64: add a XMM clobber test configure option.Ronald S. Bultje2012-02-021-0/+1
| | | | | | | This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
* rv40: x86 SIMD for biweightChristophe Gisquet2012-01-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* x86: Give RV40 init file a more suitable name.Diego Biurrun2012-01-301-1/+1
|
* png: convert DSP functions to yasm.Ronald S. Bultje2012-01-291-0/+1
|
* png: move DSP functions to their own DSP context.Ronald S. Bultje2012-01-291-0/+1
|
* rv34: DC-only inverse transformChristophe GISQUET2012-01-121-1/+5
| | | | | | | | | | | | | | | When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>
* mpegaudiodec: optimized iMDCT transformVitor Sessak2012-01-081-0/+1
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* x86: conditionally compile dnxhd encoder optimizationsDiego Biurrun2011-12-191-2/+2
|
* build: conditionally compile x86 H.264 chroma optimizationsDiego Biurrun2011-12-141-2/+3
|
* prores: idct sse2/sse4 optimizations.Ronald S. Bultje2011-10-111-0/+2
| | | | ~3.0-3.5x as fast as original C version, 1.6x as fast overall.
* Move RV3/4-specific DSP functions into their own contextKostya Shishkov2011-08-111-0/+2
| | | | Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* H.264: Add x86 assembly for 10-bit H.264 qpel functions.Daniel Kang2011-07-031-0/+1
| | | | | | | | Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* h264: Add x86 assembly for 10-bit weight/biweight H.264 functions.Daniel Kang2011-06-211-0/+1
| | | | | | Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions.Daniel Kang2011-06-181-0/+1
| | | | | | Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* Add x86 assembly for some 10-bit H.264 intra predict functions.Daniel Kang2011-06-061-1/+2
| | | | | | | Parts are inspired from the 8-bit H.264 predict code in Libav. Other parts ported from x264 with relicensing permission from author. Signed-off-by: Diego Biurrun <diego@biurrun.de>
* Add IDCT functions for 10-bit H.264.Daniel Kang2011-05-311-1/+2
| | | | | | | | Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>
* dct32: port SSE 32-point DCT to YASMVitor Sessak2011-05-211-1/+2
|