summaryrefslogtreecommitdiffstats
path: root/libavcodec/x86
Commit message (Collapse)AuthorAgeFilesLines
* vp9/x86: idct_add_16x16_ssse3.Ronald S. Bultje2013-12-142-9/+275
| | | | | | | Currently only dc-only and full 16x16. Other subforms will follow in the near future. Total decoding time of ped1080p.webm goes from 9.7 to 9.3 seconds. DC-only goes from 957 -> 131 cycles, and the full IDCT goes from ~4050 to ~745 cycles.
* Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3'Michael Niedermayer2013-12-092-29/+27
|\ | | | | | | | | | | | | | | | | | | | | | | * commit '4958f35a2ebc307049ff2104ffb944f5f457feb3': dsputil: Move apply_window_int16 to ac3dsp Conflicts: libavcodec/arm/ac3dsp_init_arm.c libavcodec/arm/ac3dsp_neon.S libavcodec/x86/ac3dsp_init.c Merged-by: Michael Niedermayer <[email protected]>
| * dsputil: Move apply_window_int16 to ac3dspDiego Biurrun2013-12-082-29/+27
| | | | | | | | The (optimized) functions are used nowhere else.
* | vp9: implement top/left half (4x4) sub-8x8-IDCT.Ronald S. Bultje2013-12-071-2/+41
| | | | | | | | | | For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from 668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.
* | vp9: split pre-load of 11585x2 out of 1d idct macro.Ronald S. Bultje2013-12-071-3/+3
| | | | | | | | This allows us to load it only once, instead of twice, in this function.
* | vp9: minor refactorings in idct ssse3 assembly.Ronald S. Bultje2013-12-071-40/+38
| | | | | | | | | | Make register usage in macros explicit; change mulsub_2w_4x to use 2 instead of 3 temp registers.
* | vp9: split x86 assembly in two files.Ronald S. Bultje2013-12-073-259/+282
| | | | | | | | | | (And in future, loopfilter or intra pred could be put in their own respective files also.)
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-12-053-14/+14
|\| | | | | | | | | | | | | * qatar/master: x86: Initialize mmxext after amd3dnow optimizations Merged-by: Michael Niedermayer <[email protected]>
| * x86: Initialize mmxext after amd3dnow optimizationsDiego Biurrun2013-12-043-14/+14
| | | | | | | | | | | | The mmxext optimizations should be at least equally fast if available and amd3dnow optimizations are being deprecated. Thus the former should override the latter, not the other way around.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-12-022-3/+8
|\| | | | | | | | | | | | | | | | | | | * qatar/master: dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo If someone optimizes dct_quantize for non x86 SIMD, then this probably needs to be reverted. Merged-by: Michael Niedermayer <[email protected]>
| * dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideoDiego Biurrun2013-12-022-3/+8
| | | | | | | | The table is MMX-specific and used nowhere else.
* | Merge commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5'Michael Niedermayer2013-11-281-0/+3
|\| | | | | | | | | | | | | * commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5': x86: dsputil: Suppress deprecation warnings for XvMC bits Merged-by: Michael Niedermayer <[email protected]>
| * x86: dsputil: Suppress deprecation warnings for XvMC bitsDiego Biurrun2013-11-281-0/+3
| | | | | | | | | | | | These parts are scheduled for removal on the next version bump. Signed-off-by: Vittorio Giovara <[email protected]>
| * lavc: VP9 decoderRonald S. Bultje2013-11-153-0/+524
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally written by Ronald S. Bultje <[email protected]> and Clément Bœsch <[email protected]> Further contributions by: Anton Khirnov <[email protected]> Diego Biurrun <[email protected]> Luca Barbato <[email protected]> Martin Storsjö <[email protected]> Signed-off-by: Luca Barbato <[email protected]> Signed-off-by: Anton Khirnov <[email protected]>
* | avcodec/x86/vp9dsp: merge a few SWAP together.Clément Bœsch2013-11-211-6/+3
| |
* | avcodec/x86: remove 3 sub in pred4x4_tm_vp8_8.Clément Bœsch2013-11-171-4/+1
| | | | | | | | | | | | | | | | | | | | before: 411 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388289 runs, 319 skips after: 389 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388308 runs, 300 skips Tested on i7 920.
* | avcodec/x86/vp9dsp: use EXTERNAL_* macros.Clément Bœsch2013-11-161-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | Original fix by one of these developers: Anton Khirnov <[email protected]> Diego Biurrun <[email protected]> Luca Barbato <[email protected]> Martin Storsjö <[email protected]> See 97962b2 / 72ca830 Personnal guess is Diego Biurrun.
* | Merge commit '458446acfa1441d283dacf9e6e545beb083b8bb0'Michael Niedermayer2013-11-152-16/+21
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '458446acfa1441d283dacf9e6e545beb083b8bb0': lavc: Edge emulation with dst/src linesize Conflicts: libavcodec/cavs.c libavcodec/h264.c libavcodec/hevc.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/rv34.c libavcodec/svq3.c libavcodec/vc1dec.c libavcodec/videodsp.h libavcodec/videodsp_template.c libavcodec/vp3.c libavcodec/vp8.c libavcodec/wmv2.c libavcodec/x86/videodsp.asm libavcodec/x86/videodsp_init.c Changes to the asm are not merged, they are left for volunteers or in their absence for later. The changes this merge introduces are reordering of the function arguments See: face578d56c2d1375e40d5e2a28acc122132bc55 Merged-by: Michael Niedermayer <[email protected]>
| * lavc: Edge emulation with dst/src linesizeRonald S. Bultje2013-11-152-541/+489
| | | | | | | | | | | | Allow supporting files for which the image stride is smaller than the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9 file or a 16x16 VP8 file with -fflags +emu_edge.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-11-141-2/+6
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: Deprecate obsolete XvMC hardware decoding support Conflicts: libavcodec/mpeg12.c libavcodec/mpeg12dec.c libavcodec/mpegvideo.c libavcodec/options_table.h libavutil/pixdesc.c libavutil/version.h Merged-by: Michael Niedermayer <[email protected]>
| * Deprecate obsolete XvMC hardware decoding supportDiego Biurrun2013-11-131-2/+6
| | | | | | | | | | | | | | XvMC has long ago been superseded by newer acceleration APIs, such as VDPAU, and few downstreams still support it. Furthermore XvMC is not implemented within the hwaccel framework, but requires its own specific code in the MPEG-1/2 decoder, which is a maintenance burden.
* | Merge commit '0338c396987c82b41d322630ea9712fe5f9561d6'Michael Niedermayer2013-11-083-10/+41
|\| | | | | | | | | | | | | | | | | | | | | | | * commit '0338c396987c82b41d322630ea9712fe5f9561d6': dsputil: Split off H.263 bits into their own H263DSPContext Conflicts: configure libavcodec/mpegvideo.h libavcodec/mpegvideo_enc.c Merged-by: Michael Niedermayer <[email protected]>
| * dsputil: Split off H.263 bits into their own H263DSPContextDiego Biurrun2013-11-083-10/+41
| |
* | avcodec/vp9: add ff_vp9_idct_idct_{4x4,8x8}_ssse3().Clément Bœsch2013-11-052-0/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips 1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips 1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips 529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips 516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips 474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips (~3.9x faster) 7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips 7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips 7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips 1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips 1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips 1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips (~7.1x faster) Overall decode time before commit: 16.48s user 0.03s system 99% cpu 16.526 total 16.54s user 0.01s system 99% cpu 16.566 total 16.46s user 0.03s system 99% cpu 16.511 total Overall decode time after commit: 16.34s user 0.02s system 99% cpu 16.378 total 16.28s user 0.02s system 99% cpu 16.315 total 16.32s user 0.03s system 99% cpu 16.366 total Tested on i7 920 with 40s 1080p footage.
* | Merge commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930'Michael Niedermayer2013-11-051-5/+1
|\| | | | | | | | | | | | | * commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930': x86: rv40dsp: Use PAVGB instruction macro where appropriate Merged-by: Michael Niedermayer <[email protected]>
| * x86: rv40dsp: Use PAVGB instruction macro where appropriateDiego Biurrun2013-11-041-5/+1
| |
| * x86: hpeldsp: Use PAVGB instruction macro where necessaryMikulas Patocka2013-11-041-13/+13
| | | | | | | | | | Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Diego Biurrun <[email protected]>
* | avcodec/x86/hpeldsp: fix crash on AMD K6-3+Mikulas Patocka2013-11-031-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are instructions pavgb and pavgusb. Both instructions do the same operation but they have different enconding. Pavgb exists in SSE (or MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set. livavcodec uses the macro PAVGB to select the proper instruction. However, the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb directly. As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and K6-3 processors, because they have pavgusb, but not pavgb. This bug seems to be introduced by commit 71155d7b4157fee44c0d3d0fc1b660ebfb9ccf46, "dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm" Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge commit '1700b4e678ed329611a16b20d11e64b7abda4839'Michael Niedermayer2013-11-023-1556/+1586
|\| | | | | | | | | | | | | | | | | | | * commit '1700b4e678ed329611a16b20d11e64b7abda4839': x86: vp8dsp: Split loopfilter code into a separate file Conflicts: libavcodec/x86/Makefile Merged-by: Michael Niedermayer <[email protected]>
| * x86: vp8dsp: Split loopfilter code into a separate fileDiego Biurrun2013-11-013-1556/+1586
| |
* | avcodec/cabac: support UNCHECKED_BITSTREAM_READER = 0Michael Niedermayer2013-10-311-0/+23
| | | | | | | | | | | | | | | | | | | | | | Fixes overreads in HEVC Fixes Ticket3070 Also fixed remaining issues from Ticket3075 and Ticket3076 Some lines of code taken from 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/x86/cabac.h and 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/cabac_functions.h Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/videodsp: Small speedups in ff_emulated_edge_mc x86 SIMD.Ronald S. Bultje2013-10-271-17/+17
| | | | | | | | | | | | | | | | Don't use word-size multiplications if size == 2, and if we're using SIMD instructions (size >= 8), complete leftover 4byte sets using movd, not mov. Both of these changes lead to minor speedups. Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/videodsp: fix a bug in a %if statement where we used '%%' ↵Ronald S. Bultje2013-10-271-1/+1
| | | | | | | | | | | | instead of '&&'. Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/cabac: include get_cabac_bypass_sign_x86() under #if ↵Michael Niedermayer2013-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | !BROKEN_COMPILER this might fix Ticket2999 as well as some fate clients untested as the original patch submitter no longer has the environment to test this should be reverted if it does not fix the issues Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/videodsp: Properly mark sse2 instructions in emulated_edge_mc ↵Ronald S. Bultje2013-10-242-22/+32
| | | | | | | | | | | | | | | | | | | | | | x86 simd as such. Should fix crashes or corrupt output on pre-SSE2 CPUs when they were using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in hfix or hvar single-edge (left/right) extension functions. Tested-by: Ingo Brückl <[email protected]> Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/dsputil_init: move ff_idct_xvid_mmxext initMichael Niedermayer2013-10-151-9/+13
| | | | | | | | | | | | This decreases the diff to libav Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/dsputil_init: remove duplicated sse2 idct initMichael Niedermayer2013-10-151-6/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | avcodec/x86/dsputil_init: fix cpu flag checksMichael Niedermayer2013-10-151-2/+2
| | | | | | | | | | | | Fixes linking failure with --disable-sse2 Signed-off-by: Michael Niedermayer <[email protected]>
* | libavcodec/x86: Fix emulated_edge_mc SSE code to not contain SSE2 ↵Ronald S. Bultje2013-10-102-28/+47
| | | | | | | | | | | | instructions on x86-32. Signed-off-by: Michael Niedermayer <[email protected]>
* | x86: Fix compilation with nasm on PPC & OS/2Ronald S. Bultje2013-10-081-2/+1
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-081-32/+42
|\| | | | | | | | | | | | | * qatar/master: x86: h264_idct: Update comments to match 8/10-bit depth optimization split Merged-by: Michael Niedermayer <[email protected]>
| * x86: h264_idct: Update comments to match 8/10-bit depth optimization splitDiego Biurrun2013-10-071-32/+42
| |
* | Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450'Michael Niedermayer2013-10-082-14/+9
|\| | | | | | | | | | | | | * commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450': x86inc: Utilize the shadow space on 64-bit Windows Merged-by: Michael Niedermayer <[email protected]>
| * x86inc: Utilize the shadow space on 64-bit WindowsHenrik Gramner2013-10-072-14/+9
| | | | | | | | | | | | | | | | | | Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <[email protected]>
* | avcodec/x86/vp9dsp: Fix compilation with nasm.Ronald S. Bultje2013-10-081-3/+3
| | | | | | | | Signed-off-by: Michael Niedermayer <[email protected]>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2013-10-071-2/+10
|\| | | | | | | | | | | | | * qatar/master: x86: fdct: Employ more specific ifdefs Merged-by: Michael Niedermayer <[email protected]>
| * x86: fdct: Employ more specific ifdefsDiego Biurrun2013-10-061-2/+10
| | | | | | | | This avoids building mmxext and sse2 code when disabled by configure.
* | Merge commit '2ddb35b91131115c094d90e04031451023441b4d'Michael Niedermayer2013-10-063-39/+67
|\| | | | | | | | | | | | | * commit '2ddb35b91131115c094d90e04031451023441b4d': x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx Merged-by: Michael Niedermayer <[email protected]>
| * x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmxDiego Biurrun2013-10-053-39/+67
| | | | | | | | | | The function does not depend on MMX and compilation without MMX enabled fails if the function is compiled conditional on MMX availability.
* | Merge commit '258414d0771845d20f646ffe4d4e60f22fba217c'Michael Niedermayer2013-10-061-11/+11
|\| | | | | | | | | | | | | * commit '258414d0771845d20f646ffe4d4e60f22fba217c': x86: fdct: Initialize optimized fdct implementations in the standard way Merged-by: Michael Niedermayer <[email protected]>