aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/x86/vp8dsp-init.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-03-051-56/+56
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (27 commits) cmdutils: use new avcodec_is_decoder/encoder() functions. lavc: make codec_is_decoder/encoder() public. lavc: deprecate AVCodecContext.sub_id. libcdio: add a forgotten AVClass to the private context. swscale: remove "cpu flags" from -sws_flags description. proresenc: give user a possibility to alter some encoding parameters vorbisenc: add output buffer overwrite protection libopencore-amrnbenc: fix end-of-stream handling ra144enc: fix end-of-stream handling nellymoserenc: zero any leftover packet bytes nellymoserenc: use proper MDCT overlap delay qpeg: Use bytestream2 functions to prevent buffer overreads. swscale: make %rep unconditional. vp8: convert simple loopfilter x86 assembly to use named arguments. vp8: convert idct x86 assembly to use named arguments. vp8: convert mc x86 assembly to use named arguments. vp8: convert loopfilter x86 assembly to use cpuflags(). vp8: convert idct/mc x86 assembly to use cpuflags(). swscale: remove now unnecessary hack. x86inc: don't "bake" stack_offset in named arguments. ... Conflicts: cmdutils.c doc/APIchanges libavcodec/mpeg12.c libavcodec/options.c libavcodec/qpeg.c libavcodec/utils.c libavcodec/version.h libavdevice/libcdio.c tests/lavf-regression.sh Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * vp8: convert idct/mc x86 assembly to use cpuflags().Ronald S. Bultje2012-03-031-56/+56
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-03-031-87/+126
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (29 commits) amrwb: remove duplicate arguments from extrapolate_isf(). amrwb: error out early if mode is invalid. h264: change underread for 10bit QPEL to overread. matroska: check buffer size for RM-style byte reordering. vp8: disable mmx functions with sse/sse2 counterparts on x86-64. vp8: change int stride to ptrdiff_t stride. wma: fix invalid buffer size assumptions causing random overreads. Windows Media Audio Lossless decoder rv10/20: Fix slice overflow with checked bitstream reader. h263dec: Disallow width/height changing with frame threads. rv10/20: Fix a buffer overread caused by losing track of the remaining buffer size. rmdec: Honor .RMF tag size rather than assuming 18. g722: Fix the QMF scaling r3d: don't set codec timebase. electronicarts: set timebase for tgv video. electronicarts: parse the framerate for cmv video. ogg: don't set codec timebase electronicarts: don't set codec timebase avs: don't set codec timebase wavpack: Fix an integer overflow ... Conflicts: libavcodec/arm/vp8dsp_init_arm.c libavcodec/fraps.c libavcodec/h264.c libavcodec/mpeg4videodec.c libavcodec/mpegvideo.c libavcodec/msmpeg4.c libavcodec/pnmdec.c libavcodec/qpeg.c libavcodec/rawenc.c libavcodec/ulti.c libavcodec/vcr1.c libavcodec/version.h libavcodec/wmalosslessdec.c libavformat/electronicarts.c libswscale/ppc/yuv2rgb_altivec.c tests/ref/acodec/g722 tests/ref/fate/ea-cmv Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * vp8: disable mmx functions with sse/sse2 counterparts on x86-64.Ronald S. Bultje2012-03-021-4/+20
| | | | | | | | | | x86-64 is guaranteed to have at least SSE2, therefore the MMX/MMX2 functions will never be used in practice.
| * vp8: change int stride to ptrdiff_t stride.Ronald S. Bultje2012-03-021-83/+106
| | | | | | | | | | On 64bit platforms with 32bit int, this means we won't have to sign- extend the integer anymore.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2011-10-211-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: (47 commits) lavc: hide private symbols. lavc: deprecate img_get_alpha_info(). lavc: use avpriv_ prefix for ff_toupper4. lavc: use avpriv_ prefix for ff_copy_bits and align_put_bits. lavc: use avpriv_ prefix for ff_ac3_parse_header. lavc: use avpriv_ prefix for ff_frame_rate_tab. lavc: rename ff_find_start_code to avpriv_mpv_find_start_code lavc: use avpriv_ prefix for ff_split_xiph_headers. lavc: use avpriv_ prefix for ff_dirac_parse_sequence_header. lavc: use avpriv_ prefix for some dv symbols used in lavf. lavc: use avpriv_ prefix for some flac symbols used in lavf. lavc: use avpriv_ prefix for some mpeg4audio symbols used in lavf. lavc: use avpriv_ prefix for some mpegaudio symbols used in lavf. lavc: use avpriv_ prefix for ff_aac_parse_header(). lavf: hide private symbols. lavf: use avpriv_ prefix for some dv functions. lavf: use avpriv_ prefix for ff_new_chapter(). avcodec: add CODEC_CAP_DELAY note to avcodec_decode_audio3() documentation avcodec: clarify the CODEC_CAP_DELAY note in avcodec_decode_video2() avcodec: clarify documentation of CODEC_CAP_DELAY ... Conflicts: configure doc/general.texi libavcodec/Makefile libavcodec/aacdec.c libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/dv.c libavcodec/dvdata.c libavcodec/dvdata.h libavcodec/libspeexenc.c libavcodec/mpegvideo.c libavcodec/version.h libavformat/avidec.c libavformat/dv.c libavformat/dv.h libavformat/flvenc.c libavformat/mov.c libavformat/mp3enc.c libavformat/oggparsespeex.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Move some variable declarations below the appropriat #ifdef.Diego Biurrun2011-10-201-1/+1
| | | | | | | | This avoids some unused variable warnings with YASM disabled.
| * Replace FFmpeg with Libav in licence headersMans Rullgard2011-03-191-4/+4
|/ | | | Signed-off-by: Mans Rullgard <mans@mansr.com>
* Move mm_support() from libavcodec to libavutil, make it a publicStefano Sabatini2010-09-081-1/+2
| | | | | | function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Rename FF_MM_ symbols related to CPU features flags as AV_CPU_FLAG_Stefano Sabatini2010-09-041-7/+7
| | | | | | symbols, and move them from libavcodec/avcodec.h to libavutil/cpu.h. Originally committed as revision 25040 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove global mm_flags variableMåns Rullgård2010-08-241-1/+1
| | | | Originally committed as revision 24909 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: move zeroing of luma DC block into the WHTJason Garrett-Glaser2010-08-021-0/+2
| | | | | | | Lets us do the zeroing in asm instead of C. Also makes it consistent with the way the regular iDCT code does it. Originally committed as revision 24668 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use word-writing instead of dword-writing (with two cached but otherwiseRonald S. Bultje2010-07-311-1/+3
| | | | | | | | | | unchanged bytes) in the horizontal simple loopfilter. This makes the filter quite a bit faster in itself (~30 cycles less on Core1), probably mostly because we don't need a complex 4x4 transpose, but only a simple byte interleave. Also allows using pextrw on SSE4, which speeds up even more (e.g. 25% faster on Core i7). Originally committed as revision 24638 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: optimize DC-only chroma case in the same way as luma.Jason Garrett-Glaser2010-07-231-7/+9
| | | | | | | Add MMX idct_dc_add4uv function for this case. ~40% faster chroma idct. Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: 30% faster idct_mbJason Garrett-Glaser2010-07-231-0/+5
| | | | | | | | | | Take shortcuts based on statistically common situations. Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT blocks are common. TODO: tie this more directly into the MB mode, since the DC-level transform is only used for non-splitmv blocks? Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8: clear DCT blocks in iDCT instead of using clear_blocks.Jason Garrett-Glaser2010-07-231-0/+2
| | | | | | ~0.3% faster overall. Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles onRonald S. Bultje2010-07-221-0/+4
| | | | | | CPUs supporting it. Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix and enable horizontal >=SSE2 mbedge loopfilter.Ronald S. Bultje2010-07-221-6/+6
| | | | Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Various VP8 x86 deblocking speedupsJason Garrett-Glaser2010-07-211-60/+40
| | | | | | | SSSE3 versions, improve SSE2 versions a bit. SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them. Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Make mmx VP8 WHT fasterJason Garrett-Glaser2010-07-211-2/+2
| | | | | | | Avoid pextrw, since it's slow on many older CPUs. Now it doesn't require mmxext either. Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)Ronald S. Bultje2010-07-201-0/+42
| | | | | | and chroma (width=8). Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.Ronald S. Bultje2010-07-201-0/+19
| | | | Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Revert r24339 (it causes fate failures on x86-64) - I'll figure out what'sRonald S. Bultje2010-07-191-19/+0
| | | | | | wrong with it tomorrow or so, then re-submit. Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Remove FF_MM_SSE2/3 flags for CPUs where this is generally not faster thanRonald S. Bultje2010-07-191-2/+5
| | | | | | | | | | | | | | regular MMX code. Examples of this are the Core1 CPU. Instead, set a new flag, FF_MM_SSE2/3SLOW, which can be checked for particular SSE2/3 functions that have been checked specifically on such CPUs and are actually faster than their MMX counterparts. In addition, use this flag to enable particular VP8 and LPC SSE2 functions that are faster than their MMX counterparts. Based on a patch by Loren Merritt <lorenm AT u washington edu>. Originally committed as revision 24340 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.Ronald S. Bultje2010-07-191-0/+19
| | | | Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Change function prototypes for width=8 inner and mbedge loopfilter functionsRonald S. Bultje2010-07-191-18/+18
| | | | | | | | | | | so that it does both U and V planes at the same time. This will have speed advantages when using SSE2 (or higher) optimizations, since we can do both the U and V rows together in a single xmm register. This also renames filter16 to filter16y and filter8 to filter8uv so that it's more obvious what each function is used for. Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
* VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.Ronald S. Bultje2010-07-151-0/+22
| | | | Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).Ronald S. Bultje2010-07-031-0/+16
| | | | Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
* SSSE3 versions of vp8 width4 bilinear MC functionsJason Garrett-Glaser2010-07-031-2/+11
| | | | Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
* SSSE3 versions of width4 VP8 6-tap MC functionsJason Garrett-Glaser2010-07-021-0/+18
| | | | | | | | | Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a non-bitexactness bug. Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>. Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix 100L in vp8dsp asm initJason Garrett-Glaser2010-07-011-2/+2
| | | | Originally committed as revision 23946 to svn://svn.ffmpeg.org/ffmpeg/trunk
* MMX idct_add for VP8.Ronald S. Bultje2010-06-291-0/+2
| | | | Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add mmxext version of VP8 DC Hadamard transformJason Garrett-Glaser2010-06-291-0/+2
| | | | Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Change MMXEXT to MMX2, MMXEXT is deprecatedBaptiste Coudurier2010-06-281-1/+1
| | | | Originally committed as revision 23865 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add x86 asm functions for VP8 put_pixelsJason Garrett-Glaser2010-06-281-0/+19
| | | | Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Add MMX, SSE2, SSSE3 asm for VP8 bilinear MCJason Garrett-Glaser2010-06-281-96/+124
| | | | Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
* Fix build without yasmDavid Conrad2010-06-271-0/+8
| | | | Originally committed as revision 23816 to svn://svn.ffmpeg.org/ffmpeg/trunk
* First shot at VP8 optimizations:Jason Garrett-Glaser2010-06-271-0/+216
- MMXEXT, SSE2 and SSSE3 MC functions - MMX and SSE4 IDCT dc_add functions Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself. Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk