diff options
author | Christophe Gisquet <christophe.gisquet@gmail.com> | 2012-04-19 22:36:17 +0200 |
---|---|---|
committer | Diego Biurrun <diego@biurrun.de> | 2012-05-10 18:42:43 +0200 |
commit | 110d0cdc9d1ec414a658f841a3fbefbf6f796d61 (patch) | |
tree | d2f80a035204c7a75a6daa5c71357e61817ffd54 /libavcodec/x86/dsputil_mmx.h | |
parent | 706b998cdcea97c50fad2228f67488de0e06b2a2 (diff) | |
download | ffmpeg-110d0cdc9d1ec414a658f841a3fbefbf6f796d61.tar.gz |
rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions
Benchmark (rounded to tens of unit):
V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16
C 445 358 985 1785 1559 3280
MMX* 219 271 478 714 929 1443
SSE2 131 158 294 425 515 892
SSSE3 120 122 248 387 390 763
End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Diffstat (limited to 'libavcodec/x86/dsputil_mmx.h')
-rw-r--r-- | libavcodec/x86/dsputil_mmx.h | 5 |
1 files changed, 5 insertions, 0 deletions
diff --git a/libavcodec/x86/dsputil_mmx.h b/libavcodec/x86/dsputil_mmx.h index 097739cf98..37f4581b9c 100644 --- a/libavcodec/x86/dsputil_mmx.h +++ b/libavcodec/x86/dsputil_mmx.h @@ -199,6 +199,11 @@ void ff_avg_cavs_qpel16_mc00_mmx2(uint8_t *dst, uint8_t *src, int stride); void ff_put_vc1_mspel_mc00_mmx(uint8_t *dst, const uint8_t *src, int stride, int rnd); void ff_avg_vc1_mspel_mc00_mmx2(uint8_t *dst, const uint8_t *src, int stride, int rnd); +void ff_put_rv40_qpel8_mc33_mmx(uint8_t *block, uint8_t *pixels, int line_size); +void ff_put_rv40_qpel16_mc33_mmx(uint8_t *block, uint8_t *pixels, int line_size); +void ff_avg_rv40_qpel8_mc33_mmx(uint8_t *block, uint8_t *pixels, int line_size); +void ff_avg_rv40_qpel16_mc33_mmx(uint8_t *block, uint8_t *pixels, int line_size); + void ff_mmx_idct(DCTELEM *block); void ff_mmxext_idct(DCTELEM *block); |