rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC

Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <diego@biurrun.de>
author: Christophe Gisquet <christophe.gisquet@gmail.com> 2012-04-19 22:36:17 +0200
committer: Diego Biurrun <diego@biurrun.de> 2012-05-10 18:42:43 +0200
commit: 110d0cdc9d1ec414a658f841a3fbefbf6f796d61 (patch)
tree: d2f80a035204c7a75a6daa5c71357e61817ffd54 /libavcodec/x86/dsputil_mmx.c
parent: 706b998cdcea97c50fad2228f67488de0e06b2a2 (diff)
download: ffmpeg-110d0cdc9d1ec414a658f841a3fbefbf6f796d61.tar.gz
1 files changed, 16 insertions, 0 deletions
diff --git a/libavcodec/x86/dsputil_mmx.c b/libavcodec/x86/dsputil_mmx.c
index 3ef19c5d13..6377a73555 100644
--- a/libavcodec/x86/dsputil_mmx.c
+++ b/libavcodec/x86/dsputil_mmx.c
@@ -1791,6 +1791,22 @@ QPEL_2TAP(avg_, 16, 3dnow)
 QPEL_2TAP(put_,  8, 3dnow)
 QPEL_2TAP(avg_,  8, 3dnow)
 
+void ff_put_rv40_qpel8_mc33_mmx(uint8_t *dst, uint8_t *src, int stride)
+{
+  put_pixels8_xy2_mmx(dst, src, stride, 8);
+}
+void ff_put_rv40_qpel16_mc33_mmx(uint8_t *dst, uint8_t *src, int stride)
+{
+  put_pixels16_xy2_mmx(dst, src, stride, 16);
+}
+void ff_avg_rv40_qpel8_mc33_mmx(uint8_t *dst, uint8_t *src, int stride)
+{
+  avg_pixels8_xy2_mmx(dst, src, stride, 8);
+}
+void ff_avg_rv40_qpel16_mc33_mmx(uint8_t *dst, uint8_t *src, int stride)
+{
+  avg_pixels16_xy2_mmx(dst, src, stride, 16);
+}
 
 #if HAVE_YASM
 typedef void emu_edge_core_func(uint8_t *buf, const uint8_t *src,
author	Christophe Gisquet <christophe.gisquet@gmail.com>	2012-04-19 22:36:17 +0200
committer	Diego Biurrun <diego@biurrun.de>	2012-05-10 18:42:43 +0200
commit	110d0cdc9d1ec414a658f841a3fbefbf6f796d61 (patch)
tree	d2f80a035204c7a75a6daa5c71357e61817ffd54 /libavcodec/x86/dsputil_mmx.c
parent	706b998cdcea97c50fad2228f67488de0e06b2a2 (diff)
download	ffmpeg-110d0cdc9d1ec414a658f841a3fbefbf6f796d61.tar.gz