diff options
author | Christophe GISQUET <christophe.gisquet@gmail.com> | 2012-02-23 19:48:58 +0100 |
---|---|---|
committer | Ronald S. Bultje <rsbultje@gmail.com> | 2012-02-23 15:50:06 -0800 |
commit | 34454c761f01275d4adaf40df6d70a59011c4a6c (patch) | |
tree | a25a23c028ddee97c1195567f855ce064bdbe916 /libavcodec/x86/Makefile | |
parent | 2e74a5abc2fda6cfbc86589852d6194d502332cb (diff) | |
download | ffmpeg-34454c761f01275d4adaf40df6d70a59011c4a6c.tar.gz |
SBR DSP x86: implement SSE sbr_sum_square_sse
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Diffstat (limited to 'libavcodec/x86/Makefile')
-rw-r--r-- | libavcodec/x86/Makefile | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile index fc88433783..e64697aa2b 100644 --- a/libavcodec/x86/Makefile +++ b/libavcodec/x86/Makefile @@ -47,6 +47,8 @@ YASM-OBJS-$(CONFIG_PNG_DECODER) += x86/pngdsp.o MMX-OBJS-$(CONFIG_PNG_DECODER) += x86/pngdsp-init.o YASM-OBJS-$(CONFIG_PRORES_DECODER) += x86/proresdsp.o MMX-OBJS-$(CONFIG_PRORES_DECODER) += x86/proresdsp-init.o +MMX-OBJS-$(CONFIG_AAC_DECODER) += x86/sbrdsp_init.o +YASM-OBJS-$(CONFIG_AAC_DECODER) += x86/sbrdsp.o MMX-OBJS-$(CONFIG_DWT) += x86/snowdsp_mmx.o MMX-OBJS-$(CONFIG_VC1_DECODER) += x86/vc1dsp_mmx.o YASM-OBJS-$(CONFIG_VP3_DECODER) += x86/vp3dsp.o |