aboutsummaryrefslogtreecommitdiffstats
path: root/libavutil/x86/float_dsp.asm
Commit message (Collapse)AuthorAgeFilesLines
* x86/float_dsp: add SSE2 and AVX versions of scalarproduct_doubleJames Almer2024-06-031-0/+52
| | | | Signed-off-by: James Almer <jamrial@gmail.com>
* x86: replace explicit REP_RETs with RETsLynne2023-02-011-9/+9
| | | | | | | | | | | | | | | | | | | From x86inc: > On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either > a branch or a branch target. So switch to a 2-byte form of ret in that case. > We can automatically detect "follows a branch", but not a branch target. > (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.) x86inc can automatically determine whether to use REP_RET rather than REP in most of these cases, so impact is minimal. Additionally, a few REP_RETs were used unnecessary, despite the return being nowhere near a branch. The only CPUs affected were AMD K10s, made between 2007 and 2011, 16 years ago and 12 years ago, respectively. In the future, everyone involved with x86inc should consider dropping REP_RETs altogether.
* x86/float_dsp: use three operand form for some instructionsJames Almer2022-09-131-8/+8
| | | | | | Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>
* avutil/x86/float_dsp: add fma3 for scalarproductPaul B Mahol2022-09-131-0/+127
|
* avutil/x86/float_dsp: Remove obsolete 3dnowext functionAndreas Rheinhardt2022-06-221-24/+1
| | | | | | | | | | | x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems which benefit from ff_vector_fmul_window_3dnowext are truely ancient 32bit AMD x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* libavutil: include assembly with full path from source rootAlexander Kanavin2022-02-081-1/+1
| | | | | | | | Otherwise nasm writes the full host-specific paths into .o output, which breaks binary reproducibility. Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
* x86/float_dsp: add ff_vector_dmul_{sse2,avx}James Almer2018-09-141-0/+33
| | | | | | ~3x to 5x faster. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: remove usage of integer instructionsJames Almer2017-05-121-7/+7
|
* x86/float_dsp: add ff_vector_fmul_reverse_avx2James Almer2017-04-111-1/+14
| | | | | | ~20% faster than AVX. Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}James Almer2017-04-101-0/+63
|
* x86/float_dsp: zero extend offset from ff_scalarproduct_float_sseJames Almer2016-01-081-3/+3
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: zero extend len from ff_butterflies_float_sse implicitlyJames Almer2016-01-081-4/+1
| | | | | Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: remove len check from ff_butterflies_float_sseJames Almer2016-01-081-3/+0
| | | | | | | The function documentation explicitly mentions it needs to be a multiple of 4. Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add missing colon to labelsJames Almer2015-07-261-1/+1
| | | | | | Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>
* x86/float_dsp: add missing femmsJames Almer2014-06-081-0/+3
| | | | | | | | It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: port vector_fmul_window to yasmJames Almer2014-06-081-0/+55
| | | | | Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: remove duplicated code from vector_dmul_scalarJames Almer2014-04-191-8/+3
| | | | | | | | Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: unroll loop in vector_fmac_scalarJames Almer2014-04-161-18/+26
| | | | | | | | ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: use SWAP in vector_fmac_scalar Win64James Almer2014-04-161-3/+3
| | | | | | | The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3James Almer2014-03-131-1/+23
| | | | | | | ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* x86: float dsp: unroll SSE versionsChristophe Gisquet2014-02-151-16/+24
| | | | | | | | | | vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f'Michael Niedermayer2013-05-031-15/+17
|\ | | | | | | | | | | | | | | | | | | * commit '566b7a20fd0cab44d344329538d314454a0bcc2f': x86: float dsp: butterflies_float SSE Conflicts: libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float dsp: butterflies_float SSEChristophe Gisquet2013-05-031-0/+26
| | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.
* | butterflies_float: replace 2 lea by 2 addMichael Niedermayer2013-04-171-2/+2
| | | | | | | | | | | | | | adds are simpler instructions and should be faster or equally fast on all cpus Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | x86: float dsp: butterflies_float SSEChristophe Gisquet2013-04-171-0/+23
| | | | | | | | | | | | | | 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* | Merge commit '73b704ac609d83e0be124589f24efd9b94947cf9'Michael Niedermayer2013-01-231-1/+27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '73b704ac609d83e0be124589f24efd9b94947cf9': arm: Add some missing header #includes floatdsp: move scalarproduct_float from dsputil to avfloatdsp. Conflicts: libavcodec/acelp_pitch_delay.c libavcodec/amrnbdec.c libavcodec/amrwbdec.c libavcodec/ra288.c libavcodec/x86/dsputil_mmx.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move scalarproduct_float from dsputil to avfloatdsp.Ronald S. Bultje2013-01-221-0/+25
| | | | | | | | This makes the aac decoder and all voice codecs independent of dsputil.
* | Merge commit '42d324694883cdf1fff1612ac70fa403692a1ad4'Michael Niedermayer2013-01-231-0/+39
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '42d324694883cdf1fff1612ac70fa403692a1ad4': floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Conflicts: libavcodec/arm/dsputil_init_vfp.c libavcodec/arm/dsputil_vfp.S libavcodec/dsputil.c libavcodec/ppc/float_altivec.c libavcodec/x86/dsputil.asm libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.Ronald S. Bultje2013-01-221-0/+37
| | | | | | | | | | | | Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.
* | Merge commit '55aa03b9f8f11ebb7535424cc0e5635558590f49'Michael Niedermayer2013-01-231-0/+30
|\| | | | | | | | | | | | | | | | | | | | | * commit '55aa03b9f8f11ebb7535424cc0e5635558590f49': floatdsp: move vector_fmul_add from dsputil to avfloatdsp. Conflicts: libavcodec/dsputil.c libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * floatdsp: move vector_fmul_add from dsputil to avfloatdsp.Ronald S. Bultje2013-01-221-0/+28
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-12-081-1/+4
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: golomb: use unsigned arithmetics in svq3_get_ue_golomb() x86: float_dsp: fix loading of the len parameter on x86-32 takdec: fix initialisation of LOCAL_ALIGNED array takdec: fix initialisation of LOCAL_ALIGNED array Conflicts: libavcodec/rv30.c libavcodec/svq3.c libavcodec/takdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix loading of the len parameter on x86-32Justin Ruggles2012-12-071-1/+4
| |
* | Merge commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152'Michael Niedermayer2012-12-071-1/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152': fate: dpcm: Add dependencies SBR DSP x86: implement SSE sbr_hf_gen AAC SBR: use AVFloatDSPContext's vector_fmul fate: image: Add dependencies Changelog: add an entry for deprecating the avconv -vol option x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Conflicts: Changelog libavutil/x86/float_dsp.asm tests/fate/image.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32Justin Ruggles2012-12-061-1/+1
| | | | | | | | Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* | Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'Michael Niedermayer2012-12-061-0/+45
|\| | | | | | | | | | | | | | | | | | | | | | | | | * commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * float_dsp: add vector_dmul_scalar() to multiply a vector of doublesJustin Ruggles2012-12-051-0/+45
| | | | | | | | Include x86-optimized versions for SSE2 and AVX.
* | Merge commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9'Michael Niedermayer2012-11-271-0/+29
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9': riff: only warn on a bad INFO chunk code size instead of failing configure: Add separate list for libraries and use where appropriate x86: float_dsp: add SSE version of vector_fmul_scalar() Conflicts: configure libavformat/riff.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: add SSE version of vector_fmul_scalar()Justin Ruggles2012-11-261-0/+29
| |
| * build: Drop AVX assembly ifdefsDiego Biurrun2012-11-111-4/+0
| | | | | | | | An assembler able to cope with AVX instructions is now required.
* | Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73'Michael Niedermayer2012-10-311-1/+0
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: include x86inc.asm in x86util.asmDiego Biurrun2012-10-311-1/+0
| | | | | | | | This is necessary to allow refactoring some x86util macros with cpuflags.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-09-081-3/+3
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: mov_chan: Only set the channel_layout if setting it to a nonzero value mov_chan: Reindent an incorrectly indented line mp2 muxer: mark as AVFMT_NOTIMESTAMPS. x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 x86: more specific checks for availability of required assembly capabilities x86: avcodec: Drop silly "_mmx" suffix from dsputil template names fate: Drop redundant setting of FUZZ to 1 cavsdsp: set idct permutation independently of dsputil x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov Conflicts: libavcodec/x86/dsputil_mmx.c libavformat/mp3enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64Justin Ruggles2012-09-071-3/+3
| | | | | | | | | | The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-311-2/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: MSS1 and MSS2: set final pixel format after common stuff has been initialised MSS2 decoder configure: handle --disable-asm before check_deps x86: Split inline and external assembly #ifdefs configure: x86: Separate inline from standalone assembler capabilities pktdumper: Use a custom define instead of PATH_MAX for buffers pktdumper: Use av_strlcpy instead of strncpy pktdumper: Use sizeof(variable) instead of the direct buffer length Conflicts: Changelog configure libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/codec_desc.c libavcodec/dct-test.c libavcodec/imgconvert.c libavcodec/mss12.c libavcodec/version.h libavfilter/x86/gradfun.c libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: Split inline and external assembly #ifdefsDiego Biurrun2012-08-311-2/+2
| |
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-08-071-2/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: x86: fix build with nasm 2.08 x86: use nop cpu directives only if supported x86: fix rNmp macros with nasm build: add trailing / to yasm/nasm -I flags x86: use 32-bit source registers with movd instruction x86: add colons after labels Conflicts: Makefile libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86: add colons after labelsMans Rullgard2012-08-071-2/+2
| | | | | | | | | | | | nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>
* | Merge remote-tracking branch 'qatar/master'Michael Niedermayer2012-07-271-10/+0
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: proresdsp: port x86 assembly to cpuflags. lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro lavfi: better channel layout negotiation alac: check for truncated packets alac: reverse lpc coeff order, simplify filter lavr: add x86-optimized mixing functions x86: add support for fmaddps fma4 instruction with abstraction to avx/sse tscc2: fix typo in array index build: use COMPILE template for HOSTOBJS build: do full flag handling for all compiler-type tools eval: fix printing of NaN in eval fate test. build: Rename aandct component to more descriptive aandcttables mpegaudio: bury inline asm under HAVE_INLINE_ASM. x86inc: automatically insert vzeroupper for YMM functions. rtmp: Check the buffer length of ping packets rtmp: Allow having more unknown data at the end of a chunk size packet without failing rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets Conflicts: Makefile configure libavcodec/x86/proresdsp.asm libavutil/eval.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
| * x86inc: automatically insert vzeroupper for YMM functions.Ronald S. Bultje2012-07-261-10/+0
| |