Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Merge remote-tracking branch 'qatar/master' | Michael Niedermayer | 2012-06-22 | 1 | -1/+1 |
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * qatar/master: libspeexenc: add supported sample rates and channel layouts. Replace usleep() calls with av_usleep() lavu: add av_usleep() function utvideo: mark interlaced frames as such utvideo: Fix interlaced prediction for RGB utvideo. cosmetics: do not use full path for local headers lavu/file: include unistd.h only when available configure: check for unistd.h log: include unistd.h only when needed lavf: include libavutil/time.h instead of redeclaring av_gettime() Conflicts: configure doc/APIchanges ffmpeg.c ffplay.c libavcodec/utvideo.c libavutil/avutil.h Merged-by: Michael Niedermayer <[email protected]> | ||||
| * | cosmetics: do not use full path for local headers | Diego Biurrun | 2012-06-22 | 1 | -1/+1 |
| | | |||||
* | | libavcodec/x86/rv40dsp_init.c: add missing HAVE_YASM | Michael Niedermayer | 2012-06-10 | 1 | -0/+4 |
|/ | | | | Signed-off-by: Michael Niedermayer <[email protected]> | ||||
* | x86: rv40: Mark rv40_weight functions as MMX2; they use MMX2 instructions. | Michael Kostylev | 2012-05-15 | 1 | -5/+5 |
| | |||||
* | rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC | Christophe Gisquet | 2012-05-10 | 1 | -0/+146 |
| | | | | | | | | | | | | | | | | | | | | Code mostly inspired by vp8's MC, however: - its MMX2 horizontal filter is worse because it can't take advantage of the coefficient redundancy - that same coefficient redundancy allows better code for non-SSSE3 versions Benchmark (rounded to tens of unit): V8x8 H8x8 2D8x8 V16x16 H16x16 2D16x16 C 445 358 985 1785 1559 3280 MMX* 219 271 478 714 929 1443 SSE2 131 158 294 425 515 892 SSSE3 120 122 248 387 390 763 End result is overall around a 15% speedup for SSSE3 version (on 6 sequences); all loop filter functions now take around 55% of decoding time, while luma MC dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%. Signed-off-by: Diego Biurrun <[email protected]> | ||||
* | rv40dsp: implement prescaled versions for biweight. | Christophe GISQUET | 2012-04-10 | 1 | -10/+20 |
| | | | | | | | | | | Quite often, the original weights are multiple of 512. By prescaling them by 1/512 when they are computed (once per frame), no intermediate shifting is needed, and no prescaling on each call either. The x86 code already used that trick. Signed-off-by: Ronald S. Bultje <[email protected]> | ||||
* | rv34: change most "int stride" into "ptrdiff_t stride". | Ronald S. Bultje | 2012-02-20 | 1 | -2/+2 |
| | | | | | | This prevents having to sign-extend on 64-bit systems with 32-bit ints, such as x86-64. Also fixes crashes on systems where we don't do it and arguments are not in registers, such as Win64 for all weight functions. | ||||
* | rv40: x86 SIMD for biweight | Christophe Gisquet | 2012-01-30 | 1 | -0/+19 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <[email protected]> | ||||
* | x86: Give RV40 init file a more suitable name. | Diego Biurrun | 2012-01-30 | 1 | -0/+60 |