| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Currently only dc-only and full 16x16. Other subforms will follow in the
near future. Total decoding time of ped1080p.webm goes from 9.7 to 9.3
seconds. DC-only goes from 957 -> 131 cycles, and the full IDCT goes
from ~4050 to ~745 cycles.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '4958f35a2ebc307049ff2104ffb944f5f457feb3':
dsputil: Move apply_window_int16 to ac3dsp
Conflicts:
libavcodec/arm/ac3dsp_init_arm.c
libavcodec/arm/ac3dsp_neon.S
libavcodec/x86/ac3dsp_init.c
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
The (optimized) functions are used nowhere else.
|
| |
| |
| |
| |
| | |
For that specific case (eob>3&&eob<=12), runtime of idct8x8 goes from
668 to 477 cycles. For all idct8x8, runtime goes from 521 to 490 cycles.
|
| |
| |
| |
| | |
This allows us to load it only once, instead of twice, in this function.
|
| |
| |
| |
| |
| | |
Make register usage in macros explicit; change mulsub_2w_4x to use 2
instead of 3 temp registers.
|
| |
| |
| |
| |
| | |
(And in future, loopfilter or intra pred could be put in their own
respective files also.)
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: Initialize mmxext after amd3dnow optimizations
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
The mmxext optimizations should be at least equally fast if available and
amd3dnow optimizations are being deprecated. Thus the former should
override the latter, not the other way around.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
dsputil: x86: Move ff_inv_zigzag_direct16 table init to mpegvideo
If someone optimizes dct_quantize for non x86 SIMD, then this
probably needs to be reverted.
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
The table is MMX-specific and used nowhere else.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'cf7860db608df7c76471d8b61f07abbd5aad8dd5':
x86: dsputil: Suppress deprecation warnings for XvMC bits
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
These parts are scheduled for removal on the next version bump.
Signed-off-by: Vittorio Giovara <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Originally written by Ronald S. Bultje <[email protected]> and
Clément Bœsch <[email protected]>
Further contributions by:
Anton Khirnov <[email protected]>
Diego Biurrun <[email protected]>
Luca Barbato <[email protected]>
Martin Storsjö <[email protected]>
Signed-off-by: Luca Barbato <[email protected]>
Signed-off-by: Anton Khirnov <[email protected]>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
before:
411 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388289 runs, 319 skips
after:
389 decicycles in ff_pred4x4_tm_vp8_8_ssse3, 8388308 runs, 300 skips
Tested on i7 920.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Original fix by one of these developers:
Anton Khirnov <[email protected]>
Diego Biurrun <[email protected]>
Luca Barbato <[email protected]>
Martin Storsjö <[email protected]>
See 97962b2 / 72ca830
Personnal guess is Diego Biurrun.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '458446acfa1441d283dacf9e6e545beb083b8bb0':
lavc: Edge emulation with dst/src linesize
Conflicts:
libavcodec/cavs.c
libavcodec/h264.c
libavcodec/hevc.c
libavcodec/mpegvideo_enc.c
libavcodec/mpegvideo_motion.c
libavcodec/rv34.c
libavcodec/svq3.c
libavcodec/vc1dec.c
libavcodec/videodsp.h
libavcodec/videodsp_template.c
libavcodec/vp3.c
libavcodec/vp8.c
libavcodec/wmv2.c
libavcodec/x86/videodsp.asm
libavcodec/x86/videodsp_init.c
Changes to the asm are not merged, they are left for volunteers or
in their absence for later.
The changes this merge introduces are reordering of the function
arguments
See: face578d56c2d1375e40d5e2a28acc122132bc55
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* qatar/master:
Deprecate obsolete XvMC hardware decoding support
Conflicts:
libavcodec/mpeg12.c
libavcodec/mpeg12dec.c
libavcodec/mpegvideo.c
libavcodec/options_table.h
libavutil/pixdesc.c
libavutil/version.h
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| | |
XvMC has long ago been superseded by newer acceleration APIs, such as
VDPAU, and few downstreams still support it. Furthermore XvMC is not
implemented within the hwaccel framework, but requires its own specific
code in the MPEG-1/2 decoder, which is a maintenance burden.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '0338c396987c82b41d322630ea9712fe5f9561d6':
dsputil: Split off H.263 bits into their own H263DSPContext
Conflicts:
configure
libavcodec/mpegvideo.h
libavcodec/mpegvideo_enc.c
Merged-by: Michael Niedermayer <[email protected]>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
1789 decicycles in idct_idct_4x4_add_c, 262136 runs, 8 skips
1839 decicycles in idct_idct_4x4_add_c, 524270 runs, 18 skips
1864 decicycles in idct_idct_4x4_add_c, 1048548 runs, 28 skips
529 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 262138 runs, 6 skips
516 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 524282 runs, 6 skips
474 decicycles in ff_vp9_idct_idct_4x4_add_ssse3, 1048565 runs, 11 skips
(~3.9x faster)
7726 decicycles in idct_idct_8x8_add_c, 1048433 runs, 143 skips
7732 decicycles in idct_idct_8x8_add_c, 2096882 runs, 270 skips
7731 decicycles in idct_idct_8x8_add_c, 4193772 runs, 532 skips
1145 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 1048549 runs, 27 skips
1137 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 2097097 runs, 55 skips
1086 decicycles in ff_vp9_idct_idct_8x8_add_ssse3, 4194188 runs, 116 skips
(~7.1x faster)
Overall decode time before commit:
16.48s user 0.03s system 99% cpu 16.526 total
16.54s user 0.01s system 99% cpu 16.566 total
16.46s user 0.03s system 99% cpu 16.511 total
Overall decode time after commit:
16.34s user 0.02s system 99% cpu 16.378 total
16.28s user 0.02s system 99% cpu 16.315 total
16.32s user 0.03s system 99% cpu 16.366 total
Tested on i7 920 with 40s 1080p footage.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'e2b5b097898c9155f4bdff4d83cdc54d5eef6930':
x86: rv40dsp: Use PAVGB instruction macro where appropriate
Merged-by: Michael Niedermayer <[email protected]>
|
| | |
|
| |
| |
| |
| |
| | |
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Diego Biurrun <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are instructions pavgb and pavgusb. Both instructions do the same
operation but they have different enconding. Pavgb exists in SSE (or
MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set.
livavcodec uses the macro PAVGB to select the proper instruction. However,
the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb
directly.
As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and
K6-3 processors, because they have pavgusb, but not pavgb.
This bug seems to be introduced by commit
71155d7b4157fee44c0d3d0fc1b660ebfb9ccf46, "dsputil: x86: Convert mpeg4
qpel and dsputil avg to yasm"
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* commit '1700b4e678ed329611a16b20d11e64b7abda4839':
x86: vp8dsp: Split loopfilter code into a separate file
Conflicts:
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <[email protected]>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes overreads in HEVC
Fixes Ticket3070
Also fixed remaining issues from Ticket3075 and Ticket3076
Some lines of code taken from 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/x86/cabac.h
and 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/cabac_functions.h
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Don't use word-size multiplications if size == 2, and if we're using
SIMD instructions (size >= 8), complete leftover 4byte sets using movd,
not mov. Both of these changes lead to minor speedups.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
instead of '&&'.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
!BROKEN_COMPILER
this might fix Ticket2999 as well as some fate clients
untested as the original patch submitter no longer has the environment to test
this should be reverted if it does not fix the issues
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
x86 simd as such.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.
Tested-by: Ingo Brückl <[email protected]>
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
This decreases the diff to libav
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Fixes linking failure with --disable-sse2
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
instructions on x86-32.
Signed-off-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: h264_idct: Update comments to match 8/10-bit depth optimization split
Merged-by: Michael Niedermayer <[email protected]>
|
| | |
|
|\|
| |
| |
| |
| |
| |
| | |
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: Michael Niedermayer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* qatar/master:
x86: fdct: Employ more specific ifdefs
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| | |
This avoids building mmxext and sse2 code when disabled by configure.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '2ddb35b91131115c094d90e04031451023441b4d':
x86: dsputil: Separate ff_add_hfyu_median_prediction_cmov from dsputil_mmx
Merged-by: Michael Niedermayer <[email protected]>
|
| |
| |
| |
| |
| | |
The function does not depend on MMX and compilation without MMX enabled
fails if the function is compiled conditional on MMX availability.
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '258414d0771845d20f646ffe4d4e60f22fba217c':
x86: fdct: Initialize optimized fdct implementations in the standard way
Merged-by: Michael Niedermayer <[email protected]>
|