ffmpeg - Mirror of FFmpeg git repo

	Commit message (Collapse)	Author	Age	Files	Lines
*	lavu/fixed_dsp: R-V V fmul_window_scaled	Rémi Denis-Courmont	2023-11-23	2	-1/+54
\| \| \| \| \|	vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
*	lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window	Rémi Denis-Courmont	2023-11-23	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5
*	lavu/fixed_dsp: optimise R-V V fmul_reverse	Rémi Denis-Courmont	2023-11-23	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.
*	riscv: fix builds without Zbb support	Rémi Denis-Courmont	2023-11-18	1	-0/+5
\|
*	lavu/riscv: fix typo	Rémi Denis-Courmont	2023-10-29	1	-1/+1
\|
*	lavu/fixed_dsp: R-V V vector_fmul_window	Rémi Denis-Courmont	2023-10-09	2	-0/+50
\|
*	lavu/fixed_dsp: R-V V vector_fmul	Rémi Denis-Courmont	2023-10-09	2	-0/+20
\| \| \| \| \|	vector_fmul_fixed_c: 4.0 vector_fmul_fixed_rvv_i64: 0.5
*	lavu/fixed_dsp: R-V V vector_fmul_reverse	Rémi Denis-Courmont	2023-10-09	2	-0/+27
\|
*	lavu/fixed_dsp: R-V V vector_fmul_add	Rémi Denis-Courmont	2023-10-09	2	-0/+26
\| \| \| \| \|	vector_fmul_add_fixed_c: 2.2 vector_fmul_add_fixed_rvv_i64: 0.5
*	lavu/float_dsp: adjust multipler in R-V V fmul_window	Rémi Denis-Courmont	2023-10-09	1	-1/+1
\| \| \| \| \| \|	The gather index vector is only used as double-length (due to register pressure), so no need to initialise it for quad-length. Basically this matches the multiplier in the prologue to the the multipler in the loop.
*	lavu/fixed_dsp: R-V V scalarproduct	Rémi Denis-Courmont	2023-10-07	2	-1/+27
\|
*	lavu/float_dsp: avoid reg-stride in R-V V fmul_window	Rémi Denis-Courmont	2023-10-03	1	-20/+25
\|
*	lavu/float_dsp: avoid reg-stride in R-V V reverse_fmul	Rémi Denis-Courmont	2023-10-03	1	-6/+11
\| \| \| \| \| \|	This revectors the inner loop to reverse vectors element in vectors, thus eliminating the negative register stride. Note that RVV does not have a vector reverse instruction, so this uses a gather.
*	riscv: factor out the bswap32 assembler	Rémi Denis-Courmont	2023-10-02	1	-0/+65
\|
*	Revert "lavu/timer: remove gratuitous volatile"	Rémi Denis-Courmont	2023-09-28	1	-2/+2
\| \| \| \| \| \| \| \|	It does not make much sense to me, but GCC somehow optimises the inline assembler even though the output is very obviously used and having observable side effects. This reverts commit 09731fbfc3a914ec4f6ffad60aa9062db6a8f6aa.
*	lavu/timer: specify RISC-V time unit	Rémi Denis-Courmont	2023-08-24	1	-0/+1
\|
*	lavu/timer: remove gratuitous volatile	Rémi Denis-Courmont	2023-08-24	1	-2/+2
\| \| \| \|	AV_READ_TIME has no side effects. It does not need to be volatile.
*	lavu/timer: use time for AV_READ_TIME on RISC-V	Rémi Denis-Courmont	2023-08-24	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far, AV_READ_TIME would return the cycle counter. This posed two problems: 1) On recent systems, it would just raise an illegal instruction exception. Indeed RDCYCLE is blocked in user space to ward off some side channel attacks. In particular, this would cause the random number generator to crash. 2) It does not match the x86 behaviour and the apparent original intent of AV_READ_TIME in the functional code base (outside test cases). So this replaces the cycle counter with the time counter. The unit is a platform-dependent constant fraction of time, and the value should be stable across harts (RISC-V lingo for physical CPU thread).
*	lavu/float_dsp: rework RISC-V V scalar product	Rémi Denis-Courmont	2023-07-20	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2
*	lavu/float_dsp: unroll RISC-V V loops	Rémi Denis-Courmont	2023-07-20	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)
*	lavu: add/use flag for RISC-V Zba extension	Rémi Denis-Courmont	2023-07-19	3	-16/+21
\| \| \| \| \| \|	The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.
*	lavu/fixed_dsp: unroll RISC-V V loop	Rémi Denis-Courmont	2023-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	Before: butterflies_fixed_c: 804.7 butterflies_fixed_rvv_i32: 348.2 After: butterflies_fixed_rvv_i32: 308.7
*	riscv/intmath: use builtins for counting ones	Rémi Denis-Courmont	2023-05-02	1	-26/+4
\| \| \| \| \| \|	As with the earlier bswap change, all versions of GCC and Clang that support RISC-V support the popcount built-ins, so we can just use them instead of inline assembler.
*	riscv/bswap: use compiler builtins	Rémi Denis-Courmont	2023-05-02	1	-47/+5
\| \| \| \| \| \| \| \| \| \| \|	av_bswapXX() are used in context that expect exact size types, notably variable arguments to av_log(). On Linux RV64, uint_fast32_t is an unsigned long, so the current inline assembler does not work properly. Since GCC and Clang gained their byte-swap built-ins before they supported RISC-V, we can simply defer to them. As an added bonus, the compiler can do instruction scheduling, which it couldn't with the Zbb inline assembler.
*	riscv: fix scalar product initialisation	Rémi Denis-Courmont	2022-10-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	VSETVLI xd, x0, ...' has rather nonobvious semantics: - If xd is x0, then it preserves the current vector length. - If xd is not x0, it sets the vector length to the supported maximum. Also somewhat confusingly, while VMV.X.S always does its thing regardless of the selected vector length, VMV.S.X does _nothing_ if the selected vector length is zero. So the current code breaks fails to initialise the accumulator if we are unlucky to have a selected vector length of zero on entry. Fix it by forcing the vector length to one.
*	lavu/riscv: helper macro for VTYPE encoding	Rémi Denis-Courmont	2022-10-10	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \|	On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies.
*	lavu/riscv: CPU flag for the Zbb extension	Rémi Denis-Courmont	2022-10-05	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.
*	riscv: remove unnecessary #include's	Rémi Denis-Courmont	2022-10-05	3	-4/+0
\| \| \| \|	Pointed out by Andreas Rheinhardt.
*	lavu/riscv: helper to read the vector length	Rémi Denis-Courmont	2022-09-28	1	-0/+45
\|
*	lavu/fixeddsp: RISC-V V butterflies_fixed	Rémi Denis-Courmont	2022-09-27	3	-1/+81
\|
*	lavu/floatdsp: RISC-V V scalarproduct_float	Rémi Denis-Courmont	2022-09-27	2	-0/+22
\|
*	lavu/floatdsp: RISC-V V vector_fmul_window	Rémi Denis-Courmont	2022-09-27	2	-0/+36
\|
*	lavu/floatdsp: RISC-V V vector_fmul_reverse	Rémi Denis-Courmont	2022-09-27	2	-0/+24
\|
*	lavu/floatdsp: RISC-V V butterflies_float	Rémi Denis-Courmont	2022-09-27	2	-0/+20
\|
*	lavu/floatdsp: RISC-V V vector_fmul_add	Rémi Denis-Courmont	2022-09-27	2	-0/+22
\|
*	lavu/floatdsp: RISC-V V vector_dmac_scalar	Rémi Denis-Courmont	2022-09-27	2	-0/+21
\|
*	lavu/floatdsp: RISC-V V vector_fmac_scalar	Rémi Denis-Courmont	2022-09-27	2	-0/+22
\|
*	lavu/floatdsp: RISC-V V vector_dmul	Rémi Denis-Courmont	2022-09-27	2	-1/+22
\|
*	lavu/floatdsp: RISC-V V vector_fmul	Rémi Denis-Courmont	2022-09-27	2	-1/+22
\|
*	lavu/floatdsp: RISC-V V vector_dmul_scalar	Rémi Denis-Courmont	2022-09-27	2	-0/+23
\|
*	lavu/floatdsp: RISC-V V vector_fmul_scalar	Rémi Denis-Courmont	2022-09-27	3	-1/+81
\| \| \| \| \|	This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.
*	lavu/riscv: fallback macros for SH{1, 2, 3}ADD	Rémi Denis-Courmont	2022-09-27	1	-0/+19
\| \| \| \| \|	Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.
*	lavu/cpu: CPU flags for the RISC-V Vector extension	Rémi Denis-Courmont	2022-09-27	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.
*	lavu/riscv: initial common header for assembler macros	Rémi Denis-Courmont	2022-09-27	1	-0/+77
\|
*	lavu/cpu: detect RISC-V base extensions	Rémi Denis-Courmont	2022-09-27	2	-0/+57
\| \| \| \| \| \| \| \| \| \|	This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.
*	lavu/riscv: fix off-by-one in bit-magnitude clip	Rémi Denis-Courmont	2022-09-15	1	-2/+2
\|
*	lavu/riscv: fix av_clip_int16	Rémi Denis-Courmont	2022-09-14	1	-2/+2
\| \| \| \| \| \|	Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>
*	lavu/riscv: add <intmath.h> optimisations	Rémi Denis-Courmont	2022-09-13	1	-0/+103
\| \| \| \| \|	This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.
*	lavu/riscv: byte-swap operations	Rémi Denis-Courmont	2022-09-13	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.
*	lavu/riscv: AV_READ_TIME cycle counter	Rémi Denis-Courmont	2022-09-13	1	-0/+53
	This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.