aboutsummaryrefslogtreecommitdiffstats
path: root/libavutil/riscv
Commit message (Collapse)AuthorAgeFilesLines
* lavu/fixed_dsp: R-V V fmul_window_scaledRémi Denis-Courmont2023-11-232-1/+54
| | | | | vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
* lavu/float_dsp: optimise R-V V fmul_reverse & fmul_windowRémi Denis-Courmont2023-11-231-6/+8
| | | | | | | | | | | | | | | | Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5
* lavu/fixed_dsp: optimise R-V V fmul_reverseRémi Denis-Courmont2023-11-231-3/+4
| | | | | | | | | | | | | | | | | Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.
* riscv: fix builds without Zbb supportRémi Denis-Courmont2023-11-181-0/+5
|
* lavu/riscv: fix typoRémi Denis-Courmont2023-10-291-1/+1
|
* lavu/fixed_dsp: R-V V vector_fmul_windowRémi Denis-Courmont2023-10-092-0/+50
|
* lavu/fixed_dsp: R-V V vector_fmulRémi Denis-Courmont2023-10-092-0/+20
| | | | | vector_fmul_fixed_c: 4.0 vector_fmul_fixed_rvv_i64: 0.5
* lavu/fixed_dsp: R-V V vector_fmul_reverseRémi Denis-Courmont2023-10-092-0/+27
|
* lavu/fixed_dsp: R-V V vector_fmul_addRémi Denis-Courmont2023-10-092-0/+26
| | | | | vector_fmul_add_fixed_c: 2.2 vector_fmul_add_fixed_rvv_i64: 0.5
* lavu/float_dsp: adjust multipler in R-V V fmul_windowRémi Denis-Courmont2023-10-091-1/+1
| | | | | | The gather index vector is only used as double-length (due to register pressure), so no need to initialise it for quad-length. Basically this matches the multiplier in the prologue to the the multipler in the loop.
* lavu/fixed_dsp: R-V V scalarproductRémi Denis-Courmont2023-10-072-1/+27
|
* lavu/float_dsp: avoid reg-stride in R-V V fmul_windowRémi Denis-Courmont2023-10-031-20/+25
|
* lavu/float_dsp: avoid reg-stride in R-V V reverse_fmulRémi Denis-Courmont2023-10-031-6/+11
| | | | | | This revectors the inner loop to reverse vectors element in vectors, thus eliminating the negative register stride. Note that RVV does not have a vector reverse instruction, so this uses a gather.
* riscv: factor out the bswap32 assemblerRémi Denis-Courmont2023-10-021-0/+65
|
* Revert "lavu/timer: remove gratuitous volatile"Rémi Denis-Courmont2023-09-281-2/+2
| | | | | | | | It does not make much sense to me, but GCC somehow optimises the inline assembler even though the output is very obviously used and having observable side effects. This reverts commit 09731fbfc3a914ec4f6ffad60aa9062db6a8f6aa.
* lavu/timer: specify RISC-V time unitRémi Denis-Courmont2023-08-241-0/+1
|
* lavu/timer: remove gratuitous volatileRémi Denis-Courmont2023-08-241-2/+2
| | | | AV_READ_TIME has no side effects. It does not need to be volatile.
* lavu/timer: use time for AV_READ_TIME on RISC-VRémi Denis-Courmont2023-08-241-6/+6
| | | | | | | | | | | | | | | So far, AV_READ_TIME would return the cycle counter. This posed two problems: 1) On recent systems, it would just raise an illegal instruction exception. Indeed RDCYCLE is blocked in user space to ward off some side channel attacks. In particular, this would cause the random number generator to crash. 2) It does not match the x86 behaviour and the apparent original intent of AV_READ_TIME in the functional code base (outside test cases). So this replaces the cycle counter with the time counter. The unit is a platform-dependent constant fraction of time, and the value should be stable across harts (RISC-V lingo for physical CPU thread).
* lavu/float_dsp: rework RISC-V V scalar productRémi Denis-Courmont2023-07-201-6/+8
| | | | | | | | | | | | | 1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2
* lavu/float_dsp: unroll RISC-V V loopsRémi Denis-Courmont2023-07-201-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)
* lavu: add/use flag for RISC-V Zba extensionRémi Denis-Courmont2023-07-193-16/+21
| | | | | | The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.
* lavu/fixed_dsp: unroll RISC-V V loopRémi Denis-Courmont2023-07-171-1/+1
| | | | | | | | | Before: butterflies_fixed_c: 804.7 butterflies_fixed_rvv_i32: 348.2 After: butterflies_fixed_rvv_i32: 308.7
* riscv/intmath: use builtins for counting onesRémi Denis-Courmont2023-05-021-26/+4
| | | | | | As with the earlier bswap change, all versions of GCC and Clang that support RISC-V support the popcount built-ins, so we can just use them instead of inline assembler.
* riscv/bswap: use compiler builtinsRémi Denis-Courmont2023-05-021-47/+5
| | | | | | | | | | | av_bswapXX() are used in context that expect exact size types, notably variable arguments to av_log(). On Linux RV64, uint_fast32_t is an unsigned long, so the current inline assembler does not work properly. Since GCC and Clang gained their byte-swap built-ins before they supported RISC-V, we can simply defer to them. As an added bonus, the compiler can do instruction scheduling, which it couldn't with the Zbb inline assembler.
* riscv: fix scalar product initialisationRémi Denis-Courmont2022-10-131-1/+1
| | | | | | | | | | | | | | VSETVLI xd, x0, ...' has rather nonobvious semantics: - If xd is x0, then it preserves the current vector length. - If xd is not x0, it sets the vector length to the supported maximum. Also somewhat confusingly, while VMV.X.S always does its thing regardless of the selected vector length, VMV.S.X does _nothing_ if the selected vector length is zero. So the current code breaks fails to initialise the accumulator if we are unlucky to have a selected vector length of zero on entry. Fix it by forcing the vector length to one.
* lavu/riscv: helper macro for VTYPE encodingRémi Denis-Courmont2022-10-101-0/+75
| | | | | | | | | | | | On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies.
* lavu/riscv: CPU flag for the Zbb extensionRémi Denis-Courmont2022-10-051-0/+6
| | | | | | | | | | | | Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.
* riscv: remove unnecessary #include'sRémi Denis-Courmont2022-10-053-4/+0
| | | | Pointed out by Andreas Rheinhardt.
* lavu/riscv: helper to read the vector lengthRémi Denis-Courmont2022-09-281-0/+45
|
* lavu/fixeddsp: RISC-V V butterflies_fixedRémi Denis-Courmont2022-09-273-1/+81
|
* lavu/floatdsp: RISC-V V scalarproduct_floatRémi Denis-Courmont2022-09-272-0/+22
|
* lavu/floatdsp: RISC-V V vector_fmul_windowRémi Denis-Courmont2022-09-272-0/+36
|
* lavu/floatdsp: RISC-V V vector_fmul_reverseRémi Denis-Courmont2022-09-272-0/+24
|
* lavu/floatdsp: RISC-V V butterflies_floatRémi Denis-Courmont2022-09-272-0/+20
|
* lavu/floatdsp: RISC-V V vector_fmul_addRémi Denis-Courmont2022-09-272-0/+22
|
* lavu/floatdsp: RISC-V V vector_dmac_scalarRémi Denis-Courmont2022-09-272-0/+21
|
* lavu/floatdsp: RISC-V V vector_fmac_scalarRémi Denis-Courmont2022-09-272-0/+22
|
* lavu/floatdsp: RISC-V V vector_dmulRémi Denis-Courmont2022-09-272-1/+22
|
* lavu/floatdsp: RISC-V V vector_fmulRémi Denis-Courmont2022-09-272-1/+22
|
* lavu/floatdsp: RISC-V V vector_dmul_scalarRémi Denis-Courmont2022-09-272-0/+23
|
* lavu/floatdsp: RISC-V V vector_fmul_scalarRémi Denis-Courmont2022-09-273-1/+81
| | | | | This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.
* lavu/riscv: fallback macros for SH{1, 2, 3}ADDRémi Denis-Courmont2022-09-271-0/+19
| | | | | Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.
* lavu/cpu: CPU flags for the RISC-V Vector extensionRémi Denis-Courmont2022-09-271-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.
* lavu/riscv: initial common header for assembler macrosRémi Denis-Courmont2022-09-271-0/+77
|
* lavu/cpu: detect RISC-V base extensionsRémi Denis-Courmont2022-09-272-0/+57
| | | | | | | | | | This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.
* lavu/riscv: fix off-by-one in bit-magnitude clipRémi Denis-Courmont2022-09-151-2/+2
|
* lavu/riscv: fix av_clip_int16Rémi Denis-Courmont2022-09-141-2/+2
| | | | | | Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>
* lavu/riscv: add <intmath.h> optimisationsRémi Denis-Courmont2022-09-131-0/+103
| | | | | This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.
* lavu/riscv: byte-swap operationsRémi Denis-Courmont2022-09-131-0/+74
| | | | | | | | | | | | | If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.
* lavu/riscv: AV_READ_TIME cycle counterRémi Denis-Courmont2022-09-131-0/+53
This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.