ffmpeg - Mirror of FFmpeg git repo

	Commit message (Collapse)	Author	Age	Files	Lines
*	lavu/float_dsp: fix compilation with RISC-V ILP32 ABI	Rémi Denis-Courmont	2024-11-25	1	-0/+16
\|
*	libavutil/riscv: Make use of elf_aux_info() on FreeBSD / OpenBSD riscv	Brad Smith	2024-11-18	1	-2/+2
\| \| \| \| \| \| \| \| \|	libavutil/riscv: Make use of elf_aux_info() on FreeBSD / OpenBSD riscv FreeBSD/OpenBSD riscv have elf_aux_info(). Signed-off-by: Brad Smith <brad@comstyle.com> Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
*	lavu/riscv: fix compilation without Vector support	Rémi Denis-Courmont	2024-11-18	1	-1/+1
\| \| \| \| \| \| \|	The half-baked assembler in Clang 16 and earlier can't process our RISC-V assembler. This adds yet another work around that. If you must use Clang, please use version 17 or later.
*	avutil/cpu_internal: Provide ff_getauxval() wrapper for getauxvaul()	Brad Smith	2024-09-09	1	-1/+1
\| \| \| \| \| \| \|	Initially used for getauxval() but will be used to add support for other API, such as elf_aux_info(). Signed-off-by: Brad Smith <brad@comstyle.com>
*	lavu/riscv: drop probing for zba CPU capability	Rémi Denis-Courmont	2024-08-05	1	-9/+1
\|
*	lavu/riscv: depend on RVB and simplify accordingly	Rémi Denis-Courmont	2024-08-05	3	-5/+4
\|
*	lavc/riscv: drop probing for F & D extensions	Rémi Denis-Courmont	2024-08-01	1	-12/+0
\| \| \| \| \| \| \| \| \| \|	F and D extensions are included in all RISC-V application profiles ever made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be selected at compilation time. Currently, there are no consumers for these two flags. If there is ever a need to reintroduce F- or D-specific optimisations, we can always use __riscv_f or __riscv_d compiler predefined macros respectively.
*	lavu/riscv: fix return type	Rémi Denis-Courmont	2024-08-01	1	-2/+2
\|
*	lavu/riscv: Revert d808070, removing AV_READ_TIME	Nathan E. Egge	2024-07-31	1	-54/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementation of ff_read_time() for RISC-V uses rdtime which has precision on existing hardware too low (!) for benchmarking purposes. Deleting this implementation falls back on clock_gettime() which was added as the default ff_read_time() implementation in 33e4cc9. Below are metrics gathered on SpacemiT K1, before and after this commit: Before: $ tests/checkasm/checkasm --bench benchmarking with native FFmpeg timers nop: 0.0 checkasm: using random seed 3473665261 checkasm: bench runs 1024 (1 << 10) RVI: - pixblockdsp.get_pixels [OK] - vc1dsp.mspel_pixels [OK] RVF: - audiodsp.audiodsp [OK] checkasm: all 4 tests passed audiodsp.vector_clipf_c: 1388.7 audiodsp.vector_clipf_rvf: 261.5 get_pixels_c: 2.0 get_pixels_rvi: 1.5 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5 After: $ tests/checkasm/checkasm --bench benchmarking with native FFmpeg timers nop: 56.4 checkasm: using random seed 1021411603 checkasm: bench runs 1024 (1 << 10) RVI: - pixblockdsp.get_pixels [OK] - vc1dsp.mspel_pixels [OK] RVF: - audiodsp.audiodsp [OK] checkasm: all 4 tests passed audiodsp.vector_clipf_c: 23236.4 audiodsp.vector_clipf_rvf: 11038.4 get_pixels_c: 79.6 get_pixels_rvi: 48.4 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
*	lavu/riscv: count bytes rather than words for bswap32	Rémi Denis-Courmont	2024-07-30	1	-5/+5
\| \| \| \|	This removes the dependency on Zba at essentially zero cost.
*	lavu/riscv: implement floating point clips	Rémi Denis-Courmont	2024-07-28	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Unlike x86, fmin/fmax are single instructions, not function calls. They are much much faster than doing a comparison, then branching based on its results. With this, audiodsp.vector_clipf gets almost twice as fast, and a properly unrollled version of it gets 4-5x faster, on SiFive-U74. This is only the low-hanging fruit: FFMIN and FFMAX are presumably affected as well. This likely applies to other instruction sets with native IEEE floats, especially those lacking a conditional select instruction.
*	lavu/riscv: add forward-edge CFI landing pads	Rémi Denis-Courmont	2024-07-25	3	-0/+19
\|
*	lavu/riscv: assembly for zicfilp LPAD	Rémi Denis-Courmont	2024-07-25	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This instruction, if aligned on a 4-byte boundary, defines a valid target ("landing pad") for an indirect call or jump. Since this instruction is a HINT, it is safe to assemble even if not included in the target instruction set architecture. The necessary alignment is already provided by the `func` macro. However this still lacks the ELF attribute to indicate that the zicfilp is supported in simple mode. This is left for future work as the ELF specification is not ratified as of yet. This will also nonobviously require the assembler to support zicfilp, insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as its temporary register.
*	lavu/riscv: align functions to 4 bytes	Rémi Denis-Courmont	2024-07-25	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	Currently the start of the byte range for each function is aligned to 4 bytes. But this can lead to situations whence the function is preceded by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual instruction and the function symbol are only aligned on 2 bytes. This forcefully disables compression for the alignment and the symbol, thus ensuring that there is no padding before the function.
*	lavu/riscv: add CPU flag for B bit manipulations	Rémi Denis-Courmont	2024-07-25	1	-0/+13
\| \| \| \| \| \| \| \|	The B extension was finally ratified in May 2024, encompassing: - Zba (addresses), - Zbb (basics) and - Zbs (single bits). It does not include Zbc (base-2 polynomials).
*	lavu/riscv: remove bespoke SH{1,2,3}ADD assembler	Rémi Denis-Courmont	2024-07-25	1	-19/+0
\| \| \| \| \| \|	configure checks that the assembler supports the B extension (or rather its constituents) anyway. These macros were dodging sanity checks for unsupported instructions and nothing else.
*	lavu/riscv: require B or zba explicitly	Rémi Denis-Courmont	2024-07-25	2	-19/+19
\|
*	lavu/riscv: grok B as an extension	Rémi Denis-Courmont	2024-07-25	1	-1/+6
\| \| \| \| \| \| \| \|	The RISC-V B bit manipulation extension was ratified only two months ago. But it is strictly equivalent to the union of the zba, zbb and zbs extensions which were defined almost 3 years earlier. Rather than require new assembler, we can just match the extension name manually and translate it into its constituent parts.
*	lavu/riscv: allow any number of extensions	Rémi Denis-Courmont	2024-07-25	1	-8/+9
\| \| \| \| \|	This reworks the func/endfunc macros to support any number of ISA extension as parameters.
*	lavu/riscv: do not fallback to AT_HWCAP auxillary vector	Rémi Denis-Courmont	2024-07-22	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If __riscv_hwprobe() fails, then the kernel version is presumably too old. There is not much point falling back to the auxillary vector. - The Linux kernel requires I, so the flag is always set on Linux, and run-time detection is unnecessary. Our RISC-V assembler does anyway not support targets without I. - Linux can compile with or without F and D, but it cannot perform run-time detection for them (a kernel with F support will not boot a processor without F). The run-time detection is thus useless in that case. Besides F and D extensions are used throughout the C code, so their run-time detection would not be practical. - Support for V was added in a later kernel version than riscv_hwprobe(), so the system call will always be available if the kernel supports V. The only exception would be vendor kernel forks, but those are known to haphasardly pretend to support V on systems without actual V support, or with only pre-ratification binary-incompatible version. Furthermore, a large chunk of our optimisations require Zba and/or Zbb which cannot be detected with HWCAP in those kernels. For what it is worth, OpenJDK already took a similar action. Note that this keeps AT_HWCAP usage for platforms with neither C run-time <sys/hwprobe.h> nor kernel <asm/hwprobe.h>, notably kernels other than Linux.
*	lavu/lls: remove useless VSETVL	Rémi Denis-Courmont	2024-06-29	1	-1/+0
\| \| \| \|	This changes neither VL nor VTYPE, so it can safely be removed.
*	avutil/riscv/cpu: fix __riscv_v_min_vlen typo	J. Dekker	2024-06-26	1	-1/+1
\| \| \| \|	Signed-off-by: J. Dekker <jdek@itanimul.li>
*	lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time	Rémi Denis-Courmont	2024-06-11	1	-0/+101
\| \| \| \| \| \| \| \| \|	Zbb static Zbb dynamic I baseline clz 0.668032642 1.336072283 19.552376803 clzl 0.668092643 1.336181786 26.110855571 ctz 1.336208533 3.340209702 26.054869008 ctzl 1.336247784 3.340362457 26.055266290 (seconds for 1 billion iterations on a SiFive-U74 core)
*	lavu/riscv: use Zbb CPOP/CPOPW at run-time	Rémi Denis-Courmont	2024-06-11	1	-4/+69
\| \| \| \| \| \| \|	Zbb static Zbb dynamic I baseline popcount 1.336129286 3.469067758 20.146362909 popcountl 1.336322291 3.340292968 20.224829821 (seconds for 1 billion iterations on a SiFive-U74 core)
*	lavu/riscv: use Zbb REV8 at run-time	Rémi Denis-Courmont	2024-06-11	1	-2/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise swaps. The result is about five times slower than if targetting Zbb statically, but still a lot faster than the default bespoke C code or a call to GCC run-time functions. For 16-bit swap, this is however unsurprisingly a lot worse, and so this sticks to the baseline. In fact, even using REV8 statically does not seem to be beneficial in that case. Zbb static Zbb dynamic I baseline bswap16: 0.668184765 3.340764069 0.668029012 bswap32: 0.668174014 3.340763319 9.353855435 bswap64: 0.668221765 3.340496313 14.698672283 (seconds for 1 billion iterations on a SiFive-U74 core)
*	riscv: probe for Zbb extension at load time	Rémi Denis-Courmont	2024-06-11	3	-1/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.
*	lavu/lls: R-V V update_lls	Rémi Denis-Courmont	2024-06-01	3	-1/+99
\| \| \| \| \| \| \|	update_lls_8_c: 7.5 update_lls_8_rvv_f64: 4.2 update_lls_12_c: 14.5 update_lls_12_rvv_f64: 5.7
*	lavu/float_dsp: R-V V scalarproduct_double	Rémi Denis-Courmont	2024-05-31	2	-0/+24
\| \| \| \| \| \| \| \| \| \|	C908: scalarproduct_double_c: 39.2 scalarproduct_double_rvv_f64: 10.5 X60: scalarproduct_double_c: 35.0 scalarproduct_double_rvv_f64: 5.2
*	riscv: allow passing addend to vtype_vli macro	Rémi Denis-Courmont	2024-05-30	1	-3/+6
\| \| \| \| \|	A constant (-1) is added to the length value, so we can have an added for free, and optimise the addition away if the addend is exactly 1.
*	lavu/riscv: add assembler macros for adjusting vector LMUL	Rémi Denis-Courmont	2024-05-19	1	-49/+117
\| \| \| \| \| \| \| \| \| \| \|	vtype_vli computes the VTYPE value with the optimal LMUL for a given element width, tail and mask policies and a run-time vector length. vtype_ivli does the same, but with the compile-time constant vector length. vwtypei and vntypei can be used to widen or narrow a VTYPE value for use in mixed-width vector-optimised functions.
*	lavu/riscv: fix parsing the unaligned access capability	Rémi Denis-Courmont	2024-05-15	1	-2/+6
\| \| \| \|	Pointed-out-by: Stefan O'Rear <sorear@fastmail.com>
*	lavu/riscv: remove bogus B extension	Rémi Denis-Courmont	2024-05-14	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	The B Bit manipulation extension was not defined to this day, and probably never will. Instead it was broken down into Zba, Zbb, Zbc and Zbs with no particular blessed set to make up B. This removes the bogus field test. Linux never set this bit, nor (AFAICT) did FreeBSD or any other OS. We can always add it back in the unlikely event that it gets taken into use.
*	lavu/riscv: CPU flag for fast misaligned accesses	Rémi Denis-Courmont	2024-05-14	1	-0/+3
\|
*	lavu/riscv: fallback to raw hwprobe() system call	Rémi Denis-Courmont	2024-05-14	1	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Not all C run-times support this, and even then, it will be a while before distributions provide recent enough versions thereof. Since this is a trivial system call wrapper, we might just as well call the corresponding kernel system call directly where the C run-time lacks support but the kernel headers are new enough (as is the case on Debian Unstable at the time of writing). In doing so, we need to add a few more guards as the first suitable kernel (headers) release did not expose the V, Zba and Zbb extensions.
*	lavu/riscv: add ff_rv_vlen_least()	Rémi Denis-Courmont	2024-05-13	1	-0/+21
\| \| \| \| \|	This inline function checks that the vector length is at least a given value. With this, most run-time VLEN checks can be optimised away.
*	lavu/riscv: add Zvbb CPU capability detection	Rémi Denis-Courmont	2024-05-11	1	-0/+7
\| \| \| \|	This requires Linux kernel version 6.8 or later.
*	lavu/riscv: remove bespoke assembler for MIN	Rémi Denis-Courmont	2024-05-10	1	-5/+0
\| \| \| \|	This is no longer necessary as Zbb is now always explicitly required.
*	lavu/riscv: allow requesting a second extension	Rémi Denis-Courmont	2024-05-10	1	-3/+6
\|
*	lavu/riscv: fix build without <sys/hwprobe.h>	Rémi Denis-Courmont	2024-05-08	1	-2/+2
\|
*	lavu/riscv: add hwprobe() for CPU detection	Rémi Denis-Courmont	2024-05-06	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	This adds the Linux-specific function call to detect CPU features. Unlike the more portable auxillary vector, this supports extensions other than single lettered ones. At this point, FFmpeg already needs this to detect Zba and Zbb at run-time, and probably will need it for Zvbb in the near future. Support will be available in glibc 2.40 onward.
*	lavu/riscv: indent code	Rémi Denis-Courmont	2024-05-06	1	-13/+15
\| \| \| \| \|	This reindents code to prepare for the next changeset. No functional changes.
*	lavu/fixed_dsp: R-V V fmul_window_scaled	Rémi Denis-Courmont	2023-11-23	2	-1/+54
\| \| \| \| \|	vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
*	lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window	Rémi Denis-Courmont	2023-11-23	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5
*	lavu/fixed_dsp: optimise R-V V fmul_reverse	Rémi Denis-Courmont	2023-11-23	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.
*	riscv: fix builds without Zbb support	Rémi Denis-Courmont	2023-11-18	1	-0/+5
\|
*	lavu/riscv: fix typo	Rémi Denis-Courmont	2023-10-29	1	-1/+1
\|
*	lavu/fixed_dsp: R-V V vector_fmul_window	Rémi Denis-Courmont	2023-10-09	2	-0/+50
\|
*	lavu/fixed_dsp: R-V V vector_fmul	Rémi Denis-Courmont	2023-10-09	2	-0/+20
\| \| \| \| \|	vector_fmul_fixed_c: 4.0 vector_fmul_fixed_rvv_i64: 0.5
*	lavu/fixed_dsp: R-V V vector_fmul_reverse	Rémi Denis-Courmont	2023-10-09	2	-0/+27
\|
*	lavu/fixed_dsp: R-V V vector_fmul_add	Rémi Denis-Courmont	2023-10-09	2	-0/+26
\| \| \| \| \|	vector_fmul_add_fixed_c: 2.2 vector_fmul_add_fixed_rvv_i64: 0.5