| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
with intel AVX512 VNNI
This commit enabled assembly code with intel AVX512 VNNI and added unit test for sobel filter
sobel_c: 4537
sobel_avx512icl 2136
Signed-off-by: bwang30 <[email protected]>
Signed-off-by: Haihao Xiang <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- ff_pix_abs16_neon
- ff_pix_abs16_xy2_neon
In direct micro benchmarks of these ff functions verses their C implementations,
these functions performed as follows on AWS Graviton 3.
ff_pix_abs16_neon:
pix_abs_0_0_c: 141.1
pix_abs_0_0_neon: 19.6
ff_pix_abs16_xy2_neon:
pix_abs_0_3_c: 269.1
pix_abs_0_3_neon: 39.3
Tested with:
./tests/checkasm/checkasm --test=motion --bench --disable-linux-perf
Signed-off-by: Jonathan Swinney <[email protected]>
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ben Avison <[email protected]>
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real stream decode will fall somewhere
between these two extremes.
Signed-off-by: Ben Avison <[email protected]>
Signed-off-by: Martin Storsjö <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
changes since v2:
* fixed label
changes since v1:
* remove vex intruction on sse4 path
* some load/pack marcos use less intructions
* fixed some typos
yuv2gbrp_full_X_4_512_c: 12757.6
yuv2gbrp_full_X_4_512_sse2: 8946.6
yuv2gbrp_full_X_4_512_sse4: 5138.6
yuv2gbrp_full_X_4_512_avx2: 3889.6
yuv2gbrap_full_X_4_512_c: 15368.6
yuv2gbrap_full_X_4_512_sse2: 11916.1
yuv2gbrap_full_X_4_512_sse4: 6294.6
yuv2gbrap_full_X_4_512_avx2: 3477.1
yuv2gbrp9be_full_X_4_512_c: 14381.6
yuv2gbrp9be_full_X_4_512_sse2: 9139.1
yuv2gbrp9be_full_X_4_512_sse4: 5150.1
yuv2gbrp9be_full_X_4_512_avx2: 2834.6
yuv2gbrp9le_full_X_4_512_c: 12990.1
yuv2gbrp9le_full_X_4_512_sse2: 9118.1
yuv2gbrp9le_full_X_4_512_sse4: 5132.1
yuv2gbrp9le_full_X_4_512_avx2: 2833.1
yuv2gbrp10be_full_X_4_512_c: 14401.6
yuv2gbrp10be_full_X_4_512_sse2: 9133.1
yuv2gbrp10be_full_X_4_512_sse4: 5126.1
yuv2gbrp10be_full_X_4_512_avx2: 2837.6
yuv2gbrp10le_full_X_4_512_c: 12718.1
yuv2gbrp10le_full_X_4_512_sse2: 9106.1
yuv2gbrp10le_full_X_4_512_sse4: 5120.1
yuv2gbrp10le_full_X_4_512_avx2: 2826.1
yuv2gbrap10be_full_X_4_512_c: 18535.6
yuv2gbrap10be_full_X_4_512_sse2: 33617.6
yuv2gbrap10be_full_X_4_512_sse4: 6264.1
yuv2gbrap10be_full_X_4_512_avx2: 3422.1
yuv2gbrap10le_full_X_4_512_c: 16724.1
yuv2gbrap10le_full_X_4_512_sse2: 11787.1
yuv2gbrap10le_full_X_4_512_sse4: 6282.1
yuv2gbrap10le_full_X_4_512_avx2: 3441.6
yuv2gbrp12be_full_X_4_512_c: 13723.6
yuv2gbrp12be_full_X_4_512_sse2: 9128.1
yuv2gbrp12be_full_X_4_512_sse4: 7997.6
yuv2gbrp12be_full_X_4_512_avx2: 2844.1
yuv2gbrp12le_full_X_4_512_c: 12257.1
yuv2gbrp12le_full_X_4_512_sse2: 9107.6
yuv2gbrp12le_full_X_4_512_sse4: 5142.6
yuv2gbrp12le_full_X_4_512_avx2: 2837.6
yuv2gbrap12be_full_X_4_512_c: 18511.1
yuv2gbrap12be_full_X_4_512_sse2: 12156.6
yuv2gbrap12be_full_X_4_512_sse4: 6251.1
yuv2gbrap12be_full_X_4_512_avx2: 3444.6
yuv2gbrap12le_full_X_4_512_c: 16687.1
yuv2gbrap12le_full_X_4_512_sse2: 11785.1
yuv2gbrap12le_full_X_4_512_sse4: 6243.6
yuv2gbrap12le_full_X_4_512_avx2: 3446.1
yuv2gbrp14be_full_X_4_512_c: 13690.6
yuv2gbrp14be_full_X_4_512_sse2: 9120.6
yuv2gbrp14be_full_X_4_512_sse4: 5138.1
yuv2gbrp14be_full_X_4_512_avx2: 2843.1
yuv2gbrp14le_full_X_4_512_c: 14995.6
yuv2gbrp14le_full_X_4_512_sse2: 9119.1
yuv2gbrp14le_full_X_4_512_sse4: 5126.1
yuv2gbrp14le_full_X_4_512_avx2: 2843.1
yuv2gbrp16be_full_X_4_512_c: 12367.1
yuv2gbrp16be_full_X_4_512_sse2: 8233.6
yuv2gbrp16be_full_X_4_512_sse4: 4820.1
yuv2gbrp16be_full_X_4_512_avx2: 2666.6
yuv2gbrp16le_full_X_4_512_c: 10904.1
yuv2gbrp16le_full_X_4_512_sse2: 8214.1
yuv2gbrp16le_full_X_4_512_sse4: 4824.1
yuv2gbrp16le_full_X_4_512_avx2: 2629.1
yuv2gbrap16be_full_X_4_512_c: 26569.6
yuv2gbrap16be_full_X_4_512_sse2: 10884.1
yuv2gbrap16be_full_X_4_512_sse4: 5488.1
yuv2gbrap16be_full_X_4_512_avx2: 3272.1
yuv2gbrap16le_full_X_4_512_c: 14010.1
yuv2gbrap16le_full_X_4_512_sse2: 10562.1
yuv2gbrap16le_full_X_4_512_sse4: 5463.6
yuv2gbrap16le_full_X_4_512_avx2: 3255.1
yuv2gbrpf32be_full_X_4_512_c: 14524.1
yuv2gbrpf32be_full_X_4_512_sse2: 8552.6
yuv2gbrpf32be_full_X_4_512_sse4: 4636.1
yuv2gbrpf32be_full_X_4_512_avx2: 2474.6
yuv2gbrpf32le_full_X_4_512_c: 13060.6
yuv2gbrpf32le_full_X_4_512_sse2: 9682.6
yuv2gbrpf32le_full_X_4_512_sse4: 4298.1
yuv2gbrpf32le_full_X_4_512_avx2: 2453.1
yuv2gbrapf32be_full_X_4_512_c: 18629.6
yuv2gbrapf32be_full_X_4_512_sse2: 11363.1
yuv2gbrapf32be_full_X_4_512_sse4: 15201.6
yuv2gbrapf32be_full_X_4_512_avx2: 3727.1
yuv2gbrapf32le_full_X_4_512_c: 16677.6
yuv2gbrapf32le_full_X_4_512_sse2: 10221.6
yuv2gbrapf32le_full_X_4_512_sse4: 5693.6
yuv2gbrapf32le_full_X_4_512_avx2: 3656.6
Reviewed-by: Paul B Mahol <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|
|
|
|
|
| |
Also add to `make fate-checkasm' target.
Signed-off-by: J. Dekker <[email protected]>
|
|
|
|
|
|
| |
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
|
|
|
|
|
|
|
| |
This tests the hscale 8bpp to 14/18bpp functions with different filter
sizes.
Signed-off-by: Josh de Kock <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ting Fu <[email protected]>
Signed-off-by: Ruiling Song <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: Ruiling Song <[email protected]>
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|
|
|
| |
Fixes fate in Windows subsystem for Linux.
|
|
|
|
|
| |
Reviewed-by: Paul B Mahol <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit adff97be5e2ff51c0bb66080c2f904ed40b6c571.
It currently fails on Windows targets.
Signed-off-by: James Almer <[email protected]>
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: James Almer <[email protected]>
|
|\
| |
| |
| |
| |
| |
| | |
* commit '39e16ee2289e4240a82597b97db5541bbbd2b996':
Revert "fate: Skip the checkasm test if CONFIG_STATIC is disabled"
Merged-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we use dllexport properly for shared libraries on windows,
there's no longer any issue with linking the object files for
e.g. libavcodec statically into checkasm. (It's still not possible
to link the built object files for e.g. libavformat statically to
libavcodec though, since libavformat exepcts to load av_export_*
symbols from a DLL.)
This reverts commit 4e62b57ee03928c12a3119dcaf78ffa1f4d6985f.
Signed-off-by: Martin Storsjö <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '698ac8f9cabd053f2c19346a77b92f8eae4218fc':
fate: Make null comparison method more useful
Merged-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| | |
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
This includes various fixes and improvements from James Almer.
Signed-off-by: James Almer <[email protected]>
|
| |
| |
| |
| |
| |
| | |
Ported from libavutil/tests/float_dsp.c
Signed-off-by: James Almer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '4e62b57ee03928c12a3119dcaf78ffa1f4d6985f':
fate: Skip the checkasm test if CONFIG_STATIC is disabled
Merged-by: Clément Bœsch <[email protected]>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When building DLLs with MSVC, CONFIG_STATIC is disabled (see
d66c52c2b3694 for a more verbose explanation) since the built
object files can't be linked statically (which checkasm does).
This worked up until recently, only by luck.
Signed-off-by: Martin Storsjö <[email protected]>
|
| |
| |
| |
| |
| | |
Tested-by: Michael Niedermayer <[email protected]>
Signed-off-by: James Almer <[email protected]>
|
|\|
| |
| |
| |
| |
| |
| | |
* commit '4537647c0429fe7c8ee655ac3fda856ba67f58a0':
fate: checkasm: Split monolithic test into individual components
Merged-by: Clément Bœsch <[email protected]>
|
|/ |
|
|
|