diff options
author | Martin Storsjö <martin@martin.st> | 2022-07-13 00:48:48 +0300 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2022-07-16 17:26:17 +0300 |
commit | 4136405c86162063e45d40d55c9985f348d4ea0a (patch) | |
tree | bf05ccefe508d21a1078cef20f6ff36968b7bd94 /libavcodec/parsers.c | |
parent | 68a03f64240dcbe408c3fd43d1071a105508a588 (diff) | |
download | ffmpeg-4136405c86162063e45d40d55c9985f348d4ea0a.tar.gz |
aarch64: me_cmp: Don't do uaddlv once per iteration
The max height is currently documented as 16; the max difference per
pixel is 255, and a .8h element can easily contain 16*255, thus keep
accumulating in two .8h vectors, and just do the final accumulationat the
end. This should work for heights up to 256.
This requires a minor register renumbering in ff_pix_abs16_xy2_neon.
Before: Cortex A53 A72 A73 Graviton 3
pix_abs_0_0_neon: 97.7 47.0 37.5 22.7
pix_abs_0_1_neon: 154.0 59.0 52.0 25.0
pix_abs_0_3_neon: 179.7 96.7 87.5 41.2
After:
pix_abs_0_0_neon: 96.0 39.2 31.2 22.0
pix_abs_0_1_neon: 150.7 59.7 46.2 23.7
pix_abs_0_3_neon: 175.7 83.7 81.7 38.2
Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/parsers.c')
0 files changed, 0 insertions, 0 deletions