diff options
author | Martin Storsjö <martin@martin.st> | 2021-09-03 13:56:05 +0300 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2021-10-18 14:27:58 +0300 |
commit | fd3bd5c492834bd100bb2462d1e1dc25a66f28f8 (patch) | |
tree | fefcefe8304a1cf70ce38914e094ee04173470ba /libavcodec/mathtables.c | |
parent | 2d5a7f6d002813ee67bffa63d4afcd439dd329a3 (diff) | |
download | ffmpeg-fd3bd5c492834bd100bb2462d1e1dc25a66f28f8.tar.gz |
aarch64: h264qpel: Do vertical filtering without transposing
This gives rather big speedups on these functions:
Before:
put_h264_qpel_8_mc01_8_neon: 241.0 131.5 138.7
put_h264_qpel_8_mc02_8_neon: 214.7 121.2 127.5
put_h264_qpel_8_mc03_8_neon: 242.5 131.2 135.7
put_h264_qpel_8_mc11_8_neon: 421.2 218.7 251.0
put_h264_qpel_8_mc12_8_neon: 878.0 509.5 537.5
put_h264_qpel_8_mc13_8_neon: 423.7 217.0 252.0
put_h264_qpel_8_mc21_8_neon: 858.2 479.5 514.0
put_h264_qpel_8_mc22_8_neon: 649.7 385.2 403.0
put_h264_qpel_8_mc23_8_neon: 860.2 476.5 517.7
put_h264_qpel_8_mc31_8_neon: 437.2 219.5 252.5
put_h264_qpel_8_mc32_8_neon: 892.5 510.5 546.0
put_h264_qpel_8_mc33_8_neon: 438.2 218.5 257.0
put_h264_qpel_16_mc01_8_neon: 944.2 509.7 546.7
put_h264_qpel_16_mc02_8_neon: 878.7 469.5 509.7
put_h264_qpel_16_mc03_8_neon: 945.7 510.7 557.0
put_h264_qpel_16_mc11_8_neon: 1663.2 858.5 979.5
put_h264_qpel_16_mc12_8_neon: 3510.2 2027.7 2112.7
put_h264_qpel_16_mc13_8_neon: 1664.7 857.5 980.5
put_h264_qpel_16_mc21_8_neon: 3366.2 1928.5 2030.5
put_h264_qpel_16_mc22_8_neon: 2584.7 1514.7 1590.2
put_h264_qpel_16_mc23_8_neon: 3367.7 1927.7 2035.0
put_h264_qpel_16_mc31_8_neon: 1716.7 849.7 997.0
put_h264_qpel_16_mc32_8_neon: 3564.0 2044.2 3835.2
put_h264_qpel_16_mc33_8_neon: 1717.7 863.0 989.5
After:
put_h264_qpel_8_mc01_8_neon: 136.0 73.7 76.0
put_h264_qpel_8_mc02_8_neon: 108.7 65.0 64.0
put_h264_qpel_8_mc03_8_neon: 137.5 72.7 73.0
put_h264_qpel_8_mc11_8_neon: 316.2 159.0 188.5
put_h264_qpel_8_mc12_8_neon: 653.0 375.5 384.7
put_h264_qpel_8_mc13_8_neon: 318.7 165.5 189.5
put_h264_qpel_8_mc21_8_neon: 739.2 385.7 432.5
put_h264_qpel_8_mc22_8_neon: 530.7 295.5 309.5
put_h264_qpel_8_mc23_8_neon: 741.2 393.7 421.0
put_h264_qpel_8_mc31_8_neon: 332.2 162.5 190.0
put_h264_qpel_8_mc32_8_neon: 667.5 378.2 390.5
put_h264_qpel_8_mc33_8_neon: 332.7 166.5 195.5
put_h264_qpel_16_mc01_8_neon: 524.2 285.2 294.0
put_h264_qpel_16_mc02_8_neon: 454.7 252.2 250.2
put_h264_qpel_16_mc03_8_neon: 525.7 286.0 283.0
put_h264_qpel_16_mc11_8_neon: 1243.2 630.7 726.7
put_h264_qpel_16_mc12_8_neon: 2610.2 1479.7 1481.2
put_h264_qpel_16_mc13_8_neon: 1250.5 631.7 727.7
put_h264_qpel_16_mc21_8_neon: 2890.2 1571.2 1679.7
put_h264_qpel_16_mc22_8_neon: 2108.7 1177.5 1223.5
put_h264_qpel_16_mc23_8_neon: 2891.7 1578.7 1667.7
put_h264_qpel_16_mc31_8_neon: 1296.7 630.5 752.5
put_h264_qpel_16_mc32_8_neon: 2664.0 1483.2 1503.5
put_h264_qpel_16_mc33_8_neon: 1297.7 632.5 747.2
I.e. overall a 20%-60% reduction in runtime of these
functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
Diffstat (limited to 'libavcodec/mathtables.c')
0 files changed, 0 insertions, 0 deletions