aboutsummaryrefslogtreecommitdiffstats
path: root/libavutil/softfloat_tables.h
diff options
context:
space:
mode:
authorLynne <dev@lynne.ee>2022-05-20 10:00:43 +0200
committerLynne <dev@lynne.ee>2022-05-20 10:12:34 +0200
commit82a68a8771ca39564f6a74e0f875d6852e7a0c2a (patch)
tree92992c7067cf22f88b701ec6a143496592cac03c /libavutil/softfloat_tables.h
parent41a558fea06cc0a23b8d2d0dfb03ef6a25cf5100 (diff)
downloadffmpeg-82a68a8771ca39564f6a74e0f875d6852e7a0c2a.tar.gz
x86/tx_float: remove vgatherdpd usage
Its performance loss ranges from either being just as fast as individual loads (Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely disasterous (older/other CPUs). Sadly, gathers never panned out fast on x86, even with the benefit of time and implementation experience. This also saves a register, as there's no need to fill out an additional register mask. Zen 3 (16384-point transform): Before: 1561050 decicycles in av_tx (fft), 131072 runs, 0 skips After: 1449621 decicycles in av_tx (fft), 131072 runs, 0 skips Alderlake: 2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller sizes.
Diffstat (limited to 'libavutil/softfloat_tables.h')
0 files changed, 0 insertions, 0 deletions