diff options
author | Andreas Rheinhardt <andreas.rheinhardt@outlook.com> | 2022-09-08 05:10:16 +0200 |
---|---|---|
committer | Andreas Rheinhardt <andreas.rheinhardt@outlook.com> | 2022-09-19 23:40:41 +0200 |
commit | 888a02a126522ad1da802eb1f4e4571afc09a14d (patch) | |
tree | daeea69822f74444049f4d5224a8bb9f5189d29b /libavcodec/x86/xvididct.asm | |
parent | 4d7a1a4619c9ed89c5f6dbf8a0ae85b9a8aa056b (diff) | |
download | ffmpeg-888a02a126522ad1da802eb1f4e4571afc09a14d.tar.gz |
swscale/output: Don't call av_pix_fmt_desc_get() in a loop
Up until now, libswscale/output.c used a macro to write
an output pixel which involved a call to av_pix_fmt_desc_get()
to find out whether the input pixel format is BE or LE
despite this being known at compile-time (there are templates
per pixfmt). Even worse, these calls are made in a loop,
so that e.g. there are eight calls to av_pix_fmt_desc_get()
for every pixel processed in yuv2rgba64_X_c_template()
for 64bit RGB formats.
This commit modifies these macros to ensure that isBE()
is evaluated at compile-time. This saved 41184B of .text
for me (GCC 11.2, -O3). Of course, it also improved performance.
E.g. ffmpeg_g -f lavfi -i testsrc2,format=yuva420p -pix_fmt rgba64le \
-threads 1 -t 1:00 -f null - (which uses yuv2rgba64le_X_c,
which is an invocation of yuv2rgba64_X_c_template() mentioned above),
performance improved from 95589 to 41387 decicycles for one call
to yuv2packedX; for the be variant the numbers went down from
76087 to 43024 decicycles.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Diffstat (limited to 'libavcodec/x86/xvididct.asm')
0 files changed, 0 insertions, 0 deletions