aboutsummaryrefslogtreecommitdiffstats
path: root/libavcodec/pcm_tablegen.c
diff options
context:
space:
mode:
authorDevin Heitmueller <devin.heitmueller@ltnglobal.com>2023-05-05 17:54:17 -0400
committerMarton Balint <cus@passwd.hu>2023-12-28 23:56:14 +0100
commitb2c82b23b9fd9906a98b53af5ee8eadd08eb95d9 (patch)
tree0bfb130fa6d39e33bb3e018f2c18fc4d79d8955a /libavcodec/pcm_tablegen.c
parent059ea1d6f60f5e3eb300041281bf83fb89678f60 (diff)
downloadffmpeg-b2c82b23b9fd9906a98b53af5ee8eadd08eb95d9.tar.gz
avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10
Rework the code a bit to speed up the 10-bit bitpacked decoding routine. This is probably about as fast as I can get it without switching to assembly language. Demonstratable with: ./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv ./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv On my development system, it went from 80ms for a 2160p frame down to 20ms (i.e. a 4X speedup). Good enough for now, I hope... Comments from Marton: Originally on my system better performance could be achieved by simply switching to the cached bitstream reader, but for Devin it was slower than his direct byte operations. I changed the order of writing output from u/y/v/y to u/v/y/y, and that made the code faster than the cached bitstream reader on my system as well. TIMER measurement of the decode loop on Ryzen 5 3600 with command line: ./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error Before: 823204127 decicycles in YUV, 256 runs, 0 skips After: 315070524 decicycles in YUV, 256 runs, 0 skips Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com> Signed-off-by: Marton Balint <cus@passwd.hu>
Diffstat (limited to 'libavcodec/pcm_tablegen.c')
0 files changed, 0 insertions, 0 deletions