| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Speeds up everything on AMD by 3x.
This uses 32 local invocations to load state into cache, as well
as to do the RCT faster.
|
|
|
|
| |
This saves on some VRAM, but mainly allows for a more unified path.
|
|
|
|
| |
8% speedup on nvidia on 4k.
|
|
|
|
|
|
|
|
| |
This commit also makes it possible for the encoder to choose a different
quantization table on a per-slice basis, as well as adding this capability
to the decoder.
Also, this commit fully fixes decoding of context=1 encoded files.
|
|
|
|
|
|
|
|
| |
This reduces the intermediate VRAM used for RGB decoding by a
factor of 100x for 6k video.
This also speeds the decoder up by 16% for 4k RGB24 and 31% for 6k video.
This is equivalent to what the software decoder does, but with less pointers.
|
|
|
|
| |
This is likely a nanooptimization, but its more correct.
|
|
|
|
|
| |
Without a barrier upfront, the reset shader may read data fields not
yet set by the setup shader.
|
|
|
|
|
|
| |
The commit which added support for host mapping accidentally broke the
original, upload route.
For drivers without host-mapping (very few), fix it.
|
|
|
|
|
| |
Rather than always using the maximum allowed slices, just use the number
of slices present in this frame.
|
|
|
|
| |
Leftover debug macro.
|
|
|
|
| |
Fixed by previous commit.
|
|
This patch adds a fully-featured level 3 and 4 decoder for FFv1,
supporting Golomb and all Range coding variants, all pixel formats,
and all features, except for the newly added floating-point formats.
On a 6000 Ada, for 3840x2160 bgr0 content at 50Mbps (standard desktop
recording), it is able to do 400fps.
An Alder Lake with 24 threads can barely do 100fps.
|