aboutsummaryrefslogtreecommitdiffstats
path: root/libavformat/isom.h
diff options
context:
space:
mode:
authorClément Bœsch <u@pkh.me>2022-02-18 23:02:20 +0100
committerClément Bœsch <u@pkh.me>2022-03-04 15:50:51 +0100
commitab77b878f1205225c6de1370fb0e998dbcc8bc69 (patch)
treec611af68a0fd84f0fd502563faed53d7dbe6c8c2 /libavformat/isom.h
parente05e4398c39b2dafbec748fcd82dcb8709956ed4 (diff)
downloadffmpeg-ab77b878f1205225c6de1370fb0e998dbcc8bc69.tar.gz
avformat/mov: fix seeking with HEVC open GOP files
This was tested with medias recorded from an iPhone XR and an iPhone 13. Here is how a typical stream looks like in coding order: ┌────────┬─────┬─────┬──────────┐ │ sample | PTS | DTS | keyframe | ├────────┼─────┼─────┼──────────┤ ┊ ┊ ┊ ┊ ┊ │ 53 │ 560 │ 510 │ No │ │ 54 │ 540 │ 520 │ No │ │ 55 │ 530 │ 530 │ No │ │ 56 │ 550 │ 540 │ No │ │ 57 │ 600 │ 550 │ Yes │ │ * 58 │ 580 │ 560 │ No │ │ * 59 │ 570 │ 570 │ No │ │ * 60 │ 590 │ 580 │ No │ │ 61 │ 640 │ 590 │ No │ │ 62 │ 620 │ 600 │ No │ ┊ ┊ ┊ ┊ ┊ In composition/display order: ┌────────┬─────┬─────┬──────────┐ │ sample | PTS | DTS | keyframe | ├────────┼─────┼─────┼──────────┤ ┊ ┊ ┊ ┊ ┊ │ 55 │ 530 │ 530 │ No │ │ 54 │ 540 │ 520 │ No │ │ 56 │ 550 │ 540 │ No │ │ 53 │ 560 │ 510 │ No │ │ * 59 │ 570 │ 570 │ No │ │ * 58 │ 580 │ 560 │ No │ │ * 60 │ 590 │ 580 │ No │ │ 57 │ 600 │ 550 │ Yes │ │ 63 │ 610 │ 610 │ No │ │ 62 │ 620 │ 600 │ No │ ┊ ┊ ┊ ┊ ┊ Sample/frame 58, 59 and 60 are B-frames which actually depends on the key frame (57). Here the key frame is not an IDR but a "CRA" (Clean Random Access). Initially, I thought I could rely on the sdtp box (independent and disposable samples), but unfortunately: sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0 sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0 The information that might have been useful here would have been is_leading, but all the samples are set to 0 so this was unusable. Instead, we need to rely on sgpd/sbgp tables. In my case the video track contained 3 sgpd tables with the following grouping types: tscl, sync and tsas. In the sync table we have the following 2 entries (only): sgpd.sync[1]: sync nal_unit_type:0x14 sgpd.sync[2]: sync nal_unit_type:0x15 (The count starts at 1 because 0 carries the undefined semantic, we'll see that later in the reference table). The NAL unit types presented here correspond to: libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20, libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21, In parallel, the sbgp sync table contains the following: ┌────┬───────┬─────┐ │ id │ count │ gdi │ ├────┼───────┼─────┤ │ 0 │ 1 │ 1 │ │ 1 │ 56 │ 0 │ │ 2 │ 1 │ 2 │ │ 3 │ 59 │ 0 │ │ 4 │ 1 │ 2 │ │ 5 │ 59 │ 0 │ │ 6 │ 1 │ 2 │ │ 7 │ 59 │ 0 │ │ 8 │ 1 │ 2 │ │ 9 │ 59 │ 0 │ │ 10 │ 1 │ 2 │ │ 11 │ 11 │ 0 │ └────┴───────┴─────┘ The gdi column (group description index) directly refers to the index in the sgpd.sync table. This means the first frame is an IDR, then we have batches of undefined frames interlaced with CRA frames. No IDR ever appears again (tried on a 30+ seconds sample). With that information, we can build an heuristic using the presentation order. A few things needed to be introduced in this commit: 1. min_sample_duration is extracted from the stts: we need the minimal step between sample in order to PTS-step backward to a valid point 2. In order to avoid a loop over the ctts table systematically during a seek, we build an expanded list of sample offsets which will be used to translate from DTS to PTS 3. An open_key_samples index to keep track of all the non-IDR key frames; for now it only supports HEVC CRA frames. We should probably add BLA frames as well, but I don't have any sample so I prefered to leave that for later It is entirely possible I missed something obvious in my approach, but I couldn't come up with a better solution. Also, as mentioned in the diff, we could optimize is_open_key_sample(), but the linear scaling overhead should be fine for now since it only happens in seek events. Fixing this issue prevents sending broken packets to the decoder. With FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames are glitching.
Diffstat (limited to 'libavformat/isom.h')
-rw-r--r--libavformat/isom.h5
1 files changed, 5 insertions, 0 deletions
diff --git a/libavformat/isom.h b/libavformat/isom.h
index bbe395ee4b..5caf42b15d 100644
--- a/libavformat/isom.h
+++ b/libavformat/isom.h
@@ -222,6 +222,11 @@ typedef struct MOVStreamContext {
MOVSbgp *sync_group;
uint8_t *sgpd_sync;
uint32_t sgpd_sync_count;
+ int32_t *sample_offsets;
+ int sample_offsets_count;
+ int *open_key_samples;
+ int open_key_samples_count;
+ uint32_t min_sample_duration;
int nb_frames_for_fps;
int64_t duration_for_fps;