doc/ffmpeg: rewrite the detailed description chapter

Split it into sections that describe in detail * the components of the transcoding pipeline * the main features it handles, in order of complexity * streamcopy * transcoding * filtering Replace the current confusing/misleading diagrams with new ones that actually reflect the program components and data flow between them.
author: Anton Khirnov <anton@khirnov.net> 2024-09-30 14:29:13 +0200
committer: Anton Khirnov <anton@khirnov.net> 2024-10-07 10:45:07 +0200
commit: 794308c61b386b613f7e68b317dfa7debddf2f07 (patch)
tree: e6cdebbf1d5d036ba001e383aaa58eb3a6d03ff3
parent: f339169f35bc3edd250062d613a7de957ad5a877 (diff)
download: ffmpeg-794308c61b386b613f7e68b317dfa7debddf2f07.tar.gz
1 files changed, 366 insertions, 101 deletions
diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index de140067ae..1bb322634d 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -87,140 +87,405 @@ The format option may be needed for raw input files.
 @chapter Detailed description
 @c man begin DETAILED DESCRIPTION
 
-The transcoding process in @command{ffmpeg} for each output can be described by
-the following diagram:
+@command{ffmpeg} builds a transcoding pipeline out of the components listed
+below. The program's operation then consists of input data chunks flowing from
+the sources down the pipes towards the sinks, while being transformed by the
+components they encounter along the way.
 
-@verbatim
- _______              ______________
-|       |            |              |
-| input |  demuxer   | encoded data |   decoder
-| file  | ---------> | packets      | -----+
-|_______|            |______________|      |
-                                           v
-                                       _________
-                                      |         |
-                                      | decoded |
-                                      | frames  |
-                                      |_________|
- ________             ______________       |
-|        |           |              |      |
-| output | <-------- | encoded data | <----+
-| file   |   muxer   | packets      |   encoder
-|________|           |______________|
+The following kinds of components are available:
+@itemize
+@item
+@emph{Demuxers} (short for "demultiplexers") read an input source in order to
+extract
 
+@itemize
+@item
+global properties such as metadata or chapters;
+@item
+list of input elementary streams and their properties
+@end itemize
+
+One demuxer instance is created for each @option{-i} option, and sends encoded
+@emph{packets} to @emph{decoders} or @emph{muxers}.
+
+In other literature, demuxers are sometimes called @emph{splitters}, because
+their main function is splitting a file into elementary streams (though some
+files only contain one elementary stream).
 
+A schematic representation of a demuxer looks like this:
+@verbatim
+┌──────────┬───────────────────────┐
+│ demuxer  │                       │ packets for stream 0
+╞══════════╡ elementary stream 0   ├──────────────────────⮞
+│          │                       │
+│  global  ├───────────────────────┤
+│properties│                       │ packets for stream 1
+│   and    │ elementary stream 1   ├──────────────────────⮞
+│ metadata │                       │
+│          ├───────────────────────┤
+│          │                       │
+│          │     ...........       │
+│          │                       │
+│          ├───────────────────────┤
+│          │                       │ packets for stream N
+│          │ elementary stream N   ├──────────────────────⮞
+│          │                       │
+└──────────┴───────────────────────┘
+     ⯅
+     │
+     │ read from file, network stream,
+     │     grabbing device, etc.
+     │
 @end verbatim
 
-@command{ffmpeg} calls the libavformat library (containing demuxers) to read
-input files and get packets containing encoded data from them. When there are
-multiple input files, @command{ffmpeg} tries to keep them synchronized by
-tracking lowest timestamp on any active input stream.
+@item
+@emph{Decoders} receive encoded (compressed) @emph{packets} for an audio, video,
+or subtitle elementary stream, and decode them into raw @emph{frames} (arrays of
+pixels for video, PCM for audio). A decoder is typically associated with (and
+receives its input from) an elementary stream in a @emph{demuxer}, but sometimes
+may also exist on its own (see @ref{Loopback decoders}).
 
-Encoded packets are then passed to the decoder (unless streamcopy is selected
-for the stream, see further for a description). The decoder produces
-uncompressed frames (raw video/PCM audio/...) which can be processed further by
-filtering (see next section). After filtering, the frames are passed to the
-encoder, which encodes them and outputs encoded packets. Finally, those are
-passed to the muxer, which writes the encoded packets to the output file.
+A schematic representation of a decoder looks like this:
+@verbatim
+          ┌─────────┐
+ packets  │         │ raw frames
+─────────⮞│ decoder ├────────────⮞
+          │         │
+          └─────────┘
+@end verbatim
 
-@section Filtering
-Before encoding, @command{ffmpeg} can process raw audio and video frames using
-filters from the libavfilter library. Several chained filters form a filter
-graph. @command{ffmpeg} distinguishes between two types of filtergraphs:
-simple and complex.
+@item
+@emph{Filtergraphs} process and transform raw audio or video @emph{frames}. A
+filtergraph consists of one or more individual @emph{filters} linked into a
+graph. Filtergraphs come in two flavors - @emph{simple} and @emph{complex},
+configured with the @option{-filter} and @option{-filter_complex} options,
+respectively.
+
+A simple filtergraph is associated with an @emph{output elementary stream}; it
+receives the input to be filtered from a @emph{decoder} and sends filtered
+output to that output stream's @emph{encoder}.
+
+A simple video filtergraph that performs deinterlacing (using the @code{yadif}
+deinterlacer) followed by resizing (using the @code{scale} filter) can look like
+this:
+@verbatim
 
-@subsection Simple filtergraphs
-Simple filtergraphs are those that have exactly one input and output, both of
-the same type. In the above diagram they can be represented by simply inserting
-an additional step between decoding and encoding:
+             ┌────────────────────────┐
+             │  simple filtergraph    │
+ frames from ╞════════════════════════╡ frames for
+ a decoder   │  ┌───────┐  ┌───────┐  │ an encoder
+────────────⮞├─⮞│ yadif ├─⮞│ scale ├─⮞│────────────⮞
+             │  └───────┘  └───────┘  │
+             └────────────────────────┘
+@end verbatim
 
+A complex filtergraph is standalone and not associated with any specific stream.
+It may have multiple (or zero) inputs, potentially of different types (audio or
+video), each of which receiving data either from a decoder or another complex
+filtergraph's output. It also has one or more outputs that feed either an
+encoder or another complex filtergraph's input.
+
+The following example diagram represents a complex filtergraph with 3 inputs and
+2 outputs (all video):
 @verbatim
- _________                        ______________
-|         |                      |              |
-| decoded |                      | encoded data |
-| frames  |\                   _ | packets      |
-|_________| \                  /||______________|
-             \   __________   /
-  simple     _\||          | /  encoder
-  filtergraph   | filtered |/
-                | frames   |
-                |__________|
+          ┌─────────────────────────────────────────────────┐
+          │               complex filtergraph               │
+          ╞═════════════════════════════════════════════════╡
+ frames   ├───────┐  ┌─────────┐      ┌─────────┐  ┌────────┤ frames
+─────────⮞│input 0├─⮞│ overlay ├─────⮞│ overlay ├─⮞│output 0├────────⮞
+          ├───────┘  │         │      │         │  └────────┤
+ frames   ├───────┐╭⮞│         │    ╭⮞│         │           │
+─────────⮞│input 1├╯ └─────────┘    │ └─────────┘           │
+          ├───────┘                 │                       │
+ frames   ├───────┐ ┌─────┐ ┌─────┬─╯              ┌────────┤ frames
+─────────⮞│input 2├⮞│scale├⮞│split├───────────────⮞│output 1├────────⮞
+          ├───────┘ └─────┘ └─────┘                └────────┤
+          └─────────────────────────────────────────────────┘
+@end verbatim
+Frames from second input are overlaid over those from the first. Frames from the
+third input are rescaled, then the duplicated into two identical streams. One of
+them is overlaid over the combined first two inputs, with the result exposed as
+the filtergraph's first output. The other duplicate ends up being the
+filtergraph's second output.
 
+@item
+@emph{Encoders} receive raw audio, video, or subtitle @emph{frames} and encode
+them into encoded @emph{packets}. The encoding (compression) process is
+typically @emph{lossy} - it degrades stream quality to make the output smaller;
+some encoders are @emph{lossless}, but at the cost of much higher output size. A
+video or audio encoder receives its input from some filtergraph's output,
+subtitle encoders receive input from a decoder (since subtitle filtering is not
+supported yet). Every encoder is associated with some muxer's @emph{output
+elementary stream} and sends its output to that muxer.
+
+A schematic representation of an encoder looks like this:
+@verbatim
+             ┌─────────┐
+ raw frames  │         │ packets
+────────────⮞│ encoder ├─────────⮞
+             │         │
+             └─────────┘
 @end verbatim
 
-Simple filtergraphs are configured with the per-stream @option{-filter} option
-(with @option{-vf} and @option{-af} aliases for video and audio respectively).
-A simple filtergraph for video can look for example like this:
+@item
+@emph{Muxers} (short for "multiplexers") receive encoded @emph{packets} for
+their elementary streams from encoders (the @emph{transcoding} path) or directly
+from demuxers (the @emph{streamcopy} path), interleave them (when there is more
+than one elementary stream), and write the resulting bytes into the output file
+(or pipe, network stream, etc.).
 
+A schematic representation of a muxer looks like this:
 @verbatim
- _______        _____________        _______        ________
-|       |      |             |      |       |      |        |
-| input | ---> | deinterlace | ---> | scale | ---> | output |
-|_______|      |_____________|      |_______|      |________|
+                       ┌──────────────────────┬───────────┐
+ packets for stream 0  │                      │   muxer   │
+──────────────────────⮞│  elementary stream 0 ╞═══════════╡
+                       │                      │           │
+                       ├──────────────────────┤  global   │
+ packets for stream 1  │                      │properties │
+──────────────────────⮞│  elementary stream 1 │   and     │
+                       │                      │ metadata  │
+                       ├──────────────────────┤           │
+                       │                      │           │
+                       │     ...........      │           │
+                       │                      │           │
+                       ├──────────────────────┤           │
+ packets for stream N  │                      │           │
+──────────────────────⮞│  elementary stream N │           │
+                       │                      │           │
+                       └──────────────────────┴─────┬─────┘
+                                                    │
+                     write to file, network stream, │
+                         grabbing device, etc.      │
+                                                    │
+                                                    ▼
+@end verbatim
+
+@end itemize
 
+@section Streamcopy
+The simplest pipeline in @command{ffmpeg} is single-stream
+@emph{streamcopy}, that is copying one @emph{input elementary stream}'s packets
+without decoding, filtering, or encoding them. As an example, consider an input
+file called @file{INPUT.mkv} with 3 elementary streams, from which we take the
+second and write it to file @file{OUTPUT.mp4}. A schematic representation of
+such a pipeline looks like this:
+@verbatim
+┌──────────┬─────────────────────┐
+│ demuxer  │                     │ unused
+╞══════════╡ elementary stream 0 ├────────╳
+│          │                     │
+│INPUT.mkv ├─────────────────────┤          ┌──────────────────────┬───────────┐
+│          │                     │ packets  │                      │   muxer   │
+│          │ elementary stream 1 ├─────────⮞│  elementary stream 0 ╞═══════════╡
+│          │                     │          │                      │OUTPUT.mp4 │
+│          ├─────────────────────┤          └──────────────────────┴───────────┘
+│          │                     │ unused
+│          │ elementary stream 2 ├────────╳
+│          │                     │
+└──────────┴─────────────────────┘
 @end verbatim
 
-Note that some filters change frame properties but not frame contents. E.g. the
-@code{fps} filter in the example above changes number of frames, but does not
-touch the frame contents. Another example is the @code{setpts} filter, which
-only sets timestamps and otherwise passes the frames unchanged.
+The above pipeline can be constructed with the following commandline:
+@example
+ffmpeg -i INPUT.mkv -map 0:1 -c copy OUTPUT.mp4
+@end example
 
-@subsection Complex filtergraphs
-Complex filtergraphs are those which cannot be described as simply a linear
-processing chain applied to one stream. This is the case, for example, when the graph has
-more than one input and/or output, or when output stream type is different from
-input. They can be represented with the following diagram:
+In this commandline
+@itemize
+
+@item
+there is a single input @file{INPUT.mkv};
+
+@item
+there are no input options for this input;
+
+@item
+there is a single output @file{OUTPUT.mp4};
+
+@item
+there are two output options for this output:
 
+@itemize
+@item
+@code{-map 0:1} selects the input stream to be used - from input with index 0
+(i.e. the first one) the stream with index 1 (i.e. the second one);
+
+@item
+@code{-c copy} selects the @code{copy} encoder, i.e. streamcopy with no decoding
+or encoding.
+@end itemize
+
+@end itemize
+
+Streamcopy is useful for changing the elementary stream count, container format,
+or modifying container-level metadata. Since there is no decoding or encoding,
+it is very fast and there is no quality loss. However, it might not work in some
+cases because of a variety of factors (e.g. certain information required by the
+target container is not available in the source). Applying filters is obviously
+also impossible, since filters work on decoded frames.
+
+More complex streamcopy scenarios can be constructed - e.g. combining streams
+from two input files into a single output:
 @verbatim
- _________
-|         |
-| input 0 |\                    __________
-|_________| \                  |          |
-             \   _________    /| output 0 |
-              \ |         |  / |__________|
- _________     \| complex | /
-|         |     |         |/
-| input 1 |---->| filter  |\
-|_________|     |         | \   __________
-               /| graph   |  \ |          |
-              / |         |   \| output 1 |
- _________   /  |_________|    |__________|
-|         | /
-| input 2 |/
-|_________|
+┌──────────┬────────────────────┐         ┌────────────────────┬───────────┐
+│ demuxer 0│                    │ packets │                    │   muxer   │
+╞══════════╡elementary stream 0 ├────────⮞│elementary stream 0 ╞═══════════╡
+│INPUT0.mkv│                    │         │                    │OUTPUT.mp4 │
+└──────────┴────────────────────┘         ├────────────────────┤           │
+┌──────────┬────────────────────┐         │                    │           │
+│ demuxer 1│                    │ packets │elementary stream 1 │           │
+╞══════════╡elementary stream 0 ├────────⮞│                    │           │
+│INPUT1.aac│                    │         └────────────────────┴───────────┘
+└──────────┴────────────────────┘
+@end verbatim
+that can be built by the commandline
+@example
+ffmpeg -i INPUT0.mkv -i INPUT1.aac -map 0:0 -map 1:0 -c copy OUTPUT.mp4
+@end example
+
+The output @option{-map} option is used twice here, creating two streams in the
+output file - one fed by the first input and one by the second. The single
+instance of the @option{-c} option selects streamcopy for both of those streams.
+You could also use multiple instances of this option together with
+@ref{Stream specifiers} to apply different values to each stream, as will be
+demonstrated in following sections.
 
+A converse scenario is splitting multiple streams from a single input into
+multiple outputs:
+@verbatim
+┌──────────┬─────────────────────┐          ┌───────────────────┬───────────┐
+│ demuxer  │                     │ packets  │                   │ muxer 0   │
+╞══════════╡ elementary stream 0 ├─────────⮞│elementary stream 0╞═══════════╡
+│          │                     │          │                   │OUTPUT0.mp4│
+│INPUT.mkv ├─────────────────────┤          └───────────────────┴───────────┘
+│          │                     │ packets  ┌───────────────────┬───────────┐
+│          │ elementary stream 1 ├─────────⮞│                   │ muxer 1   │
+│          │                     │          │elementary stream 0╞═══════════╡
+└──────────┴─────────────────────┘          │                   │OUTPUT1.mp4│
+                                            └───────────────────┴───────────┘
 @end verbatim
+built with
+@example
+ffmpeg -i INPUT.mkv -map 0:0 -c copy OUTPUT0.mp4 -map 0:1 -c copy OUTPUT1.mp4
+@end example
+Note how a separate instance of the @option{-c} option is needed for every
+output file even though their values are the same. This is because non-global
+options (which is most of them) only apply in the context of the file before
+which they are placed.
 
-Complex filtergraphs are configured with the @option{-filter_complex} option.
-Note that this option is global, since a complex filtergraph, by its nature,
-cannot be unambiguously associated with a single stream or file.
+These  examples can of course be further generalized into arbitrary remappings
+of any number of inputs into any number of outputs.
 
-The @option{-lavfi} option is equivalent to @option{-filter_complex}.
+@section Trancoding
+@emph{Transcoding} is the process of decoding a stream and then encoding it
+again. Since encoding tends to be computationally expensive and in most cases
+degrades the stream quality (i.e. it is @emph{lossy}), you should only transcode
+when you need to and perform streamcopy otherwise. Typical reasons to transcode
+are:
 
-A trivial example of a complex filtergraph is the @code{overlay} filter, which
-has two video inputs and one video output, containing one video overlaid on top
-of the other. Its audio counterpart is the @code{amix} filter.
+@itemize
+@item
+applying filters - e.g. resizing, deinterlacing, or overlaying video; resampling
+or mixing audio;
 
-@section Stream copy
-Stream copy is a mode selected by supplying the @code{copy} parameter to the
-@option{-codec} option. It makes @command{ffmpeg} omit the decoding and encoding
-step for the specified stream, so it does only demuxing and muxing. It is useful
-for changing the container format or modifying container-level metadata. The
-diagram above will, in this case, simplify to this:
+@item
+you want to feed the stream to something that cannot decode the original codec.
+@end itemize
+Note that @command{ffmpeg} will transcode all audio, video, and subtitle streams
+unless you specify @option{-c copy} for them.
 
+Consider an example pipeline that reads an input file with one audio and one
+video stream, transcodes the video and copies the audio into a single output
+file. This can be schematically represented as follows
 @verbatim
- _______              ______________            ________
-|       |            |              |          |        |
-| input |  demuxer   | encoded data |  muxer   | output |
-| file  | ---------> | packets      | -------> | file   |
-|_______|            |______________|          |________|
+┌──────────┬─────────────────────┐
+│ demuxer  │                     │       audio packets
+╞══════════╡ stream 0 (audio)    ├─────────────────────────────────────╮
+│          │                     │                                     │
+│INPUT.mkv ├─────────────────────┤ video    ┌─────────┐     raw        │
+│          │                     │ packets  │  video  │ video frames   │
+│          │ stream 1 (video)    ├─────────⮞│ decoder ├──────────────╮ │
+│          │                     │          │         │              │ │
+└──────────┴─────────────────────┘          └─────────┘              │ │
+                                                                     ▼ ▼
+                                                                     │ │
+┌──────────┬─────────────────────┐ video    ┌─────────┐              │ │
+│ muxer    │                     │ packets  │  video  │              │ │
+╞══════════╡ stream 0 (video)    │⮜─────────┤ encoder ├──────────────╯ │
+│          │                     │          │(libx264)│                │
+│OUTPUT.mp4├─────────────────────┤          └─────────┘                │
+│          │                     │                                     │
+│          │ stream 1 (audio)    │⮜────────────────────────────────────╯
+│          │                     │
+└──────────┴─────────────────────┘
+@end verbatim
+and implemented with the following commandline:
+@example
+ffmpeg -i INPUT.mkv -map 0:v -map 0:a -c:v libx264 -c:a copy OUTPUT.mp4
+@end example
+Note how it uses stream specifiers @code{:v} and @code{:a} to select input
+streams and apply different values of the @option{-c} option to them; see the
+@ref{Stream specifiers} section for more details.
+
+
+@section Filtering
+
+When transcoding, audio and video streams can be filtered before encoding, with
+either a @emph{simple} or @emph{complex} filtergraph.
+
+@subsection Simple filtergraphs
 
+Simple filtergraphs are those that have exactly one input and output, both of
+the same type (audio or video). They are configured with the per-stream
+@option{-filter} option (with @option{-vf} and @option{-af} aliases for
+@option{-filter:v} (video) and @option{-filter:a} (audio) respectively). Note
+that simple filtergraphs are tied to their output stream, so e.g. if you have
+multiple audio streams, @option{-af} will create a separate filtergraph for each
+one.
+
+Taking the trancoding example from above, adding filtering (and omitting audio,
+for clarity) makes it look like this:
+@verbatim
+┌──────────┬───────────────┐
+│ demuxer  │               │          ┌─────────┐
+╞══════════╡ video stream  │ packets  │  video  │ frames
+│INPUT.mkv │               ├─────────⮞│ decoder ├─────⮞───╮
+│          │               │          └─────────┘         │
+└──────────┴───────────────┘                              │
+                                  ╭───────────⮜───────────╯
+                                  │   ┌────────────────────────┐
+                                  │   │  simple filtergraph    │
+                                  │   ╞════════════════════════╡
+                                  │   │  ┌───────┐  ┌───────┐  │
+                                  ╰──⮞├─⮞│ yadif ├─⮞│ scale ├─⮞├╮
+                                      │  └───────┘  └───────┘  ││
+                                      └────────────────────────┘│
+                                                                │
+                                                                │
+┌──────────┬───────────────┐ video    ┌─────────┐               │
+│ muxer    │               │ packets  │  video  │               │
+╞══════════╡ video stream  │⮜─────────┤ encoder ├───────⮜───────╯
+│OUTPUT.mp4│               │          │         │
+│          │               │          └─────────┘
+└──────────┴───────────────┘
 @end verbatim
 
-Since there is no decoding or encoding, it is very fast and there is no quality
-loss. However, it might not work in some cases because of many factors. Applying
-filters is obviously also impossible, since filters work on uncompressed data.
+@subsection Complex filtergraphs
+
+Complex filtergraphs are those which cannot be described as simply a linear
+processing chain applied to one stream. This is the case, for example, when the
+graph has more than one input and/or output, or when output stream type is
+different from input. Complex filtergraphs are configured with the
+@option{-filter_complex} option. Note that this option is global, since a
+complex filtergraph, by its nature, cannot be unambiguously associated with a
+single stream or file. Each instance of @option{-filter_complex} creates a new
+complex filtergraph, and there can be any number of them.
+
+A trivial example of a complex filtergraph is the @code{overlay} filter, which
+has two video inputs and one video output, containing one video overlaid on top
+of the other. Its audio counterpart is the @code{amix} filter.
 
+@anchor{Loopback decoders}
 @section Loopback decoders
 While decoders are normally associated with demuxer streams, it is also possible
 to create "loopback" decoders that decode the output from some encoder and allow
author	Anton Khirnov <anton@khirnov.net>	2024-09-30 14:29:13 +0200
committer	Anton Khirnov <anton@khirnov.net>	2024-10-07 10:45:07 +0200
commit	794308c61b386b613f7e68b317dfa7debddf2f07 (patch)
tree	e6cdebbf1d5d036ba001e383aaa58eb3a6d03ff3
parent	f339169f35bc3edd250062d613a7de957ad5a877 (diff)
download	ffmpeg-794308c61b386b613f7e68b317dfa7debddf2f07.tar.gz