This document serves as starting point for understanding the design and implementation of the Ogg container format. If you're new to Ogg or merely want a high-level technical overview, start reading here. Other documents linked from the index page give distilled technical descriptions and references of the container mechanisms. This document is intended to aid understanding. 本文档是了解Ogg容器格式的设计和实现的起点。若是您是Ogg的新手,或者只是想要高级技术概述,请在这里开始阅读。从索引页面连接的其余文档提供了精炼的技术说明和容器机构的参考。本文档旨在帮助理解。node
Ogg is intended to be a simplest-possible container, concerned only with framing, ordering, and interleave. It can be used as a stream delivery mechanism, for media file storage, or as a building block toward implementing a more complex, non-linear container (for example, see the Skeleton or Annodex/CMML). Ogg旨在成为最简单的容器,仅关注成帧,排序和交错。它能够用做流传输机制,用于媒体文件存储,也能够用做实现更复杂的非线性容器的构建块(例如,参见Skeleton或Annodex / CMML)。 The Ogg container is not intended to be a monolithic 'kitchen-sink'. It exists only to frame and deliver in-order stream data and as such is vastly simpler than most other containers. Elementary and multiplexed streams are both constructed entirely from a single building block (an Ogg page) comprised of eight fields totalling twenty-eight bytes (the page header) a list of packet lengths (up to 255 bytes) and payload data (up to 65025 bytes). The structure of every page is the same. There are no optional fields or alternate encodings. Ogg容器不能用做总体式“厨房水槽”。它仅用于帧和按顺序传输流数据,所以比大多数其余容器要简单得多。基本流和多路复用流都彻底由单个构建块(一个Ogg页)构成,该构建块由八个字段组成,这些字段总计28个字节(页面标头),数据包长度列表(最多255个字节)和有效负载数据(最多65025个)个字节)。每一个页面的结构都相同。没有可选字段或替代编码。 Stream and media metadata is contained in Ogg and not built into the Ogg container itself. Metadata is thus compartmentalized and layered rather than part of a monolithic design, an especially good idea as no two groups seem able to agree on what a complete or complete-enough metadata set should be. In this way, the container and container implementation are isolated from unnecessary metadata design flux. 流和媒体元数据包含在Ogg中,而不是内置在Ogg容器中。所以,元数据是分隔的和分层的,而不是总体设计的一部分,这是一个特别好的主意,由于彷佛没有两个小组可以就完整的或足够足够的元数据集达成共识。这样,容器和容器实现与没必要要的元数据设计流隔离开来。ios
The Ogg container is primarily a streaming format, encapsulating chronological, time-linear mixed media into a single delivery stream or file. The design is such that an application can always encode and/or decode all features of a bitstream in one pass with no seeking and minimal buffering. Seeking to provide optimized encoding (such as two-pass encoding) or interactive decoding (such as scrubbing or instant replay) is not disallowed or discouraged, however no container feature requires nonlinear access of the bitstream. Ogg容器主要是一种流格式,将按时间顺序,时间线性的混合媒体封装到单个传递流或文件中。这种设计使得应用程序始终能够在一次遍历中对比特流的全部特征进行编码和/或解码,而无需寻找和最小化缓冲。寻求或建议不要提供优化的编码(例如两次经过编码)或交互式解码(例如清理或即时重放),可是没有容器功能须要对位流进行非线性访问。api
Ogg is designed to contain any size data payload with bounded, predictable efficiency. Ogg packets have no maximum size and a zero-byte minimum size. There is no restriction on size changes from packet to packet. Variable size packets do not require the use of any optional or additional container features. There is no optimal suggested packet size, though special consideration was paid to make sure 50-200 byte packets were no less efficient than larger packet sizes. The original design criteria was a 2% overhead at 50 byte packets, dropping to a maximum working overhead of 1% with larger packets, and a typical working overhead of .5-.7% for most practical uses. Ogg被设计为包含任何大小的数据有效载荷,而且效率是有限的。 Ogg数据包没有最大大小,最小字节数为零。对每一个数据包的大小变化没有限制。可变大小的数据包不须要使用任何可选的或附加的容器功能。尽管要特别注意确保50-200字节的数据包不比较大的数据包有效,但没有建议的最佳数据包大小。最初的设计标准是50字节数据包的开销为2%,对于较大的数据包,最大开销为1%,对于大多数实际用途,典型的工做开销为0.5-0.7%。markdown
Ogg is a byte-aligned container with no context-dependent, optional or variable-length fields. Ogg requires no repacking of codec data. The page structure is written out in-line as packet data is submitted to the streaming abstraction. In addition, it is possible to implement both Ogg mux and demux as MT-hot zero-copy abstractions (as is done in the Tremor sourcebase). Ogg是一个字节对齐的容器,没有上下文相关的,可选的或可变长度的字段。 Ogg不须要从新包装编解码器数据。当分组数据被提交给流抽象时,页面结构被内联地写出。此外,还能够将Ogg多路复用器和demux都实现为MT热零拷贝抽象(在Tremor源库中完成)。并发
Ogg is designed for efficient and immediate stream capture with high confidence. Although packets have no size limit in Ogg, pages are a maximum of just under 64kB meaning that any Ogg stream can be captured with confidence after seeing 128kB of data or less [worst case; typical figure is 6kB] from any random starting point in the stream. Ogg旨在高置信度地进行高效,即时的流捕获。尽管数据包在Ogg中没有大小限制,可是页面的最大值最大不到64kB,这意味着在看到128kB或更小的数据后,能够放心地捕获任何Ogg流。从流中任意随机起点算起的典型数字是6kB]。app
Ogg implements simple coarse- and fine-grained seeking by design. Ogg经过设计实现了简单的粗粒度和细粒度搜索。 Coarse seeking may be performed by simply 'moving the tone arm' to a new position and 'dropping the needle'. Rapid capture with accompanying timecode from any location in an Ogg file is guaranteed by the stream design. From the acquisition of the first timecode, all data needed to play back from that time code forward is ahead of the stream cursor. 粗略搜索能够经过简单地“将音调臂移至新位置并放下针”来执行。流设计可确保从Ogg文件中的任何位置快速捕获附带的时间码。从获取第一个时间码开始,从该时间码开始播放所需的全部数据都在流游标以前。 Ogg implements full sample-granularity seeking using an interpolated bisection search built on the capture and timecode mechanisms used by coarse seeking. As above, once a search finds the desired timecode, all data needed to play back from that time code forward is ahead of the stream cursor. Ogg使用插值二分搜索来实现完整的样本粒度搜索,该插值二分搜索基于粗搜索使用的捕获和时间码机制。如上所述,一旦搜索找到所需的时间码,则从该时间码开始播放所需的全部数据都在流游标以前。 Both coarse and fine seeking use the page structure and sequencing inherent to the Ogg format. All Ogg streams are fully seekable from creation; seekability is unaffected by truncation or missing data, and is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor heuristic. 粗略查找和精细查找都使用Ogg格式固有的页面结构和排序。全部Ogg流均可以从建立中彻底查找;可搜索性不受截断或丢失数据的影响,而且能够容忍严重腐败。搜寻操做既不“模糊”也不启发式。 Seeking without use of an index is a major point of the Ogg design. There two primary reasons why Ogg transport forgoes an index: 不使用索引进行查找是Ogg设计的重点。 Ogg传输放弃索引的主要缘由有两个:框架
Ogg multiplexes streams by interleaving pages from multiple elementary streams into a multiplexed stream in time order. The multiplexed pages are not altered. Muxing an Ogg AV stream out of separate audio, video and data streams is akin to shuffling several decks of cards together into a single deck; the cards themselves remain unchanged. Demultiplexing is similarly simple (as the cards are marked). Ogg经过按时间顺序未来自多个基本流的页面交错到一个多路复用的流中来多路复用流。多路复用的页面不会更改。将Ogg AV流从单独的音频,视频和数据流中混合出来,相似于将几副纸牌一块儿洗牌成一个纸牌。卡自己保持不变。解复用一样简单(如已标记卡)。 The goal of this design is to make the mux/demux operation as trivial as possible to allow live streaming systems to build and rebuild streams on the fly with minimal CPU usage and no additional storage or latency requirements. 该设计的目的是使复用/解复用操做尽量地简单,以容许实时流系统在不占用额外存储空间或等待时间的状况下,以最少的CPU使用率即时构建和重建流。less
Ogg streams belong to one of two categories, "Continuous" streams and "Discontinuous" streams. Ogg流属于“连续”流和“不连续”流两类之一。 A stream that provides a gapless, time-continuous media type with a fine-grained timebase is considered to be 'Continuous'. A continuous stream should never be starved of data. Examples of continuous data types include broadcast audio and video. 提供具备细粒度时基的无间隙,时间连续的媒体类型的流被认为是“连续的”。连续的数据流永远不会饿死数据。连续数据类型的示例包括广播音频和视频。 A stream that delivers data in a potentially irregular pattern or with widely spaced timing gaps is considered to be 'Discontinuous'. A discontinuous stream may be best thought of as data representing scattered events; although they happen in order, they are typically unconnected data often located far apart. One example of a discontinuous stream types would be captioning such as Ogg Kate. Although it's possible to design captions as a continuous stream type, it's most natural to think of captions as widely spaced pieces of text with little happening between. 以潜在的不规则模式或间隔较大的时间间隔传送数据的流被认为是“不连续的”。最好将不连续流视为表明分散事件的数据。尽管它们是按顺序发生的,但它们一般是一般彼此相距很远的未链接数据。不连续流类型的一个示例是字幕,例如Ogg Kate。尽管能够将字幕设计为连续流类型,但将字幕视为间隔很远的文本片断却不多发生是很天然的。 The fundamental reason for distinction between continuous and discontinuous streams concerns buffering. 区分连续流和不连续流的根本缘由与缓冲有关。dom
A continuous stream is, by definition, gapless. Ogg buffering is based on the simple premise of never allowing an active continuous stream to starve for data during decode; buffering works ahead until all continuous streams in a physical stream have data ready and no further. 根据定义,连续流是无间隙的。 Ogg缓冲基于这样一个简单的前提:在解码过程当中,永远不容许活动的连续流饿数据。缓冲将继续工做,直到物理流中的全部连续流都准备好数据为止。 Discontinuous stream data is not assumed to be predictable. The buffering design takes discontinuous data 'as it comes' rather than working ahead to look for future discontinuous data for a potentially unbounded period. Thus, the buffering process makes no attempt to fill discontinuous stream buffers; their pages simply 'fall out' of the stream when continuous streams are handled properly. 不连续的流数据不被认为是可预测的。缓冲设计“按需”获取不连续数据,而不是提早进行工做以寻找可能不受限制的未来的不连续数据。所以,缓冲过程不会尝试填充不连续的流缓冲区。若是正确地处理了连续流,它们的页面只会从流中“掉出来”。 Buffering requirements in this design need not be explicitly declared or managed in the encoded stream. The decoder simply reads as much data as is necessary to keep all continuous stream types gapless and no more, with discontinuous data processed as it arrives in the continuous data. Buffering is implicitly optimal for the given stream. Because all pages of all data types are stamped with absolute timing information within the stream, inter-stream synchronization timing is always maintained without the need for explicitly declared buffer-ahead hinting. 此设计中的缓冲要求不须要在编码流中明确声明或管理。解码器简单地读取所需的数据,以使全部连续流类型保持无间隙且再也不中断,并在到达连续数据时处理不连续的数据。对于给定的流,缓冲是隐式最佳的。因为全部数据类型的全部页面都在流中标记有绝对定时信息,所以始终保持流间同步定时,而无需显式声明的提早缓冲提示。
Ogg does not replicate codec-specific metadata into the mux layer in an attempt to make the mux and codec layer implementations 'fully separable'. Things like specific timebase, keyframing strategy, frame duration, etc, do not appear in the Ogg container. The mux layer is, instead, expected to query a codec through a centralized interface, left to the implementation, for this data when it is needed. Ogg不会将特定于编解码器的元数据复制到mux层中,以尝试使mux和codec层实现“彻底可分离”。诸如特定时基,关键帧策略,帧持续时间等之类的内容不会出如今Ogg容器中。相反,指望复用器层经过集中式接口查询编解码器,该接口留给实现,以在须要时对此数据进行查询。 Though modern design wisdom usually prefers to predict all possible needs of current and future codecs then embed these dependencies and the required metadata into the container itself, this strategy increases container specification complexity, fragility, and rigidity. The mux and codec code becomes more independent, but the specifications become logically less independent. A codec can't do what a container hasn't already provided for. Novel codecs are harder to support, and you can do fewer useful things with the ones you've already got (eg, try to make a good splitter without using any codecs. Such a splitter is limited to splitting at keyframes only, or building yet another new mechanism into the container layer to mark what frames to skip displaying). 尽管现代设计智慧一般倾向于预测当前和未来编解码器的全部可能需求,而后将这些依赖关系和所需的元数据嵌入容器自己,可是这种策略会增长容器规范的复杂性,脆弱性和刚性。多路复用器和编解码器代码变得更加独立,可是规范在逻辑上变得不那么独立。编解码器没法执行容器还没有提供的操做。新型编解码器更难支持,而且您能够用已有的编解码器作更少的有用的事情(例如,尝试不使用任何编解码器而制做一个好的拆分器。此类拆分器仅限于仅在关键帧处拆分或构建容器层中的另外一种新机制能够标记要跳过的帧。 Ogg's design goes the opposite direction, where the specification is to be as simple, easy to understand, and 'proofed' against novel codecs as possible. When an Ogg mux layer requires codec-specific information, it queries the codec (or a codec stub). This trades a more complex implementation for a simpler, more flexible specification. Ogg的设计朝着相反的方向发展,即规范要尽量简单,易于理解,并尽量抵制新型编解码器。当Ogg Mux层须要特定于编解码器的信息时,它将查询编解码器(或编解码器存根)。这将更复杂的实现换成更简单,更灵活的规范。
The Ogg container itself does not define a metadata system for declaring the structure and interrelations between multiple media types in a muxed stream. That is, the Ogg container itself does not specify data like 'which steam is the subtitle stream?' or 'which video stream is the primary angle?'. This metadata still exists, but is stored by the Ogg container rather than being built into the Ogg container itself. Xiph specifies the 'Skeleton' metadata format for Ogg streams, but this decoupling of container and stream structure metadata means it is possible to use Ogg with any metadata specification without altering the container itself, or without stream structure metadata at all. Ogg容器自己并未定义用于声明复用流中多种媒体类型之间的结构和相互关系的元数据系统。也就是说,Ogg容器自己不指定诸如“字幕流是哪一股蒸汽?”之类的数据。或“哪一个视频流是主要角度?”。该元数据仍然存在,可是由Ogg容器存储,而不是内置在Ogg容器自己中。 Xiph为Ogg流指定了“骨架”元数据格式,可是容器和流结构元数据的这种解耦意味着能够将Ogg与任何元数据规范一块儿使用,而无需更改容器自己,或者根本不须要流结构元数据。
Every Ogg page is stamped with a 64 bit 'granule position' that serves as an absolute timestamp for mux and seeking. A few nifty little tricks are usually also embedded in the granpos state, but we'll leave those aside for the moment (strictly speaking, they're part of each codec's mapping, not Ogg). 每一个Ogg页面上都印有一个64位的“颗粒位置”,做为多路复用和查找的绝对时间戳。一般,granpos状态中还嵌入了一些漂亮的小技巧,但咱们暂时将其忽略(严格地说,它们是每一个编解码器映射的一部分,而不是Ogg)。 As previously mentioned above, granule positions are mapped into absolute timestamps by the codec, rather than being a hard timestamp. This allows maximally efficient use of the available 64 bits to address every sample/frame position without approximation while supporting new and previously unknown timebase encodings without needing to extend or update the mux layer. When a codec needs a novel timebase, it simply brings the code for that mapping along with it. This is not a theoretical curiosity; new, wholly novel timebases were deployed with the adoption of both Theora and Dirac. "Rolling INTRA" (keyframeless video) also benefits from novel use of the granule position. 如上所述,颗粒位置由编解码器映射到绝对时间戳,而不是硬时间戳。这样就能够最大程度地有效利用可用的64位来近似估计每一个采样/帧的位置,同时支持新的和之前未知的时基编码,而无需扩展或更新多路复用器层。当编解码器须要新颖的时基时,它只需将用于该映射的代码与之一块儿带来便可。这不是理论上的好奇心;经过Theora和Dirac部署了全新的全新时基。 “滚动INTRA”(无关键帧视频)还得益于对颗粒位置的新颖使用。
Ogg codecs place raw compressed data into packets. Packets are octet payloads containing the data needed for a single decompressed unit, eg, one video frame. Packets have no maximum size and may be zero length. They do not generally have any framing information; strung together, the unframed packets form a logical bitstream of codec data with no internal landmarks. Ogg编解码器将原始压缩数据放入数据包中。数据包是八位字节有效载荷,其中包含单个解压缩单元(例如一个视频帧)所需的数据。数据包没有最大大小,长度可能为零。它们一般没有任何框架信息。串在一块儿的未成帧的数据包造成没有内部界标的编解码器数据的逻辑比特流。
Packets of raw codec data are not typically internally framed. When they are strung together into a stream without any container to provide framing, they lose their individual boundaries. Seek and capture are not possible within an unframed stream, and for many codecs with variable length payloads and/or early-packet termination (such as Vorbis), it may become impossible to recover the original frame boundaries even if the stream is scanned linearly from beginning to end. Logical bitstream packets are grouped and framed into Ogg pages along with a unique stream serial number to produce a physical bitstream. An elementary stream is a physical bitstream containing only a single logical bitstream. Each page is a self contained entity, although a packet may be split and encoded across one or more pages. The page decode mechanism is designed to recognize, verify and handle single pages at a time from the overall bitstream. 原始编解码器数据包一般不会在内部进行帧化。当它们在没有任何容器提供框架的状况下串在一块儿成为流时,它们会失去各自的边界。在未成帧的流中没法进行查找和捕获,而且对于许多具备可变长度有效载荷和/或早期数据包终止的编解码器(例如Vorbis),即便从如下位置对流进行线性扫描,也可能没法恢复原始帧边界开始到结束。 逻辑比特流包被分组并与惟一的流序列号一块儿分红Ogg页面,以生成物理比特流。基本流是仅包含单个逻辑位流的物理位流。尽管能够在一个或多个页面上拆分和编码数据包,但每一个页面都是一个独立的实体。页面解码机制旨在一次识别,验证和处理整个比特流中的单个页面。
The primary purpose of a container is to provide framing for raw packets, marking the packet boundaries so the exact packets can be retrieved for decode later. The container also provides secondary functions such as capture, timestamping, sequencing, stream identification and so on. Not all of these functions are represented in the diagram. 容器的主要目的是为原始数据包提供帧,标记数据包的边界,以即可以检索确切的数据包以供之后解码。容器还提供辅助功能,例如捕获,时间戳记,排序,流识别等。并不是全部这些功能都在图中表示。 In the Ogg container, pages do not necessarily contain integer numbers of packets. Packets may span across page boundaries or even multiple pages. This is necessary as pages have a maximum possible size in order to provide capture guarantees, but packet size is unbounded. 在Ogg容器中,页面不必定包含整数个数据包。数据包可能跨越页面边界,甚至跨越多个页面。这是必要的,由于页面具备最大可能的大小以便提供捕获保证,可是数据包大小是不受限制的。 Ogg Bitstream Framing specifies the page format of an Ogg bitstream, the packet coding process and elementary bitstreams in detail. Ogg比特流成帧详细指定了Ogg比特流的页面格式,数据包编码过程和基本比特流。
Multiple logical/elementary bitstreams can be combined into a single multiplexed bitstream by interleaving whole pages from each contributing elementary stream in time order. The result is a single physical stream that multiplexes and frames multiple logical streams. Each logical stream is identified by the unique stream serial number stamped in its pages. A physical stream may include a 'meta-header' (such as the Ogg Skeleton) comprising its own Ogg page at the beginning of the physical stream. A decoder recovers the original logical/elementary bitstreams out of the physical bitstream by taking the pages in order from the physical bitstream and redirecting them into the appropriate logical decoding entity. 经过按时间顺序交织来自每一个贡献基本流的整个页面,能够将多个逻辑/基本位流组合为单个多路复用位流。结果是单个物理流,该物理流对多个逻辑流进行了多路复用和帧化。每一个逻辑流由在其页面上标记的惟一流序列号标识。物理流能够包括在物理流的开始处包括其本身的Ogg页面的“元头”(例如,Ogg骨架)。解码器经过从物理比特流中按顺序提取页面并将其重定向到适当的逻辑解码实体中,从而从物理比特流中恢复出原始逻辑/基本比特流。
Multiple media types are mutliplexed into a single Ogg stream by interleaving the pages from each elementary physical stream. 经过交错来自每一个基本物理流的页面,能够将多种媒体类型多路复用为单个Ogg流。 Ogg Bitstream Multiplexing specifies proper multiplexing of an Ogg bitstream in detail. Ogg比特流复用详细指定了Ogg比特流的正确复用。
Multiple Ogg physical bitstreams may be concatenated into a single new stream; this is chaining. The bitstreams do not overlap; the final page of a given logical bitstream is immediately followed by the initial page of the next. 多个Ogg物理比特流能够串联成一个新的流;这是连锁。比特流不重叠;给定逻辑比特流的最后一页紧随其后。 Each logical bitstream in a chain must have a unique serial number within the scope of the full physical bitstream, not only within a particular link or segment of the chain. 一条链中的每一个逻辑比特流必须在完整物理比特流的范围内具备惟一的序列号,而不只仅是在特定的链路或链段内。
Within Ogg, each stream must be declared (by the codec) to be continuous- or discontinuous-time. Most codecs treat all streams they use as either inherently continuous- or discontinuous-time, although this is not a requirement. A codec may, as part of its mapping, choose according to data in the initial header. 在Ogg中,必须(经过编解码器)将每一个流声明为连续时间或不连续时间。大多数编解码器将其使用的全部流本质上视为连续时间或不连续时间,尽管这不是必需的。编解码器能够做为其映射的一部分,根据初始标头中的数据进行选择。 Continuous-time pages are stamped by end-time, discontinuous pages are stamped by begin-time. Pages in a multiplexed stream are interleaved in order of the time stamp regardless of stream type. Both continuous and discontinuous logical streams are used to seek within a physical stream, however only continuous streams are used to determine buffering depth; because discontinuous streams are stamped by start time, they will always 'fall out' at the proper time when buffering the continuous streams. See 'Examples' for an illustration of the buffering mechanism. 连续时间页面由结束时间标记,不连续页面由开始时间标记。无论流类型如何,复用流中的页面都按时间戳顺序交错。连续逻辑流和不连续逻辑流都用于在物理流中查找,可是只有连续流才用于肯定缓冲深度。由于不连续的流是由开始时间标记的,因此当缓冲连续的流时,它们老是在适当的时间“掉出”。有关缓冲机制的说明,请参见“示例”。
Multiplexing requirements within Ogg are straightforward. When constructing a single-link (unchained) physical bitstream consisting of multiple elementary streams: Ogg中的多路复用要求很简单。在构造由多个基本流组成的单连接(未连接)物理比特流时:
Multiplexed and/or unmultiplexed bitstreams may be chained consecutively. Such a physical bitstream obeys all the rules of both chained and multiplexed streams. Each link, when unchained, must stand on its own as a valid physical bitstream. Chained streams do not mix or interleave; a new segment may not begin until all streams in the preceding segment have terminated. 复用和/或未复用的比特流能够被连续地连接。这样的物理比特流遵照链式和多路复用流的全部规则。断开连接时,每一个连接必须做为有效的物理比特流独立存在。连接的流不会混合或交错。直到先前段中的全部流都已终止,新段才可能开始。
Each codec is allowed some freedom in deciding how its logical bitstream is encapsulated into an Ogg bitstream (even if it is a trivial mapping, eg, 'plop the packets in and go'). This is the codec's mapping. Ogg imposes a few mapping requirements on any codec. 每一个编解码器在决定如何将其逻辑比特流封装到Ogg比特流中都具备必定的自由度(即便它是微不足道的映射,例如“将数据包复制进去”)。这是编解码器的映射。 Ogg对任何编解码器都提出了一些映射要求。
[More to come shortly; this section is currently being revised and expanded]不久之后还会有更多;本节目前正在修订和扩展 Below, we present an example of a multiplexed and chained bitstream: 下面,咱们提供一个多路复用和连接的比特流的示例: