| Topic: |
| |
| Sample granularity editing of a Vorbis file; inferred arbitrary sample |
| length starting offsets / PCM stream lengths |
| |
| Overview: |
| |
| Vorbis, like mp3, is a frame-based* audio compression where audio is |
| broken up into discrete short time segments. These segments are |
| 'atomic' that is, one must recover the entire short time segment from |
| the frame packet; there's no way to recover only a part of the PCM time |
| segment from part of the coded packet without expanding the entire |
| packet and then discarding a portion of the resulting PCM audio. |
| |
| * In mp3, the data segment representing a given time period is called |
| a 'frame'; the roughly equivalent Vorbis construct is a 'packet'. |
| |
| Thus, when we edit a Vorbis stream, the finest physical editing |
| granularity is on these packet boundaries (the mp3 case is |
| actually somewhat more complex and mp3 editing is more complicated |
| than just snipping on a frame boundary because time data can be spread |
| backward or forward over frames. In Vorbis, packets are all |
| stand-alone). Thus, at the physical packet level, Vorbis is still |
| limited to streams that contain an integral number of packets. |
| |
| However, Vorbis streams may still exactly represent and be edited to a |
| PCM stream of arbitrary length and starting offset without padding the |
| beginning or end of the decoded stream or requiring that the desired |
| edit points be packet aligned. Vorbis makes use of Ogg stream |
| framing, and this framing provides time-stamping data, called a |
| 'granule position'; our starting offset and finished stream length may |
| be inferred from correct usage of the granule position data. |
| |
| Time stamping mechanism: |
| |
| Vorbis packets are bundled into into Ogg pages (note that pages do not |
| necessarily contain integral numbers of packets, but that isn't |
| inportant in this discussion. More about Ogg framing can be found in |
| ogg/doc/framing.html). Each page that contains a packet boundary is |
| stamped with the absolute sample-granularity offset of the data, that |
| is, 'complete samples-to-date' up to the last completed packet of that |
| page. (The same mechanism is used for eg, video, where the number |
| represents complete 2-D frames, and so on). |
| |
| (It's possible but rare for a packet to span more than two pages such |
| that page[s] in the middle have no packet boundary; these packets have |
| a granule position of '-1'.) |
| |
| This granule position mechaism in Ogg is used by Vorbis to indicate when the |
| PCM data intended to be represented in a Vorbis segment begins a |
| number of samples into the data represented by the first packet[s] |
| and/or ends before the physical PCM data represented in the last |
| packet[s]. |
| |
| File length a non-integral number of frames: |
| |
| A file to be encoded in Vorbis will probably not encode into an |
| integral number of packets; such a file is encoded with the last |
| packet containing 'extra'* samples. These samples are not padding; they |
| will be discarded in decode. |
| |
| *(For best results, the encoder should use extra samples that preserve |
| the character of the last frame. Simply setting them to zero will |
| introduce a 'cliff' that's hard to encode, resulting in spread-frame |
| noise. Libvorbis extrapolates the last frame past the end of data to |
| produce the extra samples. Even simply duplicating the last value is |
| better than clamping the signal to zero). |
| |
| The encoder indicates to the decoder that the file is actually shorter |
| than all of the samples ('original' + 'extra') by setting the granule |
| position in the last page to a short value, that is, the last |
| timestamp is the original length of the file discarding extra samples. |
| The decoder will see that the number of samples it has decoded in the |
| last page is too many; it is 'original' + 'extra', where the |
| granulepos says that through the last packet we only have 'original' |
| number of samples. The decoder then ignores the 'extra' samples. |
| This behavior is to occur only when the end-of-stream bit is set in |
| the page (indicating last page of the logical stream). |
| |
| Note that it not legal for the granule position of the last page to |
| indicate that there are more samples in the file than actually exist, |
| however, implementations should handle such an illegal file gracefully |
| in the interests of robust programming. |
| |
| Beginning point not on integral packet boundary: |
| |
| It is possible that we will the PCM data represented by a Vorbis |
| stream to begin at a position later than where the decoded PCM data |
| really begins after an integral packet boundary, a situation analagous |
| to the above description where the PCM data does not end at an |
| integral packet boundary. The easiest example is taking a clip out of |
| a larger Vorbis stream, and choosing a beginning point of the clip |
| that is not on a packet boundary; we need to ignore a few samples to |
| get the desired beginning point. |
| |
| The process of marking the desired beginning point is similar to |
| marking an arbitrary ending point. If the encoder wishes sample zero |
| to be some location past the actual beginning of data, it associates a |
| 'short' granule position value with the completion of the second* |
| audio packet. The granule position is associated with the second |
| packet simply by making sure the second packet completes its page. |
| |
| *(We associate the short value with the second packet for two reasons. |
| a) The first packet only primes the overlap/add buffer. No data is |
| returned before decoding the second packet; this places the decision |
| information at the point of decision. b) Placing the short value on |
| the first packet would make the value negative (as the first packet |
| normally represents position zero); a negative value would break the |
| requirement that granule positions increase; the headers have |
| position values of zero) |
| |
| The decoder sees that on the first page that will return |
| data from the overlap/add queue, we have more samples than the granule |
| position accounts for, and discards the 'surplus' from the beginning |
| of the queue. |
| |
| Note that short granule values (indicating less than the actually |
| returned about of data) are not legal in the Vorbis spec outside of |
| indicating beginning and ending sample positions. However, decoders |
| should, at minimum, tolerate inadvertant short values elsewhere in the |
| stream (just as they should tolerate out-of-order/non-increasing |
| granulepos values, although this too is illegal). |
| |
| Beginning point at arbitrary positive timestamp (no 'zero' sample): |
| |
| It's also possible that the granule position of the first page of an |
| audio stream is a 'long value', that is, a value larger than the |
| amount of PCM audio decoded. This implies only that we are starting |
| playback at some point into the logical stream, a potentially common |
| occurence in streaming applications where the decoder may be |
| connecting into a live stream. The decoder should not treat the long |
| value specially. |
| |
| A long value elsewhere in the stream would normally occur only when a |
| page is lost or out of sequence, as indicated by the page's sequence |
| number. A long value under any other situation is not legal, however |
| a decoder should tolerate both possibilities. |
| |
| |