Multi-Grammy Winner, Music Producer & Engineer, and Equipment and Studio Design Engineer

Читайте также:

An interview with George Massenburg on surround sound and allied topics is available at www.tmhlabs.com/pub.

5 Delivery Formats

Tips from This Chapter

• The multichannel digital audio consumer media today are Digital Versatile Disc Video (DVD-V), Blu ray, HD DVD, terrestrial over-the-air and satellite broadcasting, and possible delivery of these by cable, either copper or fibre optic. Internet downloadable movies are beginning, with the requirement that consumers expect the facilities of at least one stream of audio that the competition offers.

• Metadata (data about the audio "payload" data), wrappers (the area in a digital bitstream to record the metadata), and data essence (the audio payload or program) are defined.

• Linear PCM (LPCM) has been well studied and characterized, and the factors characterizing it include sample rate (see Appendix 1), word length (see Appendix 2), and the number of audio channels. Redundancy in audio may be exploited to do bit packing much like Zip files do for documents; the underlying audio coding is completely preserved through such processes.

• Word length needs to be longer in the professional domain than on the release media, so that the release may achieve the dynamic range implied by its word length, considering the effects of adding channels together in multitrack mixing.

• Products may advertise longer word lengths than are sensible given their actual dynamic range, because many of the least significant bits may contain only noise.Table 5-1 gives dynamic range versus the effective number of bits.

• Coders other than LPCM have application in many areas where LPCM even with bit-reduction packing is too inefficient. There are several classes of such coders, with different characteristics featuring various tradeoffs of factors such as maximum bit rate reduction, ability to edit, and ability to cascade.

Table 5-1 Number of Bits versus Dynamic Range

Effective number of bits	Dynamic range, dB*
16	93
17	99
18	105
19	111
20	117
21	123
22	129
23	135
24	141
^Includes the effect of triangular probability density amplitude function dither, preventing quantization distortion and noise modulation; this dither adds 3dB to the noise floor to prevent such problems.

• One class of such coders, called perceptual coders, utilize the masking characteristics of human listeners in the frequency and time domains including the fact that louder sounds tend to obscure softer ones to make more efficient use of limited channel capacity. Perceptual coders tend to offer the maximum bit rate reduction.

• Multiple tracks containing content intended to make a stereo image must be kept synchronized to the sample. Even a one-sample shift is audible as a move in a phantom image between two adjacent channels.

• Reference level on professional masters varies from -20dBFS (Society of Motion Picture andTelevision Engineers, SMPTE), through -18dBFS (EBU), up to as much as -12dBFS (some music uses).

• Many track layouts exist, but one of the most common is the one standardized by ITU (InternationalTelecommunications Union) and SMPTE for interchange of program accompanying pictures at least. It is L, R, C, LFE (Low Frequency Enhancement), LS, RS, and 7 and 8 used variably for such ancillary uses as Lt/Rt, or Hearing Impaired (HI) and Visually Impaired (VI) mono mixes.

• Most digital video tape machines have only four audio tracks, thus need compression schemes such as Dolby E to carry 5.1-channel content (in one audio pair).

• DTV, DVD-V, HD DVD, and Blu-ray have the capability for multiple audio streams accompanying picture, which are intended to be selected by the end user.

• Metadata transmits information such as the number of channels and how they are utilized, and information about level, compression, mixdown of multichannel to stereo, and similar features.

• There are three metadata mechanisms that affect level. Dialogue normalization (dialnorm) acts to make programs more interchangeable with each other, and is required of every ATSC TV receiver. Dynamic Range Control (DRC) serves as a compression system that in selected sets may be adjusted by the end user. Mixlevel provides a means for absolute level calibration of the system, traceableto the original mix. When implemented all three tend to improve on the conditions of NTSC broadcast audio.

• There is a flag to tell receiving equipment about the monitor system in use, whether X curve film monitoring, or "flat" studio and home monitoring. End-user equipment may make use of this flag to set playback parameters to match the program material.

• The 2-channel mode can flag the fact that the resulting mix is an Lt/Rt one intended for subsequent matrix decoding, or is conventional stereo, called Lo/Ro.

• Downmix sets parameters for the level of center and surrounds to appear in Left/Right outputs.

• Film mixes employ a different standard from home video. Thus, transfers to video must adjust the surround level down by 3dB.

• Sync problems between sound and picture are examined for DVD-V and DTV systems.There are multiple sources of error that can even include the model of player.

• Each of the features of multichannel digital audio described above has some variations when applied to DVD-V and Digital Television.

• Intellectual property protection schemes include making digital copies only under specified conditions and watermarking so that even analog copies derived from digital originals can be traced.

Introduction

There are various delivery formats for multichannel audio available today, for broadcast media, packaged media, and downloadable media. Most of the delivery formats carry, in addition to the audio, metadata, or data about the audio data, in order that the final end-user equipment be able to best reproduce the producer's intent. So in the multichannel world, not only does information about the basic audio, such as track formats, have to be transmitted from production or postproduction to mastering and/or encoding stages, but also, information about how the production was done needs to be transmitted. While today such information has to be supplied in writing so that the person doing the mastering or encoding to the final release format can "fill in the blanks" about metadata at the input of the encoder, it is expected that this transmission of information will take place electronically in the future,

as a part of a database accompanying the program. Forms that can be used to transmit the information today are given at the end of this chapter.

The various media for multichannel audio delivery to consumer end users, and their corresponding audio coding methods, in order of introduction, are as follows:

• Laser Disc: Dolby Digital, Digital Theater Systems (DTS).

• DTS CD: DTS Digital Surround (see Appendix 3).

• US Digital Television: Dolby Digital.

• Digital Versatile Disc Video (DVD-V): LPCM (Linear PCM), Dolby Digital, DTS, MPEG-2 Musicam Surround.

• Super Audio Compact Disc (SACD): DSD.

• Digital Versatile Disc Audio (DVD-A): LPCM, LPCM with MLP bit packing, and others (see Appendix 3).

• HD DVD: LPCM, Dolby Digital, Dolby Digital Plus, Dolby TrueHD, DTS Digital Surround, DTS-HD High Resolution Audio, DTS-HD Master Audio.

• Blu Ray: LPCM, Dolby Digital, Dolby Digital Plus, Dolby TrueHD, DTS Digital Surround, DTS-HD High Resolution Audio, DTS-HD Master Audio.

• Internet Downloadable Video and Audio: Various codecs.

• Digital Cinema: LPCM.

• There are proposals for multichannel digital radio. Some of them utilize subcarriers on existing radio stations and rely on Lt/Rt matrix encoding to offer a soft failure by reversion to analog when reception conditions warrant.

This chapter begins with information about new terminology, audio coding, sample rate, and word length requirements in postproduction (supplemented with Appendixes 1 and 2, respectively), inter-track synchronization requirements, and reference level issues which the casual reader may wish to skip.

New Terminology

In today's thinking, there is a lot of new terminology in use by standards committees that has not yet made it into common usage. The audio payload of a system, stripped of metadata, error codes, etc., is called data essence in this new world. I, for one, don't think practitioners are ever going to call their product data essence, but that is what the standards organizations call it. Metadata, or data about the essence, is supplied in the form of wrappers, the place where the metadata is stored in a bit stream.

Audio Coding

Audio coding methods start with LPCM, the oldest and most researched digital conversion method. The stages in converting an analog audio signal to LPCM include anti-alias filtering,¹ then sampling the signal in the time domain at a uniform sample rate, and finally quantizing or "binning" the signal in the level domain to the required word length. A uniform step-size device called a quantizer assigns the signal a number for the closest step available with the addition of dither to linearize the "steps" of the quantizer. Appendix 1 discusses sample rate and antialiasing; Appendix 2 explains word length and quantization. LPCM is usually preferred for professional use up to the point of encoding forthe release channel because, although it is not very efficient, it is the most mathematically simple for such processes as equalization, compared to other digital coding schemes. Also, the consequences of performing processes such as mixing are well understood so that requirements can be set in a straightforward manner. For instance, adding two equal, high-level signals together can create a level that is greater than the coding range of one source channel, but the amount is easily predicted and can be accounted for in design by "turning the signals down" before addition. Likewise, the addition o^ noise due to adding source channels together is well understood, along with requirements for re-dithering level-reduced signals to prevent quantization distortion of the output DAC (see Appendix 2), although all equipment may not be well designed with such issues in mind.

One alternate approach to LPCM is called 1-bit A^ (delta-sigma) conversion. Such systems sample the audio at a much higher frequency than conventional systems, such as at 2.82Mbits/s (hereafter Mbps) with 1-bit resolution to produce a different set of conversion tradeoffs than LPCM. Super Audio CD by Sony and Philips use such a system. SD systems generally increase the bit rate compared to LPCM, so are of perhaps limited appeal for multichannel audio.

LPCM is conceptually simple, and its manipulation is well understood. On the other hand, it cannot be said to be perceptually efficient, because only its limits on frequency and dynamic ranges are adjusted to human hearing, not the basic method. In this way, it can be seen that even LPCM is "perceptually coded," that is, by selecting a sample rate,

^A steep filter is required so that content at greater than 1/2 the sample rate is excluded from the conversion process. For example, if an input 28-kHz tone were not suppressed in a system employing 48-kHz sampling, that tone would be rendered as a 20kHz one on the output! Effectively higher frequencies than one-half the sample rate "bounce" off the folding frequency, which is 1/2 the sample rate, and wind up at lower frequencies.

word length, and number of channels, one is tuning the choices made to human perception. In fact, DVD-A gives the producer a track-by-track decision to make on these three items, factors that can even be adjusted on the basis of particular channel combinations.

A major problem for LPCM is that it can be said to be very inefficient when it comes to coding for human listeners. Since only the bounds are set psychoacoustically, the internal coding within the bounds could be made more efficient. More efficient coding could offer the representation of greater bandwidth, more dynamic range, and/or a larger number of channels within the constraints of a given bit rate.Thus, it may well be that the "best" digital audio method for a given channel is found to be another type of coding; for now LPCM is a conservative approach to take in original recording and mixing of the multigenerations needed in large-scale audio production.This is because LPCM can be cascaded for multiple generations with only known difficulties cropping up, such as the accumulation of noise and headroom limitations as channels are added together. In the past, working with multigeneration analog could sometimes surprise listeners with how distorted a final generation might sound compared to earlier generations.² What was happening was that distortion was accumulating more or less evenly among multiple generations, so long as they were well engineered, but at one particular generation the amount of distortion was "going over the top" and becoming audible. There is no corresponding mechanism in multiple generation LPCM, so noise and distortion do not accumulate except for the reasons given above.

A variety of means have been found to reduce the bit rate delivered by a given original rate of PCM conversion to save storage space on media or improve transmission capability of a limited bit rate channel. It may be viewed that within a given bit rate, the maximum audio quality may be achieved with more advanced coding than LPCM.These various means exploit the fact that audio signals are not completely random, that is, they contain redundancy. Their predictability leads to ways to reduce the bit rate of the signal. There is a range of bit-rate-reduction ratios possible, called "coding gain," from as little as 2:1 to as much as 15:1,

²\t did surprise me. The 70mm prints of Return of the Jedi had audible IM distortion, and yet an improved oxide was in use compared to earlier prints. I worked hard to understand why, when this generation measured about the same as the foregoing analog generations, that it sounded distorted. What I concluded was that each generation was adding distortion and yet remained inaudible until a threshold was crossed in this final generation to audibility. One year later with a much improved oxide for the postproduction magnetic generations on Indiana Jones and theTemple of Doom the audible IM distortion was gone from the 70mm prints, even though it contained a passage of chorus over strong bass that would normally reveal IM distortion well.

and a variety of coders are used depending on the needs of the channel and program. For small amounts of coding gain completely reversible processes are available, called lossless coders, that use software programs like computer compression/decompression systems such as ZIP to reduce file size or bit rate.

One method of doing lossless compression is Meridian Lossless Packing, MLP. It provides a variable amount of compression depending on program content, producing a variable bit rate, in order to be the most efficient on media. The electrical interfaces to and from the media, on the other hand, are at a constant bit rate, to simplify the interfaces. Auxiliary features of MLP include added layers of error coding so that the signal is better protected against media or transmission errors than previous formats, support for up to 64 channels (on media other than DVD-A that has its own limits), flags for speaker feed identification, and many others.

In order to provide more channels at longer word lengths on a given medium such as the Red Book CD with its 1.411-Mbps payload bit rate, redundancy removal may be accomplished with a higher coding gain than lossless coders using a method such as splitting the signal up into frequency bands, and using prediction from one sample to the next within each band. Since sample-by-sample digital audio tends to be highly correlated, coding the difference between adjacent samples with a knowledge of their history instead of coding their absolute value leads to a coding gain (the difference signals are smaller than the original samples and thus need less range in the quantizer, i.e., fewer bits). As an example, DTS Coherent Acoustics uses 32 frequency subbands and differential PCM within each band to deliver 5.1 channels of up to 24-bit words on the Red Book CD medium formerly limited to 2 channels of 16-bit words.

A 5.1-channel, 16-bit, 48-kHz sample rate program recorded in LPCM requires 3.84 Mbps of storage, and the same transfer rate to and from storage. Since the total payload capacity of a Digital Television station to the ATSC standard is 19 Mbps, LPCM would require about 20% of the channel capacity: too high to be practical.Thus bit-rate-reduction methods with high coding gain were essential if multichannel audio was to accompany video. One of the basic schemes is to send, instead of LPCM data that is the value for each sample in time, a successive series of spectrum analyses, which are then converted back into level versus time by the decoder.These methods are based on the fact that the time domain and the frequency domain are different representations of the same thing, and transforms between them can be used as the basis for coding gains; data reduction can be achieved in either domain. As the need for coding gain increases under conditions of lower available bit

rates, bit-rate-reduction systems exploit the characteristics of human perceptual masking.

Human listening includes masking effects: loud sounds cover up soft ones, especially nearby the loud sound in frequency. Making use of masking means having to code only the loudest sound in a frequency range at any one time, because that sound will cover up softer ones nearby. It also means that fewer bits are needed to code a high-level frequency component, because quantizing noise in the frequency range of the component is masked. Complications include the fact that the successive spectra are taken in frames of time, and the frame time length is switched to longer for more steady state sound (with greater frequency resolution) or shorter for a more transient approach (with better time resolution). Temporal masking, sometimes called non-simultaneous masking, is also a feature of human hearing that is exploited in perceptual coders. A loud sound will cover up a soft one that occurs after it, but surprisingly also before it! This is called backwards masking, and it occurs because the brain registers the loud sound more quickly than the preceding soft one, covering it up. The time frame of backwards masking is short, but there is time enough to overcome the "framing" problem of transform coders by mostly hiding transient quantization noise underneath backwards masking.

Transform codecs like Dolby Digital, and MPEG AAC, use the effects of frequency and temporal masking to produce up to a 15:1 reduction in bits, with little impact on most sounds. However, there are certain pathological cases that cause the coders to deviate from transparency, including solo harpsichord music due to its simultaneous transient and tonal nature, pitch pipe because it reveals problems in the filter bank used in the conversion to the frequency domain, and others.Thus low-bit-rate coders are used where they are needed for transmission or storage of multichannel audio within the constraints of accompanying a picture in a limited capacity channel, downloading audio from web sites, or as accompanying data streams for backwards compatibility such as in the case of DVD-A discs for playback on DVD-V players.

Дата добавления: 2015-10-30; просмотров: 121 | Нарушение авторских прав

<== предыдущая страница	\|	следующая страница ==>
One Case Study: Herbie Hancock's "Butterfly" in 10.2	\|	Cascading Coders

mybiblioteka.su - 2015-2025 год. (0.011 сек.)