Gramophone, November 1999
© Haymarket Magazines Ltd
PACKING IT ALL IN
The ultra high resolution requirements of DVD-Audio and Super Audio CD have necessitated the development of lossless digital audio compression systems, as Keith Howard explains
Audio history, like history in general, does not always proceed in logical fashion. So it is that, with DVD-Audio and Super Audio CD almost upon us, we are belatedly being introduced to lossless digital audio compression systems following some years of prior exposure to their lossy relatives, as incorporated in the perceptual coding processes that underpin MiniDisc, Digital Radio and Dolby Digital.
There would have been less potential for confusion had matters developed logically, in reverse order. In the mind of dyed-in-the-wool audio enthusiasts compression is already a dirty word because of its association with the lossy form: a term which smacks of compromise in a context where an uncompromising approach to sound quality is a central tenet. Little wonder in the circumstances that lossless compression which, unlike the lossy alternative, involves no compromise of signal quality has been alternatively termed lossless packing, in a deliberate attempt to distance it from those technologies which discard notionally inaudible signal components, such as ATRAC, MPEG and AC-3.
Compression is a potentially confusing term in any case because it has two distinct meanings in the audio context. In its old usage compression refers to a reduction in dynamic range, deliberate or otherwise. In digital audio, however, it is also used as a shortened form of data compression and refers to methods of trimming back on the large amount of data required to represent the signal. If this reduction is achieved without any modification to the signal content in other words, if the decompressed signal is a bit-exact reconstruction of the input then the compression is lossless; if output and input are not identical then the compression is lossy.
The latter form of compression has been more widely used to date because of the limitations in data capacity imposed by various means of music delivery. To compress a two-channel, full-spectrum, wide dynamic range audio signal on to MiniDisc, for example, and still retain CD-competitive playing time requires data compression of such an order that it cannot consistently be achieved without data loss. Likewise fitting 5.1 channels of high quality sound on to a film print or broadcasting two channels of Digital Radio from terrestrial transmitter sites. In all these cases the compression process generally involves loss of signal data, necessitated by limitations on data capacity. (The word generally is appropriate here because lossy compression schemes usually incorporate lossless encoding techniques, which potentially means that simple signals will be encoded without data loss. In the PASC lossy compression system of Digital Compact Cassette, for example, half the 4:1 data compression was achieved by lossless encoding processes.)
Lossless compression is now making an appearance within DVD-A and SACD because, with their vastly increased data storage capability, these high-density media significantly reduce the compression requirement. Although data compression is still employed to provide the desired combination of sound quality, channel provision and playing time, the amount of data saving required has fallen sufficiently for lossless compression to suffice, guaranteeing the uncompromising hi-fi requirement that input and output be identical.
To understand why compression is still required for DVD-A and SACD its only necessary to perform some simple arithmetic. Lets take DVD-A as the example. On a single-sided disc the maximum data capacity is 4.7 gigabytes (4.7GB). The maximum supported sampling rate is 192kHz (i.e. 192,000 samples per second) and the maximum supported resolution 24-bit. Using these figures we can calculate the maximum playing time for a two-channel audio signal stored at the highest available quality (ignoring, for the sake of convenience, the additional data required for error correction and other subcode purposes). Each channel of 24-bit/192kHz audio generates (24 x 192000 =) 4,608,000 bits per second, equivalent to 562.5 kilobytes (KB). For two channels the total data rate is therefore 1.1MB per second. At that rate the 4.7GB capacity of the disc is used up in 4,380 seconds or 73 minutes. For a two-channel signal that might suffice, but any multi-channel provision would clearly demand an unacceptable reduction in maximum playing time and/or sacrifice in either the signals resolution and/or sampling rate. Its to offset this compromise between signal quality and playing time while maintaining signal integrity that DVD-A and SACD both incorporate lossless compression.
Unsurprisingly given that SACD uses 1-bit DSD coding while DVD-A uses linear PCM, the two utilize different compression schemes. SACDs goes by the name of DST (a potential confusion here with both DSD and DTS) and was developed by Philips. For DVD-A a competition was organised by Working Group 4 of the DVD Forum to assess the best compression technology, and interested parties invited to submit their offerings for independent testing. Four did so, the eventual winner being Meridian Lossless Packing, a technology developed in the UK principally by the late Michael Gerzon, Peter Craven of Algol Applications and Bob Stuart of Meridian. Although the other three competitors remain officially unidentified I understand them to have been Digital Theater Systems (DTS), JVC and Matsushita - information which was not, I should stress, given me by Meridian.
Although they specify the compression figure in different ways, DST and MLP appear to achieve broadly similar orders of data saving. In the case of DST the typical compression ratio is quoted as 2.3-2.6 to 1 ie the signal data is reduced to 38-43 per cent its original size. Figures for MLP are quoted in terms of bit reduction per sample per channel and vary according to the sampling rate of the input signal. At 48kHz the average reduction is 5-11 bits, rising to 9-13 bits at 96kHz and 9-14 bits at DVD-As maximum permitted sampling rate of 192kHz. A 12-bit saving per sample on a 24-bit input signal corresponds to a compression ratio of 2 to 1.
How is this order data reduction achieved without any compromise to signal content? Before exploring this using MLP as the example, first a terminological aside about the use of the word entropy in this context. If you know something of thermodynamics youll understand entropy to be a measure of disorder, the entropy of a gas, for example, being higher than that of a solid because of its lack of an organised structure. In communication theory the term is used similarly, as a measure of the disordered nature of a signal. Disorder and the transmission of information might intuitively seem incompatible (disorder suggests noise), but for information to be conveyed disorder is essential. The steady, unvarying carrier wave of a radio transmitter, for example, conveys no information other than that the transmitter is active: only if the carrier wave is interrupted (e.g. Morse code) or modulated (e.g. AM or FM radio) can it convey information. In what follows, then, entropy and information can be regarded as synonyms.
All lossless compression systems incorporate three key functional elements: a framing process (which divides up the incoming signal into appropriately sized chunks for processing), a predictor and an entropy encoder (Figure 1). The predictor is alternatively called a decorrelator but for the general reader the former term gives a more ready insight into its function. It and the entropy encoder operate in series to reduce the signals data requirement at two distinct levels. First the predictor reduces the amount of data required to describe the signal waveform itself, then the entropy encoder reduces the data required to represent the output of the predictor. Essentially the entropy encoder is analogous to the zipping software used to compress files on computer. Although the algorithms used in the audio and general data contexts may differ, the process is essentially the same. What distinguishes dedicated audio compression from zipping is the formers signal predictor element, without which the amount of compression that can be achieved is much lower as anyone who has tried zipping computer sound files will know. Whereas dedicated real-time audio compression systems can achieve compression ratios in excess of 2:1, general purpose compression algorithms typically perform only half as well, despite the inherent advantage of processing off-line.
As its name suggests, the role of the predictor is to estimate what the signal will do next. To do this it analyses the signal using a suite of digital filters; in the case of MLP a suite of both FIR (finite impulse response) and IIR (infinite impulse response) filters are available, of up to eighth-order. Having made its estimate, the predictor then generates an error signal which represents the difference between its prediction and the actual signal waveform. These two pieces of information prediction and error almost but not quite constitute the predictors output; not quite because if they did there would be no data saving. Instead the predictor outputs the error signal plus the rules it used to generate the prediction, which the decoder can later employ to rebuild the predicted signal. In this way a significant data saving can be achieved.
The output of the predictor then enters the second stage of the compression process, the entropy encoder. What this does is look for patterns in the predictor output which can be exploited to reduce the data requirement still further. Various methods of doing this are provided within MLP, a proprietary algorithm first examining the data to decide which of them Huffman coding, run time coding etc will provide the most effective data reduction in each instance.
Entropy encoding is a subject in itself but a simple example suffices to illustrate the basic concept. Imagine you have to code the English alphabet digitally. As there are 26 available letters (ignoring upper/lower case distinctions) you would in the normal way require a 5-bit digital word to identify each uniquely. For example, you could arrange for a to be represented as 00001, b as 00010, c as 00011, etc. But in any sufficiently large average English text we know e will be the most frequently occurring letter, which means 00101 will appear more times than any other data sequence. If we code this most common sequence as, say, 1, and the next most common letter (t) as 10, etc then we will only have to use a full 5-bit word to represent the most infrequently occurring letters. In this way we can potentially save a lot of data without losing any information. This is an example of Huffman coding, which is ideally suited to any input, like language, which is highly variable from sample to sample but conforms to a statistical pattern overall.
Another possible pattern type is temporal: for example, the multiple repetitions of pixel colour that commonly occur in a raster (bitmap) image, much of which may comprise sky or sea or other large areas of consistent colour. In this case run time coding is likely to be the most efficient method of compressing the data. Instead of sending multiple repeats of a particular code sequence you simply send it once, appending an instruction to the decoder as to how many times to repeat it. Still other methods of entropy coding are particularly well suited to other situations, depending on the nature of the patterns within the data.
While framing, prediction and entropy encoding are common features of any lossless compression system, individual realisations will differ both in the details of these processes and in the provision of other processing elements which may be added to enhance performance. If we look at a block diagram of the MLP encoder (Figure 2) we see the expected predictor (decorrelator) and entropy encoder stages, but there are other processing elements too. Preceding the predictor stage are channel remap, shift and lossless matrix stages, while after the entropy encoder there is an optional output buffer stage (not illustrated). The first three assist the data compression or expansion processes while the third tackles another important issue, that of data rate.
Channel remapping, the first of the additional elements, has the capability to subdivide incoming channels into two or more data substreams. This allows the compressed signal to be recovered using a simpler decoder architecture, thereby saving on cost. A shift process is then applied to each data channel to recover any unused bit depth capacity, which occurs either when the input data is of less than 24-bit precision or when the channel is not fully modulated, as is the case for much of the time with typical audio content. Lastly before passing to the predictor stage the data channels are processed by a loss matrix which exploits any correlation between the signal content of different channels to cut the data requirement still further. In a conventional stereo recording, for example, correlation is typically high between the two channels as a result of central images being represented by signals of similar amplitude and phase in either channel. Similar correlations usually exist in multi-channel recordings also.
An additional path from the lossless matrix, labelled LSB bypass in the diagram, is provided to route the least significant bits of the signal around both the predictor and entropy encoder stages. The signal at these low levels typically comprises noise (often deliberately added dither noise), a high-entropy signal component that can advantageously bypass the data compression process.
Its a feature of lossless compression that the output data rate is variable. Whereas in a lossy compression system more or less information can be discarded in order to keep the output data rate constant, in a lossless process the amount of data in the output necessarily reflects the entropy of the input signal. When the amount of information in the signal (its entropy) is low, so is the output data rate, but when the signal entropy is dense the output data rate must increase to reflect this.
In the case of a transmission channel or storage medium with no limit on data rate capability, this characteristic of lossless compression is of academic interest only. It becomes very important, though, if the channel or medium has a data rate limit sufficient to accommodate the average requirement (as of course it must) but which is less than the maximum that the lossless coder might generate on certain high-entropy signals. This is the case with DVD-A which has a maximum data rate of 9.6Mbps (megabits per second) but is specified to carry up to six channels of 24-bit/96kHz data, which potentially demand a peak data rate of 13.824Mbps.
This where the provision of buffering becomes important. If the data rate from the entropy encoder exceeds the maximum allowable, the excess data is temporarily diverted to a FIFO (first in, first out) buffer memory and only read out again once the data rate has fallen sufficiently. Figure 3 shows an example of buffering at work in MLP, the signal in question being a 30-second excerpt from a six-channel 24-bit/96kHz recording which features closely-miked cymbals in all six channels. Because of the virtually random nature of this signal its entropy is unusually high and the underlying compressed data rate reaches 12.03Mbps. As soon as the output of the entropy encoder exceeds 9.2Mbps, however - just below the maximum 9.6Mbps data rate supported by DVD-A - data begins to accumulate in the buffer, awaiting sufficient fall in the entropy of the input signal. When this occurs the buffer is progressively emptied again. In the example the required buffer memory is around 85kB and the graph scale goes up to 256kB, but Meridian declines to identify just how large a buffer MLP incorporates for DVD-A. In the extreme case of the data rate requirement exceeding the buffer provision, MLP offers the recording engineer various options for reducing data within the source signal, by trimming back the sampling rate or reducing the bit depth on a channel by channel basis. This provision also allows a producer to increase playing time if required.
A block diagram of the MLP decoder (Figure 4) reveals, as you would expect, a mirror image of the encoder structure. What isnt apparent from the diagram is the decoders relative simplicity a key practical requirement since decoder complexity determines the cost of implementation in the end product. Meridian says that the computing power required to decode a two-channel data stream at 192kHz sampling rate is 27MIPs (millions of instructions per second), while six channels at 96kHz requires 40MIPs. These figures are well within the capability of inexpensive modern DSP chips.
Dolby Laboratories is handling the licensing of MLP and will provide technical support in the same manner as for its own products. To date ten semiconductor manufacturers have expressed an interest in developing and selling MLP decoders, two of whom Motorola and Cirrus Logic (Crystal Semiconductor) have publicly announced that they will do so. With DVD-Audio set for launch next year, it isnt long before the first chips will be needed.
You can view or download this article
as an Acrobat file here
|| Home | | Archive | | Biog | | Contact | | Freeware | | Links | | Web words ||