Three-Minute Tech: Audio compression

[In our Three-Minute Tech series, we tell you everything you really need to know about a technology in three minutes or less.]

The day may come when increased storage capacities and unfettered Internet bandwidth make uncompressed audio files the norm. But for now, audio files are generally compressed to save space and to make downloads faster. And while many people automatically assume that audio compression makes music sound worse, this isn’t always the case. Here’s a quick look at the types of audio compression used and how they work.

Lossy vs. lossless

There are two types of audio compression: lossy and lossless. Lossy, which includes formats such as MP3 and AAC that you’re very familiar with, is a type of compression in which some data from the original music is removed during the compression process. When the music is de-compressed for playback, this data remains “lost” and can’t be recovered. This may or may not affect the actual sound; I’ll get to that in a minute.

Lossless compression, however, ensures that the same data that is compressed is restored when you play back a file. The idea is the same as zip compression for computer files; when you compress an image or document, it shrinks to save space, but when you expand it, you have every bit and character that was in the original. The most common lossless audio formats are Apple Lossless, created by Apple and used by iTunes; and FLAC (Free Lossless Audio Codec), an open-source format with widespread device support that is used for many live concert recordings as well as high-resolution classical and popular music downloads.

What’s on a CD?

A CD contains a PCM (pulse-code modulation) data stream. If you copy this uncompressed data to a computer, it will be wrapped in a container as either an AIFF or WAV file. These files are simply that raw PCM steam with headers that allow them to be read on computers. The bit rate (measured in kilobits per second or kbps) for uncompressed audio copied from CDs is 1411.

How does lossless compression work?

Like compression used for graphics formats such as TIFF and PNG, lossless compression schemes look for redundant data and replace that data with shorter strings that, when decoded, result in the exact same data as the original. Lossless compression for music is not that different from the types of compression used for data and graphics. Lossless compression can shrink music files by between 40 percent and 75 percent, depending on the complexity of the music.

What’s lost in lossy compression?

Lossy compression doesn’t provide the exact same data when decompressed. The objective of this type of compression is to approximate the original, within certain limits (bit rate or file size). Much can be lost when a CD is ripped to an AAC or MP3 file, but if the compression is at a high enough bit rate, it can be unnoticeable to most listeners. In any case, the ultimate goal is to reduce the size of files, to store more music on a device, or to download faster.

Here are three files of orchestral music, exactly 10 minutes long. The first is uncompressed, the second compressed in Apple Lossless format, and the third with AAC at 256 kbps.

The basic principle of lossy audio compression is that it uses perceptual coding and psychoacoustic models to determine which sounds and frequencies humans can’t hear. While the difference in bit rate—and therefore in file size—between an original file and its compressed version may be important, the difference perceived by listeners may not. For example, with “joint stereo” compression, low frequencies may be stored in mono, rather than stereo, because the sound waves at these frequencies are so long that listeners can’t tell if they’re in stereo or not. (A 100Hz sound wave is about 3.4 meters—11.15 feet—long.) This is why your surround sound system only has one sub-woofer, and why its precise position is unimportant. Very high frequencies may also be removed; most people have a hard time hearing sounds above 16,000Hz to 18,000Hz, so frequencies above this range are usually discarded.

Other tricks to further compress audio

Variable bit rate (VBR) encoding may also be employed. This lets the encoder use lower bit rates for parts of an audio file that are less complex, and higher bit rates when needed.

In addition, some lossy compression codecs may further reduce size by containing extra lossless compression. So a 256-kbps file, which is 18.8 percent of the size of an original uncompressed CD recording, may actually contain much more data because of this second level of lossless compression.

These settings provide very compact files of spoken word recordings.

There is an ever larger difference in file size when you compress voice recordings, such as when you rip audiobooks from CDs. iTunes, for example, offers a Spoken Podcast import settings. This compresses files at a very low bit rate (32-kbps mono or 64-kbps stereo), but uses special voice filtering. Since the human voice has a very limited frequency range (from about 85Hz to 255Hz), even adding room for overtones, one can limit the frequency range to remove a huge amount of data without affecting the perceived sound.

Back in the Napster days, MP3 files of music at 64 kbps or 96 kbps were common. These files generally sounded inferior to the original CDs. But today the iTunes Store sells music at 256 kbps, and Amazon and other retailers offer files around the bit rate, or quality. In addition, audio compression has improved through the use of better MP3 encoders and more sophisticated formats such as AAC, which can provide better sound at lower bit rates.

While some listeners think there is a difference between compressed music files and original, uncompressed files, it’s a good idea to try a blind listening test; you may be surprised.

[Note: I have greatly over-simplified what is an extremely complex process. If you are interested in learning the details of this process, I recommend reading Ken C. Pohlmann’s Principles of Digital Audio.]