The A-Zed of Audio
Every day we deal with the problem of squeezing more of our world into smaller containers. How many books can you fit on your shelf? How many hours of sleep can you fit into your busy schedule? How many clothes can you cram into your suitcase as you pack for a trip? As time goes by, we gradually become crazier with this, always trying to squeeze a little more into a little less: we squeeze the books a bit closer, fold the shirts a bit tighter...you get the idea.
When it comes to audio, we do exactly the same thing, though you wouldn't know it. If the world of physical sound (that is, vibrations in the air) is one shore and the world of digital audio (that is, the way we represent those vibrations numerically) is the other, we need some sort of boat or bridge to shuttle between them, otherwise we’re in for a mean soaking.
The world of audio recording is like that person we've seen too often, getting camera-happy at a trendy restaurant – she feels compelled to snap photos of his food, and perhaps if he's really compulsive (or in real need of boosting his Instagram followers) he might take a photo of his plate after every bite. If this sounds insane, audio recording is full-blown nuts. It involves snapping tens of thousands of shots every second to track exactly what was going on with the air molecules as they went through their wild disturbances.
Let's suppose our Instagram devotee takes only two pictures – one when his meal arrives, and one of the empty plate at the end. If all we had were those two photos and we had an extremely limited experience of the way the world works (i.e. if we were computers) we’d suppose he devoured the meal in a single bite. That's the simplest explanation. But let's imagine that this fine diner had taken 8 photos throughout his meal. In this case we could look at the series and know he had started with the steak, then taken a few bites of his potatoes, then worked his way back to the steak, and finally went on to the salad. Still, 8 photos requires us to do a fair bit of filling in and there’s a lot of which we can't be sure – but it does paint a more complete picture. As the photos increase, there’s less and less guesswork to do.
Maybe somewhere around 1 photo every 10 seconds will allow us to reconstruct the meal accurately, bite-for-bite. But let's push past this and imagine the extreme case: our phone photographer went absolutely insane with excitement and took 10 photos per second. Now we're dealing with a lot of photos (or information) we don't need; the additional photos don't provide anything new if all that concerns us is the order in which he took his bites.
If this makes sense to you, you can get the point of a nervous-breakdown-worth of high octane math that goes into the Nyquist-Shannon Sampling Theorem, and understand that there's an "ideal" amount of information if we want to document an event accurately. When it comes to recording sound everything has to do with vibrating molecules. If a sound is vibrating the air around it at a rate of 1000 cycles per second this means that our molecules are getting quite a workout. In one cycle they move forward, back to their resting place, then backward by an equal amount as they moved forward, and finally back to their resting place. In the case of our 1000 Hz vibration, they do this 1000 times in a second. What’s the minimum number of snapshots we need if we want to be sure what these molecules were up to? It starts with a two, and ends with a thousand.
If you know how to play connect-the-dots, you'll understand what’s going on here – we end up with our snapshots, and then choose the simplest path to get from the first to the second, the second to the third, &c. &c.
If we only have 1000 snapshots per second then we'll only see the molecules in their 'forward' position once we connect the dots, and we can't really be sure they’re even moving – maybe their resting place is just a tiny distance forward from where it actually is. If we have only 40 snapshots per second and then try to connect the dots – or interpolate – we'll get an entirely different story of what happened, and the frequency of our sound will be far different than it actually was during the original event. This error is called aliasing.
On the other hand, if we have 4000 snapshots per second – or samples – we'll have more information that'll confirm the story we pieced together from the 2000 samples/second, but the extra 2000 aren’t necessary.
This matter of necessity is important to pick up on, because what we're talking about here concerns efficiency, and the minimum possible information to convey a signal accurately. Harry Nyquist and Claude Shannon – the namesakes of the mathematically supercharged theorem – were brilliant information scientists who worked for Bell Laboratories and its precursor in the early and middle parts of the 20th century, respectively.
When Nyquist wrote his riveting paper (whose excitement can be judged on its title alone), "Certain Topics in Telegraph Transmission Theory" in 1928, he wasn’t talking about music or anything even close to music. He was figuring out how to maximize the number of signals that could be sent along Bell's heavily used communication wires. It was a matter of economics – Bell had a resource for which the demand exceeded the supply and Harry Nyquist developed a theory which would be proven by Claude Shannon in 1949, concerning how to maximize the use of these lines. Neither of these men were musicians; they were engineers. Yet all digital music lives and dies by the Nyquist Rate.
The Nyquist Rate says that it's necessary to have a sample rate – that is, the number of snapshots you take per second – which is double the highest frequency you aim to record (or transmit). In the case of music we can generally hear up to 20 000 Hz, so CDs – the first mass produced digital audio medium – sported a sample rate of 44100 Hz (the extra 4100 Hz allows for less precise parts to be used in the digital conversion, thereby lowering the cost of these devices).
This all boils down to a useful figure which is the reverse of the Nyquist Rate: that is, the Nyquist Frequency. Whenever you hear of a system's sample rate, you can figure out the Nyquist Frequency, which is maximum frequency that system can record, just by dividing the sample rate in half. Today, 96 000 Hz sample rates are fairly common, and these allow for a frequency of 48 000 Hz to be accurately recorded. You may reasonably ask why on earth we care to do this, since it exceeds the upper range of a healthy person's hearing by 28 000 Hz. For one thing, our cats and dogs are probably enjoying our recordings a whole lot more, since we’re now including that sonic world we were previously chopping off. But there are also cases to be made for greater accuracy within the audible range. Remember, the Nyquist Rate is designed for efficiency above all else; it has the attitude of doing just enough to get by. For a long time, we were okay with this.
Our move toward higher sample rates is a step away from what's necessary in the eyes of an engineer, to what's desirable in the ears of a musician. The super high sample rates we're now using show off our digital affluence. As technology develops, as storage space drops in price, we start to experience what those who've acquired wealth experience: a distance from necessity. An ability to eat at restaurants that underfeeds you, but in an Instagram-worthy way. An ability to have a front lawn that sits bereft of gardens and does nothing productive. The Nyquist Rate represents a point of just getting by, making the monthly payments on time, but not having much left over. Now we've got extra binary income, and we're showing off our excess by spending it on sample rates we don't need, but which make our world a little more comfortable.
Jordan Mandel is a Digital Media Lab Instructor at the UW Stratford Campus, and writes for this blog regularly. His hobbies include the manufacture of non-spherical ball bearings, speculation on collector coins, and snow melting. More of his work can be found at jordanmandel.com/blog, which is home to the award-winning satire rag, The Outa Times.