The A-Zed of Audio
We see in two dimensions. It sounds strange, but have a look around you. What you see with your two eyes is that some things are smaller and some are larger, but the only data that comes to us through our eyes is height and width. Depth is a magical addition we give the world through years of experience in knowing whether a small thing is far away, or just small. If you throw a baseball to a friend standing 10 metres away, the only information each eye gives you is that the ball is shrinking until it arrives at your friend, who’s smaller than she’d be if she were closer to you, and larger than she’d be if she were farther away.
That’s the one-eyed picture of the world. But once we add a second eye which gets a slightly different image because of its gap with the first, depth just seems to appear. The most miraculous thing about the optical illusion that we call our perception of the world is that we think nothing of it. Yet the ability to assemble two slightly different images of the world and – fairly accurately – interpret a third dimension gives a person a huge advantage over cyclopses and pirates who can’t be sure whether pterodactyls are swooping down at them, or just growing really quickly.
When it comes to hearing, our situation is pretty similar and just as incredible. There’s always some variation in what each ear hears and just like with our eyes, we’re so good at interpreting these subtle differences we don’t even realize we’re doing it. But with these differences between what each ear hears we can interpret all sorts of things like the direction of a sound, the size of a room, the materials of walls, and whether that’s a wrecking ball coming at us or just Miley Cyrus.
We can use microphones to record all these sounds and often we use only a single microphone, which is like listening with only a single ear or seeing with only a single eye. With a vocalist – whose mouth is a single sound source – a monaural (or monophonic, aka mono) channel does the trick. But with a big instrument like a piano or a drum kit, some dramatically different things happen depending on your vantage point.
For this reason, ingenious engineers have developed methods of stereo recording, where two channels of sound are recorded and then mixed together to give an illusion of space, much like how stereoscopic pictures combine two images taken from a slightly different position to create the illusion of depth.
Now, the matter of how to set up these two microphones may seem like it’s straightforward but it’s not. Give engineers the opportunity to make the world a more complex place and you can be sure of one thing: they will.
Each of the many stereo recording methods comes with its own advantages and pitfalls. The simplest setup is when the capsules of two microphones are placed as close as possible (without touching), and the mics themselves are placed at right angles to each other – this is known as the X-Y configuration . In this case, the centre line between the two mics should be aimed at whatever it is you’re hoping to record. We end up with a recording composed of the X-signal on one side the Y-signal on the other. When we listen back we can hear differences as the two of them mimic the way we’d hear sounds with the two ears we have on either side of our heads.
Because the capsules of the mics are placed so close to one another we call them a coincident pair. The advantage to this is that the air molecules from the guitar or the pirate ship or dinosaur you’re recording will be pushing and pulling in pretty much the same pattern when they arrive at the two microphones since they’re so close. Of course, the disadvantage is the stereo image isn’t nearly as wide as we’d expect since our ears aren’t attached to our heads like this.
There’s an array of other solutions to the challenge of stereo recording including a handful of techniques developed by radio networks around the world in which the microphones are spaced at some carefully researched distance and angle. From the fact that there are so many of these standards (ORTF from France, NOS from Holland, DIN from Germany, etc.) you might reasonably conclude that there really isn’t a best way to do this.
The problem in all of these spaced – or non-coincident – arrangements is that the farther apart the capsules are, the greater the chance is that the air molecules are going to be doing a different dance when they arrive at the microphones. If they’re moving in exactly the same way we can say they’re in phase, and if they’re moving oppositely they’re out of phase; they can also be anywhere in between. It’s like a group of people doing the Macarena in a giant field with an impossibly loud stack of speakers. Sound moves about 340 m/s, so if you’re standing 340 metres from the stage, your dance is going to be a second behind a person standing right at the stage and having their insides turned to jelly from the sound power. A full cycle of that terrible dance (including four 90° turns after every time through it) lasts about 37 seconds, so a person standing 12.6 km from these colossal speakers would be back in phase with the people in front of the speakers, because they’d be 37 seconds behind. Everyone between them would be at different stages – some of them might be turning their hands while others are touching their elbows, some facing left while others are facing right. If you took a recording of just one person at the speakers and another 1.5 km away, they’d be out of phase, but not as out of phase as the person at the speakers and someone 6.3 km away, who would be completely out of phase: they’d be doing the exact same moves but one would be facing backward while the other is facing forward.
This is a lot like what happens with sound in the air if you set up a microphone at a distance and the air molecules hit the microphones at slightly different points in their dance. The result – instead of looking like a bunch of out-of-step dancers, is that some frequencies can start to cancel each other out, and if they’re completely out of phase you’ll get silence.
Still, even with these potential phase problems there are good reasons to use a non-coincident pair of mics. Maybe the coolest is a species of microphone designed to mimic a human head: an actual dummy head with microphones placed inside the ears. There are a variety of these, some of which have a full blown face and nose and sinus cavity, and some more simple ones that have two silicon ears at the end of a pole.
The recordings from these dummy head mics are uncanny when listened through headphones. If you search for binaural recordings on YouTube you’ll find everything from haircuts to traffic sounds to the more esoteric end of things that includes a completely unproven method of replicating drug experiences called binaural beats, and another called Autonomous Sensory Meridian Response that seems to always involve a Russian woman whispering to you in stereo as she brushes the hair of some mannequin. Things get strange quickly.
It’s funny, what lengths we’ll go to in order to replicate the world around us when all we have to do to get the most perfect binaural stereo sound is take off our headphones and listen to the world around us. Whether we see or hear in two or in three or in nine dimensions, the difficulties of producing convincing stereo recordings (or 3D images) can give you a real appreciation of how complex our natural processes really are, and raises a really good question: why are we so taken by what’s synthetic and convincing, when we have the real thing all around us? It’s like dying of thirst while wading in a pool of Coca-Cola.
Jordan Mandel is a Creative Media Instructor at the UW Stratford Campus, and writes for this blog regularly. His hobbies include Swiss ball-bearings, juggling, and Kijiji speculation. More of his work can be found at jordanmandel.com/blog, which is home to the award-winning satire rag, The Outa Times.