The Audio Revolution, Part One

Two Truths and a Take, Season 1 Episode 26

Alex Danco

Oct 13, 2019

If I told you about a piece of consumer electronics technology that:

A billion+ people own and use every day
Has changed those people and their world in some pretty radical and consequential ways
Gets more important every year, but not much attention - and the little attention we give it is mostly a sideshow that misses the real story in plain sight:

I’d be talking about these.

Headphones.

In the next two weeks of Two Truths and a Take, we’re going to talk about how headphones, and the audio they hiss into our ears, changed everything. Our social values and instincts have changed because of headphones. Populism and politics have changed because of headphones. I think there’s even a case to be made that Donald Trump is president because of headphones. The audio revolution happened while everyone looked elsewhere.

This is a big topic, and we’re splitting it over two weeks in order to give it justice. The punchline is next week, and today is the setup. And we’ll need a lot of setup for this one.

The Basics of Information

To really understand the impact of audio, we need to go back to basics and understand how audio works as a medium, independent of its content. What does audio have to say? What does it do to us, in plain sight, that's gone unnoticed? We need to go deep into some Marshall McLuhan territory, and appreciate what he meant by his famous line The Medium Is the Message.

McLuhan is one of two 20th century figures - the other is Claude Shannon - to truly grasp how and why information technology works. Claude Shannon laid the groundwork for McLuhan by discovering Information Theory, and defining information in a counterintuitive but powerful way: as resolution of uncertainty.

Compare these two sentences: “Let’s meet tonight at my house at 7:30” versus “Let’s do tonight, maybe.” Which one contains more information? The first one. It resolves uncertainty to a higher degree, which is why we say it’s "higher resolution".

If you’re told “Let’s meet tonight at my house at 7:30”, you’ve received a pretty complete, high-resolution dose of information. On the other hand, “Let’s do tonight maybe” is lower-resolution, with some gaps you’ll need to go fill yourself. It could mean yes, it could mean no. Eventually you’ll figure it out, but it requires active work on your part to interpret your friend’s communication style and understand the message correctly.

We live in a world of information, and we often think of information in terms of sensory input coming at us. But that’s not really information. Information isn’t what we’re told; it’s what we understand.

Hot and Cool Media

Now let’s add McLuhan to the picture. McLuhan’s first insight here is that different forms of media create different kinds of spaces and stages for information and understanding, regardless of whatever the content might be. You can arrange them on a spectrum, from high-resolution to low-resolution. McLuhan labeled this spectrum “Hot” to “Cool”.

Some forms of media and communication inherently transmit information in high definition, where what’s being communicated is right in your face. Uncertainty is resolved immediately and thoroughly. The media yells at you, like a newspaper or an action movie: it doesn’t hold back. There’s no guesswork or participation required on your part. McLuhan calls this “Hot” media.

Other forms of media and communication transmit information in lower definition. The participants have to do work to integrate several different pieces or senses, including gaps in information that must be filled in or genre conventions that must be followed, in order to complete the picture. A typical telephone conversation is lower resolution media, because a large part of the message being communicated is obscured or unsaid: it isn’t in the words, but in the gaps we must fill in. This is “Cool” media.

The concept of Hot and Cool media took me a long time to really understand. But when it suddenly clicked, it clicked all at once. I think some people have a hard time figuring it out because McLuhan’s illustrative examples in Understanding Media are from another era. “The Waltz is a Hot dance, because it’s unambiguous mechanical mashing, whereas the Twist is a Cool dance, because you have to integrate information and fill in gaps in real time” was a great example then, but less so now. People also get thrown off by his description of TV as a "cool, tactile medium”. Remember, back then, TV was a glowing fuzz of white dots and muffled audio you had to piece together - a totally different medium than film (hot back then, and now) or TV today (which has heated up a lot since McLuhan’s day).

So here’s an explanation in terms of media we know today: texting, Twitter, Instagram, Facebook and YouTube.

Texting: ice cold. The entire point of texting, particularly for young people, is that it’s a way to communicate that reveals very little information. Uncertainty and ambiguity is the point. Texting, especially a group chat, is often like a game of “what’s said versus unsaid”, where gaps must be filled in. It demands active participation on your part to complete the picture of what’s being communicated. (The dreaded “…” in iMessage, which says so little but draws us in, is Cool Media.)

Twitter: cool. Similarly, Twitter is a low-resolution, character-limited format where the majority of what’s being communicated is actually just offscreen, out of the picture. The greatest tweets and the funniest jokes on Twitter are incomplete information: they’re pure punchline. The setup goes unsaid; you have to already know it, or go figure it out. It takes a lot of work to use Twitter successfully and you have to fluently understand its genre conventions in order for it to make sense. Twitter, when used optimally, is classic Cool Media.

Instagram: warm. Instagram is higher-resolution than Twitter. The main content being communicated is all visual, and you don’t need to understand genre conventions as much. Instagram in its early photo filter days was fairly hot media, but it cooled down when it became the de facto social status app. Now there's interplay between what’s posted and how many likes it gets, and from whom, and other social dynamics like private versus public posting. There is still some ambiguity, but as a medium it’s more information-complete than Twitter or texting.

Facebook: hot. Unlike Twitter, which is a muttering mass of inside jokes, or Instagram, which is warmer but still has some cool elements to it, Facebook is more like a newspaper. It’s not holding anything back. It’s a patchwork mosaic of yelling: Acknowledge this! Be angry at this! Celebrate this! There’s not a lot of mystery on Facebook, and it doesn’t take much fluency to use it correctly. The information being communicated is all right there, blasted at you. Facebook may have started out cooler, back when it was college kids navigating social status (as Instagram is used now). But it’s heated up steadily since then.

YouTube: scorching hot. We’re going to talk about YouTube later.

Now, remember: when we say Hot and Cold media, we’re not talking about the content. We’re talking about the medium itself. The Medium Is the Message means is that the choice of media creates a stage for what follows. Hot media creates space for hot communication; cool media creates space for cool communication. Hot media heats things up; cool media cools things down.

Think about the difference between communicating by texting (cool) versus email (hot). Typographically, there’s no difference between the two. But email is understood to be a single-shot method of communication, which is hot and high-resolution, whereas texting is understood to be a dialogue: it’s a cool, chatty medium by nature, where little information is actually exchanged. Communicating by email, regardless of the content, will generally heat things up and force directness. Communicating by text will generally cool things down and invite ambiguity.

Meanwhile, the physical properties of the medium you choose will also influence the temperature of what’s being communicated. A photograph is hotter than a pencil: they both make pictures, but one makes low-resolution sketches and the other high-definition images.

What’s hottest? You might think that the highest-resolution format of all could be visual, typographic or video. But it’s not. It’s audio.

Audio: the hottest format of all

Audio, especially verbal speech, is tremendously high in information content. Most people are unaware of this. We mistakenly think of information as sensory input being thrown at us, usually with a bias towards our visual senses. But information isn’t what we’re told; it’s what we understand. Audio and speech resolve uncertainty and communicate meaning more powerfully than any other format.

Audible speech burns hot with information. Intonation, accents, innuendo, vocal phrasing, emphasis, pauses, all communicate far more than a transcript can. Audio is the format for “You all know exactly what I’m talking about, because of the way I’m saying it.” Audio is how you communicate what you really mean, straight into ears, headphones and car radios, intimately and directly. Music is good at this, but speech is even better.

Here's an exercise you can do: speaking out loud, say the word “tonight” twenty different ways, where each way is communicating something distinct. You can say “tonight” in a way that’s intrigued, satisfied, tired, horny, dejected, anxious, suspicious, hesitant, desperate, or any number of ways - and the person you’re talking to will know exactly what you mean. You can’t do that easily with image or text. A transcript of the word tonight just says tonight: flat, ambiguous. Our eyes treat it neutrally. But our ears don’t. Our ears are hyper-discriminatory.

Whatever it is that’s being communicated, audio will heat it up. Imagine you’re in a confrontation with your landlord, and you can communicate either over text messaging or by phone (cooler, back-and-forth dialog) or by email or voice mail (hot, one-shot blasts). Text keeps things chill, whereas audio forces the issue.

When you present information in an audio-first format, or especially in an audio-only format, it heats up what’s being communicated, and saturates its information content. What may have seemed ambiguous or flat when presented in text or mixed media format won’t be interpreted ambiguously by your ears. Your ears understand what’s really being said, and they seek hot content.

There’s a famous story about the Nixon-Kennedy debates that I misunderstood for a long time. Following a presidential debate between Richard Nixon and JFK, those who had listened over the radio overwhelmingly felt that Nixon had won, whereas those who watched on TV felt that JFK won. I remember originally hearing this story and thinking that the point was somehow that TV was more “superficial” than radio, and that JFK’s handsome face or easy on-screen charm somehow overruled the debate’s substance on TV but not on the radio.

I’ve now come to understand that this wasn’t the point at all. The lesson has nothing to do with the content of what either of them were saying. The content doesn’t matter. What matters is that Nixon was a Hot candidate: sharp, saturated with information, abrasive, and in your face. But JFK was a Cool candidate: relaxed, speaking easy, in slogans that invited multiple interpretations, creating plenty of gaps for the audience to fill in themselves.

Hot, high-resolution media like radio created space for a hot style and messenger like Nixon really well. But cool, low-resolution media like 1960s TV rejected him. Nixon sounded powerful and alive on the radio, but abrasive and mismatched on TV. Meanwhile, Kennedy seemed slow, empty and lethargic on a hot medium like radio, but fit smoothly and confidently on TV. It couldn’t matter less what they said: our cool and neutral eyes liked Kennedy; our hot and discriminatory ears liked Nixon.

Hot media seeks and creates hot content and hot messengers. A voice like Howard Stern, coming straight into our hyper-discriminating ears, is a powerful thing and when we hear it, we want more of it. Put headphones on, turn off the lights, and put Howard’s voice in your ears - audio only, in the dark - and you’ll experience heat. Cool messages and messengers won’t cut it anymore - not on hot media; not on headphones. They feel flat and dead.

On other forms of media, cool messengers and cool messages and cool values and cool society do well, because there’s a cool environment for them that fits right. Barack Obama was a successful Cool candidate. Yes We Can was a perfectly cool message: it doesn’t really say anything, but helpfully leaves a gap for us to fill in however we’d like. That message fit perfectly on the cool format of mid-2000s mixed internet media, with Yes We Can as a cool, blank canvas. It’s an entirely different temperature from Make America Great Again. There’s no ambiguity there. We know exactly what Make America Great Again means. If you’re not sure, go listen to it spoken out loud, on talk radio.

A good match between message and medium goes a long way; a bad match usually fizzles out fast. That’s why The Medium Is the Message - dominant cool media mean popular cool messages, which in the long run - averaged out over all their content - just means a cool temperature. Hot media mean hot messages, hot temperature, and hot consequences. It’s relatively rare for mismatches to thrive.

There’s a possible exception here worth noting, which is Donald Trump’s Twitter account. It’s quite ironic, actually, that people think of Trump - a supernaturally hot entity who rides a hot political wave and a hot tide of resentment - as somehow this great master of Twitter, one of the coldest forms of mass media today.

Here’s the thing: he’s not actually a good fit for Twitter. His tweets are a jarring spectacle, clashing badly with the way the medium normally works. Trump’s tweets only really work because he’s already president, and because the clash is part of the show. And even then, Twitter is not how Trump actually talks to his base or flexes populist power. He did not rise to the presidency because of Twitter.

You want to know where he sounds positively presidential? On the radio.

Trump sounds incredible on the radio.

Come back next week for part 2: when you put headphones on everybody, and put a hot, private channel in everyone’s pocket all day, should we be surprised that things seem to be heating up?

In other news and notes this week, in Scarcity in the Software Century we got to a nice recap point where we can look at how everything fits together in context so far:

Over the rest of the book, we’ll be using this template an awful lot, as we fill in what’s been happening with software, the internet, and the modern innovation economy. If you haven’t already, you can sign up at scarcity.substack.com for weekly-ish chapters and discussion forums.

Also, don’t miss today’s bonus issue: a very special interview with Brent Beshore, coming to your inbox just a few minutes after this one. You can find a perma-link to the interview here:

An interview with Brent Beshore | alexdanco.com

Here’s an interesting head-scratcher:

Combining probability forecasts: 60% + 60% = 60%, but "Likely" + "Likely" = "Very Likely" | Mislavsky & Gaertig

What's going on here is simple but interesting. When we give a probability forecast for something (e.g. "How likely is it that Trump will get reelected?"), we usually see answers fall into one of two categories: either numeric / quantitative ("60%") or verbal / qualitative ("Likely”). A lot of the time, we have to rely on more than one probability forecast - we make up our mind by integrating several different sources of information and then deciding for ourselves what to conclude.

This paper asked something simple but important: do we combine probability forecasts differently if they're presented numerically versus verbally? Their answer was yes, in an interesting way:

When we hear multiple numeric predictions, we take a numeric average: if you hear "60%" and then next hear "65%", we integrate that into a combined probability assessment of ~63% or something. Makes sense.

But when we hear multiple verbal predictions, we do something else: we combine them in a way that's more additive. So if you hear "Likely" and then next hear "Likely", we mentally integrate that into a probability assessment of "Very Likely". (On the low end it works the same way: Unlikely + Unlikely = Very Unlikely.)

If you know anything about statistics, something very interesting has happened here. When we're presented with numerical forecasts, our brains default to a Fisher-ian worldview. It's a worldview that basically says, "I acknowledge that data is noisy, but that at least we're asking the right question." To a Fisherian, repeated confirmations of "60%" tell you: "Yeah, 60% is probably correct."

Whereas when we're given verbal forecasts, our brains become more Bayesian. Bayesians have a lot more faith in observational data, but less faith that the question being asked was correct in the first place. To a Bayesian, repeated confirmations of "Likely" tell you, "This is probably the right track. I'm becoming more confident in our understanding of the situation. Likely is upgraded to Very Likely."

An interesting side note about this is that it flies right in the face of celebrity statisticians like Nate Silver who are on these two quests to 1) get us to think more numerically, and 2) get us to think more like Bayesians. The two goals seem to be in opposition, it appears.

Alex Danco's Newsletter

Discussion about this post