The case against conversational interfaces

01 Intro

Conversational interfaces are a bit of a meme. Every couple of years a shiny new AI development emerges and people in tech go “This is it! The next computing paradigm is here! We’ll only use natural language going forward!”. But then nothing actually changes and we continue using computers the way we always have, until the debate resurfaces a few years later.

We’ve gone through this cycle a couple of times now: Virtual assistants (Siri), smart speakers (Alexa, Google Home), chatbots (“conversational commerce”), AirPods-as-a-platform, and, most recently, large language models.

I’m not entirely sure where this obsession with conversational interfaces comes from. Perhaps it’s a type of anemoia, a nostalgia for a future we saw in StarTrek that never became reality. Or maybe it’s simply that people look at the term “natural language” and think “well, if it’s natural then it must be the logical end state”.

I’m here to tell you that it’s not.

02 Data transfer mechanisms

When people say “natural language” what they mean is written or verbal communication. Natural language is a way to exchange ideas and knowledge between humans. In other words, it’s a data transfer mechanism.

Data transfer mechanisms have two critical factors: speed and lossiness.

Speed determines how quickly data is transferred from the sender to the receiver, while lossiness refers to how accurately the data is transferred. In an ideal state, you want data transfer to happen at maximum speed (instant) and with perfect fidelity (lossless), but these two attributes are often a bit of a trade-off.

Let’s look at how well natural language does on the speed dimension:

The first thing I should note is that these data points are very, very simplified averages. The important part to take away from this table is not the accuracy of individual numbers, but the overall pattern: We are significantly faster at receiving data (reading, listening) than sending it (writing, speaking). This is why we can listen to podcasts at 2x speed, but not record them at 2x speed.

To put the writing and speaking speeds into perspective, we form thoughts at 1,000-3,000 words per minute. Natural language might be natural, but it’s a bottleneck.

And yet, if you think about your day-to-day interactions with other humans, most communication feels really fast and efficient. That’s because natural language is only one of many data transfer mechanisms available to us.

For example, instead of saying “I think what you just said is a great idea”, I can just give you a thumbs up. Or nod my head. Or simply smile.

Gestures and facial expressions are effectively data compression techniques. They encode information in a more compact, but lossier, form to make it faster and more convenient to transmit.

Natural language is great for data transfer that requires high fidelity (or as a data storage mechanism for async communication), but whenever possible we switch to other modes of communication that are faster and more effortless. Speed and convenience always wins.

My favorite example of truly effortless communication is a memory I have of my grandparents. At the breakfast table, my grandmother never had to ask for the butter – my grandfather always seemed to pass it to her automatically, because after 50+ years of marriage he just sensed that she was about to ask for it. It was like they were communicating telepathically.

*That* is the type of relationship I want to have with my computer!

03 Human Computer Interaction

Similar to human-to-human communication, there are different data transfer mechanisms to exchange information between humans and computers. In the early days of computing, users interacted with computers through a command line. These text-based commands were effectively a natural language interface, but required precise syntax and a deep understanding of the system.

The introduction of the GUI primarily solved a discovery problem: Instead of having to memorize exact text commands, you could now navigate and perform tasks through visual elements like menus and buttons. This didn’t just make things easier to discover, but also more convenient: It’s faster to click a button than to type a long text command.

Today, we live in a productivity equilibrium that combines graphical interfaces with keyboard-based commands.

We still use our mouse to navigate and tell our computers what to do next, but routine actions are typically communicated in form of quick-fire keyboard presses: ⌘b to format text as bold, ⌘t to open a new tab, ⌘c/v to quickly copy things from one place to another, etc.

These shortcuts are not natural language though. They are another form of data compression. Like a thumbs up or a nod, they help us to communicate faster.

Modern productivity tools take these data compression shortcuts to the next level. In tools like Linear, Raycast or Superhuman every single command is just a keystroke away. Once you’ve built the muscle memory, the data input feels completely effortless. It’s almost like being handed the butter at the breakfast table without having to ask for it.

Touch-based interfaces are considered the third pivotal milestone in the evolution of human computer interaction, but they have always been more of an augmentation of desktop computing rather than a replacement for it. Smartphones are great for “away from keyboard” workflows, but important productivity work still happens on desktop.

That’s because text is not a mobile-native input mechanism. A physical keyboard can feel like a natural extension of your mind and body, but typing on a phone is always a little awkward – and it shows in data transfer speeds: Average typing speeds on mobile are just 36 words-per-minute, notably slower than the ~60 words-per-minute on desktop.

We’ve been able to replace natural language with mobile-specific data compression algorithms like emojis or Snapchat selfies, but we’ve never found a mobile equivalent for keyboard shortcuts. Guess why we still don’t have a truly mobile-first productivity app after almost 20 years since the introduction of the iPhone?

“But what about speech-to-text,” you might say, pointing to reports about increasing usage of voice messaging. It’s true that speaking (150wpm) is indeed a faster data transfer mechanism than typing (60wpm), but that doesn’t automatically make it a better method to interact with computers.

We keep telling ourselves that previous voice interfaces like Alexa or Siri didn’t succeed because the underlying AI wasn’t smart enough, but that’s only half of the story. The core problem was never the quality of the output function, but the inconvenience of the input function: A natural language prompt like “Hey Google, what’s the weather in San Francisco today?” just takes 10x longer than simply tapping the weather app on your homescreen.

LLMs don’t solve this problem. The quality of their output is improving at an astonishing rate, but the input modality is a step backwards from what we already have. Why should I have to describe my desired action using natural language, when I could simply press a button or keyboard shortcut? Just pass me the goddamn butter.

04 Conversational UI as Augmentation

None of this is to say that LLMs aren’t great. I love LLMs. I use them all the time. In fact, I wrote this very essay with the help of an LLM.

Instead of drafting a first version with pen and paper (my preferred writing tools), I spent an entire hour walking outside, talking to ChatGPT in Advanced Voice Mode. We went through all the fuzzy ideas in my head, clarified and organized them, explored some additional talking points, and eventually pulled everything together into a first outline.

This wasn’t just a one-sided “Hey, can you write a few paragraphs about x” prompt. It felt like a genuine, in-depth conversation and exchange of ideas with a true thought partner. Even weeks later, I’m still amazed at how well it worked. It was one of those rare, magical moments where software makes you feel like you’re living in the future.

In contrast to typical human-to-computer commands, however, this workflow is not defined by speed. Like writing, my ChatGPT conversation is a thinking process – not an interaction that happens post-thought.

It should also be noted that ChatGPT does not substitute any existing software workflows in this example. It’s a completely new use case.

This brings me to my core thesis: The inconvenience and inferior data transfer speeds of conversational interfaces make them an unlikely replacement for existing computing paradigms – but what if they complement them?

The most convincing conversational UI I have seen to date was at a hackathon where a team turned Amazon Alexa into an in-game voice assistant for StarCraft II. Rather than replacing mouse and keyboard, voice acted as an additional input mechanism. It increased the bandwidth of the data transfer.

You could see the same pattern work for any type of knowledge work, where voice commands are available while you are busy doing other things. We will not replace Figma, Notion, or Excel with a chat interface. It’s not going to happen. Neither will we forever continue the status quo, where we constantly have to switch back and forth between these tools and an LLM.

Instead, AI should function as an always-on command meta-layer that spans across all tools. Users should be able to trigger actions from anywhere with simple voice prompts without having to interrupt whatever they are currently doing with mouse and keyboard.

For this future to become an actual reality, AI needs to work at the OS level. It’s not meant to be an interface for a single tool, but an interface across tools. Kevin Kwok famously wrote that “productivity and collaboration shouldn’t be two separate workflows”. And while he was referring to human-to-human collaboration, the statement is even more true in a world of human-to-AI collaboration, where the lines between productivity and coordination are becoming increasingly more blurry.

The second thing we need to figure out is how we can compress voice input to make it faster to transmit. What’s the voice equivalent of a thumbs-up or a keyboard shortcut? Can I prompt Claude faster with simple sounds and whistles? Should ChatGPT have access to my camera so it can change its answers in realtime based on my facial expressions?

Even as a secondary interface, speed and convenience is all that matters.

05 Closing thoughts

I admit that the title of this essay is a bit misleading (made you click though, didn’t it?). This isn’t really a case against conversational interfaces, it’s a case against zero-sum thinking.

We spend too much time thinking about AI as a substitute (for interfaces, workflows, and jobs) and too little time about AI as a complement. Progress rarely follows a simple path of replacement. It unlocks new, previously unimaginable things rather than merely displacing what came before.

The same is true here. The future isn’t about replacing existing computing paradigms with chat interfaces, but about enhancing them to make human-computer interaction feel effortless – like the silent exchange of butter at a well-worn breakfast table.

Thanks to Blake Robbins, Chris Paik, Jackson Dahl, Johannes Schickling, Jordan Singer, and signüll for reading drafts of this post.

Mar 27, 2025 × Berlin, DE

Thoughts on Clubhouse

Today marks the 77th day since I got onboarded to Clubhouse.

Clubhouse, for those not familiar with it, is essentially an audio-first social network. It’s kind of a mix between Reddit and a podcast. An interactive radio show. The app lets you jump into different chat rooms and participate in – or just listen to – live audio conversations around different topics.

It’s an exciting product.
And yet, in these last 77 days, I have actively used Clubhouse exactly three times.

The problem with Clubhouse is that you can only listen to conversations live as they happen. Given that the majority of the current user base is in North America, the most interesting conversations usually happen in the middle of the (European) night when I’m asleep.

The first thing I see on my phone after I wake up are a handful of Clubhouse notifications telling me about the all the interesting conversations I missed. I wish I could just download these conversations as podcasts and listen to them later.

Some have pointed out that the live nature of Clubhouse is exactly what makes it so special, comparing it to the ephemerality of Snapchat. And while I disagree on the Snapchat comparison (ephemerality ≠ synchronous creation and consumption), I do think it makes sense for Clubhouse to find its own native format rather than compete with podcasts directly.

While Clubhouse feels like live podcasts at the moment, I think over time it will probably evolve into something else. Something more unique.

The current state of Clubhouse reminds me of the early days of Twitter: People knew it was a unique new form factor, but they didn’t know how to use it yet. Most tweets were just short status updates. It took some time until the platform found its current form and use cases.

One of those use cases is “Twitter as a second screen”: Live commenting TV shows and sports events. I strongly suspect that this will become one of Clubhouse’s main uses cases as well.

As I pointed out in Airpods as a Platform, I see audio primarily as a secondary interface: You listen to music while you’re working out, for example. You consume podcasts while you are driving or commuting. You talk on Discord while you’re playing a game.

Audio is a medium that supports and augments other activities.

So instead of thinking about whether Clubhouse should make conversations available as downloads, a perhaps more interesting question is what activities could best be augmented with live audio? What does Clubhouse as an audio layer for other content look like?

The most obvious use case seems to be sports (and other events that have to be consumed live). I would love to replace the audio track of my TV sports broadcasters with a select group of (Clubhouse) experts whose opinions I’m actually interested in.

I wonder what other events or activities this would work for.
Do you have any ideas?

Dec 21, 2020 × Basel, CH

AirPods as a Platform

01 Intro

One of the favorite activities of tech analysts, VCs and similar Twitter armchair experts is to predict what the next big technology platform might be.

The usual suspects that come up in these conversations are VR/AR, crypto, smart speakers and similar IoT devices. A new contestant that I’ve seen come up more frequently in these debates recently are Apple’s AirPods.

Calling AirPods “the next big platform” is interesting because at the moment, they are not even a small platform. They are no platform at all. They are just a piece of hardware.

But that doesn’t mean they can’t become platform.

02 What is a platform?

Let’s first take a look at what a platform actually is.

At its core, a platform is something that others can build on top of. A classic example would be an operating system like iOS: By providing a set of APIs, Apple created a playground for developers to build and run applications on. In fact, new input capabilities such as the touch interface, gyroscope sensor and camera allowed developers to create unique applications that weren’t possible before.

Platforms are subject to network effects: More applications attract more users to the platform, while more users in turn attract more developers who build more apps.

It’s a classic flywheel effect that creates powerful winner-takes-all dynamics. This explains why there are only two (meaningful) mobile operating systems – iOS and Android.

It also explains why everyone is so interested in upcoming platforms – and why Apple might be interested in making AirPods a platform.

03 Why AirPods aren’t a platform

In their current form, AirPods are not a platform. They don’t provide any unique input or output functionalities that developers could leverage. Active Noise Cancellation and Transparency Mode are neat but not new or Airpods-exclusive features – other headphones offer exactly the same. In either case, developers don’t have any control over them and thus can’t build applications that use these functionalities.

Some say that AirPods will give rise to more audio apps because they are “always in” which in turn will lead to more (and perhaps new forms of) audio content. That might be true – content providers are always looking for alternative routes to get consumers’ attention – but, again, it does not make AirPods a platform. You can use any other pair of headphones to use these audio apps as well.

If Apple wants to make AirPods a platform, it needs to open up some part of the AirPods experience to developers so that they can build new things on top of it.

04 On Siri & Voice Platforms

The most obvious choice here is Siri, which is already integrated into every pair of AirPods.

In contrast to other voice assistants like Alexa and Google Assistant, Apple has never really opened up Siri for 3rd-party developers. If they did, it would create a new platform that could have its own ecosystem of apps and developers.

But I’m not convinced that this is Apple’s best option.
Let me explain why.

Opening up Siri wouldn’t make AirPods a platform, it would make Siri a platform. This might sound like a technicality, but I think it’s an important difference. As Jan König brilliantly summarized in this article, voice isn’t an interface for one device – it’s an interface across devices. It’s more of a meta-layer that should tie different products together to enable multi-device experiences.

This means Apple has little interest in making Siri an AirPods-exclusive. Voice-based computing works best when it’s everywhere. It’s about reach, not exclusivity. This is part of the reason why Google and Amazon excel at it.

At the moment, Siri’s capabilities are considerably behind those of Google Assistant and Alexa. Again, this isn’t overly surprising: Google’s and Amazon’s main job is finding the right answers to users’ questions. The required ML capabilities for a smart assistant are among the core competencies of these two companies.

But even Amazon and Google haven’t really figured out the platform part yet, as indicated by the lack of breakout 3rd-party voice applications. It seems like the two platforms are still looking for their product-market-fit beyond being just cheap speakers that you can also control with your voice.

This is partly because the above-mentioned use case of voice as a cross-device layer isn’t something developers can build with the current set of APIs.

The other big reason I see is that people are mistaking voice as a replacement for other interfaces. Movies like Her paint a future where human-computer-interaction primarily occurs via voice-powered smart assistants, but in reality, voice isn’t great as a primary or stand-alone interface. It works best as an *additional* input/output channel that augments whatever else you are doing.

Let me give you an example: Saying “Hey Google, turn up the volume” takes 10x longer than simply pressing the volume-up button on your phone. It only makes sense when your hands are busy doing other things (kitchen work, for example).

The most convincing voice app I have seen to date was at a hackathon where a team used the StarCraft API to build voice-enabled game commands. Not to replace your mouse and keyboard but to give you an additional input mechanism. Actual multitasking.

05 What Apple Should Build

I’m not against Apple opening Siri for developers. On the contrary, given that AirPods are meant to be worn all the time, a voice interface for situations that require multitasking is actually a very good idea. But voice input should remain the exceptional case. And it shouldn’t be what makes AirPods a platform.

Instead of voice, I’d love to see other input mechanisms that allow developers to build new ways for users to interact with the audio content they consume.

Most headsets currently on the market offer the following actions with one (or multiple) clicks of a physical button:

These inputs were invented a long time ago and there has been almost zero innovation since. Why has no one thought about additional buttons or click mechanisms that allow users to interact with the actual content?

For example, when listening to podcasts I often find myself wanting to bookmark things that are being talked about. It would be amazing if I could simply tap a button on my headphones which would add a timestamp to a bookmarks section of my podcast app. (Or better even, a transcript of the ~15 seconds of content before I pressed the button, which are then also automatically added to my notes app via an Apple Shortcut.)

Yes, you could build the same with voice as the input mechanism, but as we discussed earlier, saying “Hey Siri, please bookmark this!” just doesn’t seem very convenient.

While podcast apps might use the additional button as a bookmarking feature, Spotify could make it a Like button to quickly add songs to your Favorites playlist. Other developers could build completely new applications: Think about interactive audiobooks or similar two-way audio experiences, for example.

This is the beauty of platforms: You just provide developers with a set of tools and they will come up with use cases you hadn’t even thought about. Crowdsourced value creation.

06 Closing Thoughts

(1) The input mechanism I describe doesn’t have to be a physical button. In fact, gesture-based inputs might be even more convenient. If AirPods had built-in accelerometers, users could interact with audio content by nodding or shaking their heads. Radar-based sensors like Google’s Motion Sense could also create an interesting new interaction language for audio content.

(2) You could also think about the Apple Watch as the main input device. In contrast to the AirPods, Apple opened the Watch for developers from the start, but it hasn’t really seen much success as a platform. Perhaps a combination of Watch and AirPods has a better chance of creating an ecosystem with its own unique applications?

(3) One thing to keep in mind is that Apple doesn’t really have an interest in making AirPods a standalone platform. The iPhone (or rather iOS) will always be the core platform that Apple cares about. Instead of separate iPhone, Watch and AirPods ecosystems, think about Apple’s strategy as more of a multi-platform bundle. Even as a platform, AirPods will remain more of an accessory that adds stickiness to the existing iPhone ecosystem.

Do you have thoughts or feedback on this post?
If so, I’d love to hear it!

Thanks to Jan König for reading drafts of this post.

Apr 19, 2020 × Berlin, DE