Written by Greg Martin
20 min read
APR 30 2026

Making AI Sound (Almost) Human

Crafting realistic conversations using AI voice in ElevenLabs Creative.

Written by Greg Martin
20 min read
APR 30 2026

Making AI Sound (Almost) Human

Crafting realistic conversations using AI voice in ElevenLabs Creative.

Written by Greg Martin
10 min read
APR 30 2026

Making AI Sound (Almost) Human

Crafting realistic conversations using AI voice in ElevenLabs Creative.

Written by Greg Martin
20 min read
APR 30 2026

Making AI Sound (Almost) Human

Crafting realistic conversations using AI voice in ElevenLabs Creative.

In April 2026, I was asked to lead a workshop on using ElevenLabs Creative to craft realistic conversations with AI voice. What follows is that presentation adapted into article format.

In April 2026, I was asked to lead a workshop on using ElevenLabs Creative to craft realistic conversations with AI voice. What follows is that presentation adapted into article format.

In April 2026, I was asked to lead a workshop on using ElevenLabs Creative to craft realistic conversations with AI voice. What follows is that presentation adapted into article format.

We've all heard bad artificial voices, in fact some of us grew up with them. So I was genuinely surprised when, in late 2024, a project needing voiceover led me to investigate how much AI voice had improved.

We've all heard bad artificial voices, in fact some of us grew up with them. So I was genuinely surprised when, in late 2024, a project needing voiceover led me to investigate how much AI voice had improved.

We've all heard bad artificial voices, in fact some of us grew up with them. So I was genuinely surprised when, in late 2024, a project needing voiceover led me to investigate how much AI voice had improved.

AI voice today kind of blows my mind.

AI voice today kind of blows my mind.

AI voice today kind of blows my mind.

Ok, it has improved a LOT. Far more than I expected, especially given how often YouTube and TikTok still serve up robotic, lifeless VO.

Is it human quality? Yes, in small doses. Can you still tell it's AI? Almost always. But it's getting better fast. To appreciate where AI voice stands today, two things are worth keeping in mind. First, you have to not hate AI. If you do (no judgment), no amount of progress is going to impress you. Second, consider what it's replacing. Is AI voice better than a real human performance? Almost certainly not. Is it a massive improvement over the robotic voices we've lived with until now? Yes. A thousand times over.

Ok, it has improved a LOT. Far more than I expected, especially given how often YouTube and TikTok still serve up robotic, lifeless VO.

Is it human quality? Yes, in small doses. Can you still tell it's AI? Almost always. But it's getting better fast. To appreciate where AI voice stands today, two things are worth keeping in mind. First, you have to not hate AI. If you do (no judgment), no amount of progress is going to impress you. Second, consider what it's replacing. Is AI voice better than a real human performance? Almost certainly not. Is it a massive improvement over the robotic voices we've lived with until now? Yes. A thousand times over.

Ok, it has improved a LOT. Far more than I expected, especially given how often YouTube and TikTok still serve up robotic, lifeless VO.

Is it human quality? Yes, in small doses. Can you still tell it's AI? Almost always. But it's getting better fast. To appreciate where AI voice stands today, two things are worth keeping in mind. First, you have to not hate AI. If you do (no judgment), no amount of progress is going to impress you. Second, consider what it's replacing. Is AI voice better than a real human performance? Almost certainly not. Is it a massive improvement over the robotic voices we've lived with until now? Yes. A thousand times over.

Introducing ElevenLabs Creative

Introducing ElevenLabs Creative

Introducing ElevenLabs Creative

If you haven't heard of ElevenLabs, you've likely heard their work, whether in a more human-sounding call agent or while listening to a news article. They're not the only player in the AI voice space, but they're a significant one. Since launching in 2022, they've built a multifaceted suite of APIs, agents, and creative tools for multimedia production. For this article, I'm focusing purely on the latter: the ElevenLabs Creative toolset, which I spent considerable time with on a recent project.

So let's talk a little about that project, and where AI voice played a role.

If you haven't heard of ElevenLabs, you've likely heard their work, whether in a more human-sounding call agent or while listening to a news article. They're not the only player in the AI voice space, but they're a significant one. Since launching in 2022, they've built a multifaceted suite of APIs, agents, and creative tools for multimedia production. For this article, I'm focusing purely on the latter: the ElevenLabs Creative toolset, which I spent considerable time with on a recent project.

So let's talk a little about that project, and where AI voice played a role.

If you haven't heard of ElevenLabs, you've likely heard their work, whether in a more human-sounding call agent or while listening to a news article. They're not the only player in the AI voice space, but they're a significant one. Since launching in 2022, they've built a multifaceted suite of APIs, agents, and creative tools for multimedia production. For this article, I'm focusing purely on the latter: the ElevenLabs Creative toolset, which I spent considerable time with on a recent project.

So let's talk a little about that project, and where AI voice played a role.

Let's build an engaging educational platform for beginner day traders.

Let's build an engaging educational platform for beginner day traders.

Let's build an engaging educational platform for beginner day traders.

In 2019, I did some work for a client building a trading journal tool for day traders. We reconnected in 2024 to explore other opportunities in that space, and the one we latched onto was education. Day trading is a dense, often intimidating topic. Beginners looking for information have to sift through a lot of noise to find something useful. There's a lot of ego, and a lot of paywalls, and very few well-crafted educational experiences. That last part is surprising, given how much the educational space has grown and evolved. So we started exploring what it would take to build a platform for day trader education, with three goals: make it informative, make it engaging, and make it accessible.

I want to share a little about this project so you have context for why I reached for AI voice as a tool, and what I learned along the way.

In 2019, I did some work for a client building a trading journal tool for day traders. We reconnected in 2024 to explore other opportunities in that space, and the one we latched onto was education. Day trading is a dense, often intimidating topic. Beginners looking for information have to sift through a lot of noise to find something useful. There's a lot of ego, and a lot of paywalls, and very few well-crafted educational experiences. That last part is surprising, given how much the educational space has grown and evolved. So we started exploring what it would take to build a platform for day trader education, with three goals: make it informative, make it engaging, and make it accessible.

I want to share a little about this project so you have context for why I reached for AI voice as a tool, and what I learned along the way.

In 2019, I did some work for a client building a trading journal tool for day traders. We reconnected in 2024 to explore other opportunities in that space, and the one we latched onto was education. Day trading is a dense, often intimidating topic. Beginners looking for information have to sift through a lot of noise to find something useful. There's a lot of ego, and a lot of paywalls, and very few well-crafted educational experiences. That last part is surprising, given how much the educational space has grown and evolved. So we started exploring what it would take to build a platform for day trader education, with three goals: make it informative, make it engaging, and make it accessible.

I want to share a little about this project so you have context for why I reached for AI voice as a tool, and what I learned along the way.

Learning from the best.

Learning from the best.

Learning from the best.

Every project starts with a bit of research, and we had a wealth of experiences to learn from (pun definitely intended). Here’s three of our top inspirations, and why.

Every project starts with a bit of research, and we had a wealth of experiences to learn from (pun definitely intended). Here’s three of our top inspirations, and why.

Every project starts with a bit of research, and we had a wealth of experiences to learn from (pun definitely intended). Here’s three of our top inspirations, and why.

Duolingo is something of a gold standard for education, teaching through trial and error. It has some lovely UI patterns for quizzing users, which we took notes on. Its approach to pedagogy, however, wasn't the best fit for what we were trying to do. Duolingo emphasizes learning through quiz questions, but doesn't do much to explain concepts beforehand. Still, lots to learn from.

Duolingo is something of a gold standard for education, teaching through trial and error. It has some lovely UI patterns for quizzing users, which we took notes on. Its approach to pedagogy, however, wasn't the best fit for what we were trying to do. Duolingo emphasizes learning through quiz questions, but doesn't do much to explain concepts beforehand. Still, lots to learn from.

Duolingo is something of a gold standard for education, teaching through trial and error. It has some lovely UI patterns for quizzing users, which we took notes on. Its approach to pedagogy, however, wasn't the best fit for what we were trying to do. Duolingo emphasizes learning through quiz questions, but doesn't do much to explain concepts beforehand. Still, lots to learn from.

Code Academy has a lovely blend of explanation and applied learning throughout. I particularly liked their use of "fill in the blank" quizzes at the end of more conceptual lessons. Unlike matching or multiple choice, this format pushes users to recall answers on their own, without a crutch, which is a more reliable way to verify that key ideas have sunk in.

Code Academy has a lovely blend of explanation and applied learning throughout. I particularly liked their use of "fill in the blank" quizzes at the end of more conceptual lessons. Unlike matching or multiple choice, this format pushes users to recall answers on their own, without a crutch, which is a more reliable way to verify that key ideas have sunk in.

Code Academy has a lovely blend of explanation and applied learning throughout. I particularly liked their use of "fill in the blank" quizzes at the end of more conceptual lessons. Unlike matching or multiple choice, this format pushes users to recall answers on their own, without a crutch, which is a more reliable way to verify that key ideas have sunk in.

We loved Brilliant's focus on STEM topics. The tighter scope makes the offering easier to understand and, like a restaurant with a focused menu, allows it to craft better materials. We also appreciated its pattern of explanation followed by quiz questions, which keeps lessons interactive and prevents them from becoming just another textbook experience. I particularly liked how it turns both correct and incorrect quiz answers into learning moments, rather than penalizing or monetizing students the way Duolingo does.

We loved Brilliant's focus on STEM topics. The tighter scope makes the offering easier to understand and, like a restaurant with a focused menu, allows it to craft better materials. We also appreciated its pattern of explanation followed by quiz questions, which keeps lessons interactive and prevents them from becoming just another textbook experience. I particularly liked how it turns both correct and incorrect quiz answers into learning moments, rather than penalizing or monetizing students the way Duolingo does.

We loved Brilliant's focus on STEM topics. The tighter scope makes the offering easier to understand and, like a restaurant with a focused menu, allows it to craft better materials. We also appreciated its pattern of explanation followed by quiz questions, which keeps lessons interactive and prevents them from becoming just another textbook experience. I particularly liked how it turns both correct and incorrect quiz answers into learning moments, rather than penalizing or monetizing students the way Duolingo does.

What’s our ideal educational format?

What’s our ideal educational format?

What’s our ideal educational format?

When thinking about the ideal format for making complex topics accessible, we immediately thought of podcasts. I'm a reflective learner, meaning I learn better from dialogue that approaches a topic from multiple angles and probes with questions, rather than a straight lecture from a single voice.

When thinking about the ideal format for making complex topics accessible, we immediately thought of podcasts. I'm a reflective learner, meaning I learn better from dialogue that approaches a topic from multiple angles and probes with questions, rather than a straight lecture from a single voice.

When thinking about the ideal format for making complex topics accessible, we immediately thought of podcasts. I'm a reflective learner, meaning I learn better from dialogue that approaches a topic from multiple angles and probes with questions, rather than a straight lecture from a single voice.

“Hey, that’s kind of what podcasts do.”

“Hey, that’s kind of what podcasts do.”

Podcasts with more than one person, whether a straight discussion or an interview, fit this mold perfectly. Having two voices explore a topic is not only engaging (assuming the hosts are enthusiastic) but allows for questions and illuminating back-and-forth, where complex ideas can be picked apart from multiple directions. It makes me wish all my college lectures had been taught by two or three-person panels. So many topics would have benefited from the energy multiple voices bring.

Combining this with our research into existing educational platforms, we quickly aligned on a lesson format built around a series of voice and motion vignettes, broken up by short quiz sections. This would give us a "learn a little, then apply it" cadence, with each segment teaching just enough to quiz on before moving to the next.

Podcasts with more than one person, whether a straight discussion or an interview, fit this mold perfectly. Having two voices explore a topic is not only engaging (assuming the hosts are enthusiastic) but allows for questions and illuminating back-and-forth, where complex ideas can be picked apart from multiple directions. It makes me wish all my college lectures had been taught by two or three-person panels. So many topics would have benefited from the energy multiple voices bring.

Combining this with our research into existing educational platforms, we quickly aligned on a lesson format built around a series of voice and motion vignettes, broken up by short quiz sections. This would give us a "learn a little, then apply it" cadence, with each segment teaching just enough to quiz on before moving to the next.

Podcasts with more than one person, whether a straight discussion or an interview, fit this mold perfectly. Having two voices explore a topic is not only engaging (assuming the hosts are enthusiastic) but allows for questions and illuminating back-and-forth, where complex ideas can be picked apart from multiple directions. It makes me wish all my college lectures had been taught by two or three-person panels. So many topics would have benefited from the energy multiple voices bring.

Combining this with our research into existing educational platforms, we quickly aligned on a lesson format built around a series of voice and motion vignettes, broken up by short quiz sections. This would give us a "learn a little, then apply it" cadence, with each segment teaching just enough to quiz on before moving to the next.

diagram of lesson structure

But there we ran into a problem: neither of us wanted to record our own voices for a podcast-style lesson. Since the AI renaissance was, and still is, in full swing, we decided to explore what AI voice could do instead.

But there we ran into a problem: neither of us wanted to record our own voices for a podcast-style lesson. Since the AI renaissance was, and still is, in full swing, we decided to explore what AI voice could do instead.

But there we ran into a problem: neither of us wanted to record our own voices for a podcast-style lesson. Since the AI renaissance was, and still is, in full swing, we decided to explore what AI voice could do instead.

How good could a podcast built on AI voice actually be?

How good could a podcast built on AI voice actually be?

How good could a podcast built on AI voice actually be?

Enter NotebookLM. If you haven't looked into this tool from Google, you should. It's kind of awesome, allowing you to gather insights and query any informational content you put into it. One of its features is the ability to generate an AI-driven podcast from your content. We used this as a quick proof of concept for how our lesson content might sound as a podcast, and as a personal proof point that AI voice could create an engaging educational experience. And it delivered. From a rough draft of a single lesson, NotebookLM generated a 28-minute podcast that, aside from a few artifacts and glitches, was a thoroughly engaging ramble through our source material.

This made us feel solid on the hypothesis that AI voices can be crafted into podcast-style lessons that are engaging and educational, without distracting from what's being taught.

The only downside we saw in NotebookLM was the utter lack of control over how the information is presented, but we'll get into that in a moment. As a proof of concept, we thought it was brilliant.

Enter NotebookLM. If you haven't looked into this tool from Google, you should. It's kind of awesome, allowing you to gather insights and query any informational content you put into it. One of its features is the ability to generate an AI-driven podcast from your content. We used this as a quick proof of concept for how our lesson content might sound as a podcast, and as a personal proof point that AI voice could create an engaging educational experience. And it delivered. From a rough draft of a single lesson, NotebookLM generated a 28-minute podcast that, aside from a few artifacts and glitches, was a thoroughly engaging ramble through our source material.

This made us feel solid on the hypothesis that AI voices can be crafted into podcast-style lessons that are engaging and educational, without distracting from what's being taught.

The only downside we saw in NotebookLM was the utter lack of control over how the information is presented, but we'll get into that in a moment. As a proof of concept, we thought it was brilliant.

Enter NotebookLM. If you haven't looked into this tool from Google, you should. It's kind of awesome, allowing you to gather insights and query any informational content you put into it. One of its features is the ability to generate an AI-driven podcast from your content. We used this as a quick proof of concept for how our lesson content might sound as a podcast, and as a personal proof point that AI voice could create an engaging educational experience. And it delivered. From a rough draft of a single lesson, NotebookLM generated a 28-minute podcast that, aside from a few artifacts and glitches, was a thoroughly engaging ramble through our source material.

This made us feel solid on the hypothesis that AI voices can be crafted into podcast-style lessons that are engaging and educational, without distracting from what's being taught.

The only downside we saw in NotebookLM was the utter lack of control over how the information is presented, but we'll get into that in a moment. As a proof of concept, we thought it was brilliant.

Prototyping before the deep dive.

Prototyping before the deep dive.

Prototyping before the deep dive.

But we still needed to test our lesson format before committing to a full build, and that's where we come back to ElevenLabs and crafting conversations with AI voice. In February 2025, we took our working curriculum, selected a single lesson as our test case, and built it out as a fully dressed experience to put in front of others and see how our assumptions played out. Below are the first few vignettes from that prototype lesson (covering candlestick charts) to give you a sense of how the final prototyped experience looked, felt, and sounded.

But we still needed to test our lesson format before committing to a full build, and that's where we come back to ElevenLabs and crafting conversations with AI voice. In February 2025, we took our working curriculum, selected a single lesson as our test case, and built it out as a fully dressed experience to put in front of others and see how our assumptions played out. Below are the first few vignettes from that prototype lesson (covering candlestick charts) to give you a sense of how the final prototyped experience looked, felt, and sounded.

But we still needed to test our lesson format before committing to a full build, and that's where we come back to ElevenLabs and crafting conversations with AI voice. In February 2025, we took our working curriculum, selected a single lesson as our test case, and built it out as a fully dressed experience to put in front of others and see how our assumptions played out. Below are the first few vignettes from that prototype lesson (covering candlestick charts) to give you a sense of how the final prototyped experience looked, felt, and sounded.

Let’s talk about how I built this.

Let’s talk about how I built this.

Let’s talk about how I built this.

There's a lot more to our prototype lesson than I'm sharing here (it runs about 10 to 15 minutes), but I want to move away from the product discussion and start digging into how I built the experience — and what I learned about crafting conversations along the way. Specifically, I want to cover writing a script, selecting voices, tuning each line to sound human, and editing those lines into a realistic podcast-style conversation.

There's a lot more to our prototype lesson than I'm sharing here (it runs about 10 to 15 minutes), but I want to move away from the product discussion and start digging into how I built the experience — and what I learned about crafting conversations along the way. Specifically, I want to cover writing a script, selecting voices, tuning each line to sound human, and editing those lines into a realistic podcast-style conversation.

There's a lot more to our prototype lesson than I'm sharing here (it runs about 10 to 15 minutes), but I want to move away from the product discussion and start digging into how I built the experience — and what I learned about crafting conversations along the way. Specifically, I want to cover writing a script, selecting voices, tuning each line to sound human, and editing those lines into a realistic podcast-style conversation.

Writing our conversational script.

Writing our conversational script.

Writing our conversational script.

First, we need words. Without them, our AI voices have nothing to say. And as it turns out, writing a script is actually a challenge if you don't already do that sort of thing. It took a few trial runs, but the process my partner and I eventually landed on was simple: he would write the lesson content the way he wanted it, then hand it to me to translate into scripted dialogue while keeping his intent intact. Here's an example of his output:

First, we need words. Without them, our AI voices have nothing to say. And as it turns out, writing a script is actually a challenge if you don't already do that sort of thing. It took a few trial runs, but the process my partner and I eventually landed on was simple: he would write the lesson content the way he wanted it, then hand it to me to translate into scripted dialogue while keeping his intent intact. Here's an example of his output:

First, we need words. Without them, our AI voices have nothing to say. And as it turns out, writing a script is actually a challenge if you don't already do that sort of thing. It took a few trial runs, but the process my partner and I eventually landed on was simple: he would write the lesson content the way he wanted it, then hand it to me to translate into scripted dialogue while keeping his intent intact. Here's an example of his output:

A Candlestick Chart is a type of financial chart that provides the same information as a bar chart (Open, High, Low, and Close, or OHLC) but presents it in a visually intuitive format. Its design, which uses color-coded candlesticks, makes it easier to interpret trends and market sentiment at a glance. Candlestick Body: The rectangular portion of the candlestick, known as the body, represents the range between the opening and closing prices during a specific timeframe.
 Compact Display of Information: 
A single candlestick contains all four key price points (OHLC) within a specific timeframe.

A Candlestick Chart is a type of financial chart that provides the same information as a bar chart (Open, High, Low, and Close, or OHLC) but presents it in a visually intuitive format. Its design, which uses color-coded candlesticks, makes it easier to interpret trends and market sentiment at a glance. Candlestick Body: The rectangular portion of the candlestick, known as the body, represents the range between the opening and closing prices during a specific timeframe.
 Compact Display of Information: 
A single candlestick contains all four key price points (OHLC) within a specific timeframe.

A Candlestick Chart is a type of financial chart that provides the same information as a bar chart (Open, High, Low, and Close, or OHLC) but presents it in a visually intuitive format. Its design, which uses color-coded candlesticks, makes it easier to interpret trends and market sentiment at a glance. Candlestick Body: The rectangular portion of the candlestick, known as the body, represents the range between the opening and closing prices during a specific timeframe.
 Compact Display of Information: 
A single candlestick contains all four key price points (OHLC) within a specific timeframe.

Dry, textbook-style content. Not bad by any means, but not engaging — nor was it supposed to be. Here's what I would translate it into:

Dry, textbook-style content. Not bad by any means, but not engaging — nor was it supposed to be. Here's what I would translate it into:

Dry, textbook-style content. Not bad by any means, but not engaging — nor was it supposed to be. Here's what I would translate it into:

A

Alright, let's begin with the basics.

A

Alright, let's begin with the basics.

A

Alright, let's begin with the basics.

What exactly is a candlestick, and why does everyone use them in trading?

B

What exactly is a candlestick, and why does everyone use them in trading?

B

What exactly is a candlestick, and why does everyone use them in trading?

B

A

Candlesticks are a visual representation of a stock’s price used to show us how much it moved during a specific timeframe.

A

Candlesticks are a visual representation of a stock’s price used to show us how much it moved during a specific timeframe.

A

Candlesticks are a visual representation of a stock’s price used to show us how much it moved during a specific timeframe.

Right, so if you’re looking at, say, a one minute timeframe, then every candlestick represents how price has changed over the course of 60 seconds. Kind of like a snapshot.

B

Right, so if you’re looking at, say, a one minute timeframe, then every candlestick represents how price has changed over the course of 60 seconds. Kind of like a snapshot.

B

Right, so if you’re looking at, say, a one minute timeframe, then every candlestick represents how price has changed over the course of 60 seconds. Kind of like a snapshot.

B

A

Exactly. Each candlestick captures where the price opened at the beginning of the minute.

A

Exactly. Each candlestick captures where the price opened at the beginning of the minute.

A

Exactly. Each candlestick captures where the price opened at the beginning of the minute.

Known as its opening price.

B

Known as its opening price.

B

Known as its opening price.

B

A

Right, as well as how much it had moved 60 seconds later.

A

Right, as well as how much it had moved 60 seconds later.

A

Right, as well as how much it had moved 60 seconds later.

...known as its closing price.

B

...known as its closing price.

B

...known as its closing price.

B

A

And they also capture the highest and lowest point the price reached in that time.

A

And they also capture the highest and lowest point the price reached in that time.

A

And they also capture the highest and lowest point the price reached in that time.

Oh, vocabulary time! The opening, high, low, and closing prices are collectively known as OHLC in trader parlance. You’ll probably hear that acronym a lot.

B

Oh, vocabulary time! The opening, high, low, and closing prices are collectively known as OHLC in trader parlance. You’ll probably hear that acronym a lot.

B

Oh, vocabulary time! The opening, high, low, and closing prices are collectively known as OHLC in trader parlance. You’ll probably hear that acronym a lot.

B

A

Oh yeah... you'll definitely hear that acronym a lot.

A

Oh yeah... you'll definitely hear that acronym a lot.

A

Oh yeah... you'll definitely hear that acronym a lot.

So, y'know, throw it around if you want to sound fancy.

B

So, y'know, throw it around if you want to sound fancy.

B

So, y'know, throw it around if you want to sound fancy.

B

A

Anyway, candlesticks capture all of this in a simple visual format we can read at a glance.

A

Anyway, candlesticks capture all of this in a simple visual format we can read at a glance.

A

Anyway, candlesticks capture all of this in a simple visual format we can read at a glance.

They're kind of awesome in a deeply nerdy way. 

B

They're kind of awesome in a deeply nerdy way. 

B

They're kind of awesome in a deeply nerdy way. 

B

You get the idea. We went this route because we had clear opinions on how we wanted our lessons to flow — what we covered, how we explained it, and in what order we unpacked it so each segment built on the last. I'd take his block of dry, textbook-style content with annotated visual sketches and rewrite it so two imagined podcast hosts could make the material more engaging and accessible.

There are actually three ways to generate a script if we did this today, and they run on a spectrum from control to convenience.

You get the idea. We went this route because we had clear opinions on how we wanted our lessons to flow — what we covered, how we explained it, and in what order we unpacked it so each segment built on the last. I'd take his block of dry, textbook-style content with annotated visual sketches and rewrite it so two imagined podcast hosts could make the material more engaging and accessible.

There are actually three ways to generate a script if we did this today, and they run on a spectrum from control to convenience.

You get the idea. We went this route because we had clear opinions on how we wanted our lessons to flow — what we covered, how we explained it, and in what order we unpacked it so each segment built on the last. I'd take his block of dry, textbook-style content with annotated visual sketches and rewrite it so two imagined podcast hosts could make the material more engaging and accessible.

There are actually three ways to generate a script if we did this today, and they run on a spectrum from control to convenience.

Write your own (100% Control)

When you need absolute control over your script, there's no substitute for owning it yourself.

Write your own (100% Control)

When you need absolute control over your script, there's no substitute for owning it yourself.

Write your own (100% Control)

When you need absolute control over your script, there's no substitute for owning it yourself.

Co-author with LLM support. (50/50 Control + Convenience)

Give your LLM of choice a solid prompt for your topic, any information sources (docs, URLs), and notes on how you'd like to shape the conversation. Include not only how you'd like the topic unpacked, but direction notes for the style of conversation: how many speakers, their relative personalities, how the voices should complement each other. In short, not just what you want the conversation to cover, but the experience you want it to present to your audience. Then prepare to tweak iteratively. A lot. At the end you should have something ready to paste into an ElevenLabs Creative project. Honestly, this is probably the route I'd take if I were doing this project over again today.

Co-author with LLM support. (50/50 Control + Convenience)

Give your LLM of choice a solid prompt for your topic, any information sources (docs, URLs), and notes on how you'd like to shape the conversation. Include not only how you'd like the topic unpacked, but direction notes for the style of conversation: how many speakers, their relative personalities, how the voices should complement each other. In short, not just what you want the conversation to cover, but the experience you want it to present to your audience. Then prepare to tweak iteratively. A lot. At the end you should have something ready to paste into an ElevenLabs Creative project. Honestly, this is probably the route I'd take if I were doing this project over again today.

Co-author with LLM support. (50/50 Control + Convenience)

Give your LLM of choice a solid prompt for your topic, any information sources (docs, URLs), and notes on how you'd like to shape the conversation. Include not only how you'd like the topic unpacked, but direction notes for the style of conversation: how many speakers, their relative personalities, how the voices should complement each other. In short, not just what you want the conversation to cover, but the experience you want it to present to your audience. Then prepare to tweak iteratively. A lot. At the end you should have something ready to paste into an ElevenLabs Creative project. Honestly, this is probably the route I'd take if I were doing this project over again today.

Let AI generate your podcast (100% Convenience)

ElevenLabs has a built-in podcast script generator that lets you pull in an information source (a doc, a URL, etc.), pick two voices, and hit the ground running. This approach is great for getting something on the board quickly, but gives you very little control over the initial script. That might be fine for your purposes, just prepare yourself for a lot of editing.

Let AI generate your podcast (100% Convenience)

ElevenLabs has a built-in podcast script generator that lets you pull in an information source (a doc, a URL, etc.), pick two voices, and hit the ground running. This approach is great for getting something on the board quickly, but gives you very little control over the initial script. That might be fine for your purposes, just prepare yourself for a lot of editing.

Let AI generate your podcast (100% Convenience)

ElevenLabs has a built-in podcast script generator that lets you pull in an information source (a doc, a URL, etc.), pick two voices, and hit the ground running. This approach is great for getting something on the board quickly, but gives you very little control over the initial script. That might be fine for your purposes, just prepare yourself for a lot of editing.

We’ve got our script, now we need voices.

We’ve got our script, now we need voices.

We’ve got our script, now we need voices.

It's time to dig into ElevenLabs and their extensive voice library. Selecting one voice is hard enough (the dilemma of choice is real). But if you're building a conversation with two voices, how those voices work together matters a lot, because the voices you select tell a story. This means selecting two or more voices is a full-on casting exercise, and it helps to have a clear set of criteria for what you're looking for, even if that changes later.

Here are a few key qualities I considered while shopping for voices, and why they mattered for my specific project. Yours may differ.

It's time to dig into ElevenLabs and their extensive voice library. Selecting one voice is hard enough (the dilemma of choice is real). But if you're building a conversation with two voices, how those voices work together matters a lot, because the voices you select tell a story. This means selecting two or more voices is a full-on casting exercise, and it helps to have a clear set of criteria for what you're looking for, even if that changes later.

Here are a few key qualities I considered while shopping for voices, and why they mattered for my specific project. Yours may differ.

It's time to dig into ElevenLabs and their extensive voice library. Selecting one voice is hard enough (the dilemma of choice is real). But if you're building a conversation with two voices, how those voices work together matters a lot, because the voices you select tell a story. This means selecting two or more voices is a full-on casting exercise, and it helps to have a clear set of criteria for what you're looking for, even if that changes later.

Here are a few key qualities I considered while shopping for voices, and why they mattered for my specific project. Yours may differ.

Age

Just like in real life, the impression of age can affect the perceived credibility of your voices.

For my project we wanted voices that sounded like they were in their late twenties to early thirties. Not super young, not super old. That range lends credibility when discussing complex financial topics, while still leaving room for levity to keep things engaging.

Age

Just like in real life, the impression of age can affect the perceived credibility of your voices.

For my project we wanted voices that sounded like they were in their late twenties to early thirties. Not super young, not super old. That range lends credibility when discussing complex financial topics, while still leaving room for levity to keep things engaging.

Age

Just like in real life, the impression of age can affect the perceived credibility of your voices.

For my project we wanted voices that sounded like they were in their late twenties to early thirties. Not super young, not super old. That range lends credibility when discussing complex financial topics, while still leaving room for levity to keep things engaging.

Energy + Personality

Every voice comes with an energy level and an ingrained personality that you can enhance or subvert with your script.



High energy or bubbly personalities mesh well with social media or popular topics, while more serious discussions like politics or STEM may benefit from a calmer, more measured tone. Voices with wildly different personalities give you a hook to play off, whether for comedic effect or to represent two different audience perspectives. A classic example: a younger, more exuberant host interviewing an older, calmer voice to make a complex topic more accessible to a younger audience.



For my project we wanted relatively calm voices with room to stretch into humor and levity. Not stoic or bland, but not excessively bubbly either. Think calm, with the potential for dry wit. We also wanted them to be complementary, since we planned to have them trade speaking roles throughout the lessons.

Energy + Personality

Every voice comes with an energy level and an ingrained personality that you can enhance or subvert with your script.



High energy or bubbly personalities mesh well with social media or popular topics, while more serious discussions like politics or STEM may benefit from a calmer, more measured tone. Voices with wildly different personalities give you a hook to play off, whether for comedic effect or to represent two different audience perspectives. A classic example: a younger, more exuberant host interviewing an older, calmer voice to make a complex topic more accessible to a younger audience.



For my project we wanted relatively calm voices with room to stretch into humor and levity. Not stoic or bland, but not excessively bubbly either. Think calm, with the potential for dry wit. We also wanted them to be complementary, since we planned to have them trade speaking roles throughout the lessons.

Energy + Personality

Every voice comes with an energy level and an ingrained personality that you can enhance or subvert with your script.



High energy or bubbly personalities mesh well with social media or popular topics, while more serious discussions like politics or STEM may benefit from a calmer, more measured tone. Voices with wildly different personalities give you a hook to play off, whether for comedic effect or to represent two different audience perspectives. A classic example: a younger, more exuberant host interviewing an older, calmer voice to make a complex topic more accessible to a younger audience.



For my project we wanted relatively calm voices with room to stretch into humor and levity. Not stoic or bland, but not excessively bubbly either. Think calm, with the potential for dry wit. We also wanted them to be complementary, since we planned to have them trade speaking roles throughout the lessons.

Gender

All men? All women? Carefully neutral? Mixed? Like age, this decision is shaped by your topic and the kind of representation you want your conversation to project.

For my project, having a female voice was important — day trading is a male-dominated space and we wanted to push back on that. Our view was that having both male and female voices speak with equal confidence and authority would make the content more accessible to our entire audience, not just aspiring finance bros.

Gender

All men? All women? Carefully neutral? Mixed? Like age, this decision is shaped by your topic and the kind of representation you want your conversation to project.

For my project, having a female voice was important — day trading is a male-dominated space and we wanted to push back on that. Our view was that having both male and female voices speak with equal confidence and authority would make the content more accessible to our entire audience, not just aspiring finance bros.

Gender

All men? All women? Carefully neutral? Mixed? Like age, this decision is shaped by your topic and the kind of representation you want your conversation to project.

For my project, having a female voice was important — day trading is a male-dominated space and we wanted to push back on that. Our view was that having both male and female voices speak with equal confidence and authority would make the content more accessible to our entire audience, not just aspiring finance bros.

It's also worth noting that not all voices in ElevenLabs' library have the same audio quality. Some sound like they were recorded in a professional booth, while others have reverb or ambient noise, as if recorded in a kitchen. This is a minor thing, but unless you want to do extra audio cleanup, it's worth keeping an ear out for when pairing voices. That said, maybe having people sound like they're calling in from home is actually a good thing for your project.

It's also worth noting that not all voices in ElevenLabs' library have the same audio quality. Some sound like they were recorded in a professional booth, while others have reverb or ambient noise, as if recorded in a kitchen. This is a minor thing, but unless you want to do extra audio cleanup, it's worth keeping an ear out for when pairing voices. That said, maybe having people sound like they're calling in from home is actually a good thing for your project.

It's also worth noting that not all voices in ElevenLabs' library have the same audio quality. Some sound like they were recorded in a professional booth, while others have reverb or ambient noise, as if recorded in a kitchen. This is a minor thing, but unless you want to do extra audio cleanup, it's worth keeping an ear out for when pairing voices. That said, maybe having people sound like they're calling in from home is actually a good thing for your project.

Test-driving your voices.

Test-driving your voices.

Test-driving your voices.

As nice as the sample audio is for each voice in the ElevenLabs library, you have to take them for a spin to see how they mesh. Here's the sample script I used to test our finalist voice pairings. It not only removed the variable of random dialogue during the selection process, but also combined informational speech with more natural, human asides.

As nice as the sample audio is for each voice in the ElevenLabs library, you have to take them for a spin to see how they mesh. Here's the sample script I used to test our finalist voice pairings. It not only removed the variable of random dialogue during the selection process, but also combined informational speech with more natural, human asides.

As nice as the sample audio is for each voice in the ElevenLabs library, you have to take them for a spin to see how they mesh. Here's the sample script I used to test our finalist voice pairings. It not only removed the variable of random dialogue during the selection process, but also combined informational speech with more natural, human asides.

A

We shouldn't feel super confident we know what's about to happen unless we have more than one piece of evidence.

A

We shouldn't feel super confident we know what's about to happen unless we have more than one piece of evidence.

A

We shouldn't feel super confident we know what's about to happen unless we have more than one piece of evidence.

It’s super important. One of the most common ways beginner traders lose money is by getting overly excited and acting on a single piece of information.

B

It’s super important. One of the most common ways beginner traders lose money is by getting overly excited and acting on a single piece of information.

B

It’s super important. One of the most common ways beginner traders lose money is by getting overly excited and acting on a single piece of information.

B

A

Yeah... don’t do that.

A

Yeah... don’t do that.

A

Yeah... don’t do that.

Yup, nope, not a good idea.

B

Yup, nope, not a good idea.

B

Yup, nope, not a good idea.

B

And here's the sample audio for our three finalist pairings, along with notes on how each was assessed for our specific needs.

And here's the sample audio for our three finalist pairings, along with notes on how each was assessed for our specific needs.

And here's the sample audio for our three finalist pairings, along with notes on how each was assessed for our specific needs.

0:00/1:34

Mark + Alania

These two work really well together because they feel about the same age and energy level. Mark has a calm confidence, while Alania is laid back and quiet (she'd be amazing in a jazz podcast). We ended up using Mark for our male voice, but as a pairing they were a little too similar, like vanilla ice cream with white sprinkles.

0:00/1:34

Mark + Alania

These two work really well together because they feel about the same age and energy level. Mark has a calm confidence, while Alania is laid back and quiet (she'd be amazing in a jazz podcast). We ended up using Mark for our male voice, but as a pairing they were a little too similar, like vanilla ice cream with white sprinkles.

0:00/1:34

Mark + Alania

These two work really well together because they feel about the same age and energy level. Mark has a calm confidence, while Alania is laid back and quiet (she'd be amazing in a jazz podcast). We ended up using Mark for our male voice, but as a pairing they were a little too similar, like vanilla ice cream with white sprinkles.

0:00/1:34

Finn + Jessica

These two are noticeably younger, with a higher energy level. They complement each other nicely, but sound a bit younger than we wanted for the topic of day trading. They'd be great for a lighter topic or a younger audience.

0:00/1:34

Finn + Jessica

These two are noticeably younger, with a higher energy level. They complement each other nicely, but sound a bit younger than we wanted for the topic of day trading. They'd be great for a lighter topic or a younger audience.

0:00/1:34

Finn + Jessica

These two are noticeably younger, with a higher energy level. They complement each other nicely, but sound a bit younger than we wanted for the topic of day trading. They'd be great for a lighter topic or a younger audience.

0:00/1:34

Mark + Juniper (winner)

Compared to Alania, Juniper is a much better complement to Mark. Both share the same calm confidence, but Juniper comes across as a bit more emphatic, which gives them slightly different personalities. Both also have the ability to slide into humor where appropriate: Mark holds down the dry wit, Juniper leans into a little sass. Perfect for our needs.

0:00/1:34

Mark + Juniper (winner)

Compared to Alania, Juniper is a much better complement to Mark. Both share the same calm confidence, but Juniper comes across as a bit more emphatic, which gives them slightly different personalities. Both also have the ability to slide into humor where appropriate: Mark holds down the dry wit, Juniper leans into a little sass. Perfect for our needs.

0:00/1:34

Mark + Juniper (winner)

Compared to Alania, Juniper is a much better complement to Mark. Both share the same calm confidence, but Juniper comes across as a bit more emphatic, which gives them slightly different personalities. Both also have the ability to slide into humor where appropriate: Mark holds down the dry wit, Juniper leans into a little sass. Perfect for our needs.

Building lines and conversations.

Building lines and conversations.

Building lines and conversations.

We have a script and we have our voices. It's time to build our podcast dialogue. This is where things get technical with the ElevenLabs toolset. Once I'd pasted the script into an ElevenLabs project and assigned my voices, I needed to answer two questions:

  1. How do I get my voices to say their lines the way I want them to?

  2. How do I get my scripted conversation to feel natural?



Let's tackle these one at a time.

We have a script and we have our voices. It's time to build our podcast dialogue. This is where things get technical with the ElevenLabs toolset. Once I'd pasted the script into an ElevenLabs project and assigned my voices, I needed to answer two questions:

  1. How do I get my voices to say their lines the way I want them to?

  2. How do I get my scripted conversation to feel natural?



Let's tackle these one at a time.

We have a script and we have our voices. It's time to build our podcast dialogue. This is where things get technical with the ElevenLabs toolset. Once I'd pasted the script into an ElevenLabs project and assigned my voices, I needed to answer two questions:

  1. How do I get my voices to say their lines the way I want them to?

  2. How do I get my scripted conversation to feel natural?



Let's tackle these one at a time.

A new project in ElevenLabs showing a script with voices assigned to their respective lines.

A new project in ElevenLabs showing a script with voices assigned to their respective lines.

A new project in ElevenLabs showing a script with voices assigned to their respective lines.

A new project in ElevenLabs showing a script with voices assigned to their respective lines.

We'll start by pulling our script into an ElevenLabs project and assigning our voices to their respective lines.

We'll start by pulling our script into an ElevenLabs project and assigning our voices to their respective lines.

We'll start by pulling our script into an ElevenLabs project and assigning our voices to their respective lines.

How do I get my voices to say things the way I want them to?

How do I get my voices to say things the way I want them to?

How do I get my voices to say things the way I want them to?

First, hats off to ElevenLabs. The first time you generate a line of voice, there's a good chance it'll sound great. Not perfect, but better than good. There's still work to be done to make each line sound as human and engaging as possible, and that work happens on a line-by-line basis. Here are the tools at your disposal:

  1. Regeneration

  2. Formatting and Punctuation

  3. Overrides

  4. Voice Models and Audio Tags

First, hats off to ElevenLabs. The first time you generate a line of voice, there's a good chance it'll sound great. Not perfect, but better than good. There's still work to be done to make each line sound as human and engaging as possible, and that work happens on a line-by-line basis. Here are the tools at your disposal:

  1. Regeneration

  2. Formatting and Punctuation

  3. Overrides

  4. Voice Models and Audio Tags

First, hats off to ElevenLabs. The first time you generate a line of voice, there's a good chance it'll sound great. Not perfect, but better than good. There's still work to be done to make each line sound as human and engaging as possible, and that work happens on a line-by-line basis. Here are the tools at your disposal:

  1. Regeneration

  2. Formatting and Punctuation

  3. Overrides

  4. Voice Models and Audio Tags

Serendipity is still your friend.

Serendipity is still your friend.

Serendipity is still your friend.

The first tool for getting your voices to say lines the way you want is simple: regenerate the line. There's a certain amount of variation in how the AI will read a given line, not unlike how a human might vary in repetition. This means you can get significantly different results from the same scripted line. Here's an example.

The first tool for getting your voices to say lines the way you want is simple: regenerate the line. There's a certain amount of variation in how the AI will read a given line, not unlike how a human might vary in repetition. This means you can get significantly different results from the same scripted line. Here's an example.

The first tool for getting your voices to say lines the way you want is simple: regenerate the line. There's a certain amount of variation in how the AI will read a given line, not unlike how a human might vary in repetition. This means you can get significantly different results from the same scripted line. Here's an example.

0:00/1:34

This is the same line re-generated over and over. As with most generative AI, you can get fairly different results just by giving the AI another go.

0:00/1:34

This is the same line re-generated over and over. As with most generative AI, you can get fairly different results just by giving the AI another go.

0:00/1:34

This is the same line re-generated over and over. As with most generative AI, you can get fairly different results just by giving the AI another go.

The power of formatting and punctuation.

The power of formatting and punctuation.

The power of formatting and punctuation.

The way you write a line has a huge impact on how the AI voice reads it. Simple things like punctuation and capitalization signal to the AI where to place emphasis and where to pause. And it's not just normal grammar rules. To get your AI to say things the way you want, your script might need to look a little weird.

Here's a quick snippet of script using our Juniper voice, written normally.

The way you write a line has a huge impact on how the AI voice reads it. Simple things like punctuation and capitalization signal to the AI where to place emphasis and where to pause. And it's not just normal grammar rules. To get your AI to say things the way you want, your script might need to look a little weird.

Here's a quick snippet of script using our Juniper voice, written normally.

The way you write a line has a huge impact on how the AI voice reads it. Simple things like punctuation and capitalization signal to the AI where to place emphasis and where to pause. And it's not just normal grammar rules. To get your AI to say things the way you want, your script might need to look a little weird.

Here's a quick snippet of script using our Juniper voice, written normally.

0:00/1:34

Juniper

(original)

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

0:00/1:34

Juniper

(original)

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

0:00/1:34

Juniper

(original)

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

And here's a second take with some extra punctuation and formatting applied. It may be subtle, but notice how the timing and emphasis change.

And here's a second take with some extra punctuation and formatting applied. It may be subtle, but notice how the timing and emphasis change.

And here's a second take with some extra punctuation and formatting applied. It may be subtle, but notice how the timing and emphasis change.

0:00/1:34

Juniper

(adjusted)

They're like a "dynamic map" that shows us where prices have traveled ...and give us HINTS for where they might head next!

0:00/1:34

Juniper

(adjusted)

They're like a "dynamic map" that shows us where prices have traveled ...and give us HINTS for where they might head next!

0:00/1:34

Juniper

(adjusted)

They're like a "dynamic map" that shows us where prices have traveled ...and give us HINTS for where they might head next!

Intonation, emphasis, and timing are huge components of how we speak when we're not just reading lines. They're also critical for directing your audience's attention to what matters.

Here’s a few examples of formatting tweaks you can use for voice direction:

Intonation, emphasis, and timing are huge components of how we speak when we're not just reading lines. They're also critical for directing your audience's attention to what matters.

Here’s a few examples of formatting tweaks you can use for voice direction:

Intonation, emphasis, and timing are huge components of how we speak when we're not just reading lines. They're also critical for directing your audience's attention to what matters.

Here’s a few examples of formatting tweaks you can use for voice direction:

Emphasis

‘single quotes’
“double quotes”
CAPTIALIZE

Emphasis

‘single quotes’
“double quotes”
CAPTIALIZE

Emphasis

‘single quotes’
“double quotes”
CAPTIALIZE

Pauses

ellipsis...
comma,
dash —

Pauses

ellipsis...
comma,
dash —

Pauses

ellipsis...
comma,
dash —

Inflection

period.
ellipsis...
exclamation!
question?
no punctuation

Inflection

period.
ellipsis...
exclamation!
question?
no punctuation

Inflection

period.
ellipsis...
exclamation!
question?
no punctuation

Playing with overrides.

Playing with overrides.

Playing with overrides.

In addition to script formatting, ElevenLabs also provides a set of override tools to change how your voice speaks. Here’s the set you can play with for the Multilingual v2 Model (more on models in a moment) and broadly what they do for you. It’s important to note that there are no specific recipes for specific outcomes here — these are broadly “play around and see what you get” tools.

In addition to script formatting, ElevenLabs also provides a set of override tools to change how your voice speaks. Here’s the set you can play with for the Multilingual v2 Model (more on models in a moment) and broadly what they do for you. It’s important to note that there are no specific recipes for specific outcomes here — these are broadly “play around and see what you get” tools.

In addition to script formatting, ElevenLabs also provides a set of override tools to change how your voice speaks. Here’s the set you can play with for the Multilingual v2 Model (more on models in a moment) and broadly what they do for you. It’s important to note that there are no specific recipes for specific outcomes here — these are broadly “play around and see what you get” tools.

ElevenLabs voice overrides

ElevenLabs voice overrides for v2 model

Caption for whatever this is

Speed

How fast is the voice reading the line? This is useful for two things. First, some voices naturally talk slower or faster than others, so this is a good way to normalize them if you want multiple voices speaking at the same pace. Second, dialogue speed can be a solid emotional cue. Slower reading can come across as intentional or thoughtful, while faster delivery may read as nervous or excited.

Speed

How fast is the voice reading the line? This is useful for two things. First, some voices naturally talk slower or faster than others, so this is a good way to normalize them if you want multiple voices speaking at the same pace. Second, dialogue speed can be a solid emotional cue. Slower reading can come across as intentional or thoughtful, while faster delivery may read as nervous or excited.

Speed

How fast is the voice reading the line? This is useful for two things. First, some voices naturally talk slower or faster than others, so this is a good way to normalize them if you want multiple voices speaking at the same pace. Second, dialogue speed can be a solid emotional cue. Slower reading can come across as intentional or thoughtful, while faster delivery may read as nervous or excited.

Stability

How consistently does the voice read the line between regenerations? This is similar to the --chaos attribute in Midjourney, increasing or decreasing variation between generations so you can explore broader or more specific nuance between takes.

Stability

How consistently does the voice read the line between regenerations? This is similar to the --chaos attribute in Midjourney, increasing or decreasing variation between generations so you can explore broader or more specific nuance between takes.

Stability

How consistently does the voice read the line between regenerations? This is similar to the --chaos attribute in Midjourney, increasing or decreasing variation between generations so you can explore broader or more specific nuance between takes.

Similarity

How clear and consistent is the voice across a line? I'll hedge on this one, but I believe this slider is most useful for longer blocks of dialogue, helping the voice remain consistent from start to finish. Most of my dialogue has been shorter segments, so I haven't seen it make much of a difference in my own work. Your mileage may vary.

Similarity

How clear and consistent is the voice across a line? I'll hedge on this one, but I believe this slider is most useful for longer blocks of dialogue, helping the voice remain consistent from start to finish. Most of my dialogue has been shorter segments, so I haven't seen it make much of a difference in my own work. Your mileage may vary.

Similarity

How clear and consistent is the voice across a line? I'll hedge on this one, but I believe this slider is most useful for longer blocks of dialogue, helping the voice remain consistent from start to finish. Most of my dialogue has been shorter segments, so I haven't seen it make much of a difference in my own work. Your mileage may vary.

Style Exaggeration

How exaggerated is the voice style on a given line? Every voice has a certain degree of personality baked in, and this slider dials that up or down, affecting intonation and inflection. It's a great tool for getting more (or less) human variation out of your voice, but pushed to the extreme it can introduce erratic, Tourette's-style artifacts.

Style Exaggeration

How exaggerated is the voice style on a given line? Every voice has a certain degree of personality baked in, and this slider dials that up or down, affecting intonation and inflection. It's a great tool for getting more (or less) human variation out of your voice, but pushed to the extreme it can introduce erratic, Tourette's-style artifacts.

Style Exaggeration

How exaggerated is the voice style on a given line? Every voice has a certain degree of personality baked in, and this slider dials that up or down, affecting intonation and inflection. It's a great tool for getting more (or less) human variation out of your voice, but pushed to the extreme it can introduce erratic, Tourette's-style artifacts.

Playing with voice models.

Playing with voice models.

Playing with voice models.

The default voice model in ElevenLabs is v2, which provides consistent, studio-quality output with lifelike intonation and the ability to evoke emotion. All of the examples so far have used this model, including the lesson prototype (v3 came out right as we finished our work).

At the time of writing, there is also a v3 model. It's much more expressive and can be directed via prompts for emotion, tone, cadence, and other non-verbal elements. In short, it's pretty great. Let's listen to how v2 compares to v3, using Juniper again to demonstrate.

The default voice model in ElevenLabs is v2, which provides consistent, studio-quality output with lifelike intonation and the ability to evoke emotion. All of the examples so far have used this model, including the lesson prototype (v3 came out right as we finished our work).

At the time of writing, there is also a v3 model. It's much more expressive and can be directed via prompts for emotion, tone, cadence, and other non-verbal elements. In short, it's pretty great. Let's listen to how v2 compares to v3, using Juniper again to demonstrate.

The default voice model in ElevenLabs is v2, which provides consistent, studio-quality output with lifelike intonation and the ability to evoke emotion. All of the examples so far have used this model, including the lesson prototype (v3 came out right as we finished our work).

At the time of writing, there is also a v3 model. It's much more expressive and can be directed via prompts for emotion, tone, cadence, and other non-verbal elements. In short, it's pretty great. Let's listen to how v2 compares to v3, using Juniper again to demonstrate.

ElevenLabs voice models

ElevenLabs voice models

Caption for whatever this is

0:00/1:34

Juniper

v2 Model

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

0:00/1:34

Juniper

v2 Model

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

0:00/1:34

Juniper

v2 Model

They're like a dynamic map that shows us where prices have traveled, and give us hints for where they might head next!

And here's Juniper using the full range of what v3 provides:

And here's Juniper using the full range of what v3 provides:

And here's Juniper using the full range of what v3 provides:

0:00/1:34

Juniper

v3 Model

[sassy and faster][laugh] They're like a dynamic map that shows us where prices have traveled [inhale][slowing down][cheerful] and give us hints for where they might head next!

0:00/1:34

Juniper

v3 Model

[sassy and faster][laugh] They're like a dynamic map that shows us where prices have traveled [inhale][slowing down][cheerful] and give us hints for where they might head next!

0:00/1:34

Juniper

v3 Model

[sassy and faster][laugh] They're like a dynamic map that shows us where prices have traveled [inhale][slowing down][cheerful] and give us hints for where they might head next!

A significant difference, not only in how it sounds (I wouldn't advise mixing v2 and v3 models in long-form audio) but in how it can be directed. Let's talk about those bracketed elements. These are audio tags, small prompt windows that let you directly instruct the AI voice on cadence, flow, tone, emotional state, and non-verbal exclamations. Here are a few examples:

A significant difference, not only in how it sounds (I wouldn't advise mixing v2 and v3 models in long-form audio) but in how it can be directed. Let's talk about those bracketed elements. These are audio tags, small prompt windows that let you directly instruct the AI voice on cadence, flow, tone, emotional state, and non-verbal exclamations. Here are a few examples:

A significant difference, not only in how it sounds (I wouldn't advise mixing v2 and v3 models in long-form audio) but in how it can be directed. Let's talk about those bracketed elements. These are audio tags, small prompt windows that let you directly instruct the AI voice on cadence, flow, tone, emotional state, and non-verbal exclamations. Here are a few examples:

Story Beats

[pause] [continues softly] [hesitates] [resigned]

Story Beats

[pause] [continues softly] [hesitates] [resigned]

Story Beats

[pause] [continues softly] [hesitates] [resigned]

Tone

[dramatic tone] [lighthearted] [reflective] [serious tone]

Tone

[dramatic tone] [lighthearted] [reflective] [serious tone]

Tone

[dramatic tone] [lighthearted] [reflective] [serious tone]

Emotion

[awe] [sarcastic tone] [wistful] [matter-of-fact]

Emotion

[awe] [sarcastic tone] [wistful] [matter-of-fact]

Emotion

[awe] [sarcastic tone] [wistful] [matter-of-fact]

Rhythm + Flow

[slows down] [rushed] [emphasized]

Rhythm + Flow

[slows down] [rushed] [emphasized]

Rhythm + Flow

[slows down] [rushed] [emphasized]

Nonverbal

[laugh] [sigh] [snort] [cough] [gasp]

Nonverbal

[laugh] [sigh] [snort] [cough] [gasp]

Nonverbal

[laugh] [sigh] [snort] [cough] [gasp]

I'm just scratching the surface here, but you can dive deeper by reading ElevenLabs' blog post on audio tags.

I'm just scratching the surface here, but you can dive deeper by reading ElevenLabs' blog post on audio tags.

I'm just scratching the surface here, but you can dive deeper by reading ElevenLabs' blog post on audio tags.

A project in ElevenLabs showing a script with assigned and tuned voices.

A project in ElevenLabs showing a script with assigned and tuned voices.

A project in ElevenLabs showing a script with assigned and tuned voices.

A project in ElevenLabs showing a script with assigned and tuned voices.

How do I get my scripted conversation to feel natural?

How do I get my scripted conversation to feel natural?

How do I get my scripted conversation to feel natural?

diagram of linear conversation

We've done a lot of work getting our individual lines out of the uncanny valley. Now, how do we stitch them into a realistic conversational flow?


If you play through your scripted conversation without any editing, it'll sound pretty flat, with each line coming one after the other. Even with well-tuned lines, this feels off. That's because this isn't how people speak normally, especially when they're engaged conversational partners. They probably should let each other finish. But they usually don't.

Here’s a sample playback of a few tuned lines, un-edited.

We've done a lot of work getting our individual lines out of the uncanny valley. Now, how do we stitch them into a realistic conversational flow?


If you play through your scripted conversation without any editing, it'll sound pretty flat, with each line coming one after the other. Even with well-tuned lines, this feels off. That's because this isn't how people speak normally, especially when they're engaged conversational partners. They probably should let each other finish. But they usually don't.

Here’s a sample playback of a few tuned lines, un-edited.

We've done a lot of work getting our individual lines out of the uncanny valley. Now, how do we stitch them into a realistic conversational flow?


If you play through your scripted conversation without any editing, it'll sound pretty flat, with each line coming one after the other. Even with well-tuned lines, this feels off. That's because this isn't how people speak normally, especially when they're engaged conversational partners. They probably should let each other finish. But they usually don't.

Here’s a sample playback of a few tuned lines, un-edited.

0:00/1:34

Linear playback of the conversation prior to editing.

0:00/1:34

Linear playback of the conversation prior to editing.

0:00/1:34

Linear playback of the conversation prior to editing.

Engaged partners jump in.

Engaged partners jump in.

Engaged partners jump in.

diagram of conversation with tighter structure

Engaged partners tumble over each other, building on or reacting to what the other is saying, often talking over each other. This speeds up the conversation, injecting enthusiasm and energy into the dialogue. It's how people talk when they're excited about a topic and actively engaged. Maybe not all the time, but often.

Engaged partners tumble over each other, building on or reacting to what the other is saying, often talking over each other. This speeds up the conversation, injecting enthusiasm and energy into the dialogue. It's how people talk when they're excited about a topic and actively engaged. Maybe not all the time, but often.

Engaged partners tumble over each other, building on or reacting to what the other is saying, often talking over each other. This speeds up the conversation, injecting enthusiasm and energy into the dialogue. It's how people talk when they're excited about a topic and actively engaged. Maybe not all the time, but often.

Engaged partners backchannel.

Engaged partners backchannel.

Engaged partners backchannel.

diagram of conversation with backchannel VO

If jumping in is how engaged partners show enthusiasm, backchannel VO is how they show they're listening. Backchannel voice-over is all the little vocalizations person B makes while reacting to what person A is saying, communicating agreement, disagreement, shock, or surprise. Things like "mm-hmm," or "yeah," or "no..." in the background. These need to be used strategically to avoid feeling overtly rude, but they make a huge difference in communicating the dynamic between your voices.

Here's the same conversational sample with tighter editing and backchannel elements baked in:

If jumping in is how engaged partners show enthusiasm, backchannel VO is how they show they're listening. Backchannel voice-over is all the little vocalizations person B makes while reacting to what person A is saying, communicating agreement, disagreement, shock, or surprise. Things like "mm-hmm," or "yeah," or "no..." in the background. These need to be used strategically to avoid feeling overtly rude, but they make a huge difference in communicating the dynamic between your voices.

Here's the same conversational sample with tighter editing and backchannel elements baked in:

If jumping in is how engaged partners show enthusiasm, backchannel VO is how they show they're listening. Backchannel voice-over is all the little vocalizations person B makes while reacting to what person A is saying, communicating agreement, disagreement, shock, or surprise. Things like "mm-hmm," or "yeah," or "no..." in the background. These need to be used strategically to avoid feeling overtly rude, but they make a huge difference in communicating the dynamic between your voices.

Here's the same conversational sample with tighter editing and backchannel elements baked in:

0:00/1:34

Same conversation with editing and a few backchannel VO elements.

0:00/1:34

Same conversation with editing and a few backchannel VO elements.

0:00/1:34

Same conversation with editing and a few backchannel VO elements.

Engaged partners have human moments, and maybe even a little fun.

Engaged partners have human moments, and maybe even a little fun.

Engaged partners have human moments, and maybe even a little fun.

A

So... candlesticks are basically little mood rings for the market, right? 

A

So... candlesticks are basically little mood rings for the market, right? 

A

So... candlesticks are basically little mood rings for the market, right? 

...wow, our metaphor game is on point today.

B

...wow, our metaphor game is on point today.

B

...wow, our metaphor game is on point today.

B

A

Right? We are FANCY.

A

Right? We are FANCY.

A

Right? We are FANCY.

We have one more tool to deploy in our quest for human-sounding conversations: unscripted human moments. Interruptions, talking over each other, quick apologies, the odd tangent or joke. These are things you'll probably need to add to your script — snippets incidental to the main point of dialogue. But honestly, they're some of the most fun elements to build.

We have one more tool to deploy in our quest for human-sounding conversations: unscripted human moments. Interruptions, talking over each other, quick apologies, the odd tangent or joke. These are things you'll probably need to add to your script — snippets incidental to the main point of dialogue. But honestly, they're some of the most fun elements to build.

We have one more tool to deploy in our quest for human-sounding conversations: unscripted human moments. Interruptions, talking over each other, quick apologies, the odd tangent or joke. These are things you'll probably need to add to your script — snippets incidental to the main point of dialogue. But honestly, they're some of the most fun elements to build.

A fully tuned and edited project in ElevenLabs.

A fully tuned and edited project in ElevenLabs.

A fully tuned and edited project in ElevenLabs.

A fully tuned and edited project in ElevenLabs.

If we pull all of that together: the tuned lines, the tighter editing, the backchannel vocalizations, and a few human moments, we can get some truly convincing conversation. Here's a final clip, built using the ElevenLabs v3 model with all of the above learnings applied.

If we pull all of that together: the tuned lines, the tighter editing, the backchannel vocalizations, and a few human moments, we can get some truly convincing conversation. Here's a final clip, built using the ElevenLabs v3 model with all of the above learnings applied.

If we pull all of that together: the tuned lines, the tighter editing, the backchannel vocalizations, and a few human moments, we can get some truly convincing conversation. Here's a final clip, built using the ElevenLabs v3 model with all of the above learnings applied.

0:00/1:34

Full conversation with editing, backchannel VO, and a few human moments.

0:00/1:34

Full conversation with editing, backchannel VO, and a few human moments.

0:00/1:34

Full conversation with editing, backchannel VO, and a few human moments.

A few closing insights.

A few closing insights.

A few closing insights.

AI voice has come a long way, and tools like ElevenLabs are making it more accessible and more capable by the month. But the most important thing I took away from my experience so far is that the technology is only part of the equation. The craft of writing, casting, and editing a conversation is where the real work happens, and where the real returns are. Get that right, and the result can be genuinely compelling. With that, here are a few closing thoughts from everything I learned along the way.

AI voice has come a long way, and tools like ElevenLabs are making it more accessible and more capable by the month. But the most important thing I took away from my experience so far is that the technology is only part of the equation. The craft of writing, casting, and editing a conversation is where the real work happens, and where the real returns are. Get that right, and the result can be genuinely compelling. With that, here are a few closing thoughts from everything I learned along the way.

AI voice has come a long way, and tools like ElevenLabs are making it more accessible and more capable by the month. But the most important thing I took away from my experience so far is that the technology is only part of the equation. The craft of writing, casting, and editing a conversation is where the real work happens, and where the real returns are. Get that right, and the result can be genuinely compelling. With that, here are a few closing thoughts from everything I learned along the way.

The perceived quality and impact of handcrafted conversations is highly subjective.

Humor, cadence, and topic will all land differently with different people. And from my own experience, you become intensely biased while building these things, because you have the context of where it started and how much it improved. So always test your crafted conversations with fresh ears before calling them done.

The perceived quality and impact of handcrafted conversations is highly subjective.

Humor, cadence, and topic will all land differently with different people. And from my own experience, you become intensely biased while building these things, because you have the context of where it started and how much it improved. So always test your crafted conversations with fresh ears before calling them done.

The perceived quality and impact of handcrafted conversations is highly subjective.

Humor, cadence, and topic will all land differently with different people. And from my own experience, you become intensely biased while building these things, because you have the context of where it started and how much it improved. So always test your crafted conversations with fresh ears before calling them done.

AI voice is best used to make artificial experiences sound more human, not to replace existing human voices.

I'll go on record: at the time of writing, AI voice is tremendous at making imperfect artificial interactions feel more human. It is not an effective replacement for human voices where real human dialogue is expected and valued, largely because of how much work it takes just to break even on the "convincing humanity" factor.

AI voice is best used to make artificial experiences sound more human, not to replace existing human voices.

I'll go on record: at the time of writing, AI voice is tremendous at making imperfect artificial interactions feel more human. It is not an effective replacement for human voices where real human dialogue is expected and valued, largely because of how much work it takes just to break even on the "convincing humanity" factor.

AI voice is best used to make artificial experiences sound more human, not to replace existing human voices.

I'll go on record: at the time of writing, AI voice is tremendous at making imperfect artificial interactions feel more human. It is not an effective replacement for human voices where real human dialogue is expected and valued, largely because of how much work it takes just to break even on the "convincing humanity" factor.

Even the most human-sounding AI voice won't convince an audience determined to hate AI.

Trained on countless poorly executed examples, these folks will actively seek out any hint of uncanny valley. There is precious little middle ground here, and the inclusion of AI voice may cause a non-trivial portion of your audience to disengage from your content.

Even the most human-sounding AI voice won't convince an audience determined to hate AI.

Trained on countless poorly executed examples, these folks will actively seek out any hint of uncanny valley. There is precious little middle ground here, and the inclusion of AI voice may cause a non-trivial portion of your audience to disengage from your content.

Even the most human-sounding AI voice won't convince an audience determined to hate AI.

Trained on countless poorly executed examples, these folks will actively seek out any hint of uncanny valley. There is precious little middle ground here, and the inclusion of AI voice may cause a non-trivial portion of your audience to disengage from your content.

The real power of AI voice isn't creative puppetry. It's building behavior and patterns to make agents more realistic and, where control matters more than convenience, directable.

I thoroughly enjoyed the process of writing and tuning conversations, likening it to building a kind of voice diorama. But even I have to admit it's a lot of work to make something sound human when I could just record humans. Many people I talked with during my workshop asked whether AI could learn ideal conversational patterns and start taking the tuning work out of the equation. I agree — that would be amazing. Using voice tuning to build real, repeatable personalities would be an incredible boost to AI voice as a production tool. And if I’m thinking it, then by the inviolable laws of the internet I’m sure someone is already working on it.

The real power of AI voice isn't creative puppetry. It's building behavior and patterns to make agents more realistic and, where control matters more than convenience, directable.

I thoroughly enjoyed the process of writing and tuning conversations, likening it to building a kind of voice diorama. But even I have to admit it's a lot of work to make something sound human when I could just record humans. Many people I talked with during my workshop asked whether AI could learn ideal conversational patterns and start taking the tuning work out of the equation. I agree — that would be amazing. Using voice tuning to build real, repeatable personalities would be an incredible boost to AI voice as a production tool. And if I’m thinking it, then by the inviolable laws of the internet I’m sure someone is already working on it.

The real power of AI voice isn't creative puppetry. It's building behavior and patterns to make agents more realistic and, where control matters more than convenience, directable.

I thoroughly enjoyed the process of writing and tuning conversations, likening it to building a kind of voice diorama. But even I have to admit it's a lot of work to make something sound human when I could just record humans. Many people I talked with during my workshop asked whether AI could learn ideal conversational patterns and start taking the tuning work out of the equation. I agree — that would be amazing. Using voice tuning to build real, repeatable personalities would be an incredible boost to AI voice as a production tool. And if I’m thinking it, then by the inviolable laws of the internet I’m sure someone is already working on it.

If you’re looking for a design partner, leader, or collaborator, we should talk.

Whether you’re building a team, launching a product, or shaping your vision, I’m here to help — and I’m always up for a great conversation.

© 2025 Greg Martin. All rights reserved.

Find Me

If you’re looking for a design partner, leader, or collaborator, we should talk.

Whether you’re building a team, launching a product, or shaping your vision, I’m here to help — and I’m always up for a great conversation.

© 2025 Greg Martin. All rights reserved.

Find Me

If you’re looking for a design partner, leader, or collaborator, we should talk.

Whether you’re building a team, launching a product, or shaping your vision, I’m here to help — and I’m always up for a great conversation.

© 2025 Greg Martin. All rights reserved.

Find Me