6 Best Text to Speech Generators for 2023

best ai text to speech generators

This article reviews six of the leading Text to Speech (TTS) platforms. I spent a few hours on each to evaluate their performance and features.

My approach included signing up for free or trial plans and testing them using both original text and a standard movie script for consistency.

I primarily tested the tools in English, but also gave them a spin in Norwegian or Spanish when I could, to see how they performed in different languages.

I have focused particularly on the quality of the generated speech, level of customisation available, user-friendliness and pricing. The aim is to provide you with a helpful, unbiased guide of the best text to speech software on the market, enabling you to find the one that best meets your needs.

In this article:

What are text to speech generators?

Text-to-Speech (TTS) technologies have advanced in both quality and usage, thanks to the rise of artificial intelligence. Text to speech generators are tools that convert text into spoken language and offer synthesised voices that sound increasingly human-like.

Text to speech generators can serve a variety of purposes:

  • aiding those with reading disabilities or dyslexia
  • enabling you to listen to news and articles instead of reading them.
  • helping content creators generate high-quality work more efficiently than traditional methods.

The use cases for businesses are extensive, encompassing everything from narrating audiobooks to creating social media content like YouTube videos—essentially, any scenario where realistic voiceovers could elevate the content and expedite the production process.

These platforms now offer natural-sounding speech in multiple languages and accents.

Best Text to Speech Generators – Summary

Eleven Labs boasts the most natural-sounding speech in this comparison. The contextual awareness and many settings for tuning each voice makes it a powerful tool to generate realistic speech from text. Eleven Labs is limited, however, by having a fairly low amount of available languages compared to other platforms.

Speechify is excellent for reading, offering a seamless cross-platform experience allowing you to push play on everything from blog articles to your e-mail. Furthermore, Speechify Studio offers advanced, quick and natural-sounding Voice Cloning.

Genny, a smaller player in text to speech, is ideal for creators seeking an all-in-one platform, offering speech, AI text, and image generation. Despite lacking some refined adjustment options, it provides a generous free trial of its Pro plan for two weeks to evaluate whether or not you wish to upgrade.

Comparison Table

Platform

Price (cheapest plan)

Amount of voice generation (cheapest plan)

Voice Cloning

Number of Languages

Recommended For

Eleven Labs

$5/month

30,000 characters

Yes

28

Overall

Speechify

$99/month

4 hours

Yes

20+

Reading

Genny

$25/month

2 hours

Yes

100+

All-in-one content creation platform

Murf

$29/month

2 hours

No

20+

Video editing

Synthesia

$22.50/month

120 mins video/year

No

120+

Talking avatar videos

PlayHT

$39/month

50,000 words

Yes

130+

Developers

What are the Best AI text to Speech Generators for 2023?

  1. Eleven Labs
  2. Speechify
  3. Genny by LOVO
  4. Murf
  5. Synthesia
  6. PlayHT

1. Eleven Labs

Eleven Labs is another highly popular alternative for text to speech, with their site having nearly 10 million monthly web visitors. The platform is a comprehensive suite of text to speech tools that offer reasonable pricing and strong value for money.

Specialising in human-like voices, the platform understands the emotional and contextual nuances of the text. This means it doesn’t just read the words; it adds context appropriate intonation and rhythm as a human would. This makes it more comfortable to listen to, which could be a particularly important factor if you plan to use it for making engaging content, for example narrating audiobooks or a game character’s voice.

Their latest model supports a total of 28 languages with diverse accents for each. However, Eleven Labs does not limit you to only these voices; the pre-installed voices can be remixed using a set of easily adjustable parameters, or you can pick one of the voices made by the community in the Voice Library.

The voice cloning features enable you to clone your own voice then use it for any of their languages.

You can use the basic speech synthesis model on the free plan, which might be good enough for personal use. However, using the advanced speech synthesis editor (ideal for long-form content and full document conversions) requires subscription to at least the Creator tier. Starter gives you so-called Instant Voice Cloning and Creator gets you the professional voice cloning feature.

Pricing

  • Free: 10,000 characters (no commercial license)
  • Starter: $5/month for 30,000 characters
  • Creator: $22/month for 100,000 characters
  • Independent Publisher: $99/month for 500,000 characters
  • Growing Business: $330/month for 2 million characters
  • Enterprise: custom pricing

Pros

  • Contextual understanding makes the voice come alive
  • Easy to tweak settings for each voice including stability, clarity and style exaggeration.
  • Large selection of readily available voices
  • Neatly organised Voice Library makes it easy to discover new voices from the community
  • Instant Voice Cloning is accessible already on the Starter package

Cons

  • Professional Voice Cloning requires a Creator subscription, which may be a bit steep for some users
  • Limited number of languages

Who is this best for?

Eleven Labs is a great choice for content creators that are looking for high-quality, lifelike voices to make engaging content.

What we think of Eleven Labs

Eleven Labs claims to be one of the most advanced text to speech and voice cloning software ever. The platform arguably delivered the best AI-generated speech of the compared platforms.

While there’s a range of voices to choose from, the voice generation feature lets you easily create new ones by selecting parameters like gender, age, accent and accent strength. You can then use that voice to generate text to speech. What really surprised me positively was their voice library, where you can discover and use voices generated by the community, and these are categorised and sortable in a very useful manner.

The platform is purely browser based and created as a tool for content creators. The professional voice cloning feature and the level of control you have for adjusting the speech generation, is well-suited for making engaging content. Despite the many customisation options, the platform has a good user interface that make it easy to use and learn.

The contextual awareness isn’t perfect, but the generated voice do at decent job at following the sentiment of the text. There were also numerous ways to tweak the output under voice settings, such as adjusting the stability, clarity or style exaggerations. The output is easily downloadable as mp3 files.

The free plan gives you 10,000 characters which is enough to test and play with it, but might be too little to do any amount of serious work. While Eleven Labs doesn’t have an app or browser extension like some other alternatives, such as Speechify does, it’s a great choice if you are serious about generating high quality voice content.

2. Speechify

Speechify, with a user base of over 20 million, is a popular option for avid readers and content creators. The platform offers two main products: Text to Speech for listening to your content on any platform, as well as Speechify Studio for advanced voice-over and cloning capabilities. While the free versions offer a taste, the true power unlocks with paid subscriptions, albeit at a premium cost.

Text to Speech allows you to easily convert nearly any text into audio.

  • Available for your browser and mobile devices (iOS and Android). It also allows document uploads.
  • The Chrome extension is highly useful. It adds a play button to web pages with content, even within ChatGPT.
  • Wide range of voices in 20+ languages available. Premium voices, including those of celebrities like Snoop Dogg, sound impressively natural.

Speechify Studio is an advanced tool that lets you do voice overs and clone voices, as well as handle transcription and video dubbing.

  • The free plan gives you access to 200+ voices, but only allows 10 minutes of voice generation per month.
  • If you want more generations, you’ll need to upgrade, which can be a bit pricey for casual users.

Pricing

Text to Speech
  • Free
  • Premium: $159/year – gives you access to high-quality voices.
Speechify Studio
  • Free: 10 minutes of voice generation per month
  • Basic: $99/month for 4 hours of voice generation
  • Professional: $129/month for 8 hours of voice generation. Includes voice cloning.
  • Enterprise: custom pricing

Pros

  • Great cross-platform reading experience
  • Wide range of voices in 20+ languages.
  • High quality, human-like voices on paid plans.
  • Adjustable listening speeds (up to 5x on the premium plan).
  • User-friendly interface. Even Speechify Studio is easy to navigate despite having a lot of features.

Cons

  • Limited voice generation on free plans
  • For casual users, the pricing on paid plans is quite steep

Who is this best for?

  • Text to Speech is a great choice for avid readers, perfect for having articles or documents read aloud to you.
  • Speechify Studio is well-suited for content creators, and can be used to narrate videos, making ads, tutorials, and much more.

What we think of Speechify

Speechify offers intuitive text-to-speech for avid readers but shines most with a paid subscription. Speechify Studio is a fully fledged AI voice studio that targets content creators willing to pay top dollar. Budget-conscious or casual users are likely to find better value elsewhere.

3. Genny by LOVO

Genny is a great choice for creating content in multiple formats, with good text to speech capabilities, albeit lacking some of the sophisticated adjustment possibilities of competing platforms.

The platform, created by Lovo.ai, currently has more than 700,000 users. It’s simple mode is great for creating quick, short voice overs, while the advanced mode is a fully fledged audio/video editor with advanced features for text to speech generation.

The interface makes it easy to handle multiple voices in the same speech generation. Each block of your text can have a different voice, and you can easily change the playback sequence for different blocks inside the editor with drag and drop.

The library of voices inside Genny is very well organized, and has a search function and a bookmarking functionality to easily find and save the voices you’re looking for. If you own any NFT voices, you can also access these by connecting directly to MetaMask and WalletConnect.

While the platform has text to speech at its core, it positions itself as a versatile tool for content creation, with diverse, additional functionalities like an audio/video editor, image generator and an AI text writer.

Pricing

Free: Includes 14-day Pro trial
Basic: $25/month for 2 hours voice generation
Pro: $48 first month for 5 hours voice generation
Pro+: $149/month for 20 hours voice generation
Enterprise: Custom pricing

Pros

  • Best free text to speech option – lets you test almost every feature of the platform for 2 weeks
  • Easily find the right voices through searching, filtering by use case and bookmarking
  • Beginner friendly: Let’s you choose between simple mode or advanced mode
  • Easy to handle and arrange multiple voices in the same text to speech generation
  • Easy and quick voice cloning feature
  • A lot of features for content creators beyond just text to speech

Cons

  • Limited customisation options for voices – not possible to adjust emphasis and pause for Pro voices
  • The Advanced platform has a lot of functionality, making it feel a bit complex at times
  • Voice cloning feature lacks some customisation options

Who is this best for?

Great for content creators looking for a text to speech solution within a full suite of content creation tools, but without the need for extensive voice customisation

What we think of Genny by Lovo.ai

Genny is comprehensive in terms of features, yet fairly easy to use. The voice cloning feature was quick and of relatively good quality – I tested it on my own voice and in less than 2 minutes I was generating text to speech with it – but it lacks some adjustment options like other platforms.

The same goes with the other voices, there are less customisation possibilities than with other platforms, such as Murf.

My favourite part of Genny’s text to speech features was how easy it was to handle and arrange multiple voices simultaneously in the editor, which must be a time saver if you are making a voice over for something like an interview or podcast with multiple speakers.

Genny does have a wide range of functionality for content creation beyond its more than decent text to speech features. It offers an easy-to-use audio/video editor, an image generator and AI writer, so you can create engaging multimedia content without leaving the platform.

4. Murf

Murf is a great overall choice for a text to speech generator, and gives you granular control over each voice; it’s also has a good multimedia editor that lets you add videos and soundtrack to your generated speech.

The platform has more than 1 million users worldwide and offers text-to-speech in over 20+ languages. It also comes with 120+ pre-made voices, and it’s possible to filter them by language, gender and age. Another useful feature is that the voices are categorised by use-case; this lets you find enthusiastic voices to use in advertisements, peculiar voices for a game character, calming voices for meditation, and more. The paid plans also gives you access to a library of soundtracks, so you can add music to the speech you generate. The editor also makes it easy to add your own video or soundtrack to your generated speech.

You can easily tweak things like pitch and speed of each voice, as well as mood and the amount of pauses used. A great feature is also the ability to adjust the emphasis and pronunciation of specific words. The text to speech editor also automatically blocks your paragraphs into separate chunks, which you can play separately, meaning you don’t have to wait for a full text render each time to test your latest edits.

The voice changer (available in the Pro plan) enables you to upload low quality audio, such as recorded directly on your laptop or phone, and use AI to recreate it in professional quality. This can also be handy if you have a recording with background noise.

Pricing

  • Free: 10 mins voice generation
  • Basic: $29/month for 24 hours voice generation/year
  • Pro: $39/month for 48 hours voice generation/year
  • Enterprise: $75/month for unlimited voice generation

Pros

  • Easy to tweak voices to your needs, such as pitch, speed, mood and pauses.
  • Built-in multimedia editor lets you add video and soundtracks
  • Extensive library of ready-to-use voices
  • Voice Changer can make your low quality recordings sound professional

Cons

  • No voice cloning feature in Studio, only available on request by contacting their sales team
  • The Voice Changer feature requires a Pro subscription, which may be a bit pricey for casual users

Who is this best for?

Content creators aiming to produce high-quality speech content and wanting granular level of control of their text. However, the platform doesn’t have a voice cloning feature available, they only do this on request.

What we think of Murf

Murf does a good job in being a versatile and flexible platform for text to speech generation with a wide array of customisation option. The many voices available combined with options that make them easy to tweak, gives you ample creative freedom. Furthermore, the Studio editor gives you room for playing with different multimedia types: importing text, videos and soundtracks, and allowing you to export to many different formats, depending on your needs.

I found the contextual awareness to be okay; sometimes the voice varies the mood a bit too much within the same paragraph. An annoying thing is the censorship Murf uses; when trying to generate speech for some excerpts from a movie scene (Pulp Fiction!), I had to remove several words in order to be able to generate the speech.

All in all, Murf is a good choice if you want to edit using several types of media and want granular, block-based level of control of pronunciation of your text content.

5. Synthesia

Synthesia is a great choice for businesses looking to create realistic videos of talking avatars for a variety of use cases such as sales pitches, how-to videos, online courses, and more.

For generating its videos, the platform lets you choose between a variety of customisable templates, selecting your preferred talking AI avatar, and typing in the text you want. It supports over 120 languages in different accents and narration styles. The canvas lets you modify all visual elements of the video, such as backgrounds and text to fit with your brand; it’s also possible to add a soundtrack. It’s easy to make reusable templates for quick video production in the future.

Pricing

  • Personal: $22.50/month, 120 mins video/year
  • Enterprise: Custom pricing

Pros

  • 140+ varieties of talking avatars that look and sound great
  • All necessary customisation options for making your talking avatar video look professional and on-brand

Cons

  • No trial feature except the ability to generate a short demo video
  • Focused on just talking avatars, not recommended for other use cases of text to speech generation

Who is this best for?

Companies looking to use text to speech specifically for talking avatar videos in their business, products and marketing.

What we think of Synthesia

The videos looks professional and pleasing to the eye, despite being noticeable that they are AI generated. The customisable templates that can be easily tailored to fit with your brand could be a major time saver for businesses wanting to create short videos; making professional-looking how-to instructions for products, sales pitches, or other marketing material is made easy with the help of this platform.

While other platform offer an audio/video editor with their text to speech generation platform as well, Synthesia’s narrow focus on just the use case of talking avatars makes it easier to use, yet has everything you could need in terms of customisation.

6. PlayHT

PlayHT delivers high quality text to speech generation, but lacks some of the features and customisation options other platforms have.

PlayHT is a popular platform for text to speech with more than 3 million monthly visitors to its site, and it just launched a new and redesigned editor. It has a number of pre-made voices to choose from, in addition to basic and high fidelity voice cloning features.

For developers looking to integrate text to speech inside applications the company has made it easy through their API access. Additionally, they have a convenient embed functionality which lets you take your generated speech and put it on a website, which could be useful for cases like a blog or news article.

Pricing

  • Free: 2,500 words (no commercial use)
  • Creator: $39/month for 50,000 words
  • Pro: $99/month for 200,000 words
  • Enterprise: Custom pricing

Pros

  • API access enables text to speech conversions directly in your application
  • Instant Voice Cloning available even on free plan

Cons

  • Beyond speed, it lacks granular level adjustment parameters for voices
  • Somewhat confusing user interface

Who is this best for?

Developers looking to easily integrate text to speech in their own application.

What we think of PlayHT

The quality of the speech generation is excellent but currently lacks some customisation options. I wasn’t able to test the platform’s emotion functionality where you select an emotion for the speech you generate, as it was still in private beta.

The user interface is a bit hard to understand, and it gets a bit confusing with the new editor (where the emotions feature is still in private beta) and the old editor. This makes the product come across as somewhat lacking in functionality, especially considering the paid plans starting at 39$.

However, the platform features their API prominently and has ample documentation for it. It might be a good choice for businesses considering to include AI voice generation within their app.

FAQ

Which ai tool is best for voice cloning?

Speechify has a high quality voice cloning feature inside Speechify Studio. The platform lets you test it free of charge; you simply record a 1 minute recording of your own voice, and you can listen to it with a standardised piece of text. If you want to actually use it for your own text, though, you’ll have to upgrade to the Professional plan at 129$/month.
If you are looking to test voice cloning more in depth for free, I recommend checking out Genny. It offers a quick voice cloning of decent quality in its free trial.

What is the most realistic text to speech software?

Eleven Labs delivered the best AI-generated speech of the platforms compared in this review. The platform’s contextual awareness makes the voice sound more natural, and the speech can easily be tweaked further through various adjustment parameters.

A close runner-up is Speechify, with the premium voices on the platform being very lifelike.

Is there a Free AI Text to Speech Generator?

Most text to speech generators on the market offer free version or free trial. The major drawback of free versions tends to be the amount of voice generation you can do during a month.
Of the platforms review, Speechify, Eleven Labs, Murf and PlayHT has a free version available. Genny has a generous free trial giving you all features of the Pro plan for two weeks.

What do YouTubers use TTS for?

YouTubers increasingly utilise Text-to-Speech (TTS) in order to simplify the video creation process. It allows them to generate high-quality speech quickly, which can save both time and costs compared to traditional recording. It could help content creators expand their reach by ensuring high-quality audio with consistency in tone and style.

Is TTS legal on YouTube?

TTS is permitted on YouTube as long as it adheres to YouTube’s community guidelines. Creators can monetize their videos that are using TTS; however, copyright issues could arise if using someone else’s material without permission. The right to use TTS voices also depends on the text to speech provider’s terms and conditions, and the rights they have for their voices.

What are the disadvantages of TTS?

In using text to speech for content creation, the potential disadvantages lays in the user experience. If the voice sounds unnatural or robotic, it may be perceived as not engaging or as lacking authenticity compared to using a real voice. Another disadvantage is that while TTS may sound good for a certain piece of text, it might be lower quality for another text, which demands time for quality checks and tuning to get a high quality output.

Conclusion

The landscape of text-to-speech generators is diverse. While some platforms excel in speech quality and the ability to customize, others focus on user-friendliness and additional features like generating images and video along with speech.

Although the tested platforms vary in terms of their ability to adjust voices, affordability, and advanced features, all generated impressively natural-sounding voices that, in some cases, were hard to distinguish from actual human voices.

dario

Dario Chincha

Dario Chincha is a solopreneur with a deep fascination for AI and technology. With a long background researching trends and markets, he simplifies the complex world of AI for a wide audience. He is the creator of whatplugin.ai, a popular hub for discovering ChatGPT plugins. He also curates the top stories in AI in his weekly newsletter, What's Brewing in AI, for 3,000+ readers.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *