Using pronunciation dictionaries — ElevenLabs Documentation

In this tutorial, you’ll learn how to use a pronunciation dictionary with the ElevenLabs Python SDK. Pronunciation dictionaries are useful for controlling the specific pronunciation of words. We support both IPA and CMU alphabets. It is useful for correcting rare or specific pronunciations, such as names or companies. For example, the word nginx could be pronounced incorrectly. Instead, we can add our version of pronunciation. Based on IPA, nginx is pronounced as /ˈɛndʒɪnˈɛks/ . Finding IPA or CMU of words manually can be difficult. Instead, LLMs like ChatGPT can help you to make the search easier.

We’ll start by adding rules to the pronunciation dictionary from a file and comparing the text-to-speech results that use and do not use the dictionary. After that, we’ll discuss how to add and remove specific rules to existing dictionaries.

If you want to jump straight to the finished repo you can find it here

Phoneme tags only work with the `eleven_turbo_v2` & `eleven_monolingual_v1` models. If you use phoneme tags with other models, they will silently skip the word.

Requirements

An ElevenLabs account with an API key (here’s how to find your API key ).

Python installed on your machine

FFMPEG to play audio

Setup

Installing our SDK

Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the updating pronunciation dictionary and using text-to-speech conversion. You can install it using pip:

$ pip install elevenlabs

Additionally, install python-dotenv to manage your environmental variables:

$ pip install python-dotenv

Next, create a .env file in your project directory and fill it with your credentials like so:

ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Initiate the Client SDK

We’ll start by initializing the client SDK.

1 import os
2 from elevenlabs.client import ElevenLabs
3 
4 ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
5 client = ElevenLabs(
6     api_key=ELEVENLABS_API_KEY,
7 )

Create a Pronunciation Dictionary From a File

To create a pronunciation dictionary from a File, we’ll create a .pls file for our rules.

This rule will use the “IPA” alphabet and update the pronunciation for tomato and Tomato with a different pronunciation. PLS files are case sensitive which is why we include it both with and without a capital “T”. Save it as dictionary.pls .

dictionary.pls

1 <?xml version="1.0" encoding="UTF-8"?>
2 <lexicon version="1.0"
3       xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
4       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5       xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
6         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
7       alphabet="ipa" xml:lang="en-US">
8   <lexeme>
9     <grapheme>tomato</grapheme>
10     <phoneme>/tə'meɪtoʊ/</phoneme>
11   </lexeme>
12   <lexeme>
13     <grapheme>Tomato</grapheme>
14     <phoneme>/tə'meɪtoʊ/</phoneme>
15   </lexeme>
16 </lexicon>

In the following snippet, we start by adding rules from a file and get the uploaded result. Finally, we generate and play two different text-to-speech audio to compare the custom pronunciation dictionary.

1 import requests
2 from elevenlabs import play, PronunciationDictionaryVersionLocator
3 
4 with open("dictionary.pls", "rb") as f:
5     # this dictionary changes how tomato is pronounced
6     pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
7         file=f.read(), name="example"
8     )
9 
10 audio_1 = client.generate(
11     text="Without the dictionary: tomato",
12     voice="Rachel",
13     model="eleven_turbo_v2",
14 )
15 
16 audio_2 = client.generate(
17     text="With the dictionary: tomato",
18     voice="Rachel",
19     model="eleven_turbo_v2",
20     pronunciation_dictionary_locators=[
21         PronunciationDictionaryVersionLocator(
22             pronunciation_dictionary_id=pronunciation_dictionary.id,
23             version_id=pronunciation_dictionary.version_id,
24         )
25     ],
26 )
27 
28 # play the audio
29 play(audio_1)
30 play(audio_2)

Remove Rules From a Pronunciation Dictionary

To remove rules from a pronunciation dictionary, we can simply call remove_rules_from_the_pronunciation_dictionary method in the pronunciation dictionary module. In the following snippet, we start by removing rules based on the rule string and get the updated result. Finally, we generate and play another text-to-speech audio to test the difference. In the example, we take pronunciation dictionary version id from remove_rules_from_the_pronunciation_dictionary response because every changes to pronunciation dictionary will create a new version, so we need to use the latest version returned from the response. The old version also still available.

1 pronunciation_dictionary_rules_removed = (
2     client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
3         pronunciation_dictionary_id=pronunciation_dictionary.id,
4         rule_strings=["tomato", "Tomato"],
5     )
6 )
7 
8 audio_3 = client.generate(
9     text="With the rule removed: tomato",
10     voice="Rachel",
11     model="eleven_turbo_v2",
12     pronunciation_dictionary_locators=[
13         PronunciationDictionaryVersionLocator(
14             pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
15             version_id=pronunciation_dictionary_rules_removed.version_id,
16         )
17     ],
18 )
19 
20 play(audio_3)

Add Rules to Pronunciation Dictionary

We can add rules directly to the pronunciation dictionary with PronunciationDictionaryRule_Phoneme class and call add_rules_to_the_pronunciation_dictionary from the pronunciation dictionary. The snippet will demonstrate adding rules with the class and get the updated result. Finally, we generate and play another text-to-speech audio to test the difference. This example also use pronunciation dictionary version returned from add_rules_to_the_pronunciation_dictionary to ensure we use the latest dictionary version.

1 from elevenlabs import PronunciationDictionaryRule_Phoneme
2 
3 pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
4     pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
5     rules=[
6         PronunciationDictionaryRule_Phoneme(
7             type="phoneme",
8             alphabet="ipa",
9             string_to_replace="tomato",
10             phoneme="/tə'meɪtoʊ/",
11         ),
12         PronunciationDictionaryRule_Phoneme(
13             type="phoneme",
14             alphabet="ipa",
15             string_to_replace="Tomato",
16             phoneme="/tə'meɪtoʊ/",
17         ),
18     ],
19 )
20 
21 audio_4 = client.generate(
22     text="With the rule added again: tomato",
23     voice="Rachel",
24     model="eleven_turbo_v2",
25     pronunciation_dictionary_locators=[
26         PronunciationDictionaryVersionLocator(
27             pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
28             version_id=pronunciation_dictionary_rules_added.version_id,
29         )
30     ],
31 )
32 
33 play(audio_4)

Conclusion

You know how to use a pronunciation dictionary for generating text-to-speech audio. These functionailities open up opportunities to generate text-to-speech audio based on your pronunciation dictionary, making it more flexible for your use case.

For more details, visit our example repo to see the full project files which give a clear structure for setting up your application:

1	import os
2	from elevenlabs.client import ElevenLabs
3
4	ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
5	client = ElevenLabs(
6	api_key=ELEVENLABS_API_KEY,
7	)

1	<?xml version="1.0" encoding="UTF-8"?>
2	<lexicon version="1.0"
3	xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
4	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5	xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
6	http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
7	alphabet="ipa" xml:lang="en-US">
8	<lexeme>
9	<grapheme>tomato</grapheme>
10	<phoneme>/tə'meɪtoʊ/</phoneme>
11	</lexeme>
12	<lexeme>
13	<grapheme>Tomato</grapheme>
14	<phoneme>/tə'meɪtoʊ/</phoneme>
15	</lexeme>
16	</lexicon>

1	import requests
2	from elevenlabs import play, PronunciationDictionaryVersionLocator
3
4	with open("dictionary.pls", "rb") as f:
5	# this dictionary changes how tomato is pronounced
6	pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
7	file=f.read(), name="example"
8	)
9
10	audio_1 = client.generate(
11	text="Without the dictionary: tomato",
12	voice="Rachel",
13	model="eleven_turbo_v2",
14	)
15
16	audio_2 = client.generate(
17	text="With the dictionary: tomato",
18	voice="Rachel",
19	model="eleven_turbo_v2",
20	pronunciation_dictionary_locators=[
21	PronunciationDictionaryVersionLocator(
22	pronunciation_dictionary_id=pronunciation_dictionary.id,
23	version_id=pronunciation_dictionary.version_id,
24	)
25	],
26	)
27
28	# play the audio
29	play(audio_1)
30	play(audio_2)

1	pronunciation_dictionary_rules_removed = (
2	client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
3	pronunciation_dictionary_id=pronunciation_dictionary.id,
4	rule_strings=["tomato", "Tomato"],
5	)
6	)
7
8	audio_3 = client.generate(
9	text="With the rule removed: tomato",
10	voice="Rachel",
11	model="eleven_turbo_v2",
12	pronunciation_dictionary_locators=[
13	PronunciationDictionaryVersionLocator(
14	pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
15	version_id=pronunciation_dictionary_rules_removed.version_id,
16	)
17	],
18	)
19
20	play(audio_3)

1	from elevenlabs import PronunciationDictionaryRule_Phoneme
2
3	pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
4	pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
5	rules=[
6	PronunciationDictionaryRule_Phoneme(
7	type="phoneme",
8	alphabet="ipa",
9	string_to_replace="tomato",
10	phoneme="/tə'meɪtoʊ/",
11	),
12	PronunciationDictionaryRule_Phoneme(
13	type="phoneme",
14	alphabet="ipa",
15	string_to_replace="Tomato",
16	phoneme="/tə'meɪtoʊ/",
17	),
18	],
19	)
20
21	audio_4 = client.generate(
22	text="With the rule added again: tomato",
23	voice="Rachel",
24	model="eleven_turbo_v2",
25	pronunciation_dictionary_locators=[
26	PronunciationDictionaryVersionLocator(
27	pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
28	version_id=pronunciation_dictionary_rules_added.version_id,
29	)
30	],
31	)
32
33	play(audio_4)

Phoneme tags only work with the eleven_turbo_v2 & eleven_monolingual_v1 models. If you use phoneme tags with other models, they will silently skip the word.