Adding Text to Speech to your project will not only make it more impactful but will also be lot of fun. You can use Text to Speech to announce the class that you have predicted or announce the caption of the image that you machine learning model has created. There are several interesting ways this feature can be used.

We will be using gTTS(Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate’s text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio manipulation. For more information see

Use pip to install the latest version of gTTS

pip install gTTS

For playing audio we will use pygame v1.9.6 from within python program. Pygame is a Python wrapper module for the SDL multimedia library. It contains python functions and classes that will allow you to use SDL’s support for playing cdroms, audio and video output, and keyboard, mouse and joystick input. You also have an option of saving the audio file as mp3 and playing it using your preferred multimedia library.

Use pip to install latest version of pygame

pip install pygame

You can either save the audio file generated by gTTS as mp3 or save the output in a ioByte stream to play it directly from memory. We will see both option in this post.

gTTS requires internet connection as it does use for speech to text conversion

from gtts import gTTS
import pygame
import io

# Set True if you want to use IO Stream File instead of saving a MP3 file
use_io_stream = True
mytext = "Hello world"

myObj = gTTS(text=mytext, lang='en', slow=False)

if use_io_stream == True:
    file = io.BytesIO()
    file = io.BytesIO(file.getvalue())
    file = "welcome.mp3"