Quest for understanding the fundamentals of Linux Operating System and Embedded Systems: Speech to Text and Text to speech in Raspberry pi

Speech to Text and Text to speech in Raspberry pi

Note: The below has been Tested in Raspberry Pi 3 version B+ board.

pre-requisites:

usb sound card, mic, speaker

Check if the USB sound card is getting detected in raspberry pi

pi@raspberrypi:/dev $ lsusb

Bus 001 Device 009: ID 8086:0808 Intel Corp.

Below is the location where alsa.conf will be present in new systems

pi@raspberrypi:/dev $ cd /usr/share/alsa

pi@raspberrypi:/usr/share/alsa $ ls

alsa.conf alsa.conf.d cards init

pcm smixer.conf sndo-mixer.alisp speaker-test topology ucm utils.sh:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear

copy the below into /etc/modprobe.d/alsa-base.conf

# Load USB audio before the internal soundcard

options snd_usb_audio index=0

options snd_usb_audio index=1

options snd_bcm2835 index=2

# Make sure the sound cards are ordered the correct way in ALSA

options snd slots=snd_usb_audio,snd_usb_audio,snd_bcm2835

command to check the list of installed sound cards in Raspberry Pi

pi@raspberrypi:~/demo_python_codes $

cat /proc/asound/cards

0 [ALSA ]: bcm2835_alsa - bcm2835 ALSA

bcm2835 ALSA

1 [Device ]: USB-Audio - USB PnP Sound Device

C-Media Electronics Inc. USB PnP Sound Device at usb-3f980000.usb-1.4, full spe

pi@raspberrypi:~ $ cat /proc/asound/modules

$ cat /proc/asound/modules

0 snd_bcm2835

1 snd_usb_audio

0 snd_bcm2835 <-------------------- default sound card

1 snd_usb_audio <-------------------- usb audio card

2. In terminal, give the below command

#alsamixer

increase volume for mike and speaker

Testing microphone

Next, we need to set the mic recording volume to high.

To do this, enter the command "alsamixer" in the terminal.

On the graphical interface that shows up, press the up/down

arrow keys to set the volume. Press F6 (all), then select the

webcam or mic from the list. Again, use the up arrow key to

set the recording volume to high.

Use "audacity" for testing microphone and speaker coonected to USB sound card

sudo apt-get install audacity

Speech Recognition in python

install the following packages [raspberry pi comes with python 2.7 and 3.5]

sudo pip --no-cache-dir install SpeechRecognition

sudo apt-get install python-pyaudio python3-pyaudio

sudo python -m pip install --upgrade --force setuptools

sudo python -m pip install --upgrade --force pip

Command to view sound cards connected to RASPBERRY PI.

pi@raspberrypi:~ $ cat /proc/asound/cards

0 [Device ]: USB-Audio - USB PnP Sound Device

C-Media Electronics Inc. USB PnP Sound Device at usb-3f980000.usb-1.5, full spe

1 [U0x46d0x825 ]: USB-Audio - USB Device 0x46d:0x825

USB Device 0x46d:0x825 at usb-3f980000.usb-1.3, high speed

2 [ALSA ]: bcm2835_alsa - bcm2835 ALSA

bcm2835 ALSA

A FLAC encoder is required to encode the audio data to send to the API

sudo apt-get install flac

Run the below python code to convert speech to text. If the code return an error,

check the 'mic' device name that gets printed by the code on the terminal.

for e.g.

mic_name = "USB PnP Sound Device: Audio (hw:0,0)"

Neelkanth $:cat basic_STT_and_TTS_testing_with_dictionary.py

########################################################################

# Mission Speech Recognition [Lakshya: Personal Assistant]

########################################################################

#########################################################################

# Include Python Libraries

#########################################################################

import speech_recognition as sr

import os

#########################################################################

# INITIALIZING SPEECH RECOGNITION ENGINE

#########################################################################

#enter the name of usb microphone that you found using lsusb

mic_name = "USB PnP Sound Device: Audio (hw:0,0)"

#Sample rate is how often values are recorded

sample_rate = 48000

#Chunk is like a buffer. It stores 2048 samples (bytes of data)

#here.

#it is advisable to use powers of 2 such as 1024 or 2048

chunk_size = 2048

#Initialize the recognizer

r = sr.Recognizer()

############################################################################

# INITIALIZING MIC

############################################################################

#generate a list of all audio cards/microphones

mic_list = sr.Microphone.list_microphone_names()

print "#####################################################################"

print "The below are the list of mic's available for Raspberry pi"

print mic_list

print "#####################################################################"

############################################################################

# The following loop aims to set the device ID of the mic that

# we specifically want to use to avoid ambiguity.

############################################################################

for i, microphone_name in enumerate(mic_list):

if microphone_name == mic_name:

device_id = i

######################## END OF ALL INITIALIZATIONS ######################

############################################################################

# Actual Program starts here... Infinite while loop

############################################################################

while True:

#####################################################################

# Use the microphone as source for input. Here, we also specify

# which device ID to specifically look for, incase the microphone

# is not working, an error will pop up saying "device_id undefined"

#####################################################################

with sr.Microphone(device_index = device_id, sample_rate = sample_rate,

chunk_size = chunk_size) as source:

###################################################################

# Wait for a second to let the recognizer adjust the

# energy threshold based on the surrounding noise level

###################################################################

r.adjust_for_ambient_noise(source)

###################################################################

# Now Speak over the micro phone so that Lakshya would hear and

# and respond back

###################################################################

print "Say Something"

os.system('espeak "Hi Neelkanth.. I am Lakshya"')

#listens for the user's input

audio = r.listen(source)

try:

text = r.recognize_google(audio)

print "you said: " + text

#error occurs when google could not understand what was said

except sr.UnknownValueError:

print("Google Speech Recognition could not understand audio")

except sr.RequestError as e:

print("Could not request results from Google Speech Recognition service; {0}".format(e))

you would get an output like the below. the first one in the list will we your mic connected to USB sound card

*********************************************************************

[u'USB PnP Sound Device: Audio (hw:0,0)', -----> this one is the mic

u'USB Device 0x46d:0x825: Audio (hw:1,0)',

u'bcm2835 ALSA: - (hw:2,0)', u'bcm2835 ALSA:

IEC958/HDMI (hw:2,1)',

u'sysdefault',

u'front',

u'surround40',

u'iec958',

u'spdif',

u'dmix',

u'default']

*********************************************************************

######################################################################

What happens when the program is stuck at r.listen ??? and how to resolve it.

######################################################################

check the input volume of your microphone.

It is by default set to 0.

If the program gets stuck on the line "

audio = r.listen(source)", this simply means that
the microphone is not able to listen to any voice

input.

The "listen" function has problems

with environment noise.

So the running code is only blinking waiting.

Use this ambient noise suppression/adjustment;

r.adjust_for_ambient_noise(source, duration=5)

How to record and play recorded audio in raspberry pi [Provided USB sound card is connected]

arecord temp.wav

Use the following commands to record and play using alsa-utils:

arecord --device=plughw:0,0 --format S16_LE --rate 44100 -c1 test.wav

aplay --device=plughw:0,0 test.wav

#####################################################################

Speech to Text : This works !!!

#####################################################################

To convert the text to speech, install 'ESpeak' utility.

apt-get install espeak

To test espeak, invoke the espeak command with some text.

espeak "Hello World"

Setting for Female Voice [in Text to speech]

espeak -ven-us+f4 -s170 "I am rockstar"

Use the -v option to specifiy a voice. After that you can provide the type of language, such as en or en-us.

After that, add a plus, then either m or f, and a 1 – 5.

The s option lets you set the speed. Unfortunately, most of the voices sound pretty bad…

###################################################################

TExt to speech python API's

###################################################################

sudo pip install pyttsx [ No internet Needed for this ]

Python Code

import pyttsx

engine = pyttsx.init()

voices = engine.getProperty('voices')

engine.setProperty('voice', 'english+f5')

engine.say(text)

engine.runAndWait()

time.sleep(1)

sudo pip install gTTS [ Google API's. Internet Needed for this. but voice clarity
is awesome ]

Python code

from gtts import gTTS

import os

tts = gTTS(text='Good morning', lang='en')

tts.save("good.mp3")

os.system("mpg321 good.mp3")

#########################################################################

Using especk in python code: [TTS]

#########################################################################

import os

text = "Hi Neelkanth. How are you doing"

cmd_string = 'espeak -ven+f5 "{0}" >/dev/null'.format(text)

print cmd_string

os.system(cmd_string)