Speech to Text and Text to speech in Raspberry pi

Speech to Text and Text to speech in Raspberry pi 

Note: The below has been Tested in Raspberry Pi 3 version B+ board.

pre-requisites:
usb sound card, mic, speaker

1. 

Check if the USB sound card is getting detected in raspberry pi
pi@raspberrypi:/dev $ lsusb
Bus 001 Device 009: ID 8086:0808 Intel Corp. 


Below is the location where alsa.conf will be present in new systems
pi@raspberrypi:/dev $ cd /usr/share/alsa

pi@raspberrypi:/usr/share/alsa $ ls
alsa.conf  alsa.conf.d  cards  init  
pcm  smixer.conf  sndo-mixer.alisp  speaker-test  topology  ucm  utils.sh:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear


copy the below into  /etc/modprobe.d/alsa-base.conf

# Load USB audio before the internal soundcard
options snd_usb_audio index=0
options snd_usb_audio index=1
options snd_bcm2835 index=2
# Make sure the sound cards are ordered the correct way in ALSA
options snd slots=snd_usb_audio,snd_usb_audio,snd_bcm2835

command to check the list of installed sound cards in Raspberry Pi
pi@raspberrypi:~/demo_python_codes $ 
cat /proc/asound/cards 
 0 [ALSA           ]: bcm2835_alsa - bcm2835 ALSA
                      bcm2835 ALSA
 1 [Device         ]: USB-Audio - USB PnP Sound Device
                      C-Media Electronics Inc. USB PnP Sound Device at usb-3f980000.usb-1.4, full spe

pi@raspberrypi:~ $ cat /proc/asound/modules 
$ cat /proc/asound/modules 
 0 snd_bcm2835
 1 snd_usb_audio

 0 snd_bcm2835             <-------------------- default sound card
 1 snd_usb_audio           <-------------------- usb audio card 


2. In terminal, give the below command 
#alsamixer
increase volume for mike and speaker

Testing microphone
Next, we need to set the mic recording volume to high. 
To do this, enter the command "alsamixer" in the terminal.
 On the graphical interface that shows up, press the up/down 
arrow keys to set the volume. Press F6 (all), then select the 
webcam or mic from the list. Again, use the up arrow key to 
set the recording volume to high.

Use "audacity" for testing microphone and speaker coonected to USB sound card
sudo apt-get install audacity


3.
Speech Recognition in python

install the following packages [raspberry pi comes with python 2.7 and 3.5]
sudo pip --no-cache-dir install SpeechRecognition 
sudo apt-get install python-pyaudio python3-pyaudio
sudo python -m pip install --upgrade --force setuptools
sudo python -m pip install --upgrade --force pip

Command to view sound cards connected to RASPBERRY PI.
pi@raspberrypi:~ $ cat /proc/asound/cards

 0 [Device         ]: USB-Audio - USB PnP Sound Device
                      C-Media Electronics Inc. USB PnP Sound Device at usb-3f980000.usb-1.5, full spe
 1 [U0x46d0x825    ]: USB-Audio - USB Device 0x46d:0x825
                      USB Device 0x46d:0x825 at usb-3f980000.usb-1.3, high speed
 2 [ALSA           ]: bcm2835_alsa - bcm2835 ALSA
                      bcm2835 ALSA

A FLAC encoder is required to encode the audio data to send to the API
sudo apt-get install flac


Run the below python code to convert speech to text. If the code return an error, 
check the 'mic' device name that gets printed by the code on the terminal.
for e.g. 
mic_name = "USB PnP Sound Device: Audio (hw:0,0)"

Neelkanth $:cat basic_STT_and_TTS_testing_with_dictionary.py 
########################################################################
#        Mission Speech Recognition [Lakshya: Personal Assistant]
########################################################################

#########################################################################
#        Include Python Libraries
#########################################################################
import speech_recognition as sr
import os

#########################################################################
#             INITIALIZING SPEECH RECOGNITION ENGINE
#########################################################################

#enter the name of usb microphone that you found using lsusb
mic_name = "USB PnP Sound Device: Audio (hw:0,0)"
#Sample rate is how often values are recorded
sample_rate = 48000
#Chunk is like a buffer. It stores 2048 samples (bytes of data)
#here. 
#it is advisable to use powers of 2 such as 1024 or 2048
chunk_size = 2048
#Initialize the recognizer
r = sr.Recognizer()
############################################################################
#            INITIALIZING MIC
############################################################################

#generate a list of all audio cards/microphones
mic_list = sr.Microphone.list_microphone_names()
print "#####################################################################"
print "The below are the list of mic's available for Raspberry pi"
print mic_list
print "#####################################################################"

############################################################################
#    The following loop aims to set the device ID of the mic that
#    we specifically want to use to avoid ambiguity.
############################################################################
for i, microphone_name in enumerate(mic_list):
    if microphone_name == mic_name:
       device_id = i
########################   END OF ALL INITIALIZATIONS ######################


############################################################################
# Actual Program starts here... Infinite while loop
############################################################################

while True:
      #####################################################################
      # Use the microphone as source for input. Here, we also specify 
      # which device ID to specifically look for, incase the microphone 
      # is not working, an error will pop up saying "device_id undefined"
      #####################################################################
   with sr.Microphone(device_index = device_id, sample_rate = sample_rate, 
chunk_size = chunk_size) as source:
        ###################################################################
# Wait for a second to let the recognizer adjust the 
# energy threshold based on the surrounding noise level
        ###################################################################
r.adjust_for_ambient_noise(source)


        ###################################################################
        #   Now Speak over the micro phone so that Lakshya would hear and
        #   and respond back
        ###################################################################
print "Say Something"
        os.system('espeak "Hi Neelkanth.. I am Lakshya"')
      
#listens for the user's input
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print "you said: " + text
#error occurs when google could not understand what was said
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))

you would get an output like the below. the first one in the list will we your mic connected to USB sound card

*********************************************************************
[u'USB PnP Sound Device: Audio (hw:0,0)',    -----> this one is the mic 
u'USB Device 0x46d:0x825: Audio (hw:1,0)', 
u'bcm2835 ALSA: - (hw:2,0)', u'bcm2835 ALSA: 
IEC958/HDMI (hw:2,1)', 
u'sysdefault', 
u'front', 
u'surround40', 
u'iec958', 
u'spdif', 
u'dmix', 
u'default']
*********************************************************************

######################################################################
What happens when the program is stuck at r.listen ??? and how to resolve it.
######################################################################
check the input volume of your microphone.
It is by default set to 0. 
 

If the program gets stuck on the line "
audio = r.listen(source)",  this simply means that 
the microphone is not able to listen to any voice 
 input.

The "listen" function has problems 
with environment noise. 
So the running code is only blinking waiting.

Use this ambient noise suppression/adjustment;  
r.adjust_for_ambient_noise(source, duration=5)

How to record and play recorded audio in raspberry pi [Provided USB sound card is connected]
arecord temp.wav 

Use the following commands to record and play using alsa-utils: 
arecord --device=plughw:0,0 --format S16_LE --rate 44100 -c1 test.wav
aplay --device=plughw:0,0 test.wav


4. 
#####################################################################
Speech to Text : This works !!!
#####################################################################
To convert the text to speech, install 'ESpeak' utility.
apt-get install espeak

To test espeak,  invoke the espeak command with some text.
espeak "Hello World"

Setting for Female Voice [in Text to speech] 
espeak -ven-us+f4 -s170  "I am rockstar"

Use the -v option to specifiy a voice. After that you can provide the type of language, such as en or en-us. 
After that, add a plus, then either m or f, 
and a 1 – 5. 
The s option lets you set the speed. Unfortunately, most of the voices sound pretty bad…


###################################################################
TExt to speech python API's
###################################################################
sudo pip install pyttsx    [ No internet Needed for this ]

Python Code
import pyttsx
engine = pyttsx.init()
voices = engine.getProperty('voices')
engine.setProperty('voice',  'english+f5')
engine.say(text)
engine.runAndWait()
time.sleep(1)

sudo pip install gTTS     [ Google API's. Internet Needed for this. but voice clarity 
is awesome ]

Python code
from gtts import gTTS
import os
tts = gTTS(text='Good morning', lang='en')
tts.save("good.mp3")
os.system("mpg321 good.mp3")

#########################################################################
Using especk in python code:  [TTS]
#########################################################################
import os
text = "Hi Neelkanth.   How are you doing"
cmd_string = 'espeak -ven+f5 "{0}" >/dev/null'.format(text)
print cmd_string
os.system(cmd_string)