How To Use Speech Recognition API For Voice Driven Web Apss

Speech Recognition
By Aditya Samanta Updated

The Web Speech API opens up a whole new dimension of interactivity for web developers to explore and utilize. Formally known as Web Speech API, it is a specification that defines a JavaScript powered API for speech recognition in web pages. In this article, we will see how to use the web speech API to create web apps with native browser support for speech recognition.

Web Speech API

Web speech API specification was introduced in late 2012. The API provides native browser support for speech recognition and speech synthesis. Speech recognition is the process of converting spoken words into text while speech synthesis is the process of converting written language into speech.

This is an experimental technology. At the time of this writing, the only browser that supports it is Google Chrome. The API is baked in Google Chrome since version 25 but only the vendor prefix version is available for now. Support for speech synthesis has landed on Firefox 44+ but speech recognition is not yet available. You can check full browser support here.

Add speech recognition to your site

The first thing we need to do is check for browser support. We do this with the following code snippet. Note that we are checking for either normal or vendor prefix version.

var speechRecognition  = window.SpeechRecognition || window.webkitSpeechRecognition;

Next we create an object that provides the speech interface to control voice recognition in our web application.

var recognition = new speechRecognition();

Now we will set a couple of properties for our object

recognition.continuous = false;
recognition.interimResults = false;
recognition.lang = "en-US";

By setting the continuous property to false, we are telling our object to return no more than one result in response to recognition. We are also setting the interimResults property to false which prevents the API from returning any interim results – any results which are not final yet. Finally, we are setting the language for recognition which defaults to the current HTML lang attribute.

Now it’s time to start listening for speech input. We do this using the start method of SpeechRecognition interface. We are using the onclick event listener to start speech recognition when the user clicks on the body.

document.body.onclick = function(e) {

recognition.onresult = function(e) {
 var transcript = e.results[0][0].transcript;
 output.textContent = transcript;

Whenever a word or phrase is detected by our recognition service, it returns the result to our app and triggers the onresult event. The results property of the event stores a SpeechRecognitionResultList that represents a list of SpeechRecognitionResult objects. It can be accessed like an array so the first 0 represents the first SpeechRecognitionResult object. Every SpeechRecognitionResult object contains SpeechRecognitionAlternative objects that store individual results. Again it can be accessed like an array. Each SpeechRecognitionAlternative object has a transcript property, a string representing the  recognised word. So we are retrieving the first transcript value of the first recognized word from our service. Finally, we just set the value in a text field.

Here is the complete demo on JSFiddle. You must allow the app to access your microphone for this app to work!