How can one utilise today’s TTS (Text To Speech) technology when creating an app prototype that speaks?

There are multiple ways of making an app speak. First, it depends on which platform the app runs on – most platforms have built-in text to speech functionality (Windows, Mac) which can be used simply from terminal. When you focus on creating a web-app like I did for Chef, the options narrow down a little bit.

There are paid and free text to speech services. Google offers a Web speech API for both TTS and STT (Speech To Text). For speaking, I am currently using a Scottish solution from Edinburgh-based company Cereproc. Matthew Aylett gave us a speech at DJCAD about speech synthesis last year and I found it quite inspiring. It feels also quite good to be able to provide Scottish voice to local people who will experience the prototype in Dundee on our Degree Show – hoping that it will positively affect the overall usability of the concept 🙂 Cereproc’s API offers multiple voices which you can try directly on their website. One of the ways to use this API is by SOAP – a XML based communication protocol. Since there is a jQuery SOAP plugin, it’s easy (after reading Cereproc’s CereCloud guide) to set your app up and start firing requests and receiving back .ogg sound, which you can play in standard HTML5 Audio element in the browser.

So far I haven’t managed to affect the intonation and other delicacies when it comes to speaking, I hope to do that in near future – it should be possible by the “extended” speaking method.

Other Text To Speech technologies

From other technologies available to developers on the internet, I ’d like to name IBM Watson. Watson is a very interesting and vast project, which greatly utilises speaking as one means of human to computer interaction. Watson can perform very big learning tasks and therefore people from IBM are most likely investing considerable time in refining its speaking and listening capacities.