Enrique Mendez, 9, and his older brother, Cristian, 11, sorted through a plastic bin of toys in their New Jersey home. "I want to play with the wrestling guys," said Enrique in a voice not quite his own, but pretty close.
Enrique has Down syndrome and speech apraxia, which means that he cannot speak, aside from a few grunts and "Ma" in the word "Mama." He was able to speak to his brother, though, with an iPad loaded with the latest version of a widely used text-to-speech application, Proloquo2Go. "The voice now matches the boy," said John Mendez, Enrique's father.
Until recently, devices that help children like Enrique speak used modified adult voices. The effect can be startling to those listening because it doesn't sound like a child's voice. Most existing children's voices sound "like adults on helium," said David Niemeijer, chief executive and lead developer at AssistiveWare, which developed the software Enrique tested.
AssistiveWare and its partner, Acapela Group, developed the next version, Proloquo2Go 2.1, which features two children's voices - known as Josh and Ella - actually recorded by children. The $190 (Rs 10,574) application went on sale on iTunes last Wednesday, but people who already own the app can add the latest voices at no charge.
Few, if any, other companies offer true children's voices, largely because of the challenges of recording children. The average 10-year-old cannot spend hundreds of hours in a sound booth recording the library of phrases needed to create a synthetic child's voice.
Sound engineering can manipulate adult voices, adding filters that adjust for the higher pitch of a child's voice, for example. But without a baseline recording, the voices to date have lacked the natural sound of a child's voice. With little competitive pressure to replicate children's voices, most companies decided children could get by with the altered adult voices.
The release of Proloquo2Go's boy and girl voices - the company also has two other children's voices with a British accent for that market - is an indicator of new progress in the decades-old text-to-speech industry.
The progress is, in part, a side effect of the adoption of automated voices in everything from credit card company service lines to the grocery store checkout kiosk. But faster computer processors with more memory have enabled sound engineers to make artificial voices sound more human. Many of the larger voice companies like Nuance, in Massachusetts, and Ivona, in Poland, now offer voices in multiple languages and accents.
Proloquo2Go, which runs on Apple's mobile devices, is used by tens of thousands of children with disabilities like autism and cerebral palsy. Proloquo2Go "can be a good fit for some people, but not for everyone," said Janice C. Light, a professor at Pennsylvania State University.
Said Niemeijer, the AssistiveWare chief: "A degree of assessment is definitely necessary because the parents often just go out and buy the device and it doesn't work out. Parents often have too high hopes."
During the recording sessions for Proloquo2Go 2.1, audio engineers collected several thousand phrases and hundreds of words. From this bank of words, the application can synthesize any word in the English language. For example the word "impressive" is stitched together from the words impossible, president and detective.
Most text-to-speech devices do give users the ability to say almost anything, and many allow users to choose whether they want to sound happy, angry or sad. The challenge facing the industry is how to develop text-to-speech technologies that can predict the emotion, or tone, a person might want to use.
Many in the industry agree that a synthetic voice, even one that expresses basic emotions, is barely adequate to allow someone with a speech disability to speak normally. "You often can't really chip in sharp/sarcastic comments," wrote Martin Pistorius, a 36-year-old Web developer and author, in an email. He lost his voice after contracting meningitis when he was 12 and has been using text-to-speech technology for 10 years. "By the time you've composed it, the moment has gone so it wouldn't really be funny or appropriate any more."
"I'm pretty quick at getting my message out, but even so I still can't keep up with the pace of normal conversation," he wrote.