Judging by last week’s crowded SpeechTek conference in San Francisco, plenty of companies are still chasing the long-held dream of building computers that can understand what their owners are saying. But building the technology to decipher human speech isn't getting any easier.
“Speech has been a problem very smart people have been working on for decades,” said Microsoftchairman Bill Gates in his keynote address at the conference.
True, it might be years before computers can perfectly understand the mumblings of the human voice, but that doesn’t mean there isn’t money to be made. Startups can tackle plenty of niches in the speech market today. One example is Nexidia, a small company that wants to rewrite the way the government intelligence agencies and businesses trawl through audio recordings to identify specific words or phrases.
Currently, tracking down information in audio files is difficult. The most rudimentary method is for transcribers to manually convert audio files to written text, which can then be searched for specific terms. A slightly more advanced method is to assign labels or “meta tags” to individual segments of sound, which can then search for relevant content. More sophisticated systems automatically convert speech into text files, which are searched for the words or phrases needed.
The problem is that these methods are either laborious, inaccurate, or some combination of the two.
Nexidia’s software takes a different approach. It searches audio files using phonemes, which are the fundamental building blocks of speech. Phonemes are the smallest phonetic unit in a language capable of conveying a distinction in meaning. In plain English: the m of mat and the b of bat. Because there are typically between 35 and 40 phonemes in a single language, and about 400 phonemes in all the world’s languages, Nexidia’s claims its software can search more languages more efficiently than anything on the market.
The software, which was developed at the Georgia Institute of Technology, first creates an index of the audio file to be searched – it does this about 10 times faster than real time. Using this index file, which is typically 10 percent the size of the original, the software searches for individual words or phrases by matching phonemes. Once the original audio is indexed, searching is fast – 30 hours of, say, Al Jazeera footage, takes less than a second to search.
The government has taken note – unnamed U.S. government agencies are among Nexidia’s biggest customers. The languages Nexidia has trained its software to understand also reveal something about how it is used – they include Spanish, modern Arabic, Iraqi, and Korean, with four more on the way.
The software was also installed on laptop computers U.S. marines took to Iraq last year.
The 28-person company, which was founded in 2000 as Fast-Talk Communications, is based in Atlanta. It has raised more than $18 million in funding, from investors including H.I.G. Ventures, Boston Millennia Partners, Paladin Capital Group, SAIC’s venture arm, Cordova Ventures, and Atlanta Technology Angels. Kenneth A. Minihan, a principal with Paladin Capital Group and former director of the U.S. National Security Agency sits on Nexidia’s board.
“The traditional means of searching speech are very laborious,” says Alf Andreassen, a principal with Paladin Capital and a member of its Homeland Security Fund Investment Committee. Nexidia’s software is “a way of looking through voice information at unbelievable speeds,” he says.
Nexidia is also looking to expand its base of corporate customers. For example, call center managers can use it to search for specific words or phrases in thousands of hours of phone conversations, to check that agents are sticking to their scripts. Nexidia says it is piloting the software with several Fortune 500 companies. Customers include electronics giant NEC and NetRoadshow, an online financial conferencing company.
Ray Naeini, Nexidia’s president and CEO, says the company continues to advance its technology in collaboration with the Georgia Institute of Technology. In the future, the company’s software could, for example, analyze the stresses in speakers’ voices – to tell whether they are angry or scared or drunk. The next step would be for the software to identify individuals by the sound of their voice.
Over the next year, Mr. Naeini says the company will work on developing its “intelligent mining” software. Instead of just searching for individual terms, the software will sound an alert if it finds specific terms in a section of speech in a specific context. “That’s actionable knowledge,” says Mr. Naeini. With 9/11 inquiries continuing, this could be music to the ears of the intelligence community.
Published three times a week on Mondays, Wednesdays, and Fridays, Next Wave is an online column profiling the latest generation of startup companies and the issues they face. Have thoughts or suggestions for Next Wave? Email column editor at jthaw@redherring.com. For more information on the column, see Next Wave: Red Herring’s Startup Journal.