Microsoft Research just dropped a new app called Engkoo. Here’s how they describe it:
Free dictionary and translation based on new technology from Microsoft Research. Currently supporting English-English and English-Chinese (Simplified). Define/translate a word, phrase or sentence. Features include: fresh definitions and sample sentences mined from the web, pronunciation (audio), autocomplete with built-in support for spelling correction, wildcard (*, ?), and pinyin input. Learn more about the Engkoo project @ http://research.microsoft.com/projects/engkoo.
When you go to that page you get why they’re working on this:
Engkoo is a technology for exploring and learning language, now powering the Bing Dictionary product in China. It is built primarily by mining translation knowledge from billions of web pages – using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future.
At a system level, Engkoo is an application platform that supports a multitude of NLP and Speech technologies such as cross language retrieval, alignment, sentence classification, statistical machine translation, text-to-speech, and phonetic search. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build the world’s largest lexicon linking both Chinese and English together – at the same time covering the most up-to-date terms as captured by the net. In addition, our data set is intelligently merged with licensed data from sources including Microsoft Office and Encarta. Finally, the resulting vast, ranked, high quality composite data set is analyzed by a machine learning based classifier, allowing users to filter down sample sentences by combinable categories.
So yeah, they’re trying to universalize languages…