ロゼッタ石碑

real time translator

July 14, 2008 · Leave a Comment

so while i was walking home from class, i was suddenly hit with this idea that in the future speech recognition and human interface software would be so advanced to the point that interpreting and translating will be left entirely up to computers and machines.

this idea crossed my mind when i was thinking back to how advanced tts has become in the past two years (especially those developed by microsoft’s competitors) and when i was reminiscing back to the fact that it IS entirely possible to teach a computer how to speak a (human) language fluently, just impossible to give it its own tongue and a mind to communicate with other humans (and by communicating i mean the exchange of semantically and logically irrelevant language, like the ones only humans are capable of engaging in).

also the fact that a computer is supposeldy incapable of independent bias led me to believe that in the future translation and interpretation will all be outsourced to computers and machinery, unless the field of computational linguistics hits a huge brick wall and fails to progress from now until the end of time.

so in hypothesizing such an occurence of the future, i myself deduced the possible inner mechanism of a computer/program/machine capable of such a feat, and it scared me to think that such a machine could easily be built should the idea catch the attention of interested parties, or to think that there may already BE developers/inventors who could easily build such a mechanism… which led me to come up with a little blueprint of the machine of my own…

these are the key components of my “real time translator”:

1. speech to text
using speech-to-text technology, the machine will acoustically record and analyze the speech being spoken and transfer it into data, most likely in some form of text. http://www.brothersoft.com/downloads/speech-to-text.html is an example of speech to text technology being developed all around the world.

2. sentence breaker/pos tagger/word breaker
after the speech is transformed into analyzable data, the spoken speech is analyzed by a sentence breaker which given its knowledge/background in the syntactic structure of the language being spoken, breaks down the speech cluster into sentences. after the speech is broken down into simple sentences, each word is separated and given a “part of speech” tag, depending on the word’s placement within the sentence and the context of the sentence. The NLP project has demos for POS taggers and it is a widely known fact that Nuance and Microsoft have both been working on sentence breakers/word breakers for a long time now.

3. lexicalization
after each word has been broken down and tagged with a part-of-speech, the word is then referenced to the main language lexicon, which is basically a huge dictionary that stores information regarding how each word is pronunced, its frequency in usage within the language, how the word is used in different parts of speech if such information is applicable and so on. after such information is acquired from the main language lexicon, it needs to be then cross-referenced to a lexicon containing the same information in the target language so that “translation” can take place.

4. pos tagger/syntax builder
now that the “translated” data is available in the target language, another pos tagger needs to be applied in order to correctly label the new data, which will then be fed through a syntax builder in order for it to be correctly and accurately formed into a logical sentence in the target language.

5. text to speech
once the sentence is completely translated into the target language and is found to be syntactically and semantically accurate, the sentence then needs to be fed through a text to speech engine which will then relay the speech back to the targeted audience. text to speech can be found everywhere in the modern computer age, anywhere from global navigation systems, registry id calls, and even in windows pc’s which comes standard with a mediocre version of it in every copy (if you’re bored, go to accesories > accessibility options > text-to-speech)

the understood difficulties of this project are numerous and tantamount in scale: the lexicon will have to be updated on a regular basis to account for new words, terms, and definitions; machine translation would mean that translations will often lack variety and be monotonous in nature; the problem of how to set the machine to deal with terms and data that may not be within the lexicon (i.e. names of people, location, new things that may seem obscure); the irregularity of language that will most definitely throw the machine off course; and also the huge amount of processing power required would make instant translation/interpretation very hard or almost impossible.

but as mentioned before, the benefits of such a machine would be endless as it would bridge countless gaps and holes that are duly formed because of language barriers, although it could effecitively mean that what was once a proud oral tradition of human kind will now be lost and permanantly outsourced to hearltess machines.

oh, and i’d be out of a job too, but that’s beside the point…

Categories: Language
Tagged: , ,

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment