Long before, people has been trying hard in overcoming the human language barriers. Until recent years, IT giants like Google and Microsoft have been dedicating huge amount of resources to leapfrogged these difficulties in languages. Seamless voice translation may even seem possible in the near future. As much as we like, does it mean we have reached a breakthrough in creating a barrier-free language world?
Although instant translation is not perfect, we can now speak directly into Google translation mobile app and get our spoken words translation into another language. For most of us, it makes travelling a whole lot easier in foreign countries. In 2015 Microsoft released a new skype version that let user speaks in native language and auto translate into another foreign language. (https://support.skype.com/en/faq/FA34542/how-do-i-set-up-and-use-skype-translator)
Google translate more than 100 million words daily since its introduction in 2011. In addition to text translation, Google translate’s text recognition technology is able to capture image from camera and translate foreign text (https://support.google.com/translate/answer/6142483?hl=en). This is especially helpful when reading the menu in a foreign restaurant.
Although Google has made remarkable improvement in instant translation, there is still a whole lot to do. In order to bring this application to the next level, software engineers have to extract quality voice samples from data and add this into Google’s voice databank. The difficulty lies in the vast amount of data and variety of accent & vocalization in different languages. This has a great impact in the quality of the translated language (or target language).
Microsoft spent a decade long in instant translation development for Skype, based on their study on semi-supervised Gaussian mixture model (GMM) hidden Markov model (HMM) and deep neural network (DNN) HMM acoustic model training. In 2010, Microsoft made its breakthrough in cross-language communication, created “The Translating Telephone” which enables speech to tran scripted text (https://www.microsoft.com/en-us/research/video/the-translating-telephone/). This is then translated live using machine translation to provide speech-to-text translation and further fed into a text-to-speech system to realize speech-to-speech translation.