How we Created an AR Universal Translator in 10 days
Language barriers appear frequently in today’s society. They appear among people who speak different languages and those who speak the same language. These occasions happen to us all, no matter the type of job. But instead of discussing the problem, let’s consider the possible solutions.
The one and only: the “Universal Translator”
As most of us are fans of Star Trek, we couldn’t help but think about the Universal Translator. A device used by Captain Kirk became the holy grail for enthusiasts interested in tech and sci-fi. Today there are numerous variations on this theme, and although they can’t translate alien languages just yet, they are close to something we might use in the near future.
Companies such as LineApps and Shine Apps Solution already offer versions of universal translators through Google Play. Skype has presented a translator option containing both voice and text translation. These solutions are great, but they aren’t suitable for everyday use. Skype’s device is available only for online calls, while the rest of the apps require a handheld device. But if things keep developing at this pace, pretty spot-on universal translator-like apps might appear in our lifetime.
We gave it a go
In our office, it started with a pair of Nreal glasses we acquired. It’s “A small step to Augmented Reality” as they claim on their website, but such a big one when we talk about the design and tech possibilities. These are one of the first socially accepted glasses, since they don’t diverge too much from an everyday pair of sunglasses.
The key is simplicity; no general user likes bulky eyewear, so Nreal did it right. This encouraged us to create something everyone can use to simplify communication. Darian, our CEO and founder, got the brilliant idea to use an AI service to create subtitles underneath the person you are talking to through AR. We decided to use Microsoft’s Azure Cognitive Services, their speech-to-text and speech translation system. They describe it as “a Speech service feature that accurately transcribes spoken audio to text,” and it does that, with just a slight delay.
As explained above, the functionality of these cognitive systems is relatively simple.
1. Cognitive services capture speech.
2. Speech is translated and turned into text.
3. Subtitles appear underneath the person talking.
4. With basic face tracking, titles are positioned below the person’s face.
A working prototype was finished in just ten days, and gained much interest through social media. We just couldn’t be happier. The application we’ve created still requires some polishing, since at present the system only works with a single person. But we will surely figure it out soon. It’s a prototype, after all.
Obstacles to overcome
The biggest obstacle right now is that the two people talking have to be in a separate room, since the universal translator captures sounds from all around. Having more people talking simultaneously might cause some disturbance, but this can all be fixed by voiceprint tracking. The application could recognise the person speaking and deem only their voice to be relevant for translation. Upon this tweet going live, the reaction of our followers was amazing. Many experts in this field became engaged in exciting discussions with other experts all around the globe, and the final response we got was beyond positive.
Now the big question is, why haven’t we continued development? The answer to that lies in the fact that we created it in only ten days by using off-the-shelf products systems from large corporations. Because of that, our app wouldn’t be free. And our objective was to present a solution available for everyone, everywhere, and free of charge. We knew it was just a matter of time before one of the giants came out with a product. And it did, just nine months later. Google came out with its AR translator. The app is free, but that is not surprising, with all the data Google collects. This app created massive hype, and in the video below you can see why.
A big part of Delta Reality is constantly innovating and building something new for the market. Filip, one of our developers, worked on this project, so we asked him to tell us a bit about how it went. He said: “Working on the universal translator was a fun and exciting experience. It showed me the power and potential of azure AI and opened my eyes to a world of possible applications and useful gadgets that we could make to bring people closer and help each other every day.”
Although we did not pursue the universal translator to market, we did implement a version of it into our client’s product.