![automated lip reading automated lip reading](https://i.ytimg.com/vi/28U6EwfKois/maxresdefault.jpg)
If you fit this description and don’t believe me, next time you are in that situation, try closing your eyes and see if you can still hear the person next to you. Sighted, hearing people lip-read, however subconsciously that is. Could DeepMind's lip reading skills also help blind people and those with sight loss?Īs a blind person I’d also find such a set-up extremely useful as, for me, hearing people in a noisy environment is twice as hard as it is for someone who can see the speaker’s lips.
#Automated lip reading software
This approach, however, relies on someone being able to speak clearly into a microphone (in this case a linked smartphone) but what about a noisy office or hallway? In such a situation, the ability to use the head-mounted camera (which is unaffected by noise or the proximity of the speaker) combined with lip-reading software would give a similar result without the restrictions. There's long been voice recognition and this can aid with the real-time translation of speech-into-text – as we see here in this video of someone using Google Glass to subtitle a conversation with a colleague. So what do technological lip-reading advancements mean for disabled people? Going mobile with DeepMind’s lip-reading smartsįor people with hearing loss the benefits of such tech are obvious. The AI correctly translated 46.8% - and many of its mistakes were very small, such as missing an 's' off the end of a word.
#Automated lip reading professional
Given 200 randomly selected clips from the data set to decipher, the professional translated just 12.4% of words without errors. The result of this research and development was a system that can interpret human speech across a wide range of people speaking in a variety of lighting and filming environments.ĭeepMind significantly outperformed a professional lip-reader and all other automatic systems. Here is an example of a clip without subtitles:Īnd now the same clip with subtitles created by the DeepMind algorithm: The system successfully deciphered phrases such as “We know there will be hundreds of journalists here as well” and “According to the latest figures from the Office of National Statistics”. The result of this research and development was a system that can interpret human speech across a wide range of speakers found in a variety of lighting and filming environments. A deeply impressive result – not just lip-service It was only then that it was able to go through all 5,000 hours once more to do the deep analysis of learning exactly which words related to which mouth shapes and movements. To initially prepare all samples to be ready for the machine learning process, DeepMind first had to assume that the majority of clips were in sync, watch them all and try to learn from them a basic relationship between mouth shapes and sounds, and using that knowledge, rewatch all clips and correct the audio of anywhere the lips were out of sync with the speech. What added to the task was the fact that often the video and audio in the recordings were out of sync by up to a second. The sample used in this DeepMind study comprised no fewer than 17,500 unique words, which made it a significantly harder challenge, but ultimately resulted in a much more accurate algorithm.
![automated lip reading automated lip reading](https://mediacloud.theweek.co.uk/image/private/s--9e4DIhqG--/f_auto,t_primary-image-mobile@1/v1603029135/theweek/7/94/140915-lips_0.jpg)
The AI analysed a total of 118,000 sentences – a much larger sample than in previous pieces of research such as the LipNet study for example - which only contained 51 unique words.
#Automated lip reading tv
Researchers at Oxford University used Google's DeepMind to watch more than 5,000 hours of TV including shows such as Newsnight, BBC Breakfast and Question Time for the 'Lip Reading Sentences in the Wild' study.
![automated lip reading automated lip reading](https://www.mdpi.com/applsci/applsci-09-05432/article_deploy/html/images/applsci-09-05432-g006.png)
So what does this mean for disabled people, TV subtitling and the shady world of cloak and dagger espionage.? The biggest TV binge-fest in history but now, Artificial Intelligence like Google's DeepMind is getting its virtual teeth into the challenge - and doing an even better job than humans. Many deaf people can do it, but there are situations when it is a struggle.