Recently neural network surprise with your skills — could you ten years ago to believe that the computer will be able to”revive” portraits of Dostoevsky and Marilyn Monroe? Get ready to be surprised further because researchers from mit have created a neural network Speech2Face, which is able to draw portraits of people, simply by listening to their voices. Technology is still far from ideal, but its ability to determine the gender, nationality and age of the man is impressive.
To train the neural network used a set AVSpeech with a million short videos with thousands speaking people. Tracks with video and audio separated so the system could examine each type of material as detailed as possible. In the first stage, the algorithm VGG-Face studied the video, and created the portraits appearing on these people in full face and neutral expression. The other part of the algorithm studied, the spectrogram of a voice and imposed on the portraits for more changes — the result was a rough portrait of each talking person.
The neural network to create portraits based on the voice — already a reality
If you compare the human face with the video and the proposed algorithm variant, it is possible to find many differences. However, the researchers claim that they initially did not want to create the most similar portrait of a man — the tone and intonation of the human voice is influenced by many factors, so the ideal outcome they would not have received. But the neural network copes with the fact that it is important for researchers, namely a precise definition of gender, nationality and age.
The authors noted that at the moment the algorithm is rather weak in determining the age, but they can increase the accuracy. It was also found that the algorithm better reconstructs the face of European and Asian looks, but it is due only to the fact that training videos was not an equal number of persons of different nationalities.
Why need neural network?
What can be useful this technology in the future? Alternatively, it might someday be created, where the virtual user avatar is automatically created based on his voice. The new study also bears great scientific benefit — after studying the data, scientists can find a correlation between person’s appearance and his voice. To hear voices and to see recreated on the basis of their portraits on the project website.
What is the use of such a neural network can come up with you? His bold assumptions share in the comments and join the conversation in our Telegram chat.