For a long time already there is a speech recognition technology. As a function of image recognition. So why not combine improving both? Apparently, that’s what I thought the experts from the Massachusetts technological University (MIT) by developing an algorithm that can identify individual objects in the image, based only on verbal description.

About a very interesting technology, writes Engadget. The device of the neural network is quite simple: it consists of two interacting parts. The first works directly with the image – it divides it into a grid of cells, while the second is responsible for processing the audio signal. The incoming message is divided into short 1-2 second intervals. After this the program tests how each cell of the divided images corresponds to each 1-2 second audio file. The developers themselves compare this method with intercourse with a child when you show him objects and tell him their names.

For the new technology is quite a large range of applications, the most obvious being integration into search systems, however, developers prefer the use of the system as a tool for translation that can recognize the language and find the right words up to 100%.

“Instead of giving a job to use the “direct” translation, you can make it context sensitive and to teach the system to translate into different languages of the description and destination objects depending on the situation.”

