An insight into learning representations – ICLR 2017
I was pleased to discover that the topics discussed at this year’s International Conference on Learning Representations (ICLR 2017) tie in closely with some of the key projects the 10x team are working on day-to-day at Ocado Technology.
10x focuses on creating transformational improvements that have the potential to revolutionise our business; this article offers of a good overview of the ten times vs ten percent principle. Working in 10x requires a special set of skills, including the ability to identify challenges, and implementing out of the box thinking in order to establish innovative solutions. This also means that the focus of 10x is at the cutting edge of technology, looking far into the future and turning ambitious dreams into realities. Machine learning is unsurprisingly one of these cutting edge research areas, which makes ICLR 2017 a very enticing event (that and the fact that it imposes a three day trip to the south of France).
A few of the team members attended this year’s event. Read on to hear more about the cutting edge of learning representations and get involved with experiments you can try yourself at home.
Images as textures – Eero Simoncelli
There are two very different approaches to machine learning; you can either train a machine to process information like a brain, or train them to behave more like, well… a machine. The first keynote speaker – Eero Simoncelli, professor of neural science, mathematics and psychology at New York University, is an advocate for the first approach. However, we have still only scratched the surface when it comes to understanding the in-depth processes of the brain, so how can we model machines closely on our own minds if this is the case?
Eero focussed specifically on vision processes for this keynote speech. When it comes to vision, there’s a lot we don’t yet know. Much of the process, from the first stage where our eyes see the world around us right up until our brains have fully processed the information, remains a puzzle. So far we only really understand the first few steps; our retinas transfer information via the million or so ganglion cells in the optic nerve to the the primary visual cortex (V1), located at the back of the brain. Here the neurones filter the information by orientation, identifying vertical, horizontal and diagonal lines to condense the the data down into more manageable pieces. Unfortunately we’re not nearly as clear on the subsequent processes going on in the higher order visual cortices (secondary visual cortex V2, third visual cortex V3, etc.), or indeed the inferior temporal cortex (ITC).
Because these latter processes are a lot foggier, to a certain extent we now need to give up on biologically plausible representations when thinking back to training our machine.
Eero described that images, when perceived by human brains, appear as if they are covered in textures (the definition of texture in this case is a little imprecise, but most agree that an image of visual texture should be “spatially homogeneous, and typically contain repeated structures, often with some random variation, such as varying positions, orientations or colours”). If this is the case we should be able to look at the statistics from an original image, for example how often do I see a white pixel, how often do I see a black pixel etc., and then synthesise a copy of the image from these statistics. This copy can then be perceived by the human brain as the same image. If successful this could have applications in image compression as you could simply store the statistics rather than the image itself.
Shall we put it to the test?
Look briefly at the two images below, one after the other. In both cases focus on the red dot in the centre of both the images.
If you fought the urge to cheat and kept your focus on the red dot, you probably assumed the two images were the same. Now look again in more detail…
The synthesised image on the right is more than a little creepy, with a lot of distortion, but your brain tricked you into assuming those morphed blurs were people.
A similar trickery also works with text. Now you know what’s coming, it will be harder not to cheat, I know, but try and focus solely on the red dot again.
You probably read the word ‘myself’ but did you notice the surrounding words? Take another look.
They’re all a combination of nonsense and random symbols which your brain fools you into thinking are text.
Both these experiments go to show that we don’t analyse nearly the level of information from the input we receive as we think we do, and can be easily fooled by an image with similar texture. However, the synthesised image shown in the first example illustrates that using the statistics from an original image does not provide a usable copy.
It is nonetheless interesting to try and scrape the surface of what we don’t yet know about how we perceive the world.
Words in vector space – Babylon Health
Alongside the the talks given at ICLR 2017, I also asked the team to highlight any posters they thought had stood out as particularly interesting. The one that caught our attention was from Babylon Health – London, which covered some groundbreaking material in natural language AI tools.
The company is trying to answer one simple question: what if you could map out words in terms of their relation to each other? The AI team at Babylon Health have created a neural network based on natural language. With their new digital diagnoses app, the doctor in your pocket if you like, they have explored the ability to incorporate words in a multidimensional vector space, where each word is located next to other words with obvious connections. For example, if you find the word ‘cooking’ at a particular coordinate in the vector space, you would look to find the word ‘recipe’ and ‘saucepan’ located near by. In this way you can potentially map out an entire language. You can also train your machine to establish words using relations in the form of simple equations, such as:
‘king’ – ‘male’ + ‘female’ = ‘queen’
These elements have the potential to create a very comprehensive natural language AI tool. The multidimensional vector space also has neat applications in language analysis; by rotating and squashing the vector space of one particular language you can map it onto another language’s vector space. This allows them to establish whether there are similarities between different languages. Having done this, Babylon has noticed a very close relation between Bosnian and Serbian, and a less likely relation between Catalan and Russian.
Both these topics give a fascinating insight into the world of machine learning. Events like these give us the ability to think outside of our exact project specifications and look at high-end technologies with a broader scope. Events such as ICLR that promote out-of-the-box thinking and high-end technologies are a great place for our 10x team to be.
We can’t wait to see the next generation of technology they create.
Holly Godwin, Technology Communications Assistant