Our eyes are remarkable in making almost instant sense of the world around us in ways that even the most sophisticated machines can’t do. So what cues do we pick up that make sure we can see the wood for the trees?
One of the unforeseen boons of research on artificial intelligence is that it has revealed much about our own intelligence. Some aspects of human perception and thought can be mimicked easily, indeed vastly surpassed, by machines, while others are extremely hard to reproduce. Take visual processing. We can give a satellite an artificial eye that can photograph your backyard from space, but making machines that can interpret what they “see” is still very challenging.
That realisation should make us appreciate our own virtuosity in making almost instant sense of the world around us – even scenes as complex as woods or forests crammed with trees, some overlapping, occluded, moving, or viewed at odd angles or in poor light. This ability to deconstruct immense visual complexity is usually regarded as an exquisite refinement of the neural circuitry of the human brain: in other words, it’s all in the head. It’s seldom asked what are the rules governing the visual stimulus in the first place; that is, what are the patterns that allow us to see the wood and the trees.
But a stands the problem of image analysis on its head by asking not how we solve the problem of interpreting the world but what sort of problem it is in the first place. What hidden patterns exist in the visual stimulus?
Answering that question involves a remarkable confluence of scientific concepts. There is a growing awareness that how data is encoded, inter-converted and transported – whether in computers, genes or the quantum states of atoms – is closely linked to the field of thermodynamics, which was originally devised to understand how heat flows in engines and other machinery. For example, any processing of information – changing a bit in a computer’s binary memory from a 1 to a 0, say – generates heat.
A team at Princeton University led by William Bialek now integrates these ideas with concepts from image processing and neuroscience. The consequences are striking. Bialek and his colleagues Greg Stephens, Thierry Mora and Gasper Tkacik find that in a pixellated monochrome image of the woods in Hacklebarney State Park, New Jersey, some groups of black and white pixels are more common than other, seemingly similar ones. And they argue that such images can be assigned a kind of “temperature”, which reflects the way the black and white pixels are distributed across the visual field.
Their use of temperature to characterise the distribution of light and dark patches within an image is more than a vague metaphor – this pattern is analogous to what is found in physical systems at a so-called critical temperature, where two different states of the system merge into one. A fluid (such as water) has a critical temperature at which its liquid and gas states become indistinguishable. And a magnet such as iron has a critical temperature at which it loses its north and south magnetic poles: the magnetic poles of its constituent atoms are no longer aligned but become randomised and scrambled by the heat. For a selection of woodland images, the researchers show that the distributions of light and dark patches have just the same kinds of statistical behaviours as a theoretical model of a two-dimensional magnet near its critical temperature.
So, what are the fundamental patterns from which these images are composed? When the researchers looked for the most common types of pixel patches – for example, 4×4 groups of pixels – they found something surprising. Fully black or white patches are very common, but as the patches become divided into increasingly complex divisions of white and black pixels, not all are equally likely: there are certain forms that are significantly more likely than others. In other words, natural images seem to have some special “building blocks” from which they are constituted.
If that’s so, Bialek and colleagues think the brain might exploit this fact to aid visual perception by filtering out “noise” that occurs naturally on the retina. If the brain were to attune groups of neurons to these privileged “patches”, then it would be easier to distinguish two genuinely different images (made up of the “special” patches) from two versions of the same image corrupted by random noise (which would include “non-special” patches). In other words, natural images may offer a ready-made error-correction scheme that helps us interpret what we see.