Interpreting the patterns of light that reach our eyes is a very difficult problem, requiring about a third of our brains’ information processing capacity. When performed accurately, this process allows us to perceive many features of every object we see: their colors, shapes, identities, orientations, positions, and the spatial relationships between them. This happens so quickly and flawlessly that we don’t even notice it happen.
In the past week, a set of trippy images revealed on Google’s research blog brought the complexity of the human visual system—as simulated by an artificial neural network called GoogLeNet, developed by Google software engineers—to widespread attention. Attempts to match the performance of human vision using computers constitutes a major scientific field, one that uses some of the world’s most powerful computers. Right now, the leading efforts come from GoogLeNet, which mimics the visual brain’s processing to recognize the objects in natural images better than other methods, and with less computing power.
Ostensibly, Google wants to do this so users can search the internet’s images without a human manually tagging every cat, exposed breast, and selfie-with-brunch. But an interesting side effect of the project is that it shows computers being visually creative, using the stimuli or images they "see" to create new ones in ways that mimick the human imagination. The resulting images recall the hallmarks of artistic movements like Symbolism or Impressionism, the hidden images in Surrealism, or the "cells" of a Chuck Close painting. These are just some of the diverse strategies artists have used to interpret and represent the world around them, filtering what they see through their own neural networks and imaginations. The GoogLeNet images also recall the reported visual effects of psychedelic drugs—and there's a reason for that.
How do these visual distortions occur? In both GoogLeNet and the brain, there are many interacting layers of processing happening at once. The lower layers do really simple calculations: detecting motion, finding edges, analyzing local changes in color. By the later layers, the brain cells and their simulated cousins respond to the presence of specific object classes, like faces, indoor scenes, animals, and tools. This transformation is complicated because two images of cats may look nothing like each other in the early layers: a cat can have any orientation, position, color, or motion.
To make this problem easier, the brain/computer relies on tricks so that it does not need to process the image completely. Perhaps the most interesting of these relies on feedback from later areas in the visual system to earlier ones. When we recognize an object, we don’t need to process all the little details: we can assume our cat is furry, and the details of the fur pattern don’t change how we interact with the cat. So when we see a pattern that looks like a cat, later processing stages amplify the patterns they seem to be receiving and send these back to the earlier stages. Now the earlier stages don’t need to fill in the basic cat structure, which cuts down on the neural processing required.
Original "red tree" image run through an artificial neural network, asking it to recognize images not contained in it. Images via Google Inceptionism Gallery
Such “predictive” processing has lead to the understanding that the brain becomes a mirror of the outside world, and our perception of the world is viewed through that mirror. When learning about the world, we see new patterns and classify them into distinct types. This strengthens the connections between brain cells representing the pattern, so that commonly seen patterns get written into the brain’s architecture of neural connections. The brain then analyses its visual input through these neural connections: it imposes its architecture, and our previous experience, onto our view of the world.
GoogLeNet, extrapolating from human-tagged images in a "training set"—this is a cat, this is a tree, this is a car—does the same thing. It can recognize and identify objects types based on what it's "seen" before. Google’s most recent trick asks what happens when we run an image through a circuit representing a chosen object type not present in the image. This will see the image through the filter of that object type, and impose the chosen object type onto the image anywhere that it might be a valid interpretation of that part of the image. For example, when the system is asked to recognize animals, animal faces are exposed in the random patterns of clouds or tree branches.
The original waterfall becomes an enchanted woodland glade. Images via Google Inceptionism Gallery
Many people who have once taken hallucinogenic drugs find that the resulting images look just like things they have seen while tripping, as comments on Slashdot and the Guardian show. This is a testament to the accuracy of GoogLeNet in mimicking the human visual brain. Many drugs interfere with our perceptual processing in simple ways, like making the room appear to spin. However, the class of hallucinogens that contains magic mushrooms (psilocybin), LSD, mescaline, and DMT alters perception in this specific way. They impose patterns from things we have seen before onto our visual input, making us see faces in the clouds or intricate Oriental rug patterns on fields of grass and canopies of trees. These patterns are constantly shifting as the brain changes which patterns of feedback are activated. The resulting hallucinations vary from simple distortions of edges and colors at low doses (see the Seurat image, above), to dream-like scenes (at top) with no relationship to the incoming visual image at high doses. GoogLeNet’s outputs can mimic either, depending on which layers of the network are activated.
(left) Cats by Louis Wain (right) Vincent van Gogh, The Starry Night
This class of hallucinogen activates the higher levels of our visual processing by activating a type of serotonin receptor. Many of the drugs used to treat schizophrenia act, in part, by blocking the same receptor. It seems that some of schizophrenia’s ability to induce hallucinations may work through similar mechanisms to the hallucinogenic drugs. This may help us understand why some of GoogLeNet’s output reminds us of the distortions of reality seen in Van Gogh’s brushwork in The Starry Night and Cypresses, and Louis Wain’s later drawings of cats. Both artists spent time in mental institutions and diagnoses of schizophrenia—both during their lives and posthumously—have been put forward as explanations for the swirling, kaleidoscopic passages in their artwork.
Google’s engineers and researchers have developed an excellent tool to classify image content on the internet. But GoogLeNet also offers unexpected insight into the workings of the system it aims to mimic—the human brain—allowing us to simulate experiments we simply could not perform on humans or animals. While the psychedelic images released last week might appear novel or gimmicky to some, or like art to others, exercises like these are actually bridging the gap between human and computer visual systems. And it seems that when they mimic the brain closely enough, artificial intelligences not only see like we see, but also trip like we trip.
—Ben M. Harvey
Ben M. Harvey is a researcher in the Department of Psychology and Educational Sciences at the University of Coimbra.