When it comes to one of humanity’s most important features, machines can grasp small patterns but not the unifying whole.
By Kyle Chayka, THE NEW YORKER, Rabbit Holes
It’s a classic exercise in high-school art class: a student sits at her desk, charcoal pencil held in one hand, poised over a sheet of paper, while the other hand lies outstretched in front of her, palm up, fingers relaxed so that they curve inward. Then she uses one hand to draw the other. It’s a beginner’s assignment, but the task of depicting hands convincingly is one of the most notorious challenges in figurative art. I remember it being incredibly frustrating—getting the angles and proportion of each finger right, determining how the thumb connects to the palm, showing one finger overlapping another just so. Too often, I would end up with a bizarrely long pinky, or a thumb jutting out at an impossible angle like a broken bone. “That’s how students start learning how to draw: learning to look closely,” Kristi Soucie, my high-school art teacher, in Connecticut, told me when I called her up recently. “Everyone assumes they know what a hand looks like, but until you really do look at it you don’t understand.”
Artificial intelligence is facing a similar problem. Newly accessible tools such as Midjourney, Stable Diffusion, and dall-e are able to render a photorealistic landscape, copy a celebrity’s face, remix an image in any artist’s style, and seamlessly replace image backgrounds. Last September, an A.I.-generated image won first prize for digital art at the Colorado State Fair. But when confronted with a request to draw hands the tools have spat out a range of nightmarish appendages: hands with a dozen fingers, hands with two thumbs, hands with more hands sprouting from them like some botanical mutant. The fingers have either too many joints or none at all. They look like diagrams in a medical textbook from an alien world. The machines’ ineptitude at this particular task has become a running joke about the shortcomings of A.I. As one person put it on Twitter, “Never ask a woman her age or an AI model why they’re hiding their hands.”
As others have reported, the hand problem has to do, in part, with the generators’ ability to extrapolate information from the vast data sets of images they have been trained on. When a user types a text prompt into a generator, it draws on countless related images and replicates the patterns it has learned. But, like an archaeologist trying to translate Egyptian hieroglyphs from the Rosetta Stone, the machine can deduce only from its given material, and there are gaps in its knowledge, particularly when it comes to understanding complex organic shapes holistically. Flawed or incomplete data sets produce flawed outputs. As the linguist Noam Chomsky and his co-authors argued recently in a recent Times Op-Ed, machines and humans learn differently. “The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data,” they wrote. Instead, it “operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.”
A generator can compute that hands have fingers, but it’s harder to train it to know that there should be only five, or that the digits have more or less set lengths in relation to one another. After all, hands look very different from different angles. Looking down at my own pair as I type this on my laptop keyboard, my fingers are foreshortened and half obscured by my palms; an observer wouldn’t be able to determine their exact X-ray structure from a static image. Peter Bentley, a professor of computer science at University College London, told me that A.I. tools “have learned that hands have elements such as fingers, nails, palms. But they have no understanding of what a hand really is.” The same problem sometimes occurs when A.I. tries to render smaller features such as ears, which appear as fleshy whirlpools without the intricate cartilage structure; or teeth, which sit incorrectly in the mouth; or pupils, which turn out as caprine blobs. A.I. can grasp visual patterns but not the underlying biological logic.
Part of the problem is that most images of people don’t focus on their hands. We’re not awash in closeups of fingers the way we are in pictures of faces. “If the data set was one hundred per cent hands, I think it would do much better, as the model would allocate more of its capacity to hands,” Alex Champandard, the co-founder of a company called Creative.ai, which develops tools for creative industries, told me. One solution may be to train A.I. programs on specialized monographic data sets. (At his company, Champandard is currently building training sets made up entirely of asphalt or brick images so that filmmakers or video-game developers can quickly add surface texture.) Another might be to add three-dimensional renderings to A.I. data sets, Bentley told me. There is currently no 3-D equivalent of a well-tagged Getty Images archive that an A.I. tool can be trained on, but last December the Microsoft-supported startup OpenAI published a paper teasing a tool that creates three-dimensional models, which could help give image generators more spatial awareness—a knowledge of the skeletal structure beneath 2-D skin.
When writing prompts for A.I. generators, users often aren’t very exact. They might enter the word “hand” without specifying what said hand should be doing or how it should be posed. Jim Nightingale, a former copywriter living in New Zealand who has become an A.I. consultant, told me that he advises people to “imagine how the training images might’ve been labelled, and reverse engineer your prompt from there.” Nightingale suggested naming “recognizable gestures,” such as a clenched fist, and traits, such as hairy knuckles, to help generators isolate more specific or detailed source imagery. Such tricks don’t always work, however. One client of Nightingale’s was an author who needed a digital book cover. The A.I. generated a convincing human figure but had trouble producing a specific hand gesture that the author had in mind, so Nightingale brought on a freelance human artist to paint them into the A.I. image manually.
At least thus far in generative A.I.’s life span, users tend to seek images that get as close to reality as possible. We judge A.I. based on how precisely it replicates what we’ve already seen. Looking at gnarled A.I. hands, we fall into the uncanny valley and experience a visceral sense of disgust. The hands are both real—textured, wrinkled, spotted, with more detail than most human artists could achieve—and totally at odds with the way hands are supposed to be. The machine’s failure is comforting, in a way. Hands are a symbol of humanity, “a direct correspondence between imagination and execution,” as Patti Smith recently wrote. As long as we are the only ones who understand them, perhaps our computers won’t wholly supplant us. The strange contortions of A.I. hands make me feel a sense of anticipatory nostalgia, for a future when the technology inevitably improves and we will look back on such flaws as a kitschy relic of the “early A.I.” era, the way grainy digital-camera photos are redolent of the two-thousands.
Over time, we’ll have fewer clues about which images were generated by A.I and which were made by human hands. As Champandard told me, of the proliferation of odd fingers and incomplete claws, “I think this is a temporary problem.” Soucie, my art teacher, pinpointed a similar novice’s problem in the A.I. images and in her pupils’ drawings. “A student who’s in eighth or ninth grade, when they draw their hand, they always concentrate on the contour,” she said. A young artist tracking the wiggly line of wrinkled skin gets distracted from thinking about the hand’s over-all form, its three-dimensional quality. Like any struggling art student, A.I. tools will benefit from more training. “There’s a point when the structure and the contour come together for a student,” Soucie said. “That’s usually, like, the second year of college.” ♦.
No comments:
Post a Comment