Both drawing and language are fundamental and unique to humans as a species. Just as language is a representational system that uses systematic sounds (or manual/bodily signs) to express concepts, drawing is a means of graphically expressing concepts. Yet, unlike language, we consider it normal for people not to learn to draw, and consider those who do to be exceptional. Why do we consider drawing to be so different from language? This paper argues that the structure and development of drawing is indeed analogous to that of language. Because drawings express concepts in the visual-graphic modality using patterned schemas stored in a graphic lexicon that combine using ‘syntactic’ rules, development thus requires acquiring a vocabulary of these schemas from the environment. Without sufficient practice and exposure to an external system, a basic system persists despite arguably impoverished developmental conditions. Such a drawing system is parallel to the resilient systems of language that appear when children are not exposed to a linguistic system within a critical developmental period. Overall, this approach draws equivalence between drawing and the cognitive attributes of other domains of human expression.
Along with speaking and gesturing, drawing is a representational system for expressing concepts that is fundamental – and unique – to humans as a species. However, despite this primacy, we conceive of the structure and development of drawing as very different from language. Language development has been viewed as an innate capacity for acquiring an external system of schematic patterns from a community. Development progresses within a critical period that lasts until puberty, after which the ability to learn language rapidly declines [e.g., Lenneberg, 1967; Newport, 1990]. We consider it the norm for people to acquire language and the exception for people who do not.
In contrast, drawing development has been viewed as the growth of an individualistic skill of which some are more or less proficient. From birth until puberty, children consistently improve in drawing ability [Kindler & Darras, 1997; Lowenfeld, 1947; Willats, 2005], but they face a ‘period of oppression’ between ages 11 and 14, when their progression abruptly slows and stagnates [e.g., Davis, 1997c; Davis & Gardner, 1992; Gardner, 1980, 1990; Read, 1958; Rosenblatt & Winner, 1988]. This drop-off has been attributed to a lack of interest or motivation [Arnheim, 1997; Read, 1958]. Those that overcome this stagnation are believed to have either some innate talent for artistry or to have worked diligently to overcome it [Davis, 1997c]. If not surpassed, an individual’s drawing ability will remain the same throughout later life [Kindler & Darras, 1997], with progress becoming significantly more difficult. Unlike language, we consider it normal for people not to learn to draw and label those who do as ‘gifted’ or ‘talented’.
Despite the conception of drawing as universally accessible but requiring additional stimulation and effort to progress past puberty, this course of development does not seem to appear in all cultures. Children in Japan do not stagnate in their drawing ability [Toku, 1998, 2001a, 2001b; Wilson, 1997, 1999]. Their continual progression in learning to draw has been attributed to the high rates at which Japanese children copy drawings from comics [Toku, 2001b; Wilson, 1988, 1997, 1999]. In Japan, comics are a ubiquitous part of society, are read by people of all ages, and comprise nearly one third of all printed material [Gravett, 2004; Schodt, 1983, 1996]. The high degree of exposure to and imitation of Japanese comics has been argued to stymie this drop-off in development.
Why would copying comics enable Japanese children to progress in drawing abilities while drawing practices in other parts of the globe lead to stagnation? An answer to this question lies in situating drawing alongside developmental trajectories of other representational capacities. In particular, I will argue that the drawing system is structured like the linguistic system and, thereby, has an analogous development. The comparison between drawing and language is not new [e.g., Arnheim, 1974; Willats, 2005; Wilson & Wilson, 1977], and this paper attempts to articulate further the idea that drawing is similar to language in function, form, and development.
Structure and Development of Drawing
Drawing involves many interacting components, including the perceptual system, fine and gross motor skills, perceptual feedback, interaction with the drawings of a culture, social interactions and motivations, emotional valence, and others. The development of drawing must then engage with these multiple factors. The most widely held view has looked at drawing as a method to articulate perception. That is, ‘life drawing’ involves looking at objects in the world and then presenting them graphically (also in the literature called visual realism or view-based depiction). Alternatively, ‘drawing from memory’ involves the conceptualization of an object in the mind in lieu of having it perceptually available (also called intellectual realism or object-based depiction).
Psychological theories have often emphasized this perceptual view of drawing [Farah, 1984; Guérin, Ska, & Belleville, 1999; Kosslyn & Koenig, 1992; van Sommers, 1989; Willats, 1997, 2005]. Many cognitive models of drawing integrate Marr’s  hierarchic model of visual perception with action production and planning. Marr’s  model theorizes several levels of the understanding of visual perception, progressing from a flat perceivable image (the ‘2½D Sketch’) through an abstract 3D Model of spatial concepts about objects. These cognitive models of drawing link mental imagery with articulation: drawing from life articulates the 2½D Sketch of the perceived visual surface, and drawing from memory directly articulates the 3D Model.
The perceptual viewpoint of drawing (either from the eye or the mind’s eye) is an intuitive one because it matches the phenomenological experience of drawing. Nevertheless, it cannot capture several important traits of drawing. For example, why do people draw differently? It is unlikely that people actually perceive (or conceive of) the world in a different way, so are differences in drawing ability solely related to motor skill? The reverse is also a query: Why do individuals in one culture draw similarly, but then draw differently from individuals in other cultures?
In contrast to the perceptual viewpoint (i.e., ‘drawing is for representing what I see [by eye or in my mind]’), I will explore an alternative function of drawing more in line with the function of language. Language is a representational system that uses systematic sounds (or manual/bodily signs) to express concepts [Jackendoff, 2002]. These schemas are stored in memory and can be combined to create infinitely possible novel expressions. By and large, these expressions use symbolic reference, although language incorporates a fair amount of indexical and iconic reference as well [Clark, 1997; Liddell, 2003].
Similarly, drawing is a means of graphically expressing concepts, even though it largely does this through iconic reference. If drawing is limited to representing what we see, the meaning of drawings (or the access to the meanings of drawings) should be universally similar. In fact, like language, drawings are highly conventionalized to individuals and groups to the extent that sometimes even iconic representations cannot be discerned [Wilkins, 1997]. For example, people can readily identify the geographic and temporal origins of drawing ‘styles.’ It should be easy to recognize at a glance whether a drawing comes from the ancient Mayans, Greeks, or Chinese, or modern Japanese comics versus American comics. The very notion of cultural drawing styles belies a purely perception-based model of drawing because styles are built from conventional patterns shared by people of a culture.
There are many examples of more specific graphic schemas. Drawing teacher Mark Kistler  describes an activity that highlights people’s conventional schematic knowledge for drawing. When he asks people to very quickly draw a house, an airplane, and a person, they consistently draw the same conventionalized representations (fig. 1). While these figures are iconic images of houses, airplanes, and people, they reflect conventionalized patterns of drawing (whose house actually looks like that?). The ‘stick figure’ is a conventionalized representation of a person. Indeed, cultures use diverse yet systematic ways of drawing people [Cox, 1998; Cox, Koyasu, Hiranuma, & Perara, 2001; Paget, 1932; Wilson, 1988], and some even have difficulty discerning what the American style stick figure represents [Wilkins, 1997].
Additionally, various artists use schematic representations in their drawings that combine to make larger novel forms, thus masking their systematic nature. Take, for example, the representations of hands in figure 2 by three comic artists. The first row depicts consistent patterns used for drawing open hands and fists by Jack ‘King’ Kirby, who is regarded as one of the most influential artists in American comic history and is attributed with defining the aesthetic of superhero comics [Duncan & Smith, 2009]. Below Kirby’s drawings are examples of hands drawn by two other popular contemporary artists. Not only were all three artists internally consistent with their own patterns, but the contemporary artists also clearly used the same schematic representations that Kirby used, in a sense validating his influence (at least in the small scale). This imitation alone does not suggest that drawings are as fully conventionalized as languages, but it does imply a tendency toward systematicity within and between artists.
Evidence for schematic graphic representations is not limited to a few artists within a culture. In a seminal study, Wilson and Wilson  examined the drawings of American high school students and found that nearly all of their drawings were in some way imitative of other drawings, particularly comics and cartoons. They further examined the drawings of children throughout the world and found both consistency within and variability between the drawings of various cultures. For example, in Japan, children’s drawings are highly consistent because they are imitative of Japanese comics [Cox et al., 2001; Wilson, 1988, 1997, 1999]. Meanwhile, rectangular bodies are drawn by children in Islamic countries [Wilson, 1988; Wilson & Wilson, 1984] while bottle-shaped bodies were drawn by English children in the late 1800s [Sully, 1896]. In another corpus, 100% of drawings by Mexican children in California from the 1920s used a method of drawing legs with crossing lines that attached to the torso in a systematic pattern [Wilson, 1988]. Indeed, Wilson described an example of these drawings in which an initial attempt to draw legs another way was only to be erased and completed in the ‘correct’ conventional method.
These findings overall led Wilson and colleagues to conclude that drawing involves the transmission of culture-specific schemas, not the representation of perception [Wilson, 1988]. They argued that people store hundreds to thousands of these mental models in long-term memory and then combine these parts to create what on the whole appears to be a novel representation [Wilson & Wilson, 1977]. As a result, drawing becomes similar to language in that it uses a ‘lexicon’ of schematic models stored in memory that combine generatively to create innumerable novel images. This view goes against the perceptual viewpoint of drawing because schematic information often looks very little like the way things actually look in perception or in visual memory.
Cognitive Model of Drawing
Given these data, a cognitive model of drawing must not only account for the articulation of perception but also for schematic graphic representations. Schematic patterns have been included in some models of drawing [Guérin et al., 1999; van Sommers, 1989], but they are considered as production schemes – routinized procedural memories of familiar methods of drawing, not a ‘vocabulary’ of stored graphic forms. However, the aforementioned instances of patterns provide evidence that this knowledge is stored in long-term memory beyond just procedures. We therefore explore an alternative model for the cognition of drawing that incorporates a lexicon of schemas, a sketch of which appears in figure 3. This model expands on the work of previous researchers who also noted the language-like qualities of drawing, in particular Willats’s [1997, 2005] insightful work on the cognition of drawing, and Wilson’s observations about graphic schemas [e.g., Wilson, 1988; Wilson & Wilson, 1977].
This model contains three major parts: (a) the perceptual system, (b) the drawing system, and (c) meaning. The perceptual system involves Marr’s  components of vision: the 2½D Sketch is the form of visual perception, and the 3D Model is the full spatial understanding (thereby part of the system of meaning). Further details of Marr’s model that are unnecessary for this discussion are subsumed by the dotted lines [see Jackendoff, 1987; Marr, 1982].
The ‘Image’ Image. The component of most interest is the drawing system. First, the 2½D Sketch is linked to the ‘Image’ Image, which is the mental image in working memory of what a person may intend to draw. Often, drawing requires planning and problem solving related to holding intended objects or scenes in mind while only articulating a part at a time, in addition to operationalizing how to draw spatial relations such as occluded objects [e.g., Morra, 2002; Morra, Angi, & Tomat, 1996; Morra, Moizo, & Scopesi, 1988] or the order of producing drawings [Freeman, 1980]. For example, children who have a larger capacity for working memory more accurately draw objects as occluded, as opposed to keeping them as separate objects [e.g., Morra, 2002; Morra et al., 1988, 1996] because they are better able to engage the procedures necessary for depicting their spatial relations. This ‘Image’ Image interfaces with the two primary components of drawing stored in long-term memory: a graphic lexicon that encodes various schemas and a graphic syntax that specifies the combinatorial principles for drawing.
The Graphic Lexicon. The graphic lexicon stores the visual vocabulary used for drawings. In verbal language, lexical items range in size, including individual phonemes, morphemes, words, idioms, schematic constructions, and possibly whole sentences [Goldberg, 1995; Jackendoff, 2002]. Similarly, ‘graphic lexical items’ range in size as well. The lexicon must include individual graphemes that compose the basic graphic parts of a representation (i.e., dots, lines, curves, circles, squares, etc.) – similar to Willats’s  ‘picture primitives’ (i.e., dots, lines, or areas) that denote edges, corners, and faces of denoted objects. Just as children begin speaking by playing with the phonemes of speech, children begin drawing by scribbling, which has been argued as laying the foundation for grapheme production [Golomb, 1992; Kellogg, 1969; Kindler & Darras, 1997; Matthews, 1983].
At a larger level from individual primitives, the graphic lexicon would also need to include schematic parts of images (such as the hands in fig. 2) that can be used combinatorially in a larger picture, as discussed by Wilson [e.g., Wilson, 1988; Wilson & Wilson, 1977]. Some whole simple representations (such as the houses and stick figures) and even some whole systematized full drawings (such as if a person memorized how to draw a particular scene in a certain way) might also be stored. In addition, this lexicon would be necessary for storing the schematic parts of written languages such as letters and whole word forms (which thus would interface with phonology).
Iconic graphic schemas interface with the 3D Model to get their meaning, and abstract elements with no iconic meaning (such as symbolic hearts, stars to denote impacts in comics, peace signs, etc.) interface with the nonspatial semantic memory of conceptual structure. Willats  has addressed how iconic representations correspond to the 3D Model in detail, and he has hypothesized that the mapping of graphic marks to spatial knowledge may actually change throughout development [Willats, 2005]. Children begin by using lines to depict whole regions and faces of objects, and they only use lines to depict contours and edges as they grow older. This change relates to the interface between graphic marks and their meanings.
Graphic Syntax. Graphic syntax includes combinatorial aspects of image-making. These include rules like adjacency and occlusion, rules regulating size to depict depth, and rules for combining different schemas together (such as attaching hand schemas to arm schemas). For example, Willats  has carefully detailed the ways that lines conjoin in T-shaped junctions to depict occlusion, Y-shaped junctions to show corners, and L-shaped junctions to show edges. ‘Ungrammatical’ images occur when, instead of using a T-shaped junction of intersecting lines, Y- or plus-shaped junctions make occlusion impossible [Huffman, 1971; Willats, 1997]. These types of combinatorial adjacency between lines differ from simply drawing contiguous lines and, indeed, occur much later in the acquisition of drawing abilities [Willats, 2005]. These aspects of drawing deal with the syntax of how lines and schemas link together.
Production Scripts. Beyond encoding the form of a drawing, articulation of those schemas may involve procedural memory [Guérin et al., 1999; van Sommers, 1989]. Production scripts specify the order in which schemas are drawn. This is important because, while many drawings end up and are comprehended as purely static spatial representations, they are produced in a temporal span. For example, most people would agree that drawing a stick figure begins with a circle for the head, then a straight line for the body, then either both legs or both arms (fig. 4). Moreover, for most people, it would be strange to start with the legs then work upward or to alternate between drawing each leg and arm. This order is not based on the actual schematic elements of the representation but on the procedure for articulating it. Other schemas may be ad hoc, requiring no intrinsic ordering. For these, drawing may use a default script, such as moving from top to bottom [Freeman, 1980], which may then include backtracking to fill in details, and can be overridden by the need to attend to certain salient features [Golomb, 1981, 1992]. Another default script may push a drawer to maintain complete objects throughout production, which results in ungrammatical orderings if forms are left unspecified for too long [Willats, 2005].
Production scripts connect the drawing system to the motor system where drawings are finally articulated. From here, an additional arrow connects vision with motor skill, representing the visual feedback that people use as they see their hands moving while producing a drawing, thereby allowing the system to make adjustments. This link would thus be inhibited for blind individuals or for nonblind individuals during drawing practices that involve closing one’s eyes.
Graphic Lexical Items. A lexical item of drawing would cut across these major components in long-term memory. This is similar to how linguistic lexical items involve syntactic, phonological, and conceptual structures [Jackendoff, 2002]. For example, in language, the word dog is a noun (syntax), is pronounced /dawg/ (phonology), and means something like ‘a four-legged furry canine that is man’s best friend’ (conceptual structures). Similarly, as depicted in figure 4, a stick figure would consist of a circle and several lines (graphemes) that are stored in a particular configuration (lexicon) that need to be adjacent to each other but not overlapping (syntax), that are produced in a particular order (production script), and that represent a particular meaning (conceptual structure/3D Model).
However, not all lexical items are encoded in all structures. Simple graphemes such as lines and circles have no conceptual meaning, just as simple phonemes are purely stored aspects of phonology. Purely syntactic rules such as adjacency or perspective cannot be encoded in the lexicon because they have no particular schema to store, only a general algorithm for application (as will be discussed further below). Some schemas may also not have particular production scripts since the order in which they are drawn does not matter. Thus, graphic lexical items may be cross-listed in all structures, but they are not required to be.
Perceptual Viewpoints of Drawing, Revisited
How can this model account for aspects of drawing? When people draw, the schemas stored in long-term memory in their graphic lexicon are combined using rules from their graphic syntax. This combined information is then held in working memory as the ‘Image’ Image, their mental conception of their intended drawing. From here, the drawing is then articulated using production scripts enacted through motor skills.
In the case of life drawing, the ‘Image’ Image uses schematic information to draw the perceived surface in vision (i.e., the 2½D Sketch). However, because drawing always involves schematic information, life drawing is the attempted inhibition of the graphic lexicon in order to draw the 2½D Sketch as it appears (and it possibly even attempts to inhibit the 3D Model so as not to let the understanding of objects interfere with how they look via the eye). Nevertheless, the degree to which the lexicon can truly be inhibited is an open question. People draw differently from each other while still viewing the same visual object, yet their perceptual and conceptual systems most likely do not differ. Their style reflects the influence of their individual stored patterns of drawing despite their perception.
In contrast to life drawing, drawing from memory is the uninhibited articulation of schematic patterns connected to the spatial knowledge of visual objects. Here, meaning comes from the 3D Model, which is formulated through the combination of schemas (graphic lexicon) and combinatorial rules (graphic syntax) as held in working memory by the ‘Image’ Image. Again, this information is articulated by motor skills using production scripts. Importantly, this type of drawing does not interface with the 2½D Sketch to draw what is seen. The 2½D Sketch is only interfaced in order to get perceptual feedback on the drawing as it unfolds.
Additionally, it has long been noted and experienced that many drawers are frustrated when they cannot accurately draw their mental imagery. This feeling reflects the link between the ‘Image’ Image and the 3D Model. The perceptual viewpoint of drawing leads people to believe that this link is direct. That is, they should be able to draw objects as they conceive of them. However, the graphic lexicon intervenes between these components, requiring drawers to use schematic patterns in their representations. Thus, despite believing in a direct link between articulation and conceptualization, drawers may not have graphic schemas that allow for fully representing their intended meanings.
This model also allows us a better understanding of the treatment of depth and shading. Some aspects of depth can be stored in the graphic lexicon, such as prototypical patterns for drawing cubes or cylinders (fig. 5a). However, point perspective cannot be stored in memory as a schema because it must be applied uniquely to each scene using reference to a vanishing point. Rather, perspective must be stored in syntax as a routinized algorithm that must be applied in novel ways to each scene (fig. 5b). Aspects of shading can also be stored in the lexicon. This includes drop shadows, cross-hatching, and systematic methods of shading such as when artists use exactly the same shading pattern every time they draw a backlit three-quarter perspective on a face (fig. 5c).
However, ‘realistic’ shading involves light sources and their interactions. This too would involve applying a syntactic algorithm anew to each scene (fig. 5d). Including perspective and realistic shading into graphic syntax also makes sense of why they are so difficult to learn and why they need to be invented or discovered despite being commonplace aspects of visual perception: they cannot be stored easily amongst other schemas in the graphic lexicon.
Finally, strengthening or inhibiting these connections can possibly account for testable differences in proficiencies or deficits regarding drawing abilities. For example, one possibility is that gifted drawers simply retain visual memories longer (‘Image’ Image) and thus have greater access to articulating them [Munro, Lark-Horovitz, & Barnhart, 1942]. Another possibility is that they rapidly acquire a large set of graphic schemas and their combinatorial properties (graphic lexicon and syntax) [Wilson, 1974; Wilson & Wilson, 1977, 1982a]. Alternatively, Milbrath  indicates that talented drawers may be more ‘attentive to perception and the act of drawing’ (vision-motor control interface) and be able to make more varied use of visual representations (graphic lexicon/syntax-3D Model interface). All of these ways (and others) could characterize drawing proficiencies, and it may be the case that giftedness in drawing enhances different aspects of this model for different individuals.1
This model also allows for us to characterize drawing deficits further. For example, children with Down syndrome and Williams syndrome have comparable levels of overall intelligence yet differ in their capacity for various skills, including drawing [Bellugi, Bihrle, Jernigan, Trauner, & Doherty, 1990; Bellugi, Korenberg, & Klima, 2001]. Children with Down syndrome are able to draw with relatively the same ability as typical children. We might expect that their drawing system remains intact. However, although children with Williams syndrome can draw simplified parts of an object (such as bicycle handles, wheels, and frame), they cannot organize those parts into a coherent whole. With this model, this deficit within the drawing system can be diagnosed as a problem with graphic syntax.
Convergence and Divergence of Language and Drawing
We have now established a model of drawing structure that is analogous to that of language. Despite their similarities, it is worth addressing some divergences between these domains before addressing issues of development. First, while all three modalities of expression (verbal, manual, and graphic) use all three types of reference (symbolic, indexical, and iconic), each modality appears to favor one type of reference in particular. For example, speech dominantly uses symbolic reference, manual communication makes great use of indexicality, and drawing dominantly uses iconic reference. This iconicity means that drawings must map graphic marks to perceptual meanings that reflect this interface of resemblance [Willats, 1997], which differs from the more arbitrary form-meaning mappings in speech. This iconic reference leads to the wider universality of comprehending drawings and, concomitantly, to the perspective that drawings simply reflect perception of the world. Importantly, conventionality is orthogonal to these types of reference, meaning that both iconic and symbolic signs can be highly conventional [Peirce, 1931]. Yet, only symbols derive their meaning from conventionality, implying that for symbolic forms like most words, knowledge of that conventionality is important for both comprehension and production. In contrast, the iconicity of drawings means that their comprehension can rely on more general aspects of perception, although their production still involves patterned schemas stored in memory.
Second, unlike drawings, the verbal form forces a linear reception of spoken language. Producing and hearing verbalized sounds occur throughout a temporal span, making the linearity of language an artifact of its medium. In contrast, although drawings do unfurl temporally, this linearity may disappear once the production script has ended, leaving a spatially preserved whole image. In many cases, the perceiver of a drawing never sees the temporal unfolding of the image, meaning that production and comprehension of that production remain separated both in nature (socially interactive vs. noninteractive communication) and processing (seeing an unfolding representation vs. a whole representation). Beyond comprehension, this leads to further limitations on development: a learner never sees (and thereby cannot acquire) the production script from only seeing a finished representation.
Finally, a third difference between the structure of language and the structure of drawings comes from the nature of their syntax. Clearly, the graphic syntax presented here for individual images differs greatly from the grammatical system used to concatenate words in sentences.2 In the verbal form, syntactic structure organizes meaning into a coherent presentation [Jackendoff, 2002]. However, the graphic syntax described here does not organize meanings together but, instead, combines together graphic lines – the sensory component of drawing – in perceptible ways. This system more closely resembles the one governing the sounds of speech: phonology. Like phonology, the combinatorial system described here involves the form of the visual-graphic representation, and thereby may be a more analogous structure to graphic syntax than sentence level grammar [as was also argued by Willats, 1997]. For example, ungrammatical line junctions are more akin to illegal phonological strings (like starting a word with tf) than errors in word order. Indeed, all levels of linguistic structure use a combinatorial system, not just syntactic structure built of nouns and verbs [Jackendoff, 2002].
Drawing Development with a Graphic Lexicon
We now have a view of drawing as the graphic conveyance of conceptual knowledge through an articulation of schematic patterns. The process of drawing development then involves acquiring and producing graphic schemas [Callaghan, 1999; Wilson & Wilson, 1977] just as the acquisition of language involves acquiring the lexicon and grammar in a child’s environment. Drawing ‘fluency’ reflects achieving a command of articulating this schematic information (as opposed to achieving visual realism and/or effective, aesthetic, or emotive artistry) just as fluency in language is the ability to effectively construct coherent sentences (as opposed to producing aesthetic poetry or clever prose).
The limitations children face along the way to this fluency can reflect various facets of the model. For example, one possibility is that children actually have complete conceptions of what they are trying to draw (3D Model), but they may not have the schematic or syntactic ability to do so in generative ways [Freeman, 1980; Freeman & Cox, 1985; Karmiloff-Smith, 1990]. Alternatively, they may be fully capable of conceiving meanings – and they may or may not have schemas in mind to do so – but their motor skills have not yet developed the dexterity to articulate them properly. Another possibility is that children lack the working memory (i.e., in the ‘Image’ Image) to hold schematic and syntactic information in order to articulate the conceptions [e.g., Eng, 1931; Freeman, 1980; Morra, 2002; Morra et al., 1988, 1996]. Many of these various hypotheses for drawing development have analogues in language acquisition research. This model allows us a way to frame these varying hypotheses in a testable way (and integrate them together).
With the view that drawing development at its basis involves learning schemas, imitation must be a driving mechanism behind this acquisition. Although imitation of others’ drawings has often been disparaged by Western art education [e.g., Arnheim, 1978; Cižek, 1927; Lowenfeld, 1947], transmission of cultural conventions through imitation of external stimuli has long been recognized as a spontaneous and common trait of development in general [Piaget, 1951]. Copying drawings highly benefits learning [Gardner, 1980; Wilson & Wilson, 1982b], and imitation of external graphic sources is largely motivated through observation and modeling, not necessarily with explicit art instruction through pedagogy [Lamme & Thompson, 1994; Wilkins, 1997; Wilson, 1997; Wilson & Wilson, 1977]. This benefit is enhanced when a learner interacts with the process of drawing and not just with the end result [Pemberton & Nelson, 1987]. Therefore, learning the production script along with the schema facilitates drawing ability. Also, contrary to the traditional belief that imitation limits children’s creativity [e.g., Arnheim, 1978; Cižek, 1927; Lowenfeld, 1947], recent work suggests that imitation actually fosters creativity for drawing [Huntsinger, Jose, Krieg, & Luo, 2011; Ishibashi & Okado, 2004]. Finally, imitation is important for socialization [Korzenik, 1979], which thereby facilitates drawing ability [Callaghan, 1999]. Although younger children copy as a means of acquiring knowledge, older children do so to adapt to the conventions of their culture [Smith, 1985]. In sum, imitation appears central to children’s natural development of drawing ability.
If drawing development involves the acquisition of schemas through imitation, we now have a way to understand the difference between the developmental trajectories of children in America and Japan: Japanese children receive the requisite practice of and exposure to a visual vocabulary, but American children do not. Comics in Japan provide a rich visual language that children can acquire through imitation [Cox et al., 2001; Toku, 1998, 2001b; Wilson, 1988, 1997, 1999]. Studies have shown that over two thirds of Japanese children’s drawings are imitative of comics [Wilson, 1999], and nearly all Japanese 6-year-olds are capable of drawing complex graphic narratives in comics, yet less than half of 12-year-olds of other countries have this proficiency [Wilson, 1988]. Importantly, the acquisition of the comics’ style of drawing is largely motivated by Japanese children’s own exposure to and imitation of comics. Institutional art classes in Japanese schools focus mostly on the same ideals heralded by Western art education and, on the whole, do not promote or practice the drawing of comics [Wilson, 1997].
One reason imitation is easier in Japan is that Japanese comics predominantly use a stereotypical style featuring large eyes, pointy chins, and big hair [Cohn, 2010; Gravett, 2004; Schodt, 1983]. This style is not constrained to comics and recurs ubiquitously in cartoons, advertisements, and visual culture. Indeed, one is pressed in Japan not to find this style in graphic representations. This style originated with the Japanese ‘God of Comics’ Osamu Tezuka, who was influenced by Walt Disney and Western cartoonists [Gravett, 2004]. Due to his unprecedented popularity at the birth of the contemporary Japanese comic industry, many other comic authors imitated his style. As the industry grew and developed, the style became associated with no individual author but as a conventionalized representation characteristic of the whole nation – just like a language.
In contrast, stylistic consistency like this does not occur in American visual culture. Although some conventionalization does exist between American comic artists (for example, among studios of comic artists), great diversity is observed among drawing styles throughout the culture. Comics, cartoons, advertisements, etc., all use vastly different graphic styles within and between them. Without a ubiquitous consistent style, American children face a harder time acquiring an external system of schemas. Which style, if any, do they choose?
Social motivation also plays a role based on the prevalence of a consistent style. By imitating the style from their comics, Japanese children participate in the visual language of their culture. In contrast, without a consistent style that reflects their social community, American children do not have a social motivation to imitate unless they belong to a subculture that values some visual style, such as American readers of Japanese comics [Sell, 2011].
In sum, comics provide Japanese children with a consistent visual vocabulary they can acquire. As a result, they have greater proficiency in drawing than American children and an absence of a drop-off in drawing development [Toku, 1998, 2001a, 2001b; Wilson, 1997, 1999]. By comparison, learning to draw from life drawing does not facilitate the same graphic fluency as imitation of schemas. Because each visual percept of objects in the world is different, a learner is charged with figuring out how to represent the very unsystematic aspects of a visual surface [Arnheim, 1969, 1974], thereby not learning a vocabulary of graphic schemas. For someone who has relied on representing such visual perception in learning to draw, drawing from memory then becomes a harder task because no vocabulary of schemas has been acquired.
Impoverished Systems of Expression
According to this account, drawing development resembles language development: it involves the acquisition of a visual vocabulary of patterns. So, what happens when a user does not acquire enough schemas to attain a level of proficiency (i.e., fluency)? In order to gain a better understanding of such ‘impoverished’ abilities in drawing, we will first discuss the current understanding of drawing development followed by comparison to other linguistic domains that lack adequate stimulus. Again, it is worth stressing at the outset that we should not necessarily predict an exact correlation between the development of drawing and language even though some research has looked at possible correlations [Kindler & Darras, 1997]. However, as will be emphasized, important parallels in the overall course of development deserve to be highlighted.
Since the middle of the twentieth century, the development of drawing has often been recognized as involving a linear trajectory towards realism [Lowenfeld, 1947]. These theories of development agree that children begin in infancy with a stage of uncontrolled scribbling, a near equivalent to vocal babbling [Kindler & Darras, 1997] in which the foundation of graphemes is established [Kellogg, 1969; Matthews, 1983]. This is followed by a stage of graphic babbling, where shapes are repeated but do not necessarily form recognizable objects. By ages 2–3, children begin depicting actions through basic shapes and scribbles and then shift to using drawings to represent objects rather than actions. Between ages 5 and 8, they begin treating graphics as an autonomous system with increasing detail, realism, and complexity.
Not everyone has agreed that development follows a linear trajectory toward realism. Luquet [1927/2001] was among the first to acknowledge variation in development. He noted that children often produce different styles showing their conceptualizations of objects before settling on a full aim towards perceptual realism. For example, when young children draw cubes set in front of them (e.g., boxes, dice, etc.), they will represent all the full faces, including those that are imperceptible, resulting in an ‘impossible’ object [Luquet, 1927/2001; Willats, 2005]. In essence, Luquet theorized that children progress from using mental models to drawing from perception.
Luquet’s observations were supported experimentally by Freeman and his colleagues [Freeman, 1980; Freeman & Cox, 1985] who focused on how children’s drawings depart from realistic representation (such as unrealistic body proportions). He hypothesized that children are impaired in their production and draw from (and thus emphasize) prototypical concepts rather than a complete visual model. This difference was also addressed by Milbrath , who argued that ordinary children rely on conceptualized imagery, while gifted drawers concentrate earlier on forms, apparent shapes, and perceptual realism. Again, these approaches involve a progression broadly between drawing from conceptualized imagery (from memory) to drawing a perceived surface (life drawing).
Willats [1997, 2005] described a more detailed model of the actual process of drawing tied to Marr’s  theory of perception. He too links drawing to cognitive manifestations of life drawing (view-centered depiction) and drawing from memory (object-centered depiction), and he readily acknowledges the role of schemas, although their role in development remains unspecified. To him, the development of drawing reflects attempts to represent ever-increasing levels of perceptual understanding. He argued that the scribbles made between ages 1 and 3 are actually attempts to draw full regions, not random lines or exploratory antecedents to actual representations [as in Kellogg, 1969]. Children then progress between ages 3 and 8 to drawing regions as bounded areas. As early as age 6, they begin to use faces of objects to denote regions (rather than contours) and by age 8 they begin smoothing the outlines of objects. Only by around age 10 do children begin to use lines to depict edges and contours, evidenced by their use of line junctions and occlusion. In terms of the model described above, Willats’s model outlines that children first use their basic graphemes to depict whole objects, then proceed to basic schemas, and finally combine those schemas with syntactic rules of adjacency and occlusion.
With this in mind, it is important to note that other researchers have emphasized developmental trajectories that do not reflect a linear trend. Gardner and Winner  argued that drawings by young children are aesthetically comparable to mature artists, while drawings from adolescents undergo a period of poorer aesthetic power. Experimentation did show evidence for this U-shaped development when fine art judges rated the drawings by children, adolescents, and mature artists [Davis, 1991, 1997a, 1997b]. However, later experiments failed to replicate these findings when judges did not expressly value Modernist style art as ‘good drawing’ [Kindler, 2000; Pariser, Kindler, & van den Berg, 2007; Pariser & van den Berg, 1997; Pariser et al., 2007]. These counter experiments emphasized that children draw upon numerous influences in nonlinear development of drawing, engaging in different repertoires that suit cultural contexts and artistic aims [Kindler, 2003; Kindler & Darras, 1997, 1998]. Under this view, imitation of cultural influences plays a very important role.
Although some research has highlighted the importance of cultural influence, imitation, and drawing from schemas, models do not directly incorporate the role of imitation into their trajectories of drawing development. Some theories admirably stress diverse cultural influences, but they often do not describe how these cultural repertoires interact with the reasonably well-attested aspects of stage theories. Essentially, by so readily acknowledging the diversity offered through culturally contextual drawing influences, important details about imitative versus nonimitative development may be overlooked or blurred. However, theories that ignore cultural influence altogether actually describe how drawing persists without imitation and schemas. In essence, these theories frame what an impoverished system of drawing looks like.
Verbal and Manual Communication
We can now compare the development of drawing with that of other modalities. Children’s development of language relies greatly on exposure to and practice with an external system. Although researchers debate the degree to which the underlying structures of language are genetically endowed [e.g., Pinker, 1994] or culturally acquired [e.g., Tomasello, 2000], there is little argument that children must acquire the linguistic system in their environment to learn to speak. A child born in Boston will likely learn English while a child in Tokyo will learn Japanese. Additionally, language learning progresses within a critical period that extends until puberty, after which the ability to learn rapidly declines, and full linguistic competency will be unattainable [e.g., Lenneberg, 1967; Newport, 1990].
It is important to note that language researchers do not have hard and fast evidence for the existence of a critical period in language development. Indeed, the ‘forbidden’ experiment that deprives children of language would be inhumane. However, the accumulation of numerous examples has provided fairly convincing evidence that language learning after puberty becomes manifestly more difficult. For example, immigrants who move to a new country as adults often struggle to become fluent in the language of their new home. Even with explicit instruction in vocabulary and grammar, they may never sound truly native. However, if children move within their critical period, they have no trouble becoming native-like speakers [Klein & Perdue, 1997].
Other examples come from children who, for unfortunate reasons, were not exposed to a language at all within their critical periods. The most famous of these cases is ‘Genie’ [Curtiss, 1977], a girl who was isolated from human contact from age 2 and discovered at age 13 having never truly been exposed to language. After subsequently receiving intense language instruction, she did develop a significant vocabulary, although she never fully attained proficiency with even simple rules of grammar. Another case was a woman who did have regular social contact, but was believed to be retarded until age 31, when she was discovered to have been deaf [Curtiss, 1994]. With hearing aids, she also acquired a basic vocabulary, but her grammatical abilities were even more dramatically impoverished than Genie’s. Overall, late learning of verbal language appears to yield a rudimentary vocabulary but with impairments in being able to create sentences.
Additional research on the manual modality has further described what happens when people face a lack of linguistic input. Children, whether hearing or deaf, will learn sign language if they are exposed to it, although greater fluency comes with exposure earlier in life [Newport, 1990]. Without depending on sign language, speaking individuals also learn the gestures of their society. These signs either consist of spontaneous novel gesticulations(e.g., hands widening to express big or moving to imitate the motion of an action) or conventionalized emblems(e.g., thumbs up for good or a checkmark in the air to ask for a restaurant check) [McNeill, 1992]. However, these signs are usually produced in isolation without any sort of syntactic sequence and appear at a rate of roughly one sign per spoken clause [McNeill, 1992].
A more extreme scenario occurs when deaf children are raised by hearing parents and are never exposed to a sign language. These children have no external linguistic input, but instead they create their own manual sign systems. These ‘homesigns’ usually include several hundred signs and are generally limited to expressing ‘sentences’ only up to three signs in length [Goldin-Meadow, 2003; Goldin-Meadow & Feldman, 1977; Goldin-Meadow & Mylander, 1990]. Although some conventionalized emblems from a culture do appear in homesign, most of these signs are invented by the children, not the parents, as evidenced by the greater proficiency that children have with the signs than their parents [Goldin-Meadow & Mylander, 1983]. These findings suggest that, in the absence of a system for deaf children to acquire sign language, they will invent systematic signs to communicate despite such systems being limited in nature.
Research on impoverished linguistic abilities has led Goldin-Meadow  to make an important contrast between resilient and fragile properties of language. Resilient properties of language withstand impoverished developmental conditions and emerge despite the absence of an external input. These properties appear in homesigners and other instances of impoverished language learning; they include features like basic segmentation of signs into contrasting meanings and rudimentary patterns for ordering signs into simple sequences.
In contrast, fragile properties of language are those that do not survive impoverished conditions and require external input such as an extensive lexicon and syntactic complexity in multisign sentences. These conventional aspects of language must be learned within the critical period of language development; otherwise only the resilient properties will remain. The persistence of resilient properties has led some researchers to implicate these features as reflecting an innate system of ‘protolanguage’ that evolutionarily predated the conventionalized fragile features which build on top of this core system [Bickerton, 1990; Jackendoff, 2002].
In this light, how can we interpret the drawing abilities of individuals who do not receive adequate exposure to or practice in drawing? Individuals who do learn to draw proficiently copy other drawings to acquire a lexicon of graphic schemas. Alternatively, some people may diligently create their own systems where they invent schemas for their personal graphic idiolects. If neither of these occurs, individuals are left with a ‘homedraw’ (table 1) – an impoverished system of drawing that consists of simple, ubiquitous conventionalized signs (such as stick figures and canonical houses), a remedial capacity for novel signs, and spontaneously created drawings.
We can think of this impoverished system of drawing as similar to Goldin-Meadow’s  resilient properties of language. These are the aspects of drawing that pervade despite lacking an external input. Some resilient features may permeate across domains, and some may be specific to drawing. For example, homesigns, sign languages, and even spoken languages such as Navajo and Japanese use ‘classifier’ systems that map a variety of meanings to basic shapes and image schemas [Armstrong & Wilcox, 2007; Goldin-Meadow, 2003]. Similar mappings appear in the sand drawings of native Australians, where simple circles and lines can stand for any number of meanings [Munn, 1962, 1986], and with early stages in drawing development in which graphic marks map to basic shapes and volumes in the denotation system [Willats, 1997]. If, as Armstrong and Wilcox  suggest, these classifier systems reflect basic persisting strategies of mapping form to meaning across modalities, they may be included as resilient properties of drawing as well. Just as the resilient features of language have been revealed through studying impoverished populations, detailing the resilient features of drawing can be studied through cross-cultural and historical corpus analysis, by studying the representations of individuals who live in graphically impoverished environments, and by studying representations from individuals who might face deficits to drawing development such as blindness.
As in language, we might speculate that resilient features form an innate core of abilities for the faculty of drawing, onto which more fragile features can be built. Although children across the globe receive very different exposure to and instruction in drawing, consistent features and developmental steps have been recognized for drawings [Golomb, 2002; Kindler & Darras, 1997; Willats, 2005]. An innate faculty like this would explain why certain features of basic drawings pervade different locations, cultures, and time periods across human history [Golomb, 1992; Paget, 1932]. It would also explain why all normally developing individuals have at least some capacity to draw, even if they do not become fully fluent, and most children naturally make graphic marks with no prompting. All of this implies at least some biological basis for an innate drawing faculty in cognition.
By contrast, graphic fluency involves learning the more fragile properties of a drawing system, the set of cultural conventions that facilitate a broader system of expression. Such features may either be learned through imitating the signs of a culture (passively acquired and/or instructed) or, less likely, be fully developed by a learner. Some even more fragile aspects of drawing may never be acquirable through imitation alone. For example, perspective and realistic shading, which involve vanishing points or light sources, may have to be learned or instructed from an external source. This would make sense, since these aspects of drawing were invented or discovered in the Renaissance and are not prevalent in historically earlier drawing systems or in most children’s drawings. These fragile features may be built on top of the resilient features and serve as an overriding system to those more basic functions. For example, if the resilient properties included a basic strategy of using size to depict depth (i.e., smaller is farther), this may be overridden by culturally learned schemas to show volume or depth, or even full perspective.
Reexamining the Drop-Off
Now we might ask, if drawing involves resilient and fragile features, does that also mean it involves a critical period? With the overall theory that drawing abilities are comparable to linguistic abilities, we can reinterpret the drop-off in the developmental trajectory of drawing ability. Instead of a period of oppression where puberty halts an individual’s progress, this can perhaps be viewed as the end of a critical period at puberty, similar to that found in language development. If learners do not acquire the sufficient exposure and practice with schemas (by not imitating), they will reach the end of their critical period without attaining a level of proficiency, and thus their ability appears to ‘drop off,’ leaving them with only the resilient features of the drawing system (or the features they acquired up to that point).
If drawing does involve a critical period like language, it is worth comparing the environments and usages in which children learn to speak and draw. Children are immersed in language, often with constant encouragement to speak from communicative interactions with older speakers. They acquire language by observing and engaging in socially interactive communicative acts using that system [Kuhl, 2007] in which they are constantly encouraged to produce it in real-time exchanges. By contrast, in American and European cultures, children are not exposed to an environment that uses drawing as an everyday communicative activity. Children engage in few, if any, interactions encouraging them to imitate to acquire drawing as a way to communicate directly with adults and peers. The instances in which they are able to develop their drawing abilities are few and far between, relegated to culturally specific contexts, and usually outside of actual person-to-person interactions (from which, if they are imitating, they might only see the finished product of a drawing, and thus have no access to learning a production script).
Also, despite the sense that the United States and European countries are rich in images, these environments do not necessarily use a consistent visual vocabulary across all of their visual culture. Rather, diverse graphic styles permeate these societies, meaning that a child is exposed to numerous graphic dialects. This diversity would be analogous in language to a child raised in a society where every individual speaks a different language. Which one do they acquire? This is unlike Japan, where the same style extends through nearly all facets of visual culture. Here, drawing in this style is not just ‘learning to draw’ as a skill, it is partaking in the visual language of the community. In essence, the visual environment and usage of drawings in America and Europe is significantly more impoverished for drawing development than language development.3
In this light, American and European children’s stagnation in drawing ability is the end of a critical period in which they did not receive proper exposure and practice to patterns of drawing, and their development goes no further than the resilient features of the drawing system (thereby appearing to drop off). In contrast, Japanese children are at least immersed in a visual culture that has a consistent, rich set of schemas. As a result, Japanese children have substantially acquired the schemas of their visual language by the end of the critical period (meaning no drop-off).
Another contrasting environment comes from communities of native people in Central Australia. These communities use systems of sand drawings that feature highly conventionalized schemas used alongside speech and an auxiliary sign language that is used in multimodal storytelling as well as in everyday communication [Cox, 1998; Green, in press; Munn, 1962, 1986; Wilkins, 1997]. In fact, drawing is so intertwined with communication that speakers are not considered truly fluent unless they also draw with speech [Wilkins, 1997]. Children in these cultures learn the system without instruction, through exposure alone, just as they do for language [Wilkins, 1997]. In contrast to this system where drawings function as part of interactive linguistic acts, American culture seems severely impoverished for exposure and usage of drawing.
If the development of drawing does fall within a critical period, its end would not necessarily render further learning impossible in later life. Following this time period, drawing development would simply become manifestly more difficult (and further challenging if learning still does not imitate a vocabulary of graphic schemas) and likely require concerted effort, in contrast to the relative ease within the critical period. This is the same as in spoken language. Learning a language becomes significantly more difficult after puberty for most people, as many adults learning a second language can attest.
Finally, it should be acknowledged that, like language, evidence for a critical period for drawing has not been explored specifically (and certainly not with this perspective in mind). However, the cultural context of drawing may make it more feasible to investigate drawing than to investigate language. People who have not learned language by the end of their critical periods are extremely rare. The vast majority of human beings are fluent in a language, and impoverished abilities remain the exception. In contrast, by this account, most individuals are not fluent in a visual language of drawing because the majority of people have impoverished drawing abilities. Because of this difference, the experimental testability of a critical-period hypothesis may be more feasible for drawing than for language. It would be inhumane to deprive a child of learning language, but, since many children are already deprived of learning to draw, it can only benefit them for researchers to provide the necessary environment to test the possibility of a critical period.
Altogether, this article has argued that the cognitive system of drawing is similar to language. Functionally, drawing allows people to express their concepts visually. Learning takes place primarily over a critical developmental period that reaches its end at puberty. Throughout this period, proficiency in drawing relies on the exposure to and acquisition of schematic mental models that are constructed by imitating other people’s drawings. If learners do not acquire schemas and reach the end of their critical period without sufficient development, their drawing abilities stagnate for the remainder of their lives.
The development of drawing is an interaction between nature and nurture. Humans do seem to have an innate capacity for representing concepts graphically, but attaining full proficiency requires interacting with an external system of representations. This requires that a learner is exposed to a rich graphic environment with learnable signs and has the motivation (such as social acceptance) to acquire fluency. Nevertheless, if such fluency is not reached, people are left with resilient abilities that reflect an innate core of the drawing capacity.
This approach highlights parallels between drawing and other expressive capacities, particularly language in the verbal and manual modalities. This perspective assumes a degree of equivalence across all modalities for structure, development, and processing. That is, we should expect that the mind/brain treats all expressive capacities in similar ways, given modality-specific constraints. For example, elements such as the encoding of schemas, combinatorial rule systems, and development across critical periods may cut across all expressive modalities. However, the precise nature of those schemas and combinatorial rules differ between language and drawing because the verbal-auditory and visual-graphic modalities put different demands on their expressive systems (for example, the verbal-auditory channel forces a linear sequence while the visual-graphic channel does not). A comparable assumption has underlined research studying other domains of expression, such as music, with successful results [e.g., Jackendoff & Lerdahl, 2006; Lerdahl & Jackendoff, 1982; Patel, 2003]. Given this context, contrary theories should bear the onus of answering the key question: why should the mind/brain not treat drawing like other expressive capacities?
If parallels do exist between the structure and development of drawing and language, it is worth questioning the ramifications. Does this mean that the notion of language must be expanded? Does it mean that domain-general aspects of cognition motivate the development of both language and drawing? Are there other domains that share the same types of learning trajectories, and what does that say about development in general? These are important questions posed to future research, and addressing them requires a more thorough understanding of the drawing system and its developmental trajectory – especially regarding the interaction of fragile and resilient properties of drawing.
Finally, this theory further implicates drawing as a core, rather than peripheral, cognitive ability. Peripheral abilities (like riding a bike) often come much later in development, can be learned outside of any critical period, and do not leave behind latent resilient features if not developed. In contrast, if drawing requires a narrow time period to fully activate, this would suggest it is as essential to human cognition as other core functions like verbal or manual linguistic systems, which do leave behind resilient artifacts. Thus, any complete understanding of the mind must incorporate this capacity without treating it as an ancillary or peripheral system tied only to aesthetic or expressionistic intents. Rather, drawing serves as another avenue for conveying concepts, and its study is embedded into the understanding of human communication, human cognition, and human nature.
This paper was made possible by funding from the Tufts Center for Cognitive Studies. Thanks go to Naomi Berlove, Emily Bushnell, Stephanie Gottwald, Ray Jackendoff, and Eva Wittenberg for helpful feedback on earlier drafts and for helping to formulate the cognitive model presented herein. I also appreciate the feedback of my two anonymous reviewers.
It is worth remembering that the existing notions of ‘giftedness’ and ‘talent’ in the literature are relative to a perspective that does not conceive of drawing as a visual language. Thus, a reconceptualization of ‘giftedness’ and ‘talent’ may be required when taking into account the types of differences between ‘impoverished’ and ‘fluent’ drawers outlined in this paper.
However, a system for sequential images may be somewhat more analogous to that of sequential words [Cohn, in press; Cohn, Paczynski, Jackendoff, Holcomb, & Kuperberg, 2012].
It is important to distinguish an ‘impoverished’ learning environment from the argument of a ‘poverty of stimulus’ discussed in language acquisition [e.g., Chomsky, 1959; Pinker, 1994]. The poverty-of-stimulus argument holds that children do not receive enough exposure to aspects of language in order to acquire their structures by repeated exposure alone. For example, children may never hear certain limitations on syntactic anaphora, yet they still fully acquire the knowledge that some instances are grammatical while others are not. In this case though, children are exposed to a consistent language, and the ‘poverty’ refers to the fact that they cannot hear all instances of structure within that system (i.e., a counter to behaviorist accounts of rote learning). This is different from an ‘impoverished’ environment found in drawing (or homesign) where a child is exposed to no consistent system or is exposed to multiple systems with no distinction of value between them.