For forty years it has been assumed that joint attention is a driving force in the development of early communication (Bates, 1979), and much evidence has been provided to support this idea. Shared attention at the end of the infant’s first year has been found to relate to the infant’s first words (Baldwin, 1995; Bates, 1979; Bruner, 1974; Markus et al., 2000; Tomasello, 1988, 1995; Tomasello & Todd, 1983), early learning (Striano et al., 2006), emotion regulation (Morales et al., 2005), social development (Mundy & Sigman, 2006; Vaughan Van Hecke et al., 2007) and early symbolic thinking (Mundy & Jarrold, 2010).
A few recent investigations have suggested an alternative position, that it is infants’ abilities to sustain attention rather than share a focus with their caregiver, that may reveal the underlying mechanism for these achievements, particularly their early vocabulary growth (Yu et al., 2019). Although this literature is far smaller, it has received a lot of interest. Not only has it been used to explain individual differences in vocabulary acquisition (Brooks et al., 2018), it has also been shown to relate to cognitive performance, notably problem solving in the later toddler period (Choudhury & Gorman, 2000). This recent research posits a challenge to traditional theories, suggesting that joint attention may merely be a proxy for the ability to sustain a focus on objects (toys and people), as the true driving force behind later developmental abilities. Given the centrality of joint attention as a construct (Carpenter et al., 1998), the theoretical implications could be major.
Sustained attention is characterised by a focus or fixation on a particular stimulus. The recent literature is not wholly clear about what this reveals, but it is implied that attention is a demonstration of the ability to be more connected to objects and, therefore, an ability to use this information to develop more complex associations – what Richards and Casey (1992) term “information processing.” Yet to have to compete with, or replace, joint attention, the construct of sustained attention needs much more critical analysis. The issues with this concept can be divided into 2 main areas, measurement and definitional problems, which we explore in turn before exploring some deeper theoretical concerns.
Methodological Issues in the Measurement of Sustained Attention
Broadly, there have been 3 approaches to the measurement of sustained attention. Some authors identify it through associated physiological responses such as heart rate (Curtindale et al., 2019), some give descriptions of the internal processes posited to take place during visual fixation (Richards & Casey, 1992), while others provide measures of looking time as a threshold for what qualifies as sustained attention (Yu et al., 2019). Each of these approaches draws from or derives a different definition of the construct, and we need to examine these in turn to identify this variation. Indeed, none of the approaches used is problem free.
Although the recent papers often assume that the notion of sustained attention is new, there is a literature on the topic that is over 30 years old. Many of the early studies refer to the internal processes that supposedly underpin the ability to attend. For example, Richards and Casey (1992) made the assumption that sustained attention is a behaviour which “represents encoding of stimulus information” (p. 48) and requires “subject-controlled cognitive processing” (p. 38). However, the justification for these claims is not complete and, as we will discuss below, not all the evidence is compatible with them. It may be possible with an adult sample to identify what fixation on an object implies, as participants could be asked to provide details of the stimulus to see how much information processing had taken place. However, with an infant population, where such information cannot be requested, it is not possible to identify the cognitive processes taking place during a fixation period.
Second, other early studies used physiological responses to measure sustained attention. Richards (1989) suggests that there is a prolonged lowering of the heart rate when attention becomes directed. Whilst this measure is preferable to simple assumptions about depth of processing, there is no consensus concerning when a lowered heart rate is in evidence. Casey and Richards (1988) found that heart rate decelerations vary dependent on the child’s age. Fourteen-, 20- and 26-week-old infants experienced average heart rate decelerations of 6.82, 7.22 and 9.89 beats per minute (bpm), respectively, during sustained attention. Curtindale et al. (2019) have recently employed heart rate as a measure of infant attentional phases (combined with looking) and used 5 consecutive beats below the median heart rate as a threshold for determining sustained attention. However, as we can see from Casey and Richards’ research, 5 bpm is less than the average deceleration made across the age groups measured. It is possible, in a group of 26-week-olds, for example, that a 5-bpm deceleration may not truly reflect a period of sustained attention if their average deceleration is 9.89 bpm. So, there is little consensus about how to assess physiological correlates of attention.
Third, sustained attention has been assessed simply in terms of looking time. This should be the simplest and most reliable means of assessing the construct. However, even this approach is open to question. Some coding schemes remain vague, such as “extended … duration of visual attention to (an) object” (Wass et al., 2018, p. 1). Other researchers are more precise and specify a time threshold. In their influential paper, Yu et al. (2019, p. 2) define sustained attention as “the stabilization of visual attention to an object for long durations (e.g., > 3 s),” although we wonder why they stated “e.g.” rather than “i.e.” This definition is much more objective and allows for infant looking to score highly on measures of interrater reliability. However, as with the criticism of heart rate thresholds, there is evidence to suggest that average durations of looks to reflect periods of sustained attention may vary by age. Ruff and Lawson (1990) found that 3 s (3.33 s) were only the average look duration for infants around 1 year of age, but by 2 and 3 and a half years mean look durations had increased to 5.36 and 8.17 s, respectively. Thus, for older infants, 3 s may not truly reflect a period of sustained attention, and having a blanket rule for measuring this may undermine the validity of the results.
Theoretical Inconsistencies?
The recent literature and discussion have consistently suggested that sustained attention indicates deeper thinking about the objects being inspected. However, when debate over the past 30 years is considered, we see a strange disjuncture in the findings. Thirty years ago, sustained attention, assessed using a variety of methods and looking times, was found to be associated with poorer intellectual outcomes in the child. For example, Slater (1995) reviewed a group of studies linking attention behaviours in infancy to later IQ. Fixation time in infancy was shown to correlate negatively (–0.29) with the Stanford-Binet IQ at 5 years of age (Sigman et al., 1986). Slater distinguishes between “short lookers” and “long lookers,” and he equated protracted fixation with slower information processing. Gunderson et al. (1987) attempted to study the physiological basis of this distinction by comparing infant pigtailed macaques, divided into high and low risk for developmental problems determined by symptoms like hypoxia (associated with cognitive delay in human infants). Low-risk macaques scored higher on a test of human infant intelligence and more easily differentiated targets in a habituation task (they were “short lookers”) compared to the high-risk group who scored lower and looked longer during the visual task.
Colombo et al. (1991) also suggest that infants who take longer to habituate are slower at processing information. Colombo (1993) identified correlations between infant fixation durations and concurrent measures including (but not limited to) recognition memory, motor development, reaction time and visual discrimination, across 13 different studies for predictive validity. In all cases except one (when measuring visual exploration), there was a negative relationship between length of fixation duration and performance on these tasks. Colombo also identified a body research that consistently showed longer looking as a behaviour expressed by developmentally at-risk populations (Cohen, 1981; Fantz & Fagan, 1975; Rose, 1981).
Perhaps the fixation behaviour referred to by the earlier studies, summarised by Slater (1997) and Colombo (1993), is something different to the extended attentional fixation discussed by recent researchers? Both Colombo (1993) and Slater (1997) used it to describe the nature of prolonged visual attention during habituation paradigms. It is in these instances that longer fixations have indicated lower mental processing skills. However, in more recent research, it is used to describe looking at objects during joint attention interactions (Yu et al., 2019). In these, it is a marker of higher processing. This may explain the problems we have identified in interpreting what each behavioural measure means, and without a resolution we are yet to reach a firm understanding on the definition of this concept.
Possible Explanations for the Discrepancies in Findings and Recommendations for Future Research
The long-term effects of looking at an object during joint attention versus simply habituating to a visual stimulus are so starkly in contrast that this may provide a clue to the nature of both activities. In order to understand these differences, we need to reliably discern between the two. This requires analysis of the measures and definitions of each skill. They may well be harder to tease apart than would seem on the surface to be easily distinguishable constructs.
Let us take, for example, the heart rate measures that have been used to assess sustained attention. If we assume that agreement can be reached on a reliable and valid measurement criterion, we are still likely to run into problems when comparing attention during the two types of experiences. There is evidence to suggest that many features of social interaction can influence infant heart rate, including infant-directed speech and talk with affective content (Santesso et al., 2007). Such data, therefore, make it difficult simply to identify a “sustained lowering of the heart rate” as a measure of attention to an object, when simultaneously this can be affected by many other aspects of the social interaction. Similarly, if we consider looking times, we should take note that the infant’s fixations on the object of shared attention are highly likely to be influenced by the current social support they are receiving from their caregiver. Yu et al. (2019) construct different measures of sustained and joint attention, but perhaps these are too intertwined and complex for them to be fully dissociable.
It may be the case, therefore, that a term like “sustained attention” is too broad to apply uniformly to types of behaviour that are as distinct as object habituation and triadic interaction. For a start, it seems logical to divide the construct into social and non-social subtypes, as we know that this differentiation may be important (Greene et al., 2009). Social information would include reference to objects such as faces, whereas the non-social category would group stimuli such as geometric shapes (Sherrod, 1979). However, the plot gets thicker when infants are attending to a “social” non-social stimulus such as a picture of a face, given that they can be presented with both social and non-social stimuli during habituation. Such a schematic representation is unlikely to elicit the same reaction as a non-social stimulus.
We make 2 proposals as a solution to the methodological and theoretical problems described above. First, we should consider the stimulus which the infant is attending to. Not only would it be logical to divide sustained attention into interactive and non-interactive settings, but a classification system may need to be more complicated. If the object to which the infant attends is a laptop or tablet screen, the stimuli presented often have interactive qualities. Mobile media devices may even present images which offer interactive features inviting a reaction/response via a particular tap or swipe. We need to take several possible influences into account and not simply divide stimuli into social and inanimate.
Second, we suggest a need to define sustained attention according to the number of “stimuli” or attentional items that the infant is occupied with at a time. During habituation studies the infant is focused on a single object, usually displayed on an otherwise blank screen, while during triadic interactions there is more than 1 focus of the infant’s attention (the other person, the possible items of shared attention and any actions they perform on these). This may be what causes differences in how sustained attention is expressed and what each type predicts. Within social interaction, we may need to factor out the influence of the other interactant in order to identify the infant’s unscaffolded sustained attention. To do this effectively, studies would probably require sequential data analysis to identify the possible influences of the other interactant’s social gestures to devise a pure measure of this type of sustained attention. It may even be the case that the internal processes that take place during a joint interaction and looking at a screen are very different, and thus an attempt to operationalise these processes might be subjective and uninterpretable.
The contrast between recent studies and research 30 years ago alerts us to a problem in the idea that sustained attention during triadic (caregiver-child-object) interaction tells us more about early development than the interaction itself. First, it shows that sustained attention must be divided into more fine-grained categories. These need to be more robust, mutually exclusive and established so that they can be reliably used within their respective contexts (given the differences in the way infants engage their attention dependent on the stimuli/experimental set-up). Second, more work needs to be done to establish how the measures of sustained attention (e.g., looking time and heart rate) relate to one another, and these must be coordinated within the same study, using the same infants for both triadic interaction and habituation. As has recently been suggested, joint interaction can guide and aid an infant’s sustained attention (Wass et al., 2018). Within an interaction, periods of time spent in sustained attention may provide markers of successful joint attention, thus predicting later abilities (Yu et al., 2019). Alternatively, it may still be the case that joint attention serves as a proxy for sustained attention and its role may be as a facilitator in, as opposed to the key mechanism responsible for predicting, later developmental abilities. Finally, we need further clarification of why the correlation between time spent engaging in sustained attention and later cognitive ability differs in the contexts of habituation versus joint interaction. Only by dedicating more research to understanding the nature of sustained attention in each setting will we be able to clarify what this concept means, let alone to identify its role (or roles) in early development.
Statement of Ethics
No ethical approval was required or obtained for the preparation of this (conceptual) paper.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
K.M.H. is currently receiving a Leverhulme Trust linked PhD Studentship from the Lancaster University Department of Psychology.
Author Contributions
Author contribution was 60:40 between K.M.H. and C.L., respectively.