In the article by Navarrete et al. , entitled “Primate Brain Anatomy: New Volumetric MRI Measurements for Neuroanatomical Studies” the need for corrections and clarifications be came apparent after publication. The authors apologize for these changes.
In our main data compilation (Table 1), the article presented measurements of 53 brains from 39 species, 46 brains from the Netherlands Institute of Neuroscience Primate Brain Bank (PBB), and 7 MRI “scan donations.” The electronic online supplementa ry material (for all online suppl. material, see www.karger.com/doi/10.1159/000488136) presented partial measurements of an additional 8 brains, 5 from the PBB and 3 “scan donations.” A miscommunication led to the 10 “scan donations” not being appropriately described or attributed. The “scan donations” are more appropriately termed “MRI scans from other sources” (henceforth “other sources”), with the postmortem great ape scans originating from the Great Ape Neuroscience Project (GANP; contributed by C.C. Sherwood and P.R. Hof) and the in vivo Cebus apella scans contributed by K.A. Phillips. One of the orangutans, the bonobo, 2 of the chimpanzees, and 1 of the gorillas from the GANP collection were wild-born individuals. They were captured as infants and then lived the rest of their lives in captivity. The new Table 1 presented here and as online supplementary Table S1 correct specimen descriptions for the “other sources” scans. Our original article does not clearly state that the MRI scans we made publicly available were the scans of the 51 brains originating from the PBB, not the “other sources” scans.
The article compares brain component measurements with previously published data, reporting several discrepancies, particularly in some brain components. Given such discrepancies, we have received a request for additional measurement details and figures illustrating regions of interest, as well as queries on how to combine our dataset with existing published data. Unfortunately, we have been unable to reestablish contact with the first author after she left St. Andrews, or to otherwise access the necessary computer files. Our scans of the PBB brains are, however, available with the article.
In a new online supplementary Table S3 we now also make available the specimen-level data for the interobserver comparisons between our data and those of de Viet  and Todorov . These authors made measurements on the same scans independently of Navarrete and with a different advisory team, and their measurements thus provide useful indicators of interobserver reliability. Since mean percent errors can average out variation between specimens or observers, and certain species or specimens may be more difficult to measure, the specimen-level data in the online supplementary material provide a fuller picture of interobserver reliabilities than the means and ranges we provide in the main paper.
To further address data reliability, in online supplementary Table S3 we present new comparisons of our measurements with measurements of the total brain, neocortex, striatum, cerebellum, hippocampus, and thalamus for 6 GANP brains studied in previous publications [Sherwood et al., 2004; Barks et al., 2015]. These authors, like us, measured neocortical grey matter rather than grey plus white matter combined [Sherwood pers. commun.]. C.C. Sherwood provided the unpublished volumes for individual brains from Barks et al. . These volumes are not corrected for shrinkage, but shrinkage is expected to have been low [Sherwood pers. commun.]. We compared our cerebellum volume to the sum of the two hemispheres plus vermis in Barks et al. . We found good correspondence between these data for the total brain volumes (mean percent error = 3.4%, n = 6, range = 1.2 to 5.8%), neocortex volumes (mean percent error = –4.3%, n = 3, range = –1.7 to –6.4%), and cerebellum volumes (mean percent error = 2.3%, n = 6, range = –4.7 to 8.3%). These percent errors were smaller than those comparing our measurements to the histological measurements of other individuals of the same species (online suppl. Table S2). However, hippocampus volume measurements were consistently lower in our dataset (mean percent error = 35.8%, n = 5, range = 20.0 to 46.9%) than the data of Sherwood et al. , similar to that reported in online supplementary Table S2. Striatum measurements corresponded well for Gorilla but poorly for Pan and Pongo (mean percent error = –13.3%, n = 5, range = –24.7 to 2.9%), suggesting difficulties in applying measurement criteria across species. Thalamus measurements corresponded poorly and not in a consistent direction (mean percent error = 2.6%, n = 5, range = –29.1 to 32.3%).
The original online supplementary Table S2 used total neocortex volumes from Stephan et al.  and Zilles and Rehkämper , rather than the appropriate neocortical grey volumes. This error is corrected in the revised online supplementary Table S2, taking neocortical grey measurements from Frahm et al.  and Zilles and Rehkämper . As expected, this improves the correspondence between our data and the histological measurements (mean percent error = –11.9%, n = 8, range = –1.4 to –21.4%).
We thus provide five indicators of data reliability. Four are comparisons of our measurements with measurements of the same scans: (1) de Viet , (2) Todorov , (3) Pagnotta’s independent measurements for our article, and (4) the published work of Sherwood et al.  and Barks et al. . The fifth is a comparison of our measurements to histological measurements of different individuals of the same species [Stephan et al., 1981; Frahm et al., 1982; Zilles and Rehkämper, 1988].
A major message of our article is the need to examine the reliability of brain component measures. Much work on primate brain evolution has been based on relatively small samples of species and individuals, with the possibility that the data source impacts findings [Powell et al., 2017]. Variation between data sources can be considerable (for example, see the compilation in Reader and MacDonald ). Work is needed to examine within-species variation in brain component volumes and to compare measurement techniques on the same brains. Presently, it is not possible to fully determine the extent to which differences between studies are due to measurement differences, specimen treatment, or within-species variation. The discrepancies between datasets we describe in our article reinforce the point that caution is needed when combining datasets from different sources.
In conclusion, we note the following:
We strongly encourage independent replication of our measurements. Our scans of PBB specimens are available as online supplementary material with our article and via a repository specific to primate MRI data, the PRIMatE Data Exchange [Milham et al., 2018].
Data reliability, measurement technique, and data source should all be considered when using these data in comparative analyses. Of the Table 1 measures, we would be comfortable in using the total brain, telencephalon, neocortex, and cerebellum measures together with other comparative datasets, with appropriate statistical treatment, due to the good correspondence between our measurements and multiple independent sets of measurements of the same specimens. Without further work, we are less comfortable recommending that the other measurements detailed in online supplementary Table S1 be combined with existing datasets due to the fact that we have only one set of interobserver reliabilities for these measures.
For clarity in presentation of the data we consider most reliable, we have removed from Table 1 and online supplementary Table S1 the brain components that we had noted were frequently damaged or where boundary discriminations were difficult (medulla, pons, mesencephalon, diencephalon), where the mean percent error between our measurements and any of the four independent sets of measurements was over 10% (striatum, hippocampus, claustrum), or where percent errors had a substantial range (thalamus). The interobserver reliability data remain in order to make explicit the issue of measurement variation.
While additional methodological details would clearly strengthen the value of our measurements for comparative work, the scans, measurements, and analyses of variation between data sources we present should nonetheless provide a useful resource for future work.