Abstract
Background: Body composition is increasingly being recognized as an important prognostic factor for health outcomes across cancer, liver cirrhosis, and critically ill patients. Computed tomography (CT) scans, when taken as part of routine care, provide an excellent opportunity to precisely measure the quantity and quality of skeletal muscle and adipose tissue. However, manual analysis of CT scans is costly and time-intensive, limiting the widespread adoption of CT-based measurements of body composition. Summary: Advances in deep learning have demonstrated excellent success in biomedical image analysis. Several recent publications have demonstrated excellent accuracy in comparison to human raters for the measurement of skeletal muscle, visceral adipose, and subcutaneous adipose tissue from the lumbar vertebrae region, indicating that analysis of body composition may be successfully automated using deep neural networks. Key Messages: The high accuracy and drastically improved speed of CT body composition analysis (<1 s/scan for neural networks vs. 15 min/scan for human analysis) suggest that neural networks may aid researchers and clinicians in better understanding the role of body composition in clinical populations by enabling cost-effective, large-scale research studies. As the role of body composition in clinical settings and the field of automated analysis advance, it will be critical to examine how clinicians interact with these systems and to evaluate whether these technologies are beneficial in improving treatment and health outcomes for patients.
Body Composition Assessment in Clinical Populations
Body composition, defined as the proportions of skeletal muscle and adipose tissue, is increasingly being recognized by researchers and clinicians as a critical component in disease prognosis and treatment planning [1]. Several clinical populations, including liver cirrhosis, various cancer, and critically ill patients, display worse clinical outcomes in the presence of abnormal body composition features (e.g., low skeletal muscle mass) [2, 3]. Identifying patients with these abnormal body composition features can be a challenge in clinical settings due to a lack of access to precise body composition modalities (e.g., dual-energy X-ray absorptiometry) or technical expertise, limiting assessments to typically using body mass index. However, body mass index is a crude measure of body size which cannot distinguish skeletal muscle or adipose tissue from body weight. In contrast, computed tomography (CT) scans, which are often acquired for clinical purposes (e.g., monitoring disease progression), can be retrospectively analyzed to precisely segment skeletal muscle, intermuscular adipose tissue, visceral adipose tissue, and subcutaneous adipose tissue (Fig. 1) [4]. Typically, a single two-dimensional CT scan from the lumbar vertebrae (e.g., third lumbar vertebrae) is used as a consistent bony landmark for body composition analysis, as the skeletal muscles and adipose tissue in this region display strong associations with whole-body metrics of body composition [5]. While retrospective analysis of these scans has been instrumental in our understanding of the intricate relationships between body composition and clinical outcomes, analysis of these scans requires laborious manual segmentation of specific tissues using specialized software. Segmentation of a single scan may require 15–20 min of analysis time, which can rapidly become time- and cost-prohibitive in large-scale investigations (e.g., 4,000 CT scans require approximately 1,000–1,300 h of analysis time).
Computed tomography body composition analysis of the third lumbar vertebrae.
Overview of Automated Frameworks for Body Composition Analysis of CT Scans
Development of a framework for the automated analysis of CT scans for body composition would enable clinicians and researchers to better leverage the immense repository of available scans (approximately 5.6 million CT scans performed per year in Canada [6]) for investigating the role of body composition in clinical populations. While a few approaches have proposed sophisticated, hand-crafted features for automatically analyzing CT scans [7], recent advances in deep learning have demonstrated immense success in biomedical image segmentation. Deep learning is a subfield of machine learning which utilizes deep neural networks to make predictions (e.g., segmentation maps) based on input data (e.g., CT scans). A basic neural network is a series of stacked layers containing multiple neurons, which process (using simple mathematical operations) and then transmit information to subsequent connected layers [8]. A deep neural network is composed of an input layer, multiple hidden layers, and an output layer [9]. The hidden layers are fully connected, meaning that the outputs of these neurons are shared with all neurons of the subsequent layers (i.e., the neuron outputs of one layer become the input to neurons in the next layer), amplifying the important features of the input data for classification in the output layer. The final output layer assigns a probability for a label for classification.
Advances in learning algorithms and network architectures have led to variants of basic neural networks that are optimized for specific tasks. Convolutional neural networks (CNNs) are a class of deep neural networks most commonly applied to computer vision problems (e.g., image classification, image segmentation). CNNs implement convolutional layers in place of fully connected hidden layers, which assemble simple image features into complex patterns in deeper layers [9]. LeCun et al. [10] pioneered a seven-layer CNN in 1998 to classify handwritten digits, demonstrating the potential for deep learning for image classification tasks. In 2015, Long et al. [11] and Ronneberger et al. [12] introduced the fully convolutional network (FCN) and UNet architectures, which are variants of CNNs focused on the task of image segmentation rather than classification.
Several recent publications have applied FCN and UNet architectures to automate segmentation of skeletal muscle and adipose tissue from CT scans (Table 1) [13-18]. These deep learning approaches leverage large datasets of CT scans that have been previously analyzed by human raters and utilize neural networks to learn image features that can best segment muscle and adipose tissue. These networks can then be used to predict the segmentation maps for skeletal muscle and adipose tissue of new CT scans. To assess the performance of these automated approaches, the agreement between human-analyzed and predicted segmentations is often quantified using the Dice similarity coefficient, a statistic which measures the degree of overlap between two segmentations maps (0 indicating no overlap, 1 indicating perfect overlap). Network performance should be assessed on data that were not used during the training process, as neural networks tend to overfit training data. Assessment of network performance is often achieved using several different approaches, including cross-validation, splitting the original data into training and test sets, and testing the accuracy on an independent dataset.
Deep Learning and CT-Based Body Composition Analysis
Lee et al. [13] developed an FCN to segment skeletal muscle using 250 CT scans from lung cancer patients. This network achieved high test accuracy (Dice: 0.93; n = 150 CT scans); however, the authors observed decreased performance in obese patients (Dice: 0.92) likely due to subcutaneous soft tissue edema being segmented as muscle [13]. Recently, Dabiri et al. [17] developed a combined FCN and UNet architecture for segmentation of skeletal muscle using three datasets (totaling 6,221 CT scans) from multiple different cancer types. This network demonstrated high accuracy (Dice: 0.98) on a large test set of 2,958 CT scans. Similar approaches have also been applied to segmenting adipose tissue [17]. Wang et al. [14] created a segmentation network using data from 40 (20 training and 20 test) ovarian cancer patients which demonstrated accurate segmentation of visceral (Dice: 0.92) and subcutaneous (Dice: 0.98) adipose tissue on test patients. Several recent publications have demonstrated that skeletal muscle, visceral adipose tissue, and subcutaneous adipose tissue can be accurately segmented within the same neural network [15, 16], which may provide improved computation speed for body composition analysis. Overall, these networks provide excellent segmentation accuracy (Table 1) while drastically improving the speed of analysis (e.g., 4,000 CT scans require approximately 15 min of analysis time), demonstrating that the laborious task of manual segmentation of CT scans for body composition analysis can be automated with high success.
While several publications have demonstrated good generalizability of their neural networks on test datasets for body composition analysis of CT scans (some using large datasets), neither the networks nor training data have been made publicly available, limiting the implementation and reproducibility of these approaches. Release of an anonymous dataset of segmented CT scans would enable direct comparisons and confirmation of the reproducibility of these networks.
Moving Forward: What Is the Future of Deep Learning and Body Composition in Clinical Settings?
These advances in the automated segmentation of CT scans for body composition analysis may aid clinicians and researchers in performing large-scale research studies investigating the influence of body composition in health and disease. While the current role of body composition in clinical settings is typically limited to retrospective research, if future studies are able to leverage these automated approaches for prospective body composition research, the prognostic role of body composition may be further validated as a clinically important metric to consider in the decision-making process (e.g., dose chemotherapy based on skeletal muscle mass to minimize the risk of chemotherapy toxicity). However, given the “black box” nature of neural networks, it will become critical to examine how health care providers interact with these systems. Visual confirmation of the segmentation results during routine CT interpretation by clinicians may be useful for both ensuring accuracy and instilling confidence in the results for treatment planning and clinical decisions. Ultimately, if body composition features emerge as an important metric in the prognosis or treatment for clinical populations, it will be necessary to evaluate whether these automated analysis systems are beneficial to clinicians and patients in improving outcomes.
Disclosure Statement
The authors have no conflicts of interest to declare.
Funding Sources
The author received no specific funding for this work.