Background: While skin cancers are less prevalent in people with skin of color, they are more often diagnosed at later stages and have a poorer prognosis. The use of artificial intelligence (AI) models can potentially improve early detection of skin cancers; however, the lack of skin color diversity in training datasets may only widen the pre-existing racial discrepancies in dermatology. Objective: The aim of this study was to systematically review the technique, quality, accuracy, and implications of studies using AI models trained or tested in populations with skin of color for classification of pigmented skin lesions. Methods: PubMed was used to identify any studies describing AI models for classification of pigmented skin lesions. Only studies that used training datasets with at least 10% of images from people with skin of color were eligible. Outcomes on study population, design of AI model, accuracy, and quality of the studies were reviewed. Results: Twenty-two eligible articles were identified. The majority of studies were trained on datasets obtained from Chinese (7/22), Korean (5/22), and Japanese populations (3/22). Seven studies used diverse datasets containing Fitzpatrick skin type I–III in combination with at least 10% from black Americans, Native Americans, Pacific Islanders, or Fitzpatrick IV–VI. AI models producing binary outcomes (e.g., benign vs. malignant) reported an accuracy ranging from 70% to 99.7%. Accuracy of AI models reporting multiclass outcomes (e.g., specific lesion diagnosis) was lower, ranging from 43% to 93%. Reader studies, where dermatologists’ classification is compared with AI model outcomes, reported similar accuracy in one study, higher AI accuracy in three studies, and higher clinician accuracy in two studies. A quality review revealed that dataset description and variety, benchmarking, public evaluation, and healthcare application were frequently not addressed. Conclusions: While this review provides promising evidence of accurate AI models in populations with skin of color, the majority of the studies reviewed were obtained from East Asian populations and therefore provide insufficient evidence to comment on the overall accuracy of AI models for darker skin types. Large discrepancies remain in the number of AI models developed in populations with skin of color (particularly Fitzpatrick type IV–VI) compared with those of largely European ancestry. A lack of publicly available datasets from diverse populations is likely a contributing factor, as is the inadequate reporting of patient-level metadata relating to skin color in training datasets.

Skin cancer is the most common malignancy worldwide, with melanoma representing the deadliest form. While skin cancers are less prevalent in people with skin of color, they are more often diagnosed at a later stage and have a poorer prognosis when compared to Caucasian populations [1‒3]. Even when diagnosed at the same stage, Hispanic, Native, Asian, and African Americans have significantly shorter survival time than Caucasian Americans (p < 0.05) [4]. Skin cancers in people with skin of color often present differently from those with Caucasian skin and are often underrepresented in dermatology training [5, 6].

The use of artificial intelligence (AI) algorithms for image analysis and detection of skin cancer has the potential to decrease healthcare disparities by removing unintended clinician bias and improving accessibility and affordability [7]. Skin lesion classification by AI algorithms to date has performed equivalently to [8] and, in some cases, better than dermatologists [9]. Human-computer collaboration can increase diagnostic accuracy further [10]. However, most AI advances have used homogenous datasets [11‒15] collected from countries with predominantly European ancestry [16]. Exclusion of skin of color in training datasets poses the risk of incorrect diagnosis or missing skin cancers entirely [8] and risks widening racial disparities that already exist in dermatology [8, 17].

While multiple reviews have compared AI-based model performances for skin cancer detection [18‒20], the use of AI in populations with skin of color has not been evaluated. The objective of this study was to systematically review the current literature for AI models for classification of pigmented skin lesion images in populations with skin of color.

Literature Search

The systematic review follows the PRISMA guidelines [21]. A protocol was registered with PROSPERO (International Prospective Register of Systematic Reviews) and can be accessed at https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021281347.

A PubMed search in March 2021 used search terms relating to artificial intelligence, skin cancer, and skin lesions (search strings in online suppl. eTable; for all online suppl. material, see www.karger.com/doi/10.1159/000530225). No date range was applied, language was restricted to English, and only original research was included. Covidence software was used for screening administration. Search results were screened by reviewing titles/abstracts by two independent reviewers (Y.L. 100% and B.B.S. 20%) using eligibility criteria described in Table 1. Remaining articles were assessed for eligibility by reviewing methods or full text. Disagreements were resolved following discussions with a third independent reviewer (C.P.).

Table 1.

Inclusion and exclusion criteria used for screening and assessing eligibility of articles

Inclusion criteriaExclusion criteria
1. Any computer modeling or use of AI on diagnosis of skin conditions2. Datasets provide information on the population (racial or Fitzpatrick skin type breakdown) or datasets obtained from countries with predominantly skin of color population3. Uses dermoscopic, clinical, 3D, or other photographic images of the skin surface4. Includes the assessment of malignant and/or non-malignant pigmented skin lesions 1. No population description of the training datasets (demographic, racial, or ethnicity breakdown) or from a country with predominantly Caucasian population of European ancestry2. Dataset description with >90% Caucasian population, or fair skin type, or Fitzpatrick skin type I–III3. Solely used images from ISIC [56], PH2 [13], IAD [57], ISBI [58], HAM10000 [12], MED-NODE [14], ILSVRC [59], DermaQuest [60], DERMIS [61], DERM IQA[62], DermNet NZ [63], and datasets known to be of predominantly European ancestry 
Inclusion criteriaExclusion criteria
1. Any computer modeling or use of AI on diagnosis of skin conditions2. Datasets provide information on the population (racial or Fitzpatrick skin type breakdown) or datasets obtained from countries with predominantly skin of color population3. Uses dermoscopic, clinical, 3D, or other photographic images of the skin surface4. Includes the assessment of malignant and/or non-malignant pigmented skin lesions 1. No population description of the training datasets (demographic, racial, or ethnicity breakdown) or from a country with predominantly Caucasian population of European ancestry2. Dataset description with >90% Caucasian population, or fair skin type, or Fitzpatrick skin type I–III3. Solely used images from ISIC [56], PH2 [13], IAD [57], ISBI [58], HAM10000 [12], MED-NODE [14], ILSVRC [59], DermaQuest [60], DERMIS [61], DERM IQA[62], DermNet NZ [63], and datasets known to be of predominantly European ancestry 

DERM IQA, Dermoscopic Image Quality Assessment [62]; DERMIS, dermatology information system; ISIC, International Skin Imaging Collaboration [56]; DermNet NZ, DermNet New Zealand [63]; IAD, Interactive Atlas of Dermoscopy [57]; ISBI, International Symposium on Biomedical Imaging [58]; ILSVRC, ImageNet Large Scale Visual Recognition Challenge [59]; HAM10000, Human Against Machine with 10,000 training images [12]; MED-NODE, computer-assisted MElanoma Diagnosis from NOn-DErmoscopic images [14].

Data Extraction and Synthesis

Data extraction was performed using a standardized form by author Y.L. and confirmation by V.K. The following parameters were recorded: reference, ethnicity/ancestry/race, lesion number, sex, age, location, skin condition, public availability of dataset, number of images, type of images, methods of confirmation, deep learning system, model output, comparison with human input, and any missing data reported. Algorithm performance measures are recorded by either accuracy, sensitivity, specificity, and/or area under the receiver operating characteristic curve. A narrative synthesis of extracted data was used to present findings as a meta-analysis was not feasible due to heterogeneity of the study design, AI systems, skin lesions, and outcomes.

Quality Assessment

Quality was assessed using the Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology (CLEAR Derm) Consensus Guidelines [22]. This 25-point checklist offers comprehensive recommendations on factors critical to the development, performance, and application of image-based AI algorithms in dermatology [22].

Author Y.L. performed the quality assessment of all included studies, and author B.B.S. assessed 20%. Inter-rater agreement rate was 87%, with disagreements resolved via a third independent reviewer (V.K.). Each criterion was evaluated to be either fully, partially, or not addressed and scored either 1, 0.5, or 0, respectively, using a scoring rubric in online supplementary eTable2.

The database search identified 993 articles, including 13 duplicates. After screening titles/abstracts, 535 records were excluded, and the remaining 445 records were screened by methods, with 63 articles reviewed by full text. Forward and backward citations search revealed no additional articles. A total of 22 studies were included in the final review (PRISMA flow diagram in online supplementary eFig. 1).

Study Design

All 22 studies were performed between 2002 and 2021 [23‒32], with 11 (50%) studies published between 2020 and 2021 [33‒44]. An overview of study characteristics is displayed in Table 2. The median number of total images used in each study for all datasets combined was 5,846 (ranging from 212 to 185,192). The median dataset size for training, testing, and validation was 4,732 images (range: 247–22,608), 362 (range: 100–40,331), and 1,258 (range: 109–14,883), respectively.

Table 2.

Overview of study characteristics

First author, yearPatient populationPublic availability of datasetImage type/Image No.Validation (H = histology, C = clinical diagnosis)Deep learning systemModel output
ethnicity/ancestry/race/locationdataset information
Piccolo et al. (2002) [23] Fitzpatrick I–V Lesion n = 341Patient n = 289F 65% (n = 188)Average age 33.6 No DermoscopyTotal 341Training 0Testing 341Validation 0 All – H Network architecture DEM-MIPS (artificial neural network designed to evaluate different colorimetric and geometric parameters) Binary (melanoma, non-melanoma) 
Iyatomi et al. (2008) [24] ItalianAustrianJapanese n/a No DermoscopyTotal 1,258Training 247Testing NAValidation 1,258 Dataset A and B– HDataset C – H + C Network architectureANN (back-propagation artificial neural networks) Binary (malignant or benign)Malignancy (risk) score (0–100) 
Chang et al. (2013) [25] Taiwanese Lesion n = 769Patient n = 676F 56% (n = 380)Average age 47.6 No ClinicalTotal 1,899Training NATesting NAValidation NA Benign – CMalignant – H Network architectureComputer-aided diagnosis (CADx) system built on 91 conventional features of shape, texture, and colors (developing software – MATLAB) Binary (benign or malignant) 
Chen et al. (2016) [26] American Indian, Alaska Native, Asian, Pacific Islander, black or African, American, Caucasian Community datasetPatient n = 1,900F 52.3% (n = 993)Age >50% under 35 PartiallyDermNet NZ – YesCommunity – No ClinicalTotal 12,000Training 11,780Testing 337Validation NA Community dataset (benign and malignant) – CDermNet – H Network architecturePatented image-search algorithm that builds on proven computer vision methods from the field of CBIR Binary (melanoma and non-melanoma) 
Yang et al. (2017) [27] Korean Patient n = 110F 50% (n = 55) No DermoscopyTotal 297Training 0Testing 297Validation 0 All – H Network architecture3 stage algorithm, pre-processing, stripe pattern detection and automatic discrimination (MATLAB) Binary (LM, nevus) 
Han et al. (2018) [28] KoreanCaucasian n/a PartiallyMED-NODE, Atlas, Edinburgh, Dermofit – yesOthers – no ClinicalTotal 182,044Training 178,875Testing 1,276Validation 22,728 ASAN – C+ HMultiple other dataset (5) used with unclear validation Neural network – CNNNetwork ArchitectureMicrosoft ResNet – 152Google Inception 12-class skin tumor types 
Yu et al. (2018) [29] Korean Lesion n = 275 No DermoscopyTotal 724Training 372Testing 362Validation 109 All – H Neural network – CNNNetwork architecture modified VGG – 16 Binary (melanoma/non-melanoma) 
Zhang et al. (2018) [30] Chinese Lesion n = 1,067 No DermoscopyTotal 1,067Training 4,867Testing 1,142Validation NA Benign and malignant – CThree dermatologists disagree – H Neural network – CNNNetwork architectureGoogLeNet Inception v3 Four class classifier (BCC, SK, melanocytic nevus, and psoriasis) 
Fujisawa et al. (2019) [31] Japanese Patients n = 2,296 No ClinicalTotal 6,009Training 4,867Testing 1,142Validation NA Melanocytic nevus, split nevus, lentigo simplex – COthers – PE Neural network – DCNNNetwork architectureGoogLeNet DCNN model 1. Two class classifier (benign vs. malignant)2. Four class classifier (malignant epithelial lesion, malignant melanocytic lesion, benign epithelial lesion, benign melanocytic lesion)3. 14 class classification4. 21 class classification 
Jinnai et al. (2019) [38] Japanese Patient n = 3,551 No DermoscopyTotal 5,846Training 4,732Testing 666Validation NA Malignant – HBenign tumor – C Neural network – FRCNNNetwork architecture – VGG-16 Binary (benign/malignant)Six-class classifications (6 skin conditions) 
Zhao et al. (2019) [32] Chinese n/a No ClinicalTotal 4,500Training 3,375Testing 1,125Validation NA Benign – CMalignant – PE Neural network – CNNNetwork architecture – Xception Risk (low/high/dangerous) 
Cho et al. (2020) [33] Korean Patient n = 404 No ClinicalTotal 2,254Training 1,629Testing 625Validation NA Benign – CMalignant – H Neural network – DCNNNetwork architecture – Inception-ResNet-V2 Binary classification (benign or malignant) 
Han et al. (2020) [36] KoreanCaucasian ASNA, Normal, SNUPatient n = 28,222F 55% (n = 15,522)Average age 40MED-NODE, Web, EdinburghNA PartiallyEdinburgh – YesSNU – upon request ClinicalTotal 224,181Training 220,680Testing 2,441Validation 3,501 ASAN – C+ HEdinburgh – HMed-Node – HSNU – C + HWeb – image finding Neural network – CNNNetwork architectureSENetSe-ResNet-50Visual geometry group (VGG-19) Binary (malignant, non-malignant)Binary (steroids, antibiotics, antivirals, antifungals)Multiple class classification (134 skin disorders) 
Han et al. (2020) [35] Korean Patients n = 673Lesions n = 673F 54% (n = 363)Average age 58 No ClinicalTotal 185,192Training 182,348Testing NAValidation 2,844 All – H Neural network – CNNNetwork architectureSENetSE-ResNeXt-50SE-ResNet-50) Risk output (risk of malignancy) 
Han et al. (2020) [34] Korean Patient n = 9,556Lesion n = 10,426F 55% (5,255)Average age 52 No ClinicalTotal 40,331Training 1106,886aTesting NAValidation 40,331 All – H Neural network – RCNNNetwork architectureSENetSe-ResNeXt-50 Binary (malignant, non-malignant)32 class classification 
Huang et al. (2020) [37] Chinese Lesion n = 1,225 No ClinicalTotal 3,299Training 2,474Testing 825Validation NA All – PE Neural network - CNNNetwork architectureInception V3Inception-ResNet V2DenseNet 121ResNet 50 Binary (SK/BCC) 
Li (2020) [44] Chinese Patient n = 106 No Dermoscopy and clinicalTotal 212Training 200,000aTesting 212Validation NA All – H Network architectureYouzhi AI software (Shanghai Maise Information Technology Co., Ltd., Shanghai, China) Binary (benign or malignant)14 class classification 
Liu et al. (2020) [39] Fitzpatrick type I–VI Patient n = 15,640Lesion n = 20,676 No ClinicalTotal 79,720Training 64,837Testing NAValidation 14,483 Benign – CMalignant – H Neural network – DLSNetwork architectureInception – v4 26 class classification (primary output)419 class classification (secondary output) 
Wang et al. (2020) [40] Chinese, with Fitzpatrick type IV n/a No DermoscopyTotal 10,307Training 8,246Testing 1,031Validation 1,031 BCC – C+HOthers – C Neural network – CNNNetwork architectureGoogLeNet Inception v3 Binary classification (psoriasis and others)Multi-class classification 
Huang et al. (2021) [37] TaiwaneseCaucasian KCGMHPatient no. 1,222F 52.4% (n = 640)Average age 62HAM10000 n/a PartiallyKCGMH – noHAM10000 – yes ClinicalTotal 1,287Training 1,031Testing 128Validation 128 All – H Neural network – CNNNetwork architectureDenseNet 121 – binary classificationEfficientNet E4 – five class classification Binary (benign/malignant)5 class classification (BCC, BK, MM, NV, SCC)7 class (AK, BCC, BKL, SK, DF, MM, NV) 
Minagawa et al. (2021) [42] CaucasianJapanese Patient n = 50 PartiallyISIC–yesShinshu – no DermoscopyTotal 12,948Training 12,848Testing 100Validation NA Benign – CMalignant – H Neural network - DNNNeural architecture – Inception-ResNet-V2 4 class classification (MM/BCC/MN/BK) 
Yang et al. (2021) [43] Chinese n/a No ClinicalTotal 12,816Training 10,414Testing 300Validation 2,102 All – C Neural network – DCNNNeural architectureDenseNet-96ResNet-152ResNet-99Converged network (DenseNet – ResNet fusion) 6 class classification (Nevi, Melasma, cafe-au-lait, SK, and acquired nevi) 
First author, yearPatient populationPublic availability of datasetImage type/Image No.Validation (H = histology, C = clinical diagnosis)Deep learning systemModel output
ethnicity/ancestry/race/locationdataset information
Piccolo et al. (2002) [23] Fitzpatrick I–V Lesion n = 341Patient n = 289F 65% (n = 188)Average age 33.6 No DermoscopyTotal 341Training 0Testing 341Validation 0 All – H Network architecture DEM-MIPS (artificial neural network designed to evaluate different colorimetric and geometric parameters) Binary (melanoma, non-melanoma) 
Iyatomi et al. (2008) [24] ItalianAustrianJapanese n/a No DermoscopyTotal 1,258Training 247Testing NAValidation 1,258 Dataset A and B– HDataset C – H + C Network architectureANN (back-propagation artificial neural networks) Binary (malignant or benign)Malignancy (risk) score (0–100) 
Chang et al. (2013) [25] Taiwanese Lesion n = 769Patient n = 676F 56% (n = 380)Average age 47.6 No ClinicalTotal 1,899Training NATesting NAValidation NA Benign – CMalignant – H Network architectureComputer-aided diagnosis (CADx) system built on 91 conventional features of shape, texture, and colors (developing software – MATLAB) Binary (benign or malignant) 
Chen et al. (2016) [26] American Indian, Alaska Native, Asian, Pacific Islander, black or African, American, Caucasian Community datasetPatient n = 1,900F 52.3% (n = 993)Age >50% under 35 PartiallyDermNet NZ – YesCommunity – No ClinicalTotal 12,000Training 11,780Testing 337Validation NA Community dataset (benign and malignant) – CDermNet – H Network architecturePatented image-search algorithm that builds on proven computer vision methods from the field of CBIR Binary (melanoma and non-melanoma) 
Yang et al. (2017) [27] Korean Patient n = 110F 50% (n = 55) No DermoscopyTotal 297Training 0Testing 297Validation 0 All – H Network architecture3 stage algorithm, pre-processing, stripe pattern detection and automatic discrimination (MATLAB) Binary (LM, nevus) 
Han et al. (2018) [28] KoreanCaucasian n/a PartiallyMED-NODE, Atlas, Edinburgh, Dermofit – yesOthers – no ClinicalTotal 182,044Training 178,875Testing 1,276Validation 22,728 ASAN – C+ HMultiple other dataset (5) used with unclear validation Neural network – CNNNetwork ArchitectureMicrosoft ResNet – 152Google Inception 12-class skin tumor types 
Yu et al. (2018) [29] Korean Lesion n = 275 No DermoscopyTotal 724Training 372Testing 362Validation 109 All – H Neural network – CNNNetwork architecture modified VGG – 16 Binary (melanoma/non-melanoma) 
Zhang et al. (2018) [30] Chinese Lesion n = 1,067 No DermoscopyTotal 1,067Training 4,867Testing 1,142Validation NA Benign and malignant – CThree dermatologists disagree – H Neural network – CNNNetwork architectureGoogLeNet Inception v3 Four class classifier (BCC, SK, melanocytic nevus, and psoriasis) 
Fujisawa et al. (2019) [31] Japanese Patients n = 2,296 No ClinicalTotal 6,009Training 4,867Testing 1,142Validation NA Melanocytic nevus, split nevus, lentigo simplex – COthers – PE Neural network – DCNNNetwork architectureGoogLeNet DCNN model 1. Two class classifier (benign vs. malignant)2. Four class classifier (malignant epithelial lesion, malignant melanocytic lesion, benign epithelial lesion, benign melanocytic lesion)3. 14 class classification4. 21 class classification 
Jinnai et al. (2019) [38] Japanese Patient n = 3,551 No DermoscopyTotal 5,846Training 4,732Testing 666Validation NA Malignant – HBenign tumor – C Neural network – FRCNNNetwork architecture – VGG-16 Binary (benign/malignant)Six-class classifications (6 skin conditions) 
Zhao et al. (2019) [32] Chinese n/a No ClinicalTotal 4,500Training 3,375Testing 1,125Validation NA Benign – CMalignant – PE Neural network – CNNNetwork architecture – Xception Risk (low/high/dangerous) 
Cho et al. (2020) [33] Korean Patient n = 404 No ClinicalTotal 2,254Training 1,629Testing 625Validation NA Benign – CMalignant – H Neural network – DCNNNetwork architecture – Inception-ResNet-V2 Binary classification (benign or malignant) 
Han et al. (2020) [36] KoreanCaucasian ASNA, Normal, SNUPatient n = 28,222F 55% (n = 15,522)Average age 40MED-NODE, Web, EdinburghNA PartiallyEdinburgh – YesSNU – upon request ClinicalTotal 224,181Training 220,680Testing 2,441Validation 3,501 ASAN – C+ HEdinburgh – HMed-Node – HSNU – C + HWeb – image finding Neural network – CNNNetwork architectureSENetSe-ResNet-50Visual geometry group (VGG-19) Binary (malignant, non-malignant)Binary (steroids, antibiotics, antivirals, antifungals)Multiple class classification (134 skin disorders) 
Han et al. (2020) [35] Korean Patients n = 673Lesions n = 673F 54% (n = 363)Average age 58 No ClinicalTotal 185,192Training 182,348Testing NAValidation 2,844 All – H Neural network – CNNNetwork architectureSENetSE-ResNeXt-50SE-ResNet-50) Risk output (risk of malignancy) 
Han et al. (2020) [34] Korean Patient n = 9,556Lesion n = 10,426F 55% (5,255)Average age 52 No ClinicalTotal 40,331Training 1106,886aTesting NAValidation 40,331 All – H Neural network – RCNNNetwork architectureSENetSe-ResNeXt-50 Binary (malignant, non-malignant)32 class classification 
Huang et al. (2020) [37] Chinese Lesion n = 1,225 No ClinicalTotal 3,299Training 2,474Testing 825Validation NA All – PE Neural network - CNNNetwork architectureInception V3Inception-ResNet V2DenseNet 121ResNet 50 Binary (SK/BCC) 
Li (2020) [44] Chinese Patient n = 106 No Dermoscopy and clinicalTotal 212Training 200,000aTesting 212Validation NA All – H Network architectureYouzhi AI software (Shanghai Maise Information Technology Co., Ltd., Shanghai, China) Binary (benign or malignant)14 class classification 
Liu et al. (2020) [39] Fitzpatrick type I–VI Patient n = 15,640Lesion n = 20,676 No ClinicalTotal 79,720Training 64,837Testing NAValidation 14,483 Benign – CMalignant – H Neural network – DLSNetwork architectureInception – v4 26 class classification (primary output)419 class classification (secondary output) 
Wang et al. (2020) [40] Chinese, with Fitzpatrick type IV n/a No DermoscopyTotal 10,307Training 8,246Testing 1,031Validation 1,031 BCC – C+HOthers – C Neural network – CNNNetwork architectureGoogLeNet Inception v3 Binary classification (psoriasis and others)Multi-class classification 
Huang et al. (2021) [37] TaiwaneseCaucasian KCGMHPatient no. 1,222F 52.4% (n = 640)Average age 62HAM10000 n/a PartiallyKCGMH – noHAM10000 – yes ClinicalTotal 1,287Training 1,031Testing 128Validation 128 All – H Neural network – CNNNetwork architectureDenseNet 121 – binary classificationEfficientNet E4 – five class classification Binary (benign/malignant)5 class classification (BCC, BK, MM, NV, SCC)7 class (AK, BCC, BKL, SK, DF, MM, NV) 
Minagawa et al. (2021) [42] CaucasianJapanese Patient n = 50 PartiallyISIC–yesShinshu – no DermoscopyTotal 12,948Training 12,848Testing 100Validation NA Benign – CMalignant – H Neural network - DNNNeural architecture – Inception-ResNet-V2 4 class classification (MM/BCC/MN/BK) 
Yang et al. (2021) [43] Chinese n/a No ClinicalTotal 12,816Training 10,414Testing 300Validation 2,102 All – C Neural network – DCNNNeural architectureDenseNet-96ResNet-152ResNet-99Converged network (DenseNet – ResNet fusion) 6 class classification (Nevi, Melasma, cafe-au-lait, SK, and acquired nevi) 

BCC, basal cell carcinoma; BK, benign keratosis; CNN OR DCNN, convolutional neural network; DF, dermatofibroma; SCC, squamous cell carcinoma; SK, seborrheic keratosis; MM, melanoma; MN, melanocytic; PE, pathological examination (insufficient information provided whether histopathology and/or clinical evaluation was used); n/a, not available; n, number; CBIR, content-based image retrieval.

aAlgorithm previously trained on an different dataset; therefore, dataset numbers are not included.

The majority of studies (15/22, 68%) analyzed clinical images (i.e., wide-field or regional images), while seven studies analyzed dermoscopy images [23, 24, 27, 29, 30, 40, 42], and one study included both [44]. All but one study included both malignant and benign pigmented skin lesions, with one investigating only benign pigmented facial lesions [43].

Histopathology was used as the ground truth in 15 studies for all malignant lesions and partially in two studies [24, 26], while one study only used histopathology to resolve clinician disagreements [23]. Seven studies used histopathology as ground truth for benign lesions [23, 27, 29, 34, 35, 41, 44]. In nine studies, ground truth was established by consensus of experienced dermatologists [25, 30‒32, 38‒40, 42, 43]. Other studies used a mix of both [24, 26, 33, 36] or were not clearly defined [28, 37].

The number of pigmented skin lesion classifications used for AI model evaluation ranged from binary outcomes (e.g., benign vs. malignant) to classification of up to 419 skin conditions [39]. While most studies (19/22, 86%) evaluated lesions across all body sites, one study exclusively analyzed the lips/mouth [33], another assessed only facial skin lesions [43], and one study specifically addressed acral melanoma [29].

Population

Homogenous datasets were collected from the Chinese/Taiwanese (n = 8, 36%) [25, 30, 32, 37, 40, 41, 43, 44], Korean (n = 5, 23%) [27‒29, 33‒35], and Japanese populations (n = 3, 14%) [31, 38, 42]. Seven studies (32%) included populations from Caucasians/Fitzpatrick skin type I–III [23, 24, 26, 28, 36, 39, 42], with at least 10% American Indian [26], Alaska Native [26], black or African American [26], Pacific Islander [26], Native American [26], or Fitzpatrick IV–VI [23, 39] in the training and/or test set (Table 2).

The majority of studies did not specify the sex distribution (n = 13, 59%) or participant age (n = 15, 68%). Seven studies included age specification, ranging from 18 to older than 85 years [23, 25, 26, 34‒36, 41].

Outcome and Performance

The outcome of the classification algorithms used either a diagnostic model, risk categorical model (e.g., low, medium, or high), or a combination of both. An overview of AI model performance is described in Table 3. Majority of studies (20/22, 91%) used a diagnostic model, either with binary classification of benign or malignant [23‒27, 29, 33, 35, 37], multiclass classification of specific lesion diagnosis [28, 30, 32, 39, 42, 43], or both [31, 34, 36, 38, 40, 41, 44]. One study used categorical risk as the outcome [32]. Another study reported both diagnostic model and risk categorical model [24].

Table 3.

Measures of output and performance for AI models included in the review

ReferenceAccuracy (%)Sensitivity (%)Specificity (%)AUC
Binary classification models 
Piccolo et al. (2002) [23] n/a 92 74 n/a 
Iyatomi et al. (2008) [24] n/a 86 86 0.93 
Chang et al. (2013) [25] 91 86 88 0.95 
Chen et al. (2016) [26] 91 90 92 n/a 
Yang et al. (2017) [27] 99.7 100 99 n/a 
Yu et al. (2018) [29] 82 93 72 0.80 
Cho et al. (2020) [33] n/a Dataset 1: 76Dataset 2: 70 Dataset 1: 80Dataset 2: 76 Dataset 1: 0.83Dataset 2: 0.77 
Huang et al. (2020) [37] 86 n/a n/a 0.92 
Han et al. (2020) [35] n/a 77 91 0.91 
Fujisawa et al. (2019) [31] 93 96 90 n/a 
Jinnai et al. (2019) [38] 92 83 95 n/a 
Han et al. (2020) [36] n/a n/a n/a Edinburgh dataset: 0.93SNU dataset: 0.94 
Han et al. (2020) [34] n/a Top 1: 63 Top 1: 90 0.86 
Li et al. (2020) [44] 86 75 93 n/a 
Wang et al. (2020) [40] 77 n/a n/a n/a 
Multiclass classification models 
Han et al. (2018) [28] n/a ASAN dataset: 86Edinburg dataset: 85 ASAN dataset: 86Edinburg dataset: 81  
Zhang et al. (2018) [30] Dataset A: 87Dataset B: 87 n/a n/a  
Fujisawa et al. (2019) [31] 77 n/a n/a  
Jinnai et al. (2019) [38] 87 86 87  
Liu et al. (2020) (26-classification model) [39] Top 1: 71Top 3: 93 Top 1: 58Top 3: 88 n/a  
Han et al. (2020) [36] Top 1Edinburgh dataset: 57SNU dataset: 45Top 3Edinburgh dataset: 84SNU dataset: 69Top 5Edinburgh dataset: 92SNU dataset: 78 n/a n/a  
Han et al. (2020) [34] Top 1: 43Top 3: 62 n/a n/a  
Li et al. (2020) [44] 73 n/a n/a  
Wang et al. (2020) [40] 82 n/a n/a  
Minagawa et al. (2021) [42] 90 n/a n/a  
Yang et al. (2021) [43] Algorithm A: 88Algorithm B: 77Algorithm C: 90Algorithm D: 87 Algorithm A: 83Algorithm B: 63Algorithm C: 81Algorithm D: 80 Algorithm A: 98Algorithm B: 90Algorithm C: 99Algorithm D: 98  
Huang et al. (2021) [37] 5 class (KCGMH dataset): 727 class (HAM10000 dataset): 86 n/a n/a  
Risk categorical classification 
Zhao et al. (2019) [32] 83 Benign: 93Low risk: 85High risk: 86 Benign: 88Low risk: 85High risk: 91 Benign: 0.96Low risk: 0.92High risk: 0.95 
ReferenceAccuracy (%)Sensitivity (%)Specificity (%)AUC
Binary classification models 
Piccolo et al. (2002) [23] n/a 92 74 n/a 
Iyatomi et al. (2008) [24] n/a 86 86 0.93 
Chang et al. (2013) [25] 91 86 88 0.95 
Chen et al. (2016) [26] 91 90 92 n/a 
Yang et al. (2017) [27] 99.7 100 99 n/a 
Yu et al. (2018) [29] 82 93 72 0.80 
Cho et al. (2020) [33] n/a Dataset 1: 76Dataset 2: 70 Dataset 1: 80Dataset 2: 76 Dataset 1: 0.83Dataset 2: 0.77 
Huang et al. (2020) [37] 86 n/a n/a 0.92 
Han et al. (2020) [35] n/a 77 91 0.91 
Fujisawa et al. (2019) [31] 93 96 90 n/a 
Jinnai et al. (2019) [38] 92 83 95 n/a 
Han et al. (2020) [36] n/a n/a n/a Edinburgh dataset: 0.93SNU dataset: 0.94 
Han et al. (2020) [34] n/a Top 1: 63 Top 1: 90 0.86 
Li et al. (2020) [44] 86 75 93 n/a 
Wang et al. (2020) [40] 77 n/a n/a n/a 
Multiclass classification models 
Han et al. (2018) [28] n/a ASAN dataset: 86Edinburg dataset: 85 ASAN dataset: 86Edinburg dataset: 81  
Zhang et al. (2018) [30] Dataset A: 87Dataset B: 87 n/a n/a  
Fujisawa et al. (2019) [31] 77 n/a n/a  
Jinnai et al. (2019) [38] 87 86 87  
Liu et al. (2020) (26-classification model) [39] Top 1: 71Top 3: 93 Top 1: 58Top 3: 88 n/a  
Han et al. (2020) [36] Top 1Edinburgh dataset: 57SNU dataset: 45Top 3Edinburgh dataset: 84SNU dataset: 69Top 5Edinburgh dataset: 92SNU dataset: 78 n/a n/a  
Han et al. (2020) [34] Top 1: 43Top 3: 62 n/a n/a  
Li et al. (2020) [44] 73 n/a n/a  
Wang et al. (2020) [40] 82 n/a n/a  
Minagawa et al. (2021) [42] 90 n/a n/a  
Yang et al. (2021) [43] Algorithm A: 88Algorithm B: 77Algorithm C: 90Algorithm D: 87 Algorithm A: 83Algorithm B: 63Algorithm C: 81Algorithm D: 80 Algorithm A: 98Algorithm B: 90Algorithm C: 99Algorithm D: 98  
Huang et al. (2021) [37] 5 class (KCGMH dataset): 727 class (HAM10000 dataset): 86 n/a n/a  
Risk categorical classification 
Zhao et al. (2019) [32] 83 Benign: 93Low risk: 85High risk: 86 Benign: 88Low risk: 85High risk: 91 Benign: 0.96Low risk: 0.92High risk: 0.95 

Top: top-(n) accuracy represents the fact that the correct diagnosis is among the top n predictions output by the model.

For example, top-3 accuracy means that any of the top 3 highest probability predictions made by the model match the expected answer.

AUC, area under the curve.

The AI models using binary classification (16/22) reported an accuracy ranging from 70% to 99.7%. Of these studies, 6/16 reported ≥90% accuracy [25‒27, 31, 38, 41], three studies reported between 80 and 90% accuracy [29, 37, 44], and one study reported <80% accuracy [40]. Twelve AI models reported sensitivity and specificity as a measure of performance, which ranged from 58 to 100% and 72 to 99%, respectively. Eight studies provided an area under the curve (AUC) with 5/8 reporting values >0.9 [24, 25, 35‒37], with the remaining three models scoring between 0.77 and 0.86 [29, 33, 34].

For the 13 studies using multiclass output (i.e., >2 diagnoses), accuracy of models ranged from 43% to 93%. Six of these studies (6/13) scored <80% accuracy [31, 34, 36, 39, 41, 44], six others scored between 80 and 90% accuracy [30, 32, 38, 40, 42, 43], and one provided sensitivity and specificity of 86% and 86%, respectively, as a measure of performance [28].

Reader Studies

Reader studies, where the performance of AI models and clinician classification is compared, were performed in 14/22 studies, with results provided in Table 4[23, 25, 29, 31‒39, 42, 44]. Six studies compared AI outcomes to classification by experts, e.g., dermatologists [25, 32, 34, 36, 42, 44]. Eight studies compared outcomes for both experts and non-experts, e.g., dermatology residents and general practitioners [23, 29, 31, 33, 35, 37‒39].

Table 4.

Reader studies between AI models and human experts (e.g., dermatologists), and non-experts (e.g., dermatology residents, GPs)

ReferenceAI performanceExpert performanceNon-expert performance
Piccolo et al. (2002) [23] Sensitivity: 92%Specificity: 74% Sensitivity: 92%Specificity: 99% Sensitivity: 69%Specificity: 94% 
Chang et al. (2013) [25] Accuracy Melanoma: 91%Non-melanoma: 83%Sensitivity: 86%Specificity: 88% Accuracy: 81%Sensitivity: 83%Specificity: 86%  
Yu et al. (2018) [29] Accuracy: 82%Sensitivity: 93%Specificity: 72%AUC: 0.80 Accuracy: 81%Sensitivity: 97%Specificity: 67%AUC: 0.80 Accuracy: 65%Sensitivity: 45%Specificity: 84%AUC: 0.65 
Huang et al. (2020) [37] Sensitivity: 90%AUC 0.94 Sensitivity: 85%Specificity: 90% Sensitivity: 66%Specificity: 72% 
Han et al. (2020) [35] Sensitivity: 89%Specificity: 78%AUC: 0.92 Sensitivity: 95%Specificity: 72%ROC: 0.91 AccuracyDermatology resident: 94%Non-dermatology clinician: 77%SensitivityDermatology resident: 69%Non-dermatology clinician: 65%AUC Dermatology resident: 0.88Non-dermatology clinician: 0.73 
Fujisawa et al. (2019) [31] AccuracyBinary: 92%Multiclass: 75% AccuracyBinary: 85%Multiclass: 60% AccuracyBinary: 74%Multiclass: 42% 
Jinnai et al., (2019) [38] Accuracy: 92%Sensitivity: 83%Specificity: 95% Accuracy: 87%Sensitivity: 86%Specificity: 87% Accuracy: 85%Sensitivity: 84%Specificity: 86% 
Zhao et al. (2019) [32] SensitivityBenign: 90%Low risk: 90%High risk: 75% SensitivityBenign: 61%Low risk: 50%High risk: 64%  
Cho et al. (2020) [33] SensitivityDataset 1: 76%Dataset 2: 70%SpecificityDataset 1: 80%Dataset 2: 76%AUCDataset 1: 0.83Dataset 2: 0.77 Sensitivity-Without algorithm: 90%-With algorithm: 90%Specificity-Without algorithm: 58%-With algorithm: 61% SensitivityDermatology resident-Without algorithm: 80%-With algorithm: 85%Non-dermatology clinician-Without algorithm: 65%-With algorithm: 74%SpecificityDermatology resident-Without algorithm: 53%-With algorithm: 71%Non-dermatology clinician-Without algorithm: 46%-With algorithm: 49%AUCDermatology resident-Without algorithm: 0.33-With algorithm: 0.42Non-dermatology clinician-Without algorithm: 0.11-With algorithm: 0.23 
Han et al. (2020) [36] Multiclass modelAccuracyTop 1: 45%Top 3: 69%Top 5: 78% Multiclass modelAccuracy (without algorithm)Top 1: 50%Top 3: 67% (with algorithm)Top 1: 53%Top 3: 74%Binary modelAccuracy-Without algorithm: 77%-With algorithm: 85%  
Han et al. (2020) [34] Binary modelSensitivity: 67%Specificity: 87%Multiclass accuracyTop 1: 50%Top 3: 70% Binary modelSensitivity: 66%Specificity: 67%Multiclass accuracyTop 1: 38%Top 3: 53%  
Li et al. (2020) [44] AccuracyBinary: 73%Multiclass: 86% AccuracyBinary: 83%Multiclass: 74%  
Liu et al. (2020) [39] AccuracyTop 1: 66%Top 3: 90% AccuracyTop 1: 63%Top 3: 75% AccuracyPrimary care physicianTop 1: 44%Top 3: 60%Nurse practitionerTop 1: 40%Top 3: 55% 
Minagawa et al. (2021) [42] Accuracy: 71% Accuracy: 90%  
ReferenceAI performanceExpert performanceNon-expert performance
Piccolo et al. (2002) [23] Sensitivity: 92%Specificity: 74% Sensitivity: 92%Specificity: 99% Sensitivity: 69%Specificity: 94% 
Chang et al. (2013) [25] Accuracy Melanoma: 91%Non-melanoma: 83%Sensitivity: 86%Specificity: 88% Accuracy: 81%Sensitivity: 83%Specificity: 86%  
Yu et al. (2018) [29] Accuracy: 82%Sensitivity: 93%Specificity: 72%AUC: 0.80 Accuracy: 81%Sensitivity: 97%Specificity: 67%AUC: 0.80 Accuracy: 65%Sensitivity: 45%Specificity: 84%AUC: 0.65 
Huang et al. (2020) [37] Sensitivity: 90%AUC 0.94 Sensitivity: 85%Specificity: 90% Sensitivity: 66%Specificity: 72% 
Han et al. (2020) [35] Sensitivity: 89%Specificity: 78%AUC: 0.92 Sensitivity: 95%Specificity: 72%ROC: 0.91 AccuracyDermatology resident: 94%Non-dermatology clinician: 77%SensitivityDermatology resident: 69%Non-dermatology clinician: 65%AUC Dermatology resident: 0.88Non-dermatology clinician: 0.73 
Fujisawa et al. (2019) [31] AccuracyBinary: 92%Multiclass: 75% AccuracyBinary: 85%Multiclass: 60% AccuracyBinary: 74%Multiclass: 42% 
Jinnai et al., (2019) [38] Accuracy: 92%Sensitivity: 83%Specificity: 95% Accuracy: 87%Sensitivity: 86%Specificity: 87% Accuracy: 85%Sensitivity: 84%Specificity: 86% 
Zhao et al. (2019) [32] SensitivityBenign: 90%Low risk: 90%High risk: 75% SensitivityBenign: 61%Low risk: 50%High risk: 64%  
Cho et al. (2020) [33] SensitivityDataset 1: 76%Dataset 2: 70%SpecificityDataset 1: 80%Dataset 2: 76%AUCDataset 1: 0.83Dataset 2: 0.77 Sensitivity-Without algorithm: 90%-With algorithm: 90%Specificity-Without algorithm: 58%-With algorithm: 61% SensitivityDermatology resident-Without algorithm: 80%-With algorithm: 85%Non-dermatology clinician-Without algorithm: 65%-With algorithm: 74%SpecificityDermatology resident-Without algorithm: 53%-With algorithm: 71%Non-dermatology clinician-Without algorithm: 46%-With algorithm: 49%AUCDermatology resident-Without algorithm: 0.33-With algorithm: 0.42Non-dermatology clinician-Without algorithm: 0.11-With algorithm: 0.23 
Han et al. (2020) [36] Multiclass modelAccuracyTop 1: 45%Top 3: 69%Top 5: 78% Multiclass modelAccuracy (without algorithm)Top 1: 50%Top 3: 67% (with algorithm)Top 1: 53%Top 3: 74%Binary modelAccuracy-Without algorithm: 77%-With algorithm: 85%  
Han et al. (2020) [34] Binary modelSensitivity: 67%Specificity: 87%Multiclass accuracyTop 1: 50%Top 3: 70% Binary modelSensitivity: 66%Specificity: 67%Multiclass accuracyTop 1: 38%Top 3: 53%  
Li et al. (2020) [44] AccuracyBinary: 73%Multiclass: 86% AccuracyBinary: 83%Multiclass: 74%  
Liu et al. (2020) [39] AccuracyTop 1: 66%Top 3: 90% AccuracyTop 1: 63%Top 3: 75% AccuracyPrimary care physicianTop 1: 44%Top 3: 60%Nurse practitionerTop 1: 40%Top 3: 55% 
Minagawa et al. (2021) [42] Accuracy: 71% Accuracy: 90%  

In reader studies comparing binary classification between AI and experts (n = 11), one study reported similar diagnostic accuracy/specificity [29], three showed higher accuracy for AI models [25, 31, 38], and two reported higher accuracy in experts [42, 44]. Five studies reported specificity, sensitivity, and AUC instead of accuracy with varying outcomes [23, 32, 33, 35, 37]. For reader studies between AI and non-experts (n = 7), AI showed higher accuracy, specificity, sensitivity, and AUC in most studies [23, 29, 31, 33, 35, 37, 38]. In reader studies that compared multiclass classification between AI and expert readers (n = 5), three studies reported higher top-1 diagnosis (i.e., diagnosis with highest statistical probability) accuracy for AI [31, 34, 44], in one study readers performed better [36], and the last study had similar results between AI and readers [39]. In multiclass reader studies with non-experts, AI reported higher accuracy in both studies (n = 2) [31, 39].

One study compared categorical risk classification with experts and showed AI to have higher sensitivity across all categories [32]. Two studies additionally showed significant increase in accuracy when human experts were aided by AI outcomes [33, 36].

Quality Assessments

Studies included in the review were evaluated against the 25-point CLEAR Derm Consensus Guidelines [22], covering four domains (data, technique, technical assessment, and application). For each checklist item, each study was assessed to determine whether it fully, partially, or did not address the criteria. A summary of results is provided in Table 5, with the individual study scores available in online supplementary eTable 3. Overall, an average score of 17.4 (±2.2) out of a possible 25 points was obtained.

Table 5.

Summary of quality assessment on AI models reviewed

DomainChecklist itemQuality assessment, n (%)
fully addressedpartially addressednot addressed
Data Image types 21 (95) 1 (5) 0 (0) 
Image artifacts 12 (55) 5 (23) 5 (23) 
Technical acquisition details 22 (100) 0 (0) 0 (0) 
Pre-processing procedures 20 (91) 0 (0) 2 (9) 
Synthetic images made public if used 22 (100)a 0 (0) 0 (0) 
Public images adequately referenced 22 (100) 0 (0) 0 (0) 
Patient-level metadata 5 (23) 17 (77) 0 (23) 
Skin tone information and procedure by which skin tone was assessed 3 (14) 16 (73) 3 (14) 
Potential biases that may arise from use of patient information and metadata 9 (41) 7 (32) 6 (27) 
10 Dataset partitions 12 (55) 9 (41) 1 (5) 
11 Sample sizes of training, validation, and test sets 7 (32) 14 (64) 1 (5) 
12 External test set 3 (14) 2 (9) 17 (77) 
13 Multivendor images 20 (91) 2 (9) 0 (0) 
14 Class distribution and balance 5 (23) 15 (68) 2 (9) 
15 OOD images 2 (9) 7 (32) 13 (59) 
Technique 16 Labeling method (ground truth, who did it) 15 (68) 7 (32) 0 (0) 
17 References to common/accepted diagnostic labels 22 (100) 0 (0) 0 (0) 
18 Histopathologic review for malignancies 16 (73) 2 (9) 4 (18) 
19 Detailed description of algorithm development 14 (64) 6 (27) 2 (10) 
Technical Assessment 20 How to publicly evaluate algorithm 5 (23) 0 (0) 17 (77) 
21 Performance measures 9 (41) 13 (59) 0 (0) 
22 Benchmarking, technical comparison, and novelty 15 (68) 0 (0) 7 (32) 
23 Bias assessment 10 (45) 6 (27) 6 (27) 
Application 24 Use cases and target conditions (inside distribution) 16 (73) 6 (27) 0 (0) 
25 Potential impacts on the healthcare team and patients 3 (14) 13 (59) 6 (27) 
DomainChecklist itemQuality assessment, n (%)
fully addressedpartially addressednot addressed
Data Image types 21 (95) 1 (5) 0 (0) 
Image artifacts 12 (55) 5 (23) 5 (23) 
Technical acquisition details 22 (100) 0 (0) 0 (0) 
Pre-processing procedures 20 (91) 0 (0) 2 (9) 
Synthetic images made public if used 22 (100)a 0 (0) 0 (0) 
Public images adequately referenced 22 (100) 0 (0) 0 (0) 
Patient-level metadata 5 (23) 17 (77) 0 (23) 
Skin tone information and procedure by which skin tone was assessed 3 (14) 16 (73) 3 (14) 
Potential biases that may arise from use of patient information and metadata 9 (41) 7 (32) 6 (27) 
10 Dataset partitions 12 (55) 9 (41) 1 (5) 
11 Sample sizes of training, validation, and test sets 7 (32) 14 (64) 1 (5) 
12 External test set 3 (14) 2 (9) 17 (77) 
13 Multivendor images 20 (91) 2 (9) 0 (0) 
14 Class distribution and balance 5 (23) 15 (68) 2 (9) 
15 OOD images 2 (9) 7 (32) 13 (59) 
Technique 16 Labeling method (ground truth, who did it) 15 (68) 7 (32) 0 (0) 
17 References to common/accepted diagnostic labels 22 (100) 0 (0) 0 (0) 
18 Histopathologic review for malignancies 16 (73) 2 (9) 4 (18) 
19 Detailed description of algorithm development 14 (64) 6 (27) 2 (10) 
Technical Assessment 20 How to publicly evaluate algorithm 5 (23) 0 (0) 17 (77) 
21 Performance measures 9 (41) 13 (59) 0 (0) 
22 Benchmarking, technical comparison, and novelty 15 (68) 0 (0) 7 (32) 
23 Bias assessment 10 (45) 6 (27) 6 (27) 
Application 24 Use cases and target conditions (inside distribution) 16 (73) 6 (27) 0 (0) 
25 Potential impacts on the healthcare team and patients 3 (14) 13 (59) 6 (27) 

aNo studies included synthetic images (checklist item 5), therefore marked as “fully addressed” to not negatively impact quality score.

OOD, out of distribution.

Data

The data domain consists of 15 checklist items; of these, six items were fully addressed by >90% of publications, including checklist items (1) image types, (3) technical acquisition details, (4) pre-processing procedures, and (6) public images adequately referenced. No studies included synthetic images (item 5). Checklist items (2) image artifacts, (9) potential biases, and (10) dataset partitions (e.g., lesion distribution) were only fully addressed by approximately half of the studies. About a third of studies fully addressed the sample size of training, validation, and test sets (item 11) [28, 29, 34, 35, 39, 40, 43]. The most poorly addressed criteria were patient-level metadata (e.g., sex, gender, ethnicity; item 7), skin color information (item 8), using an external test dataset (item 12), class distribution and balance of the images (item 14), and out of distribution images (item 15).

Patient-level metadata (item 7) was partially reported in most studies. While geographical location, hospital location, and race were adequately reported, sex and age specifications were limited, as were anatomical location, relevant past medical history, and history of presenting illness. Skin color information was reported as Fitzpatrick’s skin type in only three studies [23, 39, 40].

Technique

All four checklist items in the technique domain were fully addressed by most publications. Image labeling method (item 16), e.g., information on ground truth, was fully addressed in 68% (n = 15) of publications. All studies used commonly accepted diagnostic labels (item 17). Histopathologic review (item 18) was fully addressed in 73% (n = 16) of studies. A detailed description of algorithm development (item 19) was provided in 64% (n = 14) of studies.

Technical Assessment

Of the four checkpoints, the most poorly addressed item was the public evaluation of the model (item 20), with only five papers fully addressing this item [29, 34‒36, 39] and the majority (n = 17, 77%) not addressing it at all. Four studies reported public availability of their AI algorithm or offered a public-facing test interface for external testing [28, 34‒36]. Performance measures model (item 21) was fully addressed by nine studies. Benchmarking, technical comparison, and novelty (item 22) compared to previously developed algorithm and human specialists were fully addressed by 15 studies, with technical bias assessment (item 23) discussed by 10 studies.

Application

The application domain consisted of two checklist items. Use of cases and target conditions (item 24) was fully addressed by 16 (73%) studies, while potential impacts on the healthcare team and patients (item 25) were fully addressed by only three studies [38, 39, 41].

We present, to our knowledge, the first systematic review summarizing existing AI image-based algorithms for the classification of skin lesions in people with skin of color. Our review identified 22 articles where the algorithms were primarily developed on or tested on images from people with skin of color.

Diversity and Transparency

Within the identified AI studies involving skin of color populations, there is further under-representation of darker skin types. We found that Asian populations (Chinese, Korean, and Japanese) were the major ethnic groups included, with limited datasets involving populations from the Middle East, India, Africa, Pacific Islands, and from Hispanic and Native Americans. Only three studies specifically reported skin color categories of type IV–VI and/or black or African American. Two North American studies reported only 4.3% [26] and 6.8% [39] black African American images, and one Italian-led study [23] reported 26.5% of images with Fitzpatrick skin type IV/V.

Adding to the issue was a lack of transparency in reporting skin color in image-based AI studies. A recently published scoping review [17] of 70 studies involving image-based AI algorithms for classification of skin conditions found only 14 (20%) provided details about race and seven (10%) provided detail about skin color. Furthermore, only three studies partially included Fitzpatrick skin type IV–VI populations. The lack of diversity in studies is likely fueled by the shortage of diverse publicly available datasets. A recent systematic review of 39 open access datasets and skin atlases reported that 12 of the 14 datasets that reported country of origin were from countries of predominantly European ancestry [16]. Only one dataset from Brazil reported that 5% of their population contained Fitzpatrick skin type IV–VI images [45], and another dataset comprised images from a South Korean population [36].

Model Performance

AI models in this review showed reasonably high-performance classification using both dermoscopic and clinical images in populations with skin of color. Accuracy was greater than 70% for all binary models reviewed, which is similar to models developed using Caucasian datasets [46]. It has previously been suggested that instead of training algorithms on more diverse datasets, it may be beneficial to develop separate algorithms for different populations [7]. Han et al. [28] demonstrated a significant difference in performance of an AI model trained using skin images from exclusive Asian populations, which had greater accuracy for classifying basal cell carcinomas and melanomas in Asian populations (96% and 96%, respectively) than in Caucasian populations (90% and 88%, respectively).

There is some evidence that adding out-of-distribution images (i.e., those not originally represented in the training dataset) to a pre-existing dataset and re-training can improve classification accuracy among the newly added group while not affecting accuracy among the original group [22]. Another previously suggested method is to use artificially darkened lesion images in the training dataset [47].

Quality Assessments

Our review identified several critical items from the CLEAR Checklist that were insufficiently addressed by the majority of AI imaging studies in populations with skin of color. Overall, discrepancies in image artifact description, lack of skin color information and other patient-level metadata, missing rationale for dataset sizes, and inclusion of external test sets were found. Image artifacts have been previously shown to affect algorithm performance in both dermatological images [48, 49] and clinical photographs [50]. Unclear descriptions of pre-processing of images can cause incorrect diagnosis, as can subtle alteration in color balance or rotation that can result in incorrect classifying melanoma as benign nevus [49]. Furthermore, less than half of the studies reviewed reported anatomical locations of imaged lesions. The body site of a lesion can be informative; for example, skin of color populations have a high prevalence of acral melanomas which are commonly found on the palms, soles, subungual region, and mucous membranes [51]. Future studies should consider adopting established consensus on image acquisition metrics [52] or dermatology-specific guidelines [22] to standardize images and develop robust archives of skin images used for research.

A major concern is that most AI algorithms have only been tested on internal datasets, e.g., those sampled from the same/similar populations. This issue was similarly highlighted in a recent review [18]. AI algorithms are prone to overfitting, which can result in inflated accuracy measures when only tested on internal datasets and significant performance drop when tested with images from a different source [9]. The lack of standardized performance measure model, publicly available diverse datasets, test interfaces, and reader studies presents a barrier to comparability and reproducibility of AI algorithms. An alternative solution is for algorithms to be evaluated on standardized public test datasets, with transparency on algorithm performance [22]. Testing models on external datasets and/or out-of-distribution images is a better indicator of AI model performance in a real-world setting. Given public dermoscopic datasets with diverse skin colors are now available, future algorithms should include external testing as gold standard evaluation [53].

Lastly, only limited studies included in the review addressed the potential impacts on the healthcare team and patients. Future studies would benefit by considering the clinical translation of AI models during the early stages of development and evaluating clinical utility of AI models along with their performance in conjunction with their intended users [36].

Limitations and Outlook

Our systematic review is not without limitations. First, our search was limited to articles published in English. This is particularly of note for populations with skin of color, where English may often not be the primary language. Additionally, as many studies did not report on skin of color status, inclusion was based on assumptions using ethnicity and geographical location. There can be significant variability in skin color even for a population of the same race or geographical location [54]. Reporting bias may also influence the review, as higher performing algorithms are more likely to be reported and accepted for publication.

AI algorithms have great potential to complement skin cancer screening and diagnosis of pigmented lesions, leading to improved dermatology workflows and patient outcomes. The performance and reliability of image-based AI algorithm significantly depend on the quality of data on which it is trained and tested. While this review provides promising development of AI algorithms in skin of color for East Asian populations, there are still significant discrepancies in the number of algorithms developed in populations with skin of color, particularly in darker skin types. Without inclusion of images from populations with skin of color, the incorporation of AI models into clinical practice could lead to missed and delayed diagnoses of neoplasms in people with skin of color, further widening existing health outcome disparities [55].

An ethics statement is not applicable because this study is based exclusively on published literature.

H. Peter Soyer is a shareholder of MoleMap NZ Ltd. and e-derm-consult GmbH and undertakes regular teledermatological reporting for both companies. H. Peter Soyer is a medical consultant for Canfield Scientific Inc., MoleMap Australia Pty Ltd., and Blaze Bioscience Inc. and a medical advisor for First Derm. The remaining authors have no conflicts of interests to declare.

H. Peter Soyer holds an NHMRC MRFF Next Generation Clinical Researchers Program Practitioner Fellowship (APP1137127). Clare Primiero is supported by an Australian Government Research Training Program Scholarship.

Yuyang Liu: conceptualization (equal), data curation (lead), formal analysis (equal), methodology (lead), validation (equal), and writing – original draft (lead). Clare Primiero: conceptualization (equal), funding acquisition (equal), methodology (equal), supervision (equal), and writing – review and editing (equal). Vishnutheertha Kulkarni: data curation (supporting), formal analysis (supporting), validation (equal), and writing – review and editing (supporting). H. Peter Soyer: conceptualization (supporting), funding acquisition (equal), supervision (supporting), and writing – review and editing (supporting). Brigid Betz-Stablein: conceptualization (equal), data curation (supporting), formal analysis (supporting), methodology (equal), supervision (equal), and writing – review and editing (equal).

This review is based exclusively on published literature. No publicly available datasets were used and no data were generated. Further inquiries can be directed to the corresponding author.

1.
Cormier
JN
,
Xing
Y
,
Ding
M
,
Lee
JE
,
Mansfield
PF
,
Gershenwald
JE
.
Ethnic differences among patients with cutaneous melanoma
.
Arch Intern Med
.
2006
;
166
(
17
):
1907
14
.
2.
Jackson
C
,
Maibach
H
.
Ethnic and socioeconomic disparities in dermatology
.
J Dermatolog Treat
.
2016
;
27
(
3
):
290
1
.
3.
Howlader
NNA
,
Krapcho
M
,
Miller
D
,
Brest
A
,
Yu
M
,
Ruhl
J
. edition,
SEER cancer statistics review, 1975-2018
. In.
Bethesda, MD
National Cancer Institute
2021
.
4.
Dawes
SM
,
Tsai
S
,
Gittleman
H
,
Barnholtz-Sloan
JS
,
Bordeaux
JS
.
Racial disparities in melanoma survival
.
J Am Acad Dermatol
.
2016
;
75
(
5
):
983
91
.
5.
Adamson
AS
.
Should we refer to skin as “ethnic
.
J Am Acad Dermatol
.
2017
;
76
(
6
):
1224
5
.
6.
Diao
JA
,
Adamson
AS
.
Representation and misdiagnosis of dark skin in a large-scale visual diagnostic challenge
.
J Am Acad Dermatol
.
2022
;
86
(
4
):
950
1
.
7.
Adamson
AS
,
Smith
A
.
Machine learning and health care disparities in dermatology
.
JAMA Dermatol
.
2018
;
154
(
11
):
1247
8
.
8.
Esteva
A
,
Kuprel
B
,
Novoa
RA
,
Ko
J
,
Swetter
SM
,
Blau
HM
.
Dermatologist-level classification of skin cancer with deep neural networks
.
Nature
.
2017
;
542
(
7639
):
115
8
.
9.
Tschandl
P
,
Codella
N
,
Akay
BN
,
Argenziano
G
,
Braun
RP
,
Cabo
H
.
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study
.
Lancet Oncol
.
2019
;
20
(
7
):
938
47
.
10.
Tschandl
P
,
Rinner
C
,
Apalla
Z
,
Argenziano
G
,
Codella
N
,
Halpern
A
.
Human-computer collaboration for skin cancer recognition
.
Nat Med
.
2020
;
26
(
8
):
1229
34
.
11.
Rotemberg
V
,
Kurtansky
N
,
Betz-Stablein
B
,
Caffery
L
,
Chousakos
E
,
Codella
N
.
A patient-centric dataset of images and metadata for identifying melanomas using clinical context
.
Sci Data
.
2021
;
8
(
1
):
34
.
12.
Tschandl
P
,
Rosendahl
C
,
Kittler
H
.
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
.
Sci Data
.
2018
;
5
:
180161
.
13.
Mendonca
T
,
Ferreira
PM
,
Marques
JS
,
Marcal
ARS
,
Rozeira
J
.
2: a dermoscopic image database for research and benchmarking
.
Annu Int Conf IEEE Eng Med Biol Soc
.
2013
;
2013
:
5437
40
.
14.
Giotis
I
,
Molders
N
,
Land
S
,
Biehl
M
,
Jonkman
MF
,
Petkov
N
.
MED-NODE: a computer-assisted melanoma diagnosis system using non-dermoscopic images
.
Expert Syst Appl
.
2015
;
42
(
19
):
6578
85
.
15.
de Faria
SMM
,
Henrique
M
,
Filipe
JN
,
Pereira
PMM
,
Tavora
LMN
,
Assuncao
PAA
.
Light field image dataset of skin lesions
.
Annu Int Conf IEEE Eng Med Biol Soc
.
2019
;
2019
:
3905
8
.
16.
Wen
D
,
Khan
SM
,
Ji Xu
A
,
Ibrahim
H
,
Smith
L
,
Caballero
J
.
Characteristics of publicly available skin cancer image datasets: a systematic review
.
Lancet Digit Health
.
2022
;
4
(
1
):
e64
74
.
17.
Daneshjou
R
,
Smith
MP
,
Sun
MD
,
Rotemberg
V
,
Zou
J
.
Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review
.
JAMA Dermatol
.
2021
;
157
(
11
):
1362
9
.
18.
Haggenmuller
S
,
Maron
RC
,
Hekler
A
,
Utikal
JS
,
Barata
C
,
Barnhill
RL
.
Skin cancer classification via convolutional neural networks: systematic review of studies involving human experts
.
Eur J Cancer
.
2021
;
156
:
202
16
.
19.
Thomsen
K
,
Iversen
L
,
Titlestad
TL
,
Winther
O
.
Systematic review of machine learning for diagnosis and prognosis in dermatology
.
J Dermatolog Treat
.
2020
;
31
(
5
):
496
510
.
20.
Freeman
K
,
Dinnes
J
,
Chuchu
N
,
Takwoingi
Y
,
Bayliss
SE
,
Matin
RN
.
Algorithm based smartphone apps to assess risk of skin cancer in adults: systematic review of diagnostic accuracy studies
.
BMJ
.
2020
368
m127
.
21.
Liberati
A
,
Altman
DG
,
Tetzlaff
J
,
Mulrow
C
,
Gøtzsche
PC
,
Ioannidis
JPA
.
The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration
.
Bmj
.
2009
339
b2700
.
22.
Daneshjou
R
,
Barata
C
,
Betz-Stablein
B
,
Celebi
ME
,
Codella
N
,
Combalia
M
.
Checklist for evaluation of image-based artificial intelligence reports in dermatology: CLEAR derm consensus guidelines from the international skin imaging collaboration artificial intelligence working group
.
JAMA Dermatol
.
2022
;
158
(
1
):
90
6
.
23.
Piccolo
D
,
Ferrari
A
,
Peris
K
,
Diadone
R
,
Ruggeri
B
,
Chimenti
S
.
Dermoscopic diagnosis by a trained clinician vs. a clinician with minimal dermoscopy training vs. computer-aided diagnosis of 341 pigmented skin lesions: a comparative study
.
Br J Dermatol
.
2002
;
147
(
3
):
481
6
.
24.
Iyatomi
H
,
Oka
H
,
Celebi
ME
,
Hashimoto
M
,
Hagiwara
M
,
Tanaka
M
.
An improved Internet-based melanoma screening system with dermatologist-like tumor area extraction algorithm
.
Comput Med Imaging Graph
.
2008
;
32
(
7
):
566
79
.
25.
Chang
WY
,
Huang
A
,
Yang
CY
,
Lee
CH
,
Chen
YC
,
Wu
TY
.
Computer-aided diagnosis of skin lesions using conventional digital photography: a reliability and feasibility study
.
PLoS One
.
2013
;
8
(
11
):
e76212
.
26.
Chen
RH
,
Snorrason
M
,
Enger
SM
,
Mostafa
E
,
Ko
JM
,
Aoki
V
.
Validation of a skin-lesion image-matching algorithm based on computer vision technology
.
Telemed J E Health
.
2016
;
22
(
1
):
45
50
.
27.
Yang
S
,
Oh
B
,
Hahm
S
,
Chung
KY
,
Lee
BU
.
Ridge and furrow pattern classification for acral lentiginous melanoma using dermoscopic images
.
Biomed Signal Process Control
.
2017
;
32
:
90
6
.
28.
Han
SS
,
Kim
MS
,
Lim
W
,
Park
GH
,
Park
I
,
Chang
SE
.
Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm
.
J Invest Dermatol
.
2018
;
138
(
7
):
1529
38
.
29.
Yu
C
,
Yang
S
,
Kim
W
,
Jung
J
,
Chung
K-Y
,
Lee
SW
.
Acral melanoma detection using a convolutional neural network for dermoscopy images
.
PLoS One
.
2018
;
13
(
3
):
e0193321
.
30.
Zhang
X
,
Wang
S
,
Liu
J
,
Tao
C
.
Towards improving diagnosis of skin diseases by combining deep neural network and human knowledge
.
BMC Med Inform Decis Mak
.
2018
18
Suppl 2
59
–.
31.
Fujisawa
Y
,
Otomo
Y
,
Ogata
Y
,
Nakamura
Y
,
Fujita
R
,
Ishitsuka
Y
.
Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis
.
Br J Dermatol
.
2019
;
180
(
2
):
373
81
.
32.
Zhao
XY
,
Wu
X
,
Li
FF
,
Li
Y
,
Huang
WH
,
Huang
K
.
The application of deep learning in the risk grading of skin tumors for patients using clinical images
.
J Med Syst
.
2019
;
43
(
8
):
283
.
33.
Cho
SI
,
Sun
S
,
Mun
JH
,
Kim
C
,
Kim
SY
,
Cho
S
.
Dermatologist-level classification of malignant lip diseases using a deep convolutional neural network
.
Br J Dermatol
.
2020
;
182
(
6
):
1388
94
.
34.
Han
SS
,
Moon
IJ
,
Kim
SH
,
Na
J-I
,
Kim
MS
,
Park
GH
.
Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: a retrospective validation study
.
PLoS Med
.
2020
;
17
(
11
):
e1003381
.
35.
Han
SS
,
Moon
IJ
,
Lim
W
,
Suh
IS
,
Lee
SY
,
Na
JI
.
Keratinocytic skin cancer detection on the face using region-based convolutional neural network
.
JAMA Dermatol
.
2020
;
156
(
1
):
29
37
.
36.
Han
SS
,
Park
I
,
Eun Chang
S
,
Lim
W
,
Kim
MS
,
Park
GH
.
Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders
.
J Invest Dermatol
.
2020
;
140
(
9
):
1753
61
.
37.
Huang
K
,
He
X
,
Jin
Z
,
Wu
L
,
Zhao
X
,
Wu
Z
.
Assistant diagnosis of basal cell carcinoma and seborrheic keratosis in Chinese population using convolutional neural network
.
J Healthc Eng
.
2020
;
2020
:
1713904
.
38.
Jinnai
S
,
Yamazaki
N
,
Hirano
Y
,
Sugawara
Y
,
Ohe
Y
,
Hamamoto
R
.
The development of a skin cancer classification system for pigmented skin lesions using deep learning
.
Biomolecules
.
2020
;
10
(
8
):
1123
.
39.
Liu
Y
,
Jain
A
,
Eng
C
,
Way
DH
,
Lee
K
,
Bui
P
.
A deep learning system for differential diagnosis of skin diseases
.
Nat Med
.
2020
;
26
(
6
):
900
8
.
40.
Wang
S-Q
,
Zhang
X-Y
,
Liu
J
,
Tao
C
,
Zhu
C-Y
,
Shu
C
.
Deep learning-based, computer-aided classifier developed with dermoscopic images shows comparable performance to 164 dermatologists in cutaneous disease diagnosis in the Chinese population
.
Chin Med J
.
2020
;
133
(
17
):
2027
36
.
41.
Huang
HW
,
Hsu
BWY
,
Lee
CH
,
Tseng
VS
.
Development of a light-weight deep learning model for cloud applications and remote diagnosis of skin cancers
.
J Dermatol
.
2021
;
48
(
3
):
310
6
.
42.
Minagawa
A
,
Koga
H
,
Sano
T
,
Matsunaga
K
,
Teshima
Y
,
Hamada
A
.
Dermoscopic diagnostic performance of Japanese dermatologists for skin tumors differs by patient origin: a deep learning convolutional neural network closes the gap
.
J Dermatol
.
2021
;
48
(
2
):
232
6
.
43.
Yang
Y
,
Ge
Y
,
Guo
L
,
Wu
Q
,
Peng
L
,
Zhang
E
.
Development and validation of two artificial intelligence models for diagnosing benign, pigmented facial skin lesions
.
Skin Res Technol
.
2021
;
27
(
1
):
74
9
.
44.
Li
CX
,
Fei
WM
,
Shen
CB
,
Wang
ZY
,
Jing
Y
,
Meng
RS
.
Diagnostic capacity of skin tumor artificial intelligence-assisted decision-making software in real-world clinical settings
.
Chin Med J
.
2020
;
133
(
17
):
2020
6
.
45.
Pacheco
AGC
,
Lima
GR
,
Salomão
AS
,
Krohling
B
,
Biral
IP
,
de Angelo
GG
.
PAD-UFES-20: a skin lesion dataset composed of patient data and clinical images collected from smartphones
.
Data Brief
.
2020
;
32
:
106221
.
46.
Takiddin
A
,
Schneider
J
,
Yang
Y
,
Abd-Alrazaq
A
,
Househ
M
.
Artificial intelligence for skin cancer detection: scoping review
.
J Med Internet Res
.
2021
;
23
(
11
):
e22934
.
47.
Aggarwal
P
,
Papay
FA
.
Artificial intelligence image recognition of melanoma and basal cell carcinoma in racially diverse populations
.
J Dermatolog Treat
.
2021
1
6
.
48.
Winkler
JK
,
Fink
C
,
Toberer
F
,
Enk
A
,
Deinlein
T
,
Hofmann-Wellenhof
R
.
Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition
.
JAMA Dermatol
.
2019
;
155
(
10
):
1135
41
.
49.
Du-Harpur
X
,
Arthurs
C
,
Ganier
C
,
Woolf
R
,
Laftah
Z
,
Lakhan
M
.
Clinically relevant vulnerabilities of deep machine learning systems for skin cancer diagnosis
.
J Invest Dermatol
.
2021
;
141
(
4
):
916
20
.
50.
Phillips
M
,
Marsden
H
,
Jaffe
W
,
Matin
RN
,
Wali
GN
,
Greenhalgh
J
.
Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions
.
JAMA Netw Open
.
2019
;
2
(
10
):
e1913436
.
51.
Bellew
S
,
Del Rosso
JQ
,
Kim
GK
.
Skin cancer in asians: part 2: melanoma
.
J Clin Aesthet Dermatol
.
2009
;
2
(
10
):
34
6
.
52.
Katragadda
C
,
Finnane
A
,
Soyer
HP
,
Marghoob
AA
,
Halpern
A
,
Malvehy
J
.
Technique standards for skin lesion imaging: a delphi consensus statement
.
JAMA Dermatol
.
2017
;
153
(
2
):
207
13
.
53.
Daneshjou
R
,
Vodrahalli
K
,
Liang
W
,
Novoa
R
,
Jenkins
M
,
Rotemberg
V
Disparities in dermatology AI: assessments using diverse clinical images
.
2021
.
54.
Wei
L
,
Xuemin
W
,
Wei
L
,
Li
L
,
Ping
Z
,
Yanyu
W
.
Skin color measurement in Chinese female population: analysis of 407 cases from 4 major cities of China
.
Int J Dermatol
.
2007
;
46
(
8
):
835
9
.
55.
Adamson
AS
,
Essien
U
,
Ewing
A
,
Daneshjou
R
,
Hughes-Halbert
C
,
Ojikutu
B
.
Diversity, race, and health
.
Med
.
2021
;
2
(
1
):
6
10
.
56.
International Skin Imaging Collaboration [Available from: https://www.isic- archive.com/#!/topWithHeader/onlyHeaderTop/gallery].
57.
Argenziano
G
,
Soyer
HP
,
De Giorgio
V
Interactive atlas of dermoscopy
.
2000
.
58.
NCF
Codella
,
Gutman
D
,
Celebi
ME
,
Helba
B
,
Marchetti
MA
,
Dusza
SW
.
Skin lesion analysis toward melanoma detection: a challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC)
.
IEEE 15th Int Symposium Biomed Imaging (ISBI 2018)
.
2018
168
72
.
59.
Olga Russakovsky*
JD
,
Su
H
,
Krause
J
,
Satheesh
S
,
Ma
S
,
Huang
Z
.
(* = equal contribution) ImageNet large scale visual recognition challe. ImageNet large scale visual recognition challenge
.
IJCV
.
2015
.
60.
DermQuest. [Online].(2012). [Available from: http://www.dermquest.com].
61.
Dermatology information system
. [Online].(2012). [Available from: http://www.dermis.net].
62.
Alves
J
,
Moreira
D
,
Alves
P
,
Rosado
L
,
Vasconcelos
MJM
.
Automatic focus assessment on dermoscopic images acquired with smartphones
.
Sensors
.
2019
;
19
(
22
):
4957
.
63.
DermNet
NZ
. [Online]. [Available from: https://dermnetnz.org/.