Introduction: The application of artificial intelligence (AI) algorithms in serous fluid cytology is lacking due to the deficiency in standardized publicly available datasets. Here, we develop a novel public serous effusion cytology dataset. Furthermore, we apply AI algorithms on it to test its diagnostic utility and safety in clinical practice. Methods: The work is divided into three phases. Phase 1 entails building the dataset based on the multitiered evidence-based classification system proposed by the International System (TIS) of serous fluid cytology along with ground-truth tissue diagnosis for malignancy. To ensure reliable results of future AI research on this dataset, we carefully consider all the steps of the preparation and staining from a real-world cytopathology perspective. In phase 2, we pay special consideration to the image acquisition pipeline to ensure image integrity. Then we utilize the power of transfer learning using the convolutional layers of the VGG16 deep learning model for feature extraction. Finally, in phase 3, we apply the random forest classifier on the constructed dataset. Results: The dataset comprises 3,731 images distributed among the four TIS diagnostic categories. The model achieves 74% accuracy in this multiclass classification problem. Using a one-versus-all classifier, the fallout rate for images that are misclassified as negative for malignancy despite being a higher risk diagnosis is 0.13. Most of these misclassified images (77%) belong to the atypia of undetermined significance category in concordance with real-life statistics. Conclusion: This is the first and largest publicly available serous fluid cytology dataset based on a standardized diagnostic system. It is also the first dataset to include various types of effusions and pericardial fluid specimens. In addition, it is the first dataset to include the diagnostically challenging atypical categories. AI algorithms applied on this novel dataset show reliable results that can be incorporated into actual clinical practice with minimal risk of missing a diagnosis of malignancy. This work provides a foundation for researchers to develop and test further AI algorithms for the diagnosis of serous effusions.

1.
Pinto
D
,
Chandra
A
,
Crothers
BA
,
Kurtycz
DF
,
Schmitt
F
.
The International system for reporting serous fluid cytopathology – diagnostic categories and clinical management
.
J Am Soc Cytopathol
.
2020
;
9
(
6
):
469
77
.
2.
Elsheikh
TM
,
Austin
RM
,
Chhieng
DF
,
Miller
FS
,
Moriarty
AT
,
Renshaw
AA
, et al
.
American society of cytopathology workload recommendations for automated pap test screening: developed by the productivity and quality assurance in the era of automated screening task force
.
Diagn Cytopathol
.
2013
;
41
(
2
):
174
8
.
3.
Victória Matias
A
,
Atkinson Amorim
JG
,
Buschetto Macarini
LA
,
Cerentini
A
,
Casimiro Onofre
AS
,
De Miranda Onofre
FB
, et al
.
What is the state of the art of computer vision-assisted cytology? A Systematic Literature Review
.
Comput Med Imaging Graph
.
2021
;
91
:
101934
.
4.
Phoulady
HA
,
Mouton
PR
.
A new cervical cytology dataset for nucleus detection and image classification (Cervix93) and methods for cervical nucleus detection
.
arXiv
.
2018
. preprint arXiv:1811.09651.
5.
Araújo
FH
,
Silva
RR
,
Ushizima
DM
,
Rezende
MT
,
Carneiro
CM
,
Campos Bianchi
AG
, et al
.
Deep learning for cell image segmentation and ranking
.
Comput Med Imaging Graph
.
2019
;
72
:
13
21
.
6.
Hussain
E
,
Mahanta
LB
,
Borah
H
,
Das
CR
.
Liquid based-cytology Pap smear dataset for automated multi-class diagnosis of pre-cancerous and cervical cancer lesions
.
Data Brief
.
2020
;
30
:
105589
.
7.
Rezende
MT
,
Silva
R
,
Bernardo
FDO
,
Tobias
AH
,
Oliveira
PH
,
Machado
TM
, et al
.
Cric searchable image database as a public platform for conventional pap smear cytology data
.
Sci Data
.
2021
;
8
(
1
):
151
.
8.
Liang
Y
,
Tang
Z
,
Yan
M
,
Chen
J
,
Liu
Q
,
Xiang
Y
.
Comparison detector for cervical cell/clumps detection in the limited data scenario
.
Neurocomputing
.
2021
;
437
:
195
205
.
9.
Chen
F
,
Xie
J
,
Zhang
H
,
Xia
D
.
A technique based on wavelet and morphology transform to recognize the cancer cell in pleural effusion
.
Proceedings international workshop on medical imaging and augmented reality
.
IEEE
;
2001
. p.
199
203
.
10.
Alayón
S
,
Estévez
JI
,
Sigut
J
,
Sánchez
JL
,
Toledo
P
.
An evolutionary Michigan recurrent fuzzy system for nuclei classification in cytological images using nuclear chromatin distribution
.
J Biomed Inform
.
2006
;
39
(
6
):
573
88
.
11.
Martínez
VE
,
Rodríguez
AF
,
Remolina
JF
,
Álvarez
OM
.
Recognition system for mesothelials cells classification
.
World Congress On Medical Physics and Biomedical Engineering, September 7-12, 2009, Munich, Germany: vol. 25/5 information and communication in medicine, telemedicine and e-health
.
Springer Berlin Heidelberg
;
2009
. p.
263
6
.
12.
Barwad
A
,
Dey
P
,
Susheilia
S
.
Artificial neural network in diagnosis of metastatic carcinoma in effusion cytology
.
Cytometry B Clin Cytom
.
2012
;
82
(
2
):
107
11
.
13.
Vargason
TJ
,
Cohn
J
,
Rios
D
,
Schultz
O
,
Cleary
J
,
Lau
D
,
Qiao
X
.
A clinical decision support system for malignant pleural effusion analysis
.
Justice & well-being studies faculty scholarship
.
2016
. https://orb.binghamton.edu/multigenerational_fac/1.
14.
Teramoto
A
,
Yamada
A
,
Kiriyama
Y
,
Tsukamoto
T
,
Yan
K
,
Zhang
L
, et al
.
Automated classification of benign and malignant cells from lung cytological images using deep convolutional neural network
.
Inform Med Unlocked
.
2019
;
16
:
100205
.
15.
Aboobacker
S
,
Vijayasenan
D
,
David
SS
,
Suresh
PK
,
Sreeram
S
.
A deep learning model for the automatic detection of malignancy in effusion cytology
.
2020 IEEE international conference on signal processing, communications and computing (ICSPCC)
.
IEEE
;
2020
. p.
1
5
.
16.
Teramoto
A
,
Tsukamoto
T
,
Yamada
A
,
Kiriyama
Y
,
Imaizumi
K
,
Saito
K
, et al
.
Deep learning approach to classification of lung cytological images: two-step training using actual and synthesized images by progressive growing of generative adversarial networks
.
PLoS One
.
2020
;
15
(
3
):
e0229951
.
17.
Su
F
,
Sun
Y
,
Hu
Y
,
Yuan
P
,
Wang
X
,
Wang
Q
, et al
.
Development and validation of a deep learning system for ascites cytopathology interpretation
.
Gastric Cancer
.
2020
;
23
(
6
):
1041
50
.
18.
Sanyal
P
,
Paul
S
,
Rana
V
,
Kulhari
K
.
A machine learning model for screening of body fluid cytology smears
.
bioRxiv
.
2021
.
19.
Teramoto
A
,
Kiriyama
Y
,
Tsukamoto
T
,
Sakurai
E
,
Michiba
A
,
Imaizumi
K
, et al
.
Weakly supervised learning for classification of lung cytological images using attention-based multiple instance learning
.
Sci Rep
.
2021
;
11
(
1
):
20317
.
20.
UÇAn
M
,
Kaya
B
,
Kaya
M
.
Comparison of deep learning models for body cavity fluid cytology images classification
.
2022 International Conference on data analytics for business and industry (ICDABI)
.
IEEE
;
2022
. p.
151
5
.
21.
Xie
X
,
Fu
CC
,
Lv
L
,
Ye
Q
,
Yu
Y
,
Fang
Q
, et al
.
Deep convolutional neural network-based classification of cancer cells on cytological pleural effusion images
.
Mod Pathol
.
2022
;
35
(
5
):
609
14
.
22.
Win
KY
,
Choomchuay
S
,
Hamamoto
K
.
K mean clustering based automated segmentation of overlapping cell nuclei in pleural effusion cytology images
.
2017 International Conference on Advanced Technologies for Communications (ATC)
.
IEEE
;
2017
. p.
265
9
.
23.
Win
KY
,
Choomchuay
S
,
Hamamoto
K
.
Automated segmentation and isolation of touching cell nuclei in cytopathology smear images of pleural effusion using distance transform watershed method
. Second International Workshop on pattern recognition.
SPIE
;
2017
;
Vol. 10443
. p.
126
30
.
24.
Win
KY
,
Choomchuay
S
,
Hamamoto
K
,
Raveesunthornkiat
M
.
Artificial neural network based nuclei segmentation on cytology pleural effusion images
.
2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)
.
IEEE
;
2017
. p.
245
9
.
25.
Win
KY
,
Choomchuay
S
,
Hamamoto
K
,
Raveesunthornkiat
M
,
Rangsirattanakul
L
,
Pongsawat
S
.
Computer aided diagnosis system for detection of cancer cells on cytological pleural effusion images
.
BioMed Res Int
.
2018
;
2018
:
6456724
.
26.
Mills
SE
.
Histology for pathologists essay
:
Wolters Kluwer
;
2020
.
27.
Szeliski
R
.
Computer vision: algorithms and applications
.
Springer Nature
;
2022
.
28.
Merchant
FA
,
Castleman
KR
.
Microscope image processing
.
Elsevier Academic Press
;
2023
.
29.
Bisong
E
.
Building Machine Learning and deep learning models on Google Cloud Platform: a comprehensive guide for beginners essay
:
Apress
;
2019
.
30.
Adnan
M
,
Kalra
S
,
Cresswell
JC
,
Taylor
GW
,
Tizhoosh
HR
.
Federated learning and differential privacy for medical image analysis
.
Sci Rep
.
2022
;
12
(
1
):
1953
.
31.
Goodfellow
I
,
Bengio
Y
,
Courville
A
.
Deep learning
.
MIT press
;
2016
.
32.
Zhang
W
,
Li
R
,
Zeng
T
,
Sun
Q
,
Kumar
S
,
Ye
J
, et al
.
Deep model based transfer and multi-task learning for biological image analysis
.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
;
2015
. p.
1475
84
.
33.
Tsukamoto
T
,
Teramoto
A
,
Yamada
A
,
Kiriyama
Y
,
Sakurai
E
,
Michiba
A
, et al
.
Comparison of fine-tuned deep convolutional neural networks for the automated classification of lung cancer cytology images with integration of additional classifiers
.
Asian Pac J Cancer Prev
.
2022
;
23
(
4
):
1315
24
.
34.
Simonyan
K
,
Zisserman
A
.
Very deep convolutional networks for large-scale image recognition
.
arXiv
.
2014
. preprint arXiv:1409.1556. https://arxiv.org/pdf/1409.1556.pdf%E3%80%82.
35.
Rooper
LM
,
Ali
SZ
,
Olson
MT
.
A minimum fluid volume of 75 mL is needed to ensure adequacy in a pleural effusion: a retrospective analysis of 2540 cases
.
Cancer Cytopathol
.
2014
;
122
(
9
):
657
65
.
36.
Thomas
SC
,
Davidson
LRR
,
McKean
ME
.
An investigation of adequate volume for the diagnosis of malignancy in pleural fluids
.
Cytopathology
.
2011
;
22
(
3
):
179
83
. https://doi.org/.
37.
Deng
J
,
Dong
W
,
Socher
R
,
Li
LJ
,
Li
K
,
Fei-Fei
L
.
Imagenet: a large-scale hierarchical image database
.
2009 IEEE conference on computer vision and pattern recognition
.
IEEE
;
2009
. p.
248
55
.
38.
Chollet, Francois and others
.
Keras applications
.
2015
. https://keras.io/api/applications.
39.
Kratz
A
,
Lee
SH
,
Zini
G
,
Riedl
JA
,
Hur
M
,
Machin
S
;
International Council for Standardization in Haematology
.
Digital morphology analyzers in hematology: ICSH review and recommendations
.
Int J Lab Hematol
.
2019
;
41
(
4
):
437
47
.
40.
Dey
P
.
Routine staining in cytology laboratory
.
Basic and advanced laboratory techniques in histopathology and cytology
;
2018
; p.
133
8
.
41.
Jörundsson
E
,
Lumsden
JH
,
Jacobs
RM
.
Rapid staining techniques in cytopathology: a review and comparison of modified protocols for hematoxylin and eosin, Papanicolaou and Romanowsky stains
.
Vet Clin Pathol
.
1999
;
28
(
3
):
100
8
.
42.
Morrison
LE
,
Lefever
MR
,
Lewis
HN
,
Kapadia
MJ
,
Bauer
DR
.
Conventional histological and cytological staining with simultaneous immunohistochemistry enabled by invisible chromogens
.
Lab Invest
.
2022
;
102
(
5
):
545
53
.
43.
Hua
HY
,
Wei
CY
,
Mantoo
S
.
Reviving the faded pap stained slides–experience with use of optical clearing agent
.
Pathology
.
2017
;
49
:
S77
8
.
44.
Dadhich
H
,
Toi
PC
,
Siddaraju
N
,
Sevvanthi
K
.
A comparative analysis of conventional cytopreparatory and liquid based cytological techniques (Sure Path) in evaluation of serous effusion fluids
.
Diagn Cytopathol
.
2016
;
44
(
11
):
874
9
.
45.
Baykus
N
,
Özekinci
S
,
Erdem
ZB
,
Vurgun
E
,
Yildiz
FR
.
Comparison of morphological similarities and differences between liquid-based cytology and conventional techniques of serous effusion cytology specimens
.
Acta Cytol
.
2022
;
66
(
2
):
159
64
.
46.
Gonzalez
RC
,
Woods
RE
.
Digital image processing essay
:
Pearson
;
2019
.
47.
Teramoto
A
,
Tsukamoto
T
,
Kiriyama
Y
,
Fujita
H
.
Automated classification of lung cancer types from cytological images using deep convolutional neural networks
.
BioMed Res Int
.
2017
;
2017
:
4067832
.
48.
Luke
JJ
,
Joseph
R
,
Balaji
M
.
Impact of image size on accuracy and generalization of convolutional neural networks
.
Int J Res Anal Rev.(IJRAR)
.
2019
;
6
(
1
):
70
80
.
49.
Gall
J
,
Razavi
N
,
Van Gool
L
.
An introduction to random forests for multi-class object detection
.
Outdoor and large-scale real-world scene analysis: 15th International Workshop on Theoretical Foundations of Computer Vision, Dagstuhl Castle, Germany, June 26-July 1, 2011. Revised selected papers
.
Berlin Heidelberg
:
Springer
;
2012
. p.
243
63
.
50.
Khatami
A
,
Araghi
S
,
Babaei
T
.
Evaluating the performance of different classification methods on medical X-ray images
.
SN Appl Sci
.
2019
;
1
(
10
):
1154
7
.
51.
Breiman
L
.
Random forests
.
Mach Learn
.
2001
;
45
(
1
):
5
32
.
52.
Baykal
E
,
Dogan
H
,
Ercin
ME
,
Ersoz
S
,
Ekinci
M
.
Transfer learning with pre-trained deep convolutional neural networks for serous cell classification
.
Multimed Tools Appl
.
2020
;
79
(
21–22
):
15593
611
.
53.
Khozeimeh
F
,
Sharifrazi
D
,
Izadi
NH
,
Joloudari
JH
,
Shoeibi
A
,
Alizadehsani
R
, et al
.
RF-CNN-F: random forest with convolutional neural network features for coronary artery disease diagnosis based on cardiac magnetic resonance
.
Sci Rep
.
2022
;
12
(
1
):
11178
.
You do not currently have access to this content.