Background: Modern machine learning and deep learning algorithms require large amounts of data; however, data sharing between multiple healthcare institutions is limited by privacy and security concerns. Summary: Federated learning provides a functional alternative to the single-institution approach while avoiding the pitfalls of data sharing. In cross-silo federated learning, the data do not leave a site. The raw data are stored at the site of collection. Models are created at the site of collection and are updated locally to achieve a learning objective. We demonstrate a use case with COVID-19-associated AKI. We showed that federated models outperformed their local counterparts, even when evaluated on local data in the test dataset, and performance was like those being used for pooled data. Increases in performance at a given hospital were inversely proportional to dataset size at a given hospital, which suggests that hospitals with smaller datasets have significant room for growth with federated learning approaches. Key Messages: This short article provides an overview of federated learning, gives a use case for COVID-19-associated acute kidney injury, and finally details the issues along with some potential solutions.

The advent of large machine learning models in healthcare has changed the way that prognostic and diagnostic models take advantage of electronic health record data. With the advent of deep learning methods such as large convolutional neural networks [1] and large language models [2], the approach toward prognostic models in medicine is becoming increasingly data-driven. Deep learning is a type of machine learning based on artificial neural networks in which multiple layers of processing are used to extract progressively higher level features (and performance) from data [3]. Multilayer perceptrons (MLPs) have demonstrated a scalable capacity to represent increasingly complex data with relatively few bounds on the generalizability of the model as sample size increases, unlike previous machine learning methods. As a result, in healthcare, there has been a pivot from the traditional framework of sample size estimation and effect size calculations where limited data were collected to mitigate potential patient-specific confounders to collecting as much data as possible to fully represent the full spectrum of potential disease states.

Thus, larger sample size is one of the critical roadblocks in unlocking the full potential of machine learning in medicine. However, collecting a large range of data modalities across the spectrum of potential patients faces challenges. Specifically, three primary challenges face data sharing across institutions – privacy, compliance, and intellectual property [4]. Inter-institutional modalities such as Google Health and Microsoft HealthVault have failed to satisfy basic privacy concerns, which put both the individual and institution at risk [5]. Second, poorly interoperability systems make data sharing difficult as institutions may have conflicting or out of date patient information [6]. Third, intellectual property filed by individual institutions may be contingent on individual patient data, and these intellectual properties provide financial funding for clinical trials that ultimately benefit patients [7].

Because of the limitations associated with data sharing across institutions due to concern over data privacy, one potential solution would be to train on a patient population belonging to a single hospital. However, single-hospital training data render the model vulnerable to inherent biases specific to that site. In the field of kidney disease, external validation remains an unresolved issue and can lead to detrimental decreases in performance across different patient populations [8].

Federated learning provides a functional alternative to the single-institution approach while avoiding the pitfalls of data sharing. In cross-silo federated learning, the data do not leave a site. The client’s raw data are stored at the site of collection. Models are created at the site of collection and are updated locally to achieve a learning objective. Instead, the model’s parameters, a representation of how the model maps the inputs to the outputs, are sent to a central server. At the central server, it can be aggregated with other models sent from different sites. The aggregated model can be sent back to a local site and combined with the local model for future prediction (Fig. 1).

Fig. 1.

Simplified overview of federated learning. This schematic shows how model predictions could be federated across institutions and the weights collected at a central aggregator using averaging without sharing any data being shared between institutions. The light blue arrows represent the weights of the joint model being shared with individual institutions, and the dark blue arrows represent the weights returned to the central aggregator after being trained on an individual institution’s data. The pink arrow with the red circle represents the inability to share data between hospitals.

Fig. 1.

Simplified overview of federated learning. This schematic shows how model predictions could be federated across institutions and the weights collected at a central aggregator using averaging without sharing any data being shared between institutions. The light blue arrows represent the weights of the joint model being shared with individual institutions, and the dark blue arrows represent the weights returned to the central aggregator after being trained on an individual institution’s data. The pink arrow with the red circle represents the inability to share data between hospitals.

Close modal

A model relevant to healthcare, for example, could be a MLP that maps a chest X-ray to a diagnosis of pneumonia. Every time a chest X-ray is collected at a hospital, and a diagnosis is made, the model is updated via an iterative optimization procedure such as a stochastic gradient descent at that given hospital site. After a set time frame of a few weeks, for example, the model parameters are sent to a central server site. The central server will aggregate the models sent from the different sites and then send that aggregated model to each of the sites. By only sharing the weights of the models rather than the actual data, federated learning avoids the pitfalls of data sharing while maintaining the scalability of dataset size on important diagnostic and prognostic tasks.

COVID-19 presents itself in a variety of phenotypic expressions, but a common complication is acute kidney injury. AKI prevalence in the COVID-19 individuals has been up to 46%, and AKI-associated mortality in individuals with COVID-19 has ranged from 30 to 70% [9]. Preemptively stratifying risk of AKI can be valuable in resource-constrained settings and was especially important during the COVID-19 pandemic where the NYC hospital system was often strained. We utilized COVID-19-associated AKI as a benchmark to evaluate how federated learning approaches compare to both individual institutions and pooled approaches.

The patient population included 4,029 individuals without a history of transplant, a diagnosis of kidney failure, or admission less than 48 h spread across five distinct hospital sites in the Mount Sinai Hospital system. The key input data included demographics, comorbidities, laboratory values, and vital signs. The key output data were the presence or absence of acute kidney injury within 3 and 7 days of admission. There was significant variability in both the input data as well as the outcome, with prevalence of AKI ranging from 30 to 70%.

Two federated models were trained including a MLP) and Lasso regression model. Three distinct training strategies were utilized – local, federated, and pooled. Local models were trained only on data from a single hospital. Local models do not require the least interoperable interface but have the least amount of data to generalize on. The pooled model reflected an ideal data sharing scenario, where the model was trained on all the data across all the hospitals. In practice, implementing a cross-site pooled model is impractical due to rigorous data-sharing practices between hospitals but is made plausible in this scenario due to centralization of the Sinai Hospital System. Federated models were trained via the decentralized update followed by centralized aggregation. Federated models are both practical in that they do not require interoperable hospitals and high-performing as they utilize all available data jointly. However, they require additional considerations, which are further discussed. Model performance was evaluated using AUROC bootstrapped 100× with a 70–30 train-test split.

We showed that federated models, both Lasso and MLPs, outperformed their local counterparts, even when evaluated on local data in the test dataset, and performance was like those being used for pooled data. Increases in performance at a given hospital were inversely proportional to dataset size at a given hospital, which suggests that hospitals with smaller datasets have a significant room for growth with federated learning approaches. Larger hospitals are more likely to receive transfer admissions from various institutions, which may predispose them to have greater intrinsic dataset variability and thus generalizability. In addition to out-of-hospital validation cohorts that federated learning utilizes, future studies should validate models on out-of-system data as per the TRIPOD guidelines [10].

Despite its promise, we identify three main problems that still exist in the field of federated learning, especially when applied to the healthcare space. These problems are model inversion attacks, man-in-the-middle attacks, and adversarial triggers.

Model inversion attacks are especially problematic in the healthcare space because patient data are sensitive, and leakage can lead to serious consequences. Model inversion attacks allow an attacker to reconstitute an individual sample from the model parameters [11]. In facial recognition algorithms, model inversion attacks reconstitute an individual’s face [12]. Nevertheless, countless countermeasures have been proposed. Because these samples have only been reconstructed due to precise changes in model parameters and knowledge of the update rules, perturbations of the model parameters via the addition of Gaussian noise to the parameters can mitigate the risk of model inversion. Gaussian noise is intrinsically random, and reconstructing samples from perturbed weights becomes a significantly more difficult task.

Second, the man-in-the-middle attack is when parameters are intercepted en route to the facility. These attacks intercept models exchanged between clients and replace them with malicious model updates [13]. These malicious models may underperform on a given target and result in worse diagnoses and prognoses. Because these models are currently utilized as decision support systems rather than standalone products, these attacks are not as significant but still pose future threats. To hijack model updates, the encryption must be stripped by adversaries and re-encrypted on transit. Improvements in transfer encryption and blockchain-based approaches may be utilized to mitigate the risk of man-in-the-middle attacks [14].

Third, training data can be strategically altered so that the global model makes incorrect predictions. For example, generative models like variational autoencoders can be utilized to generate fake classifier data [15]. While the local update may seem to have improved performance according to the objective, the performance of the aggregated classifier can deteriorate rapidly. However, in the same manner that these adversarial models can generate fake data, an adversarial model can simultaneously be used to protect against adversarial attacks [16]. A second solution is that the aggregated models are locally validated to ensure that adversarial triggers are avoided.

The advent of deep learning has presupposed the need to adequately include massive amounts of data to train a model. However, pooling data across multiple institutions remains a difficult task because of the problems associated with data sharing such as intellectual property, poor interoperability, and privacy. Federated learning remains a promising approach to tackling the increasing need to adequately represent dataset complexity without the pitfalls of data sharing. We highlight a case-based example where federated learning is utilized to maximize the performance of an AKI diagnostic model across relatively diverse sites in the Mount Sinai Healthcare System. Finally, we present problems that we anticipate will become increasingly prevalent as federated learning becomes more widespread as well as the state-of-the-art solutions associated with these problems. Federated learning remains a very promising solution to better account for data diversity in this increasingly data-driven space of diagnostic medicine.

Since this was a review and used published data, ethics approval was not needed.

Mr. F.F. Gulamali has no conflicts of interest. Dr. G.N. Nadkarni has received consulting fees from AstraZeneca, Reata, BioVie, Daiichi Sankyo, Qiming Capital, and GLG Consulting; has received financial compensation as a scientific board member and advisor to Renalytix; and owns equity in Renalytix, Nexus iConnect, Data2Wisdom, and Pensieve Health as a cofounder.

Dr. G.N. Nadkarni is supported by R01DK127139 and R01HL155915.

Mr. F.F. Gulamali and Dr. G.N. Nadkarni made substantial contributions to the conception or design of the work and drafting the work and have given final approval of the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

No primary data were used for this work.

1.
Lipkova
J
,
Chen
TY
,
Lu
MY
,
Chen
RJ
,
Shady
M
,
Williams
M
,
.
Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies
.
Nat Med
.
2022 Mar
;
28
(
3
):
575
82
.
2.
Steinberg
E
,
Jung
K
,
Fries
JA
,
Corbin
CK
,
Pfohl
SR
,
Shah
NH
.
Language models are an effective representation learning technique for electronic health record data
.
J Biomed Inform X
.
2021 Jan
;
113
:
103637
.
3.
LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
.
2015 May 27
;
521
(
7553
):
436
44
.
4.
Cole
CL
,
Sengupta
S
,
Rossetti Née Collins
S
,
Vawdrey
DK
,
Halaas
M
,
Maddox
TM
,
.
Ten principles for data sharing and commercialization
.
J Am Med Inform Assoc
.
2020 Nov 13
;
28
(
3
):
646
9
.
5.
Haas
S
,
Wohlgemuth
S
,
Echizen
I
,
Sonehara
N
,
Müller
G
.
Aspects of privacy for electronic health records
.
Int J Med Inform
.
2011 Feb
;
80
(
2
):
e26
31
.
6.
Staroselsky
M
,
Volk
LA
,
Tsurikova
R
,
Pizziferri
L
,
Lippincott
M
,
Wald
J
,
.
Improving electronic health record (EHR) accuracy and increasing compliance with health maintenance clinical guidelines through patient access and input
.
Int J Med Inform
.
2006 Oct
;
75
(
10–11
):
693
700
.
7.
Heus
JJ
,
de Pauw
ES
,
Mirjam
L
,
Margherita
M
,
Michael
RH
,
Michal
H
.
Importance of intellectual property generated by biomedical research at universities and academic hospitals
.
J Clin Transl Res
.
2017 May 24
;
3
(
2
):
250
9
.
8.
Tangri
N
,
Kitsios
GD
,
Inker
LA
,
Griffith
J
,
Naimark
DM
,
Walker
S
,
.
Risk prediction models for patients with chronic kidney disease: a systematic review
.
Ann Intern Med
.
2013 Apr 16
;
158
(
8
):
596
.
9.
Chan
L
,
Chaudhary
K
,
Saha
A
,
Chauhan
K
,
Vaid
A
,
Zhao
S
,
.
AKI in hospitalized patients with COVID-19
.
J Am Soc Nephrol
.
2021 Jan
;
32
(
1
):
151
60
.
10.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KGM
;
members of the TRIPOD Group
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
Eur Urol
.
2015 Jun
;
67
(
6
):
1142
51
.
11.
Fredrikson
M
,
Jha
S
,
Ristenpart
T
.
Model inversion attacks that exploit confidence information and basic countermeasures
. CCS’15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.
New York, NY, USA
:
Association for Computing Machinery
;
2015
. p.
1322
33
.
12.
Khosravy
M
,
Nakamura
K
,
Hirose
Y
,
Nitta
N
,
Babaguchi
N
.
Model inversion attack: analysis under gray-box scenario on deep learning based face recognition system
.
KSII Trans Internet Inf Syst
.
2021
;
15
(
3
):
1100
18
.
13.
Bouacida
N
,
Mohapatra
P
.
Vulnerabilities in federated learning
.
IEEE Access
.
2021
;
9
:
63229
49
.
14.
Doku
R
,
Rawat
DB
,
Liu
C
.
Towards federated learning approach to determine data relevance in big data
. In:
2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)
.
2019
. p.
184
92
.
15.
Wang
D
,
Li
C
,
Wen
S
,
Nepal
S
,
Xiang
Y
.
Man-in-the-middle attacks against machine learning classifiers via malicious generative models [Internet]
.
arXiv [cs.CR]
.
2019
. Available from: http://arxiv.org/abs/1910.06838.
16.
Samangouei
P
,
Kabkab
M
,
Chellappa
R
.
Defense-GAN: protecting classifiers against adversarial attacks using generative models [Internet]
.
arXiv [cs.CV]
.
2018
. Available from: http://arxiv.org/abs/1805.06605.