Speaker
Description
Data privacy is a growing concern in real-world machine learning (ML) applications, particularly in sensitive domains like healthcare. Federated learning (FL) offers a promising solution by enabling model training across decentralized, private data sources. However, both traditional ML and FL approaches typically assume access to fully labeled datasets, an assumption that rarely holds in practice. Users often lack the time, motivation, or expertise to label their data, making labeled examples scarce.
This paper proposes a federated semi-supervised learning (FSSL) framework that learns from a small set of labeled data alongside a large volume of unlabeled data. Our approach combines FL with VIME, a leading semi-supervised learning (SSL) method for tabular data. Unlike image or text data, tabular data presents unique challenges for SSL due to the absence of transferable pretext tasks.
We evaluate our method of predicting Parkinson’s disease severity and show that it significantly outperforms both supervised and SSL baselines across varying proportions of labeled data. The model achieves an RMSE of 7.74 and an MAE of 6.26 in the most challenging setting with only 10% labeled data, substantially outperforming both supervised FL and standalone SSL baselines, demonstrating the strength of our method under limited supervision. These results show that our method effectively leverages unlabeled data to enhance predictive performance in a privacy-preserving, real-world setting.
Special/ Invited session | QSR |
---|---|
Classification | Both methodology and application |
Keywords | federated learning, semi-supervised learning, AI |