Federated Learning with Uncertainty-Based Client Clustering for Fleet-Wide Fault Diagnosis
Date
2023-04-26
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
arXiv
Abstract
Operators from various industries have been pushing the adoption of wireless sensing nodes for
industrial monitoring, and such efforts have produced sizeable condition monitoring datasets that
can be used to build diagnosis algorithms capable of warning maintenance engineers of impending
failure or identifying current system health conditions. However, single operators may not have
sufficiently large fleets of systems or component units to collect sufficient data to develop datadriven
algorithms. Collecting a satisfactory quantity of fault patterns for safety-critical systems is
particularly difficult due to the rarity of faults. One potential solution to overcome the challenge of
having limited or not sufficiently representative datasets is to merge datasets from multiple operators
with the same type of assets. This could provide a feasible approach to ensure datasets are large
enough and representative enough. However, directly sharing data across the company’s borders
yields privacy concerns. Federated learning (FL) has emerged as a promising solution to leverage
datasets from multiple operators to train a decentralized asset fault diagnosis model while maintaining
data confidentiality. However, there are still considerable obstacles to overcome when it comes to
optimizing the federation strategy without leaking sensitive data and addressing the issue of client
dataset heterogeneity. This is particularly prevalent in fault diagnosis applications due to the high
diversity of operating conditions and system configurations. To address these two challenges, we
propose a novel clustering-based FL algorithm where clients are clustered for federating based on
dataset similarity. To quantify dataset similarity between clients without explicitly sharing data,
each client sets aside a local test dataset and evaluates the other clients’ model prediction accuracy
and uncertainty on this test dataset. Clients are then clustered for FL based on relative prediction
accuracy and uncertainty. Experiments on three bearing fault datasets, two publicly available and
one newly collected for this work, show that our algorithm significantly outperforms FedAvg and a
cosine similarity-based algorithm by 5:1% and 30:7% on average over the three datasets. Further, using a probabilistic classification model has the additional advantage of accurately quantifying its
prediction uncertainty, which we show it does exceptionally well.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
Preprint
Comments
This is a pre-print of the article Lu, Hao, Adam Thelen, Olga Fink, Chao Hu, and Simon Laflamme. "Federated Learning with Uncertainty-Based Client Clustering for Fleet-Wide Fault Diagnosis." arXiv preprint arXiv:2304.13275 (2023).
DOI: 10.48550/arXiv.2304.13275.
Copyright 2023 The Authors.
Posted with permission.