Active Learning with semi-supervised Gaussian Mixture Model on Network Flow Detection

Thumbnail Image
Date
2023-05
Authors
Song, Wenfei
Major Professor
Advisor
Li, Qi
Cai, Ying
Zhang, Wensheng
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
One of the biggest challenges in network flow detection is that the data distribution is extremely unbalanced, which leads to the inability of traditional unsupervised learning to find data features effectively, while semi-supervised/supervised learning methods require a large number of labeled samples to obtain enough information to study. Active learning can effectively reduce the cost of labeling through the process of "training-labeling the most valuable samples-training". Existing work pays more attention to the application of pool-based active learning in related scenarios, and mostly ignores the startup problem when there are no labeled samples in the initial condition. This report better simulates real-world application scenarios through special pool-stream input settings, focusing on solving the cold-start problem on extremely unbalanced datasets, and proposes a new initiative that combines uncertainty and marginal information for active learning selection strategies. We organize the rest of the report as following. In Chapter 1, we briefly discuss the background of active learning in the field of network flow detection, and in Chapter 2 we discuss existing related work. Chapter 3 presents the preliminary work, and Chapter 4 provides a formal definition of the problem. In Chapter 5, we give a concrete description of our work and present our work results in Chapter 6. At last, we make a conclusion of our work in Chapter 7.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
thesis
Comments
Rights Statement
Copyright
Funding
Subject Categories
DOI
Supplemental Resources
Source