Reducing labeling complexity in streaming data mining

Izenov, Yesdaulet

Reducing labeling complexity in streaming data mining

File

Izenov_iastate_0097M_17151.pdf (528.54 KB)

Date

2018-01-01

Authors

Izenov, Yesdaulet

Advisor

Srikanta Tirthapura

Altmetrics

Abstract

Supervised machine learning is an approach where an algorithm estimates a mapping

function by using labeled data i.e. utilizing data attributes and target values. One of the major

obstacles in supervised learning is the labeling step. Obtaining labeled data is an expensive

procedure since it typically requires human effort. Training a model with too little data tends

to overfit therefore in order to achieve a reasonable accuracy of prediction we need a minimum

number of labeled examples. This is also true for streaming machine learning models. Maintaining

a model without rebuilding and performing a prediction task without ever storing input samples are

the key concepts of streaming machine learning models. A successful and widely used streaming

model is the Hoeffding tree which has large labeling complexity. In this work, we present Frugal

Hoeffding tree, a variation of the Hoeffding tree that uses less labeled data, and provides similar

performance as the original Hoeffding tree. We conduct experiments on large real-world datasets

where we compare the performances of traditional batch decision trees, the Hoeffding tree and

the Frugal Hoeffding tree. We show that the Frugal Hoeffding tree consumes less labeled data

yet can achieve classification performance similar to the Hoeffding tree.

Academic or Administrative Unit

Department of Electrical and Computer Engineering

Type

thesis

Copyright

Tue May 01 00:00:00 UTC 2018