Space‐efficient tracking of persistent items in a massive data stream

dc.contributor.author Tirthapura, Srikanta
dc.contributor.author Tirthapura, Srikanta
dc.contributor.department Computer Science
dc.contributor.department Electrical and Computer Engineering
dc.date 2018-04-29T11:15:34.000
dc.date.accessioned 2020-06-30T02:02:34Z
dc.date.available 2020-06-30T02:02:34Z
dc.date.copyright Tue Jan 01 00:00:00 UTC 2013
dc.date.issued 2014-01-01
dc.description.abstract <p>Motivated by scenarios in network anomaly detection, we consider the problem of detecting persistent items in a data stream, which are items that occur ‘regularly’ in the stream. In contrast with heavy hitters, persistent items do not necessarily contribute significantly to the volume of a stream, and may escape detection by traditional volume‐based anomaly detectors.</p> <p>We first show that any online algorithm that tracks persistent items exactly must necessarily use a large workspace, and is infeasible to run on a traffic monitoring node. In light of this lower bound, we introduce an approximate formulation of the problem and present a small‐space algorithm to approximately track persistent items over a large data stream. We experimented with three different datasets to see how the accuracy and memory footprint of the algorithm varies with the skewness of the dataset. Our algorithms performed best for the two datasets out of three which had highest skewness of persistence and lowest mean persistence. To our knowledge, this is the first systematic study of the problem of detecting persistent items in a data stream, and our work can help detect anomalies that are temporal, rather than volume‐based.</p>
dc.description.comments <p>This is the peer-reviewed version of the following article: Lahiri, Bibudh, Srikanta Tirthapura, and Jaideep Chandrashekar. "Space‐efficient tracking of persistent items in a massive data stream." <em>Statistical Analysis and Data Mining: The ASA Data Science Journal</em> 7, no. 1 (2014): 70-92, which has been published in final form at DOI:<a href="http://dx.doi.org/10.1002/sam.11214" target="_blank">10.1002/sam.11214</a>. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/ece_pubs/177/
dc.identifier.articleid 1177
dc.identifier.contextkey 12009715
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath ece_pubs/177
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/21002
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/ece_pubs/177/2014_Tirthapura_SpaceEfficient.pdf|||Fri Jan 14 21:27:40 UTC 2022
dc.source.uri 10.1002/sam.11214
dc.subject.disciplines Electrical and Computer Engineering
dc.subject.disciplines Systems and Communications
dc.subject.keywords Data streams
dc.subject.keywords persistence
dc.subject.keywords sketches
dc.subject.keywords hash-based filters
dc.title Space‐efficient tracking of persistent items in a massive data stream
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication b0235db2-0a72-4dd1-8d5f-08e5e2e2bf7d
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2014_Tirthapura_SpaceEfficient.pdf
Size:
532.95 KB
Format:
Adobe Portable Document Format
Description:
Collections