Identifying correlated heavy-hitters in a two-dimensional data stream

Lahiri, Bibudh; Mukherjee, Arko; Tirthapura, Srikanta; Tirthapura, Srikanta

Identifying correlated heavy-hitters in a two-dimensional data stream

dc.contributor.author	Lahiri, Bibudh
dc.contributor.author	Mukherjee, Arko
dc.contributor.author	Tirthapura, Srikanta
dc.contributor.author	Tirthapura, Srikanta
dc.contributor.department	Electrical and Computer Engineering
dc.date	2018-02-19T07:26:29.000
dc.date.accessioned	2020-06-30T02:02:20Z
dc.date.available	2020-06-30T02:02:20Z
dc.date.copyright	Fri Jan 01 00:00:00 UTC 2016
dc.date.issued	2016-07-01
dc.description.abstract	<p>We consider online mining of correlated heavy-hitters (CHH) from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an aggregate along a secondary dimension. Prior work on identifying heavy-hitters in streams has almost exclusively focused on identifying heavy-hitters on a single dimensional stream, and these yield little insight into the properties of heavy-hitters along other dimensions. In typical applications however, an analyst is interested not only in identifying heavy-hitters, but also in understanding further properties such as: what other items appear frequently along with a heavy-hitter, or what is the frequency distribution of items that appear along with the heavy-hitters. We consider queries of the following form: “In a stream <em>S</em> of (<em>x</em>, <em>y</em>) tuples, on the substream <em>H</em> of all <em>x</em> values that are heavy-hitters, maintain those <em>y</em> values that occur frequently with the <em>x</em> values in <em>H</em>”. We call this problem as CHH. We formulate an approximate formulation of CHH identification, and present an algorithm for tracking CHHs on a data stream. The algorithm is easy to implement and uses workspace much smaller than the stream itself. We present provable guarantees on the maximum error, as well as detailed experimental results that demonstrate the space-accuracy trade-off.</p>
dc.description.comments	<p>This is an Accepted Manuscript of an article published by Taylor & Francis as Lahiri, Bibudh, Arko Provo Mukherjee, and Srikanta Tirthapura. "Identifying correlated heavy-hitters in a two-dimensional data stream." Data Mining and Knowledge Discovery 30, no. 4 (2016): 797-818. Available online: https://doi.org/<a href="http://dx.doi.org/10.1007" target="_blank">10.1007/s10618-015-0438-6</a>. Posted with permission.</p>
dc.format.mimetype	application/pdf
dc.identifier	archive/lib.dr.iastate.edu/ece_pubs/148/
dc.identifier.articleid	1151
dc.identifier.contextkey	11358086
dc.identifier.s3bucket	isulib-bepress-aws-west
dc.identifier.submissionpath	ece_pubs/148
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/20970
dc.language.iso	en
dc.source.bitstream	archive/lib.dr.iastate.edu/ece_pubs/148/2016_Tirthapura_IdentifyingCorrelated.pdf\|\|\|Fri Jan 14 20:26:48 UTC 2022
dc.source.uri	10.1007/s10618-015-0438-6
dc.subject.disciplines	Electrical and Computer Engineering
dc.subject.keywords	Data stream mining
dc.subject.keywords	Correlation
dc.subject.keywords	Heavy-hitters
dc.title	Identifying correlated heavy-hitters in a two-dimensional data stream
dc.type	article
dc.type.genre	article
dspace.entity.type	Publication
relation.isAuthorOfPublication	b0235db2-0a72-4dd1-8d5f-08e5e2e2bf7d
relation.isOrgUnitOfPublication	a75a044c-d11e-44cd-af4f-dab1d83339ff

File

Original bundle

Now showing 1 - 1 of 1

Name:: 2016_Tirthapura_IdentifyingCorrelated.pdf
Size:: 228.47 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Publications