Improved triangle counting in graph streams: Neighborhood multi-sampling
In this thesis, we study the problem of estimating the number of triangles of an undirected graph in the data stream model. Some of the well-known streaming algorithms work as follows: Sample a single triangle with high enough probability and repeat this basic step to obtain a global triangle count. For example, the neighborhood sampling algorithm attempts to sample a triangle by randomly choosing a single edge e, a single neighbor f of e and waits for a third edge that completes the triangle. The basic sampling step in the algorithm is repeated multiple times to obtain an estimate for the global triangle count in the input graph stream. In this work, we propose a multi-sampling variant of this algorithm. We provide a theoretical analysis of the algorithm and prove that it improves upon the known space and accuracy bounds. We experimentally show that this algorithm outperforms several well-known triangle counting streaming algorithms.