Persistence homology with applications in analysis and data science

Przybylski, Lee

Persistence homology with applications in analysis and data science

File

Przybylski_iastate_0097E_19878.pdf (2.56 MB)

Date

2021-12

Authors

Przybylski, Lee

Advisor

Weber, Eric

Keinert, Fritz

Herzog, David

Catanzaro, Michael

Nettleton, Daniel

Altmetrics

Abstract

This thesis is a compilation of three research projects that I developed as a graduate student at Iowa State. The main results of these projects are distributed into the final three chapters. Each of those chapters begin with their own overview to introduce the ideas unique to their respective projects. The first chapter provides an overview to the three projects and introduces some of the motivating concepts and notation leading to the three projects. Chapter 2 provides background on topological data analysis that is necessary to understand chapters 3 and 4. We now provide a brief description of the major contributions in each of the three projects. Topological data analysis (TDA) is often considered as the way to characterize the shape of data. The way we do this is by taking a set of data points, computing its Cech complex across a range of resolutions, and recording how the homology groups change in what is called a persistence landscape. This can be done for any collection of points in a metric space, and we apply this to fractals which are the invariant sets of an iterated function system (IFS). Using the self-similarity properties of these fractals and some assumptions about the distance between the images of the similitudes in our IFS, we prove several isomorphisms that allow us to establish a formula for the persistence landscape of the invariant set of an IFS. Another tool from TDA is the persistence landscape. In this project we develop a novel algorithm for anomaly detection in time-series data using persistence diagrams measured by the bottleneck distance. Specifically, we generate anomaly scores by randomly collecting the data into reference bags, and then for each data point, we assign an anomaly score by replacing a randomly chosen point in each reference bag with the data point of interest to create a modified bag. The anomaly score is then computed using the bottleneck distances between the persistence diagrams generated by the reference bags and their corresponding modified bags. These anomaly scores are proven to stabilize as the number of reference bags increases. The algorithm is demonstrated with an application to traffic data to detect known incidents. The final project deviates from topological data analysis and discusses the Kaczmarz reconstruction for functions in the Hardy space. The Kaczmarz algorithm is an iterative method for reconstructing vectors using inner products, and is known to be unstable in the presence of noise. We show using the relaxed Kaczmarz algorithm, the reconstruction can be stabilized. We also show that for certain noise profiles, such as noise in finite almost everywhere or from certain subspaces of of the Hardy space, the relaxed version of the Kaczmarz algorithm can fully remove the corruption by noise in the inner products. By using the spectral representation of stationary sequences, we show that our relaxed version of the Kaczmarz algorithm also stabilizes the reconstruction of Fourier series expansions related to a singular measure on the unit circle.

Type

dissertation