Exploratory analysis of high-dimensional data with visual tools

Thumbnail Image
Jeppson, Haley
Major Professor
Hofmann, Heike
Berg, Emily
Carriquiry, Alicia
Ommen, Danica
Olafsson, Sigurdur
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
This body of work combines statistical models and data visualizations in new and exciting ways. Exploratory data analysis and dimension reduction techniques serve as the backbone for developing visual tools designed to foster new ideas and an understanding of the underlying phenomenon of interest. Chapter 1 presents a thorough review of the literature regarding exploratory data analysis and graphical methods. Chapter 2 focuses on graphical methods for categorical variables and introduces an implementation of mosaic plots in R designed for generalized mosaic plots using the popular grammar of graphics R plotting paradigm, ggplot2 (Wickham 2016). We develop novel uses of mosaic plots that exemplify the capacity multidimensional categorical data visualization methods have for growth. We conclude with a Shiny application that facilitates a better understanding of the myriad of possible forms a mosaic plot can take by accommodating a thorough search through the variables and structural changes to the mosaic plot with the simple press of certain keystrokes. Chapter 3 further explores multidimensional categorical data visualizations and develops an approach to using mosaic plots to assess the lack of fit of a given ordinal model. We identify visual indicators for parameters in different models and extend the connection between mosaic plots of binary tables and odds ratios to include logistic regression models with ordinal variables. We then extended the concept to an ordinal response variable, requiring the introduction of cumulative odds ratios and the proportional odds ratio model, with which we address the assessment of higher dimensional interaction terms. The second part of chapter 3 develops techniques for visual model diagnostics and selection with mosaic plots resulting in a graphical forward step-wise selection procedure. We connect the model space and the data space by representing model residuals with jittered points on the mosaic plot, extending methods suggested by Theus and Lauer (1999) and Friendly (2002). We conclude by amending the procedure to include a second phase consisting of backward steps to tighten the model constraints by replacing some parameters with structured terms for ordinal classifications in association models. Chapter 4 switches focus to the exploratory data analyses of high-dimensional numeric data using tours. We present a new type of tour that incorporates novelty detection, providing a versatile approach to unsupervised dimension reduction. The method fuses multiple projection pursuit indices while maintaining a short memory, circumventing the issues encountered with the randomness of the grand tour and the narrow view of the projection pursuit optimization. We evaluate the behavior of the method with simulated data, and the results highlight the flexibility of the method. This work expands visual methods for exploring diverse types of high-dimensional data.
Subject Categories