Explorations of the lineup protocol for visual inference: application to high dimension, low sample size problems and metrics to assess the quality

dc.contributor.advisor Dianne Cook
dc.contributor.author Roy Chowdhury, Niladri
dc.contributor.department Statistics
dc.date 2018-08-11T14:34:31.000
dc.date.accessioned 2020-06-30T02:53:34Z
dc.date.available 2020-06-30T02:53:34Z
dc.date.copyright Wed Jan 01 00:00:00 UTC 2014
dc.date.embargo 2015-07-30
dc.date.issued 2014-01-01
dc.description.abstract <p>Statistical graphics play an important role in exploratory data analysis, model checking and diagnosis. Recent developments suggest that visual inference helps to quantify the significance of findings made from graphics. In visual inference, lineups embed the plot of the data among a set of null plots, and engage a human observer to select the plot that is most different from the rest. If the data plot is selected it corresponds to the rejection of a null hypothesis. With high dimensional data, statistical graphics are obtained by plotting low-dimensional projections, for example, in classification tasks projection pursuit is used to find low-dimensional projections that reveal differences between labelled groups. In many contemporary data sets the number of observations is relatively small compared to the number of variables, which is known as a high dimension low sample size (HDLSS) problem. The research conducted and described in this thesis explores the use of visual inference on understanding low dimensional pictures of HDLSS data. This approach may be helpful to broaden the understanding of issues related to HDLSS data in the data analysis community. Methods are illustrated using data from a published paper, which erroneously found real separation in microarray data. The thesis also describes metrics developed to assist the use of lineups for making inferential statements. Metrics measure the quality of the lineup, and help to understand what people see in the data plots. The null plots represent a finite sample from a null distribution, and the selected sample potentially affects the ease or difficulty of a lineup. Distance metrics are designed to describe how close the true data plot is to the null plots, and how close the null plots are to each other. The distribution of the distance metrics is studied to learn how well this matches to what people detect in the plots, the effect of null generating mechanism and plot choices for particular tasks. The analysis was conducted on data collected from Amazon Turk studies conducted with lineups for studying an array of exploratory data analysis tasks. Finally an R package is constructed to provide open source tools to use visual inference and distance metrics.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/13988/
dc.identifier.articleid 4995
dc.identifier.contextkey 6199714
dc.identifier.doi https://doi.org/10.31274/etd-180810-3358
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/13988
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/28175
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/13988/RoyChowdhury_iastate_0097E_14402.pdf|||Fri Jan 14 20:05:19 UTC 2022
dc.subject.disciplines Statistics and Probability
dc.subject.keywords distance metrics
dc.subject.keywords lineup
dc.subject.keywords projection pursuit
dc.subject.keywords statistical graphics
dc.subject.keywords visualization
dc.title Explorations of the lineup protocol for visual inference: application to high dimension, low sample size problems and metrics to assess the quality
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
2.93 MB
Adobe Portable Document Format