Resampling methods for score likelihood ratio based inference for source attribution problems
Date
2024-12
Authors
Veneri Guarch, Federico Alejandro
Major Professor
Advisor
Ommen, Danica M.
Vivekananda, Roy
Carriquiry, Alicia
Niemi, Jarad
Nordman, Daniel
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
This dissertation addresses source attribution problems, an inferential task that contrasts two opposing propositions regarding the origin of items. These inferential problems arise in multiple domains but play a key role in forensic science.
Due to the complexity of evidence found in practical applications, machine learning has been proposed as an alternative to evaluate the similarity between items when a probabilistic model is not feasible to construct a traditional Likelihood ratio. Score-based likelihood ratio inference hence provides an alternative framework to assess the strength of statistical evidence in this context.
Our work focuses on the common and specific source inferential problems and addresses the dependence structure generated when creating training and estimation sets to develop these inferential systems. We present resampling plans to remedy these shortcomings and how ensemble learning approaches could strengthen the current methods.
Chapter 2 introduces Strong Source Resampling (SSR), a source-aware resampling plan for the common source problem. This idea is extended to Weak Source Resampling (WSR) in Chapter 4. These resampling plans are the basis for developing base systems combined into a final value of evidence using an ensemble learning approach proposed in Chapter 2.
Chapter 3 focuses on the specific source problem, introducing synthetic source anchoring, which uses synthetic items as data augmentation, allowing the development of specific source score likelihood ratios.
Lastly, Chapter 4 introduces discrepancy metrics for score likelihood ratio-based inference that can be used to study model misspecification and the effects of not accounting for dependence.
Simulation results and applications in both chapters suggest that combining ensemble learning with a source-aware resampling could provide stronger, more stable statistical evidence value in the correct direction for machine learning and simple score-based likelihood ratios. Chapter 5 provides general conclusions and some avenues for further research.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation