Resampling methods for score likelihood ratio based inference for source attribution problems

Veneri Guarch, Federico Alejandro

Resampling methods for score likelihood ratio based inference for source attribution problems

File

VeneriGuarch_iastate_0097E_21857.pdf (10.8 MB)

Date

2024-12

Authors

Veneri Guarch, Federico Alejandro

Advisor

Ommen, Danica M.

Vivekananda, Roy

Carriquiry, Alicia

Niemi, Jarad

Nordman, Daniel

Altmetrics

Abstract

This dissertation addresses source attribution problems, an inferential task that contrasts two opposing propositions regarding the origin of items. These inferential problems arise in multiple domains but play a key role in forensic science. Due to the complexity of evidence found in practical applications, machine learning has been proposed as an alternative to evaluate the similarity between items when a probabilistic model is not feasible to construct a traditional Likelihood ratio. Score-based likelihood ratio inference hence provides an alternative framework to assess the strength of statistical evidence in this context. Our work focuses on the common and specific source inferential problems and addresses the dependence structure generated when creating training and estimation sets to develop these inferential systems. We present resampling plans to remedy these shortcomings and how ensemble learning approaches could strengthen the current methods. Chapter 2 introduces Strong Source Resampling (SSR), a source-aware resampling plan for the common source problem. This idea is extended to Weak Source Resampling (WSR) in Chapter 4. These resampling plans are the basis for developing base systems combined into a final value of evidence using an ensemble learning approach proposed in Chapter 2. Chapter 3 focuses on the specific source problem, introducing synthetic source anchoring, which uses synthetic items as data augmentation, allowing the development of specific source score likelihood ratios. Lastly, Chapter 4 introduces discrepancy metrics for score likelihood ratio-based inference that can be used to study model misspecification and the effects of not accounting for dependence. Simulation results and applications in both chapters suggest that combining ensemble learning with a source-aware resampling could provide stronger, more stable statistical evidence value in the correct direction for machine learning and simple score-based likelihood ratios. Chapter 5 provides general conclusions and some avenues for further research.

Academic or Administrative Unit

Statistics (LAS)

Type

dissertation