Ensemble learning for score likelihood ratios under the common source problem

Thumbnail Image
Date
2023-08-04
Authors
Veneri, Federico
Ommen, Danica M.
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Wiley Periodicals LLC
Authors
Research Projects
Organizational Units
Organizational Unit
Center for Statistics and Applications in Forensic Evidence
The Center for Statistics and Applications in Forensic Evidence (CSAFE) carries out research on the scientific foundations of forensic methods, develops novel statistical methods and transfers knowledge and technological innovations to the forensic science community. We collaborate with more than 80 researchers and across six universities to drive solutions to support our forensic community partners with accessible tools, open-source databases and educational opportunities.
Organizational Unit
Statistics

The Department of Statistics seeks to teach students in the theory and methodology of statistics and statistical analysis, preparing its students for entry-level work in business, industry, commerce, government, or academia.

History
The Department of Statistics was formed in 1948, emerging from the functions performed at the Statistics Laboratory. Originally included in the College of Sciences and Humanities, in 1971 it became co-directed with the College of Agriculture.

Dates of Existence
1948-present

Related Units

Journal Issue
Is Version Of
Versions
Series
Abstract
Machine learning-based score likelihood ratios (SLRs) have emerged as alternatives to traditional likelihood ratios and Bayes factors to quantify the value of evidence when contrasting two opposing propositions. When developing a conventional statistical model is infeasible, machine learning can be used to construct a (dis)similarity score for complex data and estimate the ratio of the conditional distributions of the scores. Under the common source problem, the opposing propositions address if two items come from the same source. To develop their SLRs, practitioners create datasets using pairwise comparisons from a background population sample. These comparisons result in a complex dependence structure that violates the independence assumption made by many popular methods. We propose a resampling step to remedy this lack of independence and an ensemble approach to enhance the performance of SLR systems. First, we introduce a source-aware resampling plan to construct datasets where the independence assumption is met. Using these newly created sets, we train multiple base SLRs and aggregate their outputs into a final value of evidence. Our experimental results show that this ensemble SLR can outperform a traditional SLR approach in terms of the rate of misleading evidence and discriminatory power and present more consistent results.
Comments
This article is published as F. Veneri and D. M. Ommen, Ensemble learning for score likelihood ratios under the common source problem, Stat. Anal. Data Min.: ASA Data Sci. J. 16 (2023), 528–546. https://doi.org/10.1002/sam.11637. © 2023 The Authors. Posted with permission of CSAFE.

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
Description
Keywords
Citation
DOI
Copyright
Collections