Likelihood ratios for categorical count data with applications in digital forensics

Thumbnail Image
Longjohn, Rachel
Smyth, Padhraic
Stern, Hal S.
Major Professor
Committee Member
Journal Title
Journal ISSN
Volume Title
© The Authors (2022)
Research Projects
Organizational Units
Organizational Unit
Center for Statistics and Applications in Forensic Evidence
The Center for Statistics and Applications in Forensic Evidence (CSAFE) carries out research on the scientific foundations of forensic methods, develops novel statistical methods and transfers knowledge and technological innovations to the forensic science community. We collaborate with more than 80 researchers and across six universities to drive solutions to support our forensic community partners with accessible tools, open-source databases and educational opportunities.
Journal Issue
Is Version Of
We consider the forensic context in which the goal is to assess whether two sets of observed data came from the same source or from different sources. In particular, we focus on the situation in which the evidence consists of two sets of categorical count data: a set of event counts from an unknown source tied to a crime and a set of event counts generated by a known source. Using a same-source versus different-source hypothesis framework, we develop an approach to calculating a likelihood ratio. Under our proposed model, the likelihood ratio can be calculated in closed form, and we use this to theoretically analyse how the likelihood ratio is affected by how much data is observed, the number of event types being considered, and the prior used in the Bayesian model. Our work is motivated in particular by user-generated event data in digital forensics, a context in which relatively few statistical methodologies have yet been developed to support quantitative analysis of event data after it is extracted from a device. We evaluate our proposed method through experiments using three real-world event datasets, representing a variety of event types that may arise in digital forensics. The results of the theoretical analyses and experiments with real-world datasets demonstrate that while this model is a useful starting point for the statistical forensic analysis of user-generated event data, more work is needed before it can be applied for practical use.
This article is published as Rachel Longjohn and others, Likelihood ratios for categorical count data with applications in digital forensics, Law, Probability and Risk, Volume 21, Issue 2, June 2022, Pages 91–122, Posted with permission of CSAFE.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.