PREMIER — PRobabilistic error-correction using Markov inference in errored reads

Thumbnail Image
Date
2013-01-01
Authors
Yin, Xin
Song, Zhao
Dorman, Karin
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

In this work we present a flexible, probabilistic and reference-free method of error correction for high throughput DNA sequencing data. The key is to exploit the high coverage of sequencing data and model short sequence outputs as independent realizations of a Hidden Markov Model (HMM). We pose the problem of error correction of reads as one of maximum likelihood sequence detection over this HMM. While time and memory considerations rule out an implementation of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal Viterbi algorithm (for error correction), we propose low-complexity approximate versions of both. Specifically, we propose an approximate Viterbi and a sequential decoding based algorithm for the error correction. Our results show that when compared with Reptile, a state-of-the-art error correction method, our methods consistently achieve superior performances on both simulated and real data sets.

Series Number
Journal Issue
Is Version Of
Versions
Series
Type
article
Comments

This is a manuscript of a proceeding from the IEEE Global Conference on Signal and Information Processing 2013: 73, doi:10.1109/ISIT.2013.6620502. Posted with permission.

Rights Statement
Copyright
Tue Jan 01 00:00:00 UTC 2013
Funding
DOI
Supplemental Resources