Bioinformatics Support of Genome Sequencing Projects

Thumbnail Image
Supplemental Files
Date
2002-01-01
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

The genome of an organism is the "book of life". It encodes the complete set of genetic instructions for the development of the organism. The structure of a genome is a linear sequence of nucleotides. Determination of the sequence of a genome lays the foundation for understanding biology at the molecular level. With the current biotechnology, it is a challenging task to determine the sequence of a genome. A sequencing machine can read the sequence of a piece of DNA for up to 1000 bp (base pairs). However, genomes are very huge. For example, the genome of the bacterium E. coli is about 4 Mb (million base pairs) in size, the genome of the nematode C. elegans is 100 Mb in size, and the human genome is 3 Gb in size. The inability to produce long sequences by sequencing machines requires that long sequences be produced from short sequence reads. A shotgun sequencing strategy is widely used to determine the sequence of a long segment of DNA. In this strategy, multiple copies of the DNA segment are randomly cut into small pieces. The sequence of each piece is read by an automated sequencing machine. The sequence of the large DNA segment is reconstructed by a computer program from short sequence reads. The sequence assembly problem is to assemble short reads into long sequences. What makes the sequence assembly problem non-trivial is that there is no information about how short sequence reads are ordered with respect to the DNA segment.

Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
Comments

This chapter was published as Huang, Xiaoqiu. "Bioinformatics Support of Genome Sequencing Projects." In Thomas Lengauer (Ed), Bioinformatics‐From Genomes to Drugs (2002): 25‐48. Copyright Wiley-VCH Verlag GmbH & Co. KGaA. Reproduced with permission.

Rights Statement
Copyright
Tue Jan 01 00:00:00 UTC 2002
Funding
DOI
Supplemental Resources
Source
Collections