Algorithms for hierarchical clustering of gene expression data

Thumbnail Image
Date
2004-01-01
Authors
Komarina, Srikanth
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Electrical and Computer Engineering

The Department of Electrical and Computer Engineering (ECpE) contains two focuses. The focus on Electrical Engineering teaches students in the fields of control systems, electromagnetics and non-destructive evaluation, microelectronics, electric power & energy systems, and the like. The Computer Engineering focus teaches in the fields of software systems, embedded systems, networking, information security, computer architecture, etc.

History
The Department of Electrical Engineering was formed in 1909 from the division of the Department of Physics and Electrical Engineering. In 1985 its name changed to Department of Electrical Engineering and Computer Engineering. In 1995 it became the Department of Electrical and Computer Engineering.

Dates of Existence
1909-present

Historical Names

  • Department of Electrical Engineering (1909-1985)
  • Department of Electrical Engineering and Computer Engineering (1985-1995)

Related Units

Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

Genes are parts of the genome which encode for proteins in an organism. Proteins play an important part in many biologicl processes in any organism. Measuring expression level of a gene helps biologists estimate the amount of protein produced by that gene. Microarrays can be used to measure the expression levels of thousands of genes in a single experiment. Using additional techniques such as clustering various correlations among genes of interest can be found. The most commonly used clustering technique for microarray data analysis is hierarchical clustering. Various metrics such ad Euclidean, Manhattan, Pearson correlation coefficient have been used to measure (dis)similarity between genes. A commonly used software for hierarchical clustering based on Pearson correlation coefficient takes O(N[Arrow pointing up]3) for clustering N genes, even though there are algorithms which can reduce the runtime to O(N[Arrow pointing up]2). In this thesis, we show how the runtime can be reduced to O(N log N) by using a geometric interpretation of the Pearson correlation coeffcient and show that it is optimal.

Comments
Description
Keywords
Citation
Source
Copyright
Thu Jan 01 00:00:00 UTC 2004