Exploring statistical methods for analysis of microarray data
The expansion of molecular biology in recent years has created an increasing amount of data and interest in specific tools to analyze them. Much of these data come from a class of high-throughput technology that measures hundreds or thousands of variables at the same time. One such high-throughput technology currently in use is microarray technology. The three major objectives in expression analysis are data preprocessing, identifying differential expression, and grouping genes by common behavior. Extracting the useful information on gene expression from the available output is not trivial. The data collection process is quite noisy in that non-biological bias may be introduced at a number of points by the operators or the technology. Identifying differential expression is an important step in reducing the number of variables, p, of interest to a reasonable scale. It requires distinguishing random variation in expression measurements from signal of interest. Most statistical research so far has focused on this problem and many methods exist for making the determination. Finally, grouping genes has biological importance in identifying the purpose of unidentified genes and the interconnections between biological systems. We focus on achieving the first and last of these objectives while using relatively standard methods for the second one.