Application of gene ontology and rough sets theory in predicting molecular functions and biological processes for microarray gene expression data
Is Version Of
The functions of most plant genes are unknown even in the best-studied organisms. Finding out or estimating what functions or biological processes a gene involves can help interpret and understand the biological metabolic pathways. It is necessary to generate hypotheses about functions or biological processes for unknown genes to help design more meaningful experiments. The traditional methods of microarray data analysis are based on the assumption that genes with similar expression profiles share the similar functions or biological processes, or genes with similar functions or biological processes share the similar expression profiles. In fact, genes with different functions or in different processes may have the similar expression profiles, and genes with similar profiles may have totally different functions or involve in different processes. To avoid using this assumption, supervised methods will be used. Unlike the commonly used clustering methods, which start the analysis directly with the expression profiles, we used both the background knowledge (Gene Ontology annotation) and expression profiles during the analysis. First, gene expression data was annotated by some broad Gene Ontology (GO) terms, according to their positions in Directed Acyclic Graph (DAG) of GO. Then Rough sets theory was applied to generate rules that characterize every class so that the classifier can classify unknown genes or unclear genes into those broad GO classes. At last, the trained classifier will predict the unknown genes. This method gave reasonable results either on yeast cell cycle data set or Arabidopsis time-course data set.