WPI - Computer Science and Biology Departments
Project Description |Members


DNA sequencing and gene expression studies have resulted in enormous databases that contain raw genetic sequence information together with gene expression data (information about which tissues activate particular genes). Progress in computational techniques to analyze such databases has been much slower. Most studies have been limited to applying known statistical techniques to analyze individual genes or functional sites. There is every reason to believe that the interactions between multiple sites are crucially important in controlling the transcription of DNA into RNA, for example. However, there are at present very few automated techniques that are capable of performing such multi-point analysis.

The objective of our project is to develop new algorithms and visualization tools for multi-point analysis of genomic data. Although the techniques developed in this work will have broad applicability, significant progress can only be made with guidance from target applications. Our work will aim to contribute to two major issues in molecular biology: interactions between multiple single nucleotide polymorphisms (SNPs) in their contributions to disease, and prediction of gene expression on the basis of multiple interacting DNA sequences. Our approach will require significant theoretical as well as algorithmic innovations involving a synthesis of multiple techniques from data mining, machine learning, pattern recognition, and scientific visualization.


[Return to the WPI Homepage]  [Return to the CS Homepage]