Research Projects

Our work focuses on the application of computational techniques to solve problems in biology. We also collaborate directly with biologists to help them solve real-life biology problems.


Multiple Sequence Alignment

The focus is on investigating alternative formulations of multiple sequence alignment and on utilizing additional sequences from biological databases to improve alignment performance. Dissatisfied with the NP-hardness of all previous formulations, we have proposed a polynomial time solvable formulation of multiple alignment that achieves comparable performance with the best heuristics. This result challenges the common conception that the multiple alignment problem is hard and provides significant theoretical advances. A second project is based on the observation that alignment of divergent sequences can be improved by searching for and incorporating information in additional sequences from database search. Instead of using hits directly from database search, we have investigated the use of intermediate sequences to remove excessive noise from the hits and achieved significant improvements in performance over current state-of-the-art approaches. The most recent project makes better use of horizontal information by taking into account alignment of neighboring residues when aligning two residues.

Papers

Software


Motif Finding with Applications to Predicting Transcription Factor Binding Sites

The focus is on developing new models and formulations to improve motif finding performance. We have developed an algorithm that finds DNA motifs by skipping nonconserved positions, an algorithm that integrates sample-driven and pattern-driven approaches to significantly improve computation time, and an improved pattern-driven algorithm that achieves very good biological performance.

Papers

Software


Biological Network Analysis

The focus is on developing efficient algorithms to extract functional substructures from biological networks and on identifying related subnetworks from a given network. We have considered the problem of identifying complexes from a protein interaction network by considering different types of complexes separately. We have also considered the problem of identifying linear biological pathways within a network and developed a new divide-and-conquer technique to obtain improved probabilistic and deterministic algorithms with a much lower time complexity than previous approaches. In another direction, we have investigated the problem of finding a path within a given graph that is most similar to another given path, with applications to identifying conserved pathways in different organisms.

Papers

Software


Identification of Gene Clusters

The focus is on investigating improved formulations to better model gene clusters. We have developed a supervised protein family classification algorithm that overcomes the problems of existing supervised and unsupervised algorithms, an algorithm that finds conserved gene clusters on multiple genomes, an algorithm that allows large-scale analysis of gene clustering in bacteria, and an algorithm that extracts statistically significant gene clusters from a genome with the help of gene ontology annotations.

Papers

Software


Collaborations with Biologists

Papers