SI2-SSE: Reducing the Complexity of Comparative Genomics with Online Analytical Processing

  • Gibas, Cynthia J. (PI)
  • Kosara, Robert R. (CoPI)

Project Details

Description

Genome comparison is a common bioinformatics analysis task, but a survey of the literature suggests that comparative genomic studies are done in an ad hoc, investigator-dependent, and non-reproducible fashion. Comparative genomics analysis questions can generally be formulated as set queries: what differentiates genome A from genome B, or from a broader group of its taxonomic neighbors? This project will develop a data warehouse-type database system optimized for comparative genomics that is particularly suited to answer these kinds of questions. It will store sequence-linked biological data in a way that supports OLAP (On-Line Analytical Processing) and complex set-based queries. A workflow tool will be developed for guiding the user through core comparative genomic operations, and will serve as an interface for populating the data warehouse. An interactive query tool will allow the user to easily construct complex questions about the data. Set-based as well as individual record results will be presented to the user in a way that can be easily browsed, compared, and exported.

The system will enable the user to generate and compare genomic feature sets following a guided workflow defined by and incorporating common elements of analysis used in current microbial genome studies. Parameters and results from each step in the process will be tracked for later reporting. The system will enable both biology-driven comparison of genomic feature sets, and perhaps more importantly systematic inquiry into and comparison of bioinformatics analysis results obtained at each workflow stage. All software and database structures developed in this project will be made available under an open-source license and as runnable virtual machine images. The latter will make it possible for scientists to get started without complicated installation and configuration procedures, and instead focus on their research questions. All user-facing parts of the software will be accessible through a web browser, making use of the latest developments in browser-based interaction and HTML5. A diverse group of graduate students will receive inter-disciplinary training in this project, and K-12 and underrepresented groups will be included in through ongoing partnerships with the STARS Alliance program.

StatusFinished
Effective start/end date15/9/1031/8/15

Funding

  • National Science Foundation: US$448,253.00

ASJC Scopus Subject Areas

  • Genetics
  • Molecular Biology
  • Computer Science(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.