ITR: The GriPhyN Project: Towards Petascale Virtual-Data Grids

Project Details

Description

The goal of this project is to provide the Information Technology (IT) advances required for petabyte-scale data intensive science in the 21st century. Driving the research are unprecedented requirements for geographically dispersed extraction of complex scientific information from very large collections of measured data. To meet these requirements, the GriPhyN (Grid Physics Network) team of seven IT research groups and four frontier physics experiments will pursue IT advances centered on the creation of Petascale Virtual Data Grids (PVDG). Only PVDG technology can meet the data-intensive computational needs of a diverse community of thousands of scientists spread across the globe.

GriPhyN's physicists (the CMS and ATLAS experiments at the Large Hadron Collider, the Laser Interferometer Gravitational-wave Observatory (LIGO), and the Sloan Digital Sky Survey (SDSS)) are about to enter a new era of exploration of the fundamental forces of nature and the structure of the universe. The data analysis for these experiments presents enormous IT challenges. Thousands of scientists, connected worldwide by various networks, need to perform computationally demanding analyses of data sets growing from 100 terabytes to 100 petabytes. The scale of this task far outpaces our current ability to manage and process data in a distributed environment, demanding fundamental IT advances.

To meet these challenges, GriPhyN will pursue an aggressive program of IT research to realize the concept of Virtual Data. This encompasses the definition and delivery of a (potentially unlimited) virtual space of data products derived from experimental data. In this virtual data space, requests can be satisfied by direct access and/or by computation, depending on the requirements of local and global resource management and security policies. GriPhyN's IT researchers will target advances in the areas of:

Virtual data technologies, including information models and new methods of managing software components for virtual data manipulation,

Request planning and scheduling, including mechanisms for requesting and enforcing policy constraints in a networked environment, and

Task execution, including agent computing and other paradigms for meeting user requirements for performance, reliability, and cost.

In order to apply these advances to experimental data analysis problems, GriPhyN will package them in a multi-faceted, domain-independent Virtual Data Toolkit, and use this toolkit to prototype the PVDG technology for the CMS, ATLAS, LIGO, and SDSS data analysis tasks. In the process, it will create PVDG systems for community use. This will educate a new generation of interdisciplinary scientists with expertise in the critical area of data intensive computing. The benefits of doing so will not be unique to physics, but will also apply to problems in biology (e.g. the human genome project), the environment (e.g. the Earth Observation System), and many other areas.

StatusFinished
Effective start/end date1/9/0031/8/07

Funding

  • National Science Foundation: US$12,274,841.00

ASJC Scopus Subject Areas

  • Physics and Astronomy(all)
  • Computer Science(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.