CyberTraining: DSE: Cyber Carpentry: Data Life-Cycle Training using the Datanet Federation Consortium Platform

  • Rajasekar, Arcot A.K. (PI)
  • Xu, Hao H. (CoPI)
  • Feinberg, Melanie M. (CoPI)

Project Details

Description

The emergence of massive data collections has ushered a paradigm shift in the way scientific research is conducted and new knowledge is discovered. This shift necessitates students to be trained in team-based, interdisciplinary, complex data-oriented approaches designed to translate scientific data into new solutions in order to promote the progress of science; to advance the national health, prosperity and welfare, and to secure the national defense. The proliferation of cyberinfrastructure (CI) tools necessitate addressing the needs of domain scientists from multiple angles, including data access, metadata management, large-scale analytics and workflows, data and application discovery and sharing, and data preservation. Training with such a holistic perspective is indeed daunting with a tool and solution landscape that is still fragmented. Integrated solutions, such as the Datanet Federation Consortium (DFC) Platform, provide a way to ease this overload and help touch upon all of these needed functionalities. The aim of this project is to make it easier for next generation workforce in STEM disciplines to learn all aspects of data-intensive computing environment and, more importantly, to work together with other researchers with complementary expertise.

Students in STEM disciplines need to be educated in (i) practices of data organization, (ii) importance of provenance, metadata and ontology, (iii) conformance to authentication, authorization and access control protocols, (iv) models for data sharing, discovery and curation, (v) necessity for reproducible data science workflows, (vi) practices in dealing with large-scale data computation using super computers and cloud computing, and (vii) distributed data management practices. The Datanet Federation Consortium (DFC) is an NSF-funded project that has implemented a data-centered cyber platform that has integrated tools for end-to-end data life-cycle management and data-intensive high performance computation. This project aims to use the DFC Platform to provide training for STEM graduate students in leading-edge data-intensive practices, in all aspects of data-intensive computing environments. Their training workshops will be multi-disciplinary, including earth system sciences, biological sciences, social and information sciences, marine sciences and engineering. The short term goal of the project is to provide intensive, short duration training discipline-centric workshops, called Cyber Carpentries. These workshops will lead to Certificates in Data Science, preparing a better scientific workforce with advanced data-intensive CI capabilities. For the long-term, project plans to develop self-paced tutorials and a sequence of courses that can be adapted in different STEM disciplines with concentration in data life-cycle management and data-intensive computing. The practicums will involve large datasets from multiple science data repositories including several NSF-funded large-scale cyberinfrastructure such as iRODS, CyVerse, DataONE, SEAD, TerraPop, DataVerse and HydroShare - all of which are integrated through the DFC Platform. Project will recruit students from HBCUs and MSIs for its workshops and will work closely with faculty from these universities to help them adopt the courses developed through this project. All material developed as part of the project will be made available as open course material.

StatusFinished
Effective start/end date1/11/1731/10/21

Funding

  • National Science Foundation: US$499,641.00

ASJC Scopus Subject Areas

  • Earth and Planetary Sciences(all)
  • Computer Science(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.