Matching and Data Integration

  • Ma, Zongming (Investigador principal)

Detalles del proyecto

Descripción

Demand for matching and integrating datasets arises in a wide range of application fields where the data collection process is distributed and cumulative, where data related to different aspects of a complex system must be collected separately through different protocols, and where anonymity is maintained in data communication. There is a great need to develop efficient algorithms and sound theoretical understanding for matching in these settings. Ideally, these developments should be based on concrete application scenarios, such as those arising in single-cell biology and privacy-aware social network analysis. The project will provide modeling, methods, theory, and software implementations for matching and data integration to researchers in the broader scientific community, including but not limited to cell biology, psychology, telecommunications engineering, and medicine. These methods will be especially attractive to medical researchers and cell biologists as they bear the potential of unleashing the full power of single-cell data these researchers have accumulated over time with substantial human resources and financial costs. This project will also make contributions to human resource development. The investigator will focus on improving diversity in statistics and data science research through active recruitment of students to work on the project.This project will pursue three progressively more challenging goals. The first is to study the matching of two datasets with partially overlapping features, emphasizing the benefit of including non-overlapping features. The second is to develop methods for matching two datasets with disjoint feature sets. They jointly serve as preparatory steps toward the third goal: to develop theoretically and/or empirically justifiable pipelines for matching more than two datasets and integrating them into a single dataset to be used in downstream analyses. The developed methods and pipelines will be benchmarked and validated on real data through close collaborations with field experts and research labs at Stanford University, the University of Pennsylvania, and the University of North Carolina.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
EstadoFinalizado
Fecha de inicio/Fecha fin1/7/2231/10/23

Financiación

  • National Science Foundation: USD239,925.00

!!!ASJC Scopus Subject Areas

  • Medicina (todo)
  • Matemáticas (todo)
  • Física y astronomía (todo)

Huella digital

Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.