Semiparametric Regression Analysis of Interval-Censored Data in Current Cohort Studies

  • Zeng, Donglin D (PI)

Project Details

Description

PROJECT SUMMARY
In epidemiological cohort studies, the onset of an asymptomatic disease (e.g., diabetes, hypertension, chronic
obstructive pulmonary disease, HIV infection, SARS-CoV-2 infection, cancer, or dementia) cannot be observed
directly but rather is known to occur sometime between two consecutive clinical examinations. The two examina-
tions bookend a time interval, such that the event time is “interval-censored”. It is highly challenging to analyze
interval-censored data because none of the event times is exactly known; therefore, investigators have resorted
to statistical methods that are unreliable or even invalid. The broad, long-term objectives of this research project
are to develop semiparametric regression models, with associated inference procedures and numerical algo-
rithms, for analyzing interval-censored data from current epidemiological investigations. The specific aims of the
project are: (1) to explore semiparametric regression models for assessing the impact of an interval-censored
event (e.g., onset of diabetes) on future outcomes (e.g., stroke, heart attack, methylation level); (2) to build a
system of proportional intensity models with random effects for analyzing interval-censored multi-state processes
that characterize disease progression over time; (3) to provide graphical and numerical techniques for checking
the adequacy of semiparametric regression models with interval-censored data; and (4) to relax the proportional
hazards assumption by allowing time-varying regression coefficients. All of these aims are motivated by the
unmet methodological needs in the cohort studies that the investigators are currently conducting and address
the most timely and important issues in human population health research. The estimation of model parame-
ters is based on nonparametric likelihood (with an arbitrary event-time distribution) and other sound statistical
principles. The large-sample properties of the estimators will be established rigorously through innovative use
of modern empirical process theory, semiparametric efficiency theory, and other advanced mathematical argu-
ments. Computationally efficient and stable algorithms will be created to implement the inference procedures.
The operating characteristics of the numerical algorithms and inference procedures will be evaluated extensively
through simulation studies that mimic real data. The proposed methods will be applied to the Atherosclerosis Risk
in Communities Study and the SubPopulations and InteRmediate Outcome Measures In COPD Study, both of
which are being carried out at the University of North Carolina at Chapel Hill. These studies exemplify the broad
challenges and opportunities arising from modern epidemiological research. The results will be published in both
statistical and medical journals. Efficient, reliable, user-friendly, open-access, and well-documented R packages
will be produced and disseminated to the broad scientific community. This research will create new paradigms
in survival analysis, advance population health research in the United States and elsewhere, and accelerate the
search for effective strategies to prevent and treat many diseases of critical importance to public health, including
cardiovascular disease, lung disease, diabetes, cancer, HIV/AIDS, and dementia.
StatusActive
Effective start/end date10/3/2428/2/25

Funding

  • National Heart, Lung, and Blood Institute: US$396,020.00

ASJC Scopus Subject Areas

  • Public Health, Environmental and Occupational Health

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.