BBB Research: Longitudinal Data Analysis

Many of the Division's studies are longitudinal and involve sampling frameworks such as schools, families (parent-child triads), couples, maternal/fetal pairs, or individuals. Longitudinal studies have inherent methodological challenges over time including the problems of attrition, difficulties in making statistical inference when data are correlated, and difficulties in characterizing complex longitudinal patterns. Many of the Branch's independent research projects address one or more of these issues in the context of substantive problems in one or more of the Division's studies. Some examples of the BBB's research on longitudinal data analysis are briefly described below.

Modeling Longitudinal Menstrual Cycle Data
Joint Modeling of Time-to-Event and Longitudinal Data
Modeling Longitudinal Data with an Informative Number of Measurements
Analyzing High-dimensional Continuously Collected Longitudinal Data in Small Samples

Modeling Longitudinal Menstrual Cycle Data

The BioCycle Study, conducted by the Epidemiology Branch (EB), is a longitudinal cohort study aimed at determining the association between oxidative stress levels and endogenous reproductive hormone levels. Characterizing the longitudinal profiles of multiple biomarkers over the course of the menstrual cycle is a challenging methodological problem. BBB investigators, along with those in EB, are working to develop new model classes for characterizing menstrual cycle variation in longitudinal biomarkers when measurements occur at irregular time points. The length of the menstrual cycle may be related to the underlying cyclic patterns of important hormone levels. BBB and EB investigators are currently seeking new approaches for jointly modeling the length of a menstrual cycle and the underlying cyclic pattern of important biomarkers. These joint models will help investigators to better understand the interrelationships between hormonal patterns and the length of the menstrual cycle, and to make valid inferences about the hormonal patterns during the menstrual cycle when measurements are at fixed, irregular time points during the cycle.

Joint Modeling of Time-to-Event and Longitudinal Data

Understanding the relationships between longitudinal data and the time to an event (such as time to ovulation or time to pregnancy) poses difficult methodological challenges. The longitudinal data are often subject to measurement error, which must be accounted for in the analysis. Joint modeling of event-time and longitudinal data is one approach for dealing with the measurement error problem. BBB investigators are currently working on approaches for jointly modeling the two processes when: the longitudinal data are binary indicators of behavior, such as intercourse, in a time-to-pregnancy study; and the longitudinal data are high-dimensional continuous variables, such as a multiplex panel of cytokines in a longitudinal biomarker study. These statistical methodologies will not only be important tools for Division studies, such as the Longitudinal Investigation of ertility and the Environment (LIFE) Study and the BioCycle Study, but will also be applicable across a wide range of applications in epidemiologic and clinical research.

Modeling Longitudinal Data with an Informative Number of Measurements

Most methodology for analyzing longitudinal data assumes that the number and timing of follow-up measurements are not related to the underlying longitudinal response for an individual. In many Division studies, follow-up measurements are based on whether a follow-up visit is clinically indicated. For example, in studying the trend of fetal growth in terms of body weight, normal fetuses (those with normal body weights) might get fewer ultrasound tests than abnormal fetuses (those with large or small body weights). As a result, normally growing fetuses might contribute fewer observations (i.e., ultrasound readings) than abnormally growing fetuses. Thus, a standard approach to modeling fetal weight that ignores this informative number of ultrasound readings would be biased. Any accurate analysis approach has to properly account for this informative observation problem. The PRB longitudinal trial of behavioral intervention for Type 1 diabetes rovides another example of this "number of measurements" problem. In this case, the number of follow-up visits varies across individuals and is determined by usual care. BBB investigators are currently developing new statistical methods that account for this type of observation process.

Analyzing High-dimensional Continuously Collected Longitudinal Data in Small Samples

This research is motivated by the Naturalistic Teenage Driving Study, which is led by PRB with collaborators at the Virginia Tech Transportation Institute. The study followed 41 drivers during the first 18 months of independent, licensed driving using an in-vehicle data-recording system installed in participants' vehicles. The instrumentation included accelerometers, cameras, Global Positioning Systems (GPS), front radar, and a lane tracker. The study produced a rich set of data, which could shed light on many interesting scientific questions, such as whether or how the teenagers' driving performance varies over time, and whether poor driving performance could predict a crash. Because of the extensive observations on a small number of teen drivers, the analysis of these data poses unique statistical challenges. Synthesizing the large amount of data poses some of the same challenges encountered in analyzing "-omics" data. For example, analyzing the extensive information collected on each of potentially thousands of distinct car trips over an 18-month period is similar to analysis techniques for high-dimensional genomic data. BBB investigators are exploring new statistical approaches for making inferences in these types of situations.