The analysis of genetics data is an active area of biostatistics research and presents unique opportunities along with statistical challenges, especially when dealing with data related to birth defects. For example, genetic information is typically available for an affected child along with one or both parents (triads) resulting in three genomes for study. Incorporating such genetic information into the analysis is very complex, but reflects triads giving rise to birth defects. Some examples of current research include:
- Multiple Comparisons in Genetic Testing and Methods of Genetic Association Testing
- Statistical Methodology for Understanding Copy Number Variation
- Statistical Methods for Mendelian Randomization in Case-Control Studies
Multiple Comparisons in Genetic Testing and Methods of Genetic Association Testing
BBB investigators are addressing the need for new approaches to account for multiple comparison problems inherent in the large number of statistical tests conducted in many genetic epidemiology studies. For example, this methodology will be important in analyzing genetic association studies with large number of single nucleotide polymorphisms (SNPs) when studying genetic effects on Neural Tube Defents (NTDs).
BBB investigators are also working on new approaches to testing for genetic associations in genetic epidemiologic studies of triads (case child and parents). These methods aim to exploit the triad structure, providing a more powerful test of genetic association compared to standard genetic association tests. These new approaches were directly motivated by BBB staff collaborations with Epidemiology Branch (EB) investigators who are studying NTDs.
Statistical Methodology for Understanding Copy Number Variation
Copy number variations, the varying number of copies in small segments of chromosomes, are more useful for explaining larger phenotype variations than the commonly used SNPs. However, the actual copy numbers of chromosome segments are usually not directly observable; rather, some generated signal intensities are reflective of the underlying copy number state at each SNP. Statistical methods that infer the unobserved copy number state using observed signal intensities have been proposed for the study of unrelated individuals using hidden Markov models (HMM). Researchers also want to extend the existing approach to related individuals, such as family members. To address this issue, BBB investigators developed HMMs that can incorporate Mendelian inheritance information among family members and, hence, can more accurately uncover the latent copy number states. These new approaches will be useful for analyzing genome-wide association studies (GWAS) in future Division and Institute studies.
Statistical Methods for Mendelian Randomization in Case-Control Studies
Given their susceptibility to confounding and reverse causation, the effects of intermediate phenotypes (e.g., fasting insulin levels) on disease (e.g., Type 2 diabetes) estimated from case-control or observational studies are often difficult to interpret. Mendelian randomization is a technique which adjusts for known or unknown confounding in a case-control setting by carefully selecting a gene as an instrumental variable in the causal pathway, and using the genephenotype association and gene-disease association to obtain a consistent estimate of the phenotype-disease association.
BBB investigators, in collaboration with EB investigators, are developing statistical methodology that focuses on deriving Mendelian randomization estimates of phenotype-disease association for discrete disease outcomes and for various scenarios, such as multiple genes, multiple phenotypes, longitudinal phenotype data, and phenotype data with measurement errors.