BBB Research: Constrained Statistical Inference with Applications

Parameter constraints arise naturally in many applications, due to the underlying scientific question of interest (e.g., hypothesis), the study design (e.g., dose-response study, ordered experimental groups, cell-cycle or circadian clock studies), or the underlying science/context (e.g., average height of an adolescent). In some applications, the observed data and the corresponding statistical parameters of interest add up to a constant. Such data are called compositional data. For example, the daily diet of a person may consist of 40% protein, 40% carbs, 10% fat, and 10% other, with the total adding up to 100%. These are special data since they reside inside a simplex or a pyramid or a tetrahedron. Standard statistical methods are not appropriate for analyzing such data. Compositional data arise naturally in many studies such as those related to microbiome, diet, nutrition, body fat distribution, and so forth.

Theoretical constraints in ROC curves or surfaces, usually related to various orderings on distribution functions, are also commonly encountered in diagnostic accuracy analysis. Constraints can be formulated using mathematical inequalities between parameters or nonparametrically using concepts such as stochastic ordering between random variables or other methods. Statistical methods that make use of such constraints have increased efficiency compared to methods, such as the standard analysis of variance, that ignore the underlying constraints. Gains in efficiency are reflected by reduction in sample size for the sample statistical power. Furthermore, such constrained inference methods may result in scientifically meaningful/interpretable results.

BBB investigators are developing constrained inference-based methods for a wide range of applications. For example, we developed Constrained Linear Mixed Effects, a general-purpose software package for analyzing mixed-effects models under constraints. For analyzing microbiome data, we developed methods and software called Analysis of Compositional Microbiomes (ANCOM) and ANCOM-BC. In addition, BBB-developed R packages for constrained ROC analyses are available on GitHub.

Principal Investigators

Zhen Chen, Ph.D., and Aiyi Liu, Ph.D.

Selected Publications

Lin, H., & Peddada, S. D. (2020). Analysis of compositions of microbiomes with bias correction. Nature Communications, 11(1), 3514. PMID: 32665548

Zhang, W., Tang, L. L., Li, Q., Liu, A., & Lee, M. L. T. (2020). Order-restricted inference for clustered ROC data with application to fingerprint matching accuracy. Biometrics, 76(3), 863-873. PMID: 31725175

Chen, Z., Hwang, B. S., & Kim, S. (2019). A correlated Bayesian rank likelihood approach to multiple ROC curves for endometriosis. Statistics in Medicine, 38(8):1374-1385. PMID: 30421556

Davidov, O., Jelsema, C. M., & Peddada, S. D. (2018). Testing for inequality constraints in singular models by trimming or winsorizing the variance matrix. Journal of the American Statistical Association, 113(522), 906-918. PMID: 33093735

Kaul, A., Davidov, O., & Peddada, S. D. (2017). Structural zeros in high-dimensional data with applications to microbiome studies. Biostatistics, 18(3),422–433. PMID: 28065879

Jelsema, C., & Peddada, S. D. (2016). CLME: an R package for linear mixed effects models under inequality constraints. Journal of Statistical Software, 75(1), 1-32. PMID: 32655332

Hwang, B. S., & Chen, Z. (2015). An integrated Bayesian nonparametric approach for stochastic and variability orders in ROC curve estimation: an application to endometriosis diagnosis. Journal of the American Statistical Association, 110(511), 923-934. PMID: 26839441. PMCID: PMC4733471

Mandal, S., Van Treuren, W., White, R. A., Eggesbø, M., Knight, R., & Peddada, S. D. (2015). Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial Ecology in Health and Disease, 26, 1-7. PMID: 26028277

Davidov, O., & Peddada, S. D. (2013). The linear stochastic order and directed inference for multivariate ordered distributions. Annals of Statistics, 41(1), 1-40. PMID: 23543786

Davidov, O., & Peddada, S. D. (2011). Order-restricted inference for multivariate binary data with application to toxicology. Journal of the American Statistical Association, 106(496), 1394-1404. PMID: 22973069

Liu, A., Liu, C., Li, Q., Yu, K. F., & Yuan, V. W. (2010). A threshold sample-enrichment approach in a clinical trial with heterogeneous subpopulations. Clinical Trials: Journal of the Society for Clinical Trials, 7(5), 537-545. PMID: 20685769

Peddada, S. D., Dinse, G. E., & Kissling, G. E. (2007). Incorporating historical control data when comparing tumor incidence rates. Journal of the American Statistical Association, 102(480), 1212-1220. PMID: 20396669

Hwang, J. T. G., & Peddada, S. D. (1994). Confidence interval estimation subject to order restrictions. Annals of Statistics, 22(1), 67-93.