Parameter constraints arise naturally in many applications, due to the underlying scientific question of interest (e.g., hypothesis), the study design (e.g., dose-response study, ordered experimental groups, cell-cycle or circadian clock studies), or the underlying science/context (e.g., average height of an adolescent). In some applications, the observed data and the corresponding statistical parameters of interest add up to a constant. Such data are called compositional data. For example, the daily diet of a person may consist of 40% protein, 40% carbs, 10% fat, and 10% other, with the total adding up to 100%. These are special data since they reside inside a simplex or a pyramid or a tetrahedron. Standard statistical methods are not appropriate for analyzing such data. Compositional data arise naturally in many studies such as those related to microbiome, diet, nutrition, body fat distribution, and so forth.
Theoretical constraints in ROC curves or surfaces, usually related to various orderings on distribution functions, are also commonly encountered in diagnostic accuracy analysis. Constraints can be formulated using mathematical inequalities between parameters or nonparametrically using concepts such as stochastic ordering between random variables or other methods. Statistical methods that make use of such constraints have increased efficiency compared to methods, such as the standard analysis of variance, that ignore the underlying constraints. Gains in efficiency are reflected by reduction in sample size for the sample statistical power. Furthermore, such constrained inference methods may result in scientifically meaningful/interpretable results.
BBB investigators are developing constrained inference-based methods for a wide range of applications. For example, we developed Constrained Linear Mixed Effects, a general-purpose software package for analyzing mixed-effects models under constraints. For analyzing microbiome data, we developed methods and software called Analysis of Compositional Microbiomes (ANCOM) and ANCOM-BC. In addition, BBB-developed R packages for constrained ROC analyses are available on GitHub.
Lin, H., & Peddada, S. D. (2020). Analysis of compositions of microbiomes with bias correction. Nature Communications, 11(1), 3514. PMID: 32665548
Zhang, W., Tang, L. L., Li, Q., Liu, A., & Lee, M. L. T. (2020). Order-restricted inference for clustered ROC data with application to fingerprint matching accuracy. Biometrics, 76(3), 863-873. PMID: 31725175
Chen, Z., Hwang, B. S., & Kim, S. (2019). A correlated Bayesian rank likelihood approach to multiple ROC curves for endometriosis. Statistics in Medicine, 38(8):1374-1385. PMID: 30421556
Davidov, O., Jelsema, C. M., & Peddada, S. D. (2018). Testing for inequality constraints in singular models by trimming or winsorizing the variance matrix. Journal of the American Statistical Association, 113(522), 906-918. PMID: 33093735
Kaul, A., Davidov, O., & Peddada, S. D. (2017). Structural zeros in high-dimensional data with applications to microbiome studies. Biostatistics, 18(3),422–433. PMID: 28065879
Jelsema, C., & Peddada, S. D. (2016). CLME: an R package for linear mixed effects models under inequality constraints. Journal of Statistical Software, 75(1), 1-32. PMID: 32655332
Hwang, B. S., & Chen, Z. (2015). An integrated Bayesian nonparametric approach for stochastic and variability orders in ROC curve estimation: an application to endometriosis diagnosis. Journal of the American Statistical Association, 110(511), 923-934. PMID: 26839441. PMCID: PMC4733471
Mandal, S., Van Treuren, W., White, R. A., Eggesbø, M., Knight, R., & Peddada, S. D. (2015). Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial Ecology in Health and Disease, 26, 1-7. PMID: 26028277
Davidov, O., & Peddada, S. D. (2013). The linear stochastic order and directed inference for multivariate ordered distributions. Annals of Statistics, 41(1), 1-40. PMID: 23543786
Davidov, O., & Peddada, S. D. (2011). Order-restricted inference for multivariate binary data with application to toxicology. Journal of the American Statistical Association, 106(496), 1394-1404. PMID: 22973069
Liu, A., Liu, C., Li, Q., Yu, K. F., & Yuan, V. W. (2010). A threshold sample-enrichment approach in a clinical trial with heterogeneous subpopulations. Clinical Trials: Journal of the Society for Clinical Trials, 7(5), 537-545. PMID: 20685769
Peddada, S. D., Dinse, G. E., & Kissling, G. E. (2007). Incorporating historical control data when comparing tumor incidence rates. Journal of the American Statistical Association, 102(480), 1212-1220. PMID: 20396669
Hwang, J. T. G., & Peddada, S. D. (1994). Confidence interval estimation subject to order restrictions. Annals of Statistics, 22(1), 67-93.