A data standard is an agreed upon approach, to allow for consistent measurement, qualification or exchange of an object, process, or unit of information. Data standards include documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data (visit the National Library of Medicine (NLM) definition and Environmental Protection Agency (EPA) definition for details).
Why use data standards?
The goal of data standards is to improve the usability of data and enable researchers to easily combine and co-analyze multiple datasets (i.e., improve interoperability).
Using data standards throughout the data lifecycle can reduce time and costs by minimizing implementation of new or customized approaches for similar research areas.
Data standards should be used and preserved across the data lifecycle, including during:
Collection, such as using standard forms, questions, formats
Harmonization and Curation, mapping to ontologies or controlled vocabularies, implementing standard workflows
Submission, to data repositories, for example by annotating common data elements (CDEs) in data dictionaries and providing metadata
Sharing, through data repositories that enable use by other researchers
What data standards are relevant to NICHD?
Please note: NICHD does not approve or disapprove data standards. Instead, the institute encourages the use of data standards suitable to the study design, the type of data collected, characteristics of the dataset, and best practices of the respective research community.
Please note that some data repositories may encourage or require use of specific data standards. Check the NICHD Data Repository Finder to find metadata requirements and data formats used by some NICHD-relevant data repositories.
The following existing data standards may be relevant to researchers who receive or are applying for NICHD support. Several categories of data standards exist, and sometimes a given standard falls into several categories. Although this information will be updated regularly, it is not intended to be comprehensive.
Metadata standards specify minimal elements to describe data, and how those elements are formatted. Examples include:
Social, Behavioral, Economic, and Health Sciences Data:Data Documentation Initiative (DDI) offers standards for data collected by surveys and other observational methods.
Controlled vocabularies, terminologies, and ontologies offer a consistent way to describe data by defining its semantic or contextual meaning. Controlled vocabularies, such as indices and subject headings, and terminologies, such as thesauri, help make data more sharable and searchable. Ontologies go a step further by describing the relationship between terms and concepts. Examples and resources include (in alphabetical order):
Current Procedural Terminology (CPT®), from the American Medical Association, offers a uniform process for coding medical and other health care services.
Disease Ontology provides consistent, reusable, and sustainable descriptions of human disease terms, phenotype characteristics, and related medical vocabularies.
Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease; coordinates with Mondo Disease Ontology.
Ontology Lookup Service, from the European Molecular Biology Laboratory’s European Bioinformatics Institute, provides a single point of access to the latest ontology databases.
CDEs are standardized, precisely defined questions paired with a set of specific, allowable responses that are used systematically across studies to ensure consistent data collection (see also: NOT-LM-21-005: Request for Information: Use of CDEs in NIH-Funded Research). CDE resources include (in alphabetical order):
NIH CDE Repository is the primary NIH repository for CDEs recommended or required by NIH institutes and centers and other organizations in research and other purposes.