In early 2009, the NICHD initiated an ongoing effort to establish a core library of harmonized pediatric terms that enables clinical investigators to more readily compare and aggregate dataacross clinical research portfolios.
This effort drew on an analysis of reference terminology and literature sources, and used modeling, annotation, and semantic integration tools to produce newly harmonized pediatric concepts and their corresponding data elements. To commence this initiative, the NICHD focused on developing the terminology and metadata for content-specific data acquisition tools for neonatal and infant examinations.
The content of the newborn examination form was developed using newborn metabolic screening submission forms as a model to conceptually outline the comprehensive examination that fully assesses the health of a newborn. This initiative identified specific concepts and terms within the neonatal and infant examination domain and used existing resources, such as the National Cancer Institute Thesaurus (NCIt) and the Systematized Nomenclature of Medicine (SNOMED), to harmonize the various fields and develop consistent terminology for these areas. The neonatal and infant examination concepts were developed as part of a broader pediatric terminology that is based on a developmental stage framework that follows a child’s development through the various child life stages from pre-birth to 21 years of age.
Developing the terminology in NCI Thesaurus:
The terms identified from the concept outline were represented as a terminology that depicts the concepts and their attributes, and were incorporated into the NCI Thesaurus in collaboration with NCI Enterprise Vocabulary Services (EVS) curators. The neonatal and infant examination terms are organized as a subset in the existing NCI Thesaurus hierarchy. Each term is represented as a separate concept, and assigned a unique code, term definition, and synonym/abbreviation, where required. Each newborn examination concept has an association relationship “Concept in Subset: Newborn Screening,” for easy retrieval of the terminology.
Developing the UML Model for the Neonatal and Infant Examination Concepts:
A terminology model was developed in the Unified Modeling Language (UML), organized by the various dimensions of the original concept outline. Relevant concepts and their attributes were identified and defined within each dimension, and the structure and relationship between the concepts both within and between dimensions were graphically depicted. Representing the concepts in this model format, by using the existing semantic infrastructure from the NCI Center for Biomedical Informatics and Information Technology (NCI-CBIIT), automated the process for generating metadata. This resulted in a more efficient and scalable modeling process.
Loading Metadata and Generating Data Acquisition Tool:
NCI EVS services were used to annotate the UML model with reference terminology from the NCI Thesaurus. The NCI Semantic Integration Workbench (SIW) is an EVS tool that helps to add concepts to a UML model by matching the model attributes with similar items from the NCI Thesaurus, using concept codes as reference. Semantic Connector, a component of the SIW, was run to perform an initial concept annotation of classes and attributes that were not previously matched at the data element level. The UML Loader was used to load the associated metadata from the annotated model XMI file into the cancer Data Standards Repository (caDSR). An open source clinical research tool, OpenClinica , developed by Akaza Research, was used to locate the data elements of interest from the caDSR, and to generate a mock web-based case report form (CRF) for neonatal and infant examination.
Generation of Metadata for Neonatal and Infant Examination:
Through the Neonatal and Infant Examination Terminology Prototype effort, 108 new data elements were registered in the caDSR and made available via the Common Data Element Browser. The framework of the caDSR is based on ISO 11179, which is organized into ‘Contexts’. All administered components within the database are associated with a Context, either that in which they originated or are used. A new context for the NICHD was added to the caDSR to store the NICHD Pediatric Terminology metadata as the effort continues to grow. All CDEs that were produced by NICHD are associated with the "NICHD" Context.
By using a UML model of the neonatal and infant examination terminology to generate metadata, the metadata abstraction can be performed in an automated fashion, simplifying identification of any annotation errors early in the process, and providing a visual validation method for subject matter experts to evaluate the terminology.
Metadata Generation Workflow
The prototype effort was conducted in partnership with the National Cancer Institute (NCI) which hosts infrastructure and terminology/metadata services that are widely used by the biomedical community.