Teaching Children to Read
In an important action critical to its Congressional charge, the NRP elected to develop and adopt a set of rigorous research methodological standards. These standards, which are defined in this section, guided the screening of the research literature relevant to each topic area addressed by the Panel. This screening process identified a final set of experimental or quasi-experimental research studies that were then subjected to detailed analysis. The evidence-based methodological standards adopted by the Panel are essentially those normally used in research studies of the efficacy of interventions in psychological and medical research. These include behaviorally based interventions, medications, or medical procedures proposed for use in the fostering of robust health and psychological development and the prevention or treatment of disease. It is the view of the Panel that the efficacy of materials and methodologies used in the teaching of reading and in the prevention or treatment of reading disabilities should be tested no less rigorously. However, such standards have not been universally accepted or used in reading education research. Unfortunately, only a small fraction of the total reading research literature met the Panel’s standards for use in the topic analyses.
With this as background, the Panel understood that criteria had to be developed as it considered which research studies would be eligible for assessment. There were two reasons for determining such guidelines or rules a priori. First, the use of common search, selection, analysis, and reporting procedures would ensure that the Panel’s efforts could proceed, not as a diverse collection of independent—and possibly uneven—synthesis papers, but as parts of a greater whole. The use of common procedures permitted a more unified presentation of the combined methods and findings. Second, the amount of research synthesis that had to be accomplished was substantial. Consequently, the Panel had to work in diverse subgroups to identify, screen, and evaluate the relevant research to complete their respective reports. Moreover, the Panel also had to arrive at findings that all or nearly all of the members of the NRP could endorse. Common procedures, grounded in scientific principles, helped the Panel to reach final agreements.
Each subgroup conducted a search of the literature using common procedures, describing in detail the basis and rationale for its topical term selections, the strategies employed for combining terms or delimiting searches, and the search procedures used for each topical area.
Each subgroup limited the period of time covered by its searches on the basis of relative recentness and how much literature the search generated. For example, in some cases it was decided to limit the years searched to the number of most recent years that would identify between 300 and 400 potential sources. This scope could be expanded in later iterations if it appeared that the nature of the research had changed qualitatively over time, if the proportion of useable research identified was small (e.g., less than 25%), or if the search simply represented too limited a proportion of the total set of identifiable studies. Although the number of years searched varied among subgroup topics, decisions regarding the number of years to be searched were made in accord with shared criteria.
The initial criteria were established to focus the efforts of the Panel. First, any study selected had to focus directly on children’s reading development from preschool through grade 12. Second, the study had to be published in English in a refereed journal. At a minimum, each subgroup searched both PsycINFO and ERIC databases for studies meeting these initial criteria. Subgroups could, and did, use additional databases when appropriate. Although the use of a minimum of two databases identified duplicate literature, it also afforded the opportunity to expand perspective and locate articles that would not be identifiable through a single database.
Identification of each study selected was documented for the record, and each was assigned to one or more members of the subgroup, who examined the title and abstract. Based on this examination, the subgroup member(s) determined, if possible at this stage, whether the study addressed issues within the purview of the research questions being investigated. If it did not, the study was excluded and the reason(s) for the exclusion were detailed and documented for the record. If it did address reading instructional issues relevant to the Panel’s selected topic areas, the study underwent further examination.
Following initial examination, if the study had not been excluded in accord with the preceding criteria, the full study report was located and examined in detail to determine whether the following criteria were met:
These criteria for evaluating research literature are widely accepted by scientists in disciplines involved in medical, behavioral, and social research. The application of these criteria increases the probability that objective, rigorous standards were used and that therefore the information obtained from the studies would contribute to the validity of any conclusions drawn.
If a study did not meet these criteria or could not be located, it was excluded from subgroup analysis and the reason(s) for its exclusion detailed and documented for the record. If the study was located and met the criteria, the study became one of the subgroup’s core working set of studies. The core working sets of studies gathered by the subgroups were then coded as described below and then analyzed to address the questions posed in the introduction and in the charge to the Panel.
If a core set of studies identified by the subgroup was insufficient to answer critical instructional questions, less recent studies were screened for eligibility for, and inclusion in, the core working sets of studies. This second search used the reference lists of all core studies and known literature reviews. This process identified cited studies that could meet the Panel’s methodological criteria for inclusion in the subgroups’ core working sets of studies. Any second search was described in detail and applied precisely the same search, selection, exclusion, and inclusion criteria and documentation requirements as were applied in the subgroups’ initial searches.
Manual searches, again applying precisely the same search, selection, exclusion, and inclusion criteria and documentation requirements as were applied in the subgroups’ electronic searches, were also conducted to supplement the electronic database searches. Manual searching of recent journals that publish research on specific NRP subgroup topics was performed to compensate for the delay in appearance of these journal articles in the electronic databases. Other manual searching was carried out in relevant journals to include eligible articles that should have been selected, but were missed in electronic searches.
The subgroup searches focused exclusively on research that had been published or had been scheduled for publication in refereed (peer-reviewed) journals. The Panel reached consensus that determinations and findings for claims and assumptions guiding instructional practice depended on such studies. Any search or review of studies that had not been published through the peer review process but was consulted in any subgroup’s review was treated as separate and distinct from evidence drawn from peer reviewed sources (i.e., in an appendix) and is not referenced in the Panel’s report. These non-peer-reviewed data were treated as preliminary/pilot data that might illuminate potential trends and areas for future research. Information derived in whole or in part from such studies was not to be represented at the same level of certainty as findings derived from the analysis of refereed articles.
Different types of research (e.g., descriptive-interpretive, correlational, experimental) lay claim to particular warrants, and these warrants differ markedly. The Panel felt that it was important to use a wide range of research, but that the research be used in accordance with the purposes and limitations of the various research types.
To make a determination that any instructional practice could be or should be adopted widely to improve reading achievement requires that the belief, assumption, or claim supporting the practice is causally linked to a particular outcome. The highest standard of evidence for such a claim is the experimental study, in which it is shown that treatment can make such changes and effect such outcomes. Sometimes when it is not feasible to do a randomized experiment, a quasi-experimental study is conducted. This type of study provides a standard of evidence that, while not as high, is acceptable, depending on the study design.
To sustain a claim of effectiveness, the Panel felt it necessary that there be experimental or quasi-experimental studies of sufficient size or number, and scope (in terms of population served), and that these studies be of moderate to high quality. When there were too few studies of this type or they were too narrowly cast or they were of marginally acceptable quality, then it was essential that the Panel have substantial correlational or descriptive studies that concurred with the findings if a claim was to be sustained. No claim could be determined on the basis of descriptive or correlational research alone. The use of these procedures increased the possibility of reporting findings with a high degree of internal validity.
Characteristics and outcomes of each study that met the screening criteria described above were coded and analyzed, unless otherwise authorized by the Panel. The data gathered in these coding forms were the information submitted to the final analyses. The coding was carried out in a systematic and reliable manner.
The various subgroups relied on a common coding form developed by a working group of the Panel’s scientist members and modified and endorsed by the Panel. However, some changes could be made to the common form by the various subgroups for addressing different research issues. As coding forms were developed, any changes to the common coding form were shared with and approved by the Panel to ensure consistency across various subgroups.
Unless specifically identified and substantiated as unnecessary or inappropriate by a subgroup and agreed to by the Panel, each form for analyzing studies was coded for the following categories:
If text was a variable, the coding indicated what is known about the difficulty level and nature of the texts being used. Any use of special personnel to deliver an intervention, use of special materials, staff development, or other features of the intervention that represent potential cost were noted. Finally, various threats to reliability and internal or external validity (group assignment, teacher assignment, fidelity of treatment, and confounding variables including equivalency of subjects prior to treatment and differential attrition) were coded. Each subgroup also coded additional items deemed appropriate or valuable to the specific question being studied by the subgroup members.
A study could be excluded at the coding stage only if it was found to have so serious a fundamental flaw that its use would be misleading. The reason(s) for exclusion of any such study was detailed and documented for the record. When quasi-experimental studies were selected, it was essential that each study included both pre-treatment and post-treatment evaluations of performance and that there was a comparison group or condition.
Each subgroup conducted an independent re-analysis of a randomly designated 10% sample of studies. Absolute rating agreement was calculated for each category (not for forms). If absolute agreement fell below 0.90 for any category for occurrence or nonoccurrence agreement, the subgroup took some action to improve agreement (e.g., multiple readings with resolution, improvements in coding sheet).
Upon completion of the coding for recently published studies, a letter was sent to the first author of the study requesting any missing information. Any information that was provided by authors was added to the database.
After its search, screening, and coding, a subgroup determined whether for a particular question or issue a meaningful meta-analysis could be completed or whether it was more appropriate to conduct a literature analysis of that issue or question without meta-analysis, incorporating all of the information gained. The full Panel reviewed and approved or modified each decision.
When appropriate and feasible, effect sizes were calculated for each intervention or condition in experimental and quasi-experimental studies. The subgroups used the standardized mean difference formula as the measure of treatment effect. The formula was:
(Mt - Mc) / 0.5(sdt + sdc)
M t is the mean of the treated group,
M c is the mean of the control group,
sd t is the standard deviation of the treated group, and
sd c is the standard deviation of the control group.
When means and standard deviations were not available, the subgroups followed the guidelines for the calculation of effect sizes as specified by Cooper and Hedges (1994).
The subgroups weighted effect sizes by numbers of subjects in the study or comparison to prevent small studies from overwhelming the effects evident in large studies.
Each subgroup used median and/or average effect sizes when a study had multiple comparisons, and each subgroup only employed the comparisons that were specifically relevant to the questions under review by the subgroup.
Analyses of effect sizes were undertaken with several goals in mind. First, overall effect sizes of related studies were calculated across subgroups to determine the best estimate of a treatment’s impact on reading. These overall effects were examined with regard to their difference from zero (i.e., does the treatment have an effect on reading?), strength (i.e., if the treatment has an effect, how large is that effect?), and consistency (i.e., did the effect of the treatment vary significantly from study to study?). Second, the Panel compared the magnitude of a treatment’s effect under different methodological conditions, program contexts, program features, and outcome measures and for students with different characteristics. The appropriate moderators of a treatment’s impact were drawn from the distinctions in studies recorded on the coding sheets. In each case, a statistical comparison was made to examine the impact of each moderator variable on average effect sizes for each relevant outcome variable. These analyses enabled the Panel to determine the conditions that alter a program’s effects and the types of individuals for whom the program is most and least effective. Within-group average effect sizes were examined as were overall effect sizes for differences from zero and for strength. The analytic procedures were carried out using the techniques described by Cooper and Hedges (1994).
first | previous | index