Skip Internal Navigation
Next-Generation Sequencing and the Molecular Genetics Laboratory
Since the Division of Intramural Research recently obtained its own Applied Biosystems SOLID 4 Next-Generation sequencing instrument, we have supported several whole-exome sequencing projects. To date, we have sequenced 48 whole human exomes and 24 human gene libraries targeting the coding regions of 300 genes of interest to NICHD Principal Investigators. For targeted enrichment of the selected regions, we apply the SureSelect protocol provide by Agilent Technologies. Recently, to achieve higher throughput and to permit several simultaneous applications, we upgraded the sequencing platform to SOLiD 5500.
Analyses to date have included mapping of the produced data, calculating exon coverage for the regions selected, as well as corresponding coverage for whole exomes. We annotate the resulting sequencing reads in search of novel and reported genetic variants potentially involved in the disease etiology. In addition, we developed web-friendly tools to enable investigators to view the nature of observed polymorphisms using industry-standard tools such as the Santa Cruz browser and the Broad Institute's IGV viewer.
We continue to collaborate with the Mass Spectrometry Core Facility. Our previous development of software for the
de novo identification of peptides included a large data set (LIPCUT) that exhaustively enumerates all possible amino acid combinations falling into a given mass range. We recently developed software to enumerate the elemental composition of peptides in an analogous fashion and were able to leverage favorable combinatorics to perform such enumeration for higher-mass peptides (4,500 daltons versus 1,750 daltons for LIPCUT). We are seeking to combine these resources with a newly developed isotopic clustering algorithm to improve existing peptide-fragmentation database searches by submitting refined versions of the original spectrum to the search engine.
Our current work also uses a database of known human proteins as a guide as to whether a given isotope cluster is likely to have been derived from an unmodified peptide or a peptide subjected to one or more common post-translational modifications.
We are also working on a Bayesian approach to add value to mass-spectral database searching by considering which peptides are associated with each candidate protein and the relative spectral intensity of those peptides.
Short-read Genome Assembly and Analysis
We are actively involved in a joint project with the Program in Genomics of Differentiation's Section on Molecular and Cell Biology (Rich Maraia and colleagues). So far, we have aligned Solexa reads of
S. pombe to its reference genome and have remediated a set of
S. pombe third-party genome corrections. We are working with three strains of
S. pombe, one of which is a parent of the other two strains. We have identified differences between the parent strain and the canonical reference strain and, more importantly, differences between the parent strain and its two mutants. We resolved previously identified artifacts of this next-Gen sequencing and applied
de novo assembly methods to resolve sections of poor sequence coverage. To learn more about how
S. pombe acquires its natural resistance to rapamycin, we recently began work on a new set of strains.
Radiation Hybrid Mapping
Despite the availability of a draft zebrafish genome, we see demand for radiation hybrid mapping in zebrafish. We recently dismantled the LN54 radiation hybrid panel mapping web site (for technical reasons) and now receive requests to perform radiation hybrid mapping semi-manually.
We provided ongoing consultation on DNA and protein sequence analysis and on general bioinformatics issues to the Program in Genomics of Differentiation and have consulted with regard to evolving high-throughput DNA sequencing technology.