Retroelements are a large class of genetic elements that multiply by the reverse transcription of an RNA intermediate. The resulting cDNA is incorporated into the genome of host cells. In eukaryotes, the wide-spread success of long terminal repeat (LTR)-containing retroelements has led to replication mechanisms that are conserved among diverse families of retrotransposons and retroviruses. The medical importance of retroviruses such as HIV has intensified the need to understand the molecular details of the mechanisms responsible for the propagation of LTR-retroelements. Since LTR-retrotransposons exist in yeast, the powerful techniques of yeast genetics can be applied to answer basic questions about the function of LTR-retroelements. In turn, this fundamental information may identify new antiviral targets or stradegies that can be used to combat the spread of retroviruses such as HIV.
The similarities of retrotransposons to retroviruses includes the presence of two long terminal repeats (LTRs) and open reading frames (ORFs) with coding sequences homologous to retroviral protease (PR), reverse transcriptase (RT), and integrase (IN). The first step in the transposition pathway as shown in Figure 1 is synthesis of a full-length mRNA with sequence that begins in the 5' LTR and terminates in the 3' LTR. This is directly analogous to the initial step in retrovirus particle formation. Retroviral and retrotransposon mRNAs are translated into proteins that assemble along with the mRNA into large particle structures. Both retroviral and retrotransposon particles undergo a maturation process that includes the proteolytic processing of precursor proteins and the reverse transcription of the mRNA. Retrotransposon particles complete transposition by simply inserting their DNA into the genome of the original host cell. Because each step in retrotransposition is directly related to a retrovirus process, results from the investigation of yeast retrotransposition are relevant to aspects of retrovirus behavior.
Figure 1. Synthesis of a full-length mRNA with sequence that begins in the 5' LTR and terminates in the 3' LRT.
The retrotransposon we study is the Tf1 element of the fission yeast Schizosaccharomyces pombe. The transposon is 5 kb and contains a single ORF with coding sequences for Gag, PR, RT, and IN (5). We have previously demonstrated using an in vivo assay that uses a neo-marked version of Tf1, that at least one of our cloned copies is active and can transpose at a significant frequency (3, 4). Figure 2 shows that cells that contain transposition events grow readily on the drug G418. This system allows us to use a battery of powerful molecular and genetic techniques to identify and characterize factors that contribute to the transposition process. The following are summaries of our current research projects.
Figure 2. Cells containing transposition events grow readily on the drug G418.
For more information, see the Section on Eukaryotic Transposable Elements’ current research projects:
The IN of Tf1 contains a Zn finger-like motif and a DDE motif, two sequences typical of retroelement integrases (Levin, Weaver et al. 19909). Further analysis of its sequence revealed that Tf1 IN also contains a chromodomain (CHD) at its C-terminus (Malik and Eickbush 199910). CHDs are present in chromatin modifying proteins such as Hp1 and interact directly with histone proteins with specific post-translational modifications. The presence of a CHD suggests Tf1 IN may interact directly with specific nucleosomes during integration. This is particularly interesting in the case of Tf1 because such an interaction may play a role in the insertion preference of Pol II promoters. In these first studies of the CHD, we examined its role in the catalytic activities of IN.
The full-length IN and IN lacking the CHD (CH-) were expressed in bacteria and purified extensively with cobalt agarose and heparin sepharose. To measure catalytic activity and optimize reaction conditions we used the disintegration assay. This assay measures the reverse of integration and was used initially because the substrate sequence and structural requirements are less stringent than for the forward reaction. The model substrate consisted of a 76 nt oligonucleotide that mimics the end of the LTR attached to the bottom strand of a target. A 20 nt oligonucleotide is added to mimic the top strand of the target. IN activity will catalyze a transesterification reaction that joints the 20 nt top substrate to the bottom strand of the target. This converts the 20 nt DNA into a 63 nt product, a change that can be readily detected on polyacrylamide gels. These experiments revealed that IN possessed substantial levels of catalytic activity. Surprisingly, the CH- protein was significantly more active then the full length IN.
To test the IN proteins for their ability to catalyze the forward reaction, a 24 nt double-stranded DNA was used as the substrate in strand transfer assays. The DNA contained the sequence of either the U3 or U5 ends of the Tf1 LTR and served both as donor and target. IN exhibited substantial strand transfer activity as indicated by the production of higher molecular weight DNAs. Once again, the IN lacking the CHD, CH-, had substantially more activity than the full-length IN. These data substantiated that Tf1 IN was active as a recombinant protein and that the CHD functioned to inhibit this activity.
The strand transfer activities of retrovirus and retrotransposon INs require that the highly conserved “CA” dinucleotide be present at the 3’ end of the donor substrate. We tested whether the IN of Tf1 had this same requirement for “CA” and whether the CHD contributed to this specificity. IN and CH- were tested for whether each of the last three nucleotides of the LTRs, “ACA” was necessary for strand transfer. All the modifications introduced in the sequence led to a dramatic reduction of the strand transfer activity of IN. In sharp contrast to IN, CH- retained high levels of activity with most of the substituted substrates. For example CH- retained 50% of its activity with the substrate that had a minus one transition (AtA) whereas the full length IN had just 1% of its activity with this substrate. This large difference in the specificity of the two related proteins shows that IN was far more stringent than the CH- counterpart in selecting the DNA donor with the correct 3’ end.Both contributions of the CHD, limiting IN activity and increasing sequence specificity, may result from a single mechanism that restricts access to the active site of IN. One intriguing hypothesis is that the CHD restricts IN activity until it interacts with histone H3 or some other factor and it is this interaction that relieves the inhibitory function of the CHD and allows integration to occur.
The INs of retroviruses have a 3’ processing activity that removes the two or three nucleotides 3’ of the “CA” so that strand transfer can occur. The primer for the minus strand of Tf1 is positioned adjacent to the LTR allowing the cDNA to terminate with the “CA” that is necessary for strand transfer. Thus it was predicted that no processing activity would exist (Levin 19957; Levin 19968). However, sequences of Tf1 cDNA extracted from particles revealed the surprising result that 85% of the 3’ ends had one or more untemplated nucleotides (Atwood-Moore, Ejebe et al. 20051). Because these nucleotides are expected to block strand transfer, we tested whether Tf1 IN had a processing activity. In this experiment, the 5’-end labeled substrates contained nucleotides positioned 3’ to the critical “CA”. The data from these experiments showed that IN did have processing activity capable of removing as many as 5 nucleotides from the 3’ end of the substrate. The processing activity could in theory remove the 3’ nontemplated nucleotides and allow the bulk of the cDNA to participate in integration. This model raises the question why would nontemplated nucleotides be added just so they could be removed by IN. One possibility is that the nontemplated nucleotides could protect the conserved “CA” from attack by nonspecific 3’ exonucleases.
The C-terminal domains in the INs of LTR-retrotransposons and retroviruses are not well conserved. However, close examination of C-termini did identify one motif that exists in a wide variety of INs (Malik and Eickbush 199910). This module termed the GPY/F motif is present in the INs of a diverse set of LTR-retrotransposons in the Metaviridae family (formally Ty3/gypsy) and in the gamma class of retroviruses (Fig. 1A) (Malik and Eickbush 199910; Jern, Sperber et al. 2005 5). The function of this motif has not been studied. The IN of Tf1, as a recombinant protein, is highly soluble and possesses robust catalytic activity (Hizi and Levin 2005 4). These properties motivated us to consider the IN of Tf1 as a potential model to study the function of the GPY/F domain.
Since little was known about the structure of Tf1 IN, our initial experiments tested whether it was sufficiently similar to other INs to serve as a model. IN purified from bacteria was subjected to partial proteolysis with trypsin to determine whether it possessed the three domain architecture found in other INs. The N-terminal amino-acid sequences of the protein fragments determined the cleavages occurred at amino acids 110 and 354. This confirmed IN was composed of N-terminal, central core, and C-terminal domains (Fig. 1B). When the full Tf1 IN is aligned with other INs its conserved residues and the size of its domains closely resembles the IN of Moloney murine leukemia virus (M-MuLV) (Fig. 1B). The cleavage between the central and C-terminal domains occurred in the middle of the GPY/F motif. This indicated the GPY/F region assembles into two stable segments split by less structured residues.
Figure 3. The GPY/F motif of IN. A.
An alignment of INs from the Metavirus family of transposons and the Gamma family of retroviruses shows the conserved residues of the motif (yellow). B. A scaled diagram of INs showing the three domain architecture reveals similarities between the INs of Tf1 and M-MuLV.
To validate whether the domains of Tf1 IN possessed the functions associated with the domains of other INs, and to study the function of the GPY/F motif, a series of recombinant proteins consisting of different sections of Tf1 IN were expressed in bacteria and purified (Fig 2). The central core domain of HIV-1 IN contains the catalytic module that is sufficient to support strand cleavage and joining as measured with the disintegration assay. However, in the IN of M-MuLV the domains are larger, and in this case both the central and C-terminal domains are necessary for catalytic activity. We tested which portions of Tf1 IN were required for catalytic activity using the same disintegration assay in the previous section. The central domain by itself lacked activity. Sequential deletions revealed the N-terminal domain and the GPY/F motif were necessary for activity.
Figure 4. Section of Tf1 IN produced as recombinant proteins.
Figure 5. Gel filtration of IN proteins revealed that the GPY/F fragment formed multimers.
In solution the INs of HIV-1, M-MuLV, and avian sarcoma virus (ASV) form a dimer-tetramer equilibrium. In initial experiments to test the IN of Tf1 for the propensity to multimerize we tested full-length IN for interactions with the individual portions of the protein. Using a precipitation procedure and our recombinant proteins, we found that the N-terminal domain, the central core, and the 71 amino acid GPY/F fragment all bound to the full-length IN. To test directly for stable multimers we performed gel filtration with superdex 200. In a buffer of 50 mM HEPES, pH 7.5, 0.5 M NaCl, and 1% (v/v) glycerol, IN at 1 mg/ml eluted as a single peak with an observed molecular weight of 126.5 kDa, the size predicted for a dimer. The central core by itself was also observed to form a stable dimer. This ability of the central core and the full-length proteins to dimerize was typical of other INs.
To investigate the contribution of the C-terminal domains to the multimerization of IN, the GPY/F fragment and the CHD were subjected to gel filtration with superdex 75 (Fig. 3A). The CHD had an estimated molecular weight of 11.7 kDa, indicating it was a monomer. The profile produced by the GPY/F fragment included three major peaks (Fig. 3B). The apparent sizes of these species were monomer, dimer, and trimer. To test whether the GPF residues in the center of the motif contributed to multimerization, single amino acid substitutions were generated. Both substitutions, GPF to APF and GPF to GAF completely disrupted multimerization of the GPY/F fragment (Figs. 3C and 3D). These data indicate that the GPY/F residues play a central role in promoting multimerization. In separate experiments to test for multimers we subjected the GPY/F fragment to the chemical cross-linker dithiobis succinimidyl propionate. Gel electrophoresis of the cross-linked sample indicated the protein was in an equilibrium of monomers, dimers, trimers and tetramers.
The C-terminal domains of INs are known to bind DNA without sequence specificity. To map which sections of Tf1 IN interact with DNA, each of the individual domains was assayed for DNA binding. Labeled oligonucleotides were mixed with the individual domains and the mixtures were cross-linked by UV. The full-length IN, CH-,, core, and the GPY/F fragment had substantial DNA binding activity. These DNA binding activities correspond well to what has been described for other INs. Interestingly, CH- bound significantly more DNA than the full-length IN. This indicated that the inhibitory activity of the CHD may act by blocking DNA binding.
In additional experiments the single amino acid substitutions in the GPF residues did not reduce DNA binding. This result indicates that other sequences in the GPY/F fragment mediated the DNA binding activity and the contribution of the GPF residues in the GPY/F fragment appears to be specific for promoting multimerization.
The contribution to catalysis of the GP residues was tested by generating recombinant IN with the substitutions GPF to APF and GPF to GAF. Both of these mutations abolished the strand transfer and disintegration activities. This requirement of the GPF residues for catalysis together with their contribution to multimerization indicated that the principle function of the GPY/F motif was to promote multimerization of the C-terminus. Our finding that the GPF to APF and GPF to GAF mutations did not reduce DNA binding indicated that multimerization was not required for DNA binding. The wide-spread conservation of the GPY/F motif supports the conclusion that these amino acids play a critical role in the function of IN.
Many transposable elements protect the coding capacity of their host by directing integration to nonessential regions of the genome. The preference of Tf1 for integrating upstream of genes is a mechanism that protects the coding sequences of S. pombe. This preference may be due to interactions of IN with chromatin factors. This possibility is supported by the presence in the IN of a CHD that may bind histones with specific modifications (Malik and Eickbush 1999 10). Alternatively, the integration of Tf1 upstream of ORFs could be due to interaction with chromatin remodeling complexes or transcription factors. To identify what determinants specify the positions of integration we developed a plasmid-based targeting assay in cells induced for Tf1 transposition.
To identify what factors determine the insertion sites of Tf1 integration we developed a plasmid based targeting assay that measured the integration activity of specific sequences. A strain of S. pombe was generated that contained a plasmid for the expression of Tf1 and a plasmid for target sequences. Tf1 was marked with neo so that integration would result in resistance to G418. After transcription of Tf1-neo was induced, the expression plasmid was removed by counter selection and cells with insertion events were selected on medium containing G418. Target plasmids extracted from individual patches were introduced into bacteria by selecting for resistance to ampicillin. Due to the presence of neo in Tf1, target plasmids with insertions resulted in colonies that were resistant to both ampicillin and kanamycin. These plasmids were readily identified and the position of their insertions sequenced.
Figure 6. Integration pattern of Tf1 into a target plasmid containing ade6.
To test the integration preferences of Tf1 a target plasmid was created that contained a potential target consisting of the ORF of ade6, its upstream intergenic region, and a portion of the adjacent ORF, bub1 (Fig. 1). The ade6 and bub1 genes were in divergent orientation so the 405 bp integenic sequence contained two promoters. This particular segment of the pombe genome contains chromatin structure that is maintained when placed on a plasmid. The only other portion of the target plasmid that contained a promoter was LEU2d, a selectable gene from S. cerevisiae that has a damaged promoter. The target plasmid was introduced into a strain of S. pombe and individual patches of cells were induced for the expression of Tf1-neo. No selection was placed on the function of ade6. The patches were then replica printed to medium containing G418 to select for cells with integration events. Target plasmids were isolated by preparing DNA from each G418R patch and electroporating the DNA into bacteria. Colonies resistant to ampicillin and kanamycin contained target plasmids with insertions of Tf1-neo. Of 43 independent insertions of Tf1 isolated in the plasmid, 41 (95%) occurred in the intergenic region between bub1 and ade6 (Fig. 1). These results demonstrated that the preference of Tf1 for integration upstream of ORFs was reconstituted within the context of a plasmid. All 41 insertions in the intergenic region occurred within a 160 bp window. Interestingly, the insertions clustered at specific nucleotides where no preferences for orientation were observed.
Figure 7. Integration frequency and target efficiency of plasmids with deletions in ade6. The 160 bp target window (gray line) in the promoter of ade6 is the only sequence required for Tf1 integration.
To identify which sections of bub1-ade6 were important for targeted integration, we made a series of deletions and tested their impact on the frequency of integration (Fig. 2). These data revealed that the 160 bp target window was the only portion of bub1-ade6 that was important for efficient integration. To test whether the insertion window was sufficient to be an integration target, a plasmid was created that contained just the 160 bp window. These experiments showed that just 173 bp of the promoter sequence functioned as a highly specific target for integration.
The targeting of Tf1 to the region of the ade6 and bub1 promoters suggests that the transcription of these two genes may play a critical role in integration. This possibility was tested by comparing the amounts of mRNA produced by the deletion plasmids to their efficiency of targeted integration. Two deletions in the promoter sequences exhibited sharp reductions in the transcription from both promoters and yet both plasmids supported efficient targeting. This indicates that the transcription activity of the promoters per se did not play an important role in targeted integration.The result that active transcription was not important for efficient integration into the promoters of bub1-ade6 suggests that features such as DNA binding proteins may be recognized by Tf1. To determine whether nucleosomes or DNA-bound transcription factors were positioned at integration sites we mapped the chromatin at the promoters of bub1-ade6 with micrococcal nuclease. These experiments revealed four supersensitive sites clustered together within the intergenic sequence. The intensity of these bands and their close spacing within the promoter region suggests they were created by transcription factors bound to regulatory sequences. Of the five major sites of insertion, four corresponded closely with positions of micrococcal sensitivity. This close association suggests the promoter binding proteins responsible for the micrococcal sites played a role in recruiting IN to the sites of integration. This is consistent with the finding that 160 bp of the bub1-ade6 promoters was the only sequence required for targeted integration. Unfortunately, the transcription factors that bind the promoters of bub1-ade6 are unknown. As a result, we chose to study the integration activity of a promoter for which the transcription factors are known.
To test whether Tf1 integration was directed by transcription factors, we studied the insertion activity of a promoter that has been characterized extensively. The promoter of fbp1 is highly regulated and contains an upstream activating sequence (UAS1) that consists of an eight bp binding site for the transcription activator Atf1p (Neely and Hoffman 2000 11). In addition, UAS2 consists of a short binding sequence for the transcription activator Rst2p as well as other factors (Neely and Hoffman 200011); Higuchi, Watanabe et al. 20023).We tested whether the promoter of fbp1 was a target for Tf1 integration using the target plasmid assay. Eighty-six percent of the insertions in the plasmid occurred within the promoter region of fbp1. Thirteen ( 62%) of the 21 insertions isolated in the plasmid occurred at the two dominant positions, 4923 and 4933 (Fig. 3A). These positions were just 30 and 40 bp downstream of UAS1. The lack of any insertions in the open reading frame and the clustering of inserts upstream of the open reading frame indicated that the insertions were specific for the promoter of fbp1. The role of transcription factors in integration was further supported by the concentration of integrations that occurred adjacent to the binding site of Atf1p at UAS1.
Figure 8. Integration sites in the fbp1 promoter. A. The wild type promoter has two dominant insertion sites. B. The eight nt mutation of UAS1 reduced integration at the dominant sites by 9-fold.
The integration activity of fbp1 and the published information about UAS1 allowed us to ask whether the clustering of inserts in the promoter region was the result of UAS1 function. We changed the sequence of all eight nucleotides in UAS1 of the fbp1 plasmid and used the target assay to ask whether the pattern of integration was changed. In the plasmid with the mutated UAS1, just three of 42 insertions (7%) occurred at positions 4923 and 4933 (Fig. 3B). Thus, the mutated UAS1 resulted in a 9-fold reduction in integration at the two dominant positions of insertion. The defect in the ability of UAS1 to be recognized by Tf1 allowed the secondary targets 5183 and 5149 to become major sites of integration. This result demonstrated that a functional binding site for the transcription factor Aft1p plays an important role in the targeting of Tf1 integration to the two major insertion sites in the fbp1 promoter.
The integration of Tf1 in the promoter of fbp1 may result from a tethering of IN to UAS1 by Aft1p. An interaction of Atf1p with IN could be direct or through proteins that bind Aft1p, such as the coactivator Pcr1p. Immunoprecipitation of Atf1p from cell extracts revealed that IN was in a complex with Atf1p. Further experiments tested whether Atf1p itself contributed to integration in the promoter of fbp1. Results from plasmid target experiments revealed that in the strain lacking Atf1p, insertions were no longer directed to the fbp1 promoter. Together, these results indicate that Atf1p mediates integration of Tf1 in fbp1 by tethering IN to UAS1. Whether integration at other promoters is due to Aft1p or other transcription factors remains to be tested.
The relationship between transposable elements and their hosts allows for integration of the transposon that does not impair the fitness of the host. The insertion of Tf1-neo introduces approximately 6 kb of DNA into the promoter region of genes. This form of DNA damage might be expected to disrupt promoter function. For example, one deletion we made in the promoter of ade6 revealed key promoter elements were upstream of all the major sites of integration. We tested whether the integration of Tf1-neo at the primary site of insertion, nucleotide 4573, disrupted the function of the ade6 promoter. Target plasmids with Tf1-neo insertions were introduced into S. pombe and the levels of ade6 mRNA were measured on RNA blots. The ade6 mRNA was quantified relative to the amounts of actin mRNA. The copy number of the plasmids in each transformant was determined on DNA blots and these values were used to adjust the levels of ade6 mRNA. Surprisingly, insertion of Tf1-neo actually stimulated ade6 transcription. Insertion in the left orientation at nucleotide 4573 increased ade6 mRNA by 2.1-fold while insertion in the right orientation increased ade6 expression by 3.6-fold. Even though the insertions were downstream of essential promoter elements, ade6 expression was not reduced. The absence of defects in the expression of ade6 together with the increases observed suggests the intriguing possibility that Tf1 carries promoter elements that are capable of repairing the promoters disrupted by the transposon.
We tested directly whether Tf1 was capable of activating a damaged version of the ade6 promoter that lacked key promoter elements. A deletion that removed the sequences between the target site 4573 and the ATG of bub1 reduced levels of ade6 mRNA 4.3-fold. An identical construct except that it contained Tf1-neo inserted at nucleotide 4573 produced 7.2 and 2.9 times more ade6 mRNA depending on the orientation of Tf1-neo. These data demonstrate that Tf1-neo introduced promoter elements that substituted for the key transcription components located upstream of nucleotide 4573.
The insertion of Tf1 sequences could cause the target promoters to adapt the regulatory properties of Tf1 transcription. If Tf1 expression is similar to a closely related element Tf2, conditions of oxidative stress would induce a wave of transposition that in turn could cause cellular genes to become regulated by stress. This process could provide a selective advantage for the host by inducing genes that could increase survival. This would be an evolutionary strategy for creating populations of cells expressing different genes in response to stress that would improve the chances of survival for the species.
All related news