|
|
Mouse Enhancer Screen Handbook and Methods
Introduction and Caveats
The goal of this project is to identify distant acting transcriptional enhancers in the Human Genome by coupling the identification of extreme evolutionary conserved noncoding sequence with a relatively high throughput mouse transgenesis enhancer assay.
There are two types of data found at this site. They are (1) an "Experimental Dataset" of conserved noncoding human sequences which have been tested for enhancer activity in transgenic mice and (2) a "Computational Dataset" of whole human genome conserved noncoding elements focused on Chicken, Frog, Zebrafish, and Fugu. Details on each and their availability in the Website can be found below. The Experimental Dataset is also integrated into the Computational Dataset to facilitate the various search options described below.
(1) Experimental Dataset: This section is dedicated to all the elements that have been tested for enhancer activity in transgenic mice. As of August 2005, approximately 150 elements can be found in this portion of the database and this number is expected to steadily grow.
Within the Experimental Dataset (accessible from the Enhancer Browser Homepage), a list of tested enhancer elements is provided. The left hand column indicates the location of the tested elements based on build hg17 (May 2004 UCSC). The adjacent (second) column describes known genes flanking the defined element (if the element is intra-gene, the same gene is shown as flanking). The third column indicates if the element is a positive enhancer (blue mouse), negative enhancer (white mouse), or in progress for testing (question (?) mark). Finally, the right-most column indicates the species displaying conservation with the human element. The term"ULTRA" depicts "Ultra-conserved Elements" (defined as >200bp and 100% identical between human/mouse/rat) (Bejerano, et al. Science. 304(5675):1321-5).
By clicking on the coordinates hyperlink (left column), the user is taken to the existing data for that given element. Features of these drill-down pages include the elements: - coordinates - flanking genes - representative positive transgenic embryos - observed general expression pattern - fasta sequence of the interval - primers used for enhancer cloning - a UCSC Vista Browser overview and clickable hyperlink.
(2) Computational Dataset: Our primary goal is to facilitate the identification of distant acting gene enhancers in the human genome. Our approach has been to identify anciently conserved human sequences (with chicken, frog, zebrafish, and fugu) and to filter these data for any evidence of coding or exonic sequence. More details on the Methods can be found below. Based on our goals and methods, the user should be aware that a significant portion of conserved noncoding sequences have been filtered out of this database and that each putative element found within should be manually examined for coding sequence and the quality of the alignment. The data can be accessed in several ways from the Enhancer Browser homepage (http://enhancer.lbl.gov). They include entry by:
A. Keyword such as Gene Symbol, or GenBank Accession or Locus Link Number. This is the most restrictive search. Since a significant proportion of genes in the human genome do not contain ancient conservation in noncoding sequence, the user should not be surprised or discouraged by the lack of elements in our dataset. Using other dataset such as the UCSC (http://genome.ucsc.edu/) or VISTA (http://pipeline.lbl.gov/) Genome Browsers that incorporate closer related species including mammals and primates. B. Chromosome. Users can select a human chromosome to enter the "Computational Database" and the cataloged elements will appear based on their consecutive coordinates. C. All. The complete dataset can be found by clicking on All. Again, elements will appear ranked based on their linear location in the human genome.
Following entering the database through one of the methods above, a list of candidate enhancer elements is provided. The left hand column indicates the location of the elements based on build hg17 (May 2004 UCSC). The adjacent (middle) column describes known genes flanking the defined element (if the element is intra-gene, the same gene is shown as flanking). The right-most column indicates the species displaying conservation with the human element.
Within this page, the user can also select subsets of data by clicking on header boxes to sort based on: - conservation in fugu - conservation in zebrafish - conservation in frog - conservation in chicken - elements supported by previous publications - elements tested in our Experimental Dataset
Finally, clicking on a coordinates hyperlink will transport the user to the UCSC Vista Browser where a variety of overlapping genomic data can be accessed. More details are available through their website.
Materials and Methods
The basic flow-chart of events for this project are depicted in Figure 1.
Figure 1. Flow-chart of our strategy to couple extreme comparative genomic conservation to a high throughput mouse transgenic enhancer screen. We currently have an existing set of ~1500 human-fugu conserved noncoding elements that can readily be PCR amplified, cloned, microinjected, and assayed for enhancer activity at e11.5 in less than 3 weeks.
The Dataset and Computational Filtering of Elements for In Vivo Analysis
To highlight the evolutionary history and constraint on the finished Human Genome, we performed comparative analysis between this genome and a wide range of available species (chimpanzee, mice, rats, chicken, frog and fugu) (http://paragon.lbl.gov/pgs). Since our goal is the identification of distant acting gene enhancer sequences, these comparative alignments were filter for overlap with known proteins, exons or spliced "Expressed Sequence Tags." In general, our primary focus is to study human-fugu conserved noncoding sequences where the human version is tested in our transgenic enhancer assay. In addition, we have utilized other comparative genomic datasets of extreme conservation such as all noncoding "Ultra-conserved Elements" (defined as >200bp and 100% identical between human/mouse/rat) (Bejerano, et al. Science. 304(5675):1321-5). In total this comparative analysis and exonic filtering resulted in the identification of ~2,800 human-fish conserved noncoding and 256 ultra-conserved elements in the Human Genome. As of August 2005, ~150 of these elements have been tested in transgenic mice and deposited in our Enhancer Browser Database (http://enhancer.lbl.gov).
Summary Statistics of the Human Noncoding "Computational Dataset":
All elements: 145,165 (defined in Martin et al., and available through http://paragon.lbl.gov/pgs)
conserved in chicken 38,237 conserved in frog 12,503 conserved in fugu 4,043 conserved in zebrafish 6,756
conserved in fish (fugu and zebrafish) 2,869 (overlap considered alignment of 20 bp or more)
Martin, J., (+ 120 authors), and Pennacchio, L. A. 2004. The sequence and analysis of duplication rich human chromosome 16, Nature 432(7020):988-94.
Molecular Biology
A. PCR primers are designed to amplify candidate human gene enhancers selected based on extreme evolutionary conservation.
We should emphasize that these lists are not static but will change as additional genomic data become available. We will also manually examine elements to be tested for potential missed or new coding sequence evidence or false-positive alignments. As is found in all genome-wide filtering and alignment strategies, we observe occasional sequences that do not meet our original criteria. We are also interested in providing outside investigators the opportunity to prioritize elements for transgenic testing through requests at our website. Our current pipeline processes batches of 48 or 96 elements for testing.
Primers are designed in bulk using an existing local version of the program primer3. This program is fed a flat text file of repeatmasked conserved sequence plus 200-400 bp of flanking sequence on each end and use a uniform Tm of 60oC. Primers are ordered in a 96 well format and forward oligos are 5' tailed with the required sequence (CACC) for cloning into Gateway vectors. PCRs are tested on human genomic DNA (BD Biosciences), size confirmed and successful products cloned into the standard Gateway entry vector (pENTR/D-TOPO vector, Invitrogen) using the manufacturer recommended protocol. Using the gateway system, the shuffling of these insert sequences to other vectors of choice is made simple and these vectors are available by request. Two clones from each reaction are sequence validated. Failed PCR primers go through two rounds of optimization and continued failed primers are re-ordered.
B. Fragments are cloned into a Gateway compatible hsp 68-lacZ reporter vector.
In order to increase the throughout for reporter construct generation, we are exploiting the Gateway Recombination System (Invitrogen). We have generated a Gateway compatible reporter vector (Gateway-HSP68-LacZ) and each Entry clone is transferred into the destination reporter vector using the LR recombination reaction. Vector PCR is used and/or restriction enzyme analysis to confirm successful insert transfer.
C. Ultra-pure construct DNA are microinjected into fertilized eggs to generate >5 independent transgenic mice for each construct to assess gene enhancer activities.
Plasmid DNA is purified using the EndoFree plasmid maxi kit (Qiagen) and 100 mg of each plasmid is linearized with XhoI or HindIII, followed by purification on Micropure EZ columns and Montage PCR filter units (Millipore). The purified DNA is dialyzed for 24h against injection buffer (10 mM Tris, pH 7.5; 0.1 mM EDTA), and its concentration is determined fluorometrically and by agarose gel electrophoresis. The DNA is diluted to a concentration of 1.5 to 2 ng/ml and used for pronuclear injections of FVB embryos in accordance with standard protocols approved by the Lawrence Berkeley National Laboratory.
D. Embryos are harvested at embryonic day 11.5, PCR genotyped, and stained for beta-galactosidase activity.
Embryos are harvested at embryonic day 11.5 dpc and dissected in cold 100mM phosphate buffer pH 7.3, followed by 30 min of incubation with 4% paraformaldehyde at 4oC. The embryos' are washed three times for 30 min with wash buffer (2mM MgCl2; 0.01% deoxycholate; 0.02% NP-40; 100mM phosphate buffer, pH 7.3). Embryos are stained for 24 h at room temperature with freshly made staining solution (0.8mg/ml X-gal; 4mM potassium ferrocyanide; 4mM potassium ferricyanide; 20mM Tris, pH 7.5 in wash buffer) followed by 3 rinses in 100mM phosphate buffer pH 7.3, and post-fixed in 4% paraformaldehyde. Yolk sacs are carefully dissected from embryos and DNA prepared by boiling the tissue for 20 min in 75ml of solution 1 (25mM NaOH; 0.2mM EDTA), followed by neutralization with 75ml of solution 2 (40mM Tris-HCl). Yolk sac DNA are screened by PCR with LacZ primers (LacZ-F5'-TTTCCATGTTGCCACTCGC;LacZ-R5'-AACGGCTTGCCGTTCAGCA).
E. Each positive developmental enhancer is annotated based on the observed spatial pattern of expression. Digital images as well as preserved embryos and vector constructs are maintained indefinitely.
Over the past six months, we have developed an in house pipeline for general binning of enhancer spatial patterns of expression at embryonic day 11.5. We also have strong affiliations with Brian Black, a developmental biologist at UCSF, and have quarterly meetings with this group to present novel expression patterns. Whole mount digital photos will be taken weekly for all positive staining embryos and stored in our internal database. As discussed, 'group meeting' jamborees are performed routinely to confirm proper annotation and for group consistency as to what constitutes a bona-fide enhancer versus a possible position effect. We have found that the stringent requirement of 3 independent embryos with the same pattern of expression for inclusions in the database is a reasonable choice. Elements with positive staining embryos but not fulfilling this stringent requirement are also deposited and maintained in our database but are not immediately publicly available. Upon request, these data can be provided to outside investigators for their own examination and interpretation. Our primary reason for not making all the raw data publicly available upfront is to provide the community with a clean resource to minimize false-positive data (we err on being too stringent).
F. While all embryos are preserved for more in depth analysis, a subset of positive embryos available on request will be provided to outside investigators and/or sectioned for higher resolution assessment of enhancer activities at LBNL.
Through our website Contact Us link, we provide a means for direct contact to our group for examination of all the raw data associated with an enhancer element. As we described, we require at least 3 independent transgenic mice with the same expression pattern to assign an element a given spatial property and we chose to do so to minimize false-positives and provide a robust public dataset. Nevertheless, we do identify enhancer will less reproducible characteristics and will provide access to these digital archives for investigators interested in examining our raw transgenic data. In addition, we will provide beta-galactosidase stained embryos to outside investigators on request and have a small capacity to perform in house thin section analyses for groups lacking such resources.
General Spatial Enhancer Annotation Rules
In general each construct is tested in a single transgenic experiment at a single timepoint (embryonic day 11.5 (e11.5); aka dpc11.5). Each transgenic embryo is the result of a separate transgene integration event and thus can be scored as an independent founder animal.
Our scoring criteria are as follows:
- INCONCLUSIVE EXPERIMENT: Less than 3 total transgenic mice (as determined by PCR). - NEGATIVE ENHANCER: 3 or more transgenic mice but no reproducible pattern of expression at e11.5. This does not mean these elements are not enhancers at other time-points or at levels of resolution beyond our sensitivity to detect. - POSITIVE ENHANCERS: 3 or more transgenic mice with at least 3 displaying similar patterns of expression. However, in a few instances, highly specific expression patterns found in only 2 transgenics are scored as positive.
Final Comments
We envision this dataset to be of value (but not limited) to:
- Better annotate the human genome and the location of distant acting gene regulatory elements and their functional properties in vivo. - Provide biologist with experimentally tested enhancer elements that drive specific cell type or tissue expression. Such a resource could be used for such applications as cell lineage markers (by driving GFP) or cell lineage specific gene knockouts (by driving Cre in loxP flanked conditional alleles). - Serve as a substrate for computational biologist to better understand the rules that govern gene expression by examining similar expressing elements for over-represented sequence-based features. - Provide the identification of functional gene regulatory sequences for exploration of DNA sequence variation that may effect human biology and/or disease.
Based on these wide-ranging applications, the annotation of this dataset should be used with caution and an understanding of what are its attributes and limitations. For instance, to cater to developmental biologists we do our best to capture moderately reproducible enhancers since we expect such parties would be highly motivated to follow-up novel enhancer elements and to refute or support their validity. In contrast, we also cater to computational biologists who desire robust enhancer training (and negative) sets for informatic driven analysis. Each user has a different goal and should understand and carefully examine all the underlying data to best decide what to include versus avoid for further analyses. |