|
|
|
|
|
|
|
|
|
|
|
|
References to the Alternative Splicing Database: ASDB: database of alternatively spliced genes I. Dralyuk, M.Brudno, M. S. Gelfand, M. Zorn, and I. Dubchak (2000) Nucleic Acids Research 28(1), 296-297. M. S. Gelfand, I. Dubchak, I. Dralyuk and M. Zorn (1999) Nucleic Acids Research, 27(1), 301. |
|
Alternative pre-mRNA splicing is an important mechanism for regulating gene expression in higher eukaryotes. By recent estimates, the primary transcripts of ~30% of human genes are subject to alternative splicing, often regulated in specific spatial/temporal patterns during normal development. Alternative splicing plays a major role in sex determination in Drosophila, antibody response in humans and other tissue or developmental stage specific processes (Chabot 1996; Breitbart et al. 1987, Smith et al 1989). In complex genes alternative splicing can generate dozens or even hundreds of different mRNA isoforms from a single transcript ( Breitbart and Nadal-Ginard 1987, Missler & Sudhof 1998, Gascard et al 1998). In many cases the alternatively spliced exon encodes a protein domain that is functionally important for catalytic activity or binding interactions, the resulting proteins can exhibit different or even antagonistic activities. The Alternative Splicing DB (ASDB) has been established with the intention of assembling in a central, publicly accessible site information about alternatively spliced genes, their products and expression patterns. The current ASDB format was established without explicit funding for this project and should be viewed as an early prototype rather than a completed project. When such funding is obtained we plan to greatly expand and update splicing information by curating information from literature and sequence database sources, and to improve tools to allow searching for all products of alternative splicing produced in a particular tissue or a given organism, or all variants generated by a particular transcript. Thus it should be useful not only for molecular biologists studying splicing, but also for developmental biologists, geneticists, cell biologists etc. Database content and composition
Version 2.1 of ASDB consists of two divisions,
ASDB(proteins)
, which contains amino acid sequences, and
ASDB(nucleotides)
with genomic sequences.
SWISS-PROT
uses two formats for description of alternative
splicing Thus the protein sequences were selected from
SWISS-PROT using full text search for both the
words "alternative splicing" (usually in the CC lines) and "varsplic" (in the FT lines). Some
entries describe just one alternatively spliced variant, some indicate several. In order to
group proteins that could arise by alternative splicing of the same gene, we developed the clustering
procedure. Two proteins were linked if they had a common fragment of at least 20 amino acids,
and clusters were initially defined as maximum connected groups of linked proteins. It turned
out that some clusters were chimeric, in the sense that they contained members of multigene families,
but not alternatively spliced variants of one gene.  Therefore the multiple alignments were subject
to additional analysis aimed at detection of chimeric clusters.
Each cluster is represented by multiple alignment of its members constructed using
CLUSTALW.
The distribution of cluster size, representation of species and other relevant statistics of
ASDB(proteins) can be accessed through the links below.
This processing covers the cases when alternatively spliced variants are described in separate
SWISS-PROT entries. The other kinds of ASDB records, originating from the SWISS-PROT entries
with the "varsplic" field in the feature table, usually describe the proteins that are not part
of any cluster. In these cases, the information on the variable fragments of the several
proteins which result from the alternative splicing of a single gene is contained in the entry
itself.
ASDB(proteins)
entries are marked with different symbols to allow for easy
differentiation among the three types: those proteins which are part of the ASDB clusters and the
corresponding multialignments, those which have the information on different variants in the
associated SWISS-PROT entries, and those for which the information on the variants is not available
at the present time. ASDB contains internal links between entries and/or clusters, as well
as external links to Medline, GenBank and SWISS-PROT entries.
The
ASDB(nucleotides)
division was generated by collecting all
GenBank
entries containing the words "alternative splicing" and further selection of those
entries that contain complete gene sequences (all CDS fields are complete, i.e. they do not have
continuation signs).
The search string allows for queries with boolean logic, and supports parenthetical statements for
greater flexibility and quoted strings for exact matches. We support the standard logical primitives
(AND, OR) which can be parenthesized to indicate precedence. The following are two examples of valid
searches:
An in-depth description of the syntax is available from the interactive help window by selecting "All Fields".
Note that the matches are marked blue while the mismatches are black. Short runs of matches (less than three amino acids) are marked red. Warning: The formal clustrering procedure used in version 2.1 of ASDB sometimes merges into clusters members of multigene families. Thus some clusters do not correspond to alternatively spliced genes. These situations can be diagnosed by numerous scattered mismatches (as opposed to longer mismatching segments and exact match in the remaining regions in the case of bona fide alternative splicing). Further development
|
|