Alternative Splicing DB (ASDB)

Database content How to use the database Further work / development SEARCH THE DATABASE


References to the Alternative Splicing Database:

ASDB: database of alternatively spliced genes

I. Dralyuk, M.Brudno, M. S. Gelfand, M. Zorn, and I. Dubchak (2000) Nucleic Acids Research 28(1), 296-297.

M. S. Gelfand, I. Dubchak, I. Dralyuk and M. Zorn (1999) Nucleic Acids Research, 27(1), 301.


Alternative pre-mRNA splicing is an important mechanism for regulating gene expression in higher eukaryotes. By recent estimates, the primary transcripts of ~30% of human genes are subject to alternative splicing, often regulated in specific spatial/temporal patterns during normal development. Alternative splicing plays a major role in sex determination in Drosophila, antibody response in humans and other tissue or developmental stage specific processes (Chabot 1996; Breitbart et al. 1987, Smith et al 1989). In complex genes alternative splicing can generate dozens or even hundreds of different mRNA isoforms from a single transcript ( Breitbart and Nadal-Ginard 1987, Missler & Sudhof 1998, Gascard et al 1998). In many cases the alternatively spliced exon encodes a protein domain that is functionally important for catalytic activity or binding interactions, the resulting proteins can exhibit different or even antagonistic activities.

The Alternative Splicing DB (ASDB) has been established with the intention of assembling in a central, publicly accessible site information about alternatively spliced genes, their products and expression patterns. The current ASDB format was established without explicit funding for this project and should be viewed as an early prototype rather than a completed project. When such funding is obtained we plan to greatly expand and update splicing information by curating information from literature and sequence database sources, and to improve tools to allow searching for all products of alternative splicing produced in a particular tissue or a given organism, or all variants generated by a particular transcript. Thus it should be useful not only for molecular biologists studying splicing, but also for developmental biologists, geneticists, cell biologists etc.


Database content and composition Return to the top

Version 2.1 of ASDB consists of two divisions, ASDB(proteins) , which contains amino acid sequences, and ASDB(nucleotides) with genomic sequences.

SWISS-PROT uses two formats for description of alternative splicing  Thus the protein sequences were selected from SWISS-PROT using full text search for both the words "alternative splicing" (usually in the CC lines) and "varsplic" (in the FT lines).  Some entries describe just one alternatively spliced variant, some indicate several.  In order to group proteins that could arise by alternative splicing of the same gene, we developed the clustering procedure.  Two proteins were linked if they had a common fragment of at least 20 amino acids, and clusters were initially defined as maximum connected groups of linked proteins.  It turned out that some clusters were chimeric, in the sense that they contained members of multigene families, but not alternatively spliced variants of one gene.  Therefore the multiple alignments were subject to additional analysis aimed at detection of chimeric clusters.

Each cluster is represented by multiple alignment of its members constructed using CLUSTALW. The distribution of cluster size, representation of species and other relevant statistics of ASDB(proteins) can be accessed through the links below.

This processing covers the cases when alternatively spliced variants are described in separate SWISS-PROT entries.  The other kinds of ASDB records, originating from the SWISS-PROT entries with the "varsplic" field in the feature table, usually describe the proteins that are not part of any cluster.  In these cases, the information on the variable fragments of the several proteins which result from the alternative splicing of a single gene is contained in the entry itself.  ASDB(proteins) entries are marked with different symbols to allow for easy differentiation among the three types: those proteins which are part of the ASDB clusters and the corresponding multialignments, those which have the information on different variants in the associated SWISS-PROT entries, and those for which the information on the variants is not available at the present time.  ASDB contains internal links between entries and/or clusters, as well as external links to Medline, GenBank and SWISS-PROT entries.

The ASDB(nucleotides) division was generated by collecting all GenBank entries containing the words "alternative splicing" and further selection of those entries that contain complete gene sequences (all CDS fields are complete, i.e. they do not have continuation signs).

Some database statistics     Figure 1    Figure 2


How to use the database Return to the top

You can search the database by Medline unique identifiers, GenBank accession numbers, SWISS-PROT identifiers, accession numbers, species names (including high order taxonomical groups), description, keywords and comments. The search page contains two different search boxes for the two divisions and provides interactive help, available by checking the "Show Help" checkbox. This will spawn a help window, which will then automatically update when one chooses different search fields. It provides an explanation of the field and example queries for that field.

The search string allows for queries with boolean logic, and supports parenthetical statements for greater flexibility and quoted strings for exact matches. We support the standard logical primitives (AND, OR) which can be parenthesized to indicate precedence. The following are two examples of valid searches: An in-depth description of the syntax is available from the interactive help window by selecting "All Fields".

Search results page:

  • ASDB(proteins):
    Note that those entries that are in a cluster are marked with a rhomboid, while those with a VARSPLIC field in the feature table are marked with a circle. Those that do not belong to either of the above groups are unmarked.

    Each entry starts with the name of the protein. If this protein is a member of a cluster a link to the cluster is given. If it is not, this is explicitly stated. The entry also contains a link to the appropriate SwissProt entry, as well as links to relevant GenBank and Medline entries, if any.

  • ASDB(nucleotides):
    The search results are a list of relevant GenBank links.

Cluster page:

Note that the matches are marked blue while the mismatches are black. Short runs of matches (less than three amino acids) are marked red.

Warning:

The formal clustrering procedure used in version 2.1 of ASDB sometimes merges into clusters members of multigene families. Thus some clusters do not correspond to alternatively spliced genes. These situations can be diagnosed by numerous scattered mismatches (as opposed to longer mismatching segments and exact match in the remaining regions in the case of bona fide alternative splicing).


Further development Return to the top

  • Classification of basic types of alternative splicing
  • Incorporation of data about aberrant splicing and splicing mutations
  • Manual curation of the database

Database Email Contact

Copyright © 1998, 1999 Lawrence Berkeley National Laboratory

Authors:  Igor Dralyuk, Michael Brudno, Inna Dubchak, Mikhail Gelfand.

Disclaimers


Alternative Splicing DB