SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms.
A domain is the smallest unit of evolution; a large protein can be split into smaller domains.
Domains can occur by themselves or in combination with other domains. A superfamily groups together domains of different
families which have a common evolutionary ancestor based on structural, functional and evolutionary data.
The SUPERFAMILY web site and database provides protein domain assignments, at the SCOP
'superfamily' and 'family' levels, for the predicted protein sequences in over 900 organisms,
(plus sequence collections such as UniProt).
Please contact us if you think we have missed any organisms. SUPERFAMILY domain assignments
are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for
browsing and download. Sophisticated tools are provided for the analysis of superfamily (and family) domain assignments.
SUPERFAMILY is a member of the InterPro consortium of protein annotation databases, and has been integrated into the Ensembl eukaryotic genome project and The Arabidopsis Information Resource. To date, the SUPERFAMILY publications have been cited over 400 times. SUPERFAMILY has been used in structural, functional, evolutionary and phylogenetic research projects.
Purpose
The purpose of this server is to provide structural (and hence implied functional) assignments to protein
sequences primarily at the SCOP superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary
ancestor. What this service offers is sophisticated and expertly chosen remote homology detection. What it
does not offer is an improvement in speed or assignment of superfamilies not of known structure.
There is a facility to compute assignments for your own DNA or protein sequences, and there is access to genome
assignments and to multiple sequence alignments of SCOP superfamilies. If you
have an interest in running large numbers of sequences, then please don't hesitate to
contact us via superfamily@mrc-lmb.cam.ac.uk.
The web site includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches.
Sequence Search Description
The sequence search method uses a library (covering all proteins of known structure) consisting of
1539 SCOP superfamilies from classes a to g. Each superfamily is represented by a
group of hidden Markov models. Your query sequences will be assigned
e-value scores for all models, and the significant ones will be returned. Each sequence may well hit a
superfamily more than once as there are several overlapping models for each superfamily, however it is the
hit to the superfamily which is meaningful. Each model is created from a seed sequence which is aligned to
many superfamily homologues. The model is built from the alignment (please see the SAM website for a detailed explanation).
A hit to a model is not a hit to the seed but is a hit to the superfamily which the model
represents. You may view sequences aligned to the models which represent a
view of the superfamily although it may be biased towards the seed. You may also see the genome
assignments for each superfamily or view alignments of the genome
sequences.
The SUPERFAMILY server is based upon release 1.69 of the
SCOP structural classification of
proteins, the corresponding sequences from ASTRAL, and the SAM hidden markov model software.
Comparative Genomics Tools
The SUPERFAMILY web site provides a number of comparative genomics tools for the analysis of
superfamily, and family, domains from across the tree of life. These tools include: lists of unusual
(over- and under-represented) superfamilies and families, adjacent domain pair lists and graphs, unique domain pairs,
domain combinations, domain architecture co-occurrence networks and domain distribution across taxonomic kingdoms for
each organism. A detailed description of what these tools can do, and how to use them can be found on the
comparative genomics page.
Downloads
Downloads are instantly available upon application for a free license. The model library, genome assignments and some software are available. Genome assignments are updated weekly. There is
a low traffic announcement mailing list for notification of updates/changes.
Citation
Groups using results derived from this project for publication are asked to cite:
A detailed list of the SUPERFAMILY publications can be found here.