Download SUPERFAMILY Models, Database Dump and Genome Assignments
To gain access to downloads from the SUPERFAMILY ftp server a
license agreement must be obtained, which is free for
academic and commercial use.
There is a short registration form to be filled in,
which gives immediate access to the downloads via ftp. To access the downloads follow
the instructions provided during registration and the files will appear on the private
ftp site as described below.
The SUPERFAMILY package does not include all of the software required to use the models.
This must be obtained from elsewhere; users are advised to consider
SAM (recommended)
and HMMER.
It is also possible to use PSI-BLAST, but this does not work with any of the parsing
scripts and is only advised if you really know what you're doing.
Files and directories refer to the ftp site.
- Models
The hidden Markov models are available for download in both SAM
(/models/1.73/sam_1.73.tar.gz) and HMMER (/models/1.73/hmmer_1.73.tar.gz) format in the
models directory - and now PSI-BLAST format (/models/1.73/psimodlib_1.73.tar.gz).
Either the /models/1.73/models.tab file or the SQL database is needed in addition to the
model library. The model library will be updated with every release of
SCOP. It is also strongly
advised to get the
SCOP
files (cla, des, hie). The family level classification
requires the self hits file (/models/1.73self_hits.tab.gz) in addition to the
above-mentioned SCOP parseable files. There is a description of
how to download, setup and run the models.
- Scripts
The /scripts/superfamily.pl is a wrapper script to help you run
SUPERFAMILY in either HMMER or SAM mode. This should be very useful for getting things
running.
There is an important script which it is very strongly recommended for
parsing the scores and alignments generated by a complete SUPERFAMILY model library search.
It is in /scripts/assignment.pl and is designed to be run on a directory containing score
files for each model. The output is an '.ass' file which should contain a list of
non-conflicting domain assignments. Currently it only supports directories with files
containing alignments. The family level classification
requires the /scripts/familyassignment.pl script in place of the standard assignment.pl
script. Read the documentation for more info and
help using the scripts with the hidden Markov models.
There are also other useful scripts here.
- Genome assignments
The genome assignments are contained in the SQL database, but may
also be accessed in the /genomes directory in flat file format. The file
(/genomes/ass_date.tab) contains the genome assignments for all genomes. The file
/genomes/genome.tab contains information on the genomes. This directory may not be as up
to date as the web site, if you would like some data which is on the web-site but not yet
in the ftp directory then e-mail
superfamily@mrc-lmb.cam.ac.uk.
- SUPERFAMILY MySQL database
There is a MySQL dump of the SUPERFAMILY database (in the
/sql/supfam_date.sql.gz file, where the appropriate date must be substituted). This contains
all of the genome assignments and associated information. It is recommended that this be
downloaded and installed if the genome assignments are to be used. There is a description of
how to download, install and query the database.
Each of the database tables are in turn described
here, and an entity relationship model
diagram is included.
- Seed sequences of SUPERFAMILY models
The SCOP domains used as seed sequences to build the SUPERFAMILY
models are available for download. They have been filtered to different levels of
redundancy, and can be found in the /sequences directory.
The
SCOP parsable files
(cla, des, hie) are necessary to make sense of the SCOP identifiers used by the SUPERFAMILY
database. A detailed description may be found
here.
The genome assignments and MySQL database dump are updated weekly.
|