Download SUPERFAMILY Models, Database Dump and Genome Assignments
N.B. It is no longer necessary to install SUPERFAMILY locally. Instead you may use our EC2 AWS cloud image.
To gain access to downloads from the SUPERFAMILY ftp server a
license agreement must be obtained, which is free for
academic and commercial use.
There is a short registration form to be filled in,
which gives immediate access to the downloads via ftp. To access the downloads follow
the instructions provided during registration and the files will appear on the private
ftp site as described below.
The SUPERFAMILY package does not include all of the software required to use the models.
This must be obtained from elsewhere; users are advised to consider
SAM
and HMMER3 (strongly recommended).
It is also possible to use PSI-BLAST, but this does not work with any of the parsing
scripts and is only advised if you really know what you're doing (N.B. parsing script no longer parse SAM output either).
Files and directories refer to the ftp site.
- Models
The hidden Markov models are available for download in HMMER3 (/models/hmmlib_1.75.gz) format in the
models directory.
Either the /models/model.tab file or the SQL database is needed in addition to the
model library. The model library will be updated with every release of
SCOP. It is also strongly
advised to get the
SCOP
files (cla, des, hie). The family level classification
requires the self hits file (/models/self_hits.tab.gz) in addition to the
above-mentioned SCOP parseable files. There is a description of
how to download, setup and run the models.
- Scripts
There is no longer a wrapper script for running the HMMs, simply use the hmmscan program from HMMER3.
There is an important script which it is very strongly recommended for
parsing the scores and alignments generated by a complete SUPERFAMILY model library search.
It is in /scripts/ass3.pl and is designed to be run on the output from hmmscan (HMMER3).
The output is an '.ass' file which should contain a list of
non-conflicting domain assignments. N.B. to run small numbers of sequences turn file checking off, because one of the checks is for a minimum number of sequences in the FASTA file.
In addition you may run sequences through the script /scripts/superfamily.pl. The superfamily.pl script is a wrapper that calls several other programs that will be responsible for formating the sequences, calling HMMER3, parsing and creating a html formatted version of the output (using /scripts/ass_to_html.pl). You can check how to use the script on the Amazon cloud , for a detailed description of the scripts and the output files.
There are also other useful scripts here.
- Genome assignments
The genome assignments are contained in the SQL database, but may
also be accessed in the /genomes directory in flat file format. The file
(/genomes/ass_date.tab) contains the genome assignments for all genomes. The file
/genomes/genome.tab contains information on the genomes. This directory may not be as up
to date as the web site, if you would like some data which is on the web-site but not yet
in the ftp directory then e-mail
superfamily@cs.bris.ac.uk.
- SUPERFAMILY MySQL database
There is a MySQL dump of the SUPERFAMILY database (in the
/sql/supfam_date.sql.gz file, where the appropriate date must be substituted). This contains
all of the genome assignments and associated information. It is recommended that this be
downloaded and installed if the genome assignments are to be used. There is a description of
how to download, install and query the database.
Each of the database tables are in turn described
here, and an entity relationship model
diagram is included.
- Seed sequences of SUPERFAMILY models
The SCOP domains used as seed sequences to build the SUPERFAMILY
models are available for download. They have been filtered to different levels of
redundancy, and can be found in the /sequences directory.
- EC2 AWS cloud image
The SUPERFAMILY pipeline for analysing a FASTA format file of protein sequences is available pre-installed on an image suitable for the Amazon EC2 AWS cloud computing facility. All you need is an account with Amazon and you can upload your sequences and run our pipeline directly using one command. This is significantly easier than downloading and installing the SUPERFAMIY package. After registration you will be given access to the image, and there are also instructions on how to use the image on the Amazon cloud.
The
SCOP parsable files
(cla, des, hie) are necessary to make sense of the SCOP identifiers used by the SUPERFAMILY
database. A detailed description may be found
here.
The genome assignments and MySQL database dump are updated weekly.
|