Files and directories refer to the ftp site.
- Models
The hidden Markov models are available for download in both SAM (/models/sam_1.69.tar.gz) and HMMER (/models/hmmer_1.69.tar.gz) format in the models directory -and now PSI-BLAST format (/models/psimodlib_1.69.tar.gz)-. Either the /models/models.tab file or the SQL database is needed in addition to the model library. The model library will be updated with every release of SCOP. It is also strongly advised to get the SCOP files (cla, des, hie). The new family level classification requires the self hits file (/models/self_hits.tab.gz) in addition to the above-mentioned SCOP parseable files. There is a page describing how to download, setup and run the models.
- Scripts
The /scripts/superfamily.pl is a wrapper script to help you run SUPERFAMILY in either HMMER or SAM mode. This should be very useful for getting things running.
There is an important script which it is very strongly recommended for parsing the scores and alignments generated by a complete SUPERFAMILY model library search. It is in /scripts/assignment.pl and is designed to be run on a directory containing score files for each model. The output is a '.ass' file which should contain a list of non-conflicting domain assignments. Currently it only supports directories with files containing alignments. The new family level classification requires the /scripts/familyassignment.pl script in place of the standard assignment.pl script. Read the documentation for more info and help using this script with the hidden Markov models.
There are also other useful scripts here.
- Genome assignments
The genome assignments are contained in the SQL database, but may also be accessed in the /genomes directory in flat file format. The file (/genomes/ass_date.tab) contains the genome assignments for all genomes. The file /genomes/genome.tab contains information on the genomes. This directory may not be as up to date as the web site, if you would like some data which is on the web-site but not yet in the ftp directory then e-mail superfamily@mrc-lmb.cam.ac.uk.
- SUPERFAMILY MySQL database
There is a MySQL dump of the SUPERFAMILY database (in the /sql/supfam_date.sql.gz file, where the appropriate date must be substituted). This contains all of the genome assignments and associated information. It is recommended that this be downloaded and installed if the genome assignments are to be used. There is documentation describing how to download, install and query the database. Each of the database tables are in turn described here, and an entity relationship model diagram is included.
- Seed sequences of SUPERFAMILY models
The SCOP domains used as seed sequences to build the SUPERFAMILY models are available for download. They have been filtered to different levels of redundancy, and can be found in the /sequences directory.
- SCOP files
The SCOP parsable files (cla, des, hie) are necessary to make sense of the SCOP identifiers used by the SUPERFAMILY database. A detailed description may be found here.
The genome assignments and MySQL database dump are updated weekly.