SUPERFAMILY 1.75 HMM library and genome assignments server

Superfamily is undergoing a server migration - you are now browsing on the new server. Please contact us if you experience any problems.

Generate SCOP Domain Assignments using the SUPERFAMILY Models

This page describes how to produce SCOP protein domain assignments using the SUPERFAMILY hidden Markov models (HMMs) and associated scripts.

Introduction

The process involves running a set of FASTA formatted sequences against the models using the provided scripts. The results are a set of SCOP superfamily, and family, level domain assignments.

This page is divided into three main sections:

Setting up the models and scripts is a multi-step process. There may be issues for some combinations of machines and operating systems. If you have read this document, and the relevant sections of the HMMER3 documentation, and are still having a problem, then please contact us: superfamily@cs.bris.ac.uk, feedback form.

Alternatively, we can produce domain assignments for your sequences. All we require is a set of protein sequences in FASTA format.


1: Setup models and scripts

The scripts are written in perl. Any recent version of perl should work. Around 500 MB of hard disk space will be required. We assume you are using a linux/unix environment.

1.1 Register for a SUPERFAMILY license (free for academic and commercial use).
Download the SUPERFAMILY models and scripts:

   wget --http-user USERNAME --http-password PASSWORD -r -np -nd -e robots=off \
   -R 'index.html*' 'http://supfam.org/SUPERFAMILY/downloads/license/supfam-local-1.75/'
Please use the username and password you receive after registering for a license.

If wget is unavailable on your system, the required files can be downloaded individually. You will require all files in the models and scripts directories, as well as sequences/pdbj95d.gz.

1.2 The hmmscan program from the HMMER3 software package is recommended [ PubMed12364612 ] for scoring sequences against the SUPERFAMILY models.
Download HMMER3 and follow the installation instructions that come with it. The scripts for running the models require the hmmscan program to be in your command PATH environment variable.

1.3 Download the SCOP 1.75 dir.des.scop.txt and dir.cla.scop.txt files:

   wget http://scop.mrc-lmb.cam.ac.uk/scop/parse/dir.des.scop.txt_1.75
   wget http://scop.mrc-lmb.cam.ac.uk/scop/parse/dir.cla.scop.txt_1.75
   mv dir.des.scop.txt_1.75 dir.des.scop.txt
   mv dir.cla.scop.txt_1.75 dir.cla.scop.txt
These files are required for the family level classification [ PubMed16877569 ].

1.4 Setup the infrastructure required by the scripts:

   gunzip pdbj95d.gz
   gunzip model.tab.gz
   gunzip hmmlib_1.75.gz
   mv hmmlib_1.75 hmmlib
   gunzip self_hits.tab.gz
   mkdir scratch 
   chmod u+x *.pl

   hmmpress hmmlib
N.B. you must run hmmpress on the hmmlib file before it can be used with HMMER3.


2: Use scripts to produce domain assignments

Run superfamily.pl to produce the domain assignments:

   # Simple
   ./superfamily.pl human.fa

N.B. you must make sure all scripts are in the working directory (and that './' is in your path) or that they are in your path.


3: Domain assignment output formats

Output is a tab-delimited file of domains, one domain per line.
There can be more than one domain per sequence, and there may be sequences for which there is no domain assignment.

The columns, for computer readable 'ass' file output from ass3.pl (the default):

   Sequence ID      
   SUPERFAMILY model ID      
   Match region     
   Evalue score     
   Model match start position
   Alignment to model 
   Family evalue  
   SCOP Family ID
   SCOP domain ID of closest structure (px value)

The columns, for html output:

Sequence ID
Match region
E-value Score
SCOP superfamily
Family E-value
SCOP family evalue
Closest structure
Alignment


If you have further questions, suggestions or comments, then please contact us using the feedback form or via email superfamily@cs.bris.ac.uk.