dcGO Predictor: Extracting knowledge of function, phenotype and disease from genome sequences
For a query sequence (see Protein sequence), the dcGO Predictor uses the following procedure to predict the ontological terms associated with the query:
First, it obtains the Domain architecture and its domains and supra-domains from the SUPERFAMILY database.
Then, it uses the domain-centric annotations to predict the ontology terms relevant to the query:
Finally, the predictive score is rescaled and used to rank the predictions. A higher value of the p-score indicates a more evident prediction.
- If the query contains a domain/supra-domain, then all ontology terms associated with that domain/supra-domain are transferred to the query (together with hypegeometric score, h-score);
- When a term-to-query transfer is supported by one or more domains/supra-domains, the h-scores are summed to calculate a predictive score (P-score);
- The P-score is then rescaled within the range 0-1. For each namespace (e.g., three sub-ontologies for GO), P-score=(SUM-MIN)/(MAX-MIN), where SUM is the sum of all h-scores supporting a term to be transferred to the query, MIN and MAX are respectively the minimum and maximum of SUM over a whole list of predicted terms for the target;
- Via Single Query from the Faceted Search, by clicking the logo.
- Via Batch Query, allowing the submission of up to 10000 sequences, producing results that:
1) give a summary of the prediction content by counting the number/percentage of sequences annotated by ontological terms at four levels (i.e., slim version);
2) offer a download of predictions for both the slim verion and full version;
3) links to explore the prediction details for each of the input sequences.
- Via Hyperlink, which takes the form of URLs (Without submission form but comma-separated sequence identifiers). For example, the prediction for Q01826, Q8TCS8 and O75376: http://supfam.org/SUPERFAMILY/cgi-bin/dcpredictormain.cgi?seq_query=Q01826,Q8TCS8,O75376