dcGO - A comprehensive domain-centric ontology resource for post-genomic research on functions, phenotypes, diseases and more   
  
  

Download dcGO

Annotations for SCOP domains and supra-domains

Domain2GO

GO annotations for individual domains

Jump to [ Top · Plain files · MySQL tables ]

The dcGO provides two versions of mappings between individual domains and GO: high-quality mappings and high-coverage mappings.

  • High-quality mappings are those that are supported by both single-domain proteins and when considering all proteins (including multi-domain proteins). It is labelled as 'Domain2GO_supported_by_both'.
  • High-coverage mappings include those which are not supported when only considering single-domain proteins. In other words, high-coverage domain-centric GO annotations require us to take into account all UniProt sequences (including multidomain proteins). It is labelled as 'Domain2GO_supported_only_by_all'.
The high-quality mappings are more precisely domain-centric, but high-coverage mappings are more useful for large-scale studies, particularly when accuracy can be compromised for coverage.

For each version, both the full annotations and the slim version are provided. They are available in two formats (i.e., Plain files and MySQL tables).

Plain files

Jump to [ Top · Plain files · MySQL tables ]

High-quality mappings

Domain2GO supported by both
  • High-quality truly domain-centric GO annotations supported by single-domain sequences and when considering all sequences (including multidomain proteins) are available in the Domain2GO_supported_by_both.txt file.

  • Statistics for the Domain2GO annotations are summarized in two forms: 1) SCOP hierarchy with the number of GO terms (direct and inherited; three GO sub-ontologies: BP, MF and CC), available in the Domain2GO_SCOP.both.obo file. 2) GO hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2GO_GO.both.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • GO terms which are regarded as SDFO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFO.both.txt file. We highly recommend users to use these GO terms and their annotating domains from Domain2GO_supported_by_both.txt. Unlike of the whole GO hierarchy, those GO terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep in mind that SDFO corresponds to each of three GO sub-ontologies (i.e., BP, MF, and CC ) at each of two SCOP domain types (i.e., FA and SF ).

High-coverage mappings

Domain2GO supported only by all
  • High-coverage domain-centric GO annotations supported only by all UniProts (including multidomain UniProts) are available in the Domain2GO_supported_only_by_all.txt file.

  • Statistics for the Domain2GO annotations are summarized in two forms: 1) SCOP hierarchy with the number of GO terms (direct and inherited; three GO sub-ontologies: BP, MF and CC), available in the Domain2GO_SCOP.all.obo file. 2) GO hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2GO_GO.all.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • GO terms which are regarded as SDFO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFO.all.txt file. We highly recommend users to use these GO terms and their annotating domains from Domain2GO_supported_only_by_all.txt. Unlike the whole GO hierarchy, those GO terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDFO corresponds to each of three GO sub-ontologies (i.e., BP, MF, and CC ) at each of two SCOP domain types (i.e., FA and SF ).

MySQL tables

Jump to [ Top · Plain files · MySQL tables ]

Four tables (available at Domain2GO.sql.gz) are used to store both versions of mappings.

GO_info

Containing info about GO terms
    > DESC GO_info;
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    | Field      | Type                                                                        | Null | Key | Default | Extra |
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    | go         | int(7) unsigned zerofill                                                    | NO   | PRI | NULL    |       | 
    | namespace  | enum('biological_process','molecular_function','cellular_component')        | NO   | MUL | NULL    |       | 
    | name       | varchar(255)                                                                | NO   |     | NULL    |       | 
    | synonym    | text                                                                        | YES  |     | NULL    |       | 
    | definition | text                                                                        | YES  |     | NULL    |       | 
    | distance   | tinyint(3) unsigned                                                         | NO   |     | NULL    |       | 
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    
  • The go column is the numeric part of GO id. It is browsable via GO Hierarchy.
  • The namespace column can be one of three GO sub-ontologies.
  • The name column shows the full name of GO terms.
  • The synonym column is the synonym of GO terms.
  • The definition column is the definition of GO terms.
  • The distance column shows the distance of GO terms to the corresponding sub-ontology.

GO_hie

Containing info about GO hierarchy
    > DESC GO_hie;
    +----------+--------------------------+------+-----+---------+-------+
    | Field    | Type                     | Null | Key | Default | Extra |
    +----------+--------------------------+------+-----+---------+-------+
    | parent   | int(7) unsigned zerofill | NO   | PRI | NULL    |       | 
    | child    | int(7) unsigned zerofill | NO   | PRI | NULL    |       | 
    | distance | tinyint(3) unsigned      | NO   | PRI | NULL    |       | 
    +----------+--------------------------+------+-----+---------+-------+
    
  • The parent column is the numeric part of parental GO id.
  • The child column is the numeric part of child GO id.
  • The distance column shows the distance of parental GO id to child GO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect). Notably, each edge in GO DAG can be one of three relationships: 'is_a', 'part_of', and 'regulates'. Here, we only consider the first two (i.e., 'is_a' and 'part_of') and treat them equally.

GO_mapping

Containing info about domain-centric GO annotations
    > DESC GO_mapping;
    +--------------------+---------------------------+------+-----+---------+-------+
    | Field              | Type                      | Null | Key | Default | Extra |
    +--------------------+---------------------------+------+-----+---------+-------+
    | id                 | mediumint(8) unsigned     | NO   | PRI | NULL    |       |
    | level              | enum('sf','fa')           | NO   |     | NULL    |       |
    | go                 | int(7) unsigned zerofill  | NO   | PRI | NULL    |       |
    | single_fdr         | double                    | NO   |     | 1       |       |
    | all_fdr            | double                    | NO   |     | 1       |       |
    | inherited_from     | text                      | YES  |     | NULL    |       |
    | inherited_from_all | text                      | YES  |     | NULL    |       |
    | all_fdr_min        | double                    | YES  |     | NULL    |       |
    | all_hscore_max     | double                    | YES  |     | NULL    |       |
    +--------------------+---------------------------+------+-----+---------+-------+
    
  • The id is the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
  • The level in the SCOP hierarchy. Can be one of 'sf' for superfamily, 'fa' for family.
  • The go column is the numeric part of GO id.
  • The single_fdr column is the FDR supported by singleton domain UniProts.
  • The all_fdr column is the FDR supported by all UniProts (including multidomain UniProts).
  • The inherited_from column is to mark the status of Domain2GO predicted annotations supported by both. 1) If it is marked with 'directed' (i.e., the column 'single_score'<0.001 and 'all_score'<0.001), Domain2GO is significantly supported both by singleton domain UniProts and all UniProts (including multidomain UniProts). 2) If it is a comma separated list of GO id (numeric part; not both the columns 'single_score'and 'all_score' are less than 0.001), Domain2GO is inherited from any descendant GO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of Domain2GO supported by both can be obtained by selecting the column 'inherited_from' with NOT EMPTY.
  • The inherited_from_all column is to mark the status of Domain2GO predicted annotations supported by all. 1) If it is marked with 'directed' (i.e., 'all_score'<0.001), Domain2GO is significantly supported only by all UniProts (including multidomain UniProts). 2) If it is a comma separated list of GO id (numeric part; the column 'all_score' is not less than 0.001), Domain2GO is inherited from any descendant GO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of Domain2GO supported only by all can be obtained by selecting the column 'inherited_from_all' with NOT EMPTY.
  • The all_fdr_min column is the minimum FDR over all descendant terms annotating that individual domain. Instead of using 'all_fdr' column, the full lists of annotations can also be obtained by selecting the column 'all_fdr_min' < 0.001.
  • The all_hscore_max column is the maximum hypergeometric score (h-score) over all descendant terms annotating that individual domain. Complementing the column 'all_fdr_min', this column is to indicate the strength, the higher for the stronger association. It is preferably used for ranking the domain-based protein function prediction.

GO_ic

Containing info about GO slim
    > DESC GO_ic;
    +---------+---------------------------------------------------------------+------+-----+---------+-------+
    | Field   | Type                                                          | Null | Key | Default | Extra |
    +---------+---------------------------------------------------------------+------+-----+---------+-------+
    | level   | enum('sf','fa','sf_all','fa_all')                             | NO   | PRI | NULL    |       |
    | go      | int(7) unsigned zerofill                                      | NO   | PRI | NULL    |       |
    | ic      | double                                                        | YES  |     | NULL    |       |
    | include | tinyint(2)                                                    | YES  | MUL | NULL    |       |
    +---------+---------------------------------------------------------------+------+-----+---------+-------+
    
  • The level in the SCOP hierarchy. Since this table stores both results (SDFO from Domain2GO supported by both, and SDFO from Domain2GO supported only by all), the level for former SDFO can be one of 'sf' for superfamily, 'fa' for family, and the level for latter SDFO can be one of 'sf_all' for superfamily, 'fa_all' for family.
  • The go column is the numeric part of GO id.
  • The ic column shows the infomration content of the GO term.
  • The include column indicates whether or not the GO term belongs to the SDFO. If the column is set to '0' then it is not a member of SDFO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).

Supra-domain2GO

GO annotations for supra-domains ('SP')

Jump to [ Top · Plain files · MySQL tables ]

In addition to associations of individual domains, dcGO also provides associations to supra-domains. In general, supra-domains are defined as commonly-occuring combinations of two or more successive domains. In dcGO, supra-domains are restricted only to those that do not contain gaps, i.e. a sequential order of fully-annotated domains not containing any unknown domains between them. Similarly, dcGO not only provides the mappings to supra-domains, but also the corresponding GO slims for teh supra-domains. They are available in two formats (i.e., Plain files and MySQL tables).

Plain files

Jump to [ Top · Plain files · MySQL tables ]

GO

Gene Ontology (GO)
  • Full supra-domains (including individual superfamilies) GO annotations are available in the SP2GO.txt file.

  • GO terms which are regarded as SPFO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPFO.txt file. Unlike the whole GO hierarchy, those GO terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPFO corresponds to each of three GO sub-ontologies (i.e., BP, MF, and CC ) only at SCOP superfamily level.

  • We recommend users to use these GO terms in SPFO.txt and their associations to supra-domains extracted from SP2GO.txt. These are of potential use in comparative functional genomics, particularly for understanding how multi-domain proteins have evolved under functional constraints along the tree of life.


MySQL tables

Jump to [ Top · Plain files · MySQL tables ]

Four tables (available at SP2GO.sql.gz) are used to store both of annotations and ontology slim.

GO_info

Containing info about GO terms
    > DESC GO_info;
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    | Field      | Type                                                                        | Null | Key | Default | Extra |
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    | go         | int(7) unsigned zerofill                                                    | NO   | PRI | NULL    |       | 
    | namespace  | enum('biological_process','molecular_function','cellular_component')        | NO   | MUL | NULL    |       | 
    | name       | varchar(255)                                                                | NO   |     | NULL    |       | 
    | synonym    | text                                                                        | YES  |     | NULL    |       | 
    | definition | text                                                                        | YES  |     | NULL    |       | 
    | distance   | tinyint(3) unsigned                                                         | NO   |     | NULL    |       | 
    +------------+-----------------------------------------------------------------------------+------+-----+---------+-------+
    
  • The go column is the numeric part of GO id. It is browsable via GO Hierarchy.
  • The namespace column can be one of three GO sub-ontologies.
  • The name column shows the full name of GO terms.
  • The synonym column is the synonym of GO terms.
  • The definition column is the definition of GO terms.
  • The distance column shows the distance of GO terms to the corresponding sub-ontology.

GO_hie

Containing info GO hierarchy
    > DESC GO_hie;
    +----------+--------------------------+------+-----+---------+-------+
    | Field    | Type                     | Null | Key | Default | Extra |
    +----------+--------------------------+------+-----+---------+-------+
    | parent   | int(7) unsigned zerofill | NO   | PRI | NULL    |       | 
    | child    | int(7) unsigned zerofill | NO   | PRI | NULL    |       | 
    | distance | tinyint(3) unsigned      | NO   | PRI | NULL    |       | 
    +----------+--------------------------+------+-----+---------+-------+
    
  • The parent column is the numeric part of parental GO id.
  • The child column is the numeric part of child GO id.
  • The distance column shows the distance of parental GO id to child GO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect). Notably, each edge in GO DAG can be one of three relationships: 'is_a', 'part_of', and 'regulates'. Here, we only consider the first two (i.e., 'is_a' and 'part_of') and treat them equally.

GO_mapping_supradomain

Containing info about domain-centric annotations
    > DESC GO_mapping_supradomain;
    +----------------+---------------------------+------+-----+---------+-------+
    | Field          | Type                      | Null | Key | Default | Extra |
    +----------------+---------------------------+------+-----+---------+-------+
    | supradomain    | text                      | NO   | MUL | NULL    |       |
    | level          | enum('sf')                | NO   |     | NULL    |       |
    | go             | int(7) unsigned zerofill  | NO   |     | NULL    |       |
    | all_score      | double                    | NO   |     | 1       |       |
    | inherited_from | text                      | YES  |     | NULL    |       |
    +----------------+---------------------------+------+-----+---------+-------+
    
  • The supradomain is a comma separated list of the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
  • The level in the SCOP hierarchy. Can only be 'sf' for superfamily.
  • The go column is the numeric part of GO id.
  • The all_score column is the FDR supported by all UniProts (including multidomain UniProts).
  • The inherited_from column is to mark the status of SP2GO predicted annotations. 1) If it is marked with 'directed' (i.e., 'all_score'<0.001), SP2GO is significantly supported by all UniProts (including multidomain UniProts). 2) If it is a comma separated list of GO id (numeric part; the column 'all_score' is not less than 0.001), SP2GO is inherited from any descendant GO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of SP2GO supported only by all can be obtained by selecting the column 'inherited_from' with NOT EMPTY.

GO_ic_supra

Containing info about ontology slim
    > DESC GO_ic_supra;
    +---------+---------------------------+------+-----+---------+-------+
    | Field   | Type                      | Null | Key | Default | Extra |
    +---------+---------------------------+------+-----+---------+-------+
    | level   | enum('sf')                | NO   | PRI | NULL    |       |
    | go      | int(7) unsigned zerofill  | NO   | PRI | NULL    |       |
    | ic      | double                    | YES  |     | NULL    |       |
    | include | tinyint(2)                | YES  | MUL | NULL    |       |
    +---------+---------------------------+------+-----+---------+-------+
    
  • The level in the SCOP hierarchy. Can only be 'sf' for superfamily.
  • The go column is the numeric part of GO id.
  • The ic column shows the infomration content of the GO term.
  • The include column indicates whether or not the GO term belongs to the SPFO. If the column is set to '0' then it is not a member of SPFO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).

Domain2BO

BO annotations for individual domains

Jump to [ Top · Plain files · MySQL tables ]

For each of the Biomedical Ontologies (BO), both the associations and a slim version of the ontology are provided. They are available in two formats (i.e., Plain files and MySQL tables).

Plain files

Jump to [ Top · Plain files · MySQL tables ]

DO

Disease Ontology (DO)
  • High-coverage domain-centric DO annotations are available in the Domain2DO.txt file.

  • Statistics for the Domain2DO annotations are summarized in two forms: 1) SCOP hierarchy with the number of DO terms (direct and inherited), available in the Domain2DO_SCOP.obo file. 2) DO hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2DO_DO.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • DO terms which are regarded as SDDO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDDO.txt file. We highly recommend users to use these DO terms and their annotating domains from Domain2DO.txt. Unlike the whole DO hierarchy, those DO terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDDO corresbonds to each of two SCOP domain types (i.e., FA and SF ).

HP

Human Phenotype (HP)
  • High-coverage domain-centric HP annotations are available in the Domain2HP.txt file.

  • Statistics for the Domain2HP annotations are summarized in two forms: 1) SCOP hierarchy with the number of HP terms (direct and inherited; three HP sub-ontologies: MI, ON and PA), available in the Domain2HP_SCOP.obo file. 2) HP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2HP_HP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • HP terms which are regarded as SDHP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDHP.txt file. We highly recommend users to use these HP terms and their annotating domains from Domain2HP.txt. Unlike the whole HP hierarchy, those HP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDHP corresbonds to each of three HP sub-ontologies (i.e., MI, ON and PA ) at each of two SCOP domain types (i.e., FA and SF ).

MP

Mouse Phenotype (MP)
  • High-coverage domain-centric MP annotations are available in the Domain2MP.txt file.

  • Statistics for the Domain2MP annotations are summarized in two forms: 1) SCOP hierarchy with the number of MP terms (direct and inherited), available in the Domain2MP_SCOP.obo file. 2) MP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2MP_MP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • MP terms which are regarded as SDMP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDMP.txt file. We highly recommend users to use these MP terms and their annotating domains from Domain2MP.txt. Unlike the whole MP hierarchy, those MP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDMP corresbonds to each of two SCOP domain types (i.e., FA and SF ).

WP

Worm Phenotype (WP)
  • High-coverage domain-centric WP annotations are available in the Domain2WP.txt file.

  • Statistics for the Domain2WP annotations are summarized in two forms: 1) SCOP hierarchy with the number of WP terms (direct and inherited), available in the Domain2WP_SCOP.obo file. 2) WP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2WP_WP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • WP terms which are regarded as SDWP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDWP.txt file. We highly recommend users to use these WP terms and their annotating domains from Domain2WP.txt. Unlike the whole WP hierarchy, those WP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDWP corresbonds to each of two SCOP domain types (i.e., FA and SF ).

YP

Yeast Phenotype (YP)
  • High-coverage domain-centric YP annotations are available in the Domain2YP.txt file.

  • Statistics for the Domain2YP annotations are summarized in two forms: 1) SCOP hierarchy with the number of YP terms (direct and inherited), available in the Domain2YP_SCOP.obo file. 2) YP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2YP_YP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • YP terms which are regarded as SDYP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDYP.txt file. We highly recommend users to use these YP terms and their annotating domains from Domain2YP.txt. Unlike the whole YP hierarchy, those YP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDYP corresbonds to each of two SCOP domain types (i.e., FA and SF ).

FP

Fly Phenotype (FP)
  • High-coverage domain-centric FP annotations are available in the Domain2FP.txt file.

  • Statistics for the Domain2FP annotations are summarized in two forms: 1) SCOP hierarchy with the number of FP terms (direct and inherited), available in the Domain2FP_SCOP.obo file. 2) FP hierarchy with the number of domains (direct and inherited; two SCOP levels: FP, SF, CF and CL), available in the Domain2FP_FP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • FP terms which are regarded as SDFP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFP.txt file. We highly recommend users to use these FP terms and their annotating domains from Domain2FP.txt. Unlike the whole FP hierarchy, those FP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDFP corresbonds to each of two SCOP domain types (i.e., FA and SF ).

FA

Fly Anatomy (FA)
  • High-coverage domain-centric FA annotations are available in the Domain2FA.txt file.

  • Statistics for the Domain2FA annotations are summarized in two forms: 1) SCOP hierarchy with the number of FA terms (direct and inherited), available in the Domain2FA_SCOP.obo file. 2) FA hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2FA_FA.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • FA terms which are regarded as SDFA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFA.txt file. We highly recommend users to use these FA terms and their annotating domains from Domain2FA.txt. Unlike the whole FA hierarchy, those FA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDFA corresbonds to each of two SCOP domain types (i.e., FA and SF ).

ZA

Zebrafish Anatomy (ZA)
  • High-coverage domain-centric ZA annotations are available in the Domain2ZA.txt file.

  • Statistics for the Domain2ZA annotations are summarized in two forms: 1) SCOP hierarchy with the number of ZA terms (direct and inherited), available in the Domain2ZA_SCOP.obo file. 2) ZA hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2ZA_ZA.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • ZA terms which are regarded as SDZA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDZA.txt file. We highly recommend users to use these ZA terms and their annotating domains from Domain2ZA.txt. Unlike the whole ZA hierarchy, those ZA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDZA corresbonds to each of two SCOP domain types (i.e., FA and SF ).

XA

Xenopus Anatomy (XA)
  • High-coverage domain-centric XA annotations are available in the Domain2XA.txt file.

  • XA terms which are regarded as SDXA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDXA.txt file. We highly recommend users to use these XA terms and their annotating domains from Domain2XA.txt. Unlike the whole XA hierarchy, those XA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDXA corresbonds to each of two XA sub-ontologies (i.e., XAN and XDE ) at each of two SCOP domain types (i.e., FA and SF ).

AP

Arabidopsis Plant (AP)
  • High-coverage domain-centric AP annotations are available in the Domain2AP.txt file.

  • Statistics for the Domain2AP annotations are summarized in two forms: 1) SCOP hierarchy with the number of AP terms (direct and inherited; two AP sub-ontologies: AN and DE), available in the Domain2AP_SCOP.obo file. 2) AP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2AP_AP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
  • AP terms which are regarded as SDAP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDAP.txt file. We highly recommend users to use these AP terms and their annotating domains from Domain2AP.txt. Unlike the whole AP hierarchy, those AP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDAP corresbonds to each of two AP sub-ontologies (i.e., PAN and PDE ) at each of two SCOP domain types (i.e., FA and SF ).

EC

Enzyme Commission (EC)
  • High-coverage domain-centric EC annotations are available in the Domain2EC.txt file.

  • EC terms which are regarded as SDEC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDEC.txt file. We highly recommend users to use these EC terms and their annotating domains from Domain2EC.txt. Unlike the whole EC hierarchy, those EC terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDEC corresbonds to each of two SCOP domain types (i.e., FA and SF ).

DB

DrugBank ATC_code (DB)
  • High-coverage domain-centric DB annotations are available in the Domain2DB.txt file.

  • DB terms which are regarded as SDDB (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDDB.txt file. We highly recommend users to use these DB terms and their annotating domains from Domain2DB.txt. Unlike the whole DB hierarchy, those DB terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDDB corresbonds to each of four SCOP domain types (i.e., FA and SF ).

KW

UniProtKB KeyWords (KW)
  • High-coverage domain-centric KW annotations are available in the Domain2KW.txt file.

  • KW terms which are regarded as SDKW (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDKW.txt file. We highly recommend users to use these KW terms and their annotating domains from Domain2KW.txt. Unlike the whole KW hierarchy, those KW terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDKW corresbonds to each of two SCOP domain types (i.e., FA and SF ).

UP

UniProtKB UniPathway (UP)
  • High-coverage domain-centric UP annotations are available in the Domain2UP.txt file.

  • UP terms which are regarded as SDUP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDUP.txt file. We highly recommend users to use these UP terms and their annotating domains from Domain2UP.txt. Unlike the whole UP hierarchy, those UP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDUP corresbonds to each of two SCOP domain types (i.e., FA and SF ).

CD

CTD Diseases (CD)
  • High-coverage domain-centric CD annotations are available in the Domain2CD.txt file.

  • CD terms which are regarded as SDCD (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDCD.txt file. We highly recommend users to use these CD terms and their annotating domains from Domain2CD.txt. Unlike the whole CD hierarchy, those CD terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDCD corresbonds to each of two SCOP domain types (i.e., FA and SF ).

CC

CTD Chemicals (CC)
  • High-coverage domain-centric CC annotations are available in the Domain2CC.txt file.

  • CC terms which are regarded as SDCC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDCC.txt file. We highly recommend users to use these CC terms and their annotating domains from Domain2CC.txt. Unlike the whole CC hierarchy, those CC terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDCC corresbonds to each of two SCOP domain types (i.e., FA and SF ).

MySQL tables

Jump to [ Top · Plain files · MySQL tables ]

Four tables (available at Domain2BO.sql.gz) are used to store both of annotations and ontology slim.

BO_info

Containing info about ontological terms
    > DESC BO_info;
    +------------+---------------------+------+-----+---------+-------+
    | Field      | Type                | Null | Key | Default | Extra |
    +------------+---------------------+------+-----+---------+-------+
    | obo        | char(2)             | NO   | PRI | NULL    |       |
    | bo         | varchar(20)         | NO   | PRI | NULL    |       |
    | namespace  | varchar(50)         | NO   |     | NULL    |       |
    | name       | varchar(255)        | NO   | MUL | NULL    |       |
    | synonym    | text                | YES  |     | NULL    |       |
    | definition | text                | YES  |     | NULL    |       |
    | distance   | tinyint(3) unsigned | NO   |     | NULL    |       |
    +------------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id. It is browsable via BO Hierarchy.
  • The namespace column can be one of sub-ontologies, otherwise root.
  • The name column shows the full name of BO terms.
  • The synonym column is the synonym of BO terms.
  • The definition column is the definition of BO terms.
  • The distance column shows the distance of BO terms to the corresbonding sub-ontology.

BO_hie

Containing info about hierarchy
    > DESC BO_hie;
    +----------+---------------------+------+-----+---------+-------+
    | Field    | Type                | Null | Key | Default | Extra |
    +----------+---------------------+------+-----+---------+-------+
    | obo      | char(2)             | NO   | PRI | NULL    |       |
    | parent   | varchar(20)         | NO   | PRI | NULL    |       |
    | child    | varchar(20)         | NO   | PRI | NULL    |       |
    | distance | tinyint(3) unsigned | NO   | PRI | NULL    |       |
    +----------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The parent column is the parental BO id.
  • The child column is the child BO id.
  • The distance column shows the distance of parental BO id to child BO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect).

BO_mapping

Containing info about domain-centric annotations
    > DESC BO_mapping;
    +----------------+---------------------------+------+-----+---------+-------+
    | Field          | Type                      | Null | Key | Default | Extra |
    +----------------+---------------------------+------+-----+---------+-------+
    | id             | mediumint(8) unsigned     | NO   | PRI | NULL    |       |
    | level          | enum('sf','fa')           | NO   |     | NULL    |       |
    | obo            | char(2)                   | NO   |     | NULL    |       |
    | bo             | varchar(20)               | NO   | PRI | NULL    |       |
    | fdr            | double                    | NO   |     | 1       |       |
    | inherited_from | text                      | YES  |     | NULL    |       |
    | fdr_min        | double                    | YES  |     | NULL    |       |
    | hscore_max     | double                    | YES  |     | NULL    |       |
    +----------------+---------------------------+------+-----+---------+-------+
    
  • The id is the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
  • The level in the SCOP hierarchy. Can be one of 'sf' for superfamily, 'fa' for family.
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id.
  • The fdr column is the FDR supported by all longest-transcript human genes/proteins (including multidomain proteins).
  • The inherited_from column is to mark the status of Domain2BO predicted annotations. 1) If it is marked with 'directed' (i.e., 'fdr'<0.001), Domain2BO is significantly supported only by all longest-transcript human genes/proteins (including multidomain proteins). 2) If it is a comma separated list of BO id (the column 'all_score' is not less than 0.001), Domain2BO is inherited from any descendant BO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of Domain2BO can be obtained by selecting the column 'inherited_from' with NOT EBOTY.
  • The fdr_min column is the minimum FDR over all descendant terms annotating that individual domain. Instead of using 'fdr' column, the full lists of annotations can also be obtained by selecting the column 'fdr_min' < 0.001.
  • The hscore_max column is the maximum hypergeometric score (h-score) over all descendant terms annotating that individual domain. Complementing the column 'fdr_min', this column is to indicate the strength, the higher for the stronger association. It is preferably used for ranking the domain-based protein prediction on functions, phenotypes and diseases.

BO_ic

Containing info about ontology slim
    > DESC BO_ic;
    +---------+---------------------------+------+-----+---------+-------+
    | Field   | Type                      | Null | Key | Default | Extra |
    +---------+---------------------------+------+-----+---------+-------+
    | level   | enum('sf','fa')           | NO   | PRI | NULL    |       |
    | obo     | char(2)                   | NO   |     | NULL    |       |
    | bo      | varchar(20)               | NO   | PRI | NULL    |       |
    | ic      | double                    | YES  |     | NULL    |       |
    | include | tinyint(2)                | YES  | MUL | NULL    |       |
    +---------+---------------------------+------+-----+---------+-------+
    
  • The level in the SCOP hierarchy. Can be one of 'sf' for superfamily, 'fa' for family.
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id.
  • The ic column shows the infomration content of the BO term.
  • The include column indicates whether or not the BO term belongs to the SDBO. If the column is set to '0' then it is not a member of SDBO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).

Supra-domain2BO

BO annotations for supra-domains ('SP')

Jump to [ Top · Plain files · MySQL tables ]

For each of the Biomedical Ontologies (BO), both the associations and a slim version of the ontology are provided. They are available in two formats (i.e., Plain files and MySQL tables).

Plain files

Jump to [ Top · Plain files · MySQL tables ]

DO

Disease Ontology (DO)
  • Full supra-domains (including individual superfamilies) DO annotations are available in the SP2DO.txt file.

  • DO terms which are regarded as SPDO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPDO.txt file. Unlike the whole DO hierarchy, those DO terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPDO corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these DO terms in SPDO.txt and their annotating supra-domains extracted from SP2DO.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

HP

Human Phenotype (HP)
  • Full supre-domains (including individual superfamilies) HP annotations are available in the SP2HP.txt file.

  • HP terms which are regarded as SPHO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPHO.txt file. Unlike the whole HP hierarchy, those HP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPHO corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these HP terms in SPHO.txt and their annotating supra-domains extracted from SP2HP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

MP

Mouse Phenotype (MP)
  • Full supre-domains (including individual superfamilies) MP annotations are available in the SP2MP.txt file.

  • MP terms which are regarded as SPMP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPMP.txt file. Unlike the whole MP hierarchy, those MP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPMP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these MP terms in SPMP.txt and their annotating supra-domains extracted from SP2MP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

WP

Worm Phenotype (WP)
  • Full supre-domains (including individual superfamilies) WP annotations are available in the SP2WP.txt file.

  • WP terms which are regarded as SPWP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPWP.txt file. Unlike the whole WP hierarchy, those WP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPWP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these WP terms in SPWP.txt and their annotating supra-domains extracted from SP2WP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

YP

Yeast Phenotype (YP)
  • Full supre-domains (including individual superfamilies) YP annotations are available in the SP2YP.txt file.

  • YP terms which are regarded as SPYP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPYP.txt file. Unlike the whole YP hierarchy, those YP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPYP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these YP terms in SPYP.txt and their annotating supra-domains extracted from SP2YP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

FP

Fly Phenotype (FP)
  • Full supre-domains (including individual superfamilies) FP annotations are available in the SP2FP.txt file.

  • FP terms which are regarded as SPFP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPFP.txt file. Unlike the whole FP hierarchy, those FP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPFP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these FP terms in SPFP.txt and their annotating supra-domains extracted from SP2FP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

FA

Fly Anatomy (FA)
  • Full supre-domains (including individual superfamilies) FA annotations are available in the SP2FA.txt file.

  • FA terms which are regarded as SPFA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPFA.txt file. Unlike the whole FA hierarchy, those FA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPFA corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these FA terms in SPFA.txt and their annotating supra-domains extracted from SP2FA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

ZA

Zebrafish Anatomy (ZA)
  • Full supre-domains (including individual superfamilies) ZA annotations are available in the SP2ZA.txt file.

  • ZA terms which are regarded as SPZA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPZA.txt file. Unlike the whole ZA hierarchy, those ZA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPZA corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these ZA terms in SPZA.txt and their annotating supra-domains extracted from SP2ZA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

    `

XA

Xenopus Anatomy (XA)
  • Full supre-domains (including individual superfamilies) XA annotations are available in the SP2XA.txt file.

  • XA terms which are regarded as SPXA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPXA.txt file. Unlike the whole XA hierarchy, those XA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPXA corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these XA terms in SPXA.txt and their annotating supra-domains extracted from SP2XA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

AP

Arabidopsis Plant (AP)
  • Full supre-domains (including individual superfamilies) AP annotations are available in the SP2AP.txt file.

  • AP terms which are regarded as SPAP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPAP.txt file. Unlike the whole AP hierarchy, those AP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPAP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these AP terms in SPAP.txt and their annotating supra-domains extracted from SP2AP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

EC

Enzyme Commission (EC)
  • Full supre-domains (including individual superfamilies) EC annotations are available in the SP2EC.txt file.

  • EC terms which are regarded as SPEC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPEC.txt file. Unlike the whole EC hierarchy, those EC terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPEC corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these EC terms in SPEC.txt and their annotating supra-domains extracted from SP2EC.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

DB

DrugBank ATC_code (DB)
  • Full supre-domains (including individual superfamilies) DB annotations are available in the SP2DB.txt file.

  • DB terms which are regarded as SPDB (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPDB.txt file. Unlike the whole DB hierarchy, those DB terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPDB corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these DB terms in SPDB.txt and their annotating supra-domains extracted from SP2DB.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

KW

UniProtKB KeyWords (KW)
  • Full supre-domains (including individual superfamilies) KW annotations are available in the SP2KW.txt file.

  • KW terms which are regarded as SPKW (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPKW.txt file. Unlike the whole KW hierarchy, those KW terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPKW corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these KW terms in SPKW.txt and their annotating supra-domains extracted from SP2KW.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

UP

UniProtKB UniPathway (UP)
  • Full supre-domains (including individual superfamilies) UP annotations are available in the SP2UP.txt file.

  • UP terms which are regarded as SPUP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPUP.txt file. Unlike the whole UP hierarchy, those UP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPUP corresbonds to only SCOP superfamily level.

  • We highly recommend users to use these UP terms in SPUP.txt and their annotating supra-domains extracted from SP2UP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

CD

CTD Diseases (CD)
  • High-coverage domain-centric CD annotations are available in the SP2CD.txt file.

  • CD terms which are regarded as SPCD (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPCD.txt file. We highly recommend users to use these CD terms and their annotating domains from SP2CD.txt. Unlike the whole CD hierarchy, those CD terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SPCD corresbonds to each of two SCOP domain types (i.e., FA and SF ).

CC

CTD Chemicals (CC)
  • High-coverage domain-centric CC annotations are available in the SP2CC.txt file.

  • CC terms which are regarded as SPCC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPCC.txt file. We highly recommend users to use these CC terms and their annotating domains from SP2CC.txt. Unlike the whole CC hierarchy, those CC terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SPCC corresbonds to each of two SCOP domain types (i.e., FA and SF ).

MySQL tables

Jump to [ Top · Plain files · MySQL tables ]

Four tables (available at SP2BO.sql.gz) are used to store both of annotations and ontology slim.

BO_info

Containing info about ontological terms
    > DESC BO_info;
    +------------+---------------------+------+-----+---------+-------+
    | Field      | Type                | Null | Key | Default | Extra |
    +------------+---------------------+------+-----+---------+-------+
    | obo        | char(2)             | NO   | PRI | NULL    |       |
    | bo         | varchar(20)         | NO   | PRI | NULL    |       |
    | namespace  | varchar(50)         | NO   |     | NULL    |       |
    | name       | varchar(255)        | NO   | MUL | NULL    |       |
    | synonym    | text                | YES  |     | NULL    |       |
    | definition | text                | YES  |     | NULL    |       |
    | distance   | tinyint(3) unsigned | NO   |     | NULL    |       |
    +------------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id. It is browsable via BO Hierarchy.
  • The namespace column can be one of sub-ontologies, otherwise root.
  • The name column shows the full name of BO terms.
  • The synonym column is the synonym of BO terms.
  • The definition column is the definition of BO terms.
  • The distance column shows the distance of BO terms to the corresbonding sub-ontology.

BO_hie

Containing info about hierarchy
    > DESC BO_hie;
    +----------+---------------------+------+-----+---------+-------+
    | Field    | Type                | Null | Key | Default | Extra |
    +----------+---------------------+------+-----+---------+-------+
    | obo      | char(2)             | NO   | PRI | NULL    |       |
    | parent   | varchar(20)         | NO   | PRI | NULL    |       |
    | child    | varchar(20)         | NO   | PRI | NULL    |       |
    | distance | tinyint(3) unsigned | NO   | PRI | NULL    |       |
    +----------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The parent column is the parental BO id.
  • The child column is the child BO id.
  • The distance column shows the distance of parental BO id to child BO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect).

BO_mapping_supradomain

Containing info about domain-centric annotations
    > DESC BO_mapping_supradomain;
    +----------------+---------------------------+------+-----+---------+-------+
    | Field          | Type                      | Null | Key | Default | Extra |
    +----------------+---------------------------+------+-----+---------+-------+
    | supradomain    | text                      | NO   | MUL | NULL    |       |
    | level          | enum('sf')                | NO   |     | NULL    |       |
    | obo            | char(2)                   | NO   | MUL | NULL    |       |
    | bo             | varchar(20)               | NO   |     | NULL    |       |
    | fdr            | double                    | NO   |     | 1       |       |
    | inherited_from | text                      | YES  |     | NULL    |       |
    | fdr_min        | double                    | YES  |     | NULL    |       |
    | hscore_max     | double                    | YES  |     | NULL    |       |
    +----------------+---------------------------+------+-----+---------+-------+
    
  • The supradomain is a comma separated list of the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
  • The level in the SCOP hierarchy. Can only be 'sf' for superfamily.
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id.
  • The fdr column is the FDR supported by all longest-transcript worm genes/proteins (including multidomain proteins).
  • The inherited_from column is to mark the status of SP2BO predicted annotations. 1) If it is marked with 'directed' (i.e., 'fdr'<0.001), SP2BO is significantly supported by all longest-transcript worm genes/proteins (including multidomain proteins). 2) If it is a comma separated list of BO id (numeric part; the column 'fdr' is not less than 0.001), SP2BO is inherited from any descendant BO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of SP2BO supported only by all can be obtained by selecting the column 'inherited_from' with NOT EBOTY.
  • The fdr_min column is the minimum FDR over all descendant terms annotating that supra-domain. Instead of using 'fdr' column, the full lists of annotations can also be obtained by selecting the column 'fdr_min' < 0.001.
  • The hscore_max column is the maximum hypergeometric score (h-score) over all descendant terms annotating that supra-domain. Complementing the column 'fdr_min', this column is to indicate the strength, the higher for the stronger association. It is preferably used for ranking the domain-based protein prediction on functions, phenotypes and diseases.

BO_ic_supra

Containing info about ontology slim
    > DESC BO_ic_supra;
    +---------+---------------------------+------+-----+---------+-------+
    | Field   | Type                      | Null | Key | Default | Extra |
    +---------+---------------------------+------+-----+---------+-------+
    | level   | enum('sf')                | NO   | PRI | NULL    |       |
    | obo     | char(2)                   | NO   | PRI | NULL    |       |
    | bo      | varchar(20)               | NO   | PRI | NULL    |       |
    | ic      | double                    | YES  |     | NULL    |       |
    | include | tinyint(2)                | YES  | MUL | NULL    |       |
    +---------+---------------------------+------+-----+---------+-------+
    
  • The level in the SCOP hierarchy. Can only be 'sf' for superfamily.
  • The obo column indicates the type of BO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
  • The bo column is the corresbonding BO id.
  • The ic column shows the infomration content of the BO term.
  • The include column indicates whether or not the BO term belongs to the SDBO. If the column is set to '0' then it is not a member of SDBO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).

Annotations for PFAM domains and supra-domains

GO annotations for PFAM individual domains and supra-domains

Jump to [ Top · Plain files · MySQL tables ]

The associations/mappings between ontological terms (i.e., those from GO) and PFAM individual domains and supra-domains (i.e., combinations of two or more successive domains) are available in two parsable formats (i.e., plain files and mysql tables).

Plain files

Jump to [ Top · Plain files · MySQL tables ]

GO

Gene Ontology (GO)

HP

Human Phenotype (HP)
  • Full annotations between PFAM domains (including individual domains and supra-domains) and HP terms (including three sub-ontologies) are available in the PFAM2HP.txt file.

  • The slim version of HP has four levels of terms with increasing granularity: highly general (least informative), general (Moderately Informative), specific (Informative), and highly specific (highly Informative), which can be found in the PFAMHP.txt file. Unlike the whole HP hierarchy, those HP terms at different granularity are representative and comprehensive in terms of their relevance to PFAM-based annotations.

  • In addition to the full annotation results, more specific annotations are also provided below for each object of PFAM and for each subontology of HP:

    Mappings from individual domains to three sub-ontologies of HP

    Mappings from supra-domains to three sub-ontologies of HP


MySQL tables

Jump to [ Top · Plain files · MySQL tables ]

Six tables (available at PFAM2OBO.sql.gz) are used to store relevant information. They include two tables for PFAM info (i.e., PFAM_info and PFAM_hie), two tables for ontology info (i.e., OBO_info and OBO_hie), and two tables for their mappings (i.e., OBO_mapping) and the GO slim version (i.e., OBO_ic).

Notably, OBO stands for Open Biological Ontologies including GO and HP. The reason for that is the extension in future when other ontologies are requested by users to annotate PFAM domains.

PFAM_info

Containing info about PFAM domains at the pfam family and clan levels
    > DESC PFAM_info;
    +-------------+---------------------+------+-----+---------+-------+
    | Field       | Type                | Null | Key | Default | Extra |
    +-------------+---------------------+------+-----+---------+-------+
    | level       | enum('pfam','clan') | NO   | MUL | pfam    |       |
    | acc         | varchar(7)          | NO   | PRI | NULL    |       |
    | id          | varchar(40)         | NO   |     | NULL    |       |
    | description | varchar(100)        | NO   | MUL | NULL    |       |
    +-------------+---------------------+------+-----+---------+-------+
    
  • The level in the PFAM hierarchy. Can be 'pfam' for pfam family, 'clan' for clan level.
  • The acc column is the accession number of PFAM domains.
  • The id column is the corresponding PFAM id.
  • The description column shows the full description of PFAM domains.

PFAM_hie

Containing info about PFAM hierarchy
    > DESC PFAM_hie;
    +----------+---------------------+------+-----+---------+-------+
    | Field    | Type                | Null | Key | Default | Extra |
    +----------+---------------------+------+-----+---------+-------+
    | parent   | varchar(20)         | NO   | PRI | NULL    |       |
    | child    | varchar(20)         | NO   | PRI | NULL    |       |
    | distance | tinyint(3) unsigned | NO   |     | NULL    |       |
    +----------+---------------------+------+-----+---------+-------+
    
  • The parent column is the accession number of pfam family.
  • The child column is the accession number of clan.
  • The distance column is always 1.

OBO_info

Containing info about ontological terms
    > DESC OBO_info;
    +------------+---------------------+------+-----+---------+-------+
    | Field      | Type                | Null | Key | Default | Extra |
    +------------+---------------------+------+-----+---------+-------+
    | obo        | char(2)             | NO   | PRI | NULL    |       |
    | id         | varchar(20)         | NO   | PRI | NULL    |       |
    | namespace  | varchar(50)         | NO   |     | NULL    |       |
    | name       | varchar(255)        | NO   | MUL | NULL    |       |
    | synonym    | text                | YES  |     | NULL    |       |
    | definition | text                | YES  |     | NULL    |       |
    | distance   | tinyint(3) unsigned | NO   |     | NULL    |       |
    +------------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of OBO. NOW it only includes 'GO' for 'Gene Ontology' and 'HP' for 'Human Phenotype'.
  • The id column is the corresponding OBO id.
  • The namespace column can be one of three GO sub-ontologies, otherwise root.
  • The name column shows the full name of OBO terms.
  • The synonym column is the synonym of OBO terms.
  • The definition column is the definition of OBO terms.
  • The distance column shows the distance of OBO terms to the corresponding sub-ontology.

OBO_hie

Containing info ontology hierarchy
    > DESC OBO_hie;
    +----------+---------------------+------+-----+---------+-------+
    | Field    | Type                | Null | Key | Default | Extra |
    +----------+---------------------+------+-----+---------+-------+
    | obo      | char(2)             | NO   | PRI | NULL    |       |
    | parent   | varchar(20)         | NO   | PRI | NULL    |       |
    | child    | varchar(20)         | NO   | PRI | NULL    |       |
    | distance | tinyint(3) unsigned | NO   | PRI | NULL    |       |
    +----------+---------------------+------+-----+---------+-------+
    
  • The obo column indicates the type of OBO. NOW it only includes 'GO' for 'Gene Ontology' and 'HP' for 'Human Phenotype'.
  • The parent column is the parental OBO id.
  • The child column is the child OBO id.
  • The distance column shows the shortest distance between parental OBO id and child OBO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect).

OBO_mapping

Containing info about mappings from PFAM to ontological terms
    > DESC OBO_mapping;
    +----------------+-------------+------+-----+---------+-------+
    | Field          | Type        | Null | Key | Default | Extra |
    +----------------+-------------+------+-----+---------+-------+
    | supradomain    | text        | NO   | MUL | NULL    |       |
    | obo            | char(2)     | NO   | MUL | NULL    |       |
    | id             | varchar(20) | NO   |     | NULL    |       |
    | fdr            | double      | NO   |     | 1       |       |
    | inherited_from | text        | YES  |     | NULL    |       |
    | fdr_min        | double      | YES  |     | NULL    |       |
    | hscore_max     | double      | YES  |     | NULL    |       |
    +----------------+-------------+------+-----+---------+-------+
    
  • The supradomain is a comma separated list of PFAM accession number (including individual domains).
  • The obo column indicates the type of OBO. NOW it only includes 'GO' for 'Gene Ontology' and 'HP' for 'Human Phenotype'.
  • The id column is the corresponding OBO id.
  • The fdr column is the false discovery rate (FDR), indicating significance of association.
  • The inherited_from column is to mark the status of predicted annotations. 1) If it is marked with 'directed', it suggests the annotation is significantly (i.e., 'fdr'<0.001) supported by the protein-level annotations. 2) If it is a comma separated list of OBO id (the column 'fdr' is not less than 0.001), the annotation is inherited from any descentant OBO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the full lists of annotations can be obtained by selecting the column 'inherited_from' with NOT EMPTY.
  • The fdr_min column is the minimum FDR over all descendant terms annotating that individual domain/supra-domain. Instead of using 'fdr' column, the full lists of annotations can also be obtained by selecting the column 'fdr_min' < 0.001.
  • The hscore_max column is the maximum hypergeometric score (h-score) over all descendant terms annotating that individual domain/supra-domain. Complementing the column 'fdr_min', this column is to indicate the strength, the higher for the stronger association. It is preferably used for ranking the domain-based protein function prediction.

OBO_ic

Containing info about ontology slim
    > DESC OBO_ic;
    +---------+-------------+------+-----+---------+-------+
    | Field   | Type        | Null | Key | Default | Extra |
    +---------+-------------+------+-----+---------+-------+
    | obo     | char(2)     | NO   | PRI | NULL    |       |
    | id      | varchar(20) | NO   | PRI | NULL    |       |
    | ic      | double      | YES  |     | NULL    |       |
    | include | tinyint(2)  | YES  | MUL | NULL    |       |
    +---------+-------------+------+-----+---------+-------+
    
  • The obo column indicates the type of OBO. NOW it only includes 'GO' for 'Gene Ontology' and 'HP' for 'Human Phenotype'.
  • The id column is the corresponding OBO id.
  • The ic column shows the infomration content of the OBO term.
  • The include column indicates whether or not the OBO term belongs to the slim version. If the column is set to '0' then it is not a member of slim version. Otherwise, '1' for least informative (i.e., highly general), '2' for moderately informative (general), '3' for informative (specific), '4' for highly informative (i.e., highly specific).