Electronic supplementary information (ESI) website

A disease-drug-phenotype matrix inferred by walking on a functional domain network

Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK

Protein domains are classified as units of structure, evolution and function, and thus form the molecular backbone of biosphere. Although functional networks at the protein level have been reported to be of value in predicting diseases (phenotypes or drugs), they have not previously been applied at the sub-protein resolution (protein domain in this case). We herein introduce a domain network with a functional perspective. This network has nodes consisting of protein domains (at the superfamily/evolutionary level), with edges weighted by the semantic similarity according to domain-centric Gene Ontology (dcGO) annotations, which henceforth we call “dcGOnet”. By globally exploring this network via a random walk, we demonstrate its predictive value on disease, drug, or phenotype-related ontologies. On cross-validation recovering ontology labels for domains, we achieve an overall area under the ROC curve of 89.0% for drugs, 87.3% for diseases, 87.6% for human phenotypes and 88.2% for mouse phenotypes. We show that the performance using global information from this network is significantly better than using local information, and also illustrate that the better performance is not sensitive to network size, or the choice of algorithm parameters, and is universal to different ontologies. Based on the dcGOnet and its global properties, we further develop an approach to build a disease-drug-phenotype matrix. The predicted interconnections are statistically supported using a novel randomization procedure, and are also empirically supported by inspection for biological relevance. Most of the high-ranking predictions recover connections that are well known, but others uncover connections that have only suggestive or obscure support in the literature; we show that these are missed by simpler methods, in particular for drug-disease connections. The value of this work is threefold: we describe a general methodology and make the software available, we provide the functional domain network itself, and the ranked drug-disease-phenotype matrix provides rich targets for investigation.

For details, please refer to Fang H, Gough J. Molecular BioSystems 2013, 9(7):1686-96. DOI: 10.1039/c3mb25495j (PDF)

dcGOnet: domain-centric Gene Ontology-derived network

    dcGOnet is a weighted+undirected graph, containing 289,528 edges involving 1,075 nodes.
    Save as cys file and open the dcGOnet with Cytoscape.
    dcGOnet itself is provided for the download in two formats.

RWR: Random Walk with Restart using in-house perl module

    This script (here) runs the RWR algorithm (using threads) on a particular input network (SIF format) and outputs affinity matrix. The pre-computed affinity matrix stores the affinity scores between any two nodes in the given network, and as demonstrated here, can be extensively reused for multiple purposes.

    Check Readme to know how to use.
    Basic usage: "perl rwr_thread.pl -e your_input_network_in_SIF_format". Protocals on how to use this perl module for many others are still in preparation.

Connectivity Matrix

    Disease-phenotype-drug matrix

    Connectivity matrix/map among diseases (Disease Ontology, DO), drugs (DrugBank ATC code, DB) and phenotypes (Human Phenotype, HP; Mouse Phenotype, MP).

    Color bar represents the connectivity score (the higher the stronger); "+" denotes the connection with statistical significance (FDR<0.05).


    Disease-drug connectivity matrix


    Disease-phenotype connectivity matrix


    Disease-phenotype connectivity matrix


    Drug-phenotype connectivity matrix


    Drug-phenotype connectivity matrix


    Cross-species phenotype connectivity matrix


    Full matrix involving 4 ontologies.

    Dots in pink are those statistical significant connections (FDR<0.05).

    Matrix involving 12 ontologies in dcGO

      Dots in pink are those statistical significant connections (FDR<0.05).

      • DO: Disease Ontology;
      • DB: DrugBank ATC code;
      • HP: Human Phenotype;
      • MP: Mouse Phenotype;
      • WP: Worm Phenotype;
      • FP: Fly Phenotype;
      • FA: Fly Anatomy;
      • ZA: Zebrafish Anatomy;
      • XA: Xenopus Anatomy;
      • YP: Yeast Phenotype;
      • EC: Enzyme Commission;
      • UP: UniProtKB UniPathway.