Installation and Guidance (GO TO New Website)

supraHex: A supra-hexagonal map for modelling, analysing and visualising tabular omics data

Computational Genomics Group, Department of Computer Science, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, UK

We introduce an open-source package called supraHex. As an open-source R package, supraHex is distributed as part of the Bioconductor (see the stable release version and the latest development version) and as part of R-Forge (see project page). It names after a supra-hexagon that is a giant hexagon on a 2-dimensional (2D) map grid seamlessly consisting of smaller hexagons. This 2D giant hexagon is intended to train/model, analyse and visualise a high-dimensional omics data, which usually involve a large number of genomic coordinates (e.g. genes) but a much smaller number of samples. With the supraHex, users are able to easily and intuitively carry out integrated tasks such as: simultaneous analysis of gene clustering and sample correlation, and the overlaying of additional data (if any) onto the map for multilayer omics data comparisons. These functionalities are demonstrated in a tutorial-like analysis of several real-world datasets (see Quick Starts and Real Cases below). This package is inspired by the prevalence and symmetric beauty in many natural and man-made objects (show info).


Citation and Slides

    Please cite this package as: Fang H, Gough J. (2013) supraHex: an R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map. Biochemical and Biophysical Research Communications, doi: http://dx.doi.org/10.1016/j.bbrc.2013.11.103.

    Also, we have prepared several slides to introduce the supraHex package.

Requirement and Installation

    1: R requirement:

    R (www.r-project.org) is a language and environment for statistical computing and graphics. We assume R (version 3.0.2) has been installed in your local machine. The current version 3.0.2 can be installed following quick instructions below for different platforms (Windows, Mac, and Linux).

    Quick link for Windows and Mac: Download R for Windows and Download R for Mac.

    Shell command lines for R installation in Terminal (for both Linux and Mac):

      Assuming you have a ROOT (sudo) privilege:

      hfang@gg-pc4:~ $ sudo su
      hfang@gg-pc4:~ $ wget http://www.stats.bris.ac.uk/R/src/base/R-3/R-3.0.2.tar.gz
      hfang@gg-pc4:~ $ tar xvfz R-3.0.2.tar.gz
      hfang@gg-pc4:~ $ cd R-3.0.2
      hfang@gg-pc4:~ $ ./configure
      hfang@gg-pc4:~ $ make
      hfang@gg-pc4:~ $ make check
      hfang@gg-pc4:~ $ make install
      hfang@gg-pc4:~ $ R # start R
      

      Without a ROOT privilege and R installation under your home directory (here '/home/hfang', which should be replaced with yours):

      hfang@gg-pc4:~ $ wget http://www.stats.bris.ac.uk/R/src/base/R-3/R-3.0.2.tar.gz
      hfang@gg-pc4:~ $ tar xvfz R-3.0.2.tar.gz
      hfang@gg-pc4:~ $ cd R-3.0.2
      hfang@gg-pc4:~ $ ./configure --prefix=/home/hfang/R-3.0.2
      hfang@gg-pc4:~ $ make
      hfang@gg-pc4:~ $ make check
      hfang@gg-pc4:~ $ make install
      hfang@gg-pc4:~ $ /home/hfang/R-3.0.2/bin/R # start R
      
    2: Install the package from Bioconductor (or R-Forge):

    Using the stable release version (officially released on 15-10-2013):

    source("http://bioconductor.org/biocLite.R")
    biocLite("supraHex")
    

    Using the development version (Recommend to use it for the benefit of the latest improvements):

    install.packages("hexbin",repos="http://www.stats.bris.ac.uk/R")
    install.packages("supraHex",repos="http://R-Forge.R-project.org", type="source")
    

    Note: More instructions on how to install the development version from the local source, click here.

    3: Load the package supraHex:
    library(supraHex)
    library(help="supraHex") # get a quick overview of the package content
    browseVignettes("supraHex") # list the package vignette in an HTML browser for User Manual (PDF) and R codes
    

User Manual and Reference Manual

  • The vignette (user manual) gives the essential step-by-step instructions of how the functions contained in the package can be used to better understand the input data.
    Note: Highly recommend to start with this first as it provides a task-oriented description of the package functionality.

  • The reference manual provides a greater detail of specifications of functions.
    Note: Highly recommend to use this as a reference if you are unhappy with results under the default setting.

Online HTML Documentation

    For easy reference online, this Online HTML Documentation is also provided. It is exactly the same as the pdf version of reference manual above but is documented as HTML pages.

Quick Starts and Real Cases

    Raw omics data may differ in preprocessing but analysable data can be uniformly stored in a table-like matrix, which digitises expression levels or other bioactivities of genomic coordinates (genes for simplicity hereinafter) across samples in question. This tabular matrix of genes X samples usually contains a large number of genes (i.e. large p features) but a much smaller number of samples (i.e. small n samples). The package 'supraHex' offers users a unique framework for the ultrafast understanding of any tabular omics data with 'large p, small n', both artistically and scientifically. The package itself contains several accompanying datasets, but users can also prepare their own dataset (preferably a tab-delimited text file: 1st row intended for sample names, 1st column for gene names, and the top-left entry being left empty or any text), which can be imported using:

    data <- read.table(file="you_input_tab_delimited_data_file", header=T, row.names=1, sep="\t")
    

    Note: It is advisable to appropriately transform input data matrix so as to be symmetric around the center. The common transformations are: 2-base logarithmic transformation (with all positive values), row/gene centering, being rescaled to the 0-1 range and so forth. However, how to make transformations is more about meanings of data itself rather than the requirements of the package.

    Below are applications in several real-world datasets and a collection of functions (together with optimised arguments) to analyse them. The end users are encouraged to reproduce the results and also to adapt them for analysing their own datasets: run them first, then get down to details. For example, sPipeline uses batch training algorithm (by default as it is much fast and can accommodate a large dataset); however, if your input data is highly asymmetric, you might switch to use sequential training algorithm instead via sMap <- sPipeline(data, algorithm="sequential"). We only give a brief summary of commands and output (i.e. files and figures). On the meanings and interpretations at great details, please refer to vignette (user manual).

    Note: All examples below are based on the latest development version (see above for the installation of this latest version).

    Fang et al

    This human embryo expression dataset (available from http://www.ncbi.nlm.nih.gov/pubmed/20643359) involves six successive developmental stages (S9-S14) with three replicates (R1-R3) for each stage, including:

    • Fang: an expression matrix of 5,441 genes X 18 samples;
    • Fang.geneinfo: a matrix of 5,441 X 3 containing gene information;
    • Fang.sampleinfo: a matrix of 18 X 3 containing sample information.

    (I) Load the package and import data
      library(supraHex)
      data(Fang) # import aforementioned three variables ('Fang', 'Fang.geneinfo' and 'Fang.sampleinfo')
      # a matrix of 5,441 genes expressed in 18 samples
      # transform data by row/gene centering
      data <- Fang - matrix(rep(apply(Fang,1,mean),ncol(Fang)),ncol=ncol(Fang))
      
    (II) Train the supra-hexagonal map with input data only
      sMap <- sPipeline(data)
      visHexMulComp(sMap, title.rotate=10) # see Figure 1
      sWriteData(sMap, data, filename="Output_Fang.txt") 
      
      Note: In your screen, you will see a figure displaying the multiple components of trained map in a sample-specific manner (see Figure 1 below). You will also see that a file (Output_Fang.txt) has been saved in your disk. The output file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters). You can also force the input data to be output; type ?sWriteData for details.

    (III) Visualise the map, including built-in indexes, data hits/distributions, distance between map nodes, and codebook matrix
      visHexMapping(sMap, mappingType="indexes") # see Figure 2
      visHexMapping(sMap, mappingType="hits") # see Figure 3
      visHexMapping(sMap, mappingType="dist") # see Figure 4
      visHexPattern(sMap, plotType="lines") # see Figure 5
      visHexPattern(sMap, plotType="bars") # see Figure 6
      
      Note: In Figure 2, the smaller hexagons in the supra-hexagonal map are indexed as follows: start from the center, and then expand circularly outwards, and for each circle increase in an anti-clock order. In Figure 3, the number represents how many input data vectors are hitting each hexagon, the size of which is proportional to the number of hits. In Figure 4, map distance tells how far each hexagon is away from its neighbors, and the size of each hexagon is proportional to this distance. In Figure 5, line plot displays the patterns associated with the codebook matrix. If multple colors are given, the points are also plotted. When the pattern involves both positive and negative values, zero horizental line is also shown. In Figure 6, bar plot displays the patterns associated with the codebook matrix. When the pattern involves both positive and negative values, the zero horizental line is in the middle of the hexagon; otherwise at the top of the hexagon for all negative values, and at the bottom for all positive values.

    (IV) Perform partitioning operation on the map to obtain continuous clusters (i.e. gene meta-clusters) as they are different from gene clusters in an individual map node
      sBase <- sDmatCluster(sMap)
      visDmatCluster(sMap, sBase) # see Figure 7
      sWriteData(sMap, data, sBase, filename="Output_base_Fang.txt")
      
      Note: In Figure 7, each cluster is filled with the same continuous color, and the cluster index is marked in the seed node. Although different clusters are coded using different colors (randomly generated), it is unavoidable to have very similar colors filling in neighbouring clusters. In other words, neighbouring clusters are visually indiscernible. In this confusing situation, you can rerun the command visDmatCluster(sMap, sBase) until neighbouring clusters are indeed filled with very different colors. An output file (Output_base_Fang.txt) has been saved in your disk. This file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters), and 3rd column for the cluster bases (i.e. gene meta-clusters). You can also force the input data to be output; type ?sWriteData for details.

    (V) Reorder the sample-specific components of the map to delineate relationships between samples
      sReorder <- sCompReorder(data, metric="euclidean") # see Figure 8
      visCompReorder(sMap, sReorder, title.rotate=15)
      
      Note: In Figure 8, you will see reordered components of trained map. Each component illustrates a sample-specific map and is placed within a two-dimensional rectangular lattice. Across components/samples, genes with similar expression patterns are mapped onto the same position of the map. Geometric locations of components delineate relationships between components/samples, that is, samples with the similar expression profiles are placed closer to each other.

    Listed below are browsable output figures. For viewing, please click the figure itself.

    Golub et al

    This leukemia patient expression dataset (the learning set, available from http://www.ncbi.nlm.nih.gov/pubmed/10521349) contains an expression matrix of 3,051 genes X 38 samples, involving two types of leukemia: 11 acute myeloid leukemia (AML) and 27 acute lymphoblastic leukemia (ALL). These 27 ALL are further subtyped into 19 B-cell ALL (ALL_B) and 8 T-cell ALL (ALL_T).

    (I) Load the package and import data
      library(supraHex)
      data(Golub)
      data <- Golub # a matrix of 3,051 genes expressed in 38 samples
      
    (II) Train the supra-hexagonal map with input data only
      sMap <- sPipeline(data)
      visHexMulComp(sMap,title.rotate=10,colormap="darkgreen-lightgreen-lightpink-darkred") # see Figure 1
      sWriteData(sMap, data, filename="Output_Golub.txt") 
      
      Note: In your screen, you will see a figure displaying the multiple components of trained map in a sample-specific manner (see Figure 1 below). You will also see that a file (Output_Golub.txt) has been saved in your disk. The output file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters). You can also force the input data to be output; type ?sWriteData for details.

    (III) Visualise the map, including built-in indexes, data hits/distributions, distance between map nodes, and codebook matrix
      visHexMapping(sMap, mappingType="indexes") # see Figure 2
      visHexMapping(sMap, mappingType="hits") # see Figure 3
      visHexMapping(sMap, mappingType="dist") # see Figure 4
      visHexPattern(sMap, plotType="lines") # see Figure 5
      visHexPattern(sMap, plotType="bars") # see Figure 6
      
      Note: In Figure 2, the smaller hexagons in the supra-hexagonal map are indexed as follows: start from the center, and then expand circularly outwards, and for each circle increase in an anti-clock order. In Figure 3, the number represents how many input data vectors are hitting each hexagon, the size of which is proportional to the number of hits. In Figure 4, map distance tells how far each hexagon is away from its neighbors, and the size of each hexagon is proportional to this distance. In Figure 5, line plot displays the patterns associated with the codebook matrix. If multple colors are given, the points are also plotted. When the pattern involves both positive and negative values, zero horizental line is also shown. In Figure 6, bar plot displays the patterns associated with the codebook matrix. When the pattern involves both positive and negative values, the zero horizental line is in the middle of the hexagon; otherwise at the top of the hexagon for all negative values, and at the bottom for all positive values.

    (IV) Perform partitioning operation on the map to obtain continuous clusters (i.e. gene meta-clusters) as they are different from gene clusters in an individual map node
      sBase <- sDmatCluster(sMap)
      visDmatCluster(sMap, sBase) # see Figure 7
      sWriteData(sMap, data, sBase, filename="Output_base_Golub.txt")
      
      Note: In Figure 7, each cluster is filled with the same continuous color, and the cluster index is marked in the seed node. Although different clusters are coded using different colors (randomly generated), it is unavoidable to have very similar colors filling in neighbouring clusters. In other words, neighbouring clusters are visually indiscernible. In this confusing situation, you can rerun the command visDmatCluster(sMap, sBase) until neighbouring clusters are indeed filled with very different colors. An output file (Output_base_Golub.txt) has been saved in your disk. This file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters), and 3rd column for the cluster bases (i.e. gene meta-clusters). You can also force the input data to be output; type ?sWriteData for details.

    (V) Reorder the sample-specific components of the map to delineate relationships between samples
      sReorder <- sCompReorder(data,metric="pearson") # see Figure 8
      visCompReorder(sMap,sReorder,title.rotate=15,
      colormap="darkgreen-lightgreen-lightpink-darkred")
      
      Note: In Figure 8, you will see reordered components of trained map. Each component illustrates a sample-specific map and is placed within a two-dimensional rectangular lattice. Across components/samples, genes with similar expression patterns are mapped onto the same position of the map. Geometric locations of components delineate relationships between components/samples, that is, samples with the similar expression profiles are placed closer to each other.

    Listed below are browsable output figures. For viewing, please click the figure itself.

    Hiratani et al

    This dataset (available from http://www.ncbi.nlm.nih.gov/pubmed/19952138) involves genome-wide replication-timing profiles of 22 cell lines from early mouse embryogenesis. These cell lines can be categorised into: 1) pluripotent cells, including ESCs (ESC_46C, ESC_D3 and ESC_TT2) and iPSCs (iPSC, iPSC_1D4 and iPSC_2D4); 2) partially-reprogrammed iPSCs (piPSC_1A2, piPSC_1B3 and piPSC_V3); 3) early epiblast (EPL and EMB3_D3); 4) late epiblast (EpiSC5 and EpiSC7); 5) Ectoderm (EBM6_D3, EBM9_D3, NPC_46C and NPC_TT2); 6) Mesoderm and Endoderm; and 7) late Mesoderm (Myoblast, MEF_female and MEF_male).

    The whole dataset

    The whole dataset is LOESS normalized, scaled and smoothed (but is not yet arbitrarily averaged into windows of 200kb), including:

    • RT: a replication timing matrix of 384,849 probes X 22 samples;
    • RT.probeinfo: a matrix of 384,849 X 2 containing probe information (mouse genomic coordinates based on build mm8, NCBI36);
    • RT.sampleinfo: a matrix of 22 X 4 containing sample information.

    (I) Load the package and import data remotely placed in the server here
      library(supraHex)
      URL <- url("http://supfam.org/SUPERFAMILY/dcGO/supraHex/realcases/GSE17983.Rda") # URL for 'GSE17983.Rda'
      load(URL) # remotely import aforementioned three variables ('RT', 'RT.probeinfo' and 'RT.sampleinfo')
      close(URL) # since url() always opens the connection
      ls() # you should see three variables: 'RT', 'RT.probeinfo' and 'RT.sampleinfo'
      data <- RT # a matrix of 384,849 probes X 22 samples
      colnames(data) <- RT.sampleinfo$Name # replace with informative sample names
      
    (II) Train the supra-hexagonal map with input data only
      # it takes a while (~20 min under 64-bit LINUX) but already ultrafast
      # as this dataset (>8 million entries) is computationally prohibitive by conventional clustering
      sMap <- sPipeline(data)
      visHexMulComp(sMap, title.rotate=5, gp=grid::gpar(cex=0.8,font=2)) # see Figure 1
      sWriteData(sMap, data, filename="Output_Hiratani.txt") # be patient or skip this output
      
      Note: In your screen, you will see a figure displaying the multiple components of trained map in a sample-specific manner (see Figure 1 below). You will also see that a file (Output_Hiratani.txt) has been saved in your disk. The output file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters). You can also force the input data to be output; type ?sWriteData for details.

    (III) Visualise the map, including built-in indexes, data hits/distributions, distance between map nodes, and codebook matrix
      visHexMapping(sMap, mappingType="indexes", gp=grid::gpar(cex=0.25)) # see Figure 2
      visHexMapping(sMap, mappingType="hits", gp=grid::gpar(cex=0.25,font=2), fill.color="yellow") # see Figure 3
      visHexPattern(sMap, plotType="lines") # see Figure 5
      visHexPattern(sMap, plotType="bars") # see Figure 6
      
      Note: In Figure 2, the smaller hexagons in the supra-hexagonal map are indexed as follows: start from the center, and then expand circularly outwards, and for each circle increase in an anti-clock order. In Figure 3, the number represents how many input data vectors are hitting each hexagon, the size of which is proportional to the number of hits. In Figure 5, line plot displays the patterns associated with the codebook matrix. If multple colors are given, the points are also plotted. When the pattern involves both positive and negative values, zero horizental line is also shown. In Figure 6, bar plot displays the patterns associated with the codebook matrix. When the pattern involves both positive and negative values, the zero horizental line is in the middle of the hexagon; otherwise at the top of the hexagon for all negative values, and at the bottom for all positive values.

    (IV) Perform partitioning operation on the map to obtain continuous clusters (i.e. probe meta-clusters) as they are different from probe clusters in an individual map node
      sBase <- sDmatCluster(sMap)
      visDmatCluster(sMap, sBase, gp=grid::gpar(cex=0.4)) # see Figure 7
      sWriteData(sMap, data, sBase, filename="Output_base_Hiratani.txt")
      
      Note: In Figure 7, each cluster is filled with the same continuous color, and the cluster index is marked in the seed node. Although different clusters are coded using different colors (randomly generated), it is unavoidable to have very similar colors filling in neighbouring clusters. In other words, neighbouring clusters are visually indiscernible. In this confusing situation, you can rerun the command visDmatCluster(sMap, sBase) until neighbouring clusters are indeed filled with very different colors. An output file (Output_base_Hiratani.txt) has been saved in your disk. This file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. probe clusters), and 3rd column for the cluster bases (i.e. probe meta-clusters). You can also force the input data to be output; type ?sWriteData for details.

    (V) Reorder the sample-specific components of the map to delineate relationships between samples
      sReorder <- sCompReorder(sMap, amplifier=4, metric="mi") # see Figure 8
      visCompReorder(sMap, sReorder, title.rotate=10, gp=grid::gpar(cex=0.6,font=2))
      
      Note: In Figure 8, you will see reordered components of trained map. Each component illustrates a sample-specific map and is placed within a two-dimensional rectangular lattice. Across components/samples, genes with similar expression patterns are mapped onto the same position of the map. Geometric locations of components delineate relationships between components/samples, that is, samples with the similar expression profiles are placed closer to each other.

    Listed below are browsable output figures. For viewing, please click the figure itself.

    The refseq dataset

    The refseq dataset is the same as the whole dataset but only for RefSeq gene TSS locations, including:

    • RT: a replication timing matrix of 17,292 genes X 22 samples;
    • CpG: a matrix of 17,292 genes X 1 containing gene additional information on promoter CpG classification (see http://www.ncbi.nlm.nih.gov/pubmed/17603471), with '1' for HCP (high CpG density promoters), '-1' for LCP (low CpG density promoters), '0' for ICP (intermediate CpG density promoters), and 'NA' for unclassified;
    • EX: a expression matrix of 17,292 genes X 8 samples, and samples include pluripotent cells (ESC_D3); early epiblast (EMB3_D3); late epiblast (EpiSC7); Ectoderm (EBM6_D3 and EBM9_D3); Mesoderm and Endoderm.

    (I) Load the package and import data remotely placed in the server here
      library(supraHex)
      URL <- url("http://supfam.org/SUPERFAMILY/dcGO/supraHex/realcases/Hiratani_TableS1.Rda")
      load(URL)
      close(URL)
      ls() # you should see three variables: 'RT', 'CpG' and 'EX'
      data <- RT # a replication timing matrix of 17,292 genes X 22 samples
      
    (II) Train the supra-hexagonal map with input data only
      sMap <- sPipeline(data)
      visHexMulComp(sMap, title.rotate=5, gp=grid::gpar(cex=0.8), zlim=c(-2,2), colormap="darkblue-white-darkorange") # see Figure 1
      sWriteData(sMap, data, filename="Output_Hiratani_TableS1.txt")
      
      Note: In your screen, you will see a figure displaying the multiple components of trained map in a sample-specific manner (see Figure 1 below). You will also see that a file (Output_Hiratani_TableS1.txt) has been saved in your disk. The output file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters). You can also force the input data to be output; type ?sWriteData for details.

    (III) Visualise the map, including built-in indexes, data hits/distributions, distance between map nodes, and codebook matrix
      visHexMapping(sMap, mappingType="indexes", gp=grid::gpar(cex=0.5)) # see Figure 2
      visHexMapping(sMap, mappingType="hits", gp=grid::gpar(cex=0.5)) # see Figure 3
      visHexMapping(sMap, mappingType="dist") # see Figure 4
      visHexPattern(sMap, plotType="lines") # see Figure 5
      visHexPattern(sMap, plotType="bars") # see Figure 6
      
      Note: In Figure 2, the smaller hexagons in the supra-hexagonal map are indexed as follows: start from the center, and then expand circularly outwards, and for each circle increase in an anti-clock order. In Figure 3, the number represents how many input data vectors are hitting each hexagon, the size of which is proportional to the number of hits. In Figure 4, map distance tells how far each hexagon is away from its neighbors, and the size of each hexagon is proportional to this distance. In Figure 5, line plot displays the patterns associated with the codebook matrix. If multple colors are given, the points are also plotted. When the pattern involves both positive and negative values, zero horizental line is also shown. In Figure 6, bar plot displays the patterns associated with the codebook matrix. When the pattern involves both positive and negative values, the zero horizental line is in the middle of the hexagon; otherwise at the top of the hexagon for all negative values, and at the bottom for all positive values.

    (IV) Perform partitioning operation on the map to obtain continuous clusters (i.e. probe meta-clusters) as they are different from probe clusters in an individual map node
      sBase <- sDmatCluster(sMap)
      visDmatCluster(sMap, sBase) # see Figure 7
      sWriteData(sMap, data, sBase, filename="Output_base_Hiratani_TableS1.txt")
      
      Note: In Figure 7, each cluster is filled with the same continuous color, and the cluster index is marked in the seed node. Although different clusters are coded using different colors (randomly generated), it is unavoidable to have very similar colors filling in neighbouring clusters. In other words, neighbouring clusters are visually indiscernible. In this confusing situation, you can rerun the command visDmatCluster(sMap, sBase) until neighbouring clusters are indeed filled with very different colors. An output file (Output_base_Hiratani_TableS1.txt) has been saved in your disk. This file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters), and 3rd column for the cluster bases (i.e. gene meta-clusters). You can also force the input data to be output; type ?sWriteData for details.

    (V) Reorder the sample-specific components of the map to delineate relationships between samples
      sReorder <- sCompReorder(sMap, metric="mi") # see Figure 8
      visCompReorder(sMap, sReorder, title.rotate=10, gp=grid::gpar(cex=0.6,font=2),zlim=c(-2,2), colormap="darkblue-white-darkorange")
      
      Note: In Figure 8, you will see reordered components of trained map. Each component illustrates a sample-specific map and is placed within a two-dimensional rectangular lattice. Across components/samples, genes with similar expression patterns are mapped onto the same position of the map. Geometric locations of components delineate relationships between components/samples, that is, samples with the similar expression profiles are placed closer to each other.

    (VI) Overlay the CpG additional data to the trained map to view the distribution of CpG data
      additional_HCP <- as.numeric(CpG==1 & !is.na(CpG))
      additional_LCP <- as.numeric(CpG==-1 & !is.na(CpG))
      additional <- cbind(additional_HCP, additional_LCP)
      colnames(additional) <- c("HCP","LCP")
      sOverlay <- sMapOverlay(sMap=sMap, data=data, additional=additional) 
      visHexMulComp(sOverlay, colormap="lightyellow-darkred", zlim=c(0, signif(max(sOverlay$codebook),1))) # see Figure 9
      
    (VII) Overlay the expression additional data to the trained map to view the distribution of expression data, which is further used to explore relationships between samples
      sOverlay <- sMapOverlay(sMap=sMap, data=data, additional=EX)
      sReorder <- sCompReorder(sOverlay, metric="none") # here reorder samples based on overlaid distribution data rather than expression data
      visCompReorder(sOverlay, sReorder, title.rotate=10, colormap="darkgreen-lightgreen-lightpink-darkred") # see Figure 10
      

    Listed below are browsable output figures. For viewing, please click the figure itself.

    Xiang et al

    This arabidopsis embryo expression dataset (available from http://www.ncbi.nlm.nih.gov/pubmed/21402797) contains gene expression levels (3,625 genes and 7 embryo stages). Only genes with at least 2-fold changes (in any stage) as compared to the average over embryo stages are considered. These embryos are at: early embryonic stage (i.e. Zygote and Quadrant), mid-embryonic stage (i.e. Globular, Heart and Torpedo), and late stages from Bent to Mature.

    (I) Load the package and import data
      library(supraHex)
      data(Xiang)
      data <- Xiang # a matrix of 3,625 genes expressed in 7 samples
      
    (II) Train the supra-hexagonal map with input data only
      sMap <- sPipeline(data)
      visHexMulComp(sMap,title.rotate=10,colormap="darkblue-white-darkorange") # see Figure 1
      sWriteData(sMap, data, filename="Output_Xiang.txt") 
      
      Note: In your screen, you will see a figure displaying the multiple components of trained map in a sample-specific manner (see Figure 1 below). You will also see that a file (Output_Xiang.txt) has been saved in your disk. The output file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters). You can also force the input data to be output; type ?sWriteData for details.

    (III) Visualise the map, including built-in indexes, data hits/distributions, distance between map nodes, and codebook matrix
      visHexMapping(sMap, mappingType="indexes") # see Figure 2
      visHexMapping(sMap, mappingType="hits") # see Figure 3
      visHexMapping(sMap, mappingType="dist") # see Figure 4
      visHexPattern(sMap, plotType="lines") # see Figure 5
      visHexPattern(sMap, plotType="bars",colormap="rainbow",legend.cex=0.5) # see Figure 6
      
      Note: In Figure 2, the smaller hexagons in the supra-hexagonal map are indexed as follows: start from the center, and then expand circularly outwards, and for each circle increase in an anti-clock order. In Figure 3, the number represents how many input data vectors are hitting each hexagon, the size of which is proportional to the number of hits. In Figure 4, map distance tells how far each hexagon is away from its neighbors, and the size of each hexagon is proportional to this distance. In Figure 5, line plot displays the patterns associated with the codebook matrix. If multple colors are given, the points are also plotted. When the pattern involves both positive and negative values, zero horizental line is also shown. In Figure 6, bar plot displays the patterns associated with the codebook matrix. When the pattern involves both positive and negative values, the zero horizental line is in the middle of the hexagon; otherwise at the top of the hexagon for all negative values, and at the bottom for all positive values.

    (IV) Perform partitioning operation on the map to obtain continuous clusters (i.e. gene meta-clusters) as they are different from gene clusters in an individual map node
      sBase <- sDmatCluster(sMap)
      visDmatCluster(sMap, sBase) # see Figure 7
      sWriteData(sMap, data, sBase, filename="Output_base_Xiang.txt")
      
      Note: In Figure 7, each cluster is filled with the same continuous color, and the cluster index is marked in the seed node. Although different clusters are coded using different colors (randomly generated), it is unavoidable to have very similar colors filling in neighbouring clusters. In other words, neighbouring clusters are visually indiscernible. In this confusing situation, you can rerun the command visDmatCluster(sMap, sBase) until neighbouring clusters are indeed filled with very different colors. An output file (Output_base_Xiang.txt) has been saved in your disk. This file has 1st column for your input data ID (an integer; otherwise the row names of input data matrix), and 2nd column for the corresponding index of best-matching hexagons (i.e. gene clusters), and 3rd column for the cluster bases (i.e. gene meta-clusters). You can also force the input data to be output; type ?sWriteData for details.

    (V) Reorder the sample-specific components of the map to delineate relationships between samples
      sReorder <- sCompReorder(sMap, metric="pearson") # see Figure 8
      visCompReorder(sMap,sReorder,title.rotate=10,colormap="darkblue-white-darkorange")
      
      Note: In Figure 8, you will see reordered components of trained map. Each component illustrates a sample-specific map and is placed within a two-dimensional rectangular lattice. Across components/samples, genes with similar expression patterns are mapped onto the same position of the map. Geometric locations of components delineate relationships between components/samples, that is, samples with the similar expression profiles are placed closer to each other.

    Listed below are browsable output figures. For viewing, please click the figure itself.