Correlation Network analysis of Biological Data Zhaolong Yu Fudan University, Shanghai, 200433 Abstract Correlation network analysis has been widely used for finding clusters or modules in complex networks, especially in biological networks and stock networks. On the basis of correlations between quantitative measurements weighted correlation network analysis can be implemented to identify modules formed by highly correlated elements such as genes or proteins in the biological networks. With the help of this method, we are able to explore the system-level functionality of certain genes. In this article, we tried to take advantages of weighted correlation network analysis to investigate gene co-expression networks in the context of transcriptional response of cells to changing conditions. Introduction Networks provide a straightforward representation of interactions between different elements in a system, which enables us gain insights about the dynamics of complex systems under various conditions. In the past ten years, network-based methods have been found useful in many domains including social, physical and biological system analyses. For example, in social networks, network-based methods could help us predict potential links between two people, detect highly connected community and identify the most influential"superstar When it comes to biological networks, with the rapid development of biomedical science, more and more biological networks have been identified, such as gene co expression networks, protein-protein interaction networks and cell-cell interaction networks. Previous research about various biological processes fell short of accurately quantifying the biological molecules and tracking the biological reactions in a systematic view Simply measuring the expression of few genes and investigating the molecular mechanisms in one or two pathways necessarily help explain complex biological process during which thousands of reactions are ongoing at the same time. Given the fact that there exists a gene regulation network in every cell in which around 20,000 genes, millions of RNAs and proteins interact with each other and achieve a balance network analyses have made it possible to take different kinds of biological components into consideration and delve deeper and deeper to see through the underlying mechanisms of gene expression and regulation In many real networks, the probability that a node is connected with k other node p(k) decays as a power law. Many biological networks follow the same structure
Correlation Network Analysis of Biological Data Zhaolong Yu Fudan University, Shanghai, 200433 Abstract Correlation network analysis has been widely used for finding clusters or modules in complex networks, especially in biological networks and stock networks. On the basis of correlations between quantitative measurements, weighted correlation network analysis can be implemented to identify modules formed by highly correlated elements such as genes or proteins in the biological networks. With the help of this method, we are able to explore the system-level functionality of certain genes. In this article, we tried to take advantages of weighted correlation network analysis to investigate gene co-expression networks in the context of transcriptional response of cells to changing conditions. Introduction Networks provide a straightforward representation of interactions between different elements in a system, which enables us gain insights about the dynamics of complex systems under various conditions. In the past ten years, network-based methods have been found useful in many domains including social, physical and biological system analyses. For example, in social networks, network-based methods could help us predict potential links between two people, detect highly connected community and identify the most influential “superstar”. When it comes to biological networks, with the rapid development of biomedical science, more and more biological networks have been identified, such as gene coexpression networks, protein-protein interaction networks and cell-cell interaction networks. Previous research about various biological processes fell short of accurately quantifying the biological molecules and tracking the biological reactions in a systematic view. Simply measuring the expression of few genes and investigating the molecular mechanisms in one or two pathways do not necessarily help explain complex biological process during which thousands of reactions are ongoing at the same time. Given the fact that there exists a gene regulation network in every cell in which around 20,000 genes, millions of RNAs and proteins interact with each other and achieve a balance, network analyses have made it possible to take different kinds of biological components into consideration and delve deeper and deeper to see through the underlying mechanisms of gene expression and regulation. In many real networks, the probability that a node is connected with k other node p(k) decays as a power law. Many biological networks follow the same structure
where the topology is dominated by a few highly connected nodes (hubs) which link the rest of the less connected node. For example, analysis of the protein protein interaction network revealed that highly connected nodes are more likely to be essential for survival, namely household genes or proteins. To have a better understanding of biological networks, one of the most important things need to be done is to figure out the relationships between different components inside the cell. Correlation network analysis turns out to be an effective method to measure this kind of relationships and detect the functional clusters Correlation networks are constructed on the basis of correlations between quantitative measurements that can be described by an n x m matrix X where the row indices correspond to network nodes (i=1, 2, 3..., n and the column indices (=1, 2, 3..., m) correspond to sample measurements The apparent rationale behind correlation network methodology is to use network language to find clusters(modules )of interconnected nodes, which means a set of nodes closely connected according to a suitably defined measure of interconnectedness(correlation). The second usage of correlation network is to identify significant modules among all the modules that are computed by the analysis pipeline. By virtue of a node significance measure, modules with high average node significance are identified as significant modules. Also, with correlation networks, we can easily annotate all network nodes to certain functional modules so that the potential functions of certain genes or proteins in certain biological process could be identified. This can be accomplished by defining a fuzzy measure of module memberships that generalizes the binary module membership indicator to a quantitative measure In conclusion with the help of correlation network analysis, we could gain deeper insights into the biological regulation network and try to predict what is really happening inside the cells Materials and methods In this article, we used weighted correlation network analysis pipeline to investigate gene co-expression network and tried to explain the regulatory relationships between different players in gene regulation network. First of all, we define a measurement of similarity between the gene expression profiles. This similarity measures the extent of concordance between gene expressions over a period of time or across different experiment conditions such as, the expression profiles of gene p53 in the process of pathogenesis of tumor or the expression levels of gene HuR under different concentrations of ATP. Specifically, for each pair of genes i and j, we denote this similarity measurement by Sij, and the similarity between genes i and j is defined as the absolute value of the pearson correlation This Pearson correlation score are calculated from an n x m matrix X where the
where the topology is dominated by a few highly connected nodes (hubs) which link the rest of the less connected node. For example, analysis of the proteinprotein interaction network revealed that highly connected nodes are more likely to be essential for survival, namely household genes or proteins. To have a better understanding of biological networks, one of the most important things need to be done is to figure out the relationships between different components inside the cell. Correlation network analysis turns out to be an effective method to measure this kind of relationships and detect the functional clusters. Correlation networks are constructed on the basis of correlations between quantitative measurements that can be described by an n × m matrix X where the row indices correspond to network nodes (i = 1, 2, 3 . . . , n) and the column indices (l = 1, 2, 3 . . . , m) correspond to sample measurements. The apparent rationale behind correlation network methodology is to use network language to find clusters (modules) of interconnected nodes, which means a set of nodes closely connected according to a suitably defined measure of interconnectedness (correlation). The second usage of correlation network is to identify significant modules among all the modules that are computed by the analysis pipeline. By virtue of a node significance measure, modules with high average node significance are identified as significant modules. Also, with correlation networks, we can easily annotate all network nodes to certain functional modules so that the potential functions of certain genes or proteins in certain biological process could be identified. This can be accomplished by defining a fuzzy measure of module memberships that generalizes the binary module membership indicator to a quantitative measure. In conclusion, with the help of correlation network analysis, we could gain deeper insights into the biological regulation network and try to predict what is really happening inside the cells. Materials and Methods In this article, we used weighted correlation network analysis pipeline to investigate gene co-expression network and tried to explain the regulatory relationships between different players in gene regulation network. First of all, we define a measurement of similarity between the gene expression profiles. This similarity measures the extent of concordance between gene expressions over a period of time or across different experiment conditions such as, the expression profiles of gene p53 in the process of pathogenesis of tumor or the expression levels of gene HuR under different concentrations of ATP. Specifically, for each pair of genes i and j, we denote this similarity measurement by sij, and the similarity between genes i and j is defined as the absolute value of the Pearson correlation sij = |cor(i,j)|, This Pearson correlation score are calculated from an n × m matrix X where the
row indices correspond to network nodes(i=1, 2, 3..., n)and the column indices (=1, 2, 3..., m) correspond to different sample measurements of the same node Moreover, we denote the similarity matrix by S=[sij Secondly, we transform the similarity matrix into an adjacency matrix. Since the unweighted networks are unable to reflect the continuous nature of the underlying co-expression information, instead of implementing hard thresholding resulting in an unweighted network, we choose soft-thresholding strategy to generate the adjacent matrix for the weighted network. The weighted network adjacency can be defined by raising the co-expression similarity to a power with B21. The parameter B is returned by r function pick Threshold and it could be easily seen that the weighted adjacency ai between two genes is proportional to their similarity on a logarithmic scale, log (aij=E x log(si) Thirdly, we use the topological overlap dissimilarity measure to identify the functional modules which consists of densely interconnected genes without the use of priori defined gene sets. The default method is hierarchical clustering with the standard r function hclust and branches of the hierarchical clustering dendrogram correspond to modules can be identified using one of a wide range of available branch cutting methods including the constant-height cut and Dynamic Tree Cut method. The topological overlap of two nodes reflects their relative interconnectedness and the topological overlap matrix(tOM)n2=[oiil provides a similarity measure (opposite of dissimilarity), which has been found useful in unweighted and weighted networks {k,k}+1-a wherelij= 2uaiuauj, and ki is the node connectivity To calculate a dissimilarity measure, we use formula d-1-wg to define the topological overlap-based dissimilarity measure. Once the gene modules have been determined, what we need to do now is to relate the gene modules to external information. Based on the gene sets generated in the fourth step, we can implement functional enrichment analysis to figure out whether the genes in the gene modules have some special enriched cellular functions. Furthermore, we need to identify biologically or clinically significant modules and genes, which is a major goal of gene expression analyses. The definition of biological or clinical significance depends on the research question under consideration. Abstractly speaking, we define a gene significance measure as a function that assigns a non-negative number to each gene; the higher the value is, the more biologically significant the gene is In gene knockout experiments gene significance could indicate knockout essentiality while a microarray sample trait t can be used to define a trait-based gene significance measure as the absolute correlation between the trait and the expression profiles. For a functional
row indices correspond to network nodes (i = 1, 2, 3 . . . , n) and the column indices (l = 1, 2, 3 . . . , m) correspond to different sample measurements of the same node. Moreover, we denote the similarity matrix by S = [sij]. Secondly, we transform the similarity matrix into an adjacency matrix. Since the unweighted networks are unable to reflect the continuous nature of the underlying co-expression information, instead of implementing hard thresholding resulting in an unweighted network, we choose soft-thresholding strategy to generate the adjacent matrix for the weighted network. The weighted network adjacency can be defined by raising the co-expression similarity to a power aij =𝑠𝑖𝑗 𝛽 , with β≥1. The parameter β is returned by R function pickSoftThreshold and it could be easily seen that the weighted adjacency aij between two genes is proportional to their similarity on a logarithmic scale, log(aij) = E × log(sij). Thirdly, we use the topological overlap dissimilarity measure to identify the functional modules which consists of densely interconnected genes without the use of priori defined gene sets. The default method is hierarchical clustering with the standard R function hclust and branches of the hierarchical clustering dendrogram correspond to modules can be identified using one of a wide range of available branch cutting methods including the constant-height cut and Dynamic Tree Cut method. The topological overlap of two nodes reflects their relative interconnectedness and the topological overlap matrix (TOM) Ω = [ωij] provides a similarity measure (opposite of dissimilarity), which has been found useful in unweighted and weighted networks where𝑙𝑖𝑗 = ∑𝑢 𝑎𝑖𝑢𝑎𝑢𝑗, and ki is the node connectivity. To calculate a dissimilarity measure, we use formula to define the topological overlap-based dissimilarity measure. Once the gene modules have been determined, what we need to do now is to relate the gene modules to external information. Based on the gene sets generated in the fourth step, we can implement functional enrichment analysis to figure out whether the genes in the gene modules have some special enriched cellular functions. Furthermore, we need to identify biologically or clinically significant modules and genes, which is a major goal of gene expression analyses. The definition of biological or clinical significance depends on the research question under consideration. Abstractly speaking, we define a gene significance measure as a function that assigns a non-negative number to each gene; the higher the value is, the more biologically significant the gene is. In gene knockout experiments, gene significance could indicate knockout essentiality while a microarray sample trait T can be used to define a trait-based gene significance measure as the absolute correlation between the trait and the expression profiles. For a functional
module, a measure of module significance can be defined as average gene significance across the module genes Next, studying topological properties of biological network is also of great importance. Many topological properties of networks can be succinctly described using network concepts, also known as network statistics including whole network connectivity(degree), intramodular connectivity, topological overlap, the clustering coefficient, density and so on Differential analysis of network concepts such as network connectivity may reveal potential regulatory changes in certain gene expressions. The WGCNa package of R implements several functions, such as softConnectivity, intramodular Connectivity, TOMSimilarity, cluster Coef networkConcepts, for computing these network statistics. Basic R functions can be used to create summary statistics of these concepts and for testing their differences across networks Results and discussions 1. Data cleaning and preprocessing In this article, we downloaded the gene expression data(microarray data of female liver cells and microarray data of male liver cells) from the online microarray database. These two data sets contain roughly 130 samples each Note that each row corresponds to a gene and each column to a sample or othe experiment information. We extracted the expression data from the raw file into a multi-set format suitable for consensus analysis. Due to the large numbers of missing data, we implemented R function goodSamplesgenesMs to filter the sample which contains excessive number of missing data. Moreover, we used Euclidean distance-based sample clustering to filter out the sample which fell in the range of outliers, there was a sample named F2 221 seemed to be the outlier in the female liver data. After this quality control, the two datasets were ready for further analysis Sample clustering on all genes in Female liver Figure 1 Sample clustering result We also downloaded the gene annotation file and clinical traits file so that we
module, a measure of module significance can be defined as average gene significance across the module genes. Next, studying topological properties of biological network is also of great importance. Many topological properties of networks can be succinctly described using network concepts, also known as network statistics including whole network connectivity (degree), intramodular connectivity, topological overlap, the clustering coefficient, density and so on. Differential analysis of network concepts such as network connectivity may reveal potential regulatory changes in certain gene expressions. The WGCNA package of R implements several functions, such as softConnectivity, intramodularConnectivity, TOMSimilarity, clusterCoef, networkConcepts, for computing these network statistics. Basic R functions can be used to create summary statistics of these concepts and for testing their differences across networks. Results and Discussions 1. Data cleaning and preprocessing In this article, we downloaded the gene expression data (microarray data of female liver cells and microarray data of male liver cells) from the online microarray database. These two data sets contain roughly 130 samples each. Note that each row corresponds to a gene and each column to a sample or other experiment information. We extracted the expression data from the raw file into a multi-set format suitable for consensus analysis. Due to the large numbers of missing data, we implemented R function goodSamplesGenesMS to filter the sample which contains excessive number of missing data. Moreover, we used Euclidean distance-based sample clustering to filter out the sample which fell in the range of outliers, there was a sample named F2_221 seemed to be the outlier in the female liver data. After this quality control, the two datasets were ready for further analysis. Figure 1 Sample clustering result We also downloaded the gene annotation file and clinical traits file so that we
could match these information to the expression data 2. Network construction Network construction is the most important step in the relation network analysis Since we chose the one-step soft-thresholding strategy to generate the adjacent matrix for the network, the construction step entails the choice of the soft thresholding power B to which co-expression similarity is raised to calculate adjacency. Given the fact that the gene regulation follows the power law distribution, we choose the soft thresholding power value based on the criterion of approximate scale-free topology. Therefore, we made the use of the function pick Soft Threshold that performs the analysis of network topology From 1 to 15, it seemed that 6, 7 and 8 could be the proper soft-thresholding power values. In order to speed up the calculation and fit the scale-free topology model better, we chose 7 as the soft-thresholding power value Scale Free Topology Model Fit Median connectivity . Male liver 6.7.89.10.41-12134… Soft Threshold(power) Mean connectivity Max connectivity Soft Threshold(power) Soft Threshold (power) Figure 2 Soft-thresholding power test 3. Functional module detection Based on precious results, we chose the soft thresholding power 7, minimum module size 30, the module detection sensitivity deepSplit 2. As for the merging parameters, we set the cut height for merging of modules as 0.20 which meant modules whose gene expressions are correlated above 1-0 0.8 will be merged. It could be easily seen that roughly 11 gene modules or gene clusters had been identified based on weighted correlation networks constructed from the gene expression data. In reality, there are 17 gene modules had been found however
could match these information to the expression data. 2. Network construction Network construction is the most important step in the relation network analysis. Since we chose the one-step soft-thresholding strategy to generate the adjacent matrix for the network, the construction step entails the choice of the soft thresholding power β to which co-expression similarity is raised to calculate adjacency. Given the fact that the gene regulation follows the powerlaw distribution, we choose the soft thresholding power value based on the criterion of approximate scale-free topology. Therefore, we made the use of the function pickSoftThreshold that performs the analysis of network topology. From 1 to 15, it seemed that 6, 7 and 8 could be the proper soft-thresholding power values. In order to speed up the calculation and fit the scale-free topology model better, we chose 7 as the soft-thresholding power value. Figure 2 Soft-thresholding power test 3. Functional module detection Based on precious results, we chose the soft thresholding power 7, minimum module size 30, the module detection sensitivity deepSplit 2. As for the merging parameters, we set the cut height for merging of modules as 0.20, which meant modules whose gene expressions are correlated above 1−0.2 =0.8 will be merged. It could be easily seen that roughly 11 gene modules or gene clusters had been identified based on weighted correlation networks constructed from the gene expression data. In reality, there are 17 gene modules had been found however
there are only 11 Consensus gene dendrogram and module colors -m∞ oogo寸 Figure 3 Gene modules Table 1 Network construction results Fundamental Eigengene-based Conformity-Based Density 0.207551501308766 0.2055690 Centralization 0.12773170.2047174 0.1506362 Heterogeneity 024678510.6172681 0.2818983 Mean cluster coef 0.2516585 0.2495406 0.2395485 Mean Connectivity 746.9778522471.0247652 7398428846 Length Class Mode colors 3600 -none- numeric unmergedcolors 3600 -none- numeric multiMEs goodsamples 2 -none-list goodGenes 3600 -none- logical dendrograms 1 -none- list TOMFiles 0 -none- NULL
there are only 11 Figure 3 Gene modules Table 1 Network construction results Fundamental Eigengene-based Conformity-Based Density 0.2075515 0.1308766 0.2055690 Centralization 0.1277317 0.2047174 0.1506362 Heterogeneity 0.2467851 0.6172681 0.2818983 Mean ClusterCoef 0.2516585 0.2495406 0.2395485 Mean Connectivity 746.9778522 471.0247652 739.8428846 Length Class Mode colors 3600 -none- numeric unmergedColors 3600 -none- numeric multiMEs 2 -none- list goodSamples 2 -none- list goodGenes 3600 -none- logical dendrograms 1 -none- list TOMFiles 0 -none- NULL
blockgenes 1 -none- list blocks 3600 -none- numeric origin Count 2 -none- numeric network Calibration Samples 0 -none- NULL individualtominfo 11 -none- list consensusToMInfo 0 -none- NULL consensusQuantile 1 none- numeric 4. Relating modules to external information After the basic processing of the data, network construction and module detection, in order to have a better understanding of the underlying biological functions of the gene modules and clusters calculated from the weighted correlation networks, relating current modules to external information should Since we had several available clinical traits which correspond to the column of the raw data and the clustering results of gene expression had been calculated we could relate the traits to consensus module genes in each of the two sets. However the consensus modules is a single module assignment for all genes, the module eigengenes represent the modules in each of the two sets Therefore, we need the trait data separately matched for the female and for the male expression data Module--trait relationships in Female liver 品品品品品甜 MEmagenta 品 Mμ灬滥蕊岀當褴甜囂 副需翻"翻 M计出出出出出甜出盖出出 詔雷褴甜晶品品品需四 三 Figure 4 Module trait relationship(from female liver cells). Relationships of consensus module eigengenes and clinical traits in the female data. Each row in the table corresponds to a consensus module, and each column to a trait. Numbers in the table report the correlations of the parentheses. The table is color coded by correlation according to the color legend
blockGenes 1 -none- list blocks 3600 -none- numeric originCount 2 -none- numeric networkCalibrationSamples 0 -none- NULL individualTOMInfo 11 -none- list consensusTOMInfo 0 -none- NULL consensusQuantile 1 -none- numeric 4. Relating modules to external information After the basic processing of the data, network construction and module detection, in order to have a better understanding of the underlying biological functions of the gene modules and clusters calculated from the weighted correlation networks, relating current modules to external information should be done. Since we had several available clinical traits which correspond to the column of the raw data and the clustering results of gene expression had been calculated, we could relate the traits to consensus module genes in each of the two sets. However the consensus modules is a single module assignment for all genes, the module eigengenes represent the modules in each of the two sets. Therefore, we need the trait data separately matched for the female and for the male expression data. Figure 4 Module trait relationship (from female liver cells). Relationships of consensus module eigengenes and clinical traits in the female data. Each row in the table corresponds to a consensus module, and each column to a trait. Numbers in the table report the correlations of the corresponding module eigengenes and traits, with the p-values printed below the correlations in parentheses.The table is color coded by correlation according to the color legend
Module--trait relationships in Male liver MEgreen蝨攝攝霅霅蠶鸉囧詔詔畾詔霅端 密甜雷常需雷需温温品需需密雷器密 MEu圆需品雷密需需温雷深雷雷画 MEgrey60 MEmidnightblue■ 出品温甜甜品出温需温 出aH翻 ∷岀訓鱻〓器麟 Mc益出话些出出出出游出可 MEturquoise Ee■岀霅詔。認灬 如需需m。番 Figure 5 Module trait relationship(from male liver cells). Relationships of consensus module eigengenes and clinical traits in the male data. Each row in the table corresponds to consensus module, and each column to a trait. Numbers in the table report the correlations of the orresponding module eigengenes and traits, with the p-values printed below the correlations in parentheses. The table is color coded by correlation according to the color legend From the figure of the module-trait relationship we could easily identify what specific molecular functions or pathways that the gene modules were participating in. Moreover, the consensus relationship table isolated the module-trait relationships that are present in both sets. For example, we confirm that the turquoise, purple, and red modules are highly related to size of the body in both sets,as we could see in the traits, the trait terms were highly clustered within several keywords such as fat, weight, length and leptin (a kind of protein which could help mammals to lose weight ); the pink module was highly related to insulin levels in female liver cells while was not related to insulin levels in male liver cells the light cyan module was highly related to trigly levels in male liver cells while was not related to insulin levels in female liver cells Therefore, we could know that there are some common gene expression pattern across different genders, however, some gene expression profiles are totally different in female and male liver cells. but from the table we could also find that genes are really working with each other, which means certain genes may participate in the same biological process. That is exactly why our bodies can adapt to different environments because the gene regulation networks inside our cell enable us to have many tools to do the same jobs, so that our physiological systems are robust to resist a wide ranges of changes
Figure 5 Module trait relationship (from male liver cells). Relationships of consensus module eigengenes and clinical traits in the male data. Each row in the table corresponds to a consensus module, and each column to a trait. Numbers in the table report the correlations of the corresponding module eigengenes and traits, with the p-values printed below the correlations in parentheses.The table is color coded by correlation according to the color legend. From the figure of the module-trait relationship, we could easily identify what specific molecular functions or pathways that the gene modules were participating in. Moreover, the consensus relationship table isolated the module-trait relationships that are present in both sets. For example, we confirm that the turquoise, purple, and red modules are highly related to size of the body in both sets, as we could see in the traits, the trait terms were highly clustered within several keywords such as fat, weight, length and leptin (a kind of protein which could help mammals to lose weight); the pink module was highly related to insulin levels in female liver cells while was not related to insulin levels in male liver cells; the light cyan module was highly related to trigly levels in male liver cells while was not related to insulin levels in female liver cells. Therefore, we could know that there are some common gene expression pattern across different genders, however, some gene expression profiles are totally different in female and male liver cells. But from the table we could also find that genes are really working with each other, which means certain genes may participate in the same biological process. That is exactly why our bodies can adapt to different environments because the gene regulation networks inside our cells enable us to have many tools to do the same jobs, so that our physiological systems are robust to resist a wide ranges of changes
Table2 Some Module results(chosen from 3600 genes) EntrezIDModuleLabelModuleColor MMT000000441700007N18Rik 69339 0 grey 00000046Mast2 17776 blac 00000051 Ankrd32 105377 9 magent MMT00000076NA 383154 0 grey MMT00000080Ldb2 16826 MMTo0000102 Rdhs 216453 rown MMT00000149Ak2 11637 e low MMTo0000159 Cdc2a 12534 10 purple MMT00000207 Akap13 233400 pink MMT000002122610029K21Rik66614 MMT00000231Pa2g4 18813 7 black MMT00000241NA 2 blue MMT00000268NA blue MMT000002832810043G22Rik72682 MMT00000334Brp441 55951 5 green MMTo0000365 Gltp 56356 2 blue MMTo0000368 Spryl 24063 2 blue MMTo0000373 Eomes 13813 6 red MMT00000384Ebi3 50498 brown MMT00000401Slc38a4 MMT00000418NA greenyellow MMT00000464 Srebf2 20788 7 black MMTo0000517 Mageel 107528 salmo MMT00000525NA 11 greenyellow MMTO0000549 NA 213043 I turquoise MMT00000550NA 4 yellow MMTo0000602 Scrg3 20286 13 salmon MMT00000608Cc15 20304 brown MMT00000701 VIrc13 171186 MMTo0000713 SIc7a9 30962 0 grey MMTo00007 19 Snrpa 53607 2 blue MMTo0000743 Sqle 20775 16 lightcyan MMT00000792NA 233121 blue MMT00000793C330027C09Ri 224171 10 purpl MMTo000080146324l9K20Rik 74349 0 grey MMTo0000840 Col5a3 53867 0 grey MMT00000864BC022744 2345 MMT00000887 Gne 50798 I turquoise MMTo0000963 Serpinel 18787 rown MMTo0000988 CxCl10 15945 6 red MMTo0000996 Tmem25 71687
Table2 Some Module Results (chosen from 3600 genes) Probe GeneSymbol EntrezID ModuleLabel ModuleColor MMT00000044 1700007N18Rik 69339 0 grey MMT00000046 Mast2 17776 7 black MMT00000051 Ankrd32 105377 9 magenta MMT00000076 NA 383154 0 grey MMT00000080 Ldb2 16826 8 pink MMT00000102 Rdhs 216453 3 brown MMT00000149 Ak2 11637 4 yellow MMT00000159 Cdc2a 12534 10 purple MMT00000207 Akap13 233400 8 pink MMT00000212 2610029K21Rik 66614 0 grey MMT00000231 Pa2g4 18813 7 black MMT00000241 NA NA 2 blue MMT00000268 NA NA 2 blue MMT00000283 2810043G22Rik 72682 4 yellow MMT00000334 Brp44l 55951 5 green MMT00000365 Gltp 56356 2 blue MMT00000368 Spry1 24063 2 blue MMT00000373 Eomes 13813 6 red MMT00000384 Ebi3 50498 3 brown MMT00000401 Slc38a4 69354 6 red MMT00000418 NA NA 11 greenyellow MMT00000464 Srebf2 20788 7 black MMT00000517 Magee1 107528 13 salmon MMT00000525 NA NA 11 greenyellow MMT00000549 NA 213043 1 turquoise MMT00000550 NA NA 4 yellow MMT00000602 Scrg3 20286 13 salmon MMT00000608 Ccl5 20304 3 brown MMT00000701 V1rc13 171186 0 grey MMT00000713 Slc7a9 30962 0 grey MMT00000719 Snrpa 53607 2 blue MMT00000743 Sqle 20775 16 lightcyan MMT00000792 NA 233121 2 blue MMT00000793 C330027C09Rik 224171 10 purple MMT00000801 4632419K20Rik 74349 0 grey MMT00000840 Col5a3 53867 0 grey MMT00000864 BC022744 234542 0 grey MMT00000887 Gne 50798 1 turquoise MMT00000963 Serpine1 18787 3 brown MMT00000988 Cxcl10 15945 6 red MMT00000996 Tmem25 71687 5 green
MMT00001022NA NA magenta MMTo0001077 Ngfrapl 3 brown MMT000010851700001C14Rik 75458 MMTo0001100 Mcolnl 94178 13 salmon MMTo0001110 Galt 14430 2 blue MMT000011542600001J17Rik 70385 10 67122 turquoise MMTO0001190 NA 4 yellow MMT00001245 Cdca3 14793 10 purple MMTO0001260 Sgo12 68549 10 MMT00001291NA NA 2 blue MMT00001298D630032B0lRik214579 MMTO0O01318 Lsm8 magenta MMTo0001373 Rnpsl 19826 7 black MMT00001387Ly108 30925 3 b MMTo0001394 Frk 14302 11 greenyellow MMT00001397 Pbefl 59027 green MMT00001423NA 237119 2 blue MMTo0001434 Dars 226414 MMTo00014869930023K05Rik226245 I turquoise MMTo0001496BCO17158 233913 7 black MMTo0001510 Tcfl 21405 5 green MMT00001545NA 2 blue MMT00001555 Tle1 21885 MMTO0001587 Rps3a 20091 green MMT00001596Myh7 140781 15 midnightblue MMT00001613 Surf6 20935 7 black MMTo0001646 Hmgbl 15289 11 greenyellow MMT00001675C3300l8D20Rik 77422 MMTo0001698 Top2a 21973 10 purple MMT0001714Gpr48 107515 3 brown MMTO0001732 NA 229076 2 blue MMTo00017914930544G2lRik 77629 MMTo0001806Cd84 12523 MMT00001923 Ctrb 66473 12 tan MMT00001947Pa2g4 18813 11 greenyellow MMTo0001949 Rpl31 66211 15 midnightblue MMTO0001995 NA 3 brown MMTo0002002Unc5b 107449 I turquoise MMT00002004L0C14433 14433 blue MMT00002021Saa2 20209 17 grey60 MMT000020223110050K2lRik 67302 5 green MMT00002037 Sulfl 240725 I turquois
MMT00001022 NA NA 9 magenta MMT00001077 Ngfrap1 12070 3 brown MMT00001085 1700001C14Rik 75458 6 red MMT00001100 Mcoln1 94178 13 salmon MMT00001110 Galt 14430 2 blue MMT00001154 2600001J17Rik 70385 10 purple MMT00001185 Nrarp 67122 1 turquoise MMT00001190 NA NA 4 yellow MMT00001245 Cdca3 14793 10 purple MMT00001260 Sgol2 68549 10 purple MMT00001291 NA NA 2 blue MMT00001298 D630032B01Rik 214579 0 grey MMT00001318 Lsm8 76522 9 magenta MMT00001373 Rnps1 19826 7 black MMT00001387 Ly108 30925 3 brown MMT00001394 Frk 14302 11 greenyellow MMT00001397 Pbef1 59027 5 green MMT00001423 NA 237119 2 blue MMT00001434 Dars 226414 9 magenta MMT00001486 9930023K05Rik 226245 1 turquoise MMT00001496 BC017158 233913 7 black MMT00001510 Tcf1 21405 5 green MMT00001545 NA NA 2 blue MMT00001555 Tle1 21885 4 yellow MMT00001587 Rps3a 20091 5 green MMT00001596 Myh7 140781 15 midnightblue MMT00001613 Surf6 20935 7 black MMT00001646 Hmgb1 15289 11 greenyellow MMT00001675 C330018D20Rik 77422 0 grey MMT00001698 Top2a 21973 10 purple MMT00001714 Gpr48 107515 3 brown MMT00001732 NA 229076 2 blue MMT00001791 4930544G21Rik 77629 8 pink MMT00001806 Cd84 12523 6 red MMT00001923 Ctrb 66473 12 tan MMT00001947 Pa2g4 18813 11 greenyellow MMT00001949 Rpl3l 66211 15 midnightblue MMT00001995 NA NA 3 brown MMT00002002 Unc5b 107449 1 turquoise MMT00002004 LOC14433 14433 2 blue MMT00002021 Saa2 20209 17 grey60 MMT00002022 3110050K21Rik 67302 5 green MMT00002037 Sulf1 240725 1 turquoise