Functional Enrichment Analysis

Data found in folder 5.functional-enrichment

Each pairwise comparison between two sample groups has its own subfolder of results. If an organism has no available functional annotations (for example, a de novo assembly), this entire folder will be missing. If an organism has only GO annotations without KEGG annotations (most less-common organisms), only the files from GO enrichment will be included. For the 1,309 eukaryotes and 8,898 prokaryotic genomes in the KEGG database, a KEGG pathway enrichment analysis will also be carried out.

Both GO and KEGG analyses are performed with the R package clusterProfiler. For GO enrichment, returned files include the lists of significant GO terms identified in the data (simplified into representative terms, since higher-level GO terms can be statistically significant just because one or two of their children are over-represented) as well as bar plots, network plots that link differentially expressed genes with differentially expressed GO terms, mapping plots that join GO term nodes based on the similarity of gene expression patterns, and dot plots for over and under-expressed genes for each comparison.

Sample functional enrichment figures
Top left: bar plot showing the number of differentially expressed genes per function, colored by the adjusted p-value;
Top right: mapping plot connects enriched functions that have overlapping gene sets, to visualize clustering of similar functional terms;
Bottom left: gene/function network plot shows which differentially-expressed genes are associated with enriched functions;
Bottom right: dot plot where the x-axis is the ratio of the number of genes associated with the function within the differentially expressed genes to the number of genes associated with the function within the total geneset; the size of the dot is the number of differentially expressed genes associated with the function; and the color of the dot is the adjusted p-value of the function’s enrichment.

For KEGG enrichment, an image of each significantly enriched pathway is provided (with differentially expressed genes highlighted) as well as a CSV file containing metrics for all significantly enriched pathways.

