This module is used for metagenomics, paired metagenomics/metatranscriptomics, and viral-enriched metagenomics workflows.
DNA Abundance Profiling
Taking the predicted proteins and the alignment files for each sample to the co-assembly, we generate a count matrix representing the entire dataset. Both annotated and unannotated versions of the matrix are included in the data return.
Using the annotated gene count matrix, a list of KEGG terms is extracted for each sample. PCA plots for each metadata category, a bar plot and a heatmap with the 100 most variable KEGG modules across the data set are made to visualize the completeness of each KEGG functional module in each sample. The KEGG functional terms, pathway group, and percent completeness for each sample are provided in tabular format along with the figures.
Comparing the co-assembly-based functional PCA plot to the read-based taxonomic PCA plot generated earlier in the workflow can be informative! It’s possible that two samples could have different population taxonomies but still have similar functional potential, and differences between sample groups that show up in one method vs. the other can help identify whether particular species or particular functions correspond better to observed categorical differences.
Example Folder Structure
--DNA-abundance-profiling
|--pcoa_plot_category.png
|--...per category
|--annotated.gene.count.matrix.tsv
|--gene.count.matric.tsv
|--kegg.heatmap.pdf
|--ko_map_barplot.pdf
|--ko_map_module_completeness.tab
Next module for metagenomics: Binning for MAG Identification
Next module for paired metagenomic/metatranscriptomics: Binning for MAG Identification
Next module for viral-enriched metagenomics: Binning for MAG Identification