This module is used in metagenomics, paired metagenomic/metatranscriptoms, and viral-enriched metagenomics workflows.
Assembly-Free Taxonomic Classification with Kraken
This taxonomic classification is a read-based method using the tools KRAKEN2, Bracken, and KrakenTools. Individual reads are assigned to specific taxa based on k-mer similarity to sequences in the Kraken reference database and the population abundance for identified taxa at each taxonomic level is determined.
From the species-level population abundance for each sample, we calculate alpha and beta diversity metrics (including Shannon, Simpson, Fisher, Berger-Parker, and Inverse Simpson formulas for alpha diversity). Each categorical variable in the metadata for the project is blotted as a box-violin plot for each alpha diversity metric, while each quantitative variable is plotted as a scatter plot. The box-violin plots include statistical pairwise comparisons between groups, providing p-values for statistically significant differences. When a linear relationship between a quantitative variable and alpha diversity appears to be present, we use a linear regression model to evaluate the strength of the correlation. For beta diversity, we use a correlation plot to visualize similarities between samples.
Finally, we use Krona to create interactive sunburst visualizations of population abundance for each sample, providing useful and beautiful graphics representing the complexity of hierarchical taxonomy within individual samples.
Example Folder Structure
The output from these scripts is returned in a folder structure similar to this. In addition to the classified reads used for the downstream diversity and population abundance calculations, unclassified reads are returned in case you want to attempt classification with alternate methods now or in the future as the reference databases continue to be updated.
For similar reasons, the intermediate raw and report files from Kraken and Braken are also included. Raw Kraken files contain the classification for each read, the Kraken reports contain the number of reads identified for each classification, Bracken raw files include the number of reads identified for each taxa at the specified taxonomic level (this includes the Kraken reads identified for that taxa as well as reads assigned by Kraken to any sublevel of that taxa), and the Bracken reports include the expanded hierarchy information for the raw read counts.
-kraken-assembly-free-analysis
|--classified-fastq
|--sampleID_k2pf_pe_clsf_1.fq
|--sampleID_k2pf_pe_clsf_2.fq
|--...per sample
|--unclassified-fastq
|--sampleID_k2pf_pe_uncl_1.fq
|--sampleID_k2pf_pe_uncl_2.fq
|--...per sample
|--kraken-raws
|--sampleID_SQP_k2pf_pe.kraken2
|--...per sample
|--kraken-reports
|--sampleID_SQP_k2pf_pe.k2report
|..per sample
|--bracken-raws
|--sampleID_SQP_k2pf_pe_D.bracken
|--sampleID_SQP_k2pf_pe_K.bracken
|--sampleID_SQP_k2pf_pe_P.bracken
|--sampleID_SQP_k2pf_pe_C.bracken
|--sampleID_SQP_k2pf_pe_O.bracken
|--sampleID_SQP_k2pf_pe_F.bracken
|--sampleID_SQP_k2pf_pe_G.bracken
|--sampleID_SQP_k2pf_pe_S.bracken
|--...per sample
|--bracken-reports
|--sampleID_SQP_k2pf_pe_D.breport
|--sampleID_SQP_k2pf_pe_K.breport
|--sampleID_SQP_k2pf_pe_P.breport
|--sampleID_SQP_k2pf_pe_C.breport
|--sampleID_SQP_k2pf_pe_O.breport
|--sampleID_SQP_k2pf_pe_F.breport
|--sampleID_SQP_k2pf_pe_G.breport
|--sampleID_SQP_k2pf_pe_S.breport
|--...per sample
|--bracken-diversity
|--berger-category.png
|--fisher-category.png
|--inverse.simpson-category.png
|--shannon-category.png
|--simpson-category.png
|--...per category
|--beta-diversity.pdf
|--beta-diversity.txt
|--krona-tabular-output
|--sampleID_SQP_k2pf_pe_S.b.krona.txt
|--...per sample
|--krona-html-plots
|--sampleID_SQP_k2pf_pe_S.b.krona.html
|--...per sample
Next module for metagenomics: Assembly-Free Taxonomic and Functional Classification with HUMAnN
Next module for paired metagenomic/metatranscriptomics: Assembly-Free Taxonomic and Functional Classification with HUMAnN
Next module for viral-enriched metagenomics: Assembly-Free Taxonomic and Functional Classification with HUMAnN