Menu Close

Science Spotlight: Integrating Function and Phylogeny in Metagenomics Analysis

For the past month I’ve been mostly immersed in developing and testing a whole genome metagenomics workflow for the core, and I’ve noticed that available open-source tools tend to fit into two categories: annotation of functional genes and differentiating between functional pathway abundances, or identification of taxonomic and phylogenetic categories in the microbial population. What can be more difficult is identifying which specific functions are common because confer a selective advantage in a given ecosystem, as opposed to functions that are prevalent simply because they are present in a taxa that is dominating the ecosystem for an unrelated reason.

That is where POMS comes in!

Described this November in “Integrating phylogenetic and functional data in microbiome studies” by Gavin Douglas, Molly Hayes, Morgan Langille, and Elhanan Borenstein, POMS (Phylogenetic Organization of Metagenomic Signals) is an R package that seeks to identify functional pathways that are over-represented across multiple taxa in a given sample group as compared to another, and which are therefore reasonably associated with the selective pressures imposed upon that sample group as opposed to the other. Using assembled microbial genomes from all samples in a metagenomic dataset, POMS creates a phylogenetic tree and evaluates the composition of each sample group with that tree, marking nodes at which all taxa in a subtree have increased representation in a given sample group. Next, POMS takes functional abundance information, obtained from the gene annotations of the assembled genomes, to determine which nodes mark subtrees with increased representation of a given function. The combination of skews in taxonomic representation with skews in functional representation allow the software to highlight functions that are likely to be biologically relevant to the difference between the sample groups being compared.

In the paper, the authors evaluated both simulated and real metagenomic datasets with POMS as well as multiple differential expression tools (originally developed for the RNAseq world) and phylogenetic regression. Unsurprisingly, POMS performed much better than the differential expression tools, which flagged almost every gene function as significant and made it difficult to interpret which functions were meaningfully differentially expressed. Also unsurprisingly, POMS was unable to match the resolution provided by phylogenetic regression, given that it necessarily can only identify functions that are widely but not universally present across taxa, and is therefore unable to classify anything ubiquitous or taxonomically unique as significantly differentially expressed. However, what POMS was able to do well was identify meaningful differential functional terms – both those that had been chosen to be significant in the simulated tests, and functions that made biological sense for the validation data – in a way that was more straightforward to interpret than either of the other popular methods.

These results suggest that POMS is a useful tool to add to our metagenomics workflow here at the core, allowing us to step beyond characterization of individual samples’ metagenomes into between-group comparisons with more confidence in our methodology and more biological value in our conclusions. I’m planning on implementing it with our current project and would love to hear from anyone who has experience with this or similar tools!

Citations

Gavin M Douglas, Molly G Hayes, Morgan G I Langille, Elhanan Borenstein, Integrating phylogenetic and functional data in microbiome studies, Bioinformatics, Volume 38, Issue 22, 15 November 2022, Pages 5055–5063, https://doi.org/10.1093/bioinformatics/btac655

Leave a Reply

Your email address will not be published. Required fields are marked *