Most people are familiar with the idea of the microbiome – the communities of microbes that populate just about everything, from soil to air conditioning ducts and from plant surfaces to animal guts. We’re coming to understand that these communities can profoundly influence their environments or serve as markers of environmental health. But it is still difficult to accurately and completely characterize these communities. Traditional methods of isolating, culturing, and identifying each microbial species individually are time-consuming and laborious, and fail to capture the many microbes that can’t easily be grown in a lab environment.
One method of identifying microbes in a population uses the presence of the ribosomal RNA gene – a highly conserved region region of the genome that codes for the protein production factories of the cell – to quickly and affordably obtain a representation of the population. Since portions of the 16S ribosomal RNA gene are almost identical in most bacterial species, and are interspersed with more variable regions, we can target the identical sequences as endpoints of short DNA copies containing a more variable region, and then sequence just those specific regions instead of the entire genome. Once the DNA sequences have been read, we can count the individual variants and compare the similarity between them to get an idea for the diversity of a microbial population, both in terms of the evenness of variant distribution and the distance between those variants. Variants can also be compared to reference databases to assign taxonomic information to the population data.
Advantages of this kind of targeted metagenomic sequencing (often just referred to as 16S sequencing, after the name of the gene) include affordability and analytic simplicity. The sample preparation is straightforward and the required read depth is low, allowing hundreds of samples to be sequenced at a time, while tools like Qiime2 and robust reference databases like Silva make it easy for researchers to analyze the data without needing much specialized informatic training or high-performance computing resources. On the other hand, 16S sequencing is limited by the inherent limited variability of the 16S gene itself. It is not usually possible to identify bacteria at the species level, as there isn’t enough difference between species in the same genera over the short sections of the gene typically used, and the low read depth often means that rare taxa are left unrepresented. Additionally, 16S sequencing is limited to diversity and taxonomy analyses, and is unable to capture functional information (for example, the presence and prevalence of antibiotic resistance genes in a population, or of genes focused on specific metabolites).
Shotgun metagenomics sequencing, also called whole genome metagenomic sequencing, seeks to address both of those shortcomings. With this method, all the DNA from a microbial population is sequenced at once, and then reassembled into the individual genomes. The genome sequences can be matched to known genomes for taxonomic identification and their relative abundances used for the same kind of diversity analyses as possible with 16S sequencing. However, because the entire genome is sequenced, it is also possible to measure antibiotic resistance genes or compare enriched functional pathways between microbiome samples – both of which can be very informative regarding the microbiome’s interaction with the wider ecosystem.
Not unexpectedly, the drawbacks of whole genome metagenomic sequencing mirror the advantages of 16S metagenomic sequencing: it is more expensive due to higher read depth requirements, and the necessary bioinformatics is far more complicated and time-consuming. While a decent overview of a microbial population can be gained with only 10-20,000 16S reads, a reasonably complex microbiome would need 10-12 million reads to obtain usable whole genome metagenomic results. There is some economy of scale involved in the sequencing cost, but this is still a much larger investment. Similarly, whole genome metagenomic analysis requires more powerful computing resources, more familiarity and comfort with computer programming and bioinformatics, and more time. Many individual tools often need to be used in concert to obtain the best results, and the field is still in a state of development.
Because of all these differences, there isn’t one best choice between these options – it depends on your experimental aims, as well as your available funding. If you are primarily interested in taxonomy and ecological diversity, it may be preferable to choose 16S sequencing and take advantage of the affordability to sequence many more replicates obtain valuable information for statistical confidence. On the other hand, if you want to know about antibiotic resistance or other functional aspects of the microbiomes you’re studying, and if you have bioinformatics expertise and computing resources available to you, then whole genome metagenomic sequencing is going to be able to give you data and insights that 16S sequencing would never be able to.