I had the great privilege of attending the Intelligent Systems for Molecular Biology conference in Madison, July 10–14. Here, I’m sharing about something interesting I learned there!
Because single-cell data represents a very complex space, models that accurately describe a dataset are typically highly multi-dimensional, which makes it difficult to visualize. Traditional methods like PCA plots, which are very useful for quickly and easily visualizing the differences between bulk RNA-seq samples, are often far too reductionistic for single-cell RNA sequencing data. The newly-developed R package r-APL, published in the Journal of Molecular Biology this June by Gralinska, et. al., offers Association Plots as an alternative visualization method that avoids the information loss associated with this dimensional reduction and simplifies the process of identifying cluster-specific genes.
r-APL begins by mapping all cells as well as all genes into the same multi-dimensional correspondence analysis space, such that the distance from a gene to a cell reflects the expression of that gene in that cell. Each cell cluster is then defined as the vector from the origin to the centroid of the cluster, and each gene is assigned a set of coordinates representing its distance along and from that vector. In the resulting Association Plot for that cluster, the genes most characteristic of that cluster will be located along the x-axis to the right.
Additionally, r-APL creates a score Sα for each gene, where a score greater than 1 means a 99% chance that the gene is not associated with the cluster purely by random chance (as determined by random permutations of the data), which allows for color-coding of the genes by association strength and makes it easier to interpret the Association Plots. These genes can then be compared to known marker genes to determine the cell type of the cluster – or can be used to identify novel marker genes for a cell type or subtype. Finally, GO-based functional enrichment analysis of these genes can be mapped onto the same Association Plots to visualize how strongly specific functional terms are represented by cluster-specific genes, as shown in this image from the publication.
From an end-user perspective, I appreciate that the plots generated by r-APL are interactive: they can be zoomed in and dragged, and provide gene information when hovering the cursor over a point on the graph. Also, since the whole package is run in R, the tabular data is readily accessible so exporting ranked lists of cluster-specific differentially-expressed genes is straightforward. I’m looking forward to incorporating this tool into our core’s single-cell pipeline!