Menu Close

ISMB 2022: r-APL for Visualizing Single-cell Data

I had the great privilege of attending the Intelligent Systems for Molecular Biology conference in Madison, July 10–14. Here, I’m sharing about something interesting I learned there!

Because single-cell data represents a very complex space, models that accurately describe a dataset are typically highly multi-dimensional, which makes it difficult to visualize. Traditional methods like PCA plots, which are very useful for quickly and easily visualizing the differences between bulk RNA-seq samples, are often far too reductionistic for single-cell RNA sequencing data. The newly-developed R package r-APL, published in the Journal of Molecular Biology this June by Gralinska, et. al., offers Association Plots as an alternative visualization method that avoids the information loss associated with this dimensional reduction and simplifies the process of identifying cluster-specific genes.

r-APL begins by mapping all cells as well as all genes into the same multi-dimensional correspondence analysis space, such that the distance from a gene to a cell reflects the expression of that gene in that cell. Each cell cluster is then defined as the vector from the origin to the centroid of the cluster, and each gene is assigned a set of coordinates representing its distance along and from that vector. In the resulting Association Plot for that cluster, the genes most characteristic of that cluster will be located along the x-axis to the right.

Graphical description of how to create association plots, from the original publication
Figure 1. Association Plots delineate cluster-specific genes. (a) In a high-dimensional CA space a cluster of cells (orange dots) defines a direction, here represented by the orange line pointing from the origin to the centroid of the cell cluster. The genes (black dots) associated to this cluster of cells are located close to this line along its direction. (b) For the Association Plot we only use the length from the origin to the genes’s projection onto the orange line (d*cos()) as the first coordinate of the gene in the Association Plot. The length of the perpendicular distance from the gene to the line (d*sin()) is the second coordinate. Thus, the x-axis of the Association Plot corresponds to the line pointing towards the cluster centroid shown at the end of the vector. Image and caption from Gralinska, et. al., under a non-commercial Creative Commons license.

Additionally, r-APL creates a score Sα for each gene, where a score greater than 1 means a 99% chance that the gene is not associated with the cluster purely by random chance (as determined by random permutations of the data), which allows for color-coding of the genes by association strength and makes it easier to interpret the Association Plots. These genes can then be compared to known marker genes to determine the cell type of the cluster – or can be used to identify novel marker genes for a cell type or subtype. Finally, GO-based functional enrichment analysis of these genes can be mapped onto the same Association Plots to visualize how strongly specific functional terms are represented by cluster-specific genes, as shown in this image from the publication.

Association Plot with marked genes for a specific GO term of interest
Figure 9. Location of genes annotated to the GO term ‘GO:0050853 B cell receptor signaling pathway’ in the Association Plot for the lymphoid cells. Genes belonging to this GO category are marked using black stars. Image and caption from Gralinska, et. al., under a non-commercial Creative Commons license.

From an end-user perspective, I appreciate that the plots generated by r-APL are interactive: they can be zoomed in and dragged, and provide gene information when hovering the cursor over a point on the graph. Also, since the whole package is run in R, the tabular data is readily accessible so exporting ranked lists of cluster-specific differentially-expressed genes is straightforward. I’m looking forward to incorporating this tool into our core’s single-cell pipeline!

 

 

Leave a Reply

Your email address will not be published.