Before investing in a large and expensive RNA sequencing experiment, it’s a good idea to ensure your sample size is sufficient to obtain good statistical power for your differential expression analysis (typically, a power of at least 0.8). This can be particularly tricky for RNAseq experiments in general due to variation in coverage depth, sample size, and natural population variability — as well as the thousands of individual gene comparisons that are involved.

One way to accommodate these factors is to calculate power-to-sample-size relationships using simulations based on model data with similar parameters to your experiment of interest, and the R package PROPER (PROspective Power Evaluation for RNAseq) makes this process quick and painless.

To begin, let’s install PROPER:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("PROPER") library("PROPER")

You’ll also need to install and load either edgeR or DEseq — whichever tool you are planning to use for your own downstream analysis. Here, I’ll be choosing edgeR.

BiocManager::install("edgeR") library("edgeR")

PROPER offers four build-in data sets to use for simulation modeling. The first, called “Cheung”, has the largest dispersion and represents unrelated human individuals; the third, called “Bottomly”, has a fairly small dispersion as the sources are inbreed mice. The second (“Gilad”) has a mid-range dispersion, and the final option (“MAQC”) consists of technical replicates and thus has almost no variability. If you’re unsure where your experiment falls, you can use the Cheung model to get the most conservative predictions.

So, first we’ll define our simulation model using the included Cheung data:

sim.opts.Cheung = RNAseq.SimOptions.2grp(ngenes = 20000, p.DE=0.05, lOD="cheung", lBaselineExpr="cheung")

Next, we actually run the simulation; this is the most time-consuming step, as the software is creating simulated data and calculated differential expression values using the RNAseq package we specify. Here, we can include a vector of sample sizes we want to test power for; these should be personalized according to what is reasonable for your experimental design (and funding!). The PROPER manual recommends running at least 20 simulations to obtain a stablized model, but I’ve found more consistent results with 100 simulations. On my MacBook, it takes about 30 minutes to run 100 simulations with 5 different group sizes, which is pretty reasonable (especially since 5 group sizes is a bit more than you typically need to run).

simres.C = runSims(Nreps = c(3, 5, 7, 10, 15), sim.opts=sim.opts.Cheung, DEmethod="edgeR", nsims=100)

One of the nice aspects of PROPER is that once these simulations have been run, the power calculations themselves are very quick, and can be modified and run multiple times if different parameters are desired. Here, I’ve chosen to use an FDR value of 0.1 for calling DE genes, but you can also call by p-value. Gene stratification (binning for downstream plots) can be done according to expression — which is a great way to visualize how greater coverage impacts the power of your study — or over-dispersion. The “delta” field lets you decide how much of a fold-change you consider to be biologically relevant, and can give you some additional flexibility with otherwise under-powered experiments. In addition, you can choose to filter your data, reflecting the removal of low-count genes that is typically performed in RNAseq analysis.

Let’s run a few different options to see how they affect the calculations:

# default power calculations from the PROPER vignette powers.C = comparePower(simres.C, alpha.type="fdr", alpha.nominal=0.1, stratify.by="expr", delta=0.5) # decreasing the acceptable FDR for DE genes powers.C.stringent = comparePower(simres.C, alpha.type="fdr", alpha.nominal=0.05, stratify.by="expr", delta=0.5) # increasing the targeted fold-change powers.C.largeD = comparePower(simres.C, alpha.type="fdr", alpha.nominal=0.1, stratify.by="expr", delta=2) # filtering out genes with read depth <10 powers.C.filt = comparePower(simres.C, alpha.type="fdr", alpha.nominal=0.1, strata = c(0, 10, 2^(1:7)*10, Inf), filter.by="expr", strata.filtered=1, stratify.by="expr", delta=0.5)

The quickest way to view the results is with summaryPowers, though PROPER also includes a number of useful plotting tools. The following table shows the output of summaryPowers, while the plot below shows the results of plotAll for the filtered Cheung dataset. Each of the graphs could also be made independently.

summaryPower(powers.C) summaryPower(powers.C.stringent) summaryPower(powers.C.largeD) summaryPower(powers.C.filt) plotAll(powers.C.filt)

As an aside, the plot of false discovery rate here shows just how important sample replication is for the confidence of RNAseq results: going from just 3 to 5 replicates, with an average read depth of 10–20, has a lower FDR than 3 replicates at an average depth of more than 640 reads! It’s also probably going to be more cost-effective, despite the increased library preparation cost.

In general, however, what this analysis tells us is that for a pairwise differential expression experiment in highly-variable human samples, filtering out low-count genes, targeting a false discovery rate of 0.1, and retaining all genes with a log2 fold-change greater than 0.5 in either direction, we would ideally use at least 7 replicates to obtain a power of 80%. Obviously, that’s a lot of conditions! But by the same token, that is a lot of built-in flexibility that you can take advantage of to make your power analysis as relevant as possible to your experiment.