Wednesday 19 June 2013

qPCR normalisation

This post will discuss the normalisation of qPCR results using both the Delta Delta CT (Livak) and Standard Curve (Pfaffl) methods as applied to qPCR of RNA and ChIP.  It will also discuss some of the common pitfalls when considering replicates of qPCR data.

While this isn't a cutting edge technique, the motivation for this post is the number of qPCR spreadsheets I see that do this wrongly.

Delta Delta CT


The Livak method is more commonly known as the "Delta Delta CT" (ΔΔCT). The Delta Delta CT method makes one important assumption about the PCR, namely, the amplification efficiencies of the reference control gene and the target gene of interest must be approximately equal.  Specifically Delta Delta CT assumes that each PCR cycle will exactly double the amount of material in your sample (amplification efficiency = 100%).

$$$ ΔΔCT = ΔCT (treated sample) - ΔCT (untreated sample) $$$

where $$$ ΔCT(sample) = CT(target) - CT(ref) $$$, therefore

$$$ ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated)) $$$

where

$$$CT(target,untreated)$$$ = CT value of gene of interest in untreated sample
$$$CT(ref,untreated)$$$ = CT value of control gene in untreated sample
$$$CT(target,treated)$$$ = CT value of gene of interest in treated sample
$$$CT(ref,treated)$$$ =  CT value of control gene in treated sample

We can then calculate the ratio of our target gene in our treated sample relative to our untreated sample by taking $$$2^{ΔΔCT}$$$.

A quick worked example:

UntreatedTreated
Ref Gene16.1715.895
Target Gene21.22519.763


$$$$ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$$
$$$$ΔΔCT = (21.225 - 16.17) - (19.763 - 15.895)$$$$
$$$$ΔΔCT = (5.055) - (3.868)$$$$
$$$$ΔΔCT = 1.187$$$$
$$$$2^{ΔΔCT} = 2^{1.187} = 2.277$$$$

So our gene of interest is increased by 2.277 times in our treated sample versus our untreated sample.

The exact order that you do the subtraction in actually doesn't make a huge difference and you'll probably see other people do it slightly differently.  Consider these possibilities:

ΔUntreated vs ΔTreated

$$$ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$
$$$ΔΔCT = (21.225 - 16.17) - (19.763 - 15.895)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,treated) - CT(target,treated)) - (CT(ref,untreated) - CT(target,untreated))$$$
$$$ΔΔCT = (15.895 - 19.763) - (16.17 - 21.225)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,untreated) - CT(target,untreated)) - (CT(ref,treated) - CT(target,treated))$$$
$$$ΔΔCT = (16.17 - 21.225) - (15.895 - 19.763)$$$
$$$ΔΔCT = -1.187$$$

$$$ΔΔCT = (CT(target,treated) - CT(ref,treated)) - (CT(target,untreated) - CT(ref,untreated))$$$
$$$ΔΔCT = (19.763 - 15.895) - (21.225 - 16.17)$$$
$$$ΔΔCT = -1.187$$$


ΔReference Gene vs ΔTarget Gene


$$$ΔΔCT = (CT(target,untreated) - CT(target,treated)) - (CT(ref,untreated) - CT(ref,treated))$$$
$$$ΔΔCT = (21.225 - 19.763) - (16.17 - 15.895)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,treated) - CT(ref,untreated)) - (CT(target,treated) - CT(target,untreated))$$$
$$$ΔΔCT = (15.895 - 16.17) - (19.763 - 21.225)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,untreated) - CT(ref,treated)) - (CT(target,untreated) - CT(target,treated))$$$
$$$ΔΔCT = (16.17 - 15.895) - (21.225 - 19.763)$$$
$$$ΔΔCT = -1.187$$$

$$$ΔΔCT = (CT(target,treated) - CT(target,untreated)) - (CT(ref,treated) - CT(ref,untreated))$$$
$$$ΔΔCT = (19.763 - 21.225) - (15.895 - 16.17)$$$
$$$ΔΔCT = -1.187$$$

As long as you stay consistent with the two ΔCT subtractions (i.e. always subtract treated from untreated, or reference gene from control gene, or vice versa) then the magnitude of the ΔΔCT part will always be the same.  The only thing that will change is the sign.

This is why you'll see some people will calculate the expression ratio as $$$2^{ΔΔCT}$$$ and others will do it as $$$2^{-ΔΔCT}$$$.  The difference is basically just how they set up their initial equation - effectively which direction they are comparing the samples.

To see why Delta Delta CT actually works we have to consider what's actually going on under the hood.

The absolute amount of material that we obtain through PCR for each sample for each primer pair is inversely proportional to $$$2^{CT}$$$.  We normalise our genes to a reference gene within each sample to ensure that we don't have any systematic errors due to differences between each sample (an internal control).

So the ratio of target gene to reference gene in each sample is therefore $$$2^{CT(target)} / 2^{CT(ref)}$$$.

However, because $$$\frac{b^c}{b^d} = b^{c-d}$$$ we can rewrite this as $$$2^{CT(target) - CT(ref)}$$$.

We then calculate the ratio between our two sample by calculating the quotient between the ratio of target gene to reference gene between the two samples as such:

$$$$Ratio = {2^{CT(target,untreated) - CT(ref,untreated)} \over 2^{CT(target,treated) - CT(ref,treated)}}$$$$

which by the same identity rule we applied before equals our ΔΔCT equation:

$$$(CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$.


Standard Curve

An improvement to the Delta Delta CT method was introduced by Pfaffl to account for PCR efficiency curves deviating from the theoretical 100% efficient reaction.

To measure how efficient our PCR is for a given amplicon we run a template dilution series and see how closely our idealised PCR compares to real life.

If we run a dilution series of (0.25, 0.5, 1, 2) we would expect that there would be a one CT difference between each sample in the ideal 100% efficient reaction as shown:

CT ValueConcentration
360.25
350.5
341
332

However, if our actual measured CT values indicate a larger difference then our PCR reaction has been less efficient than we hoped.

CT ValueConcentration
360.25
34.90.5
33.81
32.72

We can work out exactly how much less efficient by comparing the CT values and the log of the Concentration.

We can do this on any log scale you like although commonly it'll be done on either $$$log_2$$$ or $$$log_{10}$$$ scales.  The result will come out the same either way.  We will use $$$log_2$$$ from now on since it fits well with the property of PCR doublings.



What we are interested in is the slope of the trend line.  In the case above the slope is -1.1.  We can then work out the efficiency of the reaction as

$$$Efficency = 2^{-1/slope} = 2^{-1/-1.1} = 1.878$$$

Therefore, for each PCR reaction we generate 1.878 copies of our template rather than, the theoretical, 2 copies in the ideal case.  The Efficiency is often then represented as a scale between 0-1 which is obtainable by subtracting 1 from the calculated efficiency above.

If we do a standard curve for each primer set we have (reference gene and target gene) then we can incorporate them into our Delta Delta CT equation to get:

$$$$Ratio = {Efficency(Target)^{CT(target,untreated) - CT(target,treated)} \over Efficency(Ref)^{CT(ref,untreated) - CT(ref,treated)}}$$$$

Note that the order of subtraction matters more here as we make sure that the the exponent of each calculated efficiency contains only the CT's which were produced by that primer pair.

ChIP-PCR

Unlike in RT-PCR for gene expression quantitation, ChIP-PCR will use a single primer pair per region of interest.  However, we will then usually also include an Input sample which the ChIP sample is compared with.

Input usually comprises a huge amount of DNA so it is usually necessary to take a subset of the Input as our actual sample to PCR.  The amount that is actually used may be between 1% and 10%.

To account for this we should apply an input adjustment.  This involves calculating a dilution factor which is equal to $$$1/ {Fraction Input}$$$.  For example if you have 1% input then your dilution factor (DF) is 1/0.01 = 100.  We then calculate our CT correction factor by calculating $$$log_{Efficiency}(Dilution Factor)$$$.

Worked example:  5% input is a DF of 1/0.05 = 20.  For the standard curve described above this is a CT correction of $$$log_{1.878}(20) = 4.75$$$.  This is then subtracted from each of the Input CTs before continuing as per the Standard Curve approach described above.

Because we only have a single primer pair we can use the same efficiency for the PCR throughout which allows us to simplify the Standard Curve approach to $$$Efficiency^{ΔΔCT}$$$.

Often you'll want to represent your result as a percent of input and this can be calculated for each condition as $$$Efficiency^{ΔCT}$$$.

Replicates & Error Propagation

One place that a number of mistakes seem to creep in is the treatment of replicates within PCR experiments.

The most frequent mistake I've seen is the use of the wrong type of mean when calculating the average ratio.  The key fact to remember is to always use the arithmetic mean on anything on a linear scale and always to use the geometric mean on anything on an exponential scale.

More concisely, if it's CT values (or differences thereof), then use an arithmetic mean.  If it's concentration values (anything that is $$$Efficiency^{CT}$$$) then use the geometric mean.

I also frequently encounter incorrect treatment of error bars.  Often people will take the standard deviation (or standard error) of concentration data, or ratios, and depict them directly in their bar charts.  However, these metrics assume normally distributed data which is only the case for the CT values themselves, not the ratios or concentrations.  Briefly, if your y axis on your chart is linear scale and your error bars are the same on both sides then this is wrong.

Finally we should deal with error propagation.  The error in your experiment is composed of the errors of $$$(Untreated,Ref) + (Untreated,Target) + (Treated,Ref) + (Treated,Target)$$$.  This error will usually be more than the error implied by taking the standard error of the ΔΔCT values for each replicate.  However, for uncorrelated variables we can propagate additive errors using the formula:

$$$Error(a+b) = \sqrt{ Error(a)^2 + Error(b)^2 }$$$

This works well for delta delta CT as the error is just the product of multiple additive errors.  However, for standard curve approaches we also have error from the Efficiency values we calculated for our curve and this can't be expressed as an additive factor.

Instead we need to use a Taylor series to propogate the error.  This allows for all six sources of variance (4x CTs + 2x Efficiencies) to be included in the final error calculation.

Control Gene Choice

One final factor is the choice of a suitable housekeeping gene as a control.  If you choose a reference housekeeping gene that changes wildly in between your untreated and treated samples then the results of your target gene compared to this gene will be wrong.

One approach to reduce the effect of this is to use the geometric mean of multiple housekeeping genes as your reference instead of choosing one.  Even still, it's important to carefully choose your reference genes for stability in your particular experiment.

Software

Another approach is just to use some of the software that's out there already.  One that's well used is REST 2009 which supports virtually anything you'll want to do.  Similarly there are open source solutions such as pyQPCR.  If you are willing to pay money you can also use something like GenEx.

If you do decide to make your own spreadsheet you should, at the very least, confirm that the answers you get from there are the same as the answers that you get from these programs.

I've uploaded a spreadsheet which illustrates most of the things I've talked about in this post.  It has examples for calculating standard curves, ΔΔCT, Standard Curve and Input Correction for ChIP-PCR.  It also has worked examples for error propagation for both ΔΔCT (via simple additive error propagation) and for Standard Curves (via a taylor series).

Realistically though, for almost all cases, and particularly anything rtPCR related, I recommend using something off the shelf like REST 2009.  There aren't many reasons you need to make your own spreadsheets - unless of course you are trying to explain how PCR normalisation works in a blog post.

Note that I've previously had errors in this spreadsheet myself.  I take this as a sign of just how easy it is to make these mistakes and also a reminder that there could well be errors still lurking in this sheet (and in any other spreadsheet or software you've used). Thanks most recently to Duncan Brian (UCL) for highlighting that the SE from the Taylor Series I took from REST384 was inversely correlated to the SE of the individual components. I've updated the sheet to include both the taylor series which matches the output of REST 384 and also the Taylor series described in the Gene Quantification Platform.

References

Gene Quantification Platform, Real Time PCR amended, A useful new approach? Statistical problems? http://www.gene-quantification.com/avery-rel-pcr-errors.pdf, Last Accessed 2017.

Livak, Kenneth J., and Thomas D. Schmittgen. Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the $$$2^{-ΔΔCT}$$$ Method. Methods 25.4 (2001).

Pfaffl, Michael W. A new mathematical model for relative quantification in real-time RT–PCR. Nucleic Acids Research 29.9 (2001).

Vandesompele, Jo, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology 3.7 (2002).