qPCR normalisation

Wednesday, 19 June 2013

qPCR normalisation

This post will discuss the normalisation of qPCR results using both the Delta Delta CT (Livak) and Standard Curve (Pfaffl) methods as applied to qPCR of RNA and ChIP. It will also discuss some of the common pitfalls when considering replicates of qPCR data.

While this isn't a cutting edge technique, the motivation for this post is the number of qPCR spreadsheets I see that do this wrongly.

Delta Delta CT

The Livak method is more commonly known as the "Delta Delta CT" (ΔΔCT). The Delta Delta CT method makes one important assumption about the PCR, namely, the amplification efficiencies of the reference control gene and the target gene of interest must be approximately equal. Specifically Delta Delta CT assumes that each PCR cycle will exactly double the amount of material in your sample (amplification efficiency = 100%).

$$$ ΔΔCT = ΔCT (treated sample) - ΔCT (untreated sample) $$$

where $$$ ΔCT(sample) = CT(target) - CT(ref) $$$, therefore

$$$ ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated)) $$$

where

$$$CT(target,untreated)$$$ = CT value of gene of interest in untreated sample
$$$CT(ref,untreated)$$$ = CT value of control gene in untreated sample
$$$CT(target,treated)$$$ = CT value of gene of interest in treated sample
$$$CT(ref,treated)$$$ = CT value of control gene in treated sample

We can then calculate the ratio of our target gene in our treated sample relative to our untreated sample by taking $$$2^{ΔΔCT}$$$.

A quick worked example:

	Untreated	Treated
Ref Gene	16.17	15.895
Target Gene	21.225	19.763

$$$$ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$$
$$$$ΔΔCT = (21.225 - 16.17) - (19.763 - 15.895)$$$$
$$$$ΔΔCT = (5.055) - (3.868)$$$$
$$$$ΔΔCT = 1.187$$$$
$$$$2^{ΔΔCT} = 2^{1.187} = 2.277$$$$

So our gene of interest is increased by 2.277 times in our treated sample versus our untreated sample.

The exact order that you do the subtraction in actually doesn't make a huge difference and you'll probably see other people do it slightly differently. Consider these possibilities:

ΔUntreated vs ΔTreated

$$$ΔΔCT = (CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$
$$$ΔΔCT = (21.225 - 16.17) - (19.763 - 15.895)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,treated) - CT(target,treated)) - (CT(ref,untreated) - CT(target,untreated))$$$
$$$ΔΔCT = (15.895 - 19.763) - (16.17 - 21.225)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,untreated) - CT(target,untreated)) - (CT(ref,treated) - CT(target,treated))$$$
$$$ΔΔCT = (16.17 - 21.225) - (15.895 - 19.763)$$$
$$$ΔΔCT = -1.187$$$

$$$ΔΔCT = (CT(target,treated) - CT(ref,treated)) - (CT(target,untreated) - CT(ref,untreated))$$$
$$$ΔΔCT = (19.763 - 15.895) - (21.225 - 16.17)$$$
$$$ΔΔCT = -1.187$$$

ΔReference Gene vs ΔTarget Gene

$$$ΔΔCT = (CT(target,untreated) - CT(target,treated)) - (CT(ref,untreated) - CT(ref,treated))$$$
$$$ΔΔCT = (21.225 - 19.763) - (16.17 - 15.895)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,treated) - CT(ref,untreated)) - (CT(target,treated) - CT(target,untreated))$$$
$$$ΔΔCT = (15.895 - 16.17) - (19.763 - 21.225)$$$
$$$ΔΔCT = 1.187$$$

$$$ΔΔCT = (CT(ref,untreated) - CT(ref,treated)) - (CT(target,untreated) - CT(target,treated))$$$
$$$ΔΔCT = (16.17 - 15.895) - (21.225 - 19.763)$$$
$$$ΔΔCT = -1.187$$$

$$$ΔΔCT = (CT(target,treated) - CT(target,untreated)) - (CT(ref,treated) - CT(ref,untreated))$$$
$$$ΔΔCT = (19.763 - 21.225) - (15.895 - 16.17)$$$
$$$ΔΔCT = -1.187$$$

As long as you stay consistent with the two ΔCT subtractions (i.e. always subtract treated from untreated, or reference gene from control gene, or vice versa) then the magnitude of the ΔΔCT part will always be the same. The only thing that will change is the sign.

This is why you'll see some people will calculate the expression ratio as $$$2^{ΔΔCT}$$$ and others will do it as $$$2^{-ΔΔCT}$$$. The difference is basically just how they set up their initial equation - effectively which direction they are comparing the samples.

To see why Delta Delta CT actually works we have to consider what's actually going on under the hood.

The absolute amount of material that we obtain through PCR for each sample for each primer pair is inversely proportional to $$$2^{CT}$$$. We normalise our genes to a reference gene within each sample to ensure that we don't have any systematic errors due to differences between each sample (an internal control).

So the ratio of target gene to reference gene in each sample is therefore $$$2^{CT(target)} / 2^{CT(ref)}$$$.

However, because $$$\frac{b^c}{b^d} = b^{c-d}$$$ we can rewrite this as $$$2^{CT(target) - CT(ref)}$$$.

We then calculate the ratio between our two sample by calculating the quotient between the ratio of target gene to reference gene between the two samples as such:

$$$$Ratio = {2^{CT(target,untreated) - CT(ref,untreated)} \over 2^{CT(target,treated) - CT(ref,treated)}}$$$$

which by the same identity rule we applied before equals our ΔΔCT equation:

$$$(CT(target,untreated) - CT(ref,untreated)) - (CT(target,treated) - CT(ref,treated))$$$.

Standard Curve

An improvement to the Delta Delta CT method was introduced by Pfaffl to account for PCR efficiency curves deviating from the theoretical 100% efficient reaction.

To measure how efficient our PCR is for a given amplicon we run a template dilution series and see how closely our idealised PCR compares to real life.

If we run a dilution series of (0.25, 0.5, 1, 2) we would expect that there would be a one CT difference between each sample in the ideal 100% efficient reaction as shown:

CT Value	Concentration
36	0.25
35	0.5
34	1
33	2

However, if our actual measured CT values indicate a larger difference then our PCR reaction has been less efficient than we hoped.

CT Value	Concentration
36	0.25
34.9	0.5
33.8	1
32.7	2

We can work out exactly how much less efficient by comparing the CT values and the log of the Concentration.

We can do this on any log scale you like although commonly it'll be done on either $$$log_2$$$ or $$$log_{10}$$$ scales. The result will come out the same either way. We will use $$$log_2$$$ from now on since it fits well with the property of PCR doublings.

What we are interested in is the slope of the trend line. In the case above the slope is -1.1. We can then work out the efficiency of the reaction as

$$$Efficency = 2^{-1/slope} = 2^{-1/-1.1} = 1.878$$$

Therefore, for each PCR reaction we generate 1.878 copies of our template rather than, the theoretical, 2 copies in the ideal case. The Efficiency is often then represented as a scale between 0-1 which is obtainable by subtracting 1 from the calculated efficiency above.

If we do a standard curve for each primer set we have (reference gene and target gene) then we can incorporate them into our Delta Delta CT equation to get:

$$$$Ratio = {Efficency(Target)^{CT(target,untreated) - CT(target,treated)} \over Efficency(Ref)^{CT(ref,untreated) - CT(ref,treated)}}$$$$

Note that the order of subtraction matters more here as we make sure that the the exponent of each calculated efficiency contains only the CT's which were produced by that primer pair.

ChIP-PCR

Unlike in RT-PCR for gene expression quantitation, ChIP-PCR will use a single primer pair per region of interest. However, we will then usually also include an Input sample which the ChIP sample is compared with.

Input usually comprises a huge amount of DNA so it is usually necessary to take a subset of the Input as our actual sample to PCR. The amount that is actually used may be between 1% and 10%.

To account for this we should apply an input adjustment. This involves calculating a dilution factor which is equal to $$$1/ {Fraction Input}$$$. For example if you have 1% input then your dilution factor (DF) is 1/0.01 = 100. We then calculate our CT correction factor by calculating $$$log_{Efficiency}(Dilution Factor)$$$.

Worked example: 5% input is a DF of 1/0.05 = 20. For the standard curve described above this is a CT correction of $$$log_{1.878}(20) = 4.75$$$. This is then subtracted from each of the Input CTs before continuing as per the Standard Curve approach described above.

Because we only have a single primer pair we can use the same efficiency for the PCR throughout which allows us to simplify the Standard Curve approach to $$$Efficiency^{ΔΔCT}$$$.

Often you'll want to represent your result as a percent of input and this can be calculated for each condition as $$$Efficiency^{ΔCT}$$$.

Replicates & Error Propagation

One place that a number of mistakes seem to creep in is the treatment of replicates within PCR experiments.

The most frequent mistake I've seen is the use of the wrong type of mean when calculating the average ratio. The key fact to remember is to always use the arithmetic mean on anything on a linear scale and always to use the geometric mean on anything on an exponential scale.

More concisely, if it's CT values (or differences thereof), then use an arithmetic mean. If it's concentration values (anything that is $$$Efficiency^{CT}$$$) then use the geometric mean.

I also frequently encounter incorrect treatment of error bars. Often people will take the standard deviation (or standard error) of concentration data, or ratios, and depict them directly in their bar charts. However, these metrics assume normally distributed data which is only the case for the CT values themselves, not the ratios or concentrations. Briefly, if your y axis on your chart is linear scale and your error bars are the same on both sides then this is wrong.

Finally we should deal with error propagation. The error in your experiment is composed of the errors of $$$(Untreated,Ref) + (Untreated,Target) + (Treated,Ref) + (Treated,Target)$$$. This error will usually be more than the error implied by taking the standard error of the ΔΔCT values for each replicate. However, for uncorrelated variables we can propagate additive errors using the formula:

$$$Error(a+b) = \sqrt{ Error(a)^2 + Error(b)^2 }$$$

This works well for delta delta CT as the error is just the product of multiple additive errors. However, for standard curve approaches we also have error from the Efficiency values we calculated for our curve and this can't be expressed as an additive factor.

Instead we need to use a Taylor series to propogate the error. This allows for all six sources of variance (4x CTs + 2x Efficiencies) to be included in the final error calculation.

Control Gene Choice

One final factor is the choice of a suitable housekeeping gene as a control. If you choose a reference housekeeping gene that changes wildly in between your untreated and treated samples then the results of your target gene compared to this gene will be wrong.

One approach to reduce the effect of this is to use the geometric mean of multiple housekeeping genes as your reference instead of choosing one. Even still, it's important to carefully choose your reference genes for stability in your particular experiment.

Software

Another approach is just to use some of the software that's out there already. One that's well used is REST 2009 which supports virtually anything you'll want to do. Similarly there are open source solutions such as pyQPCR. If you are willing to pay money you can also use something like GenEx.

If you do decide to make your own spreadsheet you should, at the very least, confirm that the answers you get from there are the same as the answers that you get from these programs.

I've uploaded a spreadsheet which illustrates most of the things I've talked about in this post. It has examples for calculating standard curves, ΔΔCT, Standard Curve and Input Correction for ChIP-PCR. It also has worked examples for error propagation for both ΔΔCT (via simple additive error propagation) and for Standard Curves (via a taylor series).

Realistically though, for almost all cases, and particularly anything rtPCR related, I recommend using something off the shelf like REST 2009. There aren't many reasons you need to make your own spreadsheets - unless of course you are trying to explain how PCR normalisation works in a blog post.

Note that I've previously had errors in this spreadsheet myself. I take this as a sign of just how easy it is to make these mistakes and also a reminder that there could well be errors still lurking in this sheet (and in any other spreadsheet or software you've used). Thanks most recently to Duncan Brian (UCL) for highlighting that the SE from the Taylor Series I took from REST384 was inversely correlated to the SE of the individual components. I've updated the sheet to include both the taylor series which matches the output of REST 384 and also the Taylor series described in the Gene Quantification Platform.

References

Gene Quantification Platform, Real Time PCR amended, A useful new approach? Statistical problems? http://www.gene-quantification.com/avery-rel-pcr-errors.pdf, Last Accessed 2017.

Livak, Kenneth J., and Thomas D. Schmittgen. Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the $$$2^{-ΔΔCT}$$$ Method. Methods 25.4 (2001).

Pfaffl, Michael W. A new mathematical model for relative quantification in real-time RT–PCR. Nucleic Acids Research 29.9 (2001).

Vandesompele, Jo, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology 3.7 (2002).

82 comments:

ChatMai18 October 2013 at 07:09
Hi there,

Thank you for your article. It is very helpful. I have a question. I have more than one housekeeping genes, 2 genes at the moment. How can I use both genes to calculate the efficiency of housekeeping genes? I think qbase software can do it, but our project is on a budget now and I just hope there will be simple (or rather complicated) equations to be able to calculate it too. If the software can do, there will be another way to calculate it manually too. Thank you.

Chat
ReplyDelete
Replies
Anonymous7 November 2013 at 23:26
Hey,

very well written article! I'm still having problems in calculating error propagation. I could basically understand your calculations in your uploaded spreadsheet, but in your example, only one Ref-Gene is considered! How do I account for errors if e.g. 2 Ref_Genes are used, and if these are averaged before the first dCT is calculated! So how do i have to calculate to account for individual std errors or deviations of the Ref-Genes that these dont get lost after averaging and calculatind dCT? Hope it is clear what my problem is!
ReplyDelete
Replies
Anonymous9 November 2013 at 11:38
Thanks for your comment! If I get this right, a = mean CTs Ref a and error a = std err of mean Ref a, while 2g = 2*gmean of Ref a,b, right? So if want to just calculate the sub err of the first dCT (GOI-Ref) for condition x, that would be: squroot (Error(g/(a,b))*Error(g/(a,b))+((std err c)*(std err c)), c being the target gene? I just incorporated the sub err of ref genes a,b into the equation you mentioned in you upper text (replicates and error propagation)!? If this is right, how to calculate error propagation if one uses the arithmetic mean of refgenes a,b and not the gmean? Since the arithmetic mean is used in genex! and as you mentioned in your text, one should always use arithmetic mean, when data is in log scale - Cts are on log scale! What do you say? And thanks already for your answer!
ReplyDelete
Replies
Anonymous9 November 2013 at 18:55
Thanks Tony! I'll have a look at the paper!
ReplyDelete
Replies
Anonymous19 November 2013 at 22:04
Hi Tony,
Your spreadsheet helped me a lot with my qPCR calculations. I recently decided to use another reference gene. My question is how can I get the ratio of the genes if I am using two reference genes using the standard curve method? I cannot use the ddCt method because my primers have different efficiencies. So, do I need to calculate the geometric average of the Eff values for the reference genes?
Thanks in advance,
ReplyDelete
Replies
Unknown27 November 2013 at 18:27
Hi there,
I´m working on a qPCR protocol. I want to do a comparative method but using concentration of total DNA. How can I normalize the Ct numbers with the DNA concentration?
Thank you!
ReplyDelete
Replies
Unknown29 November 2013 at 13:21
Hello,
Could you comment on how the calculations will work on a time-course experiment ? I have a wild type (control) and knockout cell lines. For the first part, I normalize my samples to a single internal control(dCT). For my second part, I have 2 options - Either to normalize against my wild type (control) cell line or to normalize against Day 0 for each cell line. I am not sure which one is the most useful. Also the calculation of errors is something I am having problems with. Do I simply perform SD calculations on biological+technical replicates. If so, which values should I use ?

Cheers
Shredz
ReplyDelete
Replies
n0idbi0n29 November 2013 at 15:16
Thanks for the very helpful article, but I'm a bit confused with this statement:

"The most frequent mistake I've seen is the use of the wrong type of mean when calculating the average ratio. The key fact to remember is to always use the arithmetic mean on anything on a linear scale and always to use the geometric mean on anything on an exponential scale.

More concisely, if it's CT values (or differences thereof), then use an arithmetic mean. If it's concentration values (anything that is EfficiencyCT) then use the geometric mean."

Aren't the Ct values already on an exponential (ie. log2) scale, and by taking E^Ct, one is transforming to a linear scale? If this is the case, then the opposite is true: use the arithmetic mean on log scale and geometric mean on linear scale.
ReplyDelete
Replies
n0idbi0n29 November 2013 at 16:12
Thanks that does make it more clear. I think you're right that the confusion comes from the terminology, and my poor grasp of it: I'm thinking that since the Ct values are actually indicating a doubling of the copy number, they are naturally *in* an exponential scale - which, as you say, allows them to be plotted *on* a linear scale (without log transforming first). Maybe pedantic.
ReplyDelete
Replies
Anonymous8 January 2014 at 14:17
Hi! I found your page useful. If you have time to answer: I "inherited" a spreadsheet in which the application of the ddCt method is different.

1. A delta Ct value is calculated for every biological replicate. (after technical duplicates being averaged).

2. The average delta Ct of the control group is calculated and this is used to calculate the deltadelta Ct value for each sample (including the members of the control groups) separately. (sample=biological replicate=sample from one animal).

That way, an individual fold change value is gained for every biological replicate. These fold changes are then averaged in the control and the treated group. Standard error of the mean or other statistics are calculated after that. I found there are slight differences in the results compared to the standard ddCt. Mathematically it may matter at which step the averaging is done but I am not sure if valuable data is lost or not or some kind of bias is caused or not.

This method looks like a mess at first and it seems to be a hybrid method (delta Ct values are averaged in the control group but not in the treated group). What's your view? Could it work? I have a lot of data already analysed that way.
ReplyDelete
Replies
Anonymous14 January 2014 at 08:05
Thank you for your clear explanation.
ReplyDelete
Replies
Jonas20 January 2014 at 21:30
Hi Tony.
Thanks for a good explanation, I do however have a question.
I just performed a qPCR experiment, but im a bit confused about the use of the Delta delta Ct method.

I have following results:

Normal conditions: 16.6 and 17.6 (Target gene)

Stress conditions: 18.24 and 17.3 (Target gene)

Normal conditions: 20.55 and 20.89 (Ref gene)

Stress conditions: 19.9 and 20.26 (Ref gene)

So initial what i want to do it see if the expression of the stress gene pr ref gene is higher in stress conditions than in normal condition.

Which by looking at the numbers should be, since the difference in Ct values for the samples are more closely related right?

However when I do the delta delta calculation (I use the average of the dublicates)

17.1 - 20.7 = -3.6

17.7-20.08 = -2.38

2^-(-2.38-(-3.6)) = 0.35

So as far as I understand the delta delta Ct this means that the expression in stress genes is 35% lower than in normal conditions right? Which I dont think really understand according to when i just look at the numbers. Is it because it uses total expression? The reason why I dont want this is that my organism may die under the more stress full conditions so the total expression might be lower.
ReplyDelete
Replies
rashmi27 January 2014 at 07:28
Hi Tony
Thanks for all the information. How do this analysis work if I have two groups of patients (healthy and diseased, say 10 patients in each group)?

ReplyDelete
Replies
Unknown25 February 2014 at 08:01
Hi Tony,

Thanks for all the info about qpcr analysis. I have a question regarding the quantification of gene expression only at one time point, such as only at baseline (time=0). Let me summarize below:

1- I see papers that report gene expression results as GOI/reference and report higher expression with higher values. In this case, wouldnt higher values mean less expression? For exp: GOI1= 30, GOI2= 20, REF= 10. Isn't GOI2 expressed more than GOI1?

2- If using GOI/REF is ok, should I log-transform the results?

3- Would it be better to use 2^-dCt to quantify baseline gene expression?

4- Can I use these results in a correlation?

Thanks very much in advance!
ReplyDelete
Replies
Lucretia18 March 2014 at 19:15
This has been very helpful and clear.

For the ChIP % input, using the example in the file, if you get a % input of 0.02506 is that 0.02% of the input or 2.5%? I have seen other formulas that make the conversion and I am unclear which is more accurate.

Also, I don't which error to use if I want to display my ChIP data in % input. Is that what the % input upper and lower are? Could you elaborate or send me in the right direction?

Thank you so much for shedding light on qPCR and other topics.
ReplyDelete
Replies
Anonymous3 April 2014 at 01:27
Thanks for the helpful tutorial Tony.

I just wondered why the initial equation must be swapped around from treated - untreated in the first line (ΔΔCT=ΔCT(treatedsample)−ΔCT(untreatedsample)), to untreated - treated in the last line (ΔΔCT=(CT(target,untreated)−CT(ref,untreated))−(CT(target,treated)−CT(ref,treated))?
ReplyDelete
Replies
Unknown11 April 2014 at 12:26
Hello Tony, Im glad I found this informal site for qPCR.

I am recently learning the method and have some questions.

I have a reference gene with efficiency of 2 and a taget gene with efficiency of 1.79. I heard this can be used with a specific formula which I have found it here on this site also.

But concerning the analysis of the graphs with the CT-values. Should I use fit point (set the threshold manually) or the second derivative (set the threshold automatically)?

Thank you in advance!

ReplyDelete
Replies
Pez7 May 2014 at 13:14
Hi Tony,

Thanks a lot for this post, it is very useful. One question about your suggestion of normalising to the geometric mean of multiple housekeeping genes. Presumably if you were to do this averaging of housekeeping genes at the Ct level, you would be taking the arithmetic mean of the Cts and then use this as the ref Ct for the delta-delta Ct calculation? This would give the same result as calculating the fold changes for each housekeeping gene individually and taking the geometric mean of the fold change results. I am basing this assumption on your advice of using arithmetic means for Ct and geometric means for fold changes. I took a look at the reference you gave but it wasn't clear given that they don't really discuss the delta-delta Ct method.

Thanks,

MFP
ReplyDelete
Replies
Sabe20 May 2014 at 14:11
Hello,

This is an interesting post, but I'm just a beginner and I am having problems with the technical replicates. After identifying the bad replicates (difference larger than 0.5) with the package Easyqpcr in R, I proceeded to eliminate them. However, this leaves the data unbalanced (different number of Cq between genes), and it seems that REST 2009 doesn't like that. Do you have any advice that could help me? (I decided to delete both replicates when they were too different since I didn't have a good enough reason to choose one of them)

Thank you very much, I would appreciate any help,
Sabela.
ReplyDelete
Replies
Anonymous21 May 2014 at 08:11
Hi, I have a couple of questions about your spreadsheet if you have time.

First, in the 'ddCt RNA - Error Prop 2 Refs' sheet, when calculating 'Ref gene std err', the formula reads:
=(1/(2*C22))*SQRT( (B19*B21)^2 + (C19*C21^2) )

Should it read like this instead?
=(1/(2*C22))*SQRT( (B19*B21)^2 + (C19*C21)^2 ) (moved the final ^2)

Second, I'm very new to this type of analysis. Could you explain the role of the first term in that equation for me? (1/(2*C22))

Thank you very much for your help. Your post and spreadsheet have helped me see the logic in all of this.
Thanks,
Nick.
ReplyDelete
Replies
Unknown31 May 2014 at 20:33
Dear Tony,
I came here by accident! What a interesting blog you have, congratulations. Everything very detailed, nevertheless I haven´t found the answer to my question, let´s see if you can help me. In my case I´m studying organ profile, i.e. relative expression level for 5 genes in 6 different organs, I´m using one gene as a housekeeping and biological triplicates. At the beginning, in my ignorance, my idea was to use the Standard Curve method because I have the Efficiencies for all genes (Just to inform that the Ef are on the range of 95-101% for all Primers tested)... But now I just realized that in my particular case I don´t have TREATED and/or UNTREATED templates to perform a delta Ct. You see I only have templates collected from different tissues/organs (a Ct!). How do you suggest to estimate the ration between my Test gene and my Housekeeping gene in this particular case (considering that they have different Ef!)? Should I use the same equation as Pfafll considering that the treated sample is either for Test and reference genes 0 (zero)? Thanking in advance, hope you can help me here. Best regards. Marco.
ReplyDelete
Replies
Unknown2 June 2014 at 20:50
Hi Tony,
Thanks for the helpful post! One (maybe naive) question: in your Taylor Series you use the error of your primer efficiencies, but how do you calculate those errors? In your spreadsheet they are listed as 0.015. I have 3 technical replicates for my standard curve, so do I calculate primer efficiency with each replicate and then find the standard error of those calculated efficiencies?

Thanks!
Anne
ReplyDelete
Replies
Manyan F15 June 2014 at 21:55
Hi Tony!

I'm a Master's student trying to finish up my thesis and my qPCR data are driving me bananas. I analyzed 7 genes with 1 housekeeping gene (only one because b-actin was the only gene tested in another experiment that was stable for ozone condition in mouse lung tissues).

The premise of my experiment was to test the efficacy of 2 antioxidant diets (20% or 100%) in male and female after exposing them to either air or ozone.

For example:

MAC = Male Air Control
MOC = Male Ozone Control
MA20 = Male Air 20% diet
MO20 = Male Ozone 20% diet
MA100 = Male Air 100% diet
MO100 = Male Ozone 100% diet

I analyzed all of my data already in Excel according to this, page 15:

http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_040980.pdf

I used MAC (or FAC if females) as my untreated,control for everything. I don't know if this is correct since the instructions do not incorporate geometric mean. Also, I need to find out the p-value so that I can compare among my diet groups. How do I get the p-value on the ONE delta delta CT value if it was calculated based on a average of bunch of CT values (I took the average of all my biological replicates early on so the rest of my calculations are just working using ONE value).

I have my Excel files here and if I could send one to you as an example then you'll understand what I mean and point out any mistakes I am making.

Thank you for this informative bog post btw!
ReplyDelete
Replies
Samantha Moxed22 August 2014 at 20:16
Hi! Great information on this site. I'm new to qPCR and was a bit confused regarding the normalization, as different sources have advised me differently.

I did an RNA-IP, IPs were done with three different antibodies: antibody A, antibody B, and IgG.

For my RNA-IP experiment, I had whole cell lysate (total input) which I ran over a size exclusion column, a fraction (pool) of which contained my RNAs of interest. The IPs were setup with that pool.

I'm interested in determining whether a specific RNA is enriched in the IP over the pool or the total lysate input.

For the RT-qPCR, do I need to do any scaling prior to running the qPCR? For instance, each IP contains about 5% of the total input, or about 30% of the pool. Do I need to adjust the amount of cDNA I put into the qPCR reaction or is all scaling done in the post analysis (the %input formula you wrote about). Do I load the same amount (ng) of cDNA into each qPCR reaction?

Thank you for your help!
ReplyDelete
Replies
Raj the king31 August 2014 at 19:30
Best blog ever on this aspect..Simple and clear,,You have answers for every doubt of mine..Thanks,,Good Luck,
ReplyDelete
Replies
cherry31 August 2014 at 22:57
Hi, Tony, Great information you are having. Your post does helped me a lot. Nevertheless, I am still having some doubts with normalization of qPCR.

Please bare with me as the following questions that I am going to ask might be naive as I am still new in qPCR. I have read up a lot of articles and forums on how to analyze the qPCR result, but I am just couldn't figure out the overflow/steps/fundamental of normalizing the qPCR result. Perhaps you could help me on this? By telling me the flow of analyzing the qPCR data?

1.) One of my experimental objective is to study the expression of 8 ncRNA genes to check whether they are either up or down-regulated subjected in different conditions. I have 2 conditions in my experiment which is glucose as my experimental control and polyethylene(PE) powder as my treated condition. For your information, I am quantifying using absolute quantification, 3 housekeeping genes(HKGs) were tested together in this experiment. I have decided to choose ddCt method and geometric mean to normalize my data. You talked about the ratio which is below:

Ratio=2CT(target,untreated)−CT(target,treated) / 2CT(ref,untreated)−CT(ref,treated)
For my case, I should do it like this:
Ratio=2CT(target,glucose)−CT(target, PE) / 2CT(ref,glucose)−CT(ref,PE)
Am I right?

2.) Apparently, I found another similar equation which is below:

Ratio=2CT(target,untreated)−CT(target,treated) / cube root of [ H(X)g1*H(X)g2*H(X)g3]
the denominator is geometric mean of multiple HKGs.
Is it correct if I normalize my result using only second equation and ignore the first one as I am having 3 HKGs? The first equation is meant for single gene normalization?

3.) Just let's say I am having 12 samples, 6 for each glucose and PE. After substituting my Ct values into the ratio's equation and I got 12 respective ratios. Should I proceed to SPSS independent t-test analysis or I should use another program called NormFinder? I got to know this program via published papers, but I do not really understand the principle and functions behind it. Hope that you can help me explain on this.

4.) Is it necessary for us to normalize our data first before proceed into t-test analysis? Can I just do t-test directly without doing ddCT method first? Which means I have my Ct result of target gene and HKGs obtained from qPCR experiment. Can I just take the Ct values and perform t-test analysis directly without doing the ddCT method?

5.) There is a lot of paper mentioning about p-value. Could you please tell me what is p-value? How can I find it?

6.) Is it that normalization is to validate our HKGs?

7.) How do know whether a gene is up or down-regulated based on the ratio?

8.)Based on my understanding, the flow would be:
Ct value obtained > substitute into ddCT/ geometric mean equation > t-test analysis > ANOVA > analyze your result whether they are significant different?
Please tell me if I miss anything.

Very much appreciation if you could help on this. I am know that I am very in weak in fundamental. So sorry if I have caused any inconvenience.
Thank you.

Cheers,
Cheryl =)
ReplyDelete
Replies
Unknown23 September 2014 at 15:03
Thank you very much for this article. Even in the article by Livak and Schmittgen [1], which I used as a reference, wrong formulas based on arithmetic means and standard deviations are used instead of the correct geometrical metrics. No wonder there is so much confusion around, with the lack of clear and correct articles!

I want to plot concentration values with standard deviation (not standard error) bars, because I need a measure of the spread (that is something about which many people seem to be confused, too, thinking SEM would be a suitable measure of the spread [2]).

Thus, I calculated geometrical means and geometrical standard deviations from the final concentration values 2^(–ΔΔCₜ). However, one thing is still not clear to me:

If I am right, the spread of the final concentration values is made up of two components: (a) the “biological” variability already present in the population, (b) the “technical error” variability due to technical/methodical inaccuracies. If we assume that the real concentrations in the original population are normally distributed, the “biological” spread should follow a normal distribution at the level of the final concentration values, while the “technical error” spread should follow a normal distribution at the level of the Cₜ values and should thus follow a logarithmic distribution at the level of the final concentration values.

Accordingly, if the “biological” spread is (much) bigger than the “technical error” spread, it would be more appropriate to calculate arithmetic standard deviations for the final concentration values, wouldn’t it? If both “biological” spread and “technical error” spread contribute substantially to the data dispersion, I really do not know how to calculate proper standard deviations ...

[1] http://www.ncbi.nlm.nih.gov/pubmed/11846609
[2] http://www.sportsci.org/resource/stats/meansd.html
ReplyDelete
Replies
Unknown1 October 2014 at 09:50
Hi, thanks for a veryu helpful post. I just have some questions I try to get my head around.

First I have a question regarding the delta CT SEM value. In your spreadsheet you calculate the SEM for condition A and condition B and then pool these to a common delta CT SEM.
In another guide from applied biosystems, they seem to use the delta CT SEM from just the treated condition when calculating the delta delta CT SEM and further to fold change variations.
http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_042380.pdf
(p.58). Would you say that your calculation is the correct one?

My second question is about the T-test. For the T-test you use the built-in formula in excel. This calculates the SD and mean from the given dCT values in the gray box. Is this the correct SD values to use in the T-test or should you use delta CT SD values calculated the same way as the "Sub errs" (but with SD instead of std err of cource)?

I hope you can understand what I am asking!

Kind regards,
Alexander Hedbrant
ReplyDelete
Replies
Unknown6 October 2014 at 20:17
I have a comparative question for which there may be help in the community. I am trying to compare the expression of multiple genes in brain tissue across two different mammalian species. Because any particular gene will have nucleotide differences between species, I don't want to assume equal amplification efficiencies for any gene. I have collected CTs and efficiencies for multiple genes in both species, including those for one or two genes that appear to serve as viable housekeeping genes for normalization. The trick is how to correctly present the data and draw biological conclusions. I am not aware of any software package or published protocol for relative quantification of gene X in samples A and B when the efficiency may differ between samples.

Any advice is appreciated.

C.A. Baker
ReplyDelete
Replies
Unknown10 October 2014 at 14:20
I,
Can someone help me with a problem?
If i am using a real time PCR just to check if one or more genes is present or not the Ct value for the present genes need to be the same, or almost the same?

Thank you in advance
R.Carvalho
ReplyDelete
Replies
L16 February 2015 at 21:12
hi Tony,
this post is very helpful.

two questions about ChIP qPCR data and % input

I was wondering what exactly you mean by "subset of the Input". for my ChIP protocol, I perform immunoprecipitation on a 1 ml aliquot of cell lysate. I also take a 40 ul aliquot of cell lysate that is my "total input"- this second aliquot is not IPed but is essentially treated identically. I do not otherwise dilute the total input sample. so in this case, my input dilution factor would be 250 (or 1000 ul/40 ul), correct?

second question...what exactly does % input mean? is it accurate to say that % input is the percent of your target amplicon that is bound to the IPed protein in your cells?

best,
L
ReplyDelete
Replies
Nashar17 April 2015 at 17:42
Hi Tony, I usually go to ResearchGate to get answers, but found this blog interesting. I read that it is best to normalize gene expresion to more than one reference gene. We are comparing target gene expression in treated and untreatedd animals. We would like to use one in house reference gene and another reference gene (the same target gene in another organ as a reference). So lets say we are looking at expression of the target gene in the spleen and comapre it to intestine, pancreas etc..in this case we use b-actin as in house reference and lungs as an organ reference for all other organs mentioned above. How do you calculate DDCT between treated and untreated control animals with the 2 references above?. My reasoning is as it follows: 1) in "untretreated", normalize to b-actin reference (thus calculate DCT1). Then normalize the same "untreated" to the lungs (thus calculate DCT2). Call the difference DDCTa. 2) Do the same for the "treated" and call it DDCTb. 3) calculate the difference between DDCTa and DDCTb. Does this make anu sense? Thank you
ReplyDelete
Replies
Unknown18 May 2015 at 08:39
Hello,

Well written article! I have a question regarding my qPCR experiment and the application of the delta delta CT method to my data. I have the CT values, Efficiency and slope of 3 reference genes and 3 target genes. However, the cDNA are added to the Sybr. green mix as a 500x dilution (I started with 200ng cDN in 20 ul) for the reference genes, while the cDNA for my target genes is diluted only 20x. Is it possible to apply the method when I use different dilutions? How can I calculate back to be able to compare the data?

Thank you!
ReplyDelete
Replies
Preet8 June 2015 at 14:46
Hi!
Often times in our experiments, we are running into a problem where the mean fold change value for the control is greater than 1. For example, recently in a qPCR run with four control samples and four treatments, the reaction was run in triplicates. The triplicates gave highly similar C(t) values, with the largest observed difference being 0.84 and the smallest being 0.09. If our calculations have been checked multiple times and the control being utilized is reliable, what could be any potential reason why the mean fold change is greater than 1 (a value 1.13 for the previously described experiment)? Is there anything we can do to avoid this in the future?

Thanks!
ReplyDelete
Replies
Tony McBryan18 July 2015 at 17:07
Note: I no longer work in the bioinformatics field and I'm generally unable to respond to comments on this post in reasonable timescales.

I'm also cognisant that my specific knowledge in the field is getting slowly more fuzzy due to lack of use as well as more out of date.

Because of these reasons I'm going to disable future comments on this article and recommend that anyone with questions should attempt to contact a local bioinformation or computational biologist within their institution. If your institution does not have anyone that matches this description then you should probably start advocating that such a position is created - as you've probably found out it's getting harder to do modern biology without this type of support.
ReplyDelete
Replies

Add comment

New comments are not allowed.