tag:blogger.com,1999:blog-5657701209880344284.post5148735681642524482..comments2020-05-23T15:41:39.112+01:00Comments on McBryan's Musings: The G-Test and PythonTony McBryanhttp://www.blogger.com/profile/14143519988774162737noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-5657701209880344284.post-22285682320630478052014-02-10T03:45:38.613+00:002014-02-10T03:45:38.613+00:00Thanks. Here is a quick-and-dirty Chi-square compu...Thanks. Here is a quick-and-dirty Chi-square computation for independence of two samples (for 2xN contingency tables where N is number of rows), after http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm<br /><br />Although G-test is preferable in most cases, I would argue that the way Chi-square statistic here handles zero entries is a bit neater. Apologies for the formatting, not sure how to enter pre-formatted text into these comments.<br /><br /> def chisqtest2(a):<br /> """Compute Chi-square statistic for independence of two samples"""<br /> def compute_row(row):<br /> row_total = math.fsum(row)<br /> return (K1 * row[0] - K2 * row[1]) ** 2 / row_total \<br /> if row_total > 0 \<br /> else 0.0<br /><br /> sum_r = math.fsum(row[0] for row in a)<br /> sum_s = math.fsum(row[1] for row in a)<br /> K1 = math.sqrt(sum_s / sum_r)<br /> K2 = math.sqrt(sum_r / sum_s)<br /> return math.fsum(map(compute_row, a))bluekeyboxhttps://www.blogger.com/profile/00666582406844997179noreply@blogger.comtag:blogger.com,1999:blog-5657701209880344284.post-89080557294224117842013-10-14T12:14:37.687+01:002013-10-14T12:14:37.687+01:00Hi,
I've only shown the proof for a 2x2 table...Hi,<br /><br />I've only shown the proof for a 2x2 table above. I think, but have not proven explicitly, that the same trick can be applied to contingency tables of any size.<br /><br />i.e. G = 2 * (celltotals + total - (rowtotals + columntotals) ) for any sized contingency table.<br /><br />If you only have a small number of tables to do it might be easier to use a spreadsheet such as the one at the <a href="http://udel.edu/~mcdonald/statgtestind.html" rel="nofollow">Handbook of Biological Statistics</a>.<br /><br />I notice they use exactly the same approach for a larger table in their spreadsheet so I'm obviously not the first to notice this pattern.<br /><br />I've updated the post above to include a gtest function which should work for any sized contingency table if my assumption proves to be correct. Note I've not tested this latest function in any particularly robust way and it is provided entirely as is and without any warranty as to correctness.<br /><br />It does seem to agree with likelihood.test from the Deducer package in R for the few samples I've tested it with.<br /><br />TonyMcBryanhttps://www.blogger.com/profile/02439203188989397132noreply@blogger.comtag:blogger.com,1999:blog-5657701209880344284.post-29535770488512721302013-10-14T10:58:09.303+01:002013-10-14T10:58:09.303+01:00Hi Tony,
thanks for your reply.
So using a conti...Hi Tony,<br /><br />thanks for your reply. <br />So using a contingency table sized other than 2x2 (e.g. 5 columns and 12 rows) would require a different implementation, right?<br /><br />NikitaAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-5657701209880344284.post-33908940588843087702013-10-08T12:37:38.971+01:002013-10-08T12:37:38.971+01:00Hi,
I've used the G-Test for calculating the ...Hi,<br /><br />I've used the G-Test for calculating the significance of methylation changes in whole genome sequencing experiments (<a href="http://blog.mcbryan.co.uk/2013/07/the-g-test-and-bisulphite-sequencing.html" rel="nofollow">http://blog.mcbryan.co.uk/2013/07/the-g-test-and-bisulphite-sequencing.html</a>).<br /><br />That link is a specific (although complicated) example on how you might use the G-Test in practice.<br /><br />If you already have a 2x2 contingency table generated in some python code of your own you should be able to just paste the two code samples on this page into your script and call gtest() to return you a G-value followed by GtoP() to convert the G-value to a p-value.<br /><br />TonyMcBryanhttps://www.blogger.com/profile/02439203188989397132noreply@blogger.comtag:blogger.com,1999:blog-5657701209880344284.post-12936082414997643132013-10-08T12:20:49.324+01:002013-10-08T12:20:49.324+01:00Hi,
thanks for this code. Could you provide an ex...Hi,<br /><br />thanks for this code. Could you provide an exmaple using your solution? I am kind of confused how to actually calculate G-value and P-value for my data.Anonymousnoreply@blogger.com