Difference between revisions of "ANOVA"

From Clinfowiki
Jump to: navigation, search
Line 1: Line 1:
==ANOVA==
+
'''ANOVA''' tests hypotheses that are made about differences between two or more means. If independent estimates of variance can be obtained from the data, ANOVA compares the means of different groups by analyzing comparisons of variance estimates. There are two models for ANOVA, the fixed effects model, and the random effects model (in the latter, the treatments are not fixed).
  
 +
== ANOVA assumptions ==
  
==DESCRIPTION==
+
* cases are independent
 +
* Distributions are normal
 +
* Variance of data in groups is homogeneous
  
Some definitions:
+
The one way ANOVA test compares  several groups of observations, all of which are independent but possibly with different group means. Two way ANOVA  studies the effects of two factors separately (their main effect) and  together (their interaction effect).
  
* The t-test is a powerful statistical test that can be used to test differences between two means.
+
== Statistical vocabulary ==
  
 +
* The t-test is a powerful statistical test that can be used to test differences between two means.
 
* The null hypothesis claims that there is no difference between the terms we are testing.  
 
* The null hypothesis claims that there is no difference between the terms we are testing.  
 
 
* The object of our testing is to either validate or reject the null hypothesis.  
 
* The object of our testing is to either validate or reject the null hypothesis.  
 
 
* The p-value is the probability of obtaining a result at least as extreme as a given data point, under the null hypothesis.  
 
* The p-value is the probability of obtaining a result at least as extreme as a given data point, under the null hypothesis.  
 
 
* A Type I Error occurs when we falsely reject the true null hypothesis.  
 
* A Type I Error occurs when we falsely reject the true null hypothesis.  
  
 
+
== History ==
''WHAT IS ANOVA?''
+
 
+
ANOVA tests hypotheses that are made about differences between two or more means. If independent estimates of variance can be obtained from the data, ANOVA compares the means of different groups by analyzing comparisons of variance estimates. There are two models for ANOVA, the fixed effects model, and the random effects model (in the latter, the treatments are not fixed).
+
 
+
 
+
ANOVA makes some assumptions fundamental to the theory:
+
 
+
* cases are independent
+
 
+
* Distributions are normal
+
 
+
* Variance of data in groups is homogeneous
+
 
+
 
+
The one way ANOVA test compares  several groups of observations, all of which are independent but possibly with different group means. Two way ANOVA  studies the effects of two factors separately (their main effect) and  together (their interaction effect).
+
 
+
 
+
 
+
 
+
 
+
==HISTORY==
+
  
 
ANOVA was initially suggested by the British statistician Sir Ronald Aylmer Fisher in the 1920s. He was English, and was educated at Harrow and Cambridge. He was very interested in genetics. ANOVA uses Fisher's F-distribution as part of the test of statistical significance. Some of his famous papers include "On the mathematical foundations of theoretical statistics", published in the Philosophical Transactions of the Royal Society in 1922, and  "Applications of Student's distribution" , published in 1925.
 
ANOVA was initially suggested by the British statistician Sir Ronald Aylmer Fisher in the 1920s. He was English, and was educated at Harrow and Cambridge. He was very interested in genetics. ANOVA uses Fisher's F-distribution as part of the test of statistical significance. Some of his famous papers include "On the mathematical foundations of theoretical statistics", published in the Philosophical Transactions of the Royal Society in 1922, and  "Applications of Student's distribution" , published in 1925.
  
  
 
+
== Purpose ==
 
+
 
+
==PRINCIPAL USE==
+
  
 
It is possible to use the t-test to compare more than two means, but this method raises the rate of type I errors. ANOVA (Analysis of variance) is used to test differences among multiple means without increasing the Type I error rate.
 
It is possible to use the t-test to compare more than two means, but this method raises the rate of type I errors. ANOVA (Analysis of variance) is used to test differences among multiple means without increasing the Type I error rate.
Line 55: Line 32:
  
 
* robust design
 
* robust design
 
 
* increases statistical power
 
* increases statistical power
 
  
 
In addition a two way ANOVA  
 
In addition a two way ANOVA  
  
 
* looks at interaction between factors
 
* looks at interaction between factors
 
 
* reduces random variability
 
* reduces random variability
 
 
* can look at effect on second variable after controlling the first variable
 
* can look at effect on second variable after controlling the first variable
 
 
  
 
==SHORTCOMINGS==
 
==SHORTCOMINGS==
  
 
* if null hypothesis is rejected, we know at least one group differs from others, but with a one way ANOVA and multiple groups, it may be difficult to determine which group is different
 
* if null hypothesis is rejected, we know at least one group differs from others, but with a one way ANOVA and multiple groups, it may be difficult to determine which group is different
 
 
* assumptions need to be fulfilled
 
* assumptions need to be fulfilled
  
 +
== Examples ==
  
 
+
# Rennie CA, Hannan S, Maycock N, Kang C. Age-related macular degeneration: what do patients find on the internet?  J R Soc Med. 2007 Oct;100(10):473-7.   
 
+
# Internet sites were scored for technical information, quality, and SMOG (Simple Measure of Gobbledygook) using  one-way ANOVA tests
 
+
# Petrovecki M, Rahelic D, Bilic-Zulle L, Jelec V. Factors influencing medical informatics examination grade--can biorhythm, astrological sign, seasonal aspect, or bad statistics predict outcome? Croat Med J. 2003 Feb;44(1):69-74.
 
+
==EXAMPLES IN INFORMATICS==
+
 
+
Rennie CA, Hannan S, Maycock N, Kang C. Age-related macular degeneration: what do patients find on the internet?  J R Soc Med. 2007 Oct;100(10):473-7.   
+
 
+
Internet sites were scored for technical information, quality, and SMOG (Simple Measure of Gobbledygook) using  one-way ANOVA tests.
+
 
+
 
+
Petrovecki M, Rahelic D, Bilic-Zulle L, Jelec V. Factors influencing medical informatics examination grade--can biorhythm, astrological sign, seasonal aspect, or bad statistics predict outcome? Croat Med J. 2003 Feb;44(1):69-74.
+
  
 
This is an interesting study (though probably one with limited academic value). It looked at how "pseudoscientific variables" such as zodiac sign or biorhythm cycles affected a medical informatics exam grade.   
 
This is an interesting study (though probably one with limited academic value). It looked at how "pseudoscientific variables" such as zodiac sign or biorhythm cycles affected a medical informatics exam grade.   
Line 95: Line 57:
 
The answer: general learning capacity and computer habits correlated with exam grades, but there was no correlation between  grades and zodiac signs, biorhythms, students sex, or time of year when exam was taken (so I guess my zodiac sign and the fact that I once lived in Finchley, London, the same place where R.A. Fisher was born, had nothing to do with my selection of this study). However, the authors also came up with this masterfully understated statement -- "Inadequate statistical analysis can always confirm false conclusions".
 
The answer: general learning capacity and computer habits correlated with exam grades, but there was no correlation between  grades and zodiac signs, biorhythms, students sex, or time of year when exam was taken (so I guess my zodiac sign and the fact that I once lived in Finchley, London, the same place where R.A. Fisher was born, had nothing to do with my selection of this study). However, the authors also came up with this masterfully understated statement -- "Inadequate statistical analysis can always confirm false conclusions".
  
Quantitative technique: One-Way Analysis of Variance (ANOVA)
+
== Quantitative technique: One-Way Analysis of Variance (ANOVA) ==
  
Description:
 
 
The analysis of variance is a partitioning of the total variance in a set of data into a number of component parts, so that the relative contributions of identifiable sources of variation to the total variation in measured responses can be determined. From this partition, suitable F-tests can be derived that allow differences between sets of means to be assessed.1
 
The analysis of variance is a partitioning of the total variance in a set of data into a number of component parts, so that the relative contributions of identifiable sources of variation to the total variation in measured responses can be determined. From this partition, suitable F-tests can be derived that allow differences between sets of means to be assessed.1
  
 
Thus ANOVA is a biostatistical method for determining whether a difference exists between the means of three or more independent populations. Expressed mathematically, it tests the null hypothesis- H0: 41 = 42 = 43  The one-way ANOVA parametric test will result in either accepting or rejecting this null hypothesis. If we reject the null hypothesis, then we can conclude that the population means are not equal. We do not know however whether all the means are different from one another or only some of them are different. This additional specificity is determined by conducting multiple comparison procedures, i.e. additional statistical tests.2
 
Thus ANOVA is a biostatistical method for determining whether a difference exists between the means of three or more independent populations. Expressed mathematically, it tests the null hypothesis- H0: 41 = 42 = 43  The one-way ANOVA parametric test will result in either accepting or rejecting this null hypothesis. If we reject the null hypothesis, then we can conclude that the population means are not equal. We do not know however whether all the means are different from one another or only some of them are different. This additional specificity is determined by conducting multiple comparison procedures, i.e. additional statistical tests.2
  
History:
+
== History ==
 
The phrase “analysis of variance” was coined by Sir Ronald Aylmer Fisher, a statistician of the twentieth century, who defined it as “the separation of variance ascribable to one group of causes from the variance ascribable to the other groups.”1
 
The phrase “analysis of variance” was coined by Sir Ronald Aylmer Fisher, a statistician of the twentieth century, who defined it as “the separation of variance ascribable to one group of causes from the variance ascribable to the other groups.”1
  
Principal use:
+
== Principal use ==
 
One-way ANOVA is used when the researcher is comparing multiple groups (more than two) because it can control the overall Type I error rate.
 
One-way ANOVA is used when the researcher is comparing multiple groups (more than two) because it can control the overall Type I error rate.
  
 
Advantages:
 
Advantages:
It provides the overall test of equality of group means
+
* It provides the overall test of equality of group means
It can control the overall type I error rate (i.e. false positive finding)
+
* It can control the overall type I error rate (i.e. false positive finding)
It is a parametric test so it is more powerful, if normality assumptions hold true
+
* It is a parametric test so it is more powerful, if normality assumptions hold true
  
 
Shortcomings:
 
Shortcomings:
Requires that the population distributions are normal
+
* Requires that the population distributions are normal
It assumes equality of variances for each group
+
* It assumes equality of variances for each group
  
Examples in bioinformatics:
+
== Examples in bioinformatics ==
  
 
Maker VK, Donnelly MB. Surgical resident peer evaluations – what have we learned. J Surg Educ. 2008 jan-Feb;65(1):8-16.
 
Maker VK, Donnelly MB. Surgical resident peer evaluations – what have we learned. J Surg Educ. 2008 jan-Feb;65(1):8-16.
Line 125: Line 86:
 
Cohen A, Fleischer JB, Johnson MK, Brown IN, Joe AK, Hershman DL, McMahon DJ, Silverberg SJ. Prevention of bone loss after withdrawal of tamoxifen. Endocr Pract. 2008 Mar;14(2):162-7.
 
Cohen A, Fleischer JB, Johnson MK, Brown IN, Joe AK, Hershman DL, McMahon DJ, Silverberg SJ. Prevention of bone loss after withdrawal of tamoxifen. Endocr Pract. 2008 Mar;14(2):162-7.
  
Sources:
+
== Sources ==
1. Landau S, Everitt BS. A Handbook of Statistical Analyses Using SPSS, Chapman & Hall/CRC, 2004.
+
  
2. Pagano M, Gauvreau K. Principles of Biostatistics, 2nd Edition, Duxbury Press, Pacific Grove, CA, 2000.
+
# Landau S, Everitt BS. A Handbook of Statistical Analyses Using SPSS, Chapman & Hall/CRC, 2004.
 +
# Pagano M, Gauvreau K. Principles of Biostatistics, 2nd Edition, Duxbury Press, Pacific Grove, CA, 2000.
  
 
[[Category:BMI560-W-08]]
 
[[Category:BMI560-W-08]]
Line 134: Line 95:
  
  
==REFERENCES==
+
== References ==
 
+
http://digital.library.adelaide.edu.au/coll/special//fisher/18pt1.pdf
+
  
http://digital.library.adelaide.edu.au/coll/special/fisher/43.pdf
+
# http://digital.library.adelaide.edu.au/coll/special//fisher/18pt1.pdf
 +
# http://digital.library.adelaide.edu.au/coll/special/fisher/43.pdf

Revision as of 14:04, 20 October 2011

ANOVA tests hypotheses that are made about differences between two or more means. If independent estimates of variance can be obtained from the data, ANOVA compares the means of different groups by analyzing comparisons of variance estimates. There are two models for ANOVA, the fixed effects model, and the random effects model (in the latter, the treatments are not fixed).

ANOVA assumptions

  • cases are independent
  • Distributions are normal
  • Variance of data in groups is homogeneous

The one way ANOVA test compares several groups of observations, all of which are independent but possibly with different group means. Two way ANOVA studies the effects of two factors separately (their main effect) and together (their interaction effect).

Statistical vocabulary

  • The t-test is a powerful statistical test that can be used to test differences between two means.
  • The null hypothesis claims that there is no difference between the terms we are testing.
  • The object of our testing is to either validate or reject the null hypothesis.
  • The p-value is the probability of obtaining a result at least as extreme as a given data point, under the null hypothesis.
  • A Type I Error occurs when we falsely reject the true null hypothesis.

History

ANOVA was initially suggested by the British statistician Sir Ronald Aylmer Fisher in the 1920s. He was English, and was educated at Harrow and Cambridge. He was very interested in genetics. ANOVA uses Fisher's F-distribution as part of the test of statistical significance. Some of his famous papers include "On the mathematical foundations of theoretical statistics", published in the Philosophical Transactions of the Royal Society in 1922, and "Applications of Student's distribution" , published in 1925.


Purpose

It is possible to use the t-test to compare more than two means, but this method raises the rate of type I errors. ANOVA (Analysis of variance) is used to test differences among multiple means without increasing the Type I error rate.

As the number of groups increases, the number pair comparisons increases substantially and calculations become overwhelming very quickly. If we test enough pairs, we begin to make observations that are less significant, until we find p values that are insignificant. ANOVA puts all the data into one F number and gives us one P to test the null hypothesis.


ADVANTAGES

  • robust design
  • increases statistical power

In addition a two way ANOVA

  • looks at interaction between factors
  • reduces random variability
  • can look at effect on second variable after controlling the first variable

SHORTCOMINGS

  • if null hypothesis is rejected, we know at least one group differs from others, but with a one way ANOVA and multiple groups, it may be difficult to determine which group is different
  • assumptions need to be fulfilled

Examples

  1. Rennie CA, Hannan S, Maycock N, Kang C. Age-related macular degeneration: what do patients find on the internet? J R Soc Med. 2007 Oct;100(10):473-7.
  2. Internet sites were scored for technical information, quality, and SMOG (Simple Measure of Gobbledygook) using one-way ANOVA tests
  3. Petrovecki M, Rahelic D, Bilic-Zulle L, Jelec V. Factors influencing medical informatics examination grade--can biorhythm, astrological sign, seasonal aspect, or bad statistics predict outcome? Croat Med J. 2003 Feb;44(1):69-74.

This is an interesting study (though probably one with limited academic value). It looked at how "pseudoscientific variables" such as zodiac sign or biorhythm cycles affected a medical informatics exam grade.

382 second-year undergraduate students at the Rijeka University School of Medicine in the period from 1996/97 to 2000/01 academic year were asked to fill out an anonymous questionnaire about their attitude toward learning medical informatics after taking a Medical Informatics exam.

The answer: general learning capacity and computer habits correlated with exam grades, but there was no correlation between grades and zodiac signs, biorhythms, students sex, or time of year when exam was taken (so I guess my zodiac sign and the fact that I once lived in Finchley, London, the same place where R.A. Fisher was born, had nothing to do with my selection of this study). However, the authors also came up with this masterfully understated statement -- "Inadequate statistical analysis can always confirm false conclusions".

Quantitative technique: One-Way Analysis of Variance (ANOVA)

The analysis of variance is a partitioning of the total variance in a set of data into a number of component parts, so that the relative contributions of identifiable sources of variation to the total variation in measured responses can be determined. From this partition, suitable F-tests can be derived that allow differences between sets of means to be assessed.1

Thus ANOVA is a biostatistical method for determining whether a difference exists between the means of three or more independent populations. Expressed mathematically, it tests the null hypothesis- H0: 41 = 42 = 43 The one-way ANOVA parametric test will result in either accepting or rejecting this null hypothesis. If we reject the null hypothesis, then we can conclude that the population means are not equal. We do not know however whether all the means are different from one another or only some of them are different. This additional specificity is determined by conducting multiple comparison procedures, i.e. additional statistical tests.2

History

The phrase “analysis of variance” was coined by Sir Ronald Aylmer Fisher, a statistician of the twentieth century, who defined it as “the separation of variance ascribable to one group of causes from the variance ascribable to the other groups.”1

Principal use

One-way ANOVA is used when the researcher is comparing multiple groups (more than two) because it can control the overall Type I error rate.

Advantages:

  • It provides the overall test of equality of group means
  • It can control the overall type I error rate (i.e. false positive finding)
  • It is a parametric test so it is more powerful, if normality assumptions hold true

Shortcomings:

  • Requires that the population distributions are normal
  • It assumes equality of variances for each group

Examples in bioinformatics

Maker VK, Donnelly MB. Surgical resident peer evaluations – what have we learned. J Surg Educ. 2008 jan-Feb;65(1):8-16.

McCloskey DJ. Nurses’ perceptions of research utilization in a corporate health care system. J Nurs Scholarsh. 2008;40(1):39-45.

Cohen A, Fleischer JB, Johnson MK, Brown IN, Joe AK, Hershman DL, McMahon DJ, Silverberg SJ. Prevention of bone loss after withdrawal of tamoxifen. Endocr Pract. 2008 Mar;14(2):162-7.

Sources

  1. Landau S, Everitt BS. A Handbook of Statistical Analyses Using SPSS, Chapman & Hall/CRC, 2004.
  2. Pagano M, Gauvreau K. Principles of Biostatistics, 2nd Edition, Duxbury Press, Pacific Grove, CA, 2000.


References

  1. http://digital.library.adelaide.edu.au/coll/special//fisher/18pt1.pdf
  2. http://digital.library.adelaide.edu.au/coll/special/fisher/43.pdf