Basic statistical concepts

From Clinfowiki
Revision as of 12:55, 31 October 2014 by Ndgoldstein (Talk | contribs)

Jump to: navigation, search

Statistical principles

In his autobiography, Mark Twain identified three types of lies: “lies, damned lies, and statistics.” (Twain, 2012) The role of statistical misrepresentation has continued into modern medicine, as well. In an interview with The Atlantic, Dr John Ioannidis had this to say about current scientific rigor and bias, “At every step in the process, there is room to distort results, a way to make a stronger claim or to select what is going to be concluded.” (Freedman, 2010) In September 2014, JAMA published a review of the re-analysis of randomized clinical trial data; 35% of had changed conclusions based on independent review of the data. (Ebrahim et al., 2014) With medicine’s current focus on delivering improved quality care through the use of evidenced based medicine (EBM), clinicians should have a basic understanding of key statistical concepts.

Sensitivity

At its most basic, sensitivity is a measurement of well one’s test does finding the people with the disease in the entire population. In a test with low sensitivity, there will be people in the screened population which have a negative test despite having the disease. In a highly sensitive test, everyone in the tested population that has the disease will have a positive test.

2 x 2 table
Disease + Disease -
Test + True + (A) False + (B)
Test - False - (C) True - (D)

Looking at the results of figure one, you can calculate the sensitivity of a test by dividing the true positive results over the entire population with the disease. Done mathematically, the sensitivity is A/(A+C).

Applying this to clinical research, one could look at the prostate screening antigen (PSA). PSA goes up in cases of prostate cancer, so it has been used as a screening test. Initially, a PSA level of 4 ng/ml or higher was considered a positive screen. Because there were cases of prostate cancer missed with this cutoff, there was a discussion of lowering the threshold to 2.5 ng/ml. This would decrease the number of missed patients with prostate cancer, thereby increasing the sensitivity of the test. (Welch, Schwartz, & Woloshin, 2005) Not only was the level not lowered to 2.5 ng/ml, the US Preventive Service Task Force recommended against screening for PSA at all. This has to do in part with the “specificity” of the PSA screen.

Specificity

Whereas sensitivity focused on not missing cases of the disease, specificity focuses on not reporting that a patient has the disease when he does not. A test with low specificity, such as PSA, has many positive test results where the individual does not have the disease. The newborn screen for phenylketonuria is over 99% specific. (Kwon & Farrell, 2000) Specificity is calculated by true negatives divided by true negatives plus false positives. Referring back to our two by two table in figure one, sensitivity is D/(B+D).

Bayes’ Theorem

In the 18th century, Rev. Thomas Bayes developed a theory regarding conditional probabilities. Among other applications, this has been used to interpret the sensitivity and specificity of a diagnostic test. The two most commonly derived values from it are the positive predictive value (PPV) and negative predictive value (NPV). Putting the positive predictive value in clinical terms; what is the probability that the patient has the disease given a positive test? The negative predictive value represents the probability that the patient does not have the disease given a negative test.

Positive predictive value

Representing this in plain language, the PPV is the prevalence of the disease multiplied by the sensitivity of the test; you then divide that number by itself plus the prevalence of patients not having the disease multiplied by the false positive rate. In the following formula, T represents the test and D represents the disease; the + and – represent whether they are positive or negative. The notation P(T-|D+) is translated the probability that the test is negative given that the disease is positive; this is equal to 1 – sensitivity.

PPV = (P(D+) * P(T+|D+)) / (P(D+) * P(T+|D+) + P(D-) * P(T+|D-))

To simplify this even further, the positive predictive value goes up when the test is more sensitive, the disease is more prevalent, and the false positive rate is low.

Returning to our example of the highly specific PKU newborn screen, although the test is highly specific, the prevalence is low. This leads to a relatively low positive predictive value despite the high specificity.

Negative predictive value

The negative predictive value is calculated similarly.

NPV = (P(D-) * P(T-|D-)) / (P(D-) * P(T-|D-) + P(D+) * P(T-|D+))

Negative predictive values increase when the prevalence of the disease is low and the test is not sensitive.

Practical applications

High vs low prevalence

For the following two situations, we’ll be looking at a hypothetical disease called Kirk Syndrome which causes a dyspraxic, halting speech. Prior to 1966, the prevalence of the disease was 1 per 100,000. From 1966 on, there was a dramatic increase in prevalence likely secondary to some unknown environmental factor; the rates skyrocketed to 1 in every 1000. The screening test was 90% sensitive and 90% specific.

Using the Bayes theorem with pre-1966 data:

PPV = (0.9 * 0.00001) / (0.00001 * 0.9 + 0.99999 * 0.1) = 0.009%

NPV = (0.9*(1 – 0.00001)) / (0.9 * (1 – 0.00001) + (0.00001 * (1 – 0.9)) = 99.999%

When we use post-1966 data:

PPV = (0.9 * 0.001) / (0.001 * 0.9 + 0.999 * 0.1) = 0.893%

NPV = (0.9*(1 – 0.001)) / (0.9 * (1 – 0.001) + (0.001 * (1 – 0.9)) = 99.989%

Although there was a slight change in the negative predictive value for this test, there was a dramatic change in the positive predictive value. In the pre-1966 data only 0.009% of positive tests would have the disease, but after the disease became more prevalent it was nearly 0.892%--a nearly 100-fold increase.

High vs low sensitivity or specificity

As time has gone forward, there has been significant pressure for early diagnosis and management of Kirk Syndrome. It turns out halting speech has increased the spread of sexually-communicable, alien diseases after warp technology was developed. The prevalence has stayed set a one per 1000, but the test was tweaked to increase sensitivity to 99% with a concomitant decrease in specificity to 80%.

PPV = (0.99 * 0.001) / (0.001 * 0.99 + 0.999 * 0.2) = 0.493%

NPV = (0.8*(1 – 0.001)) / (0.8 * (1 – 0.001) + (0.001 * (1 – 0.99)) = 99.998%

This increase in sensitivity meant that there was a 0.01% increased chance that someone with the disease would not be missed, but only a 0.493% chance that a positive test meant that the person had the disease. Increased sensitivity increased the negative predictive value while decreasing the positive predictive value.

References

Ebrahim, S., Sohani, Z., Montoya, L., Agarwal, A., Thorlund, K., Mills, E., & Ioannidis, J. (2014). Reanalyses of randomized clinical trial data. JAMA, 312(10), 1024–32. doi:http://dx.doi.org/10.1001/jama.2014.9646

Freedman, D. H. (2010, November). Lies, Damned Lies, and Medical Science. The Atlantic. Retrieved from http://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/

Kwon, C., & Farrell, P. (2000). The magnitude and challenge of false-positive newborn screening test results. Archives of Pediatrics & Adolescent Medicine, 154(7), 714–8.

Twain, M. (2012). Autobiography of Mark Twain: Volume 1, Reader’s Edition. (H. E. Smith, B. Griffin, V. Fischer, M. B. Frank, S. Goetz, & L. D. Myrick, Eds.) (Reprint edition.). Berkeley, Calif.; London: University of California Press.

Related posts

Evidence based medicine: EBM

Analysis of variation: ANOVA

Submitted by Benj Barsotti