Basic statistical concepts
In his autobiography, Mark Twain identified three types of lies: “lies, damned lies, and statistics.” (Twain, 2012) The role of statistical misrepresentation has continued into modern medicine, as well. In an interview with The Atlantic, Dr John Ioannidis had this to say about current scientific rigor and bias, “At every step in the process, there is room to distort results, a way to make a stronger claim or to select what is going to be concluded.” (Freedman, 2010) In September 2014, JAMA published a review of the re-analysis of randomized clinical trial data; 35% of had changed conclusions based on independent review of the data. (Ebrahim et al., 2014) With medicine’s current focus on delivering improved quality care through the use of evidenced based medicine (EBM), clinicians should have a basic understanding of key statistical concepts.
At its most basic, sensitivity is a measurement of well one’s test does finding the people with the disease in the entire population. In a test with low sensitivity, there will be people in the screened population which have a negative test despite having the disease. In a highly sensitive test, everyone in the tested population that has the disease will have a positive test.
|Disease +||Disease -|
|Test +||True + (A)||False + (B)|
|Test -||False - (C)||True - (D)|
Looking at the results of figure one, you can calculate the sensitivity of a test by dividing the true positive results over the entire population with the disease. Done mathematically, the sensitivity is A/(A+C).
Applying this to clinical research, one could look at the prostate screening antigen (PSA). PSA goes up in cases of prostate cancer, so it has been used as a screening test. Initially, a PSA level of 4 ng/ml or higher was considered a positive screen. Because there were cases of prostate cancer missed with this cutoff, there was a discussion of lowering the threshold to 2.5 ng/ml. This would decrease the number of missed patients with prostate cancer, thereby increasing the sensitivity of the test. (Welch, Schwartz, & Woloshin, 2005) Not only was the level not lowered to 2.5 ng/ml, the US Preventive Service Task Force recommended against screening for PSA at all. This has to do in part with the “specificity” of the PSA screen.
Whereas sensitivity focused on not missing cases of the disease, specificity focuses on not reporting that a patient has the disease when he does not. A test with low specificity, such as PSA, has many positive test results where the individual does not have the disease. The newborn screen for phenylketonuria is over 99% specific. (Kwon & Farrell, 2000) Specificity is calculated by true negatives divided by true negatives plus false positives. Referring back to our two by two table in figure one, sensitivity is D/(B+D).
In the 18th century, Rev. Thomas Bayes developed a theory regarding conditional probabilities. Among other applications, this has been used to interpret the sensitivity and specificity of a diagnostic test. The two most commonly derived values from it are the positive predictive value (PPV) and negative predictive value (NPV). Putting the positive predictive value in clinical terms; what is the probability that the patient has the disease given a positive test? The negative predictive value represents the probability that the patient does not have the disease given a negative test.
Positive predictive value
Representing this in plain language, the PPV is the prevalence of the disease multiplied by the sensitivity of the test; you then divide that number by itself plus the prevalence of patients not having the disease multiplied by the false positive rate. In the following formula, T represents the test and D represents the disease; the + and – represent whether they are positive or negative. The notation P(T-|D+) is translated the probability that the test is negative given that the disease is positive; this is equal to 1 – sensitivity.
PPV = (P(D+) * P(T+|D+)) / (P(D+) * P(T+|D+) + P(D-) * P(T+|D-))
To simplify this even further, the positive predictive value goes up when the test is more sensitive, the disease is more prevalent, and the false positive rate is low.
Returning to our example of the highly specific PKU newborn screen, although the test is highly specific, the prevalence is low. This leads to a relatively low positive predictive value despite the high specificity.
Negative predictive value
The negative predictive value is calculated similarly.
NPV = (P(D-) * P(T-|D-)) / (P(D-) * P(T-|D-) + P(D+) * P(T-|D+))
Negative predictive values increase when the prevalence of the disease is low and the test is not sensitive.
In plain language, the likelihood ratio is the likelihood that the result of a test is expected given a variable (disease/risk), compared to the likelihood of the same result without the variable. In clinical practice, likelihood ratios are used to assess the utility of a diagnostic test and to guage how likely it is that a patient has a disease given a particular risk or previous test result (for example to evaluate pre-test probability). Both a positive and negative number is calculate which corresponds to a positive and negative result.
LR+ = sensitivity/(1-specificity) LR- = (1-sensitivity)/specificity
LR+ > 10 or LR- < 0.1 are indicative of a useful test. LRs can be multiplied with pre-test odds to estimate post-test odds.
In basic statistics, risk is quantified based on a exposure/risk/intervention x outcome contingency table.
|Outcome +||Outcome -|
Odds ratio The odds of an outcome given an exposure versus the odds of the same outcome without that exposure. Typically used in case-control studies where the outcome is known.
OR = ad/bc
Example: In a case-control study, 50/100 of postmenopausal women and 25/100 of premenopausal women developed osteoporosis. The OR ratio is thus (50)(75)/(50)(25) = 3. Thus the odds of a postmenopausal woman getting osteoporosis is 3 times the odds of a premenopausal woman.
Relative risk The risk of the outcome in an exposed group divided by the risk of the outcome in the unexposed group. Typically used in a cohort study where the exposure is known.
RR = a/(a+b) / c/(c+d)
Example: In a cohort study, 25/100 women given estrogen vs 50/100 women not given estrogen developed osteoporosis. The RR is thus (25/25+75)/(50/(50+50) = 25/100 / 50/100 = 0.5. Thus women given estrogen have a 50% risk reduction in developing osteoporosis.
Relative risk reduction The risk reduction that is attributable to the exposure or intervention.
RRR = 1 - RR
Absolute risk reduction The absolute difference in risk between exposed and unexposed groups.
ARR = c/(c+d) - a/(a+b)
Number needed to treat Number of patients who need to be treated for 1 patient to benefit.
NNT = 1/ARR
Number needed to harm Number of patients who need to be exposed for 1 patient to be harmed.
NNH = 1/AR
Incidence and Prevalence
Incidence is the number of new cases per unit of time. Incidence = # new cases in given time frame / population at risk
Prevalence is a cross section of all current cases in the population. Prevalence = # total cases / population at risk
Precision and Accuracy
Precision is the reliability or consistency of a test. Increased precision means less random variation, lower standard deviation. Accuracy is the trueness or validity of the test.
High vs low prevalence
For the following two situations, we’ll be looking at a hypothetical disease called Kirk Syndrome which causes a dyspraxic, halting speech. Prior to 1966, the prevalence of the disease was 1 per 100,000. From 1966 on, there was a dramatic increase in prevalence likely secondary to some unknown environmental factor; the rates skyrocketed to 1 in every 1000. The screening test was 90% sensitive and 90% specific.
Using the Bayes theorem with pre-1966 data:
PPV = (0.9 * 0.00001) / (0.00001 * 0.9 + 0.99999 * 0.1) = 0.009%
NPV = (0.9*(1 – 0.00001)) / (0.9 * (1 – 0.00001) + (0.00001 * (1 – 0.9)) = 99.999%
When we use post-1966 data:
PPV = (0.9 * 0.001) / (0.001 * 0.9 + 0.999 * 0.1) = 0.893%
NPV = (0.9*(1 – 0.001)) / (0.9 * (1 – 0.001) + (0.001 * (1 – 0.9)) = 99.989%
Although there was a slight change in the negative predictive value for this test, there was a dramatic change in the positive predictive value. In the pre-1966 data only 0.009% of positive tests would have the disease, but after the disease became more prevalent it was nearly 0.892%--a nearly 100-fold increase.
High vs low sensitivity or specificity
As time has gone forward, there has been significant pressure for early diagnosis and management of Kirk Syndrome. It turns out halting speech has increased the spread of sexually-communicable, alien diseases after warp technology was developed. The prevalence has stayed set a one per 1000, but the test was tweaked to increase sensitivity to 99% with a concomitant decrease in specificity to 80%.
PPV = (0.99 * 0.001) / (0.001 * 0.99 + 0.999 * 0.2) = 0.493%
NPV = (0.8*(1 – 0.001)) / (0.8 * (1 – 0.001) + (0.001 * (1 – 0.99)) = 99.998%
This increase in sensitivity meant that there was a 0.01% increased chance that someone with the disease would not be missed, but only a 0.493% chance that a positive test meant that the person had the disease. Increased sensitivity increased the negative predictive value while decreasing the positive predictive value.
Ebrahim, S., Sohani, Z., Montoya, L., Agarwal, A., Thorlund, K., Mills, E., & Ioannidis, J. (2014). Reanalyses of randomized clinical trial data. JAMA, 312(10), 1024–32. doi:http://dx.doi.org/10.1001/jama.2014.9646
Freedman, D. H. (2010, November). Lies, Damned Lies, and Medical Science. The Atlantic. Retrieved from http://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/
Kwon, C., & Farrell, P. (2000). The magnitude and challenge of false-positive newborn screening test results. Archives of Pediatrics & Adolescent Medicine, 154(7), 714–8.
Twain, M. (2012). Autobiography of Mark Twain: Volume 1, Reader’s Edition. (H. E. Smith, B. Griffin, V. Fischer, M. B. Frank, S. Goetz, & L. D. Myrick, Eds.) (Reprint edition.). Berkeley, Calif.; London: University of California Press.
Pagano, M. (2000). Principles of Biostatistics, Second edition. Cengage Learning.
Evidence based medicine: EBM
Analysis of variation: ANOVA
Submitted by Benj Barsotti