The trouble with p-values

An annoying T-shirt

An annoying T-shirt

Nature has two pieces this week on how p-values are commonly misused to distort scientific results. I’ve often been annoyed by casual statements like “what’s your p-value?” which is sometimes dropped as a quasi-scientific rebuttal in online discussions. Nature’s editors encourage us all to dive a little deeper into the foundations of statistical methods.

The first piece is an editorial called Number Crunch, issues a call to action for practicing scientists and educators:

The first step towards solving a problem is to acknowledge it. In this spirit, Nature urges all scientists to read the News Feature and its summary of the problems of the P value, if only to refresh their memories.

The second step is more difficult, because it involves finding a solution. Too many researchers have an incomplete or outdated sense of what is necessary in statistics; this is a broader problem than misuse of the P value.

Department heads, lab chiefs and senior scientists need to upgrade a good working knowledge of statistics from the ‘desirable’ column in job specifications to ‘essential’. But that, in turn, requires universities and funders to recognize the importance of statistics and provide for it. Nature is trying to do its bit and to acknowledge its own shortcomings. Better use of statistics is a central plank of a reproducibility initiative that aims to boost the reliability of the research that we publish (see Nature 496, 398; 2013).

The second piece is a more detailed article by Regina Nuzzo, titled “Scientific Methods: Statistical Errors.” Nuzzo gives a straightforward explanation of the problem:

It turned out that the problem was not in the data or in Motyl’s analyses. It lay in the surprisingly slippery nature of the P value, which is neither as reliable nor as objective as most scientists assume. “P values are not doing their job, because they can’t,” says Stephen Ziliak, an economist at Roosevelt University in Chicago, Illinois, and a frequent critic of the way statistics are used.

For many scientists, this is especially worrying in light of the reproducibility concerns. In 2005, epidemiologist John Ioannidis of Stanford University in California suggested that most published findings are false2; since then, a string of high-profile replication problems has forced scientists to rethink how they evaluate results.

P values have always had critics. In their almost nine decades of existence, they have been likened to mosquitoes (annoying and impossible to swat away), the emperor’s new clothes (fraught with obvious problems that everyone ignores) and the tool of a “sterile intellectual rake” who ravishes science but leaves it with no progeny3. One researcher suggested rechristening the methodology “statistical hypothesis inference testing”3, presumably for the acronym it would yield.

The article goes on to dissect several common distortions that result from researcher’s pursuit of results with low p-values. The point is well taken, and is a reminder for us all to spend some quality time examining our foundations.

Structured reasoning with ProbLog

Image

ProbLog is a simple language for probabilistic reasoning. It allows you to write logic programs that account for uncertainty among the facts and propositions, and it can calculate the probabilities of hypothesis or events based on your model. An online tutorial with a web-based calculator is available here.

Of all the intellectual techniques available for those who pursue a rational worldview, I believe none are more important that Bayesian inference. By saying this is “most important,” I don’t mean to disregard the fundamentals of logic, mathematics, probability, statistics, etc — those are all prerequisites to understanding Bayesian techniques. There’s been a lot of talk about Bayesianism recently in skeptical circles, with Richard Carrier drawing a lot of attention for his advocacy of Bayesian methods in history. I appreciate that there are a lot of philosophical arguments surrounding Carrier and others’ use and application of Bayes’ theorem. As with all reasoning techniques, Bayesian reasoning can be used well or it can be used poorly.
Continue reading

There is a difference between science and debate

ImageThe goal of science is to build knowledge. The goal of a debate is to win. While these can sometimes look very similar, they are not the same. Online media have created platforms where superficial debates can flourish at the expense of scientific understanding. Skeptical communities who advocate the scientific worldview already promote a widely known collection of rules for debate. They should also consider articulating “best practices” for constructive discussion, recognizing that most discussions do not need to become debates.

Science concerns itself with analysis, observation, demonstration and refutation with the goal of building knowledge. While scientific discussions often involve the juxtaposition of arguments, they rarely proceed in the manner of formal debates unless the occasion calls for an immediate decision or action — as in a peer review decision or the adoption of a research agenda. When debate happens, it requires extensive preparation by the participants and is usually moderated by a decision maker, such as a journal editor, program manager or committee chair.
Continue reading