Non-testable facts are commonplace in mathematically driven science

Observation that lacks a theory

An observation that lacks a theory

At first inspection, the scientific method seems to dictate that all accepted facts should rest on concrete observations. Based on this notion, some skeptics are quick to dismiss the scientific legitimacy of mathematically driven research. But there are many examples of important scientific findings that are essentially mathematical theorems with no prospect for physical falsification. A simple class of examples is the family of bounds and asymptotes. In this post I’ll examine a couple of specific examples from information science and engineering.

There’s an interesting set of articles that recently appeared on Pigliucci’s ScientiaSalon site. The first of these articles, titled “The multiverse as a scientific concept,” defends a mathematically-driven hypothesis that has no prospect for empirical validation. This article was authored by Coel Hellier, a professor of astrophysics at Keele University in the UK. The second article, titled “The evidence crisis,” offers a highly skeptical critique of the mathematical research methods used by string theorists, who introduce unobservable physical dimensions (and perhaps other controversial features) in order to produce a self-consistent mathematical theory that unifies the known physical laws. The second article is by Jim Baggott, who holds an Oxford PhD in physical chemistry, and has authored some critical books on modern physics, like this one.

I am very interested in the relationship between empirical and mathematical research. At just this moment, I have two article revisions in progress on my desktop. The first article provides an almost entirely empirical approach to validate a new heuristic technique; the reviewers are upset that I only have empirically tested results without a single mathematical theorem to back them up. The second article is more theoretically driven, but has limited empirical results; the reviewers complain that the experimental results are inadequate. This is a very typical situation for my field. There is an expectation of balance between theory and experiment. Purely empirical results can easily represent experimental or numerical mistakes, so you should ideally have a predictive theory to cohere with the observations. On the other hand, a strictly theoretical or mathematical result may not have any practical utility, so should be connected to some empirical demonstration (I am in an engineering field, after all).

Since I’m not a physicist, I won’t weigh in on the merits of string theory or the multiverse. In thinking about these topics, however, it occurs to me that there are a lot of scientific concepts that are purely mathematical results, and are effectively unfalsifiable. I think one such example is Shannon’s Capacity Theorem, which plays a foundational role in information theory. Simply put, Shannon’s Theorem predicts that any communication channel should have a maximum information capacity, i.e. a maximum rate at which information can be reliably communicated. There is a whole branch of information science devoted to modeling channels, solving for their capacity, and devising practical information coding techniques that push toward those capacity limits. A large amount of work in this field is purely mathematical.

With regard to empiricism, here are the features that I think are interesting about the capacity theorem: First, capacity is a limit. It tells us that we can’t achieve higher rates on a given channel. In terms of empirical testing, all we can do is build systems and observe that they don’t beat the capacity limit. That is not really an empirical test of the limit itself. Second, we usually don’t measure capacity directly. Instead, we use an assumed model for a hypothetical physical channel, and then apply some mathematical optimization theory to predict or infer the capacity limit.

Given these two features, I think the capacity theorem — along with a huge body of related research — is not truly testable or falsifiable in the way many empiricists would prefer (and I think that’s okay). Here are some specific points:

  1. We cannot falsify the proposition that every channel has a capacity. It is a consequence of the same mathematics that grounds all of probability theory and statistics research. In order to falsify the capacity theorem, we have to discard most other modern scientific practices as well. It is interesting to me that this is a strictly mathematical theorem, yet it forces inescapable conclusions about the physical world.
  2. If we did observe a system that beats capacity, we would assume that the system was measured improperly or used an incorrect channel model. Nearly every graduate student finds a way to “beat” the capacity limit early in their studies, but this is always because they made some mistake in their simulations or measurements. Even if we keep beating capacity and never find any fault in the measurements or models, it still would not suffice to falsify the capacity theorem. It’s a theorem — you can’t contradict it! Not unless you revise the axioms that lie at the theorem’s foundations. Such a revision would be amazing, but it would still have to be consistent with the traditional axioms as a degenerate case, because those axioms generate a system of theories that are overwhelmingly validated across many fields. This revision could therefore not be considered a falsification, but should rather be thought of as an extension to the theory.

The point of this analysis is to show that an unfalsifiable, untestable mathematical result is perfectly fine, if the result arises from a body of theory that is already solidly in place. To add another example, I mentioned before about how some researchers try to find information coding schemes that achieve the capacity bound. For a long time (about 50 years), the coding community took a quasi-empirical approach to this problem, devising dozens (maybe even hundreds or thousands) of coding schemes and testing them through pure analysis and simulations on different channel models. In the 1990’s, several methods were finally discovered that come extremely close to capacity on some of the most important channels. To some researchers, these methods were not good enough, since they only appear to approach capacity based on empirical observations. To these researchers, it would be preferable to construct a coding method that is mathematically proven to exactly achieve capacity.

In 2009, a method known as Polar Coding appeared, which was rigorously shown to asymptotically achieve capacity, i.e. it’s performance should get better and better as the amount of coded data goes to infinity, and when the amount of data reaches infinity, then it should work at a rate equal to capacity. This was hailed as a great advance in coding and information theory, but again the asymptotic claim is not truly verifiable through empirical methods. We can’t measure what happens when the information size reaches infinity. We can only make mathematical projections. Because of this, some researchers I know have quietly criticized the value of polar codes, calling them meaningless from a practical standpoint. I disagree; I value the progress of mathematical insight in concert with empirical research and practical applications.

To conclude, I want to offer one further observation about the mathematical system from which these theorems arise. When studying the axiomatic development of probability theory, statistics, and stochastic processes, I was really struck by how little attachment they have to empirical observations. They are mathematical frameworks with a number of fill-in-the-gap places where you specify, for instance, a physically plausible probability distribution (a commenter on Baggott’s article similarly described string theory as a mathematical framework for building theories, rather than a single fully-qualified physical theory). But even the physical probability distributions are frequently replaceable by a priori concepts derived, say, from the Bernoulli distribution (i.e. the coin toss process), or the Gaussian distribution under support from the Central Limit Theorem (another purely mathematical result!).

While we like to think that the history of science is a story of theories devised to explain observations (which may be true in some sciences), in many fields the story is partially reversed. The sciences of probability, statistics, and information theory (among many others) developed first from a priori mathematical considerations which defined the experimental procedures to be used for empirical studies. This history is chronicled in two of my favorite books on scientific history — The Emergence of Probability and The Taming of Chance — both written by philosopher Ian Hacking (who has authored a number of other interesting books worth examining).

Some may rightly argue that these claims are not totally unfalsifiable, since they are anchored to a theory that could have been independently falsified. The main point of my post, however, is to point out that a purely mathematical exposition can expose novel, very real truths about the physical world — truths that cannot be verified or falsified on their own.


Is this scientism?


Is there science happening here? I need a biologist to tell me.

PZ Myers and Laurence Moran say “Physicians and engineers are not scientists” (a point argued with, I think, malicious intent). Meanwhile Jerry Coyne and others think that car mechanics and plumbers are doing “science, broadly construed.” Sam Harris and Steven Pinker suggest (or at least imply) that scientists will ultimately overtake the humanities; Massimo Pigliucci has strenuously critiqued this latter view, calling it “scientism.”

This debate revolves around a basic rhetorical fallacy: the claim that “scientists” have a unique legitimacy attached to their beliefs, together with a claim of demarcational privilege to decide who is and isn’t a scientist. The arational imposition of intellectual privilege is, I think, the essence of the fuzzily defined “scientism” that non-scientists find threatening. It’s threatening because it is a threat. It attacks the legitimacy of entire classes of scholarship, and the Myers/Moran attack on engineers is one example.

This style of argument is used to de-legitimize a perceived opponent, or (as in Pigliucci’s case) to defend the legitimacy of his own profession. Such defenses are, according to Coyne, “defensive” — check out Coyne’s reaction to a historian who proposed that scientists might benefit from studying history. To paraphrase his position: we (scientists) don’t need you (non-scientists), you need us. On this level, the debate has nothing to do with science or the quality of ideas; instead it is a purely sophistic (and egoistic) effort to disqualify others.

I’ll pause now to remind the reader that I’m an engineer. Speaking as an engineer, I think there is a clear distinction between engineering and science: engineers have to actually get things right or they may suffer immediate economic, functional or ethical consequences. Scientists, on the other hand, have to pass their work through a process of critical review by their peers. The latter process is important to the long-term filtering of ideas, but peer review doesn’t have the same falsifying power as a collapsing bridge, an exploding boiler, a crashing train, a killer radiation leak or a misfired missile. So if we’re talking about legitimacy, I’d sooner trust the beliefs of a randomly selected engineer over those of a random scientist.

But Moran and Myers think engineers are something less. They are annoyed by Ken Ham’s claim that creationists can be successful in scientific careers, something that was argued during the Bill Nye / Ken Ham debate. They are so annoyed by the creationists that they are willing to degrade entire classes of scholars in order to win a fake point.

Continue reading

The trouble with p-values

An annoying T-shirt

An annoying T-shirt

Nature has two pieces this week on how p-values are commonly misused to distort scientific results. I’ve often been annoyed by casual statements like “what’s your p-value?” which is sometimes dropped as a quasi-scientific rebuttal in online discussions. Nature’s editors encourage us all to dive a little deeper into the foundations of statistical methods.

The first piece is an editorial called Number Crunch, issues a call to action for practicing scientists and educators:

The first step towards solving a problem is to acknowledge it. In this spirit, Nature urges all scientists to read the News Feature and its summary of the problems of the P value, if only to refresh their memories.

The second step is more difficult, because it involves finding a solution. Too many researchers have an incomplete or outdated sense of what is necessary in statistics; this is a broader problem than misuse of the P value.

Department heads, lab chiefs and senior scientists need to upgrade a good working knowledge of statistics from the ‘desirable’ column in job specifications to ‘essential’. But that, in turn, requires universities and funders to recognize the importance of statistics and provide for it. Nature is trying to do its bit and to acknowledge its own shortcomings. Better use of statistics is a central plank of a reproducibility initiative that aims to boost the reliability of the research that we publish (see Nature 496, 398; 2013).

The second piece is a more detailed article by Regina Nuzzo, titled “Scientific Methods: Statistical Errors.” Nuzzo gives a straightforward explanation of the problem:

It turned out that the problem was not in the data or in Motyl’s analyses. It lay in the surprisingly slippery nature of the P value, which is neither as reliable nor as objective as most scientists assume. “P values are not doing their job, because they can’t,” says Stephen Ziliak, an economist at Roosevelt University in Chicago, Illinois, and a frequent critic of the way statistics are used.

For many scientists, this is especially worrying in light of the reproducibility concerns. In 2005, epidemiologist John Ioannidis of Stanford University in California suggested that most published findings are false2; since then, a string of high-profile replication problems has forced scientists to rethink how they evaluate results.

P values have always had critics. In their almost nine decades of existence, they have been likened to mosquitoes (annoying and impossible to swat away), the emperor’s new clothes (fraught with obvious problems that everyone ignores) and the tool of a “sterile intellectual rake” who ravishes science but leaves it with no progeny3. One researcher suggested rechristening the methodology “statistical hypothesis inference testing”3, presumably for the acronym it would yield.

The article goes on to dissect several common distortions that result from researcher’s pursuit of results with low p-values. The point is well taken, and is a reminder for us all to spend some quality time examining our foundations.

I’m with Massimo

Screen Shot 2014-02-09 at 11.15.18 AMMassimo Pigliucci has taken a lot of heat for his criticisms of “new Atheists.” Pigliucci accuses NAs of being overconfident scientists who tread naively onto philosophical turf. I’m inclined to agree with him: NA’s are sometimes loudly making basic errors, inappropriately associating their views with “science,” and are sometimes making sophisticated excuses to rationalize their lack of rigor. For those who hope to have correct beliefs, a more cautious approach is warranted.

A few weeks ago, Jerry Coyne published a critique on his blog directed at an essay that Pigliucci published last September, titled “New Atheism and the Scientistic Turn in the Atheism Movement.” Among the various blogs I read, I’ve generally found Pigliucci’s blog, Rationally Speaking, to be one of the most intellectually satisfying. But Coyne disagrees, saying:

I’ve been put off by [Massimo’s] arrogance, attack-dogishness (if you want a strident atheist, look no further than Massimo), and his repeated criticisms of New Atheists because We Don’t Know Enough Philosophy. (If you substituted “Theology” for “Philosophy” there, you’d pretty much have Terry Eagleton).

The parenthetical phrase made me wince, since it alludes to the Courtier’s Reply argument that can be used as a sophistic excuse for lack of rigor. It is also pretty rude to equate the professional discipline of philosophy with that of theology, which Coyne believes is utterly vacuous. Later in the same post, Coyne made another alarming remark:

Note to readers: when you see the word “nuanced” used in criticism of atheism, run!

This sounds juvenile to me. All mature fields have nuances — “minor distinctions; subtlety or fine detail” — and you can’t just barge into an established field without carefully navigating those nuances. But that’s exactly how NA scientists sometimes sound when they make overreaching philosophical pronouncements.

Continue reading

Quasi-fallacies: the courtier’s reply and credential mongering

Look, science!

Look at all that science!

Skeptical arguments generally live in the domain of rhetoric and informal logic. Most informal arguments hinge on the correct identification of logical fallacies. There has been a slow growth in the number of alleged fallacies since the dawn of internet debate. Novel fallacies are usually a re-branding of established fallacies, with the goal of simplified rhetorical clarity. I’m concerned that this also promotes a false confidence that leads to shallow thinking and mis-identification.To paraphrase Occam, “fallacies are not to be multiplied beyond necessity.”

In this post, I’m going to pick on two examples: Prothero’s observations about credential mongering, and Myer’s anti-theology “courtier’s reply” argument that has been referenced by Dawkins and others. I chose these specific examples because they seem to be shaky arguments that can be aimed against each other. I don’t disagree with the conclusions of these arguments in their original context, but these arguments are not able to live independently as authentic fallacies.

Continue reading

Structured reasoning with ProbLog


ProbLog is a simple language for probabilistic reasoning. It allows you to write logic programs that account for uncertainty among the facts and propositions, and it can calculate the probabilities of hypothesis or events based on your model. An online tutorial with a web-based calculator is available here.

Of all the intellectual techniques available for those who pursue a rational worldview, I believe none are more important that Bayesian inference. By saying this is “most important,” I don’t mean to disregard the fundamentals of logic, mathematics, probability, statistics, etc — those are all prerequisites to understanding Bayesian techniques. There’s been a lot of talk about Bayesianism recently in skeptical circles, with Richard Carrier drawing a lot of attention for his advocacy of Bayesian methods in history. I appreciate that there are a lot of philosophical arguments surrounding Carrier and others’ use and application of Bayes’ theorem. As with all reasoning techniques, Bayesian reasoning can be used well or it can be used poorly.
Continue reading

There is a difference between science and debate

ImageThe goal of science is to build knowledge. The goal of a debate is to win. While these can sometimes look very similar, they are not the same. Online media have created platforms where superficial debates can flourish at the expense of scientific understanding. Skeptical communities who advocate the scientific worldview already promote a widely known collection of rules for debate. They should also consider articulating “best practices” for constructive discussion, recognizing that most discussions do not need to become debates.

Science concerns itself with analysis, observation, demonstration and refutation with the goal of building knowledge. While scientific discussions often involve the juxtaposition of arguments, they rarely proceed in the manner of formal debates unless the occasion calls for an immediate decision or action — as in a peer review decision or the adoption of a research agenda. When debate happens, it requires extensive preparation by the participants and is usually moderated by a decision maker, such as a journal editor, program manager or committee chair.
Continue reading