An observation that lacks a theory

*At first inspection, the scientific method seems to dictate that all accepted facts should rest on concrete observations. Based on this notion, some skeptics are quick to dismiss the scientific legitimacy of mathematically driven research. But there are many examples of important scientific findings that are essentially mathematical theorems with no prospect for physical falsification. A simple class of examples is the family of bounds and asymptotes. In this post I’ll examine a couple of specific examples from information science and engineering.*

There’s an interesting set of articles that recently appeared on Pigliucci’s ScientiaSalon site. The first of these articles, titled “The multiverse as a scientific concept,” defends a mathematically-driven hypothesis that has no prospect for empirical validation. This article was authored by Coel Hellier, a professor of astrophysics at Keele University in the UK. The second article, titled “The evidence crisis,” offers a highly skeptical critique of the mathematical research methods used by string theorists, who introduce unobservable physical dimensions (and perhaps other controversial features) in order to produce a self-consistent mathematical theory that unifies the known physical laws. The second article is by Jim Baggott, who holds an Oxford PhD in physical chemistry, and has authored some critical books on modern physics, like this one.

I am very interested in the relationship between empirical and mathematical research. At just this moment, I have two article revisions in progress on my desktop. The first article provides an almost entirely empirical approach to validate a new heuristic technique; the reviewers are upset that I only have empirically tested results without a single mathematical theorem to back them up. The second article is more theoretically driven, but has limited empirical results; the reviewers complain that the experimental results are inadequate. This is a very typical situation for my field. There is an expectation of balance between theory and experiment. Purely empirical results can easily represent experimental or numerical mistakes, so you should ideally have a predictive theory to cohere with the observations. On the other hand, a strictly theoretical or mathematical result may not have any practical utility, so should be connected to some empirical demonstration (I am in an engineering field, after all).

Since I’m not a physicist, I won’t weigh in on the merits of string theory or the multiverse. In thinking about these topics, however, it occurs to me that there are a lot of scientific concepts that are purely mathematical results, and are effectively unfalsifiable. I think one such example is Shannon’s Capacity Theorem, which plays a foundational role in information theory. Simply put, Shannon’s Theorem predicts that any communication channel should have a maximum information capacity, i.e. *a maximum rate at which information can be reliably communicated*. There is a whole branch of information science devoted to modeling channels, solving for their capacity, and devising practical *information coding techniques* that push toward those capacity limits. A large amount of work in this field is purely mathematical.

With regard to empiricism, here are the features that I think are interesting about the capacity theorem: First, capacity is a *limit*. It tells us that we can’t achieve higher rates on a given channel. In terms of empirical testing, all we can do is build systems and observe that they don’t beat the capacity limit. That is not really an empirical test of the limit itself. Second, we usually don’t *measure* capacity directly. Instead, we use an assumed model for a hypothetical physical channel, and then apply some mathematical optimization theory to predict or infer the capacity limit.

Given these two features, I think the capacity theorem — along with a huge body of related research — is not truly testable or falsifiable in the way many empiricists would prefer (and I think that’s okay). Here are some specific points:

- We cannot falsify the proposition that every channel has a capacity. It is a consequence of the same mathematics that grounds all of probability theory and statistics research. In order to falsify the capacity theorem, we have to discard most other modern scientific practices as well. It is interesting to me that this is a strictly mathematical theorem, yet it forces inescapable conclusions about the physical world.
- If we did observe a system that beats capacity, we would assume that the system was measured improperly or used an incorrect channel model. Nearly every graduate student finds a way to “beat” the capacity limit early in their studies, but this is
*always* because they made some mistake in their simulations or measurements. Even if we keep beating capacity and never find any fault in the measurements or models, it still would not suffice to falsify the capacity theorem. It’s a theorem — you can’t contradict it! Not unless you revise the axioms that lie at the theorem’s foundations. Such a revision would be amazing, but it would still have to be consistent with the traditional axioms as a degenerate case, because those axioms generate a system of theories that are overwhelmingly validated across many fields. This revision could therefore not be considered a falsification, but should rather be thought of as an extension to the theory.

The point of this analysis is to show that an unfalsifiable, untestable mathematical result is perfectly fine, if the result arises from a body of theory that is already solidly in place. To add another example, I mentioned before about how some researchers try to find information coding schemes that achieve the capacity bound. For a long time (about 50 years), the coding community took a quasi-empirical approach to this problem, devising dozens (maybe even hundreds or thousands) of coding schemes and testing them through pure analysis and simulations on different channel models. In the 1990’s, several methods were finally discovered that come extremely close to capacity on some of the most important channels. To some researchers, these methods were not good enough, since they only *appear* to approach capacity based on empirical observations. To these researchers, it would be preferable to construct a coding method that is *mathematically proven to exactly achieve capacity*.

In 2009, a method known as Polar Coding appeared, which was rigorously shown to *asymptotically* achieve capacity, i.e. it’s performance should get better and better as the amount of coded data goes to infinity, and when the amount of data *reaches infinity,* then it should work at a rate equal to capacity. This was hailed as a great advance in coding and information theory, but again the asymptotic claim is not truly verifiable through empirical methods. We can’t measure what happens when the information size reaches infinity. We can only make mathematical projections. Because of this, some researchers I know have quietly criticized the value of polar codes, calling them meaningless from a practical standpoint. I disagree; I value the progress of mathematical insight in concert with empirical research and practical applications.

To conclude, I want to offer one further observation about the mathematical system from which these theorems arise. When studying the axiomatic development of probability theory, statistics, and stochastic processes, I was really struck by how little attachment they have to empirical observations. They are mathematical frameworks with a number of fill-in-the-gap places where you specify, for instance, a physically plausible probability distribution (a commenter on Baggott’s article similarly described string theory as a mathematical framework for building theories, rather than a single fully-qualified physical theory). But even the physical probability distributions are frequently replaceable by *a priori* concepts derived, say, from the Bernoulli distribution (i.e. the coin toss process), or the Gaussian distribution under support from the Central Limit Theorem (another purely mathematical result!).

While we like to think that the history of science is a story of theories devised to explain observations (which may be true in some sciences), in many fields the story is partially reversed. The sciences of probability, statistics, and information theory (among many others) developed first from *a priori* mathematical considerations which *defined* the experimental procedures to be used for empirical studies. This history is chronicled in two of my favorite books on scientific history — The Emergence of Probability and The Taming of Chance — both written by philosopher Ian Hacking (who has authored a number of other interesting books worth examining).

Some may rightly argue that these claims are not totally unfalsifiable, since they are anchored to a theory that could have been independently falsified. The main point of my post, however, is to point out that a purely mathematical exposition can expose novel, very real truths about the physical world — truths that cannot be verified or falsified on their own.