Escaping the traps of Facebook, Google and other centralized data hordes

A furor erupted this week over a research project conducted by Facebook in which they manipulated the feeds of over 600,000 users in order to measure their emotional responses. To many, this sounds like a trivial intrusion, perhaps on par with the insertion of advertising content. But several scientists have argued that it constitutes a serious breech in established research ethics — namely the requirement for informed consent. In the world of scientific research, the bar for informed consent is quite high. Facebook chose to rely on their Terms of Use as a proxy for informed consent, but that is unacceptable and would establish a dangerous precedent for eroding the rights of future study participants. An author at the Skepchik network contributed this critique of Facebook’s behavior:

What’s unethical about this research is that it doesn’t appear that Facebook actually obtained informed consent. The claim in the paper is that the very vague blanket data use policy constitutes informed consent, but if we look at the typical requirements for obtaining informed consent, it becomes very clear that their policy falls way short. The typical requirements for informed consent include:

  • Respect for the autonomy of individual research participants
  • Fully explain the purposes of the research that people are agreeing to participate in in clear, jargonless language that is easy to understand
  • Explain the expected duration of the study
  • Describe the procedures that will happen during the study
  • Identify any experimental protocols that may be used
  • Describe any potential risks and benefits for participation
  • Describe how confidentiality will be maintained
  • A statement acknowledging that participation is completely voluntary, that a participant may withdraw participation at any time for any or no reason, and that any decision not to continue participating will incur no loss of benefits or other penalty.

Of course this level of detail cannot be covered by blanket “Terms of Use” that apply to all users of a general-purpose communication platform. Slate’s Katy Waldman agrees that Facebook’s study was unethical:

Here is the only mention of “informed consent” in the paper: The research “was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.”

That is not how most social scientists define informed consent.

Here is the relevant section of Facebook’s data use policy: “For example, in addition to helping people see and find things that you do and share, we may use the information we receive about you … for internal operations, including troubleshooting, data analysis, testing, research and service improvement.”

So there is a vague mention of “research” in the fine print that one agrees to by signing up for Facebook. As bioethicist Arthur Caplan told me, however, it is worth asking whether this lawyerly disclosure is really sufficient to warn people that “their Facebook accounts may be fair game for every social scientist on the planet.”

Of course Facebook is no stranger to deceptive and unethical behavior. We may recall their 2012 settlement with the Federal Trade Commission, which charged “that Facebook deceived consumers by telling them they could keep their information on Facebook private, and then repeatedly allowing it to be shared and made public.”

The problem is simple: Facebook is a centralized service that aggregates intimate data on millions of users. They need to find ways to profit from that data — our data — and we have little control over how their activity might disadvantage or manipulate the users. Their monetization strategies go beyond their already troubling project to facilitate targeted ads from third party apps, apps that you might assume have no relationship to your Facebook activities. Facebook also manages the identity and contact networks of those users, making it difficult to leave the platform without becoming disconnected from your social network. It is a trap. Last week a Metro editorial claimed that it’s getting worse, and recommends that we all quit “cold turkey.” Some users have migrated over to Google services as an escape, but Google has faced similar FTC charges that reveal isn’t any better. So Google is just another mask on the same fundamental problems.

So what is the fix? I’m putting my money on The Red Matrix, a solution that supports distributed identity, decentralized social networking, content rights management and cloud data services.


The core idea behind the Red Matrix is to provide an open specification and protocol for delivering contemporary internet services in a portable way, so that users are not tied to a single content provider. The underlying protocol, called “zot,” is designed to support a mix of public and privately shared content, providing encryption and separating a user’s identity from their service provider.

While still in its early stages, the Red Matrix provides core features comparable to WordPress, Drupal, Dropbox, Evernote and of course social networking capabilities. It is hard to summarize the possibilities of this emerging platform. I’m still discovering new ways to leverage the platform for things ranging from personal note management to blogging. Although the Red Matrix is small, it is an open source project with a fanatical base of users and developers, which makes it likely to endure and grow.

This seems like a good time to announce the Red Matrix companion channel for this site: This channel acts as a “stream of consciousness” for material related to this blog, containing supplemental information, technical posts, short comments, reposts of news items, and other miscellanea. The primary WordPress site will be reserved for more detailed posts. Any readers are welcome to comment or otherwise interact by joining the Red Matrix at my server or one of the other public servers in the Red Matrix network.

Non-testable facts are commonplace in mathematically driven science

Observation that lacks a theory

An observation that lacks a theory

At first inspection, the scientific method seems to dictate that all accepted facts should rest on concrete observations. Based on this notion, some skeptics are quick to dismiss the scientific legitimacy of mathematically driven research. But there are many examples of important scientific findings that are essentially mathematical theorems with no prospect for physical falsification. A simple class of examples is the family of bounds and asymptotes. In this post I’ll examine a couple of specific examples from information science and engineering.

There’s an interesting set of articles that recently appeared on Pigliucci’s ScientiaSalon site. The first of these articles, titled “The multiverse as a scientific concept,” defends a mathematically-driven hypothesis that has no prospect for empirical validation. This article was authored by Coel Hellier, a professor of astrophysics at Keele University in the UK. The second article, titled “The evidence crisis,” offers a highly skeptical critique of the mathematical research methods used by string theorists, who introduce unobservable physical dimensions (and perhaps other controversial features) in order to produce a self-consistent mathematical theory that unifies the known physical laws. The second article is by Jim Baggott, who holds an Oxford PhD in physical chemistry, and has authored some critical books on modern physics, like this one.

I am very interested in the relationship between empirical and mathematical research. At just this moment, I have two article revisions in progress on my desktop. The first article provides an almost entirely empirical approach to validate a new heuristic technique; the reviewers are upset that I only have empirically tested results without a single mathematical theorem to back them up. The second article is more theoretically driven, but has limited empirical results; the reviewers complain that the experimental results are inadequate. This is a very typical situation for my field. There is an expectation of balance between theory and experiment. Purely empirical results can easily represent experimental or numerical mistakes, so you should ideally have a predictive theory to cohere with the observations. On the other hand, a strictly theoretical or mathematical result may not have any practical utility, so should be connected to some empirical demonstration (I am in an engineering field, after all).

Since I’m not a physicist, I won’t weigh in on the merits of string theory or the multiverse. In thinking about these topics, however, it occurs to me that there are a lot of scientific concepts that are purely mathematical results, and are effectively unfalsifiable. I think one such example is Shannon’s Capacity Theorem, which plays a foundational role in information theory. Simply put, Shannon’s Theorem predicts that any communication channel should have a maximum information capacity, i.e. a maximum rate at which information can be reliably communicated. There is a whole branch of information science devoted to modeling channels, solving for their capacity, and devising practical information coding techniques that push toward those capacity limits. A large amount of work in this field is purely mathematical.

With regard to empiricism, here are the features that I think are interesting about the capacity theorem: First, capacity is a limit. It tells us that we can’t achieve higher rates on a given channel. In terms of empirical testing, all we can do is build systems and observe that they don’t beat the capacity limit. That is not really an empirical test of the limit itself. Second, we usually don’t measure capacity directly. Instead, we use an assumed model for a hypothetical physical channel, and then apply some mathematical optimization theory to predict or infer the capacity limit.

Given these two features, I think the capacity theorem — along with a huge body of related research — is not truly testable or falsifiable in the way many empiricists would prefer (and I think that’s okay). Here are some specific points:

  1. We cannot falsify the proposition that every channel has a capacity. It is a consequence of the same mathematics that grounds all of probability theory and statistics research. In order to falsify the capacity theorem, we have to discard most other modern scientific practices as well. It is interesting to me that this is a strictly mathematical theorem, yet it forces inescapable conclusions about the physical world.
  2. If we did observe a system that beats capacity, we would assume that the system was measured improperly or used an incorrect channel model. Nearly every graduate student finds a way to “beat” the capacity limit early in their studies, but this is always because they made some mistake in their simulations or measurements. Even if we keep beating capacity and never find any fault in the measurements or models, it still would not suffice to falsify the capacity theorem. It’s a theorem — you can’t contradict it! Not unless you revise the axioms that lie at the theorem’s foundations. Such a revision would be amazing, but it would still have to be consistent with the traditional axioms as a degenerate case, because those axioms generate a system of theories that are overwhelmingly validated across many fields. This revision could therefore not be considered a falsification, but should rather be thought of as an extension to the theory.

The point of this analysis is to show that an unfalsifiable, untestable mathematical result is perfectly fine, if the result arises from a body of theory that is already solidly in place. To add another example, I mentioned before about how some researchers try to find information coding schemes that achieve the capacity bound. For a long time (about 50 years), the coding community took a quasi-empirical approach to this problem, devising dozens (maybe even hundreds or thousands) of coding schemes and testing them through pure analysis and simulations on different channel models. In the 1990’s, several methods were finally discovered that come extremely close to capacity on some of the most important channels. To some researchers, these methods were not good enough, since they only appear to approach capacity based on empirical observations. To these researchers, it would be preferable to construct a coding method that is mathematically proven to exactly achieve capacity.

In 2009, a method known as Polar Coding appeared, which was rigorously shown to asymptotically achieve capacity, i.e. it’s performance should get better and better as the amount of coded data goes to infinity, and when the amount of data reaches infinity, then it should work at a rate equal to capacity. This was hailed as a great advance in coding and information theory, but again the asymptotic claim is not truly verifiable through empirical methods. We can’t measure what happens when the information size reaches infinity. We can only make mathematical projections. Because of this, some researchers I know have quietly criticized the value of polar codes, calling them meaningless from a practical standpoint. I disagree; I value the progress of mathematical insight in concert with empirical research and practical applications.

To conclude, I want to offer one further observation about the mathematical system from which these theorems arise. When studying the axiomatic development of probability theory, statistics, and stochastic processes, I was really struck by how little attachment they have to empirical observations. They are mathematical frameworks with a number of fill-in-the-gap places where you specify, for instance, a physically plausible probability distribution (a commenter on Baggott’s article similarly described string theory as a mathematical framework for building theories, rather than a single fully-qualified physical theory). But even the physical probability distributions are frequently replaceable by a priori concepts derived, say, from the Bernoulli distribution (i.e. the coin toss process), or the Gaussian distribution under support from the Central Limit Theorem (another purely mathematical result!).

While we like to think that the history of science is a story of theories devised to explain observations (which may be true in some sciences), in many fields the story is partially reversed. The sciences of probability, statistics, and information theory (among many others) developed first from a priori mathematical considerations which defined the experimental procedures to be used for empirical studies. This history is chronicled in two of my favorite books on scientific history — The Emergence of Probability and The Taming of Chance — both written by philosopher Ian Hacking (who has authored a number of other interesting books worth examining).

Some may rightly argue that these claims are not totally unfalsifiable, since they are anchored to a theory that could have been independently falsified. The main point of my post, however, is to point out that a purely mathematical exposition can expose novel, very real truths about the physical world — truths that cannot be verified or falsified on their own.

Why school led prayer is wrong

This post comes from my perspective as an educator, and also as a frequent unwilling participant in publicly led prayers. There are two simple reasons why institutionally directed prayer is wrong: equal rights and individual privacy. We often hear about the equal rights aspect, but I think the invasion of privacy is more powerful and more troubling. When Kevin Lowery, principle of Lebanon High School in Missouri, led a prayer at the school’s graduation ceremony, he did more than alienate a few non-Christian students and parents. He faced those students and their parents with an awkward choice: they could either bow their heads in grudging compliance with Mr. Lowery’s religious exercise and join in the subsequent applause, or they could stand visibly apart from the crowd and thereby out themselves as detractors. This presents the non-Christians (or non-participating Christians) with a crisis in which they must either go against their consciences, or be spotlighted as outsiders. That is coercive and it is wrong.

Suppose that instead of praying, Mr. Lowery had asked all the Christians in the room to stand. Suppose he had asked all the non-Christians to stand. How about just they Muslims or the Jews? How about all the Atheists? These would be pretty intrusive requests, but leading a prayer from the pulpit is no different. It is extremely easy to sort out who is from who isn’t by watching what people do when the praying starts. It may be the case that many evangelical Christians enjoy having their religious views exposed as publicly as possible, but most people prefer to keep their views and affiliations quiet, and they shouldn’t need to give any reason for exercising their own privacy.

Atheist advocate Jerry Coyne has been heavily pursuing this matter at Lebanon high school, and has received many responses from people on “both” sides of the issue. One of the first responses Coyne received was from a school board member, Mr. Kim Light, who asked: “My question is whether or not this is funded and/or supported by the University of Chicago and is this YouTube viewing conducted using university resources and conducted during time that could be used for instructional or research time.”

Mr. Light’s “question” was in fact a thinly veiled threat, probing at the blurry line that separates academic freedom from (potentially) objectionable outside interests. Mr. Kim’s threatening stance is not uncommon coming from those who want to constrain the speech of university professors. Here is my response to that kind of threat: I am knowingly and deliberately writing this post during “business hours” from my university office, using my university computer while sitting on my university chair. The purpose of this post is to give a message to any of my students or colleagues who may be reading: I respect your religious privacy and autonomy. I will not ever force you to publicly expose your religious views in any university function. I will not ever impose a religious exercise on you or on the members of any captive audience. If you have special needs associated with your religious views, I will make an effort (within reason) to discretely accommodate those needs.

I feel it is important for all educators to acknowledge that their students and peers comprise a diverse population, and we have a professional obligation to be neutral in all respects except for scholarly performance and professional conduct. If a student wants to approach me about religion, then we can have a free discussion. If a student wants to say a quiet personal prayer before an exam, that’s their prerogative (luckily I’m an Atheist or else I might consider that to be cheating). But I understand it is not my place to corner people into a religiously themed exercise.

This isn’t a hard concept at all.

The Noble Jerry Coyne

ImageJerry Coyne, a prominent “New Atheist” and author of the popular book Why Evolution is True, is seriously immersing himself in theology by studying Hart’s The Experience of God: Being, Consciousness, Bliss. Coyne is in the vanguard of “evangelizing” Atheists, who are often criticized for not sufficiently understanding the positions of sophisticated, modern theologians. By taking time to study and respond to these ivory tower materials, Coyne shows that he’s a genuine class act. I want to applaud his decision, as it presents a positive contrast to the Courtier’s Reply position — in essence, “I don’t have to study something if I already know it’s false” — that has been creeping like a nasty weed through Atheist circles. It’s nice to see someone taking the high road of intellectual engagement.


Continue reading

Is this scientism?


Is there science happening here? I need a biologist to tell me.

PZ Myers and Laurence Moran say “Physicians and engineers are not scientists” (a point argued with, I think, malicious intent). Meanwhile Jerry Coyne and others think that car mechanics and plumbers are doing “science, broadly construed.” Sam Harris and Steven Pinker suggest (or at least imply) that scientists will ultimately overtake the humanities; Massimo Pigliucci has strenuously critiqued this latter view, calling it “scientism.”

This debate revolves around a basic rhetorical fallacy: the claim that “scientists” have a unique legitimacy attached to their beliefs, together with a claim of demarcational privilege to decide who is and isn’t a scientist. The arational imposition of intellectual privilege is, I think, the essence of the fuzzily defined “scientism” that non-scientists find threatening. It’s threatening because it is a threat. It attacks the legitimacy of entire classes of scholarship, and the Myers/Moran attack on engineers is one example.

This style of argument is used to de-legitimize a perceived opponent, or (as in Pigliucci’s case) to defend the legitimacy of his own profession. Such defenses are, according to Coyne, “defensive” — check out Coyne’s reaction to a historian who proposed that scientists might benefit from studying history. To paraphrase his position: we (scientists) don’t need you (non-scientists), you need us. On this level, the debate has nothing to do with science or the quality of ideas; instead it is a purely sophistic (and egoistic) effort to disqualify others.

I’ll pause now to remind the reader that I’m an engineer. Speaking as an engineer, I think there is a clear distinction between engineering and science: engineers have to actually get things right or they may suffer immediate economic, functional or ethical consequences. Scientists, on the other hand, have to pass their work through a process of critical review by their peers. The latter process is important to the long-term filtering of ideas, but peer review doesn’t have the same falsifying power as a collapsing bridge, an exploding boiler, a crashing train, a killer radiation leak or a misfired missile. So if we’re talking about legitimacy, I’d sooner trust the beliefs of a randomly selected engineer over those of a random scientist.

But Moran and Myers think engineers are something less. They are annoyed by Ken Ham’s claim that creationists can be successful in scientific careers, something that was argued during the Bill Nye / Ken Ham debate. They are so annoyed by the creationists that they are willing to degrade entire classes of scholars in order to win a fake point.

Continue reading

The trouble with p-values

An annoying T-shirt

An annoying T-shirt

Nature has two pieces this week on how p-values are commonly misused to distort scientific results. I’ve often been annoyed by casual statements like “what’s your p-value?” which is sometimes dropped as a quasi-scientific rebuttal in online discussions. Nature’s editors encourage us all to dive a little deeper into the foundations of statistical methods.

The first piece is an editorial called Number Crunch, issues a call to action for practicing scientists and educators:

The first step towards solving a problem is to acknowledge it. In this spirit, Nature urges all scientists to read the News Feature and its summary of the problems of the P value, if only to refresh their memories.

The second step is more difficult, because it involves finding a solution. Too many researchers have an incomplete or outdated sense of what is necessary in statistics; this is a broader problem than misuse of the P value.

Department heads, lab chiefs and senior scientists need to upgrade a good working knowledge of statistics from the ‘desirable’ column in job specifications to ‘essential’. But that, in turn, requires universities and funders to recognize the importance of statistics and provide for it. Nature is trying to do its bit and to acknowledge its own shortcomings. Better use of statistics is a central plank of a reproducibility initiative that aims to boost the reliability of the research that we publish (see Nature 496, 398; 2013).

The second piece is a more detailed article by Regina Nuzzo, titled “Scientific Methods: Statistical Errors.” Nuzzo gives a straightforward explanation of the problem:

It turned out that the problem was not in the data or in Motyl’s analyses. It lay in the surprisingly slippery nature of the P value, which is neither as reliable nor as objective as most scientists assume. “P values are not doing their job, because they can’t,” says Stephen Ziliak, an economist at Roosevelt University in Chicago, Illinois, and a frequent critic of the way statistics are used.

For many scientists, this is especially worrying in light of the reproducibility concerns. In 2005, epidemiologist John Ioannidis of Stanford University in California suggested that most published findings are false2; since then, a string of high-profile replication problems has forced scientists to rethink how they evaluate results.

P values have always had critics. In their almost nine decades of existence, they have been likened to mosquitoes (annoying and impossible to swat away), the emperor’s new clothes (fraught with obvious problems that everyone ignores) and the tool of a “sterile intellectual rake” who ravishes science but leaves it with no progeny3. One researcher suggested rechristening the methodology “statistical hypothesis inference testing”3, presumably for the acronym it would yield.

The article goes on to dissect several common distortions that result from researcher’s pursuit of results with low p-values. The point is well taken, and is a reminder for us all to spend some quality time examining our foundations.

I’m with Massimo

Screen Shot 2014-02-09 at 11.15.18 AMMassimo Pigliucci has taken a lot of heat for his criticisms of “new Atheists.” Pigliucci accuses NAs of being overconfident scientists who tread naively onto philosophical turf. I’m inclined to agree with him: NA’s are sometimes loudly making basic errors, inappropriately associating their views with “science,” and are sometimes making sophisticated excuses to rationalize their lack of rigor. For those who hope to have correct beliefs, a more cautious approach is warranted.

A few weeks ago, Jerry Coyne published a critique on his blog directed at an essay that Pigliucci published last September, titled “New Atheism and the Scientistic Turn in the Atheism Movement.” Among the various blogs I read, I’ve generally found Pigliucci’s blog, Rationally Speaking, to be one of the most intellectually satisfying. But Coyne disagrees, saying:

I’ve been put off by [Massimo’s] arrogance, attack-dogishness (if you want a strident atheist, look no further than Massimo), and his repeated criticisms of New Atheists because We Don’t Know Enough Philosophy. (If you substituted “Theology” for “Philosophy” there, you’d pretty much have Terry Eagleton).

The parenthetical phrase made me wince, since it alludes to the Courtier’s Reply argument that can be used as a sophistic excuse for lack of rigor. It is also pretty rude to equate the professional discipline of philosophy with that of theology, which Coyne believes is utterly vacuous. Later in the same post, Coyne made another alarming remark:

Note to readers: when you see the word “nuanced” used in criticism of atheism, run!

This sounds juvenile to me. All mature fields have nuances — “minor distinctions; subtlety or fine detail” — and you can’t just barge into an established field without carefully navigating those nuances. But that’s exactly how NA scientists sometimes sound when they make overreaching philosophical pronouncements.

Continue reading