The new (and very old) political responsibility of data scientists

2017-01-31

We still have a responsibility to prevent the ethical misuse of new technologies, as well as helping make their impact on human welfare a positive one. But we now have a more fundamental challenge: to help defend the very concept and practice of the measurement and analysis of quantitative fact.

To be sure, a big part of practicing data science consists of dealing with the multiple issues and limitations we face when trying to observe and understand the world. Data seldom means what its name implies it means; there are qualifications, measurement biases, unclear assumptions, etc. And that's even before we engage the useful but tricky work of making inferences off that data.

But the end result of what we do — and not only, or even mainly us, for this collective work of observation and analysis is one of the common threads and foundations of civilization — is usually a pretty good guess, and it's always better than closing your eyes and giving whatever number provides you with an excuse to do what you'd rather do. Deliberately messing with the measurement of physical, economic, or social data is a lethal attack on democratic practices, because it makes impossible for citizens to evaluate government behavior. Defending the impossibility of objective measurement (as opposed to acknowledging and adapting to the many difficulties involved) is simply to give up on any form of societal organization different from mystical authoritarianism.

Neither attitude is new, but both have gained dramatically in visibility and influence during the last year. This adds to the existing ethical responsibilities of our profession a new one, unavoidably in tension with them. We not only need to fight against over-reliance on algorithmic governance driven by biased data (e.g. predicting behavior from records compiled by historically biased organizations) or the unethical commercial and political usage of collected information, but also, paradoxically, we need to defend and collaborate in the use of data-driven governance based on best-effort data and models.

There are forms of tyranny based on the systematic deployment of ubiquitous algorithmic technologies, and there are forms of obscurantism based on the use of cargo cult pseudo-science. But there are also forms of tyranny and obscurantism predicated on the deliberate corruption of data or even the negation of the very possibility of collecting it, and it's part of our job to resist them.

Economists and statisticians in Argentina, when previous governments deliberately altered some national statistics and stopped collecting others, rose to the challenge by providing parallel, and much more widely believed, numbers (among the first, the journalist and economist — a combination of skills more necessary with every passing year — Sebastián Campanario). Theirs weren't the kind of arbitrary statements that are frequently part of political discourse, nor did they reject official statistics because they didn't match ideological preconceptions or it was politically convenient to do so. Official statistics were technically wrong in their process of measurement and analysis, and for any society that aspires to meaningful self-government the soundness and availability of statistics about itself are an absolute necessity.

Data scientists are increasingly involved in the process of collection and analysis of socially relevant metrics, both in the private and the public sectors. We need to consistently refuse to do it wrong, and to do our best to do it correctly even, and specially, when we suspect other people are choosing not to. Nowcasting, inferring the present from the available information, can be as much of a challenge, and as important, as predicting the future. The fact that we might end up having to do it without the assumption of possibly flawed but honest data will be a problem we have in other contexts already began to work on. Some of the earliest applications of modern data-driven models in finance, after all, were in fraud detection.

We are all potentially climate scientists now, massive observational efforts to be refuted based on anecdotes, disingenuous visualizations to be touted as definitive proof, and eventually the very possibility of quantitative understanding to be violently mocked. We (still) have to make sure the economic and social impact of things like ubiquitous predictive surveillance and technology-driven mass unemployment are managed in positive ways, but this new responsibility isn't one we can afford to ignore.