Category Archives: Uncategorized

Russia 1, Data Science 0

Both sides in the 2016 election had access to the best statistical models and databases money could buy. If Russian influence (which as far as we know involved little more than the well-timed dumping of not exactly military grade hacked information, plus some Twitter bots and Facebook ads) was at any level decisive, then it's a slap on the face for data-driven campaigning, which apparently hasn't rendered obsolete the old art of manipulating cognitive blind spots in media coverage and political habits ("they used Facebook and Twitter" explains nothing: so did all US candidates, in theory with better data and technology, and so do small Etsy shops; it should've made no difference).

The lessons, I suspect, are three:

  • The theory and practice of data-driven campaigning is still very immature. Algorithmize the Breitbart-Russia-Assange-Fox News maneuver, and you'll have something far ahead of the state of the art. (I believe this will come from more sophisticated psychological modeling, rather than more data.)
  • If a country's political process is as vulnerable as the US' was to what the Russians did, then how will it do against an external actor properly leveraging the kind of tools you can develop at the intersection of obsessive data collection, an extremely Internet-focused government, cutting-edge AI, and an assertive foreign policy.
  • You know, like China. Hypothetically.

Whenever this happens, the proper reaction to this isn't to get angry, but to recognize that a political system proved embarrassingly vulnerable, and take measures to improve it. That said, that's slightly less likely to happen when those informational vulnerabilities are also used by the same local actors that are partially responsible for fixing them.

(As an aside, "out under-investment on security /deliberate exploiting of regulatory gaps we lobbied for/cover-up of known vulnerabilities would've been fine if not for those dastardly hackers" is also the default response of large companies to this kind of thing; this isn't a coincidence, but a shared ethos.)

Probability-as-logic vs probability-as-strategy vs probability-as-measure-theory

Attention conservation notice: Elementary (and possibly not-even-right) if you have the relevant mathematical background, pointless if you don't. Written to help me clarify to myself a moment of categorical (pun not intended) confusion.

What's a possible way to understand the relationship between probability as a the (by Cox) extension of classical logic, probability as an optimal way to make decisions, and probability in the frequentist usage? Not in any deep philosophical sense, just in terms of pragmatics.

I like to begin from the Bayes/Jaynes/Cox view: if you take classical logic as valid (which I do in daily life) and want to extend it in a consistent way to continuous logic values (which I also do), then you end up with continuous logic/certainty values we unfortunately call probability due to historical reasons.

Perhaps surprisingly, its relationship with frequentist probability isn't necessarily contentious. You can take the Kolmogorov axioms as, roughly speaking, helping you define a sort of functor (awfully, based on shared notation and vocabulary, an observation that made me shudder a bit — it's almost magical thinking) between the place where you do probability-as-logic and a place where you can exploit the machinery of measure theory. This is a nice place to be when you have to deal with an asymptotically large number of propositions; possibly the Probability Wars were driven mostly by doing this so implicitly that we aren't clear about what we're putting *into* this machinery, and then, because the notation is similar, forgetting to explicitly go back to the world of propositions, which is where we want to be once we're done with the calculations.

What made me stare a bit at the wall is the other correspondence: Let's say that for some proposition A, P[A] > P[\neg A] in the Bayesian sense (we're assuming the law of excluded middle, etc; this is about a different kind of confusing). Why should I bet that A? In other words, why the relationship between probability-as-certainty and probability-as-strategy? You can define probability based on a decision theoretic point of view (and historically speaking, that's how it was first thought of), but why the correspondence between those two conceptually different formulations?

It's a silly non-question with a silly non-answer, but I want to record it because it wasn't the first thing I thought of. I began by thinking about P[\text{win} | (P(A) > P(\neg A)) \wedge \text{bet on } A], but that leads to a lot of circularity. It turns out that the forehead-smacking way to do it is simply to observe that the best strategy is to bet on A is true iff A, and this isn't circular if we haven't yet assumed that probability-as-strategy is the same as probability-as-logic, but rather it's a non-tautological consequence of the assumed psychology and sociology of what bet on means: I should've done whatever ended up working, regardless of what the numbers told me (I'll try to feel less upset the next time somebody tells me that).

But then, in the sense of probability-as-logic, P[\text{the best strategy is to bet on A}] = P[A] by substituting propositions (and hence without resorting to any frequentist assumption about repeated trials and the long term) so, generally speaking, you end up with probability-as-strategy being part of probability-as-logic. I'm likely counting angels dancing on infinitesimals here, but it's something it felt less clear to me earlier today: probability-as-strategy is probability-as-logic, you're just thinking about propositions about strategies, which, confusingly, in the simplest cases end up having the same numerical certainty values as the propositions the strategies are about. But those aren't the same propositions, although I'm not entirely sure that in practice, given the fundamentally intuitive nature of bet on (insert here very handwavy argument from evolutionary psychology about how we all descend from organisms who got this well enough not to die before reproducing), you get in trouble by not taking this into account.

Tom Sawyer, Bilingual

Following a friend's suggestion, here's a comparison of phrase length distributions between the English and German versions of The Adventures of Tom Sawyer:

Tom Sawyer Phrase Lengths

It could be interesting to parametrize these distributions and try to characterize languages in terms of some sort of encoding mechanism (e.g., assume phrase semantics are drawn randomly from a language-independent distribution and renderings in specific languages are mappings from that distribution to sequences of words, and handwave about what cost metric the mapping is trying to minimize).