The changing clusters of terrorism

I’ve been looking at the data set from the Global Terrorism Database, an impressively detailed register of terrorism events worldwide…

A short note to myself on Propp-Wilson sampling

Most of the explanations I’ve read of Propp-Wilson sampling describe the method in terms of “sampling from the past,” in…

The Aliens/The Unbearable Lightness of Being classification space of movies

Still playing with the Group Lens movies data set, I implemented a couple of ideas from Shailesh Kumar, one of…

Latent mini-clusters of movies

Still playing with logical itemset mining, I downloaded one of the data sets from Group Lens that records movie ratings…

Finding latent clusters of side effects

One of the interesting things about logical itemset mining, besides its conceptual simplicity, is the scope of potential applications. Besides…

A thing I did

Timey-Wimey Stuff: Battles Edition Basically: you are shown two battles or conflicts, and have to say which one happened earliest….

Tom Sawyer, Bilingual

Following a friend’s suggestion, here’s a comparison of phrase length distributions between the English and German versions of The Adventures…

A first look at phrase length distribution

Here’s a sentence length vs. frequency distribution graph for Chesterton, Poe, and Swift, plus Time of Punishment. A few observations:…

The Premier League: United vs. City championship chances

Using the same model as previous posts (and, I’d say, not going against any intuition), the leading candidate to winning…

Chesterton’s magic word squares

Here are the magic word squares for a few of Chesterton’s books. Whether and how they reflect characteristics that differentiate…