The changing clusters of terrorism
I’ve been looking at the data set from the Global Terrorism Database, an impressively detailed register of terrorism events worldwide…
A short note to myself on Propp-Wilson sampling
Most of the explanations I’ve read of Propp-Wilson sampling describe the method in terms of “sampling from the past,” in…
The Aliens/The Unbearable Lightness of Being classification space of movies
Still playing with the Group Lens movies data set, I implemented a couple of ideas from Shailesh Kumar, one of…
Latent mini-clusters of movies
Still playing with logical itemset mining, I downloaded one of the data sets from Group Lens that records movie ratings…
Finding latent clusters of side effects
One of the interesting things about logical itemset mining, besides its conceptual simplicity, is the scope of potential applications. Besides…
A thing I did
Timey-Wimey Stuff: Battles Edition Basically: you are shown two battles or conflicts, and have to say which one happened earliest….
Tom Sawyer, Bilingual
Following a friend’s suggestion, here’s a comparison of phrase length distributions between the English and German versions of The Adventures…
A first look at phrase length distribution
Here’s a sentence length vs. frequency distribution graph for Chesterton, Poe, and Swift, plus Time of Punishment. A few observations:…
The Premier League: United vs. City championship chances
Using the same model as previous posts (and, I’d say, not going against any intuition), the leading candidate to winning…
Chesterton’s magic word squares
Here are the magic word squares for a few of Chesterton’s books. Whether and how they reflect characteristics that differentiate…