Latent mini-clusters of movies

Still playing with logical itemset mining, I downloaded one of the data sets from Group Lens that records movie ratings from MovieLens. The basic idea is the same as with clustering drug side effects: movies that are consistently ranked similarly by users are linked, and clusters in this graph suggest “micro-genres” of homogeneous (from a ratings POV) movies.

Here are a few of the clusters I got, practically with no fine-tuning of parameters:

  • Parts II and III of the Godfather trilogy
  • Ben-Hur and Spartacus
  • The first three Indiana Jones movies
  • Dick Tracy, Batman Forever, and Batman Returns.
  • The Devil’s Advocate and The Game.
  • The 60’s Lolita, the 1997 remake, and 1998’s Return to Paradise.
  • The first two Karate Kid movies.
  • Analyze This and Analyze That.
  • The 60’s Lord of the Flies, the 1990 remake, and 1998’s Apt Pupil

As movie clusters go, these are not particularly controversial; I found it interesting how originals and sequels or remakes seemed to be co-clustered, at least superficially. And thinking about it, clustering Apt Pupil with both Lord of the Flies movies is reasonable…

Media recommendation is by now a relatively mature field, and no single, untuned algorithm is going to be competitive against what’s already deployed. However, given the simplicity and computational manageability of basic clustering and recommendation algorithms, I expect they’ll become even more ubiquitous over time (pretty much as how autocomplete in input boxes did).