Still playing with logical itemset mining, I downloaded one of the data sets from Group Lens that records movie ratings from MovieLens. The basic idea is the same as with clustering drug side effects: movies that are consistently ranked similarly by users are linked, and clusters in this graph suggest "micro-genres" of homogeneous (from a ratings POV) movies.
Here are a few of the clusters I got, practically with no fine-tuning of parameters:
- Parts II and III of the Godfather trilogy
- Ben-Hur and Spartacus
- The first three Indiana Jones movies
- Dick Tracy, Batman Forever, and Batman Returns.
- The Devil's Advocate and The Game.
- The 60's Lolita, the 1997 remake, and 1998's Return to Paradise.
- The first two Karate Kid movies.
- Analyze This and Analyze That.
- The 60's Lord of the Flies, the 1990 remake, and 1998's Apt Pupil
As movie clusters go, these are not particularly controversial; I found it interesting how originals and sequels or remakes seemed to be co-clustered, at least superficially. And thinking about it, clustering Apt Pupil with both Lord of the Flies movies is reasonable...
Media recommendation is by now a relatively mature field, and no single, untuned algorithm is going to be competitive against what's already deployed. However, given the simplicity and computational manageability of basic clustering and recommendation algorithms, I expect they'll become even more ubiquitous over time (pretty much as how autocomplete in input boxes did).