Finding latent clusters of side effects

One of the interesting things about logical itemset mining, besides its conceptual simplicity, is the scope of potential applications. Besides the usual applications finding useful common sets of purchased goods or descriptive tags, the underlying idea of mixture-of, projections-of, latent [subsets] is a very powerful one (arguably, the reason why experiment design is so important and difficult is that most observations in the real world involve partial data from more than one simultaneous process or effect).

To play with this idea, I developed a quick-and-dirty implementation of the paper's algorithm, and applied it to the data set of the paper Predicting drug side-effect profiles: a chemical fragment-based approach. The data set includes 1385 different types of side effects potentially caused by 888 different drugs. The logical itemset mining algorithm quickly found the following latent groups of side effects:

  • hyponatremia, hyperkalemia, hypokalemia
  • impotence, decreased libido, gynecomastia
  • nightmares, psychosis, ataxia, hallucinations
  • neck rigidity, amblyopia, neck pain
  • visual field defect, eye pain, photophobia
  • rhinitis, pharyngitis, sinusitis, influenza, bronchitis

The groups seem reasonable enough (although hyperkalemia and hypokalemia being present in the same cluster is somewhat weird to my medically untrained eyes). Note the small size of the clusters and the specificity of the symptoms; most drugs induce fairly generic side effects, but the algorithm filters those out in a parametrically controlled way.