The AI killer app isn't answers, it's questions

To understand the scale of the upcoming speed-up in research itself, you can do worse than to look at this paper about how bad we are at it: The Clinical Trials Puzzle: How Network Effects Limit Drug Discovery. The abstract itself is quietly damning:

The depth of knowledge offered by post-genomic medicine has carried the promise of new drugs, and cures for multiple diseases. To explore the degree to which this capability has materialized, we extract meta-data from 356,403 clinical trials spanning four decades, aiming to offer mechanistic insights into the innovation practices in drug discovery. We find that convention dominates over innovation, as over 96% of the recorded trials focus on previously tested drug targets, and the tested drugs target only 12% of the human interactome. If current patterns persist, it would take 170 years to target all druggable proteins. We uncover two network-based fundamental mechanisms that currently limit target discovery: preferential attachment, leading to the repeated exploration of previously targeted proteins; and local network effects, limiting exploration to proteins interacting with highly explored proteins. We build on these insights to develop a quantitative network-based model of drug discovery. We demonstrate that the model is able to accurately recreate the exploration patterns observed in clinical trials. Most importantly, we show that a network-based search strategy can widen the scope of drug discovery by guiding exploration to novel proteins that are part of under explored regions in the human interactome.

At the tactical level there are good reasons for this conservative bias: trials are very expensive and have many steps, so the optimal choice for an individual researcher, team, or company is often to look into something similar to what has already been looked at. If that’s something that works… or kind of works… or at least doesn’t kill people, then you’re probably ahead of most compounds out there, even if the best you can hope for from your new trial is probably not much better than what you already have.

At the strategic level — and in medicine this means humanity as a whole — this is a disaster by omission; if justice delayed is justice denied, unnecessary slow medical research kills.

The authors of the paper describe some simple ways to counter part of this bias, which can be linked to the growing use of computationally intensive experiment design algorithms in chemistry, biology, and linked areas. The underlying idea is that figuring out the most informative experiment, the most interesting question you can ask the universe given its costs and everything you know, is hard. The naive strategy of gathering “all the data” and then running models on it doesn’t really work on complex fields, or in any field after you’ve got that low-hanging fruit. Useful innovations require knowledge that can only be acquired from deliberate experimentation, not just passive data collection — after all, you can’t test airplane designs by watching birds all day — and designing optimal experiments in an expensive and dangerous world can be as algorithmically complex as analyzing the data after you got it.

This is something most organizations using or deploying AI tend to under-invest on. Designing an experiment, or a semi-autonomous AI-controlled dynamic set of experiments, complements data gathering and analysis but requires different tools and frameworks. It’s easy, almost a reflex by now, to ask for more detail on any given phenomenon, to ask for more observations or to test minor variations of what you are already doing.

It’s much harder to look at the blanks in the map and to apply algorithmic tools to understand how best to explore it. Most of the time organizations are even aware that they can do it, much less of how to. But the world is wider than your data lake, and most of what you need to know can only be learned by looking outside. AI-guided experimental design, against the stereotype about AIs, helps us ask not just more efficient questions but more original ones.