Quick link: Knowing what your algorithm thinks it knows

The paper: What’s in a Prior? Learned Proximal Networks for Inverse Problems.

What it does: This will be clearer with a bit of context, and step by step:

  • Many algorithmic problems belong to the category of inverse problems: You have a noisy or limited measurement (e.g. a blurry photograph) of a more interesting original source (somebody’s face), and you want to recover the original source from the noisy version. This is a classic CSI problem, but pretty much every measurement, sensor, or survey can be looked at in this way.
  • The basic problem is that there’s always more than one possible original source for any corrupted version. For example, a sound recording of Beethoven’s Fifth with plenty of static could be a corrupted version of a clean recording, or it could be a perfect recording of an experimental version of the symphony with added static noise, This might seem facetious, but it’s a fundamental mathematical feature of the problem.
  • Every solver for an inverse problem, then, needs some sort of understanding of what cases of original sources are more likely than others. When these expectations — this prior — matches reality well then the solver will work well. If not, if you are trying to use a normal denoiser to clean up your experimental noisecore songs, it’ll actually make things worse.
  • The problem is that for most inverse problem solvers you can’t “read” those assumptions from your trained algorithm. Let’s say you train a de-blurring algorithm for your image sensors with a combination of a large data set and some regularization term in the loss function. Even with this full knowledge of everything that went into building the algorithm, there’s no direct way to figure out if the algorithm thinks a priori that straight lines are more common than curved lines (and will therefore tend to “fix” slightly curved lines) or if it thinks that most people have blue eyes and “correct” eye color in photos.
  • In short — too late — the paper uses some very elegant mathematics to figure out a way to train solvers for inverse problems that do let you ask how the a priori probability of some original source compares to another. In other words, they come with readable assumptions about the world.

That was a lot for a quick link. Why does it matter? It’s a neat technique that’s worth exploring by anybody training models from scratch — I’m pretty sure we’ll see quite a bit of that in the next years — but it’s also a good strategic reminder that the opacity of contemporary models is a technical side effect of how we build them, not something inherent in the problem itself. Regulatory and practice frameworks built upon the assumption that any sufficiently powerful AI will have to be an unreadable black box might become obsolete sooner rather than later. Truth is, we’ve just started to figure out this style of software building, and our hardware, data, enthusiasm, and money outpace our experience and understanding. This won’t last forever. The future of AI building is likely to be not just more powerful but also more transparent and better understood. The unreadable complexity of our models isn’t a sign of sophistication but a reminder of their still experimental nature.