A quick reminder that deep learning is a hack

Here’s a fun paper from Google Research: The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric. The gist is that a task that it’s usually approached using lots of data to train complex architectures, or with hand-crafted somewhat ad hoc methods, can be done in a very efficient and simple way without a lot of the fun but at least in this case unnecessary complexity.

If you’re working on image retrieval and identification this is of immediate application, but, more generally, this is a reminder that it’s very unlikely that current deep learning architectures, or even the basics of these models, are nowhere near the best way to do the things they do: they are just the way we can do them now. We’re substituting a deep understanding of each tasks with (a lot of) data and (a lot of computation) and if it works, of course, it works, but it’s always a bad idea to believe that one is the same as the other.

At the level of personal, business, and even societal strategy, it’s salutary to keep in mind that having a black box that does a thing and understanding a thing are very different as long-term investments: knowledge builds upon knowledge in a way that’s deeper and exponentially more powerful than just stacking black boxes with blindly trained response patterns. Truly superhuman AI capabilities — and I do agree that their development is one of the three or four key determinants of the rest of the century — won’t come through huge neural networks doing things faster and more scalably than humans do, but rather as software developing, leveraging, and building upon knowledge — not data, not software, knowledge — that’s richer and more complex than humans can handle unaided.

(For what it’s worth, we’ve done this before, many times: language, writing, bureaucracies, and laboratories are all technologies of superhuman cognition. We’ll do it again. Given the complexity of the problems we face, we better do it.)