Deep Learning as the apotheosis of Test-Driven Development

Even if you aren’t interested in data science, Deep Learning is an interesting programming paradigm; you can see it as “doing test-driven development with a ludicrously large number of tests, an IDE that writes most of the code, and a forgiving client.” No wonder everybody’s pouring so much money and brains into it! Here’s a way of thinking about Deep Learning not as an application you’re asked to code, but a language to code with.

Deep Learning applies test-driven development as we’re all taught to (and not always do): first you write the tests, and then you move from code that fails all of them to one that passes them all. One difference from the usual way of doing it, and the most obvious, is that you’ll usually have anything from hundreds of thousands to Google-scale numbers of test cases in the way of pairs (picture of a cat, type of cute thing the cat is doing), or even a potentially infinite number that look like pairs (anything you try, how badly Donkey Kong kills you). This gives you a good chance that, if you selected or generated them intelligently, the test cases represent the problem well enough that a program that passes them will work in the wild, even if the test cases are all you know about the problem. It definitely helps that for most applications the client doesn’t expect perfect performance. In a way, this lets you get away with the problem of having to get and document domain knowledge, at least for reasonable-but-not-state-of-the-art levels of performance, which is specially hard to do for to things like understanding cat pictures, because we just don’t know how we do it.

The second difference between test-driven development with the usual tools and test-driven development with Deep Learning languages and runtimes is that the latter are differentiable. Forget the mathematical side of that: the code monkey aspect of it is that when a test case fails, the compiler can fix the code on its own.

Yep.

Once you stop thinking about neural networks as “artificial brains” or data science-y stuff, and look at them as a relatively unfamiliar form of bytecode — but, as bytecode goes, also a fantastically simple one — then all that hoopla about backpropagation algorithms is justified, because they do pretty much what we do: look at how a test failed and then work backwards through the call stack, tweaking things here and there, and then running the test suite again to see if you fixed more tests than you broke. But they do it automatically and very quickly, so you can dedicate yourself to collecting the tests and figuring out the large scale structure of your program (e.g. the number and types of layers in your network, and their topology) and the best compiler settings (e.g., optimizing hyperparameters and setting up TensorFlow or whatever other framework you’re using; they are labeled as libraries and frameworks, but they can also be seen as compilers or code generators that go from data-shaped tests to network-shaped bytecode).

One currently confusing fact is that this is all rather new, so very often the same people who are writing a program are also improving the compiler or coming up with new runtimes, so it looks like that’s what programming with Deep Learning is about. But that’s just a side effect of being in the early “half of writing the program is improving gcc so it can compile it” days of the technology, where things improve by leaps and bounds (we have both a fantastic new compiler and the new Internet-scale computers to run it), but are also rather messy and very fun.

To go back to the point: from a programmer’s point of view, Deep Learning isn’t just a type of application you might be asked to implement. They are also a language to write things with, one with its own set of limitations and weak spots, sure, but also with the kind of automated code generation and bug fixing capabilities that programmers have always dreamed of, but by and large avoid because doing it with our usual languages involves a lot of maths and the kind of development timelines that makes PMs either laugh or cry.

Well, it still does, but with the right language the compiler takes care of that, and you can focus on high-level features and getting the test cases right. It isn’t the most intuitive way of working for programmers trained as we were, and it’s not going to fully replace the other languages and methods in our toolset, but it’s solving problems that we thought were impossible. How can a code monkey not be fascinated by that?