Category Archives: Programming

The job of the future isn't creating artificial intelligences, but keeping them sane

Once upon a time, we thought there was such a thing as bug-free programming. Some organizations still do — and woe betide their customers — but after a few decades hitting that particular wall, the profession has by and large accepted that writing software is such an extremely complex intellectual endeavor that errors and unfounded assumptions are unavoidable. Even the most mathematically solid of formal methods has, if nothing else, to interact with a world of unstable platforms and unreliable humans, and what worked today will fail tomorrow.

So we spend time and resources maintaining what we already "finished," fixing bugs as they are found, and adapting programs to new realities as they develop. We have to, because when we don't, as when physical infrastructure isn't maintained, we save resources in the short term, but only in our way towards protracted ruin.

It's no surprise that this also happens with our most sofisticated data-driven algorithms. CVs and scrum boards are filled with references to the maintenance of this or that prediction or optimization algorithm.

But there's a subtle, not universal but still very prevalent, problem: those aren't software bugs. This isn't to say that implementations don't have bugs; being software, they do. But they are computer programs implementing inference algorithms, which work at a higher level of abstraction, and those have their own kinds of bugs, and those don't leave stack traces behind.

A clear example is the experience of Google. PageRank was, without a doubt, among the most influential algorithms in the history of the internet, not to mention the most profitable, but as Google took the internet over by storm, gaming PageRank became such an important business activity that "SEO" became a commonplace word.

From an algorithmic point of view this simply a maintenance problem: PageRank assumed a certain relationship between link structure and relevance, based on the assumption that website creators weren't trying to fool it. Once this assumption became untenable, the algorithm had to be modified to cope with a world of link farms and text written with no human reader in mind.

In (very loosely equivalent) software terms, there was a new threat model, so Google had to figure out and apply a security patch. This is, for any organization facing a simular issue, a continual business-critical process, and one that could make or break a company's profitability (just ask anybody working on high-frequency trading). But not all companies deploy the same sort of detailed, continuous instrumentalization, and development and testing methodologies that they use to monitor and fix their software systems to their data driven algorithms independently of their implementations. The same data scientist who developed an algorithm is often in charge of monitoring its performance on a more or less regular basis; or, even worse, it's only a hit to business metrics what makes companies reassingn their scarce human resources towards figuring out what's going wrong. Either monitoring and maintenance strategy would amount to criminal malpractice if we were talking about software, yet there are companies for which is this is the norm.

Even more prevalent is the lack of automatic instrumentalization for algorithms mirroring that for servers. Any organization with a nontrivial infrastructure is well aware of, and has analysis tools and alarms for, things like server load or application errors. There are equivalent concepts for data-driven algorithms — quantitative statistical assumptions, wildly erroneous predictions — that should, also, be monitored in real time, and not collected (when the data is there) by a data scientist only after the situation has become bad enough to be noticed.

None of this is news to anybody working with big data, particularly in large organizations centered around this technology, but we have still to settle on a common set of technologies and practices, and even just on an universal agreement on its need.

These days nobody would dare deploy a web application trusting only server logs at the operating system level. Applications have their own semantics, after all, and everything in the operating system working perfectly is no guarantee that the app is working at all.

Large-scale prediction and optimization algorithms are just the same; they are often an abstraction running over the application software that implements them. They can be failing wildly, statistical assumptions unmet and parameters converging to implausible values, with nothing in the application layer logging even a warning of any kind.

Most users forgive a software bug much more easily than unintelligent behavior in avowedly intelligent software. As a culture, we're getting used to the fact that software fails, but many still buy the premise that artificial intelligence doesn't (this is contradictory, but so are all myths). Catching these errors as early as possible can only be done while algorithms are running in the real world, where the weird edge cases and the malicious users are, and this requires metrics, logs, and alarms that speak of what's going on in the world of mathematics, not software.

We haven't converged yet on a standard set of tools and practices for this, but I know many people who'll sleep easier once we have.

Quantitatively understanding your (and others') programming style

I'm not, in general, a fan of code metrics in the context of project management, but there's something to be said for looking quantitatively at the patterns in your code, specially if by comparing them with those of better programmers, you can get some hopefully useful ideas on how to improve.

(As an aside, the real possibilities in computer-assisted learning won't come from lower costs, but rather by a level of adaptability that so far not even one-on-one tutoring has allowed; if the current theories about expertise are more or less right, data-driven adaptive learning, if implemented at the right granularity level and with the right semantics model behind, could change the speed and depth the way we learn in a dramatic way... but I digress.)

Focusing on my ongoing learning of Hy, I haven't used it in any paid project so far, but I've been able to play a bit with it now and then, and this has generated a very small code base, which I was curious to compare with code written by people who actually know the language. To do that, I downloaded the source code of a few Hy projects on GitHub (hyway, hygdrop, and adderall), and wrote some code (of course, in Hy) to extract code statistics.

Hy being a Lisp, its syntax is beautifully regular, so you can start by focusing on basic but powerful questions. The first one I wanted to know was: which functions am I using the most? And how does this distribution compares with that of the (let's call it) canon Hy code?

My top five functions, in decreasing frequency: setv, defn, get, len, for.

Canon's top five functions, in decreasing frequency: ≡, if, unquote, get, defn_alias.

Yikes! Just from this, it's obvious that there are some serious stylistic differences, which probably reflect my still un-lispy understanding of the language (for example, I'm not using aliases, for should probably be replaced by more functional patterns, and the way I use setv, well, it definitely points out to the same). None of this is a "sin", nor points clearly to how I could improve (which a sufficiently good learning assistant would have), but the overall trust of the data is a good indicator of where I still have a lot of learning to do. Fun times ahead!

For another angle at the quantitative differences between my newbie-to-lisp coding style and more accomplished programmers, here are the histograms of the log mean size of subexpressions for each function (click to expand):

log (mean subexpression size)

"Canonical" code shows a longer right tail, which shows that experienced programmers are not afraid of occasionally using quite large S-expressions... something I still clearly I'm still working my way up to (alternatively, which I might need to reconsider my aversion to).

In summary: no earth-shattering discoveries, but some data points that suggests specific ways in which my coding practice in Hy differs from that of more experienced programmers, which should be helpful as general guidelines as I (hopefully) improve over the long term. Of course, all metrics are projections (in the mathematical sense) — they hide more information than they preserve. I could make my own code statistically indistinguishable from the canon for any particular metric, and still have it be awful. Except for well-analyzed domains where known metrics are sufficient statistics for the relevant performance (and programming is very much not one of those domains, despite decades of attempts), this kind of analysis will always be about suggesting changes, rather than guaranteeing success.

Hi, Hy!

Currently trying Hy as a drop-in replacement for Python in a toy project. It's interesting how much of the learning curve for Lisp goes away once you have access to an underlying runtime you're familiar with; the small stuff generates more friction than the large differences (which makes sense, as we do the small stuff more often).

A flow control structure that never makes mistakes (sorta)

I've been experimenting with Lisp-style ad-hoc flow control structures. Nothing terribly useful, but nonetheless amusing. E.g., here's a dobest() function that always does the best thing (and only the best thing) among the alternatives given to it — think of the mutant in Philip K. Dick's The Golden Man, or Nicolas Cage in the awful "adaptation" Next.

Here's how you use it:

if __name__ == '__main__':
    def measure_x():
        "Metric function: the value of x"
        global x
        return x
    def increment_x():
        "A good strategy: increment x"
        global x
        x += 1
    def decrement_x():
        "A bad strategy: decrement x"
        global x
        x -= 1
    def fail():
        "An even worse strategy"
        global x
        x = x / 0
    x = 1
    # assert(x == 1)
    dobest(measure_x, increment_x, decrement_x, fail)
    # assert(x == 2)

You give it a metric, a function that returns how good you think the current world is, and one or more functions that operate on the environment. Perhaps disappointingly dobest() doesn't actually see the future; rather, it executes each function on a copy of the current environment, and only transfers to the "real" one the environment with the highest value of metric().

Here's the ugly detail (do point out errors, but please don't mock too much; I haven't played much with Python scopes):

def dobest(metric, *branches):
    "Apply every function in *branches to a copy of the caller's environnment, only do 'for real' the best one according to the result of running metric()."
    world = copy.copy(dict(inspect.getargvalues(sys._getframe(1)).locals))
    alts = []
    for branchfunction in branches:
            # Run branchfunction in a copy of the world
            ns = copy.copy(world)
            exec branchfunction.__code__ in ns, {}
        except: # We ignore worlds where things explode
    # Sort worlds according to metric()
    alts.sort(key=lambda ns: eval(metric.__code__, ns, {}), reverse=True)
    for key in alts[0]:
        sys._getframe(1).f_globals[key] = alts[0][key]

One usability point is that the functions you give to dobest() have to explicitly access variables in the environment as global; I'm sure there are cleaner ways to do it.

Note that this also can work a bit like a try-except with undo, a la

dobest(bool, function_that_does_something, function_that_reports_an_error)

This would work like try-except, because dobest ignores functions that raise exceptions, but with the added benefit that dobest would clean up everything done by function_that_does_something.

Of course, and here's the catch, "everything" is kind of limited — I haven't precisely gone out of my way to track and catch all side effects, not that it'd even be possible without some VM or even OS support. Point is, the more I get my ass saved by git, the more I miss it in my code, or even when doing interactive data analysis with R. As the Doctor would say, working on just one timeline can be so... linear.

A mostly blank slate

The combination of a tablet and a good Javascript framework makes it very easy to deploy algorithms to places where so far they have been scarce, like meetings, notetaking, and so on. The problem lies in figuring out what those algorithms should be; just as we had to have PCs for a few years before we started coming up with things to do with them (not that we have even scratched the surface), we still don't have much of a clue about how to use handheld thinking machines outside "traditional thinking machine fields."

Think about it this way: computers have surpassed humans in the (Western Civilization's) proverbial game of strategy and conflict, chess, and are useful enough in games of chance that casinos and tournament organizers are prone to use anything from violence to lawyers to keep you from using them. So the fact that we aren't using a computer when negotiating or coordinating says something about us.

The bottleneck, Cassius would say nowadays, is not in our tech, but in our imaginations.

Men's bathrooms are (quantum-like) universal computers

As it's well documented, men choose urinals to maximize their distance from already occupied urinals. Letting u_i be 1 or 0 depending on whether urinal $i$ is occupied, and setting \sigma_{i,j} to the distances between urinals i and j, male urinal choosing behavior can be seen as an annealing heuristic maximizing

\sum u_i u_j \sigma_{i,j}

(summing over repeating indexes as per Einstein's convention). And this is obviously equivalent to the computational model implemented by D-Wave's quantum computers! Hardware implementations of urinal-based computers might be less compact than quantum ones (and you might need bathrooms embedded in non-Euclidean spaces in order to implement certain programs), but they are likely to be no more error-prone, and they are certain to escalate better.