Why the most influential business AIs will look like spellcheckers (and a toy example of how to build one)

Forget voice-controlled assistants. at work, AIs will turn everybody into functional cyborgs through squishy red lines under everything you type. Let’s look at a toy example I just built (mostly to play with deep learning along the way).

I chose as a data set Patrick Martinchek’s collection of Facebook posts from news organizations. It’s a very useful resource, covering more that a dozen organizations and with interesting metadata for each post, but for this toy model I focused exclusively on the headlines of CNN’s posts. Let’s say you’re a journalist/editor/social network specialist working for CNN, and part of your job is to write good headlines. In this context, a good headline could be defined as one having a lot of shares. How would you use an AI to help you with that?

The first step is simply to teach the AI about good and bad headlines. Patrick’s data set included 28,300 posts with both the headline and the count of shares (there were some parsing errors for which I chose just to ignore the data; in a production project the number of posts would’ve been larger). As what counts as a good headline depends on the organization, I defined a good headline as one that got a number of shares in the top 5% for the data set. This simplifies the task from predicting a number (how many shares) to a much simpler classification problem (good vs bad headline)

The script I used to train the network to perform this classification was Denny Britz’ classic Implementing a CNN for text classification in TensorFlow example. It’s an introductory model, not meant to have production-level performance (also, it was posted on December 2015, and sixteen months in this field is a very long time), but the code is elegant, well-documented, and easy to understand and modify, so it was the obvious choice for this project. The only changes I made were adapting it to train the network without having to load all of the data in memory at the same time and replacing the parser with one of NLTK‘s.

After an hour of training on my laptop, testing the model against out-of-sample data gives an accuracy of 93% and a precision for the class of good headlines of 9%. The latter is the metric I cared about for this model: it means that 9% of the headlines the model marks as good are, in fact, good. That’s about 80% better than random chance, which is… well, it’s not that impressive. But that’s after an hour of training with a tutorial example, and rather better than what you’d get from that data set using most other modeling approaches.

In any case, the point of the exercise wasn’t to get awesome numbers, but to be able to do the next step, which is where this kind of model moves from a tool used by CNN’s data scientists into one that turns writers into cyborgs.

Reaching again into NLTK’s impressive bag of tricks, I used its part-of-speech tagger to identify the nouns in every bad headline, and then a combination of WordNet’s tools for finding synonyms and the pluralizer in CLiPS’ Pattern Python module to generate a number of variants for each headline, creating new variations using simple rewrites of the original one.

So for What people across the globe think of Donald Trump, the program suggested What people across the Earth think of Donald Trump and What people across the world think of Donald Trump. What’s more, while the original headline was “bad,” the model predicts that the last variation will be good. With a 9% precision for the class, it’s not a sure thing, but it’s almost twice the a priori probability of the original, which isn’t something to sneeze at.

In another case, the program took Dog sacrifices life to save infant in fire, and suggested Dog sacrifices life to save baby in fire. The point of the model is to improve on intuition, and I don’t have the experience of whoever writes CNN’s post headlines, but that does look like it’d work better.

Where things go from a tool for data analysts to something that changes how almost everybody works is that nothing prevents a trained model from working in the background, constantly checking what you’re writing — for example, the headline for your post — and suggesting alternatives. To grasp the true power a tool like this could have, don’t imagine a web application that suggests changes to your headline, or even as a tool in your CMS or text editor, but something more like your spellchecker. For example, the “headline” field in your web app will have attached a model trained from the specific data from your organization (and/or from open data sets), which will underline it in red if it predicts it won’t work well. Right-click on the text, and it’ll show you some alternatives.

Or if the response to a customer you’re typing might make them angry.

Or if the presentation you’re building has the sort of look that works well on SlideShare.

Or if the code you’re writing is similar to the kind of code that breaks your application’s test suite.

Or if there’s something fishy in the spreadsheet you’re looking at.

Or… You get the idea. Whenever you have a classification model and a way to generate alternatives, you have a tool that can help knowledge workers do they work better, a tool that gets better over time — not just learning from its experience, as humans do, but from the collective experience of the entire organization — and no reason not to use it.

“Artificial intelligence,” or whatever label you want to apply to the current crop of technologies, is something that can, does, and will work invisibly as part of our infrastructure, and it’s also at the core of dedicated data analysis, but it’ll also change the way everybody works by having domain-specific models look in real time at everything you’re seeing and doing, and making suggestions and comments. Microsoft’s Clippy might have been the most universally reviled digital character before Jar Jar Binks, but we’ve come to depend on unobtrusive but superhuman spellcheckers, GPS guides, etc. Even now image editors work in this way, applying lots of domain-specific smarts to assist and subtly guide your work. As building models for human or superhuman performance on very specific tasks becomes accessible to every organization, the same will apply to almost every task.

It’s already beginning to. We don’t have, yet, the Microsoft Office of domain-specific AIs, and I’m not sure how that would look like, but, unavoidably, the fact that we can teach programs to perform better than humans in a list of “real-world” tasks that grows almost every week means that organizations that routinely do so — companies that don’t wait for fully artificial employees, but that also don’t neglect to enhance their employees with every better-than-human narrow AI they can build right now — have an increasing advantage over those that don’t. The interfaces are still clumsy, there’s no explicit business function or fancy LinkedIn position for it, and most workers, including ironically enough knowledge workers and people with leadership and strategic roles, still have to be convinced that cyborgization, ego issues aside, is a better career choice than eventual obsolescence, but the same barriers applied when business software first became available, yet the crushing economic and business advantages made them irrelevant in a very short amount of time.

The bottom line: Even if you won’t be replaced by an artificial intelligence, there will be many specific aspects of your work they will be or are already able to do better than you, and if you can’t or won’t work with them as part of your daily routine, there’s somebody who will. Knowing how to train and team up with software in an effective way will be one of the key work skills of the near future, and whether explicit or not, the “AI Resources Department” — a business function focused on constantly building, deploying, and improving programs with business-specific knowledge and skills — will be at the center of any organization’s efforts to become and remain competitive.