The most useful and most dangerous concept in AI is the loss function. To train or fit a model is to modify it with the goal of making some number, a mathematical function of what the model did and what it should have done, as small as possible. Much attention is generally paid to the data (less often, experimental setups) and to model architectures and training infrastructure, and almost nothing to how loss will be quantified - and therefore how it will be avoided. Most of the time a default loss function is used without consideration of alternatives, or if there's a choice it's based on technical considerations about numerical stability, gradients, and such.
As long as the loss function is smaller when you're closer to the expected outcome that's enough, is the unspoken assumption. And it's not always wrong. But the loss function hides inside itself many key facts about the world — it's what it's there for, to make possible the generic machinery of numerical optimization to work without needing to know anything else — and in the real world not all losses are commensurable:
- A loss can be a suboptimal operation: money on the table. You sell an ad space for less than you could have or automatically assign a task to somebody good but not ideal.
- But a loss can also be an existential risk for the company. Get wrong the leverage for you high-speed trading fund or build an auto-replying chatbot that gets people fired and your business might not exist this time next quarter.
- And sometimes, not in every business but in many of them, the loss is somebody else's in a deep sense. Somebody gets killed because your self-driving car isn't or see their lives destroyed because they are falsely flagged as a criminal risk.
Those three are not the same. They are not on the same scale. Yet they are often risks that can come out of the operation of what's thought of as a single AI.
The game-theoretic mathematician and the microeconomy specialist will tell you, and not be wrong in a technical sense, that you can always encode any self-consistent combination of those preferences in a single utility function. Make killing somebody, say, have infinite loss and business-existential risk a function of your preferred company valuation and there you are.
This is true as mathematics but false in context. Some of the reasons for this, in order of increasing difficulty of solution:
Loss functions simultaneously encoding all these different levels of risk are computationally tricky to use. Training algorithms don't like large differences in scale, deal badly with huge gradients, and very often just blow up when facing infinities. There are ways to work around these problems, but
Most AI engineers aren't trained to think in this way. Ask an engineer to build a facial recognition program to match customers against a database of suspected shoplifters and chances are they will aim to minimize one of the standard classification loss functions rather than ask and account for things like the asymmetrical personal and legal impact of false negatives: what's the loss function that makes sure that you don't get somebody arrested or at least escorted publicly out of a business because they happen to look like somebody else through a bad camera in a poor angle? What are the practices, risks, and impacts in the context where it will be deployed? There are people who could help with this, but ultimately neither the time nor the human resources are there, because
Projects take non-operational losses as a secondary concern. The way most companies build their AIs — and this isn't a matter of resources or technical expertise — is to either minimize operational losses and then try to build guardrails on top of it to minimize business-existential and human losses, or to train models with smaller data sets that try to avoid the "riskier" cases in something of an ad hoc, "if nobody dies on the last round of simulations it's ready for production" sort of way.
(The relatively ethical companies, at least. Many just build something more or less safe in the first sense and leave it at that.)
This is the most common strategy, but it's simultaneously too risky — you are building washing machines that can explode, and then trying to reduce the probability that they will — and not aggressive enough — you use tame optimization algorithms in contexts where you could learn radically new processes through wildly more creative ones with nothing more at risk that a short-term drop in a dashboard only you and your manager care about.
Different types of losses are incommensurable. Yet that's fine: there are qualitatively different types of AI and ways to build and deploy them.
- Systems with nothing at risk except optimality are ideal for aggressive black-box algorithms. You sacrifice baseline performance at the beginning, you gain a lot of new knowledge, and over the longer term you can do things you couldn't. Here "loss" is another word for "cost of learning."
- Where the very existence of the business can be at risk, you do not attempt to build a single model to both optimize operations and prevent blowing up the company. Just as it was before computers, the safer solution is to create two systems, one dedicated to optimization, the other dedicated to modeling and preventing existential risks... and you give the latter veto power over the former. And just as it was before computers, the difficulty isn't with technology but with culture and incentive alignment: people that veto profitable actions with a 1% risk of leading to a bankruptcy don't get bonuses or promotions, and neither do the people that write AIs to do it.
What about human risk? The shorthand that "AI" means "large generative models," as useful as it is for marketing, confuses the issue. Nothing about AI in the sense of Artificial Intelligence — an always-evolving set of technologies and approaches that predates Silicon Valley by centuries — necessarily implies opaqueness or unpredictability. Software, complex decision-making software embodying expertise and making difficult decisions, can be made not just faster and more scalable than humans but also more reliable. Not with the same implications in costs, time, types of expertise, and marketing possibilities, but also not with the same risks.
The boring fact is that most of the damage caused by AIs in the real world, as opposed to old sci-fi turned into unconsciously accepted folk tales, can be traced to the use of whatever form of AI is most fashionable at the moment instead of the one that's most appropriate to the use case. (Sometimes, of course, there's currently no AI that can handle a certain problem. If that puts human life or welfare at risk, then don't do it.)
Very few companies operate in environments with only one type of potential loss; some of them are exposed to all three. Most ethical debates about the safe use of AI are stuck on the assumption that an organization, product, or process must only use or be driven by one AI, and that all AIs are of the same type, which in 2024 often means very large, opaque, and hard to control generative models or big brute-force classification algorithms. Unbundling a process into different parts according to the types of risk involved — the different scales of loss — and using different types of AI, up to and including "none," reduces the problem to already well-known issues of contextually acceptable risk levels. There's nothing ethically wrong, and much to be gained, by using black box models in some places, while only the most carefully designed formally verified and externally verifiable software should be deployed in others.
You want human-like software on your car's entertainment console, not on your brakes. Both can be "AI" in interesting, profitable, ground-breaking senses. Just not the same.
Before making choices about model architectures or data sets, the first step in any organization's path to AI as a baseline instead of bolt-on or self-sabotage is to understand itself well enough to know which is which, and thence to have the discipline to avoid the possibility of harm everywhere it should and the daring to go all-in everywhere it can. Not to do the former would be ethically unthinkable, not to do the latter would be strategically unsound.