Software Engineering as Risk Management

During the Q&A; at the Niklaus Wirth talk last night someone mentioned formal methods from the Hoare school of work. Such as Hoare Logic. In particular they were asking why the work there had fallen out of favor, and wasn’t used more widely. Which got me thinking about the cost of axiomatic proof of computer systems. Which got me thinking about my background in computer security.

Before I had a job within a large company doing computer security, I used to think that security professionals spent most of their time updating access control lists, looking at packet logs, auditing application code, and doing all sorts of technical stuff. I thought that computer security within organizations was all about keeping people out of your data and off your systems. It’s not. It’s about spending as little as possible on issues related to the security of your computers. It’s not all about keeping people off your systems. It’s about figuring out how much it costs the company if people do make it onto your systems, and then figuring out how much it keeps to keep those people off your systems, and then picking whichever of those two costs less. Like insurance, it’s all about risk management.

Of course all kinds of issues get wrapped up in figuring out how much a compromise costs. If it’s a public compromise there’s image damage to take into account, in particular the effect on customer confidence. But at heart it’s a weighing of the expected losses due to intrustion (chance of occurance times cost per incident) against the cost of prevention. If it costs more to prevent than it does to sweep under the rug, you take the chance with the unprotected system and hope for the best. The security guys make jokes about the only secure computer being one that’s kept locked in the basement, disconnected from the network, and switched off. That’s the cost of 100% security, 0 productivity. That’s one of the nasty points of security, it is impossible to reduce the exposure risk to 0% and still get work done with the system. No matter how much money you put in, there’s always a chance of compromise. And the cost grows exponentially as you get closer to the low end of the risk scale. More and more money buys you less and less relative change in security.

Given this environment any business doing work on the Internet would be crippled with security costs if they were trying for absolute security. Nothing would ever get done. Ever. So the solution to costs spiralling out of control is the risk management approach. An implementation plan is based not on doing things absolutely correctly, but based on what can be done that will cost less than the perceived cost of doing nothing at all. I see a direct correlation here with software engineering. The question is not how to build a system which is 100% correct, but how to build a system that delivers more value than it costs to develop. In my mind that’s why the ideas from the formal verification end of computer science aren’t in more general use. The cost of training people with these tools, and then using them in building systems, raises the cost of systems beyond the cost of just not writing them at all.

Most business people understand the concept, but I think it’s probably so ingrained that it rarely get articulated explicitly. And this is probably partially responsible for the tension between business and technical concerns in most projects. It doesn’t help that most software engineering practices don’t acknowlege this end of the engineering tradeoff at all (with the very notable exception of agile practices, something that I’m just realizing as I’m writing this). It’s hard to come up with numbers attached to the exposure risk in security, and that analysis is normally done at a stage when the issue is much better understood than it is during the development stage. In the absence of those numbers, it seems as if there are very few instances where system development rigor makes it all the way out to formal verification and still stays below the cost of dealing with errors. Given the huge cost associated with changing development practices, a very strong case would have to be made before the attempt could be considered.