Thursday, May 08, 2008

More on Application Security Metrics

Eric Bidstrup of Microsoft has a blog entry up titled "How Secure is Secure?" In it he makes a number of points related, essentially, to measuring the security of software and what the appropriate metrics might be.

I'd been asking the Microsoft guys for a while whether they had any decent metrics to break down the difference between:
  • Architectural/Design Defects
  • Implementation Defects
I hadn't gotten good answers up to this point because measuring those internally during the development process is a constantly moving target. If your testing methodology is always changing, then its hard to say whether you're seeing more or fewer defects of a given type than before, especially as a percentage. That is, if you weren't catching a certain class of issue with the previous version of a static analysis tool but now you are, its hard to correlate the results to previous versions of the software.

Eric says:
Microsoft has been releasing security bulletins since 1999. Based on some informal analysis that members of our organization have done, we believe well over 50% of *all* security bulletins have resulted from implementation vulnerabilities and by some estimates as high as 70-80%. (Some cases are questionable and we debate if they are truly “implementation issues” vs. “design issues” – hence this metric isn’t precise, but still useful). I have also heard similar ratios described in casual discussions with other software developers.
In general I think you're likely to find this trend across the board. Part of the reason though is that in general implementation defects are easier to find and exploit. Exploiting input validation failures that result in buffer overflows is a lot easier than complicated business logic attacks, multi-step attacks against distributed systems, etc.

We haven't answered whether there are more Architectural/Design defects or Implementation defects, but from an exploitability standpoint, its fairly clear that implementation defects are probably the first issues we want to fix.

At the same time, we do need to balance that against the damage that can be done by an architectural flaw, and just how difficult they can be to fix, especially in deployed software. Take as an example Lanman authentication. Even if implemented without defects, the security design isn't nearly good enough to resist exploit. Completely removing Lanman authentication from Windows and getting everyone switched over to it has taken an extremely long time in most businesses because of legacy deployment, etc. So, as much as implementation defects are the ones generally exploited and that need patching, architectural defects can in some cases cause a lot more damage and be harder to address/remediate once discovered/exploited.

Another defect to throw into this category would be something like WEP. Standard WEP implementations aren't defect ridden. They don't suffer from buffer overflows, race conditions, etc. They suffer from fundamental design defects that can't be corrected without a fundamental rewrite. The number of attacks resulting from WEP probably isn't known. Even throwing out high profile cases such as TJ Maxx and Home Depot, I'm guessing the damage done is substantial.

So far then things aren't looking good for using implementation defects as a measuring stick of how secure a piece of software is. Especially for widely deployed products that have a long lifetime and complicated architecture.

Though I suppose I can come up counter-examples as well. SQL-Slammer after all was a worm that exploited a buffer overflow in MS-SQL Server via a function that was open by default to the world. It was one of the biggest worms ever (if not the biggest, I stopped paying attention years ago) and it exploited an implementation defect, though one that was exploitable because it was part of the unauthenticated attack surface of the application - a design defect.

All this really proves is that determining which of these types of defects to measure, prioritize, and fix is a tricky business and as always, you mileage may vary.

As Eric clearly points out the threat landscape isn't static either. So, what you think is a priority today might change tomorrow. And, its different for different types of software. The appropriate methodology for assessing and prioritizing defects for a desktop application is substantially different than that for a centrally hosted web application. Differences related to exploitability, time-to-fix, etc.

More on that in a post to follow.

1 comment:

Gunnar said...

Related point

http://www.mail-archive.com/sc-l@securecoding.org/msg00881.html

bad screw vs. bad table.