Rants of a deranged squirrel.

An Open Letter on the Interpretation of “Vulnerability Statistics”

[This was originally published on the OSVDB blog.]

Steve Christey (CVE Editor) wrote an open letter to several mailing lists regarding the nature of vulnerability statistics. What he said is spot on, and most of what I would have pointed out had my previous rant been more broad, and not a direct attack on a specific group. I am posting his entire letter here, because it needs to be said, read, understood, and drilled into the heads of so many people. I am reformatting this for the blog, you can read an original copy via a mail list.


Open Letter on the Interpretation of “Vulnerability Statistics”

Author: Steve Christey, CVE Editor
Date: January 4, 2006

As the new year begins, there will be many temptations to generate, comment, or report on vulnerability statistics based on totals from 2005. The original reports will likely come from publicly available Refined Vulnerability Information (RVI) sources – that is, vulnerability databases (including CVE/NVD), notification services, and periodic summary producers.

RVI sources collect unstructured vulnerability information from Raw Sources. Then, they refine, correlate, and redistribute the information to others. Raw sources include mailing lists like Bugtraq, Vulnwatch, and Full-Disclosure, web sites like PacketStorm and Securiteam, blogs, conferences, newsgroups, direct emails, etc.

In my opinion, RVI sources are still a year or two away from being able to produce reliable, repeatable, and COMPARABLE statistics. In general, consumers should treat current statistics as suggestive, not conclusive.

Vulnerability statistics are difficult to interpret due to several factors:

EDITORIAL POLICY VARIATIONS

This is just a sample of variations in editorial policy. There are legitimate reasons for each variation, usually due to audience needs or availability of analytical resources.

COMPLETENESS (what is included):

  1. SEVERITY. Some RVIs do not include very low-risk items such as a bug that causes path disclosure in an error message in certain non-operational configurations. Secunia and SecurityFocus do not do this, although they might note this when other issues are identified. Others include low-risk issues, such as CVE, ISS X-Force, US-CERT Security Bulletins, and OSVDB.
  2. VERACITY. Some RVIs will only publish vulnerabilities when they are confident that the original, raw report is legitimate – or if they’re verified it themselves. Others will publish reports when they are first detected from the raw sources. Still others will only publish reports when they are included in other RVIs, which makes them subject to the editorial policies of those RVIs unless care is taken. For example, US-CERT’s Vulnerability Notes have a high veracity requirement before they are published; OSVDB and CVE have a lower requirement for veracity, although they have correction mechanisms in place if veracity is questioned, and CVE has a two-stage approach (candidates and entries).
  3. PRODUCT SPACE. Some RVIs might omit certain products that have very limited distribution, are in the beta development stage, or are not applicable to the intended audience. For example, version 0.0.1 of a low-distribution package might be omitted, or if the RVI is intended for a business audience, video game vulnerabilities might be excluded. On the other hand, some “beta” products have extremely wide distribution.
  4. OTHER VARIATIONS. Other variations exist but have not been studied or categorized at this time. One example, though, is historical completeness. Most RVIs do not cover vulnerabilities before the RVI was first launched, whereas others – such as CVE and OSVDB – can include issues that are older than the RVI itself. As another example: a few years ago, Neohapsis made an editorial decision to omit most PHP application vulnerabilities from their summaries, if they were obscure products, or if the
    vulnerability was not exploitable in a typical operational configuration.   ABSTRACTION (how vulnerabilities are “counted”):
  5. VULNERABILITY TYPE. Some RVIs distinguish between types of vulnerabilities (e.g. buffer overflow, format string, symlink, XSS, SQL injection). CVE, OSVDB, ISS X-Force, and US-CERT Vulnerability Notes perform this distinction; Secunia, FrSIRT, and US-CERT Cyber Security Bulletins do not. Bugtraq IDs vary. As vulnerability classification becomes more detailed, there is more room for variation (e.g. integer overflows and off-by-ones might be separated from “classic” overflows).
  6. REPLICATION. Some RVIs will produce multiple records for the same core vulnerability, even based on the RVI’s own definition. Usually this is done when the same vulnerability affects multiple vendors, or if important information is released at a later date. Secunia and US-CERT Security Bulletins use replication; so might vendor advisories (for each supported distribution). OSVDB, Bugtraq ID, CVE, US-CERT Vulnerability Notes, and ISS X-Force do not – or, they use different replication than others. Replication’s impact on statistics is not well understood.
  7. OTHER VARIATIONS. Other abstraction variations exist but have not been studied or categorized at this time. As one example, if an SQL injection vulnerability affects multiple executables in the same product, OSVDB will create one record for each affected program, whereas CVE will combine them. TIMELINESS:
  8. RVIs differ in how quickly they must release vulnerability information. While this used to vary significantly in the past, these days most public RVIs have very short timelines, from the hour of release to within a few days. Vulnerability information can be volatile in the early stages, so an RVI’s requirements for timeliness directly affects its veracity and completeness. REALITY:
  9. All RVIs deal with limited resources or time, which significantly affects completeness, especially with respect to veracity, or timeliness (which is strongly associated with the ability to achieve completeness). Abstraction might also be affected, although usually to a lesser degree, except in the case of large-scale disclosures.

Conclusion

In my opinion:

You should not interpret any RVI’s statistics without considering its editorial policy. For example, the US-CERT Cyber Security Bulletin Summary for 2005 uses statistics that include replication. (As a side note, a causal glance at the bulletin’s contents makes it clear that it cannot be used to compare Windows to Linux as operating systems.)

In addition, you should not compare statistics from different RVIs until (a) the RVIs are clear about their editorial policy and (b) the differences in editorial policy can be normalized. Example: based on my PRELIMINARY investigations of a few hours’ work, OSVDB would have about 50% more records than CVE, even though it has the same underlying number of vulnerabilities and the same completeness policy for recent issues.

Third, for the sake of more knowledgeable analysis, RVIs should consider developing and publishing their own editorial policies.
(Note that based on CVE’s experience, this can be difficult to do.) Consumers should be aware that some RVIs might not be open about their raw sources, veracity analysis, and/or completeness.

Finally: while RVIs are not yet ready to provide usable, conclusive statistics, there is a solid chance that they will be able to do so in the near future. Then, the only problem will be whether the statistics are properly interpreted. But that is beyond the scope of this letter.

Steve Christey
CVE Editor

P.S. This post was written for the purpose of timely technical exchange. Members of the press are politely requested to consult me before directly attributing quotes from this article, especially with respect to stated opinion.

Exit mobile version