[This was originally published on the OSVDB blog.]
Earlier this week, Michael Roytman of Kenna Security wrote a blog with more details about the vulnerability section of the Verizon DBIR report, partially in response to my last blog here questioning how some of the data was generated and the conclusions put forth. The one real criticism I will note, is that Roytman’s blog does not acknowledge or warn that the list of CVE IDs included in the DBIR had typos, causing the wrong IDs to be included. In the world of vulnerability databases, that unique ID is designed specifically to avoid such confusion. Carrying the wrong IDs undermines the integrity of the data being presented.
In addition to my comments, Roytman had a long call with Adrian Sanabria which led to generating a new set of data with a different scope. From the Kenna blog:
We had an excellent offline discussion in which he dove deeply into the assumptions of my work, asked thoughtful, deep questions in private, and together, we came up with a better metric for generating a top 10 vulnerabilities list. To address these issues, I scaled the total successful exploitation count for every vulnerability in 2015 by the number of observed occurrences of that vulnerability in Kenna’s aggregate dataset. Sifting through 265 million vulnerabilities gives us a top 10 list perhaps more in line with what was expected – but equally unexpected! The takeaway here is that datasets like the one explored in the DBIR might be noisy, might have false positives and the like, but carefully applied to your enterprise the additional context successful exploitation data lends to vulnerability management is priceless.
I won’t go into much detail in this blog, but will say that I disagree with the statement that severely flawed data can produce takeaways that are “priceless”. Organizations acting on these top 10 lists may be spending time and resources chasing vulnerabilities that do not impact them, or pose very little risk compared to other threats they face. That action most certainly has a price; one that can be enumerated to some degree due to the cost of the employee time required. With that in mind, let’s look at the methodology which is spelled out in more detail than the DBIR, before we consider the new top 10 list Roytman generated. First, his notes on the data examined:
The first is a convenience sample that includes 2,442,792 assets (defined as: workstations, servers, databases, ips, mobile devices, etc) and 264,912,235 vulnerabilities associated to those assets. The vulnerabilities are generated by 8 different scanners, they are: Beyond Security, Tripwire, McAfee VM, Qualys, Tenable, BeyondTrust, Rapid7, and OpenVAS . This dataset is used in determining remediation rates and the normalized open rate of vulnerabilities.
I am curious if it is normal practice to consider an IP address an asset, when the system that addresses the IP is largely considered the asset. Moving past that, one point that sticks in my mind is the tools that generate the data. From the list above, consider that Beyond Security claims to have “what is arguably the world’s most complete database of security vulnerabilities.” Click around their site and you see that “the AVDS database includes over 10,000 known vulnerabilities and the updates include discoveries by our own team and those discovered by corporate and private security teams around the world.” That is less than 25% of what CVE has, and less than 10% of what VulnDB has. They even show you that they only cover 200 CVE IDs for 2016, as compared to the 1,474 open 2016 CVEs.
They don’t specify which Tripwire product, include McAfee Vulnerability Manager which was declared End of Life in October last year, and don’t specify which Qualys product. So it is a start as far as explaining what tools generate the data, but still leaves a lot of guess-work.
Roytman describes the second data set used as:
The second is a convenience sample that includes 3,615,706,022 successful exploitation events which all take place in 2015 which come from Dell Secureworks’ Counter-Threat Unit and Alienvault’s Open Threat Exchange.
The third qualification, describing the methodology is perhaps the most important, and was lacking in the DBIR:
Please note the methodology of data collection: Successful Exploitation is defined as one successful technical exploitation of a vulnerability on one machine at a particular timestamp. The event is defined as: 1. An asset has a known CVE open. 2. An attack come in that matches the signature for that CVE on that asset and 3. One or more IOCs are detected/correlated post attack. It is not necessarily a loss of a data, or even root on a machine. It is just the successful use of whatever condition is outlined in a CVE.
I have reached out to Michael and requested a sampling of data for two of the CVE IDs on his new list, and he is going to do so. In the meantime, I had a discussion with several people more familiar with IDS than myself and asked about how they would detect attacks for CVE-2013-0229 and CVE-2001-0540 as an example. Sure, detecting a specific type of packet meant to trigger this is one thing, but what is the threshold for the IDS to say it is an attack, when exploitation requires “a large number of malformed Remote Desktop Protocol (RDP) requests“. Is there a specific number of packets for it to flag an attack in progress? If too low, it may be prone to a high number of false positives. If too high, it may not detect a successful exploitation of the issue. Which leads into the second part of the methodology, comparing it with “one or more IOCs [that] are detected/correlated post attack”. In this case, it would presumably be the targeted service not responding, which could be detected a number of ways (e.g. probing the port, seeing specific errors in the logs, noticing a given process not running). Once the service is down, is the IDS that isn’t aware of the host state still cataloging a single attack? Or is it generating alerts every X minutes it detects an attack ongoing? Hopefully the data Michael sends will help me better understand how that correlation is being made, as it represents a source of incredible bias for the resulting data analysis.
Moving on to Roytman’s new list using the above data and methodology, here is the top 10 list he sees in the data:
- 2015-03-05 – 2015-1637 – Microsoft Windows Secure Channel (Schannel) RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2015-01-06 – 2015-0204 – OpenSSL RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2014-04-07 – 2014-0160 – OpenSSL TLS Heartbeat Extension Packets Handling Out-of-bounds Read Remote Memory Disclosure (Heartbleed)
- 2012-03-13 – 2012-0152 – Microsoft Windows Remote Desktop Protocol Terminal Server RDP Packet Parsing Remote DoS
- 2009-05-16 – 2013-0229 – MiniUPnPd SSDP Handler minissdp.c ProcessSSDPRequest Function Malformed Input Handling Remote DoS
- 2002-02-12 – 2002-0012 – Multiple Vendor Malformed SNMP Trap Handling DoS
- 2002-02-12 – 2002-0013 – Multiple Vendor Malformed SNMP Message-Handling Remote DoS
- 2001-12-20 – 2001-0876 – Microsoft Windows Universal Plug and Play NOTIFY Directive URL Handling Remote Overflow
- 2001-12-20 – 2001-0877 – Microsoft Windows Universal Plug and Play NOTIFY Request Remote DoS
- 2001-07-25 – 2001-0540 – Microsoft Windows Terminal Server RDP Request Handling Memory Exhaustion Remote DoS
Roytman prefaces that list with a comment that the “top 10 list perhaps more in line with what was expected – but equally unexpected!” Indeed, that is certainly true. Expected? A high-profile vulnerability disclosed in 2014 that was seen being widely exploited then, and subsequent years, something sorely missing from the DBIR list even after the corrected CVE IDs were factored in. This is the type of vulnerability almost everyone expected to top the list due to how easy it was to exploit, and knowing that it had been used heavily. This speaks to the original point and reason for the first blog; all the data ‘science’ in the world that produces highly questionable results should not be taken as gospel. Even if the methodology was sound, it doesn’t mean the data being used was.
Unfortunately, Roytman’s revised list deviates quickly into the “equally unexpected“. The first DBIR report list had a single denial of service (DoS) attack on the list, which stood out as odd. The revised list bumped that number to two which was a bit odd. The most recent list expands that to six DoS attacks, which is highly questionable for one reason or another. First, you can question the data set and methodology leading to this conclusion, but let’s say that passes muster solely for the sake of argument. Second, you can question why so many denial of service attacks are on a top 10 most exploited vulnerabilities list, framed in a context that uses the word ‘compromise’, riding on the back of a report centered around data breaches. These attacks are not causing attackers to gain privileges or steal data. They are a nuisance most of the time, or potentially used in serious DoS attacks other times. It makes one question why DoS attacks weren’t dropped from the results completely, and disclaimed as such! Or, generate two lists; one with results based on raw data, DoS and everything, and a second list that focuses on vulnerabilities that may allow for privileges to be gained and a real compromise to happen.
I cannot stress this enough. Using a term like ‘Indicator of Compromise’ (IOC) in the context of DoS attacks is disingenuous and misleading. Going back to Roytman’s introduction into this section where he makes a comment about seeing the trees (referring to the classic metaphor), I find it ironic as that sums up the purpose of my blog. The DBIR was written as if they could only see the trees (data points), and not the forest (bigger picture), which is what many people took issue with.
One point that I overlooked on the original list, and still appears on the new list, is the presence of FREAK (twice even). Fortunately for me, Thomas Ptacek does a great job explaining why FREAK is likely on the list, but absolutely should not be. Using Roytman’s blog and data, he calculates that attackers would have spent $332,183,325 using Amazon EC2 to exploit FREAK. He continues by citing one of the researchers who discovered FREAK explaining one way that a high number of false positives are generated on that particular vulnerability. He goes on to drive the point home, quoting the researcher and commenting that it likely has not been exploited in the wild by an average attacker.
Roytman essentially dismisses all of this in his blog post while saying that I am correct, that “IDS alerts generate a ton of false positives, vulnerability scanners often don’t revisit signatures, CVE is not a complete list of vulnerability definitions. But those are just the trees, and we’ll get to them later.” Unfortunately, he doesn’t get to it later in a way that provides meaningful insight into the questions about the data and conclusions. Dan Guido wrote an excellent summary of why the DBIR vulnerability section has issues, and factors in Roytman’s latest blog, breaking it all down in a manner that highlights the flaws. Even with the revised list, it is still missing the US-CERT top 30 previously cited, the Microsoft data, and the recently disclosed ‘top PoC exploits distributed on social media‘. At some point, one would logically conclude these lists should have more overlap. One thing I would love to see from Verizon and Kenna is a detailed explanation of their methodology as it relates to detecting client-side exploits, that appear to be the defacto standard for infecting tens, maybe hundreds of thousands of hosts, every year to create botnets.
I want to look at this from one more perspective, because I think it beautifully highlights how vulnerability analysis is a moving target, but in this case for all the wrong reasons. While most vulnerability aggregators and analysts are constantly adapting to new variations of vulnerabilities, new sources of vulnerability information, and new players in the game with wildly different styles of disclosure, others that come along after the data is generated and do analysis frequently seem to lose perspective in my experience. I believe this is such a case and is best illustrated in what the DBIR top 10 looks like over three revisions in less than two weeks. Yes, I know Roytman’s list isn’t officially the DBIR list, but he generated the initial data and then opted to do a different form of analysis putting it forth as something that is more representative due to applying his analysis to a more limited dataset that he presumably trusts more (i.e. the Kenna aggregate dataset). One has to wonder if this was brought up to Verizon as a better way to approach the list, and if so, why was it rejected.
The following lists show the evolution of the CVEs that appear on the top 10, with strikeout denoting the typos between the original DBIR and Gabe’s clarification which is reflected in current downloads, underlining to show any denial of service attacks, and bold to show the new CVE IDs that appear with Roytman’s reworking.
DBIR - DBIR Revised - Roytman Blog 2015-1637 – 2015-1637 - 2015-1637 2015-0204 – 2015-0204 - 2015-0204 2012-1054 – 2002-1054 - 2014-0160 2011-0877 – 2001-0877 - 2012-0152 2003-0818 – 2003-0818 - 2013-0229 2002-0126 – 2002-0126 - 2002-0012 2002-0953 – 2002-0953 - 2002-0013 2001-0876 – 2001-0876 - 2001-0876 2001-0680 – 2001-0680 - 2001-0877 1999-1058 – 1999-1058 - 2001-0540
In my mail to Roytman asking for a sample data set, I suggested that it would be interesting to see him generate a list using his methodology, but remove any DoS attack (six of his ten), so the top list only included exploits that could achieve remote privileges of some sort. He replied to me with:
… and again, awesome idea. One of those all-too-simple in retrospect, damnit why didn’t I think of it earlier things.
I found this interesting thinking back to his use of the forest and the trees metaphor.