GVD Discussion – Round Two

jericho

2 years ago

Tom Alrich published a blog titled “The Global Vulnerability Database won’t be a “database” at all” on November 10, 2023. In the blog Tom lays out some ideas for how this “database” would operate and the advantages he sees. I didn’t see this blog until early May and posted my “Thoughts on Tom Alrich’s “Global Vulnerability Database”“. Tom and I had a little back and forth on LinkedIn as he was pressed for time, but said he would reply to my comments when possible. Shortly after, Tom posted “Is it time to seriously discuss the Global Vulnerability Database?” and then a few weeks later, Tom posted a new blog titled “Clarifying the Global Vulnerability Database”. I really appreciate the time and thought Tom is putting into this idea and putting it out there for feedback. Ball is in my court I guess, since I have to give thought to both!

I will start with the first, “Is it time to seriously discuss the Global Vulnerability Database?”.

“However, it seems the cavalry lost its way, because – despite Tanya’s promise at VulnCon in late March that the Consortium would be announced in the Federal Register imminently – nothing more has appeared on that front (I think the cavalry got enticed by Las Vegas when they galloped through it. They haven’t been heard from since).”

Worse, Tanya told attendees at VulnCon that it would be a year before any of it really got moving. On the back of the NVD “slowdown”, which was really more of a shutdown, this was definitely no longer a cavalry. Because a cavalry typically shows up in time.

“The database should follow the naming conventions we suggested in our 2022 white paper, or something close to them. Of course, there are still a lot of details to be filled out regarding those suggestions.”

For ease of reference, that white paper is titled “A Proposal to Operationalize Component Identification for Vulnerability Management” and can be found here. It starts with, and summarizes the problem as “One of the biggest problems in software security today, and a huge inhibitor of the automation of production and use of software bills of materials (SBOMs), is the “naming problem” – that is, the fact that a single software product is known by different names in different ecosystems, and there is no “canonical” name that will work everywhere. This problem is most acute in the National Vulnerability Database (NVD), the most widely used vulnerability database in the world.“

I quote all of this because it is a significant problem that will be revisited below.

“However, given the fact that so many organizations worldwide have been using the NVD for free for many years, and had an overall good experience, without once being asked to contribute a dime, I was – and remain – sure that the support will be there when we ask for it.”

I would disagree here. Organizations were not asked to contribute a dime directly, but NIST NVD is taxpayer funded. We are already paying for it, like it or not.

“Thus, there can never be a one-to-one “mapping” of CPEs to purls (or vice versa), meaning harmonization of OSS identifiers in a single database would be impossible.”

I mention this later replying to the next blog, and quote it here because it is an extremely important point. At this point we could conceivably stop here in this whole discussion. If this problem can’t be solved, can the GVD truly solve the problem Tom wants it to?

“Today, a single database can very easily have multiple identifiers for single products, without breaking a sweat.”

Say it loud for the MITRE kids in the back! =)

And now for select quotes and thoughts on Tom’s second post:

“When I read Brian’s post, I realized that a number of his objections wouldn’t have been valid if I’d rewritten the post so that it focused on the single idea of a switching hub, not an actual database – although I presume I won’t be thrown in jail if users think of it as a single database.”

Not at all, but it is good to be as precise as possible here. You unfortunately are coming after MITRE and their insistence that CVE is not a database, rather, a dictionary. I have long since disagreed with that given the context and purpose of CVE. Over the years, their lack of quality assurance on entries has gone down so it is even easier to argue that they are not a dictionary. Or if they are, it is one that doesn’t even define the vulnerability with such poor descriptions and links that are not accessible. So for the GVD, whether it is a ‘switching hub’ or ‘database’ ultimately doesn’t matter. What matters is that it delivers thorough vulnerability intelligence in a timely manner that is easily consumed. So for the sake of this discussion, we need to look at it as a specialized search engine or an actual database, depending on the model and approach used. Either way, that is a pedantic argument and not a big concern at all.

“The point is that it should be possible in 2024 to field diverse queries – regarding different types of products (open source software, proprietary or “closed source” software, and intelligent devices), different types of vulnerabilities (CVE, OSV, GitHub Security Advisories, etc.), and different identifiers (CPE and purl) – and have an intelligent engine that decides, for each query, which is the best database or combination of databases to resolve the query.”

I fully agree here. Unfortunately, due to the CVE ecosystem not evolving at all, it has fallen short over the years and become less usable. While MITRE is minting CNAs at a record pace lately, it isn’t improving the quality of the intelligence as CNAs are not held to the rules they originally agreed upon. With the recent NVD issues and subsequent fallout with CISA who is doing “their own” enrichment (that is outsourced actually), the community has lost faith in what is easily the “no child left behind” of vulnerability databases. Over time, this has led for other databases and initiatives to pop up, such as the ones Tom quotes. MITRE has brought us here, and the community needs to remember that.

“There would need to be a lot of intelligence behind both of these steps, since they won’t be easy at all (and they will require quite a lot of prior knowledge, such as whether a report in OSS Index that a particular software product – identified with a purl – is affected by a CVE has the same status as a report in CVE.org that the same purl is affected by the same CVE, since they will have been derived very differently).”

Saying this will need a lot of intelligence is accurate, and probably an understatement. One thing Tom has mentioned before is the problem of ways to identify software. While a Persistent Uniform Resource Locators (PURL) is one solution, it isn’t the only one used by vulnerability databases. So not only do you need to have an intelligent mapping from PURL to PURL, you also need it from CPE to PURL, and possibly other identifiers. It’s easy to have multiple valid PURLs all for the same piece of software. In theory there should only be one CPE, but since NIST has held CPE hostage for so long, many solutions have to use their own for software that does not appear in NVD. All of this will be a huge undertaking, something that will require constant updating, and become a new headache if not implemented very well from the start.

“The NVD may have some sort of criteria they follow (e.g., “Always put a comma before ‘Inc’ and a period after it.”), but they’re clearly just rough rules of thumb if they exist at all, since CPE names vary for seemingly random reasons.”

Somewhere there are / were CPE specifications, likely before NVD took control of it. Early in the VulnDB days, we used them so we could generate our own CPE for products that didn’t appear in NVD. The fact that a seasoned vulnerability practitioner isn’t sure standards exist speaks volumes to how poorly NVD has managed CPE.

“You might say something like, “What does Microsoft call the product on their web site?” And I ask, which of the Microsoft web sites are you referring to? Is Microsoft going to enforce standard naming across all web sites worldwide? And what about blog posts on the Microsoft sites? Will they follow some sort of internal Microsoft standard? Etc., etc.”

It’s worse than that, as some vendors will refer to their own name differently on the same home page. Variations in caps, spacing, the use of e.g. Inc. or LLC, and even at times not updating one instance of a prior name.

“The company will hopefully rigorously enforce use of their chosen names, and the CNAs will be severely disciplined if they use any other in naming their products in a CVE report…And while we’re at it, the lion will lie down with the lamb and I will study war no more and people will stop having loud cell phone conversations on trains; that is, all the world’s problems will be solved…”

That’s right! The most laughable part of this is thinking that CNAs are held accountable in any way. Unfortunately they are not, and MITRE has shown no interest in doing so.

“As long as you know the package manager (or source repository) that you downloaded an open source component from, as well as the name and version string in that package manager, you can create a purl that will always let you locate the exact component in a vulnerability database. This is why purl has literally won the battle to be the number one software identifier in vulnerability databases worldwide, and literally the only alternative to CPE.”

Unless… you end up having half a dozen PURLs for the same package, because it is available on a vendor’s page, GitHub, GitLab, Gitee, and every package manager out there. Now, let’s jump back just one paragraph:

“By the way, who will pay for that inordinately expensive database of product and company names? It will cost a huge amount of money, both to put together and to maintain – much more than the cost of the NVD and CVE.org databases combined. Face it: an identifier that requires an expensive auxiliary database to make it work is a dead end.”

Who will maintain this epic list of PURLs? As of this blog, there are only 379 CNAs with tens of thousands of software companies out there. Not to mention the over one hundred million repositories on GitHub alone. While a PURL may be an open standard where CPE is not, it forces the community to set a PURL for every instance of the location of that software. That sounds like the big database you don’t think is viable?

“Currently, there are no purls in CVE.org. However, the fact that CVE now supports purl in CVE Format 5.1 (formerly “CVE JSON spec v5.1) – a change requested by the SBOM Forum two years ago – means there will be purls when the CNAs start adding them to their CVE reports (which unfortunately will probably not be soon, given the substantial training that will need to be conducted.”

It also means that the last 25 years of CVE IDs will not receive PURLs either, or not in any significant volume. It also means that only CNAs that voluntarily opt-in to the use of PURLs will begin populating them, which again, is a drop in the bucket.

“However, there is one big fly in the purl ointment: It currently doesn’t support proprietary (or “closed source”) software.”

And the other shoe drops. =) So, this is not a critique by any means, just highlighting the problems the community faces. The problems we faced 10 years have just compounded and here we are. Not that there were realistic solutions to all of these problems back then, and even if there were, we certainly didn’t address them then.

“I think this is a solvable problem, but it will depend – as a lot of worthwhile practices do – on a lot of people taking a little time every day to solve a problem for everybody. In this case, software suppliers will need to create a SWID tag for every product and version that they produce or that they still support. They might put all of these in a file called SWID.txt at a well-known location on their web site. An API in a user tool, when prompted with the name and version number of the product (which the user presumably has), would go to the site and download the SWID tag – then create the purl based on the contents (there are only about four fields needed for the purl, not the 80 or so in the original SWID spec).”

Unfortunately, I think at this point, this is a pipe dream. I am quite literally discovering new .well-known “standards” only by seeing them as requests ending in a 404 response in my web logs. So any such solution based on .well-known I think isn’t viable now, and likely won’t be moving forward.

All of that said, I am encouraged to see continuing discussion on how to start to remedy these problems. One thing I will put out there, as a career pessimist is this; what if we’re so far into this mess, and the problems have compounded so much that there is not a viable grand solution to everything here? I think that while pessimistic, it is important to keep discussions grounded in reality and have some alternate plans. I do think that the idea of the GVD has a lot of merit, while facing challenges. Perhaps if done right, it would alleviate a lot of the problems and technical debt that has been accrued on the back of government mismanagement of what has become the most used and referenced vulnerability database. Or perhaps not?

Plan for the worst, hope for the best.

Share this: