cyclonedx-maven-plugin: Aggregate BOMs cannot handle components with differing dependency trees in different modules

When processing aggregate BOMs it’s possible to encounter projects which would cause the resolution of dependencies for a component to differ, for example

  • during the resolution process, with different sets of transitive dependencies
  • using dependency management
  • using exclusions

An example of exclusion could be represented by the following dependency trees, where dependency_F has managed dependency_B to exclude dependency_E from the dependency graph

com.example.dependency_trees.exclusion:dependency_A:jar:1.0.0
\- com.example.dependency_trees.exclusion:dependency_B:jar:1.0.0:compile
   +- com.example.dependency_trees.exclusion:dependency_C:jar:1.0.0:compile
   |  \- com.example.dependency_trees.exclusion:dependency_D:jar:1.0.0:compile
   \- com.example.dependency_trees.exclusion:dependency_E:jar:1.0.0:compile

and

com.example.dependency_trees.exclusion:dependency_F:jar:1.0.0
\- com.example.dependency_trees.exclusion:dependency_B:jar:1.0.0:compile
   \- com.example.dependency_trees.exclusion:dependency_C:jar:1.0.0:compile
      \- com.example.dependency_trees.exclusion:dependency_D:jar:1.0.0:compile

An example of managing the versions could be represented by the following dependency trees, where dependency_E has managed the version of dependency_C to use version 2.0.0 instead of 1.0.0

com.example.dependency_trees.managed:dependency_A:jar:1.0.0
\- com.example.dependency_trees.managed:dependency_B:jar:1.0.0:compile
   \- com.example.dependency_trees.managed:dependency_C:jar:1.0.0:compile
      \- com.example.dependency_trees.managed:dependency_D:jar:1.0.0:compile

and

com.example.dependency_trees.managed:dependency_E:jar:1.0.0
\- com.example.dependency_trees.managed:dependency_B:jar:1.0.0:compile
   \- com.example.dependency_trees.managed:dependency_C:jar:2.0.0:compile (version managed from 1.0.0)
      \- com.example.dependency_trees.managed:dependency_D:jar:1.0.0:compile

Note this last example is also something that could occur based on the dependency resolution process, since the context of the roots would be different and could resolve to a different set of artifacts.

The aggregate SBOM should be able to represent all the valid dependency hierarchies, which means that each component with an alternative dependency hierarchy would exist multiple times (same purl but differing bom-ref/ref)

In both the above examples we would expect to see two components included for dependency_B with each related to a different reference (bom-ref) and a different dependency hierarchy (ref)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 37 (28 by maintainers)

Commits related to this issue

Most upvoted comments

@jkowalleck @aloubyansky This was intended to be a discussion about the underlying issue, with any spec changes being discussed elsewhere. Can we please keep to this?

There was an initial proposal I made to Steve about possible spec changes, trying to remain backwardly compatible with the current use. The idea was that we would keep ref as the primary attribute for each dependency, as the linkage between then, but allow an additional (optional) attribute (componentRef was suggested, but Steve wasn’t keen on the name) which could then override the reference to the component.

  • ref would remain unique within the dependencies section
  • by default ref would reference the component bom-ref
  • if we needed to include the component multiple times then we would choose unique ref attributes and then override the component relationship using the second attribute

Steve had some other ideas, including looking at how we could use some potential changes already coming through 1.5, but this particular discussion is really about what we can do now and not what the future direction would be.

The suggestions for what we could do now really focussed on composition, but I’m not really a fan of these since they explode the size of the SBOM and makes it even more unwieldy than it currently is. I have some things to test out to compare the approaches, unfortunately I’ve been offline this week because of family in the UK (still here for a few days).

I’ll get back to this soon.

this issue is getting to mixed. i would suggest to convert it to a discussion, so we could use threads to stay focussed and split topics at the same time. @CycloneDX/java-maven-maintainers

Here is how it could look like for the following two module framework

+1 this is the proposal I originally had on my blog post.

Hiya Alexey

Could we clarify the following: could a single SBOM document contain multiple components that share the same purl but have unique bom-refs?

Yes, the npm SBOMs are already doing this.

AFAIU, the answer is yes. And if so, there is no problem representing multiple variations in direct dependencies of a component with the same purl in a single SBOM. As to how often this will happen, it depends on what a given project represents.

The current discussion is more to do with how this is represented, whether we try to de-duplicate and flatten the representation in the SBOM (as this PR does) or generate the hierarchy through the assembly approach leading to the exploded dependency graph.

If a build of a project produces a single and flat (classloading-wise if we are talking about Java) runtime then, I suppose, what counts is what ends up in that single runtime, which would typically represented as a single module (even if it’s a multi module project) and, I would argue, there shouldn’t be any aggregation happening. So the issue raised wouldn’t occur in this case.

In this case it would be a normal SBOM rather than the aggregate one (leaving aside the “build time vs runtime” discussion for now).

If the produced runtime is not flat classloading-wise or a project produces multiple root components that could be consumed in any combination by target users then we’d either need a separate SBOM per root component or an aggregate one, in which dependency variations for components with the same purl will be a common case and must be supported. Having an option to “suppress” them in some way will be a major flaw in the tool.

+1, which is what this PR is addressing albeit using a flat structure.

The issue with the current cyclonedx maven plugin implementation is the dependency hierarchies for aggregated SBOMs are unreliable, I wrote this up in a blog.

In the maven plugin case these components are not duplicated in the filesystem but do exist multiple times in the build, with different hierarchies, and the current SBOM generation is representing the build time resolution. Consumers of the artifact see a different graph, which is the subject of a different issue (#312).

Note that if each of these aggregated builds were packaged up (i.e. in a zip/tar etc) then it would be the same situation.

re: https://github.com/CycloneDX/cyclonedx-maven-plugin/issues/310#issuecomment-1482648661

PURL is the identifier used by the maven plugin, and this is only a proposal for the maven plugin. The npm approach is handling the generation the same way, but choosing a different mechanism for generating their unique bom refs. This is not something that needs to be standardized across every provider, the npm approach shows this is already not the case, and only has to make sense within the context of the SBOM where the reference is relevant.

Correct me if I’m wrong, @stevespringett

Each bom-ref’s value must be unique in the CycloneDX document it is defined in. That is asserted by schema rules - like here: https://github.com/CycloneDX/specification/blob/ccbf7b5781ef534cd62616e3c4221004c7c82a66/schema/bom-1.4.xsd#L2402-L2405

The value of a bom-ref is nothing with meaning outside the CycloneDX document. Its only purpose is to be an anchor that can be referenced (via ref) inside the document or from another document via bom-link (read https://cyclonedx.org/capabilities/bomlink/). You could use any string for bom-ref value. Usually people derive the value from PURL, in hope it is unique in the context of the BOM document they build.

@jkowalleck I’ve finally published the blog post, I hope this helps to explain the current issues with the aggregated SBOM. I included a number of examples in the blog, what I feel are the more common scenarios, and there are two more in the PR tests covering version management/exclusion.

It’s absolutely related. If you take an application that has a multi-module build, such as Webgoat, the SBOM that I generate at build represents the build that is potentially delivered to customers. Webgoat does not deliver multiple versions of commons-io for example. It delivers only one in the resulting artifact. Having multiple of the same component will be a non-starter for many/most orgs.

I’m fine with making this configurable, but I think the only way this makes sense is if we then fail the build when we know that it will generate SBOMs that are not consistent. Webgoat is a good project to discuss, since there are certainly inconsistencies in those projects which show up through the aggregated SBOM.

One example is oauth-bypass, when it is built it has the following direct dependencies

    <dependency ref="pkg:maven/org.owasp.webgoat.lesson/auth-bypass@v8.0.0.M15?type=jar">
      <dependency ref="pkg:maven/org.owasp.webgoat/webgoat-container@v8.0.0.M15?type=jar"/>
      <dependency ref="pkg:maven/org.owasp.encoder/encoder@1.2?type=jar"/>
      <dependency ref="pkg:maven/com.thoughtworks.xstream/xstream@1.4.7?type=jar"/>
      <dependency ref="pkg:maven/org.projectlombok/lombok@1.16.20?type=jar"/>
      <dependency ref="pkg:maven/org.apache.commons/commons-exec@1.3?type=jar"/>
    </dependency>

and when consumed by webgoat-server it has the following direct dependencies

    <dependency ref="pkg:maven/org.owasp.webgoat.lesson/auth-bypass@v8.0.0.M15?type=jar">
      <dependency ref="pkg:maven/org.owasp.encoder/encoder@1.2?type=jar"/>
      <dependency ref="pkg:maven/com.thoughtworks.xstream/xstream@1.4.7?type=jar"/>
      <dependency ref="pkg:maven/org.apache.commons/commons-exec@1.3?type=jar"/>
    </dependency>

In this example the versions of the three artifacts above happen to be the same in each, however that is not guaranteed to be the case.

If this change is implemented, I will need a way to disable this functionality for use with my own employer. I suspect other Java shops will need the same since they do not want to erroneously include duplicate components. If a workaround is not provided, I will no longer be able to use the plugin myself, and will have to find alternatives.

We need to think about all the use cases that are affected by a change like this. While this change works for some use cases, it doesn’t for others.

Agreed, although I think this is really only an issue when these differences exist and are mis-represented in the aggregate SBOM. In those situations the aggregate SBOM could easily mislead someone into thinking they know everything which has gone into the build when perhaps they don’t. I believe the only safe approach we currently have is to rely on the individual BOMs.

We have a call tomorrow with @hboutemy, perhaps this is easier to go through in person.

Note: prior to https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/306 the dependency tree would only include one of the dependency trees. After this PR is applied the SBOM will contain multiple components for dependency_B, with the same purl but differing bom-refs, and each component will be included in a separate dependency tree.