syft: Package duplicated by different cataloger

What happened:

Scanning almalinux:latest image with various tools to compare the generated SBOM. At first sight Syft sounds better as total number of identified components was greater. But… making a deeper analysis showed that Syft had identified the same packages from rpm and from python.

Example Rpm cataloger finding:

   "name": "libcomps",
   "version": "0.1.16-2.el8",
   "type": "rpm",
   "foundBy": "rpmdb-cataloger",
   "locations": [
    {
     "path": "/var/lib/rpm/Packages",
    }
   ],
   "licenses": [],
   "language": "",
   "cpes": [
    "cpe:2.3:a:almalinux:libcomps:0.1.16-2.el8:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:libcomps:0.1.16-2.el8:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:rpm/almalinux/libcomps@0.1.16-2.el8?arch=x86_64&upstream=libcomps-0.1.16-2.el8.src.rpm&distro=almalinux-8.5",

Example Python Cataloger finding:

   "name": "libcomps",
   "version": "0.1.16",
   "type": "python",
   "foundBy": "python-package-cataloger",
   "locations": [
    {
     "path": "/usr/lib64/python3.6/site-packages/libcomps-0.1.16-py3.6.egg-info",
    }
   ],
   ],
   "licenses": [
    "GPLv2+"
   ],
   "language": "python",
   "cpes": [
    "cpe:2.3:a:rpm_software_management:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_software_management:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_software_management:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python-libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python_libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:python-libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:python_libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm-ecosystem:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:rpm_ecosystem:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:libcomps:libcomps:0.1.16:*:*:*:*:*:*:*",
    "cpe:2.3:a:python:libcomps:0.1.16:*:*:*:*:*:*:*"
   ],
   "purl": "pkg:pypi/libcomps@0.1.16",
    "license": "GPLv2+",

In fact this python package is delivered by above rpm so shall point to the same.

What you expected to happen:

In fact I am not sure if this is a good or bad to have duplicate for the same. Note that purl/cpe are different.

Searching on NVD https://nvd.nist.gov/products/cpe/search/results?namingFormat=2.3&keyword=libcomps the CPEs are only rpm based cpe:2.3:a:rpm:libcomps:.... Thus we may think that only rpm is necessary. Maybe a way to reduce the findings when one also belongs to another packager could be provided.

How to reproduce it (as minimally and precisely as possible):

docker run \
            --rm \
            -it \
            -v /var/run/docker.sock:/var/run/docker.sock \
            -v $PWD:/tmp/workdir \
            anchore/syft:latest \
            -v \
            packages \
            -s Squashed \
            -o json \
            --file /tmp/workdir/bom.json \
            docker:almalinux:latest

Anything else we need to know?:

Environment:

  • Output of syft version: 0.38.0 (and same result with 0.42.4)
  • OS (e.g: cat /etc/os-release or similar): N/A

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 22 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I’m starting to see the case for why a binary package in particular should probably not be included if there is an owning package… since the “binary packages” were entirely synthesized by syft. That may call for excluding them by default.

Let users decide which order to prioritize package cataloger discoveries (which takes precedent).

This resonates with me too. Here’s an option of what that might look like in syft configuration:

drop-packages-with-ownership-overlap:
  - parent-type: class:os
    type: binary

Where class:os is short hand for ["apk", "alpm", "rpm", "dpkg", "portage"], so the full expression would be:

drop-packages-with-ownership-overlap:
  - type: binary
    parent-type:
    - "apk"
    - "alpm"
    - "rpm"
    - "dpkg"
    - "portage"

This could be the default configuration. However, we could allow for simple expressions like:

drop-packages-with-ownership-overlap:
  # drop any python package that is owned by an RPM package
  - parent-type: rpm
    type: python

Alternatively we could allow for something as agnostic as dropping packages based off of more generic criteria:

drop-packages:
  - relationship-type: ownership-by-file-overlap 
    parent-type: class:os
    type: binary

But this is really starting to get into something like https://github.com/anchore/syft/issues/31 … but I’d like to avoid this since #31 is really about how to apply hints for a specific image, and this issue is really about ignoring a class of packages based on structural elements, regardless of the specifics of an image.

@wagoodman alright… so there is nothing we could do in that particular case (and other similar ones). Thanks for your investigation and for the great product 👍