stackrox: central crashes at startup with quay.io/stackrox-io/main:3.72.1 (also: 3.72.0)

Hello,

the central pod crashes while starting up:

cve/fetcher: 2022/10/20 09:34:33.286511 orchestrator.go:237: Info: Successfully fetched 0 OpenShift CVEs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x33b2efd]

goroutine 117 [running]:
github.com/stackrox/rox/central/cve/converter/utils.nvdCvssv2ToProtoCvssv2(0x0)
	github.com/stackrox/rox/central/cve/converter/utils/convert_utils.go:173 +0x1d

This was an installation with helm stackrox-central-services-71.0.0, upgraded to stackrox-central-services-72.0.0, which then started crashing some time (less than 1 day) after the upgrade. Still crashing with 72.1.0.

Values for helm look like this:

image:
  registry: <proxy for quay.io>/stackrox-io

env:
  proxyConfig: |
    url: http://...
    excludes:
    - ...

central:
  exposure:
     loadBalancer:
       enabled: true

scanner:
  autoscaling:
    disable: true

This is the full log for central: crash.log

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Hello folks,

thank you for reporting this incident, and our sincere apologies for the disruption this caused.

Due to an unexpected schema change in an upstream vulnerability feed, a corrupted CVE data file has been published to https://definitions.stackrox.io/ and consumed by a large number of Central instances. As a result of the data corruption, Central crashes. To the best of our knowledge, this affects all Central versions.

While we have already taken steps to ensure a non-corrupted version is served from https://definitions.stackrox.io/, affected Centrals will not be able to get out of this crashloop state without manual intervention in order to delete the previously downloaded, corrupted file.

In order to get Central back to a working state, please follow these instructions to delete the file: https://gist.github.com/misberner/c43a666fc0a6ff335925b9800473d489

We have already identified further steps to prevent issues of this kind from happening in the future, specifically ensuring that Central self-heals when a dependency corruption is fixed.

Same issue happening with version 3.70.1 without any recent changes in configuration/version.

Thank you for reporting! The team got notified about this issue by internal monitoring and is currently looking into this.

Working like a charm! Thank you very much for the quick workaround.

Thanks for providing information about the issue. The team has found the root cause and currently working on releasing a workaround and releasing a patch fix as soon as possible.

Unfortunately, all versions of Central may be affected. We understand the root cause and have a recovery command on the way. It will be shared here soon. Sorry for the downtime.

Same here on 3.71.0.

Looking at the stack trace I’m guessing something has changed in an external vulnerability database which is now making the Stackrox processing crash? And that’s why it started happening suddenly without any changes and affects several versions.

confirm it works

Works for me as well. Thank you for the fix. 🙂