dependency-track: Dependency Track High CPU Behaviour

Issue Type:

  • defect report
  • enhancement request

Current Behavior:

Dependency-Track v3.3.1 was running with 25 projects (all maven) succesfully analysed (manual upload of CycloneDX BOM) and with a total of 1000 components.

CPU usage was around 3%.

On 14 December, I then succesfully analysed my first npm project (also using cycloneDX but this time working in a Jenkins pipeline) with 1003 components.

Three days later (after a weekend and with me now being on vacation) the owner of the npm project disabled dependencyTrackPublisher in the pipeline as it was endlessly stuck on…

[DependencyTrack] Polling Dependency-Track for BoM processing status

The DT server logs show that the the last occurence of DependencyCheckScanAgent event was on 19 December.

I returned to work 3 weeks later on 7th January to find DT UI displaying…

image

There were no errors in the logs from 14th December to 7th January apart from a couple of PostgreSQL connect errors (8003 and 8006). Mostly, things looked OK (lots of NistMirrorTask etc).

After upgrading Dependeny-Track to v3.4.0 on 7th January, the UI looked OK… but things were still not working properly.

A manual upload of a CycloneDX BOM reported success in the UI - but resulted in no analysis being performed. A dependencyTrackPublisher synch of CycloneDX BOM from Jenkins resulted in endless “Polling” entries in the Jenkins console output. A restart of DT folllowed by an immediate manual CycloneDX BOM upload did produce and analysis… but this was temporary.

I think I have solved the problem. I found that DT was still running on an Azure B Series burstable VM

This gave the folllowing CPU usage…

image

Thus, I upgraded the server to a higher spec and things look OK so far. I have read #255 and the response (worker threads, etc), but upgrading the VM took only a few minutes to do!

In retrospect there are still a couple of things that were not as expected…

Expected Behavior:

  1. Should an npm project have such a huge performance impact? (see CPU graph). Although it doubled the number of components known to the system, the biggest visible difference seems to be that almost every license is “resolved” (has a link) whereas the maven projects all list licences as text alone.
  2. DT server should spot and log CPU problems. In my environment, I have access to the logs - but I had to request the CPU graph from someone else. Covered by #260?
  3. When polling for dependency-track-plugin should not wait so long when polling DT server before failing.

Environment:

  • Dependency-Track Version: 3.3.1 and then 3.4.0
  • Distribution: [Executable WAR]
  • BOM Format & Version: CycloneDX
  • Database Server: [PostgreSQL]

Other Details:

Occurred a couple of time between 14th and 18th December but not sure if there is any connection to this issue. The database is “Azure PostgreSQL” (ie, separate to DT VM) and uptime is supposed to be really good.

2018-12-14 18:08:03,169 [] WARN [com.zaxxer.hikari.pool.ProxyConnection] HikariPool-2 - Connection org.postgresql.jdbc.PgConnection@62928ff4 marked as broken because of SQLSTATE(08006), ErrorCode(0)
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:335)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 28 (19 by maintainers)

Commits related to this issue

Most upvoted comments

I have performed testing with v3.4.1 and everything looks great so far.

No issues with 3.4.1, NPM audit has run/completed numerous times and it seems much more stable. Thanks again Steve!

Thanks, Steve. Just tested with the new 3.4.1 container and activated npm scanner. It’s working fine. The CPU load drops after the scan is completed.

Thanks Steve! I’ll get it install and tested today, will be back later on today with results. Need several hours to see if it goes off to lala land.

3.4.1 is released. I would normally close this issue at this point, but I’d like confirmation from @msymons and a few of the other folks that the issue has been resolved.

I’m able to reproduce the issue in v3.4 and v3.5-snapshot. v3.5-snapshot doesn’t behave quite as bad, but still has the issue. I’ll be diving deep over the next few days trying to determine the cause.

  • Dependency-Check enabled: (yes/no) Yes

  • NPM Audit enabled: (yes/no) Yes

  • OSS Index enabled: (yes/no) No

  • Notifications enabled:

    • Slack/Microsoft Teams/Webhooks (yes/no) Yes
    • Email: (yes/no) Yes
    • Console: (yes/no) No
  • VulnDB enabled: (yes/no) Not sure

  • LDAP enabled: (yes/no) Yes

  • LDAP synchronization enabled: (yes/no) Yes

  • LDAP user provisioning enabled: (yes/no) Yes

  • Database vendor/version: ALPINE_DATABASE_DRIVER=org.postgresql.Driver ALPINE_DATABASE_DRIVER_PATH=/extlib/postgresql-42.2.5.jar

  • Java vendor/version: image: owasp/dependency-track openjdk version “1.8.0_181”

  • Host OS vendor/version: Dockerimage: owasp/dependency-track

  • Distribution:

    • Traditional WAR
    • Executable (embedded Jetty) WAR
    • Docker Docker
  • Proxy server enabled: (yes/no) Yes, Nginx

In my observations, the NPM Audit analyzer is having some issues, and in come instances goes into an endless loop. This will slowly cause thread exhaustion. However, the issue is compounded by the database connection pool which begins to consume threads for database connections it cannot get back. How fast or how often this occurs may be database specific, but fixing the cause (npm) will likely fix the entire cascading effect its having.

Experience same thing where Dependency-Track gets stuck in a loop consuming 30-40% CPU, UI is down and returns simple text error message. Can’t even make it a day without this occurring. Last thing in log is always updating Executing metrics update on the NPM project. Had to resort to implementing an automated restart every 5 hours that alleviates the problem.

  • Dependency-Check enabled: No
  • NPM Audit enabled: Yes
  • OSS Index enabled: Yes
  • Notifications enabled: No (bugs where implied filters don’t work and end up getting spammed as a result, so removed them all)
  • VulnDB enabled: No
  • LDAP enabled: Yes
  • LDAP synchronization enabled: No
  • LDAP user provisioning enabled: No
  • Database vendor/version: Microsoft SQL Server 2016
  • Java vendor/version: Oracle 8u202
  • Host OS vendor/version: Windows Server 2016
  • Distribution: Executable (embedded Jetty) WAR 3.4.0
  • Proxy server enabled: No

Finally, are you able to test 3.5.0 SNAPSHOT releases on non-production data? No

Hello,

I’m also experiencing the high CPU load with the following setup:

  • Dependency-Check enabled: no
  • NPM Audit enabled: yes
  • OSS Index enabled: yes
  • Notifications enabled: no
  • VulnDB enabled: no
  • LDAP enabled: yes
  • LDAP synchronization enabled: no
  • LDAP user provisioning enabled: no
  • Database vendor/version: postgres:10
  • Java vendor/version:
  • Host OS vendor/version:
  • Distribution:
    • Docker: owasp/dependency-track (3.4)
  • Proxy server enabled: yes (Apache reverse proxy)

Constant high CPU usage here as well:

  • Dependency-Check enabled: No (but was enabled some weeks ago)
  • NPM Audit enabled: yes
  • OSS Index enabled: yes
  • Notifications enabled: Slack
  • VulnDB enabled: No
  • LDAP enabled: No
  • Database vendor/version: SqlServer 2012 (com.microsoft.sqlserver.jdbc.SQLServerDriver from sqljdbc4-6.2.1.jar)
  • Java vendor/version: Oracle 8u192
  • Host OS vendor/version: Windows Server 2012 R2
  • alpine.worker.thread.multiplier=1 (40 cores)
  • Distribution: Executable (embedded Jetty) WAR
  • Proxy server enabled: No
  • Finally, are you able to test 3.5.0 SNAPSHOT releases on non-production data? Sure!