django-DefectDojo: Deduplication stopped working (APIv2)

I am using api v2 with a docker-compose setup with current dev branch. I am trying to import dependency check results.

Deduplication is not tiggered. I guess it has somehow been triggered before.

nginx_1         | 2019/08/10 14:45:01 [warn] 6#6: *691 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000006, client: 192.168.48.1, server: , request: "POST /api/v2/import-scan/ HTTP/1.1", host: "localhost:8080"
uwsgi_1         | /usr/local/lib/python3.5/site-packages/django/db/models/fields/__init__.py:1363: RuntimeWarning: DateTimeField Test.target_start received a naive datetime (2019-08-10 00:00:00) while time zone support is active.
uwsgi_1         |   RuntimeWarning)
uwsgi_1         | /usr/local/lib/python3.5/site-packages/django/db/models/fields/__init__.py:1363: RuntimeWarning: DateTimeField Test.target_end received a naive datetime (2019-08-10 00:00:00) while time zone support is active.
uwsgi_1         |   RuntimeWarning)
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
uwsgi_1         | applying rules
nginx_1         | 192.168.48.1 - - [10/Aug/2019:14:45:03 +0000] "POST /api/v2/import-scan/ HTTP/1.1" 201 240 "-" "Java/1.8.0_222" "-"
uwsgi_1         | [pid: 439|app: 0|req: 5/5] 192.168.48.1 () {40 vars in 722 bytes} [Sat Aug 10 14:45:01 2019] POST /api/v2/import-scan/ => generated 240 bytes in 2006 msecs (HTTP/1.1 201) 4 headers in 128 bytes (1 switches on core 0)
uwsgi_1         | [pid: 439|app: 0|req: 6/6] 192.168.48.1 () {36 vars in 524 bytes} [Sat Aug 10 14:45:03 2019] GET /api/v2/products/?name=test3 => generated 488 bytes in 14 msecs (HTTP/1.1 200) 4 headers in 134 bytes (1 switches on core 0)
nginx_1         | 192.168.48.1 - - [10/Aug/2019:14:45:03 +0000] "GET /api/v2/products/?name=test3 HTTP/1.1" 200 488 "-" "Java/1.8.0_222" "-"
uwsgi_1         | [pid: 439|app: 0|req: 7/7] 192.168.48.1 () {36 vars in 562 bytes} [Sat Aug 10 14:45:03 2019] GET /api/v2/engagements?product=1&limit=50&offset=0 => generated 0 bytes in 1 msecs (HTTP/1.1 301) 3 headers in 153 bytes (1 switches on core 0)
nginx_1         | 192.168.48.1 - - [10/Aug/2019:14:45:03 +0000] "GET /api/v2/engagements?product=1&limit=50&offset=0 HTTP/1.1" 301 0 "-" "Java/1.8.0_222" "-"
uwsgi_1         | [pid: 439|app: 0|req: 8/8] 192.168.48.1 () {36 vars in 564 bytes} [Sat Aug 10 14:45:03 2019] GET /api/v2/engagements/?product=1&limit=50&offset=0 => generated 938 bytes in 14 msecs (HTTP/1.1 200) 4 headers in 134 bytes (1 switches on core 0)
nginx_1         | 192.168.48.1 - - [10/Aug/2019:14:45:03 +0000] "GET /api/v2/engagements/?product=1&limit=50&offset=0 HTTP/1.1" 200 938 "-" "Java/1.8.0_222" "-"
uwsgi_1         | [pid: 439|app: 0|req: 9/9] 192.168.48.1 () {48 vars in 1107 bytes} [Sat Aug 10 14:45:04 2019] GET /alerts/count => generated 12 bytes in 28 msecs (HTTP/1.1 200) 4 headers in 114 bytes (1 switches on core 0)
nginx_1         | 192.168.48.1 - - [10/Aug/2019:14:45:04 +0000] "GET /alerts/count HTTP/1.1" 200 12 "http://localhost:8080/api/key-v2" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/76.0.3809.87 Chrome/76.0.3809.87 Safari/537.36" "-"
celeryworker_1  | [2019-08-10 14:45:07,523: INFO/MainProcess] Received task: dojo.tasks.async_dupe_delete[b5f16008-4738-432b-88d7-00db23867ebf]  
celeryworker_1  | [2019-08-10 14:45:07,524: INFO/MainProcess] dojo.tasks.async_dupe_delete[b5f16008-4738-432b-88d7-00db23867ebf]: delete excess duplicates
celeryworker_1  | [2019-08-10 14:45:07,529: INFO/MainProcess] Task dojo.tasks.async_dupe_delete[b5f16008-4738-432b-88d7-00db23867ebf] succeeded in 0.00508868200085999s: None

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 38 (27 by maintainers)

Most upvoted comments

Deduplication without line numbers worked fine before #1395. I think #1395 should be reverted and reviewed/tested thoroughly before merging it back into dev. It breaks too many things that were working (or working more or less) before.

valentijnscholten on Aug 11, 2019

If you’re wanting to delegate dedupe to parsers, I have a feeling the amount of families will balloon with time.

@Maffooch nope, the fourth bullet reads “done OUTSIDE of the parser” 😃 The only thing the parsers need to do is fill out findings fields. based on those fields and a global configuration, an outside process generates the hash_code and makes the dedup.

ptrovatelli on Aug 14, 2019

it’s true that #1395 was merged a bit fast. abandonning both “skip_duplicates” and “close_old_findings” should have been discussed before merging as many ppl were probably relying on them. maybe we should revert, agree on the exact behavior and take the time to dev something that works for everyone? the migration part can probably be left though, to avoid adding another “rollback migration”

ptrovatelli on Aug 11, 2019

yes for dependency check it’s easy to uniquely identify a vuln. For source code analysis it may be a bit tricky as the line number may be changing and still the vuln is the same as before and should be identified as duplicate. However we could get help from new fields from reports to work on duplicates. For example checkmarx has already an advanced algo inside to track issues even in changing code (I’d hope so at least!). The reports include a “NodeId” which I is probably uniquely identifying a vuln location, independenly of the line number. I’ll need a little bit of testing to make sure it does what I think. Also there is another hicup: in defectDojo we make an aggregation on multiples findings from checkmarx and end up with less of them. This is what Maffooch was talking about regarding the first pass on deduplication inside the parser. For example in this unit test file multiple_findings.xml there are 3 findings from checkmarx, with 3 unique NodeId but we actually keep only two of them because we only look at “sink filename” + “sink line number”, when checkmarx also look at “source filename” + “source line number”. So it’s not always easy but having a configurable hash_code per parser could answer most of these scenarii

ptrovatelli on Aug 11, 2019