druid: Druid Router UI throwing 504 when there are too many tasks
Affected Version
0.19.1
Description
We have this use case where we submit 10k+ tasks per day. When loading the Router UI, it usually throws a 504 when loading the Tasks
tile after a while of waiting. The same happens when we open the Ingestion
tab as well.
Is there a way to prevent the Router from loading all tasks at once, and rather just lazy load?
A workaround is to set druid.indexer.storage.recentlyFinishedThreshold
to a lower value. But we were wondering if there is a better way of doing this.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 15 (12 by maintainers)
I have done some profiling on our stack here, my analysis follows. We are configured with
HeapMemoryTaskStorage
.By issuing repeated SQL requests against broker (such as with
ab
) we can see the workload increase onoverlord
. Taking a CPU profile of the overlord host and focusing on the CPU related to the/tasks
endpoint gives a view that over 50% of the CPU load is inHeapMemoryTaskStorage::getTasks
, and only a small % of time in serialization.Notice specifically in the before/after below that the % of time (width of bar) of
getCompletedTaskInfo...
(highlighted in a magenta-ish colour), and that the bulk of the time is insortedCopy
.Before
After changes,
getCompletedTaskInfo...
is significantly reduced as a % of the overall CPU time, so much that serialization is now far larger than the query time.Hey @jasonk000 we’re using Postgres on AWS RDS as our
druid.indexer.storage.type
,ie,metadata