druid: Druid Router UI throwing 504 when there are too many tasks
Affected Version
0.19.1
Description
We have this use case where we submit 10k+ tasks per day. When loading the Router UI, it usually throws a 504 when loading the Tasks tile after a while of waiting. The same happens when we open the Ingestion tab as well.


Is there a way to prevent the Router from loading all tasks at once, and rather just lazy load?
A workaround is to set druid.indexer.storage.recentlyFinishedThreshold to a lower value. But we were wondering if there is a better way of doing this.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 4
- Comments: 15 (12 by maintainers)
I have done some profiling on our stack here, my analysis follows. We are configured with
HeapMemoryTaskStorage.By issuing repeated SQL requests against broker (such as with
ab) we can see the workload increase onoverlord. Taking a CPU profile of the overlord host and focusing on the CPU related to the/tasksendpoint gives a view that over 50% of the CPU load is inHeapMemoryTaskStorage::getTasks, and only a small % of time in serialization.Notice specifically in the before/after below that the % of time (width of bar) of
getCompletedTaskInfo...(highlighted in a magenta-ish colour), and that the bulk of the time is insortedCopy.Before
After changes,
getCompletedTaskInfo...is significantly reduced as a % of the overall CPU time, so much that serialization is now far larger than the query time.Hey @jasonk000 we’re using Postgres on AWS RDS as our
druid.indexer.storage.type,ie,metadata