presto: Memory leak in `com.facebook.presto.memory.ClusterMemoryManager#changeListeners`
Example log: https://api.travis-ci.org/v3/job/389838962/log.txt Commit: https://github.com/prestodb/presto/commit/48ea8f3eaa9a3047893d0acd5304035912ebb681
Example PR: https://github.com/prestodb/presto/pull/10808 log: https://api.travis-ci.org/v3/job/390987547/log.txt
log tail:
2018-06-12T06:46:47.197+0545 INFO Major GC: application 12ms, stopped 6406ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.197+0545 INFO Major GC: application 12ms, stopped 6406ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.197+0545 INFO Major GC: application 12ms, stopped 6406ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:47.198+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 6ms, stopped 6222ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.067+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:53.068+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:46:59.581+0545 WARNING Node state update request to http://127.0.0.1:44031/v1/info/state has not returned in 18.21s
2018-06-12T06:46:59.582+0545 WARNING Error fetching node state from http://127.0.0.1:36641/v1/info/state: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
2018-06-12T06:46:59.584+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:37427/v1/service/presto/general
2018-06-12T06:47:49.816+0545 INFO Discovery server connect succeeded for refresh (presto/general)
2018-06-12T06:48:13.880+0545 SEVERE Expected service announcement after 8000.00ms, but announcement was delayed 1.02m
2018-06-12T06:48:25.391+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:48:36.835+0545 SEVERE Expected service announcement after 8000.00ms, but announcement was delayed 28.75s
2018-06-12T06:48:42.965+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:37427/v1/service/presto/general
2018-06-12T06:48:54.927+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:41456/v1/service/presto/general
2018-06-12T06:49:06.700+0545 SEVERE Cannot connect to discovery server for announce: Announcement failed for http://127.0.0.1:41456
2018-06-12T06:49:06.700+0545 SEVERE Service announcement failed after 2.62m. Next request will happen within 8000.00ms
2018-06-12T06:49:06.700+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:41456/v1/service/presto/general
2018-06-12T06:49:06.703+0545 INFO Discovery server connect succeeded for refresh (presto/general)
2018-06-12T06:49:06.703+0545 INFO Discovery server connect succeeded for refresh (presto/general)
2018-06-12T06:49:12.678+0545 SEVERE Expected service announcement after 8000.00ms, but announcement was delayed 1.84m
2018-06-12T06:49:12.679+0545 SEVERE Cannot connect to discovery server for announce: Announcement failed for http://127.0.0.1:37427
2018-06-12T06:49:12.679+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:37427/v1/service/presto/general
2018-06-12T06:49:12.679+0545 SEVERE Service announcement failed after 2.72m. Next request will happen within 8000.00ms
2018-06-12T06:49:18.676+0545 SEVERE Service announcement failed after 5.19m. Next request will happen within 2.00ms
2018-06-12T06:49:24.589+0545 SEVERE Expected service announcement after 8000.00ms, but announcement was delayed 1.94m
2018-06-12T06:49:24.591+0545 SEVERE Expected service announcement after 8000.00ms, but announcement was delayed 1.94m
2018-06-12T06:49:24.591+0545 SEVERE Cannot connect to discovery server for refresh (presto/general): Lookup of presto failed for http://127.0.0.1:37427/v1/service/presto/general
2018-06-12T06:49:24.593+0545 SEVERE Cannot connect to discovery server for announce: Announcement failed for http://127.0.0.1:37427
2018-06-12T06:49:30.664+0545 SEVERE Service announcement failed after 3.02m. Next request will happen within 8000.00ms
2018-06-12T06:49:54.974+0545 WARNING Error fetching memory info from http://127.0.0.1:36641/v1/memory: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
2018-06-12T06:50:00.882+0545 SEVERE Expected service announcement after 4.00ms, but announcement was delayed 42.21s
2018-06-12T06:50:07.174+0545 WARNING Node state update request to http://127.0.0.1:44031/v1/info/state has not returned in 205.80s
2018-06-12T06:50:07.175+0545 WARNING Error fetching node state from http://127.0.0.1:44031/v1/info/state: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
2018-06-12T06:50:07.175+0545 INFO Major GC: application 22ms, stopped 6435ms:: 2.12GB -> 2.12GB
2018-06-12T06:50:07.175+0545 INFO Major GC: application 7ms, stopped 5955ms:: 2.12GB -> 2.12GB
Should we increase tests memory limit? Could there be some mem leak?
CC: @electrum
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 1
- Comments: 15 (15 by maintainers)
I see now. 10M
ClusterMemoryManager$$Lambda$7855
are queued executions incom.facebook.presto.memory.ClusterMemoryManager#listenerExecutor
I didn’t dig why the executions queue up (might be a bottleneck, since the executor is single threaded, or some kind of a lock).
Recently the problem became more server and more easily reproducible locally. I am under impression that
MemoryAwareQueryExecution
adds listeners inClusterMemoryManager
:that are never going to be removed.
I am not entirely sure yet, since all info I got so far is the 10M instances of
com.facebook.presto.memory.ClusterMemoryManager$$Lambda$7855
cc @nezihyigitbasi @raghavsethi