classgraph: OutOfMemoryError spins library out of control
Issue summary
nonapi.io.github.classgraph.fastzipfilereader.NestedJarHandler#close
may never complete if Thread was interrupted, which is always if scanning failed. See below for details.
Background
I was writing some unit tests for a really basic Neo4J graph, but while Neo4J-OGM was looking for my Entities during test setup, it just never completed. With some very roundabout debugging (all the async and OOM interplay was making me tear my hair out), I managed to narrow down the root cause to how classgraph works. I have 32G memory, and Java defaults to using 1/4th of memory per process, which means I cannot run much in parallel, so I have a global 256M override for each process. It works fine for Gradle, Neo4J, and other traditionally resource-hungry setups. It even works for my test setup most of the time, so for this report repro I lowered the memory limit to be consistent.
Repro Setup
trivial library usage:
// implementation "io.github.classgraph:classgraph:4.8.62"
ScanResult result = new ClassGraph()
.whitelistPackages("pack.age")
.verbose()
.scan()
;
plus a dependency with a nested jar in it:
runtimeOnly "org.neo4j:neo4j-lucene-upgrade:3.4.9"
non-trivial: resource constrained memory: -Xmx128M
(see build.gradle)
Full minimal repro can be found here: https://github.com/TWiStErRob/repros/tree/master/classgraph/oom-NestedJarHandler-spin
Repro steps
gradlew run
or
- Import project to IntellIJ
- Execute
Main.main
(should pass, depending on machine setup) - Edit Run Configuration to have VM options:
-Xmx128M
- Run or debug again (will fail, expected: print a line)
Pieces of the puzzle
What I found to be contributing to the issue.
Nested Jar files
runtimeOnly "org.neo4j:neo4j-lucene-upgrade:3.4.9"
an example from Neo4J/OGM, which contains !/lib/lucene-backward-codecs-5.5.0.jar This is required to trigger
} else {
// This path has one or more '!' sections.
code path in NestedJarHandler
.
Large allocation to trigger OOM
This is baked into the library. MappedByteBufferResources
allocates 64M per JAR file, which is quite large, considering there could be many jars like this on the classpath, there could a lot of allocations happening at the same time and consuming tons of memory. Note: I have 14 core/28 logical core processor, and the library runs on 39 threads. So if all those are processing nested JARs, 2.5G of memory is required. That doesn’t sound “ultra-lightweight” 😃
This allocation throws an OOM, which is caught by AutoCloseableExecutorService.afterExecute
and the thread is interrupted. I think this interrupt doesn’t hurt NestedJarHandler.close
, so it could be an irrelevant red herring to the issue. But this location came up during the investigation so I thought I would mention it.
Scanner.call
catches and cleans up
When an exception occurred (e.g. our OOM), removeTemporaryFilesAfterScan
is set to true
and threads are interrupted. finally
then comes and tries to close nestedJarHandler
. The relevant parts of the method:
@Override
public ScanResult call() throws InterruptedException, CancellationException, ExecutionException {
try {
scanResult = openClasspathElementsThenScan();
} catch (final Throwable e) {
// Since an exception was thrown, remove temporary files
removeTemporaryFilesAfterScan = true;
// Stop any running threads (should not be needed, threads should already be quiescent)
interruptionChecker.interrupt();
} finally {
if (removeTemporaryFilesAfterScan) {
// If removeTemporaryFilesAfterScan was set, remove temp files and close resources,
// zipfiles and modules
nestedJarHandler.close(topLevelLog);
}
}
return scanResult;
}
NestedJarHandler#close
spins
See numbered inline comments for explanation.
if (canonicalFileToPhysicalZipFileMap != null) {
// (4) spins out of control, because (3)
while (!canonicalFileToPhysicalZipFileMap.isEmpty()) {
try {
// (1) throws InterruptedException if thread was interrupted, which is always because Scanner error handling interrupted
for (final Entry<File, PhysicalZipFile> ent : canonicalFileToPhysicalZipFileMap.entries()) {
final PhysicalZipFile physicalZipFile = ent.getValue();
physicalZipFile.close();
// (3) never executed, because of ConcurrentHashMap's interruption in SingletonMap
canonicalFileToPhysicalZipFileMap.remove(ent.getKey());
}
} catch (final InterruptedException e) {
// (2) re-interrupts Thread
interruptionChecker.interrupt();
}
}
canonicalFileToPhysicalZipFileMap = null;
}
Proposed solutions
- Quick treatment with library gracefully executing to successful completion:
Catch OOM explicitly when allocatingbuf
inMappedByteBufferResources
and spill to disk if that happened. This would be in line with howIOException
frommakeTempFile
is caught. - Lower memory usage (could be combined with previous):
fastZipEntryToZipFileSliceMap
knows the size of the nested Jar file (childZipEntry), if that was passed down the decision can be immediately made to spill to disk before allocating and without reading the whole stream. - Don’t spin
In any case,NestedJarHandler.close
should probably be able to closecanonicalFileToPhysicalZipFileMap
even in an interrupted state, so that fail classpath scans can receive the exceptional termination and handle accordingly.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 2
- Comments: 18 (18 by maintainers)
Commits related to this issue
- Repro for classgraph/classgraph#400 — committed to TWiStErRob/repros by TWiStErRob 4 years ago
- Trim 64MB buffer if jar is smaller; use InputStream size hint (#400) — committed to classgraph/classgraph by lukehutch 4 years ago
- Add `ClassGraph#setMaxBufferedJarRAMSize(int)` (#400) — committed to classgraph/classgraph by lukehutch 4 years ago
- Add disableMemoryMapping() to use RandomAccessFile instead of mmap: #400 — committed to classgraph/classgraph by lukehutch 4 years ago
- Merge pull request #401 from TWiStErRob/issue400_test Add performance regression test for #400 — committed to classgraph/classgraph by lukehutch 4 years ago
- Avoid infinite loop on interruption (#400) — committed to classgraph/classgraph by lukehutch 4 years ago
- Replace file slicing layer and remove mmap support (#400) — committed to classgraph/classgraph by lukehutch 4 years ago
- Add mmap support back as an option (#400) — committed to classgraph/classgraph by lukehutch 4 years ago
- GH-762 - Update classgraph dependency to start up in low-memory environments. See https://github.com/classgraph/classgraph/issues/400 for more details. Also be precise in our internal docs why we ar... — committed to neo4j/neo4j-ogm by TWiStErRob 4 years ago
- GH-762 - Update classgraph dependency to start up in low-memory environments. See https://github.com/classgraph/classgraph/issues/400 for more details. Also be precise in our internal docs why we ar... — committed to neo4j/neo4j-ogm by TWiStErRob 4 years ago
- [SVN] r2932 Cinema - Heroku in /Projects/Cinema/ [FIX] Increase Gradle daemon and hence Unit runner memory to avoid https://github.com/classgraph/classgraph/issues/400 — committed to TWiStErRob/net.twisterrob.cinema by TWiStErRob 4 years ago
After working around #403 by using JDK 9 for runtime, I can confirm that a complex Neo4J OGM, Kotlin, Ktor app runs with Xmx64M happily, and Xmx32M still runs into OOM, but doesn’t spin.
You’re welcome! That’s the goal. Especially when somebody files a bug that is as detailed and deeply researched as yours is!
I usually try to ensure any bug that is reported is fixed within 24 hours, and a new release pushed out – but this particular bug report has triggered a whole set of domino changes that affect ClassGraph at a very deep level, and it might take a few days to get all the changes in place. I’ll try to get some more of that done tomorrow.
Right, but I thought you were saying
-Xmx
sets the maximum limit for all allocations, both heap and memory mapped files? Or do memory mappings fall outside the-Xmx
limit (while still being limited, probably relative to RAM size)? (I guess I could just go test this, but it seems like you know the answer…)Yes, this was a major oversight, that I forgot to trim that array! I think that alone is the source of the sometimes dramatic drop in memory consumption you saw in your latest comment.
I assumed that zipping a zipfile rarely if ever resulted in further compression, but I guess if a zipfile already contained stored uncompressed content, it should be compressible. Or maybe some zipfiles contain central directories that are large enough that zip thinks it is worth it to zip the file for the compression gains. However I just looked at my
~/.m2/repository
cache, and there isn’t a single jar in there that contains a nested jar. So I think it is only web applications, Spring-Boot applications, etc. (with custom classloaders) that cause this issue to arise, and not libraries in general.Well that’s even more amazing then that you put so much time and effort into tracking down the source of the problem, when it was a dependency of a dependency that was causing the problem!
I always thought so before, but I’m not so sure anymore. Actually the new
RandomAccessFile
method only ever relies on operating system buffers, which will only ever use unused RAM, so the memory overhead is basically zero.Yes, and this is done lazily. But I didn’t know that OS file buffering worked so well with
RandomAccessFile
– I thought it would cause many additional disk reads, leading to greater overhead.I should test this on at least Windows before I totally pull the plug on
MappedByteBuffer
, but to be honest the mmap support in ClassGraph has caused a neverending series of headaches over the years, e.g. because without usingmisc.Unsafe
, you can’t forcibly unmap mapped byte buffers, you just have to wait for the GC to decide to collect them, then they’re freed – and if you don’t unmap them, Windows holds a file lock on them, which can cause files that were supposed to be deleted to be left behind. Anyway although ClassGraph seems to work flawlessly with mmap now across all three major operating systems (and it took a lot of work to get there), I won’t shed any tears ripping all of it out as long as the performance is not impacted.Thanks, I really appreciate the contribution!
I played around a bit more, I was curious if I can reproduce what I jokingly referred to
I changed the terminating code to:
and gave the process much more memory with -Xmx10G
The numbers are: before GC/after GC 4.8.62 → before GC/after GC 4.8.63-SNAPSHOT, in megabytes. Without any dependency: 9M/2M → 9M/2M With
runtimeOnly "org.neo4j:neo4j-lucene-upgrade:3.4.9"
: 208M/202M → 15M/4M WithruntimeOnly files("libs/nested-stored.jar")
: 7M/3M → 7M/3M WithruntimeOnly files("libs/nested-deflated.jar")
: 8482M/8453M → 14M/4MThe last two are artificial files with 128 nested JAR files, that I can’t attach here 😦 I know it’s unrealistic, but it proves your fix working, and it is working really well! See also PR #401.
@TWiStErRob you sir deserve a medal for this bug report. I wish all bug reports were this detailed and well-researched!!
Not quite – firstly this is virtual memory space that is used, not allocated RAM. The full 2.5GB will only become “resident” (swapped in) if the entire range is in constant active use – and under RAM pressure, virtual memory pages that haven’t been used recently can simply be swapped back out to disk if they have changed, or in the case of jars (which never change), simply dropped from RAM. Also unless you are running multiple copies of the same program in completely different processes, then memory-mapped jars for a single
ClassGraph
instance are shared between threads in the same process (e.g. the ClassGraph worker threads) – and even if you do have multiple processes mapping the same file to RAM, any modern OS (at least Linux, but I assume also Mac OS X and Windows?) should see that the same file is already mapped when a request to map it again is issued, and re-use the mapping without allocating more virtual memory range. This should also cause mapping reuse acrossClassGraph
instances within or across processes.It’s still possible to run out of virtual memory space easily on a 32-bit OS though, and it’s possible to hit an OOM condition by running out of Java memory map file handles on Linux (there’s a limit of 64k of them).
(Update, based on your suggestion to use
-Xmx128M
: Oh, I hadn’t considered though that Java limits not just the resident size but also the entire accessible virtual memory size with-Xmx
! That’s actually pretty horrifying if true…)ClassGraph can still consume an enormous amount of RAM, simply to build the graph of
ClassInfo
objects, and if you have multiple instances of this, and/or you’re scanning a large number of classes, this can quickly consume all RAM (#338, #371).ClassInfo
objects cannot be reused acrossClassGraph
instances or across processes. I spent a lot of time carefully looking at RAM usage for #338, and reduced RAM usage as much as possible, but there was really no way to reduce it further without the caller taking actions to reduce the size of theScanResult
by using more aggressive whitelisting etc. – so further lowering memory usage in general is not really a possibility.This is in tension with the fact that some users of ClassGraph really, really don’t want ClassGraph ever writing anything to disk, or perhaps don’t even have a writeable temporary directory at all (#236). The 64MB default was intentionally set high to cause ClassGraph to avoid writing to disk wherever possible. Also, what you’re suggesting, of lowering the 64M and spilling to disk more readily, would absolutely kill performance for some runtimes with lots of nested jars, e.g. a Spring-Boot application, because right now as long as a nested jar is stored and not deflated, the nested jar can be read directly, using file slicing, when its own entries need to be deflated. If the 64M were lowered, then it would be more common that the inner jar would need to be copied to a temporary file on disk just to have its entries read, which would be wasteful.
Probably though this should be configurable on a per-application basis, so it’s a good idea to add a configuration option for it. I’ll do that.
On your suggestions:
Lots to do here – thanks for bringing these issues to my attention!