spring-boot: Iterating over JarFile.entries() is very slow.
Iterating over org.springframework.boot.loader.jar.JarFile#entries()
is very slow. The successive calls to JarFileEntries#getEntry
by EntryIterator#next
are causing the underlying RandomAccessFile
to jump back and forth in order to re-read the file headers from the central directory of the jar file. This could be much faster if JarFileEntries#visitFileHeader
would store the complete FileHeader
instead of just it’s offset and the hash of its name.
Some Background
While working on https://github.com/joinfaces/joinfaces/pull/565 i noticed that ClassGraph is about 500ms slower when scanning a repackaged Spring Boot application than scanning the same application in its unpacked form. So I had a look at the code of ClassGraph and saw that it extracted all the nested jars in order to scan them. Hoping to improve the performance of scanning nested jars, I prepared the following patch to use the JarFile implementation of spring-boot-loader
in order to avoid the extra cost of extracting all the nested jars:
https://github.com/larsgrefer/classgraph/compare/ba4c69347eaf915571e9f5142e09f7a481471570...cfb317aff4d6949afbddcc1fd0ad78b118ef52ec?expand=1
Surprisingly this approach is even a bit slower, so I dug deeper into the code and traced the performance difference down to the Iteration done here: https://github.com/classgraph/classgraph/blob/b170d2bebb871824f7d53d54aa7a9b6939f25cf0/src/main/java/io/github/classgraph/utils/JarfileMetadataReader.java#L148
In my tests, iterating over a org.springframework.boot.loader.jar.JarFile
is about 5 to 10 times slower than iterating over a java.util.zip.ZipFile
. Ths performance impact is so severe that its even faster to extract the nested jar first, to be able to use java.util.zip.ZipFile
Conclusion
After I read the commit message of https://github.com/spring-projects/spring-boot/commit/e2368b909b46bc5bcec6792fb208ba9bd0fe6aaa it seems to me, that memory efficiency is more important for you than iteration performance. So my question is if you would accept a pull request which changes the current behavior or allows ClassGraph to change the behavior of the JarFileEntries
implementation and how this PR should look like.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 24 (21 by maintainers)
While working on this I noticed that my idea would have a much bigger impact on the memory usage than I thought it would. In the current implementation only one
CentralDirectoryFileHeader
instance is created for each jar file and subsequently filled with the information of the jar file entries. With my implementation I’d have to create one instance per entry.In the meantime we found two other solutions to our inital problem:
With the build-time scan we reduced this part of the application startup to ~200ms for reading a list of class names from a text file and loading them.