liquibase: Too many open files error on application server: LiquibaseEntityResolver opens too many file descriptors and never closes them
Environment
- Java version: OpenJDK 11
- Liquibase Version: newest (4.3.3)
- Liquibase Integration & Version: Java API (we’re using new
new Liquibase().listUnrunChangeSets
manually). It’s called from within a webapp with Jetty 9 / Tomcat 8 / Tomcat 9 / JSF 2.3 / Weld 3.1.7. - Database Vendor & Version: tested with MySQL 8 and PostgreSQL 12, but probably independent of DBMS
- Operating System Type & Version: Ubuntu (tested on 18.04 and 20.10)
Description
Symptoms are the same as #1449, but not sure if it’s the same cause (we’re not using Spring). When running Liquibase, it opens many file decriptors and never closes them. They only get closed when the Garbabe Collector runs, which may be a long in the future. This results in our application server crashing.
I first observed this on a test environment server with Tomcat 8 after upgrading vom liquibase 3 to liquibase 4. The webapp failed to start, and the log contained many too many open files
messages. Checking via /proc/pid/fd
indeed revealed that the number of file descriptors had reached the limit. At first I could not reproduce this locally on my machine, but now I can and I think I know why it happens:
The problematic part is in LiquibaseEntitiyResolve, which is called when Liquibase parses the database changelogs (we’ve got 11 XML changelog files) after we call `Liquibase#listUnrunChangeSets´.
When the LiquibaseEntityResolver
is constructed, it also creates a fallbackResourceAccessor
of type ClassLoaderResourceAccessor
. The ClassLoaderResourceAccessor
loads all JARs of the application, but does so only lazily only when its #openStreams
method is called.
Next the LiquibaseEntityResolver
creates a CompositeResourceAccessor
with the resource loader from the current scope and the fallback resource loader. Now, and that’s where the error starts happening, the composite resource accessor always calls openStreams
on all accessors, regardless of whether an earlier found something or not.
This means that init
method of the fallback ClassLoaderResourceAccessor
is always called, loading all JARs.
In our case the LiquibaseEntityResolver
is created about ~20 times, and each time it loads ~300 JARs. The fiel descriptor limit on the server is 4000, so it crashes before liquibase is done.
What I had been wondering is why I cannot observe this behavior locally, but now I’ve got an explanation for that. The same process I described above actually does happen on my machine, but: the file descriptors are closed implicitly by the finalize
of the ZipFileSystem
. When I debug into the application, I can observe the file descriptors growing on my machine as well. It’s just that I seem to get lucky that the garbage collector kicks in early enough before even more file descriptors are opened. When
I increased the file descriptor limit on the test server, it finish running liquibase, but it only ran the finalizer and closed the file descriptor several minutes later (peak was ~6500 open file descriptors)
Is that enough information to reproduce? Not sure what the best solution here might be, but I guess some things that could be done are (i) closing the created file systems explicitly, (ii) not creating a new fallback accessor each time, (iii) not using the fallback accessor when the resource accessor from the current scope already returns a result.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (4 by maintainers)
Commits related to this issue
- Too many open files error on application server: LiquibaseEntityResolver opens too many file descriptors and never closes them #1815 LB-1547 — committed to liquibase/liquibase by nvoxland 3 years ago
- Rolled back to 3.10.3 liquibase https://github.com/liquibase/liquibase/issues/1815. Liquibase have a PR open that will be fixed in the next release and apparently is affecting 4.X versions ( https:... — committed to ORCID/ORCID-Source by Camelia-Orcid 3 years ago
Hi all, when can we expect this fix to be released?
We have experienced similar performance issues as described in #1754 and the changes seem to have fixed it. I tested with the artifact linked by @nvoxland above.
You can get a build with the changes off https://github.com/liquibase/liquibase/actions?query=branch%3ALB-1547
Can you try from that?