json-schema-validator: 1.0.68 introduces memory leak
Version 1.0.68 introduces a memory leak.
When called repeatedly, we see the JSON Schema validator consumes all available memory, until it crashes with java.lang.OutOfMemoryError: Java heap space
The following script demonstrates it (uses Ammonite on JDK 17, Scala 2.13).
Run the script after setting JAVA_OPTS=-Xmx256M
This assumes you’re parsing some fairly large JSON files (500kbyte in my case), and the JSON Schema is of some complexity. I assume you have some examples around like this to use.
Note that if you revert the version back to 1.0.67, the script runs successfully.
#!/usr/bin/env amm
interp.load.ivy("com.fasterxml.jackson.core" % "jackson-databind" % "2.13.2.1")
interp.load.ivy("com.networknt" % "json-schema-validator" % "1.0.68")
@
import java.nio.file.{Files, Path}
import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
import com.networknt.schema.{JsonSchema, JsonSchemaFactory, SpecVersion}
val bigFile = Files.readString(Path.of("big-json-file.json"))
// In Scala, an object is similar to Java static classes, I think?
object JsonSchemaCheck {
private val jsonSchemaContent = Files.readString(Path.of("schema.json"))
private val validator = getJsonSchema(jsonSchemaContent)
private def getJson(content: String): JsonNode = {
val mapper = new ObjectMapper
mapper.readTree(content)
}
private def getJsonSchema(schema: String): JsonSchema = {
val factory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7)
factory.getSchema(schema)
}
def check(rawDoc: String) = {
val json = getJson(rawDoc)
validator.validate(json)
}
}
@main
def main(): Unit = {
Range(1,500).foreach { i =>
println(i)
JsonSchemaCheck.check(bigFile)
}
println("Done")
}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 33 (19 by maintainers)
Running with the current master version, I’m not seeing memory exhaustion. That’s good.
Yes it was supposed to be updated. But however we clearly mentioned that there are ThreadLocal usages in the library. I will try to make it more explicit in the documentation.
Anyway, thank you for making a fix. I look forward to it.
When you say it was always recommended… Where was that recommendation made?
Let me check the documentation again…
reset()and it also comes back with no results apart from the actual method itself.@TJC sorry for the inconvenience and thank you for confirming. CollectorContext was added long before and it is noted in the documentation that data is stored in ThreadLocal. Please note the issue could be because of the UnevaluatedProperties accumulation in CollectorContext. CollectorContext is being used in this library for several use-cases like adding defaults.
Also http servers based on Java Servlets , could tend to reuse same threads for processing requests so the underlying ThreadLocal also might be reused thus not clearing the older CollectorContext from ThreadLocal. It is always recommended to reset the CollectorContext explicitly.
Please note another class from the Library ValidatorState is also on ThreadLocal.
Also I see there is no documentation for every new keyword that is added. I can add one for this though.
I would also recommend testing in your test/perf environments with an approximate load replicating production in future.
Yes reset will be a default experience in my PR next week.
Agree with you @AndreasALoew we can add an additional validate and walk methods that will NOT reset the CollectorContext based on user input.
@TJC can you quickly verify if the reset of CollectorContext fixes your issue.
Upon further testing, it seems the memory leak was introduced in 1.0.68. Reverting back to .67 fixes the problem.