komga: [Bug] Komga fails to close PDF files, may cause OOM

Komga environment

  • Komga version:
  • I am running Komga with Docker
    • Docker image tag [e.g. latest, beta]: 5f5804bbebbf
  • I am running Komga from the jar
    • Java version:
  • I have a problem in the web interface
    • Browser (with version):
  • I have a problem with an OPDS client application
    • OPDS Application (with version):
  • I have a problem with the Tachiyomi extension
    • Tachiyomi version:
    • Tachiyomi extension version:

Describe the bug

My log files are full of this warning while analyzing files:

2022-01-27 13:00:23.493  WARN 1 --- [Finalizer] org.apache.pdfbox.cos.COSDocument        : Warning: You did not close a PDF Document

Which seem to lead to this warning:

2022-01-27 17:49:38.937  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.k.infrastructure.jms.ArtemisConfig   : Java heap space

And subsequent crashing due to being out of memory. Unsure if they’re actually connected, so just reporting the failure to close PDF file

Steps to reproduce

  1. Add many PDF documents to library
  2. Scan the library for new documents, check the logs
  3. See the failure to close PDF files, and if enough PDFs, see increased memory

Expected behavior

No warning log

Actual behavior

The above warning log

Log file

2022-01-27 17:49:32.161  WARN 1 --- [Finalizer] org.apache.pdfbox.cos.COSDocument        : Warning: You did not close a PDF Document
2022-01-27 17:49:32.161  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#2-5] o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener invoker failed for destination 'sse' - trying to recover. Cause: Java heap space
2022-01-27 17:49:32.252  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.k.infrastructure.jms.ArtemisConfig   : Java heap space
2022-01-27 17:49:32.256  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.application.tasks.TaskHandler  : Executing task: AnalyzeBook(bookId='07KYJJR9VC2XK', priority='4')
2022-01-27 17:49:32.257  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookLifecycle   : Analyze and persist book: Book(name=JJBA 5 - Stone Ocean 13, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2013.pdf, fileLastModified=2014-12-27T16:21:11, fileSize=165094497, fileHash=131jxpc, number=13, id=07KYJJR9VC2XK, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.696)
2022-01-27 17:49:32.258  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Trying to analyze book: Book(name=JJBA 5 - Stone Ocean 13, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2013.pdf, fileLastModified=2014-12-27T16:21:11, fileSize=165094497, fileHash=131jxpc, number=13, id=07KYJJR9VC2XK, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.696)
2022-01-27 17:49:32.260  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Detected media type: application/pdf
2022-01-27 17:49:38.850  WARN 1 --- [Finalizer] org.apache.pdfbox.cos.COSDocument        : Warning: You did not close a PDF Document
2022-01-27 17:49:38.936  WARN 1 --- [Thread-6 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@30a62a5b)] org.apache.activemq.artemis.core.server  : AMQ222149: Message Reference[10737423979]:RELIABLE:CoreMessage[messageID=10737423979,durable=true,userID=262fc483-7fdc-11ec-8f52-0242ac110002,priority=4, timestamp=Thu Jan 27 17:46:45 PST 2022,expiration=0, durable=true, address=tasks.background,size=625,properties=TypedProperties[subtype=AnalyzeBook,_AMQ_GROUP_ID=D,__AMQ_CID=fe3f1351-7fd9-11ec-8f52-0242ac110002,unique_id=ANALYZE_BOOK_07KYJJR9VC2XK,_AMQ_ROUTING_TYPE=1,type=task]]@277474725 has reached maximum delivery attempts, sending it to Dead Letter Address DLQ from tasks.background
2022-01-27 17:49:38.937  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.k.infrastructure.jms.ArtemisConfig   : Java heap space
2022-01-27 17:49:38.941  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.application.tasks.TaskHandler  : Executing task: AnalyzeBook(bookId='07KYJJR9VC2XM', priority='4')
2022-01-27 17:49:38.942  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookLifecycle   : Analyze and persist book: Book(name=JJBA 5 - Stone Ocean 14, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2014.pdf, fileLastModified=2014-12-27T16:22:34, fileSize=153702496, fileHash=py6jwd, number=14, id=07KYJJR9VC2XM, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.697)
2022-01-27 17:49:38.944  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Trying to analyze book: Book(name=JJBA 5 - Stone Ocean 14, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2014.pdf, fileLastModified=2014-12-27T16:22:34, fileSize=153702496, fileHash=py6jwd, number=14, id=07KYJJR9VC2XM, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.697)
2022-01-27 17:49:39.205  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Detected media type: application/pdf
2022-01-27 17:49:53.380  WARN 1 --- [Finalizer] org.apache.pdfbox.cos.COSDocument        : Warning: You did not close a PDF Document
2022-01-27 17:49:53.383  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#0-6] o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener invoker failed for destination 'sse' - trying to recover. Cause: Java heap space
2022-01-27 17:49:53.574  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.k.infrastructure.jms.ArtemisConfig   : Java heap space
2022-01-27 17:49:53.579  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.application.tasks.TaskHandler  : Executing task: AnalyzeBook(bookId='07KYJJR9VC2XM', priority='4')
2022-01-27 17:49:53.580  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookLifecycle   : Analyze and persist book: Book(name=JJBA 5 - Stone Ocean 14, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2014.pdf, fileLastModified=2014-12-27T16:22:34, fileSize=153702496, fileHash=py6jwd, number=14, id=07KYJJR9VC2XM, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.697)
2022-01-27 17:49:53.582  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Trying to analyze book: Book(name=JJBA 5 - Stone Ocean 14, url=file:/data/Manga/JJBA%206%20-%20Stone%20Ocean/JJBA%205%20-%20Stone%20Ocean%2014.pdf, fileLastModified=2014-12-27T16:22:34, fileSize=153702496, fileHash=py6jwd, number=14, id=07KYJJR9VC2XM, seriesId=07KYJJR9QC4FD, libraryId=07KYHZ1G3C8VS, deletedDate=null, createdDate=2022-01-27T14:39:09, lastModifiedDate=2022-01-27T17:46:42.697)
2022-01-27 17:49:53.584  INFO 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.komga.domain.service.BookAnalyzer    : Detected media type: application/pdf
2022-01-27 17:50:01.762  WARN 1 --- [org.springframework.jms.JmsListenerEndpointContainer#1-3] o.g.k.infrastructure.jms.ArtemisConfig   : Java heap space

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (16 by maintainers)

Most upvoted comments

Oh, GitHub search is global so a few keywords surfaces new public issues mentioning caffeine. Then if I might be able to help resolve confusion or catch a bug I’ll reply. It’s useful feedback to know what might need improvements.

I’m really impressed, and big thanks for making Caffeine, i use it in a few projects and really like it 😃

Oh, GitHub search is global so a few keywords surfaces new public issues mentioning caffeine. Then if I might be able to help resolve confusion or catch a bug I’ll reply. It’s useful feedback to know what might need improvements.

My understanding seems to be different that yours. What the doc says is that the weight is not used to decide which entry is removed first, this will still be decided by the policy, like last accessed, or last written. However, it doesn’t say that the cache entry wouldn’t be removed if the size is over the maximum.

That is correct. The cache will load the entry and return it to the caller, but it may be immediately eligible for removal by an eviction policy. In your case if the entry’s weight exceeds the maximum then the cache will discard it in preference removing it and clearing the cache. Obviously retaining what we can if preferable, as the new entry can’t be held regardless.

Otherwise the eviction decision is based on recency and frequency. A future version will likely incorporate the weight in order to make a smarter decision and increase hit rates. This was explored in the paper Lightweight Robust Size Aware Cache Management. That improvement offers a modest boost, but is a bit harder when considering concurrency and has not yet been raised by users as a concern.

I checked the PDDocument docs, and it looks like you can supply MemoryUsageSettings which would allow you to restrict the amount of data you load into memory, and stores the rest in temporary files.

In my own usage of PdfBox (unrelated to Komga), I had to use mixed memory usage settings and disable the resource cache. The resource cache is soft reference based and causes OOME as images are humongous objects in G1’s terminology, and GCs do not (or did not at the time) handle those well. If I recall correctly, the resource cache is document-specific so the likelihood of cache hits was negligible in my use-case (convert pdfs from or to images). The mixed setting was less helpful than I had hoped, but moving my work to AWS Lambda was much better thanks to isolated failures, minimized risk of resource exhaustion (serves only one request per instance), and cheaply scalable. Unfortunately we moved to Google CloudRun which offers only an in-memory file system, so MemoryUsageSetting lost its benefit. PdfBox also generates very bloated files, so I had to post process using ghostscript to avoid outbound emails from failing due to attachment limits. Eventually for that usage I’ll stop dragging that original code forward and replace it all with mupdf-tools.