komga: [Bug] Book analysis is slow for webp images

Komga environment

  • OS: Debian 10
  • Komga version: 0.57.0
  • I am running Komga with Docker
    • Docker image tag [e.g. latest, beta]:
  • I am running Komga from the jar
    • Java version: 11.0.8
  • I have a problem in the web interface
    • Browser (with version):
  • I have a problem with an OPDS client application
    • OPDS Application (with version):
  • I have a problem with the Tachiyomi extension
    • Tachiyomi version:
    • Tachiyomi extension version:

Describe the bug

After noticing that a library scan on a fresh installation of Komga had been stuck on counting the number of pages inside my CBZ files for several hours earlier today, I aborted the scan and set up a new installation of Komga for this bug report. I continued by creating a new library containing only a single 10MB/35 pages CBZ file, started a library scan and observed that it took Komga almost two minutes to complete the “AnalyzeBook” task. The thumbnail generation task, meanwhile, finished in just a couple of seconds.

Steps to reproduce

  1. Create a fresh installation of Komga v0.57.0.
  2. Download this .zip file, rename it to .cbz and create a new folder for it: Locke & Key - Welcome to Lovecraft v01.zip
  3. Create a new library containing only this CBZ file.
  4. Wait a few minutes for Komga to complete the library scan.
  5. Check the log files.

Expected behavior

Komga only takes a fraction of a second to finish counting the number of pages inside the CBZ file.

Actual behavior

Komga takes almost two minutes to finish counting the number of pages inside the CBZ file. While testing different CBZ files in my collection, I observed that this time increases linearly with the number of pages. For example, it took Komga almost 20 minutes to finish scanning a 70MB/221 pages CBZ file.

Additional context

I think this issue is at least tangentially related to #278. I’ve been using Komga with the same media collection on the same hardware since about December 2019 and never had any issues with these files. In older versions, it only took Komga a fraction of a second each to finish the “AnalyzeBook” task for the CBZ files in my collection.

Edit: It looks like this issue does not affect PDF files. Komga only takes 270ms to finish scanning a 50+ pages PDF file for me.

Log file

I’ve noticed that Komga prints the error message ERROR 30813 --- [DefaultMessageListenerContainer-1] unknown.jul.logger: TODO exactly once for each corresponding page in the CBZ file, for a total of 35 times.

Spoiler

2020-08-20 15:03:37.867  INFO 30813 --- [http-nio-7264-exec-7] o.g.k.domain.service.LibraryLifecycle    : Adding new library: Comics with root folder: file:/mnt/storage/media/literature/Comics/
2020-08-20 15:03:37.899  INFO 30813 --- [http-nio-7264-exec-7] o.g.k.application.tasks.TaskReceiver     : Sending task: ScanLibrary(libraryId=02AV46FH6BCM3)
2020-08-20 15:03:37.995  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: ScanLibrary(libraryId=02AV46FH6BCM3)
2020-08-20 15:03:38.008  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.LibraryScanner  : Updating library: Library(name=Comics, root=file:/mnt/storage/media/literature/Comics/, importComicInfoBook=true, importComicInfoSeries=true, importComicInfoCollection=true, importComicInfoReadList=true, importEpubBook=true, importEpubSeries=true, importLocalArtwork=true, scanForceModifiedTime=false, scanDeep=false, id=02AV46FH6BCM3, createdDate=2020-08-20T15:03:37, lastModifiedDate=2020-08-20T15:03:37)
2020-08-20 15:03:38.009  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.FileSystemScanner   : Scanning folder: /mnt/storage/media/literature/Comics
2020-08-20 15:03:38.010  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.FileSystemScanner   : Supported extensions: [cbz, zip, cbr, rar, pdf, epub]
2020-08-20 15:03:38.012  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.FileSystemScanner   : Excluded patterns: [#recycle, @eaDir]
2020-08-20 15:03:38.012  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.FileSystemScanner   : Force directory modified time: false
2020-08-20 15:03:38.038  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.FileSystemScanner   : Scanned 1 series and 1 books in 20.9ms
2020-08-20 15:03:38.044  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.LibraryScanner  : Adding new series: Series(name=Locke & Key, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/, fileLastModified=2020-08-20T15:01:09.700904, id=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38.033306, lastModifiedDate=2020-08-20T15:03:38.033311)
2020-08-20 15:03:38.133  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: RefreshBookMetadata(bookId=02AV46G62B30H)
2020-08-20 15:03:38.139  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.LibraryScanner  : Library updated in 130ms
2020-08-20 15:03:38.144  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: AnalyzeBook(bookId=02AV46G62B30H)
2020-08-20 15:03:38.148  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task ScanLibrary(libraryId=02AV46FH6BCM3) executed in 147ms
2020-08-20 15:03:38.159  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: RefreshBookMetadata(bookId=02AV46G62B30H)
2020-08-20 15:03:38.162  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.MetadataLifecycle   : Refresh metadata for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:03:38.176  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.i.m.l.LocalArtworkProvider         : Looking for local thumbnails for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:03:38.183  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: RefreshSeriesMetadata(seriesId=02AV46G66B9VF)
2020-08-20 15:03:38.186  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task RefreshBookMetadata(bookId=02AV46G62B30H) executed in 26.9ms
2020-08-20 15:03:38.191  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: AnalyzeBook(bookId=02AV46G62B30H)
2020-08-20 15:03:38.195  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookLifecycle   : Analyze and persist book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:03:38.196  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookAnalyzer    : Trying to analyze book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:03:38.252  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookAnalyzer    : Detected media type: application/zip
2020-08-20 15:03:38.581 ERROR 30813 --- [DefaultMessageListenerContainer-1] unknown.jul.logger                       : TODO [note: this error message is printed exactly 35 times]
2020-08-20 15:05:30.078  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookAnalyzer    : Book has 35 pages
2020-08-20 15:05:30.095  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: GenerateBookThumbnail(bookId=02AV46G62B30H)
2020-08-20 15:05:30.102  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: RefreshBookMetadata(bookId=02AV46G62B30H)
2020-08-20 15:05:30.106  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task AnalyzeBook(bookId=02AV46G62B30H) executed in 112s
2020-08-20 15:05:30.122  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: RefreshSeriesMetadata(seriesId=02AV46G66B9VF)
2020-08-20 15:05:30.124  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.MetadataLifecycle   : Refresh metadata for series: Series(name=Locke & Key, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/, fileLastModified=2020-08-20T15:01:09.700, id=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38)
2020-08-20 15:05:30.171  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.i.m.l.LocalArtworkProvider         : Looking for local thumbnails for series: Series(name=Locke & Key, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/, fileLastModified=2020-08-20T15:01:09.700, id=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38)
2020-08-20 15:05:30.176  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task RefreshSeriesMetadata(seriesId=02AV46G66B9VF) executed in 53.8ms
2020-08-20 15:05:30.180  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: GenerateBookThumbnail(bookId=02AV46G62B30H)
2020-08-20 15:05:30.182  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookLifecycle   : Generate thumbnail and persist for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:05:30.183  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookAnalyzer    : Generate thumbnail for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:05:30.259 ERROR 30813 --- [DefaultMessageListenerContainer-1] unknown.jul.logger                       : TODO
2020-08-20 15:05:32.613  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookLifecycle   : House keeping thumbnails for book: 02AV46G62B30H
2020-08-20 15:05:32.616  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.domain.service.BookLifecycle   : Book has bo selected thumbnail, choosing one automatically
2020-08-20 15:05:32.620  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task GenerateBookThumbnail(bookId=02AV46G62B30H) executed in 2.44s
2020-08-20 15:05:32.623  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: RefreshBookMetadata(bookId=02AV46G62B30H)
2020-08-20 15:05:32.625  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.MetadataLifecycle   : Refresh metadata for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:05:32.632  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.i.m.l.LocalArtworkProvider         : Looking for local thumbnails for book: Book(name=Locke & Key - Welcome to Lovecraft v01, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/Locke%20&%20Key%20-%20Welcome%20to%20Lovecraft%20v01.cbz, fileLastModified=2020-03-23T10:36:42, fileSize=10309532, number=1, id=02AV46G62B30H, seriesId=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38.118)
2020-08-20 15:05:32.632  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.application.tasks.TaskReceiver     : Sending task: RefreshSeriesMetadata(seriesId=02AV46G66B9VF)
2020-08-20 15:05:32.633  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task RefreshBookMetadata(bookId=02AV46G62B30H) executed in 9.73ms
2020-08-20 15:05:32.638  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Executing task: RefreshSeriesMetadata(seriesId=02AV46G66B9VF)
2020-08-20 15:05:32.639  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.domain.service.MetadataLifecycle   : Refresh metadata for series: Series(name=Locke & Key, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/, fileLastModified=2020-08-20T15:01:09.700, id=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38)
2020-08-20 15:05:32.663  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.k.i.m.l.LocalArtworkProvider         : Looking for local thumbnails for series: Series(name=Locke & Key, url=file:/mnt/storage/media/literature/Comics/Locke%20&%20Key/, fileLastModified=2020-08-20T15:01:09.700, id=02AV46G66B9VF, libraryId=02AV46FH6BCM3, createdDate=2020-08-20T15:03:38, lastModifiedDate=2020-08-20T15:03:38)
2020-08-20 15:05:32.664  INFO 30813 --- [DefaultMessageListenerContainer-1] o.g.komga.application.tasks.TaskHandler  : Task RefreshSeriesMetadata(seriesId=02AV46G66B9VF) executed in 25.6ms

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 27 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Since the author of https://github.com/sejda-pdf/webp-imageio did not reply to my PR, I have forked the repo and published my own version onto JCenter.

Komga will use the native library if possible, and fallback on the java implementation (slower) if the native library is not available or cannot be loaded.

Currently the following OS/Arch are supported:

  • Mac: x86_64
  • Windows: x86, x86_64
  • Linux: arm, armv7, arm64, ppc64, x86, x86_64

That heavily depends on the input file, @whalehub – I’ve run tests over hundreds of heterogenous images using combinations between MozJPEG, jpegoptim, jpeg-recompress from jpeg-archive (my preferred JPEG tool, which internally uses MozJPEG and manages to keep closer to the original while usually producing smaller images when using MS-SSIM or SmallFry), libwebp etc. Sometimes one wins over the other, but it’s hard to compare apples to apples, since the quality metric does not necessarily mean the same thing for all of them. In my experience, webp@q80 still beats jpeg-recompress@q80 by 18-25% overall in file size while retaining better color fidelity.

That being said, it still makes the best sense to keep around the original files as long as enough space is available. I use webp for remote disaster recovery backups, where using webp versus jpg means I can cut my storage costs in half; I wouldn’t even consider going a lossy-to-lossy re-encoding route if I didn’t archive the original files.

For docker you can leverage on bind mounts to only change a single file.

In your docker-compose.yml add the following in your volumes:

- type: bind
  source: /path/to/this/webp-imageio-0.1.6.jar
  target: /app/BOOT-INF/lib/webp-imageio-decoder-plugin-0.2.jar
  read_only: true

Recreate the container. That’s it!

@mihailim I could kiss you right now. Komga is analyzing my library at breakneck speeds with this build. 😁

@blkjack410 Here’s a little Bash script I wrote to automate the build process for new Komga releases.

It is to be executed like this: sudo ./build-komga.sh 0.62.5

#!/bin/bash

VERSION="$1"
BUILD_PATH="/tmp/komga-build"

mkdir "${BUILD_PATH:?}" &&
  cd "${BUILD_PATH:?}" &&
  curl -sSL -o komga-tmp.jar https://github.com/gotson/komga/releases/download/v"${VERSION:?}"/komga-"${VERSION:?}".jar &&
  unzip -q komga-tmp.jar &&
  curl -sSL -o BOOT-INF/lib/webp-imageio-0.1.6.jar https://search.maven.org/remotecontent?filepath=org/sejda/imageio/webp-imageio/0.1.6/webp-imageio-0.1.6.jar &&
  rm BOOT-INF/lib/webp-imageio-decoder-plugin-0.2.jar &&
  sed -i 's|webp-imageio-decoder-plugin-0.2.jar|webp-imageio-0.1.6.jar|g' BOOT-INF/classpath.idx &&
  zip -q -0 -r komga.jar META-INF org BOOT-INF &&
  mv komga.jar /usr/local/bin/komga.jar &&
  rm -rf "${BUILD_PATH:?}"

@mihailim Thanks soooo much!! It’s so fast now and I finally can upgrade komga to the last version.

The current library is the issue, but i plan to switch to a native library once https://github.com/sejda-pdf/webp-imageio/pull/6 is merged. The current version doesn’t support all the architectures that Komga supports.

I just did some tests with a 236 pages book with webp images, the native library performs the analysis in 4s.

It is really not wise to convert from a lossy format like JPEG to another format. You should only ever convert from lossless to lossy, never from lossy to anything else!