ArchiveBox: ArchiveBox v0.6.2 on bare metal is unable to use newest Chromium version v114 (fails to archive PDF, Screenshot, or DOM)

I have installed archivebox and everything has worked fine other than Chromium, and CHROME_BINARY is valid. I’ve also specified CHROME_USER_DATA_DIR to /tmp/chrome-profile but doesn’t make any change.

ArchiveBox v0.6.2
Cpython Linux Linux-5.10.0-23-amd64-x86_64-with-glibc2.31 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox
 √  PYTHON_BINARY         v3.9.2          valid     /usr/bin/python3.9
 √  DJANGO_BINARY         v3.1.14         valid     /home/debian/.local/lib/python3.9/site-packages/django/bin/django-admin.py
 √  CURL_BINARY           v7.74.0         valid     /usr/bin/curl
 √  WGET_BINARY           v1.21           valid     /usr/bin/wget
 √  NODE_BINARY           v18.16.0        valid     /home/debian/.nvm/versions/node/v18.16.0/bin/node
 √  SINGLEFILE_BINARY     v1.0.33         valid     ./node_modules/single-file/cli/single-file
 √  READABILITY_BINARY    v0.0.6          valid     ./node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js
 √  GIT_BINARY            v2.30.2         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2021.12.17     valid     /usr/local/bin/youtube-dl
 √  CHROME_BINARY         v114.0.5735.106  valid     /usr/bin/chromium
 √  RIPGREP_BINARY        v12.1.1         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/local/lib/python3.9/dist-packages/archivebox
 √  TEMPLATES_DIR         3 files         valid     /usr/local/lib/python3.9/dist-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled

[i] Secrets locations:
 √  CHROME_USER_DATA_DIR  27 files        valid     /tmp/chrome-profile
 √  COOKIES_FILE          951.0 Bytes     valid     /home/debian/Documents/cookies.txt

[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     /home/debian/archivebox
 √  SOURCES_DIR           13 files        valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           1 files         valid     ./archive
 √  CONFIG_FILE           223.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             216.0 KB        valid     ./index.sqlite3

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

Have you considered using browselress-chrome?

https://github.com/browserless/chrome/

I think this issue is related to this issue:

Orphan chromium processes continue running after ArchiveBox snapshot jobs complete

By running docker top archivebox I can see that the chromium processes persist after finishing archiving. When this happens, the singletonlock file persists and is not killed.