ArchiveBox: ArchiveBox v0.6.2 on bare metal is unable to use newest Chromium version v114 (fails to archive PDF, Screenshot, or DOM)
I have installed archivebox and everything has worked fine other than Chromium, and CHROME_BINARY is valid. I’ve also specified CHROME_USER_DATA_DIR to /tmp/chrome-profile but doesn’t make any change.
ArchiveBox v0.6.2
Cpython Linux Linux-5.10.0-23-amd64-x86_64-with-glibc2.31 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep
[i] Dependency versions:
√ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox
√ PYTHON_BINARY v3.9.2 valid /usr/bin/python3.9
√ DJANGO_BINARY v3.1.14 valid /home/debian/.local/lib/python3.9/site-packages/django/bin/django-admin.py
√ CURL_BINARY v7.74.0 valid /usr/bin/curl
√ WGET_BINARY v1.21 valid /usr/bin/wget
√ NODE_BINARY v18.16.0 valid /home/debian/.nvm/versions/node/v18.16.0/bin/node
√ SINGLEFILE_BINARY v1.0.33 valid ./node_modules/single-file/cli/single-file
√ READABILITY_BINARY v0.0.6 valid ./node_modules/readability-extractor/readability-extractor
√ MERCURY_BINARY v1.0.0 valid ./node_modules/@postlight/mercury-parser/cli.js
√ GIT_BINARY v2.30.2 valid /usr/bin/git
√ YOUTUBEDL_BINARY v2021.12.17 valid /usr/local/bin/youtube-dl
√ CHROME_BINARY v114.0.5735.106 valid /usr/bin/chromium
√ RIPGREP_BINARY v12.1.1 valid /usr/bin/rg
[i] Source-code locations:
√ PACKAGE_DIR 23 files valid /usr/local/lib/python3.9/dist-packages/archivebox
√ TEMPLATES_DIR 3 files valid /usr/local/lib/python3.9/dist-packages/archivebox/templates
- CUSTOM_TEMPLATES_DIR - disabled
[i] Secrets locations:
√ CHROME_USER_DATA_DIR 27 files valid /tmp/chrome-profile
√ COOKIES_FILE 951.0 Bytes valid /home/debian/Documents/cookies.txt
[i] Data locations:
√ OUTPUT_DIR 8 files valid /home/debian/archivebox
√ SOURCES_DIR 13 files valid ./sources
√ LOGS_DIR 1 files valid ./logs
√ ARCHIVE_DIR 1 files valid ./archive
√ CONFIG_FILE 223.0 Bytes valid ./ArchiveBox.conf
√ SQL_INDEX 216.0 KB valid ./index.sqlite3
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (5 by maintainers)
Have you considered using browselress-chrome?
https://github.com/browserless/chrome/
I think this issue is related to this issue:
Orphan chromium processes continue running after ArchiveBox snapshot jobs complete
By running docker top archivebox I can see that the chromium processes persist after finishing archiving. When this happens, the singletonlock file persists and is not killed.