ArchiveBox: Bugfix: django branch start_ts error on init
Describe the bug
When attempting to archivebox init
with version 0.4.3 in old archive, archivebox fails at Collecting links from any existing indexes and archive folders...
with KeyError: 'start_ts'
Steps to reproduce
- Installed Django branch with
git clone
andpip install .
. - Navigated to old archive directory.
- Ran
archivebox init
- archivebox goes through most of importing process, and then dies with the error listed below.
Screenshots or log output
Traceback (most recent call last):
File "/home/USERNAME/.local/bin/archivebox", line 8, in <module>
sys.exit(main())
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 126, in main
pwd=pwd or OUTPUT_DIR,
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/__init__.py", line 62, in run_subcommand
module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/cli/archivebox_init.py", line 34, in main
out_dir=pwd or OUTPUT_DIR,
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 108, in typechecked_function
return func(*args, **kwargs)
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/main.py", line 316, in init
for link in load_main_index(out_dir=out_dir, warn=False)
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/util.py", line 108, in typechecked_function
return func(*args, **kwargs)
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/__init__.py", line 250, in load_main_index
all_links = list(parse_json_main_index(out_dir))
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/json.py", line 52, in parse_json_main_index
yield Link.from_json(link_json)
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/schema.py", line 203, in from_json
cast_result = ArchiveResult.from_json(json_result)
File "/home/USERNAME/.local/lib/python3.7/site-packages/archivebox/index/schema.py", line 62, in from_json
info['start_ts'] = parse_date(info['start_ts'])
KeyError: 'start_ts'
Software versions
- OS: Ubuntu 18.04.4 LTS
- ArchiveBox version: 848977e
- Python version: Python 3.7.8
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (8 by maintainers)
Awesome, that’s a relief to hear. We were worried it was a regression from the latest version. I’m going to close this issue for now but I’ll keep responding to your comments here, don’t worry.
If you post a ZIP (or email me ) of a handful of those swapped folders I’ll write you a bash script that fixes it.
@drpfenderson one more try please. Also, if you install it with
pip install -e .
you will always have installed the version of the code you are currently running (no need to pip install after changing branches i.e.)Perfect, thanks for those samples. It confirms our suspicion that you had a few links archived with a very old version before we introduced
start_ts
. We’ll add a workaround that will handle that older schema and upgrade those files to the new style.(Also thanks for the sponsorship @drpfenderson!)
Here is a snippet from the beginning of the main index.json file. Here is another snipped from later in the file. Let me know if you would like/need more, or are looking for something in particular.
The index.html says that it was created with version a3a048d4. Here is a gist containing the output of one of the most recent index.json files, with redacted personal info.