snakemake: JSONDecodeError while handling job success on LSF
Snakemake version 6.12.3
Describe the bug This problem happens when running a snakemake pipeline on LSF. The pipeline exits immediately and it is not reported as a job failure.
Logs
Traceback (most recent call last):
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/__init__.py", line 783, in snakemake
keepmetadata=keep_metadata,
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/workflow.py", line 1073, in execute
success = self.scheduler.schedule()
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/scheduler.py", line 440, in schedule
self._finish_jobs()
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/scheduler.py", line 529, in _finish_jobs
self.get_executor(job).handle_job_success(job)
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 871, in handle_job_success
job, upload_remote=False, handle_log=False, handle_touch=False
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 272, in handle_job_success
keep_metadata=self.keepmetadata,
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/jobs.py", line 1033, in postprocess
self, keep_metadata=keep_metadata
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/persistence.py", line 239, in finished
starttime = self._read_record(self._metadata_path, f).get(
File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/persistence.py", line 414, in _read_record_uncached
return json.load(f)
File "/home/anaconda3/envs/gtex/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/anaconda3/envs/gtex/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/anaconda3/envs/gtex/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/anaconda3/envs/gtex/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Minimal example I am running snakemake with these options:
snakemake -j 1000 --latency-wait 90 -p --keep-going --config config1="config1" --cluster "rm logs/cluster/project_{rule}/{rule}.{wildcards}.out logs/cluster/project_{rule}/{rule}.{wildcards}.err ; mkdir -p logs/cluster/project_{rule}; bsub -R \"rusage[mem={resources.mem_mb}] select[mem>{resources.mem_mb}] span[hosts=1]\" -M {resources.mem_mb} -m \"modern_hardware\" -n {threads} -J \"project_{rule}.{wildcards}\" -G ${group} -o logs/cluster/project_{rule}/{rule}.{wildcards}.out -e logs/cluster/project_{rule}/{rule}.{wildcards}.err" -s project.Snakefile
Additional context I can’t post the input files as it is on the LSF cluster
Any help or pointers will be very appreciated
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 7
- Comments: 33 (14 by maintainers)
Commits related to this issue
- fix: Adding file system based locks for meta data (#1342) The locks are based on the fasteners package and will be enabled automatically in cluster mode. API users can explicitely enable and disable ... — committed to snakemake/snakemake by holtgrewe 2 years ago
- fix: Adding file system based locks for meta data (#1342) The locks are based on the fasteners package and will be enabled automatically in cluster mode. API users can explicitely enable and disable ... — committed to snakemake/snakemake by holtgrewe 2 years ago
- fix: Adding file system based locks for meta data (#1342) The locks are based on the fasteners package and will be enabled automatically in cluster mode. API users can explicitely enable and disable ... — committed to snakemake/snakemake by holtgrewe 2 years ago
- fix: Adding file system based locks for meta data (#1342) The locks are based on the fasteners package and will be enabled automatically in cluster mode. API users can explicitely enable and disable ... — committed to snakemake/snakemake by holtgrewe 2 years ago
Me an my colleagues observe the same issue on SLURM. But not all of us have them Completely reinstalling it helped for a while but after a while we run into that that again.
We were wondering if it’s some package/library/python issue…
I also see this in the latest snakemake (7.3.x) + slurm, after a few runs of the same workflow.
It is nearly the same error. differences maybe due to different snakemake versions. I use 6.13.1 and python 3.7
I got this error as well with the latest snakemake version (7.3.2) on an LSF cluster. I was able to bypass the error by deleting the snakemake directory (
rm -rf .snakemake/).This line: https://github.com/snakemake/snakemake/blob/8c5c5e8261ae12fc40c3c293045dea6af7df0e01/snakemake/persistence.py#L393 Write several times the external_jobid to this file. First it writes something like
{"external_jobid": 2131274"}(after submitting the job to the clsuter). Then it writes several times null to it. Basically because there is no job submission done.{"external_jobid": null}. But it seems that the old file is NOT overwritten. It just starts from the beginning and adds{"external_jobid": null}to it resulting in something like{"external_jobid": null}74"}.Might be an issue when multiple subprocesses write into that file. So my guess it is a concurrency issue.
I believe a lock is necessary around here:
https://github.com/snakemake/snakemake/blob/994b1510766083df7f22d10c0e6e4bb65ffdd710/snakemake/persistence.py#L396
OR it must be guaranteed that the file is never written to twice…