snakemake: JSONDecodeError while handling job success on LSF

Snakemake version 6.12.3

Describe the bug This problem happens when running a snakemake pipeline on LSF. The pipeline exits immediately and it is not reported as a job failure.

Logs

Traceback (most recent call last):
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/__init__.py", line 783, in snakemake
    keepmetadata=keep_metadata,
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/workflow.py", line 1073, in execute
    success = self.scheduler.schedule()
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/scheduler.py", line 440, in schedule
    self._finish_jobs()
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/scheduler.py", line 529, in _finish_jobs
    self.get_executor(job).handle_job_success(job)
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 871, in handle_job_success
    job, upload_remote=False, handle_log=False, handle_touch=False
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 272, in handle_job_success
    keep_metadata=self.keepmetadata,
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/jobs.py", line 1033, in postprocess
    self, keep_metadata=keep_metadata
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/persistence.py", line 239, in finished
    starttime = self._read_record(self._metadata_path, f).get(
  File "/home/anaconda3/envs/gtex/lib/python3.7/site-packages/snakemake/persistence.py", line 414, in _read_record_uncached
    return json.load(f)
  File "/home/anaconda3/envs/gtex/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/home/anaconda3/envs/gtex/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/anaconda3/envs/gtex/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/anaconda3/envs/gtex/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Minimal example I am running snakemake with these options:

snakemake -j 1000 --latency-wait 90 -p --keep-going --config config1="config1" --cluster "rm logs/cluster/project_{rule}/{rule}.{wildcards}.out logs/cluster/project_{rule}/{rule}.{wildcards}.err ; mkdir -p logs/cluster/project_{rule}; bsub -R \"rusage[mem={resources.mem_mb}] select[mem>{resources.mem_mb}] span[hosts=1]\" -M {resources.mem_mb} -m \"modern_hardware\" -n {threads} -J \"project_{rule}.{wildcards}\" -G ${group} -o logs/cluster/project_{rule}/{rule}.{wildcards}.out -e logs/cluster/project_{rule}/{rule}.{wildcards}.err" -s project.Snakefile

Additional context I can’t post the input files as it is on the LSF cluster

Any help or pointers will be very appreciated

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 7
  • Comments: 33 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Me an my colleagues observe the same issue on SLURM. But not all of us have them Completely reinstalling it helped for a while but after a while we run into that that again.

We were wondering if it’s some package/library/python issue…

I also see this in the latest snakemake (7.3.x) + slurm, after a few runs of the same workflow.

It is nearly the same error. differences maybe due to different snakemake versions. I use 6.13.1 and python 3.7

Full Traceback (most recent call last):
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/__init__.py", line 783, in snakemake
    keepmetadata=keep_metadata,
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/workflow.py", line 1073, in execute
    success = self.scheduler.schedule()
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/scheduler.py", line 440, in schedule
    self._finish_jobs()
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/scheduler.py", line 529, in _finish_jobs
    self.get_executor(job).handle_job_success(job)
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 875, in handle_job_success
    job, upload_remote=False, handle_log=False, handle_touch=False
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 272, in handle_job_success
    keep_metadata=self.keepmetadata,
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/jobs.py", line 1033, in postprocess
    self, keep_metadata=keep_metadata
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/persistence.py", line 239, in finished
    starttime = self._read_record(self._metadata_path, f).get(
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/site-packages/snakemake/persistence.py", line 414, in _read_record_uncached
    return json.load(f)
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/fast/users/schubacm_c/work/miniconda3/lib/python3.7/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 3047 (char 3046)

I got this error as well with the latest snakemake version (7.3.2) on an LSF cluster. I was able to bypass the error by deleting the snakemake directory (rm -rf .snakemake/).

This line: https://github.com/snakemake/snakemake/blob/8c5c5e8261ae12fc40c3c293045dea6af7df0e01/snakemake/persistence.py#L393 Write several times the external_jobid to this file. First it writes something like {"external_jobid": 2131274"} (after submitting the job to the clsuter). Then it writes several times null to it. Basically because there is no job submission done. {"external_jobid": null}. But it seems that the old file is NOT overwritten. It just starts from the beginning and adds {"external_jobid": null} to it resulting in something like {"external_jobid": null}74"}.

Might be an issue when multiple subprocesses write into that file. So my guess it is a concurrency issue.

I believe a lock is necessary around here:

https://github.com/snakemake/snakemake/blob/994b1510766083df7f22d10c0e6e4bb65ffdd710/snakemake/persistence.py#L396

OR it must be guaranteed that the file is never written to twice…