gcsfs: HttpError with no message halfway through large GS write workload

Similar to https://github.com/dask/gcsfs/issues/315, I saw this today after 10s of GB of data had already been written to a Zarr archive (i.e. this isn’t a problem with initial writes, it appears to be something spurious in long-running jobs):

Traceback (most recent call last):
  File "scripts/convert_genetic_data.py", line 312, in <module>
    fire.Fire()
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "scripts/convert_genetic_data.py", line 296, in bgen_to_zarr
    ds = rechunk_dataset(
  File "scripts/convert_genetic_data.py", line 217, in rechunk_dataset
    res = fn(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/sgkit/io/bgen/bgen_reader.py", line 519, in rechunk_bgen
    rechunked.execute()
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/rechunker/api.py", line 76, in execute
    self._executor.execute_plan(self._plan, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/rechunker/executors/dask.py", line 24, in execute_plan
    return plan.compute(**kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/array/core.py", line 3724, in store_chunk
    return load_store_chunk(x, out, index, lock, return_stored, False)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/dask/array/core.py", line 3713, in load_store_chunk
    out[index] = np.asanyarray(x)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__
    self.set_basic_selection(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem
    self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync
    self.chunk_store[ckey] = cdata
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/mapping.py", line 154, in __setitem__
    self.fs.pipe_file(key, value)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file
    return await simple_upload(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload
    j = await fs._call(
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call
    raise e
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call
    self.validate_response(status, contents, json, path, headers)
  File "/workdir/.snakemake/conda/0a479a2e/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response
    raise HttpError(error)
gcsfs.utils.HttpError: Required

Any ideas what this could be or if it should be caught/retried somewhere?

gcsfs version: 0.7.1

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 27 (24 by maintainers)

Most upvoted comments

Awesome, with #380 and #385 I’m now able to smoothly write multi-terabyte zarr arrays direct to GCS.

Is there a way to do that when working through Xarray

I’m not sure - it’s an optional argument to .compute().