pangeo-forge-recipes: 0.6.0 release causes authentication issue with fsspec_open_kwargs used in FilePattern
Moving the discussion of this issue over from https://github.com/pangeo-forge/pangeo-forge-recipes/pull/192 for clarity as per @rabernat 's comment https://github.com/pangeo-forge/pangeo-forge-recipes/pull/192#issuecomment-923497234. cache_input tasks fail when using fsspec_open_kwargs fail in a FilePattern. The same recipe code using pangeo-forge-recipes==0.5.0 with fsspec_open_kwargs directly in the recipe args works. I reproduced this issue using pangeo-forge-recipes==0.6.0 (potentially introduced via https://github.com/pangeo-forge/pangeo-forge-recipes/pull/167) with the below recipe code
import aiohttp
import pandas as pd
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe
# TODO: replace with ENV vars
username = password = "pangeo@developmentseed.org"
def make_filename(time):
input_url_pattern = (
"https://arthurhouhttps.pps.eosdis.nasa.gov/gpmdata/{yyyy}/{mm}/{dd}/"
"imerg/3B-HHR.MS.MRG.3IMERG.{yyyymmdd}-S{sh}{sm}00-E{eh}{em}59.{MMMM}.V06B.HDF5"
).format(
yyyy=time.strftime("%Y"),
mm=time.strftime("%m"),
dd=time.strftime("%d"),
yyyymmdd=time.strftime("%Y%m%d"),
sh=time.strftime("%H"),
sm=time.strftime("%M"),
eh=time.strftime("%H"),
em=(time + pd.Timedelta("29 min")).strftime("%M"),
MMMM=f"{(time.hour*60 + time.minute):04}",
)
return input_url_pattern
dates = pd.date_range("2000-06-01T00:00:00", "2021-05-31T23:59:59", freq="30min")
time_concat_dim = ConcatDim("time", dates, nitems_per_file=1)
pattern = FilePattern(
make_filename,
time_concat_dim,
fsspec_open_kwargs={"auth": aiohttp.BasicAuth(username, password)},
)
recipe = XarrayZarrRecipe(
pattern,
xarray_open_kwargs={"group": "Grid", "decode_coords": "all"},
inputs_per_chunk=1,
copy_input_to_local_file=True,
)
@rabernat As per your question from the referenced comment the secrets are hardcoded in the flow with username and password (these are throwaway credentials we included with an email alias as we haven’t finalized the pangeo-forge recipe author secrets management approach).
For reference the DEBUG logs are
| | 2021-09-21T16:34:11.472462183Z stdout F [2021-09-21 16:34:11+0000] INFO - prefect.CloudTaskRunner \| Task 'cache_input[0]': Starting task run...
| | 2021-09-21T16:34:11.472524083Z stderr F INFO:prefect.CloudTaskRunner:Task 'cache_input[0]': Starting task run...
| | 2021-09-21T16:34:11.701348823Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Caching input 'time-0'
| | 2021-09-21T16:34:11.701679328Z stderr F INFO:pangeo_forge_recipes.storage:Caching file 'https://arthurhouhttps.pps.eosdis.nasa.gov/gpmdata/2000/06/01/imerg/3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5'
| | 2021-09-21T16:34:11.858015704Z stderr F INFO:pangeo_forge_recipes.storage:Copying remote file 'https://arthurhouhttps.pps.eosdis.nasa.gov/gpmdata/2000/06/01/imerg/3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5' to cache
| | 2021-09-21T16:34:12.695390428Z stderr F DEBUG:pangeo_forge_recipes.storage:entering fs.open context manager for abfs://pangeof-bakery-flow-cache-container/executor_test/gpm-imerge-hhr_executor_test/cache/ea77c75397a9c15b17ff4a2746ba5ac2-https_arthurhouhttps.pps.eosdis.nasa.gov_gpmdata_2000_06_01_imerg_3b-hhr.ms.mrg.3imerg.20000601-s000000-e002959.0000.v06b.hdf5
| | 2021-09-21T16:34:12.696209939Z stderr F DEBUG:pangeo_forge_recipes.storage:FSSpecTarget.open yielding <File-like object AzureBlobFileSystem, pangeof-bakery-flow-cache-container/executor_test/gpm-imerge-hhr_executor_test/cache/ea77c75397a9c15b17ff4a2746ba5ac2-https_arthurhouhttps.pps.eosdis.nasa.gov_gpmdata_2000_06_01_imerg_3b-hhr.ms.mrg.3imerg.20000601-s000000-e002959.0000.v06b.hdf5>
| | 2021-09-21T16:34:12.800140319Z stdout F [2021-09-21 16:34:12+0000] ERROR - prefect.CloudTaskRunner \| Task 'cache_input[0]': Exception encountered during task execution!
| | 2021-09-21T16:34:12.80019312Z stdout F Traceback (most recent call last):
| | 2021-09-21T16:34:12.80020332Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 861, in get_task_run_state
| | 2021-09-21T16:34:12.80020932Z stdout F value = prefect.utilities.executors.run_task_with_timeout(
| | 2021-09-21T16:34:12.800232921Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
| | 2021-09-21T16:34:12.800238121Z stdout F return task.run(*args, **kwargs) # type: ignore
| | 2021-09-21T16:34:12.800247021Z stdout F File "/Users/seanharkins/projects/pangeo_forge_prefect/pangeo_forge_prefect/flow_manager.py", line 74, in wrapper
| | 2021-09-21T16:34:12.800253621Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 185, in cache_input
| | 2021-09-21T16:34:12.800258921Z stdout F input_cache.cache_file(
| | 2021-09-21T16:34:12.800263921Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/storage.py", line 165, in cache_file
| | 2021-09-21T16:34:12.800268521Z stdout F _copy_btw_filesystems(input_opener, target_opener)
| | 2021-09-21T16:34:12.800271621Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/storage.py", line 38, in _copy_btw_filesystems
| | 2021-09-21T16:34:12.800274621Z stdout F data = source.read(BLOCK_SIZE)
| | 2021-09-21T16:34:12.800278621Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/implementations/http.py", line 498, in read
| | 2021-09-21T16:34:12.800282621Z stdout F return super().read(length)
| | 2021-09-21T16:34:12.800287021Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/spec.py", line 1487, in read
| | 2021-09-21T16:34:12.800250521Z stderr F ERROR:prefect.CloudTaskRunner:Task 'cache_input[0]': Exception encountered during task execution!
| | 2021-09-21T16:34:12.800318622Z stderr F Traceback (most recent call last):
| | 2021-09-21T16:34:12.800344722Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/prefect/engine/task_runner.py", line 861, in get_task_run_state
| | 2021-09-21T16:34:12.800353322Z stderr F value = prefect.utilities.executors.run_task_with_timeout(
| | 2021-09-21T16:34:12.800365822Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
| | 2021-09-21T16:34:12.800371722Z stderr F return task.run(*args, **kwargs) # type: ignore
| | 2021-09-21T16:34:12.800378022Z stderr F File "/Users/seanharkins/projects/pangeo_forge_prefect/pangeo_forge_prefect/flow_manager.py", line 74, in wrapper
| | 2021-09-21T16:34:12.800383523Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 185, in cache_input
| | 2021-09-21T16:34:12.800292921Z stdout F out = self.cache._fetch(self.loc, self.loc + length)
| | 2021-09-21T16:34:12.800400323Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/caching.py", line 376, in _fetch
| | 2021-09-21T16:34:12.800419423Z stdout F self.cache = self.fetcher(start, bend)
| | 2021-09-21T16:34:12.800428723Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 99, in wrapper
| | 2021-09-21T16:34:12.800432223Z stdout F return sync(self.loop, func, *args, **kwargs)
| | 2021-09-21T16:34:12.800437223Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 80, in sync
| | 2021-09-21T16:34:12.800440623Z stdout F raise result[0]
| | 2021-09-21T16:34:12.800444923Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 30, in _runner
| | 2021-09-21T16:34:12.800455723Z stdout F result[0] = await coro
| | 2021-09-21T16:34:12.800461324Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/implementations/http.py", line 537, in async_fetch_range
| | 2021-09-21T16:34:12.800466224Z stdout F r.raise_for_status()
| | 2021-09-21T16:34:12.800388723Z stderr F input_cache.cache_file(
| | 2021-09-21T16:34:12.800470924Z stdout F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 1000, in raise_for_status
| | 2021-09-21T16:34:12.800481224Z stdout F raise ClientResponseError(
| | 2021-09-21T16:34:12.800477924Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/storage.py", line 165, in cache_file
| | 2021-09-21T16:34:12.800508024Z stderr F _copy_btw_filesystems(input_opener, target_opener)
| | 2021-09-21T16:34:12.800513924Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/pangeo_forge_recipes/storage.py", line 38, in _copy_btw_filesystems
| | 2021-09-21T16:34:12.800519424Z stderr F data = source.read(BLOCK_SIZE)
| | 2021-09-21T16:34:12.800488124Z stdout F aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://arthurhouhttps.pps.eosdis.nasa.gov/gpmdata/2000/06/01/imerg/3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5')
| | 2021-09-21T16:34:12.800524524Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/implementations/http.py", line 498, in read
| | 2021-09-21T16:34:12.800545425Z stderr F return super().read(length)
| | 2021-09-21T16:34:12.800550425Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/spec.py", line 1487, in read
| | 2021-09-21T16:34:12.800555125Z stderr F out = self.cache._fetch(self.loc, self.loc + length)
| | 2021-09-21T16:34:12.800558325Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/caching.py", line 376, in _fetch
| | 2021-09-21T16:34:12.800561725Z stderr F self.cache = self.fetcher(start, bend)
| | 2021-09-21T16:34:12.800565425Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 99, in wrapper
| | 2021-09-21T16:34:12.800569425Z stderr F return sync(self.loop, func, *args, **kwargs)
| | 2021-09-21T16:34:12.800574225Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 80, in sync
| | 2021-09-21T16:34:12.800579125Z stderr F raise result[0]
| | 2021-09-21T16:34:12.800583925Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/asyn.py", line 30, in _runner
| | 2021-09-21T16:34:12.800588625Z stderr F result[0] = await coro
| | 2021-09-21T16:34:12.800595425Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/fsspec/implementations/http.py", line 537, in async_fetch_range
| | 2021-09-21T16:34:12.800599025Z stderr F r.raise_for_status()
| | 2021-09-21T16:34:12.800602625Z stderr F File "/srv/conda/envs/notebook/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 1000, in raise_for_status
| | 2021-09-21T16:34:12.800606125Z stderr F raise ClientResponseError(
| | 2021-09-21T16:34:12.800609326Z stderr F aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://arthurhouhttps.pps.eosdis.nasa.gov/gpmdata/2000/06/01/imerg/3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5')
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (15 by maintainers)
@rabernat @cisaacstern After some investigation I was able to isolate this issue to
copy_pruned.Fails the assertion. This is occurring as we don’t forward all the original args through to our newly constructed
FilePatterninprune_patternhttps://github.com/pangeo-forge/pangeo-forge-recipes/blob/0352ca0a5c078288444d757167a1c5e9424e1039/pangeo_forge_recipes/patterns.py#L270Amazing work here folks. We caught a bug.
If I had a provided a more complete reproducer in the first place I’m sure I would have noticed it immediately 🤦 . I would potentially err on the side of having you tackle this as I’m not fully versed in all the changes in #167 . I’d be happy to coordinate though and I’ll see if I can whip up a failing test.
Good point Charles. Let’s discuss this issue today.
Gotcha! 🙃
I would start by looking at serialization. My first guess was that
aiohttp.BasicAuthwould purge credentials from its serialized form. I checked this, and it appears not to be the case. The credentials are still there.