filesystem_spec: http implementation return error when URL contains "?"
When I try to fetch an URL with a “?” like this one:
uri = "https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&time%3E=2026-12-20T00%3A00%3A00Z&time%3C=2026-12-27T14%3A48%3A20Z&distinct()"
which is a simple webAPI request, the http implementations throw an error because it can’t “expand” this path:
import fsspec
fs = fsspec.filesystem('http')
fs.get(uri, "toto.nc")
returns the error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-48-ee357ddd0be1> in <module>
5 uri = "https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&time%3E=2026-12-20T00%3A00%3A00Z&time%3C=2026-12-27T14%3A48%3A20Z&distinct()"
6
----> 7 fs.get(uri, "toto.nc")
~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/asyn.py in get(self, rpath, lpath, recursive, **kwargs)
240 rpath = self._strip_protocol(rpath)
241 lpath = make_path_posix(lpath)
--> 242 rpaths = self.expand_path(rpath, recursive=recursive)
243 lpaths = other_paths(rpaths, lpath)
244 return sync(self.loop, self._get, rpaths, lpaths)
~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/spec.py in expand_path(self, path, recursive, maxdepth)
722 """Turn one or more globs or directories into a list of all matching files"""
723 if isinstance(path, str):
--> 724 out = self.expand_path([path], recursive, maxdepth)
725 else:
726 out = set()
~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/spec.py in expand_path(self, path, recursive, maxdepth)
737 out.add(p)
738 if not out:
--> 739 raise FileNotFoundError(path)
740 return list(sorted(out))
741
FileNotFoundError: ['https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&time%3E=2026-12-20T00%3A00%3A00Z&time%3C=2026-12-27T14%3A48%3A20Z&distinct()']
going further, I found out that it is because glob.has_magic return True with the path string that the expand_path fails.
glob.has_magic(uri)
>>> True
and finally, glob returns True because of the “?” in the URI:
glob.has_magic(uri.replace("?",""))
>>> False
So, is this a bug in the internal fsspec machinery that should take more core with URLs in this case, or am I fetching this kind of webAPI the wrong way ?
Thanks for your help ! g
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (8 by maintainers)
I can also ping you periodically @gmaze !
OK, I should not be surprised by that. Can you please look into how much work it would be to override the glob functionality in HTTPFileSystem, which would presumably involve a custom has_magic and glob re-replacement patterns. You would presumably need class attributes for the regex to use when checking for magics… sounds a little painful.
By the way, using the _file non-expanding version is a perfectly fine solution.