filesystem_spec: http implementation return error when URL contains "?"

When I try to fetch an URL with a “?” like this one:

uri = "https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&time%3E=2026-12-20T00%3A00%3A00Z&time%3C=2026-12-27T14%3A48%3A20Z&distinct()"

which is a simple webAPI request, the http implementations throw an error because it can’t “expand” this path:

import fsspec
fs = fsspec.filesystem('http')
fs.get(uri, "toto.nc")

returns the error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-48-ee357ddd0be1> in <module>
      5 uri = "https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&amp;time%3E=2026-12-20T00%3A00%3A00Z&amp;time%3C=2026-12-27T14%3A48%3A20Z&amp;distinct()"
      6 
----> 7 fs.get(uri, "toto.nc")

~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/asyn.py in get(self, rpath, lpath, recursive, **kwargs)
    240         rpath = self._strip_protocol(rpath)
    241         lpath = make_path_posix(lpath)
--> 242         rpaths = self.expand_path(rpath, recursive=recursive)
    243         lpaths = other_paths(rpaths, lpath)
    244         return sync(self.loop, self._get, rpaths, lpaths)

~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/spec.py in expand_path(self, path, recursive, maxdepth)
    722         """Turn one or more globs or directories into a list of all matching files"""
    723         if isinstance(path, str):
--> 724             out = self.expand_path([path], recursive, maxdepth)
    725         else:
    726             out = set()

~/anaconda/envs/obidam36/lib/python3.6/site-packages/fsspec/spec.py in expand_path(self, path, recursive, maxdepth)
    737                 out.add(p)
    738         if not out:
--> 739             raise FileNotFoundError(path)
    740         return list(sorted(out))
    741 

FileNotFoundError: ['https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_center&amp;time%3E=2026-12-20T00%3A00%3A00Z&amp;time%3C=2026-12-27T14%3A48%3A20Z&amp;distinct()']

going further, I found out that it is because glob.has_magic return True with the path string that the expand_path fails.

glob.has_magic(uri)
>>> True

and finally, glob returns True because of the “?” in the URI:

glob.has_magic(uri.replace("?",""))
>>> False

So, is this a bug in the internal fsspec machinery that should take more core with URLs in this case, or am I fetching this kind of webAPI the wrong way ?

Thanks for your help ! g

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I can also ping you periodically @gmaze !

OK, I should not be surprised by that. Can you please look into how much work it would be to override the glob functionality in HTTPFileSystem, which would presumably involve a custom has_magic and glob re-replacement patterns. You would presumably need class attributes for the regex to use when checking for magics… sounds a little painful.

By the way, using the _file non-expanding version is a perfectly fine solution.