pyproj: UnicodeDecodeError when run multithreaded

Code Sample, a copy-pastable example if possible

# TODO: working on it

Problem description

This is something that’s been noticed in Satpy specifically and is being tracked here: https://github.com/pytroll/satpy/issues/1114

The bottom line is that a couple of our users have been getting UnicodeDecodeErrors or errors about bad proj definitions. The really annoying bit is that is seems to be some sort of race condition or other multi-threading related issue. We are using xarray+dask and have a pyproj CRS object in the .coords of our DataArrays. We get errors like:

    return [_execute_task(a, cache) for a in arg]
  File "/work/geo2grid/lib/python3.7/site-packages/dask/core.py", line 122, in _execute_task
    elif arg in cache:
  File "/work/geo2grid/lib/python3.7/site-packages/pyproj/crs/crs.py", line 869, in __hash__
    return hash(self.to_wkt())
  File "pyproj/_crs.pyx", line 451, in pyproj._crs.Base.to_wkt
  File "pyproj/_crs.pyx", line 120, in pyproj._crs._to_wkt
  File "pyproj/_crs.pyx", line 24, in pyproj._crs.cstrdecode
  File "/work/geo2grid/lib/python3.7/site-packages/pyproj/compat.py", line 21, in pystrdecode
    return cstr.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 0: invalid continuation byte
Command exited with non-zero status 1

Or:

  File "C:\ProgramData\Miniconda3\lib\site-packages\pyresample\geometry.py", line 1012, in invproj
    target_proj = Proj(proj_dict)
  File "C:\ProgramData\Miniconda3\lib\site-packages\pyresample\_spatial_mp.py", line 121, in __init__
    **kwargs)
  File "C:\ProgramData\Miniconda3\lib\site-packages\pyproj\proj.py", line 171, in __init__
    super().__init__(cstrencode(projstring.strip()))
  File "pyproj/_proj.pyx", line 30, in pyproj._proj.Proj.__init__
pyproj.exceptions.ProjError: Invalid projection b'C'.: (Internal Proj Error: proj_create: unrecognized format / unknown name)

And other times it will print out the invalid projection with characters mixed in where they shouldn’t be. Like very clearly wrong changes where +proj=merc is changed to some odd unicode character in place of the p in proj.

I’m trying my best to reproduce this, but so far have been unsuccessful which is why I don’t have a reproducible example yet. I’ve only ever noticed this in logs.

Expected Output

No error.

Environment Information

  • Output from: python -m pyproj -v
pyproj info:
    pyproj: 2.5.0
      PROJ: 6.3.0
  data dir: /data1/users/davidh/miniconda3/envs/geo2grid_dist/share/proj

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48)  [GCC 7.3.0]
executable: /data1/users/davidh/miniconda3/envs/geo2grid_dist/bin/python
   machine: Linux-2.6.32-573.12.1.el6.x86_64-x86_64-with-centos-6.10-Final

Python deps:
       pip: 20.0.2
setuptools: 45.2.0.post20200209
    Cython: None

Specific conda-forge builds:

proj                      6.3.0                hc80f0dc_0    conda-forge
pyproj                    2.5.0            py37h8ff28aa_0    conda-forge

Installation method

  • conda, pip wheel, from source, etc…

Conda environment information (if you installed with conda):

I mentioned specific conda packages above, but we’ve seen this now on Ubuntu, Windows, and a CentOS 7 docker container running a conda-pack’d version of a conda-forge environment.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

Sorry, I thought I closed this already. This was our fault for using a CRS object with a dask map_blocks function (passing CRS objects between threads).

If I were to implement the same strategy for spatial_ref in satpy and geoxarray, do you think that would be a good idea?

I think that would be great 👍. The more consistency across libraries the better.

Would you do anything differently if you could now?

Nothing that I can think of at the moment.