netcdf-c: reproducible Zarr reading error

This is my first attempt reading zarr from netCDF, and I have found this simple, reproducible error on both my Apple Silicon and Ubuntu 20.04 machines. It exists in both 4.9.0 and 4.8.1 in different ways. I create a simple little pure zarr data set as follows with zarr 2.11.3 or zarr 2.8.1:

from numpy import *
import zarr

store = zarr.DirectoryStore('simple.zarr')
root=zarr.group(store=store)
z = root.create('z',shape=(10, 10), dtype='f',overwrite=True)
z[:] = 42

When I read this back in with zarr in python, as expected I get an array filled with 42.0. When I use the command ncdump 'file://simple.zarr#mode=nczarr,zarr' with 4.8.1 on ubuntu 20.04 or on the Apple silicon, I get

netcdf simple {
dimensions:
        .zdim_10 = 10 ;
variables:
        float z(.zdim_10, .zdim_10) ;
data:

 z =
  2.080671e-36, 5.605194e-43, 5.605194e-43, 6.305843e-44, 2.802597e-44, 
    2.942727e-44, 9.187894e-41, 3.087947e-38, 39.82812, 1.36231e+10,
  48.5647, 9.24857e-44, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ;
}

When I try to read it with 4.9.0 on the apple silicon with the same command, it hangs and never returns…

ultimately, my goal is to get my zarr files available to R users via netCDF, but figuring this out seems a logical first step. I hope this bug report is helpful, and that I am not doing something stupid.

Jamie

p.s. for 4.9.0, nc-config --all gives

This netCDF 4.9.0 has been built with the following features: 

  --cc            -> /usr/bin/clang
  --cflags        -> -I/opt/local/include
  --libs          -> -L/opt/local/lib -lnetcdf
  --static        -> -lhdf5_hl -lhdf5 -lz -ldl -lm -lzstd -lbz2 -lcurl -lxml2

  --has-c++       -> no
  --cxx           -> 

  --has-c++4      -> no
  --cxx4          -> 

  --has-fortran   -> no
  --has-dap       -> yes
  --has-dap2      -> yes
  --has-dap4      -> yes
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> no
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> no
  --has-parallel4 -> no
  --has-parallel  -> no
  --has-nczarr    -> yes

  --prefix        -> /opt/local
  --includedir    -> /opt/local/include
  --libdir        -> /opt/local/lib
  --version       -> netCDF 4.9.0

and on ubuntu netcdf 4.8.1 gives

This netCDF 4.8.1 has been built with the following features: 

  --cc            -> x86_64-conda-linux-gnu-cc
  --cflags        -> -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
  --libs          -> -L/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib -lnetcdf
  --static        -> -lmfhdf -ldf -lhdf5_hl -lhdf5 -lm -lcurl -lzip

  --has-c++       -> no
  --cxx           -> 

  --has-c++4      -> no
  --cxx4          -> 

  --has-fortran   -> yes
  --fc            -> /home/conda/feedstock_root/build_artifacts/netcdf-fortran_1642696590650/_build_env/bin/x86_64-conda-linux-gnu-gfortran
  --fflags        -> -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
  --flibs         -> -L/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib -lnetcdff -lnetcdf -lnetcdf -lnetcdff_c
  --has-f90       -> TRUE
  --has-f03       -> yes

  --has-dap       -> yes
  --has-dap2      -> yes
  --has-dap4      -> yes
  --has-nc2       -> yes
  --has-nc4       -> yes
  --has-hdf5      -> yes
  --has-hdf4      -> yes
  --has-logging   -> no
  --has-pnetcdf   -> no
  --has-szlib     -> no
  --has-cdf5      -> yes
  --has-parallel4 -> no
  --has-parallel  -> no
  --has-nczarr    -> yes

  --prefix        -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022
  --includedir    -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
  --libdir        -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib
  --version       -> netCDF 4.8.1

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 41 (16 by maintainers)

Most upvoted comments

So @JamiePringle, I’ll be updating our documentation to clarify a few things, but essentially, what’s going on here is this:

The libnetcdf.so library cannot talk to the blosc library directly; it requires an “interface” library, which acts as a go-between. This interface library is built by the netCDF library, if blosc is detected at configure/build time.

From scratch, the steps to get this to work (on my system) were as follows, and assumes libhdf5 was installed (although not strictly necessary).

  1. Install blosc, blosc development headers.
  2. Configure netCDF with --enable-plugins and --with-plugin-dir=$HOME/netcdf-plugins
  3. Ensure blosc is specified in the generated libnetcdf.settings file.
  4. Run make, make install.

Once built and installed, I set the environmental variable HDF5_PLUGIN_PATH=$HOME/netcdf-plugins. Once this is done, I can run ncdump and access the files.

The reason this works is because:

  1. NetCDF builds the interface library.
  2. ncdump knows where to find the interface library because HDF5_PLUGIN_PATH is set.

As a reminder, note that HDF5 and NCZarr do not invoke the compressor such as libblosc.21.1.dylib directly. Rather, the compression code is wrapped in code that translates the HDF5/NCZarr API to the ibblosc.21.1.dylib compressor code. If NCZarr (and HDF5) cannot find those wrappers, then it cannot apply compression to the dataset. I assume that the --enable-filters option is enabled. Then these wrappers are created in the netcdf-c/plugins directory, so you can point HDF5_PLUGIN_PATH to that directory if you want. If you want to install them in some other place, then use the –with-plugin-dir=“dir” option to install them in directory “dir”. You can alternatively specify --with-plugin-dir=yes and the wrappers will be installed in a standard location, in which case you do not need to set HDF5_PLUGIN_PATH.

Also, if you look above, I link to the c-library that numcodecs uses to do the blosc compression. The key thing to understand is that NCZarr uses the existing HDF5 compression mechanism, which is, unfortunately, a bit complicated. We have attempted to simplify things. The important thing to note is that HDF5 and NCZarr do not invoke the compressor such as c-blosc directly. Rather, the compression code is wrapped in code that translates the HDF5/NCZarr API to the c-blosc compressor code. If NCZarr (and HDF5) cannot find those wrappers, then it cannot apply compression to the dataset. Apparently the conda netcdf-c packager does not (yet) install these wrappers. So, you will need to take some extra effort to do that installation. The first step is to find those wrappers. If you can find the build directory where libnetcdf was built, then there should be a directory there called “plugins”. See if you can find it and send us a listing of the contents of that directory. \