netcdf-c: reproducible Zarr reading error
This is my first attempt reading zarr from netCDF, and I have found this simple, reproducible error on both my Apple Silicon and Ubuntu 20.04 machines. It exists in both 4.9.0 and 4.8.1 in different ways. I create a simple little pure zarr data set as follows with zarr 2.11.3 or zarr 2.8.1:
from numpy import *
import zarr
store = zarr.DirectoryStore('simple.zarr')
root=zarr.group(store=store)
z = root.create('z',shape=(10, 10), dtype='f',overwrite=True)
z[:] = 42
When I read this back in with zarr in python, as expected I get an array filled with 42.0. When I use the command ncdump 'file://simple.zarr#mode=nczarr,zarr' with 4.8.1 on ubuntu 20.04 or on the Apple silicon, I get
netcdf simple {
dimensions:
.zdim_10 = 10 ;
variables:
float z(.zdim_10, .zdim_10) ;
data:
z =
2.080671e-36, 5.605194e-43, 5.605194e-43, 6.305843e-44, 2.802597e-44,
2.942727e-44, 9.187894e-41, 3.087947e-38, 39.82812, 1.36231e+10,
48.5647, 9.24857e-44, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ;
}
When I try to read it with 4.9.0 on the apple silicon with the same command, it hangs and never returns…
ultimately, my goal is to get my zarr files available to R users via netCDF, but figuring this out seems a logical first step. I hope this bug report is helpful, and that I am not doing something stupid.
Jamie
p.s. for 4.9.0, nc-config --all gives
This netCDF 4.9.0 has been built with the following features:
--cc -> /usr/bin/clang
--cflags -> -I/opt/local/include
--libs -> -L/opt/local/lib -lnetcdf
--static -> -lhdf5_hl -lhdf5 -lz -ldl -lm -lzstd -lbz2 -lcurl -lxml2
--has-c++ -> no
--cxx ->
--has-c++4 -> no
--cxx4 ->
--has-fortran -> no
--has-dap -> yes
--has-dap2 -> yes
--has-dap4 -> yes
--has-nc2 -> yes
--has-nc4 -> yes
--has-hdf5 -> yes
--has-hdf4 -> no
--has-logging -> no
--has-pnetcdf -> no
--has-szlib -> no
--has-cdf5 -> no
--has-parallel4 -> no
--has-parallel -> no
--has-nczarr -> yes
--prefix -> /opt/local
--includedir -> /opt/local/include
--libdir -> /opt/local/lib
--version -> netCDF 4.9.0
and on ubuntu netcdf 4.8.1 gives
This netCDF 4.8.1 has been built with the following features:
--cc -> x86_64-conda-linux-gnu-cc
--cflags -> -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
--libs -> -L/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib -lnetcdf
--static -> -lmfhdf -ldf -lhdf5_hl -lhdf5 -lm -lcurl -lzip
--has-c++ -> no
--cxx ->
--has-c++4 -> no
--cxx4 ->
--has-fortran -> yes
--fc -> /home/conda/feedstock_root/build_artifacts/netcdf-fortran_1642696590650/_build_env/bin/x86_64-conda-linux-gnu-gfortran
--fflags -> -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include -I/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
--flibs -> -L/home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib -lnetcdff -lnetcdf -lnetcdf -lnetcdff_c
--has-f90 -> TRUE
--has-f03 -> yes
--has-dap -> yes
--has-dap2 -> yes
--has-dap4 -> yes
--has-nc2 -> yes
--has-nc4 -> yes
--has-hdf5 -> yes
--has-hdf4 -> yes
--has-logging -> no
--has-pnetcdf -> no
--has-szlib -> no
--has-cdf5 -> yes
--has-parallel4 -> no
--has-parallel -> no
--has-nczarr -> yes
--prefix -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022
--includedir -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/include
--libdir -> /home/pringle/anaconda3/envs/py3_parcels_mpi_bleedingApr2022/lib
--version -> netCDF 4.8.1
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 41 (16 by maintainers)
So @JamiePringle, I’ll be updating our documentation to clarify a few things, but essentially, what’s going on here is this:
The
libnetcdf.solibrary cannot talk to theblosclibrary directly; it requires an “interface” library, which acts as a go-between. This interface library is built by the netCDF library, ifbloscis detected at configure/build time.From scratch, the steps to get this to work (on my system) were as follows, and assumes
libhdf5was installed (although not strictly necessary).--enable-pluginsand--with-plugin-dir=$HOME/netcdf-pluginsbloscis specified in the generatedlibnetcdf.settingsfile.Once built and installed, I set the environmental variable
HDF5_PLUGIN_PATH=$HOME/netcdf-plugins. Once this is done, I can runncdumpand access the files.The reason this works is because:
ncdumpknows where to find the interface library becauseHDF5_PLUGIN_PATHis set.As a reminder, note that HDF5 and NCZarr do not invoke the compressor such as libblosc.21.1.dylib directly. Rather, the compression code is wrapped in code that translates the HDF5/NCZarr API to the ibblosc.21.1.dylib compressor code. If NCZarr (and HDF5) cannot find those wrappers, then it cannot apply compression to the dataset. I assume that the --enable-filters option is enabled. Then these wrappers are created in the netcdf-c/plugins directory, so you can point HDF5_PLUGIN_PATH to that directory if you want. If you want to install them in some other place, then use the –with-plugin-dir=“dir” option to install them in directory “dir”. You can alternatively specify --with-plugin-dir=yes and the wrappers will be installed in a standard location, in which case you do not need to set HDF5_PLUGIN_PATH.
Also, if you look above, I link to the c-library that numcodecs uses to do the blosc compression.The key thing to understand is that NCZarr uses the existing HDF5 compression mechanism, which is, unfortunately, a bit complicated. We have attempted to simplify things. The important thing to note is that HDF5 and NCZarr do not invoke the compressor such as c-blosc directly. Rather, the compression code is wrapped in code that translates the HDF5/NCZarr API to the c-blosc compressor code. If NCZarr (and HDF5) cannot find those wrappers, then it cannot apply compression to the dataset. Apparently the conda netcdf-c packager does not (yet) install these wrappers. So, you will need to take some extra effort to do that installation. The first step is to find those wrappers. If you can find the build directory where libnetcdf was built, then there should be a directory there called “plugins”. See if you can find it and send us a listing of the contents of that directory. \