netcdf-c: Help with slow open performance with netCDF4 Files
Environment Information
- What platform are you using? (please provide specific distribution/version in summary)
- Linux
- Windows
- OSX
- Other
- NA
- 32 and/or 64 bit?
- 32-bit
- 64-bit
- What build system are you using?
- autotools (
configure) - cmake
- autotools (
- Can you provide a sample netCDF file or
Ccode to recreate the issue?- Yes (please attach to this issue, thank you!)
- No
- Not at this time
Summary of Issue
I work on a climate model and I/O is slowly becoming a major bottleneck. By default we try to use nc4 files as our main format (a few Fortran binaries remain, but we are trying to rid ourselves of that). We recently did an experiment to see if our slow performance might be due to the fact that many files we write are written with an unlimited time dimension, so when we read them, they are unlimited. So, we fixed the time dimension, no change. We also compress, chunk, etc. the files but still no big changes. Finally, a colleague suggested converting to netCDF Classic, and, hooboy. Muuuuuuch faster.
Eventually I went whole hog. I took our NC4 file, decompressed it, converted it using nccopy -k ncX MERR.nc4 MERR.ncX to nc3, nc5, nc6, nc7 and hacked together a simple program to test it out which you can see here:
https://github.com/mathomp4/netcdfopentester/blob/master/testCNetcdf.c
All it does is open and close a file 1000 times (might be overkill). If I run this on my test files, I get the following output:
(1929)(master) $ ./testCNetcdf.exe
nc3 unlimit dim open Time = 0.820 seconds.
nc4 unlimit dim open Time = 19.870 seconds.
nc5 unlimit dim open Time = 1.020 seconds.
nc6 unlimit dim open Time = 1.130 seconds.
nc7 unlimit dim open Time = 20.950 seconds.
nc4 unlimit dim open with hdf5 Time = 0.250 seconds.
I added what I think is the correct HDF5 code to open and close a file, but I mainly program netCDF in Fortran and Python, so even this program is a huge success for me! (Also: why it’s ugly.) It seems like a netCDF open and an HDF5 open are ~2 orders of magnitude in speed difference. Is there a way to tune netCDF to open like HDF5? Or do you pay for HDF5’s quick opening later?
Or, assuming I might have got the HDF5 wrong, why does netcdf Classic open so much faster as well?
Note: The versions of the libraries I’m using are netCDF-C 4.4.1.1 and HDF5 1.8.19. I compiled with Intel 17.0.4 compilers (backed up by GCC 5.3.0). The file is one from the MERRA2 data that can be retrieved (some basic instructions in my README in boring repo mentioned above).
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 91 (53 by maintainers)
No, the coordinates attribute is a hidden attribute (called “_Netcdf4Coordinates”). You can see it with h5dump, but not with ncdump, because netCDF hides it. It is not the same as the “Coordinates” attribute you mention.
Once #1262 is merged, there is no need to turn this speedup on, it will be available for all newly created netCDF-4 files. You don’t have to write the _NetCDF4Coordinates attribute, the library will do it automatically.
The presence of _NetCDF4Coordinates will also be detected automatically when reading a file, and it will be used to speed file opens. If it is not present, then files are opened in the usual (slower) way.