netcdf-c: NetCDF 4.9.0: segmentation fault after repeatedly opening a NetCDF 4 file, reading a vector and closing the file
The julia user @sjdaines reported this segmentation fault (https://github.com/Alexander-Barth/NCDatasets.jl/issues/187 ), when repeatedly open a NetCDF 4 file, reading a vector and closing the file. After doing this ~1 000 000 times we have a segmentation fault. For the original use-case, the error occurs much earlier.
- the version of the software with which you are encountering an issue
NetCDF 4.9.0 with HDF5 1.12.1 on Linux 5.15.0 with gcc 5.2.0 or gcc 12.1.0.
- a description of the issue with the steps needed to reproduce it
NetCDF 4.9.0 is compiled with:
export CPPFLAGS="-I/workspace/destdir/include"
export CFLAGS="-std=c99"
export LDFLAGS="-L/workspace/destdir/lib"
./configure --prefix=/workspace/destdir --build=x86_64-linux-musl --host=x86_64-linux-gnu --enable-shared --disable-static --disable-dap-remote-tests --disable-plugins
The segmentation fault can also be reproduced with the following C code:
#include <stdlib.h>
#include <stdio.h>
#include <netcdf.h>
#define FILE_NAME "coords.nc"
#define NX 90
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(2);}
int main() {
int ncid, varid;
float data_in[NX];
int x, y, retval, niter;
niter = 0;
while (1) {
if (niter % 1000 == 0) {
printf("niter: %d\n",niter);
}
if ((retval = nc_open(FILE_NAME, NC_NOWRITE, &ncid)))
ERR(retval);
if ((retval = nc_inq_varid(ncid, "latitude", &varid)))
ERR(retval);
if ((retval = nc_get_var_float(ncid, varid, &data_in[0])))
ERR(retval);
if ((retval = nc_close(ncid)))
ERR(retval);
niter += 1;
}
return 0;
}
Compiled with:
gcc -g test_segfault6.c $(nc-config --cflags --libs)
After niter: 944000, the output is Segmentation fault (core dumped). Running the programm under gdb, we see the following stack trace:
[...]
niter: 944000
niter: 945000
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a991fa in __GI___libc_free (mem=0xffffffff80000400) at malloc.c:3255
3255 malloc.c: No such file or directory.
(gdb)
(gdb) where
#0 0x00007ffff7a991fa in __GI___libc_free (mem=0xffffffff80000400) at malloc.c:3255
#1 0x00007ffff7cef600 in nc4_rec_grp_del () from /workspace/destdir/lib/libnetcdf.so.19
#2 0x00007ffff7cefa2b in nc4_nc4f_list_del () from /workspace/destdir/lib/libnetcdf.so.19
#3 0x00007ffff7c975ee in nc4_close_netcdf4_file () from /workspace/destdir/lib/libnetcdf.so.19
#4 0x00007ffff7c976d5 in nc4_close_hdf5_file () from /workspace/destdir/lib/libnetcdf.so.19
#5 0x00007ffff7c97cd6 in NC4_close () from /workspace/destdir/lib/libnetcdf.so.19
#6 0x00007ffff7c3581a in nc_close () from /workspace/destdir/lib/libnetcdf.so.19
#7 0x0000000000400a42 in main () at test_segfault6.c:28
On a different system with HDF5 1.10.0 this error could not be reproduced (tested up to 5 000 000 iterations).
The NetCDF file is available at: https://github.com/Alexander-Barth/NCDatasets.jl/files/9393436/coords.zip and contains the following data:
$ ncdump -h -s coords.nc
netcdf coords {
dimensions:
latitude = 90 ;
longitude = 144 ;
bnds = 2 ;
variables:
float latitude(latitude) ;
latitude:axis = "Y" ;
latitude:units = "degrees_north" ;
latitude:standard_name = "latitude" ;
latitude:_Storage = "contiguous" ;
latitude:_Endianness = "little" ;
float longitude(longitude) ;
longitude:axis = "X" ;
longitude:units = "degrees_east" ;
longitude:standard_name = "longitude" ;
longitude:_Storage = "contiguous" ;
longitude:_Endianness = "little" ;
// global attributes:
:source = "Data from Met Office Unified Model" ;
:um_version = "11.9" ;
:Conventions = "CF-1.7" ;
:_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.12.0," ;
:_SuperblockVersion = 0 ;
:_IsNetcdf4 = 1 ;
:_Format = "netCDF-4" ;
}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (20 by maintainers)
I recall now. It turns out that, as you note, a number of functions like strdup are not officially part of c99. However it is also the case that, at least for gcc, the c library actually contains strdup implementation and if you had extern char* strdup(const char*) to a header, it actually finds it. If you look at ncconfig.h, you will see a number of declarations that try to deal with this.
OK, this a very interesting find! So
strdupis not part of C99 and indeed the compiler emitted a warning which I did not see.What is surprising, is that according to
config.h,strdupis available (but it is not):Despite the test using the option
-std=c99:For reference, there is previous discussion about this: https://github.com/Unidata/netcdf-c/issues/1408
I am closing this issue because because the option is not necessary anymore in NetCDF 4.9.0. After intensive testing by @sjdaines all reported failure cases are fixed by dropping
-std=c99. Thanks a lot to all for your valuable help!!!(And a learned a lot too; above all that C is really hard 😃)
@Alexander-Barth This is great, actually; it will help a lot to be able to replicate the environment. I will take a look at this tomorrow; my day-to-day machine is ARM, so I will move over to an x86-64 machine to test this out. I will also try under emulation if it comes down to it. Thanks!