deeplearning4j: ImageRecordReader crashes JVM with loaded Keras model in 1.0.0-beta7
Issue Description
I encountered a strange problem in 1.0.0-beta7 while trying to run a Keras model loaded from a .h5 file (e.g., VGG16.h5 from here) - this model previously ran fine in 1.0.0-beta6.
Calling computationGraph.feedForward(features, false) would crash the JVM (error log, using this code snippet:
// Create VGG16 from a Keras .h5 file
ComputationGraph tmpModel = KerasModelImport.importKerasModelAndWeights("VGG16.h5");
tmpModel.init();
ImageRecordReader reader = new ImageRecordReader(224, 224, 3);
reader.initialize(new FileSplit(new File("img_125_5.jpg"))); // Test with a single image
DataSetIterator it = new RecordReaderDataSetIterator(reader, 1);
// Keras model has wrong channel order, so flip it at the reader level
reader.setNchw_channels_first(false);
INDArray features = it.next().getFeatures();
// INDArray features = Nd4j.rand(1, 224, 224, 3); // Runs fine when initializing from random array of same size
System.out.println(Arrays.toString(features.shape())); // prints [1, 224, 224, 3]
tmpModel.feedForward(features, false);
The crash would happen specifically within the ComputationGraph class at line 1976 - figured this by stepping through the code in IntelliJ.
Strangely though, the code snippet above runs fine if you use a random numpy array of the same shape (so the issue isn’t caused by the features shape). Looking into the values of the features given by the DatasetIterator, there aren’t any NaNs or weird values (all are between 0 and 1).
Also interesting to note is that the .h5 model can be saved in beta6 to a zip using model.save(new File("VGG.zip")), then loaded in beta7, and the above snippet works fine (swapping the KerasModelImport... for ComputationGraph.load(new File("beta6KerasVGG.zip"), true);
Another note, the above snippet works fine if using a different model (e.g., ResNet50.h5) - so it’s not all Keras models that this problem occurs with.
Conclusion
On one hand, it seems like the problem is caused by updates to the KerasModelImport process - a .h5 file which loaded and ran fine in 1.0.0-beta6 now no longer works in 1.0.0-beta7. Additionally, saving a .zip file of the beta6 version and loading a new ComputationGraph in beta7 circumvents the above problem.
However, it also seems like the ImageRecordReader or DataSetIterator could be the culprit - when those are taken out of the equation (by using a random INDArray) no errors occur.
Attached files

Version Information
Please indicate relevant versions, including, if relevant:
- Deeplearning4j version - 1.0.0-beta7
- Platform information (OS, etc) - Ubuntu 18.04
- CUDA version, if used
- NVIDIA driver version, if in use
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (11 by maintainers)
I’ve made a simple Gradle project to demonstrate this and help you reproduce it.
Instructions
main()method inMain.java. The project initializes using beta6 so themain()method should complete successfully.build.gradle, change thend4janddl4jversions from1.0.0-beta6to1.0.0-beta7. Let your IDE import these changes.main()again. This should now cause the program to crash (JVM crash on Ubuntu 18.04 (log file attached) and nondescript Gradle error on Windows 10).In
Main.java, I’ve also written in some different scenarios that I’ve tried to help debug the issue; most notable isScenario 3which is the duplicating fix mentioned above.Hopefully this can be reproduced on your machine, let me know if there’s any other info you’d like 😃
Attached Files
hs_err_pid17974.log