spark-deep-learning: Unexpected results when loading images with readImages
Hi,
i am getting unexpected results when loading images (*.jpg) with readImages() from a directory in Hdfs. I am getting nulls instead of the binary image data
Following the results stored as a *.csv file with image_df.repartition(1).write.format("csv").save("/path/to/output_csv")
hdfs://path/to/test_data/Koala.jpg,"[[B@26e4b33,768,null,3,null]"
hdfs://path/to/test_data/Hydrangeas.jpg,"[[B@170b0284,768,null,3,null]"
hdfs://path/to/test_data/Lighthouse.jpg,"[[B@53d233ed,768,null,3,null]"
hdfs://path/to/MA/test_data/Desert.jpg,"[[B@dcacdc,768,null,3,null]"
hdfs://path/to/MA/test_data/Jellyfish.jpg,"[[B@17d89ff5,768,null,3,null]"
hdfs://path/to/test_data/Penguins.jpg,"[[B@eed9a0e,768,null,3,null]"
hdfs://path/to/test_data/Chrysanthemum.jpg,"[[B@111d14ca,768,null,3,null]"
hdfs://path/to/test_data/Tulips.jpg,"[[B@c3fe28d,768,null,3,null]"
ENV: -cdh.5.7.0 -spark 1.60 -Anaconda 4.2.0
Any help is appreciated.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 15 (4 by maintainers)
I have not investigated further than that, but Spark is not behaving as you would like here. It is taking a string representation of the java object that contains the row, and by default in java, the arrays are just printed by their pointers (the
[[B@6ad79116elements). It seems to be a spark issue independent of images, which I will try to reproduce.In the meantime, though, can you store your images in the parquet format, for example? It will be more compact, and high quality readers exist for various languages.