spark-deep-learning: Unexpected results when loading images with readImages

Hi,

i am getting unexpected results when loading images (*.jpg) with readImages() from a directory in Hdfs. I am getting nulls instead of the binary image data

Following the results stored as a *.csv file with image_df.repartition(1).write.format("csv").save("/path/to/output_csv")

hdfs://path/to/test_data/Koala.jpg,"[[B@26e4b33,768,null,3,null]"
hdfs://path/to/test_data/Hydrangeas.jpg,"[[B@170b0284,768,null,3,null]"
hdfs://path/to/test_data/Lighthouse.jpg,"[[B@53d233ed,768,null,3,null]"
hdfs://path/to/MA/test_data/Desert.jpg,"[[B@dcacdc,768,null,3,null]"
hdfs://path/to/MA/test_data/Jellyfish.jpg,"[[B@17d89ff5,768,null,3,null]"
hdfs://path/to/test_data/Penguins.jpg,"[[B@eed9a0e,768,null,3,null]"
hdfs://path/to/test_data/Chrysanthemum.jpg,"[[B@111d14ca,768,null,3,null]"
hdfs://path/to/test_data/Tulips.jpg,"[[B@c3fe28d,768,null,3,null]"

ENV: -cdh.5.7.0 -spark 1.60 -Anaconda 4.2.0

Any help is appreciated.

About this issue

Original URL
State: open
Created 7 years ago
Comments: 15 (4 by maintainers)

Most upvoted comments

I have not investigated further than that, but Spark is not behaving as you would like here. It is taking a string representation of the java object that contains the row, and by default in java, the arrays are just printed by their pointers (the [[B@6ad79116 elements). It seems to be a spark issue independent of images, which I will try to reproduce.

In the meantime, though, can you store your images in the parquet format, for example? It will be more compact, and high quality readers exist for various languages.

thunterdb on Aug 18, 2017