imageio: volread on tifffile produces the wrong shape

This is a regression from 2.9.0 first shared by @mkcor in https://github.com/scikit-image/scikit-image/pull/5262.

Reproducing example:

img = iio.volread('http://cmci.embl.de/sampleimages/NPCsingleNucleus.tif')
img.shape  
# in v2.10.3 (30, 180, 183)
# in v2.9.0 (15, 2, 180, 183)

It traces back to something that, at least to me, is unexpected behavior in v2.9.0. Calling imageio.imread on the above tiff (in either version of ImageIO) returns a single page of the file, not a single image. I.e.,

import imageio as iio
img = iio.imread('http://cmci.embl.de/sampleimages/NPCsingleNucleus.tif')
img.shape  # (180, 183) not the expected (2, 180, 183)

Consequentially, iterating over single images and stacking them, doesn’t yield a stack of channel-first images, but a stack of pages:

reader = iio.get_reader('http://cmci.embl.de/sampleimages/NPCsingleNucleus.tif', mode="i") 
img = np.stack([reader.get_data(index=x) for x in range(reader.get_length())])
img.shape # (30, 180, 183)

Reading a volume, on the other hand, does the expected thing and produces a stack of channel-first images (in v2.9.0)

img = iio.volread('http://cmci.embl.de/sampleimages/NPCsingleNucleus.tif')
img.shape  # (15, 2, 180, 183)

however, does the page-stacking thing in v2.10.3:

img = iio.volread('http://cmci.embl.de/sampleimages/NPCsingleNucleus.tif')
img.shape  # (30, 180, 183)

@cgohlke @almarklein @mkcor (and others who use TIFF more than me) What is expected behavior here? Would you expect imread to return a single image (shape: (2, 180, 183)) or a single page (shape: (180, 183))? I am leaning towards a single image, but I am open for comments here.

Depending on this answer, I would either look into a bugfix fir get_data(index=...) (any version) or volread(...) (v2.10.3).

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (14 by maintainers)

Most upvoted comments

@GenevieveBuckley The fix should roll out to PyPI tonight.

Nope, sorry 😦 It looks like that really is it for ImageJ files

Indeed. I also just found the relevant section of code in our vendored tiffile: It literally just parses the metadata string and sets the shape according to the channel and frame entries . . .

https://github.com/imageio/imageio/blob/master/imageio/plugins/_tifffile.py#L2172-L2252

On the bright side though, it makes the fix efficient, because we don’t have to parse the full file to figure out the image’s shape. All that is needed is to read the first page and we can learn how many pages to read for a imread(index=N) call.

I especially liked “The ImageJ file format is undocumented.”

🤣 lol.

Makes you wonder how @cgohlke figured out how to correctly read ImageJ hyperstacks in the first place. Big props.

I just ran into this as well, thanks for fixing it

Hi @FirefoxMetzger,

@mkcor Did @kmilos comment help? We need to choose a value for PhotometricInterpretation among the list of existing ones when creating a TIFF. This value gives us a sense of how the image data inside a single page should look like.

Yes, absolutely! Thank you, @kmilos. This sentence I had to read a couple times:

It’s just that the baseline defined PhotometricInterpretation values assume a presence of minimal prescribed SamplesPerPixel, and baseline only readers could ignore any superfluous ones.

Visible hyphenation helps understanding as you read (rather than afterwards):

It’s just that the baseline-defined PhotometricInterpretation values assume a presence of minimal prescribed SamplesPerPixel, and baseline-only readers could ignore any superfluous ones.

🙏

How do you deal with RGB images, where you also need to ‘reconstruct the original image’ from three grayscale images? Conceptually and computationally, it shouldn’t be any different… thinking

As per the above. We could set the page’s PhotometricInterpretation to 2, which implies RGB. Then, we can directly store the image on a single page instead of 3 pages.

Didn’t you mean “on 2 pages instead of 3 pages?” We want to access either channel when manipulating the image. Ok, I guess you meant (30, 180, 183) or (15, 2, 180, 183) as opposed to (30, 180, 183, 3) or (15, 2, 180, 183, 3).

Yes, this is what SubIFDs are for - see the previously linked TIFFPM6.pdf document describing “TIFF Trees” for all the details.

Huh, my bad. I somehow thought its the actual TIFF spec rather than a tech note on subIFDs. That’s a pretty cool feature actually that I should start using. The more you know 💯

On that note, @almarklein I think it is time to deprecate ImageIO’s vendored tifffile library. It doesn’t support subIFDs, but the actual library has since added support for it. We prefer a pip install over our vendored one, but I think we would make our life easier if we simply install the latest tifffile from pypi instead, e.g., via something like pip install imageio[tifffile].

the completely arbitrary metadata scheme mentioned by @mkcor, which is even less “standard” and therefore impossible to support generally

I agree, there is definitely a better way to do this, assuming it actually does rely on metadata. It is the way ImageJ does it though, which makes it standard in medical image analysis … by virtue of popularity >.< … and important enough for us to support since medical imaging is one of the areas relying on ImageIO.


On a more general note:

I removed my installation of tifffile and switched to our vendored version (no SubIFD support). I then tested the reading again and iio.volread still produces the desired shape of (15, 2, 180, 183). From this, I conclude that the image, unfortunately, isn’t using SubIFDs to organize the data. More evidence towards it relying on the custom metadata string to format the data…

@kmilos Do you have any other ideas on where the file might store relevant information for stacking, or happen to know how ImageJ solves this? (please don’t say “well, they use the metadata string you already found” xD)