tensorflow: tf.image.decode_png doesn't work for palette-based images

System information

Have I written custom code
Linux Ubuntu 18.04
TensorFlow installed from source
TensorFlow version: v1.12.0-0-ga6d8ffae09 1.12.0
Python version: 3.6.7
CUDA/cuDNN version: V10.0.130
GPU model and memory: RTX2080 Ti

Describe the current behavior

Pixel values should be the same regardless if loading the image by PIL or by TF

Describe the expected behavior

Pixel values are different

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

from PIL import Image
import numpy as np

import tensorflow as tf
tf.enable_eager_execution()

PATH = '/tmp/42313738-65c10f7c-807e-11e8-8f11-9db821e3c3cc.png'

im = Image.open(PATH)
ar = np.asarray(im)
pil_max = np.max(ar)
print(pil_max)

im = tf.gfile.FastGFile(PATH, 'rb').read()
ar = tf.image.decode_png(im, channels=1)
tf_max = tf.reduce_max(ar)
print(tf_max)

assert tf_max == pil_max

image: here

Other info / logs I suspect that the problem is caused by tensorflow loading the first RGB channel, (so the red channel) instead of the color indexes, for palette based png images like given example.

related to #20028

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 17 (8 by maintainers)

Most upvoted comments

ok, forget the segmentation part.

Could you just answer this question: How can I achieve the same behavior with tensorflow as I had with PIL?

because what you have suggested before is the opposite, you modified the PIL code to match the tensorflow implementation

You can do something similar using,

import tensorflow as tf
img = tf.image.decode_png(tf.io.read_file(/path/to/png/file))

captain-pool on May 1, 2019

This is not Build/Installation or Bug/Performance issue. Please post this kind of support questions at Stackoverflow. There is a big community to support and learn from your questions. GitHub is mainly for addressing bugs in installation and performance. Thanks!

jvishnuvardhan on Jun 6, 2019

jvishnuvardhan on May 22, 2019

1. That is not the problem, and the code doesn't work perfectly if I use channel=0.
   if I use channel=0, I get again a RGB image with 3 channels, which I don't want.
   I want the single channel indexed-based image.

2. The documentation doesn't mention anything about taking a weighted average of the channels.

3. I am sorry, the image I have provided it not the best, please try this code:

from PIL import Image
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()

PATH = '/tmp/xX4Stvh.png'

im = tf.gfile.FastGFile(PATH, 'rb').read()
ar = tf.image.decode_png(im, channels=0)
tf_max = tf.reduce_max(ar)
print(tf_max)

im = Image.open(PATH)
ar = np.asarray(im)
pil_max = np.max(ar)
print(pil_max)

on this image: https://i.imgur.com/xX4Stvh.png

it has only three colors, so the difference in behaviour will be more apparent

Sorry for the confusion. The Main reason for this error is, PIL opens images in palette mode. This means the color from the channels are mapped to a color palette and that palette index is provided in each location of the image. However tensorflow, decodes the image into Channels and doesn’t use this concept of palette. So, when the Pillow Image object is converted to numpy array, the values at the various positions is not intensity of the pixels of the color channel, which is in the case of tensorflow.

The correct way to verify this is:

from PIL import Image
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()

PATH = '/tmp/xX4Stvh.png'

im = tf.gfile.FastGFile(PATH, 'rb').read()
ar = tf.image.decode_png(im, channels=0)
tf_max = tf.reduce_max(ar)
print(tf_max)

im = Image.open(PATH).convert('RGB') # This convert() function converts from 
                                     # PIL exclusive Palette mode to channel mode
ar = np.asarray(im)
pil_max = np.max(ar)
print(pil_max)

And about the weighted average thing. When you are loading a color image in channel mode and request it to convert to grayscale (channels=0 does that). A weighted average on all the channels(R,G,B) is calculated to produce the value for one channel. (That’s why the number of channels for grayscaled image is 1 and RGB colored image, is 3) Reference:

captain-pool on Apr 30, 2019

A clean solution would be to re-implement a custom op to decode a PNG without palette conversion.

Currently, the conversion is done at core level:

  // convert palette to rgb(a) if needs be.
  if (context->color_type == PNG_COLOR_TYPE_PALETTE)
    png_set_palette_to_rgb(context->png_ptr);

If you are at TF 1.X, you can wrap the PIL-call with py_func in order to get the desired behavior, like:

def read_png(mask):
    def read_fn(p):
        return np.asarray(Image.open(p))
    return tf.py_func(read_fn, [mask], tf.uint8)

and then build your pipe, like:


ar = read_png(im)
tf_max = tf.reduce_max(ar)

with tf.Session() as sess:
    print(sess.run(tf_max))

Note: in TF1.X this works only in graph-mode. In TF2.X a similar trick should be possible with tf.numpy_function or tf.py_function.

alar0330 on Mar 2, 2020

thanks for the response capitan-pool So the whole point is that I actually only care about these palette indexes, not about the RGB values. Those palette index values are the target class IDs for semantic segmentation.

adrianstaniec on Apr 30, 2019