dlib: Facial landmark detection failing when RGBA is converted to RGB

Hi, I understand that the facial landmark detection fails if I give a RGBA image, as it expects either a greyscale or a RGB image. But, when I try removing the alpha channel, it still fails.

import os
from shutil import copyfileobj

try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen

import dlib
from skimage.io import imread

def download(url, filename, overwrite=False):
    if not os.path.exists(filename) or overwrite:
        response = urlopen(url)
        with open(filename, 'wb') as out_file:
            copyfileobj(response, out_file)

download('https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/Beatrix_Podolska_pedagog_muzykolog_Krakow_2008.xcf/720px-Beatrix_Podolska_pedagog_muzykolog_Krakow_2008.xcf.png',
         'Beatrix.png')

im = imread('Beatrix.png')
detector = dlib.get_frontal_face_detector()

print('Shape of image data', im.shape)
im = im[:, :, :3]
print('Shape of image data after processing', im.shape)
print(detector.run(im))

Here is an example of a PNG which is downloaded, it initially has the shape (1024, 720, 4) and after removing the alpha channel has (1024, 720, 3). But it still gives:

Shape of image data (1024, 720, 4)
Shape of image data after processing (1024, 720, 3)
Traceback (most recent call last):
  File "beatrix.py", line 32, in <module>
    print(detector.run(im))
RuntimeError: Unsupported image type, must be 8bit gray or RGB image.

EDIT: Note that this works perfectly fine if i give a PNG with RGB only or even a jpg.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Got bored of preparing for the exam. LOL

It’s just copying the data into another buffer.

I doubt that. It seems that Dlib simply passes a numpy_rgb_image which holds a pointer to the underlying buffer of a Py_Buffer object to an object_detector and I did’t see any copyings until computing the FHOG features. From what I understand, if an array is non-contiguous and shares the buffer with its ‘base’ array, Dlib can’t handle it. The image_view type requires C-contiguous buffers to correctly find the offset of a pixel.

The solution is obvious, copying the array to a contiguous buffer if necessary. We can use either PyBuffer_ToContiguous or PyBuffer_GetPointer.

Things should have become more complex if advanced indexing is used. But thanks to Numpy’s optimization, if a view has discontiguous columns (like array B below), it becomes Fortran-style contiguous. And if a view has both discontiguous rows and columns (like array C below), a new contiguous buffer will be allocated for it. However, whether the conclusion is true is unknown.

> A = numpy.arange(16, dtype=numpy.uint8).reshape(4, 4)
> A
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]], dtype=uint8)
> B = A[:, [0,1,3]]
> B
array([[ 0,  1,  3],
       [ 4,  5,  7],
       [ 8,  9, 11],
       [12, 13, 15]], dtype=uint8)
> B.flags  # B doesn't share buffer with A (already verified)
 C_CONTIGUOUS: False
 F_CONTIGUOUS: True
 OWNDATA: False
> C = B[[0,2,3], :]
> C
array([[ 0,  1,  3],
       [ 8,  9, 11],
       [12, 13, 15]], dtype=uint8)
> C.flags
 C_CONTIGUOUS: True
 F_CONTIGUOUS: False
 OWNDATA: True

To conclude, the array passed in is either contiguous (C or Fortran) or has the same strides with its ‘base’ array. So theoretically we can handle any arrays with the help of strides. The remaining problem is the way Dlib views an image buffer. For example, rgb_pixel assumes RGB channel to be contiguous which obviously may not be true. So for multi-channel images with discontiguous third dimension, we have no choice but to copy the buffer?

Correct me if I was wrong.

References