pdfcpu: unable to extract images from a pdf where BitPerComponent is not 8
Hi, I’ve encountered this error:
I’m trying to extract images from a PDF file where one of the images has a bpc
of 1. I’m willing to implement the solution for my use-case, but I’ll have to understand how the image is encoded if possible.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18
Thanks for reporting this. I’ll check it out.
Thanks so much for your contribution! Very much appreciated 💚
I can confirm that it is! Thank you!
This is fixed with the latest commit!
I’ll try to comeback to this issue soon.
I’m new to dealing with PDFs. All the PDFs I uploaded above were made using either MS Word or Adobe Acrobat then edited by hand. I just learned about the reference tables and all of this stuff. I’m going to come back to this issue with a proper PDF file and an initial implementation once I get some time. Thank you for your help.
There are a lot of issues with your hand-coded attempt for an XRefStm - it is highly invalid. Don’t even consider what Chrome does as a remote indication of a valid PDF file (https://twitter.com/angealbertini/status/939555534625177602?s=20):
/Index [7 7] --> “The first integer shall be the first object number in the subsection; the second integer shall be the number of entries in the subsection”. You have objects with object numbers that are both less than 7 and greater than 14. Thus your /Size entry is also incorrect
/Prev 7704 --> this file offset lands in the middle of XMP data stream so it is incorrect. Most likely you mean 10164 for this specific file (until you edit it, in which case that value will shift)
/W [1 2 0] --> “The sum of the items shall be the total length of each entry; it can be used with the Index array to determine the starting position of each subsection.” So each entry in your XRefStm should be 3 bytes yet your deFLATEd data stream is 28 bytes long which is (a) not a multiple of 3 and (b) doesn’t match with what /Index suggests. And when I hex dump the deFLATEd data it makes no sense according to Table 18 “Entries in a cross-reference stream”.
bytesPerPixel := (bpc*colors + 7) / 8
This is the general way to ensure you don’t miss the last byte. eg. for bpc=1 colors=3 you would end up with bytesPerPixel = 0
On Sat, Sep 25, 2021 at 2:36 AM محمّد @.***> wrote: