pdfminer.six: AttributeError: 'PDFStream' object has no attribute 'replace'
Hello everybody,
At the moment I’m parsing tons of PDFs, but pdfminer.six fails on one of them. Any suggestions? I can open the PDF, but maybe pdfminer.six can’t handle it properly. All the other PDFs origin from the same author/organization…
Traceback (most recent call last):
File "/home/felix/anaconda3/bin/pdf2txt.py", line 136, in <module>
if __name__ == '__main__': sys.exit(main())
File "/home/felix/anaconda3/bin/pdf2txt.py", line 131, in main
outfp = extract_text(**vars(A))
File "/home/felix/anaconda3/bin/pdf2txt.py", line 63, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/high_level.py", line 82, in extract_text_to_fp
interpreter.process_page(page)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 852, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 862, in render_contents
self.init_resources(resources)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 362, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 212, in get_font
font = self.get_font(None, subspec)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 203, in get_font
font = PDFCIDFont(self, spec)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdffont.py", line 658, in __init__
self.cmap = CMapDB.get_cmap(name)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/cmapdb.py", line 259, in get_cmap
data = klass._load_data(name)
File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/cmapdb.py", line 233, in _load_data
name = name.replace("\0", "")
AttributeError: 'PDFStream' object has no attribute 'replace'
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 19 (6 by maintainers)
Commits related to this issue
- Fix failure on Actuate-generated PDFs See https://github.com/pdfminer/pdfminer.six/issues/210 — committed to herrdiener/pdfminer3 by herrdiener 5 years ago
#228 works fine.
@vinayak-mehta Someone (I think @goulu) merged some bugfix PRs and then added me to the org back in early 2017, but like you I just depend on pdfminer so I’m not comfortable (not to mention don’t have time) taking on responsibility for it. I don’t know anything about the PyPi package.