pdfminer.six: AttributeError: 'PDFStream' object has no attribute 'replace'

Hello everybody,

At the moment I’m parsing tons of PDFs, but pdfminer.six fails on one of them. Any suggestions? I can open the PDF, but maybe pdfminer.six can’t handle it properly. All the other PDFs origin from the same author/organization…

Traceback (most recent call last):
  File "/home/felix/anaconda3/bin/pdf2txt.py", line 136, in <module>
    if __name__ == '__main__': sys.exit(main())
  File "/home/felix/anaconda3/bin/pdf2txt.py", line 131, in main
    outfp = extract_text(**vars(A))
  File "/home/felix/anaconda3/bin/pdf2txt.py", line 63, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/high_level.py", line 82, in extract_text_to_fp
    interpreter.process_page(page)    
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 852, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 862, in render_contents
    self.init_resources(resources)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 362, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 212, in get_font
    font = self.get_font(None, subspec)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 203, in get_font
    font = PDFCIDFont(self, spec)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/pdffont.py", line 658, in __init__
    self.cmap = CMapDB.get_cmap(name)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/cmapdb.py", line 259, in get_cmap
    data = klass._load_data(name)
  File "/home/felix/anaconda3/lib/python3.6/site-packages/pdfminer/cmapdb.py", line 233, in _load_data
    name = name.replace("\0", "")
AttributeError: 'PDFStream' object has no attribute 'replace'

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 19 (6 by maintainers)

Commits related to this issue

Most upvoted comments

#228 works fine.

@vinayak-mehta Someone (I think @goulu) merged some bugfix PRs and then added me to the org back in early 2017, but like you I just depend on pdfminer so I’m not comfortable (not to mention don’t have time) taking on responsibility for it. I don’t know anything about the PyPi package.