WeasyPrint: Not defining the document encoding can be slow when chardet is installed
I have a 9 pages sized pdf document which includes 5 images. those 5 images are included via base64 encoded sources inline.
- When i exclude those images during the rendering, the entire pipeline takes about 16seconds.
- When i include those images, i end up with 49 seconds.
That’s a fairly big hit performance wise - are there any tools to optimize this at all? You find the PDF attached - it is not really complex but more or less is test document for our print.
We are using weasyprint using a REST api like this
@app.route('/pdf', methods=['POST'])
def generate():
name = request.args.get('filename', 'unnamed.pdf')
app.logger.info('POST /pdf?filename=%s' % name)
html = HTML(string=request.data)
document = html.render(stylesheets=[CSS('css/local.css')], presentational_hints=True)
pdf = document.write_pdf(zoom=0.7936507936507937)
response = make_response(pdf)
response.headers['Content-Type'] = 'application/pdf'
response.headers['Content-Disposition'] = 'inline;filename=%s' % name
app.logger.info(' ==> POST /pdf?filename=%s ok' % name)
return response
This whole service runs on an developer machine (linux desktop) under docker
- Intel® Core™ i7-8565U CPU @ 1.80GHz
- with 32GB ram
- very fast m2 SSD
I would have expected it to be quicker then that, but it seems those images have a huge impact
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (10 by maintainers)
Great we could nail this one down. Since this docker image is only meant for weasyprint i’am suprised that i have more then the required dependencies. Alpine usually tries to keep the packages tiny and slim, but well, idnk where it comes from.
Having this in the docs makes a lot of sense, installation and usage docs.
Thank you so much for your time!
Problem solved: chardet is slow for your document.
Chardet is an optional dependency of html5lib that tries to detect a document encoding. It’s not slow for me, because it’s not installed, and that’s why I had to use
-e utf8to get the correct rendering. You don’t need the-eoption, because chardet is installed on your system and (very slowly) detects the right encoding.That’s a good question. I really don’t know why chardet is installed on your system, because it’s not a dependency of gunicorn, flask or WeasyPrint. It’s probably installed as a dependency of your alpine packages.
By the way, the documentation will be rewritten. I can keep this ticket open to add a comment about this.