WeasyPrint: Not defining the document encoding can be slow when chardet is installed

I have a 9 pages sized pdf document which includes 5 images. those 5 images are included via base64 encoded sources inline.

  • When i exclude those images during the rendering, the entire pipeline takes about 16seconds.
  • When i include those images, i end up with 49 seconds.

That’s a fairly big hit performance wise - are there any tools to optimize this at all? You find the PDF attached - it is not really complex but more or less is test document for our print.

We are using weasyprint using a REST api like this


@app.route('/pdf', methods=['POST'])
def generate():
    name = request.args.get('filename', 'unnamed.pdf')
    app.logger.info('POST  /pdf?filename=%s' % name)

    html = HTML(string=request.data)
    document = html.render(stylesheets=[CSS('css/local.css')], presentational_hints=True)
    pdf = document.write_pdf(zoom=0.7936507936507937)

    response = make_response(pdf)
    response.headers['Content-Type'] = 'application/pdf'
    response.headers['Content-Disposition'] = 'inline;filename=%s' % name
    app.logger.info(' ==> POST  /pdf?filename=%s  ok' % name)
    return response

This whole service runs on an developer machine (linux desktop) under docker

  • Intel® Core™ i7-8565U CPU @ 1.80GHz
  • with 32GB ram
  • very fast m2 SSD

I would have expected it to be quicker then that, but it seems those images have a huge impact

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

Great we could nail this one down. Since this docker image is only meant for weasyprint i’am suprised that i have more then the required dependencies. Alpine usually tries to keep the packages tiny and slim, but well, idnk where it comes from.

Having this in the docs makes a lot of sense, installation and usage docs.

Thank you so much for your time!

Problem solved: chardet is slow for your document.

Chardet is an optional dependency of html5lib that tries to detect a document encoding. It’s not slow for me, because it’s not installed, and that’s why I had to use -e utf8 to get the correct rendering. You don’t need the -e option, because chardet is installed on your system and (very slowly) detects the right encoding.

Well actually the question maybe is, if we should document the importance of setting then encoding.

That’s a good question. I really don’t know why chardet is installed on your system, because it’s not a dependency of gunicorn, flask or WeasyPrint. It’s probably installed as a dependency of your alpine packages.

By the way, the documentation will be rewritten. I can keep this ticket open to add a comment about this.