xhtml2pdf: Problems with some Unicode characters

Hi, I’m using the latest xhtml2pdf (0.2b1) & reportlab (3.4.0) through django-easy-pdf (0.1.0) on Python 3.6.0 and it’s working great for the most part! One problem I am still experiencing, though, is that some Unicode characters are not rendering properly (šŠčČćĆđĐžŽ):

screen shot 2017-03-29 at 16 38 36

I’m using the default django-easy-pdf base template and I found that I can somewhat repair things if I override it to declare the html encoding:

{% block extra_style %}
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
{% endblock %}

Which results in some characters being rendered correctly like Š and Ž, but not all of them (Č, Ć, Đ are still blacked out).

screen shot 2017-03-29 at 16 38 19

I tried experimenting with different font declarations (sans-serif, serif, external fonts), but I can’t seem to fix this. The characters are never rendered correctly. I don’t know if I’m missing some xhtml2pdf / Reportlab setting here. Do you maybe have an idea of a possible solution?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 10
  • Comments: 62 (18 by maintainers)

Most upvoted comments

Any news on fixing the bug? I tried all of the solutions I found on the web and none work, I still get nasty boxes instead of non-latin characters.

In my case helps

<style> @font-face { font-family: Roboto; src: "C://Users//user//Desktop//Project//static//fonts//Roboto-Regular.ttf"; } body { font-family: "Roboto", sans-serif; </style>

I got this problem thousands of times. There are a couple of things you might do:

  1. Use encoding metadata in the template
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta charset="UTF-8">
    
  2. Try to set encoding in CreatePDF method (metadata in the template should do the work but you can try this, you never know what happens)
      pisa.CreatePDF(
         template, dest=response, link_callback=link_callback, encoding='UTF-8')
    
  3. Change your font (This will solve this issue almost every time) But remember that some fonts even if they are supporting some special characters they just don’t work so you have to try many of them(Sometimes I even tried like 20 fonts 'till I solved this issue so don’t give up 😃). The font that never disappoints me is Roboto from Google. This is like the king of fonts because it looks good and it just works for me every time 😄. Try to download .ttf file of any font and then apply this font like this (I never tested other formats except for .ttf so maybe other formats will work as well):
        @font-face {
            font-family: Roboto;
            src: url(static/fonts/Roboto-Regular.ttf);
        }

        @font-face {
            font-family: Roboto;
            src: url(static/fonts/Roboto-Bold.ttf);
            font-weight: bold;
        }

Don’t forget to use this font:

 body {
     font-family: "Roboto", sans-serif;
  }

https://xhtml2pdf.readthedocs.io/en/latest/reference.html#fonts If this will not work for you then make sure your path to the .ttf file is correct (Try for example add to the “fonts” folder some image and render the image to the pdf so you make sure you have the static path set up correctly) and check if step 1. and 2. are applied otherwise … well God help you. 4. Try different library

As already mentioned above: Please create a new issue with a clear and concise description and corresponding (minimal) code which reproduces your issue. While there might be cases with personal support, FOSS projects generally live from public community-driven support and it simplifies stuff if some aspects are generic and provides some easy way to get a grasp of it for future readers.

@akashsagar8 Please prefer opening a new issue with a standalone example to not loose overview. By the way: With the < > button above the GitHub editor you can format your code snippets nicely.

@pedroszg @WinThuLatt

As far as I can tell, this isn’t about rendering fonts from right to left. According to Wikipedia Burmese (I guess this is the language we are talking about) is written from left to right.

Burmese doesn’t seem to be supported by any standard-font like Arial, Helvetica etc. so this means that you have to define your own font. See: https://xhtml2pdf.readthedocs.io/en/latest/reference.html#using-custom-fonts

I was able to render some Burmese text by using the font “Myanmar Text”, which was already on my Windows PC.

So use the following in the CSS part:

@font-face { font-family: MyBurmeseFont; src: url('C:\\Windows\\Fonts\\mmrtext.ttf') }
p { font-family: MyBurmeseFont }

In case you don’t have the mmrtext.ttf file: you can use any other font you find on the internet, that supports Burmese.

Here is a testrendering that I did: image

@Hamza-abughazaleh for arabic text you have to add a Custom tag, check this https://xhtml2pdf.readthedocs.io/en/latest/reference.html#asian-fonts-support

also not work 🤷‍♂️ Screen Shot 2020-09-25 at 12 16 05 PM

here you are using different text, could you copy the text?

{% load staticfiles %}
{% load i18n %}
{% get_current_language as LANGUAGE_CODE %}
<!DOCTYPE>
<html>
<head>
    <title></title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<meta charset="UTF-8">
    <style type="text/css">
		{% if LANGUAGE_CODE == 'ar' %}
			@font-face {
        		font-family: DejaVuSans;
        		src: url('{% static "fonts/DejaVuSans.ttf" %}');
    		}
			body {
			    font-family: DejaVuSans;
				font-size: 14px;
				dir: rtl;
			}
		{% else %}
			body {
				font-size: 14px;
				dir: ltr;
			}
		{% endif %}
    </style>
</head>
<body>
<div>
	<pdf:language name="arabic"/>
    <div>
        <h3> 1# فاتورة للطلب</h3>
    </div>
     <hr/>
     <div>معلومات الطلب</div>
</div>
</body>
</html>

image

is this the result you want?

yes what I must change? @pedroszg

Me too. Slaven characters, 3 of 5 work, the other 2 do not work. Changing fonts makes me even bigger problems.

I’m having the same problem guys…

change fonts may work

<meta http-equiv=Content-Type content="text/html;charset=utf-8">
@font-face {
  font-family: Microsoft Yahei Mono;
  src: url("download the YaHei Mono.ttf");
}

* {
  font-family: Microsoft Yahei Mono
}

the Microsoft Yahei Mono from https://github.com/Microsoft/WSL/issues/2463#issuecomment-334692823

I ran into this issue, too. Sadly, those Korean characters were just printed as black boxes. ex) “안녕하세요” <- it means “hello” error

I have de same issue after update xhtml2pdf to v.0.2b1 and I fix the problem adding <meta charset="UTF-8"> to html templates.