WeasyPrint: CSS Line Breaks/New Line doesn't work for non-latin chars

Hello.

The \A new line escape char in CSS raises AssesrtionError if the first char of the second line is non-latin, or in my case an Arabic letter. The following is what I get:

2019-03-18 15:33:39,460  [ERROR]  Error occured: Traceback (most recent call last):
  File "D:\my\project\pdf.py", line 126, in generate
    pdf = html.write_pdf(stylesheets=stylesheets)
  File "C:\Python37\lib\site-packages\weasyprint\__init__.py", line 198, in write_pdf
    font_config=font_config).write_pdf(
  File "C:\Python37\lib\site-packages\weasyprint\__init__.py", line 159, in render
    font_config)
  File "C:\Python37\lib\site-packages\weasyprint\document.py", line 361, in _render
    [Page(p, enable_hinting) for p in page_boxes],
  File "C:\Python37\lib\site-packages\weasyprint\document.py", line 361, in <listcomp>
    [Page(p, enable_hinting) for p in page_boxes],
  File "C:\Python37\lib\site-packages\weasyprint\layout\__init__.py", line 184, in layout_document
    make_margin_boxes(context, page, state))
  File "C:\Python37\lib\site-packages\weasyprint\layout\pages.py", line 435, in make_margin_boxes
    yield margin_box_content_layout(context, page, box)
  File "C:\Python37\lib\site-packages\weasyprint\layout\pages.py", line 444, in margin_box_content_layout
    absolute_boxes=[], fixed_boxes=[])
  File "C:\Python37\lib\site-packages\weasyprint\layout\blocks.py", line 363, in block_container_layout
    for line, resume_at in lines_iterator:
  File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 56, in iter_line_boxes
    device_size, absolute_boxes, fixed_boxes, first_letter_style)
  File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 105, in get_next_linebox
    waiting_floats, line_children=[])
  File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 745, in split_inline_box
    line_placeholders, child_waiting_floats, line_children))
  File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 583, in split_inline_level
    context, box, max_x - position_x, skip)
  File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 1020, in split_text_box
    'Expected nothing or a preserved line break' % (between,))
AssertionError: Got '\n التاريخ: الأحد 3 مارس 2019' between two lines. Expected nothing or a preserved line break

I’m using:

WeasyPrint==45
cairocffi==1.0.2
CairoSVG==2.3.0

The part of CSS string causing the issue:

@top-right {{
				padding: 0.5cm;
				font-family: 'Dubai-Medium', sans-serif;
				content: " {doc_ref}: {ref} \A {doc_date}: {weekday} {day} {month} {year} \A {doc_addressee}: {addressee} ";
				display: block;
				white-space: pre;
				vertical-align: top;
			}}

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (8 by maintainers)

Commits related to this issue

Most upvoted comments

It’s now fixed, but the fix is probably bad for performance. I’ll release version 47 and try to improve performance later.

Version 46 has been released!

I want to thank you all for fixing it!

Hello

We have the same problem. With older version of weasyprint our rtl documents were generated (not perfect but we have pdfs). Now we get an error : line_children)) File "/usr/local/lib/python3.7/dist-packages/weasyprint/layout/inlines.py", line 584, in split_inline_level context, box, max_x - position_x, skip) File "/usr/local/lib/python3.7/dist-packages/weasyprint/layout/inlines.py", line 1024, in split_text_box 'Expected nothing or a preserved line break' % (between,)) AssertionError: Got ' فرنسي- أردني بإدار' between two lines. Expected nothing or a preserved line break ' on stderr, on output. CMD : /usr/local/bin/weasyprint --encoding utf8 --format pdf As for AjawadMahmoud the problem was skipped by commenting lines (1022 to 1024 in inlines.py) but it is not a long-term solution. And we don’t evaluate the side effects yet.

We have a lot of arabic documents and it is a real problem if we can’t get pdf versions.

Do you plan to fix this bug soon ? just to think how we can do Thank you for your answer Have a good day

I understand this. I would be glad if I could add something to this great project that is been helping me for years. I’ll try to work out something. On Fri, Mar 22, 2019 at 6:38 PM Guillaume Ayoub notifications@github.com wrote:

Could you give a shot with some basic lipsom Arabic text like:

I can reproduce your bug, thank you.

We were very lucky to have your original bug fixed, but as I said earlier: right-to-left scripts are poorly supported. The code is full of assumptions that only work with Latin or Cyrilic scripts, and even some left-to-right languages should fail.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Kozea/WeasyPrint/issues/828#issuecomment-475645608, or mute the thread https://github.com/notifications/unsubscribe-auth/AATVPnGsbqRhyUevlEwIFlvHXU049zvnks5vZOrTgaJpZM4b5lqc .