pyfakefs: Can't write xml files via "lxml" package when using pyfakefs

I’m really sorry for such a broad and unspecific bug report. But I could break it down to the fact that the problem occurs only when I use pyfakefs but if I use a real fileystem for (nearly) the same test everything is fine.

Do you have any idea what could cause this side effect or how I could go on with my investigation?

Maybe pyfakefs doesn’t write a real file to the fake filesystem? Can I checkt that somehow?

The last lines of the raised error

The raised errors seems to have nothing to do with pyfakefs or my own package.

  File "/usr/lib/python3/dist-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
    ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 40, in __init__
    self._get_size()
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 46, in _get_size
    dimensions = parser.parse_dimensions()
  File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_reader.py", line 164, in parse_dimensions
    for _event, element in it:
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
    root = pullparser._close_and_return_root()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
    root = self._parser.close()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
    self._raiseerror(v)
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

----------------------------------------------------------------------
Ran 1 test in 0.691s

FAILED (errors=1)

Description

The unittest checks if a excel file can be read. The test does this steps

Create a pandas.DataFrame.
Store it as an excel file (via pandas.DataFrame.to_excel())
Read the excel file into (via pandas.read_excel()).
Compare the initial and the returned data frame.

Of course in the real tests there happens a lot more between 2. and 3. I have a wrapper around pandas.DataFrame.to_excel().

The unittests

import unittest
import pandas
import pyfakefs.fake_filesystem_unittest as pyfakefs_ut

class Works(unittest.TestCase):

    def test_simple(self):
        """Simple excel."""
        excel_path = pathlib.Path('foobar.xlsx')
        if excel_path.exists():
            excel_path.unlink()

        df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
        df_init.to_excel(excel_path)

        self.assertTrue(excel_path.exists())

        df = pandas.read_excel(excel_path)

        self.assertEqual(df.shape, (3, 3))


class Problem(pyfakefs_ut.TestCase):

    def setUp(self):
        self.setUpPyfakefs(allow_root_user=False)

    def test_simple(self):
        excel_path = pathlib.Path('foobar.xlsx')
        df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
        df_init.to_excel(excel_path)

        self.assertTrue(excel_path.exists())

        # HERE comes the ERROR
        df = pandas.read_excel(excel_path)

        self.assertEqual(df.shape, (3, 3))

Environment

In the beginning this problem occur in older versions with Pandas (1.3.5), Numpy and openpyxl. Just for that bug report I updated everything possible (except my operating system and the python interpreter) to the current available stable release version. But the error is still there.

Debian 11 (arm)
Python 3.9.2 (via debian repo)
Pandas 1.4.4 (via pip)
Numpy 1.19.5 (via pip)
openpyxl 3.0.10 (via pip)

Full error output

python3 -m unittest tests.test_bandas.Problem
E
======================================================================
ERROR: test_simple (tests.test_bandas.Problem)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1727, in close
    self.parser.Parse(b"", True) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/ownCloud/my.work/buhtzology/tests/test_bandas.py", line 1139, in test_simple
    df = pandas.read_excel(excel_path)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
    super().__init__(filepath_or_buffer, storage_options=storage_options)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
    self.book = self.load_workbook(self.handles.handle)
  File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
    return load_workbook(
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
    reader.read()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 282, in read
    self.read_worksheets()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
    ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 41, in __init__
    self._get_size()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 47, in _get_size
    dimensions = parser.parse_dimensions()
  File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 166, in parse_dimensions
    for _event, element in it:
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
    root = pullparser._close_and_return_root()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
    root = self._parser.close()
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
    self._raiseerror(v)
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

----------------------------------------------------------------------
Ran 1 test in 0.699s

FAILED (errors=1)

Misc

Referenced by https://codeberg.org/buhtz/buhtzology/issues/28

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 27 (17 by maintainers)

Most upvoted comments

When there is so much patching I wonder if my tests are still “real”.

Well, if are going to patch the file system access, this is somewhat unavoidable. As I wrote, patching the fs access of some common libraries is a convenience, as most users won’t have the time or patience to do it themselves, but this is currently only done for pandas and the related xlrd, and django.

In the case of pandas, xlrd (and probably openpyxl) the patch just ensures that the library uses Python for filesystem access, which it would do anyway in some environments. This way, you won’t catch some potential bugs in pandas or related libraries implementation that would only occur if using the C libraries, but that shouldn’t be the scope of tests. If you need completely realistic tests, you have to check in the real fs, a RAM disk could help to that faster.

Technically your fake-fs is in RAM, right?

Yes, it is.

mrbean-bremen on Sep 7, 2022