pyfakefs: Can't write xml files via "lxml" package when using pyfakefs
I’m really sorry for such a broad and unspecific bug report. But I could break it down to the fact that the problem occurs only when I use pyfakefs but if I use a real fileystem for (nearly) the same test everything is fine.
Do you have any idea what could cause this side effect or how I could go on with my investigation?
Maybe pyfakefs doesn’t write a real file to the fake filesystem? Can I checkt that somehow?
The last lines of the raised error
The raised errors seems to have nothing to do with pyfakefs or my own package.
File "/usr/lib/python3/dist-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 40, in __init__
self._get_size()
File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_read_only.py", line 46, in _get_size
dimensions = parser.parse_dimensions()
File "/usr/lib/python3/dist-packages/openpyxl/worksheet/_reader.py", line 164, in parse_dimensions
for _event, element in it:
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
root = pullparser._close_and_return_root()
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
root = self._parser.close()
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
self._raiseerror(v)
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
----------------------------------------------------------------------
Ran 1 test in 0.691s
FAILED (errors=1)
Description
The unittest checks if a excel file can be read. The test does this steps
- Create a
pandas.DataFrame. - Store it as an excel file (via
pandas.DataFrame.to_excel()) - Read the excel file into (via
pandas.read_excel()). - Compare the initial and the returned data frame.
Of course in the real tests there happens a lot more between 2. and 3. I have a wrapper around pandas.DataFrame.to_excel().
The unittests
import unittest
import pandas
import pyfakefs.fake_filesystem_unittest as pyfakefs_ut
class Works(unittest.TestCase):
def test_simple(self):
"""Simple excel."""
excel_path = pathlib.Path('foobar.xlsx')
if excel_path.exists():
excel_path.unlink()
df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
df_init.to_excel(excel_path)
self.assertTrue(excel_path.exists())
df = pandas.read_excel(excel_path)
self.assertEqual(df.shape, (3, 3))
class Problem(pyfakefs_ut.TestCase):
def setUp(self):
self.setUpPyfakefs(allow_root_user=False)
def test_simple(self):
excel_path = pathlib.Path('foobar.xlsx')
df_init = pandas.DataFrame({'FOO': range(3), 'BAR': list('ABC')})
df_init.to_excel(excel_path)
self.assertTrue(excel_path.exists())
# HERE comes the ERROR
df = pandas.read_excel(excel_path)
self.assertEqual(df.shape, (3, 3))
Environment
In the beginning this problem occur in older versions with Pandas (1.3.5), Numpy and openpyxl. Just for that bug report I updated everything possible (except my operating system and the python interpreter) to the current available stable release version. But the error is still there.
- Debian 11 (arm)
- Python 3.9.2 (via debian repo)
- Pandas 1.4.4 (via pip)
- Numpy 1.19.5 (via pip)
- openpyxl 3.0.10 (via pip)
Full error output
python3 -m unittest tests.test_bandas.Problem
E
======================================================================
ERROR: test_simple (tests.test_bandas.Problem)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1727, in close
self.parser.Parse(b"", True) # end of data
xml.parsers.expat.ExpatError: no element found: line 1, column 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/ownCloud/my.work/buhtzology/tests/test_bandas.py", line 1139, in test_simple
df = pandas.read_excel(excel_path)
File "/home/user/.local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "/home/user/.local/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
reader.read()
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 282, in read
self.read_worksheets()
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 219, in read_worksheets
ws = ReadOnlyWorksheet(self.wb, sheet.name, rel.target, self.shared_strings)
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 41, in __init__
self._get_size()
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_read_only.py", line 47, in _get_size
dimensions = parser.parse_dimensions()
File "/home/user/.local/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py", line 166, in parse_dimensions
for _event, element in it:
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1260, in iterator
root = pullparser._close_and_return_root()
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1307, in _close_and_return_root
root = self._parser.close()
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1729, in close
self._raiseerror(v)
File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1629, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
----------------------------------------------------------------------
Ran 1 test in 0.699s
FAILED (errors=1)
Misc
Referenced by https://codeberg.org/buhtz/buhtzology/issues/28
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (17 by maintainers)
Well, if are going to patch the file system access, this is somewhat unavoidable. As I wrote, patching the fs access of some common libraries is a convenience, as most users won’t have the time or patience to do it themselves, but this is currently only done for
pandasand the relatedxlrd, anddjango.In the case of
pandas,xlrd(and probablyopenpyxl) the patch just ensures that the library uses Python for filesystem access, which it would do anyway in some environments. This way, you won’t catch some potential bugs inpandasor related libraries implementation that would only occur if using the C libraries, but that shouldn’t be the scope of tests. If you need completely realistic tests, you have to check in the real fs, a RAM disk could help to that faster.Yes, it is.