pandas: BUG: "with pd.ExcelWriter" produces a corrupt Excel file in case of .xlsm extension
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
excel_path = r"D:\my_test.xlsm"
with pd.ExcelWriter(excel_path, mode='a') as writer:
# df.to_excel(writer, sheet_name='Sheet1')
pass
df_excel = pd.read_excel(excel_path, 'Sheet1')
Issue Description
The specified Excel file gets corrupted, despite the append mode. Changing its extension to .xlsx can retrieve the raw data from it, but the macro is forever gone.
In my example I used with and had only a pass statement inside it, yet the file became useless.
Q1: Can we know for sure that the default engine is used for .xlsm files? Q2: Should I contact OpenPyxl instead of you with this issue?
Expected Behavior
As I’ve read since my first encounter of this bug, ExcelWriter doesn’t support macros, but a warning of some kind would be nice before wiping out a carefully written VBA code from the Excel file.
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 1.3.4 numpy : 1.17.3 pytz : 2021.3 dateutil : 2.8.2 pip : 19.3.1 setuptools : 40.8.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : None numexpr : None odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 32 (13 by maintainers)
I think I understand now. xlsm / xlsb files can contain macros, whereas xlsx cannot. Openpyxl allows the engine argument
keep_vba
. Running this on the test file here preserves the macro:What happens when you run this on Windows @maximrobi?
In order to have a warning, we’d need to inspect the file as well as the engine_kwargs to determine if macros will be lost, warning when they are. +1 on adding such a warning.
Do not save the writer, close it:
writer.close()
What I do to create a valid .xlsm file directly in python is also with openpyxl I do something like
The first part with
Workbook()
creates the file, but it is corrupted, however, when you open it withload_workbook
and thekeep_vba=True
arg, the file is magically fixed and ready to be manipulated.The most I could do with this was to modify
pandas.io.excel._openpyxl.py.OpenpyxlWriter.__init__.book (line 65)
to directly add"keep_vba = True"
since this line does not include any motor_kwargs. And from what I tried this doesn’t affect .xlsx filesHowever to append data to existing sheet, you should do something like:
or modify again
_openpyxl.py
to add something like an"append"
option todef write_cells()
We always write the workbook back, even if no changes are made. That’s how we lose features that are not supported by an engine.
The truncate shouldn’t cause an issue here (it is actually needed). It is there in case the original file is larger than the new file (for example, if unsupported features are dropped by the engine). Otherwise, we would simply overwrite the first n-bits of the file and leave the remaining bits untouched, truncate removes those remaining bits.