PhpSpreadsheet: Incorrect date rendering
This is:
- [X] a bug report
- [ ] a feature request
- [X] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)
What is the expected behavior?
If I open “dateSample.xlsx” in my LibreOffice or in Excel, I can read exactly the following (yes, I’m French) :
| date formats | standard format |
|---|---|
| 16-mai | 16-mai |
| 16/05/2022 | 16/05/2022 |
| lundi 16 mai 2022 | lundi 16 mai 2022 |
| 16/05/22 | 16/05/22 |
As you can see, both columns show the same, even if first column is a date format (value 44697) in the sheet1.xml file.
What is the current behavior?
If I open and read the file with PHPSpreadsheet with this code :
<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Reader\Xlsx;
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReaderForFile("dateSample.xlsx");
$reader->setReadDataOnly(false);
$spreadsheet=$reader->load("dateSample.xlsx");
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);
I obtain :
array(5) {
[1]=>
array(2) {
["A"]=>
string(12) "date formats"
["B"]=>
string(15) "standard format"
}
[2]=>
array(2) {
["A"]=>
string(6) "16-May"
["B"]=>
string(6) "16-mai"
}
[3]=>
array(2) {
["A"]=>
string(9) "5/16/2022"
["B"]=>
string(10) "16/05/2022"
}
[4]=>
array(2) {
["A"]=>
string(20) "Monday, May 16, 2022"
["B"]=>
string(17) "lundi 16 mai 2022"
}
[5]=>
array(2) {
["A"]=>
string(8) "16/05/22"
["B"]=>
string(8) "16/05/22"
}
}
Q1) is not a bug for 3A ? why 5/16 (month/day) while 5A 16/05 (day/month) is the expected order ?
Q2) 2A and 4A are not in French, how to get them translated ?
What are the steps to reproduce?
The source code provided should be used for reproducing with the enclosed excel file.
What features do you think are causing the issue
- Reader
- Writer
- Styles
- Data Validations
- Formula Calulations
- Charts
- AutoFilter
- Form Elements
Does an issue affect all spreadsheet file formats? If not, which formats are affected?
The same file open and saved as ODS via libreoffice render each date as “16-May-22” … which is different (but not really better)
Which versions of PhpSpreadsheet and PHP are affected?
I’m using phpoffice/phpspreadsheet 1.23.0 and PHP 8.1.5 (cli)
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 26 (5 by maintainers)
@AnwarQasem
Dates in MS Excel 101
MS Excel uses a serialized timestamp to represent a date value, so the date/time value is stored in the file as a number like
44186. e.g.Combined with a number format mask, this can be rendered as a date value like ``21/12/2021`, and my description in a previous response to this thread explains how that serialized timestamp value is linked to a format mask.
By defalt, PhpSpreadsheet loads both the value (number
44186) and the format mask (e.g.dd/mm/yy;@) for the cell (unless you tell it to load data only from the file, when it only loads the value).If you make a call to
getValue()for that cell, PhpSPreadsheet will return the serialized timestamp, the number44186; if you callgetFormattedValue()then PhpSpreadsheet will return a date string formatted according to the mask (e.g.21/12/2021) that it is linked to.This is similar to using Unix timestamps in standard PHP, although MS Excel counts in days rather than seconds, and uses the float part of its timestamp to represent the time; and the base date could be either 1/1/1900 or 1/1/1904 (depending on whether the file was originally created in Windows Excel or Mac Excel) rather than the 1/1/1970 used by a unix timestamp.
If you load the file with
readDataOnlyset totrue, then PhpSpreadsheet will only load the cell values, but not load style information like the number format masks, and there is no way to identify that44186should be formatted as a date, because it can’t be distinguished from any other number. Even then, the methods in the Shared/Date class allow you to convert that number to a unix timestamp, or to a PHP DateTime object that you can then format however you choose; but because you have explicitly told PhpSpreadsheet not to load style information, we have to assume that this is what you want.So there is nothing to “fix” in PhpSpreadsheet, the data as it is in Excel itself (the number
44186), the datatype is matched as an Excel datatype number, the format mask is read and associated with the cell (unless you tell PhpSpreadseet not to make this association viareadDataOnly). We provide methods to read both the raw numeric value read from Excel, and the formatted “date” value. We provide methods to convert Excel timestamp values to unix timestamp values or PHP DateTime objects so that you can manipulate or format the values using standard PHP functions if you wish.And all this is described in our documentation.
The cell entries in
sheet1has ansattribute that is a style reference:This points to an entry in the
cellXfselement of the styles:So for cell
A3with thesattribute value of3, this points to the 4th entry in thatcellXfslist (offsets counted from 0)with
applyNumberFormatset totrue, and anumFmtIdof14, which is a built-in format code, as defined in the ECMA specification.For cell
A5with thesattribute value of7, this points to the 8th entry in thatcellXfslist:with
applyNumberFormatset totrue, and anumFmtIdof166, which is defined in thenumFmtsentry in styles with the explicitly defined Id of166:so that
formatCodevalue is the number format mask used for the value in cellA5There is no option for this. The Calculation Engine supports locale settings for formulae, allowing you to use French-language names for Excel functions, and
;as a function argument separator; and we have recently provided some element of support for locale when reading data from csv (such as handling,as a decimal separator rather than., and recognisingVRAIas a boolean true), but this is still fairly limited. I did explore using PHP’s Intl extension for handling formatting, but it is still over-complicated and buggy, and prone to issues depending on the version of the underlying ICU version. Enforcing it as a requirement for PhpSpreadsheet when it creates more problems than it might resolve didn’t seem like a good trade-off.That depends on the exact format mask that was stored in the Excel file. The code used for A5 in this file is an explicitly defined code of
dd/mm/yy;@. Cell A3 uses format id 14, one of the MS pre-defined styles (which can be found in https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Style/NumberFormat.php#L260) and maps tom/d/yyyy(officially this should bemm-dd-yyaccording to the ECMA specification, but MS overrides that, and we map to the MS format)… note that this is also one of Microsoft’s “locale aware” formatsThe region itself is never stored in the Excel file; this is purely a feature of the Excel application itself (it is possible to change region within the application, or to send files to people who have different region settings for their Excel, and values will be rendered according to the locale settings in their Excel)
My current settings are English/UK
Internally, PhpSpreadsheet uses
en_US.The point where this can be a problem is when trying to convert a string value like
11/05/2022… is this the 11th of May 2022? Or is it the 5th of November 2022.One other potential issue with number format masking for dates is that some of the MS defined formats are locale-aware, others are not:
Those formats shown with a ‘*’ in the type selector are locale-aware, and will display according to locale settings in MS Excel; those displayed without the ‘*’ are “absolute” and should always appear in exactly that format. PhpSpreadsheet doesn’t support this difference when rendering dates.
When I open your spreadsheet (my setting is English US):
This is, of course, different than your setting because of the different regional settings, but it is entirely consistent with the results you’re seeing. Excel (and therefore PhpSpreadsheet) does not save the regional settings with the file.