PhpSpreadsheet: Incorrect date rendering

This is:

- [X] a bug report
- [ ] a feature request
- [X] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

dateSample.xlsx

If I open “dateSample.xlsx” in my LibreOffice or in Excel, I can read exactly the following (yes, I’m French) :

date formats standard format
16-mai 16-mai
16/05/2022 16/05/2022
lundi 16 mai 2022 lundi 16 mai 2022
16/05/22 16/05/22

As you can see, both columns show the same, even if first column is a date format (value 44697) in the sheet1.xml file.

What is the current behavior?

If I open and read the file with PHPSpreadsheet with this code :

<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Reader\Xlsx;
$reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReaderForFile("dateSample.xlsx");
$reader->setReadDataOnly(false);
$spreadsheet=$reader->load("dateSample.xlsx");
$sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
var_dump($sheetData);

I obtain :

array(5) {
  [1]=>
  array(2) {
    ["A"]=>
    string(12) "date formats"
    ["B"]=>
    string(15) "standard format"
  }
  [2]=>
  array(2) {
    ["A"]=>
    string(6) "16-May"
    ["B"]=>
    string(6) "16-mai"
  }
  [3]=>
  array(2) {
    ["A"]=>
    string(9) "5/16/2022"
    ["B"]=>
    string(10) "16/05/2022"
  }
  [4]=>
  array(2) {
    ["A"]=>
    string(20) "Monday, May 16, 2022"
    ["B"]=>
    string(17) "lundi 16 mai 2022"
  }
  [5]=>
  array(2) {
    ["A"]=>
    string(8) "16/05/22"
    ["B"]=>
    string(8) "16/05/22"
  }
}

Q1) is not a bug for 3A ? why 5/16 (month/day) while 5A 16/05 (day/month) is the expected order ?

Q2) 2A and 4A are not in French, how to get them translated ?

What are the steps to reproduce?

The source code provided should be used for reproducing with the enclosed excel file.

What features do you think are causing the issue

  • Reader
  • Writer
  • Styles
  • Data Validations
  • Formula Calulations
  • Charts
  • AutoFilter
  • Form Elements

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

The same file open and saved as ODS via libreoffice render each date as “16-May-22” … which is different (but not really better)

Which versions of PhpSpreadsheet and PHP are affected?

I’m using phpoffice/phpspreadsheet 1.23.0 and PHP 8.1.5 (cli)

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 26 (5 by maintainers)

Most upvoted comments

@AnwarQasem

Until this is fixed, is there a way to import with all fields as strings, and be takes as they are in the excel ?

Dates in MS Excel 101

MS Excel uses a serialized timestamp to represent a date value, so the date/time value is stored in the file as a number like 44186. e.g.

<row r="3" x14ac:dyDescent="0.25" spans="1:2">
    <c r="A3" s="3">
        <v>44697</v>
    </c>
</row>

Combined with a number format mask, this can be rendered as a date value like ``21/12/2021`, and my description in a previous response to this thread explains how that serialized timestamp value is linked to a format mask.

By defalt, PhpSpreadsheet loads both the value (number 44186) and the format mask (e.g. dd/mm/yy;@) for the cell (unless you tell it to load data only from the file, when it only loads the value).

If you make a call to getValue() for that cell, PhpSPreadsheet will return the serialized timestamp, the number 44186; if you call getFormattedValue() then PhpSpreadsheet will return a date string formatted according to the mask (e.g. 21/12/2021) that it is linked to.

This is similar to using Unix timestamps in standard PHP, although MS Excel counts in days rather than seconds, and uses the float part of its timestamp to represent the time; and the base date could be either 1/1/1900 or 1/1/1904 (depending on whether the file was originally created in Windows Excel or Mac Excel) rather than the 1/1/1970 used by a unix timestamp.

If you load the file with readDataOnly set to true, then PhpSpreadsheet will only load the cell values, but not load style information like the number format masks, and there is no way to identify that 44186 should be formatted as a date, because it can’t be distinguished from any other number. Even then, the methods in the Shared/Date class allow you to convert that number to a unix timestamp, or to a PHP DateTime object that you can then format however you choose; but because you have explicitly told PhpSpreadsheet not to load style information, we have to assume that this is what you want.

So there is nothing to “fix” in PhpSpreadsheet, the data as it is in Excel itself (the number 44186), the datatype is matched as an Excel datatype number, the format mask is read and associated with the cell (unless you tell PhpSpreadseet not to make this association via readDataOnly). We provide methods to read both the raw numeric value read from Excel, and the formatted “date” value. We provide methods to convert Excel timestamp values to unix timestamp values or PHP DateTime objects so that you can manipulate or format the values using standard PHP functions if you wish.

And all this is described in our documentation.

The cell entries in sheet1 has an s attribute that is a style reference:

<row r="3" x14ac:dyDescent="0.25" spans="1:2">
    <c r="A3" s="3">
        <v>44697</v>
    </c>
    <c r="B3" t="s" s="2">
        <v>1</v>
    </c>
</row>

This points to an entry in the cellXfs element of the styles:

<cellXfs count="8">
    <xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0"/>
    <xf numFmtId="16" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" quotePrefix="1"/>
    <xf numFmtId="14" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="165" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
    <xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" applyAlignment="1">
        <alignment horizontal="center"/>
    </xf>
    <xf numFmtId="0" borderId="0" fillId="0" fontId="0" xfId="0" quotePrefix="1" applyAlignment="1">
        <alignment horizontal="center"/>
    </xf>
    <xf numFmtId="166" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>
</cellXfs>

So for cell A3 with the s attribute value of 3, this points to the 4th entry in that cellXfs list (offsets counted from 0)

<xf numFmtId="14" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>

with applyNumberFormat set to true, and a numFmtId of 14, which is a built-in format code, as defined in the ECMA specification.

For cell A5 with the s attribute value of 7, this points to the 8th entry in that cellXfs list:

<xf numFmtId="166" borderId="0" fillId="0" fontId="0" xfId="0" applyNumberFormat="1"/>

with applyNumberFormat set to true, and a numFmtId of 166, which is defined in the numFmts entry in styles with the explicitly defined Id of 166:

<numFmt formatCode="dd/mm/yy;@" numFmtId="166"/>

so that formatCode value is the number format mask used for the value in cell A5

Internally, PhpSpreadsheet uses en_US

1/ is there a way to change this ?

There is no option for this. The Calculation Engine supports locale settings for formulae, allowing you to use French-language names for Excel functions, and ; as a function argument separator; and we have recently provided some element of support for locale when reading data from csv (such as handling , as a decimal separator rather than ., and recognising VRAI as a boolean true), but this is still fairly limited. I did explore using PHP’s Intl extension for handling formatting, but it is still over-complicated and buggy, and prone to issues depending on the version of the underlying ICU version. Enforcing it as a requirement for PhpSpreadsheet when it creates more problems than it might resolve didn’t seem like a good trade-off.

2/ how does PhpSpreadsheet decide to convert to “5/16/2022” for A3 or “16/05/22” for A5 ?

That depends on the exact format mask that was stored in the Excel file. The code used for A5 in this file is an explicitly defined code of dd/mm/yy;@. Cell A3 uses format id 14, one of the MS pre-defined styles (which can be found in https://github.com/PHPOffice/PhpSpreadsheet/blob/master/src/PhpSpreadsheet/Style/NumberFormat.php#L260) and maps to m/d/yyyy (officially this should be mm-dd-yy according to the ECMA specification, but MS overrides that, and we map to the MS format)… note that this is also one of Microsoft’s “locale aware” formats

The region itself is never stored in the Excel file; this is purely a feature of the Excel application itself (it is possible to change region within the application, or to send files to people who have different region settings for their Excel, and values will be rendered according to the locale settings in their Excel)

My current settings are English/UK image Internally, PhpSpreadsheet uses en_US.

The point where this can be a problem is when trying to convert a string value like 11/05/2022… is this the 11th of May 2022? Or is it the 5th of November 2022.

One other potential issue with number format masking for dates is that some of the MS defined formats are locale-aware, others are not:

image

Those formats shown with a ‘*’ in the type selector are locale-aware, and will display according to locale settings in MS Excel; those displayed without the ‘*’ are “absolute” and should always appear in exactly that format. PhpSpreadsheet doesn’t support this difference when rendering dates.

When I open your spreadsheet (my setting is English US): image This is, of course, different than your setting because of the different regional settings, but it is entirely consistent with the results you’re seeing. Excel (and therefore PhpSpreadsheet) does not save the regional settings with the file.