msphpsql: "right truncation" error from sqlsrv_fetch_array of long VARCHAR(max) column containing 8-bit characters
PHP Driver version or file name
sqlsrv support => enabled ExtensionVer => 5.9.0+14119
Directive => Local Value => Master Value sqlsrv.WarningsReturnAsErrors => On => On sqlsrv.LogSeverity => 0 => 0 sqlsrv.LogSubsystems => 0 => 0 sqlsrv.ClientBufferMaxKBSize => 10240 => 10240 sqlsrv.SetLocaleInfo => 2 => 2
SQL Server version
15.0.2000.5 (earlier versions also repro)
Client operating system
Ubuntu 20.04.2 LTS
PHP version
PHP 7.4.3 (cli) (built: Oct 6 2020 15:47:56) ( NTS ) Copyright © The PHP Group Zend Engine v3.4.0, Copyright © Zend Technologies with Zend OPcache v7.4.3, Copyright ©, by Zend Technologies with Xdebug v2.9.2, Copyright © 2002-2020, by Derick Rethans
Microsoft ODBC Driver version
unixodbc-dev | 2.3.7 | https://packages.microsoft.com/ubuntu/20.04/prod focal/main amd64 Packages unixodbc-dev | 2.3.6-0.1build1 | http://archive.ubuntu.com/ubuntu focal/main amd64 Packages
Problem description
When using sqlsrv_fetch_array fails with String data, right truncation under the following conditions:
- The column is type
VARCHAR(max)or the column type is just barely big enough to fit the exact data - The column is over about 2048 bytes long
- The column contains ASCII values > 127
- The collation of the column is
NOT LIKE '%UTF8'(i.e. it is something likeSQL_Latin1_General...and notLatin1_General_100_CI_AS_SC_UTF8) - The connection options
CharacterSetis not set or set toSQLSRV_ENC_CHAR(and not set to'UTF-8')
In this case the column’s binary data happens to be UTF-8, but I don’t think that matters since all byte sequences are legal Latin1 and I think it’s converting from suspected-Latin1 to UTF-8.
I suspect that some code that does dynamic allocation to fetch results is starting with a 2048 byte buffer and mis-handling the case where the Latin1->UTF-8 conversion causes the string to grow. In the VARCHAR(n) cases that same code probably does exact allocation and avoids the bug whenever n is big enough to contain the encoded result.
Expected behavior and actual behavior
Expected successful read of string data (possibly, in this case, with the 8-bit characters presumed to be Latin1 and actually turned into UTF-8 codepoints).
Actually got String data, right truncation
Repro code or steps to reproduce
Note that the 8 bit characters ñ represent n-with-tilde-on-top as raw UTF-8-expressed-as-Latin1
$connInfo = array(
'Database' => 'Test',
'UID' => 'test',
'PWD' => 'test',
'LoginTimeout' => 5,
'CharacterSet' => SQLSRV_ENC_CHAR
);
$conn = sqlsrv_connect($host, $connInfo);
# leading string >= 2045 leading to result length > 2048
# column must be VARCHAR(MAX) or VARCHAR(2048) (starts working with bigger VARCHAR(n), e.g. 2060)
# 'CharacterSet' connInfo must not be set to SQLSRV_ENC_CHAR (works with UTF-8)
# COLLATE must not be %UTF8% (e.g. Latin1_General_100_CI_AS_SC_UTF8 works)
$sql = "DROP TABLE IF EXISTS #tmpTest;
SET NOCOUNT ON;
DECLARE @val VARCHAR(8000) = REPLICATE('a', 2045) + 'ñ';
CREATE TABLE #tmpTest (testCol VARCHAR(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS);
INSERT INTO #tmpTest (testCol) VALUES (@val);
SELECT * from #tmpTest;";
$stmt = sqlsrv_query($conn, $sql);
$row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC);
$errors = sqlsrv_errors(SQLSRV_ERR_ERRORS);
# String data, right truncation
var_dump($row, $errors);
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (10 by maintainers)
@bjj FYI, in light of ODBC driver’s autotranslation when using different codepages, we have decided to fix this issue in PHP drivers to make it more flexible in handling the possible UTF-8 characters when fetching ASCII data.
My mistake – I did the test in a container that did not have
en_US.ISO-8859-1configured. I added it and I see now and it does make a difference.