msphpsql: "right truncation" error from sqlsrv_fetch_array of long VARCHAR(max) column containing 8-bit characters

PHP Driver version or file name

sqlsrv support => enabled ExtensionVer => 5.9.0+14119

Directive => Local Value => Master Value sqlsrv.WarningsReturnAsErrors => On => On sqlsrv.LogSeverity => 0 => 0 sqlsrv.LogSubsystems => 0 => 0 sqlsrv.ClientBufferMaxKBSize => 10240 => 10240 sqlsrv.SetLocaleInfo => 2 => 2

SQL Server version

15.0.2000.5 (earlier versions also repro)

Client operating system

Ubuntu 20.04.2 LTS

PHP version

PHP 7.4.3 (cli) (built: Oct 6 2020 15:47:56) ( NTS ) Copyright © The PHP Group Zend Engine v3.4.0, Copyright © Zend Technologies with Zend OPcache v7.4.3, Copyright ©, by Zend Technologies with Xdebug v2.9.2, Copyright © 2002-2020, by Derick Rethans

Microsoft ODBC Driver version

unixodbc-dev | 2.3.7 | https://packages.microsoft.com/ubuntu/20.04/prod focal/main amd64 Packages unixodbc-dev | 2.3.6-0.1build1 | http://archive.ubuntu.com/ubuntu focal/main amd64 Packages

Problem description

When using sqlsrv_fetch_array fails with String data, right truncation under the following conditions:

  • The column is type VARCHAR(max) or the column type is just barely big enough to fit the exact data
  • The column is over about 2048 bytes long
  • The column contains ASCII values > 127
  • The collation of the column is NOT LIKE '%UTF8' (i.e. it is something like SQL_Latin1_General... and not Latin1_General_100_CI_AS_SC_UTF8)
  • The connection options CharacterSet is not set or set to SQLSRV_ENC_CHAR (and not set to 'UTF-8')

In this case the column’s binary data happens to be UTF-8, but I don’t think that matters since all byte sequences are legal Latin1 and I think it’s converting from suspected-Latin1 to UTF-8.

I suspect that some code that does dynamic allocation to fetch results is starting with a 2048 byte buffer and mis-handling the case where the Latin1->UTF-8 conversion causes the string to grow. In the VARCHAR(n) cases that same code probably does exact allocation and avoids the bug whenever n is big enough to contain the encoded result.

Expected behavior and actual behavior

Expected successful read of string data (possibly, in this case, with the 8-bit characters presumed to be Latin1 and actually turned into UTF-8 codepoints).

Actually got String data, right truncation

Repro code or steps to reproduce

Note that the 8 bit characters ñ represent n-with-tilde-on-top as raw UTF-8-expressed-as-Latin1

 $connInfo = array(
    'Database' => 'Test',
    'UID' => 'test',
    'PWD' => 'test',
    'LoginTimeout' => 5,
    'CharacterSet' => SQLSRV_ENC_CHAR
);

$conn = sqlsrv_connect($host, $connInfo);

# leading string >= 2045 leading to result length > 2048
# column must be VARCHAR(MAX) or VARCHAR(2048) (starts working with bigger VARCHAR(n), e.g. 2060)
# 'CharacterSet' connInfo must not be set to SQLSRV_ENC_CHAR (works with UTF-8)
# COLLATE must not be %UTF8% (e.g. Latin1_General_100_CI_AS_SC_UTF8 works)
$sql = "DROP TABLE IF EXISTS #tmpTest;
		SET NOCOUNT ON;
        DECLARE @val VARCHAR(8000) =  REPLICATE('a', 2045) + 'ñ';
        CREATE TABLE #tmpTest (testCol VARCHAR(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS);
        INSERT INTO #tmpTest (testCol) VALUES (@val);
        SELECT * from #tmpTest;";
$stmt = sqlsrv_query($conn, $sql);
$row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC);
$errors = sqlsrv_errors(SQLSRV_ERR_ERRORS);
# String data, right truncation
var_dump($row, $errors);

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (10 by maintainers)

Most upvoted comments

@bjj FYI, in light of ODBC driver’s autotranslation when using different codepages, we have decided to fix this issue in PHP drivers to make it more flexible in handling the possible UTF-8 characters when fetching ASCII data.

using setlocale() (must be set before making the first connection) seems to solve the issue on my side

My mistake – I did the test in a container that did not have en_US.ISO-8859-1 configured. I added it and I see now and it does make a difference.