efcore: Incorrect string matching behavior on SQL Server with whitespace patterns
I’m getting wrong results from database when trying to filter string (Contains
, StartsWith
, EndsWith
, etc) with spaces.
Steps to reproduce
I’m build query like:
var test = " ";
var venues = await context.Venues.Where(x => x.Name.Contains(test)).ToListAsync(cancellationToken);
Which gets translated to:
SELECT [v].[Id], [v].[Name]
FROM [Venues] AS [v]
WHERE (@__test_0 = N'') OR CHARINDEX(@__test_0, [v].[Name]) > 0
This query returns all rows from database when @__test_0
contains spaces.
Further technical details
EF Core version: 3.1.0 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET Core 3.1
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 24 (20 by maintainers)
Commits related to this issue
- Rewrite string equality with LIKE on SQL Server Since on SQL Server the equality operator ignores trailing whitespace, we can use LIKE when comparing to constants. Fixes #19402 Note: this is WIP, t... — committed to dotnet/efcore by roji 4 years ago
- Rewrite string equality with LIKE on SQL Server Since on SQL Server the equality operator ignores trailing whitespace, we can use LIKE when comparing to constants. Fixes #19402 Note: this is WIP, t... — committed to dotnet/efcore by roji 4 years ago
- Rewrite string equality with LIKE on SQL Server Since on SQL Server the equality operator ignores trailing whitespace, we can use LIKE when comparing to constants. Fixes #19402 Note: this is WIP, t... — committed to dotnet/efcore by roji 4 years ago
- Fix SQL Server Contains whitespace issues * Translate to LIKE instead of CHARINDEX for constant patterns * Change empty string compensation from equality to LIKE Fixes #19402 — committed to dotnet/efcore by roji 4 years ago
- Fix SQL Server Contains whitespace issues * Translate to LIKE instead of CHARINDEX for constant patterns * Change empty string compensation from equality to LIKE Fixes #19402 — committed to dotnet/efcore by roji 4 years ago
- Fix SQL Server Contains whitespace issues * Translate to LIKE instead of CHARINDEX for constant patterns * Change empty string compensation from equality to LIKE Fixes #19402 — committed to dotnet/efcore by roji 4 years ago
- Fix SQL Server Contains whitespace issues (#21097) * Translate to LIKE instead of CHARINDEX for constant patterns * Change empty string compensation from equality to LIKE Fixes #19402 — committed to dotnet/efcore by roji 4 years ago
@zachrybaker my message above may have been a bit inaccurate - in some cases, the translations of Contains, StartsWith and EndsWith will work correctly with regards to trailing whitespace (i.e. when patterns are constant we will generate LIKE, which doesn’t ignore trailing whitespace). Once we implement client-side parameter transformations (which we do intend to do), the same technique can be used for parameter patterns as well.
Re other workaround with len… First, your stack overflow link seems broken. Second, that elaborate workaround still returns a false positive when the value is shorter than the pattern:
It’s possible to add a condition for that too, but at some point that kind of complexity does not seem appropriate for translating a very common .NET method like StartsWith (and properly has some perf overhead, with all the casting, concatenation, etc. etc). In any case, as you say, it’s always possible for people to express this themselves with EF.Functions.Like.
The larger picture here is that it’s impossible to handle this well for much more basic cases without a significant perf impact - I’m thinking about simple equality. When users compare two VARCHAR columns, trailing SQL will be truncated. Any attempt to compensate for that with elaborate logic such as the above would prevent index usage - which is the last thing you want to do for the default text equality operator. Since it’s not possible to fix this for basic equality, I don’t think it makes sense to go so far for StartsWith either.
Since
a = b
gives incorrect results when either of them have trailing whitespace, we should do it for any case we can improve. That means, if column comparison is also hitting bug then we convert that too.Equality can be generated outside of SqlTranslator.VisitBinary too. So SqlTranslator wouldn’t work. Option would be either SqlExpressionFactory.Equal generates different for string or post-processor would take care of rewriting. PostProcessor would be slightly more preferable as it avoids .Equal not returning SqlBinary.
Summary
On SQL Server,
LEN(col) = 0
suffers from the same issue ascol = ''
: it identifies spaces-only strings a having length 0 (incidentally it also does not utilize indexes).However,
col LIKE ''
may be a good solution: it only identifies truly empty strings and uses an index. At least on PostgreSQL (where this whole trailing spaces madness doesn’t happen), LIKE also uses an index.We should ideally verify also on Sqlite and MySQL, though I’d have to spend more time to understand how to analyze queries there. We can postpone dealing with this to see whether we end up doing #17598 for 5.0 (which I’d really like to).
SQL Server
Behavior around trailing spaces
Index usage
PostgreSQL
Behavior around trailing spaces
Index usage
Sqlite
Behavior around trailing spaces
Index usage
Thanks for explaining, that makes sense. Right, I guess only parameter expressions can feasibly be escaped.
The current
LEN/LEFT
based translation is not SARGable. This means that an index cannot be used to satisfy aStartsWith
predicate. I am currently facing this as a performance issue in a LINQ to SQL migration. L2S translatedStartsWith
toLIKE
with escaping for wildcards. This has been proposed above as well.SQL Server is capable of seeking on an index with a parameterized
LIKE
pattern. The pattern does not need to be constant.This is implemented by SQL Server inserting an internal function into the query plan. This function takes the (dynamic)
LIKE
pattern and computes at runtime upper and lower seek bounds. This turns out to be efficient for commonly used patterns.I’d be curious if
LIKE
orCHARINDEX
are about equal in performance forContains
searches. If yes, thenLIKE
could be used as a common workhorse for all string searching operations.I believe that
LIKE
has better behavior with respect to whitespace as well.For a simple equality comparison (
entity.Column == parameter)
this should be translated to a normal=
on the SQL side.Design decisions:
Set up a quick EF6 test project for Contains (with .NET Core on Linux!), results are below.
tl;dr EF6 seems to be doing the right thing, always using LIKE for Contains, escaping wildcards for both constants and parameters. For columns it falls back to CHARINDEX, but does not compensate for empty string (so it has a false negative bug).
For EF Core, here’s what I think we should do:
Constant pattern:
Parameter pattern:
Parameterized pattern with wildcards:
Column pattern:
Post-processor indeed sounds better to me (but should be before relational command cache etc.). Will prepare this.
Note for team: I was able to reproduce this with 3.1. More minimal repro and output: