efcore: IN() list queries are not parameterized, causing increased SQL Server CPU usage

Currently, EF Core is not parameterizing IN(...) queries created from .Contains() (and maybe other cases). This has a very detrimental impact on SQL Server itself because:

The query hash is different for each distinct set of IDs passed in.
SQL Server needs to calculate a plan for each of these distinct combinations (CPU hit).
The query plans created occupy an entry in SQL Server’s plan cache (unless ad-hoc behavior is on), fighting for resources on other plans and cascading CPU issues by re-calculations due to evictions.

Note: when SQL Server has memory pressure, plan cache is the first thing to empty. So this has a profound impact at scale, doubly so when things have gone sideways.

Steps to reproduce

Here’s a reduced down version of the problem:

var ids = Enumerable.Range(1, 1000).ToList();
var throwAwary = context.Users.Where(u => ids.Contains(u.Id)).ToList();

This results in the following:

SELECT [u].[Id], [u].[AcceptRateAccepted], [u].[AcceptRateAsked], [u].[AccountId], [u].[AnswerCount], [u].[BronzeBadges], [u].[CreationDate], [u].[DaysVisitedConsecutive], [u].[DaysVisitedTotal], [u].[DisplayName], [u].[Email], [u].[Flags], [u].[GoldBadges], [u].[HasAboutMeExcerpt], [u].[HasReplies], [u].[IsVeteran], [u].[JobSearchStatus], [u].[LastAccessDate], [u].[LastDailySiteAccessDate], [u].[LastEmailDate], [u].[LastLoginDate], [u].[LastLoginIP], [u].[LastModifiedDate], [u].[Location], [u].[OptInEmail], [u].[PreferencesRaw], [u].[ProfileImageUrl], [u].[QuestionCount], [u].[RealName], [u].[Reputation], [u].[ReputationMonth], [u].[ReputationQuarter], [u].[ReputationSinceLastCheck], [u].[ReputationToday], [u].[ReputationWeek], [u].[ReputationYear], [u].[SignupStarted], [u].[SilverBadges], [u].[TeamId], [u].[TeamName], [u].[TimedPenaltyDate], [u].[Title], [u].[UserTypeId], [u].[WebsiteUrl]
FROM [Users] AS [u]
WHERE [u].[Id] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000);

Further technical details

EF Core version: 2.1.4 Database Provider: Microsoft.EntityFrameworkCore.SqlServer Operating system: Windows 2016/Windows 10 IDE: Visual Studio 2017 15.9

Proposal

EF Core should parameterize here. Instead of IN (1, 2, 3, ...) we should see IN (@__ids1, @__ids2, @__ids3, ...) or similar. This would allow query plan cache to be shared. For example if we ran this 1,000 times to fetch 1,000,000 users in batches, we’d have 1 plan in cache, whereas today we have 1,000 plans. Let’s say a user gets removed (or added!) on page 2 of 1,000…today we’d calculate and cache another 999 plans next run.

To further address the cardinality problem, an approach similar to what Dapper does would be at least a starting point. We generate only lists of certain sizes, let’s just say 5, 10, 50, 100, 500, 1000. Here’s an example with 3 parameters:

SELECT [u].[Id], [u].[AcceptRateAccepted], [u].[AcceptRateAsked], [u].[AccountId], [u].[AnswerCount], [u].[BronzeBadges], [u].[CreationDate], [u].[DaysVisitedConsecutive], [u].[DaysVisitedTotal], [u].[DisplayName], [u].[Email], [u].[Flags], [u].[GoldBadges], [u].[HasAboutMeExcerpt], [u].[HasReplies], [u].[IsVeteran], [u].[JobSearchStatus], [u].[LastAccessDate], [u].[LastDailySiteAccessDate], [u].[LastEmailDate], [u].[LastLoginDate], [u].[LastLoginIP], [u].[LastModifiedDate], [u].[Location], [u].[OptInEmail], [u].[PreferencesRaw], [u].[ProfileImageUrl], [u].[QuestionCount], [u].[RealName], [u].[Reputation], [u].[ReputationMonth], [u].[ReputationQuarter], [u].[ReputationSinceLastCheck], [u].[ReputationToday], [u].[ReputationWeek], [u].[ReputationYear], [u].[SignupStarted], [u].[SilverBadges], [u].[TeamId], [u].[TeamName], [u].[TimedPenaltyDate], [u].[Title], [u].[UserTypeId], [u].[WebsiteUrl]
FROM [Users] AS [u]
WHERE [u].[Id] IN (@ids1, @ids2, @ids3, @ids4, @ids5);

The 5 Ids are so that 1-5 values all use the same plan. Anything < n in length repeats the last value (don’t use null!). In this case let’s say our values are 1, 2, and 3. Our parameterization would be:

@ids1 = 1;
@ids2 = 2;
@ids3 = 3;
@ids4 = 3;
@ids5 = 3;

This fetches the users we want, is friendly to the cache, and lessens the amount of generated permutations for all layers.

To put this in perspective, at Stack Overflow scale we’re generating millions of one-time-use plans needlessly in EF Core (the parameterization in Linq2Sql lowered this to only cardinality permutations). We alleviate the cache issue by enabling ad-hoc query mode on our SQL Servers, but that doesn’t lessen the CPU use from all the query hashes involved except in the (very rare) reuse case of the exact same list.

This problem is dangerous because it’s also hard to see. If you’re looking at the plan cache, you’re not going to see it by any sort of impact query analysis. Each query, hash, and plan is different. There’s no sane way to group them. It’s death by a thousand cuts you can’t see. I’m currently finding and killing as many of these in our app as I can, but we should fix this at the source for everyone.

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 273
Comments: 123 (76 by maintainers)

Links to this issue

Commits related to this issue

Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago
Support primitive collections Closes #29427 Closes #30426 Closes #13617 — committed to roji/efcore by roji a year ago

Most upvoted comments

Everyone,

We spent some time investigating this; below is a write-up of the current situation, the possible ways forward and some first conclusions. The tl;dr is that we think it makes sense to implement an OPENJSON-based approach, where the (array parameter (e.g. int[]) is transmitted as a JSON string parameter (`[1,2,3]') and unpacked to a pseudo-table, which we can then be used in the SQL query as usual. As many people have shown an interest here, it would be good to receive feedback on the below!

The problem

It’s very common to filter rows based on an array of (primitive) values that exists on the client. In an EF LINQ query, this looks like the following:

var ids = new[] { 10, 20, 30 };
var customers = await context.Customers.Where(c => ids.Contains(c.Id)).ToArray();

This conceptually translates to a SQL IN expression. However, since array parameters aren’t supported by most databases, EF unpacks the array into the SQL, currently inserting its values as constants:

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (10, 20, 30);

This technique unfortunately means that different array values cause different SQL to get generated. This can cause severe performance problems, since it pollutes database query plans, causing re-planning for each execution and evicts plans for other queries. Less importantly, EF cannot (currently) cache the SQL, as it depends on parameter values; this means some query processing must occur for each execution.

In the benchmark below, this technique is called In_with_constants.

IN expression with parameters

To improve this, we could simply embed the array values as parameters instead of constants:

DECLARE @id1 INT = 10, @id2 INT = 20, @id3 INT = 30;

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (@id1, @id2, @id3);

For all arrays of identical lengths (3 in our case), this produces the same SQL and is therefore optimal. However, varying array lengths still cause differing SQLs, triggering the performance issues. Also, EF still cannot cache the SQL.

In the benchmark below, this technique is called In_with_padded_parameters with PaddingCount = 0.

Padding

A common way to alleviate the differing length problem is to “pad” the IN expression with extra “useless” parameters, up to an arbitrary boundary. For example, the following pads our 3-value array to 5 values:

DECLARE @id1 INT = 10, @id2 INT = 20, @id3 INT = 30, @i4 INT = 30, @i5 INT = 30;

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (@id1, @id2, @id3, @id4, @id5);

Array lengths of 5 and below therefore use the same SQL, at the cost of sending duplicated data to the server. The bigger the padding boundary, the less different SQLs are produced for different array lengths, but the more padding parameters are added on the average, increasing query overhead.

This padding technique has a long history and can be used with various ORMs (nHibernate, jooq, Dapper), though it sometimes requires an opt-in.

In the benchmark below, this technique is called In_with_padded_parameters with PaddingCount > 0.

Sending the values as an array

Rather than expanding an array parameter out to individual values in a SQL IN expression (whether as constants or as parameters), ideally we’d be able to just send the array parameter directly. Doing this directly usually isn’t supported, since databases don’t typically support array types in any way. The two exceptions to this are:

PostgreSQL, which has native support for arrays. The EF PostgreSQL provider already translates LINQ Contains to WHERE x = ANY (@arrayParam), where @arrayParam is a simply .NET array. This yields optimal performance with optimal simplicity.
SQL Server, which supports table-valued parameters or TVPs, which allows sending an array of arbitrary user-defined types. However, this requires pre-creation of the user-defined type (via CREATE TYPE), making it generally unsuitable for EF, and does not perform well (see Temporary_table_with_inner_join in the benchmark below).

However, all modern databases support parsing a JSON string representing an array and evaluating it as a relational resultset (e.g. pseudo-table); this allows us to send a simple string parameter containing e.g. [10,20,30], and use a table-valued SQL function to extract the values out as a table with 3 rows containing those values. Once that’s done, regular SQL can be used to implements the Contains logic. The following uses an inner join to filter out only customers with IDs matching the incoming JSON array (this technique is Array_param_with_inner_join in the benchmarks):

DECLARE @ids NVARCHAR(MAX) = '[10,20,30]';

SELECT * FROM [Customers] AS [c]
JOIN OpenJson(@ids) WITH ([Value] integer '$') AS v ON v.Value = [c].[Id];

A subquery-based approach was also benchmarked (Array_param_with_exists_subquery):

DECLARE @ids NVARCHAR(MAX) = '[10,20,30]';

SELECT * FROM [Customers] AS [c]
WHERE EXISTS (
    SELECT 1
    FROM OpenJson(@ids) WITH ([Value] integer '$') AS [v]
    WHERE v.[Value] = [c].[Id]);

This technique has the following advantages compared to the expansion-based techniques discussed above:

Only one SQL is every needed, regardless of the values or the number. This is optimal both for the database server (single plan), and for EF (it can cache the SQL).
Since no parameter padding is needed, no extra parameter data ever needs to be sent.

Links to the JSON-unpacking functions in the different databases:

SQL Server has OPENJSON. This function is only available with compatibility level 130 or higher (Azure SQL Database and SQL Server 2016 and later; the previous version - SQL Server 2014 goes out of support in 2024). An alternative OPENXML also exists with better compatibility, but performs significantly worse.
SQLite has json_each.
MySQL and MariaDB have json_table.
PostgreSQL has jsonb_to_recordset, though it isn’t required thanks to the native support for arrays.
Oracle has JSON_TABLE.

Benchmark

Benchmark results

Method	Database	NumSearchValues	Found	PaddingCount	Mean	Error	StdDev	Median
In_with_constants	SQLServer	2	False	0	401.3 μs	7.60 μs	9.05 μs	401.5 μs
In_with_padded_parameters	SQLServer	2	False	0	656.1 μs	30.38 μs	89.59 μs	642.9 μs
In_with_padded_parameters	SQLServer	2	False	5	509.0 μs	15.92 μs	46.93 μs	525.0 μs
In_with_padded_parameters	SQLServer	2	False	10	803.3 μs	28.04 μs	82.23 μs	804.9 μs
In_with_padded_parameters	SQLServer	2	False	50	1,314.9 μs	50.84 μs	149.10 μs	1,321.6 μs
In_with_padded_parameters	SQLServer	2	False	100	681.3 μs	26.25 μs	77.41 μs	704.8 μs
In_with_padded_parameters	SQLServer	2	False	500	1,933.4 μs	38.61 μs	84.75 μs	1,951.3 μs
Array_param_with_exists_subquery	SQLServer	2	False	0	964.6 μs	32.73 μs	96.00 μs	961.2 μs
Array_param_with_inner_join	SQLServer	2	False	0	964.8 μs	28.93 μs	85.30 μs	972.4 μs
OpenXml_with_inner_join	SQLServer	2	False	0	1,598.6 μs	36.22 μs	105.67 μs	1,597.8 μs
Temporary_table_with_inner_join	SQLServer	2	False	0	1,067.0 μs	35.11 μs	102.41 μs	1,071.4 μs
In_with_constants	SQLServer	2	True	0	603.1 μs	28.55 μs	84.18 μs	593.9 μs
In_with_padded_parameters	SQLServer	2	True	0	671.0 μs	30.51 μs	89.96 μs	666.1 μs
In_with_padded_parameters	SQLServer	2	True	5	520.4 μs	21.97 μs	64.77 μs	539.6 μs
In_with_padded_parameters	SQLServer	2	True	10	803.4 μs	29.80 μs	87.86 μs	808.9 μs
In_with_padded_parameters	SQLServer	2	True	50	1,260.5 μs	52.32 μs	154.26 μs	1,272.0 μs
In_with_padded_parameters	SQLServer	2	True	100	727.2 μs	19.48 μs	57.45 μs	742.5 μs
In_with_padded_parameters	SQLServer	2	True	500	1,943.0 μs	47.99 μs	141.51 μs	1,954.0 μs
Array_param_with_exists_subquery	SQLServer	2	True	0	958.9 μs	23.60 μs	69.57 μs	955.3 μs
Array_param_with_inner_join	SQLServer	2	True	0	963.7 μs	28.06 μs	82.28 μs	965.0 μs
OpenXml_with_inner_join	SQLServer	2	True	0	1,576.0 μs	49.12 μs	143.28 μs	1,566.8 μs
Temporary_table_with_inner_join	SQLServer	2	True	0	1,066.0 μs	40.26 μs	118.06 μs	1,060.8 μs
In_with_constants	SQLServer	10	False	0	426.8 μs	12.03 μs	35.48 μs	436.6 μs
In_with_padded_parameters	SQLServer	10	False	0	773.0 μs	28.31 μs	83.46 μs	775.3 μs
In_with_padded_parameters	SQLServer	10	False	5	844.6 μs	32.35 μs	95.38 μs	857.9 μs
In_with_padded_parameters	SQLServer	10	False	10	946.5 μs	45.03 μs	129.93 μs	953.0 μs
In_with_padded_parameters	SQLServer	10	False	50	1,338.0 μs	50.94 μs	150.20 μs	1,324.4 μs
In_with_padded_parameters	SQLServer	10	False	100	768.6 μs	25.58 μs	75.43 μs	789.9 μs
In_with_padded_parameters	SQLServer	10	False	500	1,963.7 μs	46.30 μs	136.51 μs	1,999.9 μs
Array_param_with_exists_subquery	SQLServer	10	False	0	977.4 μs	30.95 μs	90.78 μs	976.6 μs
Array_param_with_inner_join	SQLServer	10	False	0	969.2 μs	22.15 μs	64.97 μs	967.7 μs
OpenXml_with_inner_join	SQLServer	10	False	0	1,643.6 μs	39.89 μs	116.98 μs	1,641.3 μs
Temporary_table_with_inner_join	SQLServer	10	False	0	1,112.9 μs	40.38 μs	117.16 μs	1,095.6 μs
In_with_constants	SQLServer	10	True	0	668.2 μs	27.62 μs	80.99 μs	661.5 μs
In_with_padded_parameters	SQLServer	10	True	0	768.7 μs	24.35 μs	70.65 μs	769.3 μs
In_with_padded_parameters	SQLServer	10	True	5	858.4 μs	31.42 μs	92.64 μs	865.1 μs
In_with_padded_parameters	SQLServer	10	True	10	947.6 μs	29.00 μs	85.06 μs	938.9 μs
In_with_padded_parameters	SQLServer	10	True	50	1,320.1 μs	56.63 μs	166.99 μs	1,309.1 μs
In_with_padded_parameters	SQLServer	10	True	100	1,250.5 μs	57.92 μs	169.86 μs	1,271.9 μs
In_with_padded_parameters	SQLServer	10	True	500	2,692.6 μs	128.12 μs	377.78 μs	2,776.5 μs
Array_param_with_exists_subquery	SQLServer	10	True	0	1,026.5 μs	32.06 μs	94.54 μs	1,012.8 μs
Array_param_with_inner_join	SQLServer	10	True	0	996.8 μs	24.75 μs	72.20 μs	988.7 μs
OpenXml_with_inner_join	SQLServer	10	True	0	1,642.7 μs	40.46 μs	119.29 μs	1,642.5 μs
Temporary_table_with_inner_join	SQLServer	10	True	0	1,115.2 μs	38.13 μs	111.83 μs	1,110.0 μs
In_with_constants	SQLServer	100	False	0	524.9 μs	24.01 μs	70.42 μs	533.1 μs
In_with_padded_parameters	SQLServer	100	False	0	1,226.8 μs	47.63 μs	139.70 μs	1,236.9 μs
In_with_padded_parameters	SQLServer	100	False	5	1,245.0 μs	61.92 μs	182.56 μs	1,207.2 μs
In_with_padded_parameters	SQLServer	100	False	10	1,247.6 μs	53.19 μs	156.01 μs	1,248.4 μs
In_with_padded_parameters	SQLServer	100	False	50	1,279.4 μs	72.58 μs	214.01 μs	1,240.0 μs
In_with_padded_parameters	SQLServer	100	False	100	1,548.1 μs	94.59 μs	278.91 μs	1,593.0 μs
In_with_padded_parameters	SQLServer	100	False	500	2,832.0 μs	160.80 μs	474.12 μs	2,998.7 μs
Array_param_with_exists_subquery	SQLServer	100	False	0	1,088.3 μs	33.54 μs	97.83 μs	1,088.1 μs
Array_param_with_inner_join	SQLServer	100	False	0	1,057.4 μs	26.71 μs	78.76 μs	1,060.5 μs
OpenXml_with_inner_join	SQLServer	100	False	0	2,192.2 μs	60.30 μs	177.80 μs	2,186.1 μs
Temporary_table_with_inner_join	SQLServer	100	False	0	1,926.1 μs	44.66 μs	131.69 μs	1,919.2 μs
In_with_constants	SQLServer	100	True	0	1,001.9 μs	34.08 μs	99.95 μs	985.6 μs
In_with_padded_parameters	SQLServer	100	True	0	1,275.6 μs	62.02 μs	182.86 μs	1,249.4 μs
In_with_padded_parameters	SQLServer	100	True	5	1,252.4 μs	56.43 μs	166.39 μs	1,232.0 μs
In_with_padded_parameters	SQLServer	100	True	10	1,242.3 μs	45.00 μs	132.68 μs	1,230.9 μs
In_with_padded_parameters	SQLServer	100	True	50	1,341.4 μs	79.00 μs	231.69 μs	1,329.1 μs
In_with_padded_parameters	SQLServer	100	True	100	1,514.7 μs	91.66 μs	270.25 μs	1,525.2 μs
In_with_padded_parameters	SQLServer	100	True	500	2,793.8 μs	160.65 μs	473.68 μs	2,931.8 μs
Array_param_with_exists_subquery	SQLServer	100	True	0	1,100.8 μs	38.30 μs	112.93 μs	1,091.5 μs
Array_param_with_inner_join	SQLServer	100	True	0	1,093.7 μs	32.36 μs	94.90 μs	1,073.9 μs
OpenXml_with_inner_join	SQLServer	100	True	0	2,183.7 μs	55.15 μs	159.12 μs	2,175.9 μs
Temporary_table_with_inner_join	SQLServer	100	True	0	1,970.0 μs	38.97 μs	107.98 μs	1,963.2 μs
In_with_constants	SQLServer	1000	False	0	1,184.9 μs	38.29 μs	112.90 μs	1,182.0 μs
In_with_padded_parameters	SQLServer	1000	False	0	4,039.2 μs	242.04 μs	713.66 μs	4,242.7 μs
In_with_padded_parameters	SQLServer	1000	False	5	3,866.4 μs	270.36 μs	797.17 μs	4,164.5 μs
In_with_padded_parameters	SQLServer	1000	False	10	3,941.7 μs	237.62 μs	700.64 μs	4,137.6 μs
In_with_padded_parameters	SQLServer	1000	False	50	4,207.9 μs	200.56 μs	591.37 μs	4,433.0 μs
In_with_padded_parameters	SQLServer	1000	False	100	4,226.5 μs	178.61 μs	526.65 μs	4,351.0 μs
In_with_padded_parameters	SQLServer	1000	False	500	4,970.2 μs	319.84 μs	943.04 μs	5,245.2 μs
Array_param_with_exists_subquery	SQLServer	1000	False	0	1,848.8 μs	47.27 μs	139.37 μs	1,846.0 μs
Array_param_with_inner_join	SQLServer	1000	False	0	1,848.6 μs	51.46 μs	151.72 μs	1,834.2 μs
OpenXml_with_inner_join	SQLServer	1000	False	0	11,844.7 μs	235.72 μs	406.61 μs	11,815.6 μs
Temporary_table_with_inner_join	SQLServer	1000	False	0	3,937.7 μs	92.23 μs	271.94 μs	3,926.6 μs
In_with_constants	SQLServer	1000	True	0	1,354.4 μs	34.48 μs	98.39 μs	1,358.8 μs
In_with_padded_parameters	SQLServer	1000	True	0	4,012.3 μs	231.55 μs	682.72 μs	4,225.7 μs
In_with_padded_parameters	SQLServer	1000	True	5	3,975.6 μs	259.31 μs	764.59 μs	4,135.4 μs
In_with_padded_parameters	SQLServer	1000	True	10	3,980.3 μs	221.32 μs	652.58 μs	4,172.7 μs
In_with_padded_parameters	SQLServer	1000	True	50	4,390.7 μs	257.50 μs	759.25 μs	4,638.6 μs
In_with_padded_parameters	SQLServer	1000	True	100	4,275.2 μs	216.18 μs	637.40 μs	4,378.2 μs
In_with_padded_parameters	SQLServer	1000	True	500	5,316.2 μs	362.07 μs	1,067.56 μs	5,700.8 μs
Array_param_with_exists_subquery	SQLServer	1000	True	0	2,053.9 μs	64.06 μs	188.88 μs	2,057.4 μs
Array_param_with_inner_join	SQLServer	1000	True	0	2,089.3 μs	68.79 μs	202.82 μs	2,084.0 μs
OpenXml_with_inner_join	SQLServer	1000	True	0	12,200.8 μs	231.06 μs	204.83 μs	12,152.9 μs
Temporary_table_with_inner_join	SQLServer	1000	True	0	4,130.5 μs	82.56 μs	232.86 μs	4,106.4 μs

Benchmark sources

using System;
using System.Data;
using System.Data.Common;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Dapper;
using Microsoft.Data.SqlClient;
using Microsoft.Data.SqlClient.Server;
using Npgsql;

BenchmarkRunner.Run<Benchmark>();

// Uncomment the following for quick testing that all tests technically work
// [SimpleJob(warmupCount: 1, targetCount: 1, invocationCount: 1)]
public class Benchmark
{
    // [Params(Database.SQLServer, Database.PostgreSQL)]
    [Params(Database.SQLServer)]
    public Database Database { get; set; }

    [Params(2, 10, 100, 1000)]
    public int NumSearchValues { get; set; }

    [Params(true, false)]
    public bool Found { get; set; }

    private int[] _searchValues = null!;

    private const int TotalRows = 1000;

    [Params(0, 5, 10, 50, 100, 500)]
    public int PaddingCount { get; set; }

    private DbConnection _connection = null!;
    private DbCommand _command = null!;

    public void CommonSetup()
    {
        _connection = Database switch
        {
            Database.SQLServer => new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false"),
            Database.PostgreSQL => new NpgsqlConnection("Host=localhost;Username=test;Password=test;SSL Mode=disable"),
            _ => throw new UnreachableException()
        };
        _connection.Open();
        _command = _connection.CreateCommand();

        _connection.Execute("DROP TABLE IF EXISTS data");
        _connection.Execute("CREATE TABLE data (id INTEGER, int INTEGER)");
        _connection.Execute("CREATE INDEX IX_data ON data(int)");

        var builder = new StringBuilder("INSERT INTO data (id, int) VALUES");
        for (var i = 0; i < TotalRows; i++)
        {
            if (i > 0)
                builder.Append(",");
            builder.Append($" ({i}, {i + 100})");
        }

        _command.CommandText = builder.ToString();
        Console.WriteLine("Seed SQL: " + builder);
        _connection.Execute(builder.ToString());

        _searchValues = new int[NumSearchValues];
        for (var i = 0; i < _searchValues.Length; i++)
        {
            _searchValues[i] = Random.Shared.Next(TotalRows) + (Found ? 0 : 1000);
        }
    }

    [GlobalSetup(Target = nameof(In_with_constants))]
    public void In_with_constants_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();

        CommonSetup();

        var builder = new StringBuilder("SELECT id FROM data WHERE int IN (");
        for (var i = 0; i < _searchValues.Length; i++)
        {
            if (i > 0)
                builder.Append(",");
            builder.Append(_searchValues[i]);
        }

        builder.Append(")");
        _command.CommandText = builder.ToString();
        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(PG_any_with_parameter))]
    public void PG_any_with_parameter_setup()
    {
        if (Database != Database.PostgreSQL || PaddingCount != 0)
            throw new NotSupportedException();

        CommonSetup();

        _command.CommandText = "SELECT id FROM data WHERE int = ANY($1)";
        var p = _command.CreateParameter();
        p.Value = _searchValues;
        _command.Parameters.Add(p);

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }


    [GlobalSetup(Target = nameof(In_with_padded_parameters))]
    public void In_with_padded_parameters_setup()
    {
        CommonSetup();

        var builder = new StringBuilder("SELECT id FROM data WHERE int IN (");
        for (var i = 0; i < _searchValues.Length + PaddingCount; i++)
        {
            if (i > 0)
                builder.Append(",");

            var searchValue = i < _searchValues.Length ? _searchValues[i] : _searchValues[^1];

            switch (Database)
            {
                case Database.SQLServer:
                    var parameterName = $"@p{i + 1}";
                    builder.Append(parameterName);
                    _command.Parameters.Add(
                        new SqlParameter { ParameterName = parameterName, Value = searchValue });
                    break;
                case Database.PostgreSQL:
                    builder.Append($"${i + 1}");
                    _command.Parameters.Add(new NpgsqlParameter { Value = searchValue });
                    break;
                default:
                    throw new ArgumentOutOfRangeException();
            }
        }

        if (Database == Database.SQLServer)
        {
            ((SqlCommand)_command).EnableOptimizedParameterBinding = true;
        }

        builder.Append(")");
        _command.CommandText = builder.ToString();
        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Array_param_with_exists_subquery))]
    public void Array_param_with_exists_subquery_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        switch (Database)
        {
            case Database.SQLServer:
                _command.CommandText =
"""
SELECT id FROM data AS d WHERE EXISTS (
    SELECT 1
    FROM OpenJson(@p) WITH ([Value] integer '$') AS v
    WHERE v.Value = d.int)
""";
                _command.Parameters.Add(new SqlParameter("p", JsonSerializer.Serialize(_searchValues)));
                break;
            case Database.PostgreSQL:
                _command.CommandText =
"""
SELECT id FROM data AS d WHERE EXISTS (
    SELECT 1
    FROM unnest($1) AS v
    WHERE v.v = d.int)
""";
                _command.Parameters.Add(new NpgsqlParameter { Value = _searchValues });
                break;
        }

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Array_param_with_inner_join))]
    public void Array_param_with_inner_join_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        switch (Database)
        {
            case Database.SQLServer:
                _command.CommandText =
"""
SELECT id FROM data AS d
JOIN OpenJson(@p) WITH ([Value] integer '$') AS v ON v.Value = d.int;
""";
                _command.Parameters.Add(new SqlParameter("p", JsonSerializer.Serialize(_searchValues)));
                break;
            case Database.PostgreSQL:
                _command.CommandText =
"""
SELECT id FROM data AS d
JOIN unnest($1) AS v ON v.v = d.int;
""";
                _command.Parameters.Add(new NpgsqlParameter { Value = _searchValues });
                break;
        }

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(OpenXml_with_inner_join))]
    public void OpenXml_with_inner_join_setup()
    {
        if (Database != Database.SQLServer || PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        _command.CommandText =
"""
DECLARE @idoc INT;
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc;
SELECT id FROM data AS d
JOIN OpenXML(@idoc, '/Root/Value', 1) WITH ([value] integer) AS v ON v.Value = d.int;
EXEC sp_xml_removedocument @idoc;
""";
        var builder = new StringBuilder("<Root>");
        for (var i = 0; i < _searchValues.Length; i++)
            builder.Append($@"<Value value=""{_searchValues[i]}""/>");
        builder.Append("</Root>");

        _command.Parameters.Add(new SqlParameter("doc", builder.ToString()));

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Temporary_table_with_inner_join))]
    public void Temporary_table_with_inner_join_setup()
    {
        if (Database != Database.SQLServer || PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        // Note: this creates the TVP's type once at setup. This is because it's not possible to batch the creation
        // of the type and its use in the same SqlCommand, so that would require multiple roundtrips (better batching
        // via TDS RpcMessage would like make this work).
        // So for now, we're creating it just once, although that isn't usable by EF.
        _command.CommandText =
"""
DROP TYPE IF EXISTS int_wrapper;
CREATE TYPE int_wrapper AS TABLE (int INT);
""";
        _command.ExecuteNonQuery();

        // Note that we also create the SqlDataRecords only once at setup, which is also unrealistic and makes TVP
        // look much better than it is.
        var records = _searchValues.Select(i =>
        {
            var record = new SqlDataRecord(new SqlMetaData("int", SqlDbType.Int));
            record.SetInt32(0, i);
            return record;
        }).ToArray();

        _command.CommandText = "SELECT id FROM data AS d JOIN @ints AS ints ON ints.int = d.int";
        _command.Parameters.Add(new SqlParameter("ints", SqlDbType.Structured)
        {
            TypeName = "int_wrapper",
            Value = records
        });

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [Benchmark]
    public void In_with_constants() => _command.ExecuteNonQuery();

    [Benchmark]
    public void PG_any_with_parameter() => _command.ExecuteNonQuery();

    [Benchmark]
    public void In_with_padded_parameters() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Array_param_with_exists_subquery() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Array_param_with_inner_join() => _command.ExecuteNonQuery();

    [Benchmark]
    public void OpenXml_with_inner_join() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Temporary_table_with_inner_join() => _command.ExecuteNonQuery();

    [GlobalCleanup]
    public void Cleanup()
    {
        _command.Dispose();
        _connection.Dispose();
    }
}

public enum Database
{
    SQLServer,
    PostgreSQL
}

Observations:

In_with_constants (the current EF technique) is included only for completeness. Although it is always the fastest technique, this does not represent actual performance in a real application; since the benchmark repeatedly executes the same query, the performance issues related to constantly changing SQL don’t manifest.
Table-valued parameters perform surprisingly badly; this benchmarks creates the TVP type once at setup, and even caches the SqlDataRecord and SqlMetaData instances.
There are some anomalous results for increasing parameter padding values
For smaller value counts, In_with_padded_parameters is faster than the OPENJSON-based techniques (Array_param_with_inner_join, Array_param_with_exists_subquery). However, In_with_padded_parameters time grows much faster as the number of values increases, making OPENJSON better in the general case, with arbitrary value counts.

Detailed benchmarking of IN with parameterized vs. OPENJSON

The following benchmark show IN with parameters vs. OPENJSON with JOIN. No parameter padding is done for IN, so this optimistically assumes value counts are always at the padding boundaries. This means that results below show IN as better than it actually is, since in the real world padding will occur. In addition, this doesn’t take into account EF SQL caching, which is possible with OPENJSON but not possible with IN; this again means IN looks better than it really is.

Method	Database	NumSearchValues	Found	Mean	Error	StdDev
In_with_padded_parameters	SQLServer	6	False	515.9 us	20.68 us	60.99 us
Array_param_with_inner_join	SQLServer	6	False	993.2 us	27.36 us	80.25 us
In_with_padded_parameters	SQLServer	6	True	717.3 us	27.04 us	79.31 us
Array_param_with_inner_join	SQLServer	6	True	1,006.2 us	27.56 us	80.82 us
In_with_padded_parameters	SQLServer	7	False	509.3 us	22.65 us	66.79 us
Array_param_with_inner_join	SQLServer	7	False	984.1 us	23.06 us	67.64 us
In_with_padded_parameters	SQLServer	7	True	732.6 us	29.34 us	86.51 us
Array_param_with_inner_join	SQLServer	7	True	1,003.6 us	35.69 us	105.25 us
In_with_padded_parameters	SQLServer	8	False	504.1 us	27.04 us	79.73 us
Array_param_with_inner_join	SQLServer	8	False	996.2 us	34.32 us	101.18 us
In_with_padded_parameters	SQLServer	8	True	745.8 us	31.89 us	94.02 us
Array_param_with_inner_join	SQLServer	8	True	995.3 us	30.75 us	89.22 us
In_with_padded_parameters	SQLServer	9	False	740.3 us	27.35 us	79.77 us
Array_param_with_inner_join	SQLServer	9	False	1,000.6 us	33.03 us	96.88 us
In_with_padded_parameters	SQLServer	9	True	743.6 us	24.49 us	71.82 us
Array_param_with_inner_join	SQLServer	9	True	990.7 us	31.31 us	91.81 us
In_with_padded_parameters	SQLServer	10	False	766.2 us	30.93 us	91.18 us
Array_param_with_inner_join	SQLServer	10	False	997.7 us	28.99 us	85.03 us
In_with_padded_parameters	SQLServer	10	True	781.9 us	26.71 us	76.63 us
Array_param_with_inner_join	SQLServer	10	True	1,006.9 us	29.86 us	87.57 us
In_with_padded_parameters	SQLServer	11	False	789.3 μs	28.94 μs	83.95 μs
Array_param_with_inner_join	SQLServer	11	False	998.1 μs	35.75 μs	104.28 μs
In_with_padded_parameters	SQLServer	11	True	795.3 μs	29.90 μs	88.15 μs
Array_param_with_inner_join	SQLServer	11	True	991.0 μs	26.36 μs	76.05 μs
In_with_padded_parameters	SQLServer	12	False	787.7 μs	29.89 μs	87.68 μs
Array_param_with_inner_join	SQLServer	12	False	975.9 μs	30.89 μs	90.61 μs
In_with_padded_parameters	SQLServer	12	True	800.7 μs	28.99 μs	85.48 μs
Array_param_with_inner_join	SQLServer	12	True	985.3 μs	25.13 μs	73.69 μs
In_with_padded_parameters	SQLServer	13	False	815.3 μs	28.79 μs	83.97 μs
Array_param_with_inner_join	SQLServer	13	False	995.7 μs	36.50 μs	107.63 μs
In_with_padded_parameters	SQLServer	13	True	820.5 μs	31.98 μs	94.29 μs
Array_param_with_inner_join	SQLServer	13	True	1,006.5 μs	27.29 μs	80.46 μs
In_with_padded_parameters	SQLServer	14	False	865.0 μs	33.55 μs	98.94 μs
Array_param_with_inner_join	SQLServer	14	False	1,021.3 μs	33.39 μs	98.45 μs
In_with_padded_parameters	SQLServer	14	True	849.6 μs	31.38 μs	92.52 μs
Array_param_with_inner_join	SQLServer	14	True	1,013.0 μs	34.27 μs	100.51 μs
In_with_padded_parameters	SQLServer	15	False	824.8 μs	33.58 μs	99.01 μs
Array_param_with_inner_join	SQLServer	15	False	1,027.6 μs	27.90 μs	80.50 μs
In_with_padded_parameters	SQLServer	15	True	841.5 μs	29.20 μs	86.10 μs
Array_param_with_inner_join	SQLServer	15	True	1,011.8 μs	35.77 μs	105.46 μs
In_with_padded_parameters	SQLServer	16	False	911.6 μs	31.84 μs	93.88 μs
Array_param_with_inner_join	SQLServer	16	False	998.5 μs	31.49 μs	91.86 μs
In_with_padded_parameters	SQLServer	16	True	912.9 μs	31.90 μs	94.06 μs
Array_param_with_inner_join	SQLServer	16	True	1,004.1 μs	34.38 μs	100.82 μs
In_with_padded_parameters	SQLServer	17	False	903.2 μs	27.24 μs	80.31 μs
Array_param_with_inner_join	SQLServer	17	False	987.0 μs	25.22 μs	74.36 μs
In_with_padded_parameters	SQLServer	17	True	914.9 μs	29.68 μs	87.05 μs
Array_param_with_inner_join	SQLServer	17	True	989.6 μs	28.45 μs	82.09 μs
In_with_padded_parameters	SQLServer	18	False	936.2 μs	33.73 μs	98.91 μs
Array_param_with_inner_join	SQLServer	18	False	984.6 μs	24.66 μs	71.93 μs
In_with_padded_parameters	SQLServer	18	True	904.2 μs	31.40 μs	92.09 μs
Array_param_with_inner_join	SQLServer	18	True	983.9 μs	28.37 μs	81.41 μs
In_with_padded_parameters	SQLServer	19	False	937.8 μs	33.65 μs	99.20 μs
Array_param_with_inner_join	SQLServer	19	False	976.7 μs	30.03 μs	88.06 μs
In_with_padded_parameters	SQLServer	19	True	954.2 μs	38.87 μs	114.62 μs
Array_param_with_inner_join	SQLServer	19	True	1,022.6 μs	30.70 μs	89.55 μs
In_with_padded_parameters	SQLServer	20	False	947.5 μs	35.27 μs	103.43 μs
Array_param_with_inner_join	SQLServer	20	False	1,006.5 μs	23.44 μs	68.74 μs
In_with_padded_parameters	SQLServer	20	True	932.7 μs	40.55 μs	119.57 μs
Array_param_with_inner_join	SQLServer	20	True	1,018.0 μs	36.61 μs	107.94 μs
In_with_padded_parameters	SQLServer	21	False	943.2 μs	33.65 μs	99.21 μs
Array_param_with_inner_join	SQLServer	21	False	998.6 μs	24.13 μs	71.16 μs
In_with_padded_parameters	SQLServer	21	True	982.7 μs	40.14 μs	118.37 μs
Array_param_with_inner_join	SQLServer	21	True	1,025.1 μs	26.94 μs	78.17 μs
In_with_padded_parameters	SQLServer	22	False	1,007.9 μs	29.37 μs	86.13 μs
Array_param_with_inner_join	SQLServer	22	False	1,062.7 μs	33.69 μs	99.34 μs
In_with_padded_parameters	SQLServer	22	True	1,023.2 μs	41.94 μs	123.67 μs
Array_param_with_inner_join	SQLServer	22	True	1,062.4 μs	33.19 μs	97.34 μs
In_with_padded_parameters	SQLServer	23	False	969.7 μs	36.09 μs	106.42 μs
Array_param_with_inner_join	SQLServer	23	False	1,029.8 μs	33.79 μs	99.64 μs
In_with_padded_parameters	SQLServer	23	True	999.0 μs	33.08 μs	97.54 μs
Array_param_with_inner_join	SQLServer	23	True	1,037.8 μs	33.28 μs	97.59 μs
In_with_padded_parameters	SQLServer	24	False	976.2 μs	47.49 μs	138.52 μs
Array_param_with_inner_join	SQLServer	24	False	1,006.7 μs	25.46 μs	74.67 μs
In_with_padded_parameters	SQLServer	24	True	998.9 μs	44.81 μs	131.43 μs
Array_param_with_inner_join	SQLServer	24	True	1,015.5 μs	28.66 μs	83.61 μs
In_with_padded_parameters	SQLServer	25	False	1,010.6 μs	44.62 μs	130.86 μs
Array_param_with_inner_join	SQLServer	25	False	1,043.4 μs	35.54 μs	104.80 μs
In_with_padded_parameters	SQLServer	25	True	1,047.1 μs	44.93 μs	132.47 μs
Array_param_with_inner_join	SQLServer	25	True	1,013.7 μs	37.25 μs	109.83 μs
In_with_padded_parameters	SQLServer	26	False	998.2 μs	45.84 μs	135.15 μs
Array_param_with_inner_join	SQLServer	26	False	1,007.1 μs	28.03 μs	82.66 μs
In_with_padded_parameters	SQLServer	26	True	1,039.2 μs	33.40 μs	98.47 μs
Array_param_with_inner_join	SQLServer	26	True	1,016.4 μs	33.10 μs	97.07 μs
In_with_padded_parameters	SQLServer	27	False	1,028.9 μs	33.98 μs	100.20 μs
Array_param_with_inner_join	SQLServer	27	False	1,017.6 μs	26.44 μs	77.14 μs
In_with_padded_parameters	SQLServer	27	True	1,023.6 μs	33.19 μs	97.85 μs
Array_param_with_inner_join	SQLServer	27	True	1,023.4 μs	35.33 μs	103.62 μs
In_with_padded_parameters	SQLServer	28	False	986.5 μs	40.78 μs	120.23 μs
Array_param_with_inner_join	SQLServer	28	False	1,012.8 μs	25.03 μs	73.42 μs
In_with_padded_parameters	SQLServer	28	True	1,053.5 μs	31.16 μs	91.88 μs
Array_param_with_inner_join	SQLServer	28	True	1,034.4 μs	34.10 μs	100.53 μs
In_with_padded_parameters	SQLServer	29	False	1,034.8 μs	33.66 μs	99.26 μs
Array_param_with_inner_join	SQLServer	29	False	1,012.9 μs	33.75 μs	98.98 μs
In_with_padded_parameters	SQLServer	29	True	1,069.7 μs	32.05 μs	94.00 μs
Array_param_with_inner_join	SQLServer	29	True	1,045.3 μs	21.60 μs	62.66 μs
In_with_padded_parameters	SQLServer	30	False	1,043.5 μs	40.67 μs	119.27 μs
Array_param_with_inner_join	SQLServer	30	False	1,033.1 μs	31.54 μs	92.49 μs
In_with_padded_parameters	SQLServer	30	True	1,064.4 μs	31.34 μs	92.41 μs
Array_param_with_inner_join	SQLServer	30	True	1,052.1 μs	21.56 μs	63.56 μs
In_with_padded_parameters	SQLServer	31	False	1,071.2 us	34.23 us	100.92 us
Array_param_with_inner_join	SQLServer	31	False	1,026.3 us	26.02 us	76.30 us
In_with_padded_parameters	SQLServer	31	True	1,096.5 us	50.68 us	149.42 us
Array_param_with_inner_join	SQLServer	31	True	1,043.7 us	27.62 us	81.45 us
In_with_padded_parameters	SQLServer	32	False	1,091.2 us	46.56 us	137.29 us
Array_param_with_inner_join	SQLServer	32	False	1,024.3 us	30.00 us	88.47 us
In_with_padded_parameters	SQLServer	32	True	1,120.6 us	44.00 us	129.72 us
Array_param_with_inner_join	SQLServer	32	True	1,051.4 us	34.78 us	102.56 us
In_with_padded_parameters	SQLServer	33	False	1,063.9 us	35.94 us	104.85 us
Array_param_with_inner_join	SQLServer	33	False	994.8 us	35.37 us	103.75 us
In_with_padded_parameters	SQLServer	33	True	1,069.1 us	41.82 us	123.30 us
Array_param_with_inner_join	SQLServer	33	True	1,063.6 us	33.36 us	98.37 us
In_with_padded_parameters	SQLServer	34	False	1,090.3 us	43.93 us	129.53 us
Array_param_with_inner_join	SQLServer	34	False	1,015.3 us	22.14 us	64.24 us
In_with_padded_parameters	SQLServer	34	True	1,096.9 us	36.54 us	106.00 us
Array_param_with_inner_join	SQLServer	34	True	1,016.5 us	35.40 us	104.38 us
In_with_padded_parameters	SQLServer	35	False	1,092.8 us	44.31 us	129.96 us
Array_param_with_inner_join	SQLServer	35	False	1,024.5 us	29.75 us	87.73 us
In_with_padded_parameters	SQLServer	35	True	1,128.4 us	50.91 us	149.31 us
Array_param_with_inner_join	SQLServer	35	True	1,034.2 us	29.22 us	84.77 us

Summary

Unfortunately, there isn’t one technique which performs best in all scenarios; for smaller value counts (< 15), padded parameters performs better, whereas with larger ones OPENJSON performs better.
Ideally, a hybrid approach would be implemented, using parameter padding in lower counts and automatically switching to OPENJSON as counts increase. However, EF’s query and caching architecture makes this non-trivial.
Since OPENJSON scales better in general scenario (arbitrary value numbers), and since we want to unlock arbitrary querying of OPENJSON-based array parameters (#30426), we think it makes sense to start with the OPENJSON-based implementation. Then, we can look at adding parameterized IN with padding as an optimization for lower value counts.
Note that OPENJSON can’t always be used (older SQL Server versions, spatial types are incompatible with it). For those cases we’ll currently revert back to the current behavior (IN with constants). We may improve this in the future to IN with parameters.

+22

roji on Nov 4, 2023

I wrote an extension to EF6 to handle this exact problem for our product and apparently it is extremely similar to what Nick is proposing above. By replacing this where clause: .Where(entity => myEnumerable.Contains(entity.prop)) with this extension method: .BatchedWhereKeyInValues(entity => entity.prop, myEnumerables, batchSize: 100), my solution will break the list of values into batches, and then produce a properly parameterized query that generates the desired SQL.

For example, this expression: query.BatchedWhereKeyInValues(q => q.Id, values: {1,2,5,7,8,9}, batchSize: 3)

would effectively become the following two queryables, where the numbers are supplied as variables that allow EF to parameterize the query rather than embed them as constants: query.Where(q => q.Id == 1 || q.Id == 2 || q.Id == 5) query.Where(q => q.Id == 7 || q.Id == 8 || q.Id == 9)

EF then converts those LINQ expressions into this SQL expression: WHERE Id IN (@p__linq__0, @p__linq__1, @p__linq__2)

Check out the source here: https://gist.github.com/kroymann/e57b3b4f30e6056a3465dbf118e5f13d

+18

kroymann on Oct 13, 2018

Here’s an alternative solution I discovered thanks to a semi-related SO post by @davidbaxterbrowne:

var records = await _context.Items
	.AsNoTracking()
	.FromSql(
		"SELECT * FROM dbo.Items WHERE Id IN (SELECT value FROM OPENJSON({0}))",
		JsonConvert.SerializeObject(itemIds.ToList()))
	.ToListAsync();

It’s specific to SQL Server 2017+ (or Azure SQL Server), and in my testing with a list of 50,000 IDs against a remote Azure SQL database it was 5x faster than a standard batching solution (dividing into lists of 1,000 IDs).

+16

IGx89 on May 10, 2019

@kroymann Works like a charm! Big Thanks!

public static class QueryableExtension<TQuery>
    {
        internal static IEnumerable<IQueryable<TQuery>> WhereIn<TKey>(IQueryable<TQuery> queryable,
            Expression<Func<TQuery, TKey>> keySelector, IEnumerable<TKey> values, int batchSize)
        {
            List<TKey> distinctValues = values.Distinct().ToList();
            int lastBatchSize = distinctValues.Count % batchSize;
            if (lastBatchSize != 0)
            {
                distinctValues.AddRange(Enumerable.Repeat(distinctValues.Last(), batchSize - lastBatchSize));
            }

            int count = distinctValues.Count;
            for (int i = 0; i < count; i += batchSize)
            {
                var body = distinctValues
                    .Skip(i)
                    .Take(batchSize)
                    .Select(v =>
                    {
                        // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                        Expression<Func<TKey>> valueAsExpression = () => v;
                        return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                    })
                    .Aggregate((a, b) => Expression.OrElse(a, b));
                if (body == null)
                {
                    yield break;
                }

                var whereClause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);
                yield return queryable.Where(whereClause);
            }
        }

// doesn't use batching
        internal static IQueryable<TQuery> WhereIn<TKey>(IQueryable<TQuery> queryable,
            Expression<Func<TQuery, TKey>> keySelector, IEnumerable<TKey> values)
        {
            TKey[] distinctValues = values.Distinct().ToArray();


            int count = distinctValues.Length;
            for (int i = 0; i < count; ++i)
            {
                var body = distinctValues
                    .Select(v =>
                    {
                        // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                        Expression<Func<TKey>> valueAsExpression = () => v;
                        return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                    })
                    .Aggregate((a, b) => Expression.OrElse(a, b));

                var whereClause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);
                return queryable.Where(whereClause);
            }

            return Enumerable.Empty<TQuery>().AsQueryable();
        }
    }

Usage:

int[] a = Enumerable.Range(1, 10).ToArray();
var queries = QueryableExtension<User>.WhereIn(dbContext.Users, t => t.Id, a, 5);
foreach (var queryable in queries)
{
    _ = queryable.ToArray();
}

var everything = QueryableExtension<User>.WhereIn(dbContext.Users, t => t.Id, a);
_ = everything.ToArray();

+10

StasPerekrestov on Oct 13, 2018

@yv989c great to hear! In my benchmark above, XML performed worse than than the current EF technique (expansion to constants), but I may have not optimized that technique as much as I should have. In any case, I think that at the point where we are, simply supporting JSON and allowing an opt-out back to the current technique for old SQL Server versions is good enough.

I’m currently actively working on this, as part of a much larger change that should allow fully queryable parameters, constants and even columns across different providers. Stay tuned.

roji on Mar 28, 2023

I have updated one of @divega 's proposals above for 3.1: https://erikej.github.io/efcore/sqlserver/2020/03/30/ef-core-cache-pollution.html

ErikEJ on May 12, 2020

It is good to see that “advanced” solutions have been proposed here. Is it feasible to create a rather simple but quick fix first? Simply making the list a bunch of parameters (e.g. @p0, @p1, ...) would be much better than the literals. That way there’s one plan for each list count in contrast to one plan for each set of constants.

GSPP on Jun 23, 2020

Also such queries don’t aggregate well in logging systems like Application Insights, cause every query has unique sql.

SonicGD on Nov 2, 2018

roji on Apr 26, 2023

@mrlife unfortunately not - this is quite a non-trivial optimization, and we’re now in a feature freeze, stabilizing the release. I’m very hopeful we’ll be able to tackle this for 7.0.

roji on Sep 8, 2021

Been investigating solutions in the meantime and found that the string_split function idea from above can be separated out into its own queryable for flexible reuse. For example…

Add keyless entity to your model:

public class StringSplitResult
{
    public string Value { get; set; }
}

Some methods like this on your context:

public IQueryable<string> AsQuery(IEnumerable<string> values) =>
    SplitString(string.Join(",", values));

public IQueryable<string> SplitString(string values, string separator = ",") =>
    Set<StringSplitResult>()
        .FromSqlInterpolated($"SELECT Value FROM string_split({values}, {separator})")
        .Select(x => x.Value);

You could do something like this:

var codes = new[] {"CODE1", "CODE2"};
var item = context.Set<Entity>()
    .Where(entity => context.AsQuery(codes).Contains(entity.Code))
    .FirstOrDefault();

Which produces SQL:

Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (21ms) [Parameters=[p0='CODE1,CODE2' (Size = 4000), p1=',' (Size = 4000)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [e].[Id], [e].[Code]
FROM [Entities] AS [e]
WHERE EXISTS (
    SELECT 1
    FROM (
        SELECT Value FROM string_split(@p0, @p1)
    ) AS [s]
    WHERE ([s].[Value] = [e].[Code]) OR ([s].[Value] IS NULL AND [e].[Code] IS NULL))

stevendarby on Oct 24, 2020

@snappyfoo we’re still working out the mechanics in our planning process and the Github project board. The important thing is that this issue has the consider-for-next-release label, which means it’s definitely on the table for 6.0.

roji on Oct 16, 2020

I just wanted to share here some lexperiments I did some time ago with different alternative approaches to Enumerable.Contains using existing EF Core APIs:

https://github.com/divega/ContainsOptimization/

The goal was both to find possible workarounds, and to explore how we could deal with Contains in the future.

I timed the initial construction and first execution of the query with different approaches. This isn’t a representative benchmark because there is no data and it doesn’t measure the impact of caching, but I think it is still interesting to see the perf sensitivity to the number of parameters.

The code tests 4 approaches:

Test("standard Contains", context.People.Where(p => (ids.Contains(p.Id))));

Test("Parameter rewrite", context.People.In(ids, p => p.Id));

Test("Split function", 
    context.People.FromSql(
        $"select * from People p where p.Id in(select value from string_split({string.Join(",", ids)}, ','))"));

Test("table-valued parameter", 
    context.People.FromSql(
        $"select * from People p where p.Id in(select * from {context.CreateTableValuedParameter(ids, "@p0")})"));

Standard Contains

This is included as a baseline and is what happens today when you call Contains with a list: the elements of the list get in-lined as constants in the SQL as literals and the SQL cannot be reused.

Parameter rewrite

This approaches tries to rewrite the expression to mimic what the compiler would produce for a query like this:

var p1 = 1;
var p2 = 2; 
var p3 = 3;
context.People.Where(p => (new List<int> {p1, p2, p3}.Contains(p.Id))).ToList();

It also “bucketizes” the list, meaning that it only produces lists of specific lengths (powers of 2), repeating the last element as necessary, with the goal of favoring caching.

This approach only works within the limits of 2100 parameters in SQL Server, and sp_executesql takes a couple of parameters, so the size of the last possible bucket is 2098.

Overall this seems to be the most expensive approach using EF Core, at least on initial creation and execution.

Split function

This approach was mentioned by @NickCraver and @mgravell as something that Dapper can leverage. I implemented it using FromSql. It is interesting because it leads to just one (potentially very long) string parameter and it seems to perform very well (at least in the first query), but the STRING_SPLIT function is only available in SQL Server since version 2016.

Table-valued parameter

For this I took advantage of the ability to create table-valued parameters that contain an IEnumerable<DbDataRecord> (I used a list rather than a streaming IEnumerable, but not sure that matters in this case) and pass them as a DbParameter in FromSql. I also added a hack to define the table type for the parameter if it isn’t defined. Overall this seems to be the second most lightweight approach after the split function.

divega on Dec 16, 2018

@yv989c @pfritschi I’ve done some work in the recent days to improve the SQL quality around this, specifically #30976 which switches to using IN instead of EXISTS where possible, and #30983 which switches to OPENJSON with WITH where ordering preservation isn’t required (e.g. with IN/EXISTS). All this work should hopefully make it into preview6, and should make the EF-generated SQL as optimal as possible.

#30984 tracks synthesizing an ordering value for parameter collections where ordering is needed; I don’t intend to work on this at the moment.

roji on May 28, 2023

Hey @roji , maybe I can further elaborate on the issues described by @m-gasser:

Our initial implementation did not have the WITH clause and we had issues in production this way. So at least for us, it will not be good enough without the WITH clause, we tried.

Can you share more details about this? What specific problems did you have?

If you use OPENJSON without the data type definition for the values, SQL Server will throw the Warning Type conversation in expression (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[value],0)) may affect "CardinalityEstimate" in query plan choice. As the warning explains, having an incorrect cardinality estimation can lead to inefficient query plans, because this estimation is a key factor in determining which join and index operations are used. You will notice this issue if you work with bigger data sets or have many joins in your query.

simple_query_no_with

However, if you define the data type of your JSON values with WITH (...), SQL Server is able to estimate the cardinality correctly and you will get a more stable query execution plan.

We also size the json parameter to VARCHAR(100), VARCHAR(4000) or VARCHAR(max), depending on how many parameters are provided. I can not remember the details but I guess this allows SQL Server to keep different query plans optimized for varying number of parameters.

Here as well I’d be interesting in understanding this better - any concrete info would be very helpful.

When your JSON values are strings and you want to provide the data type (eg. OPENJSON(@display_names) WITH([value] VARCHAR(100) '$') AS [n] WHERE [n].[value] = [u].[DisplayName], you have to make sure, that the length of VARCHAR (in this example 100) is consistent for the same query. You should use the target column definition if possible, or if you cannot use the column definition you should try to use gradations.

query_plan_text

In the picture above I have executed the same queries with different parameters, one time with a consistent VARCHAR length and one time dynamically calculated depended on the longest string in the parameter. As you can see, if you have a different VARCHAR lengths, SQL Server will think that it is a different query and generate a new query plan. This causes a lot of overhead for each execution and also prevents SQL Server form using the Query Store and Plan Cache efficiently.

query_plan_generated

After further tests I also noticed that (as mentioned by @m-gasser), you get different query execution plans, if the json string is defined with different lengths. So EXEC sp_executesql '...', N'@json_string VARCHAR(4000)', @json_string='["a","b","c"]' will generate a different query plan then EXEC sp_executesql '...', N'@json_string VARCHAR(2000)', @json_string='["a","b","c"]', even if the query and data type definitions are exactly the same.

pfritschi on May 24, 2023

NOTE: See additional benchmarking below with SqlClient EnableOptimizedParameterBinding

Did a quick perf investigation of parameterized vs. non-parameterized WHERE x IN (...) with lots of values.

Key notes and takeaways:

This benchmark does not factor in the price of plan cache pollution, so it shows non-parameterized queries in a better light than the reality. In a real application with many query types, the non-parameterized IN queries would cause other queries to get evicted, affecting their perf in a potentially significant way. In this benchmark there’s just one SQL, so that obviously doesn’t show.
Parameterized queries are indeed slower on SQL Server. A somewhat significant gap starts to show even with 10 values (~230us or ~28%), with the gap seemingly widening with the number of values. The 1000 element case seems anomalous with a huge gap, but only in the “not found case”.
In PG they’re also a tiny bit slower, but far less than SQL Server. With 1000 IN values, the difference is 871us (8%) and 283us (2%) for the “found” and “not found” cases. This is almost sure to not be worth the price of losing command preparation (due to pollution), for the general case.
Note that PG already has an efficient translation for this LINQ construct, since it has 1st class array support.

To summarize, I hope we can end up with a solution that doesn’t use the SQL IN construct at all, thereby avoiding all of this (see the INNER JOIN-based solutions above).

Method	NumValues	Database	IsFound	IsParameterized	Mean	Error	StdDev	Median
Test	1	PostgreSQL	False	False	304.7 us	8.26 us	24.36 us	306.6 us
Test	1	PostgreSQL	False	True	304.9 us	8.67 us	25.56 us	311.0 us
Test	1	PostgreSQL	True	False	311.0 us	8.43 us	24.71 us	314.0 us
Test	1	PostgreSQL	True	True	314.7 us	9.22 us	27.20 us	317.2 us
Test	10	PostgreSQL	False	False	331.9 us	14.88 us	43.88 us	343.4 us
Test	10	PostgreSQL	False	True	344.7 us	11.92 us	35.16 us	346.0 us
Test	10	PostgreSQL	True	False	350.0 us	12.44 us	36.69 us	356.4 us
Test	10	PostgreSQL	True	True	357.2 us	11.32 us	33.37 us	359.6 us
Test	100	PostgreSQL	False	False	671.5 us	34.80 us	101.51 us	656.9 us
Test	100	PostgreSQL	False	True	724.5 us	48.19 us	142.10 us	716.4 us
Test	100	PostgreSQL	True	False	719.6 us	50.41 us	147.83 us	695.3 us
Test	100	PostgreSQL	True	True	730.6 us	50.48 us	148.86 us	715.9 us
Test	1000	PostgreSQL	False	False	9,738.7 us	53.92 us	45.02 us	9,746.8 us
Test	1000	PostgreSQL	False	True	10,021.4 us	199.92 us	273.65 us	9,981.0 us
Test	1000	PostgreSQL	True	False	9,299.2 us	182.40 us	355.76 us	9,247.7 us
Test	1000	PostgreSQL	True	True	10,170.5 us	201.69 us	207.12 us	10,190.8 us
Test	1	SQLServer	False	False	541.9 us	10.80 us	18.34 us	548.0 us
Test	1	SQLServer	False	True	546.4 us	10.88 us	11.64 us	546.3 us
Test	1	SQLServer	True	False	547.8 us	10.86 us	25.81 us	554.7 us
Test	1	SQLServer	True	True	552.8 us	10.89 us	16.63 us	554.0 us
Test	10	SQLServer	False	False	582.3 us	11.44 us	20.63 us	590.0 us
Test	10	SQLServer	False	True	808.9 us	23.28 us	68.65 us	825.0 us
Test	10	SQLServer	True	False	587.7 us	11.72 us	28.53 us	595.7 us
Test	10	SQLServer	True	True	817.7 us	26.45 us	77.99 us	845.9 us
Test	100	SQLServer	False	False	1,046.6 us	63.10 us	186.05 us	1,111.1 us
Test	100	SQLServer	False	True	1,853.9 us	76.86 us	226.62 us	1,878.4 us
Test	100	SQLServer	True	False	1,088.7 us	57.03 us	168.16 us	1,134.6 us
Test	100	SQLServer	True	True	1,900.9 us	99.60 us	293.69 us	1,949.8 us
Test	1000	SQLServer	False	False	2,522.5 us	67.90 us	199.15 us	2,524.7 us
Test	1000	SQLServer	False	True	20,636.4 us	329.94 us	275.52 us	20,653.9 us
Test	1000	SQLServer	True	False	20,642.5 us	410.18 us	770.42 us	20,602.6 us
Test	1000	SQLServer	True	True	20,818.2 us	412.19 us	677.24 us	20,777.1 us

Results as a text-only table

| Method | NumValues |   Database | IsFound | IsParameterized |        Mean |     Error |    StdDev |      Median |
|------- |---------- |----------- |-------- |---------------- |------------:|----------:|----------:|------------:|
|   Test |         1 | PostgreSQL |   False |           False |    304.7 us |   8.26 us |  24.36 us |    306.6 us |
|   Test |         1 | PostgreSQL |   False |            True |    304.9 us |   8.67 us |  25.56 us |    311.0 us |
|   Test |         1 | PostgreSQL |    True |           False |    311.0 us |   8.43 us |  24.71 us |    314.0 us |
|   Test |         1 | PostgreSQL |    True |            True |    314.7 us |   9.22 us |  27.20 us |    317.2 us |
|   Test |        10 | PostgreSQL |   False |           False |    331.9 us |  14.88 us |  43.88 us |    343.4 us |
|   Test |        10 | PostgreSQL |   False |            True |    344.7 us |  11.92 us |  35.16 us |    346.0 us |
|   Test |        10 | PostgreSQL |    True |           False |    350.0 us |  12.44 us |  36.69 us |    356.4 us |
|   Test |        10 | PostgreSQL |    True |            True |    357.2 us |  11.32 us |  33.37 us |    359.6 us |
|   Test |       100 | PostgreSQL |   False |           False |    671.5 us |  34.80 us | 101.51 us |    656.9 us |
|   Test |       100 | PostgreSQL |   False |            True |    724.5 us |  48.19 us | 142.10 us |    716.4 us |
|   Test |       100 | PostgreSQL |    True |           False |    719.6 us |  50.41 us | 147.83 us |    695.3 us |
|   Test |       100 | PostgreSQL |    True |            True |    730.6 us |  50.48 us | 148.86 us |    715.9 us |
|   Test |      1000 | PostgreSQL |   False |           False |  9,738.7 us |  53.92 us |  45.02 us |  9,746.8 us |
|   Test |      1000 | PostgreSQL |   False |            True | 10,021.4 us | 199.92 us | 273.65 us |  9,981.0 us |
|   Test |      1000 | PostgreSQL |    True |           False |  9,299.2 us | 182.40 us | 355.76 us |  9,247.7 us |
|   Test |      1000 | PostgreSQL |    True |            True | 10,170.5 us | 201.69 us | 207.12 us | 10,190.8 us |

|   Test |         1 |  SQLServer |   False |           False |    541.9 us |  10.80 us |  18.34 us |    548.0 us |
|   Test |         1 |  SQLServer |   False |            True |    546.4 us |  10.88 us |  11.64 us |    546.3 us |
|   Test |         1 |  SQLServer |    True |           False |    547.8 us |  10.86 us |  25.81 us |    554.7 us |
|   Test |         1 |  SQLServer |    True |            True |    552.8 us |  10.89 us |  16.63 us |    554.0 us |
|   Test |        10 |  SQLServer |   False |           False |    582.3 us |  11.44 us |  20.63 us |    590.0 us |
|   Test |        10 |  SQLServer |   False |            True |    808.9 us |  23.28 us |  68.65 us |    825.0 us |
|   Test |        10 |  SQLServer |    True |           False |    587.7 us |  11.72 us |  28.53 us |    595.7 us |
|   Test |        10 |  SQLServer |    True |            True |    817.7 us |  26.45 us |  77.99 us |    845.9 us |
|   Test |       100 |  SQLServer |   False |           False |  1,046.6 us |  63.10 us | 186.05 us |  1,111.1 us |
|   Test |       100 |  SQLServer |   False |            True |  1,853.9 us |  76.86 us | 226.62 us |  1,878.4 us |
|   Test |       100 |  SQLServer |    True |           False |  1,088.7 us |  57.03 us | 168.16 us |  1,134.6 us |
|   Test |       100 |  SQLServer |    True |            True |  1,900.9 us |  99.60 us | 293.69 us |  1,949.8 us |
|   Test |      1000 |  SQLServer |   False |           False |  2,522.5 us |  67.90 us | 199.15 us |  2,524.7 us |
|   Test |      1000 |  SQLServer |   False |            True | 20,636.4 us | 329.94 us | 275.52 us | 20,653.9 us |
|   Test |      1000 |  SQLServer |    True |           False | 20,642.5 us | 410.18 us | 770.42 us | 20,602.6 us |
|   Test |      1000 |  SQLServer |    True |            True | 20,818.2 us | 412.19 us | 677.24 us | 20,777.1 us |

Benchmark code

BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    private DbConnection _connection;
    private DbCommand _command;

    private const int Rows = 100000;

    [Params(1, 10, 100, 1000)]
    public int NumValues { get; set; }

    [Params("SQLServer", "PostgreSQL")]
    public string Database { get; set; }

    [Params(true, false)]
    public bool IsFound { get; set; }

    [Params(true, false)]
    public bool IsParameterized { get; set; }

    private Guid _existingGuid;

    [GlobalSetup]
    public Task Setup()
        => Database switch
        {
            "SQLServer" => SetupSqlServer(),
            "PostgreSQL" => SetupPostgreSQL(),
            _ => throw new NotSupportedException("Unknown database: " + Database)
        };

    private async Task SetupSqlServer()
    {
        var connection = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false");
        await connection.OpenAsync();
        var command = connection.CreateCommand();
        command.CommandText =
"""
DROP TABLE IF EXISTS data;
CREATE TABLE data (id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID(), some_int INT);
""";
        await command.ExecuteNonQueryAsync();

        foreach (var chunk in Enumerable.Range(0, Rows).Chunk(1000))
        {
            var builder = new StringBuilder("INSERT INTO data (some_int) VALUES ");
            for (var i = 0; i < chunk.Length; i++)
            {
                if (i > 0)
                    builder.Append(", ");
                builder.Append($"({chunk[i]})");
            }

            command.CommandText = builder.ToString();
            await command.ExecuteNonQueryAsync();
        }

        command.CommandText = "SELECT TOP(1) id FROM data";
        _existingGuid = (Guid)(await command.ExecuteScalarAsync())!;

        var builder2 = new StringBuilder("SELECT some_int FROM data WHERE id IN (");

        command.Parameters.Clear();

        if (IsParameterized)
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");

                builder2.Append("@p" + i);

                command.Parameters.AddWithValue("p" + i, i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid());
            }
        }
        else
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");
                builder2
                    .Append("'")
                    .Append(i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid())
                    .Append("'");
            }
        }

        builder2.Append(")");
        command.CommandText = builder2.ToString();

        _connection = connection;
        _command = command;
    }

    private async Task SetupPostgreSQL()
    {
        var connection = new NpgsqlConnection("Server=localhost;Username=test;Password=test");
        await connection.OpenAsync();
        var command = connection.CreateCommand();
        command.CommandText =
"""
DROP TABLE IF EXISTS data;
CREATE TABLE data (id uuid PRIMARY KEY DEFAULT gen_random_uuid(), some_int INT);
""";
        await command.ExecuteNonQueryAsync();

        foreach (var chunk in Enumerable.Range(0, Rows).Chunk(1000))
        {
            var builder = new StringBuilder("INSERT INTO data (some_int) VALUES ");
            for (var i = 0; i < chunk.Length; i++)
            {
                if (i > 0)
                    builder.Append(", ");
                builder.Append($"({chunk[i]})");
            }

            command.CommandText = builder.ToString();
            await command.ExecuteNonQueryAsync();
        }

        command.CommandText = "SELECT id FROM data LIMIT 1";
        _existingGuid = (Guid)(await command.ExecuteScalarAsync())!;

        var builder2 = new StringBuilder("SELECT some_int FROM data WHERE id IN (");

        command.Parameters.Clear();

        if (IsParameterized)
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");

                builder2.Append("$" + (i + 1));

                command.Parameters.Add(new()
                {
                    Value = i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid()
                });
            }
        }
        else
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");
                builder2
                    .Append("'")
                    .Append(i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid())
                    .Append("'");
            }
        }

        builder2.Append(")");
        command.CommandText = builder2.ToString();

        _connection = connection;
        _command = command;
    }

    [Benchmark]
    public Task<object> Test()
        => _command.ExecuteScalarAsync();

    [GlobalCleanup]
    public ValueTask Cleanup()
        => _connection.DisposeAsync();
}

roji on Dec 15, 2022

I see @roji. Thanks for taking the time to explain. On my side I see how my current approach might lead to unexpected results when evaluating Skip/Take on the server-side.

Hey @roji, I have fixed this in the latest release of QueryableValues. It now provides ordering guarantees, among other fixes and enhancements. Besides higher memory allocations, the performance cost resulting from the bigger payload and additional complexity was minimal.

Note that if there’s evidence of an actual performance difference between the two variants, there’s the option of emitting OPENJSON with WITH when ordering is irrelevant (e.g. for Contains), but without it where ordering needs to be preserved. But since this can add some complexity, I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

I ran a slim version of my Contains benchmarks using the latest version of QueryableValues with Microsoft.EntityFrameworkCore.SqlServer version 7.0.5 and 8.0.0-preview.4.23259.3. Unfortunately, the preview version performed significantly worse on types other than Int32. However, it does fix the query plan cache pollution issue.

Here are the BenchmarkDotNet results:

Server instance specs

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
Oct  8 2022 05:58:25
Copyright (C) 2022 Microsoft Corporation
Express Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 22621: ) (Hypervisor)

Benchmarks

Using Microsoft.EntityFrameworkCore.SqlServer v7.0.5 Open Interactive Chart
Using Microsoft.EntityFrameworkCore.SqlServer v8.0.0-preview.4.23259.3 Open Interactive Chart

yv989c on May 20, 2023

I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

In our company we currently us a custom SqlServerQuerySqlGenerator to generate WHERE conditions in the form Foo IN(SELECT Value FROM OPENJSON(@jsonstr) WITH (Value CHAR(14)). Our initial implementation did not have the WITH clause and we had issues in production this way. So at least for us, it will not be good enough without the WITH clause, we tried. I hope this anecdotal evidence is good enough for you to implement this.

We also size the json parameter to VARCHAR(100), VARCHAR(4000) or VARCHAR(max), depending on how many parameters are provided. I can not remember the details but I guess this allows SQL Server to keep different query plans optimized for varying number of parameters.

m-gasser on May 17, 2023

Hello everyone! This is my take on providing a solution to this problem. It’s now even more flexible by allowing you to compose a complex type.

I just added some benchmarks showing great results.

Please give it a try. I’m looking forward to your feedback.

Links:

yv989c on Jan 18, 2022

I’d like to contribute some insights that might be related to the issue @michaelmesser is discussing, as I encountered a similar problem while implementing OPENJSON in QueryableValues.

During performance tests, I discovered that the XML approach outperformed OPENJSON significantly when handling a large number of elements. The issue became apparent when I composed the output from the OPENJSON function against a physical table. The larger the physical table, the worse OPENJSON performed. A closer examination of the execution plan revealed that the query engine was reevaluating/parsing –my best interpretation– the JSON data provided to OPENJSON for each row –kinda– in the physical table. The following image showcases the metrics from my reproduction of the issue:

Per the execution plan, OPENJSON was being executed multiple times, despite its input being constant. If this is true, then CPU cycles were being wasted.

After extensive experimentation and trying various complex approaches, I managed to persuade the query engine to execute the OPENJSON function only once. The outcome can be seen in the following image:

After this tweak, OPENJSON surpassed the performance of all other methods. This improvement was achieved by projecting a dummy column and subsequently sorting the OPENJSON output by it.

This was my experience when testing with the following SQL Server instance specs:

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
Oct  8 2022 05:58:25
Copyright (C) 2022 Microsoft Corporation
Express Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 22621: ) (Hypervisor)

I hope this information proves helpful.

yv989c on May 2, 2023

Up to 1 GB ???

ErikEJ on Apr 26, 2023

@StefanOverHaevgRZ we’ll try to put this in for preview4, but there’s a chance that might not happen and it’ll only be in preview5.

And yes - the new solution utilizes JSON, so the number of values in the array is no longer important.

roji on Apr 26, 2023

Per the above, JSON seems to be the right candidate for the job because of its availability across DB technologies […]

FWIW the other candidate (parameterized IN) is also available across all DBs.

On the other hand, instead of scoping this to only the use of Contains, would it be possible to consider leveraging the power of the JSON format to allows us to compose any in-memory collection with the DB entities?

That’s #30426, which I mentioned in the summary; and yes, the OPENJSON approach wouldn’t be limited only to Contains. However, at this point it’s unlikely we’ll focus on collection parameters which aren’t over simple primitive types (e.g. composite keys which involve tuples), since that will probably introduce some significant additional complexity. I’ll definitely keep it in mind during implementation, but as @ErikEJ wrote, it’s important to focus on this very specific issue (the 2nd most highly voted in this repo) and not let scope creep compromise it.

roji on Mar 16, 2023

Note that OPENJSON can’t always be used (older SQL Server versions, spatial types are incompatible with it). For those cases we’ll currently revert back to the current behavior (IN with constants).

How do you plan to implement this? Via an DbOptions setting for older compat levels? Or query the compat level?

A DbOptions setting. We’ll mostly likely default to using OPENJSON by default, and you’d have a context option to request compatibility with older versions (minor breaking breaking change for people using older databases). I think we’ve already had other cases where this was needed, so I think it makes sense.

Maybe: This function is only available with compatibility level 130 or higher (SQL Server 2016 and later and Azure SQL Database)

Thanks, edited the text.

roji on Mar 16, 2023

IIRC SQL Server TVPs require an explicit head-of-time CREATE TYPE, so probably not appropriate for this kind of thing. But JSON is probably a possibility (via OPENJSON).

roji on Mar 11, 2023

https://gist.github.com/ErikEJ/6ab62e8b9c226ecacf02a5e5713ff7bd

This one, with great comments and suggestions @mrlife

ErikEJ on Sep 14, 2021

Interesting discussion.

It would be interesting to see benchmark results with the upcoming JSON native type which “will allow you to store JSON documents in a native binary format that is optimized for storage and query performance”. It’s probably more relevant for column data type, but one can also declare JSON parameters/variables.

DECLARE @json JSON = N'
[
    {
        "OrderNumber": "S043659",
        "Date":"2022-05-24T08:01:00",
        "AccountNumber":"AW29825",
        "Price":59.99,
        "Quantity":1
    },
    {
        "OrderNumber": "S043661",
        "Date":"2022-05-20T12:20:00",
        "AccountNumber":"AW73565",
        "Price":24.99,
        "Quantity":3
    }
]';

Although I’m not sure if it’s possible to use this type via c# code.

clement911 on Oct 24, 2023

@canadar only in EF Core 8 - this is quite a serious change/feature, we don’t backport this kind of thing to patch releases (see our planning/release docs).

roji on Jun 5, 2023

@yv989c apologies for the long delay, I was travelling.

I’m not an expert on SQL Server internals by any means. I mainly rely on the information provided by the execution plan and stats (e.g. SET STATISTICS) in an effort to make an educated assessment.

Oh absolutely - and all your comments/feedback here is greatly appreciated; you’ve certainly gone deeper here than I have.

IME the more info you provide to the query engine, the better it can optimize its plan.

That’s definitely true and I also would prefer to use OPENJSON with WITH rather than apply casting in the projection, if only for SQL cleanliness. Note that if there’s evidence of an actual performance difference between the two variants, there’s the option of emitting OPENJSON with WITH when ordering is irrelevant (e.g. for Contains), but without it where ordering needs to be preserved. But since this can add some complexity, I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

roji on May 16, 2023

@michaelmesser thanks for your repro - I can see the (quite strange) phenomenon occuring. I’ll investigate this internally and post back here on the results.

@yv989c yeah, the OPENJSON form with WITH unfortunately doesn’t provide the array index to order by. Note that Azure Synapse does provide an array identity function (docs), but this unfortunately isn’t yet supported elsewhere. I’ve received confirmation that one should not rely on the order of rows being the same as the input array - an ORDER BY is necessary in order to guarantee that; as a result I’ve implemented this without the WITH clause, and added casting on the projected column instead.

roji on May 5, 2023

I attached SQL to reproduce OPENJSON with WithUnorderedPrefetch and without WithUnorderedPrefetch. Run TestCreate.sql, TestInsert.sql, and then TestQuery.sql. While the difference in this example is hundreds of milliseconds, the performance difference with the real data is many seconds.

Test.zip

michaelmesser on May 3, 2023

I tried to use OPENJSON and STRING_SPLIT to work around this issue. With all the relevant pages in memory everything works as expected. However, when pages have to be read from disk the performance is very slow. OPENJSON and STRING_SPLIT have low estimates. This leads to the INNER LOOP JOIN not having WithUnorderedPrefetch enabled. Without prefetching, the query has physical reads instead of read-ahead reads. Tricking the cardinality estimator into giving a higher estimate fixes the issue, but results in less readable code. How will EF Core’s solution handle this case?

michaelmesser on May 1, 2023

Oh I see…

I’m not sure to what extent we need to actually look at the SQL Server compatibility level in those cases… The thing with this issue is that there’s a fallback: in modern SQL Server we use JSON, otherwise we fall back to IN+constants (since Contains must work, we can’t have it broken on old versions).

With STRING_AGG and CONCAT_WS there’s no “bad” translation to fall back to if they’re not supported, AFAIK. So a compatibility check would at most allow us to provide the user with a more informative “you need to upgrade” error, rather than a cryptic SQL Server “syntax error”.

In any case, CONCAT_WS is tracked by #28899, I already have an old PR #28900 which I’ll need to finish at some point. STRING_AGG is already in - we can indeed add a check to throw an informative message there (though that seems somewhat low-value).

roji on Apr 26, 2023

@Meligy we’re definitely aware of that approach - see the very first post above.

I’ve been doing some research on this question and will soon post a plan of action here.

roji on Mar 13, 2023

@ErikEJ true - ideally we’d have raw, ADO.NET benchmarks that show bare database performance; but they do provide an indication. We’ll like do some such benchmarking when this is actually implemented for EF.

roji on Mar 12, 2023

Thanks!

roji on Mar 11, 2023

Thanks, curious as to how that actually works, will dig into it

@danielgreen I’m glad you’re curious. You can find those details here. 🙂

yv989c on Mar 11, 2023

@danielgreen looks like it is passing a xml string, not a tvp

ErikEJ on Mar 11, 2023

I’d like to be able to run an adhoc query passing in an array of ints, the way you can pass a table valued parameter to a stored proc. Does the QueryableValues extension enable that?

@danielgreen You can find that precise example here.

yv989c on Mar 11, 2023

I’d like to be able to run an adhoc query passing in an array of ints, the way you can pass a table valued parameter to a stored proc. Does the QueryableValues extension enable that?

danielgreen on Mar 11, 2023

Credit where credit is due on the OPENJSON suggestion 😃 https://github.com/dotnet/efcore/issues/13617#issuecomment-491399486

Indeed!

[…] whether anything can be done to influence the estimate, as EF Core would know the actual number before sending the query.

Sure… Though we’d probably want to avoid anything that involves embedding a (constant) query hint or similar in the SQL, as that would bring us back to the original problem (SQL which depends on the collection size).

Thanks @ErikEJ for the link! Yeah, ideally JSON would perform at least nearly as well, allowing us to just always use that and not worry about delimiters.

roji on Jan 21, 2023

@stevendarby Looks like @aaronbertrand did quite a lot of testing in this area already, as always “it depends” but json is not all bad: https://sqlperformance.com/2016/04/sql-server-2016/string-split-follow-up-2

ErikEJ on Jan 21, 2023

@roji I said as much in my posts so don’t think I’ve oversold it as an option for some to explore- definitely some work to do for the general case 😃

stevendarby on Dec 15, 2022

@stevendarby it’s an interesting direction, but at least in the general case there are also varying number of values in the IN clause, so it’s limited in any case. That’s why an INNER JOIN-based solution could be a better way forward here.

roji on Dec 15, 2022

@cjstevenson - The value you provided as example does not cause any SQL injection. EF Core escapes quotes properly in the SQL. Generated SQL for example

      SELECT [b].[Id], [b].[Value]
      FROM [Blogs] AS [b]
      WHERE [b].[Value] IN (N'legit_value\''); SELECT * FROM some_table limit 10; -- ')

This is not a security issue. This issue has nothing to do with values and SQL injection, please file a new issue with repro steps if you are seeing a bug. Please refer to security policy if you are submitting a report for security issue.

smitpatel on Oct 1, 2020

Might be interesting for others, here’s how we solved the problem by using LINQKit.Core:

var predicate = PredicateBuilder.New<User>();
foreach (var id in ids)
{
	predicate = predicate.Or(user => user.Id == id);
}

var throwAway = context.Users.Where(predicate).ToList();

This will generate a parameterized query that looks like this:

SELECT ...
FROM   [Users] AS [u]
WHERE  [u].[Id] = @id1 
    OR [u].[Id] = @id2
    OR [u].[Id] = @id3
    ...

thomasdc on Sep 17, 2020

ErikEJ on Oct 13, 2018