efcore: IN() list queries are not parameterized, causing increased SQL Server CPU usage

Currently, EF Core is not parameterizing IN(...) queries created from .Contains() (and maybe other cases). This has a very detrimental impact on SQL Server itself because:

  1. The query hash is different for each distinct set of IDs passed in.
  2. SQL Server needs to calculate a plan for each of these distinct combinations (CPU hit).
  3. The query plans created occupy an entry in SQL Server’s plan cache (unless ad-hoc behavior is on), fighting for resources on other plans and cascading CPU issues by re-calculations due to evictions.

Note: when SQL Server has memory pressure, plan cache is the first thing to empty. So this has a profound impact at scale, doubly so when things have gone sideways.

Steps to reproduce

Here’s a reduced down version of the problem:

var ids = Enumerable.Range(1, 1000).ToList();
var throwAwary = context.Users.Where(u => ids.Contains(u.Id)).ToList();

This results in the following:

SELECT [u].[Id], [u].[AcceptRateAccepted], [u].[AcceptRateAsked], [u].[AccountId], [u].[AnswerCount], [u].[BronzeBadges], [u].[CreationDate], [u].[DaysVisitedConsecutive], [u].[DaysVisitedTotal], [u].[DisplayName], [u].[Email], [u].[Flags], [u].[GoldBadges], [u].[HasAboutMeExcerpt], [u].[HasReplies], [u].[IsVeteran], [u].[JobSearchStatus], [u].[LastAccessDate], [u].[LastDailySiteAccessDate], [u].[LastEmailDate], [u].[LastLoginDate], [u].[LastLoginIP], [u].[LastModifiedDate], [u].[Location], [u].[OptInEmail], [u].[PreferencesRaw], [u].[ProfileImageUrl], [u].[QuestionCount], [u].[RealName], [u].[Reputation], [u].[ReputationMonth], [u].[ReputationQuarter], [u].[ReputationSinceLastCheck], [u].[ReputationToday], [u].[ReputationWeek], [u].[ReputationYear], [u].[SignupStarted], [u].[SilverBadges], [u].[TeamId], [u].[TeamName], [u].[TimedPenaltyDate], [u].[Title], [u].[UserTypeId], [u].[WebsiteUrl]
FROM [Users] AS [u]
WHERE [u].[Id] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000);

Further technical details

EF Core version: 2.1.4 Database Provider: Microsoft.EntityFrameworkCore.SqlServer Operating system: Windows 2016/Windows 10 IDE: Visual Studio 2017 15.9

Proposal

EF Core should parameterize here. Instead of IN (1, 2, 3, ...) we should see IN (@__ids1, @__ids2, @__ids3, ...) or similar. This would allow query plan cache to be shared. For example if we ran this 1,000 times to fetch 1,000,000 users in batches, we’d have 1 plan in cache, whereas today we have 1,000 plans. Let’s say a user gets removed (or added!) on page 2 of 1,000…today we’d calculate and cache another 999 plans next run.

To further address the cardinality problem, an approach similar to what Dapper does would be at least a starting point. We generate only lists of certain sizes, let’s just say 5, 10, 50, 100, 500, 1000. Here’s an example with 3 parameters:

SELECT [u].[Id], [u].[AcceptRateAccepted], [u].[AcceptRateAsked], [u].[AccountId], [u].[AnswerCount], [u].[BronzeBadges], [u].[CreationDate], [u].[DaysVisitedConsecutive], [u].[DaysVisitedTotal], [u].[DisplayName], [u].[Email], [u].[Flags], [u].[GoldBadges], [u].[HasAboutMeExcerpt], [u].[HasReplies], [u].[IsVeteran], [u].[JobSearchStatus], [u].[LastAccessDate], [u].[LastDailySiteAccessDate], [u].[LastEmailDate], [u].[LastLoginDate], [u].[LastLoginIP], [u].[LastModifiedDate], [u].[Location], [u].[OptInEmail], [u].[PreferencesRaw], [u].[ProfileImageUrl], [u].[QuestionCount], [u].[RealName], [u].[Reputation], [u].[ReputationMonth], [u].[ReputationQuarter], [u].[ReputationSinceLastCheck], [u].[ReputationToday], [u].[ReputationWeek], [u].[ReputationYear], [u].[SignupStarted], [u].[SilverBadges], [u].[TeamId], [u].[TeamName], [u].[TimedPenaltyDate], [u].[Title], [u].[UserTypeId], [u].[WebsiteUrl]
FROM [Users] AS [u]
WHERE [u].[Id] IN (@ids1, @ids2, @ids3, @ids4, @ids5);

The 5 Ids are so that 1-5 values all use the same plan. Anything < n in length repeats the last value (don’t use null!). In this case let’s say our values are 1, 2, and 3. Our parameterization would be:

  • @ids1 = 1;
  • @ids2 = 2;
  • @ids3 = 3;
  • @ids4 = 3;
  • @ids5 = 3;

This fetches the users we want, is friendly to the cache, and lessens the amount of generated permutations for all layers.

To put this in perspective, at Stack Overflow scale we’re generating millions of one-time-use plans needlessly in EF Core (the parameterization in Linq2Sql lowered this to only cardinality permutations). We alleviate the cache issue by enabling ad-hoc query mode on our SQL Servers, but that doesn’t lessen the CPU use from all the query hashes involved except in the (very rare) reuse case of the exact same list.

This problem is dangerous because it’s also hard to see. If you’re looking at the plan cache, you’re not going to see it by any sort of impact query analysis. Each query, hash, and plan is different. There’s no sane way to group them. It’s death by a thousand cuts you can’t see. I’m currently finding and killing as many of these in our app as I can, but we should fix this at the source for everyone.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 273
  • Comments: 123 (76 by maintainers)

Commits related to this issue

Most upvoted comments

Everyone,

We spent some time investigating this; below is a write-up of the current situation, the possible ways forward and some first conclusions. The tl;dr is that we think it makes sense to implement an OPENJSON-based approach, where the (array parameter (e.g. int[]) is transmitted as a JSON string parameter (`[1,2,3]') and unpacked to a pseudo-table, which we can then be used in the SQL query as usual. As many people have shown an interest here, it would be good to receive feedback on the below!

The problem

It’s very common to filter rows based on an array of (primitive) values that exists on the client. In an EF LINQ query, this looks like the following:

var ids = new[] { 10, 20, 30 };
var customers = await context.Customers.Where(c => ids.Contains(c.Id)).ToArray();

This conceptually translates to a SQL IN expression. However, since array parameters aren’t supported by most databases, EF unpacks the array into the SQL, currently inserting its values as constants:

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (10, 20, 30);

This technique unfortunately means that different array values cause different SQL to get generated. This can cause severe performance problems, since it pollutes database query plans, causing re-planning for each execution and evicts plans for other queries. Less importantly, EF cannot (currently) cache the SQL, as it depends on parameter values; this means some query processing must occur for each execution.

In the benchmark below, this technique is called In_with_constants.

IN expression with parameters

To improve this, we could simply embed the array values as parameters instead of constants:

DECLARE @id1 INT = 10, @id2 INT = 20, @id3 INT = 30;

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (@id1, @id2, @id3);

For all arrays of identical lengths (3 in our case), this produces the same SQL and is therefore optimal. However, varying array lengths still cause differing SQLs, triggering the performance issues. Also, EF still cannot cache the SQL.

In the benchmark below, this technique is called In_with_padded_parameters with PaddingCount = 0.

Padding

A common way to alleviate the differing length problem is to “pad” the IN expression with extra “useless” parameters, up to an arbitrary boundary. For example, the following pads our 3-value array to 5 values:

DECLARE @id1 INT = 10, @id2 INT = 20, @id3 INT = 30, @i4 INT = 30, @i5 INT = 30;

SELECT * FROM [Customers] AS [c]
WHERE [c].[Id] IN (@id1, @id2, @id3, @id4, @id5);

Array lengths of 5 and below therefore use the same SQL, at the cost of sending duplicated data to the server. The bigger the padding boundary, the less different SQLs are produced for different array lengths, but the more padding parameters are added on the average, increasing query overhead.

This padding technique has a long history and can be used with various ORMs (nHibernate, jooq, Dapper), though it sometimes requires an opt-in.

In the benchmark below, this technique is called In_with_padded_parameters with PaddingCount > 0.

Sending the values as an array

Rather than expanding an array parameter out to individual values in a SQL IN expression (whether as constants or as parameters), ideally we’d be able to just send the array parameter directly. Doing this directly usually isn’t supported, since databases don’t typically support array types in any way. The two exceptions to this are:

  • PostgreSQL, which has native support for arrays. The EF PostgreSQL provider already translates LINQ Contains to WHERE x = ANY (@arrayParam), where @arrayParam is a simply .NET array. This yields optimal performance with optimal simplicity.
  • SQL Server, which supports table-valued parameters or TVPs, which allows sending an array of arbitrary user-defined types. However, this requires pre-creation of the user-defined type (via CREATE TYPE), making it generally unsuitable for EF, and does not perform well (see Temporary_table_with_inner_join in the benchmark below).

However, all modern databases support parsing a JSON string representing an array and evaluating it as a relational resultset (e.g. pseudo-table); this allows us to send a simple string parameter containing e.g. [10,20,30], and use a table-valued SQL function to extract the values out as a table with 3 rows containing those values. Once that’s done, regular SQL can be used to implements the Contains logic. The following uses an inner join to filter out only customers with IDs matching the incoming JSON array (this technique is Array_param_with_inner_join in the benchmarks):

DECLARE @ids NVARCHAR(MAX) = '[10,20,30]';

SELECT * FROM [Customers] AS [c]
JOIN OpenJson(@ids) WITH ([Value] integer '$') AS v ON v.Value = [c].[Id];

A subquery-based approach was also benchmarked (Array_param_with_exists_subquery):

DECLARE @ids NVARCHAR(MAX) = '[10,20,30]';

SELECT * FROM [Customers] AS [c]
WHERE EXISTS (
    SELECT 1
    FROM OpenJson(@ids) WITH ([Value] integer '$') AS [v]
    WHERE v.[Value] = [c].[Id]);

This technique has the following advantages compared to the expansion-based techniques discussed above:

  • Only one SQL is every needed, regardless of the values or the number. This is optimal both for the database server (single plan), and for EF (it can cache the SQL).
  • Since no parameter padding is needed, no extra parameter data ever needs to be sent.

Links to the JSON-unpacking functions in the different databases:

  • SQL Server has OPENJSON. This function is only available with compatibility level 130 or higher (Azure SQL Database and SQL Server 2016 and later; the previous version - SQL Server 2014 goes out of support in 2024). An alternative OPENXML also exists with better compatibility, but performs significantly worse.
  • SQLite has json_each.
  • MySQL and MariaDB have json_table.
  • PostgreSQL has jsonb_to_recordset, though it isn’t required thanks to the native support for arrays.
  • Oracle has JSON_TABLE.

Benchmark

Benchmark results
Method Database NumSearchValues Found PaddingCount Mean Error StdDev Median
In_with_constants SQLServer 2 False 0 401.3 μs 7.60 μs 9.05 μs 401.5 μs
In_with_padded_parameters SQLServer 2 False 0 656.1 μs 30.38 μs 89.59 μs 642.9 μs
In_with_padded_parameters SQLServer 2 False 5 509.0 μs 15.92 μs 46.93 μs 525.0 μs
In_with_padded_parameters SQLServer 2 False 10 803.3 μs 28.04 μs 82.23 μs 804.9 μs
In_with_padded_parameters SQLServer 2 False 50 1,314.9 μs 50.84 μs 149.10 μs 1,321.6 μs
In_with_padded_parameters SQLServer 2 False 100 681.3 μs 26.25 μs 77.41 μs 704.8 μs
In_with_padded_parameters SQLServer 2 False 500 1,933.4 μs 38.61 μs 84.75 μs 1,951.3 μs
Array_param_with_exists_subquery SQLServer 2 False 0 964.6 μs 32.73 μs 96.00 μs 961.2 μs
Array_param_with_inner_join SQLServer 2 False 0 964.8 μs 28.93 μs 85.30 μs 972.4 μs
OpenXml_with_inner_join SQLServer 2 False 0 1,598.6 μs 36.22 μs 105.67 μs 1,597.8 μs
Temporary_table_with_inner_join SQLServer 2 False 0 1,067.0 μs 35.11 μs 102.41 μs 1,071.4 μs
In_with_constants SQLServer 2 True 0 603.1 μs 28.55 μs 84.18 μs 593.9 μs
In_with_padded_parameters SQLServer 2 True 0 671.0 μs 30.51 μs 89.96 μs 666.1 μs
In_with_padded_parameters SQLServer 2 True 5 520.4 μs 21.97 μs 64.77 μs 539.6 μs
In_with_padded_parameters SQLServer 2 True 10 803.4 μs 29.80 μs 87.86 μs 808.9 μs
In_with_padded_parameters SQLServer 2 True 50 1,260.5 μs 52.32 μs 154.26 μs 1,272.0 μs
In_with_padded_parameters SQLServer 2 True 100 727.2 μs 19.48 μs 57.45 μs 742.5 μs
In_with_padded_parameters SQLServer 2 True 500 1,943.0 μs 47.99 μs 141.51 μs 1,954.0 μs
Array_param_with_exists_subquery SQLServer 2 True 0 958.9 μs 23.60 μs 69.57 μs 955.3 μs
Array_param_with_inner_join SQLServer 2 True 0 963.7 μs 28.06 μs 82.28 μs 965.0 μs
OpenXml_with_inner_join SQLServer 2 True 0 1,576.0 μs 49.12 μs 143.28 μs 1,566.8 μs
Temporary_table_with_inner_join SQLServer 2 True 0 1,066.0 μs 40.26 μs 118.06 μs 1,060.8 μs
In_with_constants SQLServer 10 False 0 426.8 μs 12.03 μs 35.48 μs 436.6 μs
In_with_padded_parameters SQLServer 10 False 0 773.0 μs 28.31 μs 83.46 μs 775.3 μs
In_with_padded_parameters SQLServer 10 False 5 844.6 μs 32.35 μs 95.38 μs 857.9 μs
In_with_padded_parameters SQLServer 10 False 10 946.5 μs 45.03 μs 129.93 μs 953.0 μs
In_with_padded_parameters SQLServer 10 False 50 1,338.0 μs 50.94 μs 150.20 μs 1,324.4 μs
In_with_padded_parameters SQLServer 10 False 100 768.6 μs 25.58 μs 75.43 μs 789.9 μs
In_with_padded_parameters SQLServer 10 False 500 1,963.7 μs 46.30 μs 136.51 μs 1,999.9 μs
Array_param_with_exists_subquery SQLServer 10 False 0 977.4 μs 30.95 μs 90.78 μs 976.6 μs
Array_param_with_inner_join SQLServer 10 False 0 969.2 μs 22.15 μs 64.97 μs 967.7 μs
OpenXml_with_inner_join SQLServer 10 False 0 1,643.6 μs 39.89 μs 116.98 μs 1,641.3 μs
Temporary_table_with_inner_join SQLServer 10 False 0 1,112.9 μs 40.38 μs 117.16 μs 1,095.6 μs
In_with_constants SQLServer 10 True 0 668.2 μs 27.62 μs 80.99 μs 661.5 μs
In_with_padded_parameters SQLServer 10 True 0 768.7 μs 24.35 μs 70.65 μs 769.3 μs
In_with_padded_parameters SQLServer 10 True 5 858.4 μs 31.42 μs 92.64 μs 865.1 μs
In_with_padded_parameters SQLServer 10 True 10 947.6 μs 29.00 μs 85.06 μs 938.9 μs
In_with_padded_parameters SQLServer 10 True 50 1,320.1 μs 56.63 μs 166.99 μs 1,309.1 μs
In_with_padded_parameters SQLServer 10 True 100 1,250.5 μs 57.92 μs 169.86 μs 1,271.9 μs
In_with_padded_parameters SQLServer 10 True 500 2,692.6 μs 128.12 μs 377.78 μs 2,776.5 μs
Array_param_with_exists_subquery SQLServer 10 True 0 1,026.5 μs 32.06 μs 94.54 μs 1,012.8 μs
Array_param_with_inner_join SQLServer 10 True 0 996.8 μs 24.75 μs 72.20 μs 988.7 μs
OpenXml_with_inner_join SQLServer 10 True 0 1,642.7 μs 40.46 μs 119.29 μs 1,642.5 μs
Temporary_table_with_inner_join SQLServer 10 True 0 1,115.2 μs 38.13 μs 111.83 μs 1,110.0 μs
In_with_constants SQLServer 100 False 0 524.9 μs 24.01 μs 70.42 μs 533.1 μs
In_with_padded_parameters SQLServer 100 False 0 1,226.8 μs 47.63 μs 139.70 μs 1,236.9 μs
In_with_padded_parameters SQLServer 100 False 5 1,245.0 μs 61.92 μs 182.56 μs 1,207.2 μs
In_with_padded_parameters SQLServer 100 False 10 1,247.6 μs 53.19 μs 156.01 μs 1,248.4 μs
In_with_padded_parameters SQLServer 100 False 50 1,279.4 μs 72.58 μs 214.01 μs 1,240.0 μs
In_with_padded_parameters SQLServer 100 False 100 1,548.1 μs 94.59 μs 278.91 μs 1,593.0 μs
In_with_padded_parameters SQLServer 100 False 500 2,832.0 μs 160.80 μs 474.12 μs 2,998.7 μs
Array_param_with_exists_subquery SQLServer 100 False 0 1,088.3 μs 33.54 μs 97.83 μs 1,088.1 μs
Array_param_with_inner_join SQLServer 100 False 0 1,057.4 μs 26.71 μs 78.76 μs 1,060.5 μs
OpenXml_with_inner_join SQLServer 100 False 0 2,192.2 μs 60.30 μs 177.80 μs 2,186.1 μs
Temporary_table_with_inner_join SQLServer 100 False 0 1,926.1 μs 44.66 μs 131.69 μs 1,919.2 μs
In_with_constants SQLServer 100 True 0 1,001.9 μs 34.08 μs 99.95 μs 985.6 μs
In_with_padded_parameters SQLServer 100 True 0 1,275.6 μs 62.02 μs 182.86 μs 1,249.4 μs
In_with_padded_parameters SQLServer 100 True 5 1,252.4 μs 56.43 μs 166.39 μs 1,232.0 μs
In_with_padded_parameters SQLServer 100 True 10 1,242.3 μs 45.00 μs 132.68 μs 1,230.9 μs
In_with_padded_parameters SQLServer 100 True 50 1,341.4 μs 79.00 μs 231.69 μs 1,329.1 μs
In_with_padded_parameters SQLServer 100 True 100 1,514.7 μs 91.66 μs 270.25 μs 1,525.2 μs
In_with_padded_parameters SQLServer 100 True 500 2,793.8 μs 160.65 μs 473.68 μs 2,931.8 μs
Array_param_with_exists_subquery SQLServer 100 True 0 1,100.8 μs 38.30 μs 112.93 μs 1,091.5 μs
Array_param_with_inner_join SQLServer 100 True 0 1,093.7 μs 32.36 μs 94.90 μs 1,073.9 μs
OpenXml_with_inner_join SQLServer 100 True 0 2,183.7 μs 55.15 μs 159.12 μs 2,175.9 μs
Temporary_table_with_inner_join SQLServer 100 True 0 1,970.0 μs 38.97 μs 107.98 μs 1,963.2 μs
In_with_constants SQLServer 1000 False 0 1,184.9 μs 38.29 μs 112.90 μs 1,182.0 μs
In_with_padded_parameters SQLServer 1000 False 0 4,039.2 μs 242.04 μs 713.66 μs 4,242.7 μs
In_with_padded_parameters SQLServer 1000 False 5 3,866.4 μs 270.36 μs 797.17 μs 4,164.5 μs
In_with_padded_parameters SQLServer 1000 False 10 3,941.7 μs 237.62 μs 700.64 μs 4,137.6 μs
In_with_padded_parameters SQLServer 1000 False 50 4,207.9 μs 200.56 μs 591.37 μs 4,433.0 μs
In_with_padded_parameters SQLServer 1000 False 100 4,226.5 μs 178.61 μs 526.65 μs 4,351.0 μs
In_with_padded_parameters SQLServer 1000 False 500 4,970.2 μs 319.84 μs 943.04 μs 5,245.2 μs
Array_param_with_exists_subquery SQLServer 1000 False 0 1,848.8 μs 47.27 μs 139.37 μs 1,846.0 μs
Array_param_with_inner_join SQLServer 1000 False 0 1,848.6 μs 51.46 μs 151.72 μs 1,834.2 μs
OpenXml_with_inner_join SQLServer 1000 False 0 11,844.7 μs 235.72 μs 406.61 μs 11,815.6 μs
Temporary_table_with_inner_join SQLServer 1000 False 0 3,937.7 μs 92.23 μs 271.94 μs 3,926.6 μs
In_with_constants SQLServer 1000 True 0 1,354.4 μs 34.48 μs 98.39 μs 1,358.8 μs
In_with_padded_parameters SQLServer 1000 True 0 4,012.3 μs 231.55 μs 682.72 μs 4,225.7 μs
In_with_padded_parameters SQLServer 1000 True 5 3,975.6 μs 259.31 μs 764.59 μs 4,135.4 μs
In_with_padded_parameters SQLServer 1000 True 10 3,980.3 μs 221.32 μs 652.58 μs 4,172.7 μs
In_with_padded_parameters SQLServer 1000 True 50 4,390.7 μs 257.50 μs 759.25 μs 4,638.6 μs
In_with_padded_parameters SQLServer 1000 True 100 4,275.2 μs 216.18 μs 637.40 μs 4,378.2 μs
In_with_padded_parameters SQLServer 1000 True 500 5,316.2 μs 362.07 μs 1,067.56 μs 5,700.8 μs
Array_param_with_exists_subquery SQLServer 1000 True 0 2,053.9 μs 64.06 μs 188.88 μs 2,057.4 μs
Array_param_with_inner_join SQLServer 1000 True 0 2,089.3 μs 68.79 μs 202.82 μs 2,084.0 μs
OpenXml_with_inner_join SQLServer 1000 True 0 12,200.8 μs 231.06 μs 204.83 μs 12,152.9 μs
Temporary_table_with_inner_join SQLServer 1000 True 0 4,130.5 μs 82.56 μs 232.86 μs 4,106.4 μs
Benchmark sources
using System;
using System.Data;
using System.Data.Common;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Dapper;
using Microsoft.Data.SqlClient;
using Microsoft.Data.SqlClient.Server;
using Npgsql;

BenchmarkRunner.Run<Benchmark>();

// Uncomment the following for quick testing that all tests technically work
// [SimpleJob(warmupCount: 1, targetCount: 1, invocationCount: 1)]
public class Benchmark
{
    // [Params(Database.SQLServer, Database.PostgreSQL)]
    [Params(Database.SQLServer)]
    public Database Database { get; set; }

    [Params(2, 10, 100, 1000)]
    public int NumSearchValues { get; set; }

    [Params(true, false)]
    public bool Found { get; set; }

    private int[] _searchValues = null!;

    private const int TotalRows = 1000;

    [Params(0, 5, 10, 50, 100, 500)]
    public int PaddingCount { get; set; }

    private DbConnection _connection = null!;
    private DbCommand _command = null!;

    public void CommonSetup()
    {
        _connection = Database switch
        {
            Database.SQLServer => new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false"),
            Database.PostgreSQL => new NpgsqlConnection("Host=localhost;Username=test;Password=test;SSL Mode=disable"),
            _ => throw new UnreachableException()
        };
        _connection.Open();
        _command = _connection.CreateCommand();

        _connection.Execute("DROP TABLE IF EXISTS data");
        _connection.Execute("CREATE TABLE data (id INTEGER, int INTEGER)");
        _connection.Execute("CREATE INDEX IX_data ON data(int)");

        var builder = new StringBuilder("INSERT INTO data (id, int) VALUES");
        for (var i = 0; i < TotalRows; i++)
        {
            if (i > 0)
                builder.Append(",");
            builder.Append($" ({i}, {i + 100})");
        }

        _command.CommandText = builder.ToString();
        Console.WriteLine("Seed SQL: " + builder);
        _connection.Execute(builder.ToString());

        _searchValues = new int[NumSearchValues];
        for (var i = 0; i < _searchValues.Length; i++)
        {
            _searchValues[i] = Random.Shared.Next(TotalRows) + (Found ? 0 : 1000);
        }
    }

    [GlobalSetup(Target = nameof(In_with_constants))]
    public void In_with_constants_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();

        CommonSetup();

        var builder = new StringBuilder("SELECT id FROM data WHERE int IN (");
        for (var i = 0; i < _searchValues.Length; i++)
        {
            if (i > 0)
                builder.Append(",");
            builder.Append(_searchValues[i]);
        }

        builder.Append(")");
        _command.CommandText = builder.ToString();
        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(PG_any_with_parameter))]
    public void PG_any_with_parameter_setup()
    {
        if (Database != Database.PostgreSQL || PaddingCount != 0)
            throw new NotSupportedException();

        CommonSetup();

        _command.CommandText = "SELECT id FROM data WHERE int = ANY($1)";
        var p = _command.CreateParameter();
        p.Value = _searchValues;
        _command.Parameters.Add(p);

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }


    [GlobalSetup(Target = nameof(In_with_padded_parameters))]
    public void In_with_padded_parameters_setup()
    {
        CommonSetup();

        var builder = new StringBuilder("SELECT id FROM data WHERE int IN (");
        for (var i = 0; i < _searchValues.Length + PaddingCount; i++)
        {
            if (i > 0)
                builder.Append(",");

            var searchValue = i < _searchValues.Length ? _searchValues[i] : _searchValues[^1];

            switch (Database)
            {
                case Database.SQLServer:
                    var parameterName = $"@p{i + 1}";
                    builder.Append(parameterName);
                    _command.Parameters.Add(
                        new SqlParameter { ParameterName = parameterName, Value = searchValue });
                    break;
                case Database.PostgreSQL:
                    builder.Append($"${i + 1}");
                    _command.Parameters.Add(new NpgsqlParameter { Value = searchValue });
                    break;
                default:
                    throw new ArgumentOutOfRangeException();
            }
        }

        if (Database == Database.SQLServer)
        {
            ((SqlCommand)_command).EnableOptimizedParameterBinding = true;
        }

        builder.Append(")");
        _command.CommandText = builder.ToString();
        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Array_param_with_exists_subquery))]
    public void Array_param_with_exists_subquery_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        switch (Database)
        {
            case Database.SQLServer:
                _command.CommandText =
"""
SELECT id FROM data AS d WHERE EXISTS (
    SELECT 1
    FROM OpenJson(@p) WITH ([Value] integer '$') AS v
    WHERE v.Value = d.int)
""";
                _command.Parameters.Add(new SqlParameter("p", JsonSerializer.Serialize(_searchValues)));
                break;
            case Database.PostgreSQL:
                _command.CommandText =
"""
SELECT id FROM data AS d WHERE EXISTS (
    SELECT 1
    FROM unnest($1) AS v
    WHERE v.v = d.int)
""";
                _command.Parameters.Add(new NpgsqlParameter { Value = _searchValues });
                break;
        }

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Array_param_with_inner_join))]
    public void Array_param_with_inner_join_setup()
    {
        if (PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        switch (Database)
        {
            case Database.SQLServer:
                _command.CommandText =
"""
SELECT id FROM data AS d
JOIN OpenJson(@p) WITH ([Value] integer '$') AS v ON v.Value = d.int;
""";
                _command.Parameters.Add(new SqlParameter("p", JsonSerializer.Serialize(_searchValues)));
                break;
            case Database.PostgreSQL:
                _command.CommandText =
"""
SELECT id FROM data AS d
JOIN unnest($1) AS v ON v.v = d.int;
""";
                _command.Parameters.Add(new NpgsqlParameter { Value = _searchValues });
                break;
        }

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(OpenXml_with_inner_join))]
    public void OpenXml_with_inner_join_setup()
    {
        if (Database != Database.SQLServer || PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        _command.CommandText =
"""
DECLARE @idoc INT;
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc;
SELECT id FROM data AS d
JOIN OpenXML(@idoc, '/Root/Value', 1) WITH ([value] integer) AS v ON v.Value = d.int;
EXEC sp_xml_removedocument @idoc;
""";
        var builder = new StringBuilder("<Root>");
        for (var i = 0; i < _searchValues.Length; i++)
            builder.Append($@"<Value value=""{_searchValues[i]}""/>");
        builder.Append("</Root>");

        _command.Parameters.Add(new SqlParameter("doc", builder.ToString()));

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [GlobalSetup(Target = nameof(Temporary_table_with_inner_join))]
    public void Temporary_table_with_inner_join_setup()
    {
        if (Database != Database.SQLServer || PaddingCount != 0)
            throw new NotSupportedException();
        CommonSetup();

        // Note: this creates the TVP's type once at setup. This is because it's not possible to batch the creation
        // of the type and its use in the same SqlCommand, so that would require multiple roundtrips (better batching
        // via TDS RpcMessage would like make this work).
        // So for now, we're creating it just once, although that isn't usable by EF.
        _command.CommandText =
"""
DROP TYPE IF EXISTS int_wrapper;
CREATE TYPE int_wrapper AS TABLE (int INT);
""";
        _command.ExecuteNonQuery();

        // Note that we also create the SqlDataRecords only once at setup, which is also unrealistic and makes TVP
        // look much better than it is.
        var records = _searchValues.Select(i =>
        {
            var record = new SqlDataRecord(new SqlMetaData("int", SqlDbType.Int));
            record.SetInt32(0, i);
            return record;
        }).ToArray();

        _command.CommandText = "SELECT id FROM data AS d JOIN @ints AS ints ON ints.int = d.int";
        _command.Parameters.Add(new SqlParameter("ints", SqlDbType.Structured)
        {
            TypeName = "int_wrapper",
            Value = records
        });

        Console.WriteLine("Benchmark query: " + _command.CommandText);
    }

    [Benchmark]
    public void In_with_constants() => _command.ExecuteNonQuery();

    [Benchmark]
    public void PG_any_with_parameter() => _command.ExecuteNonQuery();

    [Benchmark]
    public void In_with_padded_parameters() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Array_param_with_exists_subquery() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Array_param_with_inner_join() => _command.ExecuteNonQuery();

    [Benchmark]
    public void OpenXml_with_inner_join() => _command.ExecuteNonQuery();

    [Benchmark]
    public void Temporary_table_with_inner_join() => _command.ExecuteNonQuery();

    [GlobalCleanup]
    public void Cleanup()
    {
        _command.Dispose();
        _connection.Dispose();
    }
}

public enum Database
{
    SQLServer,
    PostgreSQL
}

Observations:

  • In_with_constants (the current EF technique) is included only for completeness. Although it is always the fastest technique, this does not represent actual performance in a real application; since the benchmark repeatedly executes the same query, the performance issues related to constantly changing SQL don’t manifest.
  • Table-valued parameters perform surprisingly badly; this benchmarks creates the TVP type once at setup, and even caches the SqlDataRecord and SqlMetaData instances.
  • There are some anomalous results for increasing parameter padding values
  • For smaller value counts, In_with_padded_parameters is faster than the OPENJSON-based techniques (Array_param_with_inner_join, Array_param_with_exists_subquery). However, In_with_padded_parameters time grows much faster as the number of values increases, making OPENJSON better in the general case, with arbitrary value counts.
Detailed benchmarking of IN with parameterized vs. OPENJSON

The following benchmark show IN with parameters vs. OPENJSON with JOIN. No parameter padding is done for IN, so this optimistically assumes value counts are always at the padding boundaries. This means that results below show IN as better than it actually is, since in the real world padding will occur. In addition, this doesn’t take into account EF SQL caching, which is possible with OPENJSON but not possible with IN; this again means IN looks better than it really is.

Method Database NumSearchValues Found PaddingCount Mean Error StdDev
In_with_padded_parameters SQLServer 6 False 0 515.9 us 20.68 us 60.99 us
Array_param_with_inner_join SQLServer 6 False 0 993.2 us 27.36 us 80.25 us
In_with_padded_parameters SQLServer 6 True 0 717.3 us 27.04 us 79.31 us
Array_param_with_inner_join SQLServer 6 True 0 1,006.2 us 27.56 us 80.82 us
In_with_padded_parameters SQLServer 7 False 0 509.3 us 22.65 us 66.79 us
Array_param_with_inner_join SQLServer 7 False 0 984.1 us 23.06 us 67.64 us
In_with_padded_parameters SQLServer 7 True 0 732.6 us 29.34 us 86.51 us
Array_param_with_inner_join SQLServer 7 True 0 1,003.6 us 35.69 us 105.25 us
In_with_padded_parameters SQLServer 8 False 0 504.1 us 27.04 us 79.73 us
Array_param_with_inner_join SQLServer 8 False 0 996.2 us 34.32 us 101.18 us
In_with_padded_parameters SQLServer 8 True 0 745.8 us 31.89 us 94.02 us
Array_param_with_inner_join SQLServer 8 True 0 995.3 us 30.75 us 89.22 us
In_with_padded_parameters SQLServer 9 False 0 740.3 us 27.35 us 79.77 us
Array_param_with_inner_join SQLServer 9 False 0 1,000.6 us 33.03 us 96.88 us
In_with_padded_parameters SQLServer 9 True 0 743.6 us 24.49 us 71.82 us
Array_param_with_inner_join SQLServer 9 True 0 990.7 us 31.31 us 91.81 us
In_with_padded_parameters SQLServer 10 False 0 766.2 us 30.93 us 91.18 us
Array_param_with_inner_join SQLServer 10 False 0 997.7 us 28.99 us 85.03 us
In_with_padded_parameters SQLServer 10 True 0 781.9 us 26.71 us 76.63 us
Array_param_with_inner_join SQLServer 10 True 0 1,006.9 us 29.86 us 87.57 us
In_with_padded_parameters SQLServer 11 False 0 789.3 μs 28.94 μs 83.95 μs
Array_param_with_inner_join SQLServer 11 False 0 998.1 μs 35.75 μs 104.28 μs
In_with_padded_parameters SQLServer 11 True 0 795.3 μs 29.90 μs 88.15 μs
Array_param_with_inner_join SQLServer 11 True 0 991.0 μs 26.36 μs 76.05 μs
In_with_padded_parameters SQLServer 12 False 0 787.7 μs 29.89 μs 87.68 μs
Array_param_with_inner_join SQLServer 12 False 0 975.9 μs 30.89 μs 90.61 μs
In_with_padded_parameters SQLServer 12 True 0 800.7 μs 28.99 μs 85.48 μs
Array_param_with_inner_join SQLServer 12 True 0 985.3 μs 25.13 μs 73.69 μs
In_with_padded_parameters SQLServer 13 False 0 815.3 μs 28.79 μs 83.97 μs
Array_param_with_inner_join SQLServer 13 False 0 995.7 μs 36.50 μs 107.63 μs
In_with_padded_parameters SQLServer 13 True 0 820.5 μs 31.98 μs 94.29 μs
Array_param_with_inner_join SQLServer 13 True 0 1,006.5 μs 27.29 μs 80.46 μs
In_with_padded_parameters SQLServer 14 False 0 865.0 μs 33.55 μs 98.94 μs
Array_param_with_inner_join SQLServer 14 False 0 1,021.3 μs 33.39 μs 98.45 μs
In_with_padded_parameters SQLServer 14 True 0 849.6 μs 31.38 μs 92.52 μs
Array_param_with_inner_join SQLServer 14 True 0 1,013.0 μs 34.27 μs 100.51 μs
In_with_padded_parameters SQLServer 15 False 0 824.8 μs 33.58 μs 99.01 μs
Array_param_with_inner_join SQLServer 15 False 0 1,027.6 μs 27.90 μs 80.50 μs
In_with_padded_parameters SQLServer 15 True 0 841.5 μs 29.20 μs 86.10 μs
Array_param_with_inner_join SQLServer 15 True 0 1,011.8 μs 35.77 μs 105.46 μs
In_with_padded_parameters SQLServer 16 False 0 911.6 μs 31.84 μs 93.88 μs
Array_param_with_inner_join SQLServer 16 False 0 998.5 μs 31.49 μs 91.86 μs
In_with_padded_parameters SQLServer 16 True 0 912.9 μs 31.90 μs 94.06 μs
Array_param_with_inner_join SQLServer 16 True 0 1,004.1 μs 34.38 μs 100.82 μs
In_with_padded_parameters SQLServer 17 False 0 903.2 μs 27.24 μs 80.31 μs
Array_param_with_inner_join SQLServer 17 False 0 987.0 μs 25.22 μs 74.36 μs
In_with_padded_parameters SQLServer 17 True 0 914.9 μs 29.68 μs 87.05 μs
Array_param_with_inner_join SQLServer 17 True 0 989.6 μs 28.45 μs 82.09 μs
In_with_padded_parameters SQLServer 18 False 0 936.2 μs 33.73 μs 98.91 μs
Array_param_with_inner_join SQLServer 18 False 0 984.6 μs 24.66 μs 71.93 μs
In_with_padded_parameters SQLServer 18 True 0 904.2 μs 31.40 μs 92.09 μs
Array_param_with_inner_join SQLServer 18 True 0 983.9 μs 28.37 μs 81.41 μs
In_with_padded_parameters SQLServer 19 False 0 937.8 μs 33.65 μs 99.20 μs
Array_param_with_inner_join SQLServer 19 False 0 976.7 μs 30.03 μs 88.06 μs
In_with_padded_parameters SQLServer 19 True 0 954.2 μs 38.87 μs 114.62 μs
Array_param_with_inner_join SQLServer 19 True 0 1,022.6 μs 30.70 μs 89.55 μs
In_with_padded_parameters SQLServer 20 False 0 947.5 μs 35.27 μs 103.43 μs
Array_param_with_inner_join SQLServer 20 False 0 1,006.5 μs 23.44 μs 68.74 μs
In_with_padded_parameters SQLServer 20 True 0 932.7 μs 40.55 μs 119.57 μs
Array_param_with_inner_join SQLServer 20 True 0 1,018.0 μs 36.61 μs 107.94 μs
In_with_padded_parameters SQLServer 21 False 0 943.2 μs 33.65 μs 99.21 μs
Array_param_with_inner_join SQLServer 21 False 0 998.6 μs 24.13 μs 71.16 μs
In_with_padded_parameters SQLServer 21 True 0 982.7 μs 40.14 μs 118.37 μs
Array_param_with_inner_join SQLServer 21 True 0 1,025.1 μs 26.94 μs 78.17 μs
In_with_padded_parameters SQLServer 22 False 0 1,007.9 μs 29.37 μs 86.13 μs
Array_param_with_inner_join SQLServer 22 False 0 1,062.7 μs 33.69 μs 99.34 μs
In_with_padded_parameters SQLServer 22 True 0 1,023.2 μs 41.94 μs 123.67 μs
Array_param_with_inner_join SQLServer 22 True 0 1,062.4 μs 33.19 μs 97.34 μs
In_with_padded_parameters SQLServer 23 False 0 969.7 μs 36.09 μs 106.42 μs
Array_param_with_inner_join SQLServer 23 False 0 1,029.8 μs 33.79 μs 99.64 μs
In_with_padded_parameters SQLServer 23 True 0 999.0 μs 33.08 μs 97.54 μs
Array_param_with_inner_join SQLServer 23 True 0 1,037.8 μs 33.28 μs 97.59 μs
In_with_padded_parameters SQLServer 24 False 0 976.2 μs 47.49 μs 138.52 μs
Array_param_with_inner_join SQLServer 24 False 0 1,006.7 μs 25.46 μs 74.67 μs
In_with_padded_parameters SQLServer 24 True 0 998.9 μs 44.81 μs 131.43 μs
Array_param_with_inner_join SQLServer 24 True 0 1,015.5 μs 28.66 μs 83.61 μs
In_with_padded_parameters SQLServer 25 False 0 1,010.6 μs 44.62 μs 130.86 μs
Array_param_with_inner_join SQLServer 25 False 0 1,043.4 μs 35.54 μs 104.80 μs
In_with_padded_parameters SQLServer 25 True 0 1,047.1 μs 44.93 μs 132.47 μs
Array_param_with_inner_join SQLServer 25 True 0 1,013.7 μs 37.25 μs 109.83 μs
In_with_padded_parameters SQLServer 26 False 0 998.2 μs 45.84 μs 135.15 μs
Array_param_with_inner_join SQLServer 26 False 0 1,007.1 μs 28.03 μs 82.66 μs
In_with_padded_parameters SQLServer 26 True 0 1,039.2 μs 33.40 μs 98.47 μs
Array_param_with_inner_join SQLServer 26 True 0 1,016.4 μs 33.10 μs 97.07 μs
In_with_padded_parameters SQLServer 27 False 0 1,028.9 μs 33.98 μs 100.20 μs
Array_param_with_inner_join SQLServer 27 False 0 1,017.6 μs 26.44 μs 77.14 μs
In_with_padded_parameters SQLServer 27 True 0 1,023.6 μs 33.19 μs 97.85 μs
Array_param_with_inner_join SQLServer 27 True 0 1,023.4 μs 35.33 μs 103.62 μs
In_with_padded_parameters SQLServer 28 False 0 986.5 μs 40.78 μs 120.23 μs
Array_param_with_inner_join SQLServer 28 False 0 1,012.8 μs 25.03 μs 73.42 μs
In_with_padded_parameters SQLServer 28 True 0 1,053.5 μs 31.16 μs 91.88 μs
Array_param_with_inner_join SQLServer 28 True 0 1,034.4 μs 34.10 μs 100.53 μs
In_with_padded_parameters SQLServer 29 False 0 1,034.8 μs 33.66 μs 99.26 μs
Array_param_with_inner_join SQLServer 29 False 0 1,012.9 μs 33.75 μs 98.98 μs
In_with_padded_parameters SQLServer 29 True 0 1,069.7 μs 32.05 μs 94.00 μs
Array_param_with_inner_join SQLServer 29 True 0 1,045.3 μs 21.60 μs 62.66 μs
In_with_padded_parameters SQLServer 30 False 0 1,043.5 μs 40.67 μs 119.27 μs
Array_param_with_inner_join SQLServer 30 False 0 1,033.1 μs 31.54 μs 92.49 μs
In_with_padded_parameters SQLServer 30 True 0 1,064.4 μs 31.34 μs 92.41 μs
Array_param_with_inner_join SQLServer 30 True 0 1,052.1 μs 21.56 μs 63.56 μs
In_with_padded_parameters SQLServer 31 False 0 1,071.2 us 34.23 us 100.92 us
Array_param_with_inner_join SQLServer 31 False 0 1,026.3 us 26.02 us 76.30 us
In_with_padded_parameters SQLServer 31 True 0 1,096.5 us 50.68 us 149.42 us
Array_param_with_inner_join SQLServer 31 True 0 1,043.7 us 27.62 us 81.45 us
In_with_padded_parameters SQLServer 32 False 0 1,091.2 us 46.56 us 137.29 us
Array_param_with_inner_join SQLServer 32 False 0 1,024.3 us 30.00 us 88.47 us
In_with_padded_parameters SQLServer 32 True 0 1,120.6 us 44.00 us 129.72 us
Array_param_with_inner_join SQLServer 32 True 0 1,051.4 us 34.78 us 102.56 us
In_with_padded_parameters SQLServer 33 False 0 1,063.9 us 35.94 us 104.85 us
Array_param_with_inner_join SQLServer 33 False 0 994.8 us 35.37 us 103.75 us
In_with_padded_parameters SQLServer 33 True 0 1,069.1 us 41.82 us 123.30 us
Array_param_with_inner_join SQLServer 33 True 0 1,063.6 us 33.36 us 98.37 us
In_with_padded_parameters SQLServer 34 False 0 1,090.3 us 43.93 us 129.53 us
Array_param_with_inner_join SQLServer 34 False 0 1,015.3 us 22.14 us 64.24 us
In_with_padded_parameters SQLServer 34 True 0 1,096.9 us 36.54 us 106.00 us
Array_param_with_inner_join SQLServer 34 True 0 1,016.5 us 35.40 us 104.38 us
In_with_padded_parameters SQLServer 35 False 0 1,092.8 us 44.31 us 129.96 us
Array_param_with_inner_join SQLServer 35 False 0 1,024.5 us 29.75 us 87.73 us
In_with_padded_parameters SQLServer 35 True 0 1,128.4 us 50.91 us 149.31 us
Array_param_with_inner_join SQLServer 35 True 0 1,034.2 us 29.22 us 84.77 us

Summary

  • Unfortunately, there isn’t one technique which performs best in all scenarios; for smaller value counts (< 15), padded parameters performs better, whereas with larger ones OPENJSON performs better.
  • Ideally, a hybrid approach would be implemented, using parameter padding in lower counts and automatically switching to OPENJSON as counts increase. However, EF’s query and caching architecture makes this non-trivial.
  • Since OPENJSON scales better in general scenario (arbitrary value numbers), and since we want to unlock arbitrary querying of OPENJSON-based array parameters (#30426), we think it makes sense to start with the OPENJSON-based implementation. Then, we can look at adding parameterized IN with padding as an optimization for lower value counts.
  • Note that OPENJSON can’t always be used (older SQL Server versions, spatial types are incompatible with it). For those cases we’ll currently revert back to the current behavior (IN with constants). We may improve this in the future to IN with parameters.

I wrote an extension to EF6 to handle this exact problem for our product and apparently it is extremely similar to what Nick is proposing above. By replacing this where clause: .Where(entity => myEnumerable.Contains(entity.prop)) with this extension method: .BatchedWhereKeyInValues(entity => entity.prop, myEnumerables, batchSize: 100), my solution will break the list of values into batches, and then produce a properly parameterized query that generates the desired SQL.

For example, this expression: query.BatchedWhereKeyInValues(q => q.Id, values: {1,2,5,7,8,9}, batchSize: 3)

would effectively become the following two queryables, where the numbers are supplied as variables that allow EF to parameterize the query rather than embed them as constants: query.Where(q => q.Id == 1 || q.Id == 2 || q.Id == 5) query.Where(q => q.Id == 7 || q.Id == 8 || q.Id == 9)

EF then converts those LINQ expressions into this SQL expression: WHERE Id IN (@p__linq__0, @p__linq__1, @p__linq__2)

Check out the source here: https://gist.github.com/kroymann/e57b3b4f30e6056a3465dbf118e5f13d

Here’s an alternative solution I discovered thanks to a semi-related SO post by @davidbaxterbrowne:

var records = await _context.Items
	.AsNoTracking()
	.FromSql(
		"SELECT * FROM dbo.Items WHERE Id IN (SELECT value FROM OPENJSON({0}))",
		JsonConvert.SerializeObject(itemIds.ToList()))
	.ToListAsync();

It’s specific to SQL Server 2017+ (or Azure SQL Server), and in my testing with a list of 50,000 IDs against a remote Azure SQL database it was 5x faster than a standard batching solution (dividing into lists of 1,000 IDs).

@kroymann Works like a charm! Big Thanks!

public static class QueryableExtension<TQuery>
    {
        internal static IEnumerable<IQueryable<TQuery>> WhereIn<TKey>(IQueryable<TQuery> queryable,
            Expression<Func<TQuery, TKey>> keySelector, IEnumerable<TKey> values, int batchSize)
        {
            List<TKey> distinctValues = values.Distinct().ToList();
            int lastBatchSize = distinctValues.Count % batchSize;
            if (lastBatchSize != 0)
            {
                distinctValues.AddRange(Enumerable.Repeat(distinctValues.Last(), batchSize - lastBatchSize));
            }

            int count = distinctValues.Count;
            for (int i = 0; i < count; i += batchSize)
            {
                var body = distinctValues
                    .Skip(i)
                    .Take(batchSize)
                    .Select(v =>
                    {
                        // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                        Expression<Func<TKey>> valueAsExpression = () => v;
                        return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                    })
                    .Aggregate((a, b) => Expression.OrElse(a, b));
                if (body == null)
                {
                    yield break;
                }

                var whereClause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);
                yield return queryable.Where(whereClause);
            }
        }

// doesn't use batching
        internal static IQueryable<TQuery> WhereIn<TKey>(IQueryable<TQuery> queryable,
            Expression<Func<TQuery, TKey>> keySelector, IEnumerable<TKey> values)
        {
            TKey[] distinctValues = values.Distinct().ToArray();


            int count = distinctValues.Length;
            for (int i = 0; i < count; ++i)
            {
                var body = distinctValues
                    .Select(v =>
                    {
                        // Create an expression that captures the variable so EF can turn this into a parameterized SQL query
                        Expression<Func<TKey>> valueAsExpression = () => v;
                        return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                    })
                    .Aggregate((a, b) => Expression.OrElse(a, b));

                var whereClause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);
                return queryable.Where(whereClause);
            }

            return Enumerable.Empty<TQuery>().AsQueryable();
        }
    }

Usage:

int[] a = Enumerable.Range(1, 10).ToArray();
var queries = QueryableExtension<User>.WhereIn(dbContext.Users, t => t.Id, a, 5);
foreach (var queryable in queries)
{
    _ = queryable.ToArray();
}

var everything = QueryableExtension<User>.WhereIn(dbContext.Users, t => t.Id, a);
_ = everything.ToArray(); 

@yv989c great to hear! In my benchmark above, XML performed worse than than the current EF technique (expansion to constants), but I may have not optimized that technique as much as I should have. In any case, I think that at the point where we are, simply supporting JSON and allowing an opt-out back to the current technique for old SQL Server versions is good enough.

I’m currently actively working on this, as part of a much larger change that should allow fully queryable parameters, constants and even columns across different providers. Stay tuned.

It is good to see that “advanced” solutions have been proposed here. Is it feasible to create a rather simple but quick fix first? Simply making the list a bunch of parameters (e.g. @p0, @p1, ...) would be much better than the literals. That way there’s one plan for each list count in contrast to one plan for each set of constants.

Also such queries don’t aggregate well in logging systems like Application Insights, cause every query has unique sql.

@mrlife unfortunately not - this is quite a non-trivial optimization, and we’re now in a feature freeze, stabilizing the release. I’m very hopeful we’ll be able to tackle this for 7.0.

Been investigating solutions in the meantime and found that the string_split function idea from above can be separated out into its own queryable for flexible reuse. For example…

Add keyless entity to your model:

public class StringSplitResult
{
    public string Value { get; set; }
}

Some methods like this on your context:

public IQueryable<string> AsQuery(IEnumerable<string> values) =>
    SplitString(string.Join(",", values));

public IQueryable<string> SplitString(string values, string separator = ",") =>
    Set<StringSplitResult>()
        .FromSqlInterpolated($"SELECT Value FROM string_split({values}, {separator})")
        .Select(x => x.Value);

You could do something like this:

var codes = new[] {"CODE1", "CODE2"};
var item = context.Set<Entity>()
    .Where(entity => context.AsQuery(codes).Contains(entity.Code))
    .FirstOrDefault();

Which produces SQL:

Microsoft.EntityFrameworkCore.Database.Command: Information: Executed DbCommand (21ms) [Parameters=[p0='CODE1,CODE2' (Size = 4000), p1=',' (Size = 4000)], CommandType='Text', CommandTimeout='30']
SELECT TOP(1) [e].[Id], [e].[Code]
FROM [Entities] AS [e]
WHERE EXISTS (
    SELECT 1
    FROM (
        SELECT Value FROM string_split(@p0, @p1)
    ) AS [s]
    WHERE ([s].[Value] = [e].[Code]) OR ([s].[Value] IS NULL AND [e].[Code] IS NULL))

@snappyfoo we’re still working out the mechanics in our planning process and the Github project board. The important thing is that this issue has the consider-for-next-release label, which means it’s definitely on the table for 6.0.

I just wanted to share here some lexperiments I did some time ago with different alternative approaches to Enumerable.Contains using existing EF Core APIs:

https://github.com/divega/ContainsOptimization/

The goal was both to find possible workarounds, and to explore how we could deal with Contains in the future.

I timed the initial construction and first execution of the query with different approaches. This isn’t a representative benchmark because there is no data and it doesn’t measure the impact of caching, but I think it is still interesting to see the perf sensitivity to the number of parameters.

The code tests 4 approaches:

Test("standard Contains", context.People.Where(p => (ids.Contains(p.Id))));

Test("Parameter rewrite", context.People.In(ids, p => p.Id));

Test("Split function", 
    context.People.FromSql(
        $"select * from People p where p.Id in(select value from string_split({string.Join(",", ids)}, ','))"));

Test("table-valued parameter", 
    context.People.FromSql(
        $"select * from People p where p.Id in(select * from {context.CreateTableValuedParameter(ids, "@p0")})"));

Standard Contains

This is included as a baseline and is what happens today when you call Contains with a list: the elements of the list get in-lined as constants in the SQL as literals and the SQL cannot be reused.

Parameter rewrite

This approaches tries to rewrite the expression to mimic what the compiler would produce for a query like this:

var p1 = 1;
var p2 = 2; 
var p3 = 3;
context.People.Where(p => (new List<int> {p1, p2, p3}.Contains(p.Id))).ToList();

It also “bucketizes” the list, meaning that it only produces lists of specific lengths (powers of 2), repeating the last element as necessary, with the goal of favoring caching.

This approach only works within the limits of 2100 parameters in SQL Server, and sp_executesql takes a couple of parameters, so the size of the last possible bucket is 2098.

Overall this seems to be the most expensive approach using EF Core, at least on initial creation and execution.

Split function

This approach was mentioned by @NickCraver and @mgravell as something that Dapper can leverage. I implemented it using FromSql. It is interesting because it leads to just one (potentially very long) string parameter and it seems to perform very well (at least in the first query), but the STRING_SPLIT function is only available in SQL Server since version 2016.

Table-valued parameter

For this I took advantage of the ability to create table-valued parameters that contain an IEnumerable<DbDataRecord> (I used a list rather than a streaming IEnumerable, but not sure that matters in this case) and pass them as a DbParameter in FromSql. I also added a hack to define the table type for the parameter if it isn’t defined. Overall this seems to be the second most lightweight approach after the split function.

@yv989c @pfritschi I’ve done some work in the recent days to improve the SQL quality around this, specifically #30976 which switches to using IN instead of EXISTS where possible, and #30983 which switches to OPENJSON with WITH where ordering preservation isn’t required (e.g. with IN/EXISTS). All this work should hopefully make it into preview6, and should make the EF-generated SQL as optimal as possible.

#30984 tracks synthesizing an ordering value for parameter collections where ordering is needed; I don’t intend to work on this at the moment.

Hey @roji , maybe I can further elaborate on the issues described by @m-gasser:

Our initial implementation did not have the WITH clause and we had issues in production this way. So at least for us, it will not be good enough without the WITH clause, we tried.

Can you share more details about this? What specific problems did you have?

If you use OPENJSON without the data type definition for the values, SQL Server will throw the Warning Type conversation in expression (CONVERT_IMPLICIT(int,OPENJSON_DEFAULT.[value],0)) may affect "CardinalityEstimate" in query plan choice. As the warning explains, having an incorrect cardinality estimation can lead to inefficient query plans, because this estimation is a key factor in determining which join and index operations are used. You will notice this issue if you work with bigger data sets or have many joins in your query.

simple_query_no_with

However, if you define the data type of your JSON values with WITH (...), SQL Server is able to estimate the cardinality correctly and you will get a more stable query execution plan.

We also size the json parameter to VARCHAR(100), VARCHAR(4000) or VARCHAR(max), depending on how many parameters are provided. I can not remember the details but I guess this allows SQL Server to keep different query plans optimized for varying number of parameters.

Here as well I’d be interesting in understanding this better - any concrete info would be very helpful.

When your JSON values are strings and you want to provide the data type (eg. OPENJSON(@display_names) WITH([value] VARCHAR(100) '$') AS [n] WHERE [n].[value] = [u].[DisplayName], you have to make sure, that the length of VARCHAR (in this example 100) is consistent for the same query. You should use the target column definition if possible, or if you cannot use the column definition you should try to use gradations.

query_plan_text

In the picture above I have executed the same queries with different parameters, one time with a consistent VARCHAR length and one time dynamically calculated depended on the longest string in the parameter. As you can see, if you have a different VARCHAR lengths, SQL Server will think that it is a different query and generate a new query plan. This causes a lot of overhead for each execution and also prevents SQL Server form using the Query Store and Plan Cache efficiently.

query_plan_generated

After further tests I also noticed that (as mentioned by @m-gasser), you get different query execution plans, if the json string is defined with different lengths. So EXEC sp_executesql '...', N'@json_string VARCHAR(4000)', @json_string='["a","b","c"]' will generate a different query plan then EXEC sp_executesql '...', N'@json_string VARCHAR(2000)', @json_string='["a","b","c"]', even if the query and data type definitions are exactly the same.

NOTE: See additional benchmarking below with SqlClient EnableOptimizedParameterBinding

Did a quick perf investigation of parameterized vs. non-parameterized WHERE x IN (...) with lots of values.

Key notes and takeaways:

  • This benchmark does not factor in the price of plan cache pollution, so it shows non-parameterized queries in a better light than the reality. In a real application with many query types, the non-parameterized IN queries would cause other queries to get evicted, affecting their perf in a potentially significant way. In this benchmark there’s just one SQL, so that obviously doesn’t show.
  • Parameterized queries are indeed slower on SQL Server. A somewhat significant gap starts to show even with 10 values (~230us or ~28%), with the gap seemingly widening with the number of values. The 1000 element case seems anomalous with a huge gap, but only in the “not found case”.
  • In PG they’re also a tiny bit slower, but far less than SQL Server. With 1000 IN values, the difference is 871us (8%) and 283us (2%) for the “found” and “not found” cases. This is almost sure to not be worth the price of losing command preparation (due to pollution), for the general case.
  • Note that PG already has an efficient translation for this LINQ construct, since it has 1st class array support.

To summarize, I hope we can end up with a solution that doesn’t use the SQL IN construct at all, thereby avoiding all of this (see the INNER JOIN-based solutions above).

Method NumValues Database IsFound IsParameterized Mean Error StdDev Median
Test 1 PostgreSQL False False 304.7 us 8.26 us 24.36 us 306.6 us
Test 1 PostgreSQL False True 304.9 us 8.67 us 25.56 us 311.0 us
Test 1 PostgreSQL True False 311.0 us 8.43 us 24.71 us 314.0 us
Test 1 PostgreSQL True True 314.7 us 9.22 us 27.20 us 317.2 us
Test 10 PostgreSQL False False 331.9 us 14.88 us 43.88 us 343.4 us
Test 10 PostgreSQL False True 344.7 us 11.92 us 35.16 us 346.0 us
Test 10 PostgreSQL True False 350.0 us 12.44 us 36.69 us 356.4 us
Test 10 PostgreSQL True True 357.2 us 11.32 us 33.37 us 359.6 us
Test 100 PostgreSQL False False 671.5 us 34.80 us 101.51 us 656.9 us
Test 100 PostgreSQL False True 724.5 us 48.19 us 142.10 us 716.4 us
Test 100 PostgreSQL True False 719.6 us 50.41 us 147.83 us 695.3 us
Test 100 PostgreSQL True True 730.6 us 50.48 us 148.86 us 715.9 us
Test 1000 PostgreSQL False False 9,738.7 us 53.92 us 45.02 us 9,746.8 us
Test 1000 PostgreSQL False True 10,021.4 us 199.92 us 273.65 us 9,981.0 us
Test 1000 PostgreSQL True False 9,299.2 us 182.40 us 355.76 us 9,247.7 us
Test 1000 PostgreSQL True True 10,170.5 us 201.69 us 207.12 us 10,190.8 us
Test 1 SQLServer False False 541.9 us 10.80 us 18.34 us 548.0 us
Test 1 SQLServer False True 546.4 us 10.88 us 11.64 us 546.3 us
Test 1 SQLServer True False 547.8 us 10.86 us 25.81 us 554.7 us
Test 1 SQLServer True True 552.8 us 10.89 us 16.63 us 554.0 us
Test 10 SQLServer False False 582.3 us 11.44 us 20.63 us 590.0 us
Test 10 SQLServer False True 808.9 us 23.28 us 68.65 us 825.0 us
Test 10 SQLServer True False 587.7 us 11.72 us 28.53 us 595.7 us
Test 10 SQLServer True True 817.7 us 26.45 us 77.99 us 845.9 us
Test 100 SQLServer False False 1,046.6 us 63.10 us 186.05 us 1,111.1 us
Test 100 SQLServer False True 1,853.9 us 76.86 us 226.62 us 1,878.4 us
Test 100 SQLServer True False 1,088.7 us 57.03 us 168.16 us 1,134.6 us
Test 100 SQLServer True True 1,900.9 us 99.60 us 293.69 us 1,949.8 us
Test 1000 SQLServer False False 2,522.5 us 67.90 us 199.15 us 2,524.7 us
Test 1000 SQLServer False True 20,636.4 us 329.94 us 275.52 us 20,653.9 us
Test 1000 SQLServer True False 20,642.5 us 410.18 us 770.42 us 20,602.6 us
Test 1000 SQLServer True True 20,818.2 us 412.19 us 677.24 us 20,777.1 us
Results as a text-only table
| Method | NumValues |   Database | IsFound | IsParameterized |        Mean |     Error |    StdDev |      Median |
|------- |---------- |----------- |-------- |---------------- |------------:|----------:|----------:|------------:|
|   Test |         1 | PostgreSQL |   False |           False |    304.7 us |   8.26 us |  24.36 us |    306.6 us |
|   Test |         1 | PostgreSQL |   False |            True |    304.9 us |   8.67 us |  25.56 us |    311.0 us |
|   Test |         1 | PostgreSQL |    True |           False |    311.0 us |   8.43 us |  24.71 us |    314.0 us |
|   Test |         1 | PostgreSQL |    True |            True |    314.7 us |   9.22 us |  27.20 us |    317.2 us |
|   Test |        10 | PostgreSQL |   False |           False |    331.9 us |  14.88 us |  43.88 us |    343.4 us |
|   Test |        10 | PostgreSQL |   False |            True |    344.7 us |  11.92 us |  35.16 us |    346.0 us |
|   Test |        10 | PostgreSQL |    True |           False |    350.0 us |  12.44 us |  36.69 us |    356.4 us |
|   Test |        10 | PostgreSQL |    True |            True |    357.2 us |  11.32 us |  33.37 us |    359.6 us |
|   Test |       100 | PostgreSQL |   False |           False |    671.5 us |  34.80 us | 101.51 us |    656.9 us |
|   Test |       100 | PostgreSQL |   False |            True |    724.5 us |  48.19 us | 142.10 us |    716.4 us |
|   Test |       100 | PostgreSQL |    True |           False |    719.6 us |  50.41 us | 147.83 us |    695.3 us |
|   Test |       100 | PostgreSQL |    True |            True |    730.6 us |  50.48 us | 148.86 us |    715.9 us |
|   Test |      1000 | PostgreSQL |   False |           False |  9,738.7 us |  53.92 us |  45.02 us |  9,746.8 us |
|   Test |      1000 | PostgreSQL |   False |            True | 10,021.4 us | 199.92 us | 273.65 us |  9,981.0 us |
|   Test |      1000 | PostgreSQL |    True |           False |  9,299.2 us | 182.40 us | 355.76 us |  9,247.7 us |
|   Test |      1000 | PostgreSQL |    True |            True | 10,170.5 us | 201.69 us | 207.12 us | 10,190.8 us |

|   Test |         1 |  SQLServer |   False |           False |    541.9 us |  10.80 us |  18.34 us |    548.0 us |
|   Test |         1 |  SQLServer |   False |            True |    546.4 us |  10.88 us |  11.64 us |    546.3 us |
|   Test |         1 |  SQLServer |    True |           False |    547.8 us |  10.86 us |  25.81 us |    554.7 us |
|   Test |         1 |  SQLServer |    True |            True |    552.8 us |  10.89 us |  16.63 us |    554.0 us |
|   Test |        10 |  SQLServer |   False |           False |    582.3 us |  11.44 us |  20.63 us |    590.0 us |
|   Test |        10 |  SQLServer |   False |            True |    808.9 us |  23.28 us |  68.65 us |    825.0 us |
|   Test |        10 |  SQLServer |    True |           False |    587.7 us |  11.72 us |  28.53 us |    595.7 us |
|   Test |        10 |  SQLServer |    True |            True |    817.7 us |  26.45 us |  77.99 us |    845.9 us |
|   Test |       100 |  SQLServer |   False |           False |  1,046.6 us |  63.10 us | 186.05 us |  1,111.1 us |
|   Test |       100 |  SQLServer |   False |            True |  1,853.9 us |  76.86 us | 226.62 us |  1,878.4 us |
|   Test |       100 |  SQLServer |    True |           False |  1,088.7 us |  57.03 us | 168.16 us |  1,134.6 us |
|   Test |       100 |  SQLServer |    True |            True |  1,900.9 us |  99.60 us | 293.69 us |  1,949.8 us |
|   Test |      1000 |  SQLServer |   False |           False |  2,522.5 us |  67.90 us | 199.15 us |  2,524.7 us |
|   Test |      1000 |  SQLServer |   False |            True | 20,636.4 us | 329.94 us | 275.52 us | 20,653.9 us |
|   Test |      1000 |  SQLServer |    True |           False | 20,642.5 us | 410.18 us | 770.42 us | 20,602.6 us |
|   Test |      1000 |  SQLServer |    True |            True | 20,818.2 us | 412.19 us | 677.24 us | 20,777.1 us |
Benchmark code
BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    private DbConnection _connection;
    private DbCommand _command;

    private const int Rows = 100000;

    [Params(1, 10, 100, 1000)]
    public int NumValues { get; set; }

    [Params("SQLServer", "PostgreSQL")]
    public string Database { get; set; }

    [Params(true, false)]
    public bool IsFound { get; set; }

    [Params(true, false)]
    public bool IsParameterized { get; set; }

    private Guid _existingGuid;

    [GlobalSetup]
    public Task Setup()
        => Database switch
        {
            "SQLServer" => SetupSqlServer(),
            "PostgreSQL" => SetupPostgreSQL(),
            _ => throw new NotSupportedException("Unknown database: " + Database)
        };

    private async Task SetupSqlServer()
    {
        var connection = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false");
        await connection.OpenAsync();
        var command = connection.CreateCommand();
        command.CommandText =
"""
DROP TABLE IF EXISTS data;
CREATE TABLE data (id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWSEQUENTIALID(), some_int INT);
""";
        await command.ExecuteNonQueryAsync();

        foreach (var chunk in Enumerable.Range(0, Rows).Chunk(1000))
        {
            var builder = new StringBuilder("INSERT INTO data (some_int) VALUES ");
            for (var i = 0; i < chunk.Length; i++)
            {
                if (i > 0)
                    builder.Append(", ");
                builder.Append($"({chunk[i]})");
            }

            command.CommandText = builder.ToString();
            await command.ExecuteNonQueryAsync();
        }

        command.CommandText = "SELECT TOP(1) id FROM data";
        _existingGuid = (Guid)(await command.ExecuteScalarAsync())!;

        var builder2 = new StringBuilder("SELECT some_int FROM data WHERE id IN (");

        command.Parameters.Clear();

        if (IsParameterized)
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");

                builder2.Append("@p" + i);

                command.Parameters.AddWithValue("p" + i, i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid());
            }
        }
        else
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");
                builder2
                    .Append("'")
                    .Append(i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid())
                    .Append("'");
            }
        }

        builder2.Append(")");
        command.CommandText = builder2.ToString();

        _connection = connection;
        _command = command;
    }

    private async Task SetupPostgreSQL()
    {
        var connection = new NpgsqlConnection("Server=localhost;Username=test;Password=test");
        await connection.OpenAsync();
        var command = connection.CreateCommand();
        command.CommandText =
"""
DROP TABLE IF EXISTS data;
CREATE TABLE data (id uuid PRIMARY KEY DEFAULT gen_random_uuid(), some_int INT);
""";
        await command.ExecuteNonQueryAsync();

        foreach (var chunk in Enumerable.Range(0, Rows).Chunk(1000))
        {
            var builder = new StringBuilder("INSERT INTO data (some_int) VALUES ");
            for (var i = 0; i < chunk.Length; i++)
            {
                if (i > 0)
                    builder.Append(", ");
                builder.Append($"({chunk[i]})");
            }

            command.CommandText = builder.ToString();
            await command.ExecuteNonQueryAsync();
        }

        command.CommandText = "SELECT id FROM data LIMIT 1";
        _existingGuid = (Guid)(await command.ExecuteScalarAsync())!;

        var builder2 = new StringBuilder("SELECT some_int FROM data WHERE id IN (");

        command.Parameters.Clear();

        if (IsParameterized)
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");

                builder2.Append("$" + (i + 1));

                command.Parameters.Add(new()
                {
                    Value = i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid()
                });
            }
        }
        else
        {
            for (var i = 0; i < NumValues; i++)
            {
                if (i > 0)
                    builder2.Append(", ");
                builder2
                    .Append("'")
                    .Append(i == NumValues - 1 && IsFound ? _existingGuid : Guid.NewGuid())
                    .Append("'");
            }
        }

        builder2.Append(")");
        command.CommandText = builder2.ToString();

        _connection = connection;
        _command = command;
    }

    [Benchmark]
    public Task<object> Test()
        => _command.ExecuteScalarAsync();

    [GlobalCleanup]
    public ValueTask Cleanup()
        => _connection.DisposeAsync();
}

I see @roji. Thanks for taking the time to explain. On my side I see how my current approach might lead to unexpected results when evaluating Skip/Take on the server-side.

Hey @roji, I have fixed this in the latest release of QueryableValues. It now provides ordering guarantees, among other fixes and enhancements. Besides higher memory allocations, the performance cost resulting from the bigger payload and additional complexity was minimal.

Note that if there’s evidence of an actual performance difference between the two variants, there’s the option of emitting OPENJSON with WITH when ordering is irrelevant (e.g. for Contains), but without it where ordering needs to be preserved. But since this can add some complexity, I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

I ran a slim version of my Contains benchmarks using the latest version of QueryableValues with Microsoft.EntityFrameworkCore.SqlServer version 7.0.5 and 8.0.0-preview.4.23259.3. Unfortunately, the preview version performed significantly worse on types other than Int32. However, it does fix the query plan cache pollution issue.

Here are the BenchmarkDotNet results:

Server instance specs

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
Oct  8 2022 05:58:25
Copyright (C) 2022 Microsoft Corporation
Express Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 22621: ) (Hypervisor)

Benchmarks

I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

In our company we currently us a custom SqlServerQuerySqlGenerator to generate WHERE conditions in the form Foo IN(SELECT Value FROM OPENJSON(@jsonstr) WITH (Value CHAR(14)). Our initial implementation did not have the WITH clause and we had issues in production this way. So at least for us, it will not be good enough without the WITH clause, we tried. I hope this anecdotal evidence is good enough for you to implement this.

We also size the json parameter to VARCHAR(100), VARCHAR(4000) or VARCHAR(max), depending on how many parameters are provided. I can not remember the details but I guess this allows SQL Server to keep different query plans optimized for varying number of parameters.

Hello everyone! This is my take on providing a solution to this problem. It’s now even more flexible by allowing you to compose a complex type.

I just added some benchmarks showing great results.

Please give it a try. I’m looking forward to your feedback.

Links:

I’d like to contribute some insights that might be related to the issue @michaelmesser is discussing, as I encountered a similar problem while implementing OPENJSON in QueryableValues.

During performance tests, I discovered that the XML approach outperformed OPENJSON significantly when handling a large number of elements. The issue became apparent when I composed the output from the OPENJSON function against a physical table. The larger the physical table, the worse OPENJSON performed. A closer examination of the execution plan revealed that the query engine was reevaluating/parsing –my best interpretation– the JSON data provided to OPENJSON for each row –kinda– in the physical table. The following image showcases the metrics from my reproduction of the issue: image

Per the execution plan, OPENJSON was being executed multiple times, despite its input being constant. If this is true, then CPU cycles were being wasted.

After extensive experimentation and trying various complex approaches, I managed to persuade the query engine to execute the OPENJSON function only once. The outcome can be seen in the following image: image

After this tweak, OPENJSON surpassed the performance of all other methods. This improvement was achieved by projecting a dummy column and subsequently sorting the OPENJSON output by it.

This was my experience when testing with the following SQL Server instance specs:

Microsoft SQL Server 2022 (RTM) - 16.0.1000.6 (X64)
Oct  8 2022 05:58:25
Copyright (C) 2022 Microsoft Corporation
Express Edition (64-bit) on Windows 10 Pro 10.0 <X64> (Build 22621: ) (Hypervisor)

I hope this information proves helpful.

Up to 1 GB ???

@StefanOverHaevgRZ we’ll try to put this in for preview4, but there’s a chance that might not happen and it’ll only be in preview5.

And yes - the new solution utilizes JSON, so the number of values in the array is no longer important.

Per the above, JSON seems to be the right candidate for the job because of its availability across DB technologies […]

FWIW the other candidate (parameterized IN) is also available across all DBs.

On the other hand, instead of scoping this to only the use of Contains, would it be possible to consider leveraging the power of the JSON format to allows us to compose any in-memory collection with the DB entities?

That’s #30426, which I mentioned in the summary; and yes, the OPENJSON approach wouldn’t be limited only to Contains. However, at this point it’s unlikely we’ll focus on collection parameters which aren’t over simple primitive types (e.g. composite keys which involve tuples), since that will probably introduce some significant additional complexity. I’ll definitely keep it in mind during implementation, but as @ErikEJ wrote, it’s important to focus on this very specific issue (the 2nd most highly voted in this repo) and not let scope creep compromise it.

Note that OPENJSON can’t always be used (older SQL Server versions, spatial types are incompatible with it). For those cases we’ll currently revert back to the current behavior (IN with constants).

How do you plan to implement this? Via an DbOptions setting for older compat levels? Or query the compat level?

A DbOptions setting. We’ll mostly likely default to using OPENJSON by default, and you’d have a context option to request compatibility with older versions (minor breaking breaking change for people using older databases). I think we’ve already had other cases where this was needed, so I think it makes sense.

Maybe: This function is only available with compatibility level 130 or higher (SQL Server 2016 and later and Azure SQL Database)

Thanks, edited the text.

IIRC SQL Server TVPs require an explicit head-of-time CREATE TYPE, so probably not appropriate for this kind of thing. But JSON is probably a possibility (via OPENJSON).

Interesting discussion.

It would be interesting to see benchmark results with the upcoming JSON native type which “will allow you to store JSON documents in a native binary format that is optimized for storage and query performance”. It’s probably more relevant for column data type, but one can also declare JSON parameters/variables.

DECLARE @json JSON = N'
[
    {
        "OrderNumber": "S043659",
        "Date":"2022-05-24T08:01:00",
        "AccountNumber":"AW29825",
        "Price":59.99,
        "Quantity":1
    },
    {
        "OrderNumber": "S043661",
        "Date":"2022-05-20T12:20:00",
        "AccountNumber":"AW73565",
        "Price":24.99,
        "Quantity":3
    }
]';

Although I’m not sure if it’s possible to use this type via c# code.

@canadar only in EF Core 8 - this is quite a serious change/feature, we don’t backport this kind of thing to patch releases (see our planning/release docs).

@yv989c apologies for the long delay, I was travelling.

I’m not an expert on SQL Server internals by any means. I mainly rely on the information provided by the execution plan and stats (e.g. SET STATISTICS) in an effort to make an educated assessment.

Oh absolutely - and all your comments/feedback here is greatly appreciated; you’ve certainly gone deeper here than I have.

IME the more info you provide to the query engine, the better it can optimize its plan.

That’s definitely true and I also would prefer to use OPENJSON with WITH rather than apply casting in the projection, if only for SQL cleanliness. Note that if there’s evidence of an actual performance difference between the two variants, there’s the option of emitting OPENJSON with WITH when ordering is irrelevant (e.g. for Contains), but without it where ordering needs to be preserved. But since this can add some complexity, I’d rather have some proof that it’s worth it first… Any help you can provide on this would be greatly appreciated - otherwise I can do some investigating too.

@michaelmesser thanks for your repro - I can see the (quite strange) phenomenon occuring. I’ll investigate this internally and post back here on the results.

@yv989c yeah, the OPENJSON form with WITH unfortunately doesn’t provide the array index to order by. Note that Azure Synapse does provide an array identity function (docs), but this unfortunately isn’t yet supported elsewhere. I’ve received confirmation that one should not rely on the order of rows being the same as the input array - an ORDER BY is necessary in order to guarantee that; as a result I’ve implemented this without the WITH clause, and added casting on the projected column instead.

I attached SQL to reproduce OPENJSON with WithUnorderedPrefetch and without WithUnorderedPrefetch. Run TestCreate.sql, TestInsert.sql, and then TestQuery.sql. While the difference in this example is hundreds of milliseconds, the performance difference with the real data is many seconds.

Test.zip

I tried to use OPENJSON and STRING_SPLIT to work around this issue. With all the relevant pages in memory everything works as expected. However, when pages have to be read from disk the performance is very slow. OPENJSON and STRING_SPLIT have low estimates. This leads to the INNER LOOP JOIN not having WithUnorderedPrefetch enabled. Without prefetching, the query has physical reads instead of read-ahead reads. Tricking the cardinality estimator into giving a higher estimate fixes the issue, but results in less readable code. How will EF Core’s solution handle this case?

Oh I see…

I’m not sure to what extent we need to actually look at the SQL Server compatibility level in those cases… The thing with this issue is that there’s a fallback: in modern SQL Server we use JSON, otherwise we fall back to IN+constants (since Contains must work, we can’t have it broken on old versions).

With STRING_AGG and CONCAT_WS there’s no “bad” translation to fall back to if they’re not supported, AFAIK. So a compatibility check would at most allow us to provide the user with a more informative “you need to upgrade” error, rather than a cryptic SQL Server “syntax error”.

In any case, CONCAT_WS is tracked by #28899, I already have an old PR #28900 which I’ll need to finish at some point. STRING_AGG is already in - we can indeed add a check to throw an informative message there (though that seems somewhat low-value).

@Meligy we’re definitely aware of that approach - see the very first post above.

I’ve been doing some research on this question and will soon post a plan of action here.

@ErikEJ true - ideally we’d have raw, ADO.NET benchmarks that show bare database performance; but they do provide an indication. We’ll like do some such benchmarking when this is actually implemented for EF.

Thanks!

Thanks, curious as to how that actually works, will dig into it

@danielgreen I’m glad you’re curious. You can find those details here. 🙂

@danielgreen looks like it is passing a xml string, not a tvp

I’d like to be able to run an adhoc query passing in an array of ints, the way you can pass a table valued parameter to a stored proc. Does the QueryableValues extension enable that?

@danielgreen You can find that precise example here.

I’d like to be able to run an adhoc query passing in an array of ints, the way you can pass a table valued parameter to a stored proc. Does the QueryableValues extension enable that?

Credit where credit is due on the OPENJSON suggestion 😃 https://github.com/dotnet/efcore/issues/13617#issuecomment-491399486

Indeed!

[…] whether anything can be done to influence the estimate, as EF Core would know the actual number before sending the query.

Sure… Though we’d probably want to avoid anything that involves embedding a (constant) query hint or similar in the SQL, as that would bring us back to the original problem (SQL which depends on the collection size).

Thanks @ErikEJ for the link! Yeah, ideally JSON would perform at least nearly as well, allowing us to just always use that and not worry about delimiters.

@stevendarby Looks like @aaronbertrand did quite a lot of testing in this area already, as always “it depends” but json is not all bad: https://sqlperformance.com/2016/04/sql-server-2016/string-split-follow-up-2

@roji I said as much in my posts so don’t think I’ve oversold it as an option for some to explore- definitely some work to do for the general case 😃

@stevendarby it’s an interesting direction, but at least in the general case there are also varying number of values in the IN clause, so it’s limited in any case. That’s why an INNER JOIN-based solution could be a better way forward here.

@cjstevenson - The value you provided as example does not cause any SQL injection. EF Core escapes quotes properly in the SQL. Generated SQL for example

      SELECT [b].[Id], [b].[Value]
      FROM [Blogs] AS [b]
      WHERE [b].[Value] IN (N'legit_value\''); SELECT * FROM some_table limit 10; -- ')

This is not a security issue. This issue has nothing to do with values and SQL injection, please file a new issue with repro steps if you are seeing a bug. Please refer to security policy if you are submitting a report for security issue.

Might be interesting for others, here’s how we solved the problem by using LINQKit.Core:

var predicate = PredicateBuilder.New<User>();
foreach (var id in ids)
{
	predicate = predicate.Or(user => user.Id == id);
}

var throwAway = context.Users.Where(predicate).ToList();

This will generate a parameterized query that looks like this:

SELECT ...
FROM   [Users] AS [u]
WHERE  [u].[Id] = @id1 
    OR [u].[Id] = @id2
    OR [u].[Id] = @id3
    ...