efcore: [Performance] GetManyByKeys - Am I missing something? It´s fast!
Hey,
I´ve recently posted some question here and there in the efcore repo. After some testing/benchmarking and refactoring… I came to the conclusion that the performance is huge!
We took our current ORM to a comparison. Same Keys, same data, same structure and so on. We generated 100.000 Product objects with no navigation properties or whatsoever.
Also we took AsNoTracking and Tracking into account which wasn´t such of a big deal.
*WithTracking - 25.000 objects - no Includes
As you can see, the first two rows represents a custom implementation of GetByKeys in efcore. In this scenario, we´ve took 25.000 keys! (I did it with 50.000 and 100.000 items, too)
It´s literally 3 times faster than our current ORM! Holy smoke!
Now, I came to the conclusion that if a price is too good to be true, then there might be something off. That´s why I am here to ask you, if my implementation cries to be refactored asap.
This is the code
public virtual async ValueTask<ICollection<TEntity>> GetObjectsByKeysAsync<TEntity>(ICollection keys, bool asNoTracking, CancellationToken cancellationToken = default) where TEntity : BaseEntity, new() {
//Check for local entries first and exclude them from the query
LocalView<TEntity> localSet = _context.Set<TEntity>().Local;
List<TEntity> localEntities = localSet.Where(x => keys.OfType<object>().Any(y => x.KeyValue.Equals(y))).ToList();
List<object> missingKeys = keys.OfType<object>().Where(x => !localEntities.Any(y => y.KeyValue.Equals(x))).ToList();
if (!missingKeys.Any())
return localEntities;
//some keys are still missing
string primaryKeyName = _sessionConfigurationProvider.MetadataStore.GetKeyName<TEntity>();
List<TEntity> retrievedValues = new List<TEntity>();
//this chunk is irrelevant to the query, because it seems that efcore generates multiple queries when reaching a certain amount of objects
//I´ve kept it, because otherwise at around 100.000 objects, the dbcommand runs into a timeout
IEnumerable<object[]> keyChunk = missingKeys.Chunk(50000);
foreach (object[] objects in keyChunk) {
Expression<Func<TEntity, bool>> expression = ExpressionTreeResolver.GenerateWhereInExpression<TEntity>(objects,primaryKeyName);
retrievedValues.AddRange(await _context.Set<TEntity>().AsNoTracking(asNoTracking).Where(expression).ToListAsync(cancellationToken));
}
return retrievedValues;
}
This is the ExpressionTree
public static Expression<Func<TEntity, bool>> GenerateWhereInExpression<TEntity>(ICollection objects, string propertyName) where TEntity : BaseEntity {
List<object> keyCollection = objects.OfType<object>().ToList();
ParameterExpression parameterExpression = Expression.Parameter(typeof(TEntity), "entity");
ConstantExpression keysParameter = Expression.Constant(keyCollection, typeof(List<object>));
MemberExpression memberExpression = Expression.Property(parameterExpression, propertyName);
Expression convertExpression = Expression.Convert(memberExpression, typeof(object));
MethodCallExpression containsExpression = Expression.Call(keysParameter, "Contains", new Type[] { }, convertExpression);
return Expression.Lambda<Func<TEntity, bool>>(containsExpression, parameterExpression);
}
It just works. But I´m literally scared to use it 😄 I would really like to get to know if that thing smells or could actually be used.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 23 (16 by maintainers)
Hey @ajcvickers,
thank you for your response. I think we will stick to it until #13617 is resolved. We won´t use it often and it´s likely a rare scenario to actually load such amount of data. Fingers crossed.
The linear search on the tracked entities however, can be refactored to be more efficient. I have got a few ideas in mind already.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.data.sqlclient.sqlcommand.enableoptimizedparameterbinding?view=sqlclient-dotnet-standard-5.0
@mkalinski93 sure thing - and thanks for your comments (I wasn’t trying to imply anything was wrong/problematic in what you wrote).
I think we went off on a tangent here with the parameterization question, which doesn’t seem to have anything to do with your original question. I’ll do some minimal benchmarking specifically for the parameterization, but we should concentrate on your original code, whether it’s correct and whether the performance indicates anything to be concerned about.
@roji Note that in the benchmarks in the attached project, the context is never actually tracking any entities, so while the code to look up local values would probably be quite slow, it is not measured in these benchmarks, which essentially just do a single no-tracking query with a big Contains.
I was going to talk about this with the team in triage. 😃