efcore: Cosmos: Deal with missing property values
Since Cosmos DB is schema-less database, it is possible to reference a property that is not defined in all documents in a query. For example, the following query:
SELECT c.MissingProperty FROM Customers c
Returns the following results for a collection that contains three documents that do not contain MissingProperty:
[
{},
{},
{}
]
When EF Core models mapped to Cosmos DB evolve, we expect that it will be common to use a new version of an entity type that contains a property that is not defined in existing documented already stored in the database.
From the perspective of materialization, this could be dealt with by just skipping properties that are missing in the store. This would result in the properties on the objects to keep whatever value they were initialized to. For example, for optional properties, a missing value in the store would become equivalent to the property being null.
However there is an important caveat with this approach: because of how indexing works in Cosmos DB, queries that reference the missing property somewhere else than in the projection could return unexpected results. For example:
-
If a property that is missing in some documents is referenced in a predicate that tests it against null, only documents that contain the property will be returned
-
If a property that is missing in some documents is referenced in the sort expression or an ORDER BY clause, documents that contain the property with any value (including null) will be sorted, but documents that do not define the properties will be filtered out because they are not in the index used to resolve ORDER BY
Although for ORDER BY there is a way we could compensate by issuing two separate queries (the first one to get all the data, and the second one to get the order an potentially less data), it seems that this could be relatively expensive. This approach would not help for the WHERE clause case because it could require all the data from the collection to be retrieved.
But for WHERE we could use IS_DEFINED (see https://github.com/aspnet/EntityFrameworkCore/issues/13131#issuecomment-418491753).
What we can do?
The alternatives I can see are:
- Make sure the behavior is well documented
- At some point come up with “schema” evolution tooling that makes sure new properties are added to existing documents
- Try to figure out a way (probably with help of annotations in the model) we can warn when properties are used in queries which could be missing values in existing documents.
- Add an extension method to EF.Functions that maps to IS_DEFINED() (see https://github.com/aspnet/EntityFrameworkCore/issues/13131#issuecomment-418491753) so that it can be explicitly used in LINQ queries.
- Compensate for null navigation properties (see https://github.com/aspnet/EntityFrameworkCore/issues/13131#issuecomment-416392815) and scalar properties by expanding IS NULL to also check for NOT IS_DEFINED().
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 6
- Comments: 22 (16 by maintainers)
Triage decisions:
IS_DEFINED(<property>)on EF.Functions for Cosmos DBc.Address.City == addresswill expand toc.Address == null OR c.Address.City == nullwhenaddressis equal to null.@AndriySvyryd, ok, I will.
@NickSevens I got an answer that might be useful:
Another related aspect of this is shown in the following query:
This query will only return documents that have an Address in which the City set to null. If the whole address is null or missing in a document, that document will not be returned.
Assuming that we agree that getting the documents in which the whole Address is null is the most expected behavior, we can easily compensate by rewritting the predicate to add “null protection”:
But still, for any document in which the Address property is completely missing, there doesn’t seem to be anything reasonable we can do to compensate.