dapr: [Proposal] - New Building Block: DocumentStore
Proposal - New Building Block: DocumentStore
Background:
As one of the maintainers of components-contrib it is very apparent that some state stores and their respective use cases are unlike others. MongoDB, RethinkDB and some uses of PostgreSQL are some examples here which store (in the case of PostgreSQL this is optional) data very differently from other state store components.
A DocumentStore allows accessing individual nested properties of a document. These can also be queried. Importantly, data types are retained on the nested properties in many document stores.
Interface
All DocumentStore components should have the following:
- Get Document (by key)
- Multi Get Documents (by key)
- Create Document
- Replace Document
- Delete Document
- Multi Delete Documents
- Query (Find) Documents Query API support (native support of searching within documents)
- Update Document – a HTTP Patch operation which can be used to replace nested document attributes. This should support a query filter since many document stores allow updating multiple documents matching a query.
The Query and Get operations should support filtering (projecting) of attributes/properties returned. This is done natively by the DocumentStore where supported and to be done by the component implementation if this is not natively supported.
Note: There is no intention to be compatible with data written / stored via state stores as this can lead to inefficient and complex design / implementation decisions as well as anti-patterns. However, the DocumentStore should be able to read data created by non-Dapr sources.
Content Type support requirements
- BSON (
application/bson): This is the default content-type that all document stores must support as it contains data type information. - JSON (
application/json): This should be supported, but its use generally discouraged as it is a lossy format which for example cannot distinguish between integer and float data types.
As a consequence of this of this proposal:
- The Query API (Alpha) should eventually be deprecated from State Store (it can coexist until DocumentStores are stable).
- MongoDB, RethinkDB, PostgreSQL, AWS DocumentDB, Azure CosmosDB, and possibly others should also be made available as DocumentStores.
- SDKs need to implement support for this new building block and support BSON encoding/decoding.
Potential REST API (request parameters and details not included here).
Create document:
POST http://localhost:<daprPort>/v1.0/document/<storename>/<collection>
Replace Document:
PUT http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>
Update Document:
PATCH http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>
Get Document by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>
Get Multiple Documents by ID:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>
Delete Document by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/<docid>
Delete Multiple Documents by ID:
DELETE http://localhost:<daprPort>/v1.0/document/<storename>/<collection>
Query Documents:
GET http://localhost:<daprPort>/v1.0/document/<storename>/<collection>/query
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 18
- Comments: 27 (20 by maintainers)
Well… the community has spoken! I think this looks like a clear mandate that this proposal needs revisiting and executing!
I think this is a good direction as it brings focus and expands Dapr’s state management features:
The Query API allows to query Redis, which is a very useful feature being used today. Will this ability be retained with having Redis as a document store?
Added Bulk Save and Bulk Delete to be clearer.
I need to look a bit more at common APIs. We could require Save, Get, Delete etc to support one or more documents. Not sure that these really need to be distinct APIs. They could all support one or more items.
No reason, these are perfectly valid document stores.
That was my concern as well first reading this, but unlike blobs which fit existing k/v semantics in terms of API, the document store interface has a large number of distinct and domain specific endpoints/methods to justify a new API. As far as users are concerned, there won’t be much difference between adding the endpoints to the existing API and creating a new one, save for a new component type - but even that I think goes much in the direction of clarity as users won’t need to look at yet another column in the state components table to see which supports document operations or not.
@berndverst I noticed the work going on around improving the Cosmos Query API, good stuff. I am however concerned about the strong guidance you’ve expressed a few times to avoid the Query API. Do you know if there has been any discussion around the DocumentStore building block recently in light of the above community vote?
I was just listening to the discussion in the community call 71 and I would vote that Dapr keeps its own query language abstraction. If some of the component manifestations then in turn can use MongoDB API to talk to multiple providers - fine - but I would not make it any kind of a dependency.
@fabistb this proposal would also make more sense for most of our state store scenarios
Yes, data written by non-Dapr users in general should be compatible. We should entirely shy away from specialized data representation @yaron2.
So in that sense state store data can be accessed too but you would need to understand the internals of Dapr state store to do that.
I’m lazy and didn’t feel like listing every component. Yes those components you mentioned also should be supported DocumentStores. Couldn’t think of them in the moment.
It’s not a good experience when only a small subset of state store component have a certain feature. As @yaron2 mentioned, the feature matrix gets too complex.
What is worse, it is incredibly confusing when querying of documents is completely dependent on the content-type chosen when saving state. This way it is currently possible to save state with MongoDB without actually being able to query the state.
DocumentStores should only support content types that are guaranteed to be queryable.
Many DocumentStores have additional capabilities to update multiple documents at once (all that match a certain query / condition), only replacing certain sub properties - that capability for example would not make sense within the context of the current state stores.
The fact that Query API is currently part of the State Store interface has lead to some hacky attempts at implementing Query API support for state stores which do not truly support this. We should provide for a more consistent experience in querying documents, and we should not allow letting a user write data to a DocumentStore which subsequently cannot be queried.
Redis with RedisJSON is a document store – so I think we could add that particular flavor as a supported DocumentStore @yaron2