amplify-cli: Appsync api _version with Elastic Search @searchable directive (trigger fails).

Describe the bug When I add document with _version property to annotated @model+@searchable type of Graphql schema, trigger DbToEs fails with following error:

{
    "index": {
        "_index": "entity",
        "_type": "doc",
        "_id": "Google_123123123123123123|User",
        "status": 400,
        "error": {
            "type": "illegal_argument_exception",
            "reason": "Field [_version] is defined twice in [doc]"
        }
    }
}

Without _version property there are no errors.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 37 (17 by maintainers)

Most upvoted comments

@sacrampton In the file amplify/backend/api/<api-name>/transform.conf remove the “ResolverConfig” key and run amplify api gql-compile -> amplify push.

kaustavghosh06 on Apr 1, 2020

@sacrampton Understood. This might take a couple of weeks to bake this into the CLI as this work needs to be added to our sprint.

yuth on Apr 22, 2020

I opened an issue aws-amplify/amplify-cli#3818 which I now believe is identical to this one.

I believe that _version is some meta property of ElasticSearch and that it therefore not syncing.

Given that I’ve enabled Conflict Resolution with amplify update api, I have no idea how I can disable Conflict Resolution given that for now it does work with ElasticSearch and that ElasticSearch is core to our solution.

If _version is reserved in ElasticSearch, can we change the mapping in the Lambda function so that it maybe goes to version instead of _version in ElasticSearch?

sacrampton on May 17, 2022

Hi @edwardfoyle - I have tested this and you have made some progress

I tested this with a deleted record - in which case _deleted is set to true and _ttl is added. But it fails when you try to stream a record with a _ttl field - see CloudWatch error. So at the moment we can’t deal with deleted items in ElasticSearch.

            "status": 400,
            "error": {
                "type": "mapper_parsing_exception",
                "reason": "Field [_ttl] is a metadata field and cannot be added inside a document. Use the index API request parameters."
            }

Then once we get the above fixed we need to be able to query in list & search to be able to filter for _deleted and then be able to “undelete” by setting _deleted & _ttl = null

sacrampton on May 11, 2020

Also note, if you go the route of implementing @yuth’s solution, you will need to change the first part of the path on this line to be your ElasticSearch index name

edwardfoyle on Apr 28, 2020

@undefobj - just taking a moment to answer your specific questions:

Are you doing anything that needs specific Elasticsearch querying capabilities such as Geo-search or can you use @key sorting & filtering of DynamoDB?

=> The primary thing that we need on the app is Geo-search. If I’m online I can search for the ID’s of records that are within a distance of where I’m standing in my app. I can then locate those ID’s in DataStore for editing - and the edits on DataStore should sync back to DynamoDB.

=> I am not looking to do any ElasticSearch type queries when offline. However, if I did want to do a powerful ElasticSearch query when online I could similarly get that ID and locate that record ID in datastore for further editing.

If you are leveraging specific full text search functionality, do they need to happen when the system is offline? We might be able to add this in the future but need to understand it better.

=> I don’t need this capability to happen offline. I don’t have any expectation that this sort of capability - including geo queries - is available offline. Of course it would be nice if it did, but I don’t expect that and can most certainly live without it.

If you’re looking to just have the results populated back down to the client from Elasticsearch as offline persistence, would it be ok if it’s “read only”? Meaning Elasticsearch results could come down to clients with a DataStore.query but you cannot update data. This might be a possible roadmap item.

=> I’m not overly fussed on how we populate/hydrate the DataStore cache. At the moment we hydrate our offline cache with @searchable queries and connected DynamoDB gets/lists (ie. search for all assets of a particular type at a particular facility and get connected photos, defects, classes & characteristics, documents, etc. for those searched assets.

It would be nice to hydrate DataStore from an ElasticSearch query. We never write back to ElasticSearch - it always goes back to DynamoDB and the lambda function sends it to ElasticSearch. But that is not a high priority. Hydrating the cache from ElasticSearch just gives you more flexibility to select items to go into the cache. My only priority is to stop the _version conflict stopping streaming from DynamoDB to ElasticSearch.

sacrampton on Apr 21, 2020

Yes this might be possible, we assumed you didn’t want that but if you do then we can look at making that the preferred mechanism.

undefobj on Apr 21, 2020

@undefobj - didn’t mean to diminish the size of the effort here - you are right, I’m not qualified to make that assessment.

Rather than strip out the _version field, would it be possible to enable external versioning in ElasticSearch when your function sends the data? That way the _version field in both DynamoDB and ElasticSearch would be the same.

If stripping out the _version is the only option at least it keeps it working - but it would be awesome if we could keep the _version the same in ES & DDB.

sacrampton on Apr 21, 2020

@undefobj - in your separate message “…I am looking into one possible solution that might allow for a write-through from DynamoDB to Elasticsearch, so the versions are controlled in DynamoDB…”

That is precisely what I’ve been saying all along - it would appear that ElasticSearch allows this, provided your streaming function tells ElasticSearch this is the case. This to me should be a simple fix to this problem (I hope).

Today we hydrate the cache by doing ElasticSearch queries (ie. get all assets in a facility downloaded to the cache). It would be nice to allow ElasticSearch to hydrate a DataStore database - but that’s not critical. What is critical is to simply stop the streaming lambda function crashing because of the _version field.

If you can get just get it where conflict detection is not killing ElasticSearch that will be a huge step forward.

sacrampton on Apr 21, 2020

Hi @kaustavghosh06 / @ammarkarachi - can I get an update on this critical bug. It means that anyone that is using ElasticSearch cannot properly implement DataStore which is a HUGE issue and has my progress toward implementing DataStore and releasing my product into the market is dead in its tracks.

There are issues, then there are critical issues - its hard to get a more critical issue than killing ElasticSearch - so I’d be grateful if you could get me an update on this.

By removing conflict detection I could return my functionality to be non-DataStore enabled - but I really do need to be able to implement DataStore to get off-line first capability properly working.

At a minimum, if a fix to this is not imminent I suggest you have a big warning on https://aws-amplify.github.io/docs/js/datastore#conflict-resolution to tell people that this is NOT compatible with ElasticSearch and not to proceed with implementing it until it is fixed.

Thanks - appreciate any input you can provide.

sacrampton on Apr 17, 2020