elasticsearch-net: Deserialization in parallel causes invalid objects

NEST/Elasticsearch.Net version: 5.6.6

Elasticsearch version: 5.6.3

Description of the problem including expected versus actual behavior: Actual: When deserializing a GeoJson object using NTS the properties data is duplicated from one entity to the other. Expected: Data should not “leak”.

Steps to reproduce:

  1. Query the database using search in parallel
  2. Test the results It doesn’t happen all the time, it happens from time to time which I believe is related to timing and multithread stuff… Code can be seen here: https://github.com/IsraelHikingMap/Site/blob/master/IsraelHiking.DataAccess/ElasticSearchGateway.cs#L381L401

Expected behavior There shouldn’t be a “memory override”

Provide ConnectionSettings (if relevant): https://github.com/IsraelHikingMap/Site/blob/master/IsraelHiking.DataAccess/ElasticSearchGateway.cs#L66L75 Most of the info can be found in the following thread, I don’t know if the problem is here or in the deserialization code in the NTS library… 😦 The comment below and afterwards is the relevant discussion. There are also elastic dump there. https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/issues/46#issuecomment-636530155

Provide DebugInformation (if relevant): Can be found in the above issue.

Any help would be greatly appreciated. Running on .net core 3.1 windows server 2012 R2.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 25 (8 by maintainers)

Commits related to this issue

Most upvoted comments

@HarelM Stating that no one wants to take care of it is a mischaracterisation. At the moment, there is not a clear reproducible example to demonstrate the issue, requiring an investment in time and effort to come up with one and investigate. In addition, the issue is for a version no longer supported and it’s not clear that the issue is with the client. This is why I think a minimal reproducible example is so important; it removes all extraneous variables, reducing the problem space to one that can expedite investigation 😃

The fact that ES is not creating a new serializer every time is what I believe is causing this issue.

Json.NET’s JsonSerializer is thread safe as far as I know. The use of StreamingContext makes assumptions about how JsonSerializer is consumed, which may not hold for a JsonSerializer outside of the control of where such assumptions are made. For example, in the case of MultiSearch where a given search response is tied to a document type T, a new serializer may be created to handle that

https://github.com/elastic/elasticsearch-net/blob/641df53f5fa646648bc9cfb8d14d5ef11041f30a/src/Nest/CommonAbstractions/ConnectionSettings/ConnectionSettingsBase.cs#L100-L101

which may break the assumptions that the usage of StreamingContext is relying on. A simple reproducible example on a supported version would help to better understand what is at play.

Great to hear that the issue is now resolved, @HarelM. Closing this issue.

I have updated the server with the package, removed my workaround and looks like this issue is resolved. Let me know if you would like me to close it, or do anything else. Thanks everyone for the time and effort in helping me address this issue. Truly amazing team work! 😃

I’ve opened NetTopologySuite/NetTopologySuite.IO.GeoJSON#59 to remove the use of serializer context; its usage isn’t integral to NetTopologySuite’s deserialization approach, and can be rewritten to not use it.

NetTopologySuite.IO.GeoJSON v2.0.4 includes this fix.