influxdb: Concurrent read- and write-request leads to error "query interrupted" on 1.4

System info: InfluxDB 1.4.2 and 1.4.1 on debian linux

Steps to reproduce:

Run concurrently many query- and write-requests againts the http endpoint. I was able to reproduce the problem with 3 parallel treads doing SELECT-Queries and 2 parallel threads doing writes against the same set of series.

Unfortunatly I can’t easily provide a shell script to reproduce, because I’m running a windows .net-application as http client. But I will provide a wireshark file with the observed behaviour.

Expected behavior:

All queries should get an proper result. If server-error occur they should be logged in the journal and counted in the _internal,http measurement.

Actual behavior:

Some of the queries get an HTTP-200 result but in the response there is the following json string: {"results":[{"statement_id":0, "error":"query interrupted"}]}

See the following packet-numbers in the attached wirehark-file:

  • 204
  • 430
  • 1797

Additional info:

If I run the same (or even a much higher) load against a 1.3.7 influxdb on the same machine, all queries return the expected result. I also coundn’t see any impact on changing the max-concurrent-queries parameter in the coordinator section of the influxdb.config. During the time I recorded the wireshark-file this parameter was set to 0.

You should easily be able to reproduce this creating similar load as you can see in the wireshark-file. If you can’t, let me know and I’ll provide additional information for you. Influxdb-quer_interrupted.pcapng.gz

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (12 by maintainers)

Most upvoted comments

I am going to be investigating how to get the clients to work properly. If I either can’t figure it out or find the conclusion unsatisfying, we’re going to back this out so that it works properly in 1.4.3.

The main reason to want to keep this is because otherwise a bunch of connections stay open for too long and you end up with a bunch of open sockets that are stuck in TIME_WAIT.