influxdb-client-python: Occasionally losing data points along with error message: "The batch item wasn't processed successfully because: (400) {"code":"invalid","message":"writing requires points"}"

I’ve been encountering occasional errors with a really simple python program to write batches of points. I thought my usage was more-or-less very basic, so I’m not clear why this is happening. Perhaps the batching machinery is improperly creating empty batches and dropping points?

I get many log messages like the following:

The batch item wasn't processed successfully because: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Sat, 11 Apr 2020 23:14:18 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '54', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'x-platform-error-code': 'invalid'})
HTTP response body: {"code":"invalid","message":"writing requires points"}

Observe how the chronograf.csv data is missing some values like (0, 9, 27, 30, 36, etc).

I’ve attached some sample code, sample local output, and a sample CSV exported from InfluxDB explorer UI. Also attached in this Gist for nice formatting.

SampleCode.py.txt LocalOutput.txt 2020-04-11-16-47_chronograf_data.csv.txt

Configuration Info:

InfluxDB version: InfluxDB Cloud 2.0 influxdb_client python module version: 1.5.0 Python version: 3.7.3 OS: Raspbian Linux (Buster)

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

The issue seems to be fixed. Please use with statement for initializing the client and the batching write_api.

The following script was used for testing:

import time
from datetime import datetime
from influxdb_client import InfluxDBClient, WriteOptions, Point

url = "https://us-west-2-1.aws.cloud2.influxdata.com"
token = "..."
org = "..."
bucket = "..."
measurement = "python-loosing-data_" + str(datetime.now())

with InfluxDBClient(url=url, token=token, debug=False) as client:
    options = WriteOptions(batch_size=8, flush_interval=8, jitter_interval=0, retry_interval=1000)
    with client.write_api(write_options=options) as write_api:
        for i in range(50):
            valOne = float(i)
            valTwo = float(i) + 0.5
            pointOne = Point(measurement).tag("sensor", "sensor1").field("PSI", valOne).time(time=datetime.utcnow())
            pointTwo = Point(measurement).tag("sensor", "sensor2").field("PSI", valTwo).time(time=datetime.utcnow())

            write_api.write(bucket, org, [pointOne, pointTwo])
            print("PSI Readings: (%f, %f)" % (valOne, valTwo))
            time.sleep(0.5)

    query = f'from(bucket: "{bucket}") |> range(start: 0) |> filter(fn: (r) => r["_measurement"] == "{measurement}") |> count()'
    tables = client.query_api().query(query, org)
    for table in tables:
        for record in table.records:
            print(f'{record.get_measurement()}: {record.get_field()} count: {record.get_value()}')

print("end")

Hi @bednar, SYNCHRONOUS works, thanks!

I am facing the same / very similar issue. I am using the following parameters:

batch_size=1_000, flush_interval=10, retry_interval=1_000

Tried with flush interval of 1, 5, 10, 20, 25 … . I didn’t find the exact level where I lose data. I don’t need to flush my data this fast, but as @joeyhagedorn said: this shouldn’t occur.

Even though not all data is saved to InfluxDB, I don’t get any errors (debug mode is enabled). It might be a memory leak. I don’t see much load on my InfluxDB server. On my “client” server I see that one core is used for about 80% (I should look if I could parallelize the workload).

FYI, server specs (InfluxDB):

  • 16 virtual cores (Intel Xeon Platinum 8176)
  • 64GB RAM (about 12GB in use)
  • SSD storage

Client is running on another VM, same server, 32GB RAM.

Using 1.8.0.dev0 client version, Influx version is InfluxDB 2.0.0-beta.10 (will check if beta 12 solves the issue).