lorawan-stack: Class C messages intermittently fail to schedule with invalid absolute time

Summary

Class C messages intermittently fail to schedule with “invalid absolute time set in application downlink” when closely spaced.

Is there a minimum time between messages to the same device? Ideally we would have them scheduled 130ms apart but have tried to increase the spacing to 1000ms to mitigate the issue

Steps to Reproduce

  1. Schedule downlink at current time + 7 sec to gateway X
  2. Schedule downlink at current time + 8 sec to gateway X
  3. Wait for 10 seconds and observe the mqtt topic DowlinkFailed
  4. If required run in a loop to increase the chance of reproducing the error

We see this approximately every 15 dowlinks

What do you see now?

17/11/2020 14:09:30.783 +1300	Sending TTN V3 multicast. topic: v3/halter/devices/2704aee7-7c77-4c99-8d26-cf110b1a90a7/down/push. Payload: {"downlinks":[{"f_port":1,"frm_payload":"ChQKEgoQEg4IAxIFDbQiy10YASDjAQ==","priority":"ABOVE_NORMAL","class_b_c":{"absolute_time":"2020-11-17T01:09:37.000Z","gateways":[{"gateway_ids":{"gateway_id":"eui-00800000a0003582"}}]}}]}
17/11/2020 14:09:30.883 +1300	Sending TTN V3 multicast. topic: v3/halter/devices/2704aee7-7c77-4c99-8d26-cf110b1a90a7/down/push. Payload: {"downlinks":[{"f_port":1,"frm_payload":"ChQKEgoQEg4IAxIFDbQiy10YASDjAQ==","priority":"ABOVE_NORMAL","class_b_c":{"absolute_time":"2020-11-17T01:09:38.000Z","gateways":[{"gateway_ids":{"gateway_id":"eui-00800000a0005314"}}]}}]}
17/11/2020 14:09:37.068 +1300	Serial 2704aee7-7c77-4c99-8d26-cf110b1a90a7: Receiving DownlinkFailed. Payload: {"end_device_ids":{"device_id":"2704aee7-7c77-4c99-8d26-cf110b1a90a7","application_ids":{"application_id":"halter"},"dev_addr":"01415A3E"},"correlation_ids":["as:downlink:01EQ9VZT44AARKQ3V190WBSXZA","as:up:01EQ9W005A64G2W1Y3GQA9S3SR"],"received_at":"2020-11-17T01:09:37.066774351Z","downlink_failed":{"downlink":{"f_port":1,"f_cnt":56,"frm_payload":"ChQKEgoQEg4IAxIFDbQiy10YASDjAQ==","class_b_c":{"gateways":[{"gateway_ids":{"gateway_id":"eui-00800000a0005314"}}],"absolute_time":"2020-11-17T01:09:38Z"},"priority":"ABOVE_NORMAL","correlation_ids":["as:downlink:01EQ9VZT44AARKQ3V190WBSXZA"]},"error":{"namespace":"pkg/networkserver","name":"absolute_time","message_format":"invalid absolute time set in application downlink","code":3}}}
17/11/2020 14:09:37.068 +1300	Serial 2704aee7-7c77-4c99-8d26-cf110b1a90a7: Receiving DownlinkFailed. Payload: {"end_device_ids":{"device_id":"2704aee7-7c77-4c99-8d26-cf110b1a90a7","application_ids":{"application_id":"halter"},"dev_addr":"01415A3E"},"correlation_ids":["as:downlink:01EQ9VZT44AARKQ3V190WBSXZA","as:up:01EQ9W005A64G2W1Y3GQA9S3SR"],"received_at":"2020-11-17T01:09:37.066774351Z","downlink_failed":{"downlink":{"f_port":1,"f_cnt":56,"frm_payload":"ChQKEgoQEg4IAxIFDbQiy10YASDjAQ==","class_b_c":{"gateways":[{"gateway_ids":{"gateway_id":"eui-00800000a0005314"}}],"absolute_time":"2020-11-17T01:09:38Z"},"priority":"ABOVE_NORMAL","correlation_ids":["as:downlink:01EQ9VZT44AARKQ3V190WBSXZA"]},"error":{"namespace":"pkg/networkserver","name":"absolute_time","message_format":"invalid absolute time set in application downlink","code":3}}}

What do you want to see instead?

No downlink failures and messages emitted from the gateway

Environment

The Things Stack for LoRaWAN: ttn-lw-stack Version: 3.9.4 Build date: 2020-09-23T09:56:19Z Git commit: c4be55c Go version: go1.15.2 OS/Arch: linux/amd64

How do you propose to implement this?

Determine the causes for the invalid absolute time and resolve if there is a bug

How do you propose to test this?

Happy to test a PR in our dev environment

Can you do this yourself and submit a Pull Request?

@rvolosatovs ?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

@virtualguy @kurtmc can you verify that this is resolved on latest v3.11?

Note the minor bump, you need to run DB migrations, see https://github.com/TheThingsNetwork/lorawan-stack/blob/e56f7f70e60dba8c1ad584411fb63a8c35659e7c/CHANGELOG.md#3110---2021-02-10

We’ll be rolling a 3.11.1 release today.

fyi we are having some troubling running binaries we have built so blocked on #3736 before we can gather logs from your patch @johanstokking

@virtualguy I added some extra debug lines. Pick any of:

  1. Apply patch investigate-3487.txt
  2. Cherry-pick https://github.com/TheThingsNetwork/lorawan-stack/commit/f9bbda5db5090dd7eb7bc2d798c92bed82efc2dd
  3. Check-out https://github.com/TheThingsNetwork/lorawan-stack/tree/investigate/3487-abs-downlink-timing (based on v3.10.7)

Please grep output by #3487 and copy here. It’s also the only output that goes to stdout (as logs go to stderr).

If it says in ScheduleAt that there are too few RTTs, you might want to increase TTN_LW_EXP_RTT_TTL (duration, like 6h for 6 hours, default is 30m for 30 minutes) and/or decrease TTN_LW_EXP_SCHEDULE_MIN_RTT_COUNT to 3 or something (default is 5). These are temporary feature flags.

cc @ymgupta