mapillary_tools: [BUG] Interrupted uploads do not resume

Basic information

  • Release version: 0.9.5
  • System: Linux, probably any
  • Capture Device: any

Steps to reproduce behavior

  1. Start uploading a bunch of images, preferably like gigabytes per sequence.
  2. Wait for the server to choke or deliberately throttle your network bandwidth, potentially down to 0. Pressing Ctrl+C and then resuming (while the upload session is still alive on the server) with exactly the same command should have the same effect.
  3. Observe that either the server or the client resumes at 'offset': 0.

It is unclear whether this is a client or server bug. My gut feeling tells me that it is a server bug but I may be wrong. The server may change the chunk size during upload depending on load. The client should adapt to this. The server should also always respond with the correct offset, independent of the current chunk size.

Expected behavior

Per sequence uploads should resume on the offset of the first incomplete chunk.

Actual behavior

2022-10-13 14:11:23,512 - DEBUG   - Sending upload_fetch_offset via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 0, 'offset': 0, 'retries': 0, 'upload_last_restart_time': 1665663083.5125833, 'upload_first_offset': 0}
2022-10-13 14:11:37,891 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 0, 'offset': 3792867, 'retries': 0, 'upload_last_restart_time': 1665663083.5125833, 'upload_first_offset': 0, 'chunk_size': 3792867}
2022-10-13 14:11:53,101 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 0, 'offset': 7585734, 'retries': 0, 'upload_last_restart_time': 1665663083.5125833, 'upload_first_offset': 0, 'chunk_size': 3792867}
2022-10-13 14:12:08,341 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 0, 'offset': 11378601, 'retries': 0, 'upload_last_restart_time': 1665663083.5125833, 'upload_first_offset': 0, 'chunk_size': 3792867}
2022-10-13 14:12:22,579 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 0, 'offset': 15171468, 'retries': 0, 'upload_last_restart_time': 1665663083.5125833, 'upload_first_offset': 0, 'chunk_size': 3792867}
2022-10-13 14:12:25,712 - WARNING - Error uploading chunk_size 3792867 at offset 0: HTTPError: 412 Client Error: Precondition Failed for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_c043f21e7f5d2f4ab25c1ba5099bf5a7.zip
2022-10-13 14:12:25,712 - INFO    - Retrying in 2 seconds (1/200)
2022-10-13 14:12:28,367 - DEBUG   - Sending upload_fetch_offset via IPC: {'total_sequence_count': 2, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mapillary_public_uploads/mly_tools_00c8a153ca702f5b1d714bd83d0e8362.zip', 'sequence_image_count': 163, 'entity_size': 618237384, 'md5sum': 'c043f21e7f5d2f4ab25c1ba5099bf5a7', 'upload_start_time': 1665663081.9819736, 'upload_total_time': 62.19941592216492, 'offset': 0, 'retries': 1, 'upload_first_offset': 0, 'chunk_size': 3792867, 'upload_last_restart_time': 1665663148.3675609}

Note that 'offset': 0, while it should have been 'offset': 15171468 because this is the offset of the first incomplete chunk.

All of the above means that it is currently extremely difficult to upload large sequences on low bandwidth connections because as soon as the load on the server changes (which is very likely on long uploads), the chunk size changes and the client has to restart uploading from offset 0. This is a huge waste of resources!

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 24 (10 by maintainers)

Most upvoted comments

If either of you need a reference point (debug logs) to work with I have been using the tools to upload BlackVue videos 24/7 for 2 weeks now. Something like 870GBytes, 180,000 mp4’s so far that seems to be working okay. I have not been watching the logs closely, but only have maybe 20 of the ConnectionError similar to above.

No doubt that I am uploading mp4’s rather than jpg’s has a lesser chance of failure,

Drop me a note if you would like any analysis or dump,

@GITNE the offset reset is likely that when mapillary_tools resumes uploading from a different IP, it might upload to a different host, and the offset is not shared across hosts. Hence offset 0.

Note these logs:

2022-10-24 19:40:41,313 - DEBUG   - GET https://rupload.facebook.com/mapillary_public_uploads/mly_tools_test_a981130cf2b7e09f4686dc273cf7187e
2022-10-24 19:40:43,019 - DEBUG   - HTTP response 200: b'{"dc":"lla2c16","offset":10076160}'

Do you get a different dc when the offset becomes 0?

While fixing the issue with the team, the best we can do is:

  1. retry on these 412 errors immediately (no sleep wait)
  2. upload from a stable network

Note on 1: it does not solve the problem, which is uploading the same data multiple times but it makes uploading faster

The fix is released here https://github.com/mapillary/mapillary_tools/releases/tag/v0.9.5a1

BTW it would be great if you could setup a local env here https://github.com/mapillary/mapillary_tools#development so we can test on branches without releasing binaries (faster iteration).

Just a quick check @GITNE: were you uploading this sequence in multiple processes/machines simultaneously?

No, single instance on one machine only.

@GITNE I suspect it’s related to HTTP status 412 error, curious to see the payload.

I’m trying to add a check on retires: if the offset fetched from the server does not move as expected, exit with full HTTP response printed out.

I will make a new alpha release and let you know soon.