tusd: Azure storage does not support concatenation

Technical Knowledge of User I’m new to using tusd and the tus-js-client, so I’ll be happy if anybody can point out any problems in my setup for testing.

Describe the bug While running tusd with an Azure Storage backend, I found that the transfer would always fail on the last HTTP POST with a missing or invalid Upload-Length header. This only occurs when parallel upload (connections) are set to at least 2.

Snippet of tusd logs during the end of a transfer:

[tusd] 2022/10/31 04:54:05.053405 event="ResponseOutgoing" status="201" method="POST" path="" requestId="" body="" 
[tusd] 2022/10/31 04:54:05.063695 event="RequestIncoming" method="PATCH" path="14ef5595737703227cbbe4c006cc7920" requestId="" 
[tusd] 2022/10/31 04:54:05.064181 event="RequestIncoming" method="PATCH" path="9d5eb446624ade5a25d719f07865fdd8" requestId="" 
[tusd] 2022/10/31 04:54:05.105852 event="ChunkWriteStart" id="9d5eb446624ade5a25d719f07865fdd8" maxSize="57738292" offset="0" 
[tusd] 2022/10/31 04:54:05.107953 event="ChunkWriteStart" id="14ef5595737703227cbbe4c006cc7920" maxSize="57738292" offset="0" 
[tusd] 2022/10/31 04:54:13.957541 event="ChunkWriteComplete" id="14ef5595737703227cbbe4c006cc7920" bytesWritten="57738292" 
[tusd] 2022/10/31 04:54:13.976012 event="ResponseOutgoing" status="204" method="PATCH" path="14ef5595737703227cbbe4c006cc7920" requestId="" body="" 
[tusd] 2022/10/31 04:54:13.976043 event="UploadFinished" id="14ef5595737703227cbbe4c006cc7920" size="57738292" 
[tusd] 2022/10/31 04:54:17.708767 event="ChunkWriteComplete" id="9d5eb446624ade5a25d719f07865fdd8" bytesWritten="57738292" 
[tusd] 2022/10/31 04:54:17.740995 event="ResponseOutgoing" status="204" method="PATCH" path="9d5eb446624ade5a25d719f07865fdd8" requestId="" body="" 
[tusd] 2022/10/31 04:54:17.741020 event="UploadFinished" id="9d5eb446624ade5a25d719f07865fdd8" size="57738292" 
[tusd] 2022/10/31 04:54:17.742362 event="RequestIncoming" method="POST" path="" requestId="" 
[tusd] 2022/10/31 04:54:17.742385 event="ResponseOutgoing" status="400" method="POST" path="" requestId="" body="ERR_INVALID_UPLOAD_LENGTH: missing or invalid Upload-Length header

Client side NodeJS using tus-js-client

0 115476584 0.00%
0 115476584 0.00%
65536 115476584 0.06%
131072 115476584 0.11%
31260672 115476584 27.07%
63569920 115476584 55.05%
90113076 115476584 78.04%
115476584 115476584 100.00%
Failed because: Error: tus: unexpected response while creating upload, originated from request (method: POST, url: http://127.0.0.1:47280/files/, response code: 400, response text: ERR_INVALID_UPLOAD_LENGTH: missing or invalid Upload-Length header
, request id: n/a)

To Reproduce Steps to reproduce the behavior:

  1. Start up tusd server with Azure backend storage.
    podman run -p 47280:47280 -v /tmp/host/tus:/srv/ --env AZURE_STORAGE_ACCOUNT=<storage account name> --env AZURE_STORAGE_KEY=<azure storage key> docker.io/tusproject/tusd:v1.10.0 -port=47280 -azure-blob-access-tier=hot -azure-object-prefix=tus-prefix -azure-storage=tus-file-container -azure-endpoint=https://<storage account name>.blob.core.windows.net
  2. Run the client demo or a nodejs client, set the parallel upload to 2.
  3. Upload a file.
  4. Watch the last POST message fail, and the file fail to combine - even though all the parts has been successfully sent.

Expected behavior I expect parallel uploads to work.

Setup details Please provide following details, if applicable to your situation:

  • Operating System: Linux (Ubuntu 22.04 LTS) and the docker image OS is running for tus
  • Used tusd version: Tried both tusd:v1.10.0 and tusd:2.0.0rc17
  • Used tusd data storage: Azure Storage
  • Used tusd configuration: -port=47280 -azure-blob-access-tier=hot -azure-object-prefix=tus-prefix -azure-storage=tus-file-container -azure-endpoint=https://<storage account name>.blob.core.windows.net
  • Used tus client library: tus-js-client using nodejs and the tus-js-client demo

NodeJS client code

const fs = require('fs');
const tus = require("tus-js-client");

const path = "file.txt"
const file = fs.createReadStream(path)

const upload = new tus.Upload(file, {
  endpoint: 'http://127.0.0.1:47280/files/',
  parallelUploads: 2,

  // Callback for errors which cannot be fixed using retries
  onError: function(error) {
    console.log("Failed because: " + error)
  },
  
  // Callback for reporting upload progress
  onProgress: function(bytesUploaded, bytesTotal) {
    const percentage = (bytesUploaded / bytesTotal * 100).toFixed(2)
    console.log(bytesUploaded, bytesTotal, percentage + "%")
  },
  
  // Callback for once the upload is completed
  onSuccess: function() {
    console.log("Download %s from %s", upload.file.name, upload.url)
  }
});

// Check if there are any previous uploads to continue.
upload.findPreviousUploads()
  .then((previousUploads) => {

    // Found previous uploads so we select the first one. 
    if (previousUploads.length) {
      upload.resumeFromPreviousUpload(previousUploads[0])
    }

    // Start the upload
    upload.start()
})

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 20 (14 by maintainers)

Most upvoted comments

Every discussion on this topic has happened in this issue, so there is nothing hidden from the public 😃 We all agree that we would like to accept support for concatenation in the Azure storage. Somebody just needs to have the time to implement it. Let me know if you are interested in doing so!

Azure store does use blocklist

https://github.com/tus/tusd/blob/7225439860d8675b231f408f8e9a26b3fadcf1e2/pkg/azurestore/azureservice.go#L206-L213

https://github.com/tus/tusd/blob/7225439860d8675b231f408f8e9a26b3fadcf1e2/pkg/azurestore/azureservice.go#L235-L244

I guess I could take a run at it, but I can’t promise any timeframe as I am a bit busy with selling my apartment and moving at the moment.

Sure, the entire relevant code for parallel uploads and concatenation is at https://github.com/tus/tus-js-client/blob/a456406f8e4232db416705952da5ef49a661c7ef/lib/upload.js#L255-L364.

It begins by slicing the entire file into multiple parts and uploading each part concurrently. Once the upload URL is available for a single part, it is stored in an array while preserving the positions of the part in the entire file: https://github.com/tus/tus-js-client/blob/a456406f8e4232db416705952da5ef49a661c7ef/lib/upload.js#L315

Finally, these URLs are POSTed to the tus server for concatentation: https://github.com/tus/tus-js-client/blob/a456406f8e4232db416705952da5ef49a661c7ef/lib/upload.js#L335-L336

I was looking at implementing it, but I was a bit unsure how to exactly do it. Theoretically, you could upload the chunks in any order you’d like, as long as you know their original order before you commit the blob. So I chose to not implement it at the time.