azure-storage-azcopy: [Question] Slow performance of sync - Local File to Blob

V10.0.2 Preview - Win 7

.\azcopy sync "C:\GCDS_dev" "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?--Key Retracted--" --recursive

When syncing larger amounts of data >1GB (local file to Blob) sync seems to take a long time to even prep the job. (i.e syncing 1.4gb of data seems to take greater than 30 mins to even srart the job)

While copy function seems to start almost straight away.

I know the sync command obviously has some file comparison work to do before it can do anything, but it still seems extraordinarily slow to begin.

Any idea what could be causing delay?

Is it possible to report file conflict check progress to the command line with a flag?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 23 (10 by maintainers)

Most upvoted comments

Consider this issue closed!!

@zezha-msft and @prjain-msft … jsut got back from holidays… will check now… im excited

Hi @VelizarVESSELINOV, thanks for your feedbacks.

To clarify though, if you only wanted to copy files, you should use the copy command, not sync which has severe overhead because we have to compare the contents of the source and destination to figure out exactly what to transfer or delete. On the other hand, copy simply transfers the source to destination. With the help of the --overwrite=false flag, copy can also avoid overwriting existing files at the destination.

@VelizarVESSELINOV take into account sync is a new feature thats only “in preview” right now. The guys are still testing it and optimizing its performance.

This thread is focused on an issue with the sync command’s initial comparison between source and destination been slow. Not the file transfer operation itself. Can i suggest you post issues with multi threading performance as a separate issue?

Hi @zezha-msft, thanks for the quick answer. In my process explorer, I saw a lot of threads running but the CPU usage was limited. Are there an option to control parallel execution or not, maybe the user interface is not showing enough what is currently done in parallel and/or chunked. For failures, I have often this error

   ERROR:
-> github.com/Azure/azure-storage-azcopy/ste.newAzcopyHTTPClientFactory.func1.1, /go/src/github.com/Azure/azure-storage-azcopy/ste/mgr-JobPartMgr.go:95
HTTP request failed

The CPU is often low (3%), but obviously using a lot of some resources so few minutes after start execution VSCode and other applications switch to not responding mode, which is annoying.

Compare to gsutil sync the azcopy sync performance are really very bad. Using macOS Mojave. Azcopy being written in go in large part, expected higher performance than gsutil/boto written in Python.

Related to slow performance extra observations:

  • missing clear intermediate output information to follow what the program is doing specially during diff analysis phase (try gsutil if want to understand what I’m talking about)
  • missing compression option during coping readable files like CSV
  • too much file transfer failures
  • missing chunking of the large files
  • missing multi-threaded option

Well i tested two scenarios. The local file empty and the blob empty. (definitely reproducible)

When syncing from the localfile to blob (with blob empty) it took 31 mins before the job even started. But syncing back down to the local file from the blob was lightning fast. see below.

10000 files on local file (source) – empty on blob (destination) - 31mins to start job 10000 files on blob (source) - empty on local file (destination) - less than 1 minute to start.

seems like its taking a long time to queue the transfer to the blob when syncing,

~10000 files. 1.4GB. An azure copy took less than 2 minutes. Basically azcopy accepts the command and prints nothing for about 30 minutes.

How can i check if low throughput is a problem? throughput = files/sec? I posted the log above