azure-storage-azcopy: [Question] Slow performance of sync - Local File to Blob
V10.0.2 Preview - Win 7
.\azcopy sync "C:\GCDS_dev" "https://azgcdsdevst1.blob.core.windows.net/gcdstest2?--Key Retracted--" --recursive
When syncing larger amounts of data >1GB (local file to Blob) sync seems to take a long time to even prep the job. (i.e syncing 1.4gb of data seems to take greater than 30 mins to even srart the job)
While copy function seems to start almost straight away.
I know the sync command obviously has some file comparison work to do before it can do anything, but it still seems extraordinarily slow to begin.
Any idea what could be causing delay?
Is it possible to report file conflict check progress to the command line with a flag?
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 2
- Comments: 23 (10 by maintainers)
Consider this issue closed!!
@zezha-msft and @prjain-msft … jsut got back from holidays… will check now… im excited
Hi @VelizarVESSELINOV, thanks for your feedbacks.
To clarify though, if you only wanted to copy files, you should use the
copy
command, notsync
which has severe overhead because we have to compare the contents of the source and destination to figure out exactly what to transfer or delete. On the other hand,copy
simply transfers the source to destination. With the help of the--overwrite=false
flag, copy can also avoid overwriting existing files at the destination.@VelizarVESSELINOV take into account sync is a new feature thats only “in preview” right now. The guys are still testing it and optimizing its performance.
This thread is focused on an issue with the sync command’s initial comparison between source and destination been slow. Not the file transfer operation itself. Can i suggest you post issues with multi threading performance as a separate issue?
Hi @zezha-msft, thanks for the quick answer. In my process explorer, I saw a lot of threads running but the CPU usage was limited. Are there an option to control parallel execution or not, maybe the user interface is not showing enough what is currently done in parallel and/or chunked. For failures, I have often this error
The CPU is often low (3%), but obviously using a lot of some resources so few minutes after start execution VSCode and other applications switch to not responding mode, which is annoying.
Compare to gsutil sync the azcopy sync performance are really very bad. Using macOS Mojave. Azcopy being written in go in large part, expected higher performance than gsutil/boto written in Python.
Related to slow performance extra observations:
Well i tested two scenarios. The local file empty and the blob empty. (definitely reproducible)
When syncing from the localfile to blob (with blob empty) it took 31 mins before the job even started. But syncing back down to the local file from the blob was lightning fast. see below.
10000 files on local file (source) – empty on blob (destination) - 31mins to start job 10000 files on blob (source) - empty on local file (destination) - less than 1 minute to start.
seems like its taking a long time to queue the transfer to the blob when syncing,
~10000 files. 1.4GB. An azure copy took less than 2 minutes. Basically azcopy accepts the command and prints nothing for about 30 minutes.
How can i check if low throughput is a problem? throughput = files/sec? I posted the log above