azure-storage-azcopy: Azcopy leaks memory and crashes when copying a large number of files

Which version of the AzCopy was used?

10.2.1.

Which platform are you using? (ex: Windows, Mac, Linux)

Linux

What command did you run?

azcopy copy "monitor" "https://container?SAS" --overwrite=false --recursive --include "*/2019-*"

What problem was encountered?

Azcopy starts, goes on for some time, starts to leak memory, slows down to a stop, eats more and more memory, crashes. Machine’s memory size is 16GB The almost-stop is at:

392 Done, 2 Failed, 31798 Pending, 178338 Skipped, 210530 Total, 2-sec Throughput (Mb/s): 2.5495
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                 
27310 root      20   0 5308436 2,902g  53432 S   9,2 18,6  10:18.43 azcopy    

How can we reproduce the problem in the simplest way?

Try to copy ca. 200000 files, from big directory structure

Have you found a mitigation/solution?

No

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 22 (14 by maintainers)

Most upvoted comments

Had very similar problems with AzCopy 10.3.3 on Windows running out of memory - with the following exception output:

.\azcopy.exe : runtime: VirtualAlloc of 8192 bytes failed with errno=1455
At line:1 char:1
+ .\azcopy.exe copy 'redacted....
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (runtime: Virtua...with errno=1455:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

My transfer had around 4 million small files. They’d uploaded to Azure storage using 10.3.3 with no problems - the issue happened when downloading from storage to a VM.

The amount of VM used by AzCopy kept increasing until it hit the size of the page file and the above exception occurred.

I noticed this in the log:

2019/12/27 14:57:08 Max open files when downloading: 2147483311 (auto-computed)

Setting AZCOPY_CONCURRENT_FILES=50 (default is 2**31) and AZCOPY_CONCURRENCY_VALUE=4 seemed to fix the problem

In my case I think the issue stemmed from a very large number of pending files - the rate new files were found by scanning was much higher than the download rate. Even with these settings Task Manager showed an unusually high number of handles opened by AzCopy (around 35,000)

So, somewhere, you have a “simple” issue with AzCopy not seeing your environment variable change. Are you running AzCopy from the same command window where you set the environment variable? If you’re not, then it won’t see it.

That’s it. I’ve set value from CMD and then ran prepared uploading script from Powershell. So, I have to set buffer value for each upload session, i.e. - put it into my script. Sorry for that, I have too little experience with azCopy. Thank you for help, John!

@grzeg1 I’ve done a bunch of tests. I can’t see anything out of the ordinary in my tests. Usage always stablizes at around the expected level. And in fact the memory usage that you mention above, of 2.9 GB, is within the bounds of normal behaviour. E.g. on a 4-core machine, AzCopy will allocate up to 2 GB for its buffers, and with some overhead for the rest of the app’s memory usage, that will take it up to the 2.9 that you see.)

I think that the most likely cause of the issue you are seeing is that somehow, the default memory usage of AzCopy is too much for your machine. I.e. AzCopy isn’t actually leaking, because its using 2.9 GB on purpose, and its memory growth will eventually stop. But… in your case, the amount that its trying to use is just too much. Maybe you have other processes running on the machine, or some other reason why its default usage of 0.5 GB per CPU is too much.

In version 10.3, we are going to introduce a new environment variable called AZCOPY_BUFFER_GB. In cases such as yours, you can set it to a lower figure than the default. E.g. set it to 1 or even 0.5 in your case. I believe that will solve the problem that you have reported. Please let us know what you think of that solution.

We will leave this issue open until 10.3 is released.