moby: Unable to react to graceful shutdown of (Windows) container

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:15:28 2016
 OS/Arch:      windows/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 21:15:28 2016
 OS/Arch:      windows/amd64

Output of docker info:

Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 86
Server Version: 1.12.0
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: nat null overlay
Swarm: inactive
Security Options:
Kernel Version: 10.0 14300 (14300.1045.amd64fre.rs1_release_svc.160705-1059)
Operating System: Windows Server 2016 Standard Technical Preview 5
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 8 GiB
Name: slc-dev-s16p-2
ID: H3YG:MO32:XSEU:NGD4:Z7FC:4SMS:EYPL:RGHO:DGWY:W767:O3L2:HNXZ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

I am unable to react to a graceful shutdown of my application running inside a (Windows) container. I have tried SetConsoleCtrlHandler(), but my handler is never called. I have tried signal(), but no SIGTERM is received. I have tried running a message loop, but WM_CLOSE is never received.

The work of shutting down the container is (apparently) done by the ShutdownComputeSystem routine from vmcompute.dll (this is from zhcsshim.go), but I cannot find any documentation or other information on what ShutdownComputeSystem does. It has been suggested that @jhowardmsft would know what’s going on.

Please help!

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 21
  • Comments: 95 (40 by maintainers)

Commits related to this issue

Most upvoted comments

Just a note here for anyone who find this issue and was as confused as I was about all this:

  1. For Windows containers, the -t parameter of docker stop seems to do nothing. There seems to be no way to specify that value from outside the container. The command issues a CTRL_SHUTDOWN_EVENT immediately regardless of any --time set.
  2. The first registry value change, RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 7200, causes Windows to delay sending the shutdown event for the specified number of seconds. I’m not exactly sure how useful this is - unless you’ve used some other channel to tell your process to shut down.
  3. The second registry value change is more useful: RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 7200000 /f. This causes Windows to wait the specified number of milliseconds before killing a process after issuing a CTRL_SHUTDOWN_EVENT - essentially the rough equivalent of docker stop -t.
  4. If you don’t delay while handling the CTRL_SHUTDOWN_EVENT your process will stop immediately, regardless of any delay set (much like how handling SIGTERM would work).

I hope this helps someone!

Hey all, we’ve had a delay in getting this fix out for our container images. This fix is not available in the Windows Server 2019 release of our container base images that rolled out the door last week, but we’re working diligently to make sure it will be present in a future Patch Tuesday release of our images.

I’ll update this thread when we get specific confirmation of the arrival. Right now, I’d forecast the first months of 2019. @rgl this is a Windows-side fix, so there’s no revision to reference.

@godefroi I’d be happy to provide that information when the image with fix is in-bound.

(@taylorb-microsoft and @jhowardmsft for FYI) Some clarification: 1803 does indeed include code that gracefully shuts down the container. As part of this, it signals individual processes created by docker using the CTRL_SHUTDOWN_EVENT. However, this notification is done with only a 5 second timeout, so not all processes are given enough lead time before the system itself is gracefully shut down.

The work Taylor identified in his comment would address this timeout, but is only available in a newer API that won’t be used until future versions of docker. We did end up building a workaround that works with current docker deployments, and is expected to be present in the November 1809 image updates. To use the longer timeout for docker processes during shutdown, three things need to be used:

  1. Registry keys for container process timeout and service timeout need to be updated and the system rebooted. This can be accomplished with the following line in a dockerfile:
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 60 && \
    reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 60000 /f

This sets both timeouts to 60 seconds, but you can replace the values with your preferred timeout.

  1. Docker stop should use a large enough timeout to accommodate the extended shutdown. By default, after 2 minutes waiting for clean shutdown docker will terminate a container. Using docker stop -t <seconds> will override this timeout.

  2. Docker cannot launch the process using terminal emulation (-t on a docker run command). Terminal emulation changes the way process signals are delivered in Windows and will cause the signal to get intercepted before it reaches the process. Using -i for interactive execution still works for shutdown signal propagation.

If your process needs an extended period of time to respond to a docker stop to clean-up before the container is shut down, these using these steps with the November update of the 1809 image should provide the needed extended timeout.

@weijuans-msft We need it in all windows images. So, yes in both Nano Server and in Server Core.

Hello,

I am running the following powershell script in a Windows Container:

try
  {
    Write-Host "3. Configuring Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd --unattended `
      --agent "$(if (Test-Path Env:AZP_AGENT_NAME) { ${Env:AZP_AGENT_NAME} } else { ${Env:computername} })" `
      --url "$(${Env:AZP_URL})" `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" `
      --pool "$(if (Test-Path Env:AZP_POOL) { ${Env:AZP_POOL} } else { 'Default' })" `
      --work "$(if (Test-Path Env:AZP_WORK) { ${Env:AZP_WORK} } else { '_work' })" `
      --replace
  
    Write-Host "4. Running Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\run.cmd
  } 
  finally
  {
    Write-Host "Cleanup. Removing Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd remove --unattended `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})"
  }

My container is using the windowsservercore-1903 image.

Unfortunately when I stop the container using docker stop, with or without -t 60 , the finally block is never triggered. Can someone help me with this?

@OnurGumus Glad to have that confirmed! I do remember running into that difference when testing this, but couldn’t root cause why terminal emulation behaved differently in nanoserver vs servercore. But since removing -t worked across both, it seemed like the right general solution. Hopefully this unblocks you from being able to rely on the correct shutdown behavior in your usage.

@OnurGumus @swernli I just wanted to comment in case this helps others too. If you don’t add an event handler to CancelKeyPress, RunConsoleAsync just hangs after calling the hosted service StopAsync

System.Console.CancelKeyPress += (e, s) => { };

Here is how I FINALLY achieved graceful shutdown on windows containers after 2 days of beating my head against the wall.

Using the following…

USER ContainerAdministrator
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 7200  
RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 7200000 /f

Starting the container with

docker run -id

Stopping the container with

docker stop -t 7200

This was all in conjuction with using SetConsoleCtrlHandler

Hi folks, providing an update from my earlier comment in November. This was officially resolved in the January Patch Tuesday event for our Windows Server 2019-based containers. Using the latest patched Windows containers & instructions provided by Stefan above should give you the expected behavior.

Thank you for your patience.

Hello,

I am running the following powershell script in a Windows Container:

try
  {
    Write-Host "3. Configuring Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd --unattended `
      --agent "$(if (Test-Path Env:AZP_AGENT_NAME) { ${Env:AZP_AGENT_NAME} } else { ${Env:computername} })" `
      --url "$(${Env:AZP_URL})" `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})" `
      --pool "$(if (Test-Path Env:AZP_POOL) { ${Env:AZP_POOL} } else { 'Default' })" `
      --work "$(if (Test-Path Env:AZP_WORK) { ${Env:AZP_WORK} } else { '_work' })" `
      --replace
  
    Write-Host "4. Running Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\run.cmd
  } 
  finally
  {
    Write-Host "Cleanup. Removing Azure Pipelines agent..." -ForegroundColor Cyan
  
    .\config.cmd remove --unattended `
      --auth PAT `
      --token "$(Get-Content ${Env:AZP_TOKEN_FILE})"
  }

My container is using the windowsservercore-1903 image.

Unfortunately when I stop the container using docker stop, with or without -t 60 , the finally block is never triggered. Can someone help me with this?

@olandese Did you ever manage to get this to work?

Update: My mistake was adding -t switch when running docker. Without t switch code runs fine. Though server core can handle -t switch too.

Here are my findings as of today: mcr.microsoft.com/windows/servercore:1809: receives the CTRL_SHUTDOWN_EVENT notification but is killed after about 5 seconds. But adding registry

RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 60 RUN reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 60000 /f

makes it running properly. It either waits for 60 secs or the app stops which ever happens earlier.

mcr.microsoft.com/windows/nanoserver:1809: Does not receive CTRL_SHUTDOWN_EVENT at all.

Adding registry settings make the stopping process wait for 60 secs and application dies without being notified.

Is there any place to report this issue about nanoserver ?

@jhowardmsft Can we get a concise description of what needs to be in place for a container to receive CTRL_SHUTDOWN_EVENT? What version of Windows Server does the host need to be running, and what version of the images do the containers need to be running?

Can’t wait to try those fixes, would be amazing if it made into 1809 and subsequently into 2019. It’s quite a big deal to be able to shut down containers gracefully.

@jhowardmsft - the Windows VS is 18604546. The OS platform work is in and being merged upstream - there is some Moby work required to connect it up.

Quick update - we have an PR in review for Windows. Typically code flow takes ~2-3 weeks for fixes to make it out to insider builds. Once the PR is merged and up streamed I should be able to provide a min build # with the fix.

First, I should’ve been more clear, in that the fix for this is in the container base image, so the only host OS version requirements are that the host OS must be equal to or newer than the container OS version. The first container OS image with this fix was the initial release of microsoft/windowsservercore:1709 and microsoft/nanoserver:1709. As a result, this fix is available starting on hosts running Windows Version 1709 or newer, as long as the container base image is also 1709 or newer.

There’s are two levels to this fix, with slightly different (but improving) levels of support for these notifications.

The first fix for this was in Version 1709, as you’ve noted. It notifies the initial process only, with a CTRL_CLOSE_EVENT sent to ConsoleCtrlHandler. The initial process in this case means the first process started by either docker run or docker exec. IE. docker run <image> shutdown.exe will get the notification, docker run <image> cmd.exe /c shutdown.exe will NOT get the notification (cmd.exe will get notified, but not pass that to shutdown.exe). As per above, this fix will work when the host OS AND container OS image version are both Windows 1709.

The second fix for this was in Windows 10 Build 17074, which changes two things. First, the notification to is changed to CTRL_SHUTDOWN_EVENT for both windowsservercore and nanoserver based images. Second, it extends the notification in windowsservercore based images to affect all process running in the container, including sending service shutdown notifications to services running in the container. This is currently available in insider builds, and will be available in Windows Version 1803, in the 1803 based container images.

I hope this clears up the questions regarding what Windows versions have this fix. Let me know if you have any more questions.

@jhowardmsft @PatrickLang any news about this?

I’ve tested the current behavior from plain C applications and posted the results at rgl/docker-windows-2016-vagrant, essentially:

Windows containers cannot be gracefully shutdown, either there is no shutdown notification or they are forcefully terminated after a while.

The next table describes whether a docker stop --time 30 <container> will graceful shutdown a container that is running a console, gui, or service app.

base image app behaviour
nanoserver console does not receive the shutdown notification
windowsservercore console receives the shutdown notification but is killed after about 5 seconds
nanoserver gui fails to run RegisterClass (there’s no GUI support in nano)
windowsservercore gui receives the shutdown notification but is killed after about 5 seconds
nanoserver service does not receive the shutdown notification
windowsservercore service does not receive the shutdown notification

@darrenstahlmsft Could you please comment on the last two answers?

And anyone: the Docker ecosystem on windows is challenging (to say the least) to follow from a consumer standpoint. I’ve read the entire thread 10 times and I still don’t understand if there are any requirements related to the host OS. Can someone with a deeper knowledge of this summarize the following:

  1. Are there any host OS version requirements in order to enable this feature?

  2. Whats the relationship between the Insider Build version numbers and the windowsservercore:1709 shown above?

Cheers

Hello everyone - I wanted to let you know we have now fixed this issue in the recent Server Insider build 17074 or higher. You can get them here: https://hub.docker.com/r/microsoft/windowsservercore-insider/tags/. Give it a try and let us know if we indeed did the job 😃.

There are other fixes and improvements on Server Core container. Find out more in the blog here: https://blogs.technet.microsoft.com/virtualization/2018/01/22/a-smaller-windows-server-core-container-with-better-application-compatibility/

@rgl I had time to test this today, and I can confirm that the upcoming Fall update to Windows Server and Windows 10 now correctly sends a notification to nanoserver as well as windowsservercore containers. Note that it still terminates after ~5 seconds.

On a Windows Insider build: graceful-terminating-console-application-windows based on microsoft/nanoserver logs this:

C:\host>type graceful-terminating-console-application-windows.log
2017-09-15 14:27:36 Running (pid=10980)... press CTRL+C to terminate.
2017-09-15 14:27:36 fd 0 is a pipe at
2017-09-15 14:27:36 fd 1 is a pipe at
2017-09-15 14:27:36 fd 2 is a pipe at
2017-09-15 14:28:24 Received the console CTRL_CLOSE_EVENT, gracefully terminating the application...
2017-09-15 14:28:24 Gracefully terminating the application in T-10...
2017-09-15 14:28:25 Gracefully terminating the application in T-9...
2017-09-15 14:28:26 Gracefully terminating the application in T-8...
2017-09-15 14:28:27 Gracefully terminating the application in T-7...
2017-09-15 14:28:28 Running (pid=1084)... press CTRL+C to terminate.
2017-09-15 14:28:28 fd 0 is a pipe at
2017-09-15 14:28:28 fd 1 is a pipe at
2017-09-15 14:28:28 fd 2 is a pipe at

It is still terminating after ~5 seconds, which for this release is expected, but now containers based off all container base images will get a notification before being shut down. I’m working with the kernel team to get the proper graceful shutdown including service shutdown notifications (and as a result, respect the Docker stop timeout correctly) done.

the shutdown is now much better than when it was when this issue was open! 😃

there’s still a pending issue, docker stop --time 600 does not wait that amount of time for the containers to terminate, can this also be fixed?

for reference, I’m testing this on my windows 2019 (1809) vagrant environment, here’s the resulting summary.

Graceful Container Shutdown

Windows containers cannot be gracefully shutdown because they are forcefully terminated after a while. Check the moby issue 25982 for progress.

The next table describes whether a docker stop --time 600 <container> will graceful shutdown a container that is running a console, gui, or service app.

base image app behavior
mcr.microsoft.com/windows/nanoserver:1809 console receives the CTRL_SHUTDOWN_EVENT notification but is killed after about 5 seconds
mcr.microsoft.com/windows/servercore:1809 console receives the CTRL_SHUTDOWN_EVENT notification but is killed after about 5 seconds
mcr.microsoft.com/windows:1809 console receives the CTRL_SHUTDOWN_EVENT notification but is killed after about 5 seconds
mcr.microsoft.com/windows/nanoserver:1809 service receives the SERVICE_CONTROL_PRESHUTDOWN notification but is killed after about 15 seconds
mcr.microsoft.com/windows/servercore:1809 service receives the SERVICE_CONTROL_PRESHUTDOWN notification but is killed after about 15 seconds
mcr.microsoft.com/windows:1809 service receives the SERVICE_CONTROL_PRESHUTDOWN notification but is killed after about 20 seconds
mcr.microsoft.com/windows/nanoserver:1809 gui fails to run because there is no GUI support libraries in the base image
mcr.microsoft.com/windows/servercore:1809 gui does not receive the shutdown messages WM_QUERYENDSESSION or WM_CLOSE
mcr.microsoft.com/windows:1809 gui does not receive the shutdown messages WM_QUERYENDSESSION or WM_CLOSE

NG setting WaitToKillServiceTimeout (e.g. Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Control -Name WaitToKillServiceTimeout -Value '450000') does not have any effect on extending the kill service timeout.

NB setting WaitToKillAppTimeout (e.g. New-ItemProperty -Force -Path 'HKU:\.DEFAULT\Control Panel\Desktop' -Name WaitToKillAppTimeout -Value '450000' -PropertyType String) does not have any effect on extending the kill application timeout.

@darrenstahlmsft Thank you for the comprehensive explanation. I have host Windows 10 1709 and image: microsoft/dotnet (which is Nano Server 1709). .Net Core Console App is dll actually so it cannot be run directly - it is run by dotnet.exe: ENTRYPOINT [“dotnet”, “TestNetCore.dll”] Doesn’t dotnet.exe forward CTRL_CLOSE_EVENT to the .Net Core Console App?

And second question - will Console.CancelKeyPress be invoked in 1803 Version? It is standard method for exit from ages. Using [DllImport(“Kernel32”)]SetConsoleCtrlHandler is a shame for such a basic functionality.

Thanks for updating this thread @weijuans-msft!

Just a small followup with some more info: The process shutdown notification has changed in the above insider update. Now, all processes in windowsservercore based containers will receive CTRL_SHUTDOWN_EVENT, rather than the previous CTRL_CLOSE_EVENT. Note there is a small bug in that insider build that causes the initial process (The same one that worked in the above shutdown trapping samples) to still receive a CTRL_CLOSE_EVENT, I have fixed this internally to now send CTRL_SHUTDOWN_EVENT like all other processes, and it will be hitting an insider build soon.

In the above insider build (17074), services in windowsservercore based containers will now receive the SERVICE_CONTROL_SHUTDOWN notification if they register for shutdown notifications.

The only change in nanoserver based containers is that the initial process will now receive CTRL_SHUTDOWN_EVENT rather than CTRL_CLOSE_EVENT. There’s still more work in the platform to enable shutdown notifications to all processes and services in nanoserver based containers.

@riverar Sorry for missing this question earlier. That is correct. If the initial process exits (by not handling the event, or returning from the event handler) the container will immediately shut down without waiting the full 5 seconds.

@sandersaares That is my understanding with current versions, yes.