IIS.ServiceMonitor: ServiceMonitor causes microsoft/iis docker container to exit with error

Containers die with the following log:

Service 'w3svc' has been stopped
APPCMD failed with error code 259
Failed to update IIS configuration

Not sure if this is related at all with #29 and #4 but have been seeing this when running the latest tag of microsoft:iis on a Windows Server 2016 AMI on AWS EC2.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 45 (9 by maintainers)

Most upvoted comments

For me this issue keeps happening. No AWS here. Just bare metal.

@peterngai yes but I do it in a run script. Below are my Dockerfile and run.ps1. My file structure has a rootfs directory, which contains run.ps1 and Wait-Service.ps1, next to the Dockerfile that gets copied in its entirety to the root of the image. It also expects the application code to be in an app directory next to the Dockerfile, but you can easily change that.

./Dockerfile

FROM microsoft/aspnet:4.7.1-windowsservercore-ltsc2016

SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

######################### install splunk
COPY installers /installers

RUN \
# Install IIS Rewrite module
    & 'c:/installers/rewrite_amd64_en-US.msi' /qn /quiet /norestart; Get-Process -Name "msiexec" | Wait-Process; \

# Cleanup
    Remove-Item -Path /installers -Recurse;\
# setup IIS
    Install-WindowsFeature NET-Framework-45-ASPNET ; \
    Install-WindowsFeature Web-Asp-Net45; \
    Remove-WebSite -Name 'Default Web Site';\
    mkdir 'c:\app'; \
    New-Website -Name 'Default Web Site' -Port 80 -PhysicalPath 'c:\app' -ApplicationPool 'DefaultAppPool';

COPY rootfs /
COPY app /app

ENTRYPOINT ["powershell", "c:/run.ps1"]

./rootfs/run.ps1

if( Test-Path env:TZ) {
  Write-Output "Setting TimeZone: $env:TZ"
  Set-TimeZone "$env:TZ"
}

# ToDo: This should remove already set environmentVariables otherwise it will fail setting duplicates
Write-Output "Setting Environment Variables for Default Web App..."
$exclusionList = @('ProgramFiles(x86)', 'CommonProgramFiles(x86)','TMP','TEMP','USERNAME','USERPROFILE','APPDATA','LOCALAPPDATA','PROGRAMDATA','PSMODULEPATH','PUBLIC','USERDOMAIN','ALLUSERSPROFILE','PATHEXT','PATH','COMPUTERNAME','COMSPEC','OS','PROCESSOR_IDENTIFIER','PROCESSOR_LEVEL','PROCESSOR_REVISION','PROGRAMFILES','PROGRAMFILES','PROGRAMW6432','SYSTEMDRIVE','WINDIR','NUMBER_OF_PROCESSORS','PROCESSOR_ARCHITECTURE','SYSTEMROOT','COMMONPROGRAMFILES','COMMONPROGRAMFILES','COMMONPROGRAMW6432','DRIVERDATA')
$envVars = $(gci env:* | where-object { $_.Name -notin $exclusionList}) | ForEach-Object { "/+`"[name='DefaultAppPool'].environmentVariables.[name='$($_.Name)',value='$($_.Value)']`"" }

& 'C:/windows/system32/inetsrv/appcmd' 'set' 'config' '-section:system.applicationHost/applicationPools' $envVars '/commit:apphost'

Write-Output "Post Config"
$varOut = $(& 'C:/windows/system32/inetsrv/appcmd' @('list', 'config', '-section:system.applicationHost/applicationPools'))
Write-Output $varOut

Write-Output "Starting Service Monitor..."
.\Wait-Service.ps1 -ServiceName W3SVC

Hopefully I’ve got good news (sorry for formatting - GitHub is not cooperating today). So, I was digging through the ECS Agent source code and found reference to the following issue #1127. I’ve also observed my ECS launched containers to have CpuPercent=1 which, indeed, seems like the root of the problem - container just doesn’t get enough computing power (and that is why it working on larger instances). Now, if you look into actual “temporary fix” (agent version 1.17.1) there is magic environment variable ECS_ENABLE_CPU_UNBOUNDED_WINDOWS_WORKAROUND. I’ve set it to true and restarted ECS Agent and all my container instances started working (docker inspect shows CpuPercent=0). So:

  1. Leave “Task CPU” in your task definition blank
  2. In EC2 Instance, hosting ECS, add environment variable ECS_ENABLE_CPU_UNBOUNDED_WINDOWS_WORKAROUND=true
  3. Restart ECS Agent service.

The drawback of this workaround is that your container instances will use all available CPUs.

you’re perfectly right @me-viper !

🐰 First I want to confirm that the docker image containing my app is valid : when i docker run it manually, there is no problem, even when i try directly on the ECS host instance my app runs just fine.

that fact, plus your last comment (“weird interaction with Amazon ECS”) lead me to consider that the problem only occurs when my app container is started BY the ECS agent

So I wondered how ECS agent could impact the execution of the container context, and i remembered that the agent actually uses the values I entered myself in the Task Definition panel to run my container such as : network mode, memory and cpu limits, etc etc

I eventually managed to reach a stable state by tweaking my task definition (memory, cpu limits) and now my ECS service runs just fine ! 👍👍

here you can see the values i used, which made the service become stable again :

amazon_ecs

@chrisjohnson00 for me using an ASP.NET MVC app I settled on about 200 millicores and 400 megabytes of memory, if you don’t have limits and your node has plenty of capacity maybe your issue is something else, but try and give it at least those in the requests and see what happens, good luck!

It looks like there were multiple issues being discussed but the most prominent one due to the bug in Amazon ECS is resolved with workarounds suggested. If there are other unresolved issues please file them separately.

I eventually gave up on using the built in service monitor and instead used a combination of this script Wait-Service.ps1 and the following code to handle IIS environment variables.

Write-Output "Setting Environment Variables for Default Web App..."
$exclusionList = @('ProgramFiles(x86)', 'CommonProgramFiles(x86)','TMP','TEMP','USERNAME','USERPROFILE','APPDATA','LOCALAPPDATA','PROGRAMDATA','PSMODULEPATH','PUBLIC','USERDOMAIN','ALLUSERSPROFILE','PATHEXT','PATH','COMPUTERNAME','COMSPEC','OS','PROCESSOR_IDENTIFIER','PROCESSOR_LEVEL','PROCESSOR_REVISION','PROGRAMFILES','PROGRAMFILES','PROGRAMW6432','SYSTEMDRIVE','WINDIR','NUMBER_OF_PROCESSORS','PROCESSOR_ARCHITECTURE','SYSTEMROOT','COMMONPROGRAMFILES','COMMONPROGRAMFILES','COMMONPROGRAMW6432','DRIVERDATA')
$envVars = $(gci env:* | where-object { $_.Name -notin $exclusionList}) | ForEach-Object { "/+`"[name='DefaultAppPool'].environmentVariables.[name='$($_.Name)',value='$($_.Value)']`"" }

& 'C:/windows/system32/inetsrv/appcmd' 'set' 'config' '-section:system.applicationHost/applicationPools' $envVars '/commit:apphost'
.\Wait-Service.ps1 -ServiceName W3SVC

This doesn’t handle duplicates properly, I ditched fixing it since it wasn’t a use case we needed to support but I hope this provides someone with a way out of this ServiceMonitor headache.

One more bit that makes picture complete. Task definition UI is pretty misleading, to say the least. We’ve got two things that affect CPU:

  1. Task definition itself: taskcpu

  2. Container definition: containercpu

From what I observe, if you set Task CPU > 0 (screen 1) you have to set CPU units to something > 0 (screen 2) otherwise you’ll end up with CPUPercent=1. If you don’t set Task CPU you need to set ECS_ENABLE_CPU_UNBOUNDED_WINDOWS_WORKAROUND=true or you, again, end up with CPUPercent=1.

I guess, that makes some sense but, for me personally, it was far from obvious.

I have the same issue. “APPCMD failed with error code 259.” For me problem is reproduced only when containers are handled by AWS ECS. When I start same container on the same EC2 instance manually (aka docker run …) everything works just fine.