aws-sam-cli: Running sam build from a container gets stuck
Description:
Hello. I’m trying to Dockerize a tool (essentially a wrapper script) that utilizes aws-sam-cli
. I’m unable to successfully execute sam build
from inside of my “wrapper” container as it gets stuck at this point:
Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container
From that point, the build doesn’t move forward (I tried to wait for over an hour). I’ve also tried different versions of aws-sam-cli
but with the same result. Running other containers inside of my container setup works as expected.
Steps to reproduce:
- Build a Docker image with
aws-sam-cli==1.36.0
installed - Run build inside of a container
docker run -it --rm -v /source/my-application:/my-application -v /var/run/docker.sock:/var/run/docker.sock wrapper build
wrapper build
executes the following
sam build \
--region ${AWS_REGION_CODE} \
--profile ${AWS_PROFILE} \
--template "$function_src_dir"/template.yaml \
--build-dir "${AWS_SAM_BUILD_DIR}" \
--use-container --parallel --debug
The Docker image is based on ubuntu:20.04
with python3
installed.
Observed result:
2021-12-02 15:19:31,364 | Building codeuri: /source/my-application runtime: python3.9 metadata: {} architecture: x86_64 functions: ['MyFunction']
2021-12-02 15:19:31,365 | Building to following folder /root/aws-sam-generated-artifacts/123456789012/ireland/my-application/MyFunction
2021-12-02 15:19:31,365 | Waiting for async results
Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container
From this point on, the build gets stuck.
Expected result:
sam build
should work when run from inside of a container
Additional environment details (Ex: Windows, Mac, Amazon Linux etc)
- OS: MacOS 12
sam --version
: 1.36.0- AWS region:
eu-west-1
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 24 (5 by maintainers)
Hi there,
There are couple different use cases here, please provide your specific example below or create a new issue so that we can check them separately.
For most common use case (or issue), in order to execute any cross arch docker image on Linux, have to run multiarch/qemu-user-static single time for each run in order to enable emulation. See; https://github.com/multiarch/qemu-user-static. This is due to limitation of Docker Linux and this is not applicable for MacOS or Windows instances.
I’ve created following Github repository for this example: https://github.com/mndeveci/example-arm-lambda-build
If I just use
sam build -u
without the step above, the build process is stuck since host can’t execute cross arch instructions. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792694386/job/21251164713But if I add
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
just before the build step here, then my build succeeds. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792717799/job/21251231671Is there any update on this? I am facing the same issue on Ubuntu 22.04
I’m re-opening the issue as the problem still persists. The reason why my previous build worked was that the CF stack didn’t contain any Lambda function. After trying to run
sam build
for a stack which contains a Lambda function from a container, I experienced the same problem - the build gets stuck.From this point, the build doesn’t progress. I also tried to upgrade the runtime to
python3.9
but that didn’t work either. My current sam version isaws-sam-cli==1.51.0
.Can we get any update on this, please?
I did some investigation and it looks like this gets hung when the docker engine is unable to create the container from the container environment for various reasons. It appears sam gets hung at this call: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/local/docker/container.py#L284
Which is entrapped in this try:/finally statement with no exception capturing here: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/lib/build/app_builder.py#L844
I suspect that because docker couldn’t create the container it hangs when attempting to stop. After some debugging on an environment that reliably had this error I could see that during the create step the docker engine returns a status code 500 and then the docker engine runs through a series of other apis before it hangs completely. There was no error code in the other calls after the 500 is returned. I was unable to track down where in sam or docker where these new apis are being called after the create fails.
So the initial issue appears to be caused by a bug when running with the python-docker library in certain environments when run from a container and then there appears to be a bug in how sam handles this particular type of failure.
I wasn’t able to confirm this but most of the issues that report this problem are run from either a windows or a macbook. I haven’t seen a linux docker in docker fail so there could be something in the way the socket is shared from the host to the container. Part of the reason I suspect this issue is os related is this error didn’t occur in a docker in docker environment built on linux but it did happen on my macbook.
For anyone else that runs into this issue try setting up the docker in docker environment on a linux host instead of windows or macos.