aws-sam-cli: Running sam build from a container gets stuck

Description:

Hello. I’m trying to Dockerize a tool (essentially a wrapper script) that utilizes aws-sam-cli. I’m unable to successfully execute sam build from inside of my “wrapper” container as it gets stuck at this point:

Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container

From that point, the build doesn’t move forward (I tried to wait for over an hour). I’ve also tried different versions of aws-sam-cli but with the same result. Running other containers inside of my container setup works as expected.

Steps to reproduce:

  1. Build a Docker image with aws-sam-cli==1.36.0 installed
  2. Run build inside of a container docker run -it --rm -v /source/my-application:/my-application -v /var/run/docker.sock:/var/run/docker.sock wrapper build

wrapper build executes the following

sam build \
    --region ${AWS_REGION_CODE} \
    --profile ${AWS_PROFILE} \
    --template "$function_src_dir"/template.yaml \
    --build-dir "${AWS_SAM_BUILD_DIR}" \
    --use-container --parallel --debug

The Docker image is based on ubuntu:20.04 with python3 installed.

Observed result:

2021-12-02 15:19:31,364 | Building codeuri: /source/my-application runtime: python3.9 metadata: {} architecture: x86_64 functions: ['MyFunction']
2021-12-02 15:19:31,365 | Building to following folder /root/aws-sam-generated-artifacts/123456789012/ireland/my-application/MyFunction
2021-12-02 15:19:31,365 | Waiting for async results

Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container

From this point on, the build gets stuck.

Expected result:

sam build should work when run from inside of a container

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

  1. OS: MacOS 12
  2. sam --version: 1.36.0
  3. AWS region: eu-west-1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 24 (5 by maintainers)

Most upvoted comments

Hi there,

There are couple different use cases here, please provide your specific example below or create a new issue so that we can check them separately.

For most common use case (or issue), in order to execute any cross arch docker image on Linux, have to run multiarch/qemu-user-static single time for each run in order to enable emulation. See; https://github.com/multiarch/qemu-user-static. This is due to limitation of Docker Linux and this is not applicable for MacOS or Windows instances.

I’ve created following Github repository for this example: https://github.com/mndeveci/example-arm-lambda-build

If I just use sam build -u without the step above, the build process is stuck since host can’t execute cross arch instructions. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792694386/job/21251164713

But if I add docker run --rm --privileged multiarch/qemu-user-static --reset -p yes just before the build step here, then my build succeeds. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792717799/job/21251231671

Is there any update on this? I am facing the same issue on Ubuntu 22.04

I’m re-opening the issue as the problem still persists. The reason why my previous build worked was that the CF stack didn’t contain any Lambda function. After trying to run sam build for a stack which contains a Lambda function from a container, I experienced the same problem - the build gets stuck.

2022-06-13 12:46:48,791 | Async execution started
2022-06-13 12:46:48,791 | Invoking function functools.partial(<bound method DefaultBuildStrategy.build_single_function_definition of <samcli.lib.build.build_strategy.DefaultBuildStrategy object at 0x7fdf18178d00>>, <samcli.lib.build.build_graph.FunctionBuildDefinition object at 0x7fdf18178b20>)
2022-06-13 12:46:48,792 | Building codeuri: /src/my_application runtime: python3.7 metadata: {} architecture: x86_64 functions: ['ClzSecHubIncidentsCloseFunction']
2022-06-13 12:46:48,792 | Building to following folder /root/aws-sam-generated-artifacts/633965134003/ireland/my_application/MyApplicationFunction
2022-06-13 12:46:48,792 | Waiting for async results

Fetching public.ecr.aws/sam/build-python3.7:latest-x86_64 Docker container image..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
2022-06-13 12:47:40,626 | Mounting /src/my_application as /tmp/samcli/source:ro,delegated inside runtime container

From this point, the build doesn’t progress. I also tried to upgrade the runtime to python3.9 but that didn’t work either. My current sam version is aws-sam-cli==1.51.0.

Can we get any update on this, please?

I did some investigation and it looks like this gets hung when the docker engine is unable to create the container from the container environment for various reasons. It appears sam gets hung at this call: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/local/docker/container.py#L284

Which is entrapped in this try:/finally statement with no exception capturing here: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/lib/build/app_builder.py#L844

I suspect that because docker couldn’t create the container it hangs when attempting to stop. After some debugging on an environment that reliably had this error I could see that during the create step the docker engine returns a status code 500 and then the docker engine runs through a series of other apis before it hangs completely. There was no error code in the other calls after the 500 is returned. I was unable to track down where in sam or docker where these new apis are being called after the create fails.

So the initial issue appears to be caused by a bug when running with the python-docker library in certain environments when run from a container and then there appears to be a bug in how sam handles this particular type of failure.

I wasn’t able to confirm this but most of the issues that report this problem are run from either a windows or a macbook. I haven’t seen a linux docker in docker fail so there could be something in the way the socket is shared from the host to the container. Part of the reason I suspect this issue is os related is this error didn’t occur in a docker in docker environment built on linux but it did happen on my macbook.

For anyone else that runs into this issue try setting up the docker in docker environment on a linux host instead of windows or macos.