polyaxon: Build fails due to memory.
Describe the bug
Build fail if running Job/Experiment right after polyaxon upload.
Traceback (most recent call last):
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/init.py", line 97, in cmd nvidia_bin=nvidia_bin)
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/init.py", line 37, in init commit=commit)
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/download.py", line 18, in download extract_path=extract_path
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/api/project.py", line 210, in download_repo extract_path=extract_path)
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/transport/http_transport.py", line 211, in download extract_path=extract_path)
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/transport/http_transport.py", line 232, in untar tar.extractall(extract_path)
File "/usr/local/lib/python3.7/tarfile.py", line 2002, in extractall numeric_owner=numeric_owner)
File "/usr/local/lib/python3.7/tarfile.py", line 2044, in extract numeric_owner=numeric_owner)
File "/usr/local/lib/python3.7/tarfile.py", line 2114, in _extract_member self.makefile(tarinfo, targetpath)
File "/usr/local/lib/python3.7/tarfile.py", line 2163, in makefile copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/usr/local/lib/python3.7/tarfile.py", line 250, in copyfileobj dst.write(buf) OSError: [Errno 12] Cannot allocate memory
To Reproduce
I’m using ubuntu 16.04. Script I use to run jobs usage: script.sh job.yaml -u, script.sh is below
How I get the error. I have some code I upload which contains a python file. In that file I have a print('hello world'), If I run my script with the above command it breaks during build. Rerunning the command works. If I make a change, running a command fails to build. If I run it again, then it works.
TLDR: first job fails, rerunning job works.
script.sh
#!/bin/bash
set -e
cd $PWD
# get yaml
# determine if xp or job
YAML=$1
shift 1
if test -f "$YAML"; then
echo "$YAML exist"
else
echo "File $YAML does not exist."
exit 1
fi
if [[ $# -ge 1 ]] && [[ $1 == '-u' ]]; then
echo "Running: polyaxon upload"
polyaxon upload
shift 1
fi
TYPE=$(grep -ozP "version:\W*[0-9]*\W*kind:\W*\K([\w-_\\.@#\\$%&\\^]*).*" <"$YAML")
echo "Type of yaml given <$TYPE>."
OUTPUT=$(polyaxon run -f $YAML)
echo $OUTPUT
case $TYPE in
job)
# TODO: get the number of job
COUNT=$(grep -ozP "Job\W*\K(?:[0-9]*)" <<< $OUTPUT)
polyaxon job -j $COUNT logs
;;
experiment)
COUNT=$(grep -ozP "Experiment\W*\K(?:[0-9]*)" <<< $OUTPUT)
polyaxon experiment -xp $COUNT logs
;;
group)
COUNT=$(grep -ozP "Group\W*\K(?:[0-9]*)" <<< $OUTPUT)
echo "No logs will be played. Group number is $COUNT."
;;
*)
echo "Error: $TYPE is not supported."
exit 64
;;
esac
build error.
Expected behavior
Not crash.
Environment
job.yaml
---
version: 1
kind: job
environment:
resources:
cpu:
limits: 1
requests: 1
build:
dockerfile: polyaxon/Dockerfile
run:
cmd:
- ls polyaxon
- python polyaxon/temp.py
- sleep 10
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (5 by maintainers)
@mouradmourafiq not sure. Wasn’t able to recreate anymore.