polyaxon: Build fails due to memory.

Describe the bug

Build fail if running Job/Experiment right after polyaxon upload.

Traceback (most recent call last): 
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/init.py", line 97, in cmd nvidia_bin=nvidia_bin) 
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/init.py", line 37, in init commit=commit) 
File "/polyaxon/polyaxon/dockerizer/dockerizer/initializer/download.py", line 18, in download extract_path=extract_path 
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/api/project.py", line 210, in download_repo extract_path=extract_path) 
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/transport/http_transport.py", line 211, in download extract_path=extract_path) 
File "/usr/local/lib/python3.7/site-packages/polyaxon_client/transport/http_transport.py", line 232, in untar tar.extractall(extract_path)
File "/usr/local/lib/python3.7/tarfile.py", line 2002, in extractall numeric_owner=numeric_owner)
File "/usr/local/lib/python3.7/tarfile.py", line 2044, in extract numeric_owner=numeric_owner) 
File "/usr/local/lib/python3.7/tarfile.py", line 2114, in _extract_member self.makefile(tarinfo, targetpath)
File "/usr/local/lib/python3.7/tarfile.py", line 2163, in makefile copyfileobj(source, target, tarinfo.size, ReadError, bufsize) 
File "/usr/local/lib/python3.7/tarfile.py", line 250, in copyfileobj dst.write(buf) OSError: [Errno 12] Cannot allocate memory

To Reproduce

I’m using ubuntu 16.04. Script I use to run jobs usage: script.sh job.yaml -u, script.sh is below How I get the error. I have some code I upload which contains a python file. In that file I have a print('hello world'), If I run my script with the above command it breaks during build. Rerunning the command works. If I make a change, running a command fails to build. If I run it again, then it works. TLDR: first job fails, rerunning job works. script.sh

#!/bin/bash
set -e
cd $PWD
# get yaml
# determine if xp or job
YAML=$1
shift 1

if test -f "$YAML"; then
    echo "$YAML exist"
else
    echo "File $YAML does not exist."
    exit 1
fi

if [[ $# -ge 1 ]] && [[ $1 == '-u' ]]; then
  echo "Running: polyaxon upload"
  polyaxon upload
  shift 1
fi

TYPE=$(grep -ozP "version:\W*[0-9]*\W*kind:\W*\K([\w-_\\.@#\\$%&\\^]*).*" <"$YAML")
echo "Type of yaml given <$TYPE>."

OUTPUT=$(polyaxon run -f $YAML)
echo $OUTPUT
case $TYPE in 

  job)
      # TODO: get the number of job
      COUNT=$(grep -ozP "Job\W*\K(?:[0-9]*)" <<< $OUTPUT)
      polyaxon job -j $COUNT logs
      ;;
  experiment)
      COUNT=$(grep -ozP "Experiment\W*\K(?:[0-9]*)" <<< $OUTPUT)
      polyaxon experiment -xp $COUNT logs
      ;;
  group)
      COUNT=$(grep -ozP "Group\W*\K(?:[0-9]*)" <<< $OUTPUT)
      echo "No logs will be played. Group number is $COUNT."
      ;;
  *)
      echo "Error: $TYPE is not supported."
      exit 64
      ;;
  esac

build error.

Expected behavior

Not crash.

Environment

job.yaml

---
version: 1
kind: job
environment:
  resources:
    cpu:
      limits: 1
      requests: 1
build:
  dockerfile: polyaxon/Dockerfile
run:
  cmd:
    - ls polyaxon
    - python polyaxon/temp.py
    - sleep 10

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (5 by maintainers)

Most upvoted comments

@mouradmourafiq not sure. Wasn’t able to recreate anymore.