polyaxon: Can't use TPU

Describe the bug

I tried to use Cloud TPU. But I got the error on StackDriver logging. And the experiment was failed. It seems that we need to specify tensorflow version with annotation.

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs","reason":"InternalError","details":{"causes":[{"message":"admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs"}]},"code":500}

To Reproduce

YAML

---
version: 1

kind: experiment

environment:
  resources:
    cpu:
      requests: 4
      limits: 4
    memory:
      requests: 15000
      limits: 15000
    tpu:
      requests: 8
      limits: 8

build:
  image: tensorflow/tensorflow:1.12.0
  build_steps:
    - pip install --no-cache-dir -r requirements.txt

run:
  # this is just a dummy python file.
  cmd: python test.py

requirements.txt

polyaxon-client==0.3.8
polyaxon-cli==0.3.8
jupyter
google-cloud-storage

Expected behavior

We can create a TPU.

Environment

  • Polyaxon: 0.3.8

Links

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (20 by maintainers)

Most upvoted comments

I understand we must use the fixed TPU type and TF version at the moment. I look forward to it. And I am glad to be on the same page with you. Thanks!

Ah I see, tpuTensorflowVersion and tpuResourceKey in your deployment config file. those are the default values. It will be reflected in the docs ASAP.

ok thanks, just to make sure, because for me it stays pending for long.

the real experiment pod is different.