polyaxon: Can't use TPU
Describe the bug
I tried to use Cloud TPU. But I got the error on StackDriver logging. And the experiment was failed. It seems that we need to specify tensorflow version with annotation.
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs","reason":"InternalError","details":{"causes":[{"message":"admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs"}]},"code":500}
To Reproduce
YAML
---
version: 1
kind: experiment
environment:
resources:
cpu:
requests: 4
limits: 4
memory:
requests: 15000
limits: 15000
tpu:
requests: 8
limits: 8
build:
image: tensorflow/tensorflow:1.12.0
build_steps:
- pip install --no-cache-dir -r requirements.txt
run:
# this is just a dummy python file.
cmd: python test.py
requirements.txt
polyaxon-client==0.3.8
polyaxon-cli==0.3.8
jupyter
google-cloud-storage
Expected behavior
We can create a TPU.
Environment
- Polyaxon: 0.3.8
Links
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 20 (20 by maintainers)
I understand we must use the fixed TPU type and TF version at the moment. I look forward to it. And I am glad to be on the same page with you. Thanks!
Ah I see,
tpuTensorflowVersionandtpuResourceKeyin your deployment config file. those are the default values. It will be reflected in the docs ASAP.ok thanks, just to make sure, because for me it stays pending for long.
the real experiment pod is different.