spark-on-k8s-operator: Cannot launch driver after Spark default CPU value to int32
Hey there!
Firstly thank you for everything you are trying to achieve. When trying to carve your own path, projects like this surely make for great ways for new players in the data engineering space to get started running clusters and forge great data experiences 😃
Here’s the problem…
I’ve noticed that the Apache Spark on k8s specification has changed the no. CPUs from a float to an integer. With the release of the latest API version this has been reflected https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/578.
Fairly new to kubernates but this seems to create a conflict with the driver runner where I’m experiencing…
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.96.0.1/api/v1/namespaces/spark-operator/pods. Message: Pod "spark-pi-driver" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.containers[0].resources.requests, message=Invalid value: "1": must be less than or equal to cpu limit, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=spark-pi-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "spark-pi-driver" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit, metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
This is likely because Invalid value: "1": must be less than or equal to cpu limit.. So our new minimum value 1 seems to be the default value. But the error indicates that the CPU limit for Kubernates must be 1. Perhaps this is related to this issue https://github.com/kubernetes/kubernetes/issues/51430. Not sure how to resolve this, may be my Kubernetes configuration at fault here more than anything.
Environment
Here is the environment I’m running (from the pod)…
λ kubectl describe pod spark-sparkoperator-7c6d6f9cfd-6257n
Name: spark-sparkoperator-7c6d6f9cfd-6257n
Namespace: spark-operator
Priority: 0
PriorityClassName: <none>
Node: docker-desktop/192.168.65.3
Start Time: Sun, 08 Dec 2019 01:10:57 +0000
Labels: app.kubernetes.io/name=sparkoperator
app.kubernetes.io/version=v1beta2-1.0.1-2.4.4
pod-template-hash=7c6d6f9cfd
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 10.1.0.9
Controlled By: ReplicaSet/spark-sparkoperator-7c6d6f9cfd
Containers:
sparkoperator:
Container ID: docker://96d7a6908bad62e35fcfd530ca5337073a27602c468f2b7580f65cce4c48fd38
Image: gcr.io/spark-operator/spark-operator:v1beta2-1.0.1-2.4.4
Image ID: docker-pullable://gcr.io/spark-operator/spark-operator@sha256:ce769e5c6a5d8fa78ceb1a0abaf961fb2424767f9535c97baac04a18169654bd
Port: 10254/TCP
Host Port: 0/TCP
Args:
-v=2
-namespace=
-ingress-url-format=
-controller-threads=10
-resync-interval=30
-logtostderr
-enable-metrics=true
-metrics-labels=app_type
-metrics-port=10254
-metrics-endpoint=/metrics
-metrics-prefix=
State: Running
Started: Sun, 08 Dec 2019 01:10:58 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from spark-sparkoperator-token-w7dmr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
spark-sparkoperator-token-w7dmr:
Type: Secret (a volume populated by a Secret)
SecretName: spark-sparkoperator-token-w7dmr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m11s default-scheduler Successfully assigned spark-operator/spark-sparkoperator-7c6d6f9cfd-6257n to docker-desktop
Normal Pulled 3m10s kubelet, docker-desktop Container image "gcr.io/spark-operator/spark-operator:v1beta2-1.0.1-2.4.4" already present on machine
Normal Created 3m10s kubelet, docker-desktop Created container sparkoperator
Normal Started 3m10s kubelet, docker-desktop Started container sparkoperator
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (8 by maintainers)
This is due to a recent change in https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/578 that introduced the
v1beta2version of the API. The change was to change the type of.spec.driver.coresto an integer to be consistent with the Spark config propertyspark.driver.coreswhich is an integer. Spark 3.0 will have a new config propertyspark.kubernetes.driver.request.coresfor setting the CPU request for the driver pod. We will add support for that soon. This new config property supports Kubernetes-comformant values, e.g.,0.1and100mand is used for specifying the CPU request of the driver pod, independently fromspark.driver.coresthat.spec.driver.coresmaps to.