spark-on-k8s-operator: Spark submit in operator fails
Hi all, I seem to be having some issues with the getting a spark application up and running: hittig issues like this:
21/06/04 07:42:53 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
21/06/04 07:42:53 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [my-ns] failed.
I have istio on the cluster hence I also tired the following settings with no avail:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: my-ns
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.1.1"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-v3.1.1.jar"
sparkVersion: "3.1.1"
batchScheduler: "volcano" #Note: the batch scheduler name must be specified with `volcano`
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
annotations:
sidecar.istio.io/inject: "false"
serviceAccount: default-editor
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.1.1
annotations:
sidecar.istio.io/inject: "false"
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
So somehow it seems like the application is not able to communicate with the kubernetes API. the default-editior sa has the following rules:
- apiGroups:
- sparkoperator.k8s.io
resources:
- sparkapplications
- scheduledsparkapplications
- sparkapplications/status
- scheduledsparkapplications/status
verbs:
- '*'
- apiGroups: [""]
resources: ["pods"]
verbs: ["*"]
- apiGroups: [""]
resources: ["services"]
verbs: ["*"]
i also added the authorizationpolicy to allow traffic for for webhook & operator:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: spark-operator
namespace: spark
spec:
selector:
matchLabels:
app.kubernetes.io/name: spark-operator
rules:
- {}
If anyone has seen this before or has any valuable pointers. that would be much appreciated.
k8s: 1.19 version: “v1beta2-1.2.3-3.1.1” chart: 1.1.3 istio: 1.19
This PROTOCOL_ERROR might also be a pointer towards the underlying issue:
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:349)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:84)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:139)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2611)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 19 (4 by maintainers)
https://github.com/fabric8io/kubernetes-client/issues/3176#issuecomment-853915701 is a good write-up of the root-cause.
In short, fabric8’s kubernetes-client cannot communicate with a Kubernetes API server where the weak TLS cipher TLS_RSA_WITH_AES_256_GCM_SHA384 has been disabled. Disabling HTTP2 is a work-around.
So the issue is related to https://github.com/fabric8io/kubernetes-client/issues/2212#issuecomment-628551315
In order to make it work. we had to add the following to the spark-operator, driver & executor:
Wow! That worked @JunaidChaudry . However I’m confused as to why.
I literally spun a whole new EKS cluster just in March this year and have been using that as our official QA cluster. Deployments there are still going as smooth as butter.
I suddenly started getting into precisely this problem after I spun another cluster a couple of days back. The interesting thing is deployments on the old cluster are still working fine.
I read through the conversation in your linked issue and indeed the new version of node AMI has been released on May 1, post which this issue started manifesting.
Thank You so much for your help.