spark-on-k8s-operator: On forcefully deletion of a driver, the driver is not getting restarted.

I have a spark streaming job that I am trying to submit by a spark-k8-operator. I have kept the restart policy as Always. However, on the manual deletion of the driver the driver is not getting restarted. My yaml: apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: test-v2 namespace: default spec: type: Scala mode: cluster image: "com/test:v1.0" imagePullPolicy: Never mainClass: com.test.TestStreamingJob mainApplicationFile: "local:///opt/spark-2.4.5/work-dir/target/scala-2.12/test-0.1.jar" sparkVersion: "2.4.5" **restartPolicy: type: Always** volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 2.4.5 serviceAccount: spark volumeMounts: - name: "test-volume" mountPath: "/tmp" terminationGracePeriodSeconds: 60 executor: cores: 1 instances: 2 memory: "512m" labels: version: 2.4.5 volumeMounts: - name: "test-volume" mountPath: "/tmp" Spark version: 2.4.5 apiVersion: “sparkoperator.k8s.io/v1beta2”

Steps which I followed:

  1. Create resource via kubectl apply -f examples/spark-test.yaml . Pod created successfully.
  2. Delete the driver manually.

Expected behavior: A new driver pod would be restarted as per the restart policy.

Actual behavior: Driver and executor pods got deleted.

Environment: Testing out this with Docker On Mac. With 4 CPUs and 8 GB Memory

Logs from spark -operator {FAILING driver pod failed with ExitCode: 143, Reason: Error}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 17 (12 by maintainers)

Most upvoted comments

Sorry to be so imprecise. I want something to always restart on success or failure. This is what Always means, true? And that didn’t work for the OP. So would something like this accomplish that?

restartPolicy:
  type: Always
  onFailureRetries: 1000000
  onFailureRetryInterval: 10
  onSubmissionFailureRetries: 3
  onSubmissionFailureRetryInterval: 10