spark-on-k8s-operator: What I am doing wrong
I am trying to run a Pyspark application using operator. I can run it perfectly if I backed the python application in spark image but when I am trying to get them from s3, I am getting into all sort issue. Please advise what I am doing wrong:
Here is my YAML File
apiVersion: "sparkoperator.k8s.io/v1beta1"
kind: SparkApplication
metadata:
name: generic-pyspark2.4.4
namespace: random
spec:
type: Python
pythonVersion: "3"
mode: cluster
image: 'pyspark-2.4.4-hadoop-2.7:v0.1"
imagePullPolicy: Always
class: org.apache.spark.deploy.PythonRunner
mainApplicationFile: "s3a://buckets/pyspark/model.py"
sparkConf:
"spark.hadoop.fs.s3a.aws.credentials.provider": com.amazonaws.auth.InstanceProfileCredentialsProvider
"spark.hadoop.fs.s3a.impl": org.apache.hadoop.fs.s3a.S3AFileSystem
"spark.shuffle.service.enabled": false
"spark.speculation": false
deps:
pyFiles:
- "s3a://buckets/pyspark/aws_utils.py"
- "s3a://buckets/pyspark/dataset.py"
sparkVersion: "2.4.4"
driver:
cores: 2
# coreLimit: "1200m"
memory: "1024m"
labels:
version: 2.4.4
serviceAccount: sparkoperator
executor:
cores: 4
instances: 5
memory: "10240m"
labels:
version: 2.4.4
I am using
spark- 2.4.4aws-java-sdk-1.7.3.jarhadoop-aws-2.7.3.jarscala: 2.11
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (8 by maintainers)
Ahh…that’s exactly what it was…when I updated my helm installation command to provide a pre-existing service account with IAM role attached to it, it worked fine. Thanks a ton for the guidance @bbenzikry. Very grateful 😃. Cheers 😃
@JunaidChaudry You can take a look at https://github.com/bbenzikry/spark-eks/blob/main/docker/spark3.Dockerfile for a reference