spark-bigquery-connector: Missing maven dependencies when using --packages and ClassNotFound when using --jars

Hi,

I want to play a little bit with the BigQuery connector (on AWS EMR version 5.24.1 with Spark 2.4.2) and run this command: pyspark --packages com.google.cloud.spark:spark-bigquery_2.11:0.9.1-beta. But the following three dependencies seem to be missing in maven central:

  • javax.jms#jms;1.1!jms.jar
  • com.sun.jdmk#jmxtools;1.2.1!jmxtools.jar
  • com.sun.jmx#jmxri;1.2.1!jmxri.jar

As a workaround, I tried to download the JAR from here: https://console.cloud.google.com/storage/browser/spark-lib/bigquery and add it to the classpath with this command: pyspark --jars spark-bigquery-latest.jar. But when I tried to read a table from BigQuery, I get this error: ClassNotFoundException: Failed to find data source: com.google.cloud.spark.bigquery.

I also tried to use com.google.cloud.spark.bigquery instead of just “bigquery” in format(), without success.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Okay, now it works with:

pyspark \     
    --files <path-to-credential-file> \
    --conf spark.executorEnv.GOOGLE_APPLICATION_CREDENTIALS=<name-of-credential-file> \
    --conf spark.yarn.appMasterEnv.GOOGLE_APPLICATION_CREDENTIALS=<name-of-credential-file> \
    --jars <path-to-bigquery-lib-jar> \

And in the code just:

spark.read.format("bigquery").option("table", "publicdata.samples.shakespeare").load()

Thanks a lot for your support!

Created #72 to handle the --packages issue