python-deequ: TypeError: 'JavaPackage' object is not callable when running pydeequ
Describe the bug I’ve got an exception when I try to run pydeequ: “TypeError: ‘JavaPackage’ object is not callable”.
To Reproduce Steps to reproduce the behavior:
- pip install pydeequ==0.1.5
- Code:
from pyspark.sql import SparkSession, Row
import pydeequ
spark = (SparkSession
.builder
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())
df = spark.sparkContext.parallelize([
Row(a="foo", b=1, c=5),
Row(a="bar", b=2, c=6),
Row(a="baz", b=3, c=None)]).toDF()
from pydeequ.analyzers import *
analysisResult = AnalysisRunner(spark) \
.onData(df) \
.addAnalyzer(Size()) \
.addAnalyzer(Completeness("b")) \
.run()
analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()
- Execute the code above
- See error: TypeError: ‘JavaPackage’ object is not callable
Expected behavior I was expecting the results of the analyzer.
Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
- Apache Spark 3.0.0
- Scala 2.12
- Pydeequ = 0.1.5
Additional context I’m running it on a Databricks cluster.
Thank you for your help.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 10
- Comments: 33 (6 by maintainers)
experiencing the same issue. solved by using
pyspark --jars /path-to-the-jar/deequ-1.0.5.jarmore info: python version: 3.7.9 spark version: 2.4.7 scala version: 2.13.4I installed the following maven package directly instead of pydeequ.deequ_maven_coord
com.amazon.deequ:deequ:1.1.0_spark-3.0-scala-2.12
you need to check wthr they have an exact match for your cluster and add it as a maven package on the databricks cluster. @anusha610 if you are running it on locally(using dbconnect) use the spark object as follows,
spark = (SparkSession .builder .config(“spark.jars.packages”, ‘com.amazon.deequ:deequ:1.1.0_spark-3.0-scala-2.12’) .config(“spark.jars.excludes”, pydeequ.f2j_maven_coord) .getOrCreate())
We have not tested with databricks yet, but here is how you’d get started with an Amazon EMR cluster – I presume there may be some overlap here! Copy and pasted below:
Your EMR cluster must be running Spark v2.4.6 in order to work with PyDeequ. Once you have a running cluster that has those components and a SageMaker notebook with the necessary permissions, you can configure a SparkSession object from the below template to connect to your cluster. If you need a refresher on how to connect a SageMaker Notebook to EMR, check out this AWS blogpost on using Sparkmagic.
Once you’re in the SageMaker Notebook, run the following JSON in a cell before you start your SparkSession to configure your EMR cluster.
Start your SparkSession object in a cell after the above configuration by running
spark, then use the SparkContext (default namedsc) to install PyDeequ onto your cluster like sousing
pyspark --jars {PATH_TO_DEEQ_JAR}resolves this error for me, i think this should be added to the installation steps.It works fine with following configuration,
Use https://mvnrepository.com/artifact/com.amazon.deequ/deequ to pick the deeque version and spark version for
spark.jars.packageAlso, if you are using databricks, make sure that you install this to the cluster libraries maven packages.
@MOHACGCG @vinura - Could you please suggest the script changes for this fix?
Issue: Am facing the similar error using DataBricks with below pydeequ version Error: TypeError: ‘JavaPackage’ object is not callable
Tried: Downloaded the suggested Jars and uploaded to Databricks filestore and passed the same for spark session
Attached screenshot of error
Could you please suggest with appropriate version, steps and scripts for data-bricks implementations
@SerenaLin2020 We have not tested PyDeequ with
deequ-1.0.5.jar, so some functionalities may be impaired. Please try withdeequ-1.0.3.jarand keep us updated! 😄Thanks @gucciwang for the insight. However, it was not working automatically as intended. Perhaps it was to do with my setup. I therefore followed @MOHACGCG instruction in the above comment and it works now. Kindly make note of this in the readme file in the interest of larger audience.
I just downloaded the jar from here and passed it on.
same issue on spark version 2.4.3. I’m using 2.4.3 hoping to load pydeequ to glue etl. Do you know if deequ is compatible with glue v2?