trino: backend scala can't load dataframe while using the Presto driver _ java.sql.Exception

This is where the test break

 df = spark.read.format("jdbc")
                .option("url", url_ )
                .option("dbtable", dbtable_ )
                .option("driver", "com.facebook.presto.jdbc.PrestoDriver")
                .load()

**The Error that I got:

java.sql Unsupported type Array**

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (8 by maintainers)

Most upvoted comments

@AssouliDFK Could you tell me the reason why you did thumbs down? Please correct me if my understanding is wrong.

ebyhr on May 2, 2022

Please try with .option("dbtable", "pg_type").

df = spark.read
               .format("jdbc")
               .option("url", URL_ )
               .option("dbtable", "pg_type")
               .option("driver", "org.postgresql.Driver")
               .load()

scala> var df = spark.read.format("jdbc").option("url", "jdbc:postgresql://localhost:15432/test?user=test&password=test").option("dbtable", "pg_type").option("driver", "org.postgresql.Driver").load()
java.sql.SQLException: Unsupported type ARRAY
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
  ... 49 elided

ebyhr on Dec 19, 2019

It appears Spark doesn’t support arrays generically for JDBC, but only for specific databases like PostgreSQL. This is why it isn’t working for Presto.

Other than changing Spark to add support for Presto, you might be able to work around this by converting the array column to JSON text in the SQL query: json_format(cast(x as json))

electrum on Dec 19, 2019

I guess the reason is Spark has PostgresDialect and the logic isn’t completely the same as Presto case.

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L320-L321

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala#L221

PostgresDialect has special logic for ARRAY type at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala#L42-L45

ebyhr on Dec 19, 2019

@AssouliDFK However, JdbcUtils.scala#L205 is showing that Spark doesn’t support java.sql.Types.ARRAY type. Also, the same issue with PostgreSQL x Spark was reported in https://stackoverflow.com/questions/50613977/unsupported-array-error-when-reading-jdbc-source-in-pyspark. What do you think about it?

ebyhr on Dec 19, 2019

@ebyhr so as @Oshimada told you the problem is not while loading the dataframe using PG ( It works successffuly ) , but when trying with the same logic in Presto it failed , without loading anything . thaaanks

AssouliDFK on Dec 19, 2019

Yo, the bug doesn’t exist on PG but when loading the dataframe from hive using presto when the bug shows . PS: the df loading is done the same way as postgres

Oshimada on Dec 19, 2019

@ebyhr no offense , but i dont think it’s a Spark bug , because I implemented with another distrbute query engines , and that was working without any bugs , so thats why i think its not a spark bug . and I’m sorry for thumbs down 😇

AssouliDFK on Dec 19, 2019

Could you also share the DDL? As far as I tested, this issue happens if the table has ARRAY type and the exception was thrown by Spark side (I suppose this isn’t Presto bug).

https://github.com/apache/spark/blob/a834dba120e3569e44c5e4b9f8db9c6eef58161b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L205

ebyhr on Dec 19, 2019

[info] java.sql.SQLException: Unsupported type ARRAY [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316) [info] at scala.Option.getOrElse(Option.scala:121) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315) [info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63) [info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35) [info] at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) [info] at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) [info] …

AssouliDFK on Dec 19, 2019

@AssouliDFK Please include the full stacktrace of an error, when running Spark with Presto JDBC 326

findepi on Dec 19, 2019