trino: backend scala can't load dataframe while using the Presto driver _ java.sql.Exception

This is where the test break

 df = spark.read.format("jdbc")
                .option("url", url_ )
                .option("dbtable", dbtable_ )
                .option("driver", "com.facebook.presto.jdbc.PrestoDriver")
                .load()

**The Error that I got:

java.sql Unsupported type Array**

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

@AssouliDFK Could you tell me the reason why you did thumbs down? Please correct me if my understanding is wrong.

Please try with .option("dbtable", "pg_type").

df = spark.read
               .format("jdbc")
               .option("url", URL_ )
               .option("dbtable", "pg_type")
               .option("driver", "org.postgresql.Driver")
               .load()
scala> var df = spark.read.format("jdbc").option("url", "jdbc:postgresql://localhost:15432/test?user=test&password=test").option("dbtable", "pg_type").option("driver", "org.postgresql.Driver").load()
java.sql.SQLException: Unsupported type ARRAY
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
  ... 49 elided

It appears Spark doesn’t support arrays generically for JDBC, but only for specific databases like PostgreSQL. This is why it isn’t working for Presto.

Other than changing Spark to add support for Presto, you might be able to work around this by converting the array column to JSON text in the SQL query: json_format(cast(x as json))

@AssouliDFK However, JdbcUtils.scala#L205 is showing that Spark doesn’t support java.sql.Types.ARRAY type. Also, the same issue with PostgreSQL x Spark was reported in https://stackoverflow.com/questions/50613977/unsupported-array-error-when-reading-jdbc-source-in-pyspark. What do you think about it?

@ebyhr so as @Oshimada told you the problem is not while loading the dataframe using PG ( It works successffuly ) , but when trying with the same logic in Presto it failed , without loading anything . thaaanks

Yo, the bug doesn’t exist on PG but when loading the dataframe from hive using presto when the bug shows . PS: the df loading is done the same way as postgres

@ebyhr no offense , but i dont think it’s a Spark bug , because I implemented with another distrbute query engines , and that was working without any bugs , so thats why i think its not a spark bug . and I’m sorry for thumbs down 😇

Could you also share the DDL? As far as I tested, this issue happens if the table has ARRAY type and the exception was thrown by Spark side (I suppose this isn’t Presto bug).

https://github.com/apache/spark/blob/a834dba120e3569e44c5e4b9f8db9c6eef58161b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L205

[info] java.sql.SQLException: Unsupported type ARRAY [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:251) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316) [info] at scala.Option.getOrElse(Option.scala:121) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315) [info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63) [info] at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35) [info] at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) [info] at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) [info]

@AssouliDFK Please include the full stacktrace of an error, when running Spark with Presto JDBC 326