superset: Unable to read S3 parquet files using DuckDB as Database Connector

I was following the DuckDB setup as per this PR and was able to load the DB file and create charts.

As a next step I wanted to load S3 parquet files in to Superset using DuckDB in memory option duckdb:///:memory:

Before trying with Superset, I used below Python code to check the DuckDB S3 parquet loading and found it to be working:

import duckdb
cursor = duckdb.connect()
cursor.execute("INSTALL httpfs;")
cursor.execute("LOAD httpfs;")
cursor.execute("SET s3_region='******'")
cursor.execute("SET s3_access_key_id=''**************")
cursor.execute("SET s3_secret_access_key='*****************************'")
cursor.execute("PRAGMA enable_profiling;")
cursor.execute("SELECT count(*) FROM read_parquet('s3://<bucket>/prefix/*.parquet'")
# pip install duckdb-engine
import sqlalchemy as sa


engine = sa.create_engine("duckdb:///:memory:")

with engine.connect() as conn:
    conn.execute(
        f"""
            INSTALL httpfs;

            LOAD httpfs;

            SET s3_region='ap-south-1';
            SET s3_access_key_id='****************';
            SET s3_secret_access_key='*****************************';
        """
    )
    r = conn.execute(
        """
           SELECT count(*) FROM read_parquet('s3://<bucket>/prefix/*.parquet');
        """
    ).fetchall()
    for row in r:
      print(dict(row))

When I tried to SET the S3 environment values in SQL Editor I was getting the below error on the UI:

Error:

DuckDB Error
duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"

Did you mean: "TimeZone"


This may be triggered by:
Issue 1002 - The database returned an unexpected error. 

See less

Full Trace:


superset_cache          | 1:M 14 Jul 2022 14:18:00.250 * Background saving terminated with success
superset_app            | 127.0.0.1 - - [14/Jul/2022:14:18:24 +0000] "GET /health HTTP/1.1" 200 2 "-" "curl/7.74.0"
superset_app            | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
superset_app            | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
superset_app            | Triggering query_id: 60
superset_app            | 2022-07-14 14:18:35,809:INFO:superset.sqllab.command:Triggering query_id: 60
superset_app            | Query 60: Executing 1 statement(s)
superset_app            | 2022-07-14 14:18:35,942:INFO:superset.sql_lab:Query 60: Executing 1 statement(s)
superset_app            | Query 60: Set query to 'running'
superset_app            | 2022-07-14 14:18:35,943:INFO:superset.sql_lab:Query 60: Set query to 'running'
superset_app            | Query 60: Running statement 1 out of 1
superset_app            | 2022-07-14 14:18:36,083:INFO:superset.sql_lab:Query 60: Running statement 1 out of 1
superset_app            | Query 60: <class 'RuntimeError'>
superset_app            | Traceback (most recent call last):
superset_app            |   File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
superset_app            |     db_engine_spec.execute(cursor, sql, async_=True)
superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1261, in execute
superset_app            |     raise cls.get_dbapi_mapped_exception(ex)
superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1259, in execute
superset_app            |     cursor.execute(query)
superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
superset_app            |     raise e
superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
superset_app            |     self.c.execute(statement)
superset_app            | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
superset_app            | 
superset_app            | Did you mean: "TimeZone"
superset_app            | 2022-07-14 14:18:36,157:ERROR:superset.sql_lab:Query 60: <class 'RuntimeError'>
superset_app            | Traceback (most recent call last):
superset_app            |   File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
superset_app            |     db_engine_spec.execute(cursor, sql, async_=True)
superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1261, in execute
superset_app            |     raise cls.get_dbapi_mapped_exception(ex)
superset_app            |   File "/app/superset/db_engine_specs/base.py", line 1259, in execute
superset_app            |     cursor.execute(query)
superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
superset_app            |     raise e
superset_app            |   File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
superset_app            |     self.c.execute(statement)
superset_app            | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
superset_app            | 
superset_app            | Did you mean: "TimeZone"
superset_app            | [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app            | Traceback (most recent call last):
superset_app            |   File "/app/superset/views/base.py", line 207, in wraps
superset_app            |     return f(self, *args, **kwargs)
superset_app            |   File "/app/superset/utils/log.py", line 244, in wrapper
superset_app            |     value = f(*args, **kwargs)
superset_app            |   File "/app/superset/views/core.py", line 2412, in sql_json
superset_app            |     command_result: CommandResult = command.run()
superset_app            |   File "/app/superset/sqllab/command.py", line 114, in run
superset_app            |     raise ex
superset_app            |   File "/app/superset/sqllab/command.py", line 98, in run
superset_app            |     status = self._run_sql_json_exec_from_scratch()
superset_app            |   File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
superset_app            |     raise ex
superset_app            |   File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
superset_app            |     return self._sql_json_executor.execute(
superset_app            |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app            |     raise SupersetErrorsException(
superset_app            | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app            | 2022-07-14 14:18:36,229:WARNING:superset.views.base:[SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app            | Traceback (most recent call last):
superset_app            |   File "/app/superset/views/base.py", line 207, in wraps
superset_app            |     return f(self, *args, **kwargs)
superset_app            |   File "/app/superset/utils/log.py", line 244, in wrapper
superset_app            |     value = f(*args, **kwargs)
superset_app            |   File "/app/superset/views/core.py", line 2412, in sql_json
superset_app            |     command_result: CommandResult = command.run()
superset_app            |   File "/app/superset/sqllab/command.py", line 114, in run
superset_app            |     raise ex
superset_app            |   File "/app/superset/sqllab/command.py", line 98, in run
superset_app            |     status = self._run_sql_json_exec_from_scratch()
superset_app            |   File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
superset_app            |     raise ex
superset_app            |   File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
superset_app            |     return self._sql_json_executor.execute(
superset_app            |   File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app            |     raise SupersetErrorsException(
superset_app            | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 22 (7 by maintainers)

Most upvoted comments

@Mageswaran1989 , @Mause

I tested with minio,

/usr/local/opt/minio/bin/minio server --config-dir=/usr/local/etc/minio --address=:9900 /usr/local/var/minio

and it works,

install 'httpfs';
load 'httpfs';
SET s3_endpoint='127.0.0.1:9900';
SET s3_access_key_id='minioadmin';
SET s3_secret_access_key='minioadmin';
SET s3_url_style = 'path';
SET s3_use_ssl=false;
select count(*) from 's3://ontime/*.parquet';
image

you need check Allow DML

image

attention, superset config does not support boolean, must be string

If it’s JSON, it probably just needs to be false instead of False