superset: Unable to read S3 parquet files using DuckDB as Database Connector
I was following the DuckDB setup as per this PR and was able to load the DB file and create charts.
As a next step I wanted to load S3 parquet files in to Superset using DuckDB in memory option duckdb:///:memory:
Before trying with Superset, I used below Python code to check the DuckDB S3 parquet loading and found it to be working:
- Without duckdb-engine
import duckdb
cursor = duckdb.connect()
cursor.execute("INSTALL httpfs;")
cursor.execute("LOAD httpfs;")
cursor.execute("SET s3_region='******'")
cursor.execute("SET s3_access_key_id=''**************")
cursor.execute("SET s3_secret_access_key='*****************************'")
cursor.execute("PRAGMA enable_profiling;")
cursor.execute("SELECT count(*) FROM read_parquet('s3://<bucket>/prefix/*.parquet'")
- With duckdb-engine
# pip install duckdb-engine
import sqlalchemy as sa
engine = sa.create_engine("duckdb:///:memory:")
with engine.connect() as conn:
conn.execute(
f"""
INSTALL httpfs;
LOAD httpfs;
SET s3_region='ap-south-1';
SET s3_access_key_id='****************';
SET s3_secret_access_key='*****************************';
"""
)
r = conn.execute(
"""
SELECT count(*) FROM read_parquet('s3://<bucket>/prefix/*.parquet');
"""
).fetchall()
for row in r:
print(dict(row))
When I tried to SET the S3 environment values in SQL Editor I was getting the below error on the UI:
Error:
DuckDB Error
duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"
Did you mean: "TimeZone"
This may be triggered by:
Issue 1002 - The database returned an unexpected error.
See less
Full Trace:
superset_cache | 1:M 14 Jul 2022 14:18:00.250 * Background saving terminated with success
superset_app | 127.0.0.1 - - [14/Jul/2022:14:18:24 +0000] "GET /health HTTP/1.1" 200 2 "-" "curl/7.74.0"
superset_app | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
superset_app | 172.20.0.1 - - [14/Jul/2022:14:18:34 +0000] "GET /api/v1/me/ HTTP/1.1" 200 150 "http://localhost:8088/superset/sqllab/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
superset_app | Triggering query_id: 60
superset_app | 2022-07-14 14:18:35,809:INFO:superset.sqllab.command:Triggering query_id: 60
superset_app | Query 60: Executing 1 statement(s)
superset_app | 2022-07-14 14:18:35,942:INFO:superset.sql_lab:Query 60: Executing 1 statement(s)
superset_app | Query 60: Set query to 'running'
superset_app | 2022-07-14 14:18:35,943:INFO:superset.sql_lab:Query 60: Set query to 'running'
superset_app | Query 60: Running statement 1 out of 1
superset_app | 2022-07-14 14:18:36,083:INFO:superset.sql_lab:Query 60: Running statement 1 out of 1
superset_app | Query 60: <class 'RuntimeError'>
superset_app | Traceback (most recent call last):
superset_app | File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
superset_app | db_engine_spec.execute(cursor, sql, async_=True)
superset_app | File "/app/superset/db_engine_specs/base.py", line 1261, in execute
superset_app | raise cls.get_dbapi_mapped_exception(ex)
superset_app | File "/app/superset/db_engine_specs/base.py", line 1259, in execute
superset_app | cursor.execute(query)
superset_app | File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
superset_app | raise e
superset_app | File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
superset_app | self.c.execute(statement)
superset_app | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
superset_app |
superset_app | Did you mean: "TimeZone"
superset_app | 2022-07-14 14:18:36,157:ERROR:superset.sql_lab:Query 60: <class 'RuntimeError'>
superset_app | Traceback (most recent call last):
superset_app | File "/app/superset/sql_lab.py", line 272, in execute_sql_statement
superset_app | db_engine_spec.execute(cursor, sql, async_=True)
superset_app | File "/app/superset/db_engine_specs/base.py", line 1261, in execute
superset_app | raise cls.get_dbapi_mapped_exception(ex)
superset_app | File "/app/superset/db_engine_specs/base.py", line 1259, in execute
superset_app | cursor.execute(query)
superset_app | File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 97, in execute
superset_app | raise e
superset_app | File "/usr/local/lib/python3.8/site-packages/duckdb_engine/__init__.py", line 85, in execute
superset_app | self.c.execute(statement)
superset_app | RuntimeError: Catalog Error: unrecognized configuration parameter "s3_region"
superset_app |
superset_app | Did you mean: "TimeZone"
superset_app | [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app | Traceback (most recent call last):
superset_app | File "/app/superset/views/base.py", line 207, in wraps
superset_app | return f(self, *args, **kwargs)
superset_app | File "/app/superset/utils/log.py", line 244, in wrapper
superset_app | value = f(*args, **kwargs)
superset_app | File "/app/superset/views/core.py", line 2412, in sql_json
superset_app | command_result: CommandResult = command.run()
superset_app | File "/app/superset/sqllab/command.py", line 114, in run
superset_app | raise ex
superset_app | File "/app/superset/sqllab/command.py", line 98, in run
superset_app | status = self._run_sql_json_exec_from_scratch()
superset_app | File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
superset_app | raise ex
superset_app | File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
superset_app | return self._sql_json_executor.execute(
superset_app | File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app | raise SupersetErrorsException(
superset_app | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app | 2022-07-14 14:18:36,229:WARNING:superset.views.base:[SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app | Traceback (most recent call last):
superset_app | File "/app/superset/views/base.py", line 207, in wraps
superset_app | return f(self, *args, **kwargs)
superset_app | File "/app/superset/utils/log.py", line 244, in wrapper
superset_app | value = f(*args, **kwargs)
superset_app | File "/app/superset/views/core.py", line 2412, in sql_json
superset_app | command_result: CommandResult = command.run()
superset_app | File "/app/superset/sqllab/command.py", line 114, in run
superset_app | raise ex
superset_app | File "/app/superset/sqllab/command.py", line 98, in run
superset_app | status = self._run_sql_json_exec_from_scratch()
superset_app | File "/app/superset/sqllab/command.py", line 151, in _run_sql_json_exec_from_scratch
superset_app | raise ex
superset_app | File "/app/superset/sqllab/command.py", line 146, in _run_sql_json_exec_from_scratch
superset_app | return self._sql_json_executor.execute(
superset_app | File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app | raise SupersetErrorsException(
superset_app | superset.exceptions.SupersetErrorsException: [SupersetError(message='duckdb error: Catalog Error: unrecognized configuration parameter "s3_region"\n\nDid you mean: "TimeZone"', error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'DuckDB', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 22 (7 by maintainers)
@Mageswaran1989 , @Mause
I tested with minio,
and it works,
you need check
Allow DMLIf it’s JSON, it probably just needs to be
falseinstead ofFalse