duckdb: [Java] IO Error: Connection error for HTTP HEAD

What happens?

COPY (
    SELECT * FROM read_json_auto(${s3_src_file}, format='newline_delimited')
) TO ${s3_dst_file} (FORMAT PARQUET);

It’s ok to run in duckdb and demo(use the same api with program has issues). In our own program, we use aws java api to upload file, and duckdb is used to help us achieve a row to column operation. And it report following exception.

Caused by: java.sql.SQLException: Invalid Error: IO Error: Connection error for HTTP HEAD to '<src>.json'
	at org.duckdb.DuckDBNative.duckdb_jdbc_prepare(Native Method)
	at org.duckdb.DuckDBPreparedStatement.prepare(DuckDBPreparedStatement.java:106)

To Reproduce

I’m sorry it’s quite hard to reproduce. The same command runs well in both duckdb and UT. The only similar issue I found is #9232 , #9647 . A reasonable guess is that our s3 client occupy some connection resource and duckdb reach its limit.

OS:

macOS, Linux

DuckDB Version:

0.9.2

DuckDB Client:

java

Full Name:

yanhui chen

Affiliation:

ApeCloud

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

About this issue

  • Original URL
  • State: open
  • Created 6 months ago
  • Reactions: 2
  • Comments: 15 (5 by maintainers)

Most upvoted comments

@samansmink http_keep_alive isn’t available in the latest dev version either (in python).

Facing the same issue when using Python

import duckdb

con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")
con.execute("SELECT * FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv';")

Failed after about 300 seconds of execution

IO Error: SSLConnection error for HTTP HEAD to 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv'

Python 3.10
duckdb                    0.8.1
duckdb-engine             0.9.2