duckdb: Missing values returned as float NaN in non-numeric columns
If I try to query a parquet file having columns that are for example list of strings, missing values in those columns are outputted as float NaNs instead of the corresponding nullable type (None or pd.NA in the case of python).
Example:
import numpy as np, pandas as pd, duckdb
pd.DataFrame({
"col1" : [["a"], ["a","b"], ["a","b","c"], None]
}).to_parquet("df.parquet")
duckdb.query("select * from read_parquet('df.parquet')").to_df()

About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (8 by maintainers)
Commits related to this issue
- Reporting: Unbound DNS - duckdb version upgrade handling o make sure DbConnection() throws a new StorageVersionException when storage versions mismatch o add restore_database() function to overwrite ... — committed to opnsense/core by AdSchellevis a year ago
- Reporting: Unbound DNS - duckdb version upgrade handling o make sure DbConnection() throws a new StorageVersionException when storage versions mismatch o add restore_database() function to overwrite ... — committed to opnsense/core by AdSchellevis a year ago
I filed an issue on pandas: https://github.com/pandas-dev/pandas/issues/51872