duckdb: Missing values returned as float NaN in non-numeric columns

If I try to query a parquet file having columns that are for example list of strings, missing values in those columns are outputted as float NaNs instead of the corresponding nullable type (None or pd.NA in the case of python).

Example:

import numpy as np, pandas as pd, duckdb
pd.DataFrame({
    "col1" : [["a"], ["a","b"], ["a","b","c"], None]
}).to_parquet("df.parquet")
duckdb.query("select * from read_parquet('df.parquet')").to_df()

Screenshot_20220709_110532

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments