koalas: ValueError when reading dict with None
I find that reading a dict
row = {'a': [1], 'b':[None]}
ks.DataFrame(row)
ValueError: can not infer schema from empty or null dataset
but for pandas there is no error
row = {'a': [1], 'b':[None]}
print(pd.DataFrame(row))
a b
0 1 None
I have tried setting dtype=np.int64 but this has not helped.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 2
- Comments: 19 (7 by maintainers)
@ederfdias Here is a possible workaround. Specify converters like below:
JFYI… using read_csv function with a column without values I don’t receive any errors, but with an read_excel() the same error is raised.
Now it works properly on pandas-on-Spark (It’s available in Apache Spark 3.2 and above)
I’d recommend to use pandas-on-Spark rather than Koalas since Koalas now only maintenance mode.
Apparently,
np.NaNdoes the trickoutput
It’s because PySpark, by default, tries to infer the type from the given data. If there’s no data or only nulls in the column, PySpark cannot infer its data type for a DataFrame.
pandas has
objecttype that can contain everything; whereas PySpark does not have such type. So, it’s actually an issue in PySpark.