cudf: [BUG] cudf.DataFrame.merge should support implicit type conversions on join columns
When merging on columns with different types in Pandas, it implicitly converts to matching types and runs the merge:
>>> df = pd.DataFrame()
>>> df['id'] = [0, 1, 2]
>>> df['val'] = [9, 9, 9]
>>> df_2 = pd.DataFrame()
>>> df_2['id'] = [0, 1, 2]
>>> df.dtypes
id int64
val int64
dtype: object
>>> df_2.dtypes
id int64
dtype: object
>>> df_2['id'] = df_2['id'].astype('float64')
>>> df.merge(df_2, on=['id'])
id val
0 0 9
1 1 9
2 2 9
cudf doesn’t, and fails with type mismatch:
>>> import cudf
>>> df = cudf.from_pandas(df)
>>> df_2 = cudf.from_pandas(df_2)
>>> df.dtypes
id int64
val int64
dtype: object
>>> df_2.dtypes
id float64
dtype: object
>>> df.merge(df_2, on=['id']).to_pandas()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/rapids/lib/python3.7/site-packages/cudf-0.9.0a0+1094.gdcdf3596a.dirty-py3.7-linux-x86_64.egg/cudf/dataframe/dataframe.py", line 1934, in merge
lhs._cols, rhs._cols, left_on, right_on, how, method
File "cudf/bindings/join.pyx", line 26, in cudf.bindings.join.join
File "cudf/bindings/join.pyx", line 124, in cudf.bindings.join.join
File "cudf/bindings/cudf_cpp.pyx", line 487, in cudf.bindings.cudf_cpp.check_gdf_error
cudf.bindings.GDFError.GDFError: b'GDF_DTYPE_MISMATCH'
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (11 by maintainers)
@kkraus14 I will try it out.
working on this 👍