cudf: [BUG] cudf.DataFrame.merge should support implicit type conversions on join columns

When merging on columns with different types in Pandas, it implicitly converts to matching types and runs the merge:

>>> df = pd.DataFrame()
>>> df['id'] = [0, 1, 2]
>>> df['val'] = [9, 9, 9]

>>> df_2 = pd.DataFrame()
>>> df_2['id'] = [0, 1, 2]

>>> df.dtypes
id     int64
val    int64
dtype: object

>>> df_2.dtypes
id    int64
dtype: object
>>> df_2['id'] = df_2['id'].astype('float64')

>>> df.merge(df_2, on=['id'])
   id  val
0   0    9
1   1    9
2   2    9

cudf doesn’t, and fails with type mismatch:

>>> import cudf
>>> df = cudf.from_pandas(df)
>>> df_2 = cudf.from_pandas(df_2)
>>> df.dtypes
id     int64
val    int64
dtype: object
>>> df_2.dtypes
id    float64
dtype: object
>>> df.merge(df_2, on=['id']).to_pandas()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/rapids/lib/python3.7/site-packages/cudf-0.9.0a0+1094.gdcdf3596a.dirty-py3.7-linux-x86_64.egg/cudf/dataframe/dataframe.py", line 1934, in merge
    lhs._cols, rhs._cols, left_on, right_on, how, method
  File "cudf/bindings/join.pyx", line 26, in cudf.bindings.join.join
  File "cudf/bindings/join.pyx", line 124, in cudf.bindings.join.join
  File "cudf/bindings/cudf_cpp.pyx", line 487, in cudf.bindings.cudf_cpp.check_gdf_error
cudf.bindings.GDFError.GDFError: b'GDF_DTYPE_MISMATCH'

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@kkraus14 I will try it out.

working on this 👍