cudf: [BUG] Non-determinism in `cudf.MultiIndex.from_product`
Describe the bug
We are seeing some test failures in test_multiindex_from_product in PR ( https://github.com/rapidsai/cudf/pull/4567 ), which appear unrelated to that change. Looks like some non-determinism in the cudf.MultiIndex.from_product constructor. Here’s an example failure that demonstrates this:
=================================== FAILURES ===================================
_________________________ test_multiindex_from_product _________________________
def test_multiindex_from_product():
arrays = [["a", "a", "b", "b"], ["house", "store", "house", "store"]]
pmi = pd.MultiIndex.from_product(arrays, names=["alpha", "location"])
gmi = cudf.MultiIndex.from_product(arrays, names=["alpha", "location"])
> assert_eq(pmi, gmi)
E AssertionError: MultiIndex level [0] are different
E
E MultiIndex level [0] values are different (50.0 %)
E [left]: Index(['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b',
E 'b', 'b'],
E dtype='object', name='alpha')
E [right]: Index(['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b',
E 'a', 'b'],
E dtype='object', name='alpha')
cudf/tests/test_multiindex.py:384: AssertionError
Steps/Code to reproduce bug
Appears running test_multiindex_from_product a few times is sufficient.
Expected behavior
That cudf.MultiIndex.from_product behaves deterministically.
Environment overview (please complete the following information)
- Environment location: gpuCI
Environment details
Other details should be in the log. If not, maybe we can ask OPS for more details.
Additional context
NA
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (11 by maintainers)
Totally ended up being a gather where we totally overengineered / overthought the problem. Thanks @jrhemstad and @harrism!
All details should be in the linked log in the OP. From what I can see, it says
ubuntu16.04.