chispa: The it_does_not_throw_with_different_schema test exposes a bug

This test shouldn’t be passing:

    def it_does_not_throw_with_different_schema():
        data1 = [(1.0, "jose"), (1.1, "li"), (1.2, "laura"), (None, None)]
        df1 = spark.createDataFrame(data1, ["num", "expected_name"])
        data2 = [("li", 1.05), ("laura", 1.2), (None, None), ("jose", 1.0)]
        df2 = spark.createDataFrame(data2, ["another_name", "same_num"])
        assert_approx_df_equality(df1, df2, 0.1, ignore_schema=True)

ignore_row_order=False isn’t set, so this shouldn’t be passing.

This is because of empty set returned in d1.keys() & d2.keys(), when the column names are different. The conditions are actually not checked at all and returning True.

About this issue

Original URL
State: closed
Created a year ago
Comments: 17 (12 by maintainers)

Most upvoted comments

I can work on it! 😃

robertkossendey on Feb 21, 2023